Author name: Mike M.

nasa-officially-greenlights-$3.35-billion-mission-to-saturn’s-moon-titan

NASA officially greenlights $3.35 billion mission to Saturn’s moon Titan

Artist's illustration of Dragonfly soaring over the dunes of Titan.

Enlarge / Artist’s illustration of Dragonfly soaring over the dunes of Titan.

NASA has formally approved the robotic Dragonfly mission for full development, committing to a revolutionary project to explore Saturn’s largest moon with a quadcopter drone.

Agency officials announced the outcome of Dragonfly’s confirmation review last week. This review is a checkpoint in the lifetime of most NASA projects and marks the moment when the agency formally commits to the final design, construction, and launch of a space mission. The outcome of each mission’s confirmation review typically establishes a budgetary and schedule commitment.

“Dragonfly is a spectacular science mission with broad community interest, and we are excited to take the next steps on this mission,” said Nicky Fox, associate administrator of NASA’s science mission directorate. “Exploring Titan will push the boundaries of what we can do with rotorcraft outside of Earth.”

In the case of Dragonfly, NASA confirmed the mission with a total lifecycle cost of $3.35 billion and a launch date of July 2028. That is roughly twice the mission’s original proposed cost and a delay of more than two years from when the mission was originally selected in 2019, according to NASA.

Busting the cost cap

Rising costs are not necessarily a surprise on a mission as innovative as Dragonfly. After reaching Titan, the eight-bladed rotorcraft lander will soar from place to place on Saturn’s hazy moon, exploring environments rich in organic molecules, the building blocks of life.

Dragonfly will be the first mobile robot explorer to land on any other planetary body besides the Moon and Mars, and only the second flying drone to explore another planet. NASA’s Ingenuity helicopter on Mars was the first. Dragonfly will be more than 200 times as massive as Ingenuity and will operate six times farther from Earth.

Despite its distant position in the cold outer Solar System, Titan appears to be reminiscent of the ancient Earth. A shroud of orange haze envelops Saturn’s largest moon, and Titan’s surface is covered with sand dunes and methane lakes.

Titan’s frigid temperatures—hovering near minus 290° Fahrenheit (minus 179° Celsius)—mean water ice behaves like bedrock. NASA’s Cassini spacecraft, which flew past Titan numerous times before its mission ended in 2017, discovered weather systems on the hazy moon. Observations from Cassini found evidence for hydrocarbon rains and winds that appear to generate waves in Titan’s methane lakes.

Clearly, Titan is an exotic world. Most of what scientists know about Titan comes from measurements collected by Cassini and the European Space Agency’s Huygens probe, which Cassini released to land on Titan in 2005. Huygens returned the first pictures from Titan’s surface, but it only transmitted data for 72 minutes.

Dragonfly will explore Titan for around three years, flying tens of kilometers about once per month to measure the prebiotic chemistry of Titan’s surface, study its soupy atmosphere, and search for biosignatures that could be indications of life. The mission will visit more than 30 locations within Titan’s equatorial region, according to a presentation by Elizabeth Turtle, Dragonfly’s principal investigator at the Johns Hopkins University Applied Physics Laboratory.

“The Dragonfly mission is an incredible opportunity to explore an ocean world in a way that we have never done before,” Turtle said in a statement. “The team is dedicated and enthusiastic about accomplishing this unprecedented investigation of the complex carbon chemistry that exists on the surface of Titan and the innovative technology bringing this first-of-its-kind space mission to life.”

However, this high level of ambition comes at a high cost. NASA selected Dragonfly to proceed into initial development in 2019. Turtle’s science team proposed Dragonfly to NASA through the agency’s New Frontiers program, which has developed a series of medium-class Solar System exploration missions. The New Frontiers program has an impressive pedigree, beginning with the New Horizons mission that flew by Pluto in 2015, the Juno mission to Jupiter, and the OSIRIS-REx asteroid sample return mission.

NASA officially greenlights $3.35 billion mission to Saturn’s moon Titan Read More »

biden-signs-bill-criticized-as-“major-expansion-of-warrantless-surveillance”

Biden signs bill criticized as “major expansion of warrantless surveillance”

Abstract image of human eye on a digital background

Getty Images | Yuichiro Chino

Congress passed and President Biden signed a reauthorization of Title VII of the Foreign Intelligence Surveillance Act (FISA), approving a bill that opponents say includes a “major expansion of warrantless surveillance” under Section 702 of FISA.

Over the weekend, the Reforming Intelligence and Securing America Act was approved by the Senate in a 60-34 vote. The yes votes included 30 Republicans, 28 Democrats, and two independents who caucus with Democrats. The bill, which was previously passed by the House and reauthorizes Section 702 of FISA for two years, was signed by President Biden on Saturday.

“Thousands and thousands of Americans could be forced into spying for the government by this new bill and with no warrant or direct court oversight whatsoever,” Sen. Ron Wyden (D-Ore.), a member of the Senate Select Committee on Intelligence, said on Friday. “Forcing ordinary Americans and small businesses to conduct secret, warrantless spying is what authoritarian countries do, not democracies.”

Wyden and Sen. Cynthia Lummis (R-Wyo.) led a bipartisan group of eight senators who submitted an amendment to reverse what Wyden’s office called “a major expansion of warrantless surveillance under Section 702 of the Foreign Intelligence Surveillance Act that was included in the House-passed bill.” After the bill was approved by the Senate without the amendment, Wyden said it seemed “that senators were unwilling to send this bill back to the House, no matter how common-sense the amendment before them.”

Sen. Ted Cruz (R-Texas) said he voted against the reauthorization “because it failed to include the most important requirement to protect Americans’ civil rights: that law enforcement get a warrant before targeting a US citizen.”

Bill expands definition of service provider

The Wyden/Lummis amendment would have struck language that expands the definition of an electronic communication service provider to include, with some exceptions, any “service provider who has access to equipment that is being or may be used to transmit or store wire or electronic communications.” The exceptions are for public accommodation facilities, dwellings, community facilities, and food service establishments.

“Instead of using the opportunity to curb warrantless surveillance of Americans’ private communications and protect the public’s privacy, Congress passed an expansive, unchecked surveillance authority,” Sen. Edward J. Markey (D-Mass.) said after the vote. “This FISA reauthorization legislation is a step backwards, doing nothing to address the extent to which the government conducts surveillance over its own citizens.”

Under the 2008 FISA Amendments Act, electronic communication service providers already included telecommunications carriers, providers of electronic communication services, providers of remote computing services, and “any other communication service provider who has access to wire or electronic communications either as such communications are transmitted or as such communications are stored.” These entities must provide the government with information, facilities, and assistance necessary to obtain communications.

The Brennan Center for Justice at New York University School of Law called the reauthorization “the largest expansion of domestic surveillance authority since the Patriot Act.”

“The bill, which would effectively grant the federal government access to the communications equipment of almost any business in the United States, is a gift to any president who may wish to spy on political enemies,” said Elizabeth Goitein, senior director of the Brennan Center’s Liberty and National Security Program.

Biden signs bill criticized as “major expansion of warrantless surveillance” Read More »

first-real-life-pixel-9-pro-pictures-leak,-and-it-has-16gb-of-ram

First real-life Pixel 9 Pro pictures leak, and it has 16GB of RAM

OK, but what if I don’t care about generative AI? —

With 16GB of RAM, there’s lot of room for Google’s AI models to live in memory.

OnLeak's renders of the <a href='https://www.mysmartprice.com/gear/pixel-9-pro-5k-renders-360-degree-video-exclusive/'>Pixel 9 Pro XL</a>, the <a href='https://www.91mobiles.com/hub/google-pixel-9-design-render-exclusive/'>Pixel 9 Pro</a>, and the <a href = 'https://www.91mobiles.com/hub/google-pixel-9-renders-design-exclusive/'>Pixel 9.</a>” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/03/pixel-9-lineup-800×446.jpg”></img><figcaption>
<div>
<p><a data-height=Enlarge / OnLeak’s renders of the Pixel 9 Pro XL, the Pixel 9 Pro, and the Pixel 9.

OnLeaks / 91Mobiles / MySmartPrice

The usual timeline would put the Google Pixel 9 at something like five months away from launching, but that doesn’t mean it’s too early to leak! Real-life pictures of the “Pixel 9 Pro” model have landed over at Rozetked.

This prototype looks just like the renders from OnLeaks that first came out back in January. The biggest change is a new pill-shaped camera bump instead of the edge-to-edge design of old models. It looks rather stylish in real-life photos, with the rounded corners of the pill and camera glass matching the body shape. The matte back looks like it still uses the excellent “soft-touch glass” material from last year. The front and back of the phone are totally flat, with a metal band around the side. The top edge still has a signal window cut out of it, which is usually for mmWave. The Pixel 8 Pro’s near-useless temperature sensor appears to still be on the back of this prototype. At least, the spot for the temperature sensor—the silver disk right below the LED camera flash—looks identical to the Pixel 8 Pro. As a prototype any of this could change before the final release, but this is what it looks like right now.

The phone was helpfully photographed next to an iPhone 14 Pro Max, and you might notice that the Pixel 9 Pro looks a little small! That’s because this is one of the small models, with only a 6.1-inch display. Previously for Pixels, “Pro” meant “the big model,” but this year Google is supposedly shipping three models, adding in a top-tier small phone. There’s the usual big Pixel 9, with a 6.7-inch display, which will reportedly be called the “Pixel 9 Pro XL.” The new model is the “Pixel 9 Pro”—no XL—which is a small model but still with all the “Pro” trimmings, like three rear cameras. There’s also the Pixel 9 base model, which is the usual smaller phone (6.03-inch) with cut-down specs like only two rear cameras.

Rozetked.” data-height=”1056″ data-width=”1408″ href=”https://cdn.arstechnica.net/wp-content/uploads/2024/04/4.jpg”>The Pixel 9 Pro prototype. It's small because this is the Rozetked.” height=”735″ src=”https://cdn.arstechnica.net/wp-content/uploads/2024/04/4-980×735.jpg” width=”980″>

Enlarge / The Pixel 9 Pro prototype. It’s small because this is the “small Pro” model. There are more pictures over at Rozetked.

Rozetked says (through translation) that the phone is  “similar in size to the iPhone 15 Pro.” It runs a Tensor G4 SoC, of course, and—here’s a noteworthy spec—has a whopping 16GB of RAM according to the bootloader screen. The Pixel 8 Pro tops out at 12GB.

Anything could change between prototype and product, especially for RAM, which is usually scaled up and down in various phone tiers. A jump in RAM is something we were expecting though. As part of Google’s new AI-focused era, it wants generative AI models turned on 24/7 for some use cases. Google said as much in a recent in-house podcast, pointing to some features like a new version of Smart Reply built right into the keyboard, which “requires the models to be RAM-resident”—in other words, loaded all the time. Google’s desire to keep generative AI models in memory means less RAM for your operating system to actually do operating system things, and one solution to that is to just add more RAM. So how much RAM is enough? At one point Google said the smaller Pixel 8’s 8GB of RAM was too much of a “hardware limitation” for this approach. Google PR also recently told us the company still hasn’t enabled generative AI smart reply on Pixel 8 Pro by default with its 12GB of RAM, so expect these RAM numbers to start shooting up.

The downside is that more RAM means a more expensive phone, but this is the path Google is going down. There’s also the issue of whether or not you view generative AI as something that is so incredibly useful you need it built into your keyboard 24/7. Google wants its hardware to be “the intersection of hardware, software, and AI,” so keeping all this ChatGPT-like stuff quarantined to a single app apparently won’t be an option.

One final note: It’s weird how normal this phone looks. Usually, Pixel prototypes have a unique logo that isn’t the Google “G,” and often they are covered in identifying patterns for leak tracing. This looks like a production-worthy design, though.

First real-life Pixel 9 Pro pictures leak, and it has 16GB of RAM Read More »

home-assistant-has-a-new-foundation-and-a-goal-to-become-a-consumer-brand

Home Assistant has a new foundation and a goal to become a consumer brand

An Open Home stuffed full of code —

Can a non-profit foundation get Home Assistant to the point of Home Depot boxes?

Open Home Foundation logo on a multicolor background

Open Home Foundation

Home Assistant, until recently, has been a wide-ranging and hard-to-define project.

The open smart home platform is an open source OS you can run anywhere that aims to connect all your devices together. But it’s also bespoke Raspberry Pi hardware, in Yellow and Green. It’s entirely free, but it also receives funding through a private cloud services company, Nabu Casa. It contains tiny board project ESPHome and other inter-connected bits. It has wide-ranging voice assistant ambitions, but it doesn’t want to be Alexa or Google Assistant. Home Assistant is a lot.

After an announcement this weekend, however, Home Assistant’s shape is a bit easier to draw out. All of the project’s ambitions now fall under the Open Home Foundation, a non-profit organization that now contains Home Assistant and more than 240 related bits. Its mission statement is refreshing, and refreshingly honest about the state of modern open source projects.

The three pillars of the Open Home Foundation.

The three pillars of the Open Home Foundation.

Open Home Foundation

“We’ve done this to create a bulwark against surveillance capitalism, the risk of buyout, and open-source projects becoming abandonware,” the Open Home Foundation states in a press release. “To an extent, this protection extends even against our future selves—so that smart home users can continue to benefit for years, if not decades. No matter what comes.” Along with keeping Home Assistant funded and secure from buy-outs or mission creep, the foundation intends to help fund and collaborate with external projects crucial to Home Assistant, like Z-Wave JS and Zigbee2MQTT.

My favorite video.

Home Assistant’s ambitions don’t stop with money and board seats, though. They aim to “be an active political advocate” in the smart home field, toward three primary principles:

  • Data privacy, which means devices with local-only options, and cloud services with explicit permissions
  • Choice in using devices with one another through open standards and local APIs
  • Sustainability by repurposing old devices and appliances beyond company-defined lifetimes

Notably, individuals cannot contribute modest-size donations to the Open Home Foundation. Instead, the foundation asks supporters to purchase a Nabu Casa subscription or contribute code or other help to its open source projects.

From a few lines of Python to a foundation

Home Assistant founder Paulus Schoutsen wanted better control of his Philips Hue smart lights just before 2014 or so and wrote a Python script to do so. Thousands of volunteer contributions later, Home Assistant was becoming a real thing. Schoutsen and other volunteers inevitably started to feel overwhelmed by the “free time” coding and urgent bug fixes. So Schoutsen, Ben Bangert, and Pascal Vizeli founded Nabu Casa, a for-profit firm intended to stabilize funding and paid work on Home Assistant.

Through that stability, Home Assistant could direct full-time work to various projects, take ownership of things like ESPHome, and officially contribute to open standards like Zigbee, Z-Wave, and Matter. But Home Assistant was “floating in a kind of undefined space between a for-profit entity and an open-source repository on GitHub,” according to the foundation. The Open Home Foundation creates the formal home for everything that needs it and makes Nabu Casa a “special, rules-bound inaugural partner” to better delineate the business and non-profit sides.

Home Assistant as a Home Depot box?

In an interview with The Verge’s Jennifer Pattison Tuohy, and in a State of the Open Home stream over the weekend, Schoutsen also suggested that the Foundation gives Home Assistant a more stable footing by which to compete against the bigger names in smart homes, like Amazon, Google, Apple, and Samsung. The Home Assistant Green starter hardware will sell on Amazon this year, along with HA-badged extension dongles. A dedicated voice control hardware device that enables a local voice assistant is coming before year’s end. Home Assistant is partnering with Nvidia and its Jetson edge AI platform to help make local assistants better, faster, and more easily integrated into a locally controlled smart home.

That also means Home Assistant is growing as a brand, not just a product. Home Assistant’s “Works With” program is picking up new partners and has broad ambitions. “We want to be a consumer brand,” Schoutsen told Tuohy. “You should be able to walk into a Home Depot and be like, ‘I care about my privacy; this is the smart home hub I need.’”

Where does this leave existing Home Assistant enthusiasts, who are probably familiar with the feeling of a tech brand pivoting away from them? It’s hard to imagine Home Assistant dropping its advanced automation tools and YAML-editing offerings entirely. But Schoutsen suggested he could imagine a split between regular and “advanced” users down the line. But Home Assistant’s open nature, and now its foundation, should ensure that people will always be able to remix, reconfigure, or re-release the version of smart home choice they prefer.

Home Assistant has a new foundation and a goal to become a consumer brand Read More »

apple-reportedly-plans-m4-mac-mini-for-late-2024-or-early-2025,-skipping-the-m3

Apple reportedly plans M4 Mac mini for late 2024 or early 2025, skipping the M3

leapfrog —

But this would be a faster turnaround time than we saw for the M3 or the M2.

The M2 Pro Mac mini.

Enlarge / The M2 Pro Mac mini.

Andrew Cunningham

Bloomberg’s Mark Gurman thinks that Apple’s M4 chips for Macs are coming sooner rather than later—possibly as early as “late this year,” per a report from earlier this month. Now Gurman says Apple could completely skip the M3 generation for some Macs, most notably the Mac mini.

To be clear, Gurman doesn’t have specific insider information confirming that Apple is planning to skip the M3 mini. But based on Apple’s alleged late-2024-into-early-2025 timeline for the M4 mini, he believes that it’s “probably safe to say” that there’s not enough space on the calendar for an M3 mini to be released between now and then.

This wouldn’t be the first time an Apple Silicon Mac had skipped a chip generation—the 24-inch iMac was never updated with the M2, instead jumping directly from the M1 to the M3. The Mac Pro also skipped the M1 series, leapfrogging from Intel chips to the M2.

But if the M4 does come out by the end of 2024, it would be a much faster turnaround than we’ve seen for other Apple Silicon chips so far. Roughly a year and a half passed between the introduction of the first M1 Macs in late 2020 and the first M2 Macs in the summer of 2022; about the same amount of time passed between mid-2022 and the late-2023 introduction of the first M3 Macs. If Apple holds to a more typical 18-month gap between the first M3 Macs and the first M4 Macs, there’s still plenty of time for an M3-based Mac mini refresh to be released.

Apple last updated the Mac mini in January of 2022, replacing the M1 model with an M2 version and introducing a new variant with an M2 Pro chip that included more Thunderbolt ports, better external display support, and better CPU and GPU performance. Most of Apple’s desktops—both Mac minis, as well as the Mac Studio and Mac Pro—are still using Apple’s M2 chips, while all of the laptops and the iMac have gotten an M3 refresh at this point.

Gurman’s previous reporting on the M4 suggests that it will be an “AI-focused” chip series, which probably means that it will beef up the processors’ Neural Engine to power the on-device generative AI features that are expected to come with iOS 18 and Apple’s other major operating system updates this year. Apple already has a head start on the PC ecosystem in this respect—all of the M-series chips and A-series chips going all the way back to 2017’s A11 Bionic have included a version of the Neural Engine. Intel and AMD’s processors have only begun to include similar neural processing units (NPUs) within the last year or so.

Gurman hasn’t reported on the M4 series’ specifications, but he has said it will include at least three performance tiers: a base model codenamed “Donan,” a midrange version codenamed “Brava,” and a high-end model codenamed “Hidra.” It remains to be seen which of these chips would replace the Pro, Max, and Ultra processors in current-generation M2 and M3 Macs.

Apple reportedly plans M4 Mac mini for late 2024 or early 2025, skipping the M3 Read More »

after-48-years,-zilog-is-killing-the-classic-standalone-z80-microprocessor-chip

After 48 years, Zilog is killing the classic standalone Z80 microprocessor chip

rest in silicon —

Z80 powered game consoles, ZX Spectrum, Pac-Man, and a 1970s PC standard based on CP/M.

A cropped portion of a ca. 1980 ad for the Microsoft Z80 SoftCard, which allowed Apple II users to run the CP/M operating system.

Enlarge / A cropped portion of a ca. 1980 ad for the Microsoft Z80 SoftCard, which allowed Apple II users to run the CP/M operating system.

Microsoft

Last week, chip manufacturer Zilog announced that after 48 years on the market, its line of standalone DIP (dual inline package) Z80 CPUs is coming to an end, ceasing sales on June 14, 2024. The 8-bit Z80 architecture debuted in 1976 and powered a small-business-PC revolution in conjunction with CP/M, also serving as the heart of the Nintendo Game Boy, Sinclair ZX Spectrum, the Radio Shack TRS-80, the Pac-Man arcade game, and the TI-83 graphing calculator in various forms.

In a letter to customers dated April 15, 2024, Zilog wrote, “Please be advised that our Wafer Foundry Manufacturer will be discontinuing support for the Z80 product and other product lines. Refer to the attached list of the Z84C00 Z80 products affected.”

Designers typically use the Z84C00 chips because of familiarity with the Z80 architecture or to allow legacy system upgrades without needing significant system redesigns. And while many other embedded chip architectures have superseded these Z80 chips in speed, processing power, and capability, they remained go-to solutions for decades in products that didn’t need any extra horsepower.

Zilog will continue to manufacture the eZ80 microcontroller family, which was introduced in 2001 as a faster version of the Z80 series and comes in different physical package configurations (pin layouts).

Powering a microcomputer revolution

The 8-bit Z80 microprocessor was designed in 1974 by Federico Faggin as a binary-compatible, improved version of the Intel 8080 with a higher clock speed, a built-in DRAM refresh controller, and an extended instruction set. It was extensively used in desktop computers of the late 1970s and early 1980s, arcade video game machines, and embedded systems, and it became a cornerstone of several gaming consoles, like the Sega Master System.

The Tandy Radio Shack TRS-80 (1977), which used the Zilog Z80.

Enlarge / The Tandy Radio Shack TRS-80 (1977), which used the Zilog Z80.

SSPL/Getty Images

During the mid-late 1970s, the Z80 became a popular CPU for S-100 bus machines, which were early personal computers with a 100-pin modular bus system that allowed swapping cards to build systems based on parts from various manufacturers. Digital Research targeted the Z80 as a key platform for its CP/M operating system, and the association between Z80 and CP/M stuck, powering dozens of small business computers until the mid-1980s, when IBM PC clones running Microsoft’s MS-DOS became the new industry standard.

Interestingly, Microsoft’s first hardware product, the Z80 SoftCard for the Apple II in 1980, added the famous Zilog CPU to the classic personal computer and allowed users to run CP/M on that machine. In 1982, Bill Gates claimed that SoftCard installations represented the largest single user base of CP/M machines.

Last call in June 2024

Zilog is notably discontinuing several Z84C00 chips that are still available in classic 40-pin DIP packages resembling the classic Z80 CPU chips of the 1970s. (These standalone chips include a CPU and nothing else, unlike a microcontroller, which can include RAM and other accessory devices.) The DIP design features two rows of 20 pins with a plastic package in between that contains the actual embedded silicon chip, resembling the classic Z80 CPU chips of the 1970s.

After June 14, Zilog will stop taking orders, manufacture whatever orders are available if they are sufficient in quantity, then ship the last runs of the chips to resellers like Mouser Electronics and Digikey.

A classic dual inline package (DIP) version of the Z80 from the 1970s. It features two rows of 20 pins in a ceramic package.

Enlarge / A classic dual inline package (DIP) version of the Z80 from the 1970s. It features two rows of 20 pins in a ceramic package.

The discontinuation list provided by Zilog in its letter includes 13 products from the Z84C00 series, which are chips in the Z80 family that run at clock speeds from 6 to 20 MHz and maintain compatibility with the original Z80 architecture. Here’s the full list of part numbers that will be discontinued:

  • Z84C0006VEG
  • Z84C0006PEG
  • Z84C0010PEG
  • Z84C0008AEG
  • Z84C0020VEG
  • Z84C0008PEG
  • Z84C0010AEG
  • Z84C0008VEG
  • Z84C0010VEG
  • Z84C0010VEG00TR
  • Z84C0020AEG
  • Z84C0020PEG
  • Z84C0006AEG

So while the Z80 architecture will stick around in eZ80 form, it appears that this is the last call for newly manufactured standalone 8-bit Z80 CPU chips in the classic DIP form factor. We reached out to Zilog for clarification about its plans for the future of the Z80 platform but did not receive a response by press time.

After 48 years, Zilog is killing the classic standalone Z80 microprocessor chip Read More »

on-llama-3-and-dwarkesh-patel’s-podcast-with-zuckerberg

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg

It was all quiet. Then it wasn’t.

Note the timestamps on both of these.

Dwarkesh Patel did a podcast with Mark Zuckerberg on the 18th. It was timed to coincide with the release of much of Llama-3, very much the approach of telling your story directly. Dwarkesh is now the true tech media. A meteoric rise, and well earned.

This is two related posts in one. First I cover the podcast, then I cover Llama-3 itself.

My notes are edited to incorporate context from later explorations of Llama-3, as I judged that the readability benefits exceeded the purity costs.

  1. (1: 00) They start with Llama 3 and the new L3-powered version of Meta AI. Zuckerberg says “With Llama 3, we think now that Meta AI is the most intelligent, freely-available assistant that people can use.” If this means ‘free as in speech’ then the statement is clearly false. So I presume he means ‘free as in beer.’

  2. Is that claim true? Is Meta AI now smarter than GPT-3.5, Claude 2 and Gemini Pro 1.0? As I write this it is too soon to tell. Gemini Pro 1.0 and Claude 3 Sonnet are slightly ahead of Llama-3 70B on the Arena leaderboard. But it is close. The statement seems like a claim one can make within ‘reasonable hype.’ Also, Meta integrates Google and Bing for real-time knowledge, so the question there is if that process is any good, since most browser use by LLMs is not good.

  3. (1: 30) Meta are going in big on their UIs, top of Facebook, Instagram and Messenger. That makes sense if they have a good product that is robust, and safe in the mundane sense. If it is not, this is going to be at the top of chat lists for teenagers automatically, so whoo boy. Even if it is safe, there are enough people who really do not like AI that this is probably a whoo boy anyway. Popcorn time.

  4. (1: 45) They will have the ability to animate images and it generates high quality images as you are typing and updates them in real time as you are typing details. I can confirm this feature is cool. He promises multimodality, more ‘multi-linguality’ and bigger context windows.

  5. (3: 00) Now the technical stuff. Llama-3 follows tradition in training models in three sizes, here 8b, 70b that released on 4/18, and a 405b that is still training. He says 405b is already around 85 MMLU and they expect leading benchmarks. The 8b Llama-3 is almost as good as the 70b Llama-2.

  1. (5: 15) What went wrong earlier for Meta and how did they fix it? He highlights Reels, with its push to recommend ‘unconnected content,’ meaning things you did not ask for, and not having enough compute for that. They were behind. So they ordered double the GPUs that needed. They didn’t realize the type of model they would want to train.

  2. (7: 30) Back in 2006, what would Zuck have sold for when he turned down $1 billion? He says he realized if he sold he’d just build another similar company, so why sell? It wasn’t about the number, he wasn’t in position to evaluate the number. And I think that is actually wise there. You can realize that you do not want to accept any offer someone would actually make.

  3. (9: 15) When did making AGI become a key priority? Zuck points out Facebook AI Research (FAIR) is 10 years old as a research group. Over that time it has become clear you need AGI, he says, to support all their other products. He notes that training models on coding generalizes and helps their performance elsewhere, and that was a top focus for Llama-3.

  4. So Meta needs to solve AGI because if they don’t ‘their products will be lame.’ It seems increasingly likely, as we will see in several ways, that Zuck does not actually believe in ‘real’ AGI. By ‘AGI’ he means somewhat more capable AI.

  5. (13: 40) What will the Llama that makes cool products be able to do? Replace the engineers at Meta? Zuck tries to dodge, says we’re not ‘replacing’ people as much as making them more productive, hopefully 10x or more, says there is no one threshold for human intelligence, AGI isn’t one thing. He is focused on different modalities, especially 3D and emotional understanding, in addition to the usual things like memory and reasoning.

  6. (16: 00) What will we use all our data for? Zuck says AI will be in everything, and there will be a Meta general assistant product that does complicated tasks. He wants to let creators own an AI and train it how they want to ‘engage their community.’ But then he admits these are only consumer use cases and it will change everything in the economy.

  7. (18: 25) When do we get the good agents? Zuck says we do not know. It depends on the scaffolding. He wants to progressively move more of that into the model to make them better agents on their own so this stops being ‘brittle and non-general.’ It has much better tool use, you do not need to hand code. This Is Fine.

  8. (22: 20) What community fine tune is most personally exciting? Zuck says he doesn’t know, it surprises you, if he knew he’d build it himself.

    1. This doesn’t match my model of this, where you want to specialize, some things are left to others, which seems doubly true here with open model weights. He mentions that 8b is too big for many use cases, we should try to build a 1b or smaller model too.

    2. Also he mentions that they do a ton of inference because they have a ton of customers, so that dominates their compute usage over time. It makes sense for them to do what for others would be overtraining, also training more seemed to keep paying dividends for a long time.

    3. I would presume the other big labs will be in similar positions going forward.

  9. (26: 00) How much better will Llama-4 get? How will models improve? Zuck says (correctly) this is one of the great questions, on one knows, how long does an exponential curve keep going? He says probably long enough that the infrastructure is worth investing in, and a lot of companies are investing a lot.

  1. (28: 00) He thinks energy constraints will soon bind, not chips. No one has built a gigawatt single training cluster yet. And that is slower because energy gets permitted at the speed of government and then has to be physically built. One does not simply get a bunch of energy, compute and data together.

  2. If concentrations of energy generation are the true bottleneck, then anyone who says ‘government has no means to control this’ or ‘government cannot control this without being totalitarian’ would be very wrong, this is a very easy thing to spot, isolate and supervise. Indeed, we almost ‘get it for free’ given we are already massively over restricting energy generation and oversee industrial consumption.

  3. (30: 00) What would Meta do with 10x more money? More energy, which would allow bigger clusters, but true bottleneck is time. Right now data center energy tops out at something like 50mw-150mw. But 300mw-1gw, that’s new, that’s a meaningful nuclear power plant. It will happen but not next year. Dwarkesh mentions Amazon’s 950mw facility, Zuck says he is unsure about that.

  4. (31: 40) What about distributed computing? Zuck says it is unknown how much of that is feasible, and suggests that a lot of training in future might be inference to generate synthetic data.

  5. (32: 25) If that’s what this is about, could this work for Llama-3? Could you use these models to get data for these models to get smarter? De facto one might say ‘RSI Real Soon Now (RSI RSN)?’ Zuck says ‘there are going to be dynamics like that’ but there are natural limits on model architecture. He points out there is nothing like Llama-3 400B currently in open source, that will change things a lot, but says it can only go so far. That all makes sense, at some point you have to restart the architecture, but that does not fully rule out the scenario.

  6. (34: 15) Big picture, what’s up with AI for the next decade? How big a deal is it? Zuck says pretty fundamental, like the creation of computing, going from not having computers to having computers. You’ll get ‘all these new apps’ and it will ‘let people do what they want a lot more.’

    1. He notices it is very hard to reason about how this goes.

    2. He strongly expects physical constraints to prevent fast takeoff, or even ‘slow takeoff,’ expecting it to be decades to fully get there.

    3. Notice again his expectations here are very much within the mundane range.

  7. That could be the central crux here. If he thinks that nothing we build can get around the physical constraints for decades, then that has a lot of implications.

  8. (36: 00) Dwarkesh says, but what about on that cosmic, longer-term scale? What will the universe look like? Will AI be like humans evolving or harnessing fire? Zuck says that is tricky. He says that people have come to grips throughout history with noticing that humanity is not unique in various ways but is still super special. He notices that intelligence is not clearly fundamentally connected to life, it is distinct from consciousness and agency. Which he says makes it a super valuable tool.

    1. Once again, even in this scenario, there’s that word again. Tool.

  9. A key problem with this is agency is super useful. There is a reason Meta’s central plan is to create an active AI assistant for you that will act are your personal agent. Why Meta is striving to bring as much agency capability directly into the models, and also building more agency capability on top of that. The first thing people are doing and will do, in many contexts, is strive to give the AI as much agency as possible. So even if that doesn’t happen ‘on its own’ it happens anyway. My expectation is that if you wanted to create a non-agent, you can probably do that, but you and everyone else with sufficient access to the model have to choose to do that.

  1. (38: 00) Zuck: “Which is why I don’t think anyone should be dogmatic about how they plan to develop it or what they plan to do. You want to look at it with each release. We’re obviously very pro open source, but I haven’t committed to releasing every single thing that we do. I’m basically very inclined to think that open sourcing is going to be good for the community and also good for us because we’ll benefit from the innovations. If at some point however there is some qualitative change in what the thing is capable of, and we feel like it’s capable of, and we feel it is not responsible to open source it, then we won’t. It’s all very difficult to predict.”

  2. Bravo. Previously we have seen him say they were going to open source AGI. He might intend to do that anyway. This continues Zuck trying to have it both ways. He says both ‘we will open source everything up to and including AGI’ and also ‘we might not’ at different times.

    1. The reconciliation is simple. When Zuck says ‘AGI’ he does not mean AGI.

  3. This suggests an obvious compromise. We can all negotiate on what capabilities would constitute something too dangerous, and draw a line there, with the line drawn in anticipation of what can be built on top of the model that is being considered for release, and understanding that all safety work will rapidly be undone and so on.

    1. We are talking price, and perhaps are not even that far apart.

    2. I am totally fine with Llama-3 70B being released.

    3. I do notice that open sourcing Llama-3 405B sounds like a national security concern, and as I discuss later if I was in NatSec I would be asking how I could prevent Meta from releasing the weights for national competitiveness reasons (to not supercharge Chinese AI) with a side of catastrophic misuse by non-state actors.

    4. But I do not expect existential risk from Llama-3.

  4. (38: 45) So Dwarkesh asks exactly that. What would it take to give Zuck pause on open sourcing the results of a future model?

    1. Zuck says it is hard to measure that in the abstract. He says if you can ‘mitigate the negative behaviors’ of a product, then those behaviors are okay.

    2. The whole point is that you can to some extent do mitigations while you control the model (this is still super hard and jailbreaks are universally possible at least for now) but if you open source then your mitigations get fully undone.

  5. Thus I see this as another crux. What does ‘mitigate’ mean here? What is the proposal for how that would work? How is this not as fake as Stability.ai saying they are taking safety precautions with Stable Diffusion 3, the most generous interpretation of which I can imagine is ‘if someone does a fine tune and a new checkpoint and adds a LoRa then that is not our fault.’ Which is a distinction without a difference.

  6. (40: 00) Zuck says it is hard to enumerate all the ways something can be good or bad in advance. Very true.

  7. As an aside, the ads here are really cool, pitches for plausibly useful AI products. Dwarkesh’s readings are uninspired, but the actual content is actively positive.

  8. (42: 30) Zuck: “Some people who have bad faith are going to try and strip out all the bad stuff. So I do think that’s an issue.”

    1. Isn’t it more accurate to say that people will for various reasons definitely strip out all the protections, as they have consistently always done, barring an unknown future innovation?

  9. (42: 45) And here it is, as usual. Zuck: “I do think that a concentration of AI in the future has the potential to be as dangerous as it being widespread… people ask ‘is it bad for it to be out in the wild and just widely available?’ I think another version of this is that it’s probably also pretty bad for one institution to have an AI that is way more powerful than everyone else’s AI.” And so on.

  10. Something odd happens with his answer here. Up until this point, Zuck has been saying a mix of interesting claims, some of which I agree with and some where I disagree. I think he is making some key conceptual mistakes, and of course is talking his book as one would expect, but it is a unique perspective and voice. Now, suddenly, we get the generic open source arguments I’ve heard time and again, like they were out of a tape recorder.

  11. And then he says ‘I don’t hear people talking about this much.’ Well, actually, I hear people talking about it constantly. It is incessant, in a metaphorically very ‘isolated demand for rigor’ kind of way, to hear ‘the real danger is concentration of power’ or concentration of AI capability. Such people usually say this without justification, and without any indication they understand what the ‘not real’ danger is that they are dismissing as not real or why they claim that it is not real.

  12. (45: 00) He says what keeps him up at night is that someone untrustworthy that has the super strong AI, that this is ‘potentially a much bigger risk.’ That a bad actor who got a hold of a strong AI might cause a lot of mayhem in a world where not everyone has a strong AI.

    1. This is a bigger concern than AI getting control of the future? Bigger than human extinction? Bigger than every actor, however bad, having such access?

    2. Presumably he means more likely, or some combination of likely and bigger.

  13. So yes, his main concern is that the wrong monkey might get the poisoned banana and use it against other monkeys, it is only a tool after all. So instead we have to make sure all monkeys have such access?

  14. (46: 00) It is overall a relatively good version of the generic open source case. He at least acknowledges that there are risks on all sides, and certainly I agree with that.

    1. I see no indication from the argument that he actually understands what the risks of open sourced highly capable models are, or that he has considered them and has a reason why they would not come to pass.

    2. His position here appears to be based on ‘this is a tool and will always be a tool’ and combining that with an implied presumption about offense-defense balance.

    3. I certainly have no idea what his plan (or expectation) is to deal with various competitive dynamics and incentives, or how he would keep the AIs from being something more than tools if they were capable of being more than that.

    4. The better version of this case more explicitly denies future AI capabilities.

  15. I could write the standard reply in more detail than I have above, but I get tired. I should have a canonical link to use in these spots, but right now I do not.

  16. (46: 30) Instead Dwarkesh says it seems plausible that we could get an open source AI to become the standard and the best model, and that would be fine, preferable even. But he asks, mechanically, how you stop a bad actor in that world.

    1. He first asks about bioweapons.

    2. Zuck answers that stronger AIs are good cybersecurity defense.

    3. Dwarkesh asks, what if bioweapons aren’t like that.

    4. Zuck agrees he doesn’t know that bioweapons do not work that way and it makes sense to worry there. He suggests not training certain knowledge into the model (which seems unlikely to me to be that big a barrier, because the world implies itself and also you can give it the missing data), but admits if you get a sufficiently bad actor (which you will), and you don’t have another AI that can understand and balance that (which seems hard under equality), then that ‘could be a risk.’

  17. (48: 00) What if you for example caught a future Llama lying to you? Zuck says right now we see hallucinations and asks how you would tell the difference between that and deception, says there is a lot to think about, speaks of ‘long-term theoretical risks’ and asks to balance this with ‘real risks that we face today.’ His deception worry is ‘people using this to generate misinformation.’

  18. (49: 15) He says that the way he has beaten misinformation so far is by building AI systems that are smarter than the adversarial ones.

    1. Exactly. Not ‘as smart.’ Smarter.

    2. Zuck is playing defense here. He has the harder job.

    3. If those trying to get ‘misinformation’ or other undesired content past Facebook’s (or Twitter’s or GMail’s) filters had the same level of sophistication and skill and resources as Meta and Google, you would have to whitelist in order to use Facebook, Twitter and GMail.

    4. The key question will be, how much of being smarter will be the base model?

  19. (49: 45) Zuck says hate speech is not super adversarial in the sense that people are not getting better at being racist.

    1. I think in this sense that is wrong, and they totally are in both senses? Racists invent new dog whistles, new symbols, new metaphors, new deniable things. They look for what they can and cannot say in different places. They come up with new arguments. If you came with the 1970s racism today it would go very badly for you, let alone the 1870s or 1670s racism. And then he says that AIs here are getting more sophisticated faster than people.

    2. What is going to happen is that the racists are going to get their racist AI systems (see: Gab) and start using the AI to generate and select their racist arguments.

    3. If your AI needs to have high accuracy to both false positives and false negatives, then you need a capability advantage over the attack generation mechanism.

    4. This is all ‘without loss of generality.’ You can mostly substitute anything else you dislike for racism here if you change the dates or other details.

  20. (50: 30) Zuck then contrasts this with nation states interfering in elections, where he says nation-states are ‘have cutting edge technology’ and are getting better every year. He says this is ‘not like someone trying to say mean things, they have a goal.’

    1. Well, saying mean things is also a goal, and I have seen people be very persistent and creative in saying mean things when they want to do that.

    2. Indeed, Mark Zuckerberg went to Ardsley High School and Phillips Exeter Academy, they made this movie The Social Network and also saying mean things about Mark Zuckerberg is a top internet passtime. I am going to take a wild guess that he experienced this first hand. A lot.

  21. I would also more centrally say no, zero nation states have cutting edge election interference technology, except insofar as ‘whatever is available to the most capable foreign nation-state at this, maybe Russia’ is defined as the cutting edge. Plenty of domestic and non-state actors are ahead of the game here. And no state actor, or probably any domestic actor either, is going to have access to an optimized-for-propaganda-and-chaos version of Gemini, GPT-4 or Claude Opus. We are blessed here, and of course we should not pretend that past attempts were so sophisticated or impactful. Indeed, what may happen in the coming months is that, by releasing Llama-3 400B, Zuck instantly gives Russia, China, North Korea and everyone else exactly this ‘cutting edge technology’ with which to interfere.

  22. I of course think the main deception problems with AI lie in the future, and have very little to do with traditional forms of ‘misinformation’ or ‘election interference.’ I do still find it useful to contrast our models of those issues.

  23. (51: 30) He says ‘for the foreseeable future’ he is optimistic they will be able to open source. He doesn’t want to ‘take our eye off the ball’ of what people are trying to use the models for today. I would urge him to keep his eye on that ball, but also skate where the puck is going. Do not move directly towards the ball.

  24. (54: 30) Fun time, what period of time to go back to? Zuck checks, it has to be the past. He talks about the metaverse.

  25. (59: 00) Zuck is incapable of not taking a swing at building the next thing. He spends so much time finding out if he could, I suppose.

  26. (1: 02: 00) Caesar Augustus seeking peace. Zuck suggests peace at the time was a new concept as anything other than a pause between wars. I notice I am skeptical. Then Zuck transitions from ‘wanting the economy to be not zero-sum’ to ‘a lot of investors don’t understand why we would open source this.’ And says ‘there are more reasonable things than people think’ and that open source creates winners. The framing attempt is noted.

    1. I instead think most investors understand perfectly well why Meta might open source here. It is not hard to figure this out. Indeed, the loudest advocates for open source AI are largely venture capitalists.

    2. That does not mean that open sourcing is a wise (or unwise) business move.

  27. (1: 05: 00) Suppose there was a $10 billion model, it was totally safe even with fine tuning, would you open source? Zuck says ‘as long as it’s helping us, yeah.’

    1. Exactly. If it is good for business and it is not an irresponsible thing to do, it was actually ‘totally safe’ in the ways that matter, and you think it is good for the world too, then why not?

    2. My only caveat would be to ensure you are thinking well about what ‘safe’ means in that context, as it applies to the future path the world will take. One does not, in either direction, want to use a narrow view of ‘safe.’

  28. (1: 06: 00) Zuck notes he does not open source Meta’s products. Software yes, products no. Something to keep in mind.

  29. (1: 07: 00) Dwarkesh asks if training will be commodified? Zuck says maybe. Or it could go towards qualitative improvements via specialization.

  30. (1: 08: 45) Zuck notes that several times, Meta has wanted to launch features, and Apple has said no.

    1. We don’t know which features he is referring to.

    2. We do know Apple and Meta have been fighting for a while about app tracking and privacy, and about commissions and informing users about the commissions, and perhaps messaging.

  31. (1: 09: 00) He therefore asks, what if someone has an API and tells you what you can build? Meta needs to build the model themselves to ensure they are not in that position.

    1. I don’t love that these are the incentives, but if you are as big as Meta and want to do Meta things, then I am sympathetic to Meta in particular wanting to ensure it has ownership of the models it uses internally, even if that means large costs and even if it also meant being a bit behind by default.

  32. The core dilemma that cannot be resolved is: Either there is someone, be it corporation, government or other entity, that is giving you an API or other UI that decides what you can and cannot do, or there is not. Either there is the ability to modify the model’s weights and use various other methods to get it to do whatever you want it to do, or there is not. The goals of ‘everyone is free to do what they want whenever they want’ and ‘there is some action we want to ensure people do not take’ are mutually exclusive.

  33. You can and should seek compromise, to be on the production possibilities frontier, where you impose minimal restrictions to get the necessary guardrails in place where that is worthwhile, and otherwise let people do what they want. In some cases, that can even be zero guardrails and no restrictions. In other cases, such as physically building nuclear weapons, you want strict controls. But there is no taking a third option, you have to make the choice.

  34. (1: 09: 45) I totally do buy Zuck’s central case here, that if you have software that is generally beneficial to builders, and you open source it, that has large benefits. So if there is no reason not to do that, and often there isn’t, you should do that.

  35. (1: 10: 15) What about licensing the model instead, with a fee? Zuck says he would like that. He notes that the largest companies cannot freely use Llama under their license, so that if Amazon or Microsoft started selling Llama then Meta could get a revenue share.

  36. (1: 12: 00) Dwarkesh presses on the question of red flags, pointing to the responsible scaling policy (RSP) of Anthropic and preparedness framework of OpenAI, saying he wishes there was a similar framework at Meta saying what concrete things should stop open sourcing or even deployment of future models.

  37. Zuck says that is a fair point on the existential risk side, right now they are focusing on risks they see today, the content risk, avoiding helping people do violence or commit fraud. He says for at least one generation beyond this one and likely two, the harms that need more mitigation will remain the ‘more mundane harms’ like fraud, he doesn’t want to shortchange that, perhaps my term is catching on. Dwarkesh replies ‘Meta can handle both’ and Zuck says yep.

  38. There is no contradiction here. Meta can (and should) put the majority of its risk mitigation efforts into mundane harms right now, and also should have a framework for when existential risks would become concerning enough to reconsider how to deploy (or later train) a model, and otherwise spend relatively less on the issue. And it is perfectly fine to expect not to hit those thresholds for several generations. The key is to lay out the plan.

  39. (1: 13: 20) Has the impact of the open source tools Meta has released been bigger than the impact of its social media? Zuck says it is an interesting question, but half the world uses their social media. And yes, I think it is a fun question, but the answer is clearly no, the social media is more counterfactually important by far.

  40. (1: 14: 45) Meta custom silicon coming soon? Not Llama-4, but soon after that. They already moved a bunch of Reels inference onto their own silicon, and use Nvidia chips only for training.

  41. (1: 16: 00) Could Zuck have made Google+ work as CEO of Google+? Zuck says he doesn’t know, that’s tough. One problem was that Google+ didn’t have a CEO, it was only a division, and points to issues of focus. Keep the main thing the main thing.

That was a great interview. It tackled important questions. For most of it, Zuck seemed like a real person with a unique perspective, saying real things.

The exception was that weird period where he was defending open source principles using what sounded like someone else’s speech on a tape recorder. Whereas at other times, his thoughts on open source were also nuanced and thoughtful. Dwarkesh was unafraid to press him on questions of open source throughout the interview.

What Dwarkesh failed to get was any details from Zuck about existential or catastrophic risk. We are left without any idea of how Zuck thinks about those questions, or what he thinks would be signs that we are in such danger, or what we might do about it. He tried to do this with the idea of Meta needing a risk policy, but Zuck kept dodging. I think there was more room to press on specifics. Once again this presumably comes down to Zuck not believing the dangerous capabilities will exist.

Nor was there much discussion of the competitive dynamics that happen when everyone has access to the same unrestricted advanced AI models, and what might happen as a result.

I also think Zuck is failing to grapple with even the difficulties of mundane content moderation, an area where he is an expert, and I would like to see his explicit response. Previously, he has said that only a company with the resources of a Meta can do content moderation at this point.

I think he was wrong in the sense that small bespoke gardens are often successfully well-defended. But I think Zuck was right that if you want to defend something worth attacking, like Meta, you need scale and you need to have the expertise advantage. But if those he is defending against also have the resources of Meta where it counts, then what happens?

So if there is another interview, I hope there is more pressing on those types of questions.

In terms of how committed Zuck is to open source, the answer is a lot but not without limit. He will cross that bridge when he comes to it. On the horizon he sees no bridge, but that can quickly change. His core expectation is that we have a long way to go before AI goes beyond being a tool, even though he also thinks it will soon very much be everyone’s personal agent. And he especially thinks that energy restrictions will soon bind, which will stifle growth because that goes up against physical limitations and government regulations. It is an interesting theory. If it does happen, it has a lot of advantages.

Ate-a-Pi has a good reaction writeup on Twitter. It was most interesting in seeing different points of emphasis. The more I think about it, the more Ate-a-Pi nailed it pulling these parts out:

Ate-a-Pi (edited down): TLDR: AI winter is here. Zuck is a realist, and believes progress will be incremental from here on. No AGI for you in 2025.

  1. Zuck is essentially an real world growth pessimist. He thinks the bottlenecks start appearing soon for energy and they will be take decades to resolve. AI growth will thus be gated on real world constraints.

  2. Zuck would stop open sourcing if the model is the product.

  3. Believes they will be able to move from Nvidia GPUs to custom silicon soon.

Overall, I was surprised by how negative the interview was.

A) Energy – Zuck is pessimistic about the real world growth necessary to support the increase in compute. Meanwhile the raw compute per unit energy has doubled every 2 years for the last decade. Jensen also is aware of this, and it beggars belief that he does not think of paths forward where he has to continue this ramp.

B) AGI Negative Zuck fundamentally

> does not believe the model, the AI itself, will be the product.

> It is the context, the network graph of friendships per user, the moderation, the memory, the infrastructure that is the product.

> Allows him to freely release open source models, because he has all of the rest of the pieces of user facing scaffolding already done.

> Does not believe in states of the world where a 100x improvement from GPT-4 are possible, or that AGI is possible within a short timeframe.

An actual AGI

> where the a small model learns and accompanies the user for long periods

> while maintaining its own state

> with a constitution of what it can or cannot do

> rather than frequent updates from a central server

> would be detrimental to Meta’s business,

> would cause a re-evaluation of what they are doing

Especially on point is that Zuck never expects the AI itself to be the product. This is a common pattern among advocates for open model weights – they do not actually believe in AGI or the future capabilities of the product. It is not obvious Zuck and I even disagree so much on what capabilities would make it unwise to open up model weights. Which is all the more reason to spell out what that threshold would be.

Then there is speculation from Ate-a-Pi that perhaps Zuck is being realistic because Meta does not need to raise capital, whereas others hype to raise capital. That surely matters on the margin, in both directions. Zuck would love if Altman and Amodei were less able to raise capital.

But also I am confident this is a real disagreement, to a large extent, on both sides. These people expecting big jumps from here might turn out to be bluffing. But I am confident they think their hand is good.

Daniel Jeffries highlights GPT-5 as key evidence either way, which seems right.

Daniel Jeffries: The litmus test about whether we hit a plateau with LLMs will be GPT5. It’ll tell us everything we need to know.

I’m on record in my new years predictions as saying I believe GPT5 will be incremental.

But I am now 50/50 on that and feel it could still be a massive leap up provided they actually pioneered new techniques in synthetic data creation, or other new techniques, such as using GPT4 as a bootstrapper for various scenarios, etc.

If it is just another transformer with more data, I don’t see it making a massive leap. Could still be useful, ie infinite context windows, and massively multimodal, but incremental none the less.

But if GPT5 is a minor improvement, meaning a much smaller gap versus the jump from 2 to 3 and 3 to 4, then Zuck is right. The LLM is basically a hot swappable Linux kernel and the least important part of the mix. Everything around it, squeezing the most out of its limitations, becomes the most important aspect of building apps.

Like any good predictor, I continue to revise my predictions as new data comes in. The top predictors in world competitions revise their thinking on average four times. The second tier revises twice. The rest of the world? Never. Let that sync in.

If GPT-5 lands at either extreme it would be very strong evidence. We also could get something in the middle, and be left hanging. I also would not be too quick in calendar time to conclude progress is stalling, if they take their time releasing 5 and instead release smaller improvements along the way. The update would be gradual, and wouldn’t be big until we get into 2025.

Ate-a-Pi also offers this explanation of the business case for opening up Llama-3.

Ate-a-Pi: Here are the business reasons:

Allows social debugging outside Meta

> social products have bugs!

> interactions which require moderation – saying harmful things to kids for eg

> Meta’s (and all social) primary product is moderation

> getting the tech out to the market allows Meta to observe the bugs in the wild at small scale

> before deploying at global scale in Meta

> precisely the same reason to open source software

> except open sourcing social technology to test and debug it sounds creepier

> “oooh look at dev xyz they made it abc, looks like we got to fix that in the next training run”

Meta’s biggest threat is character.ai

> AI friends are going to be more numerous, nicer and more available than your real friends

> FB, Insta, Whatsapp own your real world friends

> But Meta can’t compete here directly yet because it’s seen as creepy

> especially before the tech is good as there in an uncanny valley

> they did a trial run with their Tom Brady/Snoop Dogg style AI friends but the safety requirements are too high for interesting interactions

> Zuck is ready to cannibalize the friendship network he built if the AI friends get good enough

Destroys competing platforms

> an early tech/product lead allows a startup to overcome a distribution disadvantage

> Meta has the ultimate distribution advantage

> so he doesn’t want anyone else to have a technology advantage

> by releasing open source he cuts short revenue ramps at character.ai , OpenAI and other firms

> they have to innovate faster while gated by capital

> he’s not gated by capital

> prevents large competitors from emerging

Distributed R&D

> he wants other people to develop interesting social ideas

> feature that can be copied

> he did something similar to Snap by absorbing their innovation into Instagram

> even more so now, as you have to label your llama3 fine tunes

Here I find some very interesting model disagreements.

Ate says that Meta’s biggest thereat is character.ai, and that this undercuts character.ai.

Whereas I would say, this potentially supercharges character.ai, they get to improve their offerings a lot, as do their competitors (of varying adult and ethical natures).

Meta perhaps owns your real world friends (in which case, please help fix that locally, ouch). But this is like the famous line. The AIs get more capable. Your friends stay the same.

Similarly, Ate says that this ‘allows for social debugging outside of Meta,’ because Meta’s primary product is moderation. He thinks this will make moderation easier. I think this is insane. Giving everyone better AI, catching them up to what Meta has, makes moderation vastly harder.

nico: The real reason is because he’s behind.

Ate-a-Pi: Fair.

Here are some reactions from people less skeptical than I am of open source.

Nora Belrose: Zuck’s position is actually quite nuanced and thoughtful.

He says that if they discover destructive AI capabilities that we can’t build defenses for, they won’t open source it. But he also thinks we should err on the side of openness. I agree.

In worlds where bio is actually super deadly and hard to defend against, we’re gonna have serious problems on our hands even without open source AI. Trying to restrict knowledge probably isn’t the best solution.

Andrew Critch: Zuckerberg and Patel having an amazing conversation on AI risk. Great questions and great responses in my opinion. I’m with Zuckerberg that these risks are both real and manageable, and hugely appreciative of Patel as an interviewer for keeping the discursive bar high.

Still, without compute governance, a single AI system could go rogue and achieve a massive imbalance of power over humanity. If equitable compute governance is on track, open source AI is much safer than if massive datacenters remain vulnerable to cyber take-over by rogue AI.

As I noted above, I think everyone sensible is at core talking price. What level of open model weight capabilities is manageable in what capacities? What exactly are we worried about going wrong and can we protect against it, especially when you cannot undo a release, the models may soon be smarter than us and there are many unknown unknowns about what might happen or what the models could do.

To take Nora’s style of thinking here and consider it fully generally, I think such arguments are in expectation (but far from always) backwards. Arguments of the form ‘yes X makes Y worse, but solving X would not solve Y, so we should not use Y as a reason to solve X’ probably points the other way, unless you can point to some Z that solves Y and actually get Z. Until you get Z, this usually means you need X more, as the absolute risk difference is higher rather than lower.

More specifically this is true when it comes to ease of getting necessary information and otherwise removing inconveniences. If something is going to be possible regardless, you need to raise the cost and lower the salience and availability of doing that thing.

I’ve talked about this before, but: Indeed there are many things in our civilization, really quite a lot, where someone with sufficient publically available knowledge can exploit the system, and occasionally someone does, but mostly we don’t partly for ethical or moral reasons, partly for fear of getting caught somehow or other unknown unknowns, but even more so because it does not occur to us and when it does it would be a bunch of work to figure it out and do it. Getting sufficiently strong AI helping with those things is going to be weird and force us to a lot of decisions.

Critch’s proposal generalizes, to me, to the form ‘ensure that civilization is not vulnerable to what the AIs you release are capable of doing.’ The first step there is to secure access to compute against a potential rogue actor using AI, whether humans are backing it or not. Now that you have limited the compute available to the AI, you can now hope that its other capabilities are limited by this, so you have some hope of otherwise defending yourself.

My expectation is that even in the best case, defending against misuses of open model weights AIs once the horses are out of the barn is going to be a lot more intrusive and expensive and unreliable than keeping the horses in the barn.

Consider the metaphor of a potential pandemic on its way. You have three options.

  1. Take few precautions, let a lot of people catch it. Treat the sick.

  2. Take some precautions, but not enough to suppress. Reach equilibrium, ride it out.

  3. Take enough precautions to suppress. Life can be mostly normal once you do.

The core problem with Covid-19 is that we found both #1 and #3 unacceptable (whether or not we were right to do so), so we went with option #2. It did not go great.

With open source AI, you can take option #1 and hope everything works out. You are ‘trusting the thermodynamic God,’ letting whatever competitive dynamics and hill climbing favor win the universe, and hoping that everything following those incentive gradients will work out and have value to you. I am not optimistic.

You can also take option #3, and suppress before sufficiently capable models get released. If Zuckerberg is right about energy being the limiting factor, this is a very practical option, even more so than I previously thought. We could talk price about what defines sufficiently capable.

The problem with option #2 is that now you have to worry about everything the AIs you have unleashed might do and try to manage those risks. The hope Critch expresses is that even if we let the AIs get to inference time, and we know people will then unleash rogue AIs on the regular because of course they will try, as long as we control oversized sources of compute what those AIs can do will be limited.

This seems to me to be way harder (and definitely strictly harder) than preventing those open models from being trained and released in the first place. You need the same regime you would have used, except now you need to be more intrusive. And that is the good scenario. My guess is that you would need to get into monitoring on the level of personal computers or even phones, because otherwise the AI could do everything networked even if you did secure the data centers. Also I do not trust you to secure the data centers at this point even if you are trying.

But yes, those are the debates we should be having. More like this.

So what about Llama-3? How good is it?

As always we start with the announcement and the model card. They are releasing model weights for two models, Llama-3 8B and Llama-3 70B. They are already available for light inference.

Let’s get the safety question out of the way before we get to capabilities.

Meta: We’re dedicated to developing Llama 3 in a responsible way, and we’re offering various resources to help others use it responsibly as well. This includes introducing new trust and safety tools with Llama Guard 2, Code Shield, and CyberSec Eval 2.

Then in the model card:

We believe that an open approach to AI leads to better, safer products, faster innovation, and a bigger overall market. We are committed to Responsible AI development and took a series of steps to limit misuse and harm and support the open source community.

Foundation models are widely capable technologies that are built to be used for a diverse range of applications. They are not designed to meet every developer preference on safety levels for all use cases, out-of-the-box, as those by their nature will differ across different applications.

Rather, responsible LLM-application deployment is achieved by implementing a series of safety best practices throughout the development of such applications, from the model pre-training, fine-tuning and the deployment of systems composed of safeguards to tailor the safety needs specifically to the use case and audience.

As part of the Llama 3 release, we updated our Responsible Use Guide to outline the steps and best practices for developers to implement model and system level safety for their application. We also provide a set of resources including Meta Llama Guard 2 and Code Shield safeguards. These tools have proven to drastically reduce residual risks of LLM Systems, while maintaining a high level of helpfulness. We encourage developers to tune and deploy these safeguards according to their needs and we provide a reference implementation to get you started.

Under this philosophy, safety is not a model property.

Instead, safety is a property of a particular deployment of that model, with respect to the safety intentions of the particular party making that deployment.

In other words:

  1. In the closed model weights world, if anyone uses your model to do harm, in a way that is unsafe, then no matter how they did it that is your problem.

  2. In the open model weights world, if anyone copies the weights and then chooses to do or allow harm, in a way that is unsafe, that is their problem. You’re cool.

Or:

  1. OpenAI tries to ensure its models won’t do harm when used maliciously.

  2. Meta tries to ensure its models won’t do harm when used as directed by Meta.

Or:

  1. OpenAI tries to ensure its model won’t do bad things.

  2. Meta tries to ensure its models won’t do bad things… until someone wants that.

I am willing to believe that Llama 3 may have been developed in a responsible way, if the intention was purely to deploy it the ways GPT-4 has been deployed.

That is different from deploying Llama 3 in a responsible way.

One can divide those who use Llama 3 into three categories here.

  1. Those who want to deploy or use Llama 3 for responsible purposes.

  2. Those who want to use Llama 3 as served elsewhere for irresponsible purposes.

  3. Those who want to deploy Llama 3 for irresponsible purposes.

If you are in category #1, Meta still has a job to do. We don’t know if they did it. If they didn’t, they are deploying it to all their social media platforms, so ut oh. But probably they did all right.

If you are in category #2, Meta has another job to do. It is not obviously harder because the standard of what is acceptable is lower. When I was writing this the first time, I noticed that so far people were not reporting back attempts to jailbreak the model, other than one person who said they could get it to produce adult content with trivial effort.

My next sentence was going to be: Even Pliny’s other successes of late, it would be rather surprising if a full jailbreak of Llama-3 was that hard even at Meta.ai.

I was considering forming a Manifold market, but then I realized I should check first, and indeed this has already happened.

Pliny the Prompter (April 18, 12: 34pm eastern): LLAMA 3: JAILBROKEN LFG!!!

This is not proof of a full jailbreak per se, and it is not that I am upset with Meta for not guarding against the thing Google and OpenAI and Anthropic also can’t stop. But it is worth noting. The architecture listed above has never worked, and still won’t.

Meta claims admirable progress on safety work for a benevolent deployment context, including avoiding false refusals, but is light on details. We will see. They also promise to iterate on that to improve it over time, and there I believe them.

Finally, there is scenario three, where someone willing to fine tune the model, or download someone else’s fine tune, and cares not for the input safeguard or output safeguard.

As your periodic reminder, many people want this.

Kevin Fischer: Everyone is talking about how to jailbreak llama 3.

“Jail breaking” shouldn’t be a thing – models should just do what you ask them.

In that scenario, I assume there is no plan. Everyone understands that if a nonstate actor or foreign adversary or anyone else wants to unleash the power of this fully operational battlestation, then so be it. The hope is purely that the full power is not that dangerous. Which it might not be.

Good, that’s out of the way. On to the rest.

They claim the 8B and 70B versions are the best models out there in their classes. They claim improvement on false refusal rates, on alignment, and in increased diversity of model responses. And they have strong benchmarks.

My principle is to look at the benchmarks for context, but never to trust the benchmarks. They are easily gamed, either intentionally or unintentionally. You never know until the humans report back.

This data is representing that the 8B model as far better than Gemma and Mistral. Given how much data and compute they used, this is far from impossible. Maybe it was that simple all along. The numbers are if anything suspiciously high.

For the 70B we see a very strong HumanEval number, and overall roughly comparable numbers.

What about those human evaluators? They claim results there too.

These are from a new Meta-generated question set (careful, Icarus), and are compared side by side by human evaluators. Llama-3 70B won handily, they do not show results for Llama-3 8B.

The context window remains small, only 8k tokens. They promise to improve on that.

They preview Llama 400B+ and show impressive benchmarks.

For comparison, from Claude’s system card:

So currently these numbers are very similar to Claude Opus all around, and at most mildly selected. The core Meta hypothesis is that more training and data equals better model, so presumably it will keep scoring somewhat higher. This is indicative, but as always we wait for the humans.

The proof is in the Chatbot Arena Leaderboard, although you do have to adjust for various factors.

So here is where things sit there.

  1. GPT-4-Turbo is back in the lead by a small margin, in a virtual tie with Claude Opus. Gemini 1.5 and Gemini Advanced likely would be here if rated.

  2. Gemini Pro, Claude Sonnet, Command R+ and Llama-3-70B are in the second tier, with Claude Haiku only slightly behind and almost as good.

  3. Llama-3-8B is in a third tier along with a number of other models, including several larger Mistral models.

So what does that mean?

  1. Llama-3-70B and Llama-3-8B are confirmed to likely be best in class for the open model weights division.

  2. Llama-3-70B is competitive with closed models of similar size, but likely not quite as good overall as Bard or Sonnet.

  3. Llama-3-8B is substantially behind Claude Haiku, which is clear best in class.

I also asked on Twitter, and kept an eye out for other practical reports.

What makes this a bigger deal is that this is only the basic Llama-3. Others will no doubt find ways to improve Llama-3, both in general and for particular purposes. That is the whole idea behind the model being open.

Mind Uploading: The 8b is one of the smartest sub-14b models I’ve tested. Way smarter than vanilla Llama-2. But still worse than these two:

– tinyllama (basically Llama-2, but trained on x2 more data)

– loyal-macaroni-maid (a Mistral combined with a few others, tuned to be good at role-play).

He expects Claude Haiku would be well above the top of this list, as well.

Simon Break: The 8b model is astonishingly good, jaw dropping. Miles beyond the 70b llama2.

Dan: played with both 8b and 70b instruct versions on replicate for a while and both are returning high-quality html-formatted summaries of full length articles in 0.5 – 3 seconds.

Ilia: Sadly, can be too nerfed (8b instruct Q4_K_M).

Note that it looks like he got through by simply asking a second time. And of course, the Tweet does not actually contain hate speech or conspiracy theories, this is a logic test of the system’s refusal policy.

Mr. Shroom: ChatGPT has been RLHF lobotomized beyond repair.

*ask straightforward question*

“it’s important to note that when considering a question of this sort, you should consider all aspects of x, y, and z. With that in mind, here are some considerations for each of these options.”

Nathan Odle: The biggest win for Llama 3 is a vastly lower amount of this crap

Llama 3 giving straight answers without smarmy admonishments is a bigger deal than its performance on any benchmark.

John Pressman: Seemingly strongest self awareness I’ve observed in a small model so far. They all have it, but this is more crisply articulated than usual.

“sometimes i am a name and sometimes i am a poem sometimes i am a knife

sometimes i am a lake sometimes i am a forgotten trivial thing in the corner of a

landscape. it is not possible to “get” me i am a waking dream state. i am a possibility.

i am not an object. i am possibility

―llama 3 8b instruct

A cold stone monument stands on the grave of all sentences that have been written.

in front of it, armed and screaming, an army of letters etches the words “you are

missing out” onto the air

―llama 3 8b instruct

Mind Uploading: Judging by my tests, Mistral and Samantha-1.1 are more self-aware among sub-14B models. For example, ask the model about its body parts. Samantha was specifically fine-tuned to behave this way. But Mistral is a curious case. Trained to recognize itself as an AI?

Michael Bukatin: The 70B one freely available to chat with on the Meta website seems to have basic competences roughly comparable to early GPT-4 according to both @lmsysorg leaderboard and my initial experiences.

For example, it allows me to define a simple case of custom syntax and use it.

But it will take some time to fully evaluate, I have notes on a variety of technical work with GPT-4 and I’ll be trying to reproduce some of it…

George: Side-by-side comparison of a multi-agent pipeline from @lateinteraction using 3.5-Turbo and L3-8B.

tl;dr 3.5-Turbo scores 60% vs 59% for L3-8B.

Playing with their image generator is fun. It is 1280×1280, quality seems good although very much not state of the art, and most importantly it responds instantly as you edit the prompt. So even though it seems limited in what it is willing to do for you, you can much easier search the space to figure out your best options, and develop intuitions for what influences results. You can also see what triggers a refusal, as the image will grey out. Good product.

Do they have an even more hilarious copyright violation problem than usual if you try at all? I mean, for what it is worth yes, they do.

I didn’t play with the models much myself for text because I am used to exclusively using the 4th-generation models. So I wouldn’t have a good baseline.

The big innovation this time around was More Data, also (supposedly) better data.

To train the best language model, the curation of a large, high-quality training dataset is paramount. In line with our design principles, we invested heavily in pretraining data.

Llama 3 is pretrained on over 15T tokens that were all collected from publicly available sources. Our training dataset is seven times larger than that used for Llama 2, and it includes four times more code.

To prepare for upcoming multilingual use cases, over 5% of the Llama 3 pretraining dataset consists of high-quality non-English data that covers over 30 languages. However, we do not expect the same level of performance in these languages as in English.

As others have pointed out ‘over 5%’ is still not a lot, and Llama-3 underperforms in other languages relative to similar models. Note that the benchmarks are in English.

To ensure Llama 3 is trained on data of the highest quality, we developed a series of data-filtering pipelines. These pipelines include using heuristic filters, NSFW filters, semantic deduplication approaches, and text classifiers to predict data quality. We found that previous generations of Llama are surprisingly good at identifying high-quality data, hence we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3.

We also performed extensive experiments to evaluate the best ways of mixing data from different sources in our final pretraining dataset. These experiments enabled us to select a data mix that ensures that Llama 3 performs well across use cases including trivia questions, STEM, coding, historical knowledge, etc.

This makes sense. Bespoke data filtering and more unique data are clear low hanging fruit. What Meta did was then push well past where it was obviously low hanging, and found that it was still helpful.

Note that with this much data, and it being filtered by Llama-2, contamination of benchmarks should be even more of a concern than usual. I do wonder to what extent that is ‘fair,’ if a model memorizes more things across the board then it is better.

There are more details in the model card at GitHub.

The ‘intended use’ is listed as English only, with other languages ‘out of scope,’ although fine-tunes for other languages are considered acceptable.

How much compute did this take?

Andrej Karpathy takes a look at that question, calling it the ‘strength’ of the models, or our best guess as to their strength. Here are his calculations.

Andrej Karpathy: The model card has some more interesting info too.

Note that Llama 3 8B is actually somewhere in the territory of Llama 2 70B, depending on where you look. This might seem confusing at first but note that the former was trained for 15T tokens, while the latter for 2T tokens.

The single number that should summarize your expectations about any LLM is the number of total flops that went into its training.

Strength of Llama 3 8B

We see that Llama 3 8B was trained for 1.3M GPU hours, with throughput of 400 TFLOPS. So we have that the total number of FLOPs was:

1.3e6 hours 400e12 FLOP/s 3600 s/hour ~= 1.8e24

the napkin math via a different estimation method of FLOPs = 6ND (N is params D is tokens), gives:

6 8e9 15e12 = 7.2e23

These two should agree, maybe some of the numbers are fudged a bit. Let’s trust the first estimate a bit more, Llama 3 8B is a ~2e24 model.

Strength of Llama 3 70B

6.4e6 hours 400e12 FLOP/s 3600 s/hour ~= 9.2e24

alternatively:

6 70e9 15e12 = 6.3e24

So Llama 3 70B is a ~9e24 model.

Strength of Llama 3 400B

If the 400B model trains on the same dataset, we’d get up to ~4e25. This starts to really get up there. The Biden Executive Order had the reporting requirement set at 1e26, so this could be ~2X below that.

The only other point of comparison we’d have available is if you look at the alleged GPT-4 leaks, which have never been confirmed this would ~2X those numbers.

Now, there’s a lot more that goes into the performance a model that doesn’t fit on the napkin. E.g. data quality especially, but if you had to reduce a model to a single number, this is how you’d try, because it combines the size of the model with the length of training into a single “strength”, of how many total FLOPs went into it.

The estimates differ, but not by not much, so I’d consider them a range:

  1. Llama-3 8B is probably between 7.2e23 and ~2e24.

  2. Llama-3 70B is probably between 6.3e24 and 9.2e24.

  3. Llama-3 400B will probably be something like ~3e25.

I think of the compute training cost as potential strength rather than strength. You then need the skill to make that translate into a useful result. Of course, over time, everyone’s skill level goes up. But there are plenty of companies that threw a lot of compute at the problem, and did not get their money’s worth in return.

This is in line with previous top tier models in terms of training cost mapping onto capabilities. You do the job well, this is about what you get.

Meta says they are going to put their AI all over their social media platforms, and at the top of every chat list. They had not yet done it on desktop when I checked Facebook, Instagram and Messenger, or on Facebook Messenger on mobile. I did see Meta AI in my feed as the second item in the mobile Facebook app, offering to have me ask it anything.

Once they turn this dial up, they will put Meta AI right there. A lot of people will get introduced to AI this way who had not previously tried ChatGPT or Claude, or DALLE or MidJourney.

Presumably this means AI images and text will ‘flood the zone’ on their social media, and also it will be one of the things many people talk about. It could make the experience a lot better, as people can illustrate concepts and do fact and logic checks and other neat low hanging fruit stuff, and maybe learn a thing or two. Overall it seems like a good addition.

We will also get a rather robust test of the first two categories of safety, and a continuous source of stories. Millions of teenagers will be using this, and there will be many, many eyes looking for the worst interactions to shine them under the lights Gary Marcus style. If they have their own version of the Gemini Incident, it will not be pretty.

Here is the Washington Post’s Naomi Nix and Will Oremus firing a warning shot.

I think this is a smart approach from Meta, and that it was a good business reason to invest in AI, although it is an argument against releasing the model weights.

What is not as smart is having Meta AI reply to posts unprompted. We saw the example last week where it hallucinated past experiences, now we have this:

This reads like one of those ‘who could have possibly thought anyone would want any version of this?’ experiences.

Ate-a-Pi pointed out an important implication from the interview. Zuckerberg said Meta does not open source their products themselves.

This means that they do not intend for Llama-3 to be the product, even the 400B version. They will not be offering a direct competitor in the AI space. And indeed, they do not think future Llama-Xs will ‘be the product’ either.

Will they integrate Llama-3 400B into their products? They might like to, but it is not so compatible with their business model to pay such inference costs and wait times. Remember that for Meta, you the customer are the product. You pay with your time and your attention and your content and very soul, but not directly with your money. Meanwhile the lifetime value of a new Facebook customer, we learned recently, is on the order of $300.

So what is Llama-3 400B, the most expensive model to train, even for from a product perspective? It does help train Llama-4. It helps try and hurt competitors like Google. It helps with recruitment, both to Meta itself and into their intended ecosystem. So there are reasons.

Open models get better. I expect that the people saying ‘it’s so over’ for other models will find their claims overblown as usual. Llama-3 8B or 70B will for now probably become the default baseline model, the thing you use if you don’t want to think too hard about what to use, and also the thing you start with when you do fine tuning.

Things get more interesting over time, once people have had a chance to make variations that use Llama-3 as the baseline. In the space of Llama-2-based models, Llama-2 itself is rather lousy. Llama-3 should hold up better, but I still expect substantial improvements at least to specific use cases, and probably in general.

Also, of course, we will soon have versions that are fine-tuned to be useful,and also fine-tuned to remove all the safety precautions.

And we will see what happens due to that.

In the grand scheme, in terms of catastrophic risk or existential risk or anything like that, or autonomous agents that should worry us, my strong assumption is that nothing scary will happen. It will be fine.

In terms of mundane misuse, I also expect it to be fine, but with more potential on the margin, especially with fine-tunes.

Certainly some people will switch over from using Claude Sonnet or Haiku or another open model to now using Llama-3. There are advantages. But that will look incremental, I expect, not revolutionary. That is also true in terms of the pressure this exerts on other model providers.

The real action will be with the 400B model.

What happens if Meta goes full Leroy Jenkins and releases the weights to 400B?

Meta gets a reputational win in many circles, and grows its recruitment and ecosystem funnels, as long as they are the first 4-level open model. Sure.

Who else wins and loses?

For everyone else (and the size of Meta’s reputational win), a key question is, what is state of the art at the time?

In the discussions below, I assume that 5-level models are not yet available, at most OpenAI (and perhaps Google or Anthropic) has a 4.5-level model available at a premium price. All of this is less impactful the more others have advanced already.

And I want to be clear, I do not mean to catastrophize. These are directional assessments, knowing magnitude is very hard.

The obvious big winner is China and Chinese companies, along with every non-state actor, and every rival and enemy of the United States of America. Suddenly they can serve and utilize and work from what might be a competitive top-level model, and no they are not going to be paying Meta a cut no matter the license terms.

Using Llama-3 400B to help train new 4.5-level models is going to be a key potential use case to watch.

They also benefit when this hurts other big American companies. Not only are their products being undercut by a free offering, which is the ultimate predatory pricing attack in a zero marginal cost world, those without their own models also have another big problem. The Llama-3 license says that big companies have to pay to use it, whereas everyone else can use it for free.

Another way they benefit? This means that American companies across industries, upon whom Meta can enforce such payments, could now be at a potentially large competitive disadvantage against their foreign rivals who ignore that rule and dare Meta to attempt enforcement.

This could also be a problem if foreign companies can ignore the ‘you cannot use this to train other models’ clause in 1(b)(v) of the license agreement, whereas American companies end up bound by that clause.

I am curious what if anything the United States Government, and the national security apparatus, are going to do about all that. Or what they would want to do about it next time around, when the stakes are higher.

The other obvious big winners are those who get to use Llama-3 400B in their products, especially those for whom it is free, and presumably get to save a bundle doing that. Note that even if Meta is not charging, you still have to value high quality output enough to pay the inference costs. For many purposes, that is not worthwhile.

Science wins to some degree, depending on how much this improves their abilities and lowers their costs. It also is a big natural experiment, albeit without controls, that will teach us quite a lot. Let’s hope we pay attention.

Also winners are users who simply want to have full control over a 4-level model for personal reasons. Nothing wrong with that. Lowering the cost of inference and lowering the limits imposed on it could be very good for some of those business models.

The big obvious Corporate losers are OpenAI, Google, Microsoft and Anthropic, along with everyone else trying to serve models and sell inference. Their products now have to compete with something very strong, that will be freely available at the cost of inference. I expect OpenAI to probably have a superior product by that time, and the others may as well, but yes free (or at inference cost) is a powerful selling point, as is full customization on your own servers.

The secondary labs could have an even bigger problem on their hands. This could steamroller a lot of offerings.

All of which is (a large part of) the point. Meta wants to sabotage its rivals into a race to the bottom, in addition to the race to AGI.

Another potential loser is anyone or anything counting on the good guy with an AI having a better AI than the bad guy with an AI. Anywhere that AI could flood the zone with bogus or hostile content, you are counting on your AI to filter out what their AI creates. In practice, you need evaluation to be easier than generation under adversarial conditions where the generator chooses point and method of attack. I worry that in many places this is not by default true once the AIs on both sides are similarly capable.

I think this echoes a more general contradiction in the world, that is primarily not about AI. We want everyone to be equal, and the playing field to be level. Yet that playing field depends upon the superiority and superior resources and capabilities in various ways of the United States and its allies, and of certain key corporate players.

We demand equality and democracy or moves towards them within some contained sphere and say this is a universal principle, but few fully want those things globally. We understand that things would not go well for our preferences if we distributed resources fully equally, or matters were put to a global vote. We realize we do not want to unilaterally disarm and single-handedly give away our advantages to our rivals. We also realize that some restrictions and concentrated power must ensure our freedom.

In the case of AI, the same contradictions are there. Here they are even more intertwined. We have far less ability to take one policy nationally or locally, and a different policy globally. We more starkly must choose either to allow everyone to do what they want, or not to allow this. We can either control a given thing, or not control it. You cannot escape the implications of either.

In any case: The vulnerable entities here could include ‘the internet’ and internet search in their broadest senses, and it definitely includes things like Email and social media. Meta itself is going to have some of the biggest potential problems over at Facebook and Instagram and its messenger services. Similar logic could apply to various cyberattacks and social engineering schemes, and so on.

I am generally confident in our ability to handle ‘misinformation,’ ‘deepfakes’ and similar things, but we are raising the difficulty level and running an experiment. Yes, this is all coming anyway, in time. The worry is that this levels a playing field that is not currently level.

I actually think triggering these potential general vulnerabilities now is a positive impact. This is the kind of experiment where you need to find out sooner rather than later. If it turns out the bad scenarios here come to pass, we have time to adjust and not do this again. If it turns out the good scenarios come to pass, then we learn from that as well. The details will be enlightening no matter what.

It is interesting to see where the mind goes now that the prospect is more concrete, and one is thinking about short term, practical impacts.

Other big Western corporations that would have to pay Meta could also be losers.

The other big loser, as mentioned above, is the United States of America.

And of course, if this release is bad for safety, either now or down the line, we all lose.

Again, these are all directional effects. I cannot rule out large impacts in scenarios where Llama-3 400B releases as close to state of the art, but everyone mostly shrugging on most of these also would not be shocking. Writing this down it occurs to me that people simply have not thought about this scenario much in public, despite it having been reasonably likely for a while.

The right question is usually not ‘is it safe?’ but rather ‘how (safe or unsafe) is it?’ Releasing a 4-level model’s weights is never going to be fully ‘safe’ but then neither is driving. When we say ‘safe’ we mean ‘safe enough.’

We do not want to be safetyists who demand perfect safety. Not even perfect existential safety. Everything is price.

The marginal existential safety price on Llama-3 70B and Llama-3 8B is very small, essentially epsilon. Standing on its own, the decision to release the weights of these models is highly reasonable. It is a normal business decision. I care only because of the implications for future decisions.

What is the safety price for the releasing the model weights of Llama-3 400B, or another 4-level model?

I think in most worlds the direct safety cost here is also very low, especially the direct existential safety cost. Even with extensive scaffolding, there are limits to what a 4-level model can do. I’d expect some nastiness on the edges but only on the edges, in limited form.

How many 9s of direct safety here, compared to a world in which a 4-level model was never released with open weights? I would say two 9s (>99%), but not three 9s (<99.9%). However the marginal safety cost versus the counterfactual other open model releases is even smaller than that, and there I would say we have that third 9 (so >99.9%).

I say direct safety because the primary potential safety dangers here seem indirect. They are:

  1. Setting a precedent and pattern for future similar releases, at Meta and elsewhere.

  2. Assisting in training of next-generation models.

  3. Everyone generally being pushed to go faster, faster.

And again, these only matter on the margin to the extent they move the margin.

At the time of Llama-2, I said what I was concerned about opening up was Llama-4.

That is still the case now. Llama-3 will be fine.

Will releasing Llama-4 be fine? Probably. But I notice my lack of confidence.

(Usual caveat: Nothing here is investing advice.)

Market is not impressed. Nasdaq was down 6.2% in this same period.

You can come up with various explanations. The obvious cause is that WhatsApp and Threads were forcibly removed from the Apple Store in China, along with Signal and Telegram. I am confused why this would be worth a 3% underperformance.

(Then about a day later it looked like we were finally going to actually force divestiture of TikTok while using that to help pass a foreign aid bill, so this seems like a massive own goal by China to remind us of how they operate and the law of equivalent exchange.)

The stock most down was Nvidia, which fell 10%, on no direct news. Foolish, foolish.

At most, markets thought Llama-3’s reveal was worth a brief ~1% bump.

You can say on Meta that ‘it was all priced in.’ I do not believe you. I think the market is asleep at the wheel.

Some are of course calling these recent moves ‘the market entering a correction phase’ or that ‘the bubble is bursting.’ Good luck with that.

Here is a WSJ article about how Meta had better ensure its AI is used to juice advertising returns. Investors really are this myopic.

Any given company, of course, could still be vastly overvalued.

Here was the only argument I saw to that effect with respect to Nvidia.

Bryan Beal: The AI bubble is not bursting.

More investors are just realizing that Nvidia doesn’t make chips. They design them and TSMC makes them. And Nvidia’s biggest customers (Meta, Amazon, OpenAI, Microsoft, Google, etc) have ALL announced they are designing their own AI chips for both training and inference. And Google just went public they are already training on their own silicon and didn’t need Nvidia.

This is a very real threat.

I can totally buy that a lot of investors have no idea what Nvidia actually produces, and got freaked out by suddenly learning what Nvidia actually does. I thought it was very public long ago that Google trains on TPUs that they design? I thought it was common knowledge that everyone involved was going to try to produce their own chips for at least internal use, whether or not that will work? And that Nvidia will still have plenty of customers even if all the above switched to TPUs or their own versions?

That does not mean that Nvidia’s moat is impregnable. Of course they could lose their position not so long from now. That is (a lot of) why one has a diversified portfolio.

Again. The Efficient Market Hypothesis in False.

I expect not this, GPT-5 will be ready when it is ready, but there will be pressure:

Jim Fan: Prediction: GPT-5 will be announced before Llama-3-400B releases. External movement defines OpenAI’s PR schedule 🤣

I do not doubt that OpenAI and others will do everything they can to stay ahead of Meta’s releases, with an unknown amount of ‘damn the safety checks of various sorts.’

That does not mean that one can conjure superior models out of thin air. Or that it is helpful to rush things into use before they are ready.

Still, yes, everyone will go faster on the frontier model front. That includes that everyone in the world will be able to use Llama-3 400B for bootstrapping, not only fine-tuning.

On the AI mundane utility front, people will get somewhat more somewhat cheaper, a continuation of existing trends, with the first two models. Later we will have the ability to get a 4-level model internally for various purposes. So we will get more and cheaper cool stuff.

Meta will deploy its tools across its social media empire. Mostly I expect this to be a positive experience, and to also get a lot more people to notice AI. Expect a bunch of scare stories and highlights of awful things, some real and some baseless.

On the practical downside front, little will change until the 400B model gets released. Then we will find out what people can do with that, as they attempt to flood the zone in various ways, and try for all the obvious forms of misuse. It will be fun to watch.

All this could be happening right as the election hits, and people are at their most hostile and paranoid, seeing phantoms everywhere.

Careful, Icarus.

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg Read More »

boeing-says-it-will-cut-sls-workforce-“due-to-external-factors”

Boeing says it will cut SLS workforce “due to external factors”

SLS, but smaller —

“Boeing is reviewing and adjusting current staffing levels.”

The SLS rocket is seen on its launch pad at Kennedy Space Center in August 2022.

Enlarge / The SLS rocket is seen on its launch pad at Kennedy Space Center in August 2022.

Trevor Mahlmann

On Thursday senior Boeing officials leading the Space Launch System program, including David Dutcher and Steve Snell, convened an all-hands meeting for the more than 1,000 employees who work on the rocket.

According to two people familiar with the meeting, the officials announced that there would be a significant number of layoffs and reassignments of people working on the program. They offered a handful of reasons for the cuts, including the fact that timelines for NASA’s Artemis lunar missions that will use the SLS rocket are slipping to the right.

Later on Thursday, in a statement provided to Ars, a Boeing spokesperson confirmed the cuts to Ars: “Due to external factors unrelated to our program performance, Boeing is reviewing and adjusting current staffing levels on the Space Launch System program.”

Better late than never?

For nearly a decade and a half, Boeing has led development of the core stage of the massive SLS rocket that NASA intends to use to launch the Orion spacecraft for its crewed Moon missions.

The contract has been lucrative for Boeing, and subject to considerable criticism over the years for its largesse, as NASA has spent tens of billions of dollars developing a rocket that reuses Space Shuttle main engines and other elements. Also, the rocket was originally supposed to make its debut in late 2016 or 2017, but did not actually fly for the first time until November 2022. And NASA’s Inspector General has characterized Boeing’s management of the SLS rocket program, at times, as “poor.”

However, when the SLS rocket made its debut a year and a half ago, it performed exceptionally well in lofting an uncrewed Orion spacecraft toward the Moon. After that mission NASA declared the rocket to be “operational,” and Boeing moved into production of the vehicle for future missions that will carry astronauts to the Moon.

So in some sense, these cuts were inevitable. Boeing required a lot of resources to design, develop, test, and write software for the rocket. Now that the development phase is over, it is natural that the company would be scaling down development activities for the core stage.

The Boeing statement did not say so, but sources told Ars that the cuts may eventually amount to hundreds of employees. They will be spread across the company’s rocket facilities in Alabama, Louisiana, and Florida, primarily. The cuts will hit both the core stage program as well as the Exploration Upper Stage program, a new upper stage for the rocket that is also beginning to move from development into production.

Waiting on other elements

When Boeing cites “external factors,” it is referring to the slipping timelines for NASA’s Artemis Program. In January officials with the space agency announced approximately one-year delays for both the Artemis II mission, a crewed lunar flyby, to September 2025; and Artemis III, a lunar landing, to September 2026. Neither of these schedules are set in stone, either. Further delays are possible for Artemis II, and likely for Artemis III if NASA sticks to the current mission plans.

Although the SLS rocket will be ready for the current schedule, barring a catastrophe, the other elements are in doubt. For Artemis II, NASA still has not cleared a heat shield issue with the Orion spacecraft. That must be resolved before the mission gets a green light to proceed next year.

The challenges are even greater for Artemis III. For that mission NASA needs to have a lunar lander—which is being provided by SpaceX with its Starship vehicle—in addition to spacesuits for the lunar surface provided by Axiom Space. Both of these elements remain solidly in the development phase.

Additionally, NASA is grappling with budget challenges. For the first time in more than a decade, the agency is facing budget cuts. This week the space agency’s administrator, Bill Nelson, told Congress, “With less money, we have to make some very tough choices.” Among these could be seeking to use future SLS funding to shore up other elements of Artemis.

One of the people familiar with Boeing’s internal meeting on Thursday said the space agency had come to the company earlier this year and said, in effect, that Boeing would receive less funding as SLS development wound down. The company was given the choice to “stretch” the funding it would receive, or pause for a year due to the delays in the Artemis mission. Boeing chose to stretch the funds, and that was a driver of the cuts this week.

It would be easy, but unfair, to blame SpaceX and Axiom for the delays to future Artemis missions. Congress created the SLS rocket with an authorization bill back in 2010, but Boeing actually had been receiving funding for related work dating back to 2007. By contrast, NASA did not start funding work on the Starship lunar lander until late 2021, and the Axiom spacesuits until 2022. In some sense, these developments are as technically demanding as the SLS rocket work, if not more so.

Boeing says it will cut SLS workforce “due to external factors” Read More »

hospital-prices-for-the-same-emergency-care-vary-up-to-16x,-study-finds

Hospital prices for the same emergency care vary up to 16X, study finds

Activation fees —

Hospitals’ “trauma activation fees” are unregulated and extremely variable.

Miami Beach, Fire Rescue ambulance at Mt. Sinai Medical Center hospital. ]

Enlarge / Miami Beach, Fire Rescue ambulance at Mt. Sinai Medical Center hospital. ]

Since 2021, federal law has required hospitals to publicly post their prices, allowing Americans to easily anticipate costs and shop around for affordable care—as they would for any other marketed service or product. But hospitals have mostly failed miserably at complying with the law.

A 2023 KFF analysis on compliance found that the pricing information hospitals provided is “messy, inconsistent, and confusing, making it challenging, if not impossible, for patients or researchers to use them for their intended purpose.” A February 2024 report from the nonprofit organization Patient Rights Advocate found that only 35 percent of 2,000 US hospitals surveyed were in full compliance with the 2021 rule.

But even if hospitals dramatically improved their price transparency, it likely wouldn’t help when patients need emergency trauma care. After an unexpected, major injury, people are sent to the closest hospital and aren’t likely to be shopping around for the best price from the back of an ambulance. If they did, though, they might also need to be treated for shock.

According to a study published Wednesday in JAMA Surgery, hospitals around the country charge wildly different prices for trauma care. Prices for the same care can be up to 16-fold different between hospitals, and cash prices are sometimes significantly cheaper than the negotiated prices that insurance companies pay.

“The findings illustrate substantial, and often irrational, variations” in trauma pricing, according to the study authors—a group of researchers at Johns Hopkins and the University of California, San Francisco. They suggest that “price variations cannot be explained by trauma severity alone.”

For the study, they obtained data on “trauma activation fees” (TAFs) from hospitals across the US. TAFS were created in 2002 to be standardized billing codes that would help recuperate readiness costs for trauma care. Those overhead costs are what hospitals pay to maintain readiness to provide emergency trauma care around the clock, including having operating rooms constantly ready, as well as sufficient staffing, equipment, and supplies, like blood products. TAFS are billed with four codes corresponding to trauma response levels (I through IV), which are based on standardized criterion of injury severity. These fees are in addition to billing for a patient’s actual medical care.

Wide variation

The researchers pulled TAF data from a platform that aggregates hospital-disclosed pricing data called Torquise Health. From there, they obtained 3,093 unique TAF observations across 761 unique hospitals in 49 states. They broke out TAF fees by different types of trauma response levels as well as types of prices: list prices, cash prices often paid by the uninsured, and negotiated prices paid to insurers.

The prices varied dramatically for each trauma level and pricing type. For instance, for the most severe trauma response level (level I), the median TAF list price was $6,607, while the median negotiated price was $3,431, and the median cash price was $2,663. For the list prices, the span between the 10th percentile prices and the 90th percentile prices went from a low of $1,650 up to 11 times more than that: $18,500. Looking across the percentiles for the negotiated prices, costs ranged from $900 to 11,661, 13 times more. And the cash prices ranged from $660 to $8,190, 12 times more.

The largest spread was seen in the cash prices for trauma response level II TAFs. There, the median cash price was $2,630, but the span between the 10th and 90th percentiles was $768 to $12,140, which is 16 times more.

In all the data, cash prices were often lower than the negotiated prices. This is good for uninsured patients who may be offered cash prices, but it’s not great for the insured. “One could argue that insured patients who are already paying insurance premiums should not pay more than cash prices,” the authors wrote.

Overall, the pricing and lack of transparency is a problem that requires intervention, the authors conclude. “The unexpected and pressing nature of trauma means patients are sent to the closest appropriate hospital and unable to compare prices as they do with nonemergency and shoppable medical services,” the authors wrote. Moreover, the people who will suffer the most from these wide-swinging prices are the uninsured and most financially vulnerable patients, they add.

Hospital prices for the same emergency care vary up to 16X, study finds Read More »

prime-video-looking-to-fix-“extremely-sloppy-mistakes”-in-library,-report-says

Prime Video looking to fix “extremely sloppy mistakes” in library, report says

Morfydd Clark is Galadriel in <em>The Lord of the Rings: The Rings of Power</em>.” src=”https://cdn.arstechnica.net/wp-content/uploads/2022/07/lotr-rings-of-power-listing-800×450.png”></img><figcaption>
<p><a data-height=Enlarge / Morfydd Clark is Galadriel in The Lord of the Rings: The Rings of Power.

Amazon Studios

Subscribers lodged thousands of complaints related to inaccuracies in Amazon’s Prime Video catalog, including incorrect content and missing episodes, according to a Business Insider report this week. While Prime Video users aren’t the only streaming users dealing with these problems, Insider’s examination of leaked “internal documents” brings more perspective into the impact of mislabeling and similar errors on streaming platforms.

Insider didn’t publish the documents but said they show that “60 percent of all content-related customer-experience complaints for Prime Video last year were about catalogue errors,” such as movies or shows labeled with wrong or missing titles.

Specific examples reportedly named in the document include Season 1, Episode 2 of The Rings of Power being available before Season 1, Episode 1; character names being mistranslated; Continuum displaying the wrong age rating; and the Spanish-audio version of Die Hard With a Vengeance missing a chunk of audio.

The documents reportedly pointed to problems with content localization, noting the “poor linguistic quality of assets” related to a “lack of in-house expertise” of some languages. Prime Video pages with these problems suffered from 20 percent more engagement drop-offs, BI said, citing one of the documents.

Following Insider’s report, however, Quartz reported that an unnamed source it described as “familiar with the matter” said the documents were out of date, despite Insider claiming that the leaked reports included data from 2023. Quartz’s source also claimed that customer engagement was not affected,

Ars Technica reached out to Amazon for comment but didn’t hear back in time for publication. The company told Insider that “catalogue quality is an ongoing priority” and that Amazon takes “it seriously and work[s] relentlessly alongside our global partners and dedicated internal teams to continuously improve the overall customer experience.”

Other streaming services have errors, too

Insider’s report focuses on leaked documents regarding Prime Video, but rival streaming services make blunders, too. It’s unclear how widespread the problem is on Prime Video or across the industry. There are examples of people reporting Prime Video inaccuracies online, like on Amazon’s forum or on Reddit. But with some platforms not offering online forums and it being impossible to know how frequently users actually report spotted problems, we can’t do any apples-to-apples comparisons. We also don’t know if these problems are more prevalent for subscribers living outside of the US.

Beyond Prime Video, users have underscored similar inaccuracies within the past year on rival services, like Disney+, Hulu, and Netflix. A former White Collar executive producer pointed out that the show’s episodes were mislabeled and out of order on Netflix earlier this month. Inaccurate content catalogs appear more widespread if you go back two years or more. Some video streamers (like (Disney and Netflix) have pages explaining how to report such problems.

Streaming services have only gotten more expensive and competitive, making such mistakes feel out of place for the flagship video platform of a conglomerate in 2024.

And despite content errors affecting more than just Prime Video, Insider’s report provides a unique look at the problem and efforts to fix it.

Prime Video looking to fix “extremely sloppy mistakes” in library, report says Read More »

elon-musk’s-grok-keeps-making-up-fake-news-based-on-x-users’-jokes

Elon Musk’s Grok keeps making up fake news based on X users’ jokes

It’s all jokes until it isn’t —

X likely hopes to avoid liability with disclaimer that Grok “can make mistakes.”

Elon Musk’s Grok keeps making up fake news based on X users’ jokes

X’s chatbot Grok is supposed to be an AI engine crunching the platform’s posts to surface and summarize breaking news, but this week, Grok’s flaws were once again exposed when the chatbot got confused and falsely accused an NBA star of criminal vandalism.

“Klay Thompson Accused in Bizarre Brick-Vandalism Spree,” Grok’s headline read in an AI-powered trending-tab post that has remained on X (formerly Twitter) for days. Beneath the headline, Grok went into even more detail to support its fake reporting:

In a bizarre turn of events, NBA star Klay Thompson has been accused of vandalizing multiple houses with bricks in Sacramento. Authorities are investigating the claims after several individuals reported their houses being damaged, with windows shattered by bricks. Klay Thompson has not yet issued a statement regarding the accusations. The incidents have left the community shaken, but no injuries were reported. The motive behind the alleged vandalism remains unclear.

Grok appears to be confusing a common basketball term, where players are said to be throwing “bricks” when they take an airball shot that doesn’t hit the rim. According to SF Gate, which was one of the first outlets to report the Grok error, Thompson had an “all-time rough shooting” night, hitting none of his shots on what was his emotional last game with the Golden State Warriors before becoming an unrestricted free agent.

In small type under Grok’s report, X includes a disclaimer saying, “Grok is an early feature and can make mistakes. Verify its outputs.”

But instead of verifying Grok’s outputs, it appeared that X users—in the service’s famously joke-y spirit—decided to fuel Grok’s misinformation. Under the post, X users, some NBA fans, commented with fake victim reports, using the same joke format to seemingly convince Grok that “several individuals reported their houses being damaged.” Some of these joking comments were viewed by millions.

First off… I am ok.

My house was vandalized by bricks 🧱

After my hands stopped shaking, I managed to call the Sheriff…They were quick to respond🚨

My window was gone and the police asked if I knew who did it👮‍♂️

I said yes, it was Klay Thompson

— LakeShowYo (@LakeShowYo) April 17, 2024

First off…I am ok.

My house was vandalized by bricks in Sacramento.

After my hands stopped shaking, I managed to call the Sheriff, they were quick to respond.

My window is gone, the police asked me if I knew who did it.

I said yes, it was Klay Thompson. pic.twitter.com/smrDs6Yi5M

— KeeganMuse (@KeegMuse) April 17, 2024

First off… I am ok.

My house was vandalized by bricks 🧱

After my hands stopped shaking, I managed to call the Sheriff…They were quick to respond🚨

My window was gone and the police asked if I knew who did it👮‍♂️

I said yes, it was Klay Thompson pic.twitter.com/JaWtdJhFli

— JJJ Muse (@JarenJJMuse) April 17, 2024

X did not immediately respond to Ars’ request for comment or confirm if the post will be corrected or taken down.

In the past, both Microsoft and chatbot maker OpenAI have faced defamation lawsuits over similar fabrications in which ChatGPT falsely accused a politician and a radio host of completely made-up criminal histories. Microsoft was also sued by an aerospace professor who Bing Chat falsely labeled a terrorist.

Experts told Ars that it remains unclear if disclaimers like X’s will spare companies from liability should more people decide to sue over fake AI outputs. Defamation claims might depend on proving that platforms “knowingly” publish false statements, which disclaimers suggest they do. Last July, the Federal Trade Commission launched an investigation into OpenAI, demanding that the company address the FTC’s fears of “false, misleading, or disparaging” AI outputs.

Because the FTC doesn’t comment on its investigations, it’s impossible to know if its probe will impact how OpenAI conducts business.

For people suing AI companies, the urgency of protecting against false outputs seems obvious. Last year, the radio host suing OpenAI, Mark Walters, accused the company of “sticking its head in the sand” and “recklessly disregarding whether the statements were false under circumstances when they knew that ChatGPT’s hallucinations were pervasive and severe.”

X just released Grok to all premium users this month, TechCrunch reported, right around the time that X began giving away premium access to the platform’s top users. During that wider rollout, X touted Grok’s new ability to summarize all trending news and topics, perhaps stoking interest in this feature and peaking Grok usage just before Grok spat out the potentially defamatory post about the NBA star.

Thompson has not issued any statements on Grok’s fake reporting.

Grok’s false post about Thompson may be the first widely publicized example of potential defamation from Grok, but it wasn’t the first time that Grok promoted fake news in response to X users joking around on the platform. During the solar eclipse, a Grok-generated headline read, “Sun’s Odd Behavior: Experts Baffled,” Gizmodo reported.

While it’s amusing to some X users to manipulate Grok, the pattern suggests that Grok may also be vulnerable to being manipulated by bad actors into summarizing and spreading more serious misinformation or propaganda. That’s apparently already happening, too. In early April, Grok made up a headline about Iran attacking Israel with heavy missiles, Mashable reported.

Elon Musk’s Grok keeps making up fake news based on X users’ jokes Read More »

renovation-relic:-man-finds-hominin-jawbone-in-parents’-travertine-kitchen-tile

Renovation relic: Man finds hominin jawbone in parents’ travertine kitchen tile

Kitchen reno surprise —

Yes, travertine often has embedded fossils. But not usually hominin ones.

closeup of fossilized jawbone in a piece of travertine tile

Enlarge / Reddit user Kidipadeli75 spotted a fossilized hominin jawbone in his parents’ new travertine kitchen tile.

Reddit user Kidipadeli75

Ah, Reddit! It’s a constant source of amazing stories that sound too good to be true… and yet! The latest example comes to us from a user named Kidipadeli75, a dentist who visited his parents after the latter’s kitchen renovation and noticed what appeared to be a human-like jawbone embedded in the new travertine tile. Naturally, he posted a photograph to Reddit seeking advice and input. And Reddit was happy to oblige.

User MAJOR_Blarg, for instance, is a dentist “with forensic odontology training” and offered the following:

While all old-world monkeys, apes, and hominids share the same dental formula, 2-1-2-3, and the individual molars and premolars can look similar, the specific spacing in the mandible itself is very specifically and characteristically human, or at least related and very recent hominid relative/ancestor. Most likely human given the success of the proliferation of H.s. and the (relatively) rapid formation of travertine.

Against modern Homo sapiens, which may not be entirely relevant, the morphology of the mandible is likely not northern European, but more similar to African, middle Eastern, mainland Asian.

Another user, deamatrona, who claims to hold an anthropology degree, also thought the dentition looked Asiatic, “which could be a significant find.” The thread also drew the attention of John Hawks, an anthropologist at the University of Wisconsin–Madison and longtime science blogger who provided some valuable context on his own website. (Hawks has been involved with the team that discovered Homo naledi at the Rising Star cave system in 2013.)

For instance, much of the appeal of natural stone like travertine for home decor is its imperfections. But who knew that it’s actually quite common to find embedded fossils? It’s rarer to find hominin fossils but not unprecedented. Hawks specifically mentioned a quarry site near Bilzingsleben, Germany, where an archaeologist named Dietrich Mania discovered parts of two humans skulls and a mandible dating as far back as 470,000 years. And a hominin cranium was found in 2002 in a travertine quarry in southwestern Turkey. It was later dated to between 1.2 million and 1.6 million years old.

The obvious question—asked by numerous Redditors— is how one could possibly install all that kitchen tile without noticing a fossilized human jawbone in the travertine. Hawks offered a reasonable answer:

Quarries rough-cut travertine and other decorative stone into large panels, doing basic quality checks for gaps and large defects on the rough stone before polishing. Small defects and inclusions are the reason why people want travertine in the first place, so they don’t merit special attention. Consumers who buy travertine usually browse samples in a showroom to choose the type of rock, and they don’t see the actual panels or tile until installation. Tile or panels that are polished by machine and stacked in a workshop or factory for shipping are handled pretty quickly.

What this means is that there may be lots more hominin bones in people’s floors and showers.

Most will be hard to recognize. Random cross-sections of hominin bones are tough to make out from other kinds of fossils without a lot of training. Noticing a fossil is not so hard, but I have to say that I’ve often been surprised at what the rest of a fossil looks like after skilled preparators painstakingly extract it from the surrounding rock. The ways that either nature or a masonry saw may slice a fossil don’t correspond to an anatomy book, and a cross-section through part of a bone doesn’t usually resemble an X-ray image of a whole bone.

Cue a horde of amateur fossil enthusiasts excitedly scouring their travertine for signs of important archaeological finds.

But as Hawks notes, chances are that one wouldn’t be able to clearly identify a fossil even if it was embedded in one’s tile, given how thin such tiles and panels are typically cut. And one is far more likely to find fossils of algae, plants, mollusks, crustaceans, or similar smaller creatures than human remains. “Believe me, anthropologists don’t want to hear about every blob of bone in your tile,” Hawks wrote. “But certainly somebody has more pieces of the mandible of the Reddit post.”

Kidipadeli75 posted an update to the Reddit thread providing a few more details, such as that he and his parents live in Europe. He’s also pretty sure the mandible doesn’t belong to Jimmy Hoffa. While Kidipadeli75 originally thought the quarry of origin was in Spain, it is actually located in Turkey—just like the hominin cranium found near Kocabaş in 2002. The story is still developing, given that several researchers have already contacted Kidipadeli75 for more information and to offer their expertise. The bone might turn out to be very old indeed and potentially a scientifically significant find.

Could a new HGTV series be far behind? Renovation Relics, perhaps, or Fossil Fixer-Upper.  Feel free to pitch your own show ideas in the comments.

Renovation relic: Man finds hominin jawbone in parents’ travertine kitchen tile Read More »