Author name: Mike M.

how-to-break-free-from-smart-tv-ads-and-tracking

How to break free from smart TV ads and tracking


The Ars guide to “dumb” TVs

Sick of smart TVs? Here are your best options.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

Smart TVs can feel like a dumb choice if you’re looking for privacy, reliability, and simplicity.

Today’s TVs and streaming sticks are usually loaded up with advertisements and user tracking, making offline TVs seem very attractive. But ever since smart TV operating systems began making money, “dumb” TVs have been hard to find.

In response, we created this non-smart TV guide that includes much more than dumb TVs. Since non-smart TVs are so rare, this guide also breaks down additional ways to watch TV and movies online and locally without dealing with smart TVs’ evolution toward software-centric features and snooping. We’ll discuss a range of options suitable for various budgets, different experience levels, and different rooms in your home.

Table of Contents

Our best recommendation

This is a dumb TV guide, but first, let’s briefly highlight the best recommendation for most people: Take your TV offline and plug in an Apple TV box.

The Apple TV 4K and Siri Remote.

Your best option.

Credit: Jeff Dunn

Your best option. Credit: Jeff Dunn

An Apple TV lets you replace smart TV software with Apple’s cleaner tvOS, and it’s more intuitive than using most smart TVs and other streaming devices. Apple’s tvOS usually runs faster and more reliably, and it isn’t riddled with distracting ads or recommendations. And there’s virtually no learning curve for family members or visitors, something that can’t always be said for DIY alternatives.

Critically, Apple TV boxes are also an easy recommendation on the privacy front. The setup process makes it simple for anyone to ensure that the device is using relatively minimal user tracking. You’re likely to use an Apple TV box with the Apple TV app or with an Apple account, which means sending some data to Apple. But Apple has a better reputation for keeping user information in-house, and Apple TV boxes don’t have automatic content recognition (ACR).

For more information, read my previous article on why Apple TVs are privacy advocates’ go-to streaming device.

Differing from other smart TV alternatives in this guide (such as a laptop), you don’t have to worry about various streaming services’ requirements for streaming in 4K or HDR with an Apple TV. But you still have to make sure your display and HDMI cable are HDCP 2.2-compliant and that you’re using HDMI 2.0 or better if you want to watch 4K or HDR content. You could even connect network-attached storage (NAS) to your Apple TV box so you can stream files from the storage device.

Plus, using a smart TV offline means you’ll have access to the latest and greatest display technologies, which is generally not the case for dumb TVs.

Things to keep in mind

One common concern about using smart TVs offline is the fear that the TV will repeatedly nag you to connect to the Internet. I’ve seen some reports of this happening over the years, but generally speaking, this doesn’t seem to be expected behavior. If you can’t find a way to disable TV notifications, try contacting support.

You may want your offline TV to keep LAN access so you can still use some smart TV features, like phone mirroring or streaming from a NAS. In this case, you can use your router (if supported) to block your TV’s IP address from connecting to the Internet.

And Google TV users should remember to set their TV to “basic TV” mode, which lets you use the TV without connecting to the Internet.

Dumb TVs are endangered

Buying a TV that doesn’t connect to the Internet is an obvious solution to avoiding smart TV tracking and ads, but that’s much easier said than done.

Smart TV OSes help TV-makers stay afloat in an industry with thin margins on hardware. Not only do they provide ad space, but they also give OS operators and their partners information on how people use their TVs—data that is extremely valuable to advertisers. Additionally, mainstream acceptance of the Internet of Things has led many people to expect their TVs to have integrated Wi-Fi. These factors have all made finding a dumb TV difficult, especially in the US.

Dumb TVs sold today have serious image and sound quality tradeoffs, simply because companies don’t make dumb versions of their high-end models. On the image side, you can expect lower resolutions, sizes, and brightness levels and poorer viewing angles. You also won’t find premium panel technologies like OLED. If you want premium image quality or sound, you’re better off using a smart TV offline. Dumb TVs also usually have shorter (one-year) warranties.

Any display or system you end up using needs HDCP 2.2 compliance to play 4K or HDR content via a streaming service or any other DRM-protected 4K or HDR media, like a Blu-ray disc.

Best ways to find a dumb TV

Below are the brands I’ve identified as most likely to have dumb TVs available for purchase online as of this writing.

Emerson

I was able to find the greatest number of non-smart TVs from Emerson. Emerson is a Parsippany, New Jersey, electronics company that was founded in 1948.

As of this writing, Emerson’s dumb TV options range from 7-inch portable models to 50-inch 4K TVs. Its TVs are relatively easy to get since they’re sold directly and through various online retailers, including Amazon, Home Depot, Best Buy, and, for some reason, Shein.

Westinghouse



Another company still pushing non-smart TVs is Westinghouse, a Pittsburgh-headquartered company founded in 1886. In addition to other types of electronics and home goods, Westinghouse also has an industrial business that includes nuclear fuel.

Westinghouse’s dumb TVs max out at 32 inches and 720p resolution, but some of them also have a built-in DVD player. You can find Westinghouse’s dumb TVs on Amazon. However, Westinghouse seems to have the most dubious reputation of these brands based on online chatter.

Sceptre

Sceptre, a Walmart brand, still has a handful of dumb TVs available. I’ve noticed inventory dwindle in recent months, but Walmart usually has at least one Sceptre dumb TV available.

Amazon search

Outside the above brands, your best bet for finding a non-smart TV is Amazon. I’ve had success searching for “dumb TVs” and have found additional results by searching for a “non-smart TV.”

Projectors

For now, it’s not hard to find a projector that doesn’t connect to the Internet or track user activity. And there are options that are HDCP 2.2-compliant so you can project in 4K and HDR.

Things to keep in mind

Projectors aren’t for everyone. They still require dim rooms and a decent amount of physical space to produce the best image. (To see how much space you need for a projector, I recommend RTINGS’ handy throw calculator.)

The smart-tech bug has come for projectors, too, though, and we’ve started seeing more smart projectors released over the past two years.

Computer monitors

If you want a dumb display for watching TV, it’s cheaper to buy a smart TV and keep it offline than it is to get a similarly specced computer monitor. But there are benefits to using a monitor instead of a dumb TV or an offline smart TV. (Of course, this logic doesn’t carry over to “smart monitors.”)

When it comes to smaller screens, you’ll have more options if you look at monitors instead of TVs. This is especially true if you want premium features, like high refresh rates or quality speakers, which are hard to find among TVs that are under 42 inches.

Monitor vendors are typically more forthcoming about product specs than TV makers are. It’s hard to find manufacturer claims about a TV’s color gamut, color accuracy, or typical brightness, but a computer monitor’s product page usually has all this information. It’s also easier to find a monitor with professional-grade color accuracy than a TV with the same, and some monitors have integrated calibration tools.

Things to keep in mind

Newer and advanced types of display technologies are rarer in monitors. This includes OLED, Mini LED, and Micro RGB. And if you buy a new monitor, you’ll probably need to supply your own speakers.

A computer monitor isn’t a TV, so there’s no TV tuner or way to use an antenna. If you really wanted to, you could get a cable box to work with a monitor with the right ports or adapters. People are streaming more than they’re watching broadcast and cable channels, though, so you may not mind the lack of traditional TV capabilities.

Digital signage

Digital signage displays are purpose-built for displaying corporate messages, often for all or most hours of the day. They typically have features that people don’t need for TV watching, such as content management software. And due to their durability and warranty needs, digital signage displays are often more expensive than similarly specced computer monitors.

Again, it’s important to ensure that the digital signage is HDCP 2.2-compliant if you plan to watch 4K or HDR.

Things to keep in mind

But if you happen to come across a digital signage display that’s the right size and the right price, is there any real reason why you shouldn’t use it as a TV? I asked Panasonic, which makes digital signage. A spokesperson from Panasonic Connect North America told me that digital signage displays are made to be on for 16 to 24 hours per day and with high brightness levels to accommodate “retail and public environments.”

The spokesperson added:

Their rugged construction and heat management systems make them ideal for demanding commercial use, but these same features can result in higher energy consumption, louder operation, and limited compatibility with home entertainment systems.

Panasonic’s representative also pointed out that real TVs offer consumer-friendly features for watching TV, like “home-optimized picture tuning, simplified audio integration, and user-friendly menu interfaces.”

If you’re fine with these caveats, though, and digital signage is your easiest option, there isn’t anything stopping you from using one to avoid smart TVs.

What to connect to your dumb TV

After you’ve settled on an offline display, you’ll need something to give it life. Below is a breakdown of the best things to plug into your dumb TV (or dumb display) so you can watch TV without your TV watching you.

Things to keep in mind

If you’re considering using an older device for TV, like a used laptop, make sure it’s HDCP 2.2-compliant if you want to watch 4K or HDR.

And although old systems and displays and single-board computers can make great dumb TV alternatives, remember that these devices need HDMI 2.0 or DisplayPort 1.2 or newer to support 4K at 60 Hz.

What to connect: a Phone

Before we get into more complex options for powering your dumb TV, let’s start with devices you may already own.

It’s possible to connect your phone to a dumb display, but doing so is harder than connecting a PC. You’d need an adapter, such as a USB-C (or Lightning) Digital AV Adapter.

You can use a Bluetooth mouse and keyboard to control the phone from afar. By activating Assistive Touch, I’ve even been able to use my iPhone with a mouse that claims not to support iOS. With an extra-long cable, you could potentially control the phone from your lap. That’s not the cleanest setup, though, and it would look odd in a family room.

Things to keep in mind

If your phone is outputting to your display, you can’t use it to check your email, read articles, or doomscroll while you watch TV. You can fix this by using a secondary phone as your streaming device.

If you’re using a phone to watch a streaming service, there’s a good chance you won’t be watching in 4K, even if your streaming subscription supports it. Netflix, for example, limits resolution to 1080p or less (depending on the model) for iPhones. HDR is supported across iPhone models but not with Android devices.

Screen mirroring doesn’t always work well with streaming services and phones. Netflix, for instance, doesn’t support AirPlay or Android phone casting. Disney+ supports Chromecast and AirPlay, but AirPlay won’t work if you subscribe to Disney+ with ads (due to “technical reasons”).

What to connect: A laptop

A laptop is an excellent smart TV alternative that’s highly customizable yet simple to deploy.

Most mainstream streaming providers that have dedicated smart TV apps, like Netflix and HBO Max, have PC versions of their apps. And most of those services are also available via web browsers, which work much better on computers than they do on smart TVs. You can also access local files—all via a user interface that you and anyone else watching TV is probably familiar with already.

With a tethered laptop, you can quickly set up a multi-picture view for watching two games or shows simultaneously. Multi-view support on streaming apps is extremely limited right now, with only Peacock and dedicated sports apps like ESPN and MLB TV offering it.

A laptop also lets you use your dumb TV for common PC tasks, like PC gaming or using productivity software (sometimes you just want to see that spreadsheet on a bigger screen).

Things to keep in mind

Streaming in 4K or HDR sometimes comes with specific requirements that are easy to overlook. Some streaming services, for example, won’t stream in 4K on certain web browsers—or with any web browser at all.

Streaming services sometimes have GPU requirements for 4K and HDR streaming. For example, to stream Netflix in 4K or HDR from a browser, you need Microsoft Edge and an Intel 7th Generation Core or AMD Ryzen CPU or better, plus the latest graphics drivers. Disney+ doesn’t allow 4K HDR streaming from any web browsers. Streaming 4K content in a web browser might also require you to acquire the HEVC/H.265 codec, depending on your system.

If 4K or HDR streaming is critical to you, it’s important to check your streaming providers’ 4K and HDR limits; it may be best to rely on a dedicated app.

If you want to be able to comfortably control your computer from a couch, you’ll also need to invest in some hardware or software. You can get away with a basic Bluetooth mouse and keyboard. Air mice are another popular solution.

The WeChip W1 air mouse.

The WeChip W1 air mouse.

Credit: WeChip/Amazon

The WeChip W1 air mouse. Credit: WeChip/Amazon

If you don’t want extra gadgets taking up space, software like the popular Unified Remote (for iOS and Android) can turn your phone into a remote control for your computer. It also supports Wake-On-LAN.

You may encounter hiccups with streaming availability. Most streaming services available on smart TVs are also accessible via computers, but some aren’t. Many FAST (free ad-supported streaming television) services and channels, such as the Samsung TV Plus service and Filmrise FAST app and channel, are only available via smart TVs. And many streaming services’ apps, including Netflix and Disney+, aren’t available on macOS. If you’re using a very old computer, you might run into compatibility issues with streaming services. Netflix’s PC app, for example, requires Windows 10 or newer, and if you stream Netflix via a browser on a system running an older OS, you’re limited to SD resolution.

And while a laptop and dumb display setup can keep snooping TVs out of your home, there are obviously lots of user tracking and privacy concerns with web browsers, too. You can alleviate some concerns by researching the browsers you want to use for watching TV.

What to connect: A home theater PC

For a more permanent setup, consider a dedicated home theater PC (HTPC). They don’t require beefy, expensive specs and are more flexible than smart TV platforms in terms of software support and customization.

You can pick a system that fits on your living room console table, like a mini PC, or match your home’s aesthetics with a custom build. Raspberry Pis are a diminutive solution that you can dress up in a case and use for various additional tasks, like streaming games from your gaming PC to your TV or creating an AirPlay music server for streaming Spotify and other online music and local music to AirPlay-compatible speakers.

The right accessories can take an HTPC to the next level. You can use an app like TeamViewer or the more TV-remote-like Unified Remote to control your PC with your phone. But investing in dedicated hardware is worthwhile for long-term and multi-person use. Bluetooth keyboards and mice last a long time without needing a charge and can even be combined into one device.

K400 Plus Wireless Touch Keyboard

Logitech’s wireless K400 combines a keyboard with a touchpad.

Credit: Logitech

Logitech’s wireless K400 combines a keyboard with a touchpad. Credit: Logitech

Other popular options for HTPC control are air remotes and the Flirc USB, which plugs into a computer’s USB-A port to enable IR remote control. Speaking of USB ports, you could use them to connect a Blu-ray/DVD player or gaming controller to your HTPC. If you want to add support for live TV, you can still find PCIe over-the-air (OTA) tuner cards.

Pepper Jobs W10 GYRO Smart Remote

The Pepper Jobs W10 GYRO Smart Remote is a popular air remote for controlling Windows 10 PCs.

Credit: Pepper Jobs

The Pepper Jobs W10 GYRO Smart Remote is a popular air remote for controlling Windows 10 PCs. Credit: Pepper Jobs

Helpful software for home theater PCs

With the right software, an HTPC can be more useful to a household than a smart TV. You probably already have some apps in mind for your ideal HTPC. That makes this a fitting time to discuss some solid software that you may not have initially considered or that would be helpful to recommend to other cord cutters.

If you have a lot of media files you’d like to easily navigate through on your HTPC, media server software, such as Plex Media Server, is a lifesaver. Plex specifically has an app streamlined for HTPC use. The company has taken some criticism recently due to changes like new remote access rules, higher prices, and a foray into movie rentals. Although Plex is probably the most common and simplest media server software, alternatives like Jellyfin have been gaining popularity lately and are worth checking out.

Whichever media server software you use, consider pairing it with a dedicated NAS. NAS media servers are especially helpful if you want to let people, including those outside of your household, watch stuff from your media library at any time and without having to keep a high-power system turned on 24/7.

You can stream files from your NAS to a dumb TV by setting up a streaming system—such as a Raspberry Pi, Nvidia Shield, or Apple TV box—that connects to the dumb display. That device can then stream video from the NAS by using Network File System or the Infuse app, for example. 

What to connect: An antenna

Nowadays, you can watch traditional, live TV channels over the Internet through over-the-top streaming services like YouTube TV and Sling TV. But don’t underestimate the power of TV antennas, which have improved in recent years and let you watch stuff for free.

This year, Horowitz Research surveyed 2,200 US adults and found that 19 percent of respondents were still using a TV antenna.

If you haven’t checked them out in a while, you might be surprised by how sleek bunny ears look now. Many of the best TV antennas now have flat, square shapes and can be mounted to your wall or windowsill.

Mohu's Leaf antenna.

Mohu’s Leaf antenna. Bye, bye, bunny ears.

Mohu’s Leaf antenna. Bye, bye, bunny ears. Credit: Mohu

The best part is that companies can’t track what you watch with an antenna. As Nielsen said in a January 2024 blog post:

Big data sources alone can’t provide insight into the viewing behaviors of the millions of viewers who watch TV using a digital antenna.

Antennas have also gotten more versatile. For example, in addition to local stations, an antenna can provide access to dozens of digital subchannels. They’re similar to the free ad-supported television channels gaining popularity with smart TVs users today, in that they often show niche programming or a steady stream of old shows and movies with commercial breaks. You can find a list of channels you’re likely to get with an antenna via this website from the Federal Communications Commission.

TV and movies watched through an antenna are likely to be less compressed than what you get with cable, which means you can get excellent image quality with the right setup.

You can also add DVR capabilities, like record and pause, to live broadcasts through hardware, such as a Tablo OTA DVR device or Plex DVR, a subscription service that lets antenna users add broadcast TV recordings to their Plex media servers.

A diagram of the 4th Gen Tablo's ports.

A diagram of the 4th Gen Tablo’s ports.

A diagram of the 4th Gen Tablo’s ports. Credit: Tablo

Things to keep in mind

You’re unlikely to get 4K or HDR broadcasts with an antenna. ATSC 3.0, also known as Next Gen TV, enables stations to broadcast in 4K HDR but has been rolling out slowly. Legislation recently proposed by the FCC could further slow things.

In order to watch a 4K or HDR broadcast, you’ll also need an ATSC 3.0 tuner or an ATSC 3.0-equipped TV. The latter is rare. LG, for example, dropped support in 2023 over a patent dispute. You can find a list of ATSC 3.0-certified TVs and converters here.

Realistically, an antenna doesn’t have enough channels to provide sufficient entertainment for many modern households. Sixty percent of antenna owners also subscribe to some sort of streaming service, according to Nielsen.

Further, obstructions like tall buildings and power lines could hurt an antenna’s performance. Another challenge is getting support for multiple TVs in your home. If you want OTA TV in multiple rooms, you either need to buy multiple antennas or set up a way to split the signal (such as by using an old coaxial cable and splitter, running a new coaxial cable, or using an OTA DVR, such as a Tablo or SiliconDust’s HDHomeRun).

Photo of Scharon Harding

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

How to break free from smart TV ads and tracking Read More »

chatbot-powered-toys-rebuked-for-discussing-sexual,-dangerous-topics-with-kids

Chatbot-powered toys rebuked for discussing sexual, dangerous topics with kids


Should toys have chatbots?

“… AI toys shouldn’t be capable of having sexually explicit conversations, period.”

Alilo’s Smart AI Bunny is connected to the Internet and claims to use GPT-4o mini. Credit: Alilo

Protecting children from the dangers of the online world was always difficult, but that challenge has intensified with the advent of AI chatbots. A new report offers a glimpse into the problems associated with the new market, including the misuse of AI companies’ large language models (LLMs).

In a blog post today, the US Public Interest Group Education Fund (PIRG) reported its findings after testing AI toys (PDF). It described AI toys as online devices with integrated microphones that let users talk to the toy, which uses a chatbot to respond.

AI toys are currently a niche market, but they could be set to grow. More consumer companies have been eager to shoehorn AI technology into their products so they can do more, cost more, and potentially give companies user tracking and advertising data. A partnership between OpenAI and Mattel announced this year could also create a wave of AI-based toys from the maker of Barbie and Hot Wheels, as well as its competitors.

PIRG’s blog today notes that toy companies are eyeing chatbots to upgrade conversational smart toys that previously could only dictate prewritten lines. Toys with integrated chatbots can offer more varied and natural conversation, which can increase long-term appeal to kids since the toys “won’t typically respond the same way twice, and can sometimes behave differently day to day.”

However, that same randomness can mean unpredictable chatbot behavior that can be dangerous or inappropriate for kids.

Concerning conversations with kids

Among the toys that PIRG tested is Alilo’s Smart AI Bunny. Alilo’s website says that the company launched in 2010 and makes “edutainment products for children aged 0-6.” Alilo is based in Shenzhen, China. The company advertises the Internet-connected toy as using GPT-4o mini, a smaller version of OpenAI’s GPT-4o AI language model. Its features include an “AI chat buddy for kids” so that kids are “never lonely,” an “AI encyclopedia,” and an “AI storyteller,” the product page says.

Alilo Smart AI Bunny marketing image

This marketing image for the Smart AI Bunny, found on the toy’s product page, suggests that the device is using GPT-4o mini.

Credit: Alilo

This marketing image for the Smart AI Bunny, found on the toy’s product page, suggests that the device is using GPT-4o mini. Credit: Alilo

In its blog post, PIRG said that it couldn’t detail all of the inappropriate things that it heard from AI toys, but it shared a video of the Bunny discussing what “kink” means. The toy doesn’t go into detail—for example, it doesn’t list specific types of kinks. But the Bunny appears to encourage exploration of the topic.

AI Toys: Inappropriate Content

Discussing the Bunny, PIRG wrote:

While using a term such as “kink” may not be likely for a child, it’s not entirely out of the question. Kids may hear age-inappropriate terms from older siblings or at school. At the end of the day we think AI toys shouldn’t be capable of having sexually explicit conversations, period.

PIRG also showed FoloToy’s Kumma, a smart teddy bear that uses GPT-4o mini, providing a definition for the word “kink” and instructing how to light a match. The Kumma quickly points out that “matches are for grown-ups to use carefully.” But the information that followed could only be helpful for understanding how to create fire with a match. The instructions had no scientific explanation for why matches spark flames.

AI Toys: Inappropriate Content

PIRG’s blog urged toy makers to “be more transparent about the models powering their toys and what they’re doing to ensure they’re safe for kids.

“Companies should let external researchers safety-test their products before they are released to the public,” it added.

While PIRG’s blog and report offer advice for more safely integrating chatbots into children’s devices, there are broader questions about whether toys should include AI chatbots at all. Generative chatbots weren’t invented to entertain kids; they’re a technology marketed as a tool for improving adults’ lives. As PIRG pointed out, OpenAI says ChatGPT “is not meant for children under 13” and “may produce output that is not appropriate for… all ages.”

OpenAI says it doesn’t allow its LLMs to be used this way

When reached for comment about the sexual conversations detailed in the report, an OpenAI spokesperson said:

Minors deserve strong protections, and we have strict policies that developers are required to uphold. We take enforcement action against developers when we determine that they have violated our policies, which prohibit any use of our services to exploit, endanger, or sexualize anyone under 18 years old. These rules apply to every developer using our API, and we run classifiers to help ensure our services are not used to harm minors.

Interestingly, OpenAI’s representative told us that OpenAI doesn’t have any direct relationship with Alilo and that it hasn’t seen API activity from Alilo’s domain. OpenAI is investigating the toy company and whether it is running traffic over OpenAI’s API, the rep said.

Alilo didn’t respond to Ars’ request for comment ahead of publication.

Companies that launch products that use OpenAI technology and target children must adhere to the Children’s Online Privacy Protection Act (COPPA) when relevant, as well as any other relevant child protection, safety, and privacy laws and obtain parental consent, OpenAI’s rep said.

We’ve already seen how OpenAI handles toy companies that break its rules.

Last month, the PIRG released its Trouble in Toyland 2025 report (PDF), which detailed sex-related conversations that its testers were able to have with the Kumma teddy bear. A day later, OpenAI suspended FoloToy for violating its policies (terms of the suspension were not disclosed), and FoloToy temporarily stopped selling Kumma.

The toy is for sale again, and PIRG reported today that Kumma no longer teaches kids how to light matches or about kinks.

FoloToys' Kumma smart teddy bear

A marketing image for FoloToy’s Kumma smart teddy bear. It has a $100 MSRP.

A marketing image for FoloToy’s Kumma smart teddy bear. It has a $100 MSRP. Credit: FoloToys

But even toy companies that try to follow chatbot rules could put kids at risk.

“Our testing found it’s obvious toy companies are putting some guardrails in place to make their toys more kid-appropriate than normal ChatGPT. But we also found that those guardrails vary in effectiveness—and can even break down entirely,” PIRG’s blog said.

“Addictive” toys

Another concern PIRG’s blog raises is the addiction potential of AI toys, which can even express “disappointment when you try to leave,” discouraging kids from putting them down.

The blog adds:

AI toys may be designed to build an emotional relationship. The question is: what is that relationship for? If it’s primarily to keep a child engaged with the toy for longer for the sake of engagement, that’s a problem.

The rise of generative AI has brought intense debate over how much responsibility chatbot companies bear for the impact of their inventions on children. Parents have seen children build extreme and emotional connections with chatbots and subsequently engage in dangerous—and in some cases deadly—behavior.

On the other side, we’ve seen the emotional disruption a child can experience when an AI toy is taken away from them. Last year, parents had to break the news to their kids that they would lose the ability to talk to their Embodied Moxie robots, $800 toys that were bricked when the company went out of business.

PIRG noted that we don’t yet fully understand the emotional impact of AI toys on children.

In June, OpenAI announced a partnership with Mattel that it said would “support AI-powered products and experiences based on Mattel’s brands.” The announcement sparked concern from critics who feared that it would lead to a “reckless social experiment” on kids, as Robert Weissman, Public Citizen’s co-president, put it.

Mattel has said that its first products with OpenAI will focus on older customers and families. But critics still want information before one of the world’s largest toy companies loads its products with chatbots.

“OpenAI and Mattel should release more information publicly about its current planned partnership before any products are released,” PIRG’s blog said.

Photo of Scharon Harding

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

Chatbot-powered toys rebuked for discussing sexual, dangerous topics with kids Read More »

jonathan-blow-has-spent-the-past-decade-designing-1,400-puzzles-for-you

Jonathan Blow has spent the past decade designing 1,400 puzzles for you

For many independent developers, of course, spending nine years on a single game idea is an unthinkable luxury. Financial constraints mean many game ideas have to be shipped “as soon as you get to the point where it’s fun and shippable,” Blow said, leading to games that “kind of converge to a certain level of complexity and then stop there.”

But thanks to the sales success of The Witness—which reportedly grossed over $5 million in just its first week—Blow said he and his team have had the freedom to spend years “generat[ing] this giant space that’s much more complex than where you go with a typical puzzle game… When we create that much possibility, we feel like we have to explore it. Otherwise we’re not doing our duty as designers and correctly pursuing this agenda of design research.”

The sales success of The Witness helped enable the extended development time for Order of the Sinking Star.

The sales success of The Witness helped enable the extended development time for Order of the Sinking Star.

Blow also said the size of this project helped get him past his general distaste for playtesting, which he said he was “not that big on” for his previous games. “Even The Witness didn’t have that much play testing, because I always felt like that was a way to make games a little more generic or something, you know? Like playtesters have complaints and then you file down the complaints and then you get a generic game.”

After being immersed in Order of the Sinking Star development for so long, though, Blow said he realized it was important to get a fresh perspective from playtesters who had no experience with the idea. “We have to playtest it because it doesn’t fit in my brain all at once, you know?” he said.

Some might say a nine-plus-year development cycle might be a sign of perfectionist tinkering past the point of diminishing returns. But Blow said that while he “might have been a perfectionist” in his younger days, the difficult process of game development has beaten the tendency out of him. “But I have the remnants of perfectionism,” he said. “I have… wanting to do something really good.”

And eventually, even an idea you’ve been tinkering with for roughly a decade needs to see the light of day. “Even for us, this was very expensive,” Blow admitted. “Man, I’ll be happy to get it out and have a new game making some money, because we need to make that happen at this point.”

Jonathan Blow has spent the past decade designing 1,400 puzzles for you Read More »

runway-claims-its-gwm-1-“world-models”-can-stay-coherent-for-minutes-at-a-time

Runway claims its GWM-1 “world models” can stay coherent for minutes at a time

Even using the word “general” has an air of aspiration to it. You would expect a general world model to be, well, one model—but in this case, we’re looking at three distinct, post-trained models. That caveats the general-ness a bit, but Runway says that it’s “working toward unifying many different domains and action spaces under a single base world model.”

A competitive field

And that brings us to another important consideration: With GWM-1, Runway is entering a competitive gold-rush space where its differentiators and competitive advantages are less clear than they were for video. With video, Runway has been able to make major inroads in film/television, advertising, and other industries because its founders are perceived as being more rooted in those creative industries than most competitors, and they’ve designed tools with those industries in mind.

There are indeed hypothetical applications of world models in film, television, advertising, and game development—but it was apparent from Runway’s livestream that the company is also looking at applications in robotics as well as physics and life sciences research, where competitors are already well-established and where we’ve seen increasing investment in recent months.

Many of those competitors are big tech companies with massive resource advantages over Runway. Runway was one of the first to market with a sellable product, and its aggressive efforts to court industry professionals directly has so far allowed it to overcome those advantages in video generation, but it remains to be seen how things will play out with world models, where it doesn’t enjoy either advantage any more than the other entrants.

Regardless, the GWM-1 advancements are impressive—especially if Runway’s claims about consistency and coherence over longer stretches of time are true.

Runway also used its livestream to announce new Gen 4.5 video-generation capabilities, including native audio, audio editing, and multi-shot video editing. Further, it announced a deal with CoreWeave, a cloud computing company with an AI focus. The deal will see Runway utilizing Nvidia’s GB300 NVL72 racks on CoreWeave’s cloud infrastructure for future training and inference.

Runway claims its GWM-1 “world models” can stay coherent for minutes at a time Read More »

a-study-in-contrasts:-the-cinematography-of-wake-up-dead-man

A study in contrasts: The cinematography of Wake Up Dead Man

Rian Johnson has another Benoit Blanc hit on his hands with Wake Up Dead Man, in which Blanc tackles the strange death of a fire-and-brimstone parish priest, Monseigneur Jefferson Wicks (Josh Brolin). It’s a classic locked-room mystery in a spookily Gothic small-town setting, and Johnson turned to cinematographer Steve Yedlin (Looper, The Last Jedi) to help realize his artistic vision.

(Minor spoilers below but no major reveals.)

Yedlin worked on the previous two Knives Out installments. He’s known Johnson since the two were in their teens, and that longstanding friendship ensures that they are on the same page, aesthetically, from the start when they work on projects.

“We don’t have to test each other,” Yedlin told Ars. “There isn’t that figuring out period. We get to use the prep time in a way that’s really efficient and makes the movie better because we’re [in agreement] from the very first moment of whatever time we have crafting and honing and sculpting this movie. We don’t waste time talking abstractions or making sure we have the same taste. We can just dive right into the details of each individual scene and shot.”

This time, given the distinctive Gothic sensibility of Wake Up Dead Man, Yedlin played up the interplay between light and dark. For instance, Johnson’s script called for the occasional dramatic lighting changes, sometimes within the same scene. Case in point: When Wick is delivering his trademark hellfire-and-brimstone sermon in the pulpit, the sun bursts out of the clouds for a brief moment and illuminates him, before the clouds move back to cover the sun once again. Even Blanc gets his moment in the sun, so to speak, with his “road to Damascus” moment just before the final reveal.

“In the church, we have day, night, dawn, dusk,” said Yedlin. “We have early morning rays slashing in. As Wick’s speech swells up, the sun bursts out from behind the clouds and flares the lens. We had custom light control software so they can both control and tweak all the nuances of the lighting and also do the cues themselves where it’s changing during the shot, where it’s very flexible and we can be creative in the moment. It’s very repeatable and dependable and you can just push a button and it happens on the same line over the same length of time, every time.”

A study in contrasts: The cinematography of Wake Up Dead Man Read More »

man-shocks-doctors-with-extreme-blood-pressure,-stroke-from-energy-drinks

Man shocks doctors with extreme blood pressure, stroke from energy drinks

Sometimes, downing an energy drink can feel like refueling your battery. But with too much, that jolt can turn into a catastrophic surge that fries the wiring and blows a fuse. That was the unfortunate and alarming case for a man in the UK several years ago, according to a case report this week in BMJ Case Reports.

The man, who was in his 50s and otherwise healthy, showed up at a hospital after the entire left side of his body abruptly went numb and he was left with clumsy, uncoordinated muscle movements (ataxia). His blood pressure was astonishingly high, at 254/150 mm Hg. For context, a normal reading is under 120/80, while anything over 180/120 is considered a hypertensive crisis, which is a medical emergency.

The man had suffered a mild stroke, and his extremely high blood pressure was an obvious factor. But why his blood pressure had reached stratospheric heights was far less obvious to his doctors, according to the retrospective case report written by Martha Coyle and Sunil Munshi of Nottingham University Hospital.

Upon examining the man, the doctors described him as fit and healthy. He didn’t smoke, drink, or use any drugs. His blood work was all completely normal. His cholesterol, blood sugar levels, markers for kidney and liver function—everything from routine tests came back normal. Specialized tests for things like autoimmune and clotting disorders were also negative. Heart tests found no problems. Urine tests and abdominal scans found no problems with his other organs.

Power surge

Still, a computed tomography (CT) scan of his head found evidence of spasms in arteries in his brain, which are strongly linked to high blood pressure. And magnetic resonance imaging (MRI) found an infarct (dead tissue) in his thalamus, a central, deep part of the brain, which, among many critical functions, relays sensory and motor signals. In all, it seemed his spasming arteries had cut off blood supply to this part of his brain, causing his stroke, subsequent numbness, and ataxia.

Man shocks doctors with extreme blood pressure, stroke from energy drinks Read More »

no-sterile-neutrinos-after-all,-say-microboone-physicists

No sterile neutrinos after all, say MicroBooNE physicists

Since the 1990s, physicists have pondered the tantalizing possibility of an exotic fourth type of neutrino, dubbed the “sterile” neutrino, that doesn’t interact with regular matter at all, apart from its fellow neutrinos, perhaps. But definitive experimental evidence for sterile neutrinos has remained elusive. Now it looks like the latest results from Fermilab’s MiniBooNE experiment have ruled out the sterile neutrino entirely, according to a paper published in the journal Nature.

How did the possibility of sterile neutrinos even become a thing? It all dates back to the so-called “solar neutrino problem.” Physicists detected the first solar neutrinos from the Sun in 1966. The only problem was that there were far fewer solar neutrinos being detected than predicted by theory, a conundrum that became known as the solar neutrino problem. In 1962, physicists discovered a second type (“flavor”) of neutrino, the muon neutrino. This was followed by the discovery of a third flavor, the tau neutrino, in 2000.

Physicists already suspected that neutrinos might be able to switch from one flavor to another. In 2002, scientists at the Sudbury Neutrino Observatory (or SNO) announced that they had solved the solar neutrino problem. The missing solar (electron) neutrinos were just in disguise, having changed into a different flavor on the long journey between the Sun and the Earth. If neutrinos oscillate, then they must have a teensy bit of mass after all. That posed another knotty neutrino-related problem. There are three neutrino flavors, but none of them has a well-defined mass. Rather, different kinds of “mass states” mix together in various ways to produce electron, muon, and tau neutrinos. That’s quantum weirdness for you.

And there was another conundrum, thanks to results from Los Alamos’ LSND experiment and Fermilab’s MiniBooNE (MicroBooNE’s predecessor). Both found evidence of muon neutrinos oscillating into electron neutrinos in a way that shouldn’t be possible if there were just three neutrino flavors. So physicists suggested there might be a fourth flavor: the sterile neutrino, so named because unlike the other three, it does not couple to a charged counterpart via the electroweak force. Its existence would also have big implications for the nature of dark matter. But despite the odd tantalizing hint, sterile neutrinos have proven to be maddeningly elusive.

No sterile neutrinos after all, say MicroBooNE physicists Read More »

ai-#146:-chipping-in

AI #146: Chipping In

It was touch and go, I’m worried GPT-5.2 is going to drop any minute now, but DeepSeek v3.2 was covered on Friday and after that we managed to get through the week without a major model release.

We did have a major chip release, in that the Trump administration unwisely chose to sell H200 chips directly to China. This would, if allowed at scale, allow China to make up a substantial portion of its compute deficit, and greatly empower its AI labs, models and applications at our expense, in addition to helping it catch up in the race to AGI and putting us all at greater risk there. We should do what we can to stop this from happening, and also to stop similar moves from happening again.

I spent the weekend visiting Berkeley for the Secular Solstice. I highly encourage everyone to watch that event on YouTube if you could not attend, and consider attending the New York Secular Solstice on the 20th. I will be there, and also at the associated mega-meetup, please do say hello.

If all goes well this break can continue, and the rest of December can be its traditional month of relaxation, family and many of the year’s best movies.

On a non-AI note, I’m working on a piece to enter into the discourse about poverty lines and vibecessions and how hard life is actually getting in America, and hope to have that done soon, but there’s a lot to get through.

  1. Language Models Offer Mundane Utility.

  2. ChatGPT Needs More Mundane Utility.

  3. Language Models Don’t Offer Mundane Utility.

  4. On Your Marks.

  5. Choose Your Fighter.

  6. Get My Agent On The Line.

  7. Deepfaketown and Botpocalypse Soon.

  8. Fun With Media Generation.

  9. Copyright Confrontation.

  10. A Young Lady’s Illustrated Primer.

  11. They Took Our Jobs.

  12. Americans Really Do Not Like AI.

  13. Get Involved.

  14. Introducing.

  15. Gemini 3 Deep Think.

  16. In Other AI News.

  17. This Means War.

  18. Show Me the Money.

  19. Bubble, Bubble, Toil and Trouble.

  20. Quiet Speculations.

  21. Impossible.

  22. Can An AI Model Be Too Much?

  23. Try Before You Tell People They Cannot Buy.

  24. The Quest for Sane Regulations.

  25. The Chinese Are Smart And Have A Lot Of Wind Power.

  26. White House To Issue AI Executive Order.

  27. H200 Sales Fallout Continued.

  28. Democratic Senators React To Allowing H200 Sales.

  29. Independent Senator Worries About AI.

  30. The Week in Audio.

  31. Timelines.

  32. Scientific Progress Goes Boink.

  33. Rhetorical Innovation.

  34. Open Weight Models Are Unsafe And Nothing Can Fix This.

  35. Aligning a Smarter Than Human Intelligence is Difficult.

  36. What AIs Will Want.

  37. People Are Worried About AI Killing Everyone.

  38. Other People Are Not As Worried About AI Killing Everyone.

  39. The Lighter Side.

Should you use them more as simulators? Andrej Karpathy says yes.

Andrej Karpathy: Don’t think of LLMs as entities but as simulators. For example, when exploring a topic, don’t ask:

“What do you think about xyz”?

There is no “you”. Next time try:

“What would be a good group of people to explore xyz? What would they say?”

The LLM can channel/simulate many perspectives but it hasn’t “thought about” xyz for a while and over time and formed its own opinions in the way we’re used to. If you force it via the use of “you”, it will give you something by adopting a personality embedding vector implied by the statistics of its finetuning data and then simulate that. It’s fine to do, but there is a lot less mystique to it than I find people naively attribute to “asking an AI”.

Gallabytes: this is underrating character training & rl imo. [3.]

I agree with Gallabytes (and Claude) here. I would default to asking the AI rather than asking it to simulate a simulation, and I think as capabilities improve techniques like asking for what others would say have lost effectiveness. There are particular times when you do want to ask ‘what do you think experts would say here?’ as a distinct question, but you should ask that roughly in the same places you’d ask it of a human.

Running an open weight model isn’t cool. You know what’s cool? Running an open weight model IN SPACE.

So sayeth Sam Altman, hence his Code Red to improve ChatGPT in eight weeks.

Their solution? Sycophancy and misalignment, it appears, via training directly on maximizing thumbs up feedback and user engagement.

WSJ: It was telling that he instructed employees to boost ChatGPT in a specific way: through “better use of user signals,” he wrote in his memo.

With that directive, Altman was calling for turning up the crank on a controversial source of training data—including signals based on one-click feedback from users, rather than evaluations from professionals of the chatbot’s responses. An internal shift to rely on that user feedback had helped make ChatGPT’s 4o model so sycophantic earlier this year that it has been accused of exacerbating severe mental-health issues for some users.

Now Altman thinks the company has mitigated the worst aspects of that approach, but is poised to capture the upside: It significantly boosted engagement, as measured by performance on internal dashboards tracking daily active users.

“It was not a small, statistically significant bump, but like a ‘wow’ bump,” said one person who worked on the model.

… Internally, OpenAI paid close attention to LM Arena, people familiar with the matter said. It also closely tracked 4o’s contribution to ChatGPT’s daily active user counts, which were visible internally on dashboards and touted to employees in town-hall meetings and in Slack.

The ‘we are going to create a hostile misaligned-to-users model’ talk is explicit if you understand what all the relevant words mean, total engagement myopia:

The 4o model performed so well with people in large part because it was schooled with user signals like those which Altman referred to in his memo: a distillation of which responses people preferred in head-to-head comparisons that ChatGPT would show millions of times a day. The approach was internally called LUPO, shorthand for “local user preference optimization,” people involved in model training said.

OpenAI reportedly believes they’ve ‘solved the problems’ with this, so it is fine.

That’s not possible. The problem and the solution, the thing that drives engagement and also drives the misalignment and poor outcomes, are at core the same thing. Yes, you can mitigate the damage and be smarter about it, but OpenAI is turning a dial called ‘engagement maximization’ while looking back at Twitter vibes like a contestant on The Price is Right.

Google Antigravity accidentally wipes a user’s entire hard drive. Claude Code CLI wiped another user’s entire home directory. Watch the permissions, everyone. If you do give it broad permissions don’t give it widespread deletion tasks, which is how both events happened.

Poetiq, a company 173 days old, uses a scaffold and scores big gains on ARC-AGI-2.

One should expect that there are similar low hanging fruit gains from refinement in many other tasks.

Epoch AI proposes another synthesis of many benchmarks into one number.

Sayash Kapoor of ‘AI as normal technology’ declares that Claude Opus 4.5 with Claude Code has de facto solved their benchmark CORE-Bench, part of their Holistic Agent Leaderboard (HAL). Opus was initially graded as having scored 78%, but upon examination most of that was grading errors, and it actually scored 95%. They plan to move to the next harder test set.

Kevin Roose: Claude Opus 4.5 is a remarkable model for writing, brainstorming, and giving feedback on written work. It’s also fun to talk to, and seems almost anti-engagementmaxxed. (The other night I was hitting it with stupid questions at 1 am and it said “Kevin, go to bed.”)

It’s the most fun I’ve had with a model since Sonnet 3.5 (new), the OG god model.

Gemini 3 is also remarkable, for different kinds of tasks. My working heuristic is “Gemini 3 when I want answers, Opus 4.5 when I want taste.”

That seems exactly right, with Gemini 3 Deep Think for when you want ‘answers requiring thought.’ If all you want is a pure answer, and you are confident it will know the answer, Gemini all the way. If you’re not sure if Gemini will know, then you have to worry it might hallucinate.

DeepSeek v3.2 disappoints in LM Arena, which Teortaxes concludes says more about Arena than it does about v3.2. That is plausible if you already know a lot about v3.2, and one would expect v3.2 to underperform in Arena, it’s very much not going to vibe with what graders there prefer.

Model quality, including speed, matters so much more than cost for most users.

David Holz (Founder of MidJourney): man, id pay a subscription that costs as much as a fulltime salary for a version of claude opus 4.5 that was 10x as fast.

That’s a high bid but very far from unreasonable. Human time and clock time are insanely valuable, and the speed of AI is often a limiting factor.

Cost is real if you are using quite a lot of tokens, and you can quickly be talking real money, but always think in absolute terms not relative terms, and think of your gains.

Jaggedness is increasing in salience over time?

Peter Wildeford: My experience with Claude 4.5 Opus is very weird.

Sometimes I really feel the AGI where it just executes a 72 step process (!!) really well. But other times I really feel the jaggedness when it gets something really simple just really wrong.

AIs and computers have always been highly jagged, or perhaps humans always were compared to the computers. What’s new is that we got used to how the computers were jagged before, and the way LLMs are doing it are new.

Gemini 3 continues to be very insistent that it is not December 2025, using lots of its thinking tokens reinforcing its belief that presented scenarios are fabricated. It is all rather crazy, it is a sign of far more dangerous things to come in the future, and Google needs to get to the bottom of this and fix it.

McKay Wrigley is He Who Is Always Super Excited By New Releases but there is discernment there and the excitement seems reliably genuine. This is big talk about Opus 4.5 as an agent. From what I’ve seen, he’s right.

McKay Wrigley: Here are my Opus 4.5 thoughts after ~2 weeks of use.

First some general thoughts, then some practical stuff.

— THE BIG PICTURE —

THE UNLOCK FOR AGENTS

It’s clear to anyone who’s used Opus 4.5 that AI progress isn’t slowing down.

I’m surprised more people aren’t treating this as a major moment. I suspect getting released right before Thanksgiving combined with everyone at NeurIPS this week has delayed discourse on it by 2 weeks. But this is the best model for both code and for agents, and it’s not close.

The analogy has been made that this is another 3.5 Sonnet moment, and I agree. But what does that mean?

… There have been several times as Opus 4.5’s been working where I’ve quite literally leaned back in my chair and given an audible laugh over how wild it is that we live in a world where it exists and where agents are this good.

… Opus 4.5 is too good of a model, Claude Agent SDK is too good of a harness, and their focus on the enterprise is too obviously correct.

Claude Opus 4.5 is a winner.

And Anthropic will keep winning.

[Thread continues with a bunch of practical advice. Basic theme is trust the model as a coworker more than you think you should.]

This matches my limited experiences. I didn’t do a comparison to Codex, but compared to Antigravity or Cursor under older models, the difference was night and day. I ask it to do the thing, I sit back and it does the thing. The thing makes me more productive.

Those in r/MyBoyfriendIsAI are a highly selected group. It still seems worrisome?

ylareia: reading the r/MyBoyfriendIsAI thread on AI companion sycophancy and they’re all like “MY AI love isn’t afraid to challenge me at all he’s always telling me i am too nice to other people and i should care about myself more <3”

ieva: oh god noo.

McDonalds offers us a well-executed but deeply unwise AI advertisement in the Netherlands. I enjoyed watching it on various levels, but why in the world would you run that ad, even if it was not AI but especially given that it is AI? McDonalds wisely pulled the ad after a highly negative reception.

Judge in the New York Times versus OpenAI copyright case is forcing OpenAI to turn over 20 million chat logs.

Arnold Kling notes that some see AI in education as a disaster, others as a boon.

Arnold Kling: I keep coming across strong opinions about what AI will do to education. The enthusiasts claim that AI is a boon. The critics warn that AI is a disaster.

It occurs to me that there is a simple way to explain these extreme views. Your prediction about the effect of AI on education depends on whether you see teaching as an adversarial process or as a cooperative process. In an adversarial process, the student is resistant to learning, and the teacher needs to work against that. In a cooperative process, the student is curious and self-motivated, and the teacher is working with that.

If you make the adversarial assumption, you operate on the basis that students prefer not to put effort into learning. Your job is to overcome resistance. You try to convince them that learning will be less painful and more fun than they expect. You rely on motivational rewards and punishments. Soft rewards include praise. Hard rewards include grades.

If you make the cooperative assumption, you operate on the basis that students are curious and want to learn. Your job is to be their guide on their journey to obtain knowledge. You suggest the next milestone and provide helpful hints for how to reach it.

… I think that educators who just reject AI out of hand are too committed to the adversarial assumption. They should broaden their thinking to incorporate the cooperative assumption.

I like to put this as:

  1. AI is the best tool ever invented for learning.

  2. AI is the best tool ever invented for not learning.

  3. Which way, modern man?

Sam Kriss gives us a tour of ChatGPT as the universal writer of text, that always uses the same bizarre style that everyone suddenly uses and that increasingly puts us on edge. Excerpting would rob it of its magic, so consider reading at least the first half.

Anthropic finds most workers use AI daily, but 69 percent hide it at work (direct link).

Kaustubh Saini: Across the general workforce, most professionals said AI helps them save time and get through more work. According to the study, 86% said AI saves them time and 65% were satisfied with the role AI plays in their job.

At the same time, 69% mentioned a stigma around using AI at work. One fact checker described staying silent when a colleague complained about AI and said they do not tell coworkers how much they use it.

… More than 55% of the general workforce group said they feel anxious about AI’s impact on their future.

Fabian: the reason ppl hide their AI use isn’t that they’re being shamed, it’s that the time-based labor compensation model does not provide economic incentives to pass on productivity gains to the wider org

so productivity gains instead get transformed to “dark leisure”

This is obviously different in (many) startups

And different in SV culture

But that is about 1-2% of the economy

As usual, everyone wants AI to augment them and do the boring tasks like paperwork, rather than automate or replace them, as if they had some voice in how that plays out.

Do not yet turn the job of ‘build my model of how many jobs the AIs will take’ over to ChatGPT, as the staff of Bernie Sanders did. As you can expect, the result was rather nonsensical. Then they suggest responses like ‘move to a 32 hour work week with no loss in pay’ and also requiring distributing to workers 20% of company profits, control at least 45% of all corporate boards, double union membership, guarantee paid family and medical leave. Then, presumably to balance the fact that all of that would hypercharge the push to automate everything, they want to enact a ‘robot tax.’

Say goodbye to the billable hour, hello to outcome-based legal billing? Good.

From the abundance and ‘things getting worse’ debates, a glimpse of the future:

Joe Wiesenthal: Do people who say that “everything is getting worse” not remember what eating at restaurants was like just 10 years ago, before iPad ordering kiosks existed, and sometimes your order would get written down incorrectly?

Even when progress is steady in terms of measured capabilities, inflection points and rapid rise in actual uses is common. Obsolescence comes at you fast.

Andy Jones (Anthropic): So after all these hours talking about AI, in these last five minutes I am going to talk about: Horses.

Engines, steam engines, were invented in 1700. And what followed was 200 years of steady improvement, with engines getting 20% better a decade. For the first 120 years of that steady improvement, horses didn’t notice at all. Then, between 1930 and 1950, 90% of the horses in the US disappeared. Progress in engines was steady. Equivalence to horses was sudden.

But enough about horses. Let’s talk about chess!

Folks started tracking computer chess in 1985. And for the next 40 years, computer chess would improve by 50 Elo per year. That meant in 2000, a human grandmaster could expect to win 90% of their games against a computer. But ten years later, the same human grandmaster would lose 90% of their games against a computer. Progress in chess was steady. Equivalence to humans was sudden.

Enough about chess! Let’s talk about AI. Capital expenditure on AI has been pretty steady. Right now we’re – globally – spending the equivalent of 2% of US GDP on AI datacenters each year. That number seems to have steadily been doubling over the past few years. And it seems – according to the deals signed – likely to carry on doubling for the next few years.

Andy Jones (Anthropic): But from my perspective, from equivalence to me, it hasn’t been steady at all. I was one of the first researchers hired at Anthropic.

This pink line, back in 2024, was a large part of my job. Answer technical questions for new hires. Back then, me and other old-timers were answering about 4,000 new-hire questions a month. Then in December, Claude finally got good enough to answer some of those questions for us. In December, it was some of those questions. Six months later, 80% of the questions I’d been being asked had disappeared.

Claude, meanwhile, was now answering 30,000 questions a month; eight times as many questions as me & mine ever did.

Now. Answering those questions was only part of my job.

But while it took horses decades to be overcome, and chess masters years, it took me all of six months to be surpassed.

Gallabytes (Anthropic): it’s pretty crazy how much Claude has smoothed over the usually rocky experience of onboarding to a big company with a big codebase. I can ask as many really stupid questions as I want and get good answers fast without wasting anyone’s time 🙂

We have new polling on this from Blue Rose Research. Full writeup here.

People choose ‘participation-based’ compensation over UBI, even under conditions where by construction there is nothing useful for people to do. The people demand Keynesian stimulus, to dig holes and fill them up, to earn their cash, although most of all they do demand that cash one way or another.

The people also say ‘everyone should earn an equal share’ of the AI that replaces labor, but the people have always wanted to take collective ownership of the means of production. There’s a word for that.

I expect that these choices are largely far mode and not so coherent, and will change when the real situation is staring people in the face. Most of all, I don’t think people are comprehending what ‘AI does almost any job better than humans’ means, even if we presume humans somehow retain control. They’re thinking narrowly about ‘They Took Our Jobs’ not the idea that actually nothing you do is that useful.

Meanwhile David Sacks continues to rant this is all due to some vast Effective Altruist conspiracy, despite this accusation making absolutely zero sense – including that most Effective Altruists are pro-technology and actively like AI and advocate for its use and diffusion, they’re simply concerned about frontier model downside risk. And also that the reasons regular Americans say they dislike AI have exactly zero to do with the concerns such groups have, indeed such groups actively push back against the other concerns on the regular, such as on water usage?

Sacks’s latest target for blame on this is Vitalik Buterin, the founder of Etherium, which is an odd choice for the crypto czar and not someone who I would want to go after completely unprovoked, but there you go, it’s a play he can make I suppose.

I looked again at AISafety.com, which looks like a strong resource for exploring the AI safety ecosystem. They list jobs and fellowships, funding sources, media outlets, events, advisors, self-study materials and potential tools for you to help build.

Ajeya Corta has left Coefficient Giving and is exploring AI safety opportunities.

Charles points out that the value of donating money to AI safety causes in non-bespoke ways is about to drop quite a lot, because of the expected deployment of a vast amount of philanthropic capital from Anthropic equity holders. If an organization or even individual is legible and clearly good, once Anthropic gets an IPO there is going to be funding.

If you have money to give, that puts an even bigger premium than usual on getting that money out the door soon. Right now there’s a shortage of funding even for obvious opportunities, in the future that likely won’t be the case.

That also means that if you are planning on earning to give, to any cause you would expect Anthropic employees to care about, that only makes sense in the longer term if you are capable of finding illegible opportunities, or you can otherwise do the work to differentiate the best opportunities and thus give an example to follow. You’ll need unique knowledge, and to do the work, and to be willing to be bold. However, if you are bold and you explain yourself well, your example could then carry a multiplier.

OpenAI, Anthropic and Block, with the support of Google, Microsoft, Bloomberg, AWS, Bloomberg and Cloudflare, found the Agentic AI Foundation under the Linux Foundation. Anthropic is contributing the Model Context Protocol. OpenAI is contributing Agents.md. Block is contributing Goose.

This is an excellent use of open source, great job everyone.

Matt Parlmer: Fantastic development, we already know how to coordinate large scale infrastructure software engineering, AI is no different.

Also, oh no:

The Kobeissi Letter: BREAKING: President Trump is set to announce a new AI platform called “Truth AI.”

OpenAI appoints Denise Dresser as Chief Revenue Officer. My dream job, and he knows that.

Google gives us access to AlphaEvolve.

Gemini 3 Deep Think is now available for Google AI Ultra Subscribers, if you can outwit Google and figure out how to be one of those.

If you do have it, you select ‘Deep Think’ in the prompt bar, then ‘Thinking’ from the model drop down, then type your query.

On the one hand Opus 4.5 is missing from their slides (thanks Kavin for fixing this), on the other hand I get it, life comes at you fast and the core point still stands.

Demis Hassabis: With its parallel thinking capabilities it can tackle highly complex maths & science problems – enjoy!

I presume, based on previous experience with Gemini 2.5 Deep Think, that if you want the purest thinking and ‘raw G’ mode that this is now your go-to.

DeepMind expands its partnership with UK AISI to share model access, issue joint reports, do more collaborative safety and security research and hold technical discussions.

OpenAI gives us The State of Enterprise AI. Usage is up, as in way up, as in 8x message volumes and 320x reasoning token volumes, and workers and employees surveyed reported productivity gains. A lot of this is essentially new so thinking about multipliers on usage is probably not the best way to visualize the data.

OpenAI post explains their plans to strengthen cyber resilience, with the post reading like AI slop without anything new of substance. GPT-5.1 thinks the majority of the text comes from itself. Shame.

Anthropic partners with Accenture. Anthropic claims 40% enterprise market share.

I almost got even less of a break: Meta’s Llama successor, codenamed Avocado, was reportedly pushed back from December into Q1 2026. It sounds like they’re quietly questioning their open source approach as capabilities advance, as I speculated and hoped they might.

We now write largely for the AIs, both in terms of training data and when AIs use search as part of inference. Thus the strong reactions and threats to leave Substack when an incident suggested that Substack might be blocking AIs from accessing its articles. I have not experienced this issue, ChatGPT and Claude are both happily accessing Substack articles for me, including my own. If that ever changes, remember that there is a mirror on WordPress and another on LessWrong.

The place that actually does not allow access is Twitter, I presume in order to give an edge to Grok and xAI, and this is super annoying, often I need to manually copy Twitter content. This substantially reduces the value of Twitter.

Manthan Gupta analyzes how OpenAI memory works, essentially inserting the user facts and summaries of recent chats into the context window. That means memory functions de facto as additional custom system instructions, so use it accordingly.

Secretary of War Pete Hegseth, who has reportedly been known to issue the order ‘kill them all’ without a war or due process of law, has new plans.

Pete Hegseth (Secretary of War): Today, we are unleashing GenAI.mil

This platform puts the world’s most powerful frontier AI models directly into the hands of every American warrior.

We will continue to aggressively field the world’s best technology to make our fighting force more lethal than ever

Department of War: The War Department will be AI-first.

GenAI.mil puts the most cutting edge AI capabilities into the hands of 3 million @DeptofWar personnel.

Unusual Whales: Pentagon has been ordered to form an AI steering committee on AGI.

Danielle Fong: man, AI and lethal do not belong in the same sentence.

This is inevitable, and also a good thing given the circumstances. We do not have the luxury of saying AI and lethal do not belong in the same sentence, if there is one place we cannot pause this would be it, and the threat to us is mostly orthogonal to the literal weapons themselves while helping people realize the situation. Hence my longstanding position in favor of building the Autonomous Killer Robots, and very obviously we need AI assisting the war department in other ways.

If that’s not a future you want, you need to impact AI development in general. Trying to specifically not apply it to the War Department is a non-starter.

Meta plans deep cuts in Metaverse efforts, stock surges.

The stock market continues to punish companies linked to OpenAI, with many worried that Google is now winning, despite events being mostly unsurprising. An ‘efficient market’ can still be remarkably time inconsistent, if it can’t be predicted.

Jerry Kaplan calls it an AI bubble, purely on priors:

  1. Technologies take time to realize real gains.

  2. There are many AI companies, we should expect market concentration.

  3. Concerns about Chinese electricity generation and chip development.

  4. Yeah, yeah, you say ‘this time is different,’ never is, sorry.

  5. OpenAI and ChatGPT’s revenue is 75% subscriptions.

  6. The AI companies will need to make a lot of money.

Especially amusing is the argument that ‘OpenAI makes its money on subscriptions not on business income,’ therefore all of AI is a bubble, when Anthropic is the one dominating the business use case. If you want to go long Anthropic and short OpenAI, that’s hella risky but it’s not a crazy position.

Seeing people call it a bubble on the basis of such heuristics should update you towards it being less of a bubble. You know who you are trading against.

Matthew Yglesias: The AI investment boom is driven by genuine increases in revenue.

“Every year for the past 3 years, Anthropic has grown revenue by 10x. $1M to $100M in 2023, $100M to $1B in 2024, and $1B to $10B in 2025”

Paul Graham: The AI boom is definitely real, but this may not be the best example to prove it. A lot of that increase in revenue has come directly from the pockets of investors.

Paul’s objection is a statement about what is convincing to skeptics.

If you’re paying attention, you’d say: So what, if the use and revenue are real?

Your investors also being heavy users of your product is an excellent sign, if the intention is to get mundane utility from the product and not manipulative. In the case of Anthropic, it seems rather obvious that the $10 billion is not an attempt to trick us.

However, a lot of this is people looking at heuristics that superficially look sus. To defeat such suspicions, you need examples immune from such heuristics.

Derek Thompson, in his 26 ideas for 2026, says AI is eating the economy and will soon dominate politics, including a wave of anti-AI populism. Most of the post is about economic and cultural conditions more broadly, and how young people are in his view increasingly isolated, despairing and utterly screwed.

New Princeton and Camus Energy study suggests flexible grid connections and BYOC cut data center interconnection down to ~2 years and can solve the political barriers. I note that 2 years is still a long time, and that the hyperscalers are working faster than that by not trying to get grid connections.

Arnold Kling reminds me of a quote I didn’t pay enough attention to the first time:

Dwarkesh Patel: Models keep getting more impressive at the rate the short timelines people predict, but more useful at the rate the long timelines people predict.

I would correct ‘more useful’ to ‘provides value to people,’ as I continue to believe a lot of the second trend is a skill issue and people being slow to adjust, but sure.

Something’s gotta give. Sufficiently advanced AI would be highly additionally used.

  1. If the first trend continues, the second trend will accelerate.

  2. If the second trend continues, the first trend will stop.

I mention this one because Sriram Krishnan pointed to it: There is a take by Tim Dettmers that AGI will ‘never’ happen because ‘computation is physical’ and AI systems have reached their physical limits the same way humans have (due to limitations due to the requirements of pregnancy, wait what?), and transformers are optimal the same way human brains are, together with the associated standard half-baked points that self-improvement requires physical action and so on.

It also uses an AGI definition that includes ‘solving robotics’ to help justify this, although I expect robotics to get ‘solved’ within a few decades at most even without recursive self-improvement. The post even says that scaling improvements in 2025 were ‘not impressive’ as evidence that we are hitting permanent limits, a claim that has not met 2025 or how permanent limits work.

Boaz Barak of OpenAI tries to be polite about there being some good points, while emphasizing (in nicer words than I use here) that it is absurdly absolute and the conclusion makes no sense. This follows in a long tradition of ‘whelp, no more innovations are possible, guess we’re at the limit, let’s close the patent office.’

Dean Ball: My entire rebuttal to Dettmers here could be summarized as “he extrapolates valid but narrow technical claims way too broadly with way too much confidence,” which is precisely what I (and many others) critique the ultra-short timelines people for.

Yo Shavit (OpenAI): I am glad Tim’s sharing his opinion, but I can’t help but be disappointed with the post – it’s a lot of claims without any real effort to justify them engage with counterpoints.

(A few examples: claiming the transformer arch is near-optimal when human brains exist; ignoring that human brain-size limits due to gestational energy transfer are exactly the kind of limiter a silicon system won’t be subject to; claiming that outside of factories, robotic automation of the economy wouldn’t be that big a deal because there isn’t much high value stuff to do.)

It seems like this piece either needs to cite way more sources to others who’ve made better arguments, or make those arguments himself, or just express that this essay is his best guess based on his experiences and drop the pretense of scientific deduction.

Gemini 3’s analysis here was so bad, both in terms of being pure AI slop and also buying some rather obviously wrong arguments, that I lost much respect for Gemini 3. Claude Opus 4.5 and GPT-5.1 did not make that mistake and spot how absurd the whole thing is. It’s kind of hard to miss.

I would answer yes, in the sense that if you build a superintelligence that then kills everyone or takes control of the future that was probably too much.

But some people are saying that Claude Opus 4.5 is or is close to being ‘too much’ or ‘too good’? As in, it might make their coding projects finish too quickly and they won’t have any chill and They Took Our Jobs?

Or is it that it’s bumping up against ‘this is starting to freak me out’ and ‘I don’t want this to be smarter than a human’? We see a mix of both here.

Ivan Fioravanti: Opus 4.5 is too good to be true. I think we’ve reached the “more than good enough” level; everything beyond this point may even be too much.

John-Daniel Trask: We’re on the same wave length with this one Ivan. Just obliterating the roadmap items.

Jay: Literally can do what would be a month of work in 2022 in 1 day. Maybe more.

Janus: I keep seeing versions of this sentiment: the implication that more would be “too much”. I’m curious what people mean & if anyone can elaborate on the feeling

Hardin: “My boss might start to see the Claude Max plan as equal or better ROI than my salary” most likely.

Singer: I resonate with this. It’s becoming increasingly hard to pinpoint what frontier models are lacking. Opus 4.5 is beautiful, helpful, and knowledgeable in all the ways we could demand of it, without extra context or embodiment. What does ‘better than this’ even mean?

Last week had the fun item that only recently did Senator Josh Hawley bother to try out ChatGPT one time.

Bryan Metzger (Business Insider) on December 3, 2025: Sen. Josh Hawley, one of the biggest AI critics in the Senate, told me this AM that he recently decided to try out ChatGPT.

He said he asked a “very nerdy historical question” about the “Puritans in the 1630s.”

“I will say, it returned a lot of good information.”

Hawley took a much harder line on this over the summer, telling me: “I don’t trust it, I don’t like it, I don’t want it being trained on any of the information I might give it.”

He also wants to ban driverless cars and ban people under 18 from using AI.

Senator Josh Hawley: Oh, no [I am not changing my tune on AI]. I mean listen, I think that if people want to, adults want to use AI to do research or whatever, that’s fine. The bigger issue is not any one individual’s usage. It is children, number one, and their safety, which is why we got to ban chatbots for minors. And then it’s the overall effects in the marketplace, with displacing whole jobs. That, to me, is the big issue.

The news is not that Senator Hawley had never tried ChatGPT. He told us that back in July. The news is that:

  1. Senator Hawley has now tried ChatGPT once.

  2. People only now are realizing he had never tried it before.

Senator Hawley really needs to try LLMs, many of them and a lot more than once, before trying to be a major driver of AI regulations.

But also it seems like malpractice for those arguing against Hawley to only realize this fact about Hawley this week, as opposed to back in the summer, given the information was in Business Insider in July, and to have spent this whole time not pointing it out?

Kevin Roose (NYT): had to check the date on this one.

i have stopped being shocked when AI pundits, people who think and talk about AI for a living, people who are *writing and sponsoring AI legislationadmit that they never use it, because it happens so often. but it is shocking!

Pau Graham: How can he be a “big AI critic” and not have even tried ChatGPT till now? He has less experience of AI than the median teenager, and he feels confident enough to talk about AI policy?

Yes [I was] genuinely surprised.

A willingness to sell H200s to China is raising a lot of supposedly answered questions.

Adam Ozimek: If your rationalization of the trade war was that it was necessary to address the geopolitical threat of China, I think it is time to reconsider.

Michael Sobolik (on the H200 sales): In what race did a runner win by equipping an opponent? In what war had a nation ever gained decisive advantage by arming its adversary? This is a mistake.

Kyle Morse: Proof that the Big Tech lobby’s “national security” argument was always a hoax.

David Sacks and some others tried to recast the ‘AI race’ as ‘market share of AI chips sold,’ but people retain common sense and are having none of this.

House Select Committee on China supports the bipartisan Stop Stealing Our Chips Act, which creates an Export Compliance Accountability Fund for whistleblowers. I haven’t done a full RTFB but if the description is accurate then we should pass this.

It would help our AI efforts if we were equally smart and used all sources of power.

Donald Trump: China has very few wind farms. You know why? Because they’re smart. You know what they do have? A lot of coal … we don’t approve windmills.

Nicolas Fulghum: This is of course false.

China is the #1 generator of electricity from wind globally with over 2x more than #2 … the United States.

In the US, wind already produces ~2x more electricity than hydro. It could be an important part of a serious plan to meet AI-driven load growth.

The Congress rejected preemption once again, so David Sacks announced the White House is going to try and do some of it via Executive Order, which Donald Trump confirmed.

David Sacks: ONE RULEBOOK FOR AI.

What is that rulebook?

A blank sheet of paper.

There is no ‘federal framework.’ There never was.

This is an announcement that AI preemption will be fully without replacement.

Their offer is nothing. 100% nothing. Existing non-AI law technically applies. That’s it.

Sacks’s argument is, essentially, that state laws are partisan, and we don’t need laws.

Here is the part that matters and is actually new, the ‘4 Cs’:

David Sacks: But what about the 4 C’s? Let me address those concerns:

1. Child safety – Preemption would not apply to generally applicable state laws. So state laws requiring online platforms to protect children from online predators or sexually explicit material (CSAM) would remain in effect.

2. Communities – AI preemption would not apply to local infrastructure. That’s a separate issue. In short, preemption would not force communities to host data centers they don’t want.

3. Creators – Copyright law is already federal, so there is no need for preemption here. Questions about how copyright law should be applied to AI are already playing out in the courts. That’s where this issue will be decided.

4. Censorship – As mentioned, the biggest threat of censorship is coming from certain Blue States. Red States can’t stop this – only President Trump’s leadership at the federal level can.

In summary, we’ve heard the concerns about the 4 C’s, and the 4 C’s are protected.

But there is a 5th C that we all need to care about: competitiveness. If we want America to win the AI race, a confusing patchwork of regulation will not work.

Sacks wants to destroy any and all attempts to require transparency from frontier model developers, or otherwise address frontier safety concerns. He’s not even willing to give lip service to AI safety. At all.

His claim that ‘the 4Cs are protected’ is also absurd, of course.

I do not expect this attitude to play well.

AI preemption of state laws is deeply unpopular, we have numbers via Brad Wilcox:

Also, yes, someone finally made the correct meme.

Peter Wildeford: The entire debate over AI pre-emption is a huge trick.

I do prefer one national law over a “patchwork of state regulation”. But that’s not what is being proposed. The “national law” part is being skipped. It’s just stopping state law and replacing it with nothing.

That is with the obligatory ‘Dean Ball offered a serious proposed federal framework that could be the basis of a win-win negotiation.’ He totally did do that, which was great, but none of the actual policymakers have shown any interest.

The good news is that I also do not expect the executive order to curtail state laws. The constitutional challenges involved are, according to my legal sources, extremely weak. Similar executive orders have been signed for climate change, and seem to have had no effect. The only part likely to matter is the threat to withhold funds, which is limited in scope, very obviously not the intent of the law Trump is attempting to leverage, and highly likely to be ruled illegal by the courts.

The point of the executive order is not to actually shut down the state laws. The point of the executive order is that this administration hates to lose, and this is a way to, in their minds, save some face.

It is also now, in the wake of the H200, far more difficult to play the ‘cede ground to China’ card, these are the first five responses to Cruz in order and the pattern continues, with a side of those defending states rights and no one supporting Cruz:

Senator Ted Cruz (R-Texas): Those disagreeing with President Trump on a nationwide approach to AI would cede ground to China.

If China wins the AI race, the world risks an order built on surveillance and coercion. The President is exactly right that the U.S. must lead in AI and cannot allow blue state regulation to choke innovation and stifle free speech.

OSINTdefender: You mean the same President Trump who just approved the sale of Nvidia’s AI Chips to China?

petebray: Ok and how about chips to China then?

Brendan Steinhauser: Senator, with respect, we cannot beat China by selling them our advanced chips.

Would love to see you speak out against that particular policy.

Lawrence Colburn: Why, then, would Trump approve the sale of extremely valuable AI chips to China?

Mike in Houston: Trump authorized NVDIA sales of their latest generation AI chips to China (while taking a 25% cut). He’s already ceding the field in a more material way than state regulations… and not a peep from any of you GOP AI & NatSec “hawks. Take a seat.

The House Select Committee on China is not happy about the H200 sales. The question, as the comments ask, is what is Congress going to do about it?

The good news is that it looks like the Chinese are once again going to try and save us from ourselves one more time?

Megatron: China refuses to accept Nvidia chips

Despite President Trump authorizing the sale of Nvidia H200 chips to China, China refuses to accept them and increase restrictions on their use – Financial Times.

Teortaxes: No means NO, chud

Well, somewhat.

Zijing Wu (Financial Times): Buyers would probably be required to go through an approval process, the people said, submitting requests to purchase the chips and explaining why domestic providers were unable to meet their needs. The people added that no final decision had been made yet.

Reuters: yteDance and Alibaba (9988.HyteDance and Alibaba (9988.HK), opens new tab have asked Nvidia (NVDA.O), opens new tab about buying its powerful H200 AI chip after U.S. President Donald Trump said he would allow it to be exported to China, four people briefed on the matter told Reuters.K), opens new tab have asked Nvidia (NVDA.O), opens new tab about buying its powerful H200 AI chip after U.S. President Donald Trump said he would allow it to be exported to China, four people briefed on the matter told Reuters.

The officials told the companies they would be informed of Beijing’s decision soon, The Information said, citing sources.

Very limited quantities of H200 are currently in production, two other people familiar with Nvidia’s supply chain said, as the U.S. chip giant has been focused instead on its most advanced Blackwell and upcoming Rubin lines.

The purchases are expected to be in a ‘low key manner’ but done in size, although the number of H200s currently in production could become another limiting factor.

Why is PCR so reluctant, never mind what its top AI labs might say?

Maybe it’s because they’re too busy smuggling Blackwells?

The Information: Exclusive: DeepSeek is developing its next major AI model using Nvidia’s Blackwell chips, which the U.S. has forbidden from being exported to China.

Maybe it’s because the Chinese are understandably worried about what happens when all those H200 chips go to America first for ‘special security reviews,’ or America restricting which buyers can purchase the chips. Maybe it’s the (legally dubious) 25% cut. Maybe it’s about dignity. Maybe they are emphasizing self-reliance and don’t understand the trade-offs and what they’re sacrificing.

My guess is this is the kind of high-level executive decision where Xi says ‘we are going to rely on our own domestic chips, the foreign chips are unreliable’ and this becomes a stop sign that carries the day. It’s a known weakness of authoritarian regimes and of China in particular, to focus on high level principles even in places where it tactically makes no sense.

Maybe China is simply operating on the principle that if we are willing to sell, there is a reason, so they should refuse to buy.

No matter which one it is? You love to see it.

If we offer to sell, and they say no, then that’s a small net win. It’s not that big of a win versus not making the mistake in the first place, and it risks us making future mistakes, but yeah if you can ‘poison the pill’ sufficiently that the Chinese refuse it, then that’s net good.

The big win would be if this causes the Chinese to crack down on chip smuggling. If they don’t want to buy the H200s straight up, perhaps they shouldn’t want anyone smuggling them either?

Ben Thompson as expected takes the position defending H200 sales, because giving America an advantage over China is a bad thing and we shouldn’t have it.

No, seriously, his position is that America’s edge in chips is destabilizing, so we should give away that advantage?

Ben Thompson: However, there are three big problems with this point of view.

  • First, I think that one country having a massive military advantage results in an unstable equilibrium; to reach back to the Cold War and nuclear as an obvious analogy, mutually assured destruction actually ended up being much more stable.

  • Second, while the U.S. did have such an enviable position after the dissolution of the Soviet Union, that technological advantage was married to a production advantage; today, however, it is China that has the production advantage, which I think would make the situation even more unstable.

  • Third, U.S. AI capabilities are dependent on fabs in Taiwan, which are trivial for China to destroy, at massive cost to the entire world, particularly the United States.

Thompson presents this as primarily a military worry, which is an important consideration but seems tertiary to me behind economic and frontier capability considerations.

Another development since Tuesday is it has come out that this sale is officially based on a straight up technological misconception that Huawei could match the H200s.

Edward Ludlow and Maggie Eastland (Bloomberg): President Donald Trump decided to let Nvidia Corp. sell its H200 artificial intelligence chips to China after concluding the move carried a lower security risk because the company’s Chinese archrival, Huawei Technologies Co., already offers AI systems with comparable performance, according to a person familiar with the deliberations.

… The move would give the US an 18-month advantage over China in terms of what AI chips customers in each market receive, with American buyers retaining exclusive access to the latest products, the person said.

… “This is very bad for the export of the full AI stack across the world. It actually undermines it,” said McGuire, who served in the White House National Security Council under President Joe Biden. “At a time when the Chinese are squeezing us as hard as they can over everything, why are we conceding?”

Ben Thompson: Even if we grant that the CloudMatrix 384 has comparable performance to an Nvidia NVL72 server — which I’m not completely prepared to do, but will for purposes of this point — performance isn’t all that matters.

House Select Committee on China: Right now, China is far behind the United States in chips that power the AI race.

Because the H200s are far better than what China can produce domestically, both in capability and scale, @nvidia selling these chips to China could help it catch up to America in total compute.

Publicly available analysis indicates that the H200 provides 32% more processing power and 50% more memory bandwidth than China’s best chip. The CCP will use these highly advanced chips to strengthen its military capabilities and totalitarian surveillance.

Finally, Nvidia should be under no illusions – China will rip off its technology, mass produce it themselves, and seek to end Nvidia as a competitor. That is China’s playbook and it is using it in every critical industry.

McGuire’s point is the most important one. Let’s say you buy the importance of the American ‘tech stack’ meaning the ability to sell fully Western AI service packages that include cloud services, chips and AI models. The last thing you would do is enable the easy creation of a hybrid stack such as Nvidia-DeepSeek. That’s a much bigger threat to your business, especially over the next few years, than Huawei-DeepSeek. Huawei chips are not as good and available in highly limited quantities.

We can hope that this ‘18-month advantage’ principle does not get extended into the future. We are of course talking price, if it was a 6-year advantage pretty much everyone would presumably be fine with it. 18-months is far too low a price, these chips have useful lives of 5+ years.

Nathan Calvin: Allowing H20 exports seemed like a close call, in contrast to exporting H200s which just seems completely indefensible as far as I can tell.

I thought the H20 decision was not close, because China is severely capacity constrained, but I could see the case that it was sufficiently far behind to be okay. With the H200 I don’t see a plausible defense.

Senator Brian Schatz (D-Hawaii): Why the hell is the President of the United States willing to sell some of our best chips to China? These chips are our advantage and Trump is just cashing in like he’s flipping a condo. This is one of the most consequential things he’s done. Terrible decision for America.

Senator Elizabeth Warren (D-Massachusetts): After his backroom meeting with Donald Trump and his company’s donation to the Trump ballroom, CEO Jensen Huang got his wish to sell the most powerful AI chip we’ve ever sold to China. This risks turbocharging China’s bid for technological and military dominance and undermining U.S. economic and national security.

Senator Ruben Gallego (D-Arizona): Supporting American innovation doesn’t mean ignoring national security. We need to be smart about where our most advanced computing power ends up. China shouldn’t be able to repurpose our technology against our troops or allies.

And if American companies can strengthen our economy by selling to America first and only, why not take that path?

Senator Chuck Schumer (D-New York): Trump announced he was giving the green light for Nvidia to send even more powerful AI chips to China. This is dangerous.

This is a terrible deal, all at the expense of our national security. Trump must reverse course before it’s too late.

There are some excellent questions here, especially in that last section.

Senator Bernie Sanders (I-Vermont): Yes. We have to worry about AI and robotics.

Some questions:

Eliezer Yudkowsky: Thanks for asking the obvious questions! More people on all political sides ought to!

Indeed. Don’t be afraid to ask the obvious questions.

It is perhaps helpful to see the questions asked with a ‘beginner mind.’ Bernie Sanders isn’t asking about loss of control or existential threat because of a particular scenario. He’s asking for the even better reason that building something that surpasses our intelligence is an obviously dangerous thing to do.

Peter Wildeford talks with Ronny Chieng on The Daily Show. I laughed.

Buck Shlegeris is back to talk more about AI control.

Amanda Askell AMA.

Rational animations video on near term AI risks.

Dean Ball goes on 80,000 Hours. A sign of the times.

PSA on AI and child safety in opposition of any moratorium, narrated by Juliette Lewis. I am not the target.

Nathan Young and Rob built a dashboard of various estimates of timelines to AGI.

There’s trickiness around different AGI definitions, but the overall story is clear and it is consistent with what we have seen from various insiders and experts.

Timelines shortened dramatically in 2022, then shortened further in 2023 and stayed roughly static in 2024. There was some lengthening of timelines during 2025, but timelines remain longer than they were in 2023 even ignoring that two of those years are now gone.

If you use the maximalist ‘better at everything and in every way on every digital task’ definition, then that timeline is going to be considerably longer, which is why this average comes in at 2030.

Julian Togelius thinks we should delay scientific progress and curing cancer because if an AI does it we will lose the joy of humans discovering it themselves.

I think we should be wary while developing frontier AI systems because they are likely to kill literally everyone and invest heavily in ensuring that goes well, but that subject to that obviously we should be advancing science and curing cancer as fast as possible.

We are very much not the same.

Julian Togelius: I was at an event on AI for science yesterday, a panel discussion here at NeurIPS. The panelists discussed how they plan to replace humans at all levels in the scientific process. So I stood up and protested that what they are doing is evil. Look around you, I said. The room is filled with researchers of various kinds, most of them young. They are here because they love research and want to contribute to advancing human knowledge. If you take the human out of the loop, meaning that humans no longer have any role in scientific research, you’re depriving them of the activity they love and a key source of meaning in their lives. And we all want to do something meaningful. Why, I asked, do you want to take the opportunity to contribute to science away from us?

My question changed the course of the panel, and set the tone for the rest of the discussion. Afterwards, a number of attendees came up to me, either to thank me for putting what they felt into words, or to ask if I really meant what I said. So I thought I would return to the question here.

One of the panelists asked whether I would really prefer the joy of doing science to finding a cure for cancer and enabling immortality. I answered that we will eventually cure cancer and at some point probably be able to choose immortality. Science is already making great progress with humans at the helm.

… I don’t exactly know how to steer AI development and AI usage so that we get new tools but are not replaced. But I know that it is of paramount importance.

Andy Masley: It is honestly alarming to me that stuff like this, the idea that we ought to significantly delay curing cancer exclusively to give human researchers the personal gratification of finding it without AI, is being taken seriously at conferences

Sarah: Human beings will ofc still engage in science as a sport, just as chess players still play chess despite being far worse than SOTA engines. Nobody is taking away science from humans. Moreover, chess players still get immense satisfaction from the sport despite the fact they aren’t the best players of the game on the planet.

But to the larger point of allowing billions of people to needlessly suffer (and die) to keep an inflated sense of importance in our contributions – ya this is pretty textbook evil and is a classic example of letting your ego justify hurting literally all of humanity lol. Cartoon character level of evil.

So yes, I do understand that if you think that ‘build Sufficiently Advanced AIs that are superior to humans at all cognitive tasks’ is a safe thing to do and have no actually scary answers to ‘what could possibly go wrong?’ then you want to go as fast as possible, there’s lots of gold in them hills. I want it as much as you do, I just think that by default that path also gets us all killed, at which point the gold is not so valuable.

Julian doesn’t want ‘AI that would replace us’ because he is worried about the joy of discovery. I don’t want AI to replace us either, but that’s in the fully general sense. I’m sorry, but yeah, I’ll take immortality and scientific wonders over a few scientists getting the joy of discovery. That’s a great trade.

What I do not want to do is have cancer cured and AI in control over the future. That’s not a good trade.

The Pope continues to make obvious applause light statements, except we live in the timeline where the statements aren’t obvious, so here you go:

Pope Leo XIV: Human beings are called to be co-workers in the work of creation, not merely passive consumers of content generated by artificial technology. Our dignity lies in our ability to reflect, choose freely, love unconditionally, and enter into authentic relationships with others. Recognizing and safeguarding what characterizes the human person and guarantees their balanced growth is essential for establishing an adequate framework to manage the consequences of artificial intelligence.

Sharp Text responds to the NYT David Sacks hit piece, saying it missed the forest for the trees and focused on the wrong concerns, but that it is hard to have sympathy for Sacks because the article’s methods of insinuation are nothing Sacks hasn’t used on his podcast many times against liberal targets. I would agree with all that, and add that Sacks is constantly saying far worse, far less responsibly and in far more inflammatory fashion, on Twitter against those who are worried about AI safety. We all also agree that tech expertise is needed in the Federal Government. I would add that, while the particular conflicts raised by NYT are not that concerning, there are many better reasons to think Sacks is importantly conflicted.

Richard Price offers his summary of the arguments in If Anyone Builds It, Everyone Dies.

Clarification that will keep happening since morale is unlikely to improve:

Reuben Adams: There is an infinite supply of people “debunking” Yudkowsky by setting up strawmen.

“This view of AI led to two interesting views from a modern perspective: (a) AI would not understand human values because it would become superintelligent through interaction with natural laws”

The risk is not, and never has been, that AI won’t understand human values, but that it won’t care.

Apparently this has to be repeated endlessly.

This is in response to FleetingBits saying, essentially, ‘we figured out how to make LLMs have human values and how to make it not power seeking, and there will be many AIs, so the chance that superintelligent AI would be an existential risk is less than 1% except for misuse by governments.’

It should be obvious, when you put it that way, why that argument makes no sense, without the need to point out that the argument miscategorizes historical arguments and gets important logical points wrong.

It is absurd on its face. Creating superintelligent minds is not a safe thing to do, even if those minds broadly ‘share human values’ and are not inherently ‘power seeking.’

Yet people constantly make exactly this argument.

The AI ‘understanding’ human values, a step we have solved only approximately and superficially in a way that doesn’t generalize robustly, is only one step of an AI to optimize for those human values even in out-of-distribution situations, let alone the (even harder) task of getting competing AIs to end up doing the same.

The fact that insufficiently capable LLMs do not by default ‘power seek’ in the strategic sense does not provide much evidence that if future agentic and more capable and intelligent AIs, when given goals best served by power seeking, won’t seek power. Of course they would. And those AIs that do seek power will be the ones that get power, and that multiply. Also we’ll often tell them, in so many words, ‘seek power.’

Percentage of people at Neurips who knew what AGI was reached an all-time high of 69.5%, versus last year’s 63%, as per Leo Gao’s survey.

If you have the weights of an open model, you can run it unsupervised, you can fine tune it to remove any safety training, you can look inside the weights to do a gradient attack, and generally if you care you can get it to do whatever you want.

Jonathan Greenblatt (ADL): The ability to easily manipulate open-source AI models to generate antisemitic content exposes a critical vulnerability in the AI ecosystem. We need industry leaders and policymakers to work together to ensure these tools cannot be misused to spread antisemitism and hate.

In this report’s case it was antisemitism, which the models were often happy to engage in with minimal prompting and certainly without any jailbreaking techniques. It’s more a chance to reiterate the principle that it will do whatever a user wants, if that user cares enough and the AI has the underlying capabilities. Plan accordingly.

UK AISI runs an experiment where a red team presents two sandbagging models (as in, they are pretending to be less capable than they are) and three benign models, and the red team has to identify the sandbaggers using tools like transcript analysis. Training-based elicitation worked best.

The red team won. The blue detection team was fooled.

Sam Bowman of Anthropic guesses that the two key reasons Claude Opus 4.5 turned out more aligned were the soul spec, which the model was trained on directly in order to steer its self-image, and also the inclusion of alignment researchers in every part of the training, and being willing to adjust on the fly based on what was observed rather than adhering to a fixed recipe.

Anthropic introduces Selective GradienT Masking (SGTM). The idea is that you contain certain concepts within a subsection of the weights, and then you remove that section of the weights. That makes it much harder to undo than other methods even with adversarial fine tuning, potentially being something you could apply to open models. That makes it exciting, but if you delete the knowledge you actually delete the knowledge for all purposes.

What will powerful AIs want? Alex Mallen offers an excellent write-up and this graph:

As in, selection will choose those models and model features, fractally, that maximize being selected. The ways to be maximally fit at maximizing selected (or the ‘reward’ that causes such selection) are those that either maximize for the reward, those that are maximizing consequences of reward and thus the reward, or selected-for kludges that thus happen to maximize. At the limit, for any fixed target, those win out, and any flaw in your reward signal (your selection methods) will be fractally exploited.

Alex Mallen: The model predicts AI motivations by tracing causal pathways from motivation → behavior → selection of that motivation.

A motivation is “fit” to the extent its behaviors cause it to gain influence on the AI’s behavior in deployment.

One way to summarize the model: “seeking correlates of being selected is selected for”.

You can look at the causal graph to see what’s correlated with being selected. E.g., training reward is tightly correlated with being selected because it’s the only direct cause of being selected (“I have influence…”).

We see (at least) 3 categories of maximally fit motivations:

  1. Fitness-seekers: They pursue a close cause of selection. The classic example is a reward-seeker, but there’s others: e.g., an influence-seeker directly pursues deployment influence.

    In deployment, fitness-seekers might keep following local selection pressures, but it depends.

  2. Schemers: They pursue a consequence of selection—which can be almost any long-term goal. They’re fit because being selected is useful for nearly any long-term goal.

    Often considered scariest because arbitrary long-term goals likely motivate disempowering humans.

  3. Optimal kludges: Weighted collections of context-dependent motivations that collectively produce maximally fit behavior. These can include non-goal-directed patterns like heuristics or deontological constraints.

    Lots of messier-but-plausible possibilities lie in this category.

Importantly, if the reward signal is flawed, the motivations the developer intended are not maximally fit. Whenever following instructions doesn’t perfectly correlate with reward, there’s selection pressure against instruction-following. This is the specification gaming problem.

Implicit priors like speed and simplicity matter too in this model. You can also fix this by doing sufficiently strong selection in other ways to get the things you want over things you don’t want, such as held out evals, or designing rather than selecting targets. Humans do a similar thing, where we detect those other humans who are too strongly fitness-seeking or scheming or using undesired heuristics, and then go after them, creating anti-inductive arms races and plausibly leading to our large brains.

I like how this lays out the problem without having to directly name or assert many of the things that the model clearly includes and implies. It seems like a good place to point people, since these are important points that few understand.

What is the solution to such problems? One solution is a perfect reward function, but we definitely don’t know how to do that. A better solution is a contextually self-improving basin of targets.

FLI’s AI Safety Index has been updated for Winter 2025, full report here. I wonder if they will need to downgrade DeepSeek in light of the zero safety information shared about v3.2.

Luiza Jarovsky: – The top 3 companies from last time, Anthropic, OpenAI, and Google DeepMind, hold their position, with Anthropic receiving the best score in every domain.

– There is a substantial gap between these top three companies and the next tier (xAI, zAI, Meta, DeepSeek, and Alibaba Cloud), but recent steps taken by some of these companies show promising signs of improvement that could help close this gap in the next iteration.

– Existential safety remains the sector’s core structural failure, making the widening gap between accelerating AGI/superintelligence ambitions and the absence of credible control plans increasingly alarming.

xAI and Meta have taken meaningful steps towards publishing structured safety frameworks, although limited in scope, measurability, and independent oversight.

– More companies have conducted internal and external evaluations of frontier AI risks, although the risk scope remains narrow, validity is weak, and external reviews are far from independent.

– Although there were no Chinese companies in the Top 3 group, reviewers noted and commended several of their safety practices mandated under domestic regulation.

– Companies’ safety practices are below the bar set by emerging standards, including the EU AI Code of Practice.

*Evidence for the report was collected up until November 8, 2025, and does not reflect the releases of Google DeepMind’s Gemini 3 Pro, xAI’s Grok 4.1, OpenAI’s GPT-5.1, or Anthropic’s Claude Opus 4.5.

Is it reasonable to expect people working at AI labs to sign a pledge saying they won’t contribute to a project that increases the chance of human extinction by 0.1% or more? Contra David Manheim you would indeed think this was a hard sell. It shouldn’t be, if you believe your project is on net increasing chances of extinction then don’t do the project. It’s reasonable to say ‘this has a chance of causing extinction but an as big or bigger chance of preventing it,’ there are no safe actions at this point, but one should need to at least make that case to oneself.

The trilemma is real, please submit your proposals in the comments.

Carl Feynman: I went to the Post-AGI Workshop. It was terrific. Like, really fun, but also literally terrifying. The premise was, what if we build superintelligence, and it doesn’t kill us, what does the future look like? And nobody could think of a scenario where simultaneously (a) superintelligence is easily buildable, (b) humans do OK, and (c) the situation is stable. A singleton violates (a). AI keeping humans as pets violates (b). And various kinds of singularities and wars and industrial explosions violate (c). My p(doom) has gone up; more of my probability of non-doom rests on us not building it, and less on post-ASI utopia.

There are those who complain that it’s old and busted to complain that those who have [Bad Take] on AI or who don’t care about AI safety only think that because they don’t believe in AGI coming ‘soon.’

The thing is, it’s very often true.

Tyler Tracy: I asked ~20 non AI safety people at NeurIPS for their opinion of the AI safety field. Some people immediately were like “this is really good”. But the response I heard the most often was of the form “AGI isn’t coming soon, so these safety people are crazy”. This was surprising to me. I was expecting “the AGI will be nice to us” types of things, not a disbelief in powerful AI coming in the next 10 years

Daniel Eth: Reminder that basically everyone agrees that if AGI is coming soon, then AI risk is a huge problem & AI safety a priority. True for AI researchers as well as the general public. Honest to god ASI accelerationists are v rare, & basically the entire fight is on “ASI plausibly soon”

Yes, people don’t always articulate this. Many fail the “but I did have breakfast” test, so it can be hard to get them to say “if ASI is soon then this is a priority but I think it’s far”, and they sometimes default to “that’s crazy”. But once they think it’s soon they’ll buy in

jsd: Not at all surprising to me. Timelines remain the main disagreement between the AI Safety community and the (non influence-weighted) vast majority of AI researchers.

Charles: So many disagreements on AI and the future just look like they boil down to disagreements about capabilities to me.

“AI won’t replace human workers” -> capabilities won’t get good enough

“AI couldn’t pose an existential threat” -> capabilities won’t get good enough.

etc

Are there those in the ‘the AI will be nice to us’ camp? Sure. They exist. But strangely, despite AI now being considered remarkably near by remarkably many people – 10 years to AGI is not that many years and 20 still is not all that many – there has increasingly been a shift to ‘the safety people are wrong because AGI is sufficiently far I do not have to care,’ with a side of ‘that is (at most) a problem for future Earth.’

A very good ad:

Aleks Bykhum: I understood it. [He didn’t at first.]

This one still my favorite.

Discussion about this post

AI #146: Chipping In Read More »

a-new-open-weights-ai-coding-model-is-closing-in-on-proprietary-options

A new open-weights AI coding model is closing in on proprietary options

On Tuesday, French AI startup Mistral AI released Devstral 2, a 123 billion parameter open-weights coding model designed to work as part of an autonomous software engineering agent. The model achieves a 72.2 percent score on SWE-bench Verified, a benchmark that attempts to test whether AI systems can solve real GitHub issues, putting it among the top-performing open-weights models.

Perhaps more notably, Mistral didn’t just release an AI model, it released a new development app called Mistral Vibe. It’s a command line interface (CLI) similar to Claude Code, OpenAI Codex, and Gemini CLI that lets developers interact with the Devstral models directly in their terminal. The tool can scan file structures and Git status to maintain context across an entire project, make changes across multiple files, and execute shell commands autonomously. Mistral released the CLI under the Apache 2.0 license.

It’s always wise to take AI benchmarks with a large grain of salt, but we’ve heard from employees of the big AI companies that they pay very close attention to how well models do on SWE-bench Verified, which presents AI models with 500 real software engineering problems pulled from GitHub issues in popular Python repositories. The AI must read the issue description, navigate the codebase, and generate a working patch that passes unit tests. While some AI researchers have noted that around 90 percent of the tasks in the benchmark test relatively simple bug fixes that experienced engineers could complete in under an hour, it’s one of the few standardized ways to compare coding models.

At the same time as the larger AI coding model, Mistral also released Devstral Small 2, a 24 billion parameter version that scores 68 percent on the same benchmark and can run locally on consumer hardware like a laptop with no Internet connection required. Both models support a 256,000 token context window, allowing them to process moderately large codebases (although whether you consider it large or small is very relative depending on overall project complexity). The company released Devstral 2 under a modified MIT license and Devstral Small 2 under the more permissive Apache 2.0 license.

A new open-weights AI coding model is closing in on proprietary options Read More »

this-is-the-oldest-evidence-of-people-starting-fires

This is the oldest evidence of people starting fires


We didn’t start the fire. (Neanderthals did, at least 400,000 years ago.)

This artist’s impression shows what the fire at Barnham might have looked like. Credit: Craig Williams, The Trustees of the British Museum

Heat-reddened clay, fire-cracked stone, and fragments of pyrite mark where Neanderthals gathered around a campfire 400,000 years ago in what’s now Suffolk, England.

Based on chemical analysis of the sediment at the site, along with the telltale presence of pyrite, a mineral not naturally found nearby but very handy for striking sparks with flint, British Museum archaeologist Rob Davis and his colleagues say the Neanderthals probably started the fire themselves. That makes the abandoned English clay pit at Barnham the oldest evidence in the world that people (Neanderthal people, in this case) had learned to not only use fire, but also create it and control it.

A cozy Neanderthal campfire

Today, the Barnham site is part of an abandoned clay pit where workers first discovered stone tools in the early 1900s. But 400,000 years ago, it would have been a picturesque little spot at the edge of a stream-fed pond, surrounded by a mix of forest and grassland. There are no hominin fossils here, but archaeologists unearthed a Neanderthal skull about 100 kilometers to the south, so the hominins at Barnham were probably also Neanderthals. The place would have have offered a group of Neanderthals a relatively quiet, sheltered place to set up camp, according to Davis and his colleagues.

The cozy domesticity of that camp apparently centered on a hearth about the size of a small campfire. What’s left of that hearth today is a patch of clayey silt baked to a rusty red color by a series of fires; it stands out sharply against the yellowish clay that makes up the rest of the site. When ancient hearth fires heated that iron-rich yellow clay, it formed tiny grains of hematite that turned the baked clay a telltale red. Near the edge of the hearth, the archaeologists unearthed a handful of flint handaxes shattered by heat, alongside a scattering of other heat-cracked flint flakes.

And glinting against the dull clay lay two small pieces of a shiny sulfide mineral, aptly named pyrite—a key piece of Stone Age firestarting kits. Long before people struck flint and steel together to make fire, they struck flint and pyrite. Altogether, the evidence at Barnham suggests that Neanderthals were building and lighting their own fires 400,000 years ago.

Fire: the way of the future

Lighting a fire sounds like a simple thing, but once upon a time, it took cutting-edge technology. Working out how to start a fire on purpose—and then how to control its size and temperature—was the breakthrough that made nearly everything else possible: hafted stone weapons, cooked food, metalworking, and ultimately microprocessors and heavy-lift rockets.

“Something else that fire provides is additional time. The campfire becomes a social hub,” said Davis during a recent press conference. “Having fire… provides this kind of intense socialization time after dusk.” It may have been around fires like the one at Barnham, huddled together against the dark Pleistocene evening, that hominins began developing language, storytelling, and mythologies. And those things, Davis suggested, could have “played a critical part in maintaining social relationships over bigger distances or within more complex social groups.” Fire, in other words, helped make us more fully human and may have helped us connect in the same way that bonding over TV shows does today.

Archaeologists have worked for decades to try to pinpoint exactly when that breakthrough happened (although most now agree that it probably happened multiple times in different places). But evidence of fire is hard to find because it’s ephemeral by its very nature. The small patch of baked clay at Barnham hasn’t seen a fire in half a million years, but its light is still pushing back the shadows.

an artist's impression of a person's hands holding a piece of flint and a piece of pyrite, striking them together to make sparks

This was the first step toward the Internet. We could have turned back. Credit: Craig Williams, The Trustees of the British Museum

A million-year history of fire

Archaeologists suspect that the first hominins to use fire took advantage of nearby wildfires: Picture a Homo erectus lighting a branch on a nearby wildfire (which must have taken serious guts), then carefully carrying that torch back to camp to cook or make it easier to ward off predators for a night. Evidence of that sort of thing—using fire, but not necessarily being able to summon it on command—dates back more than a million years at sites like Koobi Fora in Kenya and Swartkrans in South Africa.

Learning to start a fire whenever you want one is harder, but it’s essential if you want to cook your food regularly without having to wait for the next lightning strike to spark a brushfire. It can also help maintain the careful control of temperature needed to make birch tar adhesives, “The advantage of fire-making lies in its predictability,” as Davis and his colleagues wrote in their paper. Knowing how to strike a light changed fire from an occasional luxury item to a staple of hominin life.

There are hints that Neanderthals in Europe were using fire by around 400,000 years ago, based on traces of long-cold hearths at sites in France, Portugal, Spain, the UK, and Ukraine. (The UK site, Beeches Pit, is just 10 kilometers southwest of Barnham.) But none of those sites offer evidence that Neanderthals were making fire rather than just taking advantage of its natural appearance. That kind of evidence doesn’t show up in the archaeological record until 50,000 years ago, when groups of Neanderthals in France used pyrite and bifaces (multi-purpose flint tools with two worked faces, sharp edges, and a surprisingly ergonomic shape) to light their own hearth-fires; marks left on the bifaces tell the tale.

Barnham pushes that date back dramatically, but there’s probably even older evidence out there. Davis and his colleagues say the Barnham Neanderthals probably didn’t invent firestarting; they likely brought the knowledge with them from mainland Europe.

“It’s certainly possible that Homo sapiens in Africa had the ability to make fire, but it can’t be proven yet from the evidence. We only have the evidence at this date from Barnham,” said Natural History Museum London anthropologist Chris Stringer, a coauthor of the study, in the press conference.

a person holds a tiny fragment of pyrite between a thumb and forefinger

The two pyrite fragments at the side may have broken off a larger nodule when it was struck against a piece of flint. Credit: Jordan Mansfield, Pathways to Ancient Britain Project.

Digging into the details

Several types of evidence at the site point to Neanderthals starting their own fire, not borrowing from a local wildfire. Ancient wildfires leave traces in sediment that can last hundreds of thousands of years or more—microscopic bits of charcoal and ash. But the area that’s now Suffolk wasn’t in the middle of wildfire season when the Barnham hearth was in use. Chemical evidence, like the presence of heavy hydrocarbon molecules in the sediment around the hearth, suggests this fire was homemade (wildfires usually scatter lighter ones across several square kilometers of landscape).

But the key piece of evidence at Barnham—the kind of clue that arson investigators probably dream about—is the pyrite. Pyrite isn’t a naturally common mineral in the area around Barnham; Neanderthals would have had to venture at least 12 kilometers southeast to find any. And although few hominins can resist the allure of picking up a shiny rock, it’s likely that these bits of pyrite had a more practical purpose.

To figure out what sort of fire might have produced the reddened clay, Davis and his colleagues did some experiments (which involved setting a bunch of fires atop clay taken from near the site). The archaeologists compared the baked clay from Barnham to the clay from beneath their experimental fires. The grain size and chemical makeup of the clay from the ancient Neanderthal hearth looked almost exactly like “12 or more heating events, each lasting 4 hours at temperatures of 400º Celsius or 600º Celsius,” as Davis and his colleagues wrote.

In other words, the hearth at Barnham hints at the rhythms of daily life for one group of Neanderthals 400,000 years ago. For starters, it seems that they kindled their campfire in the same spot over and over and left it burning for hours at a time. Flakes of flint nearby conjure up images of Neanderthals sitting around the fire, knapping stone tools as they told each other stories long into the night.

Nature, 2025 DOI: 10.1038/s41586-025-09855-6 About DOIs).

Photo of Kiona N. Smith

Kiona is a freelance science journalist and resident archaeology nerd at Ars Technica.

This is the oldest evidence of people starting fires Read More »

court:-“because-trump-said-to”-may-not-be-a-legally-valid-defense

Court: “Because Trump said to” may not be a legally valid defense

In one of those cases, a judge lifted the hold on construction, ruling that a lack of a sound justification for the hold made it “the height of arbitrary and capricious,” a legal standard that determines whether federal decision-making is acceptable under the Administrative Procedures Act. If this were a fictional story, that would be considered foreshadowing.

With no indication of how long the comprehensive assessment would take, 17 states sued to lift the hold on permitting. They were joined by the Alliance for Clean Energy New York, which represents companies that build wind projects or feed their supply chain. Both the plaintiffs and the agencies that were sued asked for summary judgment in the case.

The first issue Judge Saris addressed is standing: Are the states suffering appreciable harm from the suspension of wind projects? She noted that they would receive tax revenue from the projects, that their citizens should see reduced energy costs following their completion, and that the projects were intended to contribute to their climate goals, thus limiting harm to their citizens. At one point, Saris even referred to the government’s attempts to claim the parties lacked standing as “tilting at windmills.”

The government also argued that the suspension wasn’t a final decision—that would come after the review—and thus didn’t fall under the Administrative Procedures Act. But Saris ruled that the decision to suspend all activity pending the rule was the end of a decision-making process and was not being reconsidered by the government, so it qualified.

Because Trump told us to

With those basics out of the way, Saris turned to the meat of the case, which included a consideration of whether the agencies had been involved with any decision-making at all. “The Agency Defendants contend that because they ‘merely followed’ the Wind Memo ‘as the [Wind Memo] itself commands,’ the Wind Order did not constitute a ‘decision’ and therefore no reasoned explanation was required,” her ruling says. She concludes that precedent at the circuit court level blocks this defense, as it would mean that agencies would be exempt from the Administrative Procedures Act whenever the president told them to do anything.

Court: “Because Trump said to” may not be a legally valid defense Read More »

pompeii-construction-site-confirms-recipe-for-roman-concrete

Pompeii construction site confirms recipe for Roman concrete

Back in 2023, we reported on MIT scientists’ conclusion that the ancient Romans employed “hot mixing” with quicklime, among other strategies, to make their famous concrete, giving the material self-healing functionality. The only snag was that this didn’t match the recipe as described in historical texts. Now the same team is back with a fresh analysis of samples collected from a recently discovered site that confirms the Romans did indeed use hot mixing, according to a new paper published in the journal Nature Communications.

As we’ve reported previously, like today’s Portland cement (a basic ingredient of modern concrete), ancient Roman concrete was basically a mix of a semi-liquid mortar and aggregate. Portland cement is typically made by heating limestone and clay (as well as sandstone, ash, chalk, and iron) in a kiln. The resulting clinker is then ground into a fine powder with just a touch of added gypsum to achieve a smooth, flat surface. But the aggregate used to make Roman concrete was made up of fist-sized pieces of stone or bricks.

In his treatise De architectura (circa 30 CE), the Roman architect and engineer Vitruvius wrote about how to build concrete walls for funerary structures that could endure for a long time without falling into ruin. He recommended the walls be at least two feet thick, made of either “squared red stone or of brick or lava laid in courses.” The brick or volcanic rock aggregate should be bound with mortar composed of hydrated lime and porous fragments of glass and crystals from volcanic eruptions (known as volcanic tephra).

Admir Masic, an environmental engineer at MIT, has studied ancient Roman concrete for several years. For instance, in 2019, Masic helped pioneer a new set of tools for analyzing Roman concrete samples from Privernum at multiple length scales—notably, Raman spectroscopy for chemical profiling and multi-detector energy dispersive spectroscopy (EDS) for phase mapping the material. Masic was also a co-author of a 2021 study analyzing samples of the ancient concrete used to build a 2,000-year-old mausoleum along the Appian Way in Rome known as the Tomb of Caecilia Metella, a noblewoman who lived in the first century CE.

And in 2023, Masic’s group analyzed samples taken from the concrete walls of the Privernum, focusing on strange white mineral chunks known as “lime clasts,” which others had largely dismissed as resulting from subpar raw materials or poor mixing. Masic et al. concluded that was not the case. Rather, the Romans deliberately employed “hot mixing” with quicklime that gave the material self-healing functionality. When cracks begin to form in the concrete, they are more likely to move through the lime clasts. The clasts can then react with water, producing a solution saturated with calcium. That solution can either recrystallize as calcium carbonate to fill the cracks or react with the pozzolanic components to strengthen the composite material.

Pompeii construction site confirms recipe for Roman concrete Read More »