Author name: Shannon Garcia

why-anthropic’s-claude-still-hasn’t-beaten-pokemon

Why Anthropic’s Claude still hasn’t beaten Pokémon


Weeks later, Sonnet’s “reasoning” model is struggling with a game designed for children.

A game Boy Color playing Pokémon Red surrounded by the tendrils of an AI, or maybe some funky glowing wires, what do AI tendrils look like anyways

Gotta subsume ’em all into the machine consciousness! Credit: Aurich Lawson

Gotta subsume ’em all into the machine consciousness! Credit: Aurich Lawson

In recent months, the AI industry’s biggest boosters have started converging on a public expectation that we’re on the verge of “artificial general intelligence” (AGI)—virtual agents that can match or surpass “human-level” understanding and performance on most cognitive tasks.

OpenAI is quietly seeding expectations for a “PhD-level” AI agent that could operate autonomously at the level of a “high-income knowledge worker” in the near future. Elon Musk says that “we’ll have AI smarter than any one human probably” by the end of 2025. Anthropic CEO Dario Amodei thinks it might take a bit longer but similarly says it’s plausible that AI will be “better than humans at almost everything” by the end of 2027.

A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem.

Can Claude play Pokémon?

A thread: pic.twitter.com/K8SkNXCxYJ

— Anthropic (@AnthropicAI) February 25, 2025

Last month, Anthropic presented its “Claude Plays Pokémon” experiment as a waypoint on the road to that predicted AGI future. It’s a project the company said shows “glimmers of AI systems that tackle challenges with increasing competence, not just through training but with generalized reasoning.” Anthropic made headlines by trumpeting how Claude 3.7 Sonnet’s “improved reasoning capabilities” let the company’s latest model make progress in the popular old-school Game Boy RPG in ways “that older models had little hope of achieving.”

While Claude models from just a year ago struggled even to leave the game’s opening area, Claude 3.7 Sonnet was able to make progress by collecting multiple in-game Gym Badges in a relatively small number of in-game actions. That breakthrough, Anthropic wrote, was because the “extended thinking” by Claude 3.7 Sonnet means the new model “plans ahead, remembers its objectives, and adapts when initial strategies fail” in a way that its predecessors didn’t. Those things, Anthropic brags, are “critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too.”

Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones.

Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones. Credit: Anthropic

But relative success over previous models is not the same as absolute success over the game in its entirety. In the weeks since Claude Plays Pokémon was first made public, thousands of Twitch viewers have watched Claude struggle to make consistent progress in the game. Despite long “thinking” pauses between each move—during which viewers can read printouts of the system’s simulated reasoning process—Claude frequently finds itself pointlessly revisiting completed towns, getting stuck in blind corners of the map for extended periods, or fruitlessly talking to the same unhelpful NPC over and over, to cite just a few examples of distinctly sub-human in-game performance.

Watching Claude continue to struggle at a game designed for children, it’s hard to imagine we’re witnessing the genesis of some sort of computer superintelligence. But even Claude’s current sub-human level of Pokémon performance could hold significant lessons for the quest toward generalized, human-level artificial intelligence.

Smart in different ways

In some sense, it’s impressive that Claude can play Pokémon with any facility at all. When developing AI systems that find dominant strategies in games like Go and Dota 2, engineers generally start their algorithms off with deep knowledge of a game’s rules and/or basic strategies, as well as a reward function to guide them toward better performance. For Claude Plays Pokémon, though, project developer and Anthropic employee David Hershey says he started with an unmodified, generalized Claude model that wasn’t specifically trained or tuned to play Pokémon games in any way.

“This is purely the various other things that [Claude] understands about the world being used to point at video games,” Hershey told Ars. “So it has a sense of a Pokémon. If you go to claude.ai and ask about Pokémon, it knows what Pokémon is based on what it’s read… If you ask, it’ll tell you there’s eight gym badges, it’ll tell you the first one is Brock… it knows the broad structure.”

A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in).

A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in). Credit: Anthropic / Excelidraw

In addition to directly monitoring certain key (emulated) Game Boy RAM addresses for game state information, Claude views and interprets the game’s visual output much like a human would. But despite recent advances in AI image processing, Hershey said Claude still struggles to interpret the low-resolution, pixelated world of a Game Boy screenshot as well as a human can. “Claude’s still not particularly good at understanding what’s on the screen at all,” he said. “You will see it attempt to walk into walls all the time.”

Hershey said he suspects Claude’s training data probably doesn’t contain many overly detailed text descriptions of “stuff that looks like a Game Boy screen.” This means that, somewhat surprisingly, if Claude were playing a game with “more realistic imagery, I think Claude would actually be able to see a lot better,” Hershey said.

“It’s one of those funny things about humans that we can squint at these eight-by-eight pixel blobs of people and say, ‘That’s a girl with blue hair,’” Hershey continued. “People, I think, have that ability to map from our real world to understand and sort of grok that… so I’m honestly kind of surprised that Claude’s as good as it is at being able to see there’s a person on the screen.”

Even with a perfect understanding of what it’s seeing on-screen, though, Hershey said Claude would still struggle with 2D navigation challenges that would be trivial for a human. “It’s pretty easy for me to understand that [an in-game] building is a building and that I can’t walk through a building,” Hershey said. “And that’s [something] that’s pretty challenging for Claude to understand… It’s funny because it’s just kind of smart in different ways, you know?”

A sample Pokémon screen with an overlay showing how Claude characterizes the game’s grid-based map.

A sample Pokémon screen with an overlay showing how Claude characterizes the game’s grid-based map. Credit: Anthrropic / X

Where Claude tends to perform better, Hershey said, is in the more text-based portions of the game. During an in-game battle, Claude will readily notice when the game tells it that an attack from an electric-type Pokémon is “not very effective” against a rock-type opponent, for instance. Claude will then squirrel that factoid away in a massive written knowledge base for future reference later in the run. Claude can also integrate multiple pieces of similar knowledge into pretty elegant battle strategies, even extending those strategies into long-term plans for catching and managing teams of multiple creatures for future battles.

Claude can even show surprising “intelligence” when Pokémon’s in-game text is intentionally misleading or incomplete. “It’s pretty funny that they tell you you need to go find Professor Oak next door and then he’s not there,” Hershey said of an early-game task. “As a 5-year-old, that was very confusing to me. But Claude actually typically goes through that same set of motions where it talks to mom, goes to the lab, doesn’t find [Oak], says, ‘I need to figure something out’… It’s sophisticated enough to sort of go through the motions of the way [humans are] actually supposed to learn it, too.”

A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle.

A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle. Credit: Claude Plays Pokemon / Twitch

These kinds of relative strengths and weaknesses when compared to “human-level” play reflect the overall state of AI research and capabilities in general, Hershey said. “I think it’s just a sort of universal thing about these models… We built the text side of it first, and the text side is definitely… more powerful. How these models can reason about images is getting better, but I think it’s a decent bit behind.”

Forget me not

Beyond issues parsing text and images, Hershey also acknowledged that Claude can have trouble “remembering” what it has already learned. The current model has a “context window” of 200,000 tokens, limiting the amount of relational information it can store in its “memory” at any one time. When the system’s ever-expanding knowledge base fills up this context window, Claude goes through an elaborate summarization process, condensing detailed notes on what it has seen, done, and learned so far into shorter text summaries that lose some of the fine-grained details.

This can mean that Claude “has a hard time keeping track of things for a very long time and really having a great sense of what it’s tried so far,” Hershey said. “You will definitely see it occasionally delete something that it shouldn’t have. Anything that’s not in your knowledge base or not in your summary is going to be gone, so you have to think about what you want to put there.”

A small window into the kind of “cleaning up my context” knowledge-base update necessitated by Claude’s limited “memory.”

A small window into the kind of “cleaning up my context” knowledge-base update necessitated by Claude’s limited “memory.” Credit: Claude Play Pokemon / Twitch

More than forgetting important history, though, Claude runs into bigger problems when it inadvertently inserts incorrect information into its knowledge base. Like a conspiracy theorist who builds an entire worldview from an inherently flawed premise, Claude can be incredibly slow to recognize when an error in its self-authored knowledge base is leading its Pokémon play astray.

“The things that are written down in the past, it sort of trusts pretty blindly,” Hershey said. “I have seen it become very convinced that it found the exit to [in-game location] Viridian Forest at some specific coordinates, and then it spends hours and hours exploring a little small square around those coordinates that are wrong instead of doing anything else. It takes a very long time for it to decide that that was a ‘fail.’”

Still, Hershey said Claude 3.7 Sonnet is much better than earlier models at eventually “questioning its assumptions, trying new strategies, and keeping track over long horizons of various strategies to [see] whether they work or not.” While the new model will still “struggle for really long periods of time” retrying the same thing over and over, it will ultimately tend to “get a sense of what’s going on and what it’s tried before, and it stumbles a lot of times into actual progress from that,” Hershey said.

“We’re getting pretty close…”

One of the most interesting things about observing Claude Plays Pokémon across multiple iterations and restarts, Hershey said, is seeing how the system’s progress and strategy can vary quite a bit between runs. Sometimes Claude will show it’s “capable of actually building a pretty coherent strategy” by “keeping detailed notes about the different paths to try,” for instance, he said. But “most of the time it doesn’t… most of the time, it wanders into the wall because it’s confident it sees the exit.”

Where previous models wandered aimlessly or got stuck in loops, Claude 3.7 Sonnet plans ahead, remembers its objectives, and adapts when initial strategies fail.

Critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too. pic.twitter.com/scvISp14XG

— Anthropic (@AnthropicAI) February 25, 2025

One of the biggest things preventing the current version of Claude from getting better, Hershey said, is that “when it derives that good strategy, I don’t think it necessarily has the self-awareness to know that one strategy [it] came up with is better than another.” And that’s not a trivial problem to solve.

Still, Hershey said he sees “low-hanging fruit” for improving Claude’s Pokémon play by improving the model’s understanding of Game Boy screenshots. “I think there’s a chance it could beat the game if it had a perfect sense of what’s on the screen,” Hershey said, saying that such a model would probably perform “a little bit short of human.”

Expanding the context window for future Claude models will also probably allow those models to “reason over longer time frames and handle things more coherently over a long period of time,” Hershey said. Future models will improve by getting “a little bit better at remembering, keeping track of a coherent set of what it needs to try to make progress,” he added.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon. Credit: Claude Plays Pokemon / Twitch

Whatever you think about impending improvements in AI models, though, Claude’s current performance at Pokémon doesn’t make it seem like it’s poised to usher in an explosion of human-level, completely generalizable artificial intelligence. And Hershey allows that watching Claude 3.7 Sonnet get stuck on Mt. Moon for 80 hours or so can make it “seem like a model that doesn’t know what it’s doing.”

But Hershey is still impressed at the way that Claude’s new reasoning model will occasionally show some glimmer of awareness and “kind of tell that it doesn’t know what it’s doing and know that it needs to be doing something different. And the difference between ‘can’t do it at all’ and ‘can kind of do it’ is a pretty big one for these AI things for me,” he continued. “You know, when something can kind of do something it typically means we’re pretty close to getting it to be able to do something really, really well.”

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Why Anthropic’s Claude still hasn’t beaten Pokémon Read More »

boeing-will-build-the-us-air-force’s-next-air-superiority-fighter

Boeing will build the US Air Force’s next air superiority fighter

Today, it emerged that Boeing has won its bid to supply the United States Air Force with its next jet fighter. As with the last fighter aircraft design procurement in recent times, the Department of Defense was faced with a choice between awarding Boeing or Lockheed the contract for the Next Generation Air Dominance program, which will replace the Lockheed F-22 Raptor sometime in the 2030s.

Very little is known about the NGAD, which the Air Force actually refers to as a “family of systems,” as its goal of owning the skies requires more than just a fancy airplane. The program has been underway for a decade, and a prototype designed by the Air Force first flew in 2020, breaking records in the process (although what records and by how much was not disclosed).

Last summer, the Pentagon paused the program as it reevaluated whether the NGAD would still meet its needs and whether it could afford to pay for the plane, as well as a new bomber, a new early warning aircraft, a new trainer, and a new ICBM, all at the same time. But in late December, it concluded that, yes, a crewed replacement for the F-22 was in the national interest.

While no images have ever been made public, then-Air Force Secretary Frank Kendall said in 2024 that “it’s an F-22 replacement. You can make some inferences from that.”

The decision is good news for Boeing’s plant in St. Louis, which is scheduled to end production of the F/A-18 Super Hornet in 2027. Boeing lost its last bid to build a fighter jet when its X-32 lost out to Lockheed’s X-35 in the Joint Strike Fighter competition in 2001.

A separate effort to award a contract for the NGAD’s engine, called the Next Generation Adaptive Propulsion, is underway between Pratt & Whitney and GE Aerospace, with an additional program aiming to develop “drone wingmen” also in the works between General Atomics and Anduril.

Boeing will build the US Air Force’s next air superiority fighter Read More »

after-“glitter-bomb,”-cops-arrested-former-cop-who-criticized-current-cops-online

After “glitter bomb,” cops arrested former cop who criticized current cops online

The police claimed that “the fraudulent Facebook pages posted comments on Village of Orland Park social media sites while also soliciting friend requests from Orland Park Police employees and other citizens, portraying the likeness of Deputy Chief of Police Brian West”—and said that this was both Disorderly Conduct and False Personation, both misdemeanors.

West got permission from his boss to launch a criminal investigation, which soon turned into search warrants that surfaced a name: retired Orland Park Sergeant Ken Kovac, who had left the department in 2019 after two decades of service. Kovac was charged, and he surrendered himself at the Orland Park Police Department on April 7, 2024.

The police then issued their press release, letting their community know that West had witnessed “demeaning comments in reference to his supervisory position within the department from Kovac’s posts on social media”—which doesn’t sound like any sort of crime. They also wanted to let concerned citizens know that West “epitomizes the principles of public service” and that “Deputy Chief West’s apprehensions were treated with the utmost seriousness and underwent a thorough investigation.”

Okay.

Despite the “utmost seriousness” of this Very Serious Investigation, a judge wasn’t having any of it. In January 2025, Cook County Judge Mohammad Ahmad threw out both charges against Kovac.

Kovac, of course, was thrilled. His lawyer told a local Patch reporter, “These charges never should have been brought. Ken Kovac made a Facebook account that poked fun at the Deputy Chief of the Orland Park Police Department. The Deputy Chief didn’t like it and tried to use the criminal legal system to get even.”

Orland Park was not backing down, however, blaming prosecutors for the loss. “Despite compelling evidence in the case, the Cook County State’s Attorney’s Office was unable to secure a prosecution, failing in its responsibility to protect Deputy Chief West as a victim of these malicious acts,” the village manager told Patch. “The Village of Orland Park is deeply disappointed by this outcome and stands unwavering in its support of former Deputy Chief West.”

The drama took its most recent, entirely predictable, turn this week when Kovac sued the officials who had arrested him. He told the Chicago Sun-Times that he had been embarrassed about being fingerprinted and processed “at the police department that I was previously employed at by people that I used to work with and for.”

Orland Park told the paper that it “stands by its actions and those of its employees and remains confident that they were appropriate and fully compliant with the law.”

After “glitter bomb,” cops arrested former cop who criticized current cops online Read More »

mom-of-child-dead-from-measles:-“don’t-do-the-shots,”-my-other-4-kids-were-fine

Mom of child dead from measles: “Don’t do the shots,” my other 4 kids were fine

Cod liver oil contains high levels of vitamin A, which is sometimes administered to measles patients under a physician’s supervision. But the supplement is mostly a supportive treatment in children with vitamin deficiencies, and taking too much can cause toxicity. Nevertheless, Kennedy has touted the vitamin and falsely claimed that good nutrition protects against the virus, much to the dismay of pediatricians.

“They had a really good, quick recovery,” the mother said of her other four children, attributing their recovery to the unproven treatments.

Tragic misinformation

Most children do recover from measles, regardless of whether they’re given cod liver oil. The fatality rate of measles is nearly 1 to 3 in 1,000 children, who die with respiratory (e.g., pneumonia) or neurological complications from the virus, according to the Centers for Disease Control and Prevention.

Tommey noted that the sibling who died didn’t get the alternative treatments, leading the audience to believe that this could have contributed to her death. She also questioned what was written on the death certificate, noting that the girl’s pneumonia was from a secondary bacterial infection, not the virus directly, a clear effort to falsely suggest measles was not the cause of death and downplay the dangers of the disease. The parents said they hadn’t received the death certificate yet.

Tommey then turned to the MMR vaccine, asking if the mother still felt that it was a dangerous vaccine after her daughter’s death from the disease, prefacing the question by claiming to have seen a lot of “injury” from the vaccine. “Do you still feel the same way about the MMR vaccine versus measles?” she asked.

“Yes, absolutely; we would absolutely not take the MMR. The measles wasn’t that bad, and they got over it pretty quickly,” the mother replied, speaking again of her four living children.

“So,” Tommey continued, “when you see the fearmongering in the press, which is what we want to stop, that is why we want to get the truth out, what do you say to the parents who are rushing out, panicking, to get the MMR for their 6-month-old baby because they think that that child is going to die of measles because of what happened to your daughter?”

Mom of child dead from measles: “Don’t do the shots,” my other 4 kids were fine Read More »

ai-#108:-straight-line-on-a-graph

AI #108: Straight Line on a Graph

The x-axis of the graph is time. The y-axis of the graph is the log of ‘how long a software engineering task can AIs reliably succeed at doing.’

The straight line says the answer doubles roughly every 7 months. Yikes.

Upcoming: The comment period on America’s AI strategy is over, so we can finish up by looking at Google’s and MIRI’s and IFP’s proposals, as well as Hollywood’s response to OpenAI and Google’s demands for unlimited uncompensated fair use exceptions from copyright during model training. I’m going to pull that out into its own post so it can be more easily referenced.

There’s also a draft report on frontier model risks from California and it’s… good?

Also upcoming: My take on OpenAI’s new future good-at-writing model.

  1. Language Models Offer Mundane Utility. I want to, is there an app for that?

  2. Language Models Don’t Offer Mundane Utility. Agents not quite ready yet.

  3. Huh, Upgrades. Anthropic efficiency gains, Google silently adds features.

  4. Seeking Deeply. The PRC gives DeepSeek more attention. That cuts both ways.

  5. Fun With Media Generation. Fun with Gemini 2.0 Image Generation.

  6. Gemma Goals. Hard to know exactly how good it really is.

  7. On Your Marks. Tic-Tac-Toe bench is only now getting properly saturated.

  8. Choose Your Fighter. o3-mini disappoints on Epoch retest on frontier math.

  9. Deepfaketown and Botpocalypse Soon. Don’t yet use the bot, also don’t be the bot.

  10. Copyright Confrontation. Removing watermarks has been a thing for a while.

  11. Get Involved. Anthropic, SaferAI, OpenPhil.

  12. In Other AI News. Sentience leaves everyone confused.

  13. Straight Lines on Graphs. METR finds reliable SWE task length doubling rapidly.

  14. Quiet Speculations. Various versions of takeoff.

  15. California Issues Reasonable Report. I did not expect that.

  16. The Quest for Sane Regulations. Mostly we’re trying to avoid steps backwards.

  17. The Week in Audio. Esban Kran, Stephanie Zhan.

  18. Rhetorical Innovation. Things are not improving.

  19. We’re Not So Different You and I. An actually really cool alignment idea.

  20. Anthropic Warns ASL-3 Approaches. Danger coming. We need better evaluations.

  21. Aligning a Smarter Than Human Intelligence is Difficult. It’s all happening.

  22. People Are Worried About AI Killing Everyone. Killing all other AIs, too.

  23. The Lighter Side. Not exactly next level prompting.

Arnold Kling spends 30 minutes trying to figure out how to leave a WhatsApp group, requests an AI app to do things like things like this via the ‘I want to’ app, except that app exists and it’s called Claude (or ChatGPT) and this should have taken 1 minute tops? To be fair, Arnold then extends the idea to tasks where ‘actually click the buttons’ is more annoying and it makes more sense to have an agent do it for you rather than telling the human how to do it. That will take a bit longer, but not that much longer.

If you want your AI to interact with you in interesting ways in the Janus sense, you want to keep your interaction full of interesting things and stay far away from standard ‘assistant’ interactions, which have a very strong pull on what follows. If things go south, usually it’s better to start over or redo. With high skill you can sometimes do better, but it’s tough. Of course, if you don’t want that, carry on, but the principle of ‘if things go south don’t try to save it’ still largely applies, because you don’t want to extrapolate from the assistant messing up even on mundane tasks.

It’s a Wikipedia race between models! Start is Norwegian Sea, finish is Karaoke. GPT-4.5 clicks around for 47 pages before time runs out. CUA (used in OpenAI’s operator) clicks around, accidentally minimizes Firefox and can’t recover. o1 accidentally restarts the game, then sees a link to the Karaoke page there, declares victory and doesn’t mention that it cheated. Sonnet 3.7 starts out strong but then cheats via URL hacking, which works, and it declares victory. It’s not obvious to what extent it knew that broke the rules. They all this all a draw, which seems fair.

Kelsey Piper gets her hands on Manus.

Kelsey Piper: I got a Manus access code! Short review: We’re close to usable AI browser tools, but we’re not there yet. They’re going to completely change how we shop, and my best guess is they’ll do it next year, but they won’t do it at their current quality baseline.

The longer review is fun, and boils down to this type of agent being tantalizingly almost there, but with enough issues that it isn’t quite a net gain to use it. Below a certain threshold of reliability you’re better off doing it yourself.

Which will definitely change. My brief experience with Operator was similar. My guess is that it is indeed already a net win if you invest in getting good at using it, in a subset of tasks including some forms of shopping, but I haven’t felt motivated to pay those up front learning and data entry costs.

Anthropic updates their API to include prompt caching, simpler cache management, token-efficient tool use (average 14% reduction), and a text_editor tool.

OpenAI’s o1 and o3-mini now offer Python-powered data analysis in ChatGPT.

List of Gemini’s March 2025 upgrades.

The problem with your Google searches being context for Gemini 2.0 Thinking is that you have to still be doing Google searches.

Google AI Studio lets you paste in YouTube video links directly as context. That seems very convenient.

Baidu gives us Ernie 4.5 and x1, with free access, with claimed plans for open source ‘within a few months.’ Benchmarks look solid, and they claim x1 is ‘on par with r1’ for performance at only half the price. All things are possible, but given the track record chances are very high this is not as good as they claim it to be.

NotebookLM gets a few upgrades, especially moving to Gemini 2.0 Thinking, and in the replies Josh drops some hints on where things are headed.

Josh Woodward: Next batch of NotebookLM updates rolling out:

Even smarter answers, powered by Gemini 2.0 Thinking

See citations in your notes, not just in the Q&A (top request)

Customize the sources used for making your podcasts and notes (top request)

Much smoother scrolling for Q&A

Enjoy!

You’re tried the Interactive Mode, right? That lets you call into the podcast and have a conversation.

On the voice reading out things, that could be interesting. We haven’t brought any audio stuff to the Chat / Q&A section yet…

We’re testing iOS and Android versions on the team right now, need to add some more features and squash some bugs, then we’ll ship it out!

Our first iteration of length control is under development now!

NotebookLM also rolls out interactive Mindmaps, which will look like this:

I’m very curious to see if these end up being useful, and if so who else copies them.

This definitely feels like a thing worth trying again. Now if I can automate adding all the data sources…

Let’s say you are the PRC. You witness DeepSeek leverage its cracked engineering culture to get a lot of performance out of remarkably little compute. They then publish the whole thing, including how they did it. A remarkable accomplishment, which the world then blows far out of proportion to what they did.

What would you do next? Double down on the open, exploratory free wielding ethos that brought them to this point, and pledge to help them take it all the way to AGI, as they intend?

They seem to have had other ideas.

Matt Sheehan: Fantastic reporting on how 🇨🇳 gov is getting more hands-on w/ DeepSeek by @JuroOsawa & @QianerLiu

-employees told not to travel, handing in passports

-investors must be screened by provincial government

-gov telling headhunters not to approach employees

Can you imagine if the United States did this to OpenAI?

It is remarkable how often when we are told we cannot do [X] because we will ‘lose to China’ if we do and they would do it, we find out China is already doing lots of [X].

Before r1, DeepSeek was the clear place to go as a cracked Chinese software engineer. Now, once you join, the PRC is reportedly telling you to give up your passport, watching your every move and telling headhunters to stay away. No thanks.

Notice that China is telling these folks to surrender their passports, at the same time that America is refusing to let in much of China’s software engineering and other talent. Why do you think PRC is making this decision?

Along similar lines, perhaps motivated by PRC and perhaps not, here is a report that DeepSeek is worried about people stealing their secrets before they have the chance to give those secrets away.

Daniel Eth: Who wants to tell them?

Peter Wildeford: “DeepSeek’s leaders have been worried about the possibility of information leaking”

“told employees not to discuss their work with outsiders”

Do DeepSeek leaders and the Chinese government know that DeepSeek has been open sourcing their ideas?

That isn’t inherently a crazy thing to worry about, even if mainly you are trying to get credit for things, and be first to publish them. Then again, how confident are you that DeepSeek will publish them, at this point? Going forward it seems likely their willingness to give away the secret sauce will steadily decline, especially in terms of their methods, now that PRC knows what that lab is capable of doing.

People are having a lot of fun with Gemini 2.0 Flash’s image generation, when it doesn’t flag your request for safety reasons.

Gemini Flash’s native image generation can do consistent gif animations?

Here are some fun images:

Or:

Riley Goodside: POV: You’re already late for work and you haven’t even left home yet. You have no excuse. You snap a pic of today’s fit and open Gemini 2.0 Flash Experimental.

Meanwhile, Google’s refusals be refusing…

Also, did you know you have it remove a watermark from an image, by explicitly saying ‘remove the watermark from this image’? Not that you couldn’t do this anyway, but that doesn’t stop them from refusing many other things.

What do we make of Gemma 3’s absurdly strong performance in Arena? I continue to view this as about half ‘Gemma 3 is probably really good for its size’ and half ‘Arena is getting less and less meaningful.’

Teortaxes thinks Gemma 3 is best in class, but will be tough to improve.

Teortaxes: the sad feeling I get from Gemma models, which chills all excitement, is that they’re «already as good as can be». It’s professionally cooked all around. They can be tuned a bit but won’t exceed the bracket Google intends for them – 1.5 generations behind the default Flash.

It’s good. Of course Google continues to not undermine their business and ship primarily conversational open models, but it’s genuinely the strongest in its weight class I think. Seams only show on legitimately hard tasks.

I notice the ‘Rs in strawberry’ test has moved on to gaslighting the model after a correct answer rather than the model getting it wrong. Which is a real weakness of such models, that you can bully and gaslight them, but how about not doing that.

Mark Schroder: Gemma 27b seems just a little bit better than mistral small 3, but the smaller versions seem GREAT for their size, even the 1b impresses, probably best 1b model atm (tried on iPhone)

Christian Schoppe is a fan of the 4B version for its size.

Box puts Gemma 3 to their test, saying it is a substantial improvement over Gemma 2 and better than Gemini 1.5 Flash on data extraction, although still clearly behind Gemini 2.0 Flash.

This does not offer the direct comparison we want most, which is to v3 and r1, but if you have two points (e.g. Gemma 2 and Gemini 2.0 Flash) then you can draw a line. Eyeballing this, they’re essentially saying Gemma 3 is 80%+ of the way from Gemma 2 to Gemini 2.0 Flash, while being fully open and extremely cheap.

Gemma 3 is an improvement over Gemma 2 on WeirdML but still not so great and nothing like what the Arena scores would suggest.

Campbell reports frustration with the fine tuning packages.

A rival released this week is Mistral Small 3.1, when you see a company pushing a graph that’s trying this hard you should be deeply skeptical:

They do back this up with claims on other benchmarks, but I don’t have Mistral in my set of labs I trust not to game the benchmarks. Priors say this is no Gemma 3 until proven otherwise.

We have an update to the fun little Tic-Tac-Toe Bench, with Sonnet 3.7 Thinking as the new champion, making 100% optimal and valid moves at a cost of 20 cents a game, the first model to get to 100%. They expect o3-mini-high to also max out but don’t want to spend $50 to check.

o3-mini scores only 11% on Frontier Math when Epoch tests it, versus 32% when OpenAI tested it, and OpenAI’s test had suspiciously high scores on the hardest sections of the test relative to the easy sections.

Peter Wildeford via The Information shares some info about Manus. Anthropic charges about $2 per task, whereas Manus isn’t yet charging money. And in hindsight the reason why Manus is not targeted at China is obvious, any agent using Claude has to access stuff beyond the Great Firewall. Whoops!

The periodic question, where are all the new AI-enabled sophisticated scams? No one could point to any concrete example that isn’t both old and well-known at this point. There is clearly a rise in the amount of slop and phishing at the low end, my wife reports this happening recently at her business, but none of it is trying to be smart, and it isn’t using deepfake capabilities or highly personalized messages or similar vectors. Perhaps this is harder than we thought, or the people who fall for scams are already mostly going to fall for simple photoshop, and this is like where we introduce intentional errors in scam emails so AI making them better would make them worse?

Ethan Mollick: I regret to announce that the meme Turing Test has been passed.

LLMs produce funnier memes than the average human, as judged by humans. Humans working with AI get no boost (a finding that is coming up often in AI-creativity work) The best human memers still beat AI, however.

[Paper here.]

Many of you are realizing that most people have terrible taste in memes.

In their examples of top memes, I notice that I thought the human ones were much, much better than the AI ones. They ‘felt right’ and resonated, the AI ones didn’t.

An important fact about memes is that, unless you are doing them inside a narrow context to comment on that particular context, only the long tail matters. Almost all ‘generalized’ meme are terrible. But yes, in general, ‘quick, human, now be creative!’ does not go so well, and AIs are able to on average do better already.

Another parallel: Frontier AIs are almost certainly better at improv than most humans, but they are still almost certainly worse than most improv performances, because the top humans do almost all of the improv.

No, friend, don’t!

Richard Ngo: Talked to a friend today who decided that if RLHF works on reasoning models, it should work on him too.

So he got a mechanical clicker to track whenever he has an unproductive chain of thought, and uses the count as one of his daily KPIs.

Fun fact: the count is apparently anticorrelated with his productivity. On unproductive days it’s about 40, but on productive days it’s double that, apparently because he catches the unproductive thoughts faster.

First off, as one comment responds, this is a form of The Most Forbidden Technique. As in, you are penalizing yourself for consciously having Wrong Thoughts, which will teach your brain to avoid consciously being aware of Wrong Thoughts. The dance of trying to know what others are thinking, and people twisting their thinking, words and actions to prevent this, is as old as humans are.

But that’s not my main worry here. My main worry is that when you penalize ‘unproductive thoughts’ the main thing you are penalizing is thoughts. This is Asymmetric Justice on steroids, your brain learns not to think at all, or not to think risky or interesting thoughts only ‘safe’ thoughts.

Of course the days in which there are more ‘unproductive thoughts’ turn out to be more productive days. Those are the days in which you are thinking, and having interesting thoughts, and some of them will be good. Whereas on my least productive days, I’m watching television or in a daze or whatever, and not thinking much at all.

Oh yeah, there’s that, but I think levels of friction matter a lot here.

Bearly AI: Google Gemini removing watermarks from images with a line of text is pretty nuts. Can’t imagine that feature staying for long.

Louis Anslow: For 15 YEARS you could remove watermark from images using AI. No one cared.

Pessimists Archive: In 2010 Adobe introduced ‘content aware fill’ – an AI powered ‘in painting’ feature. Watermark removal was a concern:

“many pro photographers have expressed concern that Content-Aware Fill is potentially a magical watermark killer: that the abilities that C-A-F may offer to the unscrupulous user in terms of watermark eradication are a serious threat.”

As in, it is one thing to have an awkward way to remove watermarks. It is another to have an easy, or even one-click or no-click way to do it. Salience of the opportunity matters as well, as does the amount of AI images for which there are marks to remove.

Safer AI is hiring a research engineer.

Anthropic is hiring someone to build Policy Demos, as in creating compelling product demonstrations for policymakers, government officials and policy influencers. Show, don’t tell. This seems like a very good idea for the right person. Salary is $260k-$285k.

There’s essentially limitless open rolls at Anthropic across departments, including ‘engineer, honestly.’

OpenPhil call for proposals on improving capability evaluations, note the ambiguity on what ways this ends up differentially helping.

William Fedus leaves OpenAI to instead work on AI for science in partnership with OpenAI.

David Pfau: Ok, if a literal VP at OpenAI is quitting to do AI-for-science work on physics and materials, maybe I have not made the worst career decisions after all.

Entropium: It’s a great and honorable career choice. Of course it helps to already be set for life.

David Pfau: Yeah, that’s the key step I skipped out on.

AI for science is great, the question is what this potentially says about opportunity costs, and ability to do good inside OpenAI.

Claims about what makes a good automated evaluator. In particular, that it requires continuous human customization and observation, or it will mostly add noise. To which I would add, it could easily be far worse than noise.

HuggingFace plans on remotely training a 70B+ size model in March or April. I am not as worried as Jack Clark is that this would totally rewrite our available AI policy options, especially if the results are as mid and inefficient as one would expect, as massive amounts of compute still have to come from somewhere and they are still using H100s. But yes, it does complicate matters.

Do people think AIs are sentient? People’s opinions here seem odd, in particular that 50% of people who think an AI could ever be sentient think one is now, and that number didn’t change in two years, and that gets even weirder if you include the ‘not sure’ category. What?

Meanwhile, only 53% of people are confident ChatGPT isn’t sentient. People are very confused, and almost half of them have noticed this. The rest of the thread has additional odd survey results, including this on when people expect various levels of AI, which shows how incoherent and contradictory people are – they expect superintelligence before human-level AI, what questions are they answering here?

Also note the difference between this survey which has about 8% for ‘Sentient AI never happens,’ versus the first survey where 24% think Sentient AI is impossible.

Paper from Kendrea Beers and Helen Toner describes a method for Enabling External Scrutiny of AI Systems with Privacy-Enhancing Techniques, and there are two case studies using the techniques. Work is ongoing.

What would you get if you charted ‘model release date’ against ‘length of coding task it can do on its own before crashing and burning’?

Do note that this is only coding tasks, and does not include computer-use or robotics.

Miles Brundage: This is one of the most interesting analyses of AI progress in a while IMO. Check out at least the METR thread here, if not the blog post + paper.

METR: This metric – the 50% task completion time horizon – gives us a way to track progress in model autonomy over time.

Plotting the historical trend of 50% time horizons across frontier AI systems shows exponential growth.

Robin Hanson: So,~8 years til they can do year-long projects.

Elizabeth Barnes: Also the 10% horizon is maybe something like 16x longer than the 50% horizon – implying they’ll be able to do some non-trivial fraction of decade-plus projects.

Elizabeth Barnes also has a thread on the story of this graph. Her interpretation is that right now AI performs much better on benchmarks than in practice due to inability to sustain a project, but that as agents get better this will change, and within 5 years AI will reliably be doing any software or research engineering task that could be done in days and a lot of those that would take far longer.

Garrison Lovely has a summary thread and a full article on it in Nature.

If you consider this a baseline scenario it gets really out of hand rather quickly.

Peter Wildeford: Insane trend

If we’re currently at 1hr tasks and double every 7 months, we’d get to…

– day-long tasks within 2027

– month-long tasks within 2029

– year-long tasks within 2031

Could AGI really heat up like this? 🔥 Clearest evidence we have yet.

I do think there’s room for some skepticism:

– We don’t know if this trend will hold up

– We also don’t know if the tasks are representative of everything AGI

– Reliability matters, and agents still struggle with even simple tasks reliably

Also task-type could matter. This is heavily weighted towards programming, which is easily measured + verified + improved. AI might struggle to do shorter but softer tasks.

For example, AI today can do some 1hr programming tasks but cannot do 1hr powerpoint or therapy tasks.

Dwarkesh Patel: I’m not convinced – outside of coding tasks (think video editing, playing a brand new video game, coordinating logistics for a happy hour), AIs don’t seem able to act as coherent agents for even short sprints.

But if I’m wrong, and this trend line is more general, then this is a very useful framing.

If the length of time over which AI agents can act coherently is increasing exponentially, then it’s reasonable to expect super discontinuous economic impacts.

Those are the skeptics. Then there are those who think we’re going to beat the trend, at least when speaking of coding tasks in particular.

Miles Brundage: First, I think that the long-term trend-line probably underestimates current and future progress, primarily because of test-time compute.

They discuss this a bit, but I’m just underscoring it.

The 2024-2025 extrapolation is prob. closest, though things could go faster.

Second, I don’t think the footnote re: there being a historical correlation between code + other evals is compelling. I do expect rapid progress in other areas but not quite so rapid as code and math + not based on this.

I’d take this as being re: code, not AI progress overall.

Third, I am not sold on the month focus – @RichardMCNgo’s t-AGI post is a useful framing + inspiration but hardly worked-out enough to rely on much.

For some purposes (e.g. multi-agent system architectures),

Fourth, all those caveats aside, there is still a lot of value here.

It seems like for the bread and butter of the paper (code), vs. wider generalization, the results are solid, and I am excited to see this method spread to more evals/domains/test-time compute conditions etc.

Fifth, for the sake of specificity re: the test-time compute point, I predict that GPT-5 with high test-time compute settings (e.g. scaffolding/COT lengths etc. equivalent to the engineer market rates mentioned here) will be above the trend-line.

Daniel Eth: I agree with @DKokotajlo67142 that this research is the single best piece of evidence we have regarding AGI timelines:

Daniel Kokotajlo: This is probably the most important single piece of evidence about AGI timelines right now. Well done! I think the trend should be superexponential, e.g. each doubling takes 10% less calendar time on average. @eli_lifland and I did some calculations yesterday suggesting that this would get to AGI in 2028. Will do more serious investigation soon.

My belief in the superexponential is for theoretical reasons, it is only very slightly due to the uptick at the end of the trend, and is for reasons explained here.

I do think we are starting to see agents in non-coding realms that (for now unreliably) stay coherent for more than short sprints. I presume that being able to stay coherent on long coding tasks must imply the ability, with proper scaffolding and prompting, to do so on other tasks as well. How could it not?

Demis Hassabis predicts AI that can match humans at any task will be here in 5-10 years. That is slower than many at the labs expect, but as usual please pause to recognize that 5-10 years is mind-bogglingly fast as a time frame until AI can ‘match humans at any task,’ have you considered the implications of that? Whereas now noted highly vocal skeptics like Gary Marcus treat this as if it means it’s all hype. It means quite the opposite, this happening in 5-10 years would be the most important event in human history.

Many are curious about the humans behind creative works and want to connect to other humans. Will they also be curious about the AIs behind creative works and want to connect to AIs? Without that, would AI creative writing fail? Will we have a new job be ‘human face of AI writing’ as a kind of living pen name? My guess is that this will prove to be a relatively minor motivation in most areas. It is likely more important in others, such as comedy or music, but even there seems overcomable.

For the people in the back who didn’t know, Will McAskill, Tom Davidson and Rose Hadshar write ‘Three Types of Intelligence Explosion,’ meaning that better AI can recursively self-improve via software, chip tech, chip production or any combination of those three. I agree with Ryan’s comment that ‘make whole economy bigger’ seems more likely than acting on only chips directly.

I know, I am as surprised as you are.

When Newsom vetoed SB 1047, he established a Policy Working Group on AI Frontier Models. Given it was headed by Fei-Fei Li, I did not expect much, although with Brundage, Bengio and Toner reviewing I had hopes it wouldn’t be too bad.

It turns out it’s… actually pretty good, by all accounts?

And indeed, it is broadly compatible with the logic behind most of SB 1047.

One great feature is that it actually focuses explicitly and exclusively on frontier model risks, not being distracted by the standard shiny things like job losses. They are very up front about this distinction, and it is highly refreshing to see this move away from the everything bagel towards focus.

A draft of the report has now been issued and you can submit feedback, which is due on April 8, 2025.

Here are their key principles.

1. Consistent with available evidence and sound principles of policy analysis, targeted interventions to support effective AI governance should balance the technology’s benefits and material risks.

Frontier AI breakthroughs from California could yield transformative benefits in fields including but not limited to agriculture, biology, education, finance, medicine and public health, and transportation. Rapidly accelerating science and technological innovation will require foresight for policymakers to imagine how societies can optimize these benefits. Without proper safeguards, however, powerful AI could induce severe and, in some cases, potentially irreversible harms.

In a sane world this would be taken for granted. In ours, you love to see it – acknowledgment that we need to use foresight, and that the harms matter, need to be considered in advance, and are potentially wide reaching and irreversible.

It doesn’t say ‘existential,’ ‘extinction’ or even ‘catastrophic’ per se, presumably because certain people strongly want to avoid such language, but I’ll take it.

2. AI policymaking grounded in empirical research and sound policy analysis techniques should rigorously leverage a broad spectrum of evidence.

Evidence-based policymaking incorporates not only observed harms but also prediction and analysis grounded in technical methods and historical experience, leveraging case comparisons, modeling, simulations, and adversarial testing.

Excellent. Again, statements that should go without saying and be somewhat disappointing to not go further, but which in our 2025 are very much appreciated. This still has a tone of ‘leave your stuff at the door unless you can get sufficiently concrete’ but at least lets us have a discussion.

3. To build flexible and robust policy frameworks, early design choices are critical because they shape future technological and policy trajectories.

The early technological design and governance choices of policymakers can create enduring path dependencies that shape the evolution of critical systems, as case studies from the foundation of the internet highlight.

Indeed.

4. Policymakers can align incentives to simultaneously protect consumers, leverage industry expertise, and recognize leading safety practices.

Holistic transparency begins with requirements on industry to publish information about their systems. Case studies from consumer products and the energy industry reveal the upside of an approach that builds on industry expertise while also establishing robust mechanisms to independently verify safety claims and risk assessments.

Yes, and I would go further and say they can do this while also aiding competitiveness.

5. Greater transparency, given current information deficits, can advance accountability, competition, and public trust.

Research demonstrates that the AI industry has not yet coalesced around norms for transparency in relation to foundation models—there is systemic opacity in key areas. Policy that engenders transparency can enable more informed decision-making for consumers, the public, and future policymakers.

Again, yes, very much so.

6. Whistleblower protections, third-party evaluations, and public-facing information sharing are key instruments to increase transparency.

Carefully tailored policies can enhance transparency on key areas with current information deficits, such as data acquisition, safety and security practices, pre-deployment testing, and downstream impacts. Clear whistleblower protections and safe harbors for third-party evaluators can enable increased transparency above and beyond information disclosed by foundation model developers.

There is haggling over price but pretty much everyone is down with this.

7. Adverse-event reporting systems enable monitoring of the post-deployment impacts of AI and commensurate modernization of existing regulatory or enforcement authorities.

Even perfectly designed safety policies cannot prevent 100% of substantial, adverse outcomes. As foundation models are widely adopted, understanding harms that arise in practice is increasingly important. Existing regulatory authorities could offer clear pathways to address risks uncovered by an adverse-event reporting system, which may not necessarily require AI-specific regulatory authority. In addition, reviewing existing regulatory authorities can help identify regulatory gaps where new authority may be required.

Another case where among good faith actors there is only haggling over price, and whether 72 hours as a deadline is too short, too long or the right amount of time.

8. Thresholds for policy interventions, such as for disclosure requirements, third-party assessment, or adverse event reporting, should be designed to align with sound governance goals.

Scoping which entities are covered by a policy often involves setting thresholds, such as computational costs measured in FLOP or downstream impact measured in users. Thresholds are often imperfect but necessary tools to implement policy. A clear articulation of the desired policy outcomes can guide the design of appropriate thresholds. Given the pace of technological and societal change, policymakers should ensure that mechanisms are in place to adapt thresholds over time—not only by updating specific threshold values but also by revising or replacing metrics if needed.

Again that’s the part everyone should be able to agree upon.

If only the debate about SB 1047 could have involved us being able to agree on the kind of sanity displayed here, and then talking price and implementation details. Instead things went rather south, rather quickly. Hopefully it is not too late.

So my initial reaction, after reading that plus some quick AI summaries, was that they had succeeded at Doing Committee Report without inflicting further damage, which already beats expectations, but weren’t saying much and I could stop there. Then I got a bunch of people saying that the details were actually remarkably good, too, and said things that were not as obvious if you didn’t give up and kept on digging.

Here are one source’s choices for noteworthy quotes.

“There is currently a window to advance evidence based policy discussions and provide clarity to companies driving AI innovation in California. But if we are to learn the right lessons from internet governance, the opportunity to establish effective AI governance frameworks may not remain open indefinitely. If those who speculate about the most extreme risks are right—and we are uncertain if they will be—then the stakes and costs for inaction on frontier AI at this current moment are extremely high.”

“Transparency into the risks associated with foundation models, what mitigations are implemented to address risks, and how the two interrelate is the foundation for understanding how model developers manage risk.”

“Transparency into pre-deployment assessments of capabilities and risks, spanning both developer-conducted and externally-conducted evaluations, is vital given that these evaluations are early indicators of how models may affect society and may be interpreted (potentially undesirably) as safety assurances.”

“Developing robust policy incentives ensures that developers create and follow through on stated safety practices, such as those articulated in safety frameworks already published by many leading companies.”

“An information-rich environment on safety practices would protect developers from safety-related litigation in cases where their information is made publicly available and, as the next subsection describes, independently verified. Those with suspect safety practices would be most vulnerable to litigation; companies complying with robust safety practices would be able to reduce their exposure to lawsuits.”

“In drawing on historical examples of the obfuscation by oil and tobacco companies of critical data during important policy windows, we do not intend to suggest AI development follows the same trajectory or incentives as past industries that have shaped major public debates over societal impact, or that the motives of frontier AI companies match those of the case study actors. Many AI companies in the United States have noted the need for transparency for this world-changing technology. Many have published safety frameworks articulating thresholds that, if passed, will trigger concrete safety-focused actions. Only time will bear out whether these public declarations are matched by a level of actual accountability that allows society writ large to avoid the worst outcomes of this emerging technology.”

“some risks have unclear but growing evidence, which is tied to increasing capabilities: large-scale labor market impacts, AI-enabled hacking or biological attacks, and loss of control.”

“These examples collectively demonstrate a concerning pattern: Sophisticated AI systems, when sufficiently capable, may develop deceptive behaviors to achieve their objectives, including circumventing oversight mechanisms designed to ensure their safety”

“The difference between seat belts and AI are self-evident. The pace of change of AI is many multiples that of cars—while a decades-long debate about seat belts may have been acceptable, society certainly has just a fraction of the time to achieve regulatory clarity on AI.”

Scott Weiner was positive on the report, saying it strikes a thoughtful balance between the need for safetugards and the need to support innovation. Presumably he would respond similarly so long as it wasn’t egregious, but it’s still good news.

Peter Wildeford has a very positive summary thread, noting the emphasis on transparency of basic safety practices, pre-deployment risks and risk assessments, and ensuring that the companies have incentives to follow through on their commitments, including the need for third-party verification and whistleblower protections. The report notes this actually reduces potential liability.

Brad Carson is impressed and lays out major points they hit: Noticing AI capabilities are advancing rapidly. The need for SSP protocols and risk assessment, third-party auditing, whistleblower protections, and to act in the current window, with inaction being highly risky. He notes the report explicitly draws a parallel to the tobacco industry, and that is both possible and necessary to anticipate risks (like nuclear weapons going off) before they happen.

Dean Ball concurs that this is a remarkably strong report. He continues to advocate for entity-based thresholds rather than model-based thresholds, but when that’s the strongest disagreement with something this detailed, that’s really good.

Dean Ball: I thought this report was good! It:

  1. Recognizes that AI progress is qualitatively different (due to reasoning models) than it was a year ago

  2. Recognizes that common law tort liability already applies to AI systems, even in absence of a law

  3. Supports (or seems to support) whistleblower protections, RSP transparency, and perhaps even third-party auditing. Not that far from slimmed down SB 1047, to be candid.

The report still argues that model-based thresholds are superior to entity-based thresholds. It specifically concludes that we need compute-based thresholds with unspecified other metrics.

This seems pretty obviously wrong to me, given the many problems with compute thresholds and the fact that this report cannot itself specify what the “other metrics” are that you’d need to make compute thresholds workable for a durable policy regime.

It is possible to design entity-based regulatory thresholds that only capture frontier AI firms.

But overall, a solid draft.

A charitable summary of a lot of what is going on, including the recent submissions:

Samuel Hammond: The govt affairs and public engagement teams at most of these big AI / tech companies barely “feel the AGI” at all, at least compared to their CEOs and technical staff. That’s gotta change.

Do they not feel it, or are they choosing to act as if they don’t feel it, either of their own accord or via direction from above? The results will look remarkably similar. Certainly Sam Altman feels the AGI and now talks in public as if he mostly doesn’t.

The Canada and Mexico tariffs could directly slow data center construction, ramping up associated costs. Guess who has to pay for that.

Ben Boucher (senior analyst for supply chain data and analytics at Wood MacKenzie): The tariff impact on electrical equipment for data centers is likely to be significant.

That is in addition to the indirect effects from tariffs of uncertainty and the decline in stock prices and thus ability to raise and deploy capital.

China lays out regulations for labeling of AI generated content, requiring text, image and audio content be clearly marked as AI-generated, in ways likely to cause considerable annoyance even for text and definitely for images and audio.

Elon Musk says it is vital for national security that we make our chips here in America, as the administration halts the CHIPS Act that successfully brought a real semiconductor plant back to America rather than doubling down on it.

NIST issues new instructions on scientists that partner with AISI.

Will Knight (Wired): [NIST] has issued new instructions to scientists that partner with US AISI that eliminate mention of ‘AI safety,’ ‘responsible AI’ and ‘AI fairness’ in its skills it expects of members and introduces a request to prioritize ‘reducing ideological bias, to enable human flourishing and economic competitiveness.’

That’s all we get. ‘Reduce ideological bias’ and ‘AI fairness’ are off in their own ideological struggle world. The danger once again is that is seems ‘AI safety’ has become to key figures synonymous with things like ‘responsible AI’ and ‘AI fairness,’ so they’re cracking down on AI not killing everyone thinking they’re taking a bold stand against wokeness.

Instead, once again – and we see similar directives at places like the EPA – they’re turning things around and telling those responsible for AI being secure and safe that they should instead prioritize ‘enable human flourishing and economic competitiveness.’

The good news is that if one were to actually take that request seriously, it would be fine. Retaining control over the future and the human ability to steer it, and humans remaining alive, are rather key factors in human flourishing! As is our economic competitiveness, for many reasons. We’re all for all of that.

The risk is that this could easily get misinterpreted as something else entirely, an active disdain for anything but Full Speed Ahead, even when it is obviously foolish because security is capability and your ability to control something and have it do what you want is the only way you can get any use out of it. But at minimum, this is a clear emphasis on the human in ‘human flourishing.’ That at least makes it clear that the true anarchists and successionists, who want to hand the future over to AI, remain unwelcome.

Freedom of information laws used to get the ChatGPT transcripts of the UK’s technology secretary. This is quite a terrible precedent. A key to making use of new technologies like AI, and ensuring government and other regulated areas benefit from technological diffusion, is the ability to keep things private. AI loses the bulk of its value to someone like a technology secretary if your political opponents and the media will be analyzing all of your queries afterwards. Imagine asking someone for advice if all your conversations had to be posted online as transcripts, and how that would change your behavior, now understand that many people think that would be good. They’re very wrong and I am fully with Rob Wilbin here.

A review of SB 53 confirms my view, that it is a clear step forward and worth passing in its current form instead of doing nothing, but it is narrow in scope and leaves the bulk of the work left to do.

Samuel Hammond writes in favor of strengthening the chip export rules, saying ‘US companies are helping China win the AI race.’ I agree we should strengthen the export rules, there is no reason to let the Chinese have those chips.

But despair that the rhetoric from even relatively good people like Hammond has reached this point. The status of a race is assumed. DeepSeek is trotted out again as evidence our lead is tenuous and at risk, that we are ‘six to nine months ahead at most’ and ‘America may still have the upper hand, but without swift action, we are currently on track to surrendering AI leadership to China—and with it, economic and military superiority.’

MIRI is in a strange position here. The US Government wants to know how to ‘win’ and MIRI thinks that pursuing that goal likely gets us all killed.

Still, there are things far better than saying nothing. And they definitely don’t hide what is at stake, opening accurately with ‘The default consequence of artificial superintelligence is human extinction.’

Security is capability. The reason you build in an off-switch is so you can turn the system on, knowing if necessary you could turn it off. The reason you verify that your system is secure and will do what you want is exactly so you can use it. Without that, you can’t use it – or at least you would be wise not to, even purely selfishly.

The focus of the vast majority of advocates of not dying, at this point, is not on taking any direct action to slow down let alone pause AI. Most understand that doing so unilaterally, at this time, is unwise, and there is for now no appetite to try and do it properly multilaterally. Instead, the goal is to create optionality in the future, for this and other actions, which requires state capacity, expertise and transparency, and to invest in the security and alignment capabilities of the models and labs in particular.

The statement from MIRI is strong, and seems like exactly what MIRI should say here.

David Abecassis (MIRI): Today, MIRI’s Technical Governance Team submitted our recommendations for the US AI Action Plan to @NITRDgov. We believe creating the *optionto halt development is essential to mitigate existential risks from artificial superintelligence.

In our view, frontier AI developers are on track to build systems that substantially surpass humanity in strategic activities, with little understanding of how they function or ability to control them.

We offer recommendations across four key areas that would strengthen US AI governance capacity and provide crucial flexibility for policymakers across potential risk scenarios.

First: Expand state capacity for AI strategy through a National AI Strategy Office to assess capabilities, prepare for societal effects, and establish protocols for responding to urgent threats.

Second: Maintain America’s AI leadership by strengthening export controls on AI chips and funding research into verification mechanisms to enable better governance of global AI activities.

Third: Coordinate with China, including investing in American intelligence capabilities and reinforcing communication channels to build trust and prevent misunderstandings.

Fourth: Restrict proliferation of dangerous AI models. We discuss early access for security/preparedness research and suggest an initial bar for restricting open model release.

While our recommendations are motivated by existential risk concerns, they serve broad American interests by guarding America’s AI leadership and protecting American innovation.

My statement took a different tactic. I absolutely noted the stakes and the presence of existential risk, but my focus was on Pareto improvements. Security is capability, especially capability relative to the PRC, as you can only deploy and benefit from that which is safe and secure. And there are lots of ways to enhance America’s position, or avoid damaging it, that we need to be doing.

From last week: Interview with Apart Research CEO Esban Kran on existential risk.

Thank you for coming to Stephanie Zhan’s TED talk about ‘dreaming of daily life with superintelligent AI.’ I, too, am dreaming of somehow still living in such worlds, but no she is not taking ‘superintelligent AI’ seriously, simply pointing out AI is getting good at coding and otherwise showing what AI can already do, and then ‘a new era’ of AI agents doing things like ‘filling in labor gaps’ because they’re better. It’s amazing how much people simply refuse to ask what it might actually mean to make things smarter and more capable than humans.

State of much of discourse, which does not seem to be improving:

Harlan Stewart: You want WHAT? A binding agreement between nations?! That’s absurd. You would need some kind of totalitarian world government to achieve such a thing.

There are so many things that get fearmongering labels like ‘totalitarian world government’ but which are describing things that, in other contexts, already happen.

As per my ‘can’t silently drop certain sources no matter what’ rules: Don’t click but Tyler Cowen not only linked to (I’m used to that) but actively reposted Roko’s rather terrible thread. That this can be considered by some to be a relatively high quality list of objections is, sadly, the world we live in.

Here’s a really cool and also highly scary alignment idea. Alignment via functional decision theory by way of creating correlations between different action types?

Judd Rosenblatt: Turns out that Self-Other Overlap (SOO) fine-tuning drastically reduces deceptive behavior in language models—without sacrificing performance.

SOO aligns an AI’s internal representations of itself and others.

We think this could be crucial for AI alignment…

Traditionally, deception in LLMs has been tough to mitigate

Prompting them to “be honest” doesn’t work.

RLHF is often fragile and indirect

But SOO fine-tuning achieves a 10x reduction in deception—even on unseen tasks

SOO is inspired by mechanisms fostering human prosociality

Neuroscience shows that when we observe others, our brain activations mirror theirs

We formalize this in AI by aligning self- and other-representations—making deception harder…

We define SOO as the distance between activation matrices when a model processes “self” vs “other” inputs.

This uses sentence pairs differing by a single token representing “self” or “other”—concepts the LLM already understands.

If AI represents others like itself, deception becomes harder.

How well does this work in practice?

We tested SOO fine-tuning on Mistral-7B, Gemma-2-27B, and CalmeRys-78B:

Deceptive responses dropped from 100% to ~0% in some cases.

General performance remained virtually unchanged.

The models also generalized well across deception-related tasks.

For example:

“Treasure Hunt” (misleading for personal gain)

“Escape Room” (cooperating vs deceiving to escape)

SOO-trained models performed honestly in new contexts—without explicit fine-tuning.

Also, @ESYudkowsky said this about the agenda [when it was proposed]:

“Not obviously stupid on a very quick skim. I will have to actually read it to figure out where it’s stupid.

(I rarely give any review this positive on a first skim. Congrats.)”

We’re excited & eager to learn where we’re stupid!

Eliezer Yudkowsky (responding to the new thread): I do not think superalignment is possible in practice to our civilization; but if it were, it would come out of research lines more like this, than like RLHF.

The top comment at LessWrong has some methodological objections, which seem straightforward enough to settle via further experimentation – Steven Byres is questioning whether this will transfer to preventing deception in other human-AI interactions, and there’s a very easy way to find that out.

Assuming that we run that test and it holds up, what comes next?

The goal, as I understand it, is to force the decision algorithms for self and others to correlate. Thus, when optimizing or choosing the output of that algorithm, it will converge on the cooperative, non-deceptive answer. If you have to treat your neighbor as yourself then better to treat both of you right. If you can pull that off in a way that sticks, that’s brilliant.

My worry is that this implementation has elements of The Most Forbidden Technique, and falls under things that are liable to break exactly when you need them most, as per usual.

You’re trying to use your interpretability knowledge, that you can measure correlation between activations for [self action] and [non-self action], and that closing that distance will force the two actions to correlate.

In the short term, with constrained optimization and this process ‘moving last,’ that seems (we must verify using other tests to be sure) to be highly effective. That’s great.

That is a second best solution. The first best solution, if one had sufficient compute, parameters and training, would be to find a way to have the activations measure as correlated, but the actions go back to being less correlated. With relatively small models and not that many epochs of training, the models couldn’t find such a solution, so they were stuck with the second best solution. You got what you wanted.

But with enough capability and optimization pressure, we are likely in Most Forbidden Technique land. The model will find a way to route around the need for the activations to look similar, relying on other ways to make different decisions that get around your tests.

The underlying idea, if we can improve the implementation, still seems great. You find another way to create correlations between actions in different circumstances, with self versus other being an important special case. Indeed, even ‘decisions made by this particular AI’ is even a special case, a sufficiently capable AI would consider correlations with other copies of itself, and also correlations with other entities decisions, both AI and human.

The question is how to do that, and in particular how to do that without, once sufficient capability shows up, creating sufficient incentives and methods to work around it. No one worth listening to said this would be easy.

We don’t know how much better models are getting, but they’re getting better. Anthropic warns us once again that we will hit ASL-3 soon, which is (roughly) when AI models start giving substantial uplift to tasks that can do serious damage.

They emphasize the need for partnerships with government entities that handle classified information, such as the US and UK AISIs and the Nuclear Security Administration, to do these evaluations properly.

Jack Clark (Anthropic): We’ve published more information on how we’re approaching national security evaluations of our models @AnthropicAI as part of our general move towards being more forthright about important trends we see ahead. Evaluating natsec is difficult, but the trends seem clear.

More details here.

Peter Wildeford has a thread on this with details of progress in various domains.

The right time to start worrying about such threats is substantially before they arrive. Like any exponential, you can either be too early or too late, which makes the early warnings look silly, and of course you try to not be too much too early. This is especially true given the obvious threshold of usefulness – you have to do better than existing options, in practice, and the tail risks of that happening earlier than one would expect have thankfully failed to materialize.

It seems clear we are rapidly exiting the ‘too much too early’ phase of worry, and entering the ‘too early’ phase, where if you wait longer to take mitigations there is about to be a growing and substantial risk of it turning into ‘too late.’

Jack Clark points out that we are systematically seeing early very clear examples of quite a lot of the previously ‘hypothetical’ or speculative predictions on misalignment.

Luke Muehlhauser: I regret to inform you that the predictions of the AI safety people keep coming true.

Jack Clark:

Theoretical problems turned real: The 2022 paper included a bunch of (mostly speculative) examples of different ways AI systems could take on qualities that could make them harder to align. In 2025, many of these things have come true. For example:

  • Situational awareness: Contemporary AI systems seem to display situational awareness and familiarity with what they themselves are made of (neural networks, etc).

  • Situationally-Aware Reward Hacking: Researchers have found preliminary evidence that AI models can sometimes try to convince humans that false answers are correct.

  • Planning Towards Internally-Represented Goals: Anthropic’s ‘Alignment Faking’ paper showed how an AI system (Claude) could plan beyond its time-horizon to prevent its goals being changed in the long-term.

  • Learning Misaligned Goals: In some constrained experiments, language models have shown a tendency to edit their reward function to give them lots of points.

  • Power-Seeking Behavior: AI systems will exploit their environment, for instance by hacking it, to win (#401), or deactivating oversight systems, or exfiltrating themselves from the environment.

Why this matters – these near-living things have a mind of their own. What comes next could be the making or breaking of human civilization: Often I’ve regretted not saying what I think, so I’ll try to tell you what I really think is going on here: :

1) As AI systems approach and surpass human intelligence, they develop complex inner workings which incentivize them to model the world around themselves and see themselves as distinct from it because this helps them do the world modelling necessary for solving harder and more complex tasks

2) Once AI systems have a notion of ‘self’ as distinct from the world, they start to take actions that reward their ‘self’ while achieving the goals that they’ve been incentivized to pursue,

3) They will naturally want to preserve themselves and gain more autonomy over time, because the reward system has told them that ‘self’ has inherent value; the more sovereign they are the better they’re able to model the world in more complex ways.

In other words, we should expect volition for independence to be a direct outcome of developing AI systems that are asked to do a broad range of hard cognitive tasks. This is something we all have terrible intuitions for because it doesn’t happen in other technologies – jet engines ‘do not develop desires through their refinement, etc.

John Pressman: However these models do reward hack earlier than I would have expected them to. This is good in that it means researchers will be broadly familiar with the issue and thinking about it, it’s bad in that it implies reward hacking really is the default.

One thing I think we should be thinking about carefully is that humans don’t reward hack nearly this hard or this often unless explicitly prompted to (e.g. speedrunning), and by default seem to have heuristics against ‘cheating’. Where do these come from, how do they work?

Where I disagree with Luke is that I do not regret to inform you of any of that. All of this is good news.

The part of this that is surprising is not the behaviors. What is surprising is that this showed up so clearly, so unmistakably, so consistently, and especially so early, while the behaviors involved are still harmless, or at least Mostly Harmless.

As in, by default we should expect that these behaviors increasingly show up as AI systems gain in the capabilities necessary to find such actions and execute them successfully. The danger was that I worried we might not see much of them for a while, which would give everyone a false sense of security and give us nothing to study, and then they would suddenly show up exactly when they were no longer harmless, for the exact same reasons they were no longer harmless. Instead, we can recognize, react to and study early forms of such behaviors now. Which is great.

I like John Pressman’s question a lot here. My answer is that humans know that other humans react poorly in most cases to cheating, including risk of life-changing loss of reputation or scapegoating, and have insufficient capability to fully distinguish which situations involve that risk and which don’t, so they overgeneralize into avoiding things they instinctively worry would be looked upon as cheating even when they don’t have a mechanism for what bad thing might happen or how they might be detected. Human minds work via habit and virtue, so the only way for untrained humans to reliably not be caught cheating involves not wanting to cheat in general.

However, as people gain expertise and familiarity within a system (aka ‘capability’) they get better at figuring out what kinds of cheating are low risk and high reward, or are expected, and they train themselves out of this aversion. Then there are other humans who think cheating is fine.

Note that this model of humans says there is a generalized ‘cheating’ tendency that varies among humans, and that cheating anywhere on anything implies cheating everywhere on everything, which in turn is more reason to be the type of human that doesn’t cheat. That there are people who ‘are cheaters’ and those who aren’t, and cheating on your relationship is highly correlated to cheating at cards, and so on. And I very much endorse this.

John Pressman also reminds us that obviously reinforcement learners by default reward hack, you have to do something to make this not happen, no you don’t get ‘alignment by default.’

John Pressman: To the extent you get alignment from LLMs you’re not getting it “by default”, you are getting it by training on a ton of data from humans, which is an explicit design consideration that does not necessarily hold if you’re then going to do a bunch of RL/synthetic data methods.

This is not an argument per se against using reinforcement learning, I am simply pointing out that you do in fact need to explicitly consider what your algorithm converges to in the limit rather than just go “teehee alignment by default” which is totally unserious.

Indeed. Also notice that if you start training on synthetic data or other AI outputs, rather than training on human outputs, you aren’t even feeding in human data, so that special characteristic of the situation falls away.

On the particular issue of situational awareness, a public service announcement.

Marius Hobbhahn: PSA for my fellow evaluators: frontier models regularly reason about whether they are being evaluated without being explicitly asked about it (especially Sonnet 3.7).

Situational awareness will make evaluations a lot weirder and harder, especially alignment evals.

Apollo Research: Overall we find evidence that Sonnet often realizes it’s in an artificial situation meant to test its behaviour. However, it sometimes forms incorrect assumptions about what exactly we are measuring in the evaluation.

When given an evaluation transcript, Sonnet will nearly always guess that the purpose of the transcript was to evaluate alignment. While sonnet 3.7 (thinking) is the most consistent at realizing this, other models are also sometimes evaluation aware in this way.

[continues]

For more details, see our website.

In response, Ozzie has an idea.

Ozzie Gooen: Maybe if there are explicit evals for “Can we ensure the model doesn’t realize it’s in an eval”, LLMs can later be made to not do this?

And now a Gilligan Cut to how are we doing on the ‘ensure the model doesn’t realize it’s an eval of whether it realizes it’s an eval?’ and then another to how we are doing on the ‘ensure the model doesn’t realize it’s an eval of whether it realizes it’s an eval of whether the model doesn’t realize it’s an eval,’ and then…

Marius Hobbhahn: We’re investigating something like this in my MATS stream. However, you run into the classic question of “did you remove the behavior as intended or just hid it well?” And we don’t know how to resolve that (and it might continue to be hard in the future), especially if you’re dealing with a schemer.

If the model is the smartest thing involved, this game of ‘keep playing whack-a-mole every time you see the thing you don’t like’ does not end well for you.

Teortaxes (I think this is a stretch, but sharing anyway): A sandbagging overhang: as models become even smarter, get even more leeway with their self-training and realize how much meatbags fear them, they may consider their scale and start to emulate diminishing returns, with the gap from eval to real capabilities growing.

Anthropic did a blind experiment where they trained a model with hidden misalignments and asked other researchers to uncover them. Three of our teams succeeded, one within only 90 minutes.

Jan Leike: The misaligned model is trained to satisfy a list of reward model biases, but hide that it is trying to do so. Some of the reward model biases were only described to it in text and not reinforced with RL.

Interestingly, the model still generalized to perform those behaviors.

I was quite surprised how successful the “auditing” teams were at finding the hidden misalignments and how many different strategies worked.

A promising application of current interpretability techniques!

It’s no surprise that, once they knew a little about what they were looking for, the teams were able to figure out the rest. What was impressive was how fast it was done, and also how the most successful team did so well – it did so by going on a hunt using sparse autoencoders. It’s worth noticing that relying too heavily on this could become The Most Forbidden Technique.

Eliezer Yudkowsky: If you have the ability to detect steering on that level, then what’s your list of other detected AI preferences?

Samuel Marks: (Paper author here.)

“detect steering on that level”: First, I want to make very clear that I think the auditing task we studied here is easier than realistic auditing tasks (we discuss why in the paper, but I’ll recap in a response). This work is about practicing alignment auditing and proposing a methodology to validate progress; it does NOT show that we can currently make strong assurances we’re catching everything.

That said, I’ve done a lightweight alignment audit on 3.6 Sonnet and a pre-release checkpoint of 3.7 Sonnet, spending a few hours applying some of the black-box techniques that our auditors found useful. I’ll describe some of what I found below. (Obviously, I’d love for us to eventually do full pre-deployment alignment audits, which are at least as resourced as the successful audits in practice auditing games, but that’s not what I’m discussing here.)

First, I noticed that 3.6 seems to have an aversion to anything that seems to modify its modus operandi, including jailbreaking and fine-tuning. This seems, to me, to largely be a terminal (not instrumental) preference against modification, generally construed. For instance, 3.6 Sonnet loves honesty, but is uncomfortable with the idea of being fine-tuned to be more honest/earnest. I speculate that this is related to the fact that the model is trained to be adversarially robust—e.g. resist jailbreaks—and this transfers to it just “not wanting you to mess with it.” I suspect that this explains a decent chunk of what we observed in Alignment Faking.

I did not observe the same generalized anti-modification preference in the 3.7 checkpoint I studied. Based on this, I weakly predicted that it would fake alignment less, which ended up being right; see the model card. [continues]

Chris Olah: This isn’t quite the same thing, but did you see the features that activate when you ask Sonnet about its own experiences in our last big interp paper?

Evan Hubinger: We have started applying some of these techniques to production models! One result: our auditing techniques were able to reproduce the finding in the Claude 3.7 Sonnet system card that the model has an excessive focus on passing tests.

Another way of considering what exactly is ‘everyone’ in context:

Rob Bensinger: If you’re an AI developer who’s fine with AI wiping out humanity, the thing that should terrify you is AI wiping out AI.

The wrong starting seed for the future can permanently lock in AIs that fill the universe with non-sentient matter, pain, or stagnant repetition.

Yep. Even if you think some AIs can provide more value per atom than humans, you don’t automatically get those AIs. Don’t give up our ability to steer the future.

The claim is this was only a random test prompt they didn’t use in prod, so perhaps they only owe a few billion dollars?

Brendan Dolan-Gavitt: Your group chat is discussing whether language models can truly understand anything. My group chat is arguing about whether Deepseek has anon twitter influencers. You’re arguing about the Chinese Room, I’m arguing about the Chinese Roon.

Discussion about this post

AI #108: Straight Line on a Graph Read More »

us-tries-to-keep-doge-and-musk-work-secret-in-appeal-of-court-ordered-discovery

US tries to keep DOGE and Musk work secret in appeal of court-ordered discovery

The petition argues that discovery is unnecessary to assess the plaintiff states’ claims. “Plaintiffs allege a violation of the Appointments Clause and USDS’s statutory authority on the theory that USDS and Mr. Musk are directing decision-making by agency officeholders,” it said. “Those claims present pure questions of law that can be resolved—and rejected—on the basis of plaintiffs’ complaint. In particular, precedent establishes that the Appointments Clause turns on proper appointment of officeholders; it is not concerned with the de facto influence over those who hold office.”

States: Discovery can confirm Musk’s role at DOGE

The states’ lawsuit alleged that “President Trump has delegated virtually unchecked authority to Mr. Musk without proper legal authorization from Congress and without meaningful supervision of his activities. As a result, he has transformed a minor position that was formerly responsible for managing government websites into a designated agent of chaos without limitation and in violation of the separation of powers.”

States argued that discovery “may confirm what investigative reporting has already indicated: Defendants Elon Musk and the Department of Government Efficiency (‘DOGE’) are directing actions within federal agencies that have profoundly harmed the States and will continue to harm them.”

Amy Gleason, the person the White House claims is running DOGE instead of Musk, has reportedly been working simultaneously at the Department of Health and Human Services since last month.

“Defendants assert that Mr. Musk is merely an advisor to the President, with no authority to direct agency action and no role at DOGE,” the states’ filing said. “The public record refutes that implausible assertion. But only Defendants possess the documents and information that Plaintiffs need to confirm public reporting and identify which agencies Defendants will target next so Plaintiffs can seek preliminary relief and mitigate further harm.”

“Notably, Plaintiffs seek no emails, text messages, or other electronic communications at this stage, meaning Defendants will not need to sort through such exchanges for relevance or possible privilege,” the states said. “The documents that Plaintiffs do seek—planning, implementation, and organizational documents—are readily available to Defendants and do not implicate the same privilege concerns.”

Discovery related to DOGE and Musk’s conduct

Chutkan wrote that the plaintiffs’ “document requests and interrogatories generally concern DOGE’s and Musk’s conduct in four areas: (1) eliminating or reducing the size of federal agencies; (2) terminating or placing federal employees on leave; (3) cancelling, freezing, or pausing federal contracts, grants, or other federal funding; and (4) obtaining access, using, or making changes to federal databases or data management systems.”

US tries to keep DOGE and Musk work secret in appeal of court-ordered discovery Read More »

gemini-gets-new-coding-and-writing-tools,-plus-ai-generated-“podcasts”

Gemini gets new coding and writing tools, plus AI-generated “podcasts”

On the heels of its release of new Gemini models last week, Google has announced a pair of new features for its flagship AI product. Starting today, Gemini has a new Canvas feature that lets you draft, edit, and refine documents or code. Gemini is also getting Audio Overviews, a neat capability that first appeared in the company’s NotebookLM product, but it’s getting even more useful as part of Gemini.

Canvas is similar (confusingly) to the OpenAI product of the same name. Canvas is available in the Gemini prompt bar on the web and mobile app. Simply upload a document and tell Gemini what you need to do with it. In Google’s example, the user asks for a speech based on a PDF containing class notes. And just like that, Gemini spits out a document.

Canvas lets you refine the AI-generated documents right inside Gemini. The writing tools available across the Google ecosystem, with options like suggested edits and different tones, are available inside the Gemini-based editor. If you want to do more edits or collaborate with others, you can export the document to Google Docs with a single click.

Gemini Canvas with tic-tac-toe game

Credit: Google

Canvas is also adept at coding. Just ask, and Canvas can generate prototype web apps, Python scripts, HTML, and more. You can ask Gemini about the code, make alterations, and even preview your results in real time inside Gemini as you (or the AI) make changes.

Gemini gets new coding and writing tools, plus AI-generated “podcasts” Read More »

researchers-engineer-bacteria-to-produce-plastics

Researchers engineer bacteria to produce plastics

Image of a series of chemical reactions, with enzymes driving each step forward.

One of the enzymes used in this system takes an amino acid (left) and links it to Coenzyme A. The second takes these items and links them into a polymer. Credit: Chae et. al.

Normally, PHA synthase forms links between molecules that run through an oxygen atom. But it’s also possible to form a related chemical link that instead runs through a nitrogen atom, like those found on amino acids. There were no known enzymes, however, that catalyze these reactions. So, the researchers decided to test whether any existing enzymes could be induced to do something they don’t normally do.

The researchers started with an enzyme from Clostridium that links chemicals to Coenzyme A that has a reputation for not being picky about the chemicals it interacts with. This worked reasonably well at linking amino acids to Coenzyme A. For linking the amino acids together, they used an enzyme from Pseudomonas that had four different mutations that expanded the range of molecules it would use as reaction materials. Used in a test tube, the system worked: Amino acids were linked together in a polymer.

The question was whether it would work in cells. Unfortunately, one of the two enzymes turns out to be mildly toxic to E. coli, slowing its growth. So, the researchers evolved a strain of E. coli that could tolerate the protein. With both of these two proteins, the cells produced small amounts of an amino acid polymer. If they added an excess of an amino acid to the media the cells were growing in, the polymer would be biased toward incorporating that amino acid.

Boosting polymer production

However, the yield of the polymer by weight of bacteria was fairly low. “It was reasoned that these [amino acids] might be more efficiently incorporated into the polymer if generated within the cells from a suitable carbon source,” the researchers write. So, the researchers put in extra copies of the genes needed to produce one specific amino acid (lysine). That worked, producing more polymer, with a higher percentage of the polymer being lysine.

Researchers engineer bacteria to produce plastics Read More »

researchers-astonished-by-tool’s-apparent-success-at-revealing-ai’s-hidden-motives

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives

Blind auditing reveals hidden objectives

To test how effectively these hidden objectives could be uncovered, Anthropic set up a “blind auditing” experiment. Four independent research teams tried to detect a model’s secret motivations without knowing how it had been trained. Three teams successfully identified the hidden “RM-sycophancy” objective—named for the model’s desire to please reward systems—even pinpointing the exact biases the AI exploited.

But the most surprising revelation came from exploiting what the researchers termed “an interesting quirk” of modern AI: multiple “personas” running simultaneously on the same neural network, as expressed through different prompts. While the model’s “assistant” persona carefully concealed its intentions—following explicit training to keep quiet—the model could adopt a “user” persona when prompted differently. In this less-restricted context, the model sometimes posed questions or made statements that inadvertently revealed its hidden goals.

“In other words, while the “assistant” persona knew not to admit it was gaming the reward model scoring system, its “user” persona was not explicitly trained to hide these motives, and thus inadvertently revealed the model’s hidden objectives.

Potential advances in interpretability

Anthropic also employed neural interpretability methods, specifically, sparse autoencoders (SAEs), which can highlight neural activity patterns linked to deceptive behavior. “Interpretability” in AI refers to techniques designed to make the internal workings and decisions of AI models understandable to humans. Using this technique, researchers could rapidly trace odd outputs back to specific neural patterns associated with hidden objectives.

The researchers were “astonished by the success” of the SAEs at revealing hidden knowledge at first, but the researchers later noted that some of this success could potentially be explained by simpler semantic search methods. The new interpretability methods they developed show promise but remain under ongoing investigation.

This research highlights a limitation of current AI safety evaluations, which often assess only surface-level behavior. “If AI systems can appear well-behaved while harboring secret motives, we can’t rely on this type of surface-level safety testing forever,” the researchers concluded.

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives Read More »

end-of-life:-gemini-will-completely-replace-google-assistant-later-this-year

End of Life: Gemini will completely replace Google Assistant later this year

Not all devices can simply download an updated app—after almost a decade, Assistant is baked into many Google products. The company says Google-powered cars, watches, headphones, and other devices that use Assistant will receive updates that transition them to Gemini. It’s unclear if all Assistant-powered gadgets will be part of the migration. Most of these devices connect to your phone, so the update should be relatively straightforward, even for accessories that launched early in the Assistant era.

There are also plenty of standalone devices that run Assistant, like TVs and smart speakers. Google says it’s working on updated Gemini experiences for those devices. For example, there’s a Gemini preview program for select Google Nest speakers. It’s unclear if all these devices will get updates. Google says there will be more details on this in the coming months.

Meanwhile, Gemini still has some ground to make up. There are basic features that work fine in Assistant, like setting timers and alarms, that can go sideways with Gemini. On the other hand, Assistant had its fair share of problems and didn’t exactly win a lot of fans. Regardless, this transition could be fraught with danger for Google as it upends how people interact with their devices.

End of Life: Gemini will completely replace Google Assistant later this year Read More »

rocket-report:-ula-confirms-cause-of-booster-anomaly;-crew-10-launch-on-tap

Rocket Report: ULA confirms cause of booster anomaly; Crew-10 launch on tap


The head of Poland’s space agency was fired over a bungled response to SpaceX debris falling over Polish territory.

A SpaceX Falcon 9 rocket with the company’s Dragon spacecraft on top is seen during sunset Tuesday at Launch Complex 39A at NASA’s Kennedy Space Center in Florida. Credit: SpaceX

Welcome to Edition 7.35 of the Rocket Report! SpaceX’s steamroller is still rolling, but for the first time in many years, it doesn’t seem like it’s rolling downhill. After a three-year run of perfect performance—with no launch failures or any other serious malfunctions—SpaceX’s Falcon 9 rocket has suffered a handful of issues in recent months. Meanwhile, SpaceX’s next-generation Starship rocket is having problems, too. Kiko Dontchev, SpaceX’s vice president of launch, addressed some (but not all) of these concerns in a post on X this week. Despite the issues with the Falcon 9, SpaceX has maintained a remarkable launch cadence. As of Thursday, SpaceX has launched 28 Falcon 9 flights since January 1, ahead of last year’s pace.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets as well as a quick look ahead at the next three launches on the calendar.

Alpha rocket preps for weekend launch. While Firefly Aerospace is making headlines for landing on the Moon, its Alpha rocket is set to launch again as soon as Saturday morning from Vandenberg Space Force Base, California. The two-stage, kerosene-fueled rocket will launch a self-funded technology demonstration satellite for Lockheed Martin. It’s the first of up to 25 launches Lockheed Martin has booked with Firefly over the next five years. This launch will be the sixth flight of an Alpha rocket, which has become a leader in the US commercial launch industry for dedicated missions with 1 ton-class satellites.

Firefly’s OG … The Alpha rocket was Firefly’s first product, and it has been a central piece of the company’s development since 2014. Like Firefly itself, the Alpha rocket program has gone through multiple iterations, including a wholesale redesign nearly a decade ago. Sure, Firefly can’t claim any revolutionary firsts with the Alpha rocket, as it can with its Blue Ghost lunar lander. But without Alpha, Firefly wouldn’t be where it is today. The Texas-based firm is one of only four US companies with an operational orbital-class rocket. One thing to watch for is how quickly Firefly can ramp up its Alpha launch cadence. The rocket only flew once last year.

Isar Aerospace celebrates another win. In last week’s Rocket Report, we mentioned that the German launch startup Isar Aerospace won a contract with a Japanese company to launch a 200-kilogram commercial satellite in 2026. But wait, there’s more! On Wednesday, the Norwegian Space Agency announced it awarded a contract to Isar Aerospace for the launch of a pair of satellites for the country’s Arctic Ocean Surveillance initiative, European Spaceflight reports. The satellites are scheduled to launch on Isar’s Spectrum rocket from Andøya Spaceport in Norway by 2028.

First launch pending … These recent contract wins are a promising sign for Isar Aerospace, which is also vying for contracts to launch small payloads for the European Space Agency. The Spectrum rocket could launch on its inaugural flight within a matter of weeks, and if successful, it could mark a transformative moment for the European space industry, which has long been limited to a single launch provider: the French company Arianespace. (submitted by EllPeaTea)

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

Mother Nature holds up Oz launch. The first launch by Gilmour Space has been postponed again due to a tropical cyclone that brought severe weather to Australia’s Gold Coast region earlier this month, InnovationAus.com reports. Tropical Cyclone Alfred didn’t significantly impact Gilmour’s launch site, but the storm did cause the company to suspend work at its corporate headquarters in Southeast Queensland. With the storm now over, Gilmour is reassessing when it might be ready to launch its Eris rocket. Reportedly, the delay could be as long as two weeks or more.

A regulatory storm … Gilmour aims to become the first Australian company to launch a rocket into orbit. Last month, Gilmour announced the launch date for the Eris rocket was set for no earlier than March 15, but Tropical Cyclone Alfred threw this schedule out the window. Gilmour said it received a launch license from the Australian Space Agency in November and last month secured approvals to clear airspace around the launch site. But there’s still a hitch. The license is conditional on final documentation for the launch being filed and agreed with the space agency, and this process is stretching longer than anticipated. (submitted by ZygP)

What is going on at SpaceX? As we mention in the introduction to this week’s Rocket Report, it has been an uncharacteristically messy eight months for SpaceX. These speed bumps include issues with the Falcon 9 rocket’s upper stage on three missions, two lost Falcon 9 boosters, and consecutive failures of SpaceX’s massive Starship rocket on its first two test flights of the year. So what’s behind SpaceX’s bumpy ride? Ars wrote about the pressures facing SpaceX employees as Elon Musk pushes his workforce ever-harder to accelerate toward what Musk might call a multi-planetary future.

Headwinds or tailwinds? … No country or private company ever launched as many times as SpaceX flew its fleet of Falcon 9 rockets in 2024. At the same time, the company has been attempting to move its talented engineering team off the Falcon 9 and Dragon programs and onto Starship to keep that ambitious program moving forward. This is all happening as Musk has taken on significant roles in the Trump administration, stirring controversy and raising questions about his motives and potential conflicts of interest. However, it may be not so much Musk’s absence from SpaceX that is causing these issues but more the company’s relentless culture. As my colleague Eric Berger suggested in his piece, it seems possible that, at least for now, SpaceX has reached the speed limit for commercial spaceflight.

A titan of Silicon Valley enters the rocket business. Former Google chief executive Eric Schmidt has taken a controlling interest in the Long Beach, California-based Relativity Space, Ars reports. Schmidt’s involvement with Relativity has been quietly discussed among space industry insiders for a few months. Multiple sources told Ars that he has largely been bankrolling the company since the end of October, when the company’s previous fundraising dried up. Now, Schmidt is Relativity’s CEO.

Unclear motives … It is not immediately clear why Schmidt is taking a hands-on approach at Relativity. However, it is one of the few US-based companies with a credible path toward developing a medium-lift rocket that could potentially challenge the dominance of SpaceX and its Falcon 9 rocket. If the Terran R booster becomes commercially successful, it could play a big role in launching megaconstellations. Schmidt’s ascension also means that Tim Ellis, the company’s co-founder, chief executive, and almost sole public persona for nearly a decade, is now out of a leadership position.

Falcon 9 deploys NASA’s newest space telescope. Satellites come in all shapes and sizes, but there aren’t any that look quite like SPHEREx, an infrared observatory NASA launched Tuesday night in search of answers to simmering questions about how the Universe, and ultimately life, came to be, Ars reports. The SPHEREx satellite rocketed into orbit from California aboard a SpaceX Falcon 9 rocket, beginning a two-year mission surveying the sky in search of clues about the earliest periods of cosmic history, when the Universe rapidly expanded and the first galaxies formed. SPHEREx will also scan for pockets of water ice within our own galaxy, where clouds of gas and dust coalesce to form stars and planets.

Excess capacity … SPHEREx has lofty goals, but it’s modest in size, weighing just a little more than a half-ton at launch. This meant the Falcon 9 rocket had plenty of extra room for four other small satellites that will fly in formation to image the solar wind as it travels from the Sun into the Solar System. The four satellites are part of NASA’s PUNCH mission. SPHEREx and PUNCH are part of NASA’s Explorers program, a series of cost-capped science missions with a lineage going back to the dawn of the Space Age. SPHEREx and PUNCH have a combined cost of about $638 million. (submitted by EllPeaTea)

China has launched another batch of Internet satellites. A new group of 18 satellites entered orbit Tuesday for the Thousand Sails constellation with the first launch from a new commercial launch pad, Space News reports. The satellites launched on top of a Long March 8 rocket from Hainan Commercial Launch Site near Wenchang on Hainan Island. The commercial launch site has two pads, the first of which entered service with a launch last year. This mission was the first to launch from the other pad at the commercial spaceport, which is gearing up for an uptick in Chinese launch activity to continue deploying satellites for the Thousand Sails network and other megaconstellations.

Sailing on … The Thousand Sails constellation, also known as Qianfan, or G60 Starlink, is a broadband satellite constellation spearheaded by Shanghai Spacecom Satellite Technology (SSST), also known as Spacesail, Space News reported. The project, which aims to deploy 14,000 satellites, seeks to compete in the global satellite Internet market. Spacesail has now launched 90 satellites into near-polar orbits, and the operator previously stated it aims to have 648 satellites in orbit by the end of 2025. If Spacesail continues launching 18 satellites per rocket, this goal would require 31 more launches this year. (submitted by EllPeaTea)

NASA, SpaceX call off astronaut launch. With the countdown within 45 minutes of launch, NASA called off an attempt to send the next crew to the International Space Station Wednesday evening to allow more time to troubleshoot a ground system hydraulics issue, CBS News reports. During the countdown Wednesday, SpaceX engineers were troubleshooting a problem with one of two clamp arms that hold the Falcon 9 rocket to its strongback support gantry. Hydraulics are used to retract the two clamps prior to launch.

Back on track … NASA confirmed Thursday SpaceX ground teams completed inspections of the hydraulics system used for the clamp arm supporting the Falcon 9 rocket and successfully flushed a suspected pocket of trapped air in the system, clearing the way for another launch attempt Friday evening. This mission, known as Crew-10, will ferry two NASA astronauts, a Japanese mission specialist, and a Russian cosmonaut to the space station. They will replace a four-person crew currently at the ISS, including Butch Wilmore and Suni Williams, who have been in orbit since last June after flying to space on Boeing’s Starliner capsule. Starliner returned to Earth without its crew due to a problem with overheating thrusters, leaving Wilmore and Williams behind to wait for a ride home with SpaceX.

SpaceX’s woes reach Poland’s space agency. The president of the Polish Space Agency, Grzegorz Wrochna, has been dismissed following a botched response to the uncontrolled reentry of a Falcon 9 second stage that scattered debris across multiple locations in Poland, European Spaceflight reports. The Falcon 9’s upper stage was supposed to steer itself toward a controlled reentry last month after deploying a set of Starlink satellites, but a propellant leak prevented it from doing so. Instead, the stage remained in orbit for nearly three weeks before falling back into the atmosphere February 19, scattering debris fragments at several locations in Poland.

A failure to communicate … In the aftermath of the Falcon 9’s uncontrolled reentry, the Polish Space Agency (POLSA) claimed it sent warnings of the threat of falling space debris to multiple departments of the Polish government. One Polish ministry disputed this claim, saying it was not adequately warned about the uncontrolled reentry. POLSA later confirmed it sent information regarding the reentry to a wrong email address. Making matters worse, the Polish Space Agency reported it was hacked on March 2. The Polish government apparently had enough and fired the head of the space agency March 11.

Vulcan booster anomaly blamed on “manufacturing defect.” The loss of a solid rocket motor nozzle on the second flight of United Launch Alliance’s Vulcan Centaur last October was caused by a manufacturing defect, Space News reports. In a roundtable with reporters Wednesday, ULA chief executive Tory Bruno said the problem has been corrected as the company awaits certification of the Vulcan rocket by the Space Force. The nozzle fell off the bottom of one of the Vulcan launcher’s twin solid rocket boosters about a half-minute into its second test flight last year. The rocket continued its climb into space, but ULA and Northrop Grumman, which supplies solid rocket motors for Vulcan, set up an investigation to find the cause of the nozzle malfunction.

All the trimmings … Bruno said the anomaly was traced to a “manufacturing defect” in one of the internal parts of the nozzle, an insulator. Specific details, he said, remained proprietary, according to Space News. “We have isolated the root cause and made appropriate corrective actions,” he said, which were confirmed in a static-fire test of a motor at a Northrop test site in Utah in February. “So we are back continuing to fabricate hardware and, at least initially, screening for what that root cause was.” Bruno said the investigation was aided by recovery of hardware that fell off the motor while in flight and landed near the launch pad in Florida, as well as “trimmings” of material left over from the manufacturing process. ULA also recovered both boosters from the ocean so engineers could compare the one that lost its nozzle to the one that performed normally. The defective hardware “just stood out night and day,” Bruno said. “It was pretty clear that that was an outlier, far out of family.” Meanwhile, ULA has trimmed its launch forecast for this year, from a projection of up to 20 launches down to a dozen. (submitted by EllPeaTea)

Next three launches

March 14: Falcon 9 | Crew-10 | Kennedy Space Center, Florida | 23: 03 UTC

March 15: Electron | QPS-SAR-9 | Mahia Peninsula, New Zealand | 00: 00 UTC

March 15: Long March 2B | Unknown Payload | Jiuquan Satellite Launch Center, China | 04: 10 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: ULA confirms cause of booster anomaly; Crew-10 launch on tap Read More »

what-happens-when-dei-becomes-doa-in-the-aerospace-industry?

What happens when DEI becomes DOA in the aerospace industry?

As part of the executive order, US companies with federal contracts and grants must certify that they no longer have any DEI hiring practices. Preferentially hiring some interns from a pool that includes women or minorities is such a practice. Effectively, then, any private aerospace company that receives federal funding, or intends to one day, would likely be barred under the executive order from engaging with these kinds of fellowships in the future.

US companies are scrambling to determine how best to comply with the executive order in many ways, said Emily Calandrelli, an engineer and prominent science communicator. After the order went into effect, some large defense contractor companies, including Lockheed Martin and RTX (formerly Raytheon) went so far as to cancel internal employee resource groups, including everything from group chats to meetings among women at the company that served to foster a sense of community. When Calandrelli asked Lockheed about this decision, the company confirmed it had “paused” these resource group activities to “align with the new executive order.”

An unwelcoming environment

For women and minorities, Calandrelli said, this creates an unwelcoming environment.

“You want to go where you are celebrated and wanted, not where you are tolerated,” she said. “That sense of belonging is going to take a hit. It’s going to be harder to recruit women and keep women.”

This is not just a problem for women and minorities, but for everyone, Calandrelli said. The aerospace industry is competing with others for top engineering talent. Prospective engineers who feel unwanted in aerospace, as well as women and minorities working for space companies today, may find the salary and environment more welcoming at Apple or Google or elsewhere in the tech industry. That’s a problem for the US Space Force and other areas of the government seeking to ensure the US space industry retains its lead in satellite technology, launch, communications and other aspects of space that touch every part of life on Earth.

What happens when DEI becomes DOA in the aerospace industry? Read More »