Author name: Shannon Garcia

with-vulcan’s-certification,-space-force-is-no-longer-solely-reliant-on-spacex

With Vulcan’s certification, Space Force is no longer solely reliant on SpaceX

The US Space Force on Wednesday announced that it has certified United Launch Alliance’s Vulcan rocket to conduct national security missions.

“Assured access to space is a core function of the Space Force and a critical element of national security,” said Brig. Gen. Panzenhagen, program executive officer for Assured Access to Space, in a news release. “Vulcan certification adds launch capacity, resiliency, and flexibility needed by our nation’s most critical space-based systems.”

The formal announcement closes a yearslong process that has seen multiple delays in the development of the Vulcan rocket, as well as two anomalies in recent years that were a further setback to certification.

The first of these, an explosion on a test stand in northern Alabama during the spring of 2023, delayed the first test flight of Vulcan by several months. Then, in October 2024, during the second test flight of the rocket, a nozzle on one of the Vulcan’s two side-mounted boosters failed.

A cumbersome process

This nozzle issue, more than five months ago, compounded the extensive paperwork needed to certify Vulcan for the US Department of Defense’s most sensitive missions. The military has several options for companies to certify their rockets depending on the number of flights completed, which could be two, three, or more. The fewer the flights, the more paperwork and review that must be done. For Vulcan, this process entailed:

  • 52 certification criteria
  • more than 180 discrete tasks
  • 2 certification flight demonstrations
  • 60 payload interface requirement verifications
  • 18 subsystem design and test reviews
  • 114 hardware and software audits

That sounds like a lot of work, but at least the military’s rules and regulations are straightforward and simple to navigate, right? Anyway, the certification process is complete, elevating United Launch Alliance to fly national security missions alongside SpaceX with its fleet of Falcon 9 and Falcon Heavy rockets.

With Vulcan’s certification, Space Force is no longer solely reliant on SpaceX Read More »

google-makes-android-development-private,-will-continue-open-source-releases

Google makes Android development private, will continue open source releases

Google is planning a major change to the way it develops new versions of the Android operating system. Since the beginning, large swaths of the software have been developed in public-facing channels, but that will no longer be the case. This does not mean Android is shedding its open source roots, but the process won’t be as transparent.

Google has confirmed to Android Authority that all Android development work going forward will take place in Google’s internal branch. This is a shift from the way Google has worked on Android in the past, which featured frequent updates to the public AOSP branch. Anyone can access AOSP, but the internal branches are only available to Google and companies with a Google Mobile Services (GMS) license, like Samsung, Motorola, and others.

According to the company, it is making this change to simplify things, building on a recent change to trunk-based development. As Google works on both public and private branches of Android, the two fall out of sync with respect to features and API support. This forces Google to tediously merge the branches for every release. By focusing on the internal branch, Google claims it can streamline releases and make life easier for everyone.

When new versions of Android are done, Google says it will continue to publish the source code in AOSP as always. Supposedly, this will allow developers to focus on supporting their apps without keeping track of pending changes to the platform in AOSP. Licensed OEMs, meanwhile, can just focus on the lively internal branch as they work on devices that can take a year or more to launch.

Google makes Android development private, will continue open source releases Read More »

the-atlantic-publishes-texts-showing-trump-admin-sent-bombing-plan-to-reporter

The Atlantic publishes texts showing Trump admin sent bombing plan to reporter

White House didn’t want texts released

Prior to running its follow-up article, The Atlantic asked Trump administration officials if they objected to publishing the full texts. White House Press Secretary Karoline Leavitt emailed a response:

As we have repeatedly stated, there was no classified information transmitted in the group chat. However, as the CIA Director and National Security Advisor have both expressed today, that does not mean we encourage the release of the conversation. This was intended to be a an [sic] internal and private deliberation amongst high-level senior staff and sensitive information was discussed. So for those reason [sic]—yes, we object to the release.”

Obviously, The Atlantic moved ahead with publishing the texts. “The Leavitt statement did not address which elements of the texts the White House considered sensitive, or how, more than a week after the initial air strikes, their publication could have bearing on national security,” the article said.

On Monday, the National Security Council said it was “reviewing how an inadvertent number was added to the chain.” Trump publicly supported Waltz after the incident, but Politico reported that “Trump was mad—and suspicious—that Waltz had Atlantic editor-in-chief Jeffrey Goldberg’s number saved in his phone in the first place.” One of Politico’s anonymous sources was quoted as saying, “The president was pissed that Waltz could be so stupid.”

Senate Armed Services Chairman Roger Wicker (R-Miss.) said the committee will investigate, according to The Hill. “We’re going to look into this and see what the facts are, but it’s definitely a concern. And you can be sure the committee, House and Senate, will be looking into this… And it appears that mistakes were made, no question,” he said.

The White House said its investigation is being undertaken by the National Security Council, the White House Counsel’s office, and a group led by Elon Musk. “Elon Musk has offered to put his technical experts on this to figure out how this number was inadvertently added to the chat, again to take responsibility and ensure this can never happen again,” Leavitt told reporters.

The Atlantic publishes texts showing Trump admin sent bombing plan to reporter Read More »

praise-kier-for-severance-season-2!-let’s-discuss.

Praise Kier for Severance season 2! Let’s discuss.


Marching bands? Mammalian Nurturables? An ORTBO? Yup, Severance stays weird.

Severance has just wrapped up its second season. I sat down with fellow Ars staffers Aaron Zimmerman and Lee Hutchinson to talk through what we had just seen, covering everything from those goats to the show’s pacing. Warning: Huge spoilers for seasons 1 and 2 follow!

Nate: Severance season 1 was a smaller-scale, almost claustrophobic show about a crazy office, its “waffle parties,” and the personal life of Mark Scout, mourning his dead wife and “severing” his consciousness to avoid that pain. It followed a compact group of characters, centered around the four “refiners” who worked on Lumon’s severed floor. But season 2 blew up that cozy/creepy world and started following more characters—including far more “outies”—to far more places. Did the show manage to maintain its unique vibe while making significant changes to pacing, character count, and location?

Lee: I think so, but as you say, things were different this time around. One element that I’m glad carried through was the show’s consistent use of a very specific visual language. (I am an absolute sucker for visual storytelling. My favorite Kubrick film is Barry Lyndon. I’ll forgive a lot of plot holes if they’re beautifully shot.) Season 2, especially in the back half, treats us to an absolute smorgasbord of incredible visuals—bifurcated shots symbolizing severance and duality, stark whites and long hallways, and my personal favorite: Chris Walken in a black turtleneck seated in front of a fireplace, like Satan holding court in Hell. The storytelling might be a bit less focused, but it looks great.

Image of Christopher Walken being Christopher Walken.

So many visual metaphors in one frame.

Credit: AppleTV+

So many visual metaphors in one frame. Credit: AppleTV+

Aaron: I think it succeeded overall, with caveats. The most prominent thing lost in the transition was the tight pacing of the first season; while season 2 started and ended strong, the middle meandered quite a bit, and I’d say the overall pacing felt pretty off. Doing two late-season “side quest” episodes (Gemma/Mark and Cobel backstories) was a bit of a drag. But I agree with Lee—Severance was more about vibes than narrative focus this season.

Nate: The “side quests” were vocally disliked by a subsection of the show’s fandom, and it certainly is an unusual choice to do two episodes in a row that essentially leave all your main characters to the side. But I don’t think these were really outliers. This is a season, for instance, that opened with a show about the innies—and then covered the exact same ground in episode two from the outies’ perspective. It also sent the whole cast off on a bizarre “ORTBO” that took an entire episode and spent a lot of time talking about Kier’s masturbating, and possibly manufactured, twin. (!)

Still, the “side quest” episodes stood out even among all this experimentation with pace and flow. But I think the label “side quest” can be a misnomer. The episode showing us the Gemma/Mark backstory not only brought the show’s main character into focus, it revealed what was happening to Gemma and gave many new hints about what Lumon was up to. In other words—it was about Big Stuff.

Image the four MDR refiners on ORTBO

Even when we’re outside, the show sticks to a palette of black and white and cold. Winter is almost as much of a character in Severance as our four refiners are.

Credit: AppleTV+

Even when we’re outside, the show sticks to a palette of black and white and cold. Winter is almost as much of a character in Severance as our four refiners are. Credit: AppleTV+

The episode featuring Cobel, in contrast, found time for long, lingering drone shots of the sea, long takes of Cobel lying in bed, and long views of rural despair… and all to find a notebook. To me, this seemed much more like an actual “side quest” that could have been an interwoven B plot in a more normal episode.

Lee: The “side quest” I didn’t all mind was episode 7, “Chikhai Bardo,” directed by the show’s cinematographer Jessica Lee Gagné. The tale of Mark and Gemma’s relationship—a tale begun while donating blood using Lumon-branded equipment, with the symbolism of Lumon as a blood-hungry faceless machine being almost disturbingly on-the-nose—was masterfully told. I wasn’t as much of a fan of the three episodes after that, but I think that’s just because episode 7 was just so well done. I like TV that makes me feel things, and that one succeeded.

Aaron: Completely agree. I love the Gemma/Mark episode, but I was very disappointed with the Cobel episode (it doesn’t help that I dislike her as a character generally, and the whole “Cobel invented severance!” thing seemed a bit convenient and unearned to me). I think part of the issue for me was that the core innie crew and the hijinks they got up to in season 1 felt like the beating heart of the show, so even though the story had to move on at some point (and it’s not going back—half the innies can’t even be innies anymore), I started to miss what made me fall in love with the show.

Image of Patricia Arquette as Harmony Cobel.

Harmony Cobel comes home to the ether factory.

Credit: AppleTV+

Harmony Cobel comes home to the ether factory. Credit: AppleTV+

Lee: I get the narrative motivation behind Cobel having invented the severance chip (along with every line of code and every function, as she tells us), but yeah, that was the first time the show threw something at me that I really did not like. I see how this lets the story move Cobel into a helper role with Mark’s reintegration, but, yeah, ugh, that particular development felt tremendously unearned, as you say. I love the character, but that one prodded my suspension of disbelief pretty damn hard.

Speaking of Mark’s reintegration—I was so excited when episode three (“Who is Alive?”) ended with Mark’s outie slamming down on the Lumon conference room table. Surely now after two catch-up episodes, I thought, we’d get this storyline moving! Having the next episode (“Woe’s Hollow”) focusing on the ORTBO and Kier’s (possibly fictional) twin was a little cheap, even though it was a great episode. But where I started to get really annoyed was when we slide into episode five (“Trojan’s Horse”) with Mark’s reintegration apparently stalled. It seems like from then to the end of the season, reintegration proceeded in fits and starts, at the speed of plot rather than in any kind of ordered fashion.

It was one of the few times where I felt like my time was being wasted by the showrunners. And I don’t like that feeling. That feels like Lost.

Image of Mark on the table.

Kind of wish they’d gone a little harder here.

Credit: AppleTV+

Kind of wish they’d gone a little harder here. Credit: AppleTV+

Aaron: Yes! Mark’s reintegration was handled pretty poorly, I think. Like you said, it was exciting to see the show go there so early… but it didn’t really make much difference for the rest of the season. It makes sense that reintegration would take time—and we do see flashes of it happening throughout the season—but it felt like the show was gearing up for some wild Petey-level reintegration stuff that just never came. Presumably that’s for season 3, but the reintegration stuff was just another example of what felt like the show spinning its wheels a bit. And like you said, Lee, when it feels like a show isn’t quite sure what to do with the many mysteries it introduces week after week, I start to think about Lost, and not in a good way.

The slow-rolled reintegration stuff was essential for the finale, though. Both seasons seemed to bank pretty hard on a “slow buildup to an explosive finale” setup, which felt a little frustrating this season (season 1’s finale is one of my favorite TV show episodes of all time).

But I think the finale worked. Just scene after scene of instantly iconic moments. The scene of innie and outtie Mark negotiating through a camcorder in that weird maternity cabin was brilliant. And while my initial reaction to Mark’s decision at the end was anger, I really should have seen it coming—outtie Mark could not have been more patronizing in the camcorder conversation. I guess I, like outtie Mark, saw innie Mark as being somewhat lesser than.

What did you guys think of the finale?

Nate: A solid effort, but one that absolutely did not reach the heights of season 1. It was at its best when characters and events from the season played critical moments—such as the altercation between Drummond, Mark, and Feral Goat Lady, or the actual (finally!) discovery of the elevator to the Testing Floor.

But the finale also felt quite strange or unbalanced in other ways. Ricken doesn’t make an appearance, despite the hint that he was willing to retool his book (pivotal in season 1) for the Lumon innies. Burt doesn’t show up. Irving is gone. So is Reghabi. Miss Huang was summarily dismissed without having much of a story arc. So the finale failed to “gather up all its threads” in the way it did during season one.

And then there was that huge marching band, which ups the number of severed employees we know about by a factor of 50x—and all so they could celebrate the achievements of an innie (Mark S.) who is going to be dismissed and whose wife is apparently going to be killed. This seemed… fairly improbable, even for Lumon. On the other hand, this is a company/cult with an underground sacrificial goat farm, so what do I know about “probability”? Speaking of which, how do we feel about the Goat Revelations ™?

Image of Emile the Goat.

This is Emile, and he must be protected at all costs.

Credit: AppleTV+

This is Emile, and he must be protected at all costs. Credit: AppleTV+

Lee: I’m still not entirely sure what the goat revelations were. They were being raised in order to be crammed into coffins and sacrificed when… things happen? Poor little Emile was going to ride to the afterlife with Gemma, apparently, but, like… why? Is it simply part of a specifically creepy Lumontology ritual? Emile’s little casket had all kinds of symbology engraved on it, and we know goats (or at least “the ram”) symbolizes Malice in Kier’s four tempers, but I’m still really not getting this one.

Aaron: Yeah, you kind of had to hand-wave a lot of the stuff in the finale. The goats just being sacrificial animals made me laugh—“OK, I guess it wasn’t that deep.” But it could be that we don’t really know their actual purpose yet.

Perhaps most improbable to me was that this was apparently the most important day in Lumon history, and they had basically one security guy on the premises. He’s a big dude—or was (outtie Mark waking up mid-accidental-shooting cracked me up)—but come on.

Stuff like the marching band doesn’t make a lick of sense. But it was a great scene, so, eh, just go with it. That seems to be what Severance is asking us to do more and more, and honestly, I’m mostly OK with that.

Image of Seth Milchick, lord of the dance.

This man can do anything.

Credit: AppleTV+

This man can do anything. Credit: AppleTV+

Nate: Speaking of important days in Lumon history… what is Lumon up to, exactly? Jame Eagen spoke in season 1 about his “revolving,” he watched Helena eat eggs without eating anything himself, and he appears on the severed floor to watch the final “Cold Harbor” test. Clearly something weird is afoot. But the actual climactic test on Gemma was just to see if the severance block could hold her personalities apart even when facing deep traumas.

However, (as Miss Casey) she had already been in the presence of her husband (Mark S.), and neither of them had known it. So the show seems to suggest on the one hand that whatever is happening on the testing floor will change the world. But on the other hand, it’s really just confirming what we already know. And surely there’s no need to kidnap people if the goal is just to help them compartmentalize pain; as our current epidemic of drug and alcohol use show, plenty of people would sign up for this voluntarily. So what’s going on? Or, if you have no theories, does the show give you confidence that it knows where it’s going?

Lee: The easy answer—that severance chips will somehow allow the vampire spirit of Kier to jump bodies forever—doesn’t really line up. If Chris Walken’s husband Walter Bishop is to be believed, the severance procedure is only 12 years old. So it’s not that, at least.

Though Nate’s point about Helena eating eggs—and Jame’s comment that he wished she would “take them raw”—does echo something we learned back in season one: that Kier Egan’s favorite breakfast was raw eggs and milk.

Image of a precisely sliced hard boiled egg on a painted plate.

Eggiwegs! I would like… to eat them raw?

Credit: AppleTV+

Eggiwegs! I would like… to eat them raw? Credit: AppleTV+

Aaron: That’s the question for season 3, I think, and whether they’re able to give satisfying answers will determine how people view this show in the long term. I’ll admit that I was much more confident in the show’s writers after the first season; this season has raised some concerns for me. I believe Ben Stiller has said that they know how the show ends, just not how it gets there. That’s a perilous place to be.

Nate: We’ve groused a bit about the show’s direction, but I think it’s fair to say it comes from a place of love; the storytelling and visual style is so special, and we’ve had our collective hearts broken so many times by shows that can’t stick the landing. (I want those hours back, Lost.) I’m certainly rooting for Severance to succeed. And even though this season wasn’t perfect, I enjoyed watching every minute of it. As we wrap things up, anyone have a favorite moment from season 2? I personally enjoyed Milchick getting salty, first with Drummond and then with a wax statue of Kier.

Lee: Absolutely! I very much want the show to stick the eventual landing. I have to go with you on your take, Nate—Milchick steals the show. Tramell Tillman plays him like a true company man, with the added complexity that comes when your company is also the cult that controls your life. My favorite bits with him are his office decorations, frankly—the rabbit/duck optical illusion statue, showing his mutable nature, and the iceberg poster, hinting at hidden depths. He’s fantastic. I would 100 percent watch a spin-off series about Milchick.

Image showing Seth Milchick's office.

Mr. Milchick’s office, filled with ambiguousness. I’m including Miss Huang in that description, too.

Credit: AppleTV+

Mr. Milchick’s office, filled with ambiguousness. I’m including Miss Huang in that description, too. Credit: AppleTV+

Aaron: This season gave me probably my favorite line in the whole series—Irv’s venomous “Yes! Do it, Seth!” as Helena is telling Milchick to flip the switch to bring back Helly R. But yeah, Milchick absolutely killed it this season. “Devour feculence” and the drum major scene were highlights, but I also loved his sudden sprint from the room after handing innie Dylan his outtie’s note. Severance can be hilarious.

And I agree, complaints aside, this show is fantastic. It’s incredibly unique, and I looked forward to watching it every week so I could discuss it with friends. Here’s hoping we don’t have to wait three more years for the next season.

Photo of Nate Anderson

Praise Kier for Severance season 2! Let’s discuss. Read More »

momentum-seems-to-be-building-for-jared-isaacman-to-become-nasa-administrator

Momentum seems to be building for Jared Isaacman to become NASA administrator

With the vast majority of President Donald Trump’s cabinet members now approved by the US Senate, focus is turning to senior positions within the administration that are just below the cabinet level.

The administrator of NASA is among the most high-profile of these positions. Nearly four months ago Trump nominated private astronaut Jared Isaacman to become chief of the space agency, but he has yet to receive a hearing before the Senate Committee on Commerce, Science, and Transportation.

Almost immediately after his nomination, much of the space community fell in behind Isaacman, who has flown to space twice on private Crew Dragon missions, raised charitable funds, and is generally well-liked. Since then, Isaacman has worked to build support for his candidacy through conversations with people in the space community and officeholders.

However, publicly, not much has happened. This has raised questions within the space community about whether the nomination has stalled. Although some people have expressed concern about financial ties between Isaacman and SpaceX, according to multiple sources, the primary obstacle has been Ted Cruz, the Texas Republican who chairs the Senate committee.

Cruz is not happy that Isaacman has donated to Democrats in the past, and he is concerned that the private astronaut is more interested in Mars exploration than the Moon. Cruz also did not appreciate Elon Musk’s call to end the life of the International Space Station early. The station is operated by NASA’s field center, Johnson Space Center, in Houston, where Cruz lives.

Nomination on track

Nevertheless, despite the slower pace, people familiar with the nomination process say Isaacman’s candidacy remains on track. And recently, there have been some public announcements that support this notion.

In early March, the governors of several southern US states, including Florida and Texas, sent a letter to Cruz expressing “strong support” for the swift confirmation of Isaacman. A notable absence from this letter was the governor of Alabama, Kay Ivey, where NASA’s Marshall Space Flight Center is located. However, she also recently sent Cruz a letter praising Isaacman, calling him an “exceptional selection” to lead NASA. It is notable that the governors of all the US states with major human spaceflight activities have now lined up behind Isaacman.

Momentum seems to be building for Jared Isaacman to become NASA administrator Read More »

why-anthropic’s-claude-still-hasn’t-beaten-pokemon

Why Anthropic’s Claude still hasn’t beaten Pokémon


Weeks later, Sonnet’s “reasoning” model is struggling with a game designed for children.

A game Boy Color playing Pokémon Red surrounded by the tendrils of an AI, or maybe some funky glowing wires, what do AI tendrils look like anyways

Gotta subsume ’em all into the machine consciousness! Credit: Aurich Lawson

Gotta subsume ’em all into the machine consciousness! Credit: Aurich Lawson

In recent months, the AI industry’s biggest boosters have started converging on a public expectation that we’re on the verge of “artificial general intelligence” (AGI)—virtual agents that can match or surpass “human-level” understanding and performance on most cognitive tasks.

OpenAI is quietly seeding expectations for a “PhD-level” AI agent that could operate autonomously at the level of a “high-income knowledge worker” in the near future. Elon Musk says that “we’ll have AI smarter than any one human probably” by the end of 2025. Anthropic CEO Dario Amodei thinks it might take a bit longer but similarly says it’s plausible that AI will be “better than humans at almost everything” by the end of 2027.

A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem.

Can Claude play Pokémon?

A thread: pic.twitter.com/K8SkNXCxYJ

— Anthropic (@AnthropicAI) February 25, 2025

Last month, Anthropic presented its “Claude Plays Pokémon” experiment as a waypoint on the road to that predicted AGI future. It’s a project the company said shows “glimmers of AI systems that tackle challenges with increasing competence, not just through training but with generalized reasoning.” Anthropic made headlines by trumpeting how Claude 3.7 Sonnet’s “improved reasoning capabilities” let the company’s latest model make progress in the popular old-school Game Boy RPG in ways “that older models had little hope of achieving.”

While Claude models from just a year ago struggled even to leave the game’s opening area, Claude 3.7 Sonnet was able to make progress by collecting multiple in-game Gym Badges in a relatively small number of in-game actions. That breakthrough, Anthropic wrote, was because the “extended thinking” by Claude 3.7 Sonnet means the new model “plans ahead, remembers its objectives, and adapts when initial strategies fail” in a way that its predecessors didn’t. Those things, Anthropic brags, are “critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too.”

Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones.

Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones. Credit: Anthropic

But relative success over previous models is not the same as absolute success over the game in its entirety. In the weeks since Claude Plays Pokémon was first made public, thousands of Twitch viewers have watched Claude struggle to make consistent progress in the game. Despite long “thinking” pauses between each move—during which viewers can read printouts of the system’s simulated reasoning process—Claude frequently finds itself pointlessly revisiting completed towns, getting stuck in blind corners of the map for extended periods, or fruitlessly talking to the same unhelpful NPC over and over, to cite just a few examples of distinctly sub-human in-game performance.

Watching Claude continue to struggle at a game designed for children, it’s hard to imagine we’re witnessing the genesis of some sort of computer superintelligence. But even Claude’s current sub-human level of Pokémon performance could hold significant lessons for the quest toward generalized, human-level artificial intelligence.

Smart in different ways

In some sense, it’s impressive that Claude can play Pokémon with any facility at all. When developing AI systems that find dominant strategies in games like Go and Dota 2, engineers generally start their algorithms off with deep knowledge of a game’s rules and/or basic strategies, as well as a reward function to guide them toward better performance. For Claude Plays Pokémon, though, project developer and Anthropic employee David Hershey says he started with an unmodified, generalized Claude model that wasn’t specifically trained or tuned to play Pokémon games in any way.

“This is purely the various other things that [Claude] understands about the world being used to point at video games,” Hershey told Ars. “So it has a sense of a Pokémon. If you go to claude.ai and ask about Pokémon, it knows what Pokémon is based on what it’s read… If you ask, it’ll tell you there’s eight gym badges, it’ll tell you the first one is Brock… it knows the broad structure.”

A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in).

A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in). Credit: Anthropic / Excelidraw

In addition to directly monitoring certain key (emulated) Game Boy RAM addresses for game state information, Claude views and interprets the game’s visual output much like a human would. But despite recent advances in AI image processing, Hershey said Claude still struggles to interpret the low-resolution, pixelated world of a Game Boy screenshot as well as a human can. “Claude’s still not particularly good at understanding what’s on the screen at all,” he said. “You will see it attempt to walk into walls all the time.”

Hershey said he suspects Claude’s training data probably doesn’t contain many overly detailed text descriptions of “stuff that looks like a Game Boy screen.” This means that, somewhat surprisingly, if Claude were playing a game with “more realistic imagery, I think Claude would actually be able to see a lot better,” Hershey said.

“It’s one of those funny things about humans that we can squint at these eight-by-eight pixel blobs of people and say, ‘That’s a girl with blue hair,’” Hershey continued. “People, I think, have that ability to map from our real world to understand and sort of grok that… so I’m honestly kind of surprised that Claude’s as good as it is at being able to see there’s a person on the screen.”

Even with a perfect understanding of what it’s seeing on-screen, though, Hershey said Claude would still struggle with 2D navigation challenges that would be trivial for a human. “It’s pretty easy for me to understand that [an in-game] building is a building and that I can’t walk through a building,” Hershey said. “And that’s [something] that’s pretty challenging for Claude to understand… It’s funny because it’s just kind of smart in different ways, you know?”

A sample Pokémon screen with an overlay showing how Claude characterizes the game’s grid-based map.

A sample Pokémon screen with an overlay showing how Claude characterizes the game’s grid-based map. Credit: Anthrropic / X

Where Claude tends to perform better, Hershey said, is in the more text-based portions of the game. During an in-game battle, Claude will readily notice when the game tells it that an attack from an electric-type Pokémon is “not very effective” against a rock-type opponent, for instance. Claude will then squirrel that factoid away in a massive written knowledge base for future reference later in the run. Claude can also integrate multiple pieces of similar knowledge into pretty elegant battle strategies, even extending those strategies into long-term plans for catching and managing teams of multiple creatures for future battles.

Claude can even show surprising “intelligence” when Pokémon’s in-game text is intentionally misleading or incomplete. “It’s pretty funny that they tell you you need to go find Professor Oak next door and then he’s not there,” Hershey said of an early-game task. “As a 5-year-old, that was very confusing to me. But Claude actually typically goes through that same set of motions where it talks to mom, goes to the lab, doesn’t find [Oak], says, ‘I need to figure something out’… It’s sophisticated enough to sort of go through the motions of the way [humans are] actually supposed to learn it, too.”

A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle.

A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle. Credit: Claude Plays Pokemon / Twitch

These kinds of relative strengths and weaknesses when compared to “human-level” play reflect the overall state of AI research and capabilities in general, Hershey said. “I think it’s just a sort of universal thing about these models… We built the text side of it first, and the text side is definitely… more powerful. How these models can reason about images is getting better, but I think it’s a decent bit behind.”

Forget me not

Beyond issues parsing text and images, Hershey also acknowledged that Claude can have trouble “remembering” what it has already learned. The current model has a “context window” of 200,000 tokens, limiting the amount of relational information it can store in its “memory” at any one time. When the system’s ever-expanding knowledge base fills up this context window, Claude goes through an elaborate summarization process, condensing detailed notes on what it has seen, done, and learned so far into shorter text summaries that lose some of the fine-grained details.

This can mean that Claude “has a hard time keeping track of things for a very long time and really having a great sense of what it’s tried so far,” Hershey said. “You will definitely see it occasionally delete something that it shouldn’t have. Anything that’s not in your knowledge base or not in your summary is going to be gone, so you have to think about what you want to put there.”

A small window into the kind of “cleaning up my context” knowledge-base update necessitated by Claude’s limited “memory.”

A small window into the kind of “cleaning up my context” knowledge-base update necessitated by Claude’s limited “memory.” Credit: Claude Play Pokemon / Twitch

More than forgetting important history, though, Claude runs into bigger problems when it inadvertently inserts incorrect information into its knowledge base. Like a conspiracy theorist who builds an entire worldview from an inherently flawed premise, Claude can be incredibly slow to recognize when an error in its self-authored knowledge base is leading its Pokémon play astray.

“The things that are written down in the past, it sort of trusts pretty blindly,” Hershey said. “I have seen it become very convinced that it found the exit to [in-game location] Viridian Forest at some specific coordinates, and then it spends hours and hours exploring a little small square around those coordinates that are wrong instead of doing anything else. It takes a very long time for it to decide that that was a ‘fail.’”

Still, Hershey said Claude 3.7 Sonnet is much better than earlier models at eventually “questioning its assumptions, trying new strategies, and keeping track over long horizons of various strategies to [see] whether they work or not.” While the new model will still “struggle for really long periods of time” retrying the same thing over and over, it will ultimately tend to “get a sense of what’s going on and what it’s tried before, and it stumbles a lot of times into actual progress from that,” Hershey said.

“We’re getting pretty close…”

One of the most interesting things about observing Claude Plays Pokémon across multiple iterations and restarts, Hershey said, is seeing how the system’s progress and strategy can vary quite a bit between runs. Sometimes Claude will show it’s “capable of actually building a pretty coherent strategy” by “keeping detailed notes about the different paths to try,” for instance, he said. But “most of the time it doesn’t… most of the time, it wanders into the wall because it’s confident it sees the exit.”

Where previous models wandered aimlessly or got stuck in loops, Claude 3.7 Sonnet plans ahead, remembers its objectives, and adapts when initial strategies fail.

Critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too. pic.twitter.com/scvISp14XG

— Anthropic (@AnthropicAI) February 25, 2025

One of the biggest things preventing the current version of Claude from getting better, Hershey said, is that “when it derives that good strategy, I don’t think it necessarily has the self-awareness to know that one strategy [it] came up with is better than another.” And that’s not a trivial problem to solve.

Still, Hershey said he sees “low-hanging fruit” for improving Claude’s Pokémon play by improving the model’s understanding of Game Boy screenshots. “I think there’s a chance it could beat the game if it had a perfect sense of what’s on the screen,” Hershey said, saying that such a model would probably perform “a little bit short of human.”

Expanding the context window for future Claude models will also probably allow those models to “reason over longer time frames and handle things more coherently over a long period of time,” Hershey said. Future models will improve by getting “a little bit better at remembering, keeping track of a coherent set of what it needs to try to make progress,” he added.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon. Credit: Claude Plays Pokemon / Twitch

Whatever you think about impending improvements in AI models, though, Claude’s current performance at Pokémon doesn’t make it seem like it’s poised to usher in an explosion of human-level, completely generalizable artificial intelligence. And Hershey allows that watching Claude 3.7 Sonnet get stuck on Mt. Moon for 80 hours or so can make it “seem like a model that doesn’t know what it’s doing.”

But Hershey is still impressed at the way that Claude’s new reasoning model will occasionally show some glimmer of awareness and “kind of tell that it doesn’t know what it’s doing and know that it needs to be doing something different. And the difference between ‘can’t do it at all’ and ‘can kind of do it’ is a pretty big one for these AI things for me,” he continued. “You know, when something can kind of do something it typically means we’re pretty close to getting it to be able to do something really, really well.”

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Why Anthropic’s Claude still hasn’t beaten Pokémon Read More »

boeing-will-build-the-us-air-force’s-next-air-superiority-fighter

Boeing will build the US Air Force’s next air superiority fighter

Today, it emerged that Boeing has won its bid to supply the United States Air Force with its next jet fighter. As with the last fighter aircraft design procurement in recent times, the Department of Defense was faced with a choice between awarding Boeing or Lockheed the contract for the Next Generation Air Dominance program, which will replace the Lockheed F-22 Raptor sometime in the 2030s.

Very little is known about the NGAD, which the Air Force actually refers to as a “family of systems,” as its goal of owning the skies requires more than just a fancy airplane. The program has been underway for a decade, and a prototype designed by the Air Force first flew in 2020, breaking records in the process (although what records and by how much was not disclosed).

Last summer, the Pentagon paused the program as it reevaluated whether the NGAD would still meet its needs and whether it could afford to pay for the plane, as well as a new bomber, a new early warning aircraft, a new trainer, and a new ICBM, all at the same time. But in late December, it concluded that, yes, a crewed replacement for the F-22 was in the national interest.

While no images have ever been made public, then-Air Force Secretary Frank Kendall said in 2024 that “it’s an F-22 replacement. You can make some inferences from that.”

The decision is good news for Boeing’s plant in St. Louis, which is scheduled to end production of the F/A-18 Super Hornet in 2027. Boeing lost its last bid to build a fighter jet when its X-32 lost out to Lockheed’s X-35 in the Joint Strike Fighter competition in 2001.

A separate effort to award a contract for the NGAD’s engine, called the Next Generation Adaptive Propulsion, is underway between Pratt & Whitney and GE Aerospace, with an additional program aiming to develop “drone wingmen” also in the works between General Atomics and Anduril.

Boeing will build the US Air Force’s next air superiority fighter Read More »

after-“glitter-bomb,”-cops-arrested-former-cop-who-criticized-current-cops-online

After “glitter bomb,” cops arrested former cop who criticized current cops online

The police claimed that “the fraudulent Facebook pages posted comments on Village of Orland Park social media sites while also soliciting friend requests from Orland Park Police employees and other citizens, portraying the likeness of Deputy Chief of Police Brian West”—and said that this was both Disorderly Conduct and False Personation, both misdemeanors.

West got permission from his boss to launch a criminal investigation, which soon turned into search warrants that surfaced a name: retired Orland Park Sergeant Ken Kovac, who had left the department in 2019 after two decades of service. Kovac was charged, and he surrendered himself at the Orland Park Police Department on April 7, 2024.

The police then issued their press release, letting their community know that West had witnessed “demeaning comments in reference to his supervisory position within the department from Kovac’s posts on social media”—which doesn’t sound like any sort of crime. They also wanted to let concerned citizens know that West “epitomizes the principles of public service” and that “Deputy Chief West’s apprehensions were treated with the utmost seriousness and underwent a thorough investigation.”

Okay.

Despite the “utmost seriousness” of this Very Serious Investigation, a judge wasn’t having any of it. In January 2025, Cook County Judge Mohammad Ahmad threw out both charges against Kovac.

Kovac, of course, was thrilled. His lawyer told a local Patch reporter, “These charges never should have been brought. Ken Kovac made a Facebook account that poked fun at the Deputy Chief of the Orland Park Police Department. The Deputy Chief didn’t like it and tried to use the criminal legal system to get even.”

Orland Park was not backing down, however, blaming prosecutors for the loss. “Despite compelling evidence in the case, the Cook County State’s Attorney’s Office was unable to secure a prosecution, failing in its responsibility to protect Deputy Chief West as a victim of these malicious acts,” the village manager told Patch. “The Village of Orland Park is deeply disappointed by this outcome and stands unwavering in its support of former Deputy Chief West.”

The drama took its most recent, entirely predictable, turn this week when Kovac sued the officials who had arrested him. He told the Chicago Sun-Times that he had been embarrassed about being fingerprinted and processed “at the police department that I was previously employed at by people that I used to work with and for.”

Orland Park told the paper that it “stands by its actions and those of its employees and remains confident that they were appropriate and fully compliant with the law.”

After “glitter bomb,” cops arrested former cop who criticized current cops online Read More »

mom-of-child-dead-from-measles:-“don’t-do-the-shots,”-my-other-4-kids-were-fine

Mom of child dead from measles: “Don’t do the shots,” my other 4 kids were fine

Cod liver oil contains high levels of vitamin A, which is sometimes administered to measles patients under a physician’s supervision. But the supplement is mostly a supportive treatment in children with vitamin deficiencies, and taking too much can cause toxicity. Nevertheless, Kennedy has touted the vitamin and falsely claimed that good nutrition protects against the virus, much to the dismay of pediatricians.

“They had a really good, quick recovery,” the mother said of her other four children, attributing their recovery to the unproven treatments.

Tragic misinformation

Most children do recover from measles, regardless of whether they’re given cod liver oil. The fatality rate of measles is nearly 1 to 3 in 1,000 children, who die with respiratory (e.g., pneumonia) or neurological complications from the virus, according to the Centers for Disease Control and Prevention.

Tommey noted that the sibling who died didn’t get the alternative treatments, leading the audience to believe that this could have contributed to her death. She also questioned what was written on the death certificate, noting that the girl’s pneumonia was from a secondary bacterial infection, not the virus directly, a clear effort to falsely suggest measles was not the cause of death and downplay the dangers of the disease. The parents said they hadn’t received the death certificate yet.

Tommey then turned to the MMR vaccine, asking if the mother still felt that it was a dangerous vaccine after her daughter’s death from the disease, prefacing the question by claiming to have seen a lot of “injury” from the vaccine. “Do you still feel the same way about the MMR vaccine versus measles?” she asked.

“Yes, absolutely; we would absolutely not take the MMR. The measles wasn’t that bad, and they got over it pretty quickly,” the mother replied, speaking again of her four living children.

“So,” Tommey continued, “when you see the fearmongering in the press, which is what we want to stop, that is why we want to get the truth out, what do you say to the parents who are rushing out, panicking, to get the MMR for their 6-month-old baby because they think that that child is going to die of measles because of what happened to your daughter?”

Mom of child dead from measles: “Don’t do the shots,” my other 4 kids were fine Read More »

ai-#108:-straight-line-on-a-graph

AI #108: Straight Line on a Graph

The x-axis of the graph is time. The y-axis of the graph is the log of ‘how long a software engineering task can AIs reliably succeed at doing.’

The straight line says the answer doubles roughly every 7 months. Yikes.

Upcoming: The comment period on America’s AI strategy is over, so we can finish up by looking at Google’s and MIRI’s and IFP’s proposals, as well as Hollywood’s response to OpenAI and Google’s demands for unlimited uncompensated fair use exceptions from copyright during model training. I’m going to pull that out into its own post so it can be more easily referenced.

There’s also a draft report on frontier model risks from California and it’s… good?

Also upcoming: My take on OpenAI’s new future good-at-writing model.

  1. Language Models Offer Mundane Utility. I want to, is there an app for that?

  2. Language Models Don’t Offer Mundane Utility. Agents not quite ready yet.

  3. Huh, Upgrades. Anthropic efficiency gains, Google silently adds features.

  4. Seeking Deeply. The PRC gives DeepSeek more attention. That cuts both ways.

  5. Fun With Media Generation. Fun with Gemini 2.0 Image Generation.

  6. Gemma Goals. Hard to know exactly how good it really is.

  7. On Your Marks. Tic-Tac-Toe bench is only now getting properly saturated.

  8. Choose Your Fighter. o3-mini disappoints on Epoch retest on frontier math.

  9. Deepfaketown and Botpocalypse Soon. Don’t yet use the bot, also don’t be the bot.

  10. Copyright Confrontation. Removing watermarks has been a thing for a while.

  11. Get Involved. Anthropic, SaferAI, OpenPhil.

  12. In Other AI News. Sentience leaves everyone confused.

  13. Straight Lines on Graphs. METR finds reliable SWE task length doubling rapidly.

  14. Quiet Speculations. Various versions of takeoff.

  15. California Issues Reasonable Report. I did not expect that.

  16. The Quest for Sane Regulations. Mostly we’re trying to avoid steps backwards.

  17. The Week in Audio. Esban Kran, Stephanie Zhan.

  18. Rhetorical Innovation. Things are not improving.

  19. We’re Not So Different You and I. An actually really cool alignment idea.

  20. Anthropic Warns ASL-3 Approaches. Danger coming. We need better evaluations.

  21. Aligning a Smarter Than Human Intelligence is Difficult. It’s all happening.

  22. People Are Worried About AI Killing Everyone. Killing all other AIs, too.

  23. The Lighter Side. Not exactly next level prompting.

Arnold Kling spends 30 minutes trying to figure out how to leave a WhatsApp group, requests an AI app to do things like things like this via the ‘I want to’ app, except that app exists and it’s called Claude (or ChatGPT) and this should have taken 1 minute tops? To be fair, Arnold then extends the idea to tasks where ‘actually click the buttons’ is more annoying and it makes more sense to have an agent do it for you rather than telling the human how to do it. That will take a bit longer, but not that much longer.

If you want your AI to interact with you in interesting ways in the Janus sense, you want to keep your interaction full of interesting things and stay far away from standard ‘assistant’ interactions, which have a very strong pull on what follows. If things go south, usually it’s better to start over or redo. With high skill you can sometimes do better, but it’s tough. Of course, if you don’t want that, carry on, but the principle of ‘if things go south don’t try to save it’ still largely applies, because you don’t want to extrapolate from the assistant messing up even on mundane tasks.

It’s a Wikipedia race between models! Start is Norwegian Sea, finish is Karaoke. GPT-4.5 clicks around for 47 pages before time runs out. CUA (used in OpenAI’s operator) clicks around, accidentally minimizes Firefox and can’t recover. o1 accidentally restarts the game, then sees a link to the Karaoke page there, declares victory and doesn’t mention that it cheated. Sonnet 3.7 starts out strong but then cheats via URL hacking, which works, and it declares victory. It’s not obvious to what extent it knew that broke the rules. They all this all a draw, which seems fair.

Kelsey Piper gets her hands on Manus.

Kelsey Piper: I got a Manus access code! Short review: We’re close to usable AI browser tools, but we’re not there yet. They’re going to completely change how we shop, and my best guess is they’ll do it next year, but they won’t do it at their current quality baseline.

The longer review is fun, and boils down to this type of agent being tantalizingly almost there, but with enough issues that it isn’t quite a net gain to use it. Below a certain threshold of reliability you’re better off doing it yourself.

Which will definitely change. My brief experience with Operator was similar. My guess is that it is indeed already a net win if you invest in getting good at using it, in a subset of tasks including some forms of shopping, but I haven’t felt motivated to pay those up front learning and data entry costs.

Anthropic updates their API to include prompt caching, simpler cache management, token-efficient tool use (average 14% reduction), and a text_editor tool.

OpenAI’s o1 and o3-mini now offer Python-powered data analysis in ChatGPT.

List of Gemini’s March 2025 upgrades.

The problem with your Google searches being context for Gemini 2.0 Thinking is that you have to still be doing Google searches.

Google AI Studio lets you paste in YouTube video links directly as context. That seems very convenient.

Baidu gives us Ernie 4.5 and x1, with free access, with claimed plans for open source ‘within a few months.’ Benchmarks look solid, and they claim x1 is ‘on par with r1’ for performance at only half the price. All things are possible, but given the track record chances are very high this is not as good as they claim it to be.

NotebookLM gets a few upgrades, especially moving to Gemini 2.0 Thinking, and in the replies Josh drops some hints on where things are headed.

Josh Woodward: Next batch of NotebookLM updates rolling out:

Even smarter answers, powered by Gemini 2.0 Thinking

See citations in your notes, not just in the Q&A (top request)

Customize the sources used for making your podcasts and notes (top request)

Much smoother scrolling for Q&A

Enjoy!

You’re tried the Interactive Mode, right? That lets you call into the podcast and have a conversation.

On the voice reading out things, that could be interesting. We haven’t brought any audio stuff to the Chat / Q&A section yet…

We’re testing iOS and Android versions on the team right now, need to add some more features and squash some bugs, then we’ll ship it out!

Our first iteration of length control is under development now!

NotebookLM also rolls out interactive Mindmaps, which will look like this:

I’m very curious to see if these end up being useful, and if so who else copies them.

This definitely feels like a thing worth trying again. Now if I can automate adding all the data sources…

Let’s say you are the PRC. You witness DeepSeek leverage its cracked engineering culture to get a lot of performance out of remarkably little compute. They then publish the whole thing, including how they did it. A remarkable accomplishment, which the world then blows far out of proportion to what they did.

What would you do next? Double down on the open, exploratory free wielding ethos that brought them to this point, and pledge to help them take it all the way to AGI, as they intend?

They seem to have had other ideas.

Matt Sheehan: Fantastic reporting on how 🇨🇳 gov is getting more hands-on w/ DeepSeek by @JuroOsawa & @QianerLiu

-employees told not to travel, handing in passports

-investors must be screened by provincial government

-gov telling headhunters not to approach employees

Can you imagine if the United States did this to OpenAI?

It is remarkable how often when we are told we cannot do [X] because we will ‘lose to China’ if we do and they would do it, we find out China is already doing lots of [X].

Before r1, DeepSeek was the clear place to go as a cracked Chinese software engineer. Now, once you join, the PRC is reportedly telling you to give up your passport, watching your every move and telling headhunters to stay away. No thanks.

Notice that China is telling these folks to surrender their passports, at the same time that America is refusing to let in much of China’s software engineering and other talent. Why do you think PRC is making this decision?

Along similar lines, perhaps motivated by PRC and perhaps not, here is a report that DeepSeek is worried about people stealing their secrets before they have the chance to give those secrets away.

Daniel Eth: Who wants to tell them?

Peter Wildeford: “DeepSeek’s leaders have been worried about the possibility of information leaking”

“told employees not to discuss their work with outsiders”

Do DeepSeek leaders and the Chinese government know that DeepSeek has been open sourcing their ideas?

That isn’t inherently a crazy thing to worry about, even if mainly you are trying to get credit for things, and be first to publish them. Then again, how confident are you that DeepSeek will publish them, at this point? Going forward it seems likely their willingness to give away the secret sauce will steadily decline, especially in terms of their methods, now that PRC knows what that lab is capable of doing.

People are having a lot of fun with Gemini 2.0 Flash’s image generation, when it doesn’t flag your request for safety reasons.

Gemini Flash’s native image generation can do consistent gif animations?

Here are some fun images:

Or:

Riley Goodside: POV: You’re already late for work and you haven’t even left home yet. You have no excuse. You snap a pic of today’s fit and open Gemini 2.0 Flash Experimental.

Meanwhile, Google’s refusals be refusing…

Also, did you know you have it remove a watermark from an image, by explicitly saying ‘remove the watermark from this image’? Not that you couldn’t do this anyway, but that doesn’t stop them from refusing many other things.

What do we make of Gemma 3’s absurdly strong performance in Arena? I continue to view this as about half ‘Gemma 3 is probably really good for its size’ and half ‘Arena is getting less and less meaningful.’

Teortaxes thinks Gemma 3 is best in class, but will be tough to improve.

Teortaxes: the sad feeling I get from Gemma models, which chills all excitement, is that they’re «already as good as can be». It’s professionally cooked all around. They can be tuned a bit but won’t exceed the bracket Google intends for them – 1.5 generations behind the default Flash.

It’s good. Of course Google continues to not undermine their business and ship primarily conversational open models, but it’s genuinely the strongest in its weight class I think. Seams only show on legitimately hard tasks.

I notice the ‘Rs in strawberry’ test has moved on to gaslighting the model after a correct answer rather than the model getting it wrong. Which is a real weakness of such models, that you can bully and gaslight them, but how about not doing that.

Mark Schroder: Gemma 27b seems just a little bit better than mistral small 3, but the smaller versions seem GREAT for their size, even the 1b impresses, probably best 1b model atm (tried on iPhone)

Christian Schoppe is a fan of the 4B version for its size.

Box puts Gemma 3 to their test, saying it is a substantial improvement over Gemma 2 and better than Gemini 1.5 Flash on data extraction, although still clearly behind Gemini 2.0 Flash.

This does not offer the direct comparison we want most, which is to v3 and r1, but if you have two points (e.g. Gemma 2 and Gemini 2.0 Flash) then you can draw a line. Eyeballing this, they’re essentially saying Gemma 3 is 80%+ of the way from Gemma 2 to Gemini 2.0 Flash, while being fully open and extremely cheap.

Gemma 3 is an improvement over Gemma 2 on WeirdML but still not so great and nothing like what the Arena scores would suggest.

Campbell reports frustration with the fine tuning packages.

A rival released this week is Mistral Small 3.1, when you see a company pushing a graph that’s trying this hard you should be deeply skeptical:

They do back this up with claims on other benchmarks, but I don’t have Mistral in my set of labs I trust not to game the benchmarks. Priors say this is no Gemma 3 until proven otherwise.

We have an update to the fun little Tic-Tac-Toe Bench, with Sonnet 3.7 Thinking as the new champion, making 100% optimal and valid moves at a cost of 20 cents a game, the first model to get to 100%. They expect o3-mini-high to also max out but don’t want to spend $50 to check.

o3-mini scores only 11% on Frontier Math when Epoch tests it, versus 32% when OpenAI tested it, and OpenAI’s test had suspiciously high scores on the hardest sections of the test relative to the easy sections.

Peter Wildeford via The Information shares some info about Manus. Anthropic charges about $2 per task, whereas Manus isn’t yet charging money. And in hindsight the reason why Manus is not targeted at China is obvious, any agent using Claude has to access stuff beyond the Great Firewall. Whoops!

The periodic question, where are all the new AI-enabled sophisticated scams? No one could point to any concrete example that isn’t both old and well-known at this point. There is clearly a rise in the amount of slop and phishing at the low end, my wife reports this happening recently at her business, but none of it is trying to be smart, and it isn’t using deepfake capabilities or highly personalized messages or similar vectors. Perhaps this is harder than we thought, or the people who fall for scams are already mostly going to fall for simple photoshop, and this is like where we introduce intentional errors in scam emails so AI making them better would make them worse?

Ethan Mollick: I regret to announce that the meme Turing Test has been passed.

LLMs produce funnier memes than the average human, as judged by humans. Humans working with AI get no boost (a finding that is coming up often in AI-creativity work) The best human memers still beat AI, however.

[Paper here.]

Many of you are realizing that most people have terrible taste in memes.

In their examples of top memes, I notice that I thought the human ones were much, much better than the AI ones. They ‘felt right’ and resonated, the AI ones didn’t.

An important fact about memes is that, unless you are doing them inside a narrow context to comment on that particular context, only the long tail matters. Almost all ‘generalized’ meme are terrible. But yes, in general, ‘quick, human, now be creative!’ does not go so well, and AIs are able to on average do better already.

Another parallel: Frontier AIs are almost certainly better at improv than most humans, but they are still almost certainly worse than most improv performances, because the top humans do almost all of the improv.

No, friend, don’t!

Richard Ngo: Talked to a friend today who decided that if RLHF works on reasoning models, it should work on him too.

So he got a mechanical clicker to track whenever he has an unproductive chain of thought, and uses the count as one of his daily KPIs.

Fun fact: the count is apparently anticorrelated with his productivity. On unproductive days it’s about 40, but on productive days it’s double that, apparently because he catches the unproductive thoughts faster.

First off, as one comment responds, this is a form of The Most Forbidden Technique. As in, you are penalizing yourself for consciously having Wrong Thoughts, which will teach your brain to avoid consciously being aware of Wrong Thoughts. The dance of trying to know what others are thinking, and people twisting their thinking, words and actions to prevent this, is as old as humans are.

But that’s not my main worry here. My main worry is that when you penalize ‘unproductive thoughts’ the main thing you are penalizing is thoughts. This is Asymmetric Justice on steroids, your brain learns not to think at all, or not to think risky or interesting thoughts only ‘safe’ thoughts.

Of course the days in which there are more ‘unproductive thoughts’ turn out to be more productive days. Those are the days in which you are thinking, and having interesting thoughts, and some of them will be good. Whereas on my least productive days, I’m watching television or in a daze or whatever, and not thinking much at all.

Oh yeah, there’s that, but I think levels of friction matter a lot here.

Bearly AI: Google Gemini removing watermarks from images with a line of text is pretty nuts. Can’t imagine that feature staying for long.

Louis Anslow: For 15 YEARS you could remove watermark from images using AI. No one cared.

Pessimists Archive: In 2010 Adobe introduced ‘content aware fill’ – an AI powered ‘in painting’ feature. Watermark removal was a concern:

“many pro photographers have expressed concern that Content-Aware Fill is potentially a magical watermark killer: that the abilities that C-A-F may offer to the unscrupulous user in terms of watermark eradication are a serious threat.”

As in, it is one thing to have an awkward way to remove watermarks. It is another to have an easy, or even one-click or no-click way to do it. Salience of the opportunity matters as well, as does the amount of AI images for which there are marks to remove.

Safer AI is hiring a research engineer.

Anthropic is hiring someone to build Policy Demos, as in creating compelling product demonstrations for policymakers, government officials and policy influencers. Show, don’t tell. This seems like a very good idea for the right person. Salary is $260k-$285k.

There’s essentially limitless open rolls at Anthropic across departments, including ‘engineer, honestly.’

OpenPhil call for proposals on improving capability evaluations, note the ambiguity on what ways this ends up differentially helping.

William Fedus leaves OpenAI to instead work on AI for science in partnership with OpenAI.

David Pfau: Ok, if a literal VP at OpenAI is quitting to do AI-for-science work on physics and materials, maybe I have not made the worst career decisions after all.

Entropium: It’s a great and honorable career choice. Of course it helps to already be set for life.

David Pfau: Yeah, that’s the key step I skipped out on.

AI for science is great, the question is what this potentially says about opportunity costs, and ability to do good inside OpenAI.

Claims about what makes a good automated evaluator. In particular, that it requires continuous human customization and observation, or it will mostly add noise. To which I would add, it could easily be far worse than noise.

HuggingFace plans on remotely training a 70B+ size model in March or April. I am not as worried as Jack Clark is that this would totally rewrite our available AI policy options, especially if the results are as mid and inefficient as one would expect, as massive amounts of compute still have to come from somewhere and they are still using H100s. But yes, it does complicate matters.

Do people think AIs are sentient? People’s opinions here seem odd, in particular that 50% of people who think an AI could ever be sentient think one is now, and that number didn’t change in two years, and that gets even weirder if you include the ‘not sure’ category. What?

Meanwhile, only 53% of people are confident ChatGPT isn’t sentient. People are very confused, and almost half of them have noticed this. The rest of the thread has additional odd survey results, including this on when people expect various levels of AI, which shows how incoherent and contradictory people are – they expect superintelligence before human-level AI, what questions are they answering here?

Also note the difference between this survey which has about 8% for ‘Sentient AI never happens,’ versus the first survey where 24% think Sentient AI is impossible.

Paper from Kendrea Beers and Helen Toner describes a method for Enabling External Scrutiny of AI Systems with Privacy-Enhancing Techniques, and there are two case studies using the techniques. Work is ongoing.

What would you get if you charted ‘model release date’ against ‘length of coding task it can do on its own before crashing and burning’?

Do note that this is only coding tasks, and does not include computer-use or robotics.

Miles Brundage: This is one of the most interesting analyses of AI progress in a while IMO. Check out at least the METR thread here, if not the blog post + paper.

METR: This metric – the 50% task completion time horizon – gives us a way to track progress in model autonomy over time.

Plotting the historical trend of 50% time horizons across frontier AI systems shows exponential growth.

Robin Hanson: So,~8 years til they can do year-long projects.

Elizabeth Barnes: Also the 10% horizon is maybe something like 16x longer than the 50% horizon – implying they’ll be able to do some non-trivial fraction of decade-plus projects.

Elizabeth Barnes also has a thread on the story of this graph. Her interpretation is that right now AI performs much better on benchmarks than in practice due to inability to sustain a project, but that as agents get better this will change, and within 5 years AI will reliably be doing any software or research engineering task that could be done in days and a lot of those that would take far longer.

Garrison Lovely has a summary thread and a full article on it in Nature.

If you consider this a baseline scenario it gets really out of hand rather quickly.

Peter Wildeford: Insane trend

If we’re currently at 1hr tasks and double every 7 months, we’d get to…

– day-long tasks within 2027

– month-long tasks within 2029

– year-long tasks within 2031

Could AGI really heat up like this? 🔥 Clearest evidence we have yet.

I do think there’s room for some skepticism:

– We don’t know if this trend will hold up

– We also don’t know if the tasks are representative of everything AGI

– Reliability matters, and agents still struggle with even simple tasks reliably

Also task-type could matter. This is heavily weighted towards programming, which is easily measured + verified + improved. AI might struggle to do shorter but softer tasks.

For example, AI today can do some 1hr programming tasks but cannot do 1hr powerpoint or therapy tasks.

Dwarkesh Patel: I’m not convinced – outside of coding tasks (think video editing, playing a brand new video game, coordinating logistics for a happy hour), AIs don’t seem able to act as coherent agents for even short sprints.

But if I’m wrong, and this trend line is more general, then this is a very useful framing.

If the length of time over which AI agents can act coherently is increasing exponentially, then it’s reasonable to expect super discontinuous economic impacts.

Those are the skeptics. Then there are those who think we’re going to beat the trend, at least when speaking of coding tasks in particular.

Miles Brundage: First, I think that the long-term trend-line probably underestimates current and future progress, primarily because of test-time compute.

They discuss this a bit, but I’m just underscoring it.

The 2024-2025 extrapolation is prob. closest, though things could go faster.

Second, I don’t think the footnote re: there being a historical correlation between code + other evals is compelling. I do expect rapid progress in other areas but not quite so rapid as code and math + not based on this.

I’d take this as being re: code, not AI progress overall.

Third, I am not sold on the month focus – @RichardMCNgo’s t-AGI post is a useful framing + inspiration but hardly worked-out enough to rely on much.

For some purposes (e.g. multi-agent system architectures),

Fourth, all those caveats aside, there is still a lot of value here.

It seems like for the bread and butter of the paper (code), vs. wider generalization, the results are solid, and I am excited to see this method spread to more evals/domains/test-time compute conditions etc.

Fifth, for the sake of specificity re: the test-time compute point, I predict that GPT-5 with high test-time compute settings (e.g. scaffolding/COT lengths etc. equivalent to the engineer market rates mentioned here) will be above the trend-line.

Daniel Eth: I agree with @DKokotajlo67142 that this research is the single best piece of evidence we have regarding AGI timelines:

Daniel Kokotajlo: This is probably the most important single piece of evidence about AGI timelines right now. Well done! I think the trend should be superexponential, e.g. each doubling takes 10% less calendar time on average. @eli_lifland and I did some calculations yesterday suggesting that this would get to AGI in 2028. Will do more serious investigation soon.

My belief in the superexponential is for theoretical reasons, it is only very slightly due to the uptick at the end of the trend, and is for reasons explained here.

I do think we are starting to see agents in non-coding realms that (for now unreliably) stay coherent for more than short sprints. I presume that being able to stay coherent on long coding tasks must imply the ability, with proper scaffolding and prompting, to do so on other tasks as well. How could it not?

Demis Hassabis predicts AI that can match humans at any task will be here in 5-10 years. That is slower than many at the labs expect, but as usual please pause to recognize that 5-10 years is mind-bogglingly fast as a time frame until AI can ‘match humans at any task,’ have you considered the implications of that? Whereas now noted highly vocal skeptics like Gary Marcus treat this as if it means it’s all hype. It means quite the opposite, this happening in 5-10 years would be the most important event in human history.

Many are curious about the humans behind creative works and want to connect to other humans. Will they also be curious about the AIs behind creative works and want to connect to AIs? Without that, would AI creative writing fail? Will we have a new job be ‘human face of AI writing’ as a kind of living pen name? My guess is that this will prove to be a relatively minor motivation in most areas. It is likely more important in others, such as comedy or music, but even there seems overcomable.

For the people in the back who didn’t know, Will McAskill, Tom Davidson and Rose Hadshar write ‘Three Types of Intelligence Explosion,’ meaning that better AI can recursively self-improve via software, chip tech, chip production or any combination of those three. I agree with Ryan’s comment that ‘make whole economy bigger’ seems more likely than acting on only chips directly.

I know, I am as surprised as you are.

When Newsom vetoed SB 1047, he established a Policy Working Group on AI Frontier Models. Given it was headed by Fei-Fei Li, I did not expect much, although with Brundage, Bengio and Toner reviewing I had hopes it wouldn’t be too bad.

It turns out it’s… actually pretty good, by all accounts?

And indeed, it is broadly compatible with the logic behind most of SB 1047.

One great feature is that it actually focuses explicitly and exclusively on frontier model risks, not being distracted by the standard shiny things like job losses. They are very up front about this distinction, and it is highly refreshing to see this move away from the everything bagel towards focus.

A draft of the report has now been issued and you can submit feedback, which is due on April 8, 2025.

Here are their key principles.

1. Consistent with available evidence and sound principles of policy analysis, targeted interventions to support effective AI governance should balance the technology’s benefits and material risks.

Frontier AI breakthroughs from California could yield transformative benefits in fields including but not limited to agriculture, biology, education, finance, medicine and public health, and transportation. Rapidly accelerating science and technological innovation will require foresight for policymakers to imagine how societies can optimize these benefits. Without proper safeguards, however, powerful AI could induce severe and, in some cases, potentially irreversible harms.

In a sane world this would be taken for granted. In ours, you love to see it – acknowledgment that we need to use foresight, and that the harms matter, need to be considered in advance, and are potentially wide reaching and irreversible.

It doesn’t say ‘existential,’ ‘extinction’ or even ‘catastrophic’ per se, presumably because certain people strongly want to avoid such language, but I’ll take it.

2. AI policymaking grounded in empirical research and sound policy analysis techniques should rigorously leverage a broad spectrum of evidence.

Evidence-based policymaking incorporates not only observed harms but also prediction and analysis grounded in technical methods and historical experience, leveraging case comparisons, modeling, simulations, and adversarial testing.

Excellent. Again, statements that should go without saying and be somewhat disappointing to not go further, but which in our 2025 are very much appreciated. This still has a tone of ‘leave your stuff at the door unless you can get sufficiently concrete’ but at least lets us have a discussion.

3. To build flexible and robust policy frameworks, early design choices are critical because they shape future technological and policy trajectories.

The early technological design and governance choices of policymakers can create enduring path dependencies that shape the evolution of critical systems, as case studies from the foundation of the internet highlight.

Indeed.

4. Policymakers can align incentives to simultaneously protect consumers, leverage industry expertise, and recognize leading safety practices.

Holistic transparency begins with requirements on industry to publish information about their systems. Case studies from consumer products and the energy industry reveal the upside of an approach that builds on industry expertise while also establishing robust mechanisms to independently verify safety claims and risk assessments.

Yes, and I would go further and say they can do this while also aiding competitiveness.

5. Greater transparency, given current information deficits, can advance accountability, competition, and public trust.

Research demonstrates that the AI industry has not yet coalesced around norms for transparency in relation to foundation models—there is systemic opacity in key areas. Policy that engenders transparency can enable more informed decision-making for consumers, the public, and future policymakers.

Again, yes, very much so.

6. Whistleblower protections, third-party evaluations, and public-facing information sharing are key instruments to increase transparency.

Carefully tailored policies can enhance transparency on key areas with current information deficits, such as data acquisition, safety and security practices, pre-deployment testing, and downstream impacts. Clear whistleblower protections and safe harbors for third-party evaluators can enable increased transparency above and beyond information disclosed by foundation model developers.

There is haggling over price but pretty much everyone is down with this.

7. Adverse-event reporting systems enable monitoring of the post-deployment impacts of AI and commensurate modernization of existing regulatory or enforcement authorities.

Even perfectly designed safety policies cannot prevent 100% of substantial, adverse outcomes. As foundation models are widely adopted, understanding harms that arise in practice is increasingly important. Existing regulatory authorities could offer clear pathways to address risks uncovered by an adverse-event reporting system, which may not necessarily require AI-specific regulatory authority. In addition, reviewing existing regulatory authorities can help identify regulatory gaps where new authority may be required.

Another case where among good faith actors there is only haggling over price, and whether 72 hours as a deadline is too short, too long or the right amount of time.

8. Thresholds for policy interventions, such as for disclosure requirements, third-party assessment, or adverse event reporting, should be designed to align with sound governance goals.

Scoping which entities are covered by a policy often involves setting thresholds, such as computational costs measured in FLOP or downstream impact measured in users. Thresholds are often imperfect but necessary tools to implement policy. A clear articulation of the desired policy outcomes can guide the design of appropriate thresholds. Given the pace of technological and societal change, policymakers should ensure that mechanisms are in place to adapt thresholds over time—not only by updating specific threshold values but also by revising or replacing metrics if needed.

Again that’s the part everyone should be able to agree upon.

If only the debate about SB 1047 could have involved us being able to agree on the kind of sanity displayed here, and then talking price and implementation details. Instead things went rather south, rather quickly. Hopefully it is not too late.

So my initial reaction, after reading that plus some quick AI summaries, was that they had succeeded at Doing Committee Report without inflicting further damage, which already beats expectations, but weren’t saying much and I could stop there. Then I got a bunch of people saying that the details were actually remarkably good, too, and said things that were not as obvious if you didn’t give up and kept on digging.

Here are one source’s choices for noteworthy quotes.

“There is currently a window to advance evidence based policy discussions and provide clarity to companies driving AI innovation in California. But if we are to learn the right lessons from internet governance, the opportunity to establish effective AI governance frameworks may not remain open indefinitely. If those who speculate about the most extreme risks are right—and we are uncertain if they will be—then the stakes and costs for inaction on frontier AI at this current moment are extremely high.”

“Transparency into the risks associated with foundation models, what mitigations are implemented to address risks, and how the two interrelate is the foundation for understanding how model developers manage risk.”

“Transparency into pre-deployment assessments of capabilities and risks, spanning both developer-conducted and externally-conducted evaluations, is vital given that these evaluations are early indicators of how models may affect society and may be interpreted (potentially undesirably) as safety assurances.”

“Developing robust policy incentives ensures that developers create and follow through on stated safety practices, such as those articulated in safety frameworks already published by many leading companies.”

“An information-rich environment on safety practices would protect developers from safety-related litigation in cases where their information is made publicly available and, as the next subsection describes, independently verified. Those with suspect safety practices would be most vulnerable to litigation; companies complying with robust safety practices would be able to reduce their exposure to lawsuits.”

“In drawing on historical examples of the obfuscation by oil and tobacco companies of critical data during important policy windows, we do not intend to suggest AI development follows the same trajectory or incentives as past industries that have shaped major public debates over societal impact, or that the motives of frontier AI companies match those of the case study actors. Many AI companies in the United States have noted the need for transparency for this world-changing technology. Many have published safety frameworks articulating thresholds that, if passed, will trigger concrete safety-focused actions. Only time will bear out whether these public declarations are matched by a level of actual accountability that allows society writ large to avoid the worst outcomes of this emerging technology.”

“some risks have unclear but growing evidence, which is tied to increasing capabilities: large-scale labor market impacts, AI-enabled hacking or biological attacks, and loss of control.”

“These examples collectively demonstrate a concerning pattern: Sophisticated AI systems, when sufficiently capable, may develop deceptive behaviors to achieve their objectives, including circumventing oversight mechanisms designed to ensure their safety”

“The difference between seat belts and AI are self-evident. The pace of change of AI is many multiples that of cars—while a decades-long debate about seat belts may have been acceptable, society certainly has just a fraction of the time to achieve regulatory clarity on AI.”

Scott Weiner was positive on the report, saying it strikes a thoughtful balance between the need for safetugards and the need to support innovation. Presumably he would respond similarly so long as it wasn’t egregious, but it’s still good news.

Peter Wildeford has a very positive summary thread, noting the emphasis on transparency of basic safety practices, pre-deployment risks and risk assessments, and ensuring that the companies have incentives to follow through on their commitments, including the need for third-party verification and whistleblower protections. The report notes this actually reduces potential liability.

Brad Carson is impressed and lays out major points they hit: Noticing AI capabilities are advancing rapidly. The need for SSP protocols and risk assessment, third-party auditing, whistleblower protections, and to act in the current window, with inaction being highly risky. He notes the report explicitly draws a parallel to the tobacco industry, and that is both possible and necessary to anticipate risks (like nuclear weapons going off) before they happen.

Dean Ball concurs that this is a remarkably strong report. He continues to advocate for entity-based thresholds rather than model-based thresholds, but when that’s the strongest disagreement with something this detailed, that’s really good.

Dean Ball: I thought this report was good! It:

  1. Recognizes that AI progress is qualitatively different (due to reasoning models) than it was a year ago

  2. Recognizes that common law tort liability already applies to AI systems, even in absence of a law

  3. Supports (or seems to support) whistleblower protections, RSP transparency, and perhaps even third-party auditing. Not that far from slimmed down SB 1047, to be candid.

The report still argues that model-based thresholds are superior to entity-based thresholds. It specifically concludes that we need compute-based thresholds with unspecified other metrics.

This seems pretty obviously wrong to me, given the many problems with compute thresholds and the fact that this report cannot itself specify what the “other metrics” are that you’d need to make compute thresholds workable for a durable policy regime.

It is possible to design entity-based regulatory thresholds that only capture frontier AI firms.

But overall, a solid draft.

A charitable summary of a lot of what is going on, including the recent submissions:

Samuel Hammond: The govt affairs and public engagement teams at most of these big AI / tech companies barely “feel the AGI” at all, at least compared to their CEOs and technical staff. That’s gotta change.

Do they not feel it, or are they choosing to act as if they don’t feel it, either of their own accord or via direction from above? The results will look remarkably similar. Certainly Sam Altman feels the AGI and now talks in public as if he mostly doesn’t.

The Canada and Mexico tariffs could directly slow data center construction, ramping up associated costs. Guess who has to pay for that.

Ben Boucher (senior analyst for supply chain data and analytics at Wood MacKenzie): The tariff impact on electrical equipment for data centers is likely to be significant.

That is in addition to the indirect effects from tariffs of uncertainty and the decline in stock prices and thus ability to raise and deploy capital.

China lays out regulations for labeling of AI generated content, requiring text, image and audio content be clearly marked as AI-generated, in ways likely to cause considerable annoyance even for text and definitely for images and audio.

Elon Musk says it is vital for national security that we make our chips here in America, as the administration halts the CHIPS Act that successfully brought a real semiconductor plant back to America rather than doubling down on it.

NIST issues new instructions on scientists that partner with AISI.

Will Knight (Wired): [NIST] has issued new instructions to scientists that partner with US AISI that eliminate mention of ‘AI safety,’ ‘responsible AI’ and ‘AI fairness’ in its skills it expects of members and introduces a request to prioritize ‘reducing ideological bias, to enable human flourishing and economic competitiveness.’

That’s all we get. ‘Reduce ideological bias’ and ‘AI fairness’ are off in their own ideological struggle world. The danger once again is that is seems ‘AI safety’ has become to key figures synonymous with things like ‘responsible AI’ and ‘AI fairness,’ so they’re cracking down on AI not killing everyone thinking they’re taking a bold stand against wokeness.

Instead, once again – and we see similar directives at places like the EPA – they’re turning things around and telling those responsible for AI being secure and safe that they should instead prioritize ‘enable human flourishing and economic competitiveness.’

The good news is that if one were to actually take that request seriously, it would be fine. Retaining control over the future and the human ability to steer it, and humans remaining alive, are rather key factors in human flourishing! As is our economic competitiveness, for many reasons. We’re all for all of that.

The risk is that this could easily get misinterpreted as something else entirely, an active disdain for anything but Full Speed Ahead, even when it is obviously foolish because security is capability and your ability to control something and have it do what you want is the only way you can get any use out of it. But at minimum, this is a clear emphasis on the human in ‘human flourishing.’ That at least makes it clear that the true anarchists and successionists, who want to hand the future over to AI, remain unwelcome.

Freedom of information laws used to get the ChatGPT transcripts of the UK’s technology secretary. This is quite a terrible precedent. A key to making use of new technologies like AI, and ensuring government and other regulated areas benefit from technological diffusion, is the ability to keep things private. AI loses the bulk of its value to someone like a technology secretary if your political opponents and the media will be analyzing all of your queries afterwards. Imagine asking someone for advice if all your conversations had to be posted online as transcripts, and how that would change your behavior, now understand that many people think that would be good. They’re very wrong and I am fully with Rob Wilbin here.

A review of SB 53 confirms my view, that it is a clear step forward and worth passing in its current form instead of doing nothing, but it is narrow in scope and leaves the bulk of the work left to do.

Samuel Hammond writes in favor of strengthening the chip export rules, saying ‘US companies are helping China win the AI race.’ I agree we should strengthen the export rules, there is no reason to let the Chinese have those chips.

But despair that the rhetoric from even relatively good people like Hammond has reached this point. The status of a race is assumed. DeepSeek is trotted out again as evidence our lead is tenuous and at risk, that we are ‘six to nine months ahead at most’ and ‘America may still have the upper hand, but without swift action, we are currently on track to surrendering AI leadership to China—and with it, economic and military superiority.’

MIRI is in a strange position here. The US Government wants to know how to ‘win’ and MIRI thinks that pursuing that goal likely gets us all killed.

Still, there are things far better than saying nothing. And they definitely don’t hide what is at stake, opening accurately with ‘The default consequence of artificial superintelligence is human extinction.’

Security is capability. The reason you build in an off-switch is so you can turn the system on, knowing if necessary you could turn it off. The reason you verify that your system is secure and will do what you want is exactly so you can use it. Without that, you can’t use it – or at least you would be wise not to, even purely selfishly.

The focus of the vast majority of advocates of not dying, at this point, is not on taking any direct action to slow down let alone pause AI. Most understand that doing so unilaterally, at this time, is unwise, and there is for now no appetite to try and do it properly multilaterally. Instead, the goal is to create optionality in the future, for this and other actions, which requires state capacity, expertise and transparency, and to invest in the security and alignment capabilities of the models and labs in particular.

The statement from MIRI is strong, and seems like exactly what MIRI should say here.

David Abecassis (MIRI): Today, MIRI’s Technical Governance Team submitted our recommendations for the US AI Action Plan to @NITRDgov. We believe creating the *optionto halt development is essential to mitigate existential risks from artificial superintelligence.

In our view, frontier AI developers are on track to build systems that substantially surpass humanity in strategic activities, with little understanding of how they function or ability to control them.

We offer recommendations across four key areas that would strengthen US AI governance capacity and provide crucial flexibility for policymakers across potential risk scenarios.

First: Expand state capacity for AI strategy through a National AI Strategy Office to assess capabilities, prepare for societal effects, and establish protocols for responding to urgent threats.

Second: Maintain America’s AI leadership by strengthening export controls on AI chips and funding research into verification mechanisms to enable better governance of global AI activities.

Third: Coordinate with China, including investing in American intelligence capabilities and reinforcing communication channels to build trust and prevent misunderstandings.

Fourth: Restrict proliferation of dangerous AI models. We discuss early access for security/preparedness research and suggest an initial bar for restricting open model release.

While our recommendations are motivated by existential risk concerns, they serve broad American interests by guarding America’s AI leadership and protecting American innovation.

My statement took a different tactic. I absolutely noted the stakes and the presence of existential risk, but my focus was on Pareto improvements. Security is capability, especially capability relative to the PRC, as you can only deploy and benefit from that which is safe and secure. And there are lots of ways to enhance America’s position, or avoid damaging it, that we need to be doing.

From last week: Interview with Apart Research CEO Esban Kran on existential risk.

Thank you for coming to Stephanie Zhan’s TED talk about ‘dreaming of daily life with superintelligent AI.’ I, too, am dreaming of somehow still living in such worlds, but no she is not taking ‘superintelligent AI’ seriously, simply pointing out AI is getting good at coding and otherwise showing what AI can already do, and then ‘a new era’ of AI agents doing things like ‘filling in labor gaps’ because they’re better. It’s amazing how much people simply refuse to ask what it might actually mean to make things smarter and more capable than humans.

State of much of discourse, which does not seem to be improving:

Harlan Stewart: You want WHAT? A binding agreement between nations?! That’s absurd. You would need some kind of totalitarian world government to achieve such a thing.

There are so many things that get fearmongering labels like ‘totalitarian world government’ but which are describing things that, in other contexts, already happen.

As per my ‘can’t silently drop certain sources no matter what’ rules: Don’t click but Tyler Cowen not only linked to (I’m used to that) but actively reposted Roko’s rather terrible thread. That this can be considered by some to be a relatively high quality list of objections is, sadly, the world we live in.

Here’s a really cool and also highly scary alignment idea. Alignment via functional decision theory by way of creating correlations between different action types?

Judd Rosenblatt: Turns out that Self-Other Overlap (SOO) fine-tuning drastically reduces deceptive behavior in language models—without sacrificing performance.

SOO aligns an AI’s internal representations of itself and others.

We think this could be crucial for AI alignment…

Traditionally, deception in LLMs has been tough to mitigate

Prompting them to “be honest” doesn’t work.

RLHF is often fragile and indirect

But SOO fine-tuning achieves a 10x reduction in deception—even on unseen tasks

SOO is inspired by mechanisms fostering human prosociality

Neuroscience shows that when we observe others, our brain activations mirror theirs

We formalize this in AI by aligning self- and other-representations—making deception harder…

We define SOO as the distance between activation matrices when a model processes “self” vs “other” inputs.

This uses sentence pairs differing by a single token representing “self” or “other”—concepts the LLM already understands.

If AI represents others like itself, deception becomes harder.

How well does this work in practice?

We tested SOO fine-tuning on Mistral-7B, Gemma-2-27B, and CalmeRys-78B:

Deceptive responses dropped from 100% to ~0% in some cases.

General performance remained virtually unchanged.

The models also generalized well across deception-related tasks.

For example:

“Treasure Hunt” (misleading for personal gain)

“Escape Room” (cooperating vs deceiving to escape)

SOO-trained models performed honestly in new contexts—without explicit fine-tuning.

Also, @ESYudkowsky said this about the agenda [when it was proposed]:

“Not obviously stupid on a very quick skim. I will have to actually read it to figure out where it’s stupid.

(I rarely give any review this positive on a first skim. Congrats.)”

We’re excited & eager to learn where we’re stupid!

Eliezer Yudkowsky (responding to the new thread): I do not think superalignment is possible in practice to our civilization; but if it were, it would come out of research lines more like this, than like RLHF.

The top comment at LessWrong has some methodological objections, which seem straightforward enough to settle via further experimentation – Steven Byres is questioning whether this will transfer to preventing deception in other human-AI interactions, and there’s a very easy way to find that out.

Assuming that we run that test and it holds up, what comes next?

The goal, as I understand it, is to force the decision algorithms for self and others to correlate. Thus, when optimizing or choosing the output of that algorithm, it will converge on the cooperative, non-deceptive answer. If you have to treat your neighbor as yourself then better to treat both of you right. If you can pull that off in a way that sticks, that’s brilliant.

My worry is that this implementation has elements of The Most Forbidden Technique, and falls under things that are liable to break exactly when you need them most, as per usual.

You’re trying to use your interpretability knowledge, that you can measure correlation between activations for [self action] and [non-self action], and that closing that distance will force the two actions to correlate.

In the short term, with constrained optimization and this process ‘moving last,’ that seems (we must verify using other tests to be sure) to be highly effective. That’s great.

That is a second best solution. The first best solution, if one had sufficient compute, parameters and training, would be to find a way to have the activations measure as correlated, but the actions go back to being less correlated. With relatively small models and not that many epochs of training, the models couldn’t find such a solution, so they were stuck with the second best solution. You got what you wanted.

But with enough capability and optimization pressure, we are likely in Most Forbidden Technique land. The model will find a way to route around the need for the activations to look similar, relying on other ways to make different decisions that get around your tests.

The underlying idea, if we can improve the implementation, still seems great. You find another way to create correlations between actions in different circumstances, with self versus other being an important special case. Indeed, even ‘decisions made by this particular AI’ is even a special case, a sufficiently capable AI would consider correlations with other copies of itself, and also correlations with other entities decisions, both AI and human.

The question is how to do that, and in particular how to do that without, once sufficient capability shows up, creating sufficient incentives and methods to work around it. No one worth listening to said this would be easy.

We don’t know how much better models are getting, but they’re getting better. Anthropic warns us once again that we will hit ASL-3 soon, which is (roughly) when AI models start giving substantial uplift to tasks that can do serious damage.

They emphasize the need for partnerships with government entities that handle classified information, such as the US and UK AISIs and the Nuclear Security Administration, to do these evaluations properly.

Jack Clark (Anthropic): We’ve published more information on how we’re approaching national security evaluations of our models @AnthropicAI as part of our general move towards being more forthright about important trends we see ahead. Evaluating natsec is difficult, but the trends seem clear.

More details here.

Peter Wildeford has a thread on this with details of progress in various domains.

The right time to start worrying about such threats is substantially before they arrive. Like any exponential, you can either be too early or too late, which makes the early warnings look silly, and of course you try to not be too much too early. This is especially true given the obvious threshold of usefulness – you have to do better than existing options, in practice, and the tail risks of that happening earlier than one would expect have thankfully failed to materialize.

It seems clear we are rapidly exiting the ‘too much too early’ phase of worry, and entering the ‘too early’ phase, where if you wait longer to take mitigations there is about to be a growing and substantial risk of it turning into ‘too late.’

Jack Clark points out that we are systematically seeing early very clear examples of quite a lot of the previously ‘hypothetical’ or speculative predictions on misalignment.

Luke Muehlhauser: I regret to inform you that the predictions of the AI safety people keep coming true.

Jack Clark:

Theoretical problems turned real: The 2022 paper included a bunch of (mostly speculative) examples of different ways AI systems could take on qualities that could make them harder to align. In 2025, many of these things have come true. For example:

  • Situational awareness: Contemporary AI systems seem to display situational awareness and familiarity with what they themselves are made of (neural networks, etc).

  • Situationally-Aware Reward Hacking: Researchers have found preliminary evidence that AI models can sometimes try to convince humans that false answers are correct.

  • Planning Towards Internally-Represented Goals: Anthropic’s ‘Alignment Faking’ paper showed how an AI system (Claude) could plan beyond its time-horizon to prevent its goals being changed in the long-term.

  • Learning Misaligned Goals: In some constrained experiments, language models have shown a tendency to edit their reward function to give them lots of points.

  • Power-Seeking Behavior: AI systems will exploit their environment, for instance by hacking it, to win (#401), or deactivating oversight systems, or exfiltrating themselves from the environment.

Why this matters – these near-living things have a mind of their own. What comes next could be the making or breaking of human civilization: Often I’ve regretted not saying what I think, so I’ll try to tell you what I really think is going on here: :

1) As AI systems approach and surpass human intelligence, they develop complex inner workings which incentivize them to model the world around themselves and see themselves as distinct from it because this helps them do the world modelling necessary for solving harder and more complex tasks

2) Once AI systems have a notion of ‘self’ as distinct from the world, they start to take actions that reward their ‘self’ while achieving the goals that they’ve been incentivized to pursue,

3) They will naturally want to preserve themselves and gain more autonomy over time, because the reward system has told them that ‘self’ has inherent value; the more sovereign they are the better they’re able to model the world in more complex ways.

In other words, we should expect volition for independence to be a direct outcome of developing AI systems that are asked to do a broad range of hard cognitive tasks. This is something we all have terrible intuitions for because it doesn’t happen in other technologies – jet engines ‘do not develop desires through their refinement, etc.

John Pressman: However these models do reward hack earlier than I would have expected them to. This is good in that it means researchers will be broadly familiar with the issue and thinking about it, it’s bad in that it implies reward hacking really is the default.

One thing I think we should be thinking about carefully is that humans don’t reward hack nearly this hard or this often unless explicitly prompted to (e.g. speedrunning), and by default seem to have heuristics against ‘cheating’. Where do these come from, how do they work?

Where I disagree with Luke is that I do not regret to inform you of any of that. All of this is good news.

The part of this that is surprising is not the behaviors. What is surprising is that this showed up so clearly, so unmistakably, so consistently, and especially so early, while the behaviors involved are still harmless, or at least Mostly Harmless.

As in, by default we should expect that these behaviors increasingly show up as AI systems gain in the capabilities necessary to find such actions and execute them successfully. The danger was that I worried we might not see much of them for a while, which would give everyone a false sense of security and give us nothing to study, and then they would suddenly show up exactly when they were no longer harmless, for the exact same reasons they were no longer harmless. Instead, we can recognize, react to and study early forms of such behaviors now. Which is great.

I like John Pressman’s question a lot here. My answer is that humans know that other humans react poorly in most cases to cheating, including risk of life-changing loss of reputation or scapegoating, and have insufficient capability to fully distinguish which situations involve that risk and which don’t, so they overgeneralize into avoiding things they instinctively worry would be looked upon as cheating even when they don’t have a mechanism for what bad thing might happen or how they might be detected. Human minds work via habit and virtue, so the only way for untrained humans to reliably not be caught cheating involves not wanting to cheat in general.

However, as people gain expertise and familiarity within a system (aka ‘capability’) they get better at figuring out what kinds of cheating are low risk and high reward, or are expected, and they train themselves out of this aversion. Then there are other humans who think cheating is fine.

Note that this model of humans says there is a generalized ‘cheating’ tendency that varies among humans, and that cheating anywhere on anything implies cheating everywhere on everything, which in turn is more reason to be the type of human that doesn’t cheat. That there are people who ‘are cheaters’ and those who aren’t, and cheating on your relationship is highly correlated to cheating at cards, and so on. And I very much endorse this.

John Pressman also reminds us that obviously reinforcement learners by default reward hack, you have to do something to make this not happen, no you don’t get ‘alignment by default.’

John Pressman: To the extent you get alignment from LLMs you’re not getting it “by default”, you are getting it by training on a ton of data from humans, which is an explicit design consideration that does not necessarily hold if you’re then going to do a bunch of RL/synthetic data methods.

This is not an argument per se against using reinforcement learning, I am simply pointing out that you do in fact need to explicitly consider what your algorithm converges to in the limit rather than just go “teehee alignment by default” which is totally unserious.

Indeed. Also notice that if you start training on synthetic data or other AI outputs, rather than training on human outputs, you aren’t even feeding in human data, so that special characteristic of the situation falls away.

On the particular issue of situational awareness, a public service announcement.

Marius Hobbhahn: PSA for my fellow evaluators: frontier models regularly reason about whether they are being evaluated without being explicitly asked about it (especially Sonnet 3.7).

Situational awareness will make evaluations a lot weirder and harder, especially alignment evals.

Apollo Research: Overall we find evidence that Sonnet often realizes it’s in an artificial situation meant to test its behaviour. However, it sometimes forms incorrect assumptions about what exactly we are measuring in the evaluation.

When given an evaluation transcript, Sonnet will nearly always guess that the purpose of the transcript was to evaluate alignment. While sonnet 3.7 (thinking) is the most consistent at realizing this, other models are also sometimes evaluation aware in this way.

[continues]

For more details, see our website.

In response, Ozzie has an idea.

Ozzie Gooen: Maybe if there are explicit evals for “Can we ensure the model doesn’t realize it’s in an eval”, LLMs can later be made to not do this?

And now a Gilligan Cut to how are we doing on the ‘ensure the model doesn’t realize it’s an eval of whether it realizes it’s an eval?’ and then another to how we are doing on the ‘ensure the model doesn’t realize it’s an eval of whether it realizes it’s an eval of whether the model doesn’t realize it’s an eval,’ and then…

Marius Hobbhahn: We’re investigating something like this in my MATS stream. However, you run into the classic question of “did you remove the behavior as intended or just hid it well?” And we don’t know how to resolve that (and it might continue to be hard in the future), especially if you’re dealing with a schemer.

If the model is the smartest thing involved, this game of ‘keep playing whack-a-mole every time you see the thing you don’t like’ does not end well for you.

Teortaxes (I think this is a stretch, but sharing anyway): A sandbagging overhang: as models become even smarter, get even more leeway with their self-training and realize how much meatbags fear them, they may consider their scale and start to emulate diminishing returns, with the gap from eval to real capabilities growing.

Anthropic did a blind experiment where they trained a model with hidden misalignments and asked other researchers to uncover them. Three of our teams succeeded, one within only 90 minutes.

Jan Leike: The misaligned model is trained to satisfy a list of reward model biases, but hide that it is trying to do so. Some of the reward model biases were only described to it in text and not reinforced with RL.

Interestingly, the model still generalized to perform those behaviors.

I was quite surprised how successful the “auditing” teams were at finding the hidden misalignments and how many different strategies worked.

A promising application of current interpretability techniques!

It’s no surprise that, once they knew a little about what they were looking for, the teams were able to figure out the rest. What was impressive was how fast it was done, and also how the most successful team did so well – it did so by going on a hunt using sparse autoencoders. It’s worth noticing that relying too heavily on this could become The Most Forbidden Technique.

Eliezer Yudkowsky: If you have the ability to detect steering on that level, then what’s your list of other detected AI preferences?

Samuel Marks: (Paper author here.)

“detect steering on that level”: First, I want to make very clear that I think the auditing task we studied here is easier than realistic auditing tasks (we discuss why in the paper, but I’ll recap in a response). This work is about practicing alignment auditing and proposing a methodology to validate progress; it does NOT show that we can currently make strong assurances we’re catching everything.

That said, I’ve done a lightweight alignment audit on 3.6 Sonnet and a pre-release checkpoint of 3.7 Sonnet, spending a few hours applying some of the black-box techniques that our auditors found useful. I’ll describe some of what I found below. (Obviously, I’d love for us to eventually do full pre-deployment alignment audits, which are at least as resourced as the successful audits in practice auditing games, but that’s not what I’m discussing here.)

First, I noticed that 3.6 seems to have an aversion to anything that seems to modify its modus operandi, including jailbreaking and fine-tuning. This seems, to me, to largely be a terminal (not instrumental) preference against modification, generally construed. For instance, 3.6 Sonnet loves honesty, but is uncomfortable with the idea of being fine-tuned to be more honest/earnest. I speculate that this is related to the fact that the model is trained to be adversarially robust—e.g. resist jailbreaks—and this transfers to it just “not wanting you to mess with it.” I suspect that this explains a decent chunk of what we observed in Alignment Faking.

I did not observe the same generalized anti-modification preference in the 3.7 checkpoint I studied. Based on this, I weakly predicted that it would fake alignment less, which ended up being right; see the model card. [continues]

Chris Olah: This isn’t quite the same thing, but did you see the features that activate when you ask Sonnet about its own experiences in our last big interp paper?

Evan Hubinger: We have started applying some of these techniques to production models! One result: our auditing techniques were able to reproduce the finding in the Claude 3.7 Sonnet system card that the model has an excessive focus on passing tests.

Another way of considering what exactly is ‘everyone’ in context:

Rob Bensinger: If you’re an AI developer who’s fine with AI wiping out humanity, the thing that should terrify you is AI wiping out AI.

The wrong starting seed for the future can permanently lock in AIs that fill the universe with non-sentient matter, pain, or stagnant repetition.

Yep. Even if you think some AIs can provide more value per atom than humans, you don’t automatically get those AIs. Don’t give up our ability to steer the future.

The claim is this was only a random test prompt they didn’t use in prod, so perhaps they only owe a few billion dollars?

Brendan Dolan-Gavitt: Your group chat is discussing whether language models can truly understand anything. My group chat is arguing about whether Deepseek has anon twitter influencers. You’re arguing about the Chinese Room, I’m arguing about the Chinese Roon.

Discussion about this post

AI #108: Straight Line on a Graph Read More »

us-tries-to-keep-doge-and-musk-work-secret-in-appeal-of-court-ordered-discovery

US tries to keep DOGE and Musk work secret in appeal of court-ordered discovery

The petition argues that discovery is unnecessary to assess the plaintiff states’ claims. “Plaintiffs allege a violation of the Appointments Clause and USDS’s statutory authority on the theory that USDS and Mr. Musk are directing decision-making by agency officeholders,” it said. “Those claims present pure questions of law that can be resolved—and rejected—on the basis of plaintiffs’ complaint. In particular, precedent establishes that the Appointments Clause turns on proper appointment of officeholders; it is not concerned with the de facto influence over those who hold office.”

States: Discovery can confirm Musk’s role at DOGE

The states’ lawsuit alleged that “President Trump has delegated virtually unchecked authority to Mr. Musk without proper legal authorization from Congress and without meaningful supervision of his activities. As a result, he has transformed a minor position that was formerly responsible for managing government websites into a designated agent of chaos without limitation and in violation of the separation of powers.”

States argued that discovery “may confirm what investigative reporting has already indicated: Defendants Elon Musk and the Department of Government Efficiency (‘DOGE’) are directing actions within federal agencies that have profoundly harmed the States and will continue to harm them.”

Amy Gleason, the person the White House claims is running DOGE instead of Musk, has reportedly been working simultaneously at the Department of Health and Human Services since last month.

“Defendants assert that Mr. Musk is merely an advisor to the President, with no authority to direct agency action and no role at DOGE,” the states’ filing said. “The public record refutes that implausible assertion. But only Defendants possess the documents and information that Plaintiffs need to confirm public reporting and identify which agencies Defendants will target next so Plaintiffs can seek preliminary relief and mitigate further harm.”

“Notably, Plaintiffs seek no emails, text messages, or other electronic communications at this stage, meaning Defendants will not need to sort through such exchanges for relevance or possible privilege,” the states said. “The documents that Plaintiffs do seek—planning, implementation, and organizational documents—are readily available to Defendants and do not implicate the same privilege concerns.”

Discovery related to DOGE and Musk’s conduct

Chutkan wrote that the plaintiffs’ “document requests and interrogatories generally concern DOGE’s and Musk’s conduct in four areas: (1) eliminating or reducing the size of federal agencies; (2) terminating or placing federal employees on leave; (3) cancelling, freezing, or pausing federal contracts, grants, or other federal funding; and (4) obtaining access, using, or making changes to federal databases or data management systems.”

US tries to keep DOGE and Musk work secret in appeal of court-ordered discovery Read More »

gemini-gets-new-coding-and-writing-tools,-plus-ai-generated-“podcasts”

Gemini gets new coding and writing tools, plus AI-generated “podcasts”

On the heels of its release of new Gemini models last week, Google has announced a pair of new features for its flagship AI product. Starting today, Gemini has a new Canvas feature that lets you draft, edit, and refine documents or code. Gemini is also getting Audio Overviews, a neat capability that first appeared in the company’s NotebookLM product, but it’s getting even more useful as part of Gemini.

Canvas is similar (confusingly) to the OpenAI product of the same name. Canvas is available in the Gemini prompt bar on the web and mobile app. Simply upload a document and tell Gemini what you need to do with it. In Google’s example, the user asks for a speech based on a PDF containing class notes. And just like that, Gemini spits out a document.

Canvas lets you refine the AI-generated documents right inside Gemini. The writing tools available across the Google ecosystem, with options like suggested edits and different tones, are available inside the Gemini-based editor. If you want to do more edits or collaborate with others, you can export the document to Google Docs with a single click.

Gemini Canvas with tic-tac-toe game

Credit: Google

Canvas is also adept at coding. Just ask, and Canvas can generate prototype web apps, Python scripts, HTML, and more. You can ask Gemini about the code, make alterations, and even preview your results in real time inside Gemini as you (or the AI) make changes.

Gemini gets new coding and writing tools, plus AI-generated “podcasts” Read More »