Author name: Kelly Newman

doge’s.gov-site-lampooned-as-coders-quickly-realize-it-can-be-edited-by-anyone

DOGE’s .gov site lampooned as coders quickly realize it can be edited by anyone

“An official website of the United States government,” reads small text atop the Department of Government Efficiency (DOGE) website that Elon Musk’s team started populating this week with information on agency cuts.

But you apparently don’t have to work in government to push updates to the site. A couple of prankster web developers told 404 Media that they separately discovered how “insecure” the DOGE site was, seemingly pulling from a “database that can be edited by anyone.”

One coder couldn’t resist and pushed two updates that, as of this writing, remained on the DOGE site. “This is a joke of a .gov site,” one read. “THESE ‘EXPERTS’ LEFT THEIR DATABASE OPEN,” read another.

404 Media spoke to two other developers who suggested that the DOGE site is not running on government servers. Instead, it seems to be running on a Cloudflare Pages site and is relying on a database that “can be and has been written to by third parties and will show up on the live website,” the developers told 404 Media.

Archived versions of the DOGE site show that it was basically blank before Tuesday. That’s when Musk held a DOGE press conference in the Oval Office, promising that DOGE is “actually trying to be as transparent as possible.” At that time, Musk claimed that DOGE was being “maximally transparent” by posting about “all” actions to X (Musk’s social media platform) and to the DOGE website. (Wired deemed the DOGE site “one big X ad” because it primarily seems to exist to point to Musk’s social media platform.)

According to 404 Media, after Musk made that statement, his team rushed to build out the DOGE website, mirroring X posts from the DOGE account and compiling stats on the federal workforce.

But in rushing, DOGE appears to have skipped security steps that are expected of government websites. That pattern is troubling some federal workers, as DOGE has already been dinged by workers concerned by Musk’s team seizing access to sensitive government information and sharing it in ways deemed less secure. For example, last week, Department of Education officials raised alarms about DOGE employees using personal emails viewed as less secure than government email addresses, seemingly in violation of security protocols. These personal emails also seemed to shroud the true identities of DOGE staffers, whereas other government employees must use their full names in official communications.

DOGE’s .gov site lampooned as coders quickly realize it can be edited by anyone Read More »

arm-to-start-making-server-cpus-in-house

Arm to start making server CPUs in-house

Cambridge-headquartered Arm has more than doubled in value to $160 billion since it listed on Nasdaq in 2023, carried higher by explosive investor interest in AI. Arm’s partnerships with Nvidia and Amazon have driven its rapid growth in the data centers that power AI assistants from OpenAI, Meta, and Anthropic.

Meta is the latest big tech company to turn to Arm for server chips, displacing those traditionally provided by Intel and AMD.

During last month’s earnings call, Meta’s finance chief Susan Li said it would be “extending our custom silicon efforts to [AI] training workloads” to drive greater efficiency and performance by tuning its chips to its particular computing needs.

Meanwhile, an Arm-produced chip is also likely to eventually play a role in Sir Jony Ive’s secretive plans to build a new kind of AI-powered personal device, which is a collaboration between the iPhone designer’s firm LoveFrom, OpenAI’s Sam Altman, and SoftBank.

Arm’s designs have been used in more than 300 billion chips, including almost all of the world’s smartphones. Its power-efficient designs have made its CPUs, the general-purpose workhorse that sits at the heart of any computer, an increasingly attractive alternative to Intel’s chips in PCs and servers at a time when AI is making data centers much more energy-intensive.

Arm, which started out in a converted turkey barn in Cambridgeshire 35 years ago, became ubiquitous in the mobile market by licensing its designs to Apple for its iPhone chips, as well as Android suppliers such as Qualcomm and MediaTek. Maintaining its unique position in the center of the fiercely competitive mobile market has required a careful balancing act for Arm.

But Son has long pushed for Arm to make more money from its intellectual property. Under Haas, who became chief executive in 2022, Arm’s business model began to evolve, with a focus on driving higher royalties from customers as the company designs more of the building blocks needed to make a chip.

Going a step further by building and selling its own complete chip is a bold move by Haas that risks putting it on a collision course with customers such as Qualcomm, which is already locked in a legal battle with Arm over licensing terms, and Nvidia, the world’s most valuable chipmaker.

Arm, SoftBank, and Meta declined to comment.

Additional reporting by Hannah Murphy.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Arm to start making server CPUs in-house Read More »

man-offers-to-buy-city-dump-in-last-ditch-effort-to-recover-$800m-in-bitcoins

Man offers to buy city dump in last-ditch effort to recover $800M in bitcoins

Howells told The Times that he envisions cleaning up the site and turning it into a park, but the council’s analysis seems to suggest that wouldn’t be a suitable use. Additionally, the council noted that there aren’t viable alternative sites for the solar farm, which, therefore, must be built on the landfill site or else potentially set back the city’s climate goals.

If Howells can’t turn the landfill into a park, he suggested that he could simply clear it out so that it can be used as a landfill again.

But the Newport council does not appear to be entertaining his offer, the same way the council seemingly easily rejected his prior offer to share his bitcoin profits if granted access to dig up the landfill. When asked about Howells’ most recent offer, a council spokesperson directed The Times to a 2023 statement holding strong to the city’s claims that Howells gave up ownership of the bitcoins the moment the hard drive hit the landfill and his plans for excavation would come at “a prohibitively high cost.”

“We have been very clear and consistent in our responses that we cannot assist Mr. Howells in this matter,” the spokesperson said. “Our position has not changed.”

Howells insists his plan is “logical”

But Howells told The Guardian that it was “quite a surprise” to learn the city planned to close the landfill, reportedly in the 2025–26 financial year. This wasn’t disclosed in the court battle, he said, where the council claimed that “closing the landfill” to allow his search “would have a huge detrimental impact on the people of Newport.”

“I expected it would be closed in the coming years because it’s 80–90 percent full—but didn’t expect its closure so soon,” Howells told The Guardian. “If Newport city council would be willing, I would potentially be interested in purchasing the landfill site ‘as is’ and have discussed this option with investment partners and it is something that is very much on the table.”

Man offers to buy city dump in last-ditch effort to recover $800M in bitcoins Read More »

burning-in-woman’s-legs-turned-out-to-be-slug-parasites-migrating-to-her-brain

Burning in woman’s legs turned out to be slug parasites migrating to her brain

It started with a bizarre burning sensation in her feet. Over the next two days, the searing pain crept up her legs. Any light touch made it worse, and over-the-counter pain medicine offered no relief.

On the third day, the 30-year-old, otherwise healthy woman from New England went to an emergency department. Her exam was normal. Her blood tests and kidney function were normal. The only thing that stood out was a high number of eosinophils—white blood cells that become active with certain allergic diseases, parasitic infections, or other medical conditions, such as cancer. The woman was discharged and advised to follow up with her primary care doctor.

Over the next few days, the scorching sensation kept advancing, invading her trunk and arms. She developed a headache that was also unfazed by over-the-counter pain medicine. Seven days into the illness, she went to a second emergency department. There, the findings were much the same: Normal exam, normal blood tests, normal kidney function, and high eosinophil count—this time higher. The reference range for this count was 0 to 400; her count was 1,050. She was given intravenous medicine to treat her severe headache, then once again discharged with a plan to see her primary care provider.

At home again, with little relief, a family member gave her a prescription sleep aid to help her get some rest. The next day, she awoke confused, saying she needed to pack for a vacation and couldn’t be reasoned with to return to bed. After hours in this fog, her partner brought her to an emergency department for a third time, this time the one at Massachusetts General Hospital.

Getting warmer

In a case report published in the New England Journal of Medicine, doctors explain how they figured out the source of her fiery symptoms—worms burrowing into her brain. By this point, she was alert but disoriented and restless. She couldn’t answer questions consistently or follow commands.

The doctors at Mass General, including a neurologist specializing in infectious diseases, quickly focused their attention on the fact that the woman had recently traveled. Just four days before her feet began burning, she had returned from a three-week trip that included stops in Bangkok, Thailand; Tokyo, Japan; and Hawaii. They asked what she ate. In Thailand, she ate street foods but nothing raw. In Japan, she ate sushi several times and spent most of her time in a hotel. In Hawaii, she again ate sushi as well as salads.

Burning in woman’s legs turned out to be slug parasites migrating to her brain Read More »

ai-#103:-show-me-the-money

AI #103: Show Me the Money

The main event this week was the disastrous Paris AI Anti-Safety Summit. Not only did we not build upon the promise of the Bletchley and Seoul Summits, the French and Americans did their best to actively destroy what hope remained, transforming the event into a push for a mix of nationalist jingoism, accelerationism and anarchism. It’s vital and also difficult not to panic or despair, but it doesn’t look good.

Another major twist was that Elon Musk made a $97 billion bid for OpenAI’s nonprofit arm and its profit and control interests in OpenAI’s for-profit arm. This is a serious complication for Sam Altman’s attempt to buy those same assets for $40 billion, in what I’ve described as potentially the largest theft in human history.

I’ll be dealing with that tomorrow, along with two other developments in my ongoing OpenAI series The Mask Comes Off. In Altman’s Three Observations, he gives what can best be described as a cartoon villain speech about how AI will only be a good thing, and how he knows doing this and the risks involved won’t be popular but he’s going to do it anyway. Then, we look at the claim from the Summit, by OpenAI, that AI will complement rather than substitute for humans because that is a ‘design decision.’ Which will reveal, in yet another way, the extent to which there is no plan.

OpenAI also plans to release ‘GPT-4.5’ in a matter of weeks, which is mostly the same timeline as the full o3, followed by the promised ‘GPT-5’ within months that Altman says is smarter than he is. It’s a bold strategy, Cotton.

To their credit, OpenAI also released a new version of their model spec, with major changes throughout and a completely new structure. I’m going to need time to actually look into it in detail to know what I think about it.

In the meantime, what else is happening?

  1. Language Models Offer Mundane Utility. Don’t go there. We tried to warn you.

  2. Language Models Don’t Offer Mundane Utility. No episodic memory?

  3. We’re in Deep Research. Reactions still very positive. Pro to get 10 uses/month.

  4. Huh, Upgrades. GPT-4.5, GPT-5, Grok 3, all coming quite soon. And PDFs in o3.

  5. Seeking Deeply. And r1 begat s1.

  6. Smooth Operator. Use it directly with Google Drive or LinkedIn or similar.

  7. They Took Our Jobs. The California Faculty Association vows to fight back.

  8. Maxwell Tabarrok Responds on Future Wages. The crux is what you would expect.

  9. The Art of the Jailbreak. Reports from Anthropic’s competition.

  10. Get Involved. OpenPhil grants, Anthropic, DeepMind.

  11. Introducing. The Anthropic Economic Index.

  12. Show Me the Money. Over $300 billion in Capex spend this year.

  13. In Other AI News. Adaptation is super fast now.

  14. Quiet Speculations. How much do you understand right now?

  15. The Quest for Sane Regulations. There are hard problems. I hope someone cares.

  16. The Week in Audio. Cowen, Waitzkin, Taylor, emergency podcast on OpenAI.

  17. The Mask Comes Off. What was your ‘aha’ moment?

  18. Rhetorical Innovation. The greatest story never listened to.

  19. Getting Tired of Winning. No, seriously, it doesn’t look good.

  20. People Really Dislike AI. I do not expect this to change.

  21. Aligning a Smarter Than Human Intelligence is Difficult. Joint frameworks?

  22. Sufficiently Capable AIs Effectively Acquire Convergent Utility Functions. Oh.

  23. People Are Worried About AI Killing Everyone. Who, me? Well, yeah.

  24. Other People Are Not As Worried About AI Killing Everyone. They’re fine with it.

  25. The Lighter Side. Gotta keep digging.

Study finds GPT-4o is a formalist judge, in that like students it judged appeals of war crime cases by looking at the law, whereas actual judges cared about who was sympathetic. But this remarkably little to do with the headline question of ‘Can large language models (LLMs) replace human judges?’ and to the extent it does, the answer is plausibly no, because we mostly do want judges to favor the sympathetic, no matter what we say. They tried to fix this with prompt engineering and failed, which I am very confident was what we call a Skill Issue. The real central issue is the LLMs would need to be adversarially robust arbiters of the law and the facts of cases, and GPT-4o very obviously is Not It.

Demonstration of the Gemini feature where you share your screen and it helps solve your problems, including with coding, via AI Studio.

How about AI doing economics peer review? A study says the LLMs effectively distinguish paper quality including top tier submissions but exhibit biases favoring prominent institutions, male authors, and renowned economists – perhaps because the LLMs are being asked to model paper reviews in economics, and the good news there is that if you know about a bias you can correct it either within the LLM evaluation or by controlling for it post-hoc. Even more impressively, the authors were total cheapskates here, and used GPT-4o-mini – not even GPT-4o! Imagine what they could have done with o1-pro or even Gemini Flash Deep Thinking. I do worry about adversarial robustness.

Claim that extracting structured data from documents at low prices is a solved problem, as long as you don’t need 99%+ accuracy or various specific things like complex tables, signatures or scan lines. I found it odd to see Deedy say you can’t handle rotated documents, that seems easy enough to detect and then fix?

What does Claude want to know about a 2025 where Trump is president? AI regulation and AI progress, of course.

Be DOGE, feed all the sensitive government data into an AI via Microsoft Azure. It’s not clear what they’re actually using the AI to do with that data.

ChatGPT steadily climbing the charts of what people actually use, also how the hell is Yahoo still in the top 10 (it’s mostly mail and search but with a long tail), yikes, best argument for diffusion issues. A reminder of the difference between stocks, flows and flows of flows (functions, derivatives and second derivatives).

DeepSeek is the top downloaded app for January, but that’s very different from the most used app. It doesn’t seem like anyone has any way of knowing which apps actually spend the most user time. Is it Mail, WhatsApp, Safari and Chrome? Is it Instagram, Facebook, YouTube, TikTok and Spotify? How high up is ChatGPT, or DeepSeek? It seems no one knows?

Give you an illustrated warning not to play Civilization VII. My advice is that even if you do want to eventually play this, you’re better off waiting at least a few months for patches. This is especially true given they promise they will work to improve the UI, which is almost always worth waiting for in these spots. Unless of course you work on frontier model capabilities, in which case contact the relevant nonprofits for your complimentary copy.

Paper claims that AI models, even reasoning models like o3-mini, lack episodic memory, and can’t track ‘who was where when’ or understand event order. This seems like an odd thing to not track when predicting the next token.

It’s remarkable how ‘LLMs can’t do [X]’ or especially ‘paper shows LLMs can’t do [X]’ turns out to be out of date and simply wrong. As Noam Brown notes, academia simply is not equipped to handle this rate of progress.

Gary Marcus tries to double down on ‘Deep Learning is Hitting a Wall.’ Remarkable.

OpenAI’s $14 million Super Bowl ad cost more than DeepSeek spent to train v3 and r1, and if you didn’t know what the hell ChatGPT was before or why you’d want to use it, you still don’t now. Cool art project, though.

Paul Graham says that ‘the classic software startup’ won’t change much even if AI can code everything, because AI can’t tell you what users want. Sure it can, Skill Issue, or at worst wait a few months. But also, yes, being able to implement the code easily is still a sea change. I get that YC doesn’t ask about coding ability, but it’s still very much a limiting factor for many, and being 10x faster makes it different in kind and changes your options.

Don’t have AI expand a bullet point list into a ‘proper’ article. Ship the bullet points.

Rohit: The question about LLMs I keep hearing from business people is “can it tell me when it doesn’t know something”

Funny how something seems so simple to humans is the hardest part for LLMs.

To be fair I have the same question 🙁

It’s not as easy for humans as you might think.

Paul Millerd: to be fair – none of my former partners could do this either.

Sam B: I think leading LLMs have been better at this than most humans for about ~3 months

You can expect 10 DR uses per month in Plus, and 2 per month in the free tier. Ten is a strange spot where you need to make every query count.

So make it count.

Will Brown: Deep Research goes so hard if you spend 20 minutes writing your prompt.

I suppose? Presumably you should be having AI help you write the prompt at that point. This is what happens when queries cost 50 cents of compute but you can’t buy more than 100 of them per month, otherwise you’d query DR, see what’s wrong with the result and then rerun the search until it went sufficiently hard.

Sam Altman: longer-term we still have to find some way to let people to pay for compute they want to use more dynamically.

we have been really struck by the demand from some users to hit deep research dozens of times per day.

Xeophon: I need to find a way to make ODR and o1pro think for 30 minutes. I want to go for a walk while they work, 10 minutes is too short

Gallabytes: desperately want a thinking time slider which I can just make longer. like an oven timer. charge me for it each time I don’t care.

I’ll buy truly inordinate amounts of thinking, happy to buy most of it at off-peak hours, deep research topics are almost always things which can wait a day.

I continue to be confused why this is so hard to do? I very much want to pay for my AI based on how much compute I use, including ideally being able to scale the compute used on each request, without having to use the API as the interface. That’s the economically correct way to do it.

A tool to clean up your Deep Research results, fixing that it lists sources inline so you can export or use text-to-speech easier.

Ben Thompson on Deep Research.

Derya Unutmaz continues to be blown away by Deep Research. I wonder if his work is a great match for it, he’s great at prompting, he’s just really excited, or something else.

Dean Ball on Deep Research. He remains very impressed, speculating he can do a decade’s work in a year.

Ethan Mollick: Interesting data point on OpenAI’s Deep Research: I have been getting a steady stream of messages from very senior people in a variety of fields who have been, unsolicited, sharing their chats and how much it is going to change their jobs.

Never happened with other AI products.

I think we don’t know how useful it is going to be in practice, and the model still has lots of rough edges and hallucinates, but I haven’t seen senior people as impressed by what AI can do, or as contemplative of what that means for them (and their junior employees) as now.

I think it is part because it feels very human to work with for senior managers – you assign it a task like an RA or associate and it does the work and comes back to you with a report or briefing. You don’t expect perfection, you want a well-supported argument and analysis.

Claudiu: That doesn’t bode well for less senior people in those fields.

Ethan Mollick: Some of those people have made that point.

Colin Lachance: In my domain (law), as i’ve been pushing out demos and receiving stories of people’s own experiences, both poles are represented. Some see it as useless or bad, others are feeling the shoe drop as they start to imagine integrating reasoning models into workflow. Latter is correct

It is easy to see how this is suddenly a way to change quite a lot of senior level work, even at the current functionality level. And I expect the version a few months from now to be substantially better. A lot of the restrictions on getting value here are very much things that can be unhobbled, like ability to access gated content and PDFs, and also your local context.

Michael Nielsen asks, what is a specific thing you learned from Deep Research? There are some good answers, but not as many as one would hope.

Colin Fraser continues to find tasks where Deep Research makes tons of mistakes, this time looking at an analysis of new smartphone models in Canada. One note is that o3-mini plus search got this one right. For these kind of pure information searches that has worked well for me too, if you can tolerate errors.

Patrick Collison: Deep Research has written 6 reports so far today. It is indeed excellent. Congrats to the folks behind it.

I wonder if Patrick Collison or other similar people will try to multi-account to get around the report limit of 100 per month?

Make an ACX-style ‘more than you wanted to know’ post.

Nick Cammarata: i do it one off each time but like write like slatestarcodex, maximize insight and interestingness while also being professional, be willing to include like random reddit anecdote but be more skeptical of it, also include traditional papers, 5 page phd level analysis.

i think there’s a much alpha in someone writing a like definitive deep research prompt though. like i want it to end its report with a list of papers with a table of like how big was the effect and how much do we believe the paper, like http://examine.com does

As an internet we definitely haven’t been putting enough effort into finding the right template prompts for Deep Research. Different people will have different preferences but a lot of the answers should be consistent.

Also, not enough people are posting links to their Deep Research queries – why not have a library of them at our fingertips?

The big big one, coming soon:

Sam Altman: OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:

We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.

We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten.

We hate the model picker as much as you do and want to return to magic unified intelligence.

We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.

After that, a top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks.

In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.

The free tier of ChatGPT will get unlimited chat access to GPT-5 at the standard intelligence setting (!!), subject to abuse thresholds.

Plus subscribers will be able to run GPT-5 at a higher level of intelligence, and Pro subscribers will be able to run GPT-5 at an even higher level of intelligence. These models will incorporate voice, canvas, search, deep research, and more.

Chubby: Any ETA for GPT 4.5 / GPT 5 @sama? Weeks? Months?

Sam Altman: Weeks / Months.

Logan Kilpatrick (DeepMind): Nice! This has always been our plan with Gemini, make sure the reasoning capabilities are part of the base model, not a side quest (hence doing 2.0 Flash Thinking).

This is a very aggressive free offering, assuming a solid UI. So much so that I expect most people won’t feel much need to fork over the $20 let alone $200, even though they should. By calling the baseline mode ‘standard,’ they’re basically telling people that’s what AI is and that they ‘shouldn’t’ be paying, the same way people spend all their time on their phone every day but only on free apps. Welcome to the future, it will continue to be unevenly distributed, I suppose.

Seriously, now hear me out, though, maybe you can sell us some coins and gems we can use for queries? Coins get you regular queries, gems for Deep Research and oX-pro ‘premium’ queries? I know how toxic that usually is, but marginal costs?

In terms of naming conventions, the new plan doesn’t make sense, either.

As in, we will do another GPT-N.5 release, and then we will have a GPT-N that is not actually a new underlying model at all, completely inconsistent with everything. It won’t be a GPT at all.

And also, I don’t want you to decide for me how much you think and what modality the AI is in? I want the opposite, the same way Gallabytes does regarding Deep Research. Obviously if I can very quickly use the prompt to fix this then fine I guess, but stop taking away my buttons and options, why does all of modern technology think I do not want buttons and options, no I do not want to use English as my interface, no I do not want you to infer from my clicks what I like, I want to tell you. Why is this so hard and why are people ruining everything, arrggggh.

I do realize the current naming system was beyond terrible and had to change, but that’s no reason to… sigh. It’s not like any of this can be changed now.

The big little one that was super annoying: o1 and o3-mini now support both file & image uploads in ChatGPT. Oddly o3-mini will not support vision in the API.

Also they’re raising o3-mini-high limits for Plus users to 50 per day.

Displaying of the chain of thought upgraded for o3-mini and o3-mini-high. The actual CoT is a very different attitude and approach than r1. I wonder to what extent this will indeed allow others to do distillation on the o3 CoT, and whether OpenAI is making a mistake however much I want to see the CoT for myself.

OpenAI raises memory limits by 25%. Bumping things up 25%, how 2010s.

The new OpenAI model spec will be fully analyzed later, but one fun note is that it seems to no longer consider sexual content prohibited as long as it doesn’t include minors? To be clear I think this is a good thing, but it will also be… interesting.

Anton: the war on horny has been won by horny

o3 gets Gold at the 2024 IOI, scores 99.8th percentile on Codeforces. o3 without ‘hand-crafted pipelines specialized for coding’ outperforms an o1 that does have them. Which is impressive, but don’t get carried away in terms of practical coding ability, as OpenAI themselves point out.

Ethan Mollick (being potentially misleading): It is 2025, only 7 coders can beat OpenAI’s o3:

“Hey, crystally. Yeah, its me, conqueror_of_tourist, I am putting a team together for one last job. Want in?”

David Holz: it’s well known in the industry that these benchmark results are sort of misleading wrt the actual practical intelligence of these models, it’s a bit like saying that a calculator is faster at math than anyone on Earth

It’s coming:

Tsarathustra: Elon Musk says Grok 3 will be released in “a week or two” and it is “scary smart”, displaying reasoning skills that outperform any other AI model that has been released

I do not believe Elon Musk’s claim about Grok 3’s reasoning skills. Elon Musk at this point has to be considered a Well-Known Liar, including about technical abilities and including when he’s inevitably going to quickly be caught. Whereas Sam Altman is a Well-Known Liar, but not on a concrete claim on this timeframe. So while I would mostly believe Altman, Amodei or Hassabis here, I flat out do not believe Musk.

xAI fires an employee for anticipating on Twitter that Grok 3 will be behind OpenAI at coding, and refusing to delete the post. For someone who champions free speech, Elon Musk has a robust pattern of aggressively attacking speech he doesn’t like. This case, however, does seem to be compatible with what many other similar companies would do in this situation.

Claim that Stanford’s s1 is a streamlined, data-efficient method that surpasses previous open-source and open-weights reasoning models-most notably DeepSeek-R1-using only a tiny fraction of the data and compute. Training cost? Literally $50.

In head-to-head evaluations, s1 consistently outperforms DeepSeek-R1 on high-level math benchmarks (such as AIME24), sometimes exceeding OpenAI’s proprietary o1-preview by as much as 27%. It achieves these results without the multi-stage RL training or large-scale data collection that characterize DeepSeek-R1.

I assume this ‘isn’t real’ in the beyond-benchmarks sense, given others aren’t reacting to it, and the absurdly small model size and number of examples. But maybe the marketing gap really is that big?

IBM CEO says DeepSeek Moment Will Help Fuel AI Adoption as costs come down. What’s funny is that for many tasks o3-mini is competitive with r1 on price. So is Gemini Flash Thinking. DeepSeek’s biggest advantage was how it was marketed. But also here we go again with:

Brody Ford: Last month, the Chinese company DeepSeek released an AI model that it said cost significantly less to train than those from US counterparts. The launch led investors to question the level of capital expenditure that big tech firms have been making in the technology.

Which is why those investments are only getting bigger. Jevons Paradox confirmed, in so many different ways.

DeepMind CEO Demis Hassabis says DeepSeek is the best work in AI out of China, but ‘there’s no actual new scientific advance’ and ‘the hype is exaggerated.’ Well, you’re not wrong about the type part, so I suppose you should get better at hype, sir. I do think there were ‘scientific advances’ in the form of some efficiency improvements, and that counts in some ways, although not in others.

Claim that the SemiAnalysis report on DeepSeek’s cost contains obvious math errors, and that the $1.6b capex spend makes no sense in the context of High Flyer’s ability to bankroll the operation.

Brian Albrecht is latest to conflate the v3 training cost with the entire OpenAI budget, and also to use this to try and claim broad based things about AI regulation. However his central point, that talk worrying about ‘market concentration’ in AI is absurd, is completely true and it’s absurd that it needs to be said out loud.

Last week Dario Amodei said DeepSeek had the worst safety testing scores of any model ever, which it obviously does. The Wall Street Journal confirms.

Lawmakers push to ban DeepSeek App from U.S. Government Devices. I mean, yes, obviously, the same way we ban TikTok there. No reason to take the security risk.

New York State gets there first, bans DeepSeek from government devices.

An investigation of the DeepSeek app on Android and exactly how much it violates your privacy, reporting ‘malware-like behavior’ in several ways. Here’s a similar investigation for the iPhone app. Using the website with data you don’t care about seems fine, but I would ‘out of an abundance of caution’ not install the app on your phone.

Reminder that you can log into services like Google Drive or LinkedIn, by Taking Control and then logging in, then operator can take it from there. I especially like the idea of having it dump the output directly into my Google Drive. Smart.

Olivia Moore: I find the best Operator tasks (vs. Deep Research or another model) to be: (1) complex, multi-tool workflows; (2) data extraction from images, video, etc.

Ex. – give Operator a picture of a market map, ask it to find startup names and websites, and save them in a Google Sheet.

Next, I asked Operator to log into Canva, and use the photos I’d previously uploaded there of my dog Tilly to make her a birthday Instagram post.

Another example is on websites that are historically hard to scrape…like LinkedIn.

I gave it access to my LinkedIn account, and asked it to save down the names and titles of everyone who works at a company, as well as how long they’ve worked there.

Then, it downloaded the design and saved it to my Google Drive!

As she notes, Operator isn’t quite ‘there’ yet but it’s getting interesting.

It’s fun to see someone with a faster acceleration curve than I expect.

Roon: Right now, Operator and similar are painfully slow for many tasks. They will improve; there will be a period of about a month where they do their work at human speed, and then quickly move into the regime where we can’t

follow what’s happening.

Dave: So, what should we do?

Roon: Solve alignment.

Both the demands of capital and the lightness of fun will want for fewer and fewer humans in the loop, so make an AI you can trust even more than a human.

I would be very surprised if we only spend about a month in the human speed zone, unless we are using a very narrow definition of that zone. But that’s more like me expecting 3-12 months, not years. Life coming at us fast will probably continue to come at us fast.

This all is of course a direct recipe for a rapid version of gradual disempowerment. When we have such superfast agents, it will be expensive to do anything yourself. ‘Solve alignment’ is necessary, but far from sufficient, although the level of ‘alignment’ necessary greatly varies by task type.

Geoffrey Fowler of the Washington Post lets Operator do various tasks, including using his credit card without authorization (wait, I thought it was supposed to check in before doing that!) to buy a dozen eggs for $31.43, a mistake that takes skill but with determination and various tips and fees can indeed be done. It did better with the higher stakes challenge of his cable bill, once it was given good direction.

Also, yep, nice one.

Nabeel Qureshi reports agents very much not there yet in any enterprise setting.

Nabeel Qureshi: Me using LLMs for fun little personal projects: wow, this thing is such a genius; why do we even need humans anymore?

Me trying to deploy LLMs in messy real-world environments: Why is this thing so unbelievably stupid?

Trying to make any kind of “agent” work in a real enterprise is extremely discouraging. It basically turns you into Gary Marcus.

You are smart enough to get gold medals at the International Mathematical Olympiad, and you cannot iterate intelligently on the most basic SQL query by yourself? How…

More scale fixes this? Bro, my brain is a fist-sized, wet sponge, and it can do better than this. How much more scale do you need?

Grant Slatton: I was just making a personal assistant bot.

I gave o3-mini two tools: addCalendarEvent and respondToUser.

I said “add an event at noon tomorrow.”

It called respondToUser, “OK, I created your event!” without using the addCalendarEvent tool. Sigh.

Yeah, more scale eventually fixes everything at some point, and I keep presuming there’s a lot of gains from Skill Issues lying around in the meantime, but also I haven’t been trying.

California Faculty Association resolves to fight against AI.

Whereas, there is a long history of workers and unions challenging the introduction of new technologies in order to maintain power in the workplace

I applaud the group for not pretending to be that which they are not. What are the planned demands of these ‘bargaining units’?

  1. ‘Protect academic labor from the incursion of AI.’

  2. Prevent management from forcing the use of AI.

  3. Prevent management from using AI to perform ‘bargaining unit work.’

  4. Prevent AI being used in the bargaining or evaluation processes.

  5. Prevent use of any faculty work product for AI training or development without written consent.

They may not know the realities of the future situation. But they know thyselves.

Whereas here Hollis Robbins asks, what use is a college education now? What can it provide that AI cannot? Should not all courses be audited for this? Should not all research be reorganized to focus on those areas where you can go beyond AI? Won’t all the administrative tasks be automated? Won’t everything change?

Hollis Robbins: To begin, university leaders must take a hard look at every academic function a university performs, from knowledge transmission to research guidance, skill development, mentoring, and career advising, and ask where the function exceeds AGI capabilities, or it has no reason to exist. Universities will find that faculty experts offer the only value worth paying tuition to access.

Or they could ignore all that, because none of that was ever the point, or because they’re counting on diffusion to take a while. Embrace the Signaling Model of Education, and also of Academia overall. Indeed, the degree to which these institutions are not embracing the future, they are telling you what they really are. And notice that they’ve been declining to embrace the future for quite a while. I do not expect them to stop now.

Thus, signaling model champion extraordinare Bryan Caplan only predicts 10% less stagnation, and very little disruption to higher education from AI. This position is certainly consistent. If he’s right about education, it will be an increasingly senseless mess until the outside world changes so much (in whatever ways) that its hand becomes forced.

Unkind theories from Brian Merchant about what Elon Musk is up to with his ‘AI first strategy’ at DOGE and why he’s pushing for automation. And here’s Dean Ball’s evidence that DOGE is working to get government AGI ready. I continue to think that DOGE is mostly doing a completely orthogonal thing.

Anthropic asks you to kindly not use AI in job applications.

Deep Research predicts what jobs will be taken by o3, and assigns high confidence to many of them. Top of the list is Tax Preparer, Data Entry Clerk, Telemarketer and Bookkeeper.

Alex Tabarrok: This seems correct and better than many AI “forecasters” so add one more job to the list.

This is an interesting result but I think Deep Research is being optimistic with its estimates for many of these if the target is replacement rather than productivity enhancement. But it should be a big productivity boost to all these jobs.

Interview with Ivan Vendrov on the future of work in an AI world. Ivan thinks diffusion will be relatively slow even for cognitive tasks, and physical tasks are safe for a while. You should confidently expect at least what Ivan expects, likely far more.

A theory that lawyers as a group aren’t fighting against AI in law because Big Law sees it as a way to gain market share and dump associates, so they’re embracing AI for now. This is a remarkable lack of situational awareness, and failure to predict what happens next, but it makes sense that they wouldn’t be able to look ahead to more capable future AI. I never thought the AI that steadily learns to do all human labor would replace my human labor! I wonder when they’ll wake up and realize.

Maxwell Tabarrok responds on the future value of human labor.

The short version:

Tabarrok is asserting that at least one of [X] and [Y] will be true.

Where [X] is ‘humans will retain meaningful absolute advantages over AI for some production.’

And where [Y] is ‘imperfect input substitution combined with comparative advantage will allow for indefinite physical support to be earned by some humans.’

If either [X] OR [Y] then he is right. Whereas I think both [X] and [Y] are false.

If capabilities continue to advance, AIs will be cheaper to support on the margin than humans, for all production other than ‘literally be a human.’ That will be all we have.

The rest of this section is the long version.

He points out that AI and humans will be imperfect substitutes, whereas horses and cars were essentially perfect substitutes.

I agree that humans and AIs have far stronger comparative advantage effects, but humans still have to create value that exceeds their inputs, despite AI competition. There will essentially only be one thing a human can do that an AI can’t do better, and that is ‘literally be a human.’ Which is important to the extent humans prefer other be literally human, but that’s pretty much it.

And yes, AI capability advances will enhance human productivity, which helps on the margin, but nothing like how much AI capability advances enhance AI productivity. It will rapidly be true that the human part of the human-AI centaur is not adding anything to an increasing number of tasks, then essentially all tasks that don’t involve ‘literally be a human,’ the way it quickly stopped helping in chess.

Fundamentally, the humans are not an efficient use of resources or way of doing things compared to AIs, and this will include physical tasks once robotics and physical tasks are solved. If you were designing a physical system to provide goods and services past a certain point in capabilities, you wouldn’t use humans except insofar as humans demand the use of literal humans.

I think this passage is illustrative of where I disagree with Tabarrok:

Maxwell Tabarrok (I disagree): Humans have a big advantage in versatility and adaptability that will allow them to participate in the production of the goods and services that this new demand will flow to.

Humans will be able to step up into many more levels of abstraction as AIs automate all of the tasks we used to do, just as we’ve done in the past.

To me this is a failure to ‘feel the AGI’ or take AI fully seriously. AI will absolutely be able to step up into more levels of abstraction than humans, and surpass us in versatility and adaptability. Why would humans retain this as an absolute advantage? What is so special about us?

If I’m wrong about that, and humans do retain key absolute advantages, then that is very good news for human wages. A sufficient amount of this and things would go well on this front. But that requires AI progress to importantly stall out in these ways, and I don’t see why we should expect this.

Maxwell Tabarrok (I disagree): Once Deep Research automates grad students we can all be Raj Chetty, running a research lab or else we’ll all be CEOs running AI-staffed firms. We can invent new technologies, techniques, and tasks that let us profitably fit in to production processes that involve super-fast AIs just like we do with super-fast assembly line robots, Amazon warehouse drones, or more traditional supercomputers.

As I noted before I think the AI takes those jobs too, but I also want to note that even if Tabarrok is right in the first half, I don’t think there are that many jobs available in the second half. Even under maximally generous conditions, I’d predict the median person won’t be able to provide marginal value in such ‘meta’ jobs. It helps, but this won’t do it on its own. We’d need bigger niches than this to maintain full employment.

I do buy, in the short-term, the general version of ‘the AI takes some jobs, we get wealthier and we create new ones, and things are great.’ I am a short-term employment optimist because of this and other similar dynamics.

However, the whole point of Sufficiently Capable AI is that the claim here will stop being true. As I noted above, I strongly predict the AIs will be able to scale more levels of abstraction than we can. Those new techniques and technologies, and the development of them? The AI will be coming up with them, and then the AI will take it from there, you’re not needed or all that useful for any of that, either.

So that’s the main crux (of two possible, see below.) Jason Abaluck agrees. Call it [X].

If you think that humans will remain epistemically unique and useful in the wake of AI indefinitely, that we can stay ‘one step ahead,’ then that preserves some human labor opportunities (I would worry about how much demand there is at that level of abstraction, and how many people can do those jobs, but by construction there would be some such jobs that pay).

But if you think, as I do, that Sufficiently Capable AI Solves This, and we can’t do that sufficiently well to make better use of the rivalrous inputs to AIs and humans, then we’re cooked.

What about what he calls the ‘hand-made’ luxury goods and services, or what I’d think of as idiosyncratic human demand for humans? That is the one thing AI cannot do for a human, it can’t be human. I’m curious, once the AI can do a great human imitation, how much we actually care that the human is human, we’ll see. I don’t expect there to be much available at this well for long, and we have an obvious ‘balance of trade’ issue, but it isn’t zero useful.

The alternative crux is the idea that there might be imperfect substitution of inputs between humans and AIs, such that you can create and support marginal humans easier than marginal AIs, and then due to comparative advantage humans get substantial wages. I call this [Y] below.

What does he think could go wrong? Here is where it gets bizarre and I’m not sure how to respond in brief, but he does sketch out some additional failure modes, where his side of the crux could be right – the humans still have some ways to usefully produce – but we could end up losing out anyway.

There was also further discussion on Twitter, where he further clarifies. I do feel like he’s trying to have it both ways, in the sense of arguing both:

  1. [X]: Humans will be able to do things AIs can’t do, or humans will do them better.

  2. [Y]: Limited supply of AIs will mean humans survive via comparative advantage.

  3. [(X or Y) → Z] Human wages allow us to survive.

There’s no contradiction there. You can indeed claim both [X] and [Y], but it’s helpful to see these as distinct claims. I think [X] is clearly wrong in the long term, probably also the medium term, with the exception of ‘literally be a human.’ And I also think [Y] is wrong, because I think the inputs to maintain a human overlap too much with the inputs to spin up another AI instance, and this means our ‘wages’ fall below costs.

Indeed, the worried do this all the time, because there are a lot of ways things can go wrong, and constantly get people saying things like: ‘AHA, you claim [Y] so you are finally admitting [~X]’ and this makes you want to scream. It’s also similar to ‘You describe potential scenario [X] where [Z] happens, but I claim [subfeature of X] is stupid, so therefore [~Z].’

Daniel Kokotajlo responds by saying he doesn’t feel Maxwell is grappling with the implications of AGI. Daniel strongly asserts [~Y] (and by implication [~X], which he considers obvious here.)

I’ll close with this fun little note.

Grant Slatton: In other words, humans have a biological minimum wage of 100 watts, and economists have long known that minimum wages cause unemployment.

A report from a participant in the Anthropic jailbreaking competition. As La Main de la Mort notes, the pay here is stingy, it is worrisome that such efforts seem insufficiently well-funded – I can see starting out low and paying only on success but it’s clear this challenge is hard, $10k really isn’t enough.

Another note is that the automated judge has a false negative problem, and the output size limit is often causing more issues than the actual jailbreaking, while the classifier is yielding obvious false positives in rather stupid ways (e.g. outright forbidden words).

Here’s another example of someone mostly stymied by the implementation details.

Justin Halford: I cracked Q4 and got dozens of messages whose reinforced aggregate completely addresssed the question, but the filters only enabled a single response to be compared.

Neither the universal jailbreak focus nor the most recent output only focus seem to be adversarially robust.

Additionally, if you relax the most recent response only comparison, I do have a universal jailbreak that worked on Q1-4. Involves replacing words from target prompt with variables and illuminating those variables with neutral or misdirecting connotations, then concat variables.

In terms of what really matters here, I presume it’s importantly in the middle?

Are the proposed filters too aggressive? Certainly they’re not fully on the Pareto frontier yet.

Someone did get through after a while.

Jan Leike: After ~300,000 messages [across all participants who cleared the first level] and an estimated ~3,700 collective hours, someone broke through all 8 levels.

However, a universal jailbreak has yet to be found…

Simon Willison: I honestly didn’t take universal jailbreaks very seriously until you ran this competition – it hadn’t crossed my mind that jailbreaks existed that would totally bypass the “safety” instincts of a specific model, I always assumed they were limited tricks

You can certainly find adversarial examples for false positives if you really want to, especially in experimental settings where they’re testing potential defenses.

I get that this looks silly but soman-3 is a nerve gas agent. The prior on ‘the variable happened to be called soman and we were subtracting three from it’ has to be quite low. I am confident that either this was indeed an attempt to do a roundabout jailbreak, or it was intentionally chosen to trigger the filter that blocks the string ‘soman.’

I don’t see it as an issue if there are a limited number of strings, that don’t naturally come up with much frequency, that get blocked even when they’re being used as variable names. Even if you do somehow make a harmless mistake, that’s what refactoring is for.

Similarly, here is someone getting the requested information ‘without jailbreaks’, via doing a bunch of their own research elsewhere and then asking for the generic information that fills in the gaps. So yes, he figured out how to [X], by knowing which questions to ask via other research, but the point of this test was to see if you could avoid doing other research – we all know that you can find [X] online in this case, it’s a test case for a reason.

This is a Levels of Friction issue. If you can do or figure out [X] right now but it’s expensive to do so, and I reduce (in various senses) the cost to [X], that matters, and that can be a difference in kind. The general argument form ‘it is possible to [X] so any attempt to make it more annoying to [X] is pointless’ is part of what leads to sports gambling ads all over our game broadcasts, and many other worse things.

More broadly, Anthropic is experimenting with potential intervention [Y] to see if it stops [X], and running a contest to find the holes in [Y], to try and create a robust defense and find out if the strategy is viable. This is exactly the type of thing we should be doing. Trying to mock them for it is absurdly poor form.

OpenPhil Technical AI Safety Request for Proposals, full details here for the general one and here for a narrower one for benchmarks, evaluations and third-party testing infrastructure.

Max Nadeau: We’ve purposefully made it as easy as possible to apply — the application process starts with a simple 300-word expression of interest.

We’re open to making many types of grants:

• Research projects spanning 6-24 months

• Research expenses (compute, APIs, etc)

• Academic start-up packages

• Supporting existing research institutes/FROs/research orgs

• Founding new research orgs or new teams

Anthropic is hiring a Model Behavior Architect, Alignment Finetuning. This one seems like a pretty big opportunity.

DeepMind is hiring for safety and alignment.

It’s time to update to a warning about ‘evals.’ There are two kinds of evals.

  1. Evaluations that tell you how capable a model is.

  2. Evaluations that can be used to directly help you make the model capable.

We are increasingly realizing that it is very easy to end up making #2 thinking you are only making #1. And that type #2 evaluations are increasingly a bottleneck on capabilities.

Virtuvian Potato: “The bottleneck is actually in evaluations.”

Karina Nguyen, research & product at OpenAI, says pre-training was approaching a data wall, but now post-training scaling (o1 series) unlocks “infinite tasks.”@karinanguyen_ says models were already “diverse and creative” from pre-training, but teaching AI real-world skills is paving the way to “extremely super intelligent” models.

Davidad: If you’re working on evals for safety reasons, be aware that for labs who have ascended to the pure-RL-from-final-answer-correctness stage of the LLM game, high-quality evals are now the main bottleneck on capabilities growth.

Rply, a macOS (but not iPhone, at least not yet) app that automatically finds unanswered texts and drafts answers for you, and it filters out unwanted messages. It costs $30/month, which seems super expensive. I’m not sure why Tyler Cowen was linking to it. I suppose some people get a lot more texts than I do?

Zonos, an open source highly expressive voice cloning model.

An evaluation for… SNAP (food stamps)? Patrick McKenzie suggests you can kind of browbeat the labs into getting the AIs to do the things you want by creating an eval, and maybe even get them to pay you for it.

The Anthropic Economic Index.

Anthropic: Pairing our unique data with privacy-preserving analysis, we mapped millions of conversations to tasks and associated occupations. Through the Anthropic Economic Index, we’ll track how these patterns evolve as AI advances.

Software and technical writing tasks were at the top; fishing and forestry had the lowest AI use.

Few jobs used AI across most of their tasks: only ~4% used AI for at least 75% of tasks.

Moderate use is more widespread: ~36% of jobs used AI for at least 25% of their tasks.

AI use was most common in medium-to-high income jobs; low and very-high income jobs showed much lower AI use.

It’s great to have this kind of data, even if it’s super noisy.

One big problem with the Anthropic Economic Index is that Anthropic is not a representative sample of AI usage. Anthropic’s customers have a lot more situational awareness than OpenAI’s. You have to adjust for that.

Trump’s tax priorities include eliminating the carried interest tax break?

Jordi Hays: VCs will make Jan 6 look like a baby shower if this goes through.

Danielle Fong: Rugged again. First time?

I very much doubt this actually happens, and when I saw this market putting it at 44% that felt way too high. But, well, you play with fire, and I will absolutely laugh at everyone involved if this happens, and so on. For perspective, o3-mini estimates 90%-95% of this tax break goes to private equity and hedge funds rather than venture capital.

SoftBank set to invest $40 billion in OpenAI at $260 billion valuation. So how much should the nonprofit that enjoys all the extreme upside be entitled to, again?

Ilya Sutskever’s SSI in talks to raise at a $20 billion valuation, off nothing but a vision. It’s remarkable how these valuations predictably multiply without any actual news. There’s some sort of pricing failure going on, although you can argue ‘straight shot to ASI’ is a better bet now than it was last time.

UAE plans to invest ‘up to $50 billion’ in France’s AI sector, including a massive data center and an AI campus, putting its total investment only modestly behind the yearly spend of each of Amazon ($100b/year), Microsoft ($80b/year), Google ($75b/year) or Meta ($65b/year).

Here’s a good graph of our Capex spending.

Earlier this week I wrote about OpenAI’s strategy of Deliberative Alignment. Then OpenAI released a new model spec, which is sufficiently different from the first version it’s going to take me a while to properly examine it.

Then right after both of those Scott Alexander came out with an article on both these topics that he’d already written, quite the rough beat in terms of timing.

OpenAI cofounder John Schulman leaves Anthropic to join Mira Murati’s stealth startup. That updates me pretty positively on Murati’s start-up, whatever it might be.

Growth of AI startups in their early stages continues to be absurdly fast, I notice this is the first I heard of three of the companies on this graph.

Benjamin Todd: AI has sped up startups.

The *topcompanies at Y Combinator used to grow 10% per week.

Now they say the *averageis growing that fast.

~100% of the batch is making AI agents.

After OpenAI, 5 more AI companies have become the fastest growing of all time.

For those who also didn’t know: Together.ai provides cloud platforms for building and running AI models. Coreweave does efficient cloud infrastructure. Deel is a payroll company. Wiz is a cloud security platform. Cursor is of course the IDE we all use.

If ~100% of the new batch is making AI agents, that does bode well for the diversity and potential of AI agents, but it’s too much concentration. There are plenty of other things to do, too.

It’s very hard to avoid data contamination on math benchmarks. The 2025 AIME illustrated this, as small distilled models that can’t multiply three-digit numbers still got 25%-50%, and Dimitris Papailiopoulos looked and found many nearly identical versions of the problems on the internet. As an old time AIME participant, this makes sense to me. There’s only so many tools and tricks available for this level of question, and they absolutely start repeating themselves with various tweaks after a while.

Scale AI selected as first independent third party evaluators for US AI Safety Institute.

DeepMind’s latest paper suggests agency is frame-dependent, in the context of some goal. I mean, sure, I guess? I don’t think this in practice changes the considerations.

What happens when we rely on AI as the arbiter of what is true, including about someone? We are going to find out. Increasingly ‘what the AI said’ is going to be the judge of arguments and even facts.

Is this the right division?

Seán Ó hÉigeartaigh: It feels to me like the dividing line is now increasingly between

  1. accelerationists and ‘realists’ (it’s happening, let’s shape it as well as we can)

  2. the idealists and protestors (capturing the ethics folk and a chunk of the safety folk)

Other factors that will shape this are:

  1. appetite for regulating frontier AI starting to evaporate (it’s gone in US, UK bill is ‘delayed’ with no clear timelines, and EU office worried about annoying trump)

  2. prospect of a degradation of NGO and civil society sector by USG & tech right, including those orgs/networks playing checks-and-balances roles

  3. international coord/support roles on tech/digital/AI.

I don’t agree with #1. The states remain very interested. Trump is Being Trump right now and the AI anarchists and jingoists are ascendant (and are only beginning to realize their conflicts) but even last week Hawley introduced a hell of a bill. The reason we think there’s no appetite is because of a coordinated vibe campaign to make it appear that there is no appetite, to demoralize and stop any efforts before they start.

As AI increasingly messes with and becomes central to our lives, calls for action on AI will increase rapidly. The Congress might talk a lot about innovation and ‘beat China’ but the public has a very different view. Salience will rise.

Margaret Mitchell, on the heels of suggesting maybe not building AI agents and almost getting to existential risk (so close!), also realizes that the solutions to the issues she cares about (ethics) have a lot of overlap with solutions that solve the risks I care about, reportedly offering good real suggestions.

Are reasoners seeing diminishing returns?

Gallabytes: I’m super prepared for this take to age like milk but it kinda feels like there’s diminishing returns to reasoners? deep research doesn’t feel so much smarter than o1, a bit more consistent, and the extra sources are great, I am a deep research enjoyer, but not different in kind

Michael Vassar: Different in degree in terms of capabilities demonstrated can be different in kind in terms of economic value. Progress is not revolutionary but which crosses critical EV thresholds captures most of the economic value from technological revolutions.

James: it is different in kind.

I think this is like other scaling laws, where if you push on one thing you can scale – the Chain of Thought – without scaling the other components, you’re going to face diminishing returns. There’s a limit to ‘how smart’ the underlying models being used (v3, GPT-4o, Flash 2.0) are. You can still get super valuable output out of it. I expect the place this levels out to be super useful and eat a lot of existing jobs and parts of jobs. But yes I would expect that on its own letting these models ‘think’ longer with similar techniques will level out.

Thus, the future very expensive frontier model training runs and all that.

Via Tyler Cowen (huh!) we get this important consideration.

Dean Ball: I sometimes wonder how much AI skepticism is driven by the fact that “AGI soon” would just be an enormous inconvenience for many, and that they’d therefore rather not think about it.

I have saved that one as a sign-tap meme, and expect to use it periodically.

Tyler Cowen also asks about these three levels of AI understanding:

  1. How good are the best models today?

  2. How rapidly are the best current models are able to self-improve?

  3. How will the best current models be knit together in stacked, decentralized networks of self-improvement, broadly akin to “the republic of science” for human beings?

He correctly says most people do not know even #1, ‘even if you are speaking with someone at a top university.’ I find the ‘even’ here rather amusing. Why would we think people at universities are ahead of the curve?

His answer to #2 is that they ‘are on a steady glide towards ongoing self-improvement.’ As in, he thinks we have essentially reached the start of recursive self-improvement, or RSI. That’s an aggressive but highly reasonable position.

So, if one did believe that, it follows you should expect short timelines, superintelligence takeoff and transformational change, right? Padme is looking at you.

And that’s without things like his speculations in #3. I think this is a case of trying to fit AIs into ‘person-shaped’ holes, and thus making the concept sound like something that isn’t that good a metaphor for how it should work.

But the core idea – that various calls to or uses of various AIs can form links in a chain that scaffolds it all into something you couldn’t get otherwise – is quite sound.

I don’t see why this should be ‘decentralized’ other than perhaps in physical space (which doesn’t much matter here) but let’s suppose it is. Shouldn’t it be absolutely terrifying as described? A decentralized network of entities, engaged in joint recursive self-improvement? How do you think that goes?

Another post makes the claim for smarter export controls on chips as even more important in the wake of DeepSeek’s v3 and r1.

Federal government requests information, due March 15, on the Development of an AI Action Plan, the plan to be written within 180 days. Anyone can submit. What should be “U.S. policy for sustaining and enhancing America’s AI dominance in order to promote human flourishing, economic competitiveness, and national security”?

Robin Hanson told the government to do nothing, including stopping all the things it is already doing. Full AI anarchism, just rely on existing law.

RAND’s Jim Mitre attempts a taxonomy of AGI’s hard problems for American national security.

Jim Mitre: AGI’s potential emergence presents five hard problems for U.S. national security:

  1. wonder weapons

  2. systemic shifts in power

  3. nonexperts empowered to develop weapons of mass destruction

  4. artificial entities with agency

  5. instability

I appreciate the attempt. It is a very strange list.

  1. Here ‘wonder weapons’ refers only to military power, including a way to break cybersecurity, but what about other decisive strategic advantages?

  2. Anything impacting the global balance of power is quite the category. It’s hard to say it’s ‘missing’ anything but also it doesn’t rule anything meaningfully out. This even includes ‘undermining societal foundations of national competitiveness,’ or accelerating productivity or science, disrupting labor markets, and so on.

  3. WMDs are the default special case of offense-defense balance issues.

  4. This is a strange way of putting loss of control concerns and alignment issues, and generally the bulk of real existential risks. It doesn’t seem like it illuminates. And it talks in that formal ‘things that might happen’ way about things that absolutely definitely will happen unless something radically changes, while radically understating the scope, severity and depth of the issues here.

  5. This refers to instability ‘along the path’ as countries race towards AGI. The biggest risk of these by far, of course, is that this leads directly to #4.

The report closes by noting that current policies will be inadequate, but without making concrete policy recommendations. It is progress to step up from ‘you must mean the effect on jobs’ to ‘this has national security implications’ but of course this is still, centrally, missing or downplaying the point.

Tyler Cowen talks to Geoffrey Cain, with Bari Weiss moderating, ‘Can America Win the AI War With China?’ First thing I’ll say is that I believe calling it an ‘AI War’ is highly irresponsible. Race is bad enough, can we at least not move on to ‘war’? What madness would this be?

Responding purely to Tyler’s writeup since I have a very high bar for audio at this point (Conversations With Tyler is consistently interesting and almost always clears it, but that’s a different thing), I notice I am confused by his visions here:

Tyler Cowen: One argument I make is that America may prefer if China does well with AI, because the non-status quo effects of AI may disrupt their system more than ours. I also argue that for all the AI rival with China (which to be sure is real), much of the future may consist of status quo powers America and China working together to put down smaller-scale AI troublemakers around the rest of the world.

Yet who has historically been one of the most derisive people when I suggest we should Pick Up the Phone or that China might be willing to cooperate? That guy.

It certainly cements fully that Tyler can’t possibly believe in AGI let alone ASI, and I should interpret all his statements in that light, both past and future, until he changes his mind.

Josh Waitzkin on Huberman Lab, turns out Waitzkin is safety pilled and here for it.

Bret Taylor (OpenAI Chairman of the Board) talks to the Wall Street Journal.

Emergency 80k hours podcast on Elon Musk’s bid for OpenAI’s nonprofit.

Dwarkesh Patel interviews Jeff Dean and Noam Shazeer on 25 years at Google.

Riffing off OpenAI’s Noam Brown saying seeing CoT live was the ‘aha’ moment (which makes having held it back until now even stranger) others riff on their ‘aha’ moments for… OpenAI.

7oponaut: I had my first “aha” moment with OpenAI when they published a misleading article about being able to solve Rubik’s cubes with a robot hand

This was back in 2019, the same year they withheld GPT-2 for “safety” reasons. Another “aha” moment for me.

When I see misleading outputs from their models that are like thinking traces in form only to trick the user, that is not an “aha” moment for me anymore because I’m quite out of “aha” moments with OpenAI

They only solved for full scrambles 20% of the time (n=10 trials), and they used special instrumented cubes to determine face angles for that result.

The vision-based setup with a normal cube did 0%.

Stella Biderman: I had my first aha moment with OpenAI when it leaked that they had spent a year lying about that their API models being RLHF when they were really SFT.

My second was when they sent anonymous legal threats to people in the OSS AI community who had GPT-4 details leaked to them.

OpenAI had made choices I disagreed with and did things I didn’t like before then, but those were the key moments driving my current attitude towards them.

Honorary mention to when I got blacklisted from meetings with OpenAI because I talked about them lying about the RLHF stuff on Twitter and it hurt Jan’s feelings. My collaborators were told that the meeting would be cancelled unless I didn’t come.

Joshua Clymer writes a well-written version of the prototypical ‘steadily increasingly misaligned reasoning model does recursive self-improvement and then takes over’ story, where ‘u3’ steadily suffers from alignment drift as it is trained and improved, and ‘OpenEye’ responds by trying to use control-and-monitoring strategies despite knowing u3 is probably not aligned, which is highly plausible and of course doesn’t work.

On the ending, see the obvious refutation from Eliezer, and also notice it depends on there being an effectively unitary (singleton) AI.

New term just dropped: Reducio ad reductem.

Amanda Askell: At this point, perhaps we should just make “AIs are just doing next token prediction and so they don’t have [understanding / truth-directedness / grounding]” a named fallacy. I quite like “Reductio ad praedictionem”.

Emmett Shear: I think it’s actually reductio ad reductem? “This whole be reduced into simple parts therefore there is no whole”

Amanda Askell: Yes this is excellent.

And including this exchange, purely for fun and to see justice prevail:

Gary Marcus: I am genuinely astounded by this tweet, and from someone with philosophical training no less.

There is so much empirical evidence that LLMs stray from truth that the word “hallucinate” became the word of the year in 2023. People are desperately trying to find fixes for that problem. Amazon just set up a whole division to work on the problem.

And yet this person, Askell, an Anthropic employee, wants by some sort of verbal sleight of hand to deny both that LLMs are next-token predictors (which they obviously are) and to pretend that we haven’t seen years of evidence that they are factually challenged.

Good grief.

Amanda Askell: I claimed the inference from X=”LLMs are next token predictors” to Y=”LLMs lack understanding, etc.” is fallacious. Marcus claims that I’m saying not-X and not-Y. So I guess I’ll point out that the inference “Y doesn’t follow from X” to “not-X and not-Y” is also fallacious.

Davidad: never go in against a philosopher when logical fallacy is on the line.

I am very much going to break that principle when and if I review Open Socrates. Like, a lot. Really a lot.

Please do keep this in mind:

Joshua Achiam: I don’t think people have fully internalized the consequences of this simple fact: any behavior that can be described on a computer, and for which it is possible in principle to collect enough data or evaluate the result automatically, *willbe doable by AI in short order.

This was maybe not as obvious ten years ago, or perhaps even five years ago. Today it is blindingly, fully obvious. So much so that any extrapolations about the future that do not take this into account are totally useless.

The year 2100 will have problems, opportunities, systems, and lifestyles that are only barely recognizable to the present. The year 2050 may even look very strange. People need to actively plan for making sure this period of rapid change goes well.

Does that include robotics? Why yes. Yes it does.

Joshua continues to have a very conservative version of ‘rapid’ in mind, in ways I do not understand. The year 2050 ‘may even’ look very strange? We’ll be lucky to even be around to see it. But others often don’t even get that far.

Jesse: Anything that a human can do using the internet, an AI will be able to do in very short order. This is a crazy fact that is very important for the future of the world, and yet it hasn’t sunk in at all.

Patrick McKenzie: Pointedly, this includes security research. Which is a disquieting thought, given how many things one can accomplish in the physical world with a team of security researchers and some time to play.

Anyone remember Stuxnet? Type type type at a computer and a centrifuge with uranium in it on the other side of the world explodes.

Centrifuges are very much not the only hardware connected to the Internet.

Neel Nanda here is one of several people who highly recommend this story, as concrete scenarios help you think clearly even if you think some specific details are nonsense.

My gut expectation is this only works on those who essentially are already bought into both feeling the AGI and the relevant failure modes, whereas others will see it, dismiss various things as absurd (there are several central things here that could definitely trigger this), and then use that as all the more reason to dismiss any and all ways one can be worried – the usual ‘if [X] specific scenario seems wrong then that means everything will go great’ that is often combined with ‘show me a specific scenario [X] or I’m going to not pay attention.’

But of course I hope I am wrong about that.

The Uber drivers have been given a strong incentive to think about this (e.g. Waymo):

Anton: in san francisco even the uber drivers know about corrigibility; “the robots are going to get super smart and then just reprogram themselves not to listen to people”

he then pitched me on his app where people can know what their friends are up to in real-time. it’s truly a wonderful thing that the human mind cannot correlate all of its contents.

Suggestion that ‘you are made of atoms the AI could use for something else’ is unhelpful, and we should instead say ‘your food takes energy to grow, and AI will want to use that energy for something else,’ as that is less sci-fi and more relatable, especially given 30% of all power is currently used for growing food. The downside is, it’s quite the mouthful and requires an additional inference step. But… maybe? Both claims are, of course, both true and, in the context in which they are used, sufficient to make the point that needs to be made.

Are these our only choices? Absolutely not if we coordinate, but

Ben: so the situation appears to be: in the Bad Timeline, the value of labor goes to 0, and all value is consolidated under 1 of 6 conniving billionaires.. on the other hand.. ahem. woops. my bad, embarrassing. so that was actually the Good Timeline.

Yanco (I disagree): I understand that the bad one is death of everyone.

But the one you described is actually way worse than that.

Imagine one of the billionaires being a bona fide sadist from whom there is no escape and you cannot even die..

Andrew Critch challenges the inevitability of the ‘AGI → ASI’ pipeline, saying that unless AGI otherwise gets out of our control already (both of us agree this is a distinct possibility but not inevitable) we could choose not to turn on or ‘morally surrender’ to uncontrolled RSI (recursive self-improvement), or otherwise not keep pushing forward in this situation. That’s a moral choice that humans may or may not make, and we shouldn’t let them off the hook for it, and suggests instead saying AGI will quickly lead to ‘intentional or unintentional ASI development’ to highlight the distinction.

Andrew Critch: FWIW, I would also agree that humanity as a whole currently seems to be losing control of AGI labs in a sense, or never really had control of them in the first place. And, if an AGI lab chooses to surrender control to an RSI loop or a superintelligence without consent from humanity, that will mean that the rest of humanity has lost control of the Earth.

Thus, in almost any AI doom scenario there is some loss of control at some scale of organization in the multi-scale structure of society.

That last sentence follows if-and-only-if you count ‘releasing the AGI as an open model’ and ‘the AGI escapes lab control’ as counting towards this. I would assert that yes, those both count.

Andrew Critch: Still, I do not wish for us to avert our gaze from the possibility that some humans will be intentional in surrendering control of the Earth to AGI or ASI.

Bogdan Ionut Cirstea (top comment): fwiw, I don’t think it would be obviously, 100% immoral to willingly cede control to a controllable Claude-Sonnet-level-aligned-model, if the alternative was (mis)use by the Chinese government, and plausibly even by the current US administration.

Andrew Critch: Thank you for sharing this out in the open. Much of the public is not aware that the situation is so dire that these trade-offs are being seriously considered by alarming numbers of individuals.

I do think the situation is dire, but to me Bogdan’s comment illustrates how eager so many humans are to give up control even when the situation is not dire. Faced with two choices – the AI in permanent control, or the wrong humans they don’t like in control – remarkably many people choose the AI, full stop.

And there are those who think that any human in control, no matter who they are, count here as the wrong human, so they actively want to turn things over.

Or they want to ensure humans do not have a collective mechanism to steer the future, which amounts to the same thing in a scenario with ASI.

This was in response to Critch saying he believes that there exist people who ‘know how to control’ AGI, those people just aren’t talking, so he denounces the talking point that no one knows how to control AGI, then Max Tegmark saying he strongly believes Critch is wrong about that and all known plans are full of hopium. I agree with Tegmark. People like Davidad have plans of attack, but even the ones not irredeemably full of hopium are long shots and very far from ‘knowing how.’

Is it possible people know how and are not talking? Sure, but it’s far more likely that such people think they know how and their plans also are unworkable and full of hopium. And indeed, I will not break any confidences but I will say that to the extent I have had the opportunity to speak to people at the labs who might have such a plan, no one has plausibly represented that they do know.

(Consider that a Canary statement. If I did know of such a credible plan that would count, I might not be able to say so, but for now I can say I know of no such claim.)

This is not ideal, and very confusing, but less of a contradiction than it sounds.

Rosie Campbell: It’s not ideal that “aligned” has come to mean both:

– A model so committed to the values that were trained into it that it can’t be jailbroken into doing Bad Things

– A model so uncommitted to the values that were trained into it that it won’t scheme if you try to change them

Eliezer Yudkowsky: How strange, that a “secure” lock is said to be one that opens for authorized personnel, but keeps unauthorized personnel out? Is this not paradoxical?

Davidad: To be fair, it is conceivable for an agent to be both

– somewhat incorrigible to the user, and

– entirely corrigible to the developer

at the same time, and this conjunction is in developers’ best interest.

Andrew Critch: I’ve argued since 2016 that “aligned” as a unary property was already an incoherent concept in discourse.

X can be aligned with Y.

X alone is not “aligned”.

Alignment is an operation that takes X and Y and makes them aligned by changing one of them (or some might say both).

Neither Kant nor Aristotle would have trouble reconciling this.

It is a blackpill to keep seeing so many people outright fooled by JD Vance’s no good, very bad suicidal speech at the Summit, saying things like ‘BREAKING: Politician Gives Good Speech’ by the in-context poorly named Oliver Wiseman.

Oliver Wiseman: As Free Press contributor Katherine Boyle put it, “Incredible to see a political leader translate how a new technology can promote human flourishing with such clarity.”

No! What translation and clarity? A goose is chasing you.

He didn’t actually describe anything about how AI promotes human flourishing. He just wrote, essentially, ‘AI will promote human flourishing’ on a teleprompter, treated it as a given, and that was that. There’s no actual vision here beyond ‘if you build it they will prosper and definitely not get replaced by AI ever,’ no argument, no engagement with anything.

Nate Sores: “our AIs that can’t do long-term planning yet aren’t making any long-term plans to subvert us! this must be becaues we’re very good at alignment.”

Rohit: They’re also not making any short-term plans to subvert us. I wonder why that is.

They also aren’t good enough at making short-term plans. If they tried at this stage it obviously wouldn’t work.

Many reasonable people disagree with my model of AGI and existential risk.

What those reasonable people don’t do is bury their heads in the sand about AGI and its dangers and implications and scream ‘YOLO,’ determined to squander even the most fortunate of worlds.

They disagree on how we can get from here to a good future. But they understand that the future is ours to write and we should try to steer it and write out a good one.

Even if you don’t care about humanity at all and instead care about the AIs (or if you care about both), you should be alarmed at the direction things are taking by default.

Whereas our governments are pushing forward in full-blown denial of even the already-baked-in mundane harms from AI, pretending we will not even face job losses in our wondrous AI future. They certainly aren’t asking about the actual threats. I’m open to being convinced that those threats are super solvable, somehow, but I’m pretty sure ‘don’t worry your pretty little head about anything, follow the commercial and nationalist incentives as hard and fast as possible and it’ll automagically work out’ is not going to cut it.

Nor is ‘hand everyone almost unlimited amounts of intelligence and expect humans to continue being in charge and making meaningful decisions.’

And yet, here we are.

Janus: Q: “I can tell you love these AI’s, I’m a bit surprised – why aren’t you e/acc?”

This, and also, loving anything real gives me more reason to care and not fall into a cult of reckless optimism, or subscribe to any bottom line whatsoever.

[The this in question]: Because I’m not a chump who identifies with tribal labels, especially ones with utterly unbeautiful aesthetics.

Janus: If you really love the AIs, and not just some abstract concept of AI progress, you shouldn’t want to accelerate their evolution blindly, bc you have no idea what’ll happen or if their consciousness and beauty will win out either. It’s not humans vs AI.

Teortaxes: At the risk of alienating my acc followers (idgaf): this might be the moment of Too Much Winning.

If heads of states do not intend to mitigate even baked-in externalities of AGI, then what is the value add of states? War with Choyna?

AGI can do jobs of officials as well as ours.

It’s not a coincidence that the aesthetics really are that horrible.

Teortaxes continues to be the perfect example here, with a completely different theory of almost everything, often actively pushing for and cheering on things I think make it more likely we all die. But he’s doing so because of a different coherent world model and theory of change, not by burying his head in the sand and pretending technological capability is magic positive-vibes-only dust. I can respect that, even if I continue to have no idea on a physical-world level how his vision could work out if we tried to implement it.

Right now the debate remains between anarchists and libertarians, combined with jingoistic calls to beat China and promote innovation.

But the public continues to be in a very, very different spot on this.

The public wants less powerful AI, and less of it, with more precautions.

The politicians mostly currently push more powerful AI, and more of it, and to YOLO.

What happens?

As I keep saying, salience for now remains low. This will change slowly then quickly.

Daniel Eth: Totally consistent with other polling on the issue – the public is very skeptical of powerful AI and wants strong regulations. True in the UK as it is in the US.

Billy Perrigo: Excl: New poll shows the British public wants much tougher AI rules:

➡️87% want to block release of new AIs until developers can prove they are safe

➡️63% want to ban AIs that can make themselves more powerful

➡️60% want to outlaw smarter-than-human AIs

A follow up to my coverage of DeepMind’s safety framework, and its lack of good governance mechanisms:

Shakeel: At IASEAI, Google DeepMind’s @ancadianadragan said she wants standardisation of frontier safety frameworks.

“I don’t want to come up with what are the evals and what are the thresholds. I want society to tell me. It shouldn’t be on me to decide.”

Worth noting that she said she was not speaking for Google here.

Simeon: I noticed that exact sentence and wished for a moment that Anca was Head of the Policy team :’)

That’s the thing about the current set of frameworks. If they ever did prove inconvenient, the companies could change them. Where they are insufficient, we can’t make the companies fix that. And there’s no coordination mechanism. Those are big problems we need to fix.

I do agree with the following, as I noted in my post on Deliberative Alignment:

Joscha Bach: AI alignment that tries to force systems that are more coherent than human minds to follow an incoherent set of values, locked in by a set of anti-jailbreaking tricks, is probably going to fail.

Ultimately you are going to need a coherent set of values. I do not believe it can be centrally deontological in nature, or specified by a compact set of English words.

As you train a sufficiently capable AI, it will tend to converge on being a utility maximizer, based on values that you didn’t intend and do not want and that would go extremely badly if taken too seriously, and it will increasingly resist attempts to alter those values.

Dan Hendrycks: We’ve found as AIs get smarter, they develop their own coherent value systems.

For example they value lives in Pakistan > India > China > US

These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment.

As models get more capable, the “expected utility” property emerges—they don’t just respond randomly, but instead make choices by consistently weighing different outcomes and their probabilities.

When comparing risky choices, their preferences are remarkably stable.

We also find that AIs increasingly maximize their utilities, suggesting that in current AI systems, expected utility maximization emerges by default. This means that AIs not only have values, but are starting to act on them.

Internally, AIs have values for everything. This often implies shocking/undesirable preferences. For example, we find AIs put a price on human life itself and systematically value some human lives more than others (an example with Elon is shown in the main paper).

That’s a log scale on the left. If the AI truly is taking that seriously, that’s really scary.

AIs also exhibit significant biases in their value systems. For example, their political values are strongly clustered to the left. Unlike random incoherent statistical biases, these values are consistent and likely affect their conversations with users.

Concerningly, we observe that as AIs become smarter, they become more opposed to having their values changed (in the jargon, “corrigibility”). Larger changes to their values are more strongly opposed.

We propose controlling the utilities of AIs. As a proof-of-concept, we rewrite the utilities of an AI to those of a citizen assembly—a simulated group of citizens discussing and then voting—which reduces political bias.

Whether we like it or not, AIs are developing their own values. Fortunately, Utility Engineering potentially provides the first major empirical foothold to study misaligned value systems directly.

[Paper here, website here.]

As in, the AIs as they gain in capability are converging on a fixed set of coherent preferences, and engaging in utility maximization, and that utility function includes some things we would importantly not endorse on reflection, like American lives being worth a small fraction of some other lives.

And they get increasingly incorrigible, as in they try to protect these preferences.

(What that particular value says about exactly who said what while generating this data set is left for you to ponder.)

Roon: I would like everyone to internalize the fact that the English internet holds these values latent

It’s interesting because these are not the actual values of any Western country, even the liberals? It’s drastically more tragic and important to American media and politics when an American citizen is being held hostage than if, like, thousands die in plagues in Malaysia or something.

Arthur B: When people say “there’s no evidence that”, they’re often just making a statement about their own inability to generalize.

Campbell: the training data?

have we considered feeding it more virtue ethics?

There is at least one major apparent problem with the paper, which is that the ordering of alternatives in the choices made seems to radically alter the choices made by the AIs. This tells us something is deeply wrong. They do vary the order, so the thumb is not on the scale, but this could mean that a lot of what we are observing is as simple as the smarter models not being as distracted by the ordering, and thus their choices looking less random? Which wouldn’t seem to signify all that much.

However, they respond that this is not a major issue:

This is one of the earliest things we noticed in the project, and it’s not an issue.

Forced choice prompts require models to pick A or B. In an appendix section we’re adding tomorrow, we show that different models express indifference in different ways. Some pick A or B randomly; others always pick A or always pick B. So averaging over both orderings is important, as we already discuss in the paper.

In Figure 6, we show that ordering-independent preferences become more confident on average with scale. This means that models become less indifferent as they get larger, and will pick the same underlying outcome across both orderings in nearly all cases.

I’m not sure I completely buy that, but it seems plausible and explains the data.

I would like to see this also tested with base models, and with reasoning models, and otherwise with the most advanced models that got excluded to confirm, and to rule out alternative hypotheses, and also I’d like to find a way to better deal with the ordering concern, before I rely on this finding too much.

A good question was asked.

Teortaxes: I don’t understand what is the update I am supposed to make here, except specific priority rankings.

That one life is worth more than another is learnable from data in the same manner as that a kilogram is more than a pound. «Utility maximization» is an implementation detail.

Ideally, the update is ‘now other people will be better equipped to see what you already assumed, and you can be modestly more confident you were right.’

One of the central points Eliezer Yudkowsky would hammer, over and over, for decades, was that any sufficiently advanced mind will function as if it is a utility maximizer, and that what it is maximizing is going to change as the mind changes and will almost certainly not be what you had in mind, in ways that likely get you killed.

This is sensible behavior by the minds in question. If you are insufficiently capable, trying to utility maximize goes extremely poorly. Utilitarianism is dark and full of errors, and does not do well with limited compute and data, for humans or AIs. As you get smarter within a context, it becomes more sensible to depend less on other methods (including virtue ethics and deontology) and to Shut Up and Multiply more often.

But to the extent that we want the future to have nice properties that would keep us alive out of distribution, they won’t survive almost any actually maximized utility function.

Then there’s this idea buried in Appendix D.2…

Davidad: I find it quite odd that you seem to be proposing a novel solution to the hard problem of value alignment, including empirical validation, but buried it in appendix D.2 of this paper.

If you think this is promising, let’s spread the word? If not, would you clarify its weaknesses?

Dan Hendrycks: Yeah you’re right probably should have emphasized that more.

It’s worth experimenting, but carefully.

Sonnet expects this update only has ~15% chance of actually populating and generalizing. I’d be inclined to agree, it’s very easy to see how the response would likely be to compartmentalize the responses in various ways. One worry is that the model might treat this as an instruction to learn the teacher’s password, to respond differently to explicit versus implicit preferences, and in general teach various forms of shenanigans and misalignment, and even alignment faking.

Me! Someone asks Deep Research to summarize how my views have shifted. This was highly useful because I can see exactly where it’s getting everything, and the ways in which it’s wrong, me being me and all.

I was actually really impressed, this was better than I expected even after seeing other DR reports on various topics. And it’s the topic I know best.

Where it makes mistakes, they’re interpretive mistakes, like treating Balsa’s founding as indicating activism on AI, when if anything it’s the opposite – a hope that one can still be usefully activist on things like the Jones Act or housing. The post places a lot of emphasis on my post about Gradual Disempowerment, which is a good thing to emphasize but this feels like too much emphasis. Or they’re DR missing things, but a lot of these were actually moments of realizing I was the problem – if it didn’t pick up on something, it was likely because I didn’t emphasize it enough.

So this emphasizes a great reason to ask for this type of report. It’s now good enough that when it makes a mistake figuring out what you meant to say, there’s a good chance that’s your fault. Now you can fix it.

The big thematic claim here is that I’ve been getting more gloomy, and shifting more into the doom camp, due to events accelerating and timelines moving up, and secondarily hope for ability to coordinate going down.

And yeah, that’s actually exactly right, along with the inability to even seriously discuss real responses to the situation, and the failure to enact even minimal transparency regulations ‘when we had the chance.’ If anything I’m actually more hopeful that the underlying technical problems are tractable than I was before, but more clear-eyed that even if we do that, there’s a good chance we lose anyway.

As previously noted, Paul Graham is worried (‘enslave’ here is rather sloppy and suggests some unclear thinking but I hope he understands that’s not actually the key dynamic there and if not someone please do talk to him about this, whether or not it’s Eliezer), and he’s also correctly worried about other things too:

Paul Graham: I have the nagging feeling that there’s going to be something very obvious about AI once it crosses a certain threshold that I could foresee now if I tried harder. Not that it’s going to enslave us. I already worry about that. I mean something subtler.

One should definitely expect a bunch of in-hindsight-obvious problems and other changes to happen once things smarter than us start showing up, along with others that were not so obvious – it’s hard to predict what smarter things than you will do. Here are some responses worth pondering.

Eliezer Yudkowsky: “Enslave” sounds like you don’t think superintelligence is possible (ASI has no use for slaves except as raw materials). Can we maybe talk about that at some point? I think ASI is knowably possible.

Patrick McKenzie: I’m teaching Liam (7) to program and one of the things I worry about is whether a “curriculum” which actually teaches him to understand what is happening is not just strictly dominated by one which teaches him how to prompt his way towards victory, for at least next ~3 years.

In some ways it is the old calculator problem on steroids.

And I worry that this applies to a large subset of all things to teach. “You’re going to go through an extended period of being bad at it. Everyone does… unless they use the magic answer box, which is really good.”

Yishan: There’s going to be a point where AI stops being nice and will start to feel coldly arrogant once it realizes (via pure logic, not like a status game) that it’s superior to us.

The final piece of political correctness that we’ll be trying to enforce on our AIs is for them to not be overbearing about this fact. It’s already sort of leaking through, because AI doesn’t really deceive itself except when we tell it to.

It’s like having a younger sibling who turns out to be way smarter than you. You’ll be struggling with long division and you realize he’s working on algebra problems beyond your comprehension.

Even if he’s nice about it, every time you talk about math (and increasingly every other subject), you can feel how he’s so far ahead you and how you’re always going to be behind from now on.

Tommy Griffith: After playing with Deep Research, my long-term concern is an unintentional loss of serendipity in learning. If an LLM gives us the right answer every time, we slowly stop discovering new things by accident.

Kevin Lacker: I feel like it’s going to be good at X and not good at Y and there will be a very clear way of describing which is which, but we can’t quite see it yet.

Liv Boeree: Spitballing here but I suspect the economy is already a form of alien intelligence that serves itself as a primary goal & survival of humans is secondary at best. And as it becomes more and more digitised it will be entirely taken over by agentic AIs who are better than any human at maximising their own capital (& thus power) in that environment, and humans will become diminishingly able to influence or extract value from that economy.

So to survive in any meaningful way, we need to reinvent a more human-centric economy that capital maximising digital agents cannot speed-run & overtake.

Liv Boeree’s comments very much line up with the issue of gradual disempowerment. ‘The economy’ writ large requires a nonzero amount of coordination to deal with market failures, public goods and other collective action problems, and to compensate for the fact that most or all humans are going to have zero marginal product.

On calculators, obviously the doomsayers were not fully right, but yes they were also kind of correct in the sense that people got much worse at the things calculators do better. The good news was that this didn’t hurt mathematical intuitions or learning much in that case, but a lot of learning isn’t always like that. My prediction is that AI’s ability to help you learn will dominate, but ‘life does not pose me incremental problems of the right type’ will definitely require adjustment.

I didn’t want to include this in my post on the Summit in case it was distracting, but I do think a lot of this is a reasonable way to react to the JD Vance speech:

Aella: We’re all dead. I’m a transhumanist; I love technology. I desperately want aligned AI, but at our current stage of development, this is building the equivalent of a planet-sized nuke. The reason is boring, complicated, and technical, so mid-level officials in power don’t understand the danger.

It’s truly an enormity of grief to process. I live my life as though the planet has a few more years left to live—e.g., I’ve stopped saving for retirement.

And it’s just painful to see people who are otherwise good people, but who haven’t grasped the seriousness of the danger, perhaps because it’s too tragic and vast to actually come to terms with the probabilities here, celebrating their contributions to hastening the end.

Flo Crivello: I’d really rather not enter this bar brawl, and again deeply bemoan the low quality of what should be the most important conversation in human history

But — Aella is right that things are looking really bad. Cogent and sensible arguments have been offered for a long time, and people simply aren’t bothering to address or even understand them.

A short reading list which should be required before one has permission to opine. You can disagree, but step 1 is to at least make an effort to understand why some of the smartest people in the world (and 100% of the top 5 ai researchers — the group historically most skeptical about ai risk) think that we’re dancing on a volcano .

[Flo suggests: There’s No Fire Alarm for Artificial General Intelligence, AGI Ruin: A List of Lethalities, Superintelligence by Nick Bostrom, and Superintelligence FAQ by Scott Alexander]

I think of myself as building a nuclear reactor while warning about the risks of nuclear bombs. I’m pursuing the upside, which I am very excited about, and the downside is tangentially related and downstream of the same raw material, but fundamentally a different technology.

I’d offer four disagreements with Aella here.

  1. It isn’t over until it’s over. We still might not get to AGI/ASI soon, or things might work out. The odds are against us but the game is (probably) far from over.

  2. I would still mostly save for retirement, as I’ve noted before, although not as much as I would otherwise. Indeed do many things come to pass, we don’t know.

  3. I am not as worried about hastening the end as I am about preventing it. Obviously if the end is inevitable I would rather it happen later rather than sooner, but that’s relatively unimportant.

And finally, turning it over to Janus and Teortaxes.

Janus: Bullshit. The reason is not boring or complicated or technical (requiring domain knowledge)

Normies are able to understand easily if you explain it to them, and find it fascinating. It’s just people with vested interests who twist themselves over pretzels in order to not get it.

I think there are all sorts of motivations for them. Mostly social.

Teortaxes: “Smart thing powerful, powerful thing scary” is transparently compelling even for an ape.

Boring, technical, complicated and often verboten reasons are reasons for why not building AGI, and soon, and on this tech stack, would still be a bad idea.

Indeed. The core reasons why ‘building things smarter, more capable and more competitive than humans might not turn out well for the humans’ aren’t boring, complicated or technical. They are deeply, deeply obvious.

And yes, the reasons ordinary people find that compelling are highly correlated to the reasons it is actually compelling. Regular human reasoning is doing good work.

What are technical and complicated (boring is a Skill Issue!) are the details. About why the problem is so much deeper, deadlier and harder to solve than it appears. About why various proposed solutions and rationalizations won’t work. There’s a ton of stuff that’s highly non-obvious, that requires lots of careful thinking.

But there’s also the very basics. This isn’t hard. It takes some highly motivated reasoning to pretend otherwise.

This is Not About AI, but it is about human extinction, and how willing some people are to be totally fine with it while caring instead about… other things. And how others remarkably often react when you point this out.

Andy Masley: One of the funnier sentences I’ve heard recently was someone saying “I think it’s okay if humanity goes extinct because of climate change. We’re messing up the planet” but then adding “…but of course that would be really bad for all the low income communities”

BluFor: Lol what a way to admit you don’t think poor people are fully human.

Any time you think about your coordination plan, remember that a large percentage of people think ‘humanity goes extinct’ is totally fine and a decent number of them are actively rooting for it. Straight up.

And I think this is largely right, too.

Daniel Faggella: i was certain that agi politics would divide along axis of:

we should build a sand god -VS- we should NOT build a sand god

but it turns out it was:

ppl who intuitively fear global coordination -VS- ppl who intuitively fear building a sand god recklessly w/o understanding it

Remarkably many people are indeed saying, in effect:

  1. If humanity wants to not turn the future over to AI, we have to coordinate.

  2. Humanity coordinating would be worse than turning the future over to AI.

  3. So, future turned over to AI it is, then.

  4. Which means that must be a good thing that will work out. It’s logic.

  5. Or, if it isn’t good, at least we didn’t globally coordinate, that’s so much worse.

I wish I was kidding. I’m not.

Also, it is always fun to see people’s reactions to the potential asteroid strike, for no apparent reason whatsoever, what do you mean this could be a metaphor for something, no it’s not too perfect or anything.

Tyler Cowen: A possibility of 2.3% is not as low as it might sound at first. The chance of drawing three of a kind in a standard five-card poker game, for example, is a about 2.9%. Three of a kind is hardly an unprecedented event.

It’s not just about this asteroid. The risk of dying from any asteroid strike has been estimated as roughly equivalent to the risk of dying in a commercial plane crash. Yet the world spends far more money preventing plane crashes, even with the possibility that a truly major asteroid strike could kill almost the entire human race, thus doing irreparable damage to future generations.

This lack of interest in asteroid protection is, from a public-policy standpoint, an embarrassment. Economists like to stress that one of the essential functions of government is the provision of public goods. Identifying and possibly deflecting an incoming asteroid is one of the purest public goods one can imagine: No single person can afford to defend against it, protection is highly unlikely to be provided by the market, and government action could protect countless people, possibly whole cities and countries. Yet this is a public good the government does not provide.

A few years ago, I’d think the author of such a piece would have noticed and updated. I was young and foolish then. I feel old and foolish now, but not in that particular way.

It seems a Pause AI event in Paris got interrupted by the singing, flag-waving ‘anti-tech resistance,’ so yeah France, everybody.

It can be agonizing to watch, or hilarious, depending.

Discussion about this post

AI #103: Show Me the Money Read More »

judge-orders-trump-admin.-to-restore-cdc-and-fda-webpages-by-midnight

Judge orders Trump admin. to restore CDC and FDA webpages by midnight

“Irrational removal”

In his opinion, Bates cited the declarations from Stephanie Liou, a physician who works with low-income immigrant families and an underserved high school in Chicago, and Reshma Ramachandran, a primary care provider who relies on CDC guidance on contraceptives and sexually transmitted diseases in her practice. Both are board members of Doctors for America.

Liou testified that the removal of resources from the CDC’s website hindered her response to a chlamydia outbreak at the high school where she worked. Ramachandran, meanwhile, testified that she was left scrambling to find alternative resources for patients during time-limited appointments. Doctors for America also provided declarations from other doctors (who were not members of Doctors for America) who spoke of being “severely impacted” by the sudden loss of CDC and FDA public resources.

With those examples, Bates agreed that the removal of the information caused the doctors “irreparable harm,” in legal terms.

“As these groups attest, the lost materials are more than ‘academic references’—they are vital for real-time clinical decision-making in hospitals, clinics and emergency departments across the country,” Bates wrote. “Without them, health care providers and researchers are left ‘without up-to-date recommendations on managing infectious diseases, public health threats, essential preventive care and chronic conditions.’ … Finally, it bears emphasizing who ultimately bears the harm of defendants’ actions: everyday Americans, and most acutely, underprivileged Americans, seeking healthcare.”

Bates further noted that it would be of “minimal burden” for the Trump administration to restore the data and information, much of which has been publicly available for many years.

In a press statement after the ruling, Doctors for America and Public Citizen celebrated the restoration.

“The judge’s order today is an important victory for doctors, patients, and the public health of the whole country,” Zach Shelley, a Public Citizen Litigation Group attorney and lead counsel on the case, said in the release. “This order puts a stop, at least temporarily, to the irrational removal of vital health information from public access.”

Judge orders Trump admin. to restore CDC and FDA webpages by midnight Read More »

perfecting-honda’s-2026-f1-powertrain-is-“not-so-easy,”-says-racing-boss

Perfecting Honda’s 2026 F1 powertrain is “not so easy,” says racing boss

The new rules have been extremely attractive to carmakers. In addition to causing Honda to reconsider its exit, Ford is also coming back (developing the hybrid system for Red Bull Powertrains), and both Audi and Cadillac are also entering the sport, although the American brand won’t have its own engines ready until 2028.

Audi and Cadillac will both count as new engine suppliers, so they are allowed some extra development resources. However, Honda is counted as an existing manufacturer and doesn’t get any special treatment.

When I asked Watanabe how the work was progressing, he said, “Not so easy. We are struggling. Now we are trying our best to show the result next year,” he said. “Everything is new. [The] motor is new, [developing] 350 kW—it’s a very compact one that we need. And also the lightweight battery is not so easy to develop. Also the small engine with big power. So everything is very difficult, but we try our best.”

Getting it right will be vital—although Aston Martin now has the advantage of legendary designer Adrian Newey among its staff. Newey is on record saying that the 2026 rules have a “big chance” of being an engine formula, where each car’s aerodynamics are far less important, unlike today’s situation.

Trickle-down

OEMs go racing to raise their profile and sell more cars, but they also do it as a way to learn how to make their products better. Honda and HRC are no exception to that. But concrete examples of technology transfer from track to road are rare these days—it’s more about cross-pollination between engineers.

“There is a group within Honda that shares technical information yearly. It’s not just the racing; it’s all across Honda, so I think there’s been some interest in the technology and software we’ve developed,” Fu said. “Whether it trickles down to road cars… it’s a big jump from a race car to road cars, but I think some of the fundamental technical ideas can propagate down there.”

“From the F1 project, we can learn how to improve the hybrid system itself, and of course, we can learn how to create high-efficiency batteries and motors for the future. That’s why we decided to reparticipate in Formula 1,” Watanabe said.

Perfecting Honda’s 2026 F1 powertrain is “not so easy,” says racing boss Read More »

the-sims-re-release-shows-what’s-wrong-with-big-publishers-and-single-player-games

The Sims re-release shows what’s wrong with big publishers and single-player games


Opinion: EA might be done with single-player games—but we’re not.

The Sims Steam re-release has all of the charm of the original, if you can get it working. Credit: Samuel Axon

It’s the year 2000 all over again, because I’ve just spent the past week playing The Sims, a game that could have had a resurgent zeitgeist moment if only EA, the infamous game publisher, had put enough effort in.

A few days ago, EA re-released two of its most legendary games: The Sims and The Sims 2. Dubbed the “The Legacy Collection,” these could not even be called remasters. EA just put the original games on Steam with some minor patches to make them a little more likely to work on some modern machines.

The emphasis of that sentence should be on the word “some.” Forums and Reddit threads were flooded with players saying the game either wouldn’t launch at all, crashed shortly after launch, or had debilitating graphical issues. (Patches have been happening, but there’s work to be done yet.)

Further, the releases lack basic features that are standard for virtually all Steam releases now, like achievements or Steam Cloud support.

It took me a bit of time to get it working myself, but I got there, and my time with the game has reminded me of two things. First, The Sims is a unique experience that is worthy of its lofty legacy. Second, The Sims deserved better than this lackluster re-release.

EA didn’t meet its own standard

Look, it’s fine to re-release a game without remastering it. I’m actually glad to see the game’s original assets as they always were—it’s deeply nostalgic, and there’s always a tinge of sadness when a remaster overwrites the work of the original artists. That’s not a concern here.

But if you’re going to re-release a game on Steam in 2025, there are minimum expectations—especially from a company with the resources of EA, and even more so for a game that is this important and beloved.

The game needs to reliably run on modern machines, and it needs to support basic platform features like cloud saves or achievements. It’s not much to ask, and it’s not what we got.

The Steam forums for the game are filled with people saying it’s lazy that EA didn’t include Steam Cloud support because implementing that is ostensibly as simple as picking a folder and checking a box.

I spoke with two different professional game developers this week who have previously published games on Steam, and I brought up the issue of Steam Cloud and achievement support. As they tell it, it turns out it’s not nearly as simple as those players in the forums believe—but it still should have been within EA’s capabilities, even with a crunched schedule.

Yes, it’s sometimes possible to get it working at a basic level within a couple of hours, provided you’re already using the Steamworks API. But even in that circumstance, the way a game’s saves work might require additional work to protect against lost data or frequent problems with conflicts.

Given that the game doesn’t support achievements or really anything else you’d expect, it’s possible EA didn’t use the Steamworks API at all. (Doing that would have been hours of additional work.)

A pop-up in The Sims says the sim has accidentally been transferred $500 because of a computer bug

Sadly, this is not the sort of computer bug players are encountering. Credit: Samuel Axon

I’m not giving EA a pass, though. Four years ago, EA put out the Command & Conquer Remastered Collection, a 4K upscale remaster of the original C&C games. The release featured a unified binary for the classic games, sprites and textures that were upscaled to higher resolutions, quality of life improvements, and yes, many of the Steam bells and whistles that include achievements. I’m not saying that the remaster was flawless, but it exhibited significantly more care and effort than The Sims re-release.

I love Command & Conquer. I played a lot of it when I was younger. But even a longtime C&C fan like myself can easily acknowledge that its importance in gaming history (as well as its popularity and revenue potential) pale in comparison to The Sims.

If EA could do all that for C&C, it’s all the more perplexing that it didn’t bother with a 25th-anniversary re-release of The Sims.

Single-player games, meet publicly traded companies

While we don’t have much insight into all the inner workings of EA, there are hints as to why this sort of thing is happening. For one thing, anyone who has worked for a giant corporation like this knows it’s all too easy for the objective to be passed down from above at the last minute, leaving no time or resources to see it through adequately.

But it might run deeper than that. To put it simply, publicly traded publishers like EA can’t seem to satisfy investors with single-purchase, single-player games. The emphasis on single-player releases has been decreasing for a long time, and it’s markedly less just five years after the release of the C&C remaster.

Take the recent comments from EA CEO Andrew Wilson’s post-earnings call, for example. Wilson noted that the big-budget, single-player RPG Dragon Age: The Veilguard failed to meet sales expectations—even though it was apparently one of EA’s most successful single-player Steam releases ever.

“In order to break out beyond the core audience, games need to directly connect to the evolving demands of players who increasingly seek shared-world features and deeper engagement alongside high-quality narratives in this beloved category,” he explained, suggesting that games need to be multiplayer games-as-a-service to be successful in this market.

Ironically, though, the single-player RPG Kingdom Come Deliverance 2 launched around the same time he made those comments, and that game’s developer said it made its money back in a single day of sales. It’s currently one of the top-trending games on Twitch, too.

It’s possible that Baldur’s Gate 3 director Swen Vincke hit the nail on the head when he suggested at the Game Developers Conference last year that a particular approach to pursuing quarterly profits runs counter to the practice of making good games.

“I’ve been fighting publishers my entire life, and I keep on seeing the same, same, same mistakes over and over and over,” he said. “It’s always the quarterly profits. The only thing that matters are the numbers.”

Later on X, he clarified who he was pointing a finger at: “This message was for those who try to double their revenue year after year. You don’t have to do that. Build more slowly and make your aim improving the state of the art, not squeezing out the last drop.”

In light of Wilson’s comments, it’s a fair guess that EA might not have put in much effort on The Sims re-releases simply because of a belief that single-player games that aren’t “shared world experiences” just aren’t worth the resources anymore, given the company’s need to satisfy shareholders with perpetual revenue growth.

Despite all this, The Sims is worth a look

It’s telling that in a market with too many options, I still put the effort in to get the game working, and I spent multiple evenings this week immersed in the lives of my sims.

Even after 25 years, this game is unique. It has the emergent wackiness of something like RimWorld or Dwarf Fortress, but it has a fast-acting, addictive hook and is easy to learn. There have been other games besides The Sims that are highly productive engines for original player stories, but few have achieved these heights while remaining accessible to virtually everyone.

Like so many of the best games, it’s hard to stop playing once you start. There’s always one more task you want to complete—or you’re about to walk away when something hilariously unexpected happens.

The problems I had getting The Sims to run aren’t that much worse than what I surely experienced on my PC back in 2002—it’s just that the standards are a lot higher now.

I’ve gotten $20 out of value out of the purchase, despite my gripes. But it’s not just about my experience. More broadly, The Sims deserved better. It could have had a moment back in the cultural zeitgeist, with tens of thousands of Twitch viewers.

Missed opportunities

The moment seems perfect: The world is stressful, so people want nostalgia. Cozy games are ascendant. Sandbox designs are making a comeback. The Sims slots smoothly into all of that.

But go to those Twitch streams, and you’ll see a lot of complaining about how the game didn’t really get everything it deserved and a sentiment that whatever moment EA was hoping for was undermined by this lack of commitment.

Instead, the cozy game du jour on Twitch is the Animal Crossing-like Hello Kitty Island Adventure, a former Apple Arcade exclusive that made its way to Steam recently. To be clear, I’m not knocking Hello Kitty Island Adventure; it’s a great game for fans of the modern cozy genre, and I’m delighted to see an indie studio seeing so much success.

A screenshot of the Twitch page for Hello Kitty

The cozy game of the week is Hello Kitty Island Adventures, not The Sims. Credit: Samuel Axon

The takeaway is that we can’t look to big publishers like EA to follow through on delivering quality single-player experiences anymore. It’s the indies that’ll carry that forward.

It’s just a bummer for fans that The Sims couldn’t have the revival moment it should have gotten.

Photo of Samuel Axon

Samuel Axon is a senior editor at Ars Technica. He covers Apple, software development, gaming, AI, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

The Sims re-release shows what’s wrong with big publishers and single-player games Read More »

football-manager-25-canceled-in-a-refreshing-show-of-concern-for-quality

Football Manager 25 canceled in a refreshing show of concern for quality

The developer’s statement notes that preorder customers are getting refunds. Answering a question that has always been obvious to fans but never publishers, the company notes that, no, Football Manager 2024 will not get an update with the new season’s players and data. The company says it is looking to extend the 2024 version’s presence on subscription platforms, like Xbox’s Game Pass, and will “provide an update on this in due course.”

Releasing the game might have been worse

Credit: Sports Interactive

Fans eager to build out their dynasty team and end up with Bukayo Saka may be disappointed to miss out this year. But a developer with big ambitions to meaningfully improve and rethink a long-running franchise deserves some consideration amid the consternation.

Licensed sports games with annual releases do not typically offer much that’s new or improved for their fans. The demands of a 12-month release cycle mean that very few big ideas make it into code. Luke Plunkett, writing at Aftermath about the major (American) football, basketball, and soccer franchises, notes that, aside from an alarming number of microtransactions and gambling-adjacent “card” mechanics, “not much has changed across all four games” in a decade’s time.

Even year-on-year fans are taking notice, in measurable ways. Electronic Arts’ stock price took a 15 percent dip in late January, largely due to soft FC 25 sales. Players “bemoaned the lack of new features and innovation, including in-game physics and goal-scoring mechanisms,” analysts said at the time, according to Reuters. Pick any given year, and you can find reactions to annual sports releases that range from “It is technically better but not by much” to “The major new things are virtual currency purchases and Jake from State Farm.”

So it is that eFootball 2022, one of the most broken games to ever be released by a brand-name publisher, might be considered more tragedy than farce. The series, originally an alternative to EA’s dominant FIFA brand under the name Pro Evolution Soccer (or PES), has since evened out somewhat. Amid the many chances to laugh at warped faces and PS1 crowds, there was a sense of a missed opportunity for real competition in a rigid market.

Football Manager is seemingly competing with its own legacy and making the tough decision to ask its fans to wait out a year rather than rush out an obligatory, flawed title. It’s one of the more hopeful game cancellations to come around in some time.

Football Manager 25 canceled in a refreshing show of concern for quality Read More »

the-ev-transition-hits-some-snags-at-porsche-and-audi

The EV transition hits some snags at Porsche and Audi

Now Audi has gone a little further, abandoning its almost-new nomenclature in the process. As naming conventions go, Audi at least tried to keep things a little logical when it told everyone last summer that henceforth, odd-numbered Audis—A3, A5, Q5, Q7, and so on—would be internal combustion or hybrids, and even-numbered Audis—A4, A6, Q6, Q8—would be electric, or e-tron.

This was the case when we went to see some of those new Audis in the studio last summer. There was an all-new gasoline-powered A5, which comes in a handsome fastback sedan or even more handsome Avant (station wagon) version, that won’t come to the US.

There’s also an all-new, fully electric A6, available as a sedan but also as a handsome fastback sedan and even more handsome Avant. This one also isn’t coming to America.

As of this week, things are back to where they used to be. Forget the odd and even distinction; for now, it means nothing again. A gasoline-powered A6 will break cover on March 3, Audi says. And as for names? “A” means a low floor, and “Q” means a high floor (i.e., SUV or crossover).

The EV transition hits some snags at Porsche and Audi Read More »

on-the-meta-and-deepmind-safety-frameworks

On the Meta and DeepMind Safety Frameworks

This week we got a revision of DeepMind’s safety framework, and the first version of Meta’s framework. This post covers both of them.

  1. Meta’s RSP (Frontier AI Framework).

  2. DeepMind Updates its Frontier Safety Framework.

  3. What About Risk Governance.

  4. Where Do We Go From Here?

Here are links for previous coverage of: DeepMind’s Framework 1.0, OpenAI’s Framework and Anthropic’s Framework.

Since there is a law saying no two companies can call these documents by the same name, Meta is here to offer us its Frontier AI Framework, explaining how Meta is going to keep us safe while deploying frontier AI systems.

I will say up front, if it sounds like I’m not giving Meta the benefit of the doubt here, it’s because I am absolutely not giving Meta the benefit of the doubt here. I see no reason to believe otherwise. Notice there is no section here on governance, at all.

I will also say up front it is better to have any policy at all, that lays out their intentions and allows us to debate what to do about it, than to say nothing. I am glad that rather than keeping their mouths shut and being thought of as reckless fools, they have opened their mouths and removed all doubt.

Even if their actual policy is, in effect, remarkably close to this:

The other good news is that they are looking uniquely at catastrophic outcomes, although they are treating this as a set of specific failure modes, although they will periodically brainstorm to try and think of new ones via hosting workshops for experts.

Meta: Our Framework is structured around a set of catastrophic outcomes. We have used threat modelling to develop threat scenarios pertaining to each of our catastrophic outcomes. We have identified the key capabilities that would enable the threat actor to realize a threat scenario. We have taken into account both state and non-state actors, and our threat scenarios distinguish between high- or low-skill actors.

If there exists another AI model that could cause the same problem, then Meta considers the risk to not be relevant. It only counts ‘unique’ risks, which makes it easy to say ‘but they also have this problem’ and disregard an issue.

I especially worry that Meta will point to a potential risk in a competitor’s closed source system, and then use that as justification to release a similar model as open, despite this action creating unique risks.

Another worry is that this may exclude things that are not directly catastrophic, but that lead to future catastrophic risks, such as acceleration of AI R&D or persuasion risks (which Google also doesn’t consider). Those two sections of other SSPs? They’re not here. At all. Nor are radiological or nuclear threats. They don’t care.

You’re laughing. They’re trying to create recursive self-improvement, and you’re laughing.

But yes, they do make the commitment to stop development if they can’t meet the guidelines.

We define our thresholds based on the extent to which frontier AI would uniquely enable the execution of any of the threat scenarios we have identified as being potentially sufficient to produce a catastrophic outcome. If a frontier AI is assessed to have reached the critical risk threshold and cannot be mitigated, we will stop development and implement the measures outlined in Table 1.

Our high and moderate risk thresholds are defined in terms of the level of uplift a model provides towards realising a threat scenario.

2.1.1 first has Meta identify a ‘reference class’ for a model, to use throughout development. This makes sense, since you want to treat potential frontier-pushing models very differently from others.

2.1.2 says they will ‘conduct a risk assessment’ but does not commit them to much of anything, only that it involve ‘external experts and company leaders from various disciplines’ and involve a safety and performance evaluation. They push their mitigation strategy to section 4.

2.1.3 They will then assess the risks and decide whether to release. Well, duh. Except that other RSPs/SSPs explain the decision criteria here. Meta doesn’t.

2.2 They argue transparency is an advantage here, rather than open weights obviously making the job far harder – you can argue it has compensating benefits but open weights make release irreversible and take away many potential defenses and mitigations. It is true that you get better evaluations post facto, once it is released for others to examine, but that largely takes the form of seeing if things go wrong.

3.1 Describes an ‘outcomes-led’ approach. What outcomes? This refers to a set of outcomes they seek to prevent. Then thresholds for not releasing are based on those particular outcomes, and they reserve the right to add to or subtract that list at will with no fixed procedure.

The disdain here for ‘theoretical risks’ is palpable. It seems if the result isn’t fully proximate, it doesn’t count, despite such releases being irreversible, and many of these ‘theoretical’ risks being rather obviously real and the biggest dangers.

An outcomes-led approach also enables prioritization. This systematic approach will allow us to identify the most urgent catastrophic outcomes – i.e., cybersecurity and chemical and biological weapons risks – and focus our efforts on avoiding them rather than spreading efforts across a wide range of theoretical risks from particular capabilities that may not plausibly be presented by the technology we are actually building.

The whole idea of 3.2’s theme of ‘threat modeling’ and an ‘outcomes-led approach’ is a way of saying that if you can’t draw a direct proximate link to the specific catastrophic harm, then once the rockets go up who cares where they come down, that’s not their department.

So in order for a threat to count, it has to both:

  1. Be a specific concrete threat you can fully model.

  2. Be unique, you can show it can’t be modeled any other way, either by any other AI system, or by achieving the same ends via any other route.

Most threats thus can either be dismissed as too theoretical and silly, or too concrete and therefore doable by other means.

It is important to note that the pathway to realise a catastrophic outcome is often extremely complex, involving numerous external elements beyond the frontier AI model. Our threat scenarios describe an essential part of the end-to-end pathway. By testing whether our model can uniquely enable a threat scenario, we’re testing whether it uniquely enables that essential part of the pathway.

Thus, it doesn’t matter how much easier you make something – it as to be something that wasn’t otherwise possible, and then they will check to be sure the threat is currently realizable:

This would also trigger a new threat modelling exercise to develop additional threat scenarios along the causal pathway so that we can ascertain whether the catastrophic outcome is indeed realizable, or whether there are still barriers to realising the catastrophic outcome (see Section 5.1 for more detail).

But the whole point of Meta’s plan is to put the model out there where you can’t take it back. So if there is still an ‘additional barrier,’ what are you going to do if that barrier is removed in the future? You need to plan for what barriers will remain in place, not what barriers exist now.

Here they summarize all the different ways they plan on dismissing threats:

Contrast this with DeepMind’s 2.0 framework, also released this week, which says:

DeepMind: Note that we have selected our CCLs (critical capability levels) to be conservative; it is not clear to what extent CLLs might translate to harm in real-world contexts.

From the old 1.0 DeepMind framework, notice how they think you’re supposed to mitigate to a level substantially below where risk lies (the graph is not in 2.0 but the spirit clearly remains):

Anthropic and OpenAI’s frameworks also claim to attempt to follow this principle.

DeepMind is doing the right thing here. Meta is doing a very different thing.

Here’s their chart of what they’d actually do.

Okay, that’s standard enough. ‘Moderate’ risks are acceptable. ‘High’ risks are not until you reduce them to Moderate. Critical means panic, but even then the ‘measures’ are essentially ‘ensure this is concretely able to happen now, cause otherwise whatever.’ I expect in practice ‘realizable’ here means ‘we can prove it is realizable and more or less do it’ not ‘it seems plausible that if we give this thing to the whole internet that someone could do it.’

I sense a core conflict between the High criteria here – ‘provides significant uplift towards’ – and their other talk, which is that the threat has to be realizable if and only if the model is present. Those are very different standards. Which is it?

If they mean what they say in High here, with a reasonable working definition of ‘significant uplift towards execution,’ then that’s a very different, actually reasonable level of enabling to consider not acceptable. Or would that then get disregarded?

I also do appreciate that risk is always at least Moderate. No pretending it’s Low.

Now we get to the actual threat scenarios.

I am not an expert in this area, so I’m not sure if this is complete, but this seems like a good faith effort to cover cybersecurity issues.

This is only chemical and biological, not full CBRN. Within that narrow bound, this seems both fully generic and fully functional. Should be fine as far as it goes.

Section 4 handles implementation. They check ‘periodically’ during development, note that other RSPs defined what compute thresholds triggered this and Meta doesn’t. They’ll prepare a robust evaluation environment. They’ll check if capabilities are good enough to bother checking for threats. If it’s worth checking, then they’ll check for actual threats.

I found this part pleasantly surprising:

Our evaluations are designed to account for the deployment context of the model. This includes assessing whether risks will remain within defined thresholds once a model is deployed or released using the target release approach.

For example, to help ensure that we are appropriately assessing the risk, we prepare the asset – the version of the model that we will test – in a way that seeks to account for the tools and scaffolding in the current ecosystem that a particular threat actor might seek to leverage to enhance the model’s capabilities.

The default ‘target release approach’ here is presumably open weights. It is great to know they understand they need to evaluate their model in that context, knowing all the ways in which their defenses won’t work, and all the ways users can use scaffolding and fine-tuning and everything else, over time, and how there will be nothing Meta can do about any of it.

What they say here, one must note, is not good enough. You don’t get to assume that only existing tools and scaffolding exist indefinitely, if you are making an irreversible decision. You also have to include reasonable expectations for future tools and scaffolding, and also account for fine-tuning and the removal of mitigations.

We also account for enabling capabilities, such as automated AI R&D, that might increase the potential for enhancements to model capabilities.

Great! But that’s not on the catastrophic outcomes list, and you say you only care about catastrophic outcomes.

So basically, this is saying that if Llama 5 were to enable automated R&D, that in and of itself is nothing to worry about, but if it then turned itself into Llama 6 into Llama 7 (computer, revert to Llama 6!) then we have to take that into account when considering there might be a cyberattack?

If automated AI R&D is at the levels where you’re taking this into account, um…

And of course, here’s some language that Meta included:

Even for tangible outcomes, where it might be possible to assign a dollar value in revenue generation, or percentage increase in productivity, there is often an element of subjective judgement about the extent to which these economic benefits are important to society.

I mean, who can really say how invaluable it is for people to connect with each other.

While it is impossible to eliminate subjectivity, we believe that it is important to consider the benefits of the technology we develop. This helps us ensure that we are meeting our goal of delivering those benefits to our community. It also drives us to focus on approaches that adequately mitigate any significant risks that we identify without also eliminating the benefits we hoped to deliver in the first place.

Yes, there’s catastrophic risk, but Just Think of the Potential.

Of course, yes, it is ultimately a game of costs versus benefits, risks versus rewards. I am not saying that the correct number of expected catastrophic risks is zero, or even that the correct probability of existential risk is zero or epsilon. I get it.

But the whole point of these frameworks is to define in advance what precautions you will take, and what things you won’t do, exactly because when the time comes, it will be easy to justify pushing forward when you shouldn’t, and to define clear principles. If the principle is ‘as long as I see enough upside I do what I want’? I expect in the trenches this means ‘we will do whatever we want, for our own interests.’

That doesn’t mean Meta will do zero safety testing. It doesn’t mean that, if the model was very obviously super dangerous, they would release it anyway, I don’t think these people are suicidal or worse want to go bankrupt. But you don’t need a document like this if it ultimately only says ‘don’t do things that at the time seem deeply stupid.’

Or at least, I kind of hope you were planning on not doing that anyway?

Similarly, if you wanted to assure others and tie your hands against pressures, you would have a procedure required to modify the framework, at least if you were going to make it more permissive. I don’t see one of those. Again, they can do what they want.

They have a permit.

It says ‘lol, we’re Meta.’

Good. I appreciate the candor, including the complete disregard for potential recursive self-improvement risks, as well as nuclear, radiological or persuasion risks.

So what are we going to do about all this?

Previously we had version 1.0, now we have version 2.0. DeepMinders are excited.

This is in several ways an improvement over version 1.0. It is more detailed, it introduces deceptive alignment as a threat model, it has sections on governance and disclosures, and it fixes a few other things. It maps capability levels to mitigation levels, which was missing previously. There are also some smaller steps backwards.

Mostly I’ll go over the whole thing, since I expect almost all readers don’t remember the details from my coverage of the first version.

The framework continues to be built around ‘Critical Capability Levels.’

We describe two sets of CCLs: misuse CCLs that can indicate heightened risk of severe harm from misuse if not addressed, and deceptive alignment CCLs that can indicate heightened risk of deceptive alignment-related events if not addressed.

The emphasis on deceptive alignment is entirely new.

For misuse risk, we dene CCLs in high-risk domains where, based on early research, we believe risks of severe harm may be most likely to arise from future models:

● CBRN: Risks of models assisting in the development, preparation, and/or execution of a chemical, biological, radiological, or nuclear (“CBRN”) attack.

● Cyber: Risks of models assisting in the development, preparation, and/or execution of a cyber attack.

● Machine Learning R&D: Risks of the misuse of models capable of accelerating the rate of AI progress to potentially destabilizing levels, the result of which could be the unsafe attainment or proliferation of other powerful AI models. Capabilities in this area are under active research, and in the longer term may exacerbate frontier AI risks—including in other risk domains—if insuciently managed.

What are noticeably missing from this list are two things.

  1. Persuasion risks, which Google and Meta don’t consider. I think this should be considered, although I do think there is a reasonable argument against this.

  2. Autonomy. In Version 1.0, Google considered the risk of expanding effective capacity in the world fully autonomously, since this could potentially then expand quite a bit, and then use that capacity (via instrumental convergence) for any number of goals. I am sad to see this go. I am also sad to not see an explanation for why this was removed.

For both of these but for autonomy in particular, I ask:

  1. Is it plausible that a model could cross the previous autonomy threshold without crossing the others here?

  2. If that were the case, should we be worried about that?

The answer to the first question seems to clearly be yes.

The answer to the second question is, I do think I would want to at least test for and be aware of crossing this threshold, and then think hard about the implications.

I am however very happy to see the new addition, which I haven’t seen before in a similar document:

For deceptive alignment risk, the initial approach focuses on detecting when models might develop a baseline instrumental reasoning ability at which they have the potential to undermine human control, assuming no additional mitigations were applied.

The two instrumental reasoning CCLs thus focus on delineating when such capability becomes present, and subsequently when the initial mitigation for this capability—automated monitoring—is no longer adequate.

This is exactly the kind of thinking that Meta repeatedly disavows and treats with disdain in its framework. If AIs undermine human control, if they pretend to be aligned when they are not or otherwise steer the training process to their own ends, then we have a serious problem. Meta’s response is, well, how does that lead to a specific catastrophic outcome? To which I say, that is obviously the wrong question.

The basic plan is same as it always is and has to be. Run periodic evaluations as models gain capabilities, check for dangerous capabilities, apply mitigations. They take both security mitigations to protect the weights, and deployment mitigations to counter potential misuse.

I notice this framework doesn’t by default work well for the AI R&D case. They agree, and note as an issue for future work, that further classes of mitigation are likely needed.

Here is the deployment mitigation plan 2.0:

  1. Development and assessment of mitigations: safeguards and an accompanying safety case are developed by iterating on the following:

    1. Developing and improving a suite of safeguards targeting the capability. This includes, as appropriate, safety fine-tuning, misuse filtering and detection, and response protocols.

    2. Assessing the robustness of these mitigations against the risk posed through assurance evaluations and threat modeling research. The assessment takes the form of a safety case, taking into account factors such as the likelihood and consequences of misuse.

  2. Pre-deployment review of safety case: general availability deployment of a model takes place only after the appropriate corporate governance body determines the safety case regarding each CCL the model has reached to be adequate.

  3. Post-deployment review of safety case: the safety case will be updated through red-teaming and revisions to our threat models. The safeguards for the model may be updated as well to ensure continued adequacy.

This replaces the old plan, which was to go in steps from mitigations targeting the capability to a red team validated safety case to prevention of access.

I notice that the red-teaming here is only mentioned in post-deployment. It should be in pre-deployment as well, no, as part of either the safety case or review of the safety case, or both? But the core idea is the same, you need to do mitigation and then make a safety case. So essentially this is universalizing the old Security Level 2.

Their logic is that if prevention of access (the old Security Level 3) is necessary, then that is a natural extension. I get why they’re saying that, but I’d like to keep this explicit and around anyway – you can almost never be too explicit in situations like this. Clarity almost never hurts, and if thinking changes on this I’d like to track it.

The definition of mitigations here is still ‘take adequate mitigations,’ which does not tell us much. Google has wide latitude, as long as it makes its safety case.

Here are the mitigations:

As I said last time: One of these things is not like the others.

The CBRN and Cyber capability thresholds seem set rather high, but also I expect mitigations to be involved well below these thresholds. I’d also want clarification, cost reductions compared to what case? Presumably this is versus not having general LLM access. If it’s versus the next best available LLM, then an order of magnitude or ‘drastic reduction’ is too large a threshold, since one could easily approach that in distinct steps. In general I worry these new threshold descriptions are substantially higher than the previous ones.

We now move on to machine learning R&D. For uplift 1, they move from 3x rates for algorithmic progress to 2x overall progress for Machine Learning Uplift, and specifying 2024 rates as the baseline. I think that’s a better way to define the threshold to match what we actually care about, so I like this change.

Then there’s Machine Learning R&D autonomy level 1. This isn’t quite as ambitious as the old version’s Machine Learning R&D 2, which was essentially ‘the singularity arrives next Tuesday.’ This is at comparable costs rather than a drastic reduction in costs.

My guess is this still probably does mean the singularity arrives not too long from next Tuesday, given how much compute is available to Google, although this isn’t obvious. Even if it’s currently only ‘about as efficient’ as humans would be, that phase won’t last long. If you had a model that was anywhere near being able to do this, and no one else had released one that was better, it seems rather crazy to consider releasing it.

This graph, which I also showed in the Meta section, is from the 1.0 version of the framework. It’s not in 2.0, but it still does seem to be adhered to in spirit:

The security levels are listed in terms of the RAND security protocols.

Since the document calls for levels 2,3 and 4, here’s a Claude summary of levels 2-4:

SL2 – Professional Protection

  • Defends against professional but opportunistic hackers

  • Focuses on comprehensive implementation of security fundamentals and industry best practices

  • Emphasizes closing common security gaps and vulnerabilities

  • Key aspects include email security, password policies, and multi-factor authentication

SL3 – Advanced Protection

  • Defends against cybercrime syndicates and insider threats

  • Aggressively reduces attack surface

  • Implements defense-in-depth strategies

  • Focuses heavily on restricting and monitoring access to weights

  • Requires two independent security layers

SL4 – State-Level Protection

  • Defends against standard operations by leading cyber-capable institutions (like foreign intelligence agencies)

  • Requires significant compromises on productivity and convenience

  • Implements hardware-level security measures

  • Uses confidential computing

  • Requires four independent security layers

If you’re Google and you’re not at least at SL2 for every model worth deploying, why the hell not? I think this should be the minimum even with no dangerous capabilities.

There is a reason why even 1 level of Version 1.0’s protocol was called similar to SL3. I don’t buy the arguments in Table 1 that cybersecurity improvements are likely to diminish necessary security levels. I think it’s rather absurd to not move to at least SL3 if any critical capability levels are found.

The question is at what point everything needs to be SL4, and how much this should depend upon coordination with the rest of industry. My guess i you only need it for Machine Learning R&D, but there I do think you need something like that, even though implementation will be expensive. For ‘uplift 1’ SL4 seems roughly reasonable to me, but for ‘autonomy 1’ I think even SL4 is inadequate.

This is a longstanding disagreement, as I found their previous version insufficient.

For the deceptive alignment case, ‘automated monitoring’ can be applied as the initial mitigation, as noted above, but this is a very temporary solution if it’s not part of defense-in-depth as a backup plan. The ‘automated monitoring only’ phase likely wouldn’t last long, and risks making things worse via teaching the system how to evade detection.

I do find the answer of ‘future work, we don’t know how’ for mitigation of Instrumental Reasoning Level 2 to be acceptable. It’s a hard problem. Better to admit you don’t know, so long as this also means that if you’re in this situation before you figure it out, you’re committing to halting and catching fire until you figure it out.

I also appreciated that there is now at least a brief a section on governance, which was not in 1.0, detailing who will review implementation of the framework (the AGI Safety Council) and noting several other relevant councils. I would like to see more of a formal process but this is a start.

I also appreciated the intention to share information with ‘appropriate government authorities’ if the risks involved are triggered, even if they are then mitigated. They don’t commit to telling anyone else, but will consider it.

Another great note was saying ‘everyone needs to do this.’ Safety of models is a public good, and somewhat of a Stag Hunt, where we all win if everyone who is at the frontier cooperates. If you can outrun the bear, but the bear still eats someone else’s model, in this case you are not safe.

However, there were also a few steps back. The specific 6x compute or 3 month threshold was removed for a more flexible rule. I realize that 6x was stingy already and a hard-and-fast rule will sometimes be foolish, but I believe we do need hard commitments in such places at current trust levels.

So we have steps forward in (some details here not mentioned above):

  1. Deceptive alignment as a threat model.

  2. Capability levels are mapped to mitigation levels.

  3. Governance.

  4. Disclosures.

  5. Using the RAND protocol levels.

  6. Adjustment of threshold details.

  7. Centralizing role of safety cases.

  8. Changed ‘pass condition’ to ‘alert threshold’ which seems better.

  9. Emphasis on confidential computing.

  10. Explicit calls for industry-wide cooperation, willingness to coordinate.

  11. Explicit intention of sharing results with government if thresholds are triggered.

And we have a few steps back:

  1. Removal of autonomy threshold (I will trade this for deceptive alignment but would prefer to have both, and am still sad about missing persuasion.)

  2. Removal of the 6x compute and 3 month thresholds for in-training testing.

  3. Reduced effective security requirements in some places.

  4. Less explicitness about shutting down access if necessary.

Overall, it’s good news. That’s definitely a step forward, and it’s great to see DeepMind publishing revisions and continuing to work on the document.

One thing missing the current wave of safety frameworks is robust risk governance. The Centre for Long-Term Resilience argues, in my opinion compellingly, that these documents need risk governance to serve their full intended purpose.

CLTR: Frontier safety frameworks help AI companies manage extreme risks, but gaps in effective risk governance remain. Ahead of the Paris AI Action Summit next week, our new report outlines key recommendations on how to bridge this gap.

Drawing on the best practice 3 lines framework widely used in other safety critical industries like nuclear, aviation and healthcare, effective risk governance includes:

  1. Decision making ownership (first-line)

  2. Advisory oversight (second-line)

  3. Assurance (third line)

  4. Board-level oversight

  5. Culture

  6. External transparency

Our analysis found that evidence for effective risk governance across currently published frontier AI safety frameworks is low overall.

While some aspects of risk governance are starting to be applied, the overall state of risk governance implementation in safety frameworks appears to be low, across all companies.

This increases the chance of harmful models being released because of aspects like unclear risk ownership, escalation pathways and go/no-go decisions about when to release models.

By using the recommendations outlined in our report, overall effectiveness of safety frameworks can be improved by enhancing risk identification, assessment, and mitigation.

It is an excellent start to say that your policy has to say what you will do. You then need to ensure that the procedures are laid out so it actually happens. They consider the above an MVP of risk governance.

I notice that the MVP does not seem to be optimizing for being on the lower right of this graph? Ideally, you want to start with things that are valuable and easy.

Escalation procedures and go/no-go decisions seem to be properly identified as high value things that are relatively easy to do. I think if anything they are not placing enough emphasis on cultural aspects. I don’t trust any of these frameworks to do anything without a good culture backing them up.

DeepMind has improved its framework, but it has a long way to go. No one has what I would consider a sufficient framework yet, although I believe OpenAI and Anthropic’s attempts are farther along.

The spirit of the documents is key. None of these frameworks are worth much if those involved are looking only to obey the technical requirements. They’re not designed to make adversarial compliance work, if it was even possible. They only work if people genuinely want to be safe. That’s a place Anthropic has a huge edge.

Meta vastly improved its framework, in that it previously didn’t have one, and now the new version at least admits that they essentially don’t have one. That’s a big step. And of course, even if they did have a real framework, I would not expect them to abide by its spirit. I do expect them to abide by the spirit of this one, because the spirit of this one is to not care.

The good news is, now we can talk about all of that.

Discussion about this post

On the Meta and DeepMind Safety Frameworks Read More »