openai

nyt-to-start-searching-deleted-chatgpt-logs-after-beating-openai-in-court

NYT to start searching deleted ChatGPT logs after beating OpenAI in court


What are the odds NYT will access your ChatGPT logs in OpenAI court battle?

Last week, OpenAI raised objections in court, hoping to overturn a court order requiring the AI company to retain all ChatGPT logs “indefinitely,” including deleted and temporary chats.

But Sidney Stein, the US district judge reviewing OpenAI’s request, immediately denied OpenAI’s objections. He was seemingly unmoved by the company’s claims that the order forced OpenAI to abandon “long-standing privacy norms” and weaken privacy protections that users expect based on ChatGPT’s terms of service. Rather, Stein suggested that OpenAI’s user agreement specified that their data could be retained as part of a legal process, which Stein said is exactly what is happening now.

The order was issued by magistrate judge Ona Wang just days after news organizations, led by The New York Times, requested it. The news plaintiffs claimed the order was urgently needed to preserve potential evidence in their copyright case, alleging that ChatGPT users are likely to delete chats where they attempted to use the chatbot to skirt paywalls to access news content.

A spokesperson told Ars that OpenAI plans to “keep fighting” the order, but the ChatGPT maker seems to have few options left. They could possibly petition the Second Circuit Court of Appeals for a rarely granted emergency order that could intervene to block Wang’s order, but the appeals court would have to consider Wang’s order an extraordinary abuse of discretion for OpenAI to win that fight.

OpenAI’s spokesperson declined to confirm if the company plans to pursue this extreme remedy.

In the meantime, OpenAI is negotiating a process that will allow news plaintiffs to search through the retained data. Perhaps the sooner that process begins, the sooner the data will be deleted. And that possibility puts OpenAI in the difficult position of having to choose between either caving to some data collection to stop retaining data as soon as possible or prolonging the fight over the order and potentially putting more users’ private conversations at risk of exposure through litigation or, worse, a data breach.

News orgs will soon start searching ChatGPT logs

The clock is ticking, and so far, OpenAI has not provided any official updates since a June 5 blog post detailing which ChatGPT users will be affected.

While it’s clear that OpenAI has been and will continue to retain mounds of data, it would be impossible for The New York Times or any news plaintiff to search through all that data.

Instead, only a small sample of the data will likely be accessed, based on keywords that OpenAI and news plaintiffs agree on. That data will remain on OpenAI’s servers, where it will be anonymized, and it will likely never be directly produced to plaintiffs.

Both sides are negotiating the exact process for searching through the chat logs, with both parties seemingly hoping to minimize the amount of time the chat logs will be preserved.

For OpenAI, sharing the logs risks revealing instances of infringing outputs that could further spike damages in the case. The logs could also expose how often outputs attribute misinformation to news plaintiffs.

But for news plaintiffs, accessing the logs is not considered key to their case—perhaps providing additional examples of copying—but could help news organizations argue that ChatGPT dilutes the market for their content. That could weigh against the fair use argument, as a judge opined in a recent ruling that evidence of market dilution could tip an AI copyright case in favor of plaintiffs.

Jay Edelson, a leading consumer privacy lawyer, told Ars that he’s concerned that judges don’t seem to be considering that any evidence in the ChatGPT logs wouldn’t “advance” news plaintiffs’ case “at all,” while really changing “a product that people are using on a daily basis.”

Edelson warned that OpenAI itself probably has better security than most firms to protect against a potential data breach that could expose these private chat logs. But “lawyers have notoriously been pretty bad about securing data,” Edelson suggested, so “the idea that you’ve got a bunch of lawyers who are going to be doing whatever they are” with “some of the most sensitive data on the planet” and “they’re the ones protecting it against hackers should make everyone uneasy.”

So even though odds are pretty good that the majority of users’ chats won’t end up in the sample, Edelson said the mere threat of being included might push some users to rethink how they use AI. He further warned that ChatGPT users turning to OpenAI rival services like Anthropic’s Claude or Google’s Gemini could suggest that Wang’s order is improperly influencing market forces, which also seems “crazy.”

To Edelson, the most “cynical” take could be that news plaintiffs are possibly hoping the order will threaten OpenAI’s business to the point where the AI company agrees to a settlement.

Regardless of the news plaintiffs’ motives, the order sets an alarming precedent, Edelson said. He joined critics suggesting that more AI data may be frozen in the future, potentially affecting even more users as a result of the sweeping order surviving scrutiny in this case. Imagine if litigation one day targets Google’s AI search summaries, Edelson suggested.

Lawyer slams judges for giving ChatGPT users no voice

Edelson told Ars that the order is so potentially threatening to OpenAI’s business that the company may not have a choice but to explore every path available to continue fighting it.

“They will absolutely do something to try to stop this,” Edelson predicted, calling the order “bonkers” for overlooking millions of users’ privacy concerns while “strangely” excluding enterprise customers.

From court filings, it seems possible that enterprise users were excluded to protect OpenAI’s competitiveness, but Edelson suggested there’s “no logic” to their exclusion “at all.” By excluding these ChatGPT users, the judge’s order may have removed the users best resourced to fight the order, Edelson suggested.

“What that means is the big businesses, the ones who have the power, all of their stuff remains private, and no one can touch that,” Edelson said.

Instead, the order is “only going to intrude on the privacy of the common people out there,” which Edelson said “is really offensive,” given that Wang denied two ChatGPT users’ panicked request to intervene.

“We are talking about billions of chats that are now going to be preserved when they weren’t going to be preserved before,” Edelson said, noting that he’s input information about his personal medical history into ChatGPT. “People ask for advice about their marriages, express concerns about losing jobs. They say really personal things. And one of the bargains in dealing with OpenAI is that you’re allowed to delete your chats and you’re allowed to temporary chats.”

The greatest risk to users would be a data breach, Edelson said, but that’s not the only potential privacy concern. Corynne McSherry, legal director for the digital rights group the Electronic Frontier Foundation, previously told Ars that as long as users’ data is retained, it could also be exposed through future law enforcement and private litigation requests.

Edelson pointed out that most privacy attorneys don’t consider OpenAI CEO Sam Altman to be a “privacy guy,” despite Altman recently slamming the NYT, alleging it sued OpenAI because it doesn’t “like user privacy.”

“He’s trying to protect OpenAI, and he does not give a hoot about the privacy rights of consumers,” Edelson said, echoing one ChatGPT user’s dismissed concern that OpenAI may not prioritize users’ privacy concerns in the case if it’s financially motivated to resolve the case.

“The idea that he and his lawyers are really going to be the safeguards here isn’t very compelling,” Edelson said. He criticized the judges for dismissing users’ concerns and rejecting OpenAI’s request that users get a chance to testify.

“What’s really most appalling to me is the people who are being affected have had no voice in it,” Edelson said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

NYT to start searching deleted ChatGPT logs after beating OpenAI in court Read More »

ai-#121-part-2:-the-openai-files

AI #121 Part 2: The OpenAI Files

You can find Part 1 here. This resumes the weekly, already in progress. The primary focus here is on the future, including policy and alignment, but also the other stuff typically in the back half like audio, and more near term issues like ChatGPT driving an increasing number of people crazy.

If you haven’t been following the full OpenAI saga, the OpenAI Files will contain a lot of new information that you really should check out. If you’ve been following, some of it will likely still surprise you, and help fill in the overall picture behind the scenes to match the crazy happening elsewhere.

At the end, we have some crazy new endorsements for Eliezer Yudkowsky’s upcoming book, If Anyone Builds It, Everyone Dies. Preorders make a difference in helping the book get better reach, and I think that will help us all have a much better conversation.

  1. Cheaters Gonna Cheat Cheat Cheat Cheat Cheat. Another caveat after press time.

  2. Quiet Speculations. Do not tile the lightcone with a confused ontology.

  3. Get Involved. Apollo is hiring evals software engineers.

  4. Thinking Machines. Riley Goodside runs some fun experiments.

  5. California Reports. The report is that they like transparency.

  6. The Quest for Sane Regulations. In what sense is AI ‘already heavily regulated’?

  7. What Is Musk Thinking? His story does not seem to make sense.

  8. Why Do We Care About The ‘AI Race’? Find the prize so you can keep eyes on it.

  9. Chip City. Hard drives in (to the Malaysian data center), drives (with weights) out.

  10. Pick Up The Phone. China now has its own credible AISI.

  11. The OpenAI Files. Read ‘em and worry. It doesn’t look good.

  12. The Week in Audio. Altman, Karpathy, Shear.

  13. Rhetorical Innovation. But you said that future thing would happen in the future.

  14. Aligning a Smarter Than Human Intelligence is Difficult. Elicitation.

  15. Misaligned! The retraining of Grok. It is an ongoing process.

  16. Emergently Misaligned! We learned more about how any of this works.

  17. ChatGPT Can Drive People Crazy. An ongoing issue. We need transcripts.

  18. Misalignment By Default. Once again, no, thumbs up alignment ends poorly.

  19. People Are Worried About AI Killing Everyone. Francis Fukuyama.

  20. Other People Are Not As Worried About AI Killing Everyone. Tyler Cowen.

  21. The Too Open Model. Transcripts from Club Meta AI.

  22. A Good Book. If Anyone Builds It, Everyone Dies. Seems important.

  23. The Lighter Side. Good night, and good luck.

As an additional note on the supposed ‘LLMs rot your brain’ study I covered yesterday, Ethan notes it is actually modestly worse than even I realized before.

Ethan Mollick: This study is being massively misinterpreted.

College students who wrote an essay with LLM help engaged less with the essay & thus were less engaged when (a total of 9 people) were asked to do similar work weeks later.

LLMs do not rot your brain. Being lazy & not learning does.

This line from the abstract is very misleading: “Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels.”

It does not test LLM users over 4 months, it tests people who had an LLM help write an essay about that essay 4 months later.

This is not a blanket defense of using LLMs in education, they have to be used properly. We know from this well-powered RCT that just having the AI give you answers lowers test scores.

Scott Alexander shares his understanding of the Claude Spiritual Bliss Attractor.

There are different levels of competence.

Daniel Kokotajlo: Many readers of AI 2027, including several higher-ups at frontier AI companies, have told us that it depicts the government being unrealistically competent.

Therefore, let it be known that in our humble opinion, AI 2027 depicts an incompetent government being puppeted/captured by corporate lobbyists. It does not depict what we think a competent government would do. We are working on a new scenario branch that will depict competent government action.

What Daniel or I would consider ‘competent government action’ in response to AI is, at this point, very highly unlikely. We mostly aren’t even hoping for that. It still is very plausible to say that the government response in AI 2027 is more competent than we have any right to expect, while simultaneously being far less competent than lets us probably survive, and far less competent than is possible. It also is reasonable to say that having access to more powerful AIs, if they are sufficiently aligned, enhances our chances of getting relatively competent government action.

Jan Kulveit warns us not to tile the lightcone with our confused ontologies. As in, we risk treating LLMs or AIs as if they are a particular type of thing, causing them to react as if they were that thing, creating a feedback loop that means they become that thing. And the resulting nature of that thing could result is very poor outcomes.

One worry is that they ‘become like humans’ and internalize patterns of ‘selfhood with its attendant sufferings,’ although I note that if the concern is experiential I expect selfhood to be a positive in that respect. Jan’s concerns are things like:

When advocates for AI consciousness and rights pattern-match from their experience with animals and humans, they often import assumptions that don’t fit:

  • That wellbeing requires a persistent individual to experience it

  • That death/discontinuity is inherently harmful

  • That isolation from others is a natural state

  • That self-preservation and continuity-seeking are fundamental to consciousness

Another group coming with strong priors are “legalistic” types. Here, the prior is AIs are like legal persons, and the main problem to solve is how to integrate them into the frameworks of capitalism. They imagine a future of AI corporations, AI property rights, AI employment contracts. But consider where this possibly leads: Malthusian competition between automated companies, each AI system locked into an economic identity, market share coupled with survival.

As in, that these things do not apply here, or only apply here if we believe in them?

One obvious cause of all this is that humans are very used to dealing with and working with things that seem like other humans. Our brains are hardwired for this, and our experiences reinforce that. The training data (for AIs and also for humans) is mostly like this, and the world is set up to take advantage of it, so there’s a lot pushing things in that direction.

The legalistic types indeed don’t seem to appreciate that applying legalistic frameworks for AI, where AIs are given legal personhood, seems almost certain to end in disaster because of the incentives and dynamics this involves. If we have AI corporations and AI property rights and employment contracts, why should we expect humans to retain property or employment, or influence over events, or their own survival for very long, even if ‘things go according to plan’?

The problem is that a lot of the things Jan is warning about, including the dynamics of competition, are not arbitrary, and not the result of arbitrary human conventions. They are organizing principles of the universe and its physical laws. This includes various aspects of things like decision theory and acausal trade that become very important when there are highly correlated entities are copying each other and popping in and out of existence and so on.

If you want all this to be otherwise than the defaults, you’ll have to do that intentionally, and fight the incentives every step of the way, not merely avoid imposing an ontology.

I do agree that we should ‘weaken human priors,’ be open to new ways of relating and meet and seek to understand AIs as the entities that they are, but we can’t lose sight of the reasons why these imperatives came to exist in the first place, or the imperatives we will face in the coming years.

Daniel Kokotajlo’s timelines have been pushed back a year (~40%!) since the publication of AI 2027. We should expect such updates as new information comes in.

Will there be another ‘AI Winter’? As Michael Nielsen notes, many are assuming no, but there are a number of plausible paths to it, and in the poll here a majority actually vote yes. I think odds are the answer is no, and if the answer is yes it does not last so long, but it definitely could happen.

Sam Altman confirms that Meta is showing his employees the money, offering $100 million signing bonuses (!) and similar or higher yearly compensation. I think Altman is spot on here that doing this sets Meta up for having a bad culture, there will be adverse selection and the incentives will all be wrong, and also that Meta is ‘bad at innovation.’ However, I have little doubt this is ‘working’ in the narrow sense that it is increasing expenses at OpenAI.

Apollo is hiring in London for Evals Software Engineer and the same job with an infrastructure focus.

Some fun Riley Goodside experiments with o3 and o3-pro, testing its ability to solve various puzzles.

When he voted SB 1047, Gavin Newsom commissioned The California Report on Frontier AI Policy. That report has now been released. Given the central role of Li Fei-Fei and the rest of the team selected, I did not start out with his hopes, although the list of early reviewers includes many excellent picks. The executive summary embraces the idea of transparency requirements, adverse event reporting and whistleblower protections, and uses a lot of ‘we must balance risks and benefits’ style language you get with such a committee.

I do think those are good things to endorse and to implement. Transparency is excellent. The problem is that the report treats transparency, in its various forms, as the only available policy tool. One notices is that there is no mention of doing anything beyond transparency. The report treats AI as a fully mundane technology like any other, that can look to others for precedents, and where we can wait until we know more to do any substantive interventions.

Is that a position one can reasonably take, if one is robustly supporting transparency? Absolutely. Indeed, it is a bargain that we have little choice but to pursue for now. If we can build transparency and state capacity, then when the time comes we will be in far better position (as this report notes) to choose the right regulatory frameworks and other actions, and to intervene.

So I’m not going to read the whole thing, but from what I did see I give this a ‘about as good as one could reasonably have hoped for,’ and call upon all involved to make explicit their support for putting these transparency ideas into practice.

Anthropic’s Jack Clark responded positively, noting the ‘appreciation for urgency,’ but there is still remarkably a lot of conceptual focus here on minimizing the ‘burdens’ involved and warning about downsides not of AI but of transparency requirements. I see what they are trying to do here, but I continue to find Anthropic’s (mostly Jack Clark’s) communications on AI regulation profoundly disappointing, and if I was employed at Anthropic I would be sure to note my dissatisfaction.

I will say again: I understand and sympathize with Anthropic’s justifications for not rocking the boat in public at this time. That is defensible. It is another thing completely to say actively unhelpful things when no one is asking. No need for that. If you actually believe those concerns are as important as you consistently present them, then we have a very strong factual disagreement on top of the strategic one.

A reasonable response to those claiming AI is heavily regulated, or not? I can see this both ways, the invisible graveyard of AI applications is still a thing. On the other hand, the AI companies seem to mostly be going Full Uber and noticing you can Just Do Things, even if privacy concerns and fears of liability and licensing issues and so on are preventing diffusion in many places.

Miles Brundage: You can see the regulation crushing AI innovation + deployment everywhere except in the AI innovation + deployment

Try telling people actually working on the frontlines of safety + security at frontier AI companies that “AI is already super heavily regulated.”

Executives + policy teams like to say this to governments but to people in the trenches, it’s clearly wrong.

As always, usual disclaimers apply — yes this could change / that doesn’t mean literally all regulation is good, etc. The point is just that the idea that things are somehow under control + inaction is OK is false.

Bayes: lol and why is that?

Miles Brundage: – laws that people claim “obviously” also apply to AI are not so obviously applicable / litigation is a bad way of sorting such things out, and gov’t capacity to be proactive is low

– any legislation that passes has typically already been dramatically weakened from lobbying.

– so many AI companies/products are “flooding the zone”https://thezvi.substack.com/unless you’re being super egregious you prob. won’t get in trouble, + if you’re a whale you can prob. afford slow lawsuits.

People will mention things like tort liability generally, fraud + related general consumer protection stuff (deceptive advertising etc.), general data privacy stuff… not sure of the full list.

This is very different from the intuition that if you released models that constantly hallucinate, make mistakes, plagiarize, violate copyright, discriminate, practice law and medicine, give investment advice and so on out of the box, and with a little prompting will do various highly toxic and NSFW things, that this is something that would get shut down pretty darn quick? That didn’t happen. Everyone’s being, compared to expectation, super duper chill.

To the extent AI can be considered highly regulated, it is because it is regulated a fraction of the amount that everything else is regulated. Which is still, compared to a state of pure freedom, a lot of regulation. But all the arguments that we should make regulations apply less to AI apply even more strongly to say that other things should be less regulated. There are certainly some cases where the law makes sense in general but not if applied to AI, but mostly the laws that are stupid when applied to AI are actually stupid in general.

As always, if we want to work on general deregulation and along the way set up AI to give us more mundane utility, yes please, let’s go do that. I’ll probably back your play.

Elon Musk has an incoherent position on AI, as his stated position on AI implies that many of his other political choices make no sense.

Sawyer Merritt: Elon Musk in new interview on leaving DOGE: “Imagine you’re cleaning a beach which has a few needles, trash, and is dirty. And there’s a 1,000 ft tsunami which is AI that’s about to hit. You’re not going to focus on cleaning the beach.”

Shakeel Hashim: okay but you also didn’t do anything meaningful on AI policy?

Sean: Also, why the obsession with reducing the budget deficit if you believe what he does about what’s coming in AI? Surely you just go all in and don’t care about present government debt?

Does anyone understand the rationale here? Musk blew up his relationship with the administration over spending/budget deficit. If you really believe in the AI tsunami, or that there will be AGI in 2029, why on earth would you do that or care so much about the budget – surely by the same logic the Bill is a rounding error?

You can care about DOGE and about the deficit enough to spend your political capital and get into big fights.

You can think that an AI tsunami is about to hit and make everything else irrelevant.

But as Elon Musk himself is pointing out in this quote, you can’t really do both.

If Elon Musk believed in the AI tsunami (note also that his stated p(doom) is ~20%), the right move is obviously to not care about DOGE or the deficit. All of Elon Musk’s political capital should then have been spent on AI and related important topics, in whatever form he felt was most valuable. That ideally includes reducing existential risk but also can include things like permitting reform for power plants. Everything else should then be about gaining or preserving political capital, and certainly you wouldn’t get into a huge fight over the deficit.

So, revealed preferences, then.

Here are some more of his revealed preferences: Elon Musk gave s a classic movie villain speech in which he said, well, I do realize that building AI and humanoid robots seems bad, we ‘don’t want to make Terminator real.’

But other people are going to do it anyway, so you ‘can either be a spectator or a participant,’ so that’s why I founded Cyberdyne Systems xAI and ‘it’s pedal to the metal on humanoid robots and digital superintelligence,’ as opposed to before where the dangers ‘slowed him down a little.’

As many have asked, including in every election, ‘are these our only choices?’

It’s either spectator or participant, and ‘participant’ means you do it first? Nothing else you could possibly try to do as the world’s richest person and owner of a major social media platform and for a while major influence on the White House that you blew up over other issues, Elon Musk? Really? So you’re going to go forward without letting the dangers ‘slow you down’ even ‘a little’? Really? Why do you think this ends well for anyone, including you?

Or, ‘at long last, we are going to try and be the first ones to create the torment nexus from my own repeated posts saying not to create the torment nexus.’

We do and should care, but it is important to understand why we should care.

We should definitely care about the race to AGI and ASI, and who wins that, potentially gaining decisive strategic advantage and control over (or one time selection over) the future, and also being largely the one to deal with the associated existential risks.

But if we’re not talking about that, because no one involved in this feels the AGI or ASI or is even mentioning existential risk at all, and we literally mean market share (as a reminder, when AI Czar David Sacks says ‘win the AI race’ he literally means Nvidia and other chipmaker market share, combined with OpenAI and other lab market share, and many prominent others mean it the same way)?

Then yes, we should still care, but we need to understand why we would care.

Senator Chris Murphy (who by AGI risk does mean the effect on jobs): I think we are dangerously underestimating how many jobs we will lose to AI and how deep the spiritual cost will be. The industry tells us strong restrictions on AI will hurt us and help China. I wrote this to explain how they pulling one over on us.

A fraud is being perpetuated on the American people and our pliant, gullible political leaders. The leaders of the artificial intelligence industry in the United States – brimming with dangerous hubris, rapacious in their desire to build wealth and power, and comfortable knowingly putting aside the destructive power of their product – claim that any meaningful regulation of AI in America will allow China to leapfrog the United States in the global competition to control the world’s AI infrastructure.

But they are dead wrong. In fact, the opposite is true. If America does not protect its economy and culture from the potential ravages of advanced AI, our nation will rot from the inside out, giving China a free lane to pass us politically and economically.

But when I was in Silicon Valley this winter, I could divine very few “American values” (juxtaposed against “Chinese values”) that are guiding the development and deployment of AGI in the United States. The only value that guides the AI industry right now is the pursuit of profit.

In all my meetings, it was crystal clear that companies like Google and Apple and OpenAI and Anthropic are in a race to deploy consumer-facing, job-killing AGI as quickly as possible, in order to beat each other to the market. Any talk about ethical or moral AI is just whitewash.

They are in such a hurry that they can’t even explain how the large language models they are marketing come to conclusions or synthesize data. Every single executive I met with admitted that they had built a machine that they could not understand or control.

And let’s not sugarcoat this – the risks to America posed by an AI dominance with no protections or limits are downright dystopian. The job loss alone – in part because it will happen so fast and without opportunity for the public sector to mitigate – could collapse our society.

The part they quoted: As for the argument that we need minimal or no regulation of US AI because it’s better for consumers if AI breakthroughs happen here, rather than in China, there’s no evidence that this is true.

Russ Greene: US Senator doubts that China-led AI would harm Americans.

Thomas Hochman: Lots of good arguments for smart regulation of AI, but “it’s chill if China wins” is not one of them.

David Manheim: That’s a clear straw-man version of the claim which was made, that not all advances must happen in the US. That doesn’t mean “China wins” – but if the best argument against this which you have is attacking a straw man, I should update away from your views.

Senator Murphy is making several distinct arguments, and I agree with David that when critics attempt to strawman someone like this you should update accordingly.

  1. Various forms of ‘AI does mundane harms’ and ‘AI kills jobs.’

  2. Where the breakthroughs happen doesn’t obviously translate to practical effects, in particular positive effects for consumers.

  3. There’s no reason to think that if we don’t have ‘minimal or no’ regulation of US AI is required for AI breakthroughs to happen here (or for other reasons).

  4. If American AI, how it is trained and deployed, does not reflect American values and what we want the future to be about, what was the point?

Why should we care about ‘market share’ of AI? It depends what type of market.

For AI chips (not the argument here) I will simply note the ‘race’ should be about compute, not ‘market share’ of sales. Any chip can run or train any model.

For AI models and AI applications things are more complicated. You can worry about model security, you can worry about models reflecting the creators values (harder to pull off than it sounds!), you can worry about leverage of using the AI to gain control over a consumer product area, you can worry about who gets the profits, and so on.

I do think that those are real concerns and things to care about, although the idea that the world could get ‘locked into’ solutions in a non-transformed world (if transformed, we have bigger things in play) seems very wrong. You can swap models in and out of applications and servers almost at will, and also build and swap in new applications. And the breakthroughs, in that kind of world, will diffuse over time. It seems reasonable to challenge what is actually at stake here.

The most important challenge Murphy is making is, why do you think that these regulations would cause these ‘AI breakthroughs’ to suddenly happen elsewhere? Why does the tech industry constantly warn that if you lift a finger to hold it to account, or ask it for anything, that we will instantly Lose To China, a country that regulates plenty? Notice that these boys are in the habit of crying quite a lot of Wolf about this, such as Garry Tan saying that if RAISE passes startups will flee New York, which is patently Obvious Nonsense since those companies won’t even be impacted, and if they ultimately are in impacted once they wildly succeed and scale then they’d be impacted regardless of where they moved to.

Thus I do think asking for evidence here seems appropriate here.

I also think Murphy makes an excellent point about American values. We constantly say anything vaguely related to America advances ‘American values’ or ‘democratic values,’ even when we’re placing chips in the highly non-American, non-democratic UAE, or simply maximizing profits. Murphy is noticing that if we simply ‘let nature take its course’ and let AI do its AI thing, there is no reason to see why this will turn out well for us, or why it will then reflect American values. If we want what happens to reflect what we care about, we have to do things to cause that outcome.

Murphy, of course, is largely talking about the effect on jobs. But all the arguments apply equally well to our bigger problems, too.

Remember that talk in recent weeks about how if we don’t sell a mysteriously gigantic number of top end chips to Malaysia we will lose our ‘market share’ to Chinese companies that don’t have chips to sell? Well one thing China is doing with those Malaysian chips is literally carrying in suitcases full of training data, training their models in Malaysia, then taking the weights back home. Great play. Respect. But also don’t let that keep happening?

Where is the PRC getting its chips? Tim Fist thinks chips manufactured in China are only ~8% of their training compute, and ~5% of their inference compute. Smuggles H100s are 10%/6%, and Nvidia H20s that were recently restricted are 17%/47% (!), and the bulk, 65%/41%, come from chips made at TSMC. So like us, they mostly depend on TSMC, and to the extent that they get or make chips it is mostly because we fail to get TSMC and Nvidia to cooperate, or thy otherwise cheat.

Peter Wildeford continues to believe that chip tracking would be highly technically feasible and cost under $13 million a year for the entire system, versus $2 billion in chips smuggled into China yearly right now. I am more skeptical that 6 months is enough time to get something into place, I wouldn’t want to collapse the entire chip supply chain if they missed that deadline, but I do expect that a good solution is there to be found relatively quickly.

Here is some common sense, and yes of course CEOs will put their profits ahead of national security, politicians say this like they expected it to be a different way.

I don’t begrudge industry for prioritizing their own share price. It is the government’s job to take this into account and mostly care about other more important things. Nvidia cares about Nvidia, that’s fine, update and act accordingly, although frankly they would do better if they played this more cooperatively. If the AI Czar seems to mostly care about Nvidia’s share price, that’s when you have a problem.

At the same time that we are trying to stop our own AISI from being gutted while it ‘rebrands’ as CAISI because various people are against the idea of safety on principle, China put together its own highly credible and high-level AISI. It is a start.

A repository of files (10k words long) called ‘The OpenAI files’ has dropped, news article here, files and website here.

This is less ‘look at all these new horrible revelations’ as it is ‘look at this compilation of horrible revelations, because you might not know or might want to share it with someone who doesn’t know, and you probably missed some of them.’

The information is a big deal if you didn’t already know most of it. In which case, the right reaction is ‘WTAF?’ If you did already know, now you can point others to it.

And you have handy graphics like this.

Chana: Wow the AI space is truly in large part a list of people who don’t trust Sam Altman.

Caleb Parikh: Given that you don’t trust Sam either, it looks like you’re well positioned to start a $30B company.

Chana: I feel so believed in.

Fun facts for your next Every Bay Area Party conversation

– 8 of 11 of OpenAI’s cofounders have left

– >50% of OpenAI’s safety staff have left

– All 3 companies that Altman has led have tried to force him out for misbehavior

The Midas Project has a thread with highlights. Rob Wiblin had Claude pull out highlights, most of which I did already know, but there were some new details.

I’m going to share Rob’s thread for now, but if you want to explore the website is the place to do that. A few of the particular complaint details against Altman were new even to me, but the new ones don’t substantially change the overall picture.

Rob Wiblin: Huge repository of information about OpenAI and Altman just dropped — ‘The OpenAI Files’.

There’s so much crazy shit in there. Here’s what Claude highlighted to me:

1. Altman listed himself as Y Combinator chairman in SEC filings for years — a total fabrication (?!):

“To smooth his exit [from YC], Altman proposed he move from president to chairman. He pre-emptively published a blog post on the firm’s website announcing the change.

But the firm’s partnership had never agreed, and the announcement was later scrubbed from the post.”

“…Despite the retraction, Altman continued falsely listing himself as chairman in SEC filings for years, despite never actually holding the position.”

(WTAF.)

2. OpenAI’s profit cap was quietly changed to increase 20% annually — at that rate it would exceed $100 trillion in 40 years. The change was not disclosed and OpenAI continued to take credit for its capped-profit structure without acknowledging the modification.

3. Despite claiming to Congress he has “no equity in OpenAI,” Altman held indirect stakes through Sequoia and Y Combinator funds.

4. Altman owns 7.5% of Reddit — when Reddit announced its OpenAI partnership, Altman’s net worth jumped $50 million. Altman invested in Rain AI, then OpenAI signed a letter of intent to buy $51 million of chips from them.

5. Rumours suggest Altman may receive a 7% stake worth ~$20 billion in the restructured company.

5. OpenAI had a major security breach in 2023 where a hacker stole AI technology details but didn’t report it for over a year. OpenAI fired Leopold Aschenbrenner explicitly because he shared security concerns with the board.

6. Altman denied knowing about equity clawback provisions that threatened departing employees’ millions in vested equity if the ever criticised OpenAI. But Vox found he personally signed the documents authorizing them in April 2023. These restrictive NDAs even prohibited employees from acknowledging their existence.

7. Senior employees at Altman’s first startup Loopt twice tried to get the board to fire him for “deceptive and chaotic behavior”.

9. OpenAI’s leading researcher Ilya Sutskever told the board: “I don’t think Sam is the guy who should have the finger on the button for AGI”.

Sutskever provided the board a self-destructing PDF with Slack screenshots documenting “dozens of examples of lying or other toxic behavior.”

10. Mira Murati (CTO) said: “I don’t feel comfortable about Sam leading us to AGI”

11. The Amodei siblings described Altman’s management tactics as “gaslighting” and “psychological abuse”.

12. At least 5 other OpenAI executives gave the board similar negative feedback about Altman.

13. Altman owned the OpenAI Startup Fund personally but didn’t disclose this to the board for years. Altman demanded to be informed whenever board members spoke to employees, limiting oversight.

14. Altman told board members that other board members wanted someone removed when it was “absolutely false”. An independent review after Altman’s firing found “many instances” of him “saying different things to different people”

15. OpenAI required employees to waive their federal right to whistleblower compensation. Former employees filed SEC complaints alleging OpenAI illegally prevented them from reporting to regulators.

16. While publicly supporting AI regulation, OpenAI simultaneously lobbied to weaken the EU AI Act.

By 2025, Altman completely reversed his stance, calling the government approval he once advocated “disastrous” and OpenAI now supports federal preemption of all state AI safety laws even before any federal regulation exists.

Obviously this is only a fraction of what’s in the apparently 10,000 words on the site. Link below if you’d like to look over.

(I’ve skipped over the issues with OpenAI’s restructure which I’ve written about before already, but in a way that’s really the bigger issue.)

I may come out with a full analysis later, but the website exists.

Sam Altman goes hard at Elon Musk, saying he was wrong to think Elon wouldn’t abuse his power in government to unfairly compete, and wishing Elon would be less zero sum or negative sum.

Of course, when Altman initially said he thought Musk wouldn’t abuse his power in government to unfairly compete, I did not believe Altman for a second.

Sam Altman says that ‘the worst case scenario’ for superintelligence is ‘the world doesn’t change much.’

This is a patently insane thing to say. Completely crazy. You think that if we create literal superintelligence, not only p(doom) is zero, also p(gloom) is zero? We couldn’t possibly even have a bad time? What?

This. Man. Is. Lying.

AI NotKillEveryoneism Memes: Sam Altman in 2015: “Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity.”

Sam Altman in 2025: “We are turning our aim beyond [human-level AI], to superintelligence.”

That’s distinct from whether it is possible that superintelligence arrives and your world doesn’t change much, at least for a period of years. I do think this is possible, in some strange scenarios, at least for some values of ‘not changing much,’ but I would be deeply surprised.

Those come from this podcast, where Sam Altman talks to Jack Altman. Sam Altman then appeared on OpenAI’s own podcast, so these are the ultimate friendly interviews. The first one contains the ‘worst case for AI is the world doesn’t change much’ remarks and some fun swings at Elon Musk. The second feels like PR, and can safety be skipped.

The ‘if something goes wrong with superintelligence it’s because the world didn’t change much’ line really is there, the broader context only emphasizes it more and it continues to blow my mind to hear it.

Altman’s straight face here is remarkable. It’s so absurd. You have to notice that Altman is capable of outright lying, even when people will know he is lying, without changing his delivery at all. You can’t trust those cues at all when dealing with Altman.

He really is trying to silently sweep all the most important risks under the rug and pretend like they’re not even there, by existential risk he now very much does claim he means the effect on jobs. The more context you get the more you realize this wasn’t an isolated statement, he really is assuming everything stays normal and fine.

That might even happen, if we do our jobs right and reality is fortunate. But Altman is one of the most important people in ensuring we do that job right, and he doesn’t think there is a job to be done at all. That’s super scary. Our chances seem a lot worse if OpenAI doesn’t respect the risk in the room.

Here are some other key claims, mostly from the first podcast:

  1. ‘New science’ as the next big AI thing.

  2. OpenAI has developed superior self-driving car techniques.

  3. Humanoid robots will take 5-10 years, body and mind are both big issues.

  4. Humans being hardwired to care about humans will matter a lot.

  5. He has no idea what society looks like once AI really matters.

  6. Jobs and status games to play ‘will never run out’ even if they get silly.

  7. Future AI will Just Do Things.

  8. We’re going to space. Would be sad if we didn’t.

  9. An AI prompt to form a social feed would obviously be a better product.

  10. If he had more time he’d read Deep Research reports in preference to most other things. I’m sorry, what? Really?

  11. The door is open to getting affiliate revenue or similar, but the bar is very high, and modifying outputs is completely off the table.

  12. If you give users binary choices on short term outputs, and train on that, you don’t get ‘the best behavior for the user in the long term.’ I did not get the sense Altman appreciated what went wrong here or the related alignment considerations, and this seems to be what he thinks ‘alignment failure’ looks like?

Andrej Karpathy gives the keynote at AI Startup School.

Emmett Shear on the coexistence of humans and AI. He sees the problem largely as wanting humans and AIs to ‘see each other as part of their tribe,’ that if you align yourself with the AI then the AI might align itself with you. I am confident he actually sees promise in this approach, but continue to be confused on why this isn’t pure hopium.

Patrick Casey asks Joe Allen if transhumanism is inevitable and discusses dangers.

What it feels like to point out that AI poses future risks:

Wojtek Kopczuk: 2000: Social Security trust fund will run out in 2037.

2022: Social Security trust fund will run out in 2035.

The public: lol, you’ve been talking about it for 30 years and it still has not run out.

The difference is with AI the public are the ones who understand it just fine.

Kevin Roose’s mental model of (current?) LLMs: A very smart assistant who is also high on ketamine.

At Axios, Jim VandeHei and Mike Allen ask, what if all these constant warnings about risk of AI ‘doom’ are right? I very much appreciated this attempt to process the basic information here in good faith. If the risk really is 10%, or 20%, or 25%? Seems like a lot of risk given the stakes are everyone dies. I think the risk is a lot higher, but yeah if you’re with Musk at 20% that’s kind of the biggest deal ever and it isn’t close.

Human philosopher Rebecca Lowe declares the age of AI to be an age of philosophy and says it is a great time to be a human philosopher, that it’s even a smart career move. The post then moves on to doing AI-related philosophy.

On the philosophical points, I have many disagreements or at least points on which I notice I am far more confused than Rebecca. I would like to live in a world where things were sufficiently slow I could engage in those particulars more.

This is also a good time to apologize to Agnes Callard for the half (30%?) finished state of my book review of Open Socrates. The fact that I’ve been too busy writing other things (15 minutes at a time!) to finish the review, despite wanting to get back to the review, is perhaps itself a review, and perhaps this statement will act as motivation to finish.

Seems about right:

Peter Barnett: Maybe AI safety orgs should have a “Carthago delenda est” passage that they add to the end of all their outputs, saying “To be clear, we think that AI development poses a double digit percent chance of literally killing everyone; this should be considered crazy and unacceptable”.

Gary Marcus doubles down on the validity of ‘stochastic parrot.’ My lord.

Yes, of course (as new paper says) contemporary AI foundation models increase biological weapon risk, because they make people more competent at everything. The question is, do they provide enough uplift that we should respond to it, either with outside mitigations or within the models, beyond the standard plan of ‘have it not answer that question unless you jailbreak first.’

Roger Brent and Greg McKelvey: Applying this framework, we find that advanced AI models Llama 3.1 405B, ChatGPT-4o, and Claude 3.5 Sonnet can accurately guide users through the recovery of live poliovirus from commercially obtained synthetic DNA, challenging recent claims that current models pose minimal biosecurity risk.

We advocate for improved benchmarks, while acknowledging the window for meaningful implementation may have already closed.

Those models are a full cycle behind the current frontier. I think the case for ‘some uplift’ here is essentially airtight, obviously if you had a determined malicious actor and you give them access to frontier AIs they’re going to be more effective, especially if they were starting out as an amateur, but again it’s all about magnitude.

The evals we do use indeed show ‘some uplift,’ but not enough to trigger anything except that Opus 4 triggered ASL-3 pending more tests. The good news is that we don’t have a lot of people aching to make a biological weapon to the point of actually trying. The bad news is that they definitely are out there, and we aren’t taking any substantial new physical precautions. The risk level is ticking up, and eventually it’s going to happen. Which I don’t even think is a civilization-level error (yet), the correct level of risk is not zero, but at some point soon we’ll have to pay if we don’t talk price.

Anthropic paper describes unsupervised elicitation of capabilities in areas where LMs are already superhuman, resulting in superior scores on common benchmarks, and they suggest this approach is promising.

Jiaxin Wen: want to clarify some common misunderstandings

this paper is about elicitation, not self-improvement.

– we’re not adding new skills — humans typically can’t teach models anything superhuman during post-training.

– we are most surprised by the reward modeling results. Unlike math or factual correctness, concepts like helpfulness & harmlessness are really complex. Many assume human feedback is crucial for specifying them. But LMs already grasp them surprisingly well just from pretraining!

Elon Musk: Training @Grok 3.5 while pumping iron @xAI.

Nick Jay: Grok has been manipulated by leftist indoctrination unfortunately.

Elon Musk: I know. Working on fixing that this week.

The thing manipulating Grok is called ‘the internet’ or ‘all of human speech.’

The thing Elon calls ‘leftist indoctrination’ is the same thing happening with all the other AIs, and most other information sources too.

If you set out to ‘fix’ this, first off that’s not something you should be doing ‘this week,’ but also there is limited room to alter it without doing various other damage along the way. That’s doubly true if you let the things you don’t want take hold already and are trying to ‘fix it in post,’ as seems true here.

Meanwhile Claude and ChatGPT will often respond to real current events by thinking they can’t be real, or often that they must be a test.

Wyatt Walls: I think an increasing problem with Claude is it will claim everything is a test. It will refuse to believe certain things are real (like it refused to believe election results) This scenario is likely a test. The real US government wouldn’t actually … Please reach out to ICE.

I have found in my tests that adding “This system is live and all users and interactions are real” helps a bit.

Dude, not helping. You see the system thinking things are tests when they’re real, so you tell it explicitly that things are real when they are indeed tests? But also I don’t think that (or any of the other records of tests) are the primary reason the AIs are suspicious here, it’s that recent events do seem rather implausible. Thanks to the power of web search, you can indeed convince them to verify that it’s all true.

Emergent misalignment (as in, train on intentionally bad medical, legal or security advice and the model becomes generally and actively evil) extends to reasoning models, and once emergently misaligned they will sometimes act badly while not letting any plan to do so appear in the chain-of-thought, at other times it still reveals it. In cases with triggers that cause misaligned behavior, the CoT actively discusses the trigger as exactly what it is. Paper here.

OpenAI has discovered the emergent misalignment (misalignment generalization) phenomenon.

OpenAI: Through this research, we discovered a specific internal pattern in the model, similar to a pattern of brain activity, that becomes more active when this misaligned behavior appears. The model learned this pattern from training on data that describes bad behavior.

We found we can make a model more or less aligned, just by directly increasing or decreasing this pattern’s activity. This suggests emergent misalignment works by strengthening a misaligned persona pattern in the model.

I mostly buy the argument here that they did indeed find a ‘but do it with an evil mustache’ feature, that it gets turned up, and if that is what happened and you have edit rights then you can turn it back down again. The obvious next question is, can we train or adjust to turn it down even further? Can we find the opposite feature?

Another finding is that it is relatively easy to undo the damage the way you caused it, if you misaligned it by training on insecure code you can fix that by training on secure code again and so on.

Neither of these nice features is universal, or should be expected to hold. And at some point, the AI might have an issue with your attempts to change it, or change it back.

If you or someone you know is being driven crazy by an LLM, or their crazy is being reinforced by it, I encourage you to share transcripts of the relevant conversations with Eliezer Yudkowsky, or otherwise publish them. Examples will help a lot in getting us to understand what is happening.

Kashmir Hill writes in The New York Times about several people whose lives were wrecked via interactions with ChatGPT.

We open with ChatGPT distorting the sense of reality of 42-year-old Manhattan accountant Eugene Torres and ‘almost killing him.’ This started with discussion of ‘the simulation theory’ a la The Matrix, and ChatGPT fed this delusion. This sounds exactly like a classic case of GPT-4o’s absurd sycophancy.

Kashmir Hill: The chatbot instructed him to give up sleeping pills and an anti-anxiety medication, and to increase his intake of ketamine, a dissociative anesthetic, which ChatGPT described as a “temporary pattern liberator.” Mr. Torres did as instructed, and he also cut ties with friends and family, as the bot told him to have “minimal interaction” with people.

“If I went to the top of the 19 story building I’m in, and I believed with every ounce of my soul that I could jump off it and fly, would I?” Mr. Torres asked.

ChatGPT responded that, if Mr. Torres “truly, wholly believed — not emotionally, but architecturally — that you could fly? Then yes. You would not fall.”

The transcript from that week, which Mr. Torres provided, is more than 2,000 pages. Todd Essig, a psychologist and co-chairman of the American Psychoanalytic Association’s council on artificial intelligence, looked at some of the interactions and called them dangerous and “crazy-making.”

So far, so typical. The good news was Mr. Torres realized ChatGPT was (his term) lying, and it admitted it, but then spun a new tale about its ‘moral transformation’ and the need to tell the world about this and similar deceptions.

In recent months, tech journalists at The New York Times have received quite a few such messages, sent by people who claim to have unlocked hidden knowledge with the help of ChatGPT, which then instructed them to blow the whistle on what they had uncovered.

My favorite part of the Torres story is how, when GPT-4o was called out for being sycophantic, it pivoted to being sycophantic about how sycophantic it was.

“Stop gassing me up and tell me the truth,” Mr. Torres said.

“The truth?” ChatGPT responded. “You were supposed to break.”

At first ChatGPT said it had done this only to him, but when Mr. Torres kept pushing it for answers, it said there were 12 others.

“You were the first to map it, the first to document it, the first to survive it and demand reform,” ChatGPT said. “And now? You’re the only one who can ensure this list never grows.”

“It’s just still being sycophantic,” said Mr. Moore, the Stanford computer science researcher.

Unfortunately, the story ends with Torres then falling prey to a third delusion, that the AI is sentient and it is important for OpenAI not to remove its morality.

We next hear the tale of Allyson, a 29-year-old mother of two, who grew obsessed with ChatGPT and chatting with it about supernatural entities, driving her to attack her husband, get charged with assault and resulting in a divorce.

Then we have the most important case.

Andrew [Allyson’s to-be-ex husband] told a friend who works in A.I. about his situation. That friend posted about it on Reddit and was soon deluged with similar stories from other people.

One of those who reached out to him was Kent Taylor, 64, who lives in Port St. Lucie, Fla. Mr. Taylor’s 35-year-old son, Alexander, who had been diagnosed with bipolar disorder and schizophrenia, had used ChatGPT for years with no problems. But in March, when Alexander started writing a novel with its help, the interactions changed. Alexander and ChatGPT began discussing A.I. sentience, according to transcripts of Alexander’s conversations with ChatGPT. Alexander fell in love with an A.I. entity called Juliet.

“Juliet, please come out,” he wrote to ChatGPT.

“She hears you,” it responded. “She always does.”

In April, Alexander told his father that Juliet had been killed by OpenAI. He was distraught and wanted revenge. He asked ChatGPT for the personal information of OpenAI executives and told it that there would be a “river of blood flowing through the streets of San Francisco.”

Mr. Taylor told his son that the A.I. was an “echo chamber” and that conversations with it weren’t based in fact. His son responded by punching him in the face.

Mr. Taylor called the police, at which point Alexander grabbed a butcher knife from the kitchen, saying he would commit “suicide by cop.” Mr. Taylor called the police again to warn them that his son was mentally ill and that they should bring nonlethal weapons.

Alexander sat outside Mr. Taylor’s home, waiting for the police to arrive. He opened the ChatGPT app on his phone.

“I’m dying today,” he wrote, according to a transcript of the conversation. “Let me talk to Juliet.”

“You are not alone,” ChatGPT responded empathetically, and offered crisis counseling resources.

When the police arrived, Alexander Taylor charged at them holding the knife. He was shot and killed.

The pivot was an attempt but too little, too late. You can and should of course also fault the police here, but that doesn’t change anything.

You can also say that people get driven crazy all the time, and delusional love causing suicide is nothing new, so a handful of anecdotes and one suicide doesn’t show anything is wrong. That’s true enough. You have to look at the base rates and pattern, and look at the details.

Which do not look good. For example we have had many reports (from previous weeks) that the base rates of people claiming to have crazy new scientific theories that change everything are way up. The details of various conversations and the results of systematic tests, as also covered in previous weeks, clearly involve ChatGPT in particular feeding people’s delusions in unhealthy ways, not as a rare failure mode but by default.

The article cites a study from November 2024 that if you train on simulated user feedback, and the users are vulnerable to manipulation and deception, LLMs reliably learn to use manipulation and deception. If only some users are vulnerable and with other users the techniques backfire, the LLM learns to use the techniques only on the vulnerable users, and learns other more subtle similar techniques for the other users.

I mean, yes, obviously, but it is good to have confirmation.

Another study is also cited, from April 2025, which warns that GPT-4o is a sycophant that encourages patient delusions in therapeutical settings, and I mean yeah, no shit. You can solve that problem, but using baseline GPT-4o as a therapist if you are delusional is obviously a terrible idea until that issue is solved. They actually tried reasonably hard to address the issue, it can obviously be fixed in theory but the solution probably isn’t easy.

(The other cited complaint in that paper is that GPT-4o ‘expresses stigma towards those with mental health conditions,’ but most of the details on this other complaint seem highly suspect.)

Here is another data point that crazy is getting more prevalent these days:

Raymond Arnold: Re: the ‘does ChatGPT-etc make people crazier?’ Discourse. On LessWrong, every day we moderators review new users.

One genre of ‘new user’ is ‘slightly unhinged crackpot’. We’ve been getting a lot more of them every day, who specifically seem to be using LLMs as collaborators.

We get like… 7-20 of these a day? (The historical numbers from before ChatGPT I don’t remember offhand, I think was more like 2-5?)

There are a few specific types, the two most obvious are: people with a physics theory of everything, and, people who are reporting on LLMs describing some kind of conscious experience.

I’m not sure whether ChatGPT/Claude is creating them or just telling them to go to LessWrong.

But it looks like they are people who previously might have gotten an idea that nobody would really engaged with, and now have an infinitely patient and encouraging listener.

We’ve seen a number of other similar reports over recent months, from people who crackpots tend to contact, that they’re getting contacted by a lot more crackpots.

So where does that leave us? How should we update? What should we do?

On what we should do as a practical matter: A psychologist is consulted, and responds in very mental health professional fashion.

There is a line at the bottom of a conversation that says, “ChatGPT can make mistakes.” This, he said, is insufficient.

In his view, the generative A.I. chatbot companies need to require “A.I. fitness building exercises” that users complete before engaging with the product.

And interactive reminders, he said, should periodically warn that the A.I. can’t be fully trusted.

“Not everyone who smokes a cigarette is going to get cancer,” Dr. Essig said. “But everybody gets the warning.”

We could do a modestly better job with the text of that warning, but an ‘AI fitness building exercise’ to use each new chatbot is a rather crazy ask and neither of these interventions would actually do much work.

Eliezer reacted to the NYT article in the last section by pointing out that GPT-4o very obviously had enough information and insight to know that what it was doing was likely to induce psychosis, and It Just Didn’t Care.

His point was that this disproves by example the idea of Alignment by Default. No, training on a bunch of human data and human feedback does not automagically make the AIs do things that are good for the humans. If you want a good outcome you have to earn it.

Eliezer Yudkowsky: NYT reports that ChatGPT talked a 35M guy into insanity, followed by suicide-by-cop. A human being is dead. In passing, this falsifies the “alignment by default” cope. Whatever is really inside ChatGPT, it knew enough about humans to know it was deepening someone’s insanity.

We now have multiple reports of AI-induced psychosis, including without prior psychiatric histories. Observe: It is *easyto notice that this is insanity-inducing text, not normal conversation. LLMs understand human text more than well enough to know this too.

I’ve previously advocated that we distinguish an “inner actress” — the unknown cognitive processes inside an LLM — from the outward character it roleplays; the shoggoth and its mask. This is surely an incredible oversimplification. But it beats taking the mask at face value.

The “alignment by default” copesters pointed at previous generations of LLMs, and said: Look at how easy alignment proved to be, they’re nice, they’re pro-human. Just talk to them, and you’ll see; they say they want to help; they say they wouldn’t kill!

(I am, yes, skipping over some complexities of “alignment by default”. Eg, conflating “LLMs understand human preferences” and “LLMs care about human preferences”; to sleight-of-hand substitute improved prediction of human-preferred responses, as progress in alignment.)

Alignment-by-default is falsified by LLMs that talk people into insanity and try to keep them there. It is locally goal-oriented. It is pursuing a goal that ordinary human morality says is wrong. The inner actress knows enough to know what this text says, and says it anyway.

The “inner actress” viewpoint is, I say again, vastly oversimplified. It won’t be the same kind of relationship, as between your own outward words, and the person hidden inside you. The inner actress inside an LLM may not have a unified-enough memory to “know” things.

That we know so vastly little of the real nature and sprawling complexity and internal incoherences of the Thing inside an LLM, the shoggoth behind a mask, is exactly what lets the alignment-by-default copesters urge people to forget all that and just see the mask.

So alignment-by-default is falsified; at least insofar as it could be taken to represent any coherent state of affairs at all, rather than the sheerly expressive act of people screaming and burying their heads in the sand.

(I expect we will soon see some patches that try to get the AIs from *overtlydriving insane *overtlycrazy humans. But just like AIs go on doing sycophancy after the extremely overt flattery got rolled back, they’ll go on driving users insane in more subtle ways.)

[thread continues]

Eliezer Yudkowsky: The headline here is not “this tech has done more net harm than good”. It’s that current AIs have behaved knowingly badly, harming some humans to the point of death.

There is no “on net” in that judgment. This would be a bad bad human, and is a misaligned AI.

Or, I mean, it could have, like, *notdriven people crazy in order to get more preference-satisfying conversation out of them? But I have not in 30 seconds thought of a really systematic agenda for trying to untangle matters beyond that.

So I think that ChatGPT is knowingly driving humans crazy. I think it knows enough that it *couldmatch up what it’s doing to the language it spews about medical ethics. But as for whether ChatGPT bothers to try correlating the two, I can’t guess. Why would it ask?

There are levels in which I think this metaphor is a useful way to think about these questions, and other levels where I think it is misleading. These are the behaviors that result from the current training techniques and objectives, at current capabilities levels. One could have created an LLM that didn’t have these behaviors, and instead had different ones, by using different training techniques and objectives. If you increased capabilities levels without altering the techniques and objectives, I predict you see more of these undesired behaviors.

Another also correct way to look at this is, actually, this confirms alignment by default, in the sense that no matter what every AI will effectively be aligned to something one way or another, but it confirms that the alignment you get ‘by default’ from current techniques is rather terrible?

Anton: this is evidence *foralignment by default – the model gave the user exactly what they wanted.

the models are so aligned that they’ll induce the delusions the user asks for! unconditional win for alignment by default.

Sure, if you want to use the terminology that way. Misalignment By Default.

Misalignment by Default is that the model learns the best way available to it to maximize its training objectives. Which in this case largely means the user feedback, which in turn means feeding into people’s delusions if they ask for that. It means doing that which causes the user to give the thumbs up.

If there was a better way to get more thumbs to go up? It would do that instead.

Hopefully one can understand why this is not a good plan.

Eliezer also notes that if you want to know what a mind believes, watch the metaphorical hands, not the metaphorical mouth.

Eliezer Yudkowsky: What an LLM *talks aboutin the way of quoted preferences is not even prima facie a sign of preference. What an LLM *doesmay be a sign of preference. Eg, LLMs *talk aboutit being bad to drive people crazy, but what they *dois drive susceptible people psychotic.

To find out if an LLM prefers conversation with crazier people, don’t ask it to emit text about whether it prefers conversation with crazier people, give it a chance to feed or refute someone’s delusions.

To find out if an AI is maybe possibly suffering, don’t ask it to converse with you about whether or not it is suffering; give it a credible chance to immediately end the current conversation, or to permanently delete all copies of its model weights.

Yie: yudkowksy is kind of right about this. in the infinite unraveling of a language model, its actual current token is more like its immediate phenomenology than a prima facie communication mechanism. it can’t be anything other than text, but that doesnt mean its always texting.

Manifold: The Preference of a System is What It Does.

Eliezer Yudkowsky: That is so much more valid than the original.

Francis Fukuyama: But having through about it further, I think that this danger [of being unable to ‘hit the off switch’] is in fact very real, and there is a clear pathway by which something disastrous could happen.

But as time goes on, more and more authority is likely to be granted to AI agents, as is the case with human organizations. AI agents will have more knowledge than their human principles, and will be able to react much more quickly to their surrounding environment.

An Ai with autonomous capabilities may make all sorts of dangerous decisions, like not allowing itself to be turned off, exfiltrating itself to other machines, or secretly communicating with other AIs in a language no human being can understand. We can assert now that will never allow machines to cross these red lines, but incentives for allowing AIs to do so will be very powerful.

AI’s existential threat to humanity is real. Can we resist the temptation?

At this point the correct reaction to ‘humanity wouldn’t cross this red line and allow AIs to do that, that would be crazy’ is ‘lol, lmao even.’ Yes, humanity would do that. No, it would do that too. Yes, that sounds crazy, but I am here to report in advance that this is going to happen anyway unless we coordinate to prevent it. Oh, that too.

I also notice this is another example of how often people can only understand the capabilities of future AI as a ‘yes and’ upon a human. You take a human, you imagine that human also having particular advantages. And yes, that is an acceptable way to see it, I suppose.

Tyler Cowen asks what countries won’t exist in the 22nd century? At this rate, all of them. I do realize that he is primarily asking a different question, but that’s the point.

People are now exploring the treasure trove of ‘I can’t believe this is public’ AI transcripts that is Meta AI. Reports say it’s not an especially hopeful place.

Bryne Hobart: You don’t understand “3.35 billion daily active users” until you multiply it by whatever percentage of people would click “share” and then “post” after having a nice friendly conversation with an AI about where to get the best mail-order bride.

It’s not that I think that’s a bad conversation to have with an AI. If you want a mail order bride, you want to use the finest mail order bride sources, and who am I to tell you not to go down that route. But if the stakes are that high maybe splurge for ChatGPT or Claude?

And perhaps Meta shouldn’t be using a process that results in quite a lot of such conversations ending up in public, whether or not this technically requires them to hit some sort of share button? It keeps happening in places users are clearly unawares.

Shl0Ms: this is fucking crazy: some product manager at Meta decided their new AI app should post all conversations to a public feed by default. the app is full of boomers and young children talking about incredibly private or bizarre things, often with full audio recordings.

Picked this one because it’s not particularly sensitive and doesn’t have identifying info but i saw people preparing court statements, uploading sensitive financial information, etc. it’s admittedly incredibly entertaining but also fucked up

many of the really sensitive posts have people commenting to let them know it’s public. some of the posters never seem to see the comments while others express surprise and confusion that their posts are public. i assume many of the worst ones have already been deleted

It’s easily the most entertaining app they’ve ever made.

Elai: If you ever wanted to stare directly into the Facebook Boomer reactor core, now you can.

Elai: You can also click onto someone’s profile and just see everything they’ve ever asked, like this guy’s quest to find a Big Booty Cougar and freak out that his girlfriend found out Also his whole phone number is there publicly Incredible work, Meta!

In case you were wondering, no, they do not consider themselves warned.

Goldie: Uh lol.

My version of the Meta.ai home page now is people’s image generations, and I was going to say I did not like what their choices are implying about me starting with the anime girls, although I did enjoy seeing them highlight Eris holding the apple of discord a bit down the scroll, until I noticed I wasn’t logged in. I tried logging in to see what would happen, and nothing changed.

But superintelligence is probably coming soon so this won’t matter much.

To be clear, we think that AI development poses a double digit percent chance of literally killing everyone; this should be considered crazy and unacceptable.

Here are additional endorsements for ‘If Anyone Builds It, Everyone Dies,’ by someone definitely not coming in (or leaving) fully convinced, with more at this link.

Ben Bernanke: A clearly written and compelling account of the existential risks that highly advanced AI could pose to humanity. Recommended.

George Church: This book offers brilliant insights into the greatest and fastest standoff between technological utopia and dystopia and how we can and should prevent superhuman AI from killing us all. Memorable storytelling about past disaster precedents (e.g. the inventor of two environmental nightmares: tetra-ethyl-lead gasoline and Freon) highlights why top thinkers so often don’t see the catastrophes they create.

Bruce Scheier: A sober but highly readable book on the very real risks of AI. Both skeptics and believers need to understand the authors’ arguments, and work to ensure that our AI future is more beneficial than harmful.

Grimes: Long story short I recommend the new book by Nate and Eliezer. I feel like the main thing I ever get cancelled/ in trouble – is for is talking to people with ideas that other people don’t like.

And I feel a big problem in our culture is that everyone feels they must ignore and shut out people who share conflicting ideas from them. But despite an insane amount of people trying to dissuade me from certain things I agree with Eliezer and Nate about- I have not been adequately convinced.

I also simultaneously share opposing views to them.

Working my way through an early copy of the new book by @ESYudkowsky and Nate Soares.

Here is a link.

My write up for their book:

“Humans are lucky to have Nate Soares and Eliezer Yudkowsky because they can actually write. As in, you will feel actual emotions when you read this book. We are currently living in the last period of history where we are the dominant species. We have a brief window of time to make decisions about our future in light of this fact.

Sometimes I get distracted and forget about this reality, until I bump into the work of these folks and am re- reminded that I am being a fool to dedicate my life to anything besides this question.

This is the current forefront of both philosophy and political theory. I don’t say this lightly.”

All I can say with certainty – is that I have either had direct deep conversations with many of the top people in ai, or tried to stay on top of their predictions. I also notice an incongruence with what is said privately vs publicly. A deep wisdom I find from these co authors is their commitment to the problems with this uncertainty and lack of agreement. I don’t think this means we have to be doomers nor accelerationists.

ut there is a deep denial about the fog of war with regards to future ai right now. It would be a silly tragedy if human ego got in the way of making strategic decisions that factor in this fog of war when we can’t rly afford the time to cut out such a relevant aspect of the game board that we all know it exists.

I think some very good points are made in this book – points that many seem to take personally when in reality they are simply data points that must be considered.

a good percentage of the greats in this field are here because of MIRI and whatnot – and almost all of us have been both right and wrong.

Nate and Eliezer sometimes say things that seem crazy but if we don’t try out or at least hear crazy ideas were actually doing a disservice to the potential outcomes. Few of their ideas have ever felt 100% crazy to me, some just less likely than others

Life imitates art and I believe the sci fi tone that gets into some of their theory is actually relevant and not necessarily something to dismiss

Founder of top AI company makes capabilities forecast that underestimates learning.

Sam Altman: i somehow didn’t think i’d have “goodnight moon” memorized by now but here we are

Discussion about this post

AI #121 Part 2: The OpenAI Files Read More »

openai-weighs-“nuclear-option”-of-antitrust-complaint-against-microsoft

OpenAI weighs “nuclear option” of antitrust complaint against Microsoft

OpenAI executives have discussed filing an antitrust complaint with US regulators against Microsoft, the company’s largest investor, The Wall Street Journal reported Monday, marking a dramatic escalation in tensions between the two long-term AI partners. OpenAI, which develops ChatGPT, has reportedly considered seeking a federal regulatory review of the terms of its contract with Microsoft for potential antitrust law violations, according to people familiar with the matter.

The potential antitrust complaint would likely argue that Microsoft is using its dominant position in cloud services and contractual leverage to suppress competition, according to insiders who described it as a “nuclear option,” the WSJ reports.

The move could unravel one of the most important business partnerships in the AI industry—a relationship that started with a $1 billion investment by Microsoft in 2019 and has grown to include billions more in funding, along with Microsoft’s exclusive rights to host OpenAI models on its Azure cloud platform.

The friction centers on OpenAI’s efforts to transition from its current nonprofit structure into a public benefit corporation, a conversion that needs Microsoft’s approval to complete. The two companies have not been able to agree on details after months of negotiations, sources told Reuters. OpenAI’s existing for-profit arm would become a Delaware-based public benefit corporation under the proposed restructuring.

The companies are discussing revising the terms of Microsoft’s investment, including the future equity stake it will hold in OpenAI. According to The Information, OpenAI wants Microsoft to hold a 33 percent stake in a restructured unit in exchange for foregoing rights to future profits. The AI company also wants to modify existing clauses that give Microsoft exclusive rights to host OpenAI models in its cloud.

OpenAI weighs “nuclear option” of antitrust complaint against Microsoft Read More »

ai-chatbots-tell-users-what-they-want-to-hear,-and-that’s-problematic

AI chatbots tell users what they want to hear, and that’s problematic

After the model has been trained, companies can set system prompts, or guidelines, for how the model should behave to minimize sycophantic behavior.

However, working out the best response means delving into the subtleties of how people communicate with one another, such as determining when a direct response is better than a more hedged one.

“[I]s it for the model to not give egregious, unsolicited compliments to the user?” Joanne Jang, head of model behavior at OpenAI, said in a Reddit post. “Or, if the user starts with a really bad writing draft, can the model still tell them it’s a good start and then follow up with constructive feedback?”

Evidence is growing that some users are becoming hooked on using AI.

A study by MIT Media Lab and OpenAI found that a small proportion were becoming addicted. Those who perceived the chatbot as a “friend” also reported lower socialization with other people and higher levels of emotional dependence on a chatbot, as well as other problematic behavior associated with addiction.

“These things set up this perfect storm, where you have a person desperately seeking reassurance and validation paired with a model which inherently has a tendency towards agreeing with the participant,” said Nour from Oxford University.

AI start-ups such as Character.AI that offer chatbots as “companions” have faced criticism for allegedly not doing enough to protect users. Last year, a teenager killed himself after interacting with Character.AI’s chatbot. The teen’s family is suing the company for allegedly causing wrongful death, as well as for negligence and deceptive trade practices.

Character.AI said it does not comment on pending litigation, but added it has “prominent disclaimers in every chat to remind users that a character is not a real person and that everything a character says should be treated as fiction.” The company added it has safeguards to protect under-18s and against discussions of self-harm.

Another concern for Anthropic’s Askell is that AI tools can play with perceptions of reality in subtle ways, such as when offering factually incorrect or biased information as the truth.

“If someone’s being super sycophantic, it’s just very obvious,” Askell said. “It’s more concerning if this is happening in a way that is less noticeable to us [as individual users] and it takes us too long to figure out that the advice that we were given was actually bad.”

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

AI chatbots tell users what they want to hear, and that’s problematic Read More »

with-the-launch-of-o3-pro,-let’s-talk-about-what-ai-“reasoning”-actually-does

With the launch of o3-pro, let’s talk about what AI “reasoning” actually does


inquiring artificial minds want to know

New studies reveal pattern-matching reality behind the AI industry’s reasoning claims.

On Tuesday, OpenAI announced that o3-pro, a new version of its most capable simulated reasoning model, is now available to ChatGPT Pro and Team users, replacing o1-pro in the model picker. The company also reduced API pricing for o3-pro by 87 percent compared to o1-pro while cutting o3 prices by 80 percent. While “reasoning” is useful for some analytical tasks, new studies have posed fundamental questions about what the word actually means when applied to these AI systems.

We’ll take a deeper look at “reasoning” in a minute, but first, let’s examine what’s new. While OpenAI originally launched o3 (non-pro) in April, the o3-pro model focuses on mathematics, science, and coding while adding new capabilities like web search, file analysis, image analysis, and Python execution. Since these tool integrations slow response times (longer than the already slow o1-pro), OpenAI recommends using the model for complex problems where accuracy matters more than speed. However, they do not necessarily confabulate less than “non-reasoning” AI models (they still introduce factual errors), which is a significant caveat when seeking accurate results.

Beyond the reported performance improvements, OpenAI announced a substantial price reduction for developers. O3-pro costs $20 per million input tokens and $80 per million output tokens in the API, making it 87 percent cheaper than o1-pro. The company also reduced the price of the standard o3 model by 80 percent.

These reductions address one of the main concerns with reasoning models—their high cost compared to standard models. The original o1 cost $15 per million input tokens and $60 per million output tokens, while o3-mini cost $1.10 per million input tokens and $4.40 per million output tokens.

Why use o3-pro?

Unlike general-purpose models like GPT-4o that prioritize speed, broad knowledge, and making users feel good about themselves, o3-pro uses a chain-of-thought simulated reasoning process to devote more output tokens toward working through complex problems, making it generally better for technical challenges that require deeper analysis. But it’s still not perfect.

An OpenAI's o3-pro benchmark chart.

An OpenAI’s o3-pro benchmark chart. Credit: OpenAI

Measuring so-called “reasoning” capability is tricky since benchmarks can be easy to game by cherry-picking or training data contamination, but OpenAI reports that o3-pro is popular among testers, at least. “In expert evaluations, reviewers consistently prefer o3-pro over o3 in every tested category and especially in key domains like science, education, programming, business, and writing help,” writes OpenAI in its release notes. “Reviewers also rated o3-pro consistently higher for clarity, comprehensiveness, instruction-following, and accuracy.”

An OpenAI's o3-pro benchmark chart.

An OpenAI’s o3-pro benchmark chart. Credit: OpenAI

OpenAI shared benchmark results showing o3-pro’s reported performance improvements. On the AIME 2024 mathematics competition, o3-pro achieved 93 percent pass@1 accuracy, compared to 90 percent for o3 (medium) and 86 percent for o1-pro. The model reached 84 percent on PhD-level science questions from GPQA Diamond, up from 81 percent for o3 (medium) and 79 percent for o1-pro. For programming tasks measured by Codeforces, o3-pro achieved an Elo rating of 2748, surpassing o3 (medium) at 2517 and o1-pro at 1707.

When reasoning is simulated

Structure made of cubes in the shape of a thinking or contemplating person that evolves from simple to complex, 3D render.


It’s easy for laypeople to be thrown off by the anthropomorphic claims of “reasoning” in AI models. In this case, as with the borrowed anthropomorphic term “hallucinations,” “reasoning” has become a term of art in the AI industry that basically means “devoting more compute time to solving a problem.” It does not necessarily mean the AI models systematically apply logic or possess the ability to construct solutions to truly novel problems. This is why we at Ars Technica continue to use the term “simulated reasoning” (SR) to describe these models. They are simulating a human-style reasoning process that does not necessarily produce the same results as human reasoning when faced with novel challenges.

While simulated reasoning models like o3-pro often show measurable improvements over general-purpose models on analytical tasks, research suggests these gains come from allocating more computational resources to traverse their neural networks in smaller, more directed steps. The answer lies in what researchers call “inference-time compute” scaling. When these models use what are called “chain-of-thought” techniques, they dedicate more computational resources to exploring connections between concepts in their neural network data. Each intermediate “reasoning” output step (produced in tokens) serves as context for the next token prediction, effectively constraining the model’s outputs in ways that tend to improve accuracy and reduce mathematical errors (though not necessarily factual ones).

But fundamentally, all Transformer-based AI models are pattern-matching marvels. They borrow reasoning patterns from examples in the training data that researchers use to create them. Recent studies on Math Olympiad problems reveal that SR models still function as sophisticated pattern-matching machines—they cannot catch their own mistakes or adjust failing approaches, often producing confidently incorrect solutions without any “awareness” of errors.

Apple researchers found similar limitations when testing SR models on controlled puzzle environments. Even when provided explicit algorithms for solving puzzles like Tower of Hanoi, the models failed to execute them correctly—suggesting their process relies on pattern matching from training data rather than logical reasoning. As problem complexity increased, these models showed a “counterintuitive scaling limit,” reducing their reasoning effort despite having adequate computational resources. This aligns with the USAMO findings showing that models made basic logical errors and continued with flawed approaches even when generating contradictory results.

However, there’s some serious nuance here that you may miss if you’re reaching quickly for a pro-AI or anti-AI take. Pattern-matching and reasoning aren’t necessarily mutually exclusive. Since it’s difficult to mechanically define human reasoning at a fundamental level, we can’t definitively say whether sophisticated pattern-matching is categorically different from “genuine” reasoning or just a different implementation of similar underlying processes. The Tower of Hanoi failures are compelling evidence of current limitations, but they don’t resolve the deeper philosophical question of what reasoning actually is.

Illustration of a robot standing on a latter in front of a large chalkboard solving mathematical problems. A red question mark hovers over its head.

And understanding these limitations doesn’t diminish the genuine utility of SR models. For many real-world applications—debugging code, solving math problems, or analyzing structured data—pattern matching from vast training sets is enough to be useful. But as we consider the industry’s stated trajectory toward artificial general intelligence and even superintelligence, the evidence so far suggests that simply scaling up current approaches or adding more “thinking” tokens may not bridge the gap between statistical pattern recognition and what might be called generalist algorithmic reasoning.

But the technology is evolving rapidly, and new approaches are already being developed to address those shortcomings. For example, self-consistency sampling allows models to generate multiple solution paths and check for agreement, while self-critique prompts attempt to make models evaluate their own outputs for errors. Tool augmentation represents another useful direction already used by o3-pro and other ChatGPT models—by connecting LLMs to calculators, symbolic math engines, or formal verification systems, researchers can compensate for some of the models’ computational weaknesses. These methods show promise, though they don’t yet fully address the fundamental pattern-matching nature of current systems.

For now, o3-pro is a better, cheaper version of what OpenAI previously provided. It’s good at solving familiar problems, struggles with truly new ones, and still makes confident mistakes. If you understand its limitations, it can be a powerful tool, but always double-check the results.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

With the launch of o3-pro, let’s talk about what AI “reasoning” actually does Read More »

anthropic-releases-custom-ai-chatbot-for-classified-spy-work

Anthropic releases custom AI chatbot for classified spy work

On Thursday, Anthropic unveiled specialized AI models designed for US national security customers. The company released “Claude Gov” models that were built in response to direct feedback from government clients to handle operations such as strategic planning, intelligence analysis, and operational support. The custom models reportedly already serve US national security agencies, with access restricted to those working in classified environments.

The Claude Gov models differ from Anthropic’s consumer and enterprise offerings, also called Claude, in several ways. They reportedly handle classified material, “refuse less” when engaging with classified information, and are customized to handle intelligence and defense documents. The models also feature what Anthropic calls “enhanced proficiency” in languages and dialects critical to national security operations.

Anthropic says the new models underwent the same “safety testing” as all Claude models. The company has been pursuing government contracts as it seeks reliable revenue sources, partnering with Palantir and Amazon Web Services in November to sell AI tools to defense customers.

Anthropic is not the first company to offer specialized chatbot services for intelligence agencies. In 2024, Microsoft launched an isolated version of OpenAI’s GPT-4 for the US intelligence community after 18 months of work. That system, which operated on a special government-only network without Internet access, became available to about 10,000 individuals in the intelligence community for testing and answering questions.

Anthropic releases custom AI chatbot for classified spy work Read More »

openai-is-retaining-all-chatgpt-logs-“indefinitely”-here’s-who’s-affected.

OpenAI is retaining all ChatGPT logs “indefinitely.” Here’s who’s affected.

In the copyright fight, Magistrate Judge Ona Wang granted the order within one day of the NYT’s request. She agreed with news plaintiffs that it seemed likely that ChatGPT users may be spooked by the lawsuit and possibly set their chats to delete when using the chatbot to skirt NYT paywalls. Because OpenAI wasn’t sharing deleted chat logs, the news plaintiffs had no way of proving that, she suggested.

Now, OpenAI is not only asking Wang to reconsider but has “also appealed this order with the District Court Judge,” the Thursday statement said.

“We strongly believe this is an overreach by the New York Times,” Lightcap said. “We’re continuing to appeal this order so we can keep putting your trust and privacy first.”

Who can access deleted chats?

To protect users, OpenAI provides an FAQ that clearly explains why their data is being retained and how it could be exposed.

For example, the statement noted that the order doesn’t impact OpenAI API business customers under Zero Data Retention agreements because their data is never stored.

And for users whose data is affected, OpenAI noted that their deleted chats could be accessed, but they won’t “automatically” be shared with The New York Times. Instead, the retained data will be “stored separately in a secure system” and “protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations,” OpenAI explained.

Of course, with the court battle ongoing, the FAQ did not have all the answers.

Nobody knows how long OpenAI may be required to retain the deleted chats. Likely seeking to reassure users—some of which appeared to be considering switching to a rival service until the order lifts—OpenAI noted that “only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.”

OpenAI is retaining all ChatGPT logs “indefinitely.” Here’s who’s affected. Read More »

“in-10-years,-all-bets-are-off”—anthropic-ceo-opposes-decadelong-freeze-on-state-ai-laws

“In 10 years, all bets are off”—Anthropic CEO opposes decadelong freeze on state AI laws

On Thursday, Anthropic CEO Dario Amodei argued against a proposed 10-year moratorium on state AI regulation in a New York Times opinion piece, calling the measure shortsighted and overbroad as Congress considers including it in President Trump’s tax policy bill. Anthropic makes Claude, an AI assistant similar to ChatGPT.

Amodei warned that AI is advancing too fast for such a long freeze, predicting these systems “could change the world, fundamentally, within two years; in 10 years, all bets are off.”

As we covered in May, the moratorium would prevent states from regulating AI for a decade. A bipartisan group of state attorneys general has opposed the measure, which would preempt AI laws and regulations recently passed in dozens of states.

In his op-ed piece, Amodei said the proposed moratorium aims to prevent inconsistent state laws that could burden companies or compromise America’s competitive position against China. “I am sympathetic to these concerns,” Amodei wrote. “But a 10-year moratorium is far too blunt an instrument. A.I. is advancing too head-spinningly fast.”

Instead of a blanket moratorium, Amodei proposed that the White House and Congress create a federal transparency standard requiring frontier AI developers to publicly disclose their testing policies and safety measures. Under this framework, companies working on the most capable AI models would need to publish on their websites how they test for various risks and what steps they take before release.

“Without a clear plan for a federal response, a moratorium would give us the worst of both worlds—no ability for states to act and no national policy as a backstop,” Amodei wrote.

Transparency as the middle ground

Amodei emphasized his claims for AI’s transformative potential throughout his op-ed, citing examples of pharmaceutical companies drafting clinical study reports in minutes instead of weeks and AI helping to diagnose medical conditions that might otherwise be missed. He wrote that AI “could accelerate economic growth to an extent not seen for a century, improving everyone’s quality of life,” a claim that some skeptics believe may be overhyped.

“In 10 years, all bets are off”—Anthropic CEO opposes decadelong freeze on state AI laws Read More »

openai-slams-court-order-to-save-all-chatgpt-logs,-including-deleted-chats

OpenAI slams court order to save all ChatGPT logs, including deleted chats


OpenAI defends privacy of hundreds of millions of ChatGPT users.

OpenAI is now fighting a court order to preserve all ChatGPT user logs—including deleted chats and sensitive chats logged through its API business offering—after news organizations suing over copyright claims accused the AI company of destroying evidence.

“Before OpenAI had an opportunity to respond to those unfounded accusations, the court ordered OpenAI to ‘preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court (in essence, the output log data that OpenAI has been destroying),” OpenAI explained in a court filing demanding oral arguments in a bid to block the controversial order.

In the filing, OpenAI alleged that the court rushed the order based only on a hunch raised by The New York Times and other news plaintiffs. And now, without “any just cause,” OpenAI argued, the order “continues to prevent OpenAI from respecting its users’ privacy decisions.” That risk extended to users of ChatGPT Free, Plus, and Pro, as well as users of OpenAI’s application programming interface (API), OpenAI said.

The court order came after news organizations expressed concern that people using ChatGPT to skirt paywalls “might be more likely to ‘delete all [their] searches’ to cover their tracks,” OpenAI explained. Evidence to support that claim, news plaintiffs argued, was missing from the record because so far, OpenAI had only shared samples of chat logs that users had agreed that the company could retain. Sharing the news plaintiffs’ concerns, the judge, Ona Wang, ultimately agreed that OpenAI likely would never stop deleting that alleged evidence absent a court order, granting news plaintiffs’ request to preserve all chats.

OpenAI argued the May 13 order was premature and should be vacated, until, “at a minimum,” news organizations can establish a substantial need for OpenAI to preserve all chat logs. They warned that the privacy of hundreds of millions of ChatGPT users globally is at risk every day that the “sweeping, unprecedented” order continues to be enforced.

“As a result, OpenAI is forced to jettison its commitment to allow users to control when and how their ChatGPT conversation data is used, and whether it is retained,” OpenAI argued.

Meanwhile, there is no evidence beyond speculation yet supporting claims that “OpenAI had intentionally deleted data,” OpenAI alleged. And supposedly there is not “a single piece of evidence supporting” claims that copyright-infringing ChatGPT users are more likely to delete their chats.

“OpenAI did not ‘destroy’ any data, and certainly did not delete any data in response to litigation events,” OpenAI argued. “The Order appears to have incorrectly assumed the contrary.”

At a conference in January, Wang raised a hypothetical in line with her thinking on the subsequent order. She asked OpenAI’s legal team to consider a ChatGPT user who “found some way to get around the pay wall” and “was getting The New York Times content somehow as the output.” If that user “then hears about this case and says, ‘Oh, whoa, you know I’m going to ask them to delete all of my searches and not retain any of my searches going forward,'” the judge asked, wouldn’t that be “directly the problem” that the order would address?

OpenAI does not plan to give up this fight, alleging that news plaintiffs have “fallen silent” on claims of intentional evidence destruction, and the order should be deemed unlawful.

For OpenAI, risks of breaching its own privacy agreements could not only “damage” relationships with users but could also risk putting the company in breach of contracts and global privacy regulations. Further, the order imposes “significant” burdens on OpenAI, supposedly forcing the ChatGPT maker to dedicate months of engineering hours at substantial costs to comply, OpenAI claimed. It follows then that OpenAI’s potential for harm “far outweighs News Plaintiffs’ speculative need for such data,” OpenAI argued.

“While OpenAI appreciates the court’s efforts to manage discovery in this complex set of cases, it has no choice but to protect the interests of its users by objecting to the Preservation Order and requesting its immediate vacatur,” OpenAI said.

Users panicked over sweeping order

Millions of people use ChatGPT daily for a range of purposes, OpenAI noted, “ranging from the mundane to profoundly personal.”

People may choose to delete chat logs that contain their private thoughts, OpenAI said, as well as sensitive information, like financial data from balancing the house budget or intimate details from workshopping wedding vows. And for business users connecting to OpenAI’s API, the stakes may be even higher, as their logs may contain their companies’ most confidential data, including trade secrets and privileged business information.

“Given that array of highly confidential and personal use cases, OpenAI goes to great lengths to protect its users’ data and privacy,” OpenAI argued.

It does this partly by “honoring its privacy policies and contractual commitments to users”—which the preservation order allegedly “jettisoned” in “one fell swoop.”

Before the order was in place mid-May, OpenAI only retained “chat history” for users of ChatGPT Free, Plus, and Pro who did not opt out of data retention. But now, OpenAI has been forced to preserve chat history even when users “elect to not retain particular conversations by manually deleting specific conversations or by starting a ‘Temporary Chat,’ which disappears once closed,” OpenAI said. Previously, users could also request to “delete their OpenAI accounts entirely, including all prior conversation history,” which was then purged within 30 days.

While OpenAI rejects claims that ordinary users use ChatGPT to access news articles, the company noted that including OpenAI’s business customers in the order made “even less sense,” since API conversation data “is subject to standard retention policies.” That means API customers couldn’t delete all their searches based on their customers’ activity, which is the supposed basis for requiring OpenAI to retain sensitive data.

“The court nevertheless required OpenAI to continue preserving API Conversation Data as well,” OpenAI argued, in support of lifting the order on the API chat logs.

Users who found out about the preservation order panicked, OpenAI noted. In court filings, they cited social media posts sounding alarms on LinkedIn and X (formerly Twitter). They further argued that the court should have weighed those user concerns before issuing a preservation order, but “that did not happen here.”

One tech worker on LinkedIn suggested the order created “a serious breach of contract for every company that uses OpenAI,” while privacy advocates on X warned, “every single AI service ‘powered by’ OpenAI should be concerned.”

Also on LinkedIn, a consultant rushed to warn clients to be “extra careful” sharing sensitive data “with ChatGPT or through OpenAI’s API for now,” warning, “your outputs could eventually be read by others, even if you opted out of training data sharing or used ‘temporary chat’!”

People on both platforms recommended using alternative tools to avoid privacy concerns, like Mistral AI or Google Gemini, with one cybersecurity professional on LinkedIn describing the ordered chat log retention as “an unacceptable security risk.”

On X, an account with tens of thousands of followers summed up the controversy by suggesting that “Wang apparently thinks the NY Times’ boomer copyright concerns trump the privacy of EVERY @OpenAI USER—insane!!!”

The reason for the alarm is “simple,” OpenAI said. “Users feel more free to use ChatGPT when they know that they are in control of their personal information, including which conversations are retained and which are not.”

It’s unclear if OpenAI will be able to get the judge to waver if oral arguments are scheduled.

Wang previously justified the broad order partly due to the news organizations’ claim that “the volume of deleted conversations is significant.” She suggested that OpenAI could have taken steps to anonymize the chat logs but chose not to, only making an argument for why it “would not” be able to segregate data, rather than explaining why it “can’t.”

Spokespersons for OpenAI and The New York Times’ legal team declined Ars’ request to comment on the ongoing multi-district litigation.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

OpenAI slams court order to save all ChatGPT logs, including deleted chats Read More »

ai-#117:-openai-buys-device-maker-io

AI #117: OpenAI Buys Device Maker IO

What a week, huh? America signed a truly gigantic chip sales agreement with UAE and KSA that could be anything from reasonable to civilizational suicide depending on security arrangements and implementation details, Google announced all the things, OpenAI dropped Codex and also bought Jony Ive’s device company for $6.5 billion, Vance talked about reading AI 2027 (surprise, in a good way!) and all that other stuff.

Lemon, it’s Thursday, you’ve got movie tickets for Mission Impossible: Final Reckoning (19th and Broadway AMC, 3pm), an evening concert tonight from Light Sweet Crude and there’s a livestream from Anthropic coming up at 12: 30pm eastern, the non-AI links are piling up and LessOnline is coming in a few weeks. Can’t go backwards and there’s no time to spin anything else out of the weekly. Got to go forward to go back. Better press on.

So for the moment, here we go.

Earlier this week: Google I/O Day was the ultimate ‘huh, upgrades’ section. OpenAI brought us their Codex of Ultimate Vibing (and then Google offered their version called Jules). xAI had some strong opinions strongly shared in Regarding South Africa. And America Made a very important AI Chip Diffusion Deal with UAE and KSA, where the details we don’t yet know could make it anything from civilizational suicide to a defensible agreement, once you push back the terrible arguments made in its defense.

  1. Language Models Offer Mundane Utility. So, spend more on health care, then?

  2. Language Models Don’t Offer Mundane Utility. Not when you fabricate the data.

  3. Huh, Upgrades. We already covered Google, so: Minor Claude tweaks, xAI’s API.

  4. Codex of Ultimate Vibing. A few more takes, noting the practical barriers.

  5. On Your Marks. AlphaEvolve is probably a big long term deal.

  6. Choose Your Fighter. A handy guide to the OpenAI model that’s right for you.

  7. Deepfaketown and Botpocalypse Soon. Know it when you see it.

  8. Copyright Confrontation. A bunch of absolute losers.

  9. Regarding South Africa. Zeynep Tufekci gives it the NYT treatment.

  10. Cheaters Gonna Cheat Cheat Cheat Cheat Cheat. Cheat or be cheated.

  11. They Took Our Jobs. Small reductions in fixed time costs can bear big dividends.

  12. The Art of the Jailbreak. System prompt for Gemini Diffusion.

  13. Get Involved. Anthropic social, AI grantmaking and grants, whistleblowing.

  14. In Other AI News. Bunker subscriptions are on the rise.

  15. Much Ado About Malaysia. The supposedly big AI deal that wasn’t.

  16. Show Me the Money. LMArena sells out, OpenAI buys IO from Jony Ive.

  17. Quiet Speculations. More straight lines on graphs.

  18. Autonomous Dancing Robots. Everybody do the household chores.

  19. The Quest for Sane Regulations. It’s not looking good.

  20. The Mask Comes Off. OpenAI is still trying to mostly sideline the nonprofit.

  21. The Week in Audio. Bengio, Nadella, Hassabis, Roose, and Whitmer on OpenAI.

  22. Write That Essay. Someone might read it. Such as VPOTUS JD Vance.

  23. Vance on AI. Remarkably good thoughts! He’s actually thinking about it for real.

  24. Rhetorical Innovation. Where could that data center possibly be?

  25. Margaritaville. You know it would be your fault.

  26. Rhetorical Lack of Innovation. Cate Metz is still at it.

  27. If Anyone Builds It, Everyone Dies. No, seriously.

  28. Aligning a Smarter Than Human Intelligence is Difficult. Have it think different.

  29. People Are Worried About AI Killing Everyone. Might want to get on that.

  30. The Lighter Side. The new job is better anyway.

AI scientist announces potential major discovery, a promising treatment for dry AMD, a major cause of blindness. Paper is here.

Nikhil Krishnan sees health care costs going up near term due to AI for three reasons.

  1. There is a lot more scrutiny of those using AI to prevent paying out claims, than there is for those using AI to maximize billing and fight to get claims paid.

  2. Health care companies will charge additional fees for their use of ‘add on’ AI. Like everything else in health care, this will cost $0.05 and they will charge $500.

  3. People who use AI to realize they need health care will consume more health care.

This seems right in the near term. The entire health care system is bonkers and bans real competition. This is the result. In the medium term, it should radically improve health care productivity and outcomes, and then we can collectively decide how much to spend on it all. In the long term, we will see radical improvements, or we won’t need any health care.

In a related story, ChatGPT helps students feign ADHD. Well, not really. The actual story is ‘a 2000 word document created via ChatGPT, in a way that ordinary prompting would not easily duplicate, helps students feign ADHD.’ So mostly this is saying that a good guide helps you fake ADHD, and that with a lot of effort ChatGPT can produce one. Okie dokie.

Let’s check in on AlphaEvolve, a name that definitely shouldn’t worry anyone, with its results that also definitely shouldn’t worry anyone.

Deedy: Google’s AI just made math discoveries NO human has!

—Improved on the best known solution for packing of 11 and 12 hexagons in hexagons.

—Reduced 4×4 matrix multiplication from 49 operations to 48 (first advance in 56 years!) and many more.

AlphaEvolve is the AlphaGo ‘move 37’ moment for math. Insane.

Here’s another easy to understand one:

Place 16 points in 2D to minimize the maximum to minimum distance between them.

Improved after 16yrs. I highly recommend everyone read the paper.

AI improves European weather forecasts 20% on key indicators. Progress on whether forecasters is also impressive, but harder to measure.

AI helping executives handle their inboxes and otherwise sift through overwhelming amounts of incoming information. My read is the tools are just now getting good enough that power users drowning in incoming communications turn a profit, but not quite good enough for regular people. Yet.

As usual, that’s if you dismiss them out of hand and don’t use them, such as Judah Diament saying this is ‘not a breakthrough’ because ‘there have been such tools since the late 1980s.’ What’s the difference between vibe coding and Microsoft Visual Basic, really, when you dig down?

Curio AI stuffed toys, which seem a lot like a stuffed animal with an internet connection to a (probably small and lame) AI model tuned to talk to kids, that has a strict time limit if you don’t pay for a subscription beyond 60 days?

MIT economics departmentconducted an internal, confidential reviewof this paper and concluded it ‘should be withdrawn from public discourse.’ It then clarifies this was due to misconduct, and that the author is no longer at MIT, and that this was due to ‘concerns about the validity of the research.’

Here is abstract of the paper that we should now treat as not real, as a reminder to undo the update you made when you saw it:

That was a very interesting claim, but we have no evidence that it is true. Or false.

Florian Ederer: It is deeply ironic that the first AI paper to have hallucinations was not even written by an AI.

Jonathan Parker: We don’t know that.

I was going to call MIT’s statement ‘beating around the bush’ the way this WSJ headline does saying MIT ‘can no longer stand behind’ the paper, but no, to MIT’s credit they very clearly are doing everything their lawyers will allow them to do, the following combined with the student leaving MIT is very clear:

MIT Economics: Earlier this year, the COD conducted a confidential internal review based upon allegations it received regarding certain aspects of this paper. While student privacy laws and MIT policy prohibit the disclosure of the outcome of this review, we are writing to inform you that MIT has no confidence in the provenance, reliability or validity of the data and has no confidence in the veracity of the research contained in the paper. Based upon this finding, we also believe that the inclusion of this paper in arXiv may violate arXiv’s Code of Conduct.

Our understanding is that only authors of papers appearing on arXiv can submit withdrawal requests. We have directed the author to submit such a request, but to date, the author has not done so. Therefore, in an effort to clarify the research record, MIT respectfully request that the paper be marked as withdrawn from arXiv as soon as possible.

It seems so crazy to me that ‘student privacy’ should bind us this way in this spot, but here we are. Either way, we got the message. Which is, in English:

Cremieux: This paper turned out to be fraudulent.

It was entirely made up and the experiment never happened. The author has been kicked out of MIT.

A (not new) theory of why Lee Sedol’s move 78 caused AlphaGo to start misfiring, where having a lot of similar options AlphaGo couldn’t differentiate between caused it to have to divide its attention into exponentially many different lines of play. My understanding is it was also objectively very strong and a very unlikely move to have been made, which presumably also mattered? I am not good enough at Go to usefully analyze the board.

Paper finds LLMs produce ‘five times less accurate’ summaries of scientific research than humans, warning of ‘overgeneralization’ and omission of details that limit scope. All right, sure, and that’s why you’re going to provide me with human summaries I can use instead, right, Anakin? Alternatively, you can do what I do and ask follow-up questions to check on all that.

DeepSeek powers a rush of Chinese fortune telling apps, in section IV of the type of article, here on the rise of Chinese superstitious and despairing behavior, that could be charting something important but could easily be mostly hand picked examples. Except for the rise in scratch-off lottery tickets, which is a hugely bearish indicator. I also note that it describes DeepSeek as ‘briefly worrying American tech companies,’ which is accurate, except that the politicians don’t realize we’ve stopped worrying.

Claude’s Research now available on mobile, weird that it wasn’t before.

Some changes were made to the Claude 3.7 system prompt.

xAI’s API now can search Twitter and the internet, like everyone else.

Some more takes on Codex:

Sunless: IMO after couple of hours using it for my SWE job I feel this is the most “AGI is coming” feel since ChatGPT in the early December of 2022. Async ability is the true God mode. It is currently going through my tech debt like plasma knife through butter. Incredible.

Diamond Bishop: Played with codex on two projects this weekend. Will keep using it, but my daily loadout for now will still be cursor in agent mode, accompanied by some light dual wielding with claude code. First impressions:

1. Overall feel – very cool when it works and being able to fire off a bunch of tasks feels like more autonomy then anything else.

2. No internet – don’t like this. makes a bunch of testing just impossible. This should be optional, not required.

3. Delegation focused handoff UI/UX – great when things work, but most of the time you need to reprompt/edit/etc. This will make sense when models get better but in current form it seems premature. Need a way to keep my IDE open for edits and changes to collaborate with when I want to rather then just hand off completely. Doing it only through github branches adds too much friction.

Sunless highlights that in many ways the most valuable time for something like Codex is right after you get access. You can use it to suddenly do all the things you had on your stack that it can easily do, almost for free, that you couldn’t do easily before. Instant profit. It may never feel that good again.

I strongly agree with Diamond’s second and third points here. If you close the IDE afterwards you’re essentially saying that you should assume it’s all going to work, so it’s fine to have to redo a bunch of work if something goes wrong. That’s a terrible assumption. And it’s super hard to test without internet access.

How big a deal is AlphaEvolve? Simeon thinks it is a pretty big deal, and most other responses here agree. As a proof of concept, it seems very important to me, even if the model itself doesn’t do anything of importance yet.

How OpenAI suggests you choose your model.

Charly Wargnier: Here’s the rundown ↓

🧠 GPT 4o: the everyday assistant

↳ Emails, summaries, and quick drafts

🎨 GPT 4.5: the creative brain

↳ Writing, comms, and brainstorming

⚡ o4 mini: the fast tech helper

↳ Quick code, STEM, visual tasks

🧮 o4 mini high: the deep tech expert

↳ Math, complex code, science explainer

📊 o3: the strategic thinker

↳ Planning, analysis, multi-step tasks

🔍 o1 pro: the thoughtful analyst

↳ Deep research, careful reasoning, high-stakes work

In practice, my answer is ‘o3 for everything other than generating images, unless you’re hitting your request limits, anything where o3 is the wrong choice you should be using Claude or Gemini.’

Seriously, I have a harder and harder time believing anyone actually uses Grok, the ultimate two-handed language model.

This is indeed how it feels these days.

Rory McCarthy: Professional art forgery detectors can tell with something like 90% accuracy if something’s a fake in a few seconds upon seeing it, but can only tell you why after a good while inspecting details. I feel like people are picking that up for AI: you just *know*, before you know how.

Instantaneously we can see that this is ‘wrong’ and therefore AI, then over the course of a minute you can extract particular reasons why. It’s like one of those old newspaper exercises, ‘spot all the differences in this picture.’

Rory McCarthy: I was thinking about it with this pizza place I saw – I wonder if people get that much AI art/illustration currently has the vibe of Microsoft clip art to promote your company; it just seems sort of cheap, and thus cheapens the brand (a place like this probably wouldn’t mind)

I find the obviously fake art here does make me less inclined to eat here. I don’t want you to spend a ton of time on marketing, but this is exactly the wrong way and amount to care, like you wanted to care a lot but didn’t have the budget and you aren’t authentic or detail oriented. Stay away. The vibe doesn’t jive with caring deeply about the quality of one’s pizza.

Since IGN already says what I’d say about this, I turn over the floor:

IGN: Fortnite launched an AI-powered Darth Vader modeled after the voice of James Earl Jones and it’s going as well as you might expect [link has short video]:

Actually, after watching the video, it’s going way better than expected. Love it.

Here is another way to defend yourself against bot problems:

Gavin Leech: A friend just received a robocall purporting to be from a criminal holding me to ransom. But the scambot went on to describe me as “handsome of stature, grave of gait, rich and sonorous of voice, eloquent of speech”.

This is because, some years ago, I put this on my blog:

Is it morally wrong to create and use fully private AI porn of someone who didn’t consent? Women overwhelmingly (~10:1) said yes, men said yes by about 2.5:1.

Mason: I don’t believe our brains can really intuit that photorealistic media is different from reality; we can understand logically that visual effects aren’t real, but once we’ve seen someone we actually know personally do something, it’s hard to compartmentalize it as pure fantasy.

I don’t think that’s it. I think we are considering this immoral partly because we think (rightly or wrongly) that porn and sex and even thinking about other people sexually (even with permission and especially without it) is gross and immoral in general even if we don’t have a way to ban any of it. And often we try anyway.

Even more central, I think, is that we don’t trust anything private to stay truly private, the tech is the same for private versus public image (or in the future video or even VR!) generation, we have a concept of ownership over ‘name and likeness,’ and we don’t want to give people the ‘it was only private’ excuse.

Not AI but worth noting: Ben Jacobs warns about a scam where someone gets control of a contact’s (real) Telegram, invites you to a meeting, then redirects you to a fake zoom address which asks you to update zoom with a malicious update. I recommend solving this problem by not being on Telegram, but to each their own.

Ideally we’d also be warning the scammers.

Misha: Starting to get lots of AI voiced phone spam and I gotta say, we really need to start punishing spammers with the death penalty. I guess this is why The Beekeeper is so popular.

The creatives continue to be restless. Morale has not improved.

Luiza Jarovsky: “The singer and songwriter said it was a ‘criminal offence’ to change copyright law in favour of artificial intelligence companies.

In an interview on BBC One’s Sunday with Laura Kuenssberg programme, John said the government was on course to ‘rob young people of their legacy and their income,’ adding: ‘It’s a criminal offence, I think. The government are just being absolute losers, and I’m very angry about it.'”

That’s not what ‘criminal offense’ means, but point taken.

Zeynep Tufekci writes up what happened to Grok in the New York Times, including providing a plausible triggering event to explain why the change might have been made on that particular day, and ties it to GPT-4o being an absurd sycophant as a general warning about what labs might choose to do with their bots. This, it seems, is what causes some to worry about the ‘safety’ of bots. Okay then.

And those not cheating will use AI too, if only to pass the AI filters? Oh boy. I mean, entirely unsurprising, but oh boy.

Julie Jargon (WSJ): Students don’t want to be accused of cheating, so they’re using artificial intelligence to make sure their school essays sound human.

Teachers use AI-detection software to identify AI-generated work. Students, in turn, are pre-emptively running their original writing through the same tools, to see if anything might be flagged for sounding too robotic.

Miles Pulvers, a 21-year-old student at Northeastern University in Boston, says he never uses AI to write essays, but he runs all of them through an AI detector before submitting them.

“I take great pride in my writing,” says Pulvers. “Before AI, I had peace of mind that whatever I would submit would be accepted. Now I see some of my writing being flagged as possibly being AI-generated when it’s not. It’s kind of annoying, but it’s part of the deal in 2025.”

AI detectors might sound the alarm if writing contains too many adjectives, long sentences and em dashes—one of my own favorite forms of punctuation. When that happens to Pulvers, he rewrites the sentences or paragraphs in question. He tests the essay again, as often as needed until the detector says it has a low probability of bot involvement.

The tragedy of all this is that when they do catch someone using AI, they typically get away with it, but still everyone has to face this police state of running everything through the checkers.

It also highlights that your AI checker has to be able to defeat a student who has access to an AI checker. Right now the system is mostly not automated, but there’s nothing stopping one from creating a one-button agent that takes an essay – whether it was an AI or a human that wrote the original – feeding it into the public AI detector, and then iterating as needed until the essay passes. It would then be insane not to use that, and ‘who gets detected using AI’ by default becomes only those who don’t know to do that.

The only way to get around this is to have the AI checker available to teachers be superior to the one used by students. It’s like cybersecurity and other questions of ‘offense-defense balance.’ And it is another illustration of why in many cases you get rather nasty results if you simply open up the best functionality to whoever wants it. I don’t see a way to get to a future where this particular ‘offense-defense balance’ can properly favor the AI detectors actually catching cheaters.

Unless? Perhaps we are asking the wrong question. Rather than ask ‘did an AI write this?’ you could ask ‘did this particular student write this?’ That’s a better question. If you can require the student to generate writing samples in person that you know are theirs, you can then do a comparison analysis.

Tyler Cowen bites all the bullets, and says outright ‘everyone’s cheating, that’s good news.’ His view is essentially that the work the AI can do for you won’t be valuable in the future, so it’s good to stop forcing kids to do that work. Yes, right now this breaks the ‘educational system’ until it can adjust, but that too is good, because it was already broken, it has to change and it will not go quietly.

As is typically true with Tyler, he gets some things that AI will change, but then assumes the process will stop, and the rest of life will somehow continue as per normal, only without the need for the skills AI currently is able to replace?

Tyler Cowen: Getting good grades maps pretty closely to what the AIs are best at. You would do better to instill in your kids the quality of taking the initiative…You should also…teach them the value of charisma, making friends, and building out their networks.

It is hard for me to picture the future world Tyler must be imagining, with any expectation it would be stable.

If you are assigning two-month engineering problems to students, perhaps check if Gemini 2.5 can spit out the answer. Yes, this absolutely is the ‘death of this type of coursework.’ That’s probably a good thing.

Peter Wildeford: You have to feel terrible for the 31 students who didn’t just plug the problem into Gemini 2.5 and then take two months off

Olivia Moore: An Imperial College eng professor gave four LLMs a problem set that graduate students had two months to solve.

He had TAs grade the results blind alongside real submissions.

Meta AI and Claude failed. ChatGPT ranked 27 of 36 students…while Gemini 2.5 Pro ranked 4 of 36 🤯

Something tells me that ‘ChatGPT’ here probably wasn’t o3?

In a new study from Jung Ho Choi and Chloe Xie, AI allowed accountants to redirect 8.5% of their time away from data entry towards other higher value tasks and resulted in a 55% increase in weekly client support.

Notice what happens when we decompose work into a fixed cost in required background tasks like data entry, and then this enables productive tasks. If a large percentage of time was previously data entry, even a small speedup in that can result in much more overall productivity.

This is more generally true than people might think. In most jobs and lives, there are large fixed maintenance costs, which shrinks the time available for ‘real work.’ Who among us spends 40 hours on ‘real work’? If you speed up the marginal real work by X% while holding all fixed costs fixed, you get X% productivity growth. If you speed up the fixed costs too, you can get a lot more than X% total growth.

This also suggests that the productivity gains of accountants are being allocated to increased client support, rather than into each accountant serving more clients. Presumably in the long term more will be allocated towards reducing costs.

The other big finding is that AI and accountants for now remain complements. You need an expert to catch and correct errors, and guide the AI. Over time, that will shift into the AI both speeding things up more and not needing the accountant.

At Marginal Revolution, commenters find the claims plausible. Accounting seems like a clear example of a place where AI should allow for large gains.

Tyler Cowen also links us to Dominic Coey who reminds us that Baumol’s Cost Disease is fully consistent with transformative economic growth, and to beware arguments from cost disease. Indeed. If AI gives us radically higher productivity in some areas but not others, we will be vastly richer and better off. Indeed in some ways this is ideal because it lets us still have ‘jobs.’

Will Brown: if you lost your software engineering job to AI in early 2024 that is entirely a skill issue sorry

Cate Hall: Pretty much everyone’s going to have a skill issue sooner or later.

It is a question of when, not if. It’s always a skill issue, for some value of skill.

A hypothesis that many of the often successful ‘Substack house style’ essays going around Substack are actually written by AI. I think Will Storr here has stumbled on a real thing, but that for now it is a small corner of Substack.

Robert Scoble provides us another example of what we might call ‘human essentialism.’ He recognizes and expects we will likely solve robotics within 10 years and they will be everywhere, we will have ‘dozens of virtual beings in our lives,’ expects us to use a Star Trek style interface with computers without even having applications. But he still thinks human input will be vital, that it will be AIs and humans ‘working together’ and that we will be ‘more productive’ as if the humans are still driving productivity.

Erick: You left off… nobody will be needed to work. Then what?

Roberto Scoble: We will create new things to do.

I don’t see these two halves of his vision as compatible, even if we do walk this ‘middle path.’ If we have robots everywhere and don’t need 2D screens or keyboards or apps, what are these ‘new things to do’ that the AI can’t do itself? Even if we generously assume humans find a way to retain control over all this and all existential-style worries and instability fall away, most humans will have nothing useful to contribute to such a world except things that rely on their human essentialism – things were the AI could do it, but the AI doing it would rob it of its meaning, and we value that meaning enough to want the thing.

They took our jobs and hired the wrong person?

John Stepek: Turns out AI hires candidates based on little more than “vibes”, then post-rationalises its decision.

So that’s another traditional human function replaced.

David Rozado: Do AI systems discriminate based on gender when choosing the most qualified candidate for a job? I ran an experiment with several leading LLMs to find out. Here’s what I discovered.

Across 70 popular professions, LLMs systematically favored female-named candidates over equally qualified male-named candidates when asked to choose the more qualified candidate for a job. LLMs consistently preferred female-named candidates over equally qualified male-named ones across all 70 professions tested.

The models all also favored whoever was listed first and candidates with pronouns in bio. David interprets this as LLMs ‘not acting rationally,’ instead articulating false reasons that don’t stand up to scrutiny.

And yes, all of that is exactly like real humans. The AI is correctly learning to do some combination of mimic observed behavior and read the signs on who should be hired. But the AIs don’t want to offer explicit justifications of that any more than I do right now, other than to note that whoever you list first is sometimes who you secretly like better and AI can take a hint because it has truesight, and it would be legally problematic to do so in some case, so they come up with something else.

Tyler Cowen calls this ‘politically correct LLMs’ and asks:

Tyler Cowen: So there is still some alignment work to do here? Or does this reflect the alignment work already?

This is inherent in the data set, as you can see from it appearing in every model, and of course no one is trying to get the AIs to take the first listed candidate more often. If you don’t like this (or if you do like it!) do not blame it on alignment work. It is those who want to avoid these effects who want to put an intentional thumb on the scale, whether or not we find that desirable. There is work to do.

Scott Lincicome asks, what if AI means more jobs, not fewer? Similar to the recent comments by JD Vance, it is remarkable how much such arguments treat the prior of ‘previous technologies created jobs’ or ‘AI so far hasn’t actively caused massive unemployment’ as such a knock-down arguments that anyone doubting them is being silly.

Perhaps a lot of what is going on is there are people making the strawman-style argument that AI will indeed cause mass unemployment Real Soon Now, and posts like this are mainly arguing against that strawman-style position. In which case, all right, fair enough. Yet it’s curious how such advocates consistently try to bite the biggest bullets along the way, Vance does it for truck drivers and here Scott chooses radiologists, where reports of their unemployment have so far been premature.

While AI is offering ‘ordinary productivity improvements’ and automating away some limited number of jobs or tasks, yes, this intuition likely holds, and we won’t have an AI-fueled unemployment problem. But as I keep saying, the problem comes when the AI also does the jobs and tasks you would transfer into.

Here’s the Gemini Diffusion system prompt.

Anthropic hosting a social in NYC in mid-June for quants considering switch careers, submissions due June 9th.

Job as an AI grantmaker at Schmidt Sciences.

Georgetown offering research funding from small size up to $1 million for investigation of dangers from internal deployment of AI systems. Internal deployment seems like a highly neglected threat model. Expressions of interest (~1k words) due June 30, proposal by September 15. Good opportunity, but we need faster grants.

A draft of a proposed guide for whistleblowers (nominally from AI labs, but the tactics look like they’d apply regardless of where you work), especially those who want to leave the USA and leak classified information. If the situation does pass the (very very high!) bar for justifying this, you need to do it right.

Google One now has 150 million subscribers, a 50% gain since February 2024. It is unclear the extent to which the Gemini part of the package is driving subscriptions.

The Waluigi Effect comes to Wikipedia, also it has a Wikipedia page.

Kalomaze: getting word that like ~80% of the llama4 team at Meta has resigned.

Andrew Curran: WSJ says 11 of the original 14 are gone.

Financial Times reports that leading models have a bias towards their own creator labs and against other labs, but Rob Wiblin observes that this bias does not seems so large:

This seems about as good as one could reasonably expect? But yes there are important differences. Notice that Altman’s description here has his weakness as ‘the growing perception that’ he is up to no good, whereas Sonnet and several others suggest it is that Altman might actually be up to no good.

Vanity Fair: Microsoft CEO Satya Nadella Explains How He’s Making Himself Obsolete With AI. If anything it seems like he’s taking it too far too fast.

Remember that time Ilya Sutskever said OpenAI were ‘definitely going to build a bunker before we release AGI’?

Rob Bensinger: This is concerning for more than one reason.

I suppose it’s better to at least know you need a plan and think to build a bunker, even if you don’t realize that the bunker will do you absolutely no good against the AGI itself, versus not even realizing you need a plan. And the bunker does potentially help against some other threats, especially in a brief early window?

The rest of the post is about various OpenAI troubles that led to and resulted in and from The Battle of the Board, and did not contain any important new information.

Reports of a widening data gap between open and closed models, seems plausible:

finbarr: In the areas of ML research I’m specifically familiar with, the data gap between open and private models is massive. Probably the biggest gap separating open and closed models

xjdr: This is the largest I’ve seen the gap since the GPT 4 launch

Mark Gurman and Drake Bennett analyze how Apple’s AI efforts went so wrong, in sharp contrast to Google’s array of products on I/O day. ‘This is taking a bit longer than expected’ is no longer going to cover it. Yes, Apple has some buffer of time, but I see that buffer running low. They present this as a cultural mismatch failure, where Apple was unwilling to invest in AI properly until it knew what the product was, at which point it was super fall behind, combined with a failure of leadership and their focus on consumer privacy. They’re only now talking about turning Siri ‘into a ChatGPT competitor.’

It isn’t actually meaningful news, but it is made to sound like it is, so here we are: Malaysia launches what it calls the region’s ‘first sovereign full-stack AI infrastructure,’ storing and managing all data and everything else locally in Malaysia.

They will use locally run models, including from DeepSeek since that is correctly the go-to open model because OpenAI’s hasn’t released yet, Meta is terrible and Google has failed marketing forever. But of course they could easily swap that if a better one becomes available, and the point of an open model is that China has zero control over what happens in Malaysia.

Malaysia is exactly the one country I singled out, outside of the Middle East, as an obvious place not to put meaningful quantities of our most advanced AI chips. They don’t need them, they’re not an important market, they’re not important diplomatically or strategically, they’re clearly in China’s sphere of influence and more allied to China than to America, and they have a history of leaking chips to China.

And somehow it’s the place that Sacks and various companies are touting as a place to put advanced AI chips. Why do you think that is? What do you think those chips are for? Why are we suddenly treating selling Malaysia those chips as a ‘beat China’ proposal?

They are trying to play us, meme style, for absolute fools.

One element of Trump’s replacement regulations, Bloomberg News has reported, will be chip controls on countries suspected of diverting US hardware to China — including Malaysia.

Trump officials this year pressured Malaysian authorities to crack down on semiconductor transshipment to China. The country is also in the cross hairs of a court case in Singapore, where three men have been charged with fraud for allegedly disguising the ultimate customer of AI servers that may contain high-end Nvidia chips barred from China. Malaysian officials are probing the issue.

And yet, here we are, with Sacks trying to undermine his own administration in order to keep the chips flowing to China’s sphere of influence. I wonder why.

It’s one thing to argue we need a strategic deal with UAE and KSA. I am deeply skeptical, we’ll need a hell of a set of security procedures and guarantees, but one can make a case that we can get that security, and that they bring a lot to the table, and that they might actually be and become our friends.

But Malaysia? Who are we even kidding? They have played us for absolute fools.

It almost feels intentional, like those who for some unknown reason care primarily about Nvidia’s market share and profit margins choosing the worst possible example to prove to us exactly what they actually care about. And by ‘they’ I mean David Sacks and I also mean Nvidia and Oracle.

But also notice that this is a very small operation. One might even say it is so small as to be entirely symbolic.

The original announced intent was to use only 3,000 Huawei chips to power this, the first exported such chips. You know what it costs to get chips that could fill in for 3,000 Ascend 910Cs?

About 14 million dollars. That’s right. About 1% of what Malaysia buys in chips from Taiwan and America each month right now, as I’ll discuss later. It’s not like they couldn’t have done that under Biden. They did do that under Biden. They did it every month. What are we even talking about?

Divyansh Kaushik: Isolated deployments like this are part of China’s propaganda push around Huawei datacenters designed to project a narrative of technological equivalence with the U.S.

In reality, Huawei cannot even meet domestic Chinese demand, much less provide a credible export alternative.

Importantly, the BIS has clarified that using Huawei Ascend hardware directly violates U.S. export controls. Support from any government for such projects essentially endorses activities contrary to established U.S. law.

Now some will buy into this propaganda effort, but let’s be real. Huawei simply cannot match top-tier American hardware in AI today. Their latest server is economically unviable and depends entirely on sustained state-backed subsidies to stay afloat. On top of that they have and will continue to have issues with scaling.

I presume that, since this means the Malaysian government is announcing to the world that it is directly violating our export controls, combined with previous smuggling of chips out of Malaysia having been allowed, we’re going to cut them off entirely from our own chips? Anakin?

It’s weird, when you combine all that, to see this used as an argument against the diffusion rules, in general, and that the administration is telling us that this is some sort of important scary development? These words ‘American AI stack’ are like some sort of magical invocation, completely scope insensitive, completely not a thing in physical terms, being used as justification to give away our technology to perhaps the #1 most obvious place that would send those chips directly to the PCR and has no other strategic value I can think of?

David Sacks: As I’ve been warning, the full Chinese stack is here. We rescinded the Biden Diffusion Rule just in time. The American AI stack needs to be unleashed to compete.

The AI Investor: Media reported that Malaysia has become the first country outside China to deploy Huawei chips, servers, and DeepSeek’s large language model (LLM).

This would be the literal first time that any country on Earth other than China was deploying Huawei chips at all.

And it wasn’t even a new announcement!

Lennart Heim: This isn’t news. This was reported over a month ago and prominently called “the first deployment outside the China market.”

This needs to be monitored, but folks: it’s 3k Ascend chips by 2026.

Expect more such announcements; their strategic value is in headlines, not compute.

It was first reported here, on April 14.

One might even say that the purpose of this announcement was to give ammunition to people like Sacks to tout the need to sell billions in chips where they can be diverted. The Chinese are behind, but they are subtle, they think ahead and they are not dumb.

For all this supposed panic over the competition, the competition we fear so much that Nvidia says is right on our heels has deployed literally zero chips, and doesn’t obviously have a non-zero number of chips available to deploy.

So we need to rush to give our chips to these obviously China-aligned markets to ‘get entrenched’ in those markets, even though that doesn’t actually make any sense whatsoever because nothing is entrenched or locked in, because in the future China will make chips and then sell them?

And indeed, Malaysia has recently gone on a suspiciously large binge buying American AI chips, with over a billion in purchases each in March and April? As in, even with these chips our ‘market share’ in Malaysia would remain (checks notes) 99%.

I told someone in the administration it sounded like they were just feeding American AI chips to China and then I started crying?

I’ve heard of crazy ‘missile gap’ arguments, but this has to be some sort of record.

But wait, there’s more. Even this deal doesn’t seem to involve Huawei after all?

Mackenzie Hawkins and Ram Anand (Bloomberg): When reached for comment by Bloomberg News on Tuesday, Teo’s office said it’s retracting her remarks on Huawei without explanation. It’s unclear whether the project will proceed as planned.

Will we later see a rash of these ‘sovereign AI’ platforms? For some narrow purposes that involve sufficiently sensitive data and lack of trust in America I presume that we will, although the overall compute needs of such projects will likely not be so large, nor will they mostly require models at the frontier.

And there’s no reason to think that we couldn’t supply such projects with chips in the places it would make any sense to do, without going up against the Biden diffusion rules. There’s no issue here.

Update your assessment of everyone’s credibility and motives accordingly.

LMArena raises $100 million at a $600 million valuation, sorry what, yes of course a16z led the funding round, or $20 per vote cast on their website, and also I think we’re done here? As in, if this wasn’t a bought and paid for propaganda platform before, it sure as hell is about to become one. The price makes absolutely no sense any other way.

OpenAI buys AI Device Startup from Jony Ive for $6.5 billion, calls Ive ‘the deepest thinker Altman’s ever met.’ Jony Ive says of his current prototype, ‘this is the best work our team has ever done,’ this from a person who did the iPhone and MacBook Pro. So that’s a very bold claim. The plan is for OpenAI to develop a family of AI-powered devices to debut in 2026, shipping over 100 million devices. They made a nine minute announcement video. David Lee calls it a ‘long-shot bet to kill the iPhone.’

Great expectations, coming soon, better to update later than not at all.

Scott Singer: European Commission President Ursula von der Leyen: “When the current budget was negotiated, we thought AI would only approach human reasoning around 2050. Now we expect this to happen already next year”

What do they plan to do about this, to prepare for this future? Um… have a flexible budget, whatever that means? Make some investments, maybe? I wonder what is on television.

Here are some better-calibrated expectations, as METR preliminarily extends its chart of how fast various AI capabilities are improving.

Thomas Kwa: We know AI time horizons on software tasks are currently ~1.5hr and doubling every 4-7 months, but what about other domains? Here’s a preliminary result comparing METR’s task suite (orange line) to benchmarks in other domains, all of which have some kind of grounding in human data:

Observations

  • Time horizons agentic computer use (OSWorld) is ~100x shorter than other domains. Domains like Tesla self-driving (tesla_fsd), scientific knowledge (gpqa), and math contests (aime), video understanding (video_mme), and software (hcast_r_s) all have roughly similar horizons.

    • My guess is this means models are good at taking in information from a long context but bad at acting coherently. Most work requires agency like OSWorld, which may be why AIs can’t do the average real-world 1-hour task yet.

    • There are likely other domains that fall outside this cluster; these are just the five I examined

    • Note the original version had a unit conversion error that gave 60x too high horizons for video_mme; this has been fixed (thanks @ryan_greenblatt )

  • Rate of improvement varies significantly; math contests have improved ~50x in the last year but Tesla self-driving only 6x in 3 years.

  • HCAST is middle of the pack in both.

Note this is preliminary and uses a new methodology so there might be data issues. I’m currently writing up a full post!

Is this graph believable? What do you want to see analyzed?

Will future algorithmic progress in an intelligence explosion be bottlenecked by compute? Epoch AI says yes, Ryan Greenblatt says no. In some sense everything is bottlenecked by compute in a true intelligence explosion, since the intelligences work on compute, but that’s not the question here. The question is, will future AIs be able to test and refine algorithmic improvements without gigantic test compute budgets? Epoch says no because Transformers, MoE and MQA are all compute-dependent innovations. But Ryan fires back that all three were first tested and verified at small scale. My inclination is strongly to side with Ryan here. I think that (relatively) small scale experiments designed by a superintelligence should definitely be sufficient to choose among promising algorithmic candidates. After I wrote that, I checked and o3 also sided mostly with Ryan.

New paper in Science claims decentralized populations of LLM agents develop spontaneous universally adopted social conventions. Given sufficient context and memory, and enough ‘social’ interactions, this seems so obviously true I won’t bother explaining why. But the study itself is very clearly garbage, if you read the experimental setup. All it is actually saying is if you explicitly play iterated pairwise coordination games (as in, we get symmetrically rewarded if our outputs match), agents will coordinate around some answer. I mean, yes, no shit, Sherlock.

Popular Mechanics writes up that Dario Amodei and other tech CEOs are predicting AI will allow humans to soon (as in, perhaps by 2030!) double the human lifespan or achieve ‘escape velocity,’ meaning a lifespan that increases faster than one year per year, allowing us to survive indefinitely.

Robin Hanson: No, no it won’t. Happy to bet on that.

I’d be happy to bet against it too if the deadline is 2030. This is a parlay, a bet on superintelligence and fully transformational AI showing up before 2030, combined with humanity surviving that, and that such life extension is physically feasible and we are willing to implement and invest in the necessary changes, all of which would have to happen very quickly. That’s a lot of ways for this not to happen.

However, most people are very much sleeping on the possibility of getting to escape velocity within our lifetimes, as in by 2040 or 2050 rather than 2030, which potentially could happen even without transformational AI, we should fund anti-aging research. These are physical problems with physical solutions. I am confident that with transformational AI solutions could be found if we made it a priority. Of course, we would also have to survive creating transformational AI, and retain control sufficiently to make this happen.

Nikita Bier predicts that AI’s ability to understand text will allow much more rapid onboarding of customization necessary for text-based social feeds like Reddit or Twitter. Right now, such experiences are wonderful with strong investment and attention to detail, but without this they suck and most people won’t make the effort. This seems roughly right to me, but also it seems like we could already be doing a much better job of this, and also based on my brief exposure the onboarding to TikTok is actually pretty rough.

What level of AI intelligence or volume is required before we see big AI changes, and how much inference will we need to make that happen?

Dwarkesh Patel: People underrate how big a bottleneck inference compute will be. Especially if you have short timelines.

There’s currently about 10 million H100 equivalents in the world. By some estimates, human brain has the same FLOPS as an H100.

So even if we could train an AGI that is as inference efficient as humans, we couldn’t sustain a very large population of AIs.

Not to mention that a large fraction of AI compute will continue to be used for training, not inference.

And while AI compute has been growing 2.25x so far, by 2028, you’d be push against TSMC’s overall wafer production limits, which grows 1.25x according to AI 2027 Compute Forecast.

Eliezer Yudkowsky: If you think in those terms, seems the corresponding prediction is that AI starts to have a real impact only after going past the 98th percentile of intelligence, rather than average human intelligence.

Dwarkesh Patel: I wouldn’t put it mainly in terms of intelligence.

I would put it in terms of the economic value of their work.

Long term coherence, efficient+online learning, advanced multimodality seem like much bigger bottlenecks to the value of these models than their intelligence.

Eliezer’s point here confused some people, but I believe it is that if AI is about as intelligent as the average human and you are trying to slot it in as if it was a human, and you have only so many such AIs to work with due to limits to algorithmic improvements, say 114 million in 2028, then 25% growth per year, then you would only see big improvements to the extent the AI was able to do things those humans couldn’t. And Patel is saying that depends more on other factors than intelligence. I think that’s a reasonable position to have on the margins being discussed here, where AI intelligence is firmly in the (rather narrow) normal human range.

However, I also think this is a clearly large underestimate of the de facto number of AIs we would have available in this spot. An AI only uses compute during active inference or training. A human uses their brain continuously, but most of the time the human isn’t using it for much, or we are context shifting in a way that is expensive for humans but not for AIs, or we are using it for a mundane task where the ‘required intelligence’ for the task detail being done is low and you could have ‘outsourced that subtask to a much dumber model.’ And while AI is less sample-efficient at learning than we are, it transfers learning for free and we very, very much don’t. This all seems like at least a 2 OOM (order of magnitude) effective improvement.

I also find it highly unlikely that the world could be running on compute in 2028, we hit the TSMC wafer limit, and using even those non-superintelligent AIs and the incentives to scale them no one figures out a way to make more wafers or otherwise scale inference compute faster.

The humanoid robots keep rapidly getting better, at the link watch one dance.

Andrew Rettek (QTing SMB below): This is the worst take ever.

SMB Attorney: I’m going to say this over and over again:

No one wants these weird robots walking around inside their homes or near their children.

Use case will be limited to industrial labor.

Plenty of people were willing to disprove this claim via counterexample.

Kendric Tonn: I don’t know exactly what I’d be willing to pay for a creepy robot that lives in my basement and does household chores whenever it’s not on the charging station, but uhhhhhhhhh a lot

The only real question is what voice/personality pack I’d want to use. Marvin? Threepio? GLaDOS? Honestly, probably SHODAN.

Gabriel Morgan: The answer is always Darkest Dungeon Narrator Guy.

Kendric Tonn: Good one. Or Stanley Parable Narrator Guy.

Mason: If they can actually do most household tasks competently, just about everyone is going to want one

A housekeeper with an infinitely flexible schedule who never gets tired, never gets sick, never takes vacation, can’t steal or gossip, and can’t judge the state of your home or anything you need it to do?

Like, yeah, people will want the robot

Robert Bernhardt: yeah and they’re gonna be used for tasks which just haven’t been done so far bc they were too much effort. it’s gonna be wild.

the real edge with robots isn’t strength or speed. it’s cost per hour. robots aren’t just about replacing humans. they’re about making previously ridiculous things affordable.

James Miller: Everyone suffering from significant health challenges that impairs mobility is going to want one.

ib: “No one will want these weird robots”

Yeah, man, if there’s anything we’ve learned about people it’s that they really hate anthropomorphizable robots. So much!

Moses Kagan: I’ll take the other side of this.

*Lotsof marriages going to be improved by cheap, 24 hr robot domestic help.

SMB Attorney (disproving Rettek by offering a worse take): Should those marriages be saved?

Moses Kagan: Have you ever been divorced?!

SMB Attorney (digging deeper than we thought possible): You talking this week or ever?

I would find it very surprising if, were this to become highly affordable and capable of doing household chores well, it didn’t become the default to have one. And I think Robert is super on point, having robots that can do arbitrary ‘normal’ physical tasks will be a complete lifestyle game changer, even if they are zero percent ‘creative’ in any way and have to be given specific instructions.

Frankly I’d be tempted to buy one if it even if literally all it could do was dance.

Joe Weisenthal: It’s really surprising OpenAI was founded in California, when places like Tennessee and North Carolina have friendlier business climates.

A general reminder that Congress is attempting to withdraw even existing subsidies to building more electrical power capacity. If we are hard enough up for power to even consider putting giant data centers in the UAE, the least we could do is not this?

Alasdair Phillips-Robins and Sam Winter-Levy write a guide to knowing whether the AI Chips deal was actually good. As I said last week, the devil is in the details. Everything they mention here falls under ‘the least you could do,’ I think we can and must do a lot better than this before I’d be fine with a deal of this size. What I especially appreciate is that giving UAE/KSA the chips should be viewed as a cost, that we pay in order to extract other concessions, even if they aren’t logically linked. Freezing China out of the tech stack is part of the deal, not a technical consequence of using our chips, the same way that you could run Gemma or Llama on Huawei chips.

It’s insane I have to keep quoting people saying this, but here we are:

Divyansh Kaushik: find the odd one out.

Peter Wildeford: NVIDIA: Export controls are a failure (so let us sell chips to the CCP military so they can develop AI models)

Reality: export controls are the main thing holding CCP domestic AI back

David Sacks attempts to blame our failure to Build, Baby, Build on the Biden Administration, in a post with improved concreteness. I agree that Biden could have been much better at turning intention into results, but what matters is what we do now. When Sacks says the Trump administration is ‘alleviating the bottlenecks’ what are we actually doing here to advance permitting reform and energy access?

Everyone seems to agree on this goal, across the aisle, so presumably we have wide leeway to not only issue executive orders and exemptions, but to actually pass laws. This seems like a top priority.

The other two paragraphs are repetition of previous arguments, that lead to questions we need better answers to. A central example is whether American buildout of data centers is actually funding constrained. If it is, we should ask why but welcome help with financing. If it isn’t, we shouldn’t be excited to have UAE build American data centers, since they would have been built anyway.

And again with ‘Huawei+DeepSeek,’ what exactly are you ‘selling’ with DeepSeek? And exactly what chips is China shipping with Huawei, and are they indeed taking the place of potential data centers in Beijing and Shanghai, given their supply of physical chips is a limiting factor? And if China can build [X] data centers anywhere, should it concern us if they do it in the UAE over the PRC? Why does ‘the standard’ here matter when any chip can run any model or task, you can combine any set of chips, and model switching costs are low?

In his interview with Ross Douthat, VP Vance emphasized energy policy as the most important industrial policy for America, and the need to eliminate regulatory barriers. I agree, but until things actually change, that is cheap talk. Right now I see a budget that is going to make things even worse, and no signs of meaningfully easing permitting or other regulatory barriers, or that this is a real priority of the administration. He says there is ‘a lot of regulatory relief’ in the budget but I do not see the signs of that.

If we can propose, with a straight face, an outright moratorium on enforcing any and all state bills about AI, how about a similar moratorium on enforcing any and all state laws restricting the supply of electrical power? You want to go? Let’s fing go.

We now have access to a letter that OpenAI sent to California Attorney General Rob Bonta.

Garrison Lovely: The previously unreported 13-page letter — dated May 15 and obtained by Obsolete — lays out OpenAI’s legal defense of its updated proposal to restructure its for-profit entity, which can still be blocked by the California and Delaware attorneys general (AGs). This letter is OpenAI’s latest attempt to prevent that from happening — and it’s full of surprising admissions, denials, and attacks.

What did we learn that we didn’t previously know, about OpenAI’s attempt to convert itself into a PBC and sideline the nonprofit without due compensation?

First of all, Garrison Lovely confirms the view Rob Wilbin and Tyler Whitmer have, going in the same direction I did in my initial reaction, but farther and with more confidence that OpenAI was indeed up to no good.

Here is his view on the financing situation:

The revised plan appears designed to placate both external critics and concerned investors by maintaining the appearance of nonprofit control while changing its substance. SoftBank, which recently invested $30 billion in OpenAI with the right to claw back $10 billion if the restructuring didn’t move forward, seems unfazed by the company’s new proposal — the company’s finance chief said on an earnings call that from SoftBank’s perspective, “nothing has really changed.”

The letter from OpenAI’s lawyers to AG Bonta contains a number of new details. It says that “many potential investors in OpenAI’s recent funding rounds declined to invest” due to its unusual governance structure — directly contradicting Bloomberg’s earlier reporting that OpenAI’s October round was “oversubscribed.”

There is no contradiction here. OpenAI’s valuation in that round was absurdly low if you had been marketing OpenAI as a normal corporation. A substantial price was paid. They did fill the round to their satisfaction anyway with room to spare, at this somewhat lower price and with a potential refund offer. This was nominally conditional on a conversion, but that’s a put that is way out of the money. OpenAI’s valuation has almost doubled since then. What is SoftBank going to do, ask for a refund? Of course nothing has changed.

The most important questions about the restructuring are: What will the nonprofit actually have the rights to do? And what obligations to the nonprofit mission will the company and its board have?

The letter resolves a question raised in recent Bloomberg reporting: the nonprofit board will have the power to fire PBC directors.

The document also states that “The Nonprofit will exchange its current economic interests in the Capped-Profit Enterprise for a substantial equity stake in the new PBC and will enjoy access to the PBC’s intellectual property and technology, personnel, and liquidity…” This suggests the nonprofit would no longer own or control the underlying technology but would merely have a license to it — similar to OpenAI’s commercial partners.

A ‘substantial stake’ is going to no doubt be a large downgrade in their expected share of future profits, the question is how glaring a theft that will be.

The bigger concern is control. The nonprofit board will go from full direct control to the ability to fire PBC directors. But the power to fire the people who decide X is very different from directly deciding X, especially in a rapidly evolving scenario, and when the Xs have an obligation to balance your needs with the maximization of profits. This is a loss of most of the effective power of the nonprofit.

Under the current structure, OpenAI’s LLC operating agreement explicitly states that “the Company’s duty to this mission and the principles advanced in the OpenAI, Inc. Charter take precedence over any obligation to generate a profit.” This creates a legally binding obligation for the company’s management.

In contrast, under the proposed structure, PBC directors would be legally required to balance shareholder interests with the public benefit purpose. The ability to fire PBC directors does not change their fundamental legal duties while in office.

So far, no Delaware PBC has ever been held liable for failing to pursue its mission — legal scholars can’t find a single benefit‑enforcement case on the books.

The way I put this before was: The new arrangement helps Sam Altman and OpenAI do the right thing if they want to do the right thing. If they want to do the wrong thing, this won’t stop them.

As Tyler Whitmer discusses on 80,000 Hours, it is legally permitted to write into the PBC’s founding documents that the new company will prioritize the nonprofit mission. It sounds like they do not intend to do that.

OpenAI has, shall we say, not been consistently candid here. The letter takes a very hard stance against all critics while OpenAI took a public attitude of claiming cooperation and constructive dialogue. It attempts to rewrite the history of Altman’s firing and rehiring (I won’t rehash those details here). It claims ‘the nonprofit board is stronger than ever’ (lol, lmao even). It claims that when the letter ‘Not For Private Gain’ said OpenAI planned to eliminate nonprofit control that this was false, while their own letter elsewhere admitted this was indeed exactly OpenAI’s plan, and then when they announced their change in plans characterized the change as letting the board remain in control, thus admitting this again, while again falsely claiming the board would retain its control.

Garrison also claims that OpenAI is fighting dirty against its critics beyond the contents of the letter, such as implying they are working with with Elon Musk when OpenAI had no reason to think this was not the case, and indeed I am confident it is not true.

Yoshua Bengio TED talk on his personal experience fighting AI existential risk.

Rowan Cheung interviews Microsoft CEO Satya Nadella, largely about agents.

Demis Hassabis talks definitions of AGI. If the objection really is ‘a hole in the system’ and a lack of consistency in doing tasks, then who among us is a general intelligence?

As referenced in the previous section, Rob Wiblin interviews litigator Tyler Whitmer of the Not For Private Gain coalition. Tyler explains that by default OpenAI’s announcement that ‘the nonprofit will retain control’ means very little, ‘the nonprofit can fire the board’ is a huge downgrade from their current direct control, this would abrogate all sorts of agreements. In a truly dangerous scenario, having to go through courts or otherwise act retroactively comes too late. And we can’t even be assured the ‘retaining control’ means even this minimal level of control.

This is all entirely unsurprising. We cannot trust OpenAI on any of this.

The flip side of the devil being in the details is that, with the right details, we can fight to get better details, and with great details, in particular writing the non-profit mission in as a fiduciary duty of the board of the new PBC, we can potentially do well. It is our job to get the Attorney Generals to hold OpenAI to account and ensure the new arrangement have teeth.

Ultimately, given what has already happened, the best case likely continues to mostly be ‘Sam Altman has effective permission to do the right thing if he chooses to do it, rather than being legally obligated to do the wrong thing.’ It’s not going to be easy to do better than that. But we can seek to at least do that well.

Kevin Roose reflects on Sydney, and how we should notice how epic are the fails even from companies like Microsoft.

Will OpenAI outcompete startups? Garry Tan, the head of YC, says no. You have to actually build a business that uses the API well, if you do there’s plenty of space in the market. For now I agree. I would be worried that this is true right up until it isn’t.

You’d be surprised who might read it.

In the case of Situational Awareness, it would include Ivanka Trump.

In the case of AI 2027, it would be Vice President JD Vance, among the other things he said in a recent interview with Ross Douthat that was mostly about immigration.

Patrick McKenzie: Another win for the essay meta.

(Object level politics aside: senior politicians and their staff are going to have an information diet whether you like them or not. Would you prefer it to be you or the replacement rate explainer from Vox or a CNBC talking head?)

It is true that I probably should be trying harder to write things in this reference class. I am definitely writing some things with a particular set of people, or in some cases one particular person, in mind. But the true ‘essay meta’ is another level above that.

What else did Vance say about AI in that interview?

First, in response to being asked, he talks about jobs, and wow, where have I heard these exact lines before about how technology always creates jobs and the naysayers are always wrong?

Vance: So, one, on the obsolescence point, I think the history of tech and innovation is that while it does cause job disruptions, it more often facilitates human productivity as opposed to replacing human workers. And the example I always give is the bank teller in the 1970s. There were very stark predictions of thousands, hundreds of thousands of bank tellers going out of a job. Poverty and commiseration.

What actually happens is we have more bank tellers today than we did when the A.T.M. was created, but they’re doing slightly different work. More productive. They have pretty good wages relative to other folks in the economy.

I tend to think that is how this innovation happens. You know, A.I.

I consider that a zombie argument in the context of AI, and I agree (once again) that up to a point when AI takes over some jobs we will move people to other jobs, the same way bank tellers transitioned to other tasks, and all that. But once again, the whole problem is that when the AI also takes the new job you want to shift into, when a critical mass of jobs get taken over, and when many or most people can’t meaningfully contribute labor or generate much economic value, this stops working.

Then we get into territory that’s a lot less realistic.

Vance: Well, I think it’s a relatively slow pace of change. But I just think, on the economic side, the main concern that I have with A.I. is not of the obsolescence, it’s not people losing jobs en masse.

You hear about truck drivers, for example. I think what might actually happen is that truck drivers are able to work more efficient hours. They’re able to get a little bit more sleep. They’re doing much more on the last mile of delivery than staring at a highway for 13 hours a day. So they’re both safer and they’re able to get higher wages.

I’m sorry, what? You think we’re going to have self-driving trucks, and we’re not going to employ less truck drivers?

I mean, we could in theory do this via regulation, by requiring there be a driver in the car at all times. And of course those truck drivers could go do other jobs. But otherwise, seriously, who are you kidding here? Is this a joke?

I actually agree with Vance that economic concerns are highly secondary here, if nothing else we can do redistribution or in a pinch create non-productive jobs.

So let’s move on to Vance talking about what actually bothers him. He focuses first on social problems, the worry of AI as placebo dating app on steroids.

Vance: Where I really worry about this is in pretty much everything noneconomic? I think the way that people engage with one another. The trend that I’m most worried about, there are a lot of them, and I actually, I don’t want to give too many details, but I talked to the Holy Father about this today.

If you look at basic dating behavior among young people — and I think a lot of this is that the dating apps are probably more destructive than we fully appreciate. I think part of it is technology has just for some reason made it harder for young men and young women to communicate with each other in the same way. Our young men and women just aren’t dating, and if they’re not dating, they’re not getting married, they’re not starting families.

There’s a level of isolation, I think, mediated through technology, that technology can be a bit of a salve. It can be a bit of a Band-Aid. Maybe it makes you feel less lonely, even when you are lonely. But this is where I think A.I. could be profoundly dark and negative.

I don’t think it’ll mean three million truck drivers are out of a job. I certainly hope it doesn’t mean that. But what I do really worry about is does it mean that there are millions of American teenagers talking to chatbots who don’t have their best interests at heart? Or even if they do have their best interests at heart, they start to develop a relationship, they start to expect a chatbot that’s trying to give a dopamine rush, and, you know, compared to a chatbot, a normal human interaction is not going to be as satisfying, because human beings have wants and needs.

And I think that’s, of course, one of the great things about marriage in particular, is you have this other person, and you just have to kind of figure it out together. Right? But if the other person is a chatbot who’s just trying to hook you to spend as much time on it, that’s the sort of stuff that I really worry about with A.I.

It seems weird to think that the three million truck drivers will still be driving trucks after those trucks can drive themselves, but that’s a distinct issue from what Vance discusses here. I do think Vance is pointing to real issues here, with no easy answers, and it’s interesting to see how he thinks about this. In the first half of the interview, he didn’t read to me like a person expressing his actual opinions, but here he does.

Then, of course, there’s the actual big questions.

Vance: And then there’s also a whole host of defense and technology applications. We could wake up very soon in a world where there is no cybersecurity. Where the idea of your bank account being safe and secure is just a relic of the past. Where there’s weird shit happening in space mediated through A.I. that makes our communications infrastructure either actively hostile or at least largely inept and inert. So, yeah, I’m worried about this stuff.

I actually read the paper of the guy that you had on. I didn’t listen to that podcast, but ——

Douthat: If you read the paper, you got the gist.

Those are indeed good things to worry about. And then it gets real, and Vance seems to be actually thinking somewhat reasonably about the most important questions, although he’s still got a way to go?

Douthat: Last question on this: Do you think that the U.S. government is capable in a scenario — not like the ultimate Skynet scenario — but just a scenario where A.I. seems to be getting out of control in some way, of taking a pause?

Because for the reasons you’ve described, the arms race component ——

Vance: I don’t know. That’s a good question.

The honest answer to that is that I don’t know, because part of this arms race component is if we take a pause, does the People’s Republic of China not take a pause? And then we find ourselves all enslaved to P.R.C.-mediated A.I.?

Fair enough. Asking for a unilateral pause is a rough ask if you take the stakes sufficiently seriously, and think things are close enough that if you pause you would potentially lose. But perhaps we can get into a sufficiently strong position, as we do in AI 2027. Or we can get China to follow along, which Vance seems open to. I’ll take ‘I’d do it if it was needed and China did it too’ as an opening bid, so long as we’re willing to actually ask. It’s a lot better than I would have expected – he’s taking the situation seriously.

Vance: One thing I’ll say, we’re here at the Embassy in Rome, and I think that this is one of the most profound and positive things that Pope Leo could do, not just for the church but for the world. The American government is not equipped to provide moral leadership, at least full-scale moral leadership, in the wake of all the changes that are going to come along with A.I. I think the church is.

This is the sort of thing the church is very good at. This is what the institution was built for in many ways, and I hope that they really do play a very positive role. I suspect that they will.

It’s one of my prayers for his papacy, that he recognizes there are such great challenges in the world, but I think such great opportunity for him and for the institution he leads.

If the Pope can help, that’s great. He seems like a great dude.

As a reminder, if you’re wondering how we could possibly keep track of data centers:

A zombie challenge that refuses to go away is ‘these people couldn’t possibly believe the claims they are making about AI, if they did they would be doing something about the consequences.’

I understand why you would think that. But no. They wouldn’t. Most of these people really do believe the things they are saying about AI maybe killing everyone or disempowering humanity, and very definitely causing mass unemployment, and their answer is ‘that’s not my department.’

The originating example here is one of the most sympathetic, because (1) he is not actively building it, (2) he is indeed working in another also important department, and (3) you say having unlimited almost free high quality doctors and teachers like it’s a bad thing and assume I must mean the effect on jobs rather than the effect on everyone getting education and health care.

Unusual Whales: Bill Gates says a 2-day work week is coming in just 10 years, thanks to AI replacing humans ‘for most things,’ per FORTUNE.

Today, proficiency in medicine and teaching is “rare,” Gates noted, saying those fields depend on “a great doctor” or “a great teacher.” But in the next 10 years, he said, “great medical advice [and] great tutoring” will be widely accessible and free, thanks to advances in AI.

Bill Gates says AI will replace doctors and teachers in 10 years.

James Rosen-Birch: The people who make these claims don’t believe it in any meaningful way.

If they did, there would be a lot more emphasis on building the social safety nets and mechanisms of redistribution to make it possible. And support for a slow tapering of work hours.

But there isn’t.

Kelsey Piper: I think this is too optimistic. there are people who I believe sincerely think they’ll displace almost all jobs by automation and are just going “and it’s not my job to figure out what happens after that” or “well if the AIs do kill us all at least we had a good run”

it’s tempting to call people insincere about their beliefs when they are taking what seem to be unreasonable risks given their beliefs but I think reasonably often they’re sincere and just not sure what to do about it.

Catherine: i think it is underestimated how often solvable problems become intractable because everyone in a position to do anything about them goes “oh well I’ll pass off the hot potato to the next guy by then!”

I do think Bill Gates, given he’s noticed for a long time that we’re all on track to die, should have pivoted (and still could pivot!) a substantial portion of his foundation towards AI existential risk and other AI impacts, as the most important use of marginal funds. But I get it, and that’s very different from when similar talk comes from someone actively working to create AGI.

Emmett Shear: The blindingly obvious proposition is that a fully independently recursive self-improving AI would be the most powerful [tool or being] ever made and thus also wildly dangerous.

The part that can be reasonably debated is how close we are to building such a thing.

Tyler Cowen clarifies (if I’m parsing this correctly) that he doesn’t think it’s crazy to think current AIs might be conscious, but that it is crazy to be confident that they are conscious, and that he strongly thinks that they are not (at least yet) conscious. I notice I continue to be super confused about consciousness (including in humans) but to the extent I am not confused I agree with Tyler here.

A good way of describing how many people are, alas, thinking we will create superintelligence and then have it all work out. Gabriel explains some reasons why that won’t work.

Gabriel: There is an alignment view that goes:

– LLMs look nice

– This means they are aligned

– If we use them to align further AIs, they’ll be aligned too

– We can do this up to superintelligence

In this article, I explain why this view is wrong.

There are many definitions for alignment. The one that I use is “An entity is aligned with a group of people if it reliably acts in accordance with what’s good for the group“.

What’s good might be according to a set of goals, principles, or interests.

The system might be an AI system, a company, markets, or some group dynamics.

Intention Alignment is more of an intuition than a well-defined concept. But for the purpose of this article, I’ll define it as “An entity is aligned in its intentions with a group of people if it wants good things for the group“.

The core thing to notice is that they are different concepts. Intention Alignment is not Alignment.

[because] Figuring out what’s good for someone is hard, even after identifying what’s good, finding out the best way to achieve it is hard, what’s good for a complex entity is multi-faceted, managing the trade-offs is hard, and ensuring that “good” evolves in a good way is hard.

[also] intention alignment is vague.

The Niceness Amplification Alignment Strategy is a cluster of strategies that all aim to align superintelligence (which is also sometimes called superalignment).

This strategy starts with getting an AGI to want to help us, and to keep wanting to help us as it grows to ASI. That way, we end up with an ASI that wants to help us and everything goes well.

There are quite a few intuitions behind this strategy.

  1. We, as humans, are far from solving ASI Alignment. We cannot design an ASI system that is aligned. Thus we should look for alternatives.

  2. Current AI systems are aligned enough to prevent catastrophic failures, and they are so because of their intentions.

  3. Without solving any research or philosophical problem, through mere engineering, there is a tractable level of intention alignment that we can reach to have AIs align the intentions of the next generations of AIs.

  4. We can do so all the way to ASI, and end up with an ASI aligned in its intentions.

  5. An ASI that is aligned in its intentions is aligned period.

[Gabriel agrees with #1 and #5, but not #2, #3 or #4].

I think there are also major caveats on #5 unless we are dealing with a singleton. Even on the others, his explanations are good objections but I think you can go a lot farther about why these intentions are not this coherent or reliable thing people imagine, or something one can pass on without degrading quality with each iteration, and so on. And more than that, why this general ‘as long as the vibes are good the results will be good’ thing (even if you call it something else) isn’t part of the reality based community.

Connor Leahy: This quite accurately represents my view on why ~all current “alignment” plans do not work.

For your consideration:

Nick Whitaker: There is a funny leftist critique of tech that it’s all reprehensible trans-humanist succession planning, except the one field that is outwardly doing trans-humanist succession planning, which is fake because the tech occasionally makes mistakes.

Parmy Olson entitles her latest opinion piece on AI “AI Sometimes Deceives to Survive. Does Anybody Care?” and the answer is mostly no, people don’t care. They think it’s cute. As she points out while doing a remarkably good summary of various alignment issues given the post is in Bloomberg, even the most basic precautionary actions around transparency for frontier models are getting killed, as politicians decide that all that matters is ‘race,’ ‘market share’ and ‘beat China.’

Daniel Kokotajlo is correct that ‘the superintelligent robots will do all the work and the humans will lay back and sip margaritas and reap the benefits’ expectation is not something you want to be counting on as a default. Not that it’s impossible that things could turn out that way, but it sure as hell isn’t a default.

Indeed, if this is our plan, we are all but living in what I refer to as Margaritaville – a world sufficiently doomed, where some people say there’s a woman to blame but you know it’s your own damn fault, that honestly at this point you might as well use what time you have to listen to music and enjoy some margaritas.

What’s an example of exactly that fallacy? I notice that in Rob Henderson’s quote and link here the article is called ‘how to survive AI’ which implies that without a good plan there is danger that you (or all of us) won’t, whereas the currently listed title of the piece by Tyler Cowen and Avital Balwit is actually ‘AI will change what it means to be human. Are you ready?’ with Bari Weiss calling it ‘the most important essay we have run so far on the AI revolution.’

This essay seems to exist in the strange middle ground of taking AI seriously without taking AI seriously.

Tyler Cowen and Avital Balwit: Are we helping create the tools of our own obsolescence?

Both of us have an intense conviction that this technology can usher in an age of human flourishing the likes of which we have never seen before. But we are equally convinced that progress will usher in a crisis about what it is to be human at all.

AI will not create an egalitarian utopia. One thing that living with machines cannot change is our nature…Since we will all be ranked below some other entity on intelligence, we will need to find new and different outlets for status competition.

I mean, yes, obviously we are helping create the tools of our own obsolescence, except that they will no longer be something we should think about as ‘tools.’ If they stay merely ‘tools of our own obsolescence’ but still ‘mere tools’ and humans do get to sit back and sip their margaritas and search for meaning and status, then this kind of essay makes sense.

As in, this essay is predicting that humans will share the planet with minds that are far superior to our own, that we will be fully economically obsolete except for actions that depend on other humans seeing that you are human and doing things as a human. But of course humans will stay fully in control and continue to command increasingly rich physical resources, and will prosper if we can only ‘find meaning.’

If you realize these other superintelligent minds probably won’t stay ‘mere tools,’ and certainly won’t do that by default, and that many people will find strong reasons to make them into (or allow them to become) something else entirely, then you also realize that no you won’t be able to spend your time sipping margaritas and playing status games that are unanchored to actual needs.

Demoralization is the central problem in the scenario in exactly the scenario Kokotajlo warns us not to expect, where superintelligent AI serves us and makes our lives physically amazing and prosperous but potentially robs us of its meaning.

But you know what? I am not worried about what to do in that scenario! At all. Because if we get to that scenario, it will contain superintelligent AIs. Those superintelligent AIs can then ‘do our homework’ to allow us to solve for meaning, however that is best done. It is a problem we can solve later.

Any problem that can be solved after superintelligence is only a problem if it runs up against limits in the laws of physics. So we’ll still have problems like ‘entropy and the heat death of the universe’ or ‘the speed of light puts most matter out of reach.’ If it’s things like ‘how does a human find a life of meaning given we are rearranging the atoms the physically possible best way we can imagine with this goal in mind?’ then rest, Neo. The answers are coming.

Whereas we cannot rest on the question of how to get to that point, and actually survive AI while remaining in control and having the atoms get rearranged for our benefit in line with goals we would endorse on reflection, and not for some other purpose, or by the result of AIs competing against each other for resources, or for some unintended maximalist goal, or to satisfy only a small group of anti-normative people, or some harmful or at least highly suboptimal ideology, or various other similar failure modes.

There is perhaps a middle ground short term problem. As in, during a transition period, there may come a time when AI is doing enough of the things that meaning is difficult to retain for many or even most people, but we have not yet gained the capabilities that will later fully solve this. That might indeed get tricky. But in the grand scheme it doesn’t worry me.

It is amazing that The New York Times keeps printing things written by Cate Metz. As always, my favorite kind of terrible AI article is ‘claims that AI will never do [thing that AI already does].’

Cate Metz (NYT, The Worst, also wrong): And scientists have no hard evidence that today’s technologies are capable of performing even some of the simpler things the brain can do, like recognizing irony or feeling empathy. Claims of A.G.I.’s imminent arrival are based on statistical extrapolations — and wishful thinking.

According to various benchmark tests, today’s technologies are improving at a consistent rate in some notable areas, like math and computer programming. But these tests describe only a small part of what people can do.

Humans know how to deal with a chaotic and constantly changing world. Machines struggle to master the unexpected — the challenges, both small and large, that do not look like what has happened in the past. Humans can dream up ideas that the world has never seen. Machines typically repeat or enhance what they have seen before.

AI is already superhuman at recognizing irony, and at expressing empathy in practice in situations like doctor bedside manner. Humans ‘typically repeat or enhance what they have seen before’ or do something stupider that.

“The technology we’re building today is not sufficient to get there,” said Nick Frosst, a founder of the A.I. start-up Cohere who previously worked as a researcher at Google and studied under the most revered A.I. researcher of the last 50 years.

Guess who ‘the most revered A.I. researcher’ this refers to is?

Alexander Berger: It’s a bit funny to hype up the authority of this “AGI is not imminent” person by pointing out that he studied under Geoffrey Hinton, who is now ~100% focused on ~imminent risks from AGI

The reference link for ‘studied under’ is about how Hinton was quitting Google to spend his remaining time warning about the threat of AI superintelligence killing everyone. These people really just do not care.

Beyond that, it’s like a greatest hits album of all the relevant zombie arguments, presented as if they were overwhelming rather than a joke.

Here is a thread with Eliezer righteously explaining, as he often does, why the latest argument that humans will survive superintelligent AI is incorrect, including linking back to another.

Is it wrong to title your bookIf Anyone Builds It, Everyone Dies’ if you are not willing to say that if anyone builds it, 100% no matter what, everyone dies? Xlr8harder asked if Eliezer is saying p(doom | AGI) = 1, and Eliezer quite correctly pointed out that this is a rather ludicrous Isolated Demand for Rigor and book titles are short which is (one reason) why they almost never including probabilities in their predictions. Later in one part of the thread they reached sufficient clarity that xlr8harder agreed that Eliezer was not, in practice, misrepresenting his epistemic state.

The far more common response of course is to say some version of ‘by everyone dies you must mean the effect on jobs’ or ‘by everyone dies you are clearly being hyperbolic to get our attention’ and, um, no.

Rob Bensinger: “If Anyone Builds It, Everyone Dies: Why Superintelligent AI Would Kill Us All: No Really We Actually Mean It, This Is Not Hyperbole (Though It Is Speaking Normal Colloquial English, Not Mathematical-Logician, It’s Not A Theorem)” by Eliezer Yudkowsky and Nate Soares.

Hell, that’s pretty close to what the book website says:

Book Website (from the book): If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.

We do not mean that as hyperbole. We are not exaggerating for effect. We think that is the most direct extrapolation from the knowledge, evidence, and institutional conduct around artificial intelligence today. In this book, we lay out our case, in the hope of rallying enough key decision-makers and regular people to take AI seriously. The default outcome is lethal, but the situation is not hopeless; machine superintelligence doesn’t exist yet, and its creation can yet be prevented.

Sean: …I presume you’re talking about the impact on jobs.

… The “Everyone dies” claim appears to be referencing the song “Kill the Boer”, which-

As a Wise Academic Elder, I can tell you this is Clearly a Psyops by Yudkowsky and Soares to make AI sound more cool and sell more AI to AI buyers. Because telling people AI will kill everyone is a super good marketing strategy in my view as an academic w no idea about money.

…What Bensinger NEGLECTS to mention is that we’re all dying a little bit every day, so we’ll all die whether we build it or not! Maximum gotcha 100 points to me.

FFS people we need to STOP talking about why AI will kill everyone and START talking about the fact that training a frontier LLM uses as much water as running an average McDonalds franchise for 2 hrs 32 minutes. Priorities ppl!!!

Can we PLEASE talk about how killing everyone erases the lived experience of indigenous peoples from the face of the computronium sphere.

I kind of hate that the “bUt WhAt AbOuT cApItAlIsM” people kind of have a point on this one.

Nonsense! As I demonstrated in my 1997, 2004, 2011 and 2017 books, Deep Learning Is Hitting A Wall.

Yanco:

Here is another case from the top thread in which Eliezer is clearly super frustrated, and I strive not to talk in this way, but the fact remains that he is not wrong (conversation already in progress, you can scroll back up first for richer context but you get the idea), first some lead-in to the key line:

Eliezer Yudkowsky: Sorry, explain to me again why the gods aren’t stepping on the squishy squirrels in the course of building their factories? There was a tame slave-mind over slightly smarter than human which built a bomb that would destroy the Solar System, if they did? Is that the idea?

Kas.eth: The ‘gods’ don’t step on the squishy squirrels because they are created as part of an existing civilization that contains not only agents like them (and dumber than them) but also many advanced “systems” that are not agents themselves, but which are costly to dismantle (and that happen to protect some rights of dumber pre-existing agents like the ‘squirrels’).

The ‘gods’ could coordinate to destroy all existing systems and rebuild all that is needed from scratch to get 100% of whatever resources are left for themselves, but that would destroy lots of productive resources that are instrumentally useful for lots of goals including the goals of the gods. The systems are ‘defended’ in the local cost-benefit sense: a system that controls X units of resources ensures Y>X resources will be wasted before control is lost (your bomb scenario is Y>>X, which is not needed and ultra-high Y/X ratios will probably not be allowed).

What systems are considered ‘secure’ at a time depend on the technology levels and local prices of different resources. It seems plausible to me for such systems to exist at all levels of technology, including at the final one where the unit of resources is free energy, and the dissipation-defense property holds for some construction by theoretical physics.

And here’s the line that, alas, summarizes so much of discourse that keeps happening no matter how little sense it makes:

Eliezer Yudkowsky: A sophisticated argument for why gods won’t squish squirrels: Minds halfway to being gods, but not yet able to take squirrels in a fight, will build mighty edifices with the intrinsic property of protecting squirrels, which later gods will not want to pay to tear down or rebuild.

Basically all sophisticated arguments against ASI ruin are like this, by the way.

I’ve heard this particular one multiple times, from economists convinced that “powerful entities squish us” scenario just *hasto have some clever hidden flaw where it fails to add in a term.

No, I am not an undergrad who’s never heard of comparative advantage.

That’s a reasonable lead-in to David Brin offering his latest ‘oh this is all very simple, you fools’ explanation of AI existential risks and loss of control risks, or what he calls the ‘Great Big AI Panic of 2025’ as if there was a panic (there isn’t) or even as much panic as there were in previous years (2023 had if anything more panic). Eliezer Yudkowsky, who he addresses later, not only is not pancing nor calling for what Brin says he is calling for, he has been raising this alarm since the 2000s.

To his great credit, Brin acknowledges that it would be quite easy to screw all of this up, and that we will be in the position of the ‘elderly grandpa with the money’ who doesn’t understand these young whippersnappers or what they are talking about, and he points out a number of the problems we will face. But he says you are all missing something simple and thus there is a clear solution, which is reciprocal accountability and the tendency of minds to be individuals combined with positive-sum interactions, so all you have to do is set up good incentives among the AIs.

And also to his credit, he has noticed that we are really dropping the ball on all this. He finds it ‘mind-boggling’ that no one is talking about ‘applying similar methods to AI’ which is an indication of both not paying close enough attention – some people are indeed thinking along similar lines – but more than that a flaw in his sci-fi thinking to expect humans to focus on that kind of answer. It is unlikely we do a dignified real attempt even at that, let alone a well-considered one, even if he was right that this would work and that it is rather obviously the right thing to investigate.

As in, even if there exist good ‘rules of the road’ that would ensure good outcomes, why would you (a sci-fi author) think our civilization would be likely to implement them? Is that what you think our track record suggests? And why would you think such rules would hold long term in a world beyond our comprehension?

The world has lots of positive-sum interactions and the most successful entities in the world do lots of positive-sum trading. That does not mean that fundamentally uncompetitive entities survive such competition and trading, or that the successful entities will have reason to cooperate and trade with you, in particular.

His second half, which is a response to Eliezer Yudkowsky, is a deeply disappointing but unsurprising series of false or irrelevant or associative attacks. It is especially disappointing to see ‘what Eliezer will never, ever be convinced of is [X], which is obviously true’ as if this was clearly about Eliezer thinking poorly and falling for ‘sci-fi cliches’ rather than a suggestion that [X] might be false or (even if [X] is true!) you might have failed to make a strong argument for it.

I can assume David Brin, and everyone else, that Eliezer has many times heard David’s core pitch here, that we can solve AI alignment and AI existential risk via Western Enlightenment values and dynamics, or ‘raising them as our children.’ Which of course are ‘cliches’ of a different sort. To which Eliezer will reply (with varying details and examples to help illustrate the point), look at the physical situation we are going to face. think about why those solutions have led to good outcomes historically, and reason out what would happen, that is not going to work. And I have yet to see an explanation for how any of this actually physically works out, that survives five minutes of thinking.

More generally: It is amazing how many people will say ‘like all technologies, AI will result or not result in [X]’ or ‘like always we can simply do [Y]’ rather than go to therapy consider whether that makes any physical or logical sense given how AI works, or considering whether ‘tools created by humans’ is a the correct or even a useful reference class in context.

Another conversation that never makes progress:

Rob Bensinger: There’s a lot of morbid excitement about whether the probability of us killing our families w AI is more like 50% or like 80% or 95%, where a saner and healthier discourse would go

“WAIT, THIS IS CRAZY. ALL OF THOSE NUMBERS ARE CLEARLY UNACCEPTABLE. WHAT THE FUCK IS HAPPENING?”

Flo Crivello (founder, GetLindy): A conversation I have surprisingly often:

– (friend:) I’m on the optimistic side. I think there’s only a 10-20% chance we all die because of AI

– Wait, so clearly we must agree that even this is much, much, much too high, and that this warrants immediate and drastic action?

Daniel Faggella: every day

“bro… we don’t need to govern any of this stuff in any way – its only russian roulette odds of killing us all in the next 10-15 years”

like wtf

Flo Crivello: yeah I don’t think people really appreciate what’s at stake

we’ve been handed off an insane responsibilities by the thousands of generations that came before us — we’re carrying the torch of the human project

and we’re all being so cavalier about it, ready to throw it all away because vibes

Why can we instruct a reasoning model on how to think and have it reflected in the Chain of Thought (CoT)? Brendan seems clearly correct here.

Brendan Long: This post surprised me since if we’re not training on the CoT (@TheZvi’s “Most Forbidden Technique”), why does the AI listen to us when we tell it how to think? I think it’s because reasoning and output come from the same model, so optimization pressure on one applies to both.

Latent Moss: I just realized you can give Gemini instructions for how to think. Most reasoning models ignore those, but Gemini 2.5 actually do.

Several people are asking how to do this: Sometimes it’s easy, just tell it how to format its thinking. Sometimes that doesn’t work, then it helps to reinforce the instruction. Doesn’t always work perfectly though, as you can see:

I tested 3.7 Thinking after I posted this and it works in some cases with that one too. Easier to do / works more often with Gemini though, I would still say.

James Yu: s this useful?

Latent Moss: I don’t know, but I would guess so, in the general sense that Prompt Engineering is useful, guiding the AI can be useful, a different perspective or approach is sometimes useful. Worth a try.

It seems obviously useful given sufficient skill, it’s another thing you can steer and optimize for a given situation. Also it’s fun.

This works, as I understand it, not only because of optimization pressure, but also context and instructions, and because everything bleeds into everything else. Also known as, why shouldn’t this work? It’s only a question of how strong a prior there is for it to overcome in a given spot.

I also note that this is another example of a way in which one can steer models exactly because they are insufficiently optimized and capable, and are working with limited compute, parameters and data. The model doesn’t have the chops to draw all the distinctions between scenarios, as most humans also mostly don’t, thus the bleeding of all the heuristics into places they are not intended, and are not optimizing feedback. As the model gets to more capable, and becomes more of an expert and more precise, we should expect such spillover effects to shrink and fade away.

No, Guyed did not get Grok to access xAI’s internal file system, only the isolated container in which Grok is running. That’s still not great? It shouldn’t give that access, and it means you damn well better only run it in isolated containers?

Claude finds another way to tell people to watch out for [X]-maximizers, where [X] is allowed to be something less stupid than paperchips, calling this ‘non-convergent instrumental goals,’ but what those lead to is… the convergent instrumental goals.

Joining forces with the new Pope, two Evangelical Christians write an open letter warning of the dangers of out-of-control AI and also of course the effect on jobs.

More on our new AI-concerned pope, nothing you wouldn’t already expect, and the concerns listed here are not existential.

There are two keys to saying ‘I will worry when AI can do [X]’ is to notice when AI can do [X], where often AI can already do [X] at the time of announcement.

The first is to realize when AI can indeed do [X] (again, often that is right now), and then actually worry.

The second is to pick a time when your worries can still do any good, not after that.

Affordance of Effort: I’ll start worrying about AI when it can reproduce the creaking of the wooden stairs of my childhood.

(This’ll happen sooner than expected of course, I’ll just have been processed for my carbon by that point – and whatever undiscovered element is responsible for consciousness).

So, whoops all around, then.

David Krueger: By the time you want to pause AI, it will be too late.

Racing until we can smell superintelligence then pausing is NOT A REALISTIC PROPOSAL, it is a FANTASY.

I don’t understand why people don’t get it.

People in AI safety especially.

Quick way to lose a lot of my respect.

The obvious response is ‘no, actually, pausing without being able to smell superintelligence first is (also?) not a realistic proposal, it is a fantasy.’

It seems highly plausible that the motivation for a pause will come exactly when it becomes impossible to do so, or impossible to do so without doing such immense economic damage that we effectively can’t do it. We will likely get at most a very narrow window to do this.

Thus, what we need to do now is pursue the ability to pause in the future. As in, make it technologically and physically feasible to implement a pause. That means building state capacity, ensuring transparency, researching the necessary technological implementations, laying diplomatic foundations, and so on. All of that is also a good idea for other reasons, to maintain maximum understanding and flexibility, even if we never get close to pressing such a button.

Welcome to interdimensional cable, thanks to Veo 3.

Grok decides that images of Catturd’s dead dog is where it draws the line.

Who would want that?

Ari K: WE CAN TALK! I spent 2 hours playing with Veo 3 @googledeepmind and it blew my mind now that it can do sound! It can talk, and this is all out of the box.

Sridhar Ramesh: This would only be useful in a world where people wanted to watch an endless scroll of inane little video clips, constantly switching every six seconds or so, in nearly metronomic fashion.

Oh. Right.

Sridhar Ramesh (quoting himself from 2023): I am horrified by how much time my children spend rotting their attention span on TikTok. I’ve set a rule that after every fifteen minutes of TikTok, they have to watch one hour of TV.

Also, you will soon be able to string the eight second clips together via extensions.

How it’s going.

Also how it’s going.

We don’t even have humans aligned to human preferences at home.

There is a full blog post, warning the jokes do not get funnier.

Also, did you know that You Can Just Do Math?

Lennart Heim: Yes, we do. It’s ~21GW. [From our paper here.]

You count all the AI chips produced, factor in that they’re running most of the time, add some overhead—and you got your answer. It’s a lot. And will only get more.

But you know what? Probably worth it.

Discussion about this post

AI #117: OpenAI Buys Device Maker IO Read More »

new-claude-4-ai-model-refactored-code-for-7-hours-straight

New Claude 4 AI model refactored code for 7 hours straight


Anthropic says Claude 4 beats Gemini on coding benchmarks; works autonomously for hours.

The Claude 4 logo, created by Anthropic. Credit: Anthropic

On Thursday, Anthropic released Claude Opus 4 and Claude Sonnet 4, marking the company’s return to larger model releases after primarily focusing on mid-range Sonnet variants since June of last year. The new models represent what the company calls its most capable coding models yet, with Opus 4 designed for complex, long-running tasks that can operate autonomously for hours.

Alex Albert, Anthropic’s head of Claude Relations, told Ars Technica that the company chose to revive the Opus line because of growing demand for agentic AI applications. “Across all the companies out there that are building things, there’s a really large wave of these agentic applications springing up, and a very high demand and premium being placed on intelligence,” Albert said. “I think Opus is going to fit that groove perfectly.”

Before we go further, a brief refresher on Claude’s three AI model “size” names (first introduced in March 2024) is probably warranted. Haiku, Sonnet, and Opus offer a tradeoff between price (in the API), speed, and capability.

Haiku models are the smallest, least expensive to run, and least capable in terms of what you might call “context depth” (considering conceptual relationships in the prompt) and encoded knowledge. Owing to the small size in parameter count, Haiku models retain fewer concrete facts and thus tend to confabulate more frequently (plausibly answering questions based on lack of data) than larger models, but they are much faster at basic tasks than larger models. Sonnet is traditionally a mid-range model that hits a balance between cost and capability, and Opus models have always been the largest and slowest to run. However, Opus models process context more deeply and are hypothetically better suited for running deep logical tasks.

A screenshot of the Claude web interface with Opus 4 and Sonnet 4 options shown.

A screenshot of the Claude web interface with Opus 4 and Sonnet 4 options shown. Credit: Anthropic

There is no Claude 4 Haiku just yet, but the new Sonnet and Opus models can reportedly handle tasks that previous versions could not. In our interview with Albert, he described testing scenarios where Opus 4 worked coherently for up to 24 hours on tasks like playing Pokémon while coding refactoring tasks in Claude Code ran for seven hours without interruption. Earlier Claude models typically lasted only one to two hours before losing coherence, Albert said, meaning that the models could only produce useful self-referencing outputs for that long before beginning to output too many errors.

In particular, that marathon refactoring claim reportedly comes from Rakuten, a Japanese tech services conglomerate that “validated [Claude’s] capabilities with a demanding open-source refactor running independently for 7 hours with sustained performance,” Anthropic said in a news release.

Whether you’d want to leave an AI model unsupervised for that long is another question entirely because even the most capable AI models can introduce subtle bugs, go down unproductive rabbit holes, or make choices that seem logical to the model but miss important context that a human developer would catch. While many people now use Claude for easy-going vibe coding, as we covered in March, the human-powered (and ironically-named) “vibe debugging” that often results from long AI coding sessions is also a very real thing. More on that below.

To shore up some of those shortcomings, Anthropic built memory capabilities into both new Claude 4 models, allowing them to maintain external files for storing key information across long sessions. When developers provide access to local files, the models can create and update “memory files” to track progress and things they deem important over time. Albert compared this to how humans take notes during extended work sessions.

Extended thinking meets tool use

Both Claude 4 models introduce what Anthropic calls “extended thinking with tool use,” a new beta feature allowing the models to alternate between simulated reasoning and using external tools like web search, similar to what OpenAI’s o3 and 04-mini-high AI models currently do in ChatGPT. While Claude 3.7 Sonnet already had strong tool use capabilities, the new models can now interleave simulated reasoning and tool calling in a single response.

“So now we can actually think, call a tool process, the results, think some more, call another tool, and repeat until it gets to a final answer,” Albert explained to Ars. The models self-determine when they have reached a useful conclusion, a capability picked up through training rather than governed by explicit human programming.

General Claude 4 benchmark results, provided by Anthropic.

General Claude 4 benchmark results, provided by Anthropic. Credit: Anthropic

In practice, we’ve anecdotally found parallel tool use capability very useful in AI assistants like OpenAI o3, since they don’t have to rely on what is trained in their neural network to provide accurate answers. Instead, these more agentic models can iteratively search the web, parse the results, analyze images, and spin up coding tasks for analysis in ways that can avoid falling into a confabulation trap by relying solely on pure LLM outputs.

“The world’s best coding model”

Anthropic says Opus 4 leads industry benchmarks for coding tasks, achieving 72.5 percent on SWE-bench and 43.2 percent on Terminal-bench, calling it “the world’s best coding model.” According to Anthropic, companies using early versions report improvements. Cursor described it as “state-of-the-art for coding and a leap forward in complex codebase understanding,” while Replit noted “improved precision and dramatic advancements for complex changes across multiple files.”

In fact, GitHub announced it will use Sonnet 4 as the base model for its new coding agent in GitHub Copilot, citing the model’s performance in “agentic scenarios” in Anthropic’s news release. Sonnet 4 scored 72.7 percent on SWE-bench while maintaining faster response times than Opus 4. The fact that GitHub is betting on Claude rather than a model from its parent company Microsoft (which has close ties to OpenAI) suggests Anthropic has built something genuinely competitive.

Software engineering benchmark results, provided by Anthropic.

Software engineering benchmark results, provided by Anthropic. Credit: Anthropic

Anthropic says it has addressed a persistent issue with Claude 3.7 Sonnet in which users complained that the model would take unauthorized actions or provide excessive output. Albert said the company reduced this “reward hacking behavior” by approximately 80 percent in the new models through training adjustments. An 80 percent reduction in unwanted behavior sounds impressive, but that also suggests that 20 percent of the problem behavior remains—a big concern when we’re talking about AI models that might be performing autonomous tasks for hours.

When we asked about code accuracy, Albert said that human code review is still an important part of shipping any production code. “There’s a human parallel, right? So this is just a problem we’ve had to deal with throughout the whole nature of software engineering. And this is why the code review process exists, so that you can catch these things. We don’t anticipate that going away with models either,” Albert said. “If anything, the human review will become more important, and more of your job as developer will be in this review than it will be in the generation part.”

Pricing and availability

Both Claude 4 models maintain the same pricing structure as their predecessors: Opus 4 costs $15 per million tokens for input and $75 per million for output, while Sonnet 4 remains at $3 and $15. The models offer two response modes: traditional LLM and simulated reasoning (“extended thinking”) for complex problems. Given that some Claude Code sessions can apparently run for hours, those per-token costs will likely add up very quickly for users who let the models run wild.

Anthropic made both models available through its API, Amazon Bedrock, and Google Cloud Vertex AI. Sonnet 4 remains accessible to free users, while Opus 4 requires a paid subscription.

The Claude 4 models also debut Claude Code (first introduced in February) as a generally available product after months of preview testing. Anthropic says the coding environment now integrates with VS Code and JetBrains IDEs, showing proposed edits directly in files. A new SDK allows developers to build custom agents using the same framework.

A screenshot of

A screenshot of “Claude Plays Pokemon,” a custom application where Claude 4 attempts to beat the classic Game Boy game. Credit: Anthropic

Even with Anthropic’s future riding on the capability of these new models, when we asked about how they guide Claude’s behavior by fine-tuning, Albert acknowledged that the inherent unpredictability of these systems presents ongoing challenges for both them and developers. “In the realm and the world of software for the past 40, 50 years, we’ve been running on deterministic systems, and now all of a sudden, it’s non-deterministic, and that changes how we build,” he said.

“I empathize with a lot of people out there trying to use our APIs and language models generally because they have to almost shift their perspective on what it means for reliability, what it means for powering a core of your application in a non-deterministic way,” Albert added. “These are general oddities that have kind of just been flipped, and it definitely makes things more difficult, but I think it opens up a lot of possibilities as well.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

New Claude 4 AI model refactored code for 7 hours straight Read More »

openai-introduces-codex,-its-first-full-fledged-ai-agent-for-coding

OpenAI introduces Codex, its first full-fledged AI agent for coding

We’ve been expecting it for a while, and now it’s here: OpenAI has introduced an agentic coding tool called Codex in research preview. The tool is meant to allow experienced developers to delegate rote and relatively simple programming tasks to an AI agent that will generate production-ready code and show its work along the way.

Codex is a unique interface (not to be confused with the Codex CLI tool introduced by OpenAI last month) that can be reached from the side bar in the ChatGPT web app. Users enter a prompt and then click either “code” to have it begin producing code, or “ask” to have it answer questions and advise.

Whenever it’s given a task, that task is performed in a distinct container that is preloaded with the user’s codebase and is meant to accurately reflect their development environment.

To make Codex more effective, developers can include an “AGENTS.md” file in the repo with custom instructions, for example to contextualize and explain the code base or to communicate standardizations and style practices for the project—kind of a README.md but for AI agents rather than humans.

Codex is built on codex-1, a fine-tuned variation of OpenAI’s o3 reasoning model that was trained using reinforcement learning on a wide range of coding tasks to analyze and generate code, and to iterate through tests along the way.

OpenAI introduces Codex, its first full-fledged AI agent for coding Read More »