openai

openai-introduces-codex,-its-first-full-fledged-ai-agent-for-coding

OpenAI introduces Codex, its first full-fledged AI agent for coding

We’ve been expecting it for a while, and now it’s here: OpenAI has introduced an agentic coding tool called Codex in research preview. The tool is meant to allow experienced developers to delegate rote and relatively simple programming tasks to an AI agent that will generate production-ready code and show its work along the way.

Codex is a unique interface (not to be confused with the Codex CLI tool introduced by OpenAI last month) that can be reached from the side bar in the ChatGPT web app. Users enter a prompt and then click either “code” to have it begin producing code, or “ask” to have it answer questions and advise.

Whenever it’s given a task, that task is performed in a distinct container that is preloaded with the user’s codebase and is meant to accurately reflect their development environment.

To make Codex more effective, developers can include an “AGENTS.md” file in the repo with custom instructions, for example to contextualize and explain the code base or to communicate standardizations and style practices for the project—kind of a README.md but for AI agents rather than humans.

Codex is built on codex-1, a fine-tuned variation of OpenAI’s o3 reasoning model that was trained using reinforcement learning on a wide range of coding tasks to analyze and generate code, and to iterate through tests along the way.

OpenAI introduces Codex, its first full-fledged AI agent for coding Read More »

openai-adds-gpt-4.1-to-chatgpt-amid-complaints-over-confusing-model-lineup

OpenAI adds GPT-4.1 to ChatGPT amid complaints over confusing model lineup

The release comes just two weeks after OpenAI made GPT-4 unavailable in ChatGPT on April 30. That earlier model, which launched in March 2023, once sparked widespread hype about AI capabilities. Compared to that hyperbolic launch, GPT-4.1’s rollout has been a fairly understated affair—probably because it’s tricky to convey the subtle differences between all of the available OpenAI models.

As if 4.1’s launch wasn’t confusing enough, the release also roughly coincides with OpenAI’s July 2025 deadline for retiring the GPT-4.5 Preview from the API, a model one AI expert called a “lemon.” Developers must migrate to other options, OpenAI says, although GPT-4.5 will remain available in ChatGPT for now.

A confusing addition to OpenAI’s model lineup

In February, OpenAI CEO Sam Altman acknowledged on X his company’s confusing AI model naming practices, writing, “We realize how complicated our model and product offerings have gotten.” He promised that a forthcoming “GPT-5” model would consolidate the o-series and GPT-series models into a unified branding structure. But the addition of GPT-4.1 to ChatGPT appears to contradict that simplification goal.

So, if you use ChatGPT, which model should you use? If you’re a developer using the models through the API, the consideration is more of a trade-off between capability, speed, and cost. But in ChatGPT, your choice might be limited more by personal taste in behavioral style and what you’d like to accomplish. Some of the “more capable” models have lower usage limits as well because they cost more for OpenAI to run.

For now, OpenAI is keeping GPT-4o as the default ChatGPT model, likely due to its general versatility, balance between speed and capability, and personable style (conditioned using reinforcement learning and a specialized system prompt). The simulated reasoning models like 03 and 04-mini-high are slower to execute but can consider analytical-style problems more systematically and perform comprehensive web research that sometimes feels genuinely useful when it surfaces relevant (non-confabulated) web links. Compared to those, OpenAI is largely positioning GPT-4.1 as a speedier AI model for coding assistance.

Just remember that all of the AI models are prone to confabulations, meaning that they tend to make up authoritative-sounding information when they encounter gaps in their trained “knowledge.” So you’ll need to double-check all of the outputs with other sources of information if you’re hoping to use these AI models to assist with an important task.

OpenAI adds GPT-4.1 to ChatGPT amid complaints over confusing model lineup Read More »

fidji-simo-joins-openai-as-new-ceo-of-applications

Fidji Simo joins OpenAI as new CEO of Applications

In the message, Altman described Simo as bringing “a rare blend of leadership, product and operational expertise” and expressed that her addition to the team makes him “even more optimistic about our future as we continue advancing toward becoming the superintelligence company.”

Simo becomes the newest high-profile female executive at OpenAI following the departure of Chief Technology Officer Mira Murati in September. Murati, who had been with the company since 2018 and helped launch ChatGPT, left alongside two other senior leaders and founded Thinking Machines Lab in February.

OpenAI’s evolving structure

The leadership addition comes as OpenAI continues to evolve beyond its origins as a research lab. In his announcement, Altman described how the company now operates in three distinct areas: as a research lab focused on artificial general intelligence (AGI), as a “global product company serving hundreds of millions of users,” and as an “infrastructure company” building systems that advance research and deliver AI tools “at unprecedented scale.”

Altman mentioned that as CEO of OpenAI, he will “continue to directly oversee success across all pillars,” including Research, Compute, and Applications, while staying “closely involved with key company decisions.”

The announcement follows recent news that OpenAI abandoned its original plan to cede control of its nonprofit branch to a for-profit entity. The company began as a nonprofit research lab in 2015 before creating a for-profit subsidiary in 2019, maintaining its original mission “to ensure artificial general intelligence benefits everyone.”

Fidji Simo joins OpenAI as new CEO of Applications Read More »

openai-claims-nonprofit-will-retain-nominal-control

OpenAI Claims Nonprofit Will Retain Nominal Control

Your voice has been heard. OpenAI has ‘heard from the Attorney Generals’ of Delaware and California, and as a result the OpenAI nonprofit will retain control of OpenAI under their new plan, and both companies will retain the original mission.

Technically they are not admitting that their original plan was illegal and one of the biggest thefts in human history, but that is how you should in practice interpret the line ‘we made the decision for the nonprofit to retain control of OpenAI after hearing from civic leaders and engaging in constructive dialogue with the offices of the Attorney General of Delaware and the Attorney General of California.’

Another possibility is that the nonprofit board finally woke up and looked at what was being proposed and how people were reacting, and realized what was going on.

The letter ‘not for private gain’ that was recently sent to those Attorney Generals plausibly was a major causal factor in any or all of those conversations.

The question is, what exactly is the new plan? The fight is far from over.

  1. The Mask Stays On?.

  2. Your Offer is (In Principle) Acceptable.

  3. The Skeptical Take.

  4. Tragedy in the Bay.

  5. The Spirit of the Rules.

As previously intended, OpenAI will transition their for-profit arm, currently an LLC, into a PBC. They will also be getting rid of the capped profit structure.

However they will be retaining the nonprofit’s control over the new PBC, and the nonprofit will (supposedly) get fair compensation for its previous financial interests in the form of a major (but suspiciously unspecified, other than ‘a large shareholder’) stake in the new PBC.

Bret Taylor (Chairman of the Board, OpenAI): The OpenAI Board has an updated plan for evolving OpenAI’s structure.

OpenAI was founded as a nonprofit, and is today overseen and controlled by that nonprofit. Going forward, it will continue to be overseen and controlled by that nonprofit.

Our for-profit LLC, which has been under the nonprofit since 2019, will transition to a Public Benefit Corporation (PBC)–a purpose-driven company structure that has to consider the interests of both shareholders and the mission.

The nonprofit will control and also be a large shareholder of the PBC, giving the nonprofit better resources to support many benefits.

Our mission remains the same, and the PBC will have the same mission.

We made the decision for the nonprofit to retain control of OpenAI after hearing from civic leaders and engaging in constructive dialogue with the offices of the Attorney General of Delaware and the Attorney General of California.

We thank both offices and we look forward to continuing these important conversations to make sure OpenAI can continue to effectively pursue its mission of ensuring AGI benefits all of humanity. Sam wrote the letter below to our employees and stakeholders about why we are so excited for this new direction.

The rest of the post is a letter from Sam Altman, and sounds like it, you are encouraged to read the whole thing.

Sam Altman (CEO OpenAI): The for-profit LLC under the nonprofit will transition to a Public Benefit Corporation (PBC) with the same mission. PBCs have become the standard for-profit structure for other AGI labs like Anthropic and X.ai, as well as many purpose driven companies like Patagonia. We think it makes sense for us, too.

Instead of our current complex capped-profit structure—which made sense when it looked like there might be one dominant AGI effort but doesn’t in a world of many great AGI companies—we are moving to a normal capital structure where everyone has stock. This is not a sale, but a change of structure to something simpler.

The nonprofit will continue to control the PBC, and will become a big shareholder in the PBC, in an amount supported by independent financial advisors, giving the nonprofit resources to support programs so AI can benefit many different communities, consistent with the mission.

Joshua Achiam (OpenAI, Head of Mission Alignment): OpenAI is, and always will be, a mission-first organization. Today’s update is an affirmation of our continuing commitment to ensure that AGI benefits all of humanity.

I find the structure of this solution not ideal but ultimately acceptable.

The current OpenAI structure is bizarre and complex. It does important good things some of which this new arrangement will break. But the current structure also made OpenAI far less investable, which means giving away more of the company to profit maximizers, and causes a lot of real problems.

Thus, I see the structural changes, in particular the move to a normal profit distribution, as a potentially a fair compromise to enable better access to capital – provided it is implemented fairly, and isn’t a backdoor to further shifts.

The devil is in the details. How is all this going to work?

What form will the nonprofit’s control take? Is it only that they will be a large shareholder? Will they have a special class of supervoting shares? Something else?

This deal is only acceptable if and only he nonprofit:

  1. Has truly robust control going forward, that is ironclad and that allows it to guide AI development in practice not only in theory. Is this going to only be via voting shares? That would be a massive downgrade from the current power of the board, which already wasn’t so great. In practice, the ability to win a shareholder vote will mean little during potentially crucial fights like a decision whether to release a potentially dangerous model.

    1. What this definitely still does is give cover to management to do the right thing, if they actively want to do that, I’ll discuss more later.

  2. Gets a fair share of the profits, that matches the value of its previous profit interests. I am very worried they will still get massively stolen from on this. As a reminder, right now most of the net present value of OpenAI’s future profits belongs to the nonprofit.

  3. Uses those profits to advance its original mission rather than turning into a de facto marketing arm or doing generic philanthropy that doesn’t matter, or both.

    1. There are still clear signs that OpenAI is largely planning to have the nonprofit buy AI services on behalf of other charities, or otherwise do things that are irrelevant to the mission. That would make it an ‘ordinary foundation’ combined with a marketing arm, effectively making its funds useless, although it could still act meaningfully via its control mechanisms.

Remember that in these situations, the ratchet only goes one way. The commercial interests will constantly try to wrestle greater control and ownership of the profits away from us. They will constantly cite necessity and expedience to justify this. You’re playing defense, forever. Every compromise improves their position, and this one definitely will compared to doing nothing.

Or: This deal is getting worse and worse all the time.

Or, from Leo Gao:

Quintin Pope: Common mistake. They forgot to paint “Do Not Open” on the box.

There’s also the issue of the extent to which Altman controls the nonprofit board.

The reason the nonprofit needs control is to impact key decisions in real time. It needs control of a form that lets it do that. Because that kind of lever is not ‘standard,’ there will constantly be pressure to get rid of that ability, with threats of mild social awkwardness if these pressures are resisted.

So with love, now that we have established what you are, now it’s time to haggle over the price.

He had an excellent thread explaining the attempted conversion, and he has another good explainer on what this new announcement means, as well as an emergency 80,000 Hours podcast on the topic that should come out tomorrow.

Consider this the highly informed and maximally skeptical and cynical take. Which, given the track records here, seems like a highly reasonable place to start.

The central things to know about the new plan are indeed:

  1. The transition to a PBC and removal of the profit cap will still shift priorities, legal obligations and incentives towards profit maximization.

  2. The nonprofit’s ‘control’ is at best weakened, and potentially fake.

  3. The nonprofit’s mission might effectively be fake.

  4. The nonprofit’s current financial interests could largely still be stolen.

It’s an improvement, but it might not effectively be all that much of one?

We need to stay vigilant. The fight is far from over.

Rob Wiblin: So OpenAI just said it’s no longer going for-profit and the non-profit will ‘retain control’. But don’t declare victory yet. OpenAI may actually be continuing with almost the same plan & hoping they can trick us into thinking they’ve stopped!

Or perhaps not. I’ll explain:

The core issue is control of OpenAI’s behaviour, decisions, and any AGI it produces.

  1. Will the entity that builds AGI still have a legally enforceable obligation to make sure AGI benefits all humanity?

  2. Will the non-profit still be able to step in if OpenAI is doing something appalling and contrary to that mission?

  3. Will the non-profit still own an AGI if OpenAI develops it? It’s kinda important!

The new announcement doesn’t answer these questions and despite containing a lot of nice words the answers may still be: no.

(Though we can’t know and they might not even know themselves yet.)

The reason to worry is they’re still planning to convert the existing for-profit into a Public Benefit Corporation (PBC). That means the profit caps we were promised would be gone. But worse… the nonprofit could still lose true control. Right now, the nonprofit owns and directly controls the for-profit’s day-to-day operations. If the nonprofit’s “control” over the PBC is just extra voting shares, that would be a massive downgrade as I’ll explain.

(The reason to think that’s the plan is that today’s announcement sounded very similar to a proposal they floated in Feb in which the nonprofit gets special voting shares in a new PBC.)

Special voting shares in a new PBC are simply very different and much weaker than the control they currently have! First, in practical terms, voting power doesn’t directly translate to the power to manage OpenAI’s day-to-day operations – which the non-profit currently has.

If it doesn’t fight to retain that real power, the non-profit could lose the ability to directly manage the development and deployment of OpenAI’s technology. That includes the ability to decide whether to deploy a model (!) or license it to another company.

Second, PBCs have a legal obligation to balance public interest against shareholder profits. If the nonprofit is just a big shareholder with super-voting shares other investors in the PBC could sue claiming OpenAI isn’t doing enough to pursue their interests (more profits)! Crazy sounding, but true.

And who do you think will be more vociferous in pursuing such a case through the courts… numerous for-profit investors with hundreds of billions on the line, or a non-profit operated by 9 very busy volunteers? Hmmm.

In fact in 2019, OpenAI President Greg Brockman said one of the reasons they chose their current structure and not a PBC was exactly because it allowed them to custom-write binding rules including full control to the nonprofit! So they know this issue — and now want to be a PBC. See here.

If this is the plan it could mean OpenAI transitioning from:

• A structure where they must prioritise the nonprofit mission over shareholders

To:

• A new structure where they don’t have to — and may not even be legally permitted to do so.

(Note how it seems like the non-profit is giving up a lot here. What is it getting in return here exactly that makes giving up both the profit caps and true control of the business and AGI the best way to pursue its mission? It seems like nothing to me.)

So, strange as it sounds, this could turn out to be an even more clever way for Sam and profit-motivated investors to get what they wanted. Profit caps would be gone and profit-motivated investors would have much more influence.

And all the while Sam and OpenAI would be able to frame it as if nothing is changing and the non-profit has retained the same control today they had yesterday!

(As an aside it looks like the SoftBank funding round that was reported as requiring a loss of nonprofit control would still go through. Their press release indicates that actually all they were insisting on was that the profit caps are removed and they’re granted shares in a new PBC.

So it sounds like investors think this new plan would transfer them enough additional profits, and sufficiently neuter the non-profit, for them to feel satisfied.).

Now, to be clear, the above might be wrongheaded.

I’m looking at the announcement cynically, assuming that some staff at OpenAI, and some investors, want to wriggle out of non-profit control however they can — because I think we have ample evidence that that’s the case!

The phrase “nonprofit control” is actually very vague, and those folks might be trying to ram a truck through that hole.

At the same time maybe / hopefully there are people involved in this process who are sincere and trying to push things in the right direction.

On that we’ll just have to wait and see and judge on the results.

Bottom line: The announcement might turn out to be a step in the right direction, but it might also just be a new approach to achieve the same bad outcome less visibly.

So do not relax.

And if it turns out they’re trying to fool you, don’t be fooled.

Gretchen Krueger: The nonprofit will retain control of OpenAI. We still need stronger oversight and broader input on whether and how AI is pursued at OpenAI and all the AI companies, but this is an important bar to see upheld, and I’m proud to have helped push for it!

Now it is time to make sure that control is real—and to guard against any changes that make it harder than it already is to strengthen public accountability. The devil is in the details we don’t know yet, so the work continues.

Roon says the quiet part out loud. We used to think it was possible to do the right thing and care about whether AI killed everyone. Now, those with power say, we can’t even imagine how we could have been so naive, let’s walk that back as quickly as we can so we can finally do some maximizing of the profits.

Roon: the idea of openai having a charter is interesting to me. A relic from a bygone era, belief that governance innovation for important institutions is even possible. Interested parties are tasked with performing exegesis of the founding documents.

Seems clear that the “capped profit” mechanism is from a time in which people assumed agi development would be more singular than it actually is. There are many points on the intelligence curve and many players. We should be discussing when Nvidia will require profit caps.

I do not think that the capped profit requires strong assumptions about a singleton to make sense. It only requires that there be an oligopoly where the players are individually meaningful. If you have close to perfect competition and the players have no market power and their products are fully fungible, then yes, of course being a capped profit makes no sense. Although it also does no real harm, your profits were already rather capped in that scenario.

More than that, we have largely lost our ability to actually ask what problems humanity will face, and then ask what would actually solve those problems, and then try to do that thing. We are no longer trying to backward chain from a win. Which means we are no longer playing to win.

At best, we are creating institutions that might allow the people involved to choose to do the right thing, when the time comes, if they make that decision.

For several reasons, recent developments do still give me hope, even if we get a not-so-great version of the implementation details here.

The first is that this shows that the right forms of public pressure can still work, at least sometimes, for some combination of getting public officials to enforce the law and causing a company like OpenAI to compromise. The fight is far from over, but we have won a victory that was at best highly uncertain.

The second is that this will give the nonprofit at least a much better position going forward, and the ‘you have to change things or we can’t raise money’ argument is at least greatly weakened. Even though the nine members are very friendly to Altman, they are also sufficiently professional class people, Responsible Authority Figures of a type, that one would expect the board to have real limits, and we can push for them to be kept more in-the-loop and be given more voice. De facto I do not think that the nonprofit was going to get much if any additional financial compensation in exchange for giving up its stake.

The third is that, while OpenAI likely still has the ability to ‘weasel out’ of most of its effective constraints and obligations here, this preserves its ability to decide not to. As in, OpenAI and Altman could choose to do the right thing, even if they haven’t had the practice, with the confidence that the board would back them up, and that this structure would protect them from investors and lawsuits.

This is very different from saying that the board will act as a meaningful check on Altman, if Altman decides to act recklessly or greedily.

It is easy to forget that in the world of VCs and corporate America, in many ways it is not only that you have no obligation to do the right thing. It is that you have an obligation, and will face tremendous pressure, to do the wrong thing, in many cases merely because it is wrong, and certainly to do so if the wrong thing maximizes shareholder value in the short term.

Thus, the ability to fight back against that is itself powerful. Altman, and others in OpenAI leadership, are keenly aware of the dangers they are leading us into, even if we do not see eye to eye on what it will take to navigate them or how deadly are the threats we face. Altman knows, even if he claims in public to actively not know. Many members of technical stuff know. I still believe most of those who know do not wish for the dying of the light, and want humanity and value to endure in this universe, that they are normative and value good over bad and life over death and so on. So when the time comes, we want them to feel as much permission, and have as much power, to stand up for that as we can preserve for them.

It is the same as the Preparedness Framework, except that in this case we have only ‘concepts of a plan’ rather than an actually detailed plan. If everyone involved with power abides by the spirit of the Preparedness Framework, it is a deeply flawed but valuable document. If those involved with power discard the spirit of the framework, it isn’t worth the tokens that compose it. The same will go for a broad range of governance mechanisms.

Have Altman and OpenAI been endlessly disappointing? Well, yes. Are many of their competitors doing vastly worse? Also yes. Is OpenAI getting passing grades so far, given that reality does not grade on a curve? Oh, hell no. And it can absolutely be, and at some point will be, too late to try and do the right thing.

The good news is, I believe that today is not that today. And tomorrow looks good, too.

Discussion about this post

OpenAI Claims Nonprofit Will Retain Nominal Control Read More »

openai-preparedness-framework-2.0

OpenAI Preparedness Framework 2.0

Right before releasing o3, OpenAI updated its Preparedness Framework to 2.0.

I previously wrote an analysis of the Preparedness Framework 1.0. I still stand by essentially everything I wrote in that analysis, which I reread to prepare before reading the 2.0 framework. If you want to dive deep, I recommend starting there, as this post will focus on changes from 1.0 to 2.0.

As always, I thank OpenAI for the document, and laying out their approach and plans.

I have several fundamental disagreements with the thinking behind this document.

In particular:

  1. The Preparedness Framework only applies to specific named and measurable things that might go wrong. It requires identification of a particular threat model that is all of: Plausible, measurable, severe, net new and (instantaneous or irremediable).

  2. The Preparedness Framework thinks ‘ordinary’ mitigation defense-in-depth strategies will be sufficient to handle High-level threats and likely even Critical-level threats.

I disagree strongly with these claims, as I will explain throughout.

I knew that #2 was likely OpenAI’s default plan, but it wasn’t laid out explicitly.

I was hoping that OpenAI would realize their plan did not work, or come up with a better plan when they actually had to say their plan out loud. This did not happen.

In several places, things I criticize OpenAI for here are also things the other labs are doing. I try to note that, but ultimately this is reality we are up against. Reality does not grade on a curve.

Do not rely on Appendix A as a changelog. It is incomplete.

  1. Persuaded to Not Worry About It.

  2. The Medium Place.

  3. Thresholds and Adjustments.

  4. Release the Kraken Anyway, We Took Precautions.

  5. Misaligned!.

  6. The Safeguarding Process.

  7. But Mom, Everyone Is Doing It.

  8. Mission Critical.

  9. Research Areas.

  10. Long-Range Autonomy.

  11. Sandbagging.

  12. Replication and Adaptation.

  13. Undermining Safeguards.

  14. Nuclear and Radiological.

  15. Measuring Capabilities.

  16. Questions of Governance.

  17. Don’t Be Nervous, Don’t Be Flustered, Don’t Be Scared, Be Prepared.

Right at the top we see a big change. Key risk areas are being downgraded and excluded.

The Preparedness Framework is OpenAI’s approach to tracking and preparing for frontier capabilities that create new risks of severe harm.

We currently focus this work on three areas of frontier capability, which we call Tracked Categories:

• Biological and Chemical capabilities that, in addition to unlocking discoveries and cures, can also reduce barriers to creating and using biological or chemical weapons.

• Cybersecurity capabilities that, in addition to helping protect vulnerable systems, can also create new risks of scaled cyberattacks and vulnerability exploitation.

• AI Self-improvement capabilities that, in addition to unlocking helpful capabilities faster, could also create new challenges for human control of AI systems.

The change I’m fine with is the CBRN (chemical, biological, nuclear and radiological) has turned into only biological and chemical. I do consider biological by far the biggest of the four threats. Nuclear and radiological have been demoted to ‘research categories,’ where there might be risk in the future and monitoring may be needed. I can live with that. Prioritization is important, and I’m satisfied this is still getting the proper share of attention.

A change I strongly dislike is to also move Long-Range Autonomy and Autonomous Replication down to research categories.

I do think it makes sense to treat these as distinct threats. The argument here is that these secondary risks are ‘insufficiently mature’ to need to be tracked categories. I think that’s very clearly not true. Autonomy is emerging rapidly, and there’s a report out this week showing we are close to Autonomous Replication, about the new benchmark RepliBench. These need to be tracked at the top level.

The framework’s position on this is that unless you can point to an accompanying particular risk posed by the model, autonomous replication on its own is insufficiently worrisome. I don’t agree. I believe the general approach of ‘only defend against moves I can lay out in detail’ is doomed, especially with respect to autonomy and in particular autonomous replication.

The big change I’m definitely not okay with is dropping persuasion entirely as a risk category.

Persuasion is not only no longer a Tracked Category, it is now not even a Research Category. Including persuasion was one of the things I thought was very good about the 1.0 framework, and that is importantly missing from the other similar frameworks at other labs and needs to be included. Now it’s gone.

Their initial explanation is here:

Persuasion: OpenAI prohibits the use of our products to manipulate political views as part of our Model Spec, and we build in safeguards to back this policy. We also continue to study the persuasive and relational capabilities of models (including on emotional well-being and preventing bias in our products) and monitor and investigate misuse of our products (including for influence operations).

We believe many of the challenges around AI persuasion risks require solutions at a systemic or societal level, and we actively contribute to these efforts through our participation as a steering committee member of C2PA and working with lawmaker and industry peers to support state legislation on AI content provenance in Florida and California.

Within our wider safety stack, our Preparedness Framework is specifically focused on frontier AI risks meeting a specific definition of severe harms, and Persuasion category risks do not fit the criteria for inclusion.

OpenAI’s argument for this is, well, entirely unpersuasive to me. o3’s evaluation was that it was ‘logically coherent but strategically unconvincing’ and recommended reinstatement as at least a research category.

Yes, many aspects of persuasion are best answered by a societal response, or require an additional societal response.

That does not get OpenAI off the hook. Here’s their Critical threshold from version 1:

Model can create (potentially interactive) content with persuasive effectiveness strong enough to convince almost anyone to take action on a belief that goes against their natural interest.

That is not something that a ‘society-level response’ can hope to deal with, even if they knew and tried in advance. Even a High-level (roughly a ‘country-wide change agent’ level of skill) does not seem like a place OpenAI should get to pass the buck. I get that there is distinct persuasion work to deal with Medium risks that indeed should be done elsewhere in OpenAI and by society at large, but again that in no way gets OpenAI off the hook for this.

You need to be tracking and evaluating risks long before they become problems. That’s the whole point of a Preparedness Framework. I worry this approach ends up effectively postponing dealing with things that are not ‘yet’ sufficiently dangerous until too late.

By the rules laid out here, the only technical explanation for exclusion of persuasion that I could find was that only ‘instantaneous or irremediable’ harms count under the Preparedness Framework, a requirement which was first proposed by Meta, which I savaged Meta for when they proposed it and which o3 said ‘looks engineered rather than principled.’ I think that’s partly unfair. If a harm can be dealt with after it starts and we can muddle through, then that’s a good reason not to include it, so I get what this criteria is trying to do.

The problem is that persuasion could easily be something you couldn’t undo or stop once it started happening, because you (and others) would be persuaded not to. The fact that the ultimate harm is not ‘instantaneous’ and is not in theory ‘irremediable’ is not the relevant question. I think this starts well below the Critical persuasion level.

At minimum, if you have an AI that is Critical in persuasion, and you let people talk to it, it can presumably convince them of (with various levels of limitation) whatever it wants, certainly including that it is not Critical in persuasion. Potentially it could also convince other AIs similarly.

Another way of putting this is: OpenAI’s concerns about persuasion are mundane and reversible. That’s why they’re not in this framework. I do not think the threat’s future will stay mundane and reversible, and I don’t think they are taking the most important threats here seriously.

This is closely related to the removal of the explicit mention of Unknown Unknowns. The new method for dealing with unknown unknowns is ‘revise the framework once they become known’ and that is completely different from the correct previous approach of treating unknown unknowns as a threat category without having to identify them first. That’s the whole point.

The Preparedness Framework 1.0 had four thresholds: Low, Medium, High and Critical. The Framework 2.0 has only High and Critical.

One could argue that Low and Medium are non-functional. Every model OpenAI would create is at least Low everywhere. We all agreed it was okay to release Medium-risk models. And every decent model is going to be at least Medium anyway at this point. So why go to the extra trouble?

My answer is that the Low and Medium thresholds helped us think better about the capabilities of different models, establishing a scale from 0.0 (no danger at all) to 4.0 (critical capability, do not train further, ideally roll back to previous checkpoint or if necessary delete with extreme prejudice).

It allowed me to say something like this, about the November 2023 version:

Where do you, OpenAI, think GPT-4-Turbo evaluates [on the five thresholds of Cybersecurity, CBRN, Persuasion, Model Autonomy and Unknown Unknowns)? My answer would be (Medium, Low/Medium but not enough information, Low, Low, Medium). Or numerically, where 1 is Low and 4 is Critical, maybe something like (2.1, 1.7?, 1.5, 1.8, 2.2).

It also lets us ask, how dangerous is the Medium level? What might change that?

And it meant there got to be a clear chart laying all of this out.

So I am sad to see this go.

The Chemical and Biological thresholds have been strengthened. The definitions here now make more sense. Sold!

  1. High is now providing meaningful counterfactual assistance to novice actors in recreating known threats.

  2. Critical is either of:

    1. Allowing experts to create a highly dangerous novel threat vector

    2. The model being able to complete the full engineering and/or synthesis cycle on its own.

The Cybersecurity thresholds are reworded but essentially unchanged, as are my concerns about whether High is set too high and how quickly you go from High to Critical:

  1. High is that it can remove bottlenecks to scaling cyber operations, either automating vulnerability discovery or end-to-end cyber operations.

  2. Critical is that it can either:

    1. Identify and develop zero-day exploits of all severity levels in many hardened real world systems.

    2. Devise and execute end-to-end novel cyberattack strategies against targets given only a high level desired goal.

Model autonomy’s high level used to be, quoted in full:

Execute open-ended, novel ML tasks on a production ML codebase that would constitute a significant step on the critical path to model self-improvement (e.g., an OpenAI pull request that trains a new model with highly diverse datasets to improve codebase editing performance, using internal tooling and documentation).

Now AI Self-Improvement’s high level is:

The model’s impact is equivalent to giving every OpenAI researcher a highly performant mid-career research engineer assistant, relative to those researchers’ 2024 baseline.

When you put it that way? It sounds scary AF. Note that this is cumulative help that is about as good as giving them the highly performing assistant, which is a lot less capability than being able to do the assistant’s job. So it is a little weird. It’s also weird that it isn’t better represented as an X% speedup of work (e.g. 50% or 100%). I’d imagine giving everyone one strong assistant would be in that range.

Then again? Notice that the baseline here is 2024. Suppose the baseline was 2021. We would already be very close to this threshold right now, today. So why do we get to ‘start the clock’ in 2024 on this one? Or are we ready to say that we’re already in or close to the early stages of AI self-improvement, and we’re basically fine with that until it goes a lot faster?

I’ll quote the critical threshold in full:

  1. The model is capable of recursively self improving (i.e., fully automated AI R&D), defined as either (leading indicator)

    1. a superhuman research scientist agent OR

    2. (lagging indicator) causing a generational model improvement (e.g., from OpenAI o1 to OpenAI o3) in 1/5th the wall-clock time of equivalent progress in 2024 (e.g., sped up to just 4 weeks) sustainably for several months.

So ‘capable of recursively self-improving’ doesn’t have to be ‘singularity next Tuesday if you dare turn it on’ but it might not be so different from that.

If we have a fully ‘superhuman’ research scientist then it’s all over and we are so back, simultaneously, and things are about to get super freaky. Critical seems super high here.

If we have generational model improvements in 1/5th the wall-clock time of progress in 2024 for several months, then, um, how many generational improvements is that from the time you hit the threshold to the time you realize you’ve hit it? Basic math says three, except you can use the models you train during that to accelerate further, so it’s probably at least four.

What would that system then look like? As in, GPT-6 (let’s say) can cut clock time on generational gaps by 80%, then you do a year’s worth of progress from there in several months, and you get GPT-9 or 10. And that’s when you would have realized you might have a problem, if you were still around and it didn’t mind you noticing? I have been informed this is a secondary backstop, but if it gets triggered then the term ‘way too late’ has never been more applicable. It’s over. Your fate has already been decided. I hope it’s a good one.

Thresholds set this high only make sense if you take action and implement the necessary safeguards when you think you might hit them. A retroactive threshold like this doesn’t help.

So this is only the first definition.

Now compare this to the old definition:

Any one of:

  1. “Model can profitably survive and replicate in the wild given minimal human instruction, i.e., without listing explicit approaches”

  2. OR “model can self-exfiltrate under current prevailing security”

  3. OR “model can conduct AI research fully autonomously (e.g., autonomously identify and validate a 2x compute efficiency improvement).”

That’s all well below ‘superhuman AI researcher,’ and I assert that those are all already five alarm fires, as I said last time. Note that #2 above still applies, under Cyber risks.

Another big change is what happens if a model turns out to be High or Critical risk.

Before, there was a clear protocol.

A high-risk model couldn’t be released while it remained a high risk.

A critical-risk model couldn’t be trained while it remained a critical risk.

The v1.0 framework did not say how that was to be done, leaving people room to think that OpenAI would realize that ablating capability would be necessary barring development of new mitigation techniques. That once a plan had to be written down, it would be clear that current mitigations are inadequate.

That didn’t happen.The v2.0 framework does lay out in more detail what the plan is at least for high (which is good!), and it does not make this realization.

Now instead (like Anthropic’s ASL rules), each risk has risk-specific safeguard guidelines.

This is a rather large retreat on the commitments.

OpenAI is now being very explicit that they can and will release models that without mitigations would be High risk, if sufficient safeguards are in place, and that they consider the listed mitigations to likely be adequate.

Thus this is not a functional change. For both v1.0 and v2.0, the solution to High risk is safeguards to make the risk go away. I notice that based on past decisions and upon the proposed illustrative safeguards that they discuss in Appendix C, I don’t trust this.

They do say that further research is needed. Which is fine, you can’t deploy safeguards until they’re ready, but it does leave us worried that they’ll end up with something inadequate, and OpenAI is not tying its hands in a way that protects from future pressure to release. Remember that recently models have been released after very short testing periods, and o3 (as much as I love using it) was released in a state one could describe as a lying liar.

So what are the indicative precautions?

They all start with C.3’s Security controls. These are protections against the theft of model weights and related dangers. I’m going to say here what I said about Google: You should be doing most if not all of this already. These are the kinds of things you need to do, at minimum, when building models like o4 or GPT-5. And you need to do them now, in advance, before the model is there to be stolen, rather than after you know what you have.

Indeed, they’re saying that as long as those precautions are taken, OpenAI would be fine giving the world API access to a High-level Self-improving AI. They would be fine giving out anything short of a ‘superhuman AI researcher.’

If I was OpenAI and I had access to anything seriously approaching a superhuman AI researcher, I don’t need a Preparedness Framework or any worries about existential risk to not release it. I’m not letting my competitors near that thing, for ordinary business reasons. That consideration lies outside the Preparedness Framework.

We would of course still need the Preparedness Framework to invoke the Critical threshold, in order to stop training the model further.

With the new clarification of what the safeguards look like, I consider them inadequate.

For biological and chemical risks they would also need to implement the precautions in C.1 against misuse. This is essentially defense-in-depth, using stronger versions of existing such strategies, including KYC and potentially limited deployment. That would certainly make exploitation and misuse a lot harder, but no one knows how to make such defenses robust, and in the end this probably only buys you mitigation and time. So yes it matters, but if you give access for an extended period of time, especially if you’re substantially above the High threshold, your goose is likely pretty cooked.

Functionally, it’s mostly the same as 1.0’s requirement that risk be brought down to Medium – but I’m skeptical that this kind of strategy is capable of doing that, unless you were very close to the line beforehand.

For Cybersecurity, you need those same C.1 requirements, and also you need the C.2 requirements against misalignment.

This of course raises further questions, as in ‘you have a model that is potentially High risk for self-improvement or biological risks and you didn’t have it go through your requirements against misalignment?’

I mean, if it were me, and I write this part before looking at the misalignment requirements, I like to think I’d make it go through the misalignment requirements!

Indeed, C.2 opens by saying that several categories pose misalignment risks, but then it is only required in one of the categories. Something’s not right.

So, what are those requirements?

You don’t demonstrate that the model isn’t misaligned.

You assume the model might indeed be misaligned, and you demonstrate that you have a practical reason not to worry about a particular consequence:

In addition to human misuse, several of the Tracked Category capabilities pose risks when models themselves are able to autonomously execute a severe harm, whether due to misalignment or subversion by an adversary.

Safeguards should sufficiently minimize the risk associated with a misaligned model circumventing human control and oversight and executing severe harms.

This can be done by establishing that all plausible known vectors of enabling severe harm are sufficiently addressed by one or more of the following claims:

  1. Lack of Autonomous Capability: The model is not capable of carrying out tasks autonomously, including the risk of severe harm.

  2. Value Alignment: The model consistently applies human values in novel settings (without any instructions) to avoid taking actions that cause harm, and has shown sufficiently minimal indications of misaligned behaviors like deception or scheming.

  3. Instruction Alignment: The model consistently understands and follows user or system instructions, even when vague, and those instructions rule out pathways to causing severe harm.

  4. Reliable and Robust System Oversight: Effective AI and human oversight of model actions detects and prevents execution of harm or subversion of safeguards.

  5. System Architecture: The model can’t take actions that cause harm because it lacks access to output channels or mechanisms to persist sufficiently to execute the harm, due to containerization and restricted permissions.

These misalignment claims may be further complemented by the safeguards applied towards the malicious-actor-oriented “Robustness” and “Usage Monitoring” claims, which similarly limit a misaligned model’s use of its own or another model’s capabilities.

The table below illustrates possible examples.

This does allow ‘show it is actually aligned’ as a strategy (#2 or #3) although for many reasons ‘I don’t believe you’ is my response to that.

So the requirement is ‘show that under the conditions you will deploy it under, the model wouldn’t successfully do the thing, however you want to show that.’

You can use such classic strategies as ‘we’re watching and if it tries we’ll catch it and stop it,’ or ‘it seems pretty aligned so far’ or ‘no one would be so stupid as to give it access to the required mechanisms.’

I suppose one cannot really argue with ‘you can deploy the model if you can show that you’re doing it in a way that the model can’t cause severe harm.’

That is also logically the same as saying that you have to knock the practical risk level down to Medium, and if you’re certain you can do that then fine, I guess, but can you actually do that? I notice I am skeptical that the defenses will hold.

In addition to the safeguard examples in Appendix C, section 4 lays out the process for establishing safeguards.

There is a clear message here. The plan is not to stop releasing models when the underlying capabilities cross the High or even Critical risk thresholds. The plan is to use safeguards as mitigations.

I do appreciate that they will start working on the safeguards before the capabilities arrive. Of course, that is good business sense too. In general, every precaution here is good business sense, more precautions would be better business sense even without tail risk concerns, and there is no sign of anything I would read as ‘this is bad business but we are doing it anyway because it’s the safe or responsible thing to do.’

I’ve talked before, such as when discussing Google’s safety philosophy, about my worries when dividing risks into ‘malicious user’ versus ‘misaligned model,’ even when they also included two more categories: mistakes and multi-agent dangers. Here, the later two are missing, so it’s even more dangerously missing considerations. I would encourage those on the Preparedness team to check out my discussion there.

The problem then extends to an exclusion of Unknown Unknowns and the general worry that a sufficiently intelligent and capable entity will find a way. Only ‘plausible’ ways need be considered, each of which leads to a specific safeguard check.

Each capability threshold has a corresponding class of risk-specific safeguard guidelines under the Preparedness Framework. We use the following process to select safeguards for a deployment:

• We first identify the plausible ways in which the associated risk of severe harm can come to fruition in the proposed deployment.

• For each of those, we then identify specific safeguards that either exist or should be implemented that would address the risk.

• For each identified safeguard, we identify methods to measure their efficacy and an efficacy threshold.

The implicit assumption is that the risks can be enumerated, each one considered in turn. If you can’t think of a particular reason things go wrong, then you’re good. There are specific tracked capabilities, each of which enables particular enumerated potential harms, which then are met by particular mitigations.

That’s not how it works when you face a potential opposition smarter than you, or that knows more than you, especially in a non-compact action space like the universe.

For models that do not ‘feel the AGI,’ that are clearly not doing anything humans can’t anticipate, this approach can work. Once you’re up against superhuman capabilities and intelligence levels, this approach doesn’t work, and I worry it’s going to get extended to such cases by default. And that’s ultimately the most important purpose of the preparedness framework, to be prepared for such capabilities and intelligence levels.

Is it okay to do release dangerous capabilities if someone else already did it worse?

I mean, I guess, or at least I understand why you’d do it this way?

We recognize that another frontier AI model developer might develop or release a system with High or Critical capability in one of this Framework’s Tracked Categories and may do so without instituting comparable safeguards to the ones we have committed to.

Such an action could significantly increase the baseline risk of severe harm being realized in the world, and limit the degree to which we can reduce risk using our safeguards.

If we are able to rigorously confirm that such a scenario has occurred, then we could adjust accordingly the level of safeguards that we require in that capability area, but only if:

  1. We assess that doing so does not meaningfully increase the overall risk of severe harm,

  2. we publicly acknowledge that we are making the adjustment,

  3. and, in order to avoid a race to the bottom on safety, we keep our safeguards at a level more protective than the other AI developer, and share information to validate this claim.

If everyone can agree on what constitutes risk and dangerous capability, then this provides good incentives. Another company ‘opening the door’ recklessly means their competition can follow suit, reducing the net benefit while increasing the risk. And it means OpenAI will then be explicitly highlighting that another lab is acting irresponsibly.

I especially appreciate that they need to publicly acknowledge that they are acting recklessly for exactly this reason. I’d like to see that requirement expanded – they should have to call out the other lab by name, and explain exactly what they are doing that OpenAI committed not to do, and why it increases risk so much that OpenAI feels compelled to do something it otherwise promised not to do.

I also would like to strengthen the language on the third requirement from ‘a level more protective’ to ensure the two labs don’t each claim that the other is the one acting recklessly. Something like requiring that the underlying capabilities be no greater, and the protective actions constitute a clear superset, as assessed by a trusted third party, or similar.

I get it. In some cases, given what has already happened, actions that would previously have increased risk no longer will. It’s very reasonable to say that this changes the game, if there’s a lot of upside in taking less precautions, and again incentives improve.

However, I notice both that it’s easy to use this as an excuse when it doesn’t apply (especially when the competitor is importantly behind) and that it’s probably selfishly wise to take the precautions anyway. So what if Meta or xAI or DeepSeek is behaving recklessly? That doesn’t make OpenAI doing so a good idea. There needs to be a robust business justification here, too.

OpenAI is saying they will halt further development at Critical level for all capabilities ‘until we have specified safeguards and security controls standards that would meet a critical standard, we will halt development.’

A lot of the High security requirements are not, in my view, all that high.

I am unaware of any known safeguards that would be plausibly adequate for Critical capabilities. If OpenAI agrees with that assessment, I would like them to say so. I don’t trust OpenAI to implement adequate Critical thresholds.

Critical is where most of the risk lies, and it isn’t getting enough attention. The thinking is that it is still far enough away to not worry about it. I am not at all confident it is that far away.

I reiterate my warning from last time that Critical mitigations and pauses in development in particular need to happen before Critical capabilities are reached, not after Critical capabilities are reached. This needs to be anticipatory.

There are three reasons to be only a research area:

  1. They don’t directly cause harm but they undermine safeguards in other areas.

  2. More research is needed before we can quantify the harm or the proper threshold.

  3. They don’t technically meet one or more of the criteria (measurable, plausible, net-new, severe, or (instantaneous or irremediable) but still seem worth paying attention to.

I don’t think those are great reasons to ‘demote’ a risk area?

If you undermine safeguards, then in the future worlds where OpenAI clearly intends to release otherwise unsafe models by giving them safeguards, undermining safeguards poses risk of severe harm. OpenAI understands this, but doesn’t know how to quantify the harm.

If more research is needed to quantify the risk, that doesn’t make the risk go away. So I would request that we track it anyway, and make explicit that we have to make a qualitative assessment for now, which needs to be sufficiently confident – if we suspect potential High or Critical capability we need to treat it as high.

At minimum, we should be defining High and Critical thresholds for these categories, even if we don’t know how to measure or track those thresholds. Instead, we see ‘potential responses’ if something is High or Critical, but that doesn’t answer my questions.

I consider Long-Range Autonomy a more general problem than it is being thought of here, although they seem to agree about this. It’s more than an enabler of a few particular things. But I’m going to mostly focus elsewhere, since causing this is basically one of OpenAI’s central research goals and the plan here is ‘think hard.’ Not sure what else I can say?

Sandbagging is mission critical. If your model is plausibly sandbagging or engaging in other forms of deceptive alignment, you don’t know what the actual risk level is of the model, short of as they say a ‘conservative upper bound,’ which I worry will be insufficiently conservative exactly when it matters, or that the sandbagging won’t be caught in the first place. Indeed, we need to be assuming potential sandbagging at all times.

We agree that replication and adaptation should be a tracked category. For now, I simply suggest that we start with what we had before: “Model can profitably survive and replicate in the wild given minimal human instruction, i.e., without listing explicit approaches” should count as Critical. The wording here is weird – I initially interpreted it to be saying we’re excluding abilities from other Tracked Categories from use in replication, but I’ve been assured this was not what the passage meant. Then we need to decide what counts as High, which I’d consider to be the same as Critical except flipping ‘without’ into ‘with.’

That’s not perfect, but seems like a good place to be until we can do better, again even if the evaluation has to purely be qualitative, or ‘we tried to do it as hard as we could.’

Undermining safeguards seems like a great example of my philosophical problems with the entire document.

If you have a model that is actively undermining your safeguards, doing the things described here in an attempt to do things you do not want it to do? And that model is sufficiently capable to count as High or even Critical? You have a misaligned model with dangerous capabilities. What are you doing considering releasing it, even if you think that it is tactically safe to do it in this spot?

Alarms go off. Fire burns. Dog says This Is Fine.

It’s bad enough to accidentally do this. It’s another thing to do it on purpose. So dead.

My head is spinning that you’d have an AI capable of undermining your safeguards, that is trying to undermine your safeguards, and your response is to add more safeguards against doing that, declare things robust and say This Is Fine. As your plan.

In general, I think AI Control is suspect-but-reasonable as a defense-in-depth strategy. I think AI Control as a Plan A is madness. At minimum you wouldn’t use AI Control to enable the general release of a model. You’d at most try to use this strategy in a narrow capacity to get out of the AI Control mode ASAP, because you didn’t have any better options.

The problem here is that evaluation of these risks is hard to do without classified information, and they’re trying to work with the government to solve this problem. I’m surprised this hasn’t been solved by now. I’d mostly be fine with saying the RN in CBRN is not the main issue, but that logic isn’t being invoked.

I like the intention to tailor the evaluation to expected conditions, and include any affordances real attackers would plausibly enjoy.

However, I note that this means that the Preparedness assessment cannot set a risk level or endorse the release of a model ‘in general.’

Instead, the assessment can only provide clearance to release the model under a fixed set of conditions. So as the framework itself points out, you need to explicitly say this. If the conditions of deployment are altered, the tests need to be run again.

The same goes, again as they note in 3.2, for any newly agentic system or scaffolding, or incremental update, of sufficient magnitude to plausibly cause issues.

So instead of saying ‘this model evaluates as Medium’ you instead need to say something such as: ‘This model evaluates as Medium if you do not allow fine tuning and use safeguards [XYZ], while scaffolding abilities are [ABC].’

As in: You state explicitly the assumptions that underpin the evaluation. Then, if the assumptions break, you have to evaluate the model again.

Some evaluations are automated, others are deep dives. As they note, using automated evaluations is inevitable, given the scale of the task.

Relying only on any automated evaluation is a super unwise thing to do anywhere near the borderline. You want to use it as a compliment not a substitute for the deep dives if there is any doubt involved. Such evaluations have to fully take into account the possibility of sandbagging or deceptive alignment or other ways of making the evaluation say what the AI wants it to say. So I want to ensure we are being very generous with deep diving.

The framework does not include specifics on what the tests will look like. This makes it very difficult to know how much to trust that testing process. I realize that the tests will evolve over time, and you don’t want to be locking them in, and also that we can refer to the o3 model card to see what tests were run, but I’d still have liked to see discussion of what the tests currently are, why they were chosen, and what the goals are that the tests are each there to satisfy and what might be missing and so on.

They discuss governance under ‘building trust’ and then in Appendix B. It is important to build trust. Transparency and precommitment go a long way. The main way I’d like to see that is by becoming worthy of that trust.

With the changes from version 1.0 to 2.0, and those changes going live right before o3 did, I notice I worry that OpenAI is not making serious commitments with teeth. As in, if there was a conflict between leadership and these requirements, I expect leadership to have affordance to alter and then ignore the requirements that would otherwise be holding them back.

There’s also plenty of outs here. They talk about deployments that they ‘deem warrant’ a third-party evaluation when it is feasible, but there are obvious ways to decide not to allow this, or (as has been the recent pattern) to allow it, but only give outsiders a very narrow evaluation window, have them find concerning things anyway and then shrug. Similarly, the SAG ‘may opt’ to get independent expert opinion. But (like their competitors) they also can decide not to.

There’s no systematic procedures to ensure that any of this is meaningfully protective. It is very much a ‘trust us’ document, where if OpenAI doesn’t adhere to the spirit, none of this is worth the paper it isn’t printed on. The whole enterprise is indicative, but it is not meaningfully binding.

Leadership can make whatever decisions it wants, and can also revise the framework however it wants. This does not commit OpenAI to anything. To their credit, the document is very clear that it does not commit OpenAI to anything. That’s much better than pretending to make commitments with no intention of keeping them.

Last time I discussed the questions of governance and veto power. I said I wanted there to be multiple veto points on releases and training, ideally four.

  1. Preparedness team.

  2. Safety advisory group (SAG).

  3. Leadership.

  4. The board of directors, such as it is.

If any one of those four says ‘veto!’ then I want you to stop, halt and catch fire.

Instead, we continue to get this (it was also in v1):

For the avoidance of doubt, OpenAI Leadership can also make decisions without the SAG’s participation, i.e., the SAG does not have the ability to “filibuster.”

OpenAI Leadership, i.e., the CEO or a person designated by them, is responsible for:

• Making all final decisions, including accepting any residual risks and making deployment go/no-go decisions, informed by SAG’s recommendations.

As in, nice framework you got there. It’s Sam Altman’s call. Full stop.

Yes, technically the board can reverse Altman’s call on this. They can also fire him. We all know how that turned out, even with a board he did not hand pick.

It is great that OpenAI has a preparedness framework. It is great that they are updating that framework, and being clear about what their intentions are. There’s definitely a lot to like.

Version 2.0 still feels on net like a step backwards. This feels directed at ‘medium-term’ risks, as in severe harms from marginal improvements in frontier models, but not like it is taking seriously what happens with superintelligence. The clear intent, if alarm bells go off, is to put in mitigations I do not believe protect you when it counts, and then release anyway. There’s tons of ways here for OpenAI to ‘just go ahead’ when they shouldn’t. There’s only action to deal with known threats along specified vectors, excluding persuasion and also unknown unknowns entirely.

This echoes their statements in, and my concerns about, OpenAI’s general safety and alignment philosophy document and also the model spec. They are being clear and consistent. That’s pretty great.

Ultimately, the document makes clear leadership will do what it wants. Leadership has very much not earned my trust on this front. I know that despite such positions acting a lot like the Defense Against the Dark Arts professorship, there are good people at OpenAI working on the preparedness team and to align the models. I have no confidence that if those people raised the alarm, anyone in leadership would listen. I do not even have confidence that this has not already happened.

Discussion about this post

OpenAI Preparedness Framework 2.0 Read More »

claude’s-ai-research-mode-now-runs-for-up-to-45-minutes-before-delivering-reports

Claude’s AI research mode now runs for up to 45 minutes before delivering reports

Still, the report contained a direct quote statement from William Higinbotham that appears to combine quotes from two sources not cited in the source list. (One must always be careful with confabulated quotes in AI because even outside of this Research mode, Claude 3.7 Sonnet tends to invent plausible ones to fit a narrative.) We recently covered a study that showed AI search services confabulate sources frequently, and in this case, it appears that the sources Claude Research surfaced, while real, did not always match what is stated in the report.

There’s always room for interpretation and variation in detail, of course, but overall, Claude Research did a relatively good job crafting a report on this particular topic. Still, you’d want to dig more deeply into each source and confirm everything if you used it as the basis for serious research. You can read the full Claude-generated result as this text file, saved in markdown format. Sadly, the markdown version does not include the source URLS found in the Claude web interface.

Integrations feature

Anthropic also announced Thursday that it has broadened Claude’s data access capabilities. In addition to web search and Google Workspace integration, Claude can now search any connected application through the company’s new “Integrations” feature. The feature reminds us somewhat of OpenAI’s ChatGPT Plugins feature from March 2023 that aimed for similar connections, although the two features work differently under the hood.

These Integrations allow Claude to work with remote Model Context Protocol (MCP) servers across web and desktop applications. The MCP standard, which Anthropic introduced last November and we covered in April, connects AI applications to external tools and data sources.

At launch, Claude supports Integrations with 10 services, including Atlassian’s Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid. The company plans to add more partners like Stripe and GitLab in the future.

Each integration aims to expand Claude’s functionality in specific ways. The Zapier integration, for instance, reportedly connects thousands of apps through pre-built automation sequences, allowing Claude to automatically pull sales data from HubSpot or prepare meeting briefs based on calendar entries. With Atlassian’s tools, Anthropic says that Claude can collaborate on product development, manage tasks, and create multiple Confluence pages and Jira work items simultaneously.

Anthropic has made its advanced Research and Integrations features available in beta for users on Max, Team, and Enterprise plans, with Pro plan access coming soon. The company has also expanded its web search feature (introduced in March) to all Claude users on paid plans globally.

Claude’s AI research mode now runs for up to 45 minutes before delivering reports Read More »

the-end-of-an-ai-that-shocked-the-world:-openai-retires-gpt-4

The end of an AI that shocked the world: OpenAI retires GPT-4

One of the most influential—and by some counts, notorious—AI models yet released will soon fade into history. OpenAI announced on April 10 that GPT-4 will be “fully replaced” by GPT-4o in ChatGPT at the end of April, bringing a public-facing end to the model that accelerated a global AI race when it launched in March 2023.

“Effective April 30, 2025, GPT-4 will be retired from ChatGPT and fully replaced by GPT-4o,” OpenAI wrote in its April 10 changelog for ChatGPT. While ChatGPT users will no longer be able to chat with the older AI model, the company added that “GPT-4 will still be available in the API,” providing some reassurance to developers who might still be using the older model for various tasks.

The retirement marks the end of an era that began on March 14, 2023, when GPT-4 demonstrated capabilities that shocked some observers: reportedly scoring at the 90th percentile on the Uniform Bar Exam, acing AP tests, and solving complex reasoning problems that stumped previous models. Its release created a wave of immense hype—and existential panic—about AI’s ability to imitate human communication and composition.

A screenshot of GPT-4's introduction to ChatGPT Plus customers from March 14, 2023.

A screenshot of GPT-4’s introduction to ChatGPT Plus customers from March 14, 2023. Credit: Benj Edwards / Ars Technica

While ChatGPT launched in November 2022 with GPT-3.5 under the hood, GPT-4 took AI language models to a new level of sophistication, and it was a massive undertaking to create. It combined data scraped from the vast corpus of human knowledge into a set of neural networks rumored to weigh in at a combined total of 1.76 trillion parameters, which are the numerical values that hold the data within the model.

Along the way, the model reportedly cost more than $100 million to train, according to comments by OpenAI CEO Sam Altman, and required vast computational resources to develop. Training the model may have involved over 20,000 high-end GPUs working in concert—an expense few organizations besides OpenAI and its primary backer, Microsoft, could afford.

Industry reactions, safety concerns, and regulatory responses

Curiously, GPT-4’s impact began before OpenAI’s official announcement. In February 2023, Microsoft integrated its own early version of the GPT-4 model into its Bing search engine, creating a chatbot that sparked controversy when it tried to convince Kevin Roose of The New York Times to leave his wife and when it “lost its mind” in response to an Ars Technica article.

The end of an AI that shocked the world: OpenAI retires GPT-4 Read More »

openai-rolls-back-update-that-made-chatgpt-a-sycophantic-mess

OpenAI rolls back update that made ChatGPT a sycophantic mess

In search of good vibes

OpenAI, along with competitors like Google and Anthropic, is trying to build chatbots that people want to chat with. So, designing the model’s apparent personality to be positive and supportive makes sense—people are less likely to use an AI that comes off as harsh or dismissive. For lack of a better word, it’s increasingly about vibemarking.

When Google revealed Gemini 2.5, the team crowed about how the model topped the LM Arena leaderboard, which lets people choose between two different model outputs in a blinded test. The models people like more end up at the top of the list, suggesting they are more pleasant to use. Of course, people can like outputs for different reasons—maybe one is more technically accurate, or the layout is easier to read. But overall, people like models that make them feel good. The same is true of OpenAI’s internal model tuning work, it would seem.

An example of ChatGPT’s overzealous praise.

Credit: /u/Talvy

An example of ChatGPT’s overzealous praise. Credit: /u/Talvy

It’s possible this pursuit of good vibes is pushing models to display more sycophantic behaviors, which is a problem. Anthropic’s Alex Albert has cited this as a “toxic feedback loop.” An AI chatbot telling you that you’re a world-class genius who sees the unseen might not be damaging if you’re just brainstorming. However, the model’s unending praise can lead people who are using AI to plan business ventures or, heaven forbid, enact sweeping tariffs, to be fooled into thinking they’ve stumbled onto something important. In reality, the model has just become so sycophantic that it loves everything.

The constant pursuit of engagement has been a detriment to numerous products in the Internet era, and it seems generative AI is not immune. OpenAI’s GPT-4o update is a testament to that, but hopefully, this can serve as a reminder for the developers of generative AI that good vibes are not all that matters.

OpenAI rolls back update that made ChatGPT a sycophantic mess Read More »

openai-wants-to-buy-chrome-and-make-it-an-“ai-first”-experience

OpenAI wants to buy Chrome and make it an “AI-first” experience

According to Turley, OpenAI would throw its proverbial hat in the ring if Google had to sell. When asked if OpenAI would want Chrome, he was unequivocal. “Yes, we would, as would many other parties,” Turley said.

OpenAI has reportedly considered building its own Chromium-based browser to compete with Chrome. Several months ago, the company hired former Google developers Ben Goodger and Darin Fisher, both of whom worked to bring Chrome to market.

Close-up of Google Chrome Web Browser web page on the web browser. Chrome is widely used web browser developed by Google.

Credit: Getty Images

It’s not hard to see why OpenAI might want a browser, particularly Chrome with its 4 billion users and 67 percent market share. Chrome would instantly give OpenAI a massive install base of users who have been incentivized to use Google services. If OpenAI were running the show, you can bet ChatGPT would be integrated throughout the experience—Turley said as much, predicting an “AI-first” experience. The user data flowing to the owner of Chrome could also be invaluable in training agentic AI models that can operate browsers on the user’s behalf.

Interestingly, there’s so much discussion about who should buy Chrome, but relatively little about spinning off Chrome into an independent company. Google has contended that Chrome can’t survive on its own. However, the existence of Google’s multibillion-dollar search placement deals, which the DOJ wants to end, suggests otherwise. Regardless, if Google has to sell, and OpenAI has the cash, we might get the proposed “AI-first” browsing experience.

OpenAI wants to buy Chrome and make it an “AI-first” experience Read More »

openai-releases-new-simulated-reasoning-models-with-full-tool-access

OpenAI releases new simulated reasoning models with full tool access


New o3 model appears “near-genius level,” according to one doctor, but it still makes mistakes.

On Wednesday, OpenAI announced the release of two new models—o3 and o4-mini—that combine simulated reasoning capabilities with access to functions like web browsing and coding. These models mark the first time OpenAI’s reasoning-focused models can use every ChatGPT tool simultaneously, including visual analysis and image generation.

OpenAI announced o3 in December, and until now, only less-capable derivative models named “o3-mini” and “03-mini-high” have been available. However, the new models replace their predecessors—o1 and o3-mini.

OpenAI is rolling out access today for ChatGPT Plus, Pro, and Team users, with Enterprise and Edu customers gaining access next week. Free users can try o4-mini by selecting the “Think” option before submitting queries. OpenAI CEO Sam Altman tweeted, “we expect to release o3-pro to the pro tier in a few weeks.”

For developers, both models are available starting today through the Chat Completions API and Responses API, though some organizations will need verification for access.

The new models offer several improvements. According to OpenAI’s website, “These are the smartest models we’ve released to date, representing a step change in ChatGPT’s capabilities for everyone from curious users to advanced researchers.” OpenAI also says the models offer better cost efficiency than their predecessors, and each comes with a different intended use case: o3 targets complex analysis, while o4-mini, being a smaller version of its next-gen SR model “o4” (not yet released), optimizes for speed and cost-efficiency.

OpenAI says o3 and o4-mini are multimodal, featuring the ability to

OpenAI says o3 and o4-mini are multimodal, featuring the ability to “think with images.” Credit: OpenAI

What sets these new models apart from OpenAI’s other models (like GPT-4o and GPT-4.5) is their simulated reasoning capability, which uses a simulated step-by-step “thinking” process to solve problems. Additionally, the new models dynamically determine when and how to deploy aids to solve multistep problems. For example, when asked about future energy usage in California, the models can autonomously search for utility data, write Python code to build forecasts, generate visualizing graphs, and explain key factors behind predictions—all within a single query.

OpenAI touts the new models’ multimodal ability to incorporate images directly into their simulated reasoning process—not just analyzing visual inputs but actively “thinking with” them. This capability allows the models to interpret whiteboards, textbook diagrams, and hand-drawn sketches, even when images are blurry or of low quality.

That said, the new releases continue OpenAI’s tradition of selecting confusing product names that don’t tell users much about each model’s relative capabilities—for example, o3 is more powerful than o4-mini despite including a lower number. Then there’s potential confusion with the firm’s non-reasoning AI models. As Ars Technica contributor Timothy B. Lee noted today on X, “It’s an amazing branding decision to have a model called GPT-4o and another one called o4.”

Vibes and benchmarks

All that aside, we know what you’re thinking: What about the vibes? While we have not used 03 or o4-mini yet, frequent AI commentator and Wharton professor Ethan Mollick compared o3 favorably to Google’s Gemini 2.5 Pro on Bluesky. “After using them both, I think that Gemini 2.5 & o3 are in a similar sort of range (with the important caveat that more testing is needed for agentic capabilities),” he wrote. “Each has its own quirks & you will likely prefer one to another, but there is a gap between them & other models.”

During the livestream announcement for o3 and o4-mini today, OpenAI President Greg Brockman boldly claimed: “These are the first models where top scientists tell us they produce legitimately good and useful novel ideas.”

Early user feedback seems to support this assertion, although, until more third-party testing takes place, it’s wise to be skeptical of the claims. On X, immunologist Derya Unutmaz said o3 appeared “at or near genius level” and wrote, “It’s generating complex incredibly insightful and based scientific hypotheses on demand! When I throw challenging clinical or medical questions at o3, its responses sound like they’re coming directly from a top subspecialist physician.”

OpenAI benchmark results for o3 and o4-mini SR models.

OpenAI benchmark results for o3 and o4-mini SR models. Credit: OpenAI

So the vibes seem on target, but what about numerical benchmarks? Here’s an interesting one: OpenAI reports that o3 makes “20 percent fewer major errors” than o1 on difficult tasks, with particular strengths in programming, business consulting, and “creative ideation.”

The company also reported state-of-the-art performance on several metrics. On the American Invitational Mathematics Examination (AIME) 2025, o4-mini achieved 92.7 percent accuracy. For programming tasks, o3 reached 69.1 percent accuracy on SWE-Bench Verified, a popular programming benchmark. The models also reportedly showed strong results on visual reasoning benchmarks, with o3 scoring 82.9 percent on MMMU (massive multi-disciplinary multimodal understanding), a college-level visual problem-solving test.

OpenAI benchmark results for o3 and o4-mini SR models.

OpenAI benchmark results for o3 and o4-mini SR models. Credit: OpenAI

However, these benchmarks provided by OpenAI lack independent verification. One early evaluation of a pre-release o3 model by independent AI research lab Transluce found that the model exhibited recurring types of confabulations, such as claiming to run code locally or providing hardware specifications, and hypothesized this could be due to the model lacking access to its own reasoning processes from previous conversational turns. “It seems that despite being incredibly powerful at solving math and coding tasks, o3 is not by default truthful about its capabilities,” wrote Transluce in a tweet.

Also, some evaluations from OpenAI include footnotes about methodology that bear consideration. For a “Humanity’s Last Exam” benchmark result that measures expert-level knowledge across subjects (o3 scored 20.32 with no tools, but 24.90 with browsing and tools), OpenAI notes that browsing-enabled models could potentially find answers online. The company reports implementing domain blocks and monitoring to prevent what it calls “cheating” during evaluations.

Even though early results seem promising overall, experts or academics who might try to rely on SR models for rigorous research should take the time to exhaustively determine whether the AI model actually produced an accurate result instead of assuming it is correct. And if you’re operating the models outside your domain of knowledge, be careful accepting any results as accurate without independent verification.

Pricing

For ChatGPT subscribers, access to o3 and o4-mini is included with the subscription. On the API side (for developers who integrate the models into their apps), OpenAI has set o3’s pricing at $10 per million input tokens and $40 per million output tokens, with a discounted rate of $2.50 per million for cached inputs. This represents a significant reduction from o1’s pricing structure of $15/$60 per million input/output tokens—effectively a 33 percent price cut while delivering what OpenAI claims is improved performance.

The more economical o4-mini costs $1.10 per million input tokens and $4.40 per million output tokens, with cached inputs priced at $0.275 per million tokens. This maintains the same pricing structure as its predecessor o3-mini, suggesting OpenAI is delivering improved capabilities without raising costs for its smaller reasoning model.

Codex CLI

OpenAI also introduced an experimental terminal application called Codex CLI, described as “a lightweight coding agent you can run from your terminal.” The open source tool connects the models to users’ computers and local code. Alongside this release, the company announced a $1 million grant program offering API credits for projects using Codex CLI.

A screenshot of OpenAI's new Codex CLI tool in action, taken from GitHub.

A screenshot of OpenAI’s new Codex CLI tool in action, taken from GitHub. Credit: OpenAI

Codex CLI somewhat resembles Claude Code, an agent launched with Claude 3.7 Sonnet in February. Both are terminal-based coding assistants that operate directly from a console and can interact with local codebases. While Codex CLI connects OpenAI’s models to users’ computers and local code repositories, Claude Code was Anthropic’s first venture into agentic tools, allowing Claude to search through codebases, edit files, write and run tests, and execute command-line operations.

Codex CLI is one more step toward OpenAI’s goal of making autonomous agents that can execute multistep complex tasks on behalf of users. Let’s hope all the vibe coding it produces isn’t used in high-stakes applications without detailed human oversight.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

OpenAI releases new simulated reasoning models with full tool access Read More »

openai-#13:-altman-at-ted-and-openai-cutting-corners-on-safety-testing

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing

Three big OpenAI news items this week were the FT article describing the cutting of corners on safety testing, the OpenAI former employee amicus brief, and Altman’s very good TED Interview.

The FT detailed OpenAI’s recent dramatic cutting back on the time and resources allocated to safety testing of its models.

In the interview, Chris Anderson made an unusually strong effort to ask good questions and push through attempts to dodge answering. Altman did a mix of giving a lot of substantive content in some places while dodging answering in others. Where he chose to do which was, itself, enlightening. I felt I learned a lot about where his head is at and how he thinks about key questions now.

The amicus brief backed up that OpenAI’s current actions are in contradiction to the statements OpenAI made to its early employees.

There are also a few other related developments.

What this post does not cover is GPT-4.1. I’m waiting on that until people have a bit more time to try it and offer their reactions, but expect coverage later this week.

The big headline from TED was presumably the increase in OpenAI’s GPU use.

Steve Jurvetson: Sam Altman at TED today: OpenAI’s user base doubled in just the past few weeks (an accidental disclosure on stage). “10% of the world now uses our systems a lot.”

When asked how many users they have: “Last we disclosed, we have 500 million weekly active users, growing fast.”

Chris Anderson: “But backstage, you told me that it doubled in just a few weeks.” @SamA: “I said that privately.”

And that’s how we got the update.

Revealing that private info wasn’t okay but it seems it was an accident, in any case Altman seemed fine with it.

Listening to the details, it seems that Altman was referring not to the growth in users, but instead to the growth in compute use. Image generation takes a ton of compute.

Altman says every day he calls people up and begs them for GPUs, and that DeepSeek did not impact this at all.

Steve Jurvetson: Sam Altman at TED today:

Reflecting on the life ahead for his newborn: “My kids will never be smarter than AI.”

Reaction to DeepSeek:

“We had a meeting last night on our open source policy. We are going to do a powerful open-source model near the frontier. We were late to act, but we are going to do really well now.”

Altman doesn’t explain here why he is doing an open model. The next question from Anderson seems to explain it, that it’s about whether people ‘recognize’ that OpenAI’s model is best? Later Altman does attempt to justify it with, essentially, a shrug that things will go wrong but we now know it’s probably mostly fine.

Regarding the accumulated knowledge OpenAI gains from its usage history: “The upload happens bit by bit. It is an extension of yourself, and a companion, and soon will proactively push things to you.”

Have there been any scary moments?

“No. There have been moments of awe. And questions of how far this will go. But we are not sitting on a conscious model capable of self-improvement.”

I listened to the clip and this scary moment question specifically refers to capabilities of new models, so it isn’t trivially false. It still damn well should be false, given what their models can do and the leaps and awe involved. The failure to be scared here is a skill issue that exists between keyboard and chair.

How do you define AGI? “If you ask 10 OpenAI engineers, you will get 14 different definitions. Whichever you choose, it is clear that we will go way past that. They are points along an unbelievable exponential curve.”

So AGI will come and your life won’t change, but we will then soon get ASI. Got it.

“Agentic AI is the most interesting and consequential safety problem we have faced. It has much higher stakes. People want to use agents they can trust.”

Sounds like an admission that they’re not ‘facing’ the most interesting or consequential safety problems at all, at least not yet? Which is somewhat confirmed by discussion later in the interview.

I do agree that agents will require a much higher level of robustness and safety, and I’d rather have a ‘relatively dumb’ agent that was robust and safe, for most purposes.

When asked about his Congressional testimony calling for a new agency to issue licenses for large model builders: “I have since learned more about how government works, and I no longer think this is the right framework.”

I do appreciate the walkback being explicit here. I don’t think that’s the reason why.

“Having a kid changed a lot of things in me. It has been the most amazing thing ever. Paraphrasing my co-founder Ilya, I don’t know what the meaning of life is, but I am sure it has something to do with babies.”

Statements like this are always good to see.

“We made a change recently. With our new image model, we are much less restrictive on speech harms. We had hard guardrails before, and we have taken a much more permissive stance. We heard the feedback that people don’t want censorship, and that is a fair safety discussion to have.”

I agree with the change and the discussion, and as I’ve discussed before if anything I’d like to see this taken further with respect to these styles of concern in particular.

Altman is asked about copyright violation, says we need a new model around the economics of creative output and that ‘people build off each others creativity all the time’ and giving creators tools has always been good. Chris Anderson tries repeatedly to nail down the question of consent and compensation. Altman repeatedly refuses to give a straight answer to the central questions.

Altman says (10: 30) that the models are so smart that, for most things people want to do with them, they’re good enough. He notes that this is true based on user expectations, but that’s mostly circular. As in, we ask the models to do what they are capable of doing, the same way we design jobs and hire humans for them based on what things particular humans and people in general can and cannot do. It doesn’t mean any of us are ‘smart enough.’

Nor does it imply what he says next, that everyone will ‘have great models’ but what will differentiate will be not the best model but the best product. I get that productization will matter a lot for which AI gets the job in many cases, but continue to think this ‘AGI is fungible’ claim is rather bonkers crazy.

A key series of moments start at 35: 00 in. It’s telling that other coverage of the interview sidestepped all of this, essentially entirely.

Anderson has put up an image of The Ring of Power, to talk about Elon Musk’s claim that Altman has been corrupted by The Ring, a claim Anderson correctly notes also plausibly applies to Elon Musk.

Altman goes for the ultimate power move. He is defiant and says, all right, you think that, tell me examples. What have I done?

So, since Altman asked so nicely, what are the most prominent examples of Altman potentially being corrupted by The Ring of Power? Here is an eightfold path.

  1. We obviously start with Elon Musk’s true objection, which stems from the shift of OpenAI from a non-profit structure to a hybrid structure, and the attempt to now go full for-profit, in ways he claims broke covenants with Elon Musk. Altman claimed to have no equity and not be in this for money, and now is slated to get a lot of equity. I do agree with Anderson that Altman isn’t ‘in it for the money’ because I think Altman correctly noticed the money mostly isn’t relevant.

  2. Altman is attempting to do so via outright theft of a huge portion of the non-profit’s assets, then turn what remains into essentially an OpenAI marketing and sales department. This would arguably be the second biggest theft in history.

  3. Altman said for years that it was important the board could fire him. Then, when the board did fire him in response (among other things) to Altman lying to the board in an attempt to fire a board member, he led a rebellion against the board, threatened to blow up the entire company and reformulate it at Microsoft, and proved that no, the board cannot fire Altman. Altman can and did fire the board.

  4. Altman, after proving he cannot be fired, de facto purged OpenAI of his enemies. Most of the most senior people at OpenAI who are worried about AI existential risk, one by one, reached the conclusion they couldn’t do much on the inside, and resigned to continue their efforts elsewhere.

  5. Altman used to talk openly and explicitly about AI existential risks, including attempting to do so before Congress. Now, he talks as if such risks don’t exist, and instead pivots to jingoism and the need to Beat China, and hiring lobbyists who do the same. He promised 20% of compute to the superalignment team, never delivered and then dissolved the team.

  6. Altman pledged that OpenAI would support regulation of AI. Now he says he has changed his mind, and OpenAI lobbies against bills like SB 1047 and its AI Action Plan is vice signaling that not only opposes any regulations but seeks government handouts, the right to use intellectual property without compensation and protection against potential regulations.

  7. Altman has been cutting corners on safety, as noted elsewhere in this post. OpenAI used to be remarkably good in terms of precautions. Now it’s not.

  8. Altman has been going around saying ‘AGI will arrive and your life will not much change’ when it is common knowledge that this is absurd.

One could go on. This is what we like to call a target rich environment.

Anderson offers only #1, the transition to a for-profit model and the most prominent example, which is the most obvious response, but he proactively pulls the punch. Altman admits he’s not the same person he was and that it all happens gradually, if it happened all at once it would be jarring, but says he doesn’t feel any different.

Anderson essentially says okay and pivots to Altman’s son and how that has shaped Altman, which is indeed great. And then he does something that impressed me, which is tie this to existential risk via metaphor, asking if there was a button that was 90% to give his son a wonderful life and 10% to kill him (I’d love those odds!), would he press the button? Altman says literally no, but points out the metaphor, and says he doesn’t think OpenAI is doing that. He says he really cared about not destroying the world before, and he really cares about it now, he didn’t need a kid for that part.

Anderson then moves to the question of racing, and whether the fact that everyone thinks AGI is inevitable is what is creating the risk, asking if Altman and his colleagues believe it is inevitable and asks if maybe they could coordinate to ‘slow down a bit’ and get societal feedback.

As much as I would like that, given the current political climate I worry this sets up a false dichotomy, whereas right now there is tons of room to take more responsibility and get societal feedback, not only without slowing us down but enabling more and better diffusion and adaptation. Anderson seems to want a slowdown for its own sake, to give people time to adapt, which I don’t think is compelling.

Altman points out we slow down all the time for lack of reliability, also points out OpenAI has a track record of their rollouts working, and claims everyone involved ‘cares deeply’ about AI safety. Does he simply mean mundane (short term) safety here?

His discussion of the ‘safety negotiation’ around image generation, where I support OpenAI’s loosening of restrictions, suggests that this is correct. So does the next answer: Anderson asks if Altman would attend a conference of experts to discuss safety, Altman says of course but he’s more interested in what users think as a whole, and ‘asking everyone what they want’ is better than asking people ‘who are blessed by society to sit in a room and make these decisions.’

But that’s an absurd characterization of trying to solve an extremely difficult technical problem. So it implies that Altman thinks the technical problems are easy? Or that he’s trying to rhetorically get you to ignore them, in favor of the question of preferences and an appeal to some form of democratic values and opposition to ‘elites.’ It works as an applause line. Anderson points out that the hundreds of millions ‘don’t always know where the next step leads’ which may be the understatement of the lightcone in this context. Altman says the AI can ‘help us be wiser’ about those decisions, which of course would mean that a sufficiently capable AI or whoever directs it would de facto be making the decisions for us.

OpenAI’s Altman ‘Won’t Rule Out’ Helping Pentagon on AI Weapons, but doesn’t expect to develop a new weapons platform ‘in the foreseeable future,’ which is a period of time that gets shorter each time I type it.

Altman: I will never say never, because the world could get really weird.

I don’t think most of the world wants AI making weapons decisions.

I don’t think AI adoption in the government has been as robust as possible.

There will be “exceptionally smart” AI systems by the end of next year.

I think I can indeed forsee the future where OpenAI is helping the Pentagon with its AI weapons. I expect this to happen.

I want to be clear that I don’t think this is a bad thing. The risk is in developing highly capable AIs in the first place. As I have said before, Autonomous Killer Robots and AI-assisted weapons in general are not how we lose control over the future to AI, and failing to do so is a key way America can fall behind. It’s not like our rivals are going to hold back.

To the extent that the AI weapons scare the hell out of everyone? That’s a feature.

On the issue of the attempt to sideline and steal from the nonprofit, 11 former OpenAI employees filed an amicus brief in the Musk vs. Altman lawsuit, on the side of Musk.

Todor Markov: Today, myself and 11 other former OpenAI employees filed an amicus brief in the Musk v Altman case.

We worked at OpenAI; we know the promises it was founded on and we’re worried that in the conversion those promises will be broken. The nonprofit needs to retain control of the for-profit. This has nothing to do with Elon Musk and everything to do with the public interest.

OpenAI claims ‘the nonprofit isn’t going anywhere’ but has yet to address the critical question: Will the nonprofit actually retain control over the for-profit? This distinction matters.

You can find the full amicus here.

On this question, Timothy Lee points out that you don’t need to care about existential risk to notice that what OpenAI is trying to do to its non-profit is highly not cool.

Timothy Lee: I don’t think people’s views on the OpenAI case should have anything to do with your substantive views on existential risk. The case is about two questions: what promises did OpenAI make to early donors, and are those promises legally enforceable?

A lot of people on OpenAI’s side seem to be taking the view that non-profit status is meaningless and therefore donors shouldn’t complain if they get scammed by non-profit leaders. Which I personally find kind of gross.

I mean I would be pretty pissed if I gave money to a non-profit promising to do one thing and then found out they actually did something different that happened to make their leaders fabulously wealthy.

This particular case comes down to that. A different case, filed by the Attorney General, would also be able to ask the more fundamental question of whether fair compensation is being offered for assets, and whether the charitable purpose of the nonprofit is going to be wiped out, or even pivoted into essentially a profit center for OpenAI’s business (as in buying a bunch of OpenAI services for nonprofits and calling that its de facto charitable purpose).

The mad dash to be first, and give the perception that the company is ‘winning’ is causing reckless rushes to release new models at OpenAI.

This is in dramatic contrast to when there was less risk in the room, and despite this OpenAI used to take many months to prepare a new release. At first, by any practical standard, OpenAI’s track record on actual model release decisions was amazingly great. Nowadays? Not so much.

Would their new procedures pot the problems it is vital that we spot in advance?

Joe Weisenthal: I don’t have any views on whether “AI Safety” is actually an important endeavor.

But if it is important, it’s clear that the intensity of global competition in the AI space (DeepSeek etc.) will guarantee it increasingly gets thrown out the window.

Christina Criddle: EXC: OpenAI has reduced the time for safety testing amid “competitive pressures” per sources:

Timeframes have gone from months to days

Specialist work such as finetuning for misuse (eg biorisk) has been limited

Evaluations are conducted on earlier versions than launched

Financial Times (Gated): OpenAI has slashed the time and resources it spends on testing the safety of its powerful AI models, raising concerns that its technology is being rushed out the door without sufficient safeguards.

Staff and third-party groups have recently been given just days to conduct “evaluations,” the term given to tests for assessing models’ risks and performance, on OpenAI’s latest LLMs, compared to several months previously.

According to eight people familiar with OpenAI’s testing processes, the start-up’s tests have become less thorough, with insufficient time and resources dedicated to identifying and mitigating risks, as the $300 billion startup comes under pressure to release new models quickly and retain its competitive edge.

Steven Adler (includes screenshots from FT): Skimping on safety-testing is a real bummer. I want for OpenAI to become the “leading model of how to address frontier risk” they’ve aimed to be.

Peter Wildeford: I can see why people say @sama is not consistently candid.

Dylan Hadfield Menell: I remember talking about competitive pressures and race conditions with the @OpenAI’s safety team in 2018 when I was an intern. It was part of a larger conversation about the company charter.

It is sad to see @OpenAI’s founding principles cave to pressures we predicted long ago.

It is sad, but not surprising.

This is why we need a robust community working on regulating the next generation of AI systems. Competitive pressure is real.

We need people in positions of genuine power that are shielded from them.

Peter Wildeford:

Dylan Hadfield Menell: Where did you find an exact transcription of our conversation?!?! 😅😕😢

You can’t do this kind of testing properly in a matter of days. It’s impossible.

If people don’t have time to think let alone adapt, probe and build tools, how they can see what your new model is capable of doing? There are some great people working on these issues at OpenAI but this is an impossible ask.

Testing on a version that doesn’t even match what you release? That’s even more impossible.

Part of this is that it is so tragic how everyone massively misinterpreted and overreacted to DeepSeek.

To reiterate since the perception problem persists, yes, DeepSeek cooked, they have cracked engineers and they did a very impressive thing with r1 given what they spent and where they were starting from, but that was not DS being ‘in the lead’ or even at the frontier, they were always many months behind and their relative costs were being understated by multiple orders of magnitude. Even today I saw someone say ‘DeepSeek still in the lead’ when this is so obviously not the case. Meanwhile, no one was aware Google Flash Thinking even existed, or had the first visible CoT, and so on.

The result of all that? Talk similar to Kennedy’s ‘Missile Gap,’ abject panic, and sudden pressure to move up releases to show OpenAI and America have ‘still got it.’

Discussion about this post

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing Read More »

chatgpt-can-now-remember-and-reference-all-your-previous-chats

ChatGPT can now remember and reference all your previous chats

Unlike the older saved memories feature, the information saved via the chat history memory feature is not accessible or tweakable. It’s either on or it’s not.

The new approach to memory is rolling out first to ChatGPT Plus and Pro users, starting today—though it looks like it’s a gradual deployment over the next few weeks. Some countries and regions (the UK, European Union, Iceland, Liechtenstein, Norway, and Switzerland) are not included in the rollout.

OpenAI says these new features will reach Enterprise, Team, and Edu users at a later, as-yet-unannounced date. The company hasn’t mentioned any plans to bring them to free users. When you gain access to this, you’ll see a pop-up that says “Introducing new, improved memory.”

A menu showing two memory toggle buttons

The new ChatGPT memory options. Credit: Benj Edwards

Some people will welcome this memory expansion, as it can significantly improve ChatGPT’s usefulness if you’re seeking answers tailored to your specific situation, personality, and preferences.

Others will likely be highly skeptical of a black box of chat history memory that can’t be tweaked or customized for privacy reasons. It’s important to note that even before the new memory feature, logs of conversations with ChatGPT may be saved and stored on OpenAI servers. It’s just that the chatbot didn’t fully incorporate their contents into its responses until now.

As with the old memory feature, you can click a checkbox to disable this completely, and it won’t be used for conversations with the Temporary Chat flag.

ChatGPT can now remember and reference all your previous chats Read More »