openai

judge-calls-out-openai’s-“straw-man”-argument-in-new-york-times-copyright-suit

Judge calls out OpenAI’s “straw man” argument in New York Times copyright suit

“Taken as true, these facts give rise to a plausible inference that defendants at a minimum had reason to investigate and uncover end-user infringement,” Stein wrote.

To Stein, the fact that OpenAI maintains an “ongoing relationship” with users by providing outputs that respond to users’ prompts also supports contributory infringement claims, despite OpenAI’s argument that ChatGPT’s “substantial noninfringing uses” are exonerative.

OpenAI defeated some claims

For OpenAI, Stein’s ruling likely disappoints, although Stein did drop some of NYT’s claims.

Likely upsetting to news publishers, that included a “free-riding” claim that ChatGPT unfairly profits off time-sensitive “hot news” items, including the NYT’s Wirecutter posts. Stein explained that news publishers failed to plausibly allege non-attribution (which is key to a free-riding claim) because, for example, ChatGPT cites the NYT when sharing information from Wirecutter posts. Those claims are pre-empted by the Copyright Act anyway, Stein wrote, granting OpenAI’s motion to dismiss.

Stein also dismissed a claim from the NYT regarding alleged removal of copyright management information (CMI), which Stein said cannot be proven simply because ChatGPT reproduces excerpts of NYT articles without CMI.

The Digital Millennium Copyright Act (DMCA) requires news publishers to show that ChatGPT’s outputs are “close to identical” to the original work, Stein said, and allowing publishers’ claims based on excerpts “would risk boundless DMCA liability”—including for any use of block quotes without CMI.

Asked for comment on the ruling, an OpenAI spokesperson declined to go into any specifics, instead repeating OpenAI’s long-held argument that AI training on copyrighted works is fair use. (Last month, OpenAI warned Donald Trump that the US would lose the AI race to China if courts ruled against that argument.)

“ChatGPT helps enhance human creativity, advance scientific discovery and medical research, and enable hundreds of millions of people to improve their daily lives,” OpenAI’s spokesperson said. “Our models empower innovation, and are trained on publicly available data and grounded in fair use.”

Judge calls out OpenAI’s “straw man” argument in New York Times copyright suit Read More »

mcp:-the-new-“usb-c-for-ai”-that’s-bringing-fierce-rivals-together

MCP: The new “USB-C for AI” that’s bringing fierce rivals together


Model context protocol standardizes how AI uses data sources, supported by OpenAI and Anthropic.

What does it take to get OpenAI and Anthropic—two competitors in the AI assistant market—to get along? Despite a fundamental difference in direction that led Anthropic’s founders to quit OpenAI in 2020 and later create the Claude AI assistant, a shared technical hurdle has now brought them together: How to easily connect their AI models to external data sources.

The solution comes from Anthropic, which developed and released an open specification called Model Context Protocol (MCP) in November 2024. MCP establishes a royalty-free protocol that allows AI models to connect with outside data sources and services without requiring unique integrations for each service.

“Think of MCP as a USB-C port for AI applications,” wrote Anthropic in MCP’s documentation. The analogy is imperfect, but it represents the idea that, similar to how USB-C unified various cables and ports (with admittedly a debatable level of success), MCP aims to standardize how AI models connect to the infoscape around them.

So far, MCP has also garnered interest from multiple tech companies in a rare show of cross-platform collaboration. For example, Microsoft has integrated MCP into its Azure OpenAI service, and as we mentioned above, Anthropic competitor OpenAI is on board. Last week, OpenAI acknowledged MCP in its Agents API documentation, with vocal support from the boss upstairs.

“People love MCP and we are excited to add support across our products,” wrote OpenAI CEO Sam Altman on X last Wednesday.

MCP has also rapidly begun to gain community support in recent months. For example, just browsing this list of over 300 open source servers shared on GitHub reveals growing interest in standardizing AI-to-tool connections. The collection spans diverse domains, including database connectors like PostgreSQL, MySQL, and vector databases; development tools that integrate with Git repositories and code editors; file system access for various storage platforms; knowledge retrieval systems for documents and websites; and specialized tools for finance, health care, and creative applications.

Other notable examples include servers that connect AI models to home automation systems, real-time weather data, e-commerce platforms, and music streaming services. Some implementations allow AI assistants to interact with gaming engines, 3D modeling software, and IoT devices.

What is “context” anyway?

To fully appreciate why a universal AI standard for external data sources is useful, you’ll need to understand what “context” means in the AI field.

With current AI model architecture, what an AI model “knows” about the world is baked into its neural network in a largely unchangeable form, placed there by an initial procedure called “pre-training,” which calculates statistical relationships between vast quantities of input data (“training data”—like books, articles, and images) and feeds it into the network as numerical values called “weights.” Later, a process called “fine-tuning” might adjust those weights to alter behavior (such as through reinforcement learning like RLHF) or provide examples of new concepts.

Typically, the training phase is very expensive computationally and happens either only once in the case of a base model, or infrequently with periodic model updates and fine-tunings. That means AI models only have internal neural network representations of events prior to a “cutoff date” when the training dataset was finalized.

After that, the AI model is run in a kind of read-only mode called “inference,” where users feed inputs into the neural network to produce outputs, which are called “predictions.” They’re called predictions because the systems are tuned to predict the most likely next token (a chunk of data, such as portions of a word) in a user-provided sequence.

In the AI field, context is the user-provided sequence—all the data fed into an AI model that guides the model to produce a response output. This context includes the user’s input (the “prompt”), the running conversation history (in the case of chatbots), and any external information sources pulled into the conversation, including a “system prompt” that defines model behavior and “memory” systems that recall portions of past conversations. The limit on the amount of context a model can ingest at once is often called a “context window,” “context length, ” or “context limit,” depending on personal preference.

While the prompt provides important information for the model to operate upon, accessing external information sources has traditionally been cumbersome. Before MCP, AI assistants like ChatGPT and Claude could access external data (a process often called retrieval augmented generation, or RAG), but doing so required custom integrations for each service—plugins, APIs, and proprietary connectors that didn’t work across different AI models. Each new data source demanded unique code, creating maintenance challenges and compatibility issues.

MCP addresses these problems by providing a standardized method or set of rules (a “protocol”) that allows any supporting AI model framework to connect with external tools and information sources.

How does MCP work?

To make the connections behind the scenes between AI models and data sources, MCP uses a client-server model. An AI model (or its host application) acts as an MCP client that connects to one or more MCP servers. Each server provides access to a specific resource or capability, such as a database, search engine, or file system. When the AI needs information beyond its training data, it sends a request to the appropriate server, which performs the action and returns the result.

To illustrate how the client-server model works in practice, consider a customer support chatbot using MCP that could check shipping details in real time from a company database. “What’s the status of order #12345?” would trigger the AI to query an order database MCP server, which would look up the information and pass it back to the model. The model could then incorporate that data into its response: “Your order shipped on March 30 and should arrive April 2.”

Beyond specific use cases like customer support, the potential scope is very broad. Early developers have already built MCP servers for services like Google Drive, Slack, GitHub, and Postgres databases. This means AI assistants could potentially search documents in a company Drive, review recent Slack messages, examine code in a repository, or analyze data in a database—all through a standard interface.

From a technical implementation perspective, Anthropic designed the standard for flexibility by running in two main modes: Some MCP servers operate locally on the same machine as the client (communicating via standard input-output streams), while others run remotely and stream responses over HTTP. In both cases, the model works with a list of available tools and calls them as needed.

A work in progress

Despite the growing ecosystem around MCP, the protocol remains an early-stage project. The limited announcements of support from major companies are promising first steps, but MCP’s future as an industry standard may depend on broader acceptance, although the number of MCP servers seems to be growing at a rapid pace.

Regardless of its ultimate adoption rate, MCP may have some interesting second-order effects. For example, MCP also has the potential to reduce vendor lock-in. Because the protocol is model-agnostic, a company could switch from one AI provider to another while keeping the same tools and data connections intact.

MCP may also allow a shift toward smaller and more efficient AI systems that can interact more fluidly with external resources without the need for customized fine-tuning. Also, rather than building increasingly massive models with all knowledge baked in, companies may instead be able to use smaller models with large context windows.

For now, the future of MCP is wide open. Anthropic maintains MCP as an open source initiative on GitHub, where interested developers can either contribute to the code or find specifications about how it works. Anthropic has also provided extensive documentation about how to connect Claude to various services. OpenAI maintains its own API documentation for MCP on its website.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

MCP: The new “USB-C for AI” that’s bringing fierce rivals together Read More »

openai-#12:-battle-of-the-board-redux

OpenAI #12: Battle of the Board Redux

Back when the OpenAI board attempted and failed to fire Sam Altman, we faced a highly hostile information environment. The battle was fought largely through control of the public narrative, and the above was my attempt to put together what happened.

My conclusion, which I still believe, was that Sam Altman had engaged in a variety of unacceptable conduct that merited his firing.

In particular, he very much ‘not been consistently candid’ with the board on several important occasions. In particular, he lied to board members about what was said by other board members, with the goal of forcing out a board member he disliked. There were also other instances in which he misled and was otherwise toxic to employees, and he played fast and loose with the investment fund and other outside opportunities.

I concluded that the story that this was about ‘AI safety’ or ‘EA (effective altruism)’ or existential risk concerns, other than as Altman’s motivation to attempt to remove board members, was a false narrative largely spread by Altman’s allies and those who are determined to hate on anyone who is concerned future AI might get out of control or kill everyone, often using EA’s bad press or vibes as a point of leverage to do that.

A few weeks later, I felt that leaks confirmed the bulk the story I told at that first link, and since then I’ve had anonymous sources confirm my account was centrally true.

Thanks to Keach Hagey at the Wall Street Journal, we now have by far the most well-researched and complete piece on what happened: The Secrets and Misdirection Behind Sam Altman’s Firing From OpenAI. Most, although not all, of the important remaining questions are now definitively answered, and the story I put together has been confirmed.

The key now is to Focus Only On What Matters. What matters going forward are:

  1. Claims of Altman’s toxic and dishonest behaviors, that if true merited his firing.

  2. That the motivations behind the firing were these ordinary CEO misbehaviors.

  3. Altman’s allies successfully spread a highly false narrative about events.

  4. That OpenAI could easily have moved forward with a different CEO, if things had played out differently and Altman had not threatened to blow up OpenAI.

  5. OpenAI is now effectively controlled by Sam Altman going forward. His claims that ‘the board can fire me’ in practice mean very little.

Also important is what happened afterwards, which was likely caused in large part by both the events and also way they were framed, and also Altman’s consolidated power.

In particular, Sam Altman and OpenAI, whose explicit mission is building AGI and who plan to do so within Trump’s second term, started increasingly talking and acting like AGI was No Big Deal, except for the amazing particular benefits.

Their statements don’t feel the AGI. They no longer tell us our lives will change that much. It is not important, they do not even bother to tell us, to protect against key downside risks of building machines smarter and more capable than humans – such as the risk that those machines effectively take over, or perhaps end up killing everyone.

And if you disagreed with that, or opposed Sam Altman? You were shown the door.

  1. OpenAI was then effectively purged. Most of its strongest alignment researchers left, as did most of those who most prominently wanted to take care to ensure OpenAI’s quest for AGI did not kill everyone or cause humanity to lose control over the future.

  2. Altman’s public statements about AGI, and OpenAI’s policy positions, stopped even mentioning the most important downside risks of AGI and ASI (artificial superintelligence), and shifted towards attempts at regulatory capture and access to government cooperation and funding. Most prominently, their statement on the US AI Action Plan can only be described as disingenuous vice signaling in pursuit of their own private interests.

  3. Those public statements and positions no longer much even ‘feel the AGI.’ Altman has taken to predicting that AGI will happen and your life won’t much change, and treating future AGI as essentially a fungible good. We know, from his prior statements, that Altman knows better. And we know from their current statements that many the engineers at OpenAI know better. Indeed, in context, they shout it from the rooftops.

  4. We discovered that self-hiding NDAs were aggressively used by OpenAI, under threat of equity confiscation, to control people and the narrative.

  5. With control over the board, Altman is attempting to convert OpenAI into a for-profit company, with sufficiently low compensation that this act could plausibly become the greatest theft in human history.

Beware being distracted by the shiny. In particular:

  1. Don’t be distracted by the article’s ‘cold open’ in which Peter Thiel tells a paranoid and false story to Sam Altman, in which Thiel asserts that ‘EAs’ or ‘safety’ people will attempt to destroy OpenAI, and that they have ‘half the company convinced’ and so on. I don’t doubt the interaction happened, but this was unrelated to what happened.

    1. To the extent it was related, it was because Altman and his allies paranoia about such possibilities, inspired by such tall tales, caused Altman to lie to the board in general, and attempt to force Helen Toner off the board in particular.

  2. Don’t be distracted by the fact that the board botched the firing, and the subsequent events, from a tactical perspective. Yes we can learn from their mistakes, but the board that made those mistakes is gone now.

This is all quite bad, but things could be far worse. OpenAI still has many excellent people working on alignment, security and safety. I They have put out a number of strong documents. By that standard, and in terms of how responsibly they have actually handled their releases, OpenAI has outperformed many other industry actors, although less responsible than Anthropic. Companies like DeepSeek, Meta and xAI, and at times Google, work hard to make OpenAI look good on these fronts.

Now, on to what we learned this week.

Hagey’s story paints a clear picture of what actually happened.

It is especially clear about why this happened. The firing wasn’t about EA, ‘the safety people’ or existential risk. What was this about?

Altman repeatedly lied to, misled and mistreated employees of OpenAI. Altman repeatedly lied about and withheld factual and importantly material matters, including directly to the board. There was a large litany of complaints.

The big new fact is that the board was counting on Murati’s support. But partly because of this, they felt they couldn’t disclose that their information came largely from Murati. That doesn’t explain why they couldn’t say this to Murati herself.

If the facts asserted in the WSJ article are true, I would say that any responsible board would have voted for Altman’s removal. As OpenAI’s products got more impactful, and the stakes got higher, Altman’s behaviors left no choice.

Claude agreed, this was one shot, I pasted in the full article and asked:

Zvi: I’ve shared a news article. Based on what is stated in the news article, if the reporting is accurate, how would you characterize the board’s decision to fire Altman? Was it justified? Was it necessary?

Claude 3.7: Based on what’s stated in the article, the board’s decision to fire Sam Altman appears both justified and necessary from their perspective, though clearly poorly executed in terms of preparation and communication.

I agree, on both counts. There are only two choices here, at least one must be true:

  1. The board had a fiduciary duty to fire Altman.

  2. The board members are outright lying about what happened.

That doesn’t excuse the board’s botched execution, especially its failure to disclose information in a timely manner.

The key facts cited here are:

  1. Altman said publicly and repeatedly ‘the board can fire me. That’s important’ but he really called the shots and did everything in his power to ensure this.

  2. Altman did not even inform the board about ChatGPT in advance, at all.

  3. Altman explicitly claimed three enhancements to GPT-4 had been approved by the joint safety board. Helen Toner found only one had been approved.

  4. Altman allowed Microsoft to launch the test of GPT-4 in India, in the form of Sydney, without the approval of the safety board or informing the board of directors of the breach. Due to the results of that experiment entering the training data, deploying Sydney plausibly had permanent effects on all future AIs. This was not a trivial oversight.

  5. Altman did not inform the board that he had taken financial ownership of the OpenAI investment fund, which he claimed was temporary and for tax reasons.

  6. Mira Murati came to the board with a litany of complaints about what she saw as Altman’s toxic management style, including having Brockman, who reported to her, go around her to Altman whenever there was a disagreement. Altman responded by bringing the head of HR to their 1-on-1s until Mira said she wouldn’t share her feedback with the board.

  7. Altman promised both Pachocki and Sutskever they could direct the research direction of the company, losing months of productivity, and this was when Sutskever started looking to replace Altman.

  8. The most egregious lie (Hagey’s term for it) and what I consider on its own sufficient to require Altman be fired: Altman told one board member, Sutskever, that a second board member, McCauley, had said that Toner should leave the board because of an article Toner wrote. McCauley said no such thing. This was an attempt to get Toner removed from the board. If you lie to board members about other board members in an attempt to gain control over the board, I assert that the board should fire you, pretty much no matter what.

  9. Sutskever collected dozens of examples of alleged Altman lies and other toxic behavior, largely backed up by screenshots from Murati’s Slack channel. One lie in particular was that Altman told Murati that the legal department had said GPT-4-Turbo didn’t have to go through joint safety board review. The head lawyer said he did not say that. The decision not to go through the safety board here was not crazy, but lying about the lawyers opinion on this is highly unacceptable.

Murati was clearly a key source for many of these firing offenses (and presumably for this article, given its content and timing, although I don’t know anything nonpublic). Despite this, even after Altman was fired, the board didn’t even tell Murati why they had fired him while asking her to become interim CEO, and in general stayed quiet largely (in this post’s narrative) to protect Murati. But then, largely because of the board’s communication failures, Murati turned on the board and the employees backed Altman.

This section reiterates and expands on my warnings above.

The important narrative here is that Altman engaged in various shenanigans and made various unforced errors that together rightfully got him fired. But the board botched the execution, and Altman was willing to burn down OpenAI in response and the board wasn’t. Thus, Altman got power back and did an ideological purge.

The first key distracting narrative, the one I’m seeing many fall into, is to treat this primarily as a story about board incompetence. Look at those losers, who lost, because they were stupid losers in over their heads with no business playing at this level. Many people seem to think the ‘real story’ is that a now defunct group of people were bad at corporate politics and should get mocked.

Yes, that group was bad at corporate politics. We should update on that, and be sure that the next time we have to Do Corporate Politics we don’t act like that, and especially that we explain why we we doing things. But the group that dropped this ball is defunct, whereas Altman is still CEO. And this is not a sporting event.

The board is now irrelevant. Altman isn’t. What matters is the behavior of Altman, and what he did to earn getting fired. Don’t be distracted by the shiny.

A second key narrative spun by Altman’s allies is that Altman is an excellent player of corporate politics. He has certainly pulled off some rather impressive (and some would say nasty) tricks. But the picture painted here is rife with unforced errors. Altman won because the opposition played badly, not because he played so well.

Most importantly, as I noted at the time, the board started out with nine members, five of whom at the time were loyal to Altman even if you don’t count Ilya Sutskever. Altman could easily have used this opportunity to elect new loyal board members. Instead, he allowed three of his allies to leave the board without replacement, leading to the deadlock of control, which then led to the power struggle. Given Altman knows so many well-qualified allies, this seems like a truly epic level of incompetence to me.

The third other key narrative is the one Altman’s allies have centrally told since day one, which is entirely false, is that this firing (which they misleadingly call a ‘coup’) was ‘the safety people’ or ‘the EAs’ trying to ‘destroy’ OpenAI.

My worry is that many will see that this false framing is presented early in the post, and not read far enough to realize the post is pointing out that the framing is entirely false. Thus, many or even most readers might get exactly the wrong idea.

In particular, this piece opens with an irrelevant story ecoching this false narrative. Peter Thiel is at dinner telling his friend Sam Altman a frankly false and paranoid story about Effective Altruism and Eliezer Yudkowsky.

Thiel says that ‘half the company believes this stuff’ (if only!) and that ‘the EAs’ had ‘taken over’ OpenAI (if only again!), and predicting that ‘the safety people,’ who on various occasions Thiel has described as literally and at length as the biblical Antichrist would ‘destroy’ OpenAI (whereas, instead, the board in the end fell on its sword to prevent Altman and his allies from destroying OpenAI).

And it gets presented in ways like this:

We are told to focus on the nice people eating dinner while other dastardly people held ‘secret video meetings.’ How is this what is important here?

Then if you keep reading, Hagey makes it clear: The board’s firing of Altman had nothing to do with that. And we get on with the actual excellent article.

I don’t doubt Thiel told that to Altman, and I find it likely Thiel even believed it. The thing is, it isn’t true, and it’s rather important that people know it isn’t true.

If you want to read more about what has happened at OpenAI, I have covered this extensively, and my posts contain links to the best primary and other secondary sources I could find. Here are the posts in this sequence.

  1. OpenAI: Facts From a Weekend.

  2. OpenAI: The Battle of the Board.

  3. OpenAI: Altman Returns.

  4. OpenAI: Leaks Confirm the Story.

  5. OpenAI: The Board Expands.

  6. OpenAI: Exodus.

  7. OpenAI: Fallout

  8. OpenAI: Helen Toner Speaks.

  9. OpenAI #8: The Right to Warn.

  10. OpenAI #10: Reflections.

  11. On the OpenAI Economic Blueprint.

  12. The Mask Comes Off: At What Price?

  13. OpenAI #11: America Action Plan.

The write-ups will doubtless continue, as this is one of the most important companies in the world.

Discussion about this post

OpenAI #12: Battle of the Board Redux Read More »

openai’s-new-ai-image-generator-is-potent-and-bound-to-provoke

OpenAI’s new AI image generator is potent and bound to provoke


The visual apocalypse is probably nigh, but perhaps seeing was never believing.

A trio of AI-generated images created using OpenAI’s 4o Image Generation model in ChatGPT. Credit: OpenAI

The arrival of OpenAI’s DALL-E 2 in the spring of 2022 marked a turning point in AI when text-to-image generation suddenly became accessible to a select group of users, creating a community of digital explorers who experienced wonder and controversy as the technology automated the act of visual creation.

But like many early AI systems, DALL-E 2 struggled with consistent text rendering, often producing garbled words and phrases within images. It also had limitations in following complex prompts with multiple elements, sometimes missing key details or misinterpreting instructions. These shortcomings left room for improvement that OpenAI would address in subsequent iterations, such as DALL-E 3 in 2023.

On Tuesday, OpenAI announced new multimodal image generation capabilities that are directly integrated into its GPT-4o AI language model, making it the default image generator within the ChatGPT interface. The integration, called “4o Image Generation” (which we’ll call “4o IG” for short), allows the model to follow prompts more accurately (with better text rendering than DALL-E 3) and respond to chat context for image modification instructions.

An AI-generated cat in a car drinking a can of beer created by OpenAI’s 4o Image Generation model. OpenAI

The new image generation feature began rolling out Tuesday to ChatGPT Free, Plus, Pro, and Team users, with Enterprise and Education access coming later. The capability is also available within OpenAI’s Sora video generation tool. OpenAI told Ars that the image generation when GPT-4.5 is selected calls upon the same 4o-based image generation model as when GPT-4o is selected in the ChatGPT interface.

Like DALL-E 2 before it, 4o IG is bound to provoke debate as it enables sophisticated media manipulation capabilities that were once the domain of sci-fi and skilled human creators into an accessible AI tool that people can use through simple text prompts. It will also likely ignite a new round of controversy over artistic styles and copyright—but more on that below.

Some users on social media initially reported confusion since there’s no UI indication of which image generator is active, but you’ll know it’s the new model if the generation is ultra slow and proceeds from top to bottom. The previous DALL-E model remains available through a dedicated “DALL-E GPT” interface, while API access to GPT-4o image generation is expected within weeks.

Truly multimodal output

4o IG represents a shift to “native multimodal image generation,” where the large language model processes and outputs image data directly as tokens. That’s a big deal, because it means image tokens and text tokens share the same neural network. It leads to new flexibility in image creation and modification.

Despite baking-in multimodal image generation capabilities when GPT-4o launched in May 2024—when the “o” in GPT-4o was touted as standing for “omni” to highlight its ability to both understand and generate text, images, and audio—OpenAI has taken over 10 months to deliver the functionality to users, despite OpenAI president Greg Brock teasing the feature on X last year.

OpenAI was likely goaded by the release of Google’s multimodal LLM-based image generator called “Gemini 2.0 Flash (Image Generation) Experimental,” last week. The tech giants continue their AI arms race, with each attempting to one-up the other.

And perhaps we know why OpenAI waited: At a reasonable resolution and level of detail, the new 4o IG process is extremely slow, taking anywhere from 30 seconds to one minute (or longer) for each image.

Even if it’s slow (for now), the ability to generate images using a purely autoregressive approach is arguably a major leap for OpenAI due to its flexibility. But it’s also very compute-intensive, since the model generates the image token by token, building it sequentially. This contrasts with diffusion-based methods like DALL-E 3, which start with random noise and gradually refine an entire image over many iterative steps.

Conversational image editing

In a blog post, OpenAI positions 4o Image Generation as moving beyond generating “surreal, breathtaking scenes” seen with earlier AI image generators and toward creating “workhorse imagery” like logos and diagrams used for communication.

The company particularly notes improved text rendering within images, a capability where previous text-to-image models often spectacularly failed, often turning “Happy Birthday” into something resembling alien hieroglyphics.

OpenAI claims several key improvements: users can refine images through conversation while maintaining visual consistency; the system can analyze uploaded images and incorporate their details into new generations; and it offers stronger photorealism—although what constitutes photorealism (for example, imitations of HDR camera features, detail level, and image contrast) can be subjective.

A screenshot of OpenAI's 4o Image Generation model in ChatGPT. We see an existing AI-generated image of a barbarian and a TV set, then a request to set the TV set on fire.

A screenshot of OpenAI’s 4o Image Generation model in ChatGPT. We see an existing AI-generated image of a barbarian and a TV set, then a request to set the TV set on fire. Credit: OpenAI / Benj Edwards

In its blog post, OpenAI provided examples of intended uses for the image generator, including creating diagrams, infographics, social media graphics using specific color codes, logos, instruction posters, business cards, custom stock photos with transparent backgrounds, editing user photos, or visualizing concepts discussed earlier in a chat conversation.

Notably absent: Any mention of the artists and graphic designers whose jobs might be affected by this technology. As we covered throughout 2022 and 2023, job impact is still a top concern among critics of AI-generated graphics.

Fluid media manipulation

Shortly after OpenAI launched 4o Image Generation, the AI community on X put the feature through its paces, finding that it is quite capable at inserting someone’s face into an existing image, creating fake screenshots, and converting meme photos into the style of Studio Ghibli, South Park, felt, Muppets, Rick and Morty, Family Guy, and much more.

It seems like we’re entering a completely fluid media “reality” courtesy of a tool that can effortlessly convert visual media between styles. The styles also potentially encroach upon protected intellectual property. Given what Studio Ghibli co-founder Hayao Miyazaki has previously said about AI-generated artwork (“I strongly feel that this is an insult to life itself.”), it seems he’d be unlikely to appreciate the current AI-generated Ghibli fad on X at the moment.

To get a sense of what 4o IG can do ourselves, we ran some informal tests, including some of the usual CRT barbarians, queens of the universe, and beer-drinking cats, which you’ve already seen above (and of course, the plate of pickles.)

The ChatGPT interface with the new 4o image model is conversational (like before with DALL-E 3), but you can suggest changes over time. For example, we took the author’s EGA pixel bio (as we did with Google’s model last week) and attempted to give it a full body. Arguably, Google’s more limited image model did a far better job than 4o IG.

Giving the author's pixel avatar a body using OpenAI's 4o Image Generation model in ChatGPT.

Giving the author’s pixel avatar a body using OpenAI’s 4o Image Generation model in ChatGPT. Credit: OpenAI / Benj Edwards

While my pixel avatar was commissioned from the very human (and talented) Julia Minamata in 2020, I also tried to convert the inspiration image for my avatar (which features me and legendary video game engineer Ed Smith) into EGA pixel style to see what would happen. In my opinion, the result proves the continued superiority of human artistry and attention to detail.

Converting a photo of Benj Edwards and video game legend Ed Smith into “EGA pixel art” using OpenAI’s 4o Image Generation model in ChatGPT. Credit: OpenAI / Benj Edwards

We also tried to see how many objects 4o Image Generation could cram into an image, inspired by a 2023 tweet by Nathan Shipley when he was evaluating DALL-E 3 shortly after its release. We did not account for every object, but it looks like most of them are there.

Generating an image of a surfer holding tons of items, inspired by a 2023 Twitter post from Nathan Shipley.

Generating an image of a surfer holding tons of items, inspired by a 2023 Twitter post from Nathan Shipley. Credit: OpenAI / Benj Edwards

On social media, other people have manipulated images using 4o IG (like Simon Willison’s bear selfie), so we tried changing an AI-generated note featured in an article last year. It worked fairly well, though it did not really imitate the handwriting style as requested.

Modifying text in an image using OpenAI's 4o Image Generation model in ChatGPT.

Modifying text in an image using OpenAI’s 4o Image Generation model in ChatGPT. Credit: OpenAI / Benj Edwards

To take text generation a little further, we generated a poem about barbarians using ChatGPT, then fed it into an image prompt. The result feels roughly equivalent to diffusion-based Flux in capability—maybe slightly better—but there are still some obvious mistakes here and there, such as repeated letters.

Testing text generation using OpenAI's 4o Image Generation model in ChatGPT.

Testing text generation using OpenAI’s 4o Image Generation model in ChatGPT. Credit: OpenAI / Benj Edwards

We also tested the model’s ability to create logos featuring our favorite fictional Moonshark brand. One of the logos not pictured here was delivered as a transparent PNG file with an alpha channel. This may be a useful capability for some people in a pinch, but to the extent that the model may produce “good enough” (not exceptional, but looks OK at a glance) logos for the price of $o (not including an OpenAI subscription), it may end up competing with some human logo designers, and that will likely cause some consternation among professional artists.

Generating a

Generating a “Moonshark Moon Pies” logo using OpenAI’s 4o Image Generation model in ChatGPT. Credit: OpenAI / Benj Edwards

Frankly, this model is so slow we didn’t have time to test everything before we needed to get this article out the door. It can do much more than we have shown here—such as adding items to scenes or removing them. We may explore more capabilities in a future article.

Limitations

By now, you’ve seen that, like previous AI image generators, 4o IG is not perfect in quality: It consistently renders the author’s nose at an incorrect size.

Other than that, while this is one of the most capable AI image generators ever created, OpenAI openly acknowledges significant limitations of the model. For example, 4o IG sometimes crops images too tightly or includes inaccurate information (confabulations) with vague prompts or when rendering topics it hasn’t encountered in its training data.

The model also tends to fail when rendering more than 10–20 objects or concepts simultaneously (making tasks like generating an accurate periodic table currently impossible) and struggles with non-Latin text fonts. Image editing is currently unreliable over many multiple passes, with a specific bug affecting face editing consistency that OpenAI says it plans to fix soon. And it’s not great with dense charts or accurately rendering graphs or technical diagrams. In our testing, 4o Image Generation produced mostly accurate but flawed electronic circuit schematics.

Move fast and break everything

Even with those limitations, multimodal image generators are an early step into a much larger world of completely plastic media reality where any pixel can be manipulated on demand with no particular photo editing skill required. That brings with it potential benefits, ethical pitfalls, and the potential for terrible abuse.

In a notable shift from DALL-E, OpenAI now allows 4o IG to generate adult public figures (not children) with certain safeguards, while letting public figures opt out if desired. Like DALL-E, the model still blocks policy-violating content requests (such as graphic violence, nudity, and sex).

The ability for 4o Image Generation to imitate celebrity likenesses, brand logos, and Studio Ghibli films reinforces and reminds us how GPT-4o is partly (aside from some licensed content) a product of a massive scrape of the Internet without regard to copyright or consent from artists. That mass-scraping practice has resulted in lawsuits against OpenAI in the past, and we would not be surprised to see more lawsuits or at least public complaints from celebrities (or their estates) about their likenesses potentially being misused.

On X, OpenAI CEO Sam Altman wrote about the company’s somewhat devil-may-care position about 4o IG: “This represents a new high-water mark for us in allowing creative freedom. People are going to create some really amazing stuff and some stuff that may offend people; what we’d like to aim for is that the tool doesn’t create offensive stuff unless you want it to, in which case within reason it does.”

An original photo of the author beside AI-generated images created by OpenAI's 4o Image Generation model. From left to right: Studio Ghibli style, Muppet style, and pasta style.

An original photo of the author beside AI-generated images created by OpenAI’s 4o Image Generation model. From second left to right: Studio Ghibli style, Muppet style, and pasta style. Credit: OpenAI / Benj Edwards

Zooming out, GPT-4o’s image generation model (and the technology behind it, once open source) feels like it further erodes trust in remotely produced media. While we’ve always needed to verify important media through context and trusted sources, these new tools may further expand the “deep doubt” media skepticism that’s become necessary in the age of AI. By opening up photorealistic image manipulation to the masses, more people than ever can create or alter visual media without specialized skills.

While OpenAI includes C2PA metadata in all generated images, that data can be stripped away and might not matter much in the context of a deceptive social media post. But 4o IG doesn’t change what has always been true: We judge information primarily by the reputation of its messenger, not by the pixels themselves. Forgery existed long before AI. It reinforces that everyone needs media literacy skills—understanding that context and source verification have always been the best arbiters of media authenticity.

For now, Altman is ready to take on the risks of releasing the technology into the world. “As we talk about in our model spec, we think putting this intellectual freedom and control in the hands of users is the right thing to do, but we will observe how it goes and listen to society,” Altman wrote on X. “We think respecting the very wide bounds society will eventually choose to set for AI is the right thing to do, and increasingly important as we get closer to AGI. Thanks in advance for the understanding as we work through this.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

OpenAI’s new AI image generator is potent and bound to provoke Read More »

dad-demands-openai-delete-chatgpt’s-false-claim-that-he-murdered-his-kids

Dad demands OpenAI delete ChatGPT’s false claim that he murdered his kids

Currently, ChatGPT does not repeat these horrible false claims about Holmen in outputs. A more recent update apparently fixed the issue, as “ChatGPT now also searches the Internet for information about people, when it is asked who they are,” Noyb said. But because OpenAI had previously argued that it cannot correct information—it can only block information—the fake child murderer story is likely still included in ChatGPT’s internal data. And unless Holmen can correct it, that’s a violation of the GDPR, Noyb claims.

“While the damage done may be more limited if false personal data is not shared, the GDPR applies to internal data just as much as to shared data,” Noyb says.

OpenAI may not be able to easily delete the data

Holmen isn’t the only ChatGPT user who has worried that the chatbot’s hallucinations might ruin lives. Months after ChatGPT launched in late 2022, an Australian mayor threatened to sue for defamation after the chatbot falsely claimed he went to prison. Around the same time, ChatGPT linked a real law professor to a fake sexual harassment scandal, The Washington Post reported. A few months later, a radio host sued OpenAI over ChatGPT outputs describing fake embezzlement charges.

In some cases, OpenAI filtered the model to avoid generating harmful outputs but likely didn’t delete the false information from the training data, Noyb suggested. But filtering outputs and throwing up disclaimers aren’t enough to prevent reputational harm, Noyb data protection lawyer, Kleanthi Sardeli, alleged.

“Adding a disclaimer that you do not comply with the law does not make the law go away,” Sardeli said. “AI companies can also not just ‘hide’ false information from users while they internally still process false information. AI companies should stop acting as if the GDPR does not apply to them, when it clearly does. If hallucinations are not stopped, people can easily suffer reputational damage.”

Dad demands OpenAI delete ChatGPT’s false claim that he murdered his kids Read More »

study-finds-ai-generated-meme-captions-funnier-than-human-ones-on-average

Study finds AI-generated meme captions funnier than human ones on average

It’s worth clarifying that AI models did not generate the images used in the study. Instead, researchers used popular, pre-existing meme templates, and GPT-4o or human participants generated captions for them.

More memes, not better memes

When crowdsourced participants rated the memes, those created entirely by AI models scored higher on average in humor, creativity, and shareability. The researchers defined shareability as a meme’s potential to be widely circulated, influenced by humor, relatability, and relevance to current cultural topics. They note that this study is among the first to show AI-generated memes outperforming human-created ones across these metrics.

However, the study comes with an important caveat. On average, fully AI-generated memes scored higher than those created by humans alone or humans collaborating with AI. But when researchers looked at the best individual memes, humans created the funniest examples, and human-AI collaborations produced the most creative and shareable memes. In other words, AI models consistently produced broadly appealing memes, but humans—with or without AI help—still made the most exceptional individual examples.

Diagrams of meme creation and evaluation workflows taken from the paper.

Diagrams of meme creation and evaluation workflows taken from the paper. Credit: Wu et al.

The study also found that participants using AI assistance generated significantly more meme ideas and described the process as easier and requiring less effort. Despite this productivity boost, human-AI collaborative memes did not rate higher on average than memes humans created alone. As the researchers put it, “The increased productivity of human-AI teams does not lead to better results—just to more results.”

Participants who used AI assistance reported feeling slightly less ownership over their creations compared to solo creators. Given that a sense of ownership influenced creative motivation and satisfaction in the study, the researchers suggest that people interested in using AI should carefully consider how to balance AI assistance in creative tasks.

Study finds AI-generated meme captions funnier than human ones on average Read More »

openai-#11:-america-action-plan

OpenAI #11: America Action Plan

Last week I covered Anthropic’s submission to the request for suggestions for America’s action plan. I did not love what they submitted, and especially disliked how aggressively they sidelines existential risk and related issues, but given a decision to massively scale back ambition like that the suggestions were, as I called them, a ‘least you can do’ agenda, with many thoughtful details.

OpenAI took a different approach. They went full jingoism in the first paragraph, framing this as a race in which we must prevail over the CCP, and kept going. A lot of space is spent on what a kind person would call rhetoric and an unkind person corporate jingoistic propaganda.

Their goal is to have the Federal Government not only not regulate AI or impose any requirements on AI whatsoever on any level, but also prevent the states from doing so, and ensure that existing regulations do not apply to them, seeking ‘relief’ from proposed bills, including exemption from all liability, explicitly emphasizing immunity from regulations targeting frontier models in particular and name checking SB 1047 as an example of what they want immunity from, all in the name of ‘Freedom to Innovate,’ warning of undermining America’s leadership position otherwise.

None of which actually makes any sense from a legal perspective, that’s not how any of this works, but that’s clearly not what they decided to care about. If this part was intended as a serious policy proposal it would have tried to pretend to be that. Instead it’s a completely incoherent proposal, that goes halfway towards something unbelievably radical but pulls back from trying to implement it.

Meanwhile, they want the United States to not only ban Chinese ‘AI infrastructure’ but also coordinate with other countries to ban it, and they want to weaken the compute diffusion rules for those who cooperate with this, essentially only restricting countries with a history or expectation of leaking technology to China, or those who won’t play ball with OpenAI’s anticompetitive proposals.

They refer to DeepSeek as ‘state controlled.’

They claim that DeepSeek could be ordered to alter its models to cause harm, if one were to build upon them, seems to fundamentally misunderstand that DeepSeek is releasing open models. You can’t modify an open model like that. Nor can you steal someone’s data if they’re running their own copy. The parallel to Huawei is disingenuous at best, especially given the source.

They cite the ‘Belt and Road Initiative’ and claim to expect China to coerce people into using DeepSeek’s models.

For copyright they proclaim the need for ‘freedom to learn’ and asserts that AI training is fully fair use and immune from copyright. I think this is a defensible position, and myself support mandatory licensing similar to radio for music, in a way that compensates creators. I think the position here is defensible. But the rhetoric?

They all but declare that if we don’t apply fair use, the authoritarians will conquer us.

If the PRC’s developers have unfettered access to data and American companies are left without fair use access, the race for AI is effectively over. America loses, as does the success of democratic AI.

It amazes me they wrote that with a straight face. Everything is power laws. Suggesting that depriving American labs of some percentage of data inputs, even if that were to happen and the labs were to honor those restrictions (which I very much do not believe they have typically been doing), would mean ‘the race is effectively over’ is patently absurd. They know that better than anyone. Have they no shame? Are they intentionally trying to tell us that they have no shame? Why?

This document is written in a way that seems almost designed to make one vomit. This is vice signaling. As I have said before, and with OpenAI documents this has happened before, when that happens, I think it is important to notice it!

I don’t think the inducing of vomit is a coincidence. They chose to write it this way. They want people to see that they are touting disingenuous jingoistic propaganda in a way that seems suspiciously corrupt. Why would they want to signal that? You tell me.

You don’t publish something like this unless you actively want headlines like this:

Evan Morrison: Altman translated – if you don’t give Open AI free access to steal all copyrighted material by writers, musicians and filmmakers without legal repercussions then we will lose the AI race with China – a communist nation which nonetheless protects the copyright of individuals.

There are other similar and similarly motivated claims throughout.

The claim that China can circumvent some regulatory restrictions present in America is true enough, and yes that constitutes an advantage that could be critical if we do EU-style things, but the way they frame it goes beyond hyperbolic. Every industry, everywhere, would like to say ‘any requirements you place upon me make our lives harder and helps our competitors, so you need to place no restrictions on us of any kind.’

Then there’s a mix of proposals, some of which are good, presented reasonably:

Their proposal for a ‘National Transmission Highway Act’ on par with the 1956 National Interstate and Defense Highways Act seems like it should be overkill, but our regulations in these areas are deeply fed, so if as they suggest here it is focused purely on approvals I am all for that one. They also want piles of government money.

Similarly their idea of AI ‘Opportunity Zones’ is great if it only includes sidestepping permitting and various regulations. The tax incentives or ‘credit enhancements’ I see as an unnecessary handout, private industry is happy to make these investments if we clear the way.

The exception is semiconductor manufacturing, where we do need to provide the right incentives, so we will need to pay up.

Note that OpenAI emphasizes the need for solar and wind projects on top of other energy sources.

Digitization of government data currently in analog form is a great idea, we should do it for many overdetermined reasons. But to point out the obvious, are we then going to hide that data from PRC? It’s not an advantage to American AI companies if everyone gets equal access.

The Compact for AI proposal is vague but directionally seems good.

Their ‘national AI Readiness Strategy’ is part of a long line of ‘retraining’ style government initiatives that, frankly, don’t work, and also aren’t necessary here. I’m fine with expanding 529 savings plans to cover AI supply chain-related training programs, I mean sure why not, but don’t try to do much more than that. The private sector is far better equipped to handle this one, especially with AI help.

I don’t get the ‘creating AI research labs’ strategy here, it seems to be a tax on AI companies payable to universities? This doesn’t actually make economic sense at all.

The section on Government Adaptation of AI is conceptually fine, but the emphasis on private-public partnerships is telling.

Some others are even hasher than I was. Andrew Curran has similar even blunter thoughts on both of the DeepSeek and fair use rhetorical moves.

Alexander Doria: The main reason OpenAI is calling to reinforce fair use for model training: their new models directly compete with writers, journalists, wikipedia editors. We have deep research (a “wikipedia killer”, ditto Noam Brown) and now the creative writing model.

The fundamental doctrine behind the google books transformative exception: you don’t impede on the normal commercialization of the work used. No longer really the case…

We have models trained exclusively on open data.

Gallabytes (on the attempt to ban Chinese AI models): longshoremen level scummy move. @OpenAI this is disgraceful.

As we should have learned many times in the past, most famously with the Jones Act, banning the competition is not The Way. You don’t help your industry compete, you instead risk destroying your industry’s ability to compete.

This week, we saw for example that Saudi Aramco chief says DeepSeek AI makes ‘big difference’ to operations. The correct response is to say, hey, have you tried Claude and ChatGPT, or if you need open models have you tried Gemma? Let’s turn that into a reasoning model for you.

The response that says you’re ngmi? Trying to ban DeepSeek, or saying if you don’t get exemptions from laws then ‘the race is over.’

From Peter Wildeford, seems about right:

The best steelman of OpenAI’s response I’ve seen comes from John Pressman. His argument is, yes there is cringe here – he chooses to focus here on a line about DeepSeek’s willingness to do a variety of illicit activities and a claim that this reflects CCP’s view of violating American IP law. Which is certainly another cringy line. But, he points out, the Trump administration asked how America can get ahead and stay ahead in AI, so in that context why shouldn’t OpenAI respond with a jingoistic move towards regulatory capture and a free pass to do as they want?

And yes, there is that, although his comments also reinforce that the price in ‘gesture towards open model support’ for some people to cheer untold other horrors is remarkably cheap.

This letter is part of a recurring pattern in OpenAI’s public communications.

OpenAI have issued some very good documents on the alignment and technical fronts, including their model spec and statement on alignment philosophy, as well as their recent paper on The Most Forbidden Technique. They have been welcoming of detailed feedback on those fronts. In these places they are being thoughtful and transparent, and doing some good work, and I have updated positively. OpenAI’s actual model deployment decisions have mostly been fine in practice, with some troubling signs such as the attempt to pretend GPT-4.5 was not a frontier model.

Alas, their public relations and lobbying departments, and Altman’s public statements in various places, have been consistently terrible and getting even worse over time, to the point of being consistent and rather blatant vice signaling. OpenAI is intentionally presenting themselves as disingenuous jingoistic villains, seeking out active regulatory protections, doing their best to kill attempts to keep models secure, and attempting various forms of government subsidy and regulatory capture.

I get why they would think it is strategically wise to present themselves in this way, to appeal to both the current government and to investors, especially in the wake of recent ‘vibe shifts.’ So I get why one could be tempted to say, oh, they don’t actually believe any of this, they’re only being strategic, obviously not enough people will penalize them for it so they need to do it, and thus you shouldn’t penalize them for it either, that would only be spite.

I disagree. When people tell you who they are, you should believe them.

Discussion about this post

OpenAI #11: America Action Plan Read More »

ai-search-engines-cite-incorrect-sources-at-an-alarming-60%-rate,-study-says

AI search engines cite incorrect sources at an alarming 60% rate, study says

A new study from Columbia Journalism Review’s Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The research tested eight AI-driven search tools equipped with live search functionality and discovered that the AI models incorrectly answered more than 60 percent of queries about news sources.

Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar noted in their report that roughly 1 in 4 Americans now use AI models as alternatives to traditional search engines. This raises serious concerns about reliability, given the substantial error rate uncovered in the study.

Error rates varied notably among the tested platforms. Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried. Grok 3 demonstrated the highest error rate, at 94 percent.

A graph from CJR shows

A graph from CJR shows “confidently wrong” search results. Credit: CJR

For the tests, researchers fed direct excerpts from actual news articles to the AI models, then asked each model to identify the article’s headline, original publisher, publication date, and URL. They ran 1,600 queries across the eight different generative search tools.

The study highlighted a common trend among these AI models: rather than declining to respond when they lacked reliable information, the models frequently provided confabulations—plausible-sounding incorrect or speculative answers. The researchers emphasized that this behavior was consistent across all tested models, not limited to just one tool.

Surprisingly, premium paid versions of these AI search tools fared even worse in certain respects. Perplexity Pro ($20/month) and Grok 3’s premium service ($40/month) confidently delivered incorrect responses more often than their free counterparts. Though these premium models correctly answered a higher number of prompts, their reluctance to decline uncertain responses drove higher overall error rates.

Issues with citations and publisher control

The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.

AI search engines cite incorrect sources at an alarming 60% rate, study says Read More »

openai-pushes-ai-agent-capabilities-with-new-developer-api

OpenAI pushes AI agent capabilities with new developer API

Developers using the Responses API can access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses.

That’s notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On OpenAI’s SimpleQA benchmark, which aims to measure confabulation rate, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent—both substantially outperforming the larger GPT-4.5 model without search, which scored 63 percent.

Despite these improvements, the technology still has significant limitations. Aside from issues with CUA properly navigating websites, the improved search capability doesn’t completely solve the problem of AI confabulations, with GPT-4o search still making factual mistakes 10 percent of the time.

Alongside the Responses API, OpenAI released the open source Agents SDK, providing developers with free tools to integrate models with internal systems, implement safeguards, and monitor agent activities. This toolkit follows OpenAI’s earlier release of Swarm, a framework for orchestrating multiple agents.

These are still early days in the AI agent field, and things will likely improve rapidly. However, at the moment, the AI agent movement remains vulnerable to unrealistic claims, as demonstrated earlier this week when users discovered that Chinese startup Butterfly Effect’s Manus AI agent platform failed to deliver on many of its promises, highlighting the persistent gap between promotional claims and practical functionality in this emerging technology category.

OpenAI pushes AI agent capabilities with new developer API Read More »

what-does-“phd-level”-ai-mean?-openai’s-rumored-$20,000-agent-plan-explained.

What does “PhD-level” AI mean? OpenAI’s rumored $20,000 agent plan explained.

On the Frontier Math benchmark by EpochAI, o3 solved 25.2 percent of problems, while no other model has exceeded 2 percent—suggesting a leap in mathematical reasoning capabilities over the previous model.

Benchmarks vs. real-world value

Ideally, potential applications for a true PhD-level AI model would include analyzing medical research data, supporting climate modeling, and handling routine aspects of research work.

The high price points reported by The Information, if accurate, suggest that OpenAI believes these systems could provide substantial value to businesses. The publication notes that SoftBank, an OpenAI investor, has committed to spending $3 billion on OpenAI’s agent products this year alone—indicating significant business interest despite the costs.

Meanwhile, OpenAI faces financial pressures that may influence its premium pricing strategy. The company reportedly lost approximately $5 billion last year covering operational costs and other expenses related to running its services.

News of OpenAI’s stratospheric pricing plans come after years of relatively affordable AI services that have conditioned users to expect powerful capabilities at relatively low costs. ChatGPT Plus remains $20 per month and Claude Pro costs $30 monthly—both tiny fractions of these proposed enterprise tiers. Even ChatGPT Pro’s $200/month subscription is relatively small compared to the new proposed fees. Whether the performance difference between these tiers will match their thousandfold price difference is an open question.

Despite their benchmark performances, these simulated reasoning models still struggle with confabulations—instances where they generate plausible-sounding but factually incorrect information. This remains a critical concern for research applications where accuracy and reliability are paramount. A $20,000 monthly investment raises questions about whether organizations can trust these systems not to introduce subtle errors into high-stakes research.

In response to the news, several people quipped on social media that companies could hire an actual PhD student for much cheaper. “In case you have forgotten,” wrote xAI developer Hieu Pham in a viral tweet, “most PhD students, including the brightest stars who can do way better work than any current LLMs—are not paid $20K / month.”

While these systems show strong capabilities on specific benchmarks, the “PhD-level” label remains largely a marketing term. These models can process and synthesize information at impressive speeds, but questions remain about how effectively they can handle the creative thinking, intellectual skepticism, and original research that define actual doctoral-level work. On the other hand, they will never get tired or need health insurance, and they will likely continue to improve in capability and drop in cost over time.

What does “PhD-level” AI mean? OpenAI’s rumored $20,000 agent plan explained. Read More »

elon-musk-loses-initial-attempt-to-block-openai’s-for-profit-conversion

Elon Musk loses initial attempt to block OpenAI’s for-profit conversion

A federal judge rejected Elon Musk’s request to block OpenAI’s planned conversion from a nonprofit to for-profit entity but expedited the case so that Musk’s core claims can be addressed in a trial before the end of this year.

Musk had filed a motion for preliminary injunction in US District Court for the Northern District of California, claiming that OpenAI’s for-profit conversation “violates the terms of Musk’s donations” to the company. But Musk failed to meet the burden of proof needed for an injunction, Judge Yvonne Gonzalez Rogers ruled yesterday.

“Plaintiffs Elon Musk, [former OpenAI board member] Shivon Zilis, and X.AI Corp. (‘xAI’) collectively move for a preliminary injunction barring defendants from engaging in various business activities, which plaintiffs claim violate federal antitrust and state law,” Rogers wrote. “The relief requested is extraordinary and rarely granted as it seeks the ultimate relief of the case on an expedited basis, with a cursory record, and without the benefit of a trial.”

Rogers said that “the Court is prepared to offer an expedited schedule on the core claims driving this litigation [to] address the issues which are allegedly more urgent in terms of public, not private, considerations.” There would be important public interest considerations if the for-profit shift is found to be illegal at a trial, she wrote.

Musk said OpenAI took advantage of him

Noting that OpenAI donors may have taken tax deductions from a nonprofit that is now turning into a for-profit enterprise, Rogers said the court “agrees that significant and irreparable harm is incurred when the public’s money is used to fund a non-profit’s conversion into a for-profit.” But as for the motion to block the for-profit conversion before a trial, “The request for an injunction barring any steps towards OpenAI’s conversion to a for-profit entity is DENIED.”

Elon Musk loses initial attempt to block OpenAI’s for-profit conversion Read More »

ai-firms-follow-deepseek’s-lead,-create-cheaper-models-with-“distillation”

AI firms follow DeepSeek’s lead, create cheaper models with “distillation”

Thanks to distillation, developers and businesses can access these models’ capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.

Developers can use OpenAI’s platform for distillation, learning from the large language models that underpin products like ChatGPT. OpenAI’s largest backer, Microsoft, used GPT-4 to distill its small language family of models Phi as part of a commercial partnership after investing nearly $14 billion into the company.

However, the San Francisco-based start-up has said it believes DeepSeek distilled OpenAI’s models to train its competitor, a move that would be against its terms of service. DeepSeek has not commented on the claims.

While distillation can be used to create high-performing models, experts add they are more limited.

“Distillation presents an interesting trade-off; if you make the models smaller, you inevitably reduce their capability,” said Ahmed Awadallah of Microsoft Research, who said a distilled model can be designed to be very good at summarising emails, for example, “but it really would not be good at anything else.”

David Cox, vice-president for AI models at IBM Research, said most businesses do not need a massive model to run their products, and distilled ones are powerful enough for purposes such as customer service chatbots or running on smaller devices like phones.

“Any time you can [make it less expensive] and it gives you the right performance you want, there is very little reason not to do it,” he added.

That presents a challenge to many of the business models of leading AI firms. Even if developers use distilled models from companies like OpenAI, they cost far less to run, are less expensive to create, and, therefore, generate less revenue. Model-makers like OpenAI often charge less for the use of distilled models as they require less computational load.

AI firms follow DeepSeek’s lead, create cheaper models with “distillation” Read More »