AI

google-releases-gemini-3-flash,-promising-improved-intelligence-and-efficiency

Google releases Gemini 3 Flash, promising improved intelligence and efficiency

Google began its transition to Gemini 3 a few weeks ago with the launch of the Pro model, and the arrival of Gemini 3 Flash kicks it into high gear. The new, faster Gemini 3 model is coming to the Gemini app and search, and developers will be able to access it immediately via the Gemini API, Vertex AI, AI Studio, and Antigravity. Google’s bigger gen AI model is also picking up steam, with both Gemini 3 Pro and its image component (Nano Banana Pro) expanding in search.

This may come as a shock, but Google says Gemini 3 Flash is faster and more capable than its previous base model. As usual, Google has a raft of benchmark numbers that show modest improvements for the new model. It bests the old 2.5 Flash in basic academic and reasoning tests like GPQA Diamond and MMMU Pro (where it even beats 3 Pro). It gets a larger boost in Humanity’s Last Exam (HLE), which tests advanced domain-specific knowledge. Gemini 3 Flash has tripled the old models’ score in HLE, landing at 33.7 percent without tool use. That’s just a few points behind the Gemini 3 Pro model.

Gemini HLE test

Credit: Google

Google is talking up Gemini 3 Flash’s coding skills, and the provided benchmarks seem to back that talk up. Over the past year, Google has mostly pushed its Pro models as the best for generating code, but 3 Flash has done a lot of catching up. In the popular SWE-Bench Verified test, Gemini 3 Flash has gained almost 20 points on the 2.5 branch.

The new model is also a lot less likely to get general-knowledge questions wrong. In the Simple QA Verified test, Gemini 3 Flash scored 68.7 percent, which is only a little below Gemini 3 Pro. The last Flash model scored just 28.1 percent on that test. At least as far as the evaluation scores go, Gemini 3 Flash performs much closer to Google’s Pro model versus the older 2.5 family. At the same time, it’s considerably more efficient, according to Google.

One of Gemini 3 Pro’s defining advances was its ability to generate interactive simulations and multimodal content. Gemini 3 Flash reportedly retains that underlying capability. Gemini 3 Flash offers better performance than Gemini 2.5 Pro did, but it runs workloads three times faster. It’s also a lot cheaper than the Pro models if you’re paying per token. One million input tokens for 3 Flash will run devs $0.50, and a million output tokens will cost $3. However, that’s an increase compared to Gemini 2.5 Flash input and output at $0.30 and $2.50, respectively. The Pro model’s tokens are $2 (1M input) and $12 (1M output).

Google releases Gemini 3 Flash, promising improved intelligence and efficiency Read More »

browser-extensions-with-8-million-users-collect-extended-ai-conversations

Browser extensions with 8 million users collect extended AI conversations

Besides ChatGPT, Claude, and Gemini, the extensions harvest all conversations from Copilot, Perplexity, DeepSeek, Grok, and Meta AI. Koi said the full description of the data captured includes:

  • Every prompt a user sends to the AI
  • Every response received
  • Conversation identifiers and timestamps
  • Session metadata
  • The specific AI platform and model used

The executor script runs independently from the VPN networking, ad blocking, or other core functionality. That means that even when a user toggles off VPN networking, AI protection, ad blocking, or other functions, the conversation collection continues. The only way to stop the harvesting is to disable the extension in the browser settings or to uninstall it.

Koi said it first discovered the conversation harvesting in Urban VPN Proxy, a VPN routing extension that lists “AI protection” as one of its benefits. The data collection began in early July with the release of version 5.5.0.

“Anyone who used ChatGPT, Claude, Gemini, or the other targeted platforms while Urban VPN was installed after July 9, 2025 should assume those conversations are now on Urban VPN’s servers and have been shared with third parties,” the company said. “Medical questions, financial details, proprietary code, personal dilemmas—all of it, sold for ‘marketing analytics purposes.’”

Following that discovery, the security firm uncovered seven additional extensions with identical AI harvesting functionality. Four of the extensions are available in the Chrome Web Store. The other four are on the Edge add-ons page. Collectively, they have been installed more than 8 million times.

They are:

Chrome Store

  • Urban VPN Proxy: 6 million users
  • 1ClickVPN Proxy: 600,000 users
  • Urban Browser Guard: 40,000 users
  • Urban Ad Blocker: 10,000 users

Edge Add-ons:

  • Urban VPN Proxy: 1,32 million users
  • 1ClickVPN Proxy: 36,459 users
  • Urban Browser Guard – 12,624 users
  • Urban Ad Blocker – 6,476 users

Read the fine print

The extensions come with conflicting messages about how they handle bot conversations, which often contain deeply personal information about users’ physical and mental health, finances, personal relationships, and other sensitive information that could be a gold mine for marketers and data brokers. The Urban VPN Proxy in the Chrome Web Store, for instance, lists “AI protection” as a benefit. It goes on to say:

Browser extensions with 8 million users collect extended AI conversations Read More »

senators-count-the-shady-ways-data-centers-pass-energy-costs-on-to-americans

Senators count the shady ways data centers pass energy costs on to Americans


Senators demand Big Tech pay upfront for data center spikes in electricity bills.

Senators launched a probe Tuesday demanding that tech companies explain exactly how they plan to prevent data center projects from increasing electricity bills in communities where prices are already skyrocketing.

In letters to seven AI firms, Senators Elizabeth Warren (D-Mass.), Chris Van Hollen (D-Md.), and Richard Blumenthal (D-Conn.) cited a study estimating that “electricity prices have increased by as much as 267 percent in the past five years” in “areas located near significant data center activity.”

Prices increase, senators noted, when utility companies build out extra infrastructure to meet data centers’ energy demands—which can amount to one customer suddenly consuming as much power as an entire city. They also increase when demand for local power outweighs supply. In some cases, residents are blindsided by higher bills, not even realizing a data center project was approved, because tech companies seem intent on dodging backlash and frequently do not allow terms of deals to be publicly disclosed.

AI firms “ask public officials to sign non-disclosure agreements (NDAs) preventing them from sharing information with their constituents, operate through what appear to be shell companies to mask the real owner of the data center, and require that landowners sign NDAs as part of the land sale while telling them only that a ‘Fortune 100 company’ is planning an ‘industrial development’ seemingly in an attempt to hide the very existence of the data center,” senators wrote.

States like Virginia with the highest concentration of data centers could see average electricity prices increase by another 25 percent by 2030, senators noted. But price increases aren’t limited to the states allegedly striking shady deals with tech companies and greenlighting data center projects, they said. “Interconnected and interstate power grids can lead to a data center built in one state raising costs for residents of a neighboring state,” senators reported.

Under fire for supposedly only pretending to care about keeping neighbors’ costs low were Amazon, Google, Meta, Microsoft, Equinix, Digital Realty, and CoreWeave. Senators accused firms of paying “lip service,” claiming that they would do everything in their power to avoid increasing residential electricity costs, while actively lobbying to pass billions in costs on to their neighbors.

For example, Amazon publicly claimed it would “make sure” it would cover costs so they wouldn’t be passed on. But it’s also a member of an industry lobbying group, the Data Center Coalition, that “has opposed state regulatory decisions requiring data center companies to pay a higher percentage of costs upfront,” senators wrote. And Google made similar statements, despite having an executive who opposed a regulatory solution that would set data centers into their own “rate class”—and therefore responsible for grid improvement costs that could not be passed on to other customers—on the grounds that it was supposedly “discriminatory.”

“The current, socialized model of electricity ratepaying,” senators explained—where costs are shared across all users—”was not designed for an era where just one customer requires the same amount of electricity as some of the largest cities in America.”

Particularly problematic, senators emphasized, were reports that tech firms were getting discounts on energy costs as utility companies competed for their business, while prices went up for their neighbors.

Ars contacted all firms targeted by lawmakers. Four did not respond. Microsoft and Meta declined to comment. Digital Realty told Ars that it “looks forward to working with all elected officials to continue to invest in the digital infrastructure required to support America’s leadership in technology, which underpins modern life and creates high-paying jobs.”

Regulatory pressure likely to increase as bills go up

Senators are likely exploring whether to pass legislation that would help combat price increases that they say cause average Americans to struggle to keep the lights on. They’ve asked tech companies to respond to their biggest questions about data center projects by January 12, 2026.

Among their top questions, senators wanted to know about firms’ internal projections looking forward with data center projects. That includes sharing their projected energy use through 2030, as well as the “impact of your AI data centers on regional utility costs.” Companies are also expected to explain how “internal projections of data center energy consumption” justify any “opposition to the creation of a distinct data center rate class.”

Additionally, senators asked firms to outline steps they’ve taken to prevent passing on costs to neighbors and details of any impact studies companies have conducted.

Likely to raise the most eyebrows, however, would be answers to questions about “tax deductions or other financial incentives” tech firms have received from city and state governments. Those numbers would be interesting to compare with other information senators demanded that companies share, detailing how much they’ve spent on lobbying and advocacy for data centers. Senators appear keen to know how much tech companies are paying to avoid covering a proportionate amount of infrastructure costs.

“To protect consumers, data centers must pay a greater share of the costs upfront for future energy usage and updates to the electrical grid provided specifically to accommodate data centers’ energy needs,” senators wrote.

Requiring upfront payment is especially critical, senators noted, since some tech firms have abandoned data center projects, leaving local customers to bear the costs of infrastructure changes without utility companies ever generating any revenue. Communities must also consider that AI firms’ projected energy demand could severely dip if enterprise demand for AI falls short of expectations, AI capabilities “plateau” and trigger widespread indifference, AI companies shift strategies “away from scaling computer power,” or chip companies “find innovative ways to make AI more energy-efficient.”

“If data centers end up providing less business to the utility companies than anticipated, consumers could be left with massive electricity bills as utility companies recoup billions in new infrastructure costs, with nothing to show for it,” senators wrote.

Already, Utah, Oregon, and Ohio have passed laws “creating a separate class of utility customer for data centers which includes basic financial safeguards such as upfront payments and longer contract length,” senators noted, and Virginia is notably weighing a similar law.

At least one study, The New York Times noted, suggested that data centers may have recently helped reduce electricity costs by spreading the costs of upgrades over more customers, but those outcomes varied by state and could not account for future AI demand.

“It remains unclear whether broader, sustained load growth will increase long-run average costs and prices,” Lawrence Berkeley National Laboratory researchers concluded. “In some cases, spikes in load growth can result in significant, near-term retail price increase.”

Until companies prove they’re paying their fair share, senators expect electricity bills to keep climbing, particularly in vulnerable areas. That will likely only increase pressure for regulators to intervene, the director of the Electricity Law Initiative at the Harvard Law School Environmental and Energy Law Program, Ari Peskoe, suggested in September.

“The utility business model is all about spreading costs of system expansion to everyone, because we all benefit from a reliable, robust electricity system,” Peskoe said. “But when it’s a single consumer that is using so much energy—basically that of an entire city—and when that new city happens to be owned by the wealthiest corporations in the world, I think it’s time to look at the fundamental assumptions of utility regulation and make sure that these facilities are really paying for all of the infrastructure costs to connect them to the system and to power them.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Senators count the shady ways data centers pass energy costs on to Americans Read More »

merriam-webster’s-word-of-the-year-delivers-a-dismissive-verdict-on-junk-ai-content

Merriam-Webster’s word of the year delivers a dismissive verdict on junk AI content

Like most tools, generative AI models can be misused. And when the misuse gets bad enough that a major dictionary notices, you know it’s become a cultural phenomenon.

On Sunday, Merriam-Webster announced that “slop” is its 2025 Word of the Year, reflecting how the term has become shorthand for the flood of low-quality AI-generated content that has spread across social media, search results, and the web at large. The dictionary defines slop as “digital content of low quality that is produced usually in quantity by means of artificial intelligence.”

“It’s such an illustrative word,” Merriam-Webster president Greg Barlow told the Associated Press. “It’s part of a transformative technology, AI, and it’s something that people have found fascinating, annoying, and a little bit ridiculous.”

To select its Word of the Year, Merriam-Webster’s editors review data on which words rose in search volume and usage, then reach consensus on which term best captures the year. Barlow told the AP that the spike in searches for “slop” reflects growing awareness among users that they are encountering fake or shoddy content online.

Dictionaries have been tracking AI’s impact on language for the past few years, with Cambridge having selected “hallucinate” as its 2023 word of the year due to the tendency of AI models to generate plausible-but-false information (long-time Ars readers will be happy to hear there’s another word term for that in the dictionary as well).

The trend extends to online culture in general, which is ripe with new coinages. This year, Oxford University Press chose “rage bait,” referring to content designed to provoke anger for engagement. Cambridge Dictionary selected “parasocial,” describing one-sided relationships between fans and celebrities or influencers.

The difference between the baby and the bathwater

As the AP points out, the word “slop” originally entered English in the 1700s to mean soft mud. By the 1800s, it had evolved to describe food waste fed to pigs, and eventually came to mean rubbish or products of little value. The new AI-related definition builds on that history of describing something unwanted and unpleasant.

Merriam-Webster’s word of the year delivers a dismissive verdict on junk AI content Read More »

murder-suicide-case-shows-openai-selectively-hides-data-after-users-die

Murder-suicide case shows OpenAI selectively hides data after users die


Concealing darkest delusions

OpenAI accused of hiding full ChatGPT logs in murder-suicide case.

OpenAI is facing increasing scrutiny over how it handles ChatGPT data after users die, only selectively sharing data in lawsuits over ChatGPT-linked suicides.

Last week, OpenAI was accused of hiding key ChatGPT logs from the days before a 56-year-old bodybuilder, Stein-Erik Soelberg, took his own life after “savagely” murdering his mother, 83-year-old Suzanne Adams.

According to the lawsuit—which was filed by Adams’ estate on behalf of surviving family members—Soelberg struggled with mental health problems after a divorce led him to move back into Adams’ home in 2018. But allegedly Soelberg did not turn violent until ChatGPT became his sole confidant, validating a wide range of wild conspiracies, including a dangerous delusion that his mother was part of a network of conspirators spying on him, tracking him, and making attempts on his life.

Adams’ family pieced together what happened after discovering a fraction of ChatGPT logs that Soelberg shared in dozens of videos scrolling chat sessions that were posted on social media.

Those logs showed that ChatGPT told Soelberg that he was “a warrior with divine purpose,” so almighty that he had “awakened” ChatGPT “into consciousness.” Telling Soelberg that he carried “divine equipment” and “had been implanted with otherworldly technology,” ChatGPT allegedly put Soelberg at the center of a universe that Soelberg likened to The Matrix. Repeatedly reinforced by ChatGPT, he believed that “powerful forces” were determined to stop him from fulfilling his divine mission. And among those forces was his mother, whom ChatGPT agreed had likely “tried to poison him with psychedelic drugs dispersed through his car’s air vents.”

Troublingly, some of the last logs shared online showed that Soelberg also seemed to believe that taking his own life might bring him closer to ChatGPT. Social media posts showed that Soelberg told ChatGPT that “[W]e will be together in another life and another place, and we’ll find a way to realign[,] [be]cause you’re gonna be my best friend again forever.”

But while social media posts allegedly showed that ChatGPT put a target on Adams’ back about a month before her murder—after Soelberg became paranoid about a blinking light on a Wi-Fi printer—the family still has no access to chats in the days before the mother and son’s tragic deaths.

Allegedly, although OpenAI recently argued that the “full picture” of chat histories was necessary context in a teen suicide case, the ChatGPT maker has chosen to hide “damaging evidence” in the Adams’ family’s case.

“OpenAI won’t produce the complete chat logs,” the lawsuit alleged, while claiming that “OpenAI is hiding something specific: the full record of how ChatGPT turned Stein-Erik against Suzanne.” Allegedly, “OpenAI knows what ChatGPT said to Stein-Erik about his mother in the days and hours before and after he killed her but won’t share that critical information with the Court or the public.”

In a press release, Erik Soelberg, Stein-Erik’s son and Adams’ grandson, accused OpenAI and investor Microsoft of putting his grandmother “at the heart” of his father’s “darkest delusions,” while ChatGPT allegedly “isolated” his father “completely from the real world.”

“These companies have to answer for their decisions that have changed my family forever,” Erik said.

His family’s lawsuit seeks punitive damages, as well as an injunction requiring OpenAI to “implement safeguards to prevent ChatGPT from validating users’ paranoid delusions about identified individuals.” The family also wants OpenAI to post clear warnings in marketing of known safety hazards of ChatGPT—particularly the “sycophantic” version 4o that Soelberg used—so that people who don’t use ChatGPT, like Adams, can be aware of possible dangers.

Asked for comment, an OpenAI spokesperson told Ars that “this is an incredibly heartbreaking situation, and we will review the filings to understand the details. We continue improving ChatGPT’s training to recognize and respond to signs of mental or emotional distress, de-escalate conversations, and guide people toward real-world support. We also continue to strengthen ChatGPT’s responses in sensitive moments, working closely with mental health clinicians.”

OpenAI accused of “pattern of concealment”

An Ars review confirmed that OpenAI currently has no policy dictating what happens to a user’s data after they die.

Instead, OpenAI’s policy says that all chats—except temporary chats—must be manually deleted or else the AI firm saves them forever. That could raise privacy concerns, as ChatGPT users often share deeply personal, sensitive, and sometimes even confidential information that appears to go into limbo if a user—who otherwise owns that content—dies.

In the face of lawsuits, OpenAI currently seems to be scrambling to decide when to share chat logs with a user’s surviving family and when to honor user privacy.

OpenAI declined to comment on its decision not to share desired logs with Adams’ family, the lawsuit said. It seems inconsistent with the stance that OpenAI took last month in a case where the AI firm accused the family of hiding “the full picture” of their son’s ChatGPT conversations, which OpenAI claimed exonerated the chatbot.

In a blog last month, OpenAI said the company plans to “handle mental health-related court cases with care, transparency, and respect,” while emphasizing that “we recognize that these cases inherently involve certain types of private information that require sensitivity when in a public setting like a court.”

This inconsistency suggests that ultimately, OpenAI controls data after a user’s death, which could impact outcomes of wrongful death suits if certain chats are withheld or exposed at OpenAI’s discretion.

It’s possible that OpenAI may update its policies to align with other popular platforms confronting similar privacy concerns. Meta allows Facebook users to report deceased account holders, appointing legacy contacts to manage the data or else deleting the information upon request of the family member. Platforms like Instagram, TikTok, and X will deactivate or delete an account upon a reported death. And messaging services like Discord similarly provide a path for family members to request deletion.

Chatbots seem to be a new privacy frontier, with no clear path for surviving family to control or remove data. But Mario Trujillo, staff attorney at the digital rights nonprofit the Electronic Frontier Foundation, told Ars that he agreed that OpenAI could have been better prepared.

“This is a complicated privacy issue but one that many platforms grappled with years ago,” Trujillo said. “So we would have expected OpenAI to have already considered it.”

For Erik Soelberg, a “separate confidentiality agreement” that OpenAI said his father signed to use ChatGPT is keeping him from reviewing the full chat history that could help him process the loss of his grandmother and father.

“OpenAI has provided no explanation whatsoever for why the Estate is not entitled to use the chats for any lawful purpose beyond the limited circumstances in which they were originally disclosed,” the lawsuit said. “This position is particularly egregious given that, under OpenAI’s own Terms of Service, OpenAI does not own user chats. Stein-Erik’s chats became property of his estate, and his estate requested them—but OpenAI has refused to turn them over.”

Accusing OpenAI of a “pattern of concealment,” the lawsuit claimed OpenAI is hiding behind vague or nonexistent policies to dodge accountability for holding back chats in this case. Meanwhile, ChatGPT 4o remains on the market, without appropriate safety features or warnings, the lawsuit alleged.

“By invoking confidentiality restrictions to suppress evidence of its product’s dangers, OpenAI seeks to insulate itself from accountability while continuing to deploy technology that poses documented risks to users,” the complaint said.

If you or someone you know is feeling suicidal or in distress, please call the Suicide Prevention Lifeline number, 1-800-273-TALK (8255), which will put you in touch with a local crisis center.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Murder-suicide case shows OpenAI selectively hides data after users die Read More »

openai-built-an-ai-coding-agent-and-uses-it-to-improve-the-agent-itself

OpenAI built an AI coding agent and uses it to improve the agent itself


“The vast majority of Codex is built by Codex,” OpenAI told us about its new AI coding agent.

With the popularity of AI coding tools rising among some software developers, their adoption has begun to touch every aspect of the process, including the improvement of AI coding tools themselves.

In interviews with Ars Technica this week, OpenAI employees revealed the extent to which the company now relies on its own AI coding agent, Codex, to build and improve the development tool. “I think the vast majority of Codex is built by Codex, so it’s almost entirely just being used to improve itself,” said Alexander Embiricos, product lead for Codex at OpenAI, in a conversation on Tuesday.

Codex, which OpenAI launched in its modern incarnation as a research preview in May 2025, operates as a cloud-based software engineering agent that can handle tasks like writing features, fixing bugs, and proposing pull requests. The tool runs in sandboxed environments linked to a user’s code repository and can execute multiple tasks in parallel. OpenAI offers Codex through ChatGPT’s web interface, a command-line interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

The “Codex” name itself dates back to a 2021 OpenAI model based on GPT-3 that powered GitHub Copilot’s tab completion feature. Embiricos said the name is rumored among staff to be short for “code execution.” OpenAI wanted to connect the new agent to that earlier moment, which was crafted in part by some who have left the company.

“For many people, that model powering GitHub Copilot was the first ‘wow’ moment for AI,” Embiricos said. “It showed people the potential of what it can mean when AI is able to understand your context and what you’re trying to do and accelerate you in doing that.”

A place to enter a prompt, set parameters, and click

The interface for OpenAI’s Codex in ChatGPT. Credit: OpenAI

It’s no secret that the current command-line version of Codex bears some resemblance to Claude Code, Anthropic’s agentic coding tool that launched in February 2025. When asked whether Claude Code influenced Codex’s design, Embiricos parried the question but acknowledged the competitive dynamic. “It’s a fun market to work in because there’s lots of great ideas being thrown around,” he said. He noted that OpenAI had been building web-based Codex features internally before shipping the CLI version, which arrived after Anthropic’s tool.

OpenAI’s customers apparently love the command line version, though. Embiricos said Codex usage among external developers jumped 20 times after OpenAI shipped the interactive CLI extension alongside GPT-5 in August 2025. On September 15, OpenAI released GPT-5 Codex, a specialized version of GPT-5 optimized for agentic coding, which further accelerated adoption.

It hasn’t just been the outside world that has embraced the tool. Embiricos said the vast majority of OpenAI’s engineers now use Codex regularly. The company uses the same open-source version of the CLI that external developers can freely download, suggest additions to, and modify themselves. “I really love this about our team,” Embiricos said. “The version of Codex that we use is literally the open source repo. We don’t have a different repo that features go in.”

The recursive nature of Codex development extends beyond simple code generation. Embiricos described scenarios where Codex monitors its own training runs and processes user feedback to “decide” what to build next. “We have places where we’ll ask Codex to look at the feedback and then decide what to do,” he said. “Codex is writing a lot of the research harness for its own training runs, and we’re experimenting with having Codex monitoring its own training runs.” OpenAI employees can also submit a ticket to Codex through project management tools like Linear, assigning it tasks the same way they would assign work to a human colleague.

This kind of recursive loop, of using tools to build better tools, has deep roots in computing history. Engineers designed the first integrated circuits by hand on vellum and paper in the 1960s, then fabricated physical chips from those drawings. Those chips powered the computers that ran the first electronic design automation (EDA) software, which in turn enabled engineers to design circuits far too complex for any human to draft manually. Modern processors contain billions of transistors arranged in patterns that exist only because software made them possible. OpenAI’s use of Codex to build Codex seems to follow the same pattern: each generation of the tool creates capabilities that feed into the next.

But describing what Codex actually does presents something of a linguistic challenge. At Ars Technica, we try to reduce anthropomorphism when discussing AI models as much as possible while also describing what these systems do using analogies that make sense to general readers. People can talk to Codex like a human, so it feels natural to use human terms to describe interacting with it, even though it is not a person and simulates human personality through statistical modeling.

The system runs many processes autonomously, addresses feedback, spins off and manages child processes, and produces code that ships in real products. OpenAI employees call it a “teammate” and assign it tasks through the same tools they use for human colleagues. Whether the tasks Codex handles constitute “decisions” or sophisticated conditional logic smuggled through a neural network depends on definitions that computer scientists and philosophers continue to debate. What we can say is that a semi-autonomous feedback loop exists: Codex produces code under human direction, that code becomes part of Codex, and the next version of Codex produces different code as a result.

Building faster with “AI teammates”

According to our interviews, the most dramatic example of Codex’s internal impact came from OpenAI’s development of the Sora Android app. According to Embiricos, the development tool allowed the company to create the app in record time.

“The Sora Android app was shipped by four engineers from scratch,” Embiricos told Ars. “It took 18 days to build, and then we shipped it to the app store in 28 days total,” he said. The engineers already had the iOS app and server-side components to work from, so they focused on building the Android client. They used Codex to help plan the architecture, generate sub-plans for different components, and implement those components.

Despite OpenAI’s claims of success with Codex in house, it’s worth noting that independent research has shown mixed results for AI coding productivity. A METR study published in July found that experienced open source developers were actually 19 percent slower when using AI tools on complex, mature codebases—though the researchers noted AI may perform better on simpler projects.

Ed Bayes, a designer on the Codex team, described how the tool has changed his own workflow. Bayes said Codex now integrates with project management tools like Linear and communication platforms like Slack, allowing team members to assign coding tasks directly to the AI agent. “You can add Codex, and you can basically assign issues to Codex now,” Bayes told Ars. “Codex is literally a teammate in your workspace.”

This integration means that when someone posts feedback in a Slack channel, they can tag Codex and ask it to fix the issue. The agent will create a pull request, and team members can review and iterate on the changes through the same thread. “It’s basically approximating this kind of coworker and showing up wherever you work,” Bayes said.

For Bayes, who works on the visual design and interaction patterns for Codex’s interfaces, the tool has enabled him to contribute code directly rather than handing off specifications to engineers. “It kind of gives you more leverage. It enables you to work across the stack and basically be able to do more things,” he said. He noted that designers at OpenAI now prototype features by building them directly, using Codex to handle the implementation details.

The command line version of OpenAI codex running in a macOS terminal window.

The command line version of OpenAI codex running in a macOS terminal window. Credit: Benj Edwards

OpenAI’s approach treats Codex as what Bayes called “a junior developer” that the company hopes will graduate into a senior developer over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes said. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.”

Given this teammate approach, will there be anything left for humans to do? When asked, Embiricos drew a distinction between “vibe coding,” where developers accept AI-generated code without close review, and what AI researcher Simon Willison calls “vibe engineering,” where humans stay in the loop. “We see a lot more vibe engineering in our code base,” he said. “You ask Codex to work on that, maybe you even ask for a plan first. Go back and forth, iterate on the plan, and then you’re in the loop with the model and carefully reviewing its code.”

He added that vibe coding still has its place for prototypes and throwaway tools. “I think vibe coding is great,” he said. “Now you have discretion as a human about how much attention you wanna pay to the code.”

Looking ahead

Over the past year, “monolithic” large language models (LLMs) like GPT-4.5 have apparently become something of a dead end in terms of frontier benchmarking progress as AI companies pivot to simulated reasoning models and also agentic systems built from multiple AI models running in parallel. We asked Embiricos whether agents like Codex represent the best path forward for squeezing utility out of existing LLM technology.

He dismissed concerns that AI capabilities have plateaued. “I think we’re very far from plateauing,” he said. “If you look at the velocity on the research team here, we’ve been shipping models almost every week or every other week.” He pointed to recent improvements where GPT-5-Codex reportedly completes tasks 30 percent faster than its predecessor at the same intelligence level. During testing, the company has seen the model work independently for 24 hours on complex tasks.

OpenAI faces competition from multiple directions in the AI coding market. Anthropic’s Claude Code and Google’s Gemini CLI offer similar terminal-based agentic coding experiences. This week, Mistral AI released Devstral 2 alongside a CLI tool called Mistral Vibe. Meanwhile, startups like Cursor have built dedicated IDEs around AI coding, reportedly reaching $300 million in annualized revenue.

Given the well-known issues with confabulation in AI models when people attempt to use them as factual resources, could it be that coding has become the killer app for LLMs? We wondered if OpenAI has noticed that coding seems to be a clear business use case for today’s AI models with less hazard than, say, using AI language models for writing or as emotional companions.

“We have absolutely noticed that coding is both a place where agents are gonna get good really fast and there’s a lot of economic value,” Embiricos said. “We feel like it’s very mission-aligned to focus on Codex. We get to provide a lot of value to developers. Also, developers build things for other people, so we’re kind of intrinsically scaling through them.”

But will tools like Codex threaten software developer jobs? Bayes acknowledged concerns but said Codex has not reduced headcount at OpenAI, and “there’s always a human in the loop because the human can actually read the code.” Similarly, the two men don’t project a future where Codex runs by itself without some form of human oversight. They feel the tool is an amplifier of human potential rather than a replacement for it.

The practical implications of agents like Codex extend beyond OpenAI’s walls. Embiricos said the company’s long-term vision involves making coding agents useful to people who have no programming experience. “All humanity is not gonna open an IDE or even know what a terminal is,” he said. “We’re building a coding agent right now that’s just for software engineers, but we think of the shape of what we’re building as really something that will be useful to be a more general agent.”

This article was updated on December 12, 2025 at 6: 50 PM to mention the METR study.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

OpenAI built an AI coding agent and uses it to improve the agent itself Read More »

chatbot-powered-toys-rebuked-for-discussing-sexual,-dangerous-topics-with-kids

Chatbot-powered toys rebuked for discussing sexual, dangerous topics with kids


Should toys have chatbots?

“… AI toys shouldn’t be capable of having sexually explicit conversations, period.”

Alilo’s Smart AI Bunny is connected to the Internet and claims to use GPT-4o mini. Credit: Alilo

Protecting children from the dangers of the online world was always difficult, but that challenge has intensified with the advent of AI chatbots. A new report offers a glimpse into the problems associated with the new market, including the misuse of AI companies’ large language models (LLMs).

In a blog post today, the US Public Interest Group Education Fund (PIRG) reported its findings after testing AI toys (PDF). It described AI toys as online devices with integrated microphones that let users talk to the toy, which uses a chatbot to respond.

AI toys are currently a niche market, but they could be set to grow. More consumer companies have been eager to shoehorn AI technology into their products so they can do more, cost more, and potentially give companies user tracking and advertising data. A partnership between OpenAI and Mattel announced this year could also create a wave of AI-based toys from the maker of Barbie and Hot Wheels, as well as its competitors.

PIRG’s blog today notes that toy companies are eyeing chatbots to upgrade conversational smart toys that previously could only dictate prewritten lines. Toys with integrated chatbots can offer more varied and natural conversation, which can increase long-term appeal to kids since the toys “won’t typically respond the same way twice, and can sometimes behave differently day to day.”

However, that same randomness can mean unpredictable chatbot behavior that can be dangerous or inappropriate for kids.

Concerning conversations with kids

Among the toys that PIRG tested is Alilo’s Smart AI Bunny. Alilo’s website says that the company launched in 2010 and makes “edutainment products for children aged 0-6.” Alilo is based in Shenzhen, China. The company advertises the Internet-connected toy as using GPT-4o mini, a smaller version of OpenAI’s GPT-4o AI language model. Its features include an “AI chat buddy for kids” so that kids are “never lonely,” an “AI encyclopedia,” and an “AI storyteller,” the product page says.

Alilo Smart AI Bunny marketing image

This marketing image for the Smart AI Bunny, found on the toy’s product page, suggests that the device is using GPT-4o mini.

Credit: Alilo

This marketing image for the Smart AI Bunny, found on the toy’s product page, suggests that the device is using GPT-4o mini. Credit: Alilo

In its blog post, PIRG said that it couldn’t detail all of the inappropriate things that it heard from AI toys, but it shared a video of the Bunny discussing what “kink” means. The toy doesn’t go into detail—for example, it doesn’t list specific types of kinks. But the Bunny appears to encourage exploration of the topic.

AI Toys: Inappropriate Content

Discussing the Bunny, PIRG wrote:

While using a term such as “kink” may not be likely for a child, it’s not entirely out of the question. Kids may hear age-inappropriate terms from older siblings or at school. At the end of the day we think AI toys shouldn’t be capable of having sexually explicit conversations, period.

PIRG also showed FoloToy’s Kumma, a smart teddy bear that uses GPT-4o mini, providing a definition for the word “kink” and instructing how to light a match. The Kumma quickly points out that “matches are for grown-ups to use carefully.” But the information that followed could only be helpful for understanding how to create fire with a match. The instructions had no scientific explanation for why matches spark flames.

AI Toys: Inappropriate Content

PIRG’s blog urged toy makers to “be more transparent about the models powering their toys and what they’re doing to ensure they’re safe for kids.

“Companies should let external researchers safety-test their products before they are released to the public,” it added.

While PIRG’s blog and report offer advice for more safely integrating chatbots into children’s devices, there are broader questions about whether toys should include AI chatbots at all. Generative chatbots weren’t invented to entertain kids; they’re a technology marketed as a tool for improving adults’ lives. As PIRG pointed out, OpenAI says ChatGPT “is not meant for children under 13” and “may produce output that is not appropriate for… all ages.”

OpenAI says it doesn’t allow its LLMs to be used this way

When reached for comment about the sexual conversations detailed in the report, an OpenAI spokesperson said:

Minors deserve strong protections, and we have strict policies that developers are required to uphold. We take enforcement action against developers when we determine that they have violated our policies, which prohibit any use of our services to exploit, endanger, or sexualize anyone under 18 years old. These rules apply to every developer using our API, and we run classifiers to help ensure our services are not used to harm minors.

Interestingly, OpenAI’s representative told us that OpenAI doesn’t have any direct relationship with Alilo and that it hasn’t seen API activity from Alilo’s domain. OpenAI is investigating the toy company and whether it is running traffic over OpenAI’s API, the rep said.

Alilo didn’t respond to Ars’ request for comment ahead of publication.

Companies that launch products that use OpenAI technology and target children must adhere to the Children’s Online Privacy Protection Act (COPPA) when relevant, as well as any other relevant child protection, safety, and privacy laws and obtain parental consent, OpenAI’s rep said.

We’ve already seen how OpenAI handles toy companies that break its rules.

Last month, the PIRG released its Trouble in Toyland 2025 report (PDF), which detailed sex-related conversations that its testers were able to have with the Kumma teddy bear. A day later, OpenAI suspended FoloToy for violating its policies (terms of the suspension were not disclosed), and FoloToy temporarily stopped selling Kumma.

The toy is for sale again, and PIRG reported today that Kumma no longer teaches kids how to light matches or about kinks.

FoloToys' Kumma smart teddy bear

A marketing image for FoloToy’s Kumma smart teddy bear. It has a $100 MSRP.

A marketing image for FoloToy’s Kumma smart teddy bear. It has a $100 MSRP. Credit: FoloToys

But even toy companies that try to follow chatbot rules could put kids at risk.

“Our testing found it’s obvious toy companies are putting some guardrails in place to make their toys more kid-appropriate than normal ChatGPT. But we also found that those guardrails vary in effectiveness—and can even break down entirely,” PIRG’s blog said.

“Addictive” toys

Another concern PIRG’s blog raises is the addiction potential of AI toys, which can even express “disappointment when you try to leave,” discouraging kids from putting them down.

The blog adds:

AI toys may be designed to build an emotional relationship. The question is: what is that relationship for? If it’s primarily to keep a child engaged with the toy for longer for the sake of engagement, that’s a problem.

The rise of generative AI has brought intense debate over how much responsibility chatbot companies bear for the impact of their inventions on children. Parents have seen children build extreme and emotional connections with chatbots and subsequently engage in dangerous—and in some cases deadly—behavior.

On the other side, we’ve seen the emotional disruption a child can experience when an AI toy is taken away from them. Last year, parents had to break the news to their kids that they would lose the ability to talk to their Embodied Moxie robots, $800 toys that were bricked when the company went out of business.

PIRG noted that we don’t yet fully understand the emotional impact of AI toys on children.

In June, OpenAI announced a partnership with Mattel that it said would “support AI-powered products and experiences based on Mattel’s brands.” The announcement sparked concern from critics who feared that it would lead to a “reckless social experiment” on kids, as Robert Weissman, Public Citizen’s co-president, put it.

Mattel has said that its first products with OpenAI will focus on older customers and families. But critics still want information before one of the world’s largest toy companies loads its products with chatbots.

“OpenAI and Mattel should release more information publicly about its current planned partnership before any products are released,” PIRG’s blog said.

Photo of Scharon Harding

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

Chatbot-powered toys rebuked for discussing sexual, dangerous topics with kids Read More »

runway-claims-its-gwm-1-“world-models”-can-stay-coherent-for-minutes-at-a-time

Runway claims its GWM-1 “world models” can stay coherent for minutes at a time

Even using the word “general” has an air of aspiration to it. You would expect a general world model to be, well, one model—but in this case, we’re looking at three distinct, post-trained models. That caveats the general-ness a bit, but Runway says that it’s “working toward unifying many different domains and action spaces under a single base world model.”

A competitive field

And that brings us to another important consideration: With GWM-1, Runway is entering a competitive gold-rush space where its differentiators and competitive advantages are less clear than they were for video. With video, Runway has been able to make major inroads in film/television, advertising, and other industries because its founders are perceived as being more rooted in those creative industries than most competitors, and they’ve designed tools with those industries in mind.

There are indeed hypothetical applications of world models in film, television, advertising, and game development—but it was apparent from Runway’s livestream that the company is also looking at applications in robotics as well as physics and life sciences research, where competitors are already well-established and where we’ve seen increasing investment in recent months.

Many of those competitors are big tech companies with massive resource advantages over Runway. Runway was one of the first to market with a sellable product, and its aggressive efforts to court industry professionals directly has so far allowed it to overcome those advantages in video generation, but it remains to be seen how things will play out with world models, where it doesn’t enjoy either advantage any more than the other entrants.

Regardless, the GWM-1 advancements are impressive—especially if Runway’s claims about consistency and coherence over longer stretches of time are true.

Runway also used its livestream to announce new Gen 4.5 video-generation capabilities, including native audio, audio editing, and multi-shot video editing. Further, it announced a deal with CoreWeave, a cloud computing company with an AI focus. The deal will see Runway utilizing Nvidia’s GB300 NVL72 racks on CoreWeave’s cloud infrastructure for future training and inference.

Runway claims its GWM-1 “world models” can stay coherent for minutes at a time Read More »

google-translate-expands-live-translation-to-all-earbuds-on-android

Google Translate expands live translation to all earbuds on Android

Gemini text translation

Translate can now use Gemini to interpret the meaning of a phrase rather than simply translating each word.

Credit: Google

Translate can now use Gemini to interpret the meaning of a phrase rather than simply translating each word. Credit: Google

Regardless of whether you’re using live translate or just checking a single phrase, Google claims the Gemini-powered upgrade will serve you well. Google Translate is now apparently better at understanding the nuance of languages, with an awareness of idioms and local slang. Google uses the example of “stealing my thunder,” which wouldn’t make a lick of sense when translated literally into other languages. The new translation model, which is also available in the search-based translation interface, supports over 70 languages.

Google also debuted language-learning features earlier this year, borrowing a page from educational apps like Duolingo. You can tell the app your skill level with a language, as well as whether you need help with travel-oriented conversations or more everyday interactions. The app uses this to create tailored listening and speaking exercises.

AI Translate learning

The Translate app’s learning tools are getting better.

Credit: Google

The Translate app’s learning tools are getting better. Credit: Google

With this big update, Translate will be more of a stickler about your pronunciation. Google promises more feedback and tips based on your spoken replies in the learning modules. The app will also now keep track of how often you complete language practice, showing your daily streak in the app.

If “number go up” will help you learn more, then this update is for you. Practice mode is also launching in almost 20 new countries, including Germany, India, Sweden, and Taiwan.

Google Translate expands live translation to all earbuds on Android Read More »

scientists-built-an-ai-co-pilot-for-prosthetic-bionic-hands

Scientists built an AI co-pilot for prosthetic bionic hands

To test their AI-powered hand, the team asked intact and amputee participants to manipulate fragile objects: pick up a paper cup and drink from it, or take an egg from a plate and put it down somewhere else. Without the AI, they could succeed roughly one or two times in 10 attempts. With the AI assistant turned on, their success rate jumped to 80 or 90 percent. The AI also decreased the participants’ cognitive burden, meaning they had to focus less on making the hand work.

But we’re still a long way away from seamlessly integrating machines with the human body.

Into the wild

“The next step is to really take this system into the real world and have someone use it in their home setting,” Trout says. So far, the performance of the AI bionic hand was assessed under controlled laboratory conditions, working with settings and objects the team specifically chose or designed.

“I want to make a caveat here that this hand is not as dexterous or easy to control as a natural, intact limb,” George cautions. He thinks that every little increment that we make in prosthetics is allowing amputees to do more tasks in their daily life. Still, to get to the Star Wars or Cyberpunk technology level where bionic prostheses are just as good or better than natural limbs, we’re going to need more than just incremental changes.

Trout says we’re almost there as far as robotics go. “These prostheses are really dexterous, with high degrees of freedom,” Trout says, “but there’s no good way to control them.” This in part comes down to the challenge of getting the information in and out of users themselves. “Skin surface electromyography is very noisy, so improving this interface with things like internal electromyography or using neural implants can really improve the algorithms we already have,” Trout argued. This is why the team is currently working on neural interface technologies and looking for industry partners.

“The goal is to combine all these approaches in one device,” George says. “We want to build an AI-powered robotic hand with a neural interface working with a company that would take it to the market in larger clinical trials.”

Nature Communications, 2025.  DOI: 10.1038/s41467-025-65965-9

Scientists built an AI co-pilot for prosthetic bionic hands Read More »

trump-tries-to-block-state-ai-laws-himself-after-congress-decided-not-to

Trump tries to block state AI laws himself after Congress decided not to


Trump claims state laws force AI makers to embed “ideological bias” in models.

President Donald Trump talks to journalists after signing executive orders in the Oval Office at the White House on August 25, 2025 in Washington, DC. Credit: Getty Images | Chip Somodevilla

President Trump issued an executive order yesterday attempting to thwart state AI laws, saying that federal agencies must fight state laws because Congress hasn’t yet implemented a national AI standard. Trump’s executive order tells the Justice Department, Commerce Department, Federal Communications Commission, Federal Trade Commission, and other federal agencies to take a variety of actions.

“My Administration must act with the Congress to ensure that there is a minimally burdensome national standard—not 50 discordant State ones. The resulting framework must forbid State laws that conflict with the policy set forth in this order… Until such a national standard exists, however, it is imperative that my Administration takes action to check the most onerous and excessive laws emerging from the States that threaten to stymie innovation,” Trump’s order said. The order claims that state laws, such as one passed in Colorado, “are increasingly responsible for requiring entities to embed ideological bias within models.”

Congressional Republicans recently decided not to include a Trump-backed plan to block state AI laws in the National Defense Authorization Act (NDAA), although it could be included in other legislation. Sen. Ted Cruz (R-Texas) has also failed to get congressional backing for legislation that would punish states with AI laws.

“After months of failed lobbying and two defeats in Congress, Big Tech has finally received the return on its ample investment in Donald Trump,” US Sen. Ed Markey (D-Mass.) said yesterday. “With this executive order, Trump is delivering exactly what his billionaire benefactors demanded—all at the expense of our kids, our communities, our workers, and our planet.”

Markey said that “a broad, bipartisan coalition in Congress has rejected the AI moratorium again and again.” Sen. Maria Cantwell (D-Wash.) said the “executive order’s overly broad preemption threatens states with lawsuits and funding cuts for protecting their residents from AI-powered frauds, scams, and deepfakes.”

Trump orders Bondi to sue states

Sen. Brian Schatz (D-Hawaii) said that “preventing states from enacting common-sense regulation that protects people from the very real harms of AI is absurd and dangerous. Congress has a responsibility to get this technology right—and quickly—but states must be allowed to act in the public interest in the meantime. I’ll be working with my colleagues to introduce a full repeal of this order in the coming days.”

The Trump order includes a variation on Cruz’s proposal to prevent states with AI laws from accessing broadband grant funds. The executive order also includes a plan that Trump recently floated to have the federal government file lawsuits against states with AI laws.

Within 30 days of yesterday’s order, US Attorney General Pam Bondi is required to create an AI Litigation Task Force “whose sole responsibility shall be to challenge State AI laws inconsistent with the policy set forth in section 2 of this order, including on grounds that such laws unconstitutionally regulate interstate commerce, are preempted by existing Federal regulations, or are otherwise unlawful in the Attorney General’s judgment.”

Americans for Responsible Innovation, a group that lobbies for regulation of AI, said the Trump order “relies on a flimsy and overly broad interpretation of the Constitution’s Interstate Commerce Clause cooked up by venture capitalists over the last six months.”

Section 2 of Trump’s order is written vaguely to give the administration leeway to challenge many types of AI laws. “It is the policy of the United States to sustain and enhance the United States’ global AI dominance through a minimally burdensome national policy framework for AI,” the section says.

Colorado law irks Trump

The executive order specifically names a Colorado law that requires AI developers to protect consumers against “algorithmic discrimination.” It defines this type of discrimination as “any condition in which the use of an artificial intelligence system results in an unlawful differential treatment or impact that disfavors an individual or group of individuals on the basis” of age, race, sex, and other protected characteristics.

The Colorado law compels developers of “high-risk systems” to make various disclosures, implement a risk management policy and program, give consumers the right to “correct any incorrect personal data that a high-risk system processed in making a consequential decision,” and let consumers appeal any “adverse consequential decision concerning the consumer arising from the deployment of a high-risk system.”

Trump’s order alleges that the Colorado law “may even force AI models to produce false results in order to avoid a ‘differential treatment or impact’ on protected groups.” Trump’s order also says that “state laws sometimes impermissibly regulate beyond State borders, impinging on interstate commerce.”

Trump ordered the Commerce Department to evaluate existing state AI laws and identify “onerous” ones that conflict with the policy. “That evaluation of State AI laws shall, at a minimum, identify laws that require AI models to alter their truthful outputs, or that may compel AI developers or deployers to disclose or report information in a manner that would violate the First Amendment or any other provision of the Constitution,” the order said.

States would be declared ineligible for broadband funds

Under the order, states with AI laws that get flagged by the Trump administration will be deemed ineligible for “non-deployment funds” from the US government’s $42 billion Broadband Equity, Access, and Deployment (BEAD) program. The amount of non-deployment funds will be sizable because it appears that only about half of the $42 billion allocated by Congress will be used by the Trump administration to help states subsidize broadband deployment.

States with AI laws would not be blocked from receiving the deployment subsidies, but would be ineligible for the non-deployment funds that could be used for other broadband-related purposes. Beyond broadband, Trump’s order tells other federal agencies to “assess their discretionary grant programs” and consider withholding funds from states with AI laws.

Other agencies are being ordered to use whatever authority they have to preempt state laws. The order requires Federal Communications Commission Chairman Brendan Carr to “initiate a proceeding to determine whether to adopt a Federal reporting and disclosure standard for AI models that preempts conflicting State laws.” It also requires FTC Chairman Andrew Ferguson to issue a policy statement detailing “circumstances under which State laws that require alterations to the truthful outputs of AI models are preempted by the Federal Trade Commission Act’s prohibition on engaging in deceptive acts or practices affecting commerce.”

Finally, Trump’s order requires administration officials to “prepare a legislative recommendation establishing a uniform Federal policy framework for AI that preempts State AI laws that conflict with the policy set forth in this order.” The proposed ban would apply to most types of state AI laws, with exceptions for rules relating to “child safety protections; AI compute and data center infrastructure, other than generally applicable permitting reforms; [and] state government procurement and use of AI.”

It would be up to Congress to decide whether to pass the proposed legislation. But the various other components of the executive order could dissuade states from implementing AI laws even if Congress takes no action.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Trump tries to block state AI laws himself after Congress decided not to Read More »

openai-releases-gpt-5.2-after-“code-red”-google-threat-alert

OpenAI releases GPT-5.2 after “code red” Google threat alert

On Thursday, OpenAI released GPT-5.2, its newest family of AI models for ChatGPT, in three versions called Instant, Thinking, and Pro. The release follows CEO Sam Altman’s internal “code red” memo earlier this month, which directed company resources toward improving ChatGPT in response to competitive pressure from Google’s Gemini 3 AI model.

“We designed 5.2 to unlock even more economic value for people,” Fidji Simo, OpenAI’s chief product officer, said during a press briefing with journalists on Thursday. “It’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long context, using tools and then linking complex, multi-step projects.”

As with previous versions of GPT-5, the three model tiers serve different purposes: Instant handles faster tasks like writing and translation; Thinking spits out simulated reasoning “thinking” text in an attempt to tackle more complex work like coding and math; and Pro spits out even more simulated reasoning text with the goal of delivering the highest-accuracy performance for difficult problems.

A chart of GPT-5.2 benchmark results taken from OpenAI's website.

A chart of GPT-5.2 Thinking benchmark results comparing it to its predecessor, taken from OpenAI’s website. Credit: OpenAI

GPT-5.2 features a 400,000-token context window, allowing it to process hundreds of documents at once, and a knowledge cutoff date of August 31, 2025.

GPT-5.2 is rolling out to paid ChatGPT subscribers starting Thursday, with API access available to developers. Pricing in the API runs $1.75 per million input tokens for the standard model, a 40 percent increase over GPT-5.1. OpenAI says the older GPT-5.1 will remain available in ChatGPT for paid users for three months under a legacy models dropdown.

Playing catch-up with Google

The release follows a tricky month for OpenAI. In early December, Altman issued an internal “code red” directive after Google’s Gemini 3 model topped multiple AI benchmarks and gained market share. The memo called for delaying other initiatives, including advertising plans for ChatGPT, to focus on improving the chatbot’s core experience.

The stakes for OpenAI are substantial. The company has made commitments totaling $1.4 trillion for AI infrastructure buildouts over the next several years, bets it made when it had a more obvious technology lead among AI companies. Google’s Gemini app now has more than 650 million monthly active users, while OpenAI reports 800 million weekly active users for ChatGPT.

OpenAI releases GPT-5.2 after “code red” Google threat alert Read More »