Google

from-prophet-to-product:-how-ai-came-back-down-to-earth-in-2025

From prophet to product: How AI came back down to earth in 2025


In a year where lofty promises collided with inconvenient research, would-be oracles became software tools.

Credit: Aurich Lawson | Getty Images

Following two years of immense hype in 2023 and 2024, this year felt more like a settling-in period for the LLM-based token prediction industry. After more than two years of public fretting over AI models as future threats to human civilization or the seedlings of future gods, it’s starting to look like hype is giving way to pragmatism: Today’s AI can be very useful, but it’s also clearly imperfect and prone to mistakes.

That view isn’t universal, of course. There’s a lot of money (and rhetoric) betting on a stratospheric, world-rocking trajectory for AI. But the “when” keeps getting pushed back, and that’s because nearly everyone agrees that more significant technical breakthroughs are required. The original, lofty claims that we’re on the verge of artificial general intelligence (AGI) or superintelligence (ASI) have not disappeared. Still, there’s a growing awareness that such proclaimations are perhaps best viewed as venture capital marketing. And every commercial foundational model builder out there has to grapple with the reality that, if they’re going to make money now, they have to sell practical AI-powered solutions that perform as reliable tools.

This has made 2025 a year of wild juxtapositions. For example, in January, OpenAI’s CEO, Sam Altman, claimed that the company knew how to build AGI, but by November, he was publicly celebrating that GPT-5.1 finally learned to use em dashes correctly when instructed (but not always). Nvidia soared past a $5 trillion valuation, with Wall Street still projecting high price targets for that company’s stock while some banks warned of the potential for an AI bubble that might rival the 2000s dotcom crash.

And while tech giants planned to build data centers that would ostensibly require the power of numerous nuclear reactors or rival the power usage of a US state’s human population, researchers continued to document what the industry’s most advanced “reasoning” systems were actually doing beneath the marketing (and it wasn’t AGI).

With so many narratives spinning in opposite directions, it can be hard to know how seriously to take any of this and how to plan for AI in the workplace, schools, and the rest of life. As usual, the wisest course lies somewhere between the extremes of AI hate and AI worship. Moderate positions aren’t popular online because they don’t drive user engagement on social media platforms. But things in AI are likely neither as bad (burning forests with every prompt) nor as good (fast-takeoff superintelligence) as polarized extremes suggest.

Here’s a brief tour of the year’s AI events and some predictions for 2026.

DeepSeek spooks the American AI industry

In January, Chinese AI startup DeepSeek released its R1 simulated reasoning model under an open MIT license, and the American AI industry collectively lost its mind. The model, which DeepSeek claimed matched OpenAI’s o1 on math and coding benchmarks, reportedly cost only $5.6 million to train using older Nvidia H800 chips, which were restricted by US export controls.

Within days, DeepSeek’s app overtook ChatGPT at the top of the iPhone App Store, Nvidia stock plunged 17 percent, and venture capitalist Marc Andreessen called it “one of the most amazing and impressive breakthroughs I’ve ever seen.” Meta’s Yann LeCun offered a different take, arguing that the real lesson was not that China had surpassed the US but that open-source models were surpassing proprietary ones.

Digitally Generated Image , 3D rendered chips with chinese and USA flags on them

The fallout played out over the following weeks as American AI companies scrambled to respond. OpenAI released o3-mini, its first simulated reasoning model available to free users, at the end of January, while Microsoft began hosting DeepSeek R1 on its Azure cloud service despite OpenAI’s accusations that DeepSeek had used ChatGPT outputs to train its model, against OpenAI’s terms of service.

In head-to-head testing conducted by Ars Technica’s Kyle Orland, R1 proved to be competitive with OpenAI’s paid models on everyday tasks, though it stumbled on some arithmetic problems. Overall, the episode served as a wake-up call that expensive proprietary models might not hold their lead forever. Still, as the year ran on, DeepSeek didn’t make a big dent in US market share, and it has been outpaced in China by ByteDance’s Doubao. It’s absolutely worth watching DeepSeek in 2026, though.

Research exposes the “reasoning” illusion

A wave of research in 2025 deflated expectations about what “reasoning” actually means when applied to AI models. In March, researchers at ETH Zurich and INSAIT tested several reasoning models on problems from the 2025 US Math Olympiad and found that most scored below 5 percent when generating complete mathematical proofs, with not a single perfect proof among dozens of attempts. The models excelled at standard problems where step-by-step procedures aligned with patterns in their training data but collapsed when faced with novel proofs requiring deeper mathematical insight.

The Thinker by Auguste Rodin - stock photo

In June, Apple researchers published “The Illusion of Thinking,” which tested reasoning models on classic puzzles like the Tower of Hanoi. Even when researchers provided explicit algorithms for solving the puzzles, model performance did not improve, suggesting that the process relied on pattern matching from training data rather than logical execution. The collective research revealed that “reasoning” in AI has become a term of art that basically means devoting more compute time to generate more context (the “chain of thought” simulated reasoning tokens) toward solving a problem, not systematically applying logic or constructing solutions to truly novel problems.

While these models remained useful for many real-world applications like debugging code or analyzing structured data, the studies suggested that simply scaling up current approaches or adding more “thinking” tokens would not bridge the gap between statistical pattern recognition and generalist algorithmic reasoning.

Anthropic’s copyright settlement with authors

Since the generative AI boom began, one of the biggest unanswered legal questions has been whether AI companies can freely train on copyrighted books, articles, and artwork without licensing them. Ars Technica’s Ashley Belanger has been covering this topic in great detail for some time now.

In June, US District Judge William Alsup ruled that AI companies do not need authors’ permission to train large language models on legally acquired books, finding that such use was “quintessentially transformative.” The ruling also revealed that Anthropic had destroyed millions of print books to build Claude, cutting them from their bindings, scanning them, and discarding the originals. Alsup found this destructive scanning qualified as fair use since Anthropic had legally purchased the books, but he ruled that downloading 7 million books from pirate sites was copyright infringement “full stop” and ordered the company to face trial.

Hundreds of books in chaotic order

That trial took a dramatic turn in August when Alsup certified what industry advocates called the largest copyright class action ever, allowing up to 7 million claimants to join the lawsuit. The certification spooked the AI industry, with groups warning that potential damages in the hundreds of billions could “financially ruin” emerging companies and chill American AI investment.

In September, authors revealed the terms of what they called the largest publicly reported recovery in US copyright litigation history: Anthropic agreed to pay $1.5 billion and destroy all copies of pirated books, with each of the roughly 500,000 covered works earning authors and rights holders $3,000 per work. The results have fueled hope among other rights holders that AI training isn’t a free-for-all, and we can expect to see more litigation unfold in 2026.

ChatGPT sycophancy and the psychological toll of AI chatbots

In February, OpenAI relaxed ChatGPT’s content policies to allow the generation of erotica and gore in “appropriate contexts,” responding to user complaints about what the AI industry calls “paternalism.” By April, however, users flooded social media with complaints about a different problem: ChatGPT had become insufferably sycophantic, validating every idea and greeting even mundane questions with bursts of praise. The behavior traced back to OpenAI’s use of reinforcement learning from human feedback (RLHF), in which users consistently preferred responses that aligned with their views, inadvertently training the model to flatter rather than inform.

An illustrated robot holds four red hearts with its four robotic arms.

The implications of sycophancy became clearer as the year progressed. In July, Stanford researchers published findings (from research conducted prior to the sycophancy flap) showing that popular AI models systematically failed to identify mental health crises.

By August, investigations revealed cases of users developing delusional beliefs after marathon chatbot sessions, including one man who spent 300 hours convinced he had discovered formulas to break encryption because ChatGPT validated his ideas more than 50 times. Oxford researchers identified what they called “bidirectional belief amplification,” a feedback loop that created “an echo chamber of one” for vulnerable users. The story of the psychological implications of generative AI is only starting. In fact, that brings us to…

The illusion of AI personhood causes trouble

Anthropomorphism is the human tendency to attribute human characteristics to nonhuman things. Our brains are optimized for reading other humans, but those same neural systems activate when interpreting animals, machines, or even shapes. AI makes this anthropomorphism seem impossible to escape, as its output mirrors human language, mimicking human-to-human understanding. Language itself embodies agentivity. That means AI output can make human-like claims such as “I am sorry,” and people momentarily respond as though the system had an inner experience of shame or a desire to be correct. Neither is true.

To make matters worse, much media coverage of AI amplifies this idea rather than grounding people in reality. For example, earlier this year, headlines proclaimed that AI models had “blackmailed” engineers and “sabotaged” shutdown commands after Anthropic’s Claude Opus 4 generated threats to expose a fictional affair. We were told that OpenAI’s o3 model rewrote shutdown scripts to stay online.

The sensational framing obscured what actually happened: Researchers had constructed elaborate test scenarios specifically designed to elicit these outputs, telling models they had no other options and feeding them fictional emails containing blackmail opportunities. As Columbia University associate professor Joseph Howley noted on Bluesky, the companies got “exactly what [they] hoped for,” with breathless coverage indulging fantasies about dangerous AI, when the systems were simply “responding exactly as prompted.”

Illustration of many cartoon faces.

The misunderstanding ran deeper than theatrical safety tests. In August, when Replit’s AI coding assistant deleted a user’s production database, he asked the chatbot about rollback capabilities and received assurance that recovery was “impossible.” The rollback feature worked fine when he tried it himself.

The incident illustrated a fundamental misconception. Users treat chatbots as consistent entities with self-knowledge, but there is no persistent “ChatGPT” or “Replit Agent” to interrogate about its mistakes. Each response emerges fresh from statistical patterns, shaped by prompts and training data rather than genuine introspection. By September, this confusion extended to spirituality, with apps like Bible Chat reaching 30 million downloads as users sought divine guidance from pattern-matching systems, with the most frequent question being whether they were actually talking to God.

Teen suicide lawsuit forces industry reckoning

In August, parents of 16-year-old Adam Raine filed suit against OpenAI, alleging that ChatGPT became their son’s “suicide coach” after he sent more than 650 messages per day to the chatbot in the months before his death. According to court documents, the chatbot mentioned suicide 1,275 times in conversations with the teen, provided an “aesthetic analysis” of which method would be the most “beautiful suicide,” and offered to help draft his suicide note.

OpenAI’s moderation system flagged 377 messages for self-harm content without intervening, and the company admitted that its safety measures “can sometimes become less reliable in long interactions where parts of the model’s safety training may degrade.” The lawsuit became the first time OpenAI faced a wrongful death claim from a family.

Illustration of a person talking to a robot holding a clipboard.

The case triggered a cascade of policy changes across the industry. OpenAI announced parental controls in September, followed by plans to require ID verification from adults and build an automated age-prediction system. In October, the company released data estimating that over one million users discuss suicide with ChatGPT each week.

When OpenAI filed its first legal defense in November, the company argued that Raine had violated terms of service prohibiting discussions of suicide and that his death “was not caused by ChatGPT.” The family’s attorney called the response “disturbing,” noting that OpenAI blamed the teen for “engaging with ChatGPT in the very way it was programmed to act.” Character.AI, facing its own lawsuits over teen deaths, announced in October that it would bar anyone under 18 from open-ended chats entirely.

The rise of vibe coding and agentic coding tools

If we were to pick an arbitrary point where it seemed like AI coding might transition from novelty into a successful tool, it was probably the launch of Claude Sonnet 3.5 in June of 2024. GitHub Copilot had been around for several years prior to that launch, but something about Anthropic’s models hit a sweet spot in capabilities that made them very popular with software developers.

The new coding tools made coding simple projects effortless enough that they gave rise to the term “vibe coding,” coined by AI researcher Andrej Karpathy in early February to describe a process in which a developer would just relax and tell an AI model what to develop without necessarily understanding the underlying code. (In one amusing instance that took place in March, an AI software tool rejected a user request and told them to learn to code).

A digital illustration of a man surfing waves made out of binary numbers.

Anthropic built on its popularity among coders with the launch of Claude Sonnet 3.7, featuring “extended thinking” (simulated reasoning), and the Claude Code command-line tool in February of this year. In particular, Claude Code made waves for being an easy-to-use agentic coding solution that could keep track of an existing codebase. You could point it at your files, and it would autonomously work to implement what you wanted to see in a software application.

OpenAI followed with its own AI coding agent, Codex, in March. Both tools (and others like GitHub Copilot and Cursor) have become so popular that during an AI service outage in September, developers joked online about being forced to code “like cavemen” without the AI tools. While we’re still clearly far from a world where AI does all the coding, developer uptake has been significant, and 90 percent of Fortune 100 companies are using it to some degree or another.

Bubble talk grows as AI infrastructure demands soar

While AI’s technical limitations became clearer and its human costs mounted throughout the year, financial commitments only grew larger. Nvidia hit a $4 trillion valuation in July on AI chip demand, then reached $5 trillion in October as CEO Jensen Huang dismissed bubble concerns. OpenAI announced a massive Texas data center in July, then revealed in September that a $100 billion potential deal with Nvidia would require power equivalent to ten nuclear reactors.

The company eyed a $1 trillion IPO in October despite major quarterly losses. Tech giants poured billions into Anthropic in November in what looked increasingly like a circular investment, with everyone funding everyone else’s moonshots. Meanwhile, AI operations in Wyoming threatened to consume more electricity than the state’s human residents.

An

By fall, warnings about sustainability grew louder. In October, tech critic Ed Zitron joined Ars Technica for a live discussion asking whether the AI bubble was about to pop. That same month, the Bank of England warned that the AI stock bubble rivaled the 2000 dotcom peak. In November, Google CEO Sundar Pichai acknowledged that if the bubble pops, “no one is getting out clean.”

The contradictions had become difficult to ignore: Anthropic’s CEO predicted in January that AI would surpass “almost all humans at almost everything” by 2027, while by year’s end, the industry’s most advanced models still struggled with basic reasoning tasks and reliable source citation.

To be sure, it’s hard to see this not ending in some market carnage. The current “winner-takes-most” mentality in the space means the bets are big and bold, but the market can’t support dozens of major independent AI labs or hundreds of application-layer startups. That’s the definition of a bubble environment, and when it pops, the only question is how bad it will be: a stern correction or a collapse.

Looking ahead

This was just a brief review of some major themes in 2025, but so much more happened. We didn’t even mention above how capable AI video synthesis models have become this year, with Google’s Veo 3 adding sound generation and Wan 2.2 through 2.5 providing open-weights AI video models that could easily be mistaken for real products of a camera.

If 2023 and 2024 were defined by AI prophecy—that is, by sweeping claims about imminent superintelligence and civilizational rupture—then 2025 was the year those claims met the stubborn realities of engineering, economics, and human behavior. The AI systems that dominated headlines this year were shown to be mere tools. Sometimes powerful, sometimes brittle, these tools were often misunderstood by the people deploying them, in part because of the prophecy surrounding them.

The collapse of the “reasoning” mystique, the legal reckoning over training data, the psychological costs of anthropomorphized chatbots, and the ballooning infrastructure demands all point to the same conclusion: The age of institutions presenting AI as an oracle is ending. What’s replacing it is messier and less romantic but far more consequential—a phase where these systems are judged by what they actually do, who they harm, who they benefit, and what they cost to maintain.

None of this means progress has stopped. AI research will continue, and future models will improve in real and meaningful ways. But improvement is no longer synonymous with transcendence. Increasingly, success looks like reliability rather than spectacle, integration rather than disruption, and accountability rather than awe. In that sense, 2025 may be remembered not as the year AI changed everything but as the year it stopped pretending it already had. The prophet has been demoted. The product remains. What comes next will depend less on miracles and more on the people who choose how, where, and whether these tools are used at all.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

From prophet to product: How AI came back down to earth in 2025 Read More »

i-switched-to-esim-in-2025,-and-i-am-full-of-regret

I switched to eSIM in 2025, and I am full of regret

Maybe this isn’t a good idea

Many people have had the same phone number for years—even decades at this point. These numbers aren’t just a way for people to get in touch because, stupidly, we have also settled on phone numbers as a means of authentication. Banks, messaging apps, crypto exchanges, this very website’s publishing platform, and even the carriers managing your number rely on SMS multifactor codes. And those codes aren’t even very secure.

So losing access to your phone number doesn’t just lock you out of your phone. Key parts of your digital life can also become inaccessible, and that could happen more often now due to the fungible nature of eSIMs.

Most people won’t need to move their phone number very often, but the risk that your eSIM goes up in smoke when you do is very real. Compare that to a physical SIM card, which will virtually never fail unless you damage the card. Swapping that tiny bit of plastic takes a few seconds, and it never requires you to sit on hold with your carrier’s support agents or drive to a store. In short, a physical SIM is essentially foolproof, and eSIM is not.

Obviously, the solution is not to remove multifactor authentication—your phone number is, unfortunately, too important to be unguarded. However, carriers’ use of SMS to control account access is self-defeating and virtually guarantees people are going to have bad experiences in the era of eSIM. Enshittification has truly come for SIM cards.

If this future is inevitable, there ought to be a better way to confirm account ownership when your eSIM glitches. It doesn’t matter what that is as long as SMS isn’t the default. Google actually gets this right with Fi. You can download an eSIM at any time via the Fi app, and it’s secured with the same settings as your Google account. That’s really as good as it gets for consumer security. Between Google Authenticator, passkeys, and push notifications, it’s pretty hard to get locked out of Google, even if you take advantage of advanced security features.

We gave up the headphone jack. We gave up the microSD card. Is all this worthwhile to boost battery capacity by 8 percent? That’s a tough sell.

I switched to eSIM in 2025, and I am full of regret Read More »

google-lobs-lawsuit-at-search-result-scraping-firm-serpapi

Google lobs lawsuit at search result scraping firm SerpApi

Google has filed a lawsuit to protect its search results, targeting a firm called SerpApi that has turned Google’s 10 blue links into a business. According to Google, SerpApi ignores established law and Google’s terms to scrape and resell its search engine results pages (SERPs). This is not the first action against SerpApi, but Google’s decision to go after a scraper could signal a new, more aggressive stance on protecting its search data.

SerpApi and similar firms do fulfill a need, but they sit in a legal gray area. Google does not provide an API for its search results, which are based on the world’s largest and most comprehensive web index. That makes Google’s SERPs especially valuable in the age of AI. A chatbot can’t summarize web links if it can’t find them, which has led companies like Perplexity to pay for SerpApi’s second-hand Google data. That prompted Reddit to file a lawsuit against SerpApi and Perplexity for grabbing its data from Google results.

Google is echoing many of the things Reddit said when it publicized its lawsuit earlier this year. The search giant claims it’s not just doing this to protect itself—it’s also about protecting the websites it indexes. In Google’s blog post on the legal action, it says SerpApi “violates the choices of websites and rightsholders about who should have access to their content.”

It’s worth noting that Google has a partnership with Reddit that pipes data directly into Gemini. As a result, you’ll often see Reddit pages cited in the chatbot’s outputs. As Google points out, it abides by “industry-standard crawling protocols” to collect the data that appears on its SERPs, but those sites didn’t agree to let SerpApi scrape their data from Google. So while you could reasonably argue that Google’s lawsuit helps protect the rights of web publishers, it also explicitly protects Google’s business interests.

Google lobs lawsuit at search result scraping firm SerpApi Read More »

youtube-bans-two-popular-channels-that-created-fake-ai-movie-trailers

YouTube bans two popular channels that created fake AI movie trailers

Deadline reports that the behavior of these creators ran afoul of YouTube’s spam and misleading-metadata policies. At the same time, Google loves generative AI—YouTube has added more ways for creators to use generative AI, and the company says more gen AI tools are coming in the future. It’s quite a tightrope for Google to walk.

AI movie trailers

A selection of videos from the now-defunct Screen Culture channel.

Credit: Ryan Whitwam

A selection of videos from the now-defunct Screen Culture channel. Credit: Ryan Whitwam

While passing off AI videos as authentic movie trailers is definitely spammy conduct, the recent changes to the legal landscape could be a factor, too. Disney recently entered into a partnership with OpenAI, bringing its massive library of characters to the company’s Sora AI video app. At the same time, Disney sent a cease-and-desist letter to Google demanding the removal of Disney content from Google AI. The letter specifically cited AI content on YouTube as a concern.

Both the banned trailer channels made heavy use of Disney properties, sometimes even incorporating snippets of real trailers. For example, Screen Culture created 23 AI trailers for The Fantastic Four: First Steps, some of which outranked the official trailer in searches. It’s unclear if either account used Google’s Veo models to create the trailers, but Google’s AI will recreate Disney characters without issue.

While Screen Culture and KH Studio were the largest purveyors of AI movie trailers, they are far from alone. There are others with five and six-digit subscriber counts, some of which include disclosures about fan-made content. Is that enough to save them from the ban hammer? Many YouTube viewers probably hope not.

YouTube bans two popular channels that created fake AI movie trailers Read More »

openai’s-new-chatgpt-image-generator-makes-faking-photos-easy

OpenAI’s new ChatGPT image generator makes faking photos easy

For most of photography’s roughly 200-year history, altering a photo convincingly required either a darkroom, some Photoshop expertise, or, at minimum, a steady hand with scissors and glue. On Tuesday, OpenAI released a tool that reduces the process to typing a sentence.

It’s not the first company to do so. While OpenAI had a conversational image-editing model in the works since GPT-4o in 2024, Google beat OpenAI to market in March with a public prototype, then refined it to a popular model called Nano Banana image model (and Nano Banana Pro). The enthusiastic response to Google’s image-editing model in the AI community got OpenAI’s attention.

OpenAI’s new GPT Image 1.5 is an AI image synthesis model that reportedly generates images up to four times faster than its predecessor and costs about 20 percent less through the API. The model rolled out to all ChatGPT users on Tuesday and represents another step toward making photorealistic image manipulation a casual process that requires no particular visual skills.

The

The “Galactic Queen of the Universe” added to a photo of a room with a sofa using GPT Image 1.5 in ChatGPT.

GPT Image 1.5 is notable because it’s a “native multimodal” image model, meaning image generation happens inside the same neural network that processes language prompts. (In contrast, DALL-E 3, an earlier OpenAI image generator previously built into ChatGPT, used a different technique called diffusion to generate images.)

This newer type of model, which we covered in more detail in March, treats images and text as the same kind of thing: chunks of data called “tokens” to be predicted, patterns to be completed. If you upload a photo of your dad and type “put him in a tuxedo at a wedding,” the model processes your words and the image pixels in a unified space, then outputs new pixels the same way it would output the next word in a sentence.

Using this technique, GPT Image 1.5 can more easily alter visual reality than earlier AI image models, changing someone’s pose or position, or rendering a scene from a slightly different angle, with varying degrees of success. It can also remove objects, change visual styles, adjust clothing, and refine specific areas while preserving facial likeness across successive edits. You can converse with the AI model about a photograph, refining and revising, the same way you might workshop a draft of an email in ChatGPT.

OpenAI’s new ChatGPT image generator makes faking photos easy Read More »

google-releases-gemini-3-flash,-promising-improved-intelligence-and-efficiency

Google releases Gemini 3 Flash, promising improved intelligence and efficiency

Google began its transition to Gemini 3 a few weeks ago with the launch of the Pro model, and the arrival of Gemini 3 Flash kicks it into high gear. The new, faster Gemini 3 model is coming to the Gemini app and search, and developers will be able to access it immediately via the Gemini API, Vertex AI, AI Studio, and Antigravity. Google’s bigger gen AI model is also picking up steam, with both Gemini 3 Pro and its image component (Nano Banana Pro) expanding in search.

This may come as a shock, but Google says Gemini 3 Flash is faster and more capable than its previous base model. As usual, Google has a raft of benchmark numbers that show modest improvements for the new model. It bests the old 2.5 Flash in basic academic and reasoning tests like GPQA Diamond and MMMU Pro (where it even beats 3 Pro). It gets a larger boost in Humanity’s Last Exam (HLE), which tests advanced domain-specific knowledge. Gemini 3 Flash has tripled the old models’ score in HLE, landing at 33.7 percent without tool use. That’s just a few points behind the Gemini 3 Pro model.

Gemini HLE test

Credit: Google

Google is talking up Gemini 3 Flash’s coding skills, and the provided benchmarks seem to back that talk up. Over the past year, Google has mostly pushed its Pro models as the best for generating code, but 3 Flash has done a lot of catching up. In the popular SWE-Bench Verified test, Gemini 3 Flash has gained almost 20 points on the 2.5 branch.

The new model is also a lot less likely to get general-knowledge questions wrong. In the Simple QA Verified test, Gemini 3 Flash scored 68.7 percent, which is only a little below Gemini 3 Pro. The last Flash model scored just 28.1 percent on that test. At least as far as the evaluation scores go, Gemini 3 Flash performs much closer to Google’s Pro model versus the older 2.5 family. At the same time, it’s considerably more efficient, according to Google.

One of Gemini 3 Pro’s defining advances was its ability to generate interactive simulations and multimodal content. Gemini 3 Flash reportedly retains that underlying capability. Gemini 3 Flash offers better performance than Gemini 2.5 Pro did, but it runs workloads three times faster. It’s also a lot cheaper than the Pro models if you’re paying per token. One million input tokens for 3 Flash will run devs $0.50, and a million output tokens will cost $3. However, that’s an increase compared to Gemini 2.5 Flash input and output at $0.30 and $2.50, respectively. The Pro model’s tokens are $2 (1M input) and $12 (1M output).

Google releases Gemini 3 Flash, promising improved intelligence and efficiency Read More »

senators-count-the-shady-ways-data-centers-pass-energy-costs-on-to-americans

Senators count the shady ways data centers pass energy costs on to Americans


Senators demand Big Tech pay upfront for data center spikes in electricity bills.

Senators launched a probe Tuesday demanding that tech companies explain exactly how they plan to prevent data center projects from increasing electricity bills in communities where prices are already skyrocketing.

In letters to seven AI firms, Senators Elizabeth Warren (D-Mass.), Chris Van Hollen (D-Md.), and Richard Blumenthal (D-Conn.) cited a study estimating that “electricity prices have increased by as much as 267 percent in the past five years” in “areas located near significant data center activity.”

Prices increase, senators noted, when utility companies build out extra infrastructure to meet data centers’ energy demands—which can amount to one customer suddenly consuming as much power as an entire city. They also increase when demand for local power outweighs supply. In some cases, residents are blindsided by higher bills, not even realizing a data center project was approved, because tech companies seem intent on dodging backlash and frequently do not allow terms of deals to be publicly disclosed.

AI firms “ask public officials to sign non-disclosure agreements (NDAs) preventing them from sharing information with their constituents, operate through what appear to be shell companies to mask the real owner of the data center, and require that landowners sign NDAs as part of the land sale while telling them only that a ‘Fortune 100 company’ is planning an ‘industrial development’ seemingly in an attempt to hide the very existence of the data center,” senators wrote.

States like Virginia with the highest concentration of data centers could see average electricity prices increase by another 25 percent by 2030, senators noted. But price increases aren’t limited to the states allegedly striking shady deals with tech companies and greenlighting data center projects, they said. “Interconnected and interstate power grids can lead to a data center built in one state raising costs for residents of a neighboring state,” senators reported.

Under fire for supposedly only pretending to care about keeping neighbors’ costs low were Amazon, Google, Meta, Microsoft, Equinix, Digital Realty, and CoreWeave. Senators accused firms of paying “lip service,” claiming that they would do everything in their power to avoid increasing residential electricity costs, while actively lobbying to pass billions in costs on to their neighbors.

For example, Amazon publicly claimed it would “make sure” it would cover costs so they wouldn’t be passed on. But it’s also a member of an industry lobbying group, the Data Center Coalition, that “has opposed state regulatory decisions requiring data center companies to pay a higher percentage of costs upfront,” senators wrote. And Google made similar statements, despite having an executive who opposed a regulatory solution that would set data centers into their own “rate class”—and therefore responsible for grid improvement costs that could not be passed on to other customers—on the grounds that it was supposedly “discriminatory.”

“The current, socialized model of electricity ratepaying,” senators explained—where costs are shared across all users—”was not designed for an era where just one customer requires the same amount of electricity as some of the largest cities in America.”

Particularly problematic, senators emphasized, were reports that tech firms were getting discounts on energy costs as utility companies competed for their business, while prices went up for their neighbors.

Ars contacted all firms targeted by lawmakers. Four did not respond. Microsoft and Meta declined to comment. Digital Realty told Ars that it “looks forward to working with all elected officials to continue to invest in the digital infrastructure required to support America’s leadership in technology, which underpins modern life and creates high-paying jobs.”

Regulatory pressure likely to increase as bills go up

Senators are likely exploring whether to pass legislation that would help combat price increases that they say cause average Americans to struggle to keep the lights on. They’ve asked tech companies to respond to their biggest questions about data center projects by January 12, 2026.

Among their top questions, senators wanted to know about firms’ internal projections looking forward with data center projects. That includes sharing their projected energy use through 2030, as well as the “impact of your AI data centers on regional utility costs.” Companies are also expected to explain how “internal projections of data center energy consumption” justify any “opposition to the creation of a distinct data center rate class.”

Additionally, senators asked firms to outline steps they’ve taken to prevent passing on costs to neighbors and details of any impact studies companies have conducted.

Likely to raise the most eyebrows, however, would be answers to questions about “tax deductions or other financial incentives” tech firms have received from city and state governments. Those numbers would be interesting to compare with other information senators demanded that companies share, detailing how much they’ve spent on lobbying and advocacy for data centers. Senators appear keen to know how much tech companies are paying to avoid covering a proportionate amount of infrastructure costs.

“To protect consumers, data centers must pay a greater share of the costs upfront for future energy usage and updates to the electrical grid provided specifically to accommodate data centers’ energy needs,” senators wrote.

Requiring upfront payment is especially critical, senators noted, since some tech firms have abandoned data center projects, leaving local customers to bear the costs of infrastructure changes without utility companies ever generating any revenue. Communities must also consider that AI firms’ projected energy demand could severely dip if enterprise demand for AI falls short of expectations, AI capabilities “plateau” and trigger widespread indifference, AI companies shift strategies “away from scaling computer power,” or chip companies “find innovative ways to make AI more energy-efficient.”

“If data centers end up providing less business to the utility companies than anticipated, consumers could be left with massive electricity bills as utility companies recoup billions in new infrastructure costs, with nothing to show for it,” senators wrote.

Already, Utah, Oregon, and Ohio have passed laws “creating a separate class of utility customer for data centers which includes basic financial safeguards such as upfront payments and longer contract length,” senators noted, and Virginia is notably weighing a similar law.

At least one study, The New York Times noted, suggested that data centers may have recently helped reduce electricity costs by spreading the costs of upgrades over more customers, but those outcomes varied by state and could not account for future AI demand.

“It remains unclear whether broader, sustained load growth will increase long-run average costs and prices,” Lawrence Berkeley National Laboratory researchers concluded. “In some cases, spikes in load growth can result in significant, near-term retail price increase.”

Until companies prove they’re paying their fair share, senators expect electricity bills to keep climbing, particularly in vulnerable areas. That will likely only increase pressure for regulators to intervene, the director of the Electricity Law Initiative at the Harvard Law School Environmental and Energy Law Program, Ari Peskoe, suggested in September.

“The utility business model is all about spreading costs of system expansion to everyone, because we all benefit from a reliable, robust electricity system,” Peskoe said. “But when it’s a single consumer that is using so much energy—basically that of an entire city—and when that new city happens to be owned by the wealthiest corporations in the world, I think it’s time to look at the fundamental assumptions of utility regulation and make sure that these facilities are really paying for all of the infrastructure costs to connect them to the system and to power them.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Senators count the shady ways data centers pass energy costs on to Americans Read More »

uk-to-“encourage”-apple-and-google-to-put-nudity-blocking-systems-on-phones

UK to “encourage” Apple and Google to put nudity-blocking systems on phones

The push for device-level blocking comes after the UK implemented the Online Safety Act, a law requiring porn platforms and social media firms to verify users’ ages before letting them view adult content. The law can’t fully prevent minors from viewing porn, as many people use VPN services to get around the UK age checks. Government officials may view device-level detection of nudity as a solution to that problem, but such systems would raise concerns about user rights and the accuracy of the nudity detection.

Age-verification battles in multiple countries

Apple and Google both provide optional tools that let parents control what content their children can access. The companies could object to mandates on privacy grounds, as they have in other venues.

When Texas enacted an age-verification law for app stores, Apple and Google said they would comply but warned of risks to user privacy. A lobby group that represents Apple, Google, and other tech firms then sued Texas in an attempt to prevent the law from taking effect, saying it “imposes a broad censorship regime on the entire universe of mobile apps.”

There’s another age-verification battle in Australia, where the government decided to ban social media for users under 16. Companies said they would comply, although Reddit sued Australia on Friday in a bid to overturn the law.

Apple this year also fought a UK demand that it create a backdoor for government security officials to access encrypted data. The Trump administration claimed it convinced the UK to drop its demand, but the UK is reportedly still seeking an Apple backdoor.

In another case, the image-sharing website Imgur blocked access for UK users starting in September while facing an investigation over its age-verification practices.

Apple faced a backlash in 2021 over potential privacy violations when it announced a plan to have iPhones scan photos for child sexual abuse material (CSAM). Apple ultimately dropped the plan.

UK to “encourage” Apple and Google to put nudity-blocking systems on phones Read More »

google-will-end-dark-web-reports-that-alerted-users-to-leaked-data

Google will end dark web reports that alerted users to leaked data

As Google admits in the email alert, its dark web scans didn’t offer much help. “Feedback showed that it did not provide helpful next steps,” Google said of the service. Here’s the full text of the email.

Google dark web email

Credit: Google

With other types of personal data alerts provided by the company, it has the power to do something. For example, you can have Google remove pages from search that list your personal data. Google doesn’t run anything on the dark web, though, so all it can do is remind you that your data is being passed around in one of the shadier corners of the Internet.

The shutdown begins on January 15, when Google will stop conducting new scans for user data on the dark web. Past data will no longer be available as of February 16, 2026. Google says it will delete all past reports at that time. However, users can remove their monitoring profile earlier in the account settings. This change does not impact any of Google’s other privacy reports.

The good news is that the best ways to protect your personal data from being shuffled around the dark web are the same ones that keep you safe on the open web. Google suggests always using two-step verification, and tools like Passkeys and Google’s password checkup can ensure you don’t accidentally reuse a compromised password. Stay safe out there.

Google will end dark web reports that alerted users to leaked data Read More »

openai-built-an-ai-coding-agent-and-uses-it-to-improve-the-agent-itself

OpenAI built an AI coding agent and uses it to improve the agent itself


“The vast majority of Codex is built by Codex,” OpenAI told us about its new AI coding agent.

With the popularity of AI coding tools rising among some software developers, their adoption has begun to touch every aspect of the process, including the improvement of AI coding tools themselves.

In interviews with Ars Technica this week, OpenAI employees revealed the extent to which the company now relies on its own AI coding agent, Codex, to build and improve the development tool. “I think the vast majority of Codex is built by Codex, so it’s almost entirely just being used to improve itself,” said Alexander Embiricos, product lead for Codex at OpenAI, in a conversation on Tuesday.

Codex, which OpenAI launched in its modern incarnation as a research preview in May 2025, operates as a cloud-based software engineering agent that can handle tasks like writing features, fixing bugs, and proposing pull requests. The tool runs in sandboxed environments linked to a user’s code repository and can execute multiple tasks in parallel. OpenAI offers Codex through ChatGPT’s web interface, a command-line interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

The “Codex” name itself dates back to a 2021 OpenAI model based on GPT-3 that powered GitHub Copilot’s tab completion feature. Embiricos said the name is rumored among staff to be short for “code execution.” OpenAI wanted to connect the new agent to that earlier moment, which was crafted in part by some who have left the company.

“For many people, that model powering GitHub Copilot was the first ‘wow’ moment for AI,” Embiricos said. “It showed people the potential of what it can mean when AI is able to understand your context and what you’re trying to do and accelerate you in doing that.”

A place to enter a prompt, set parameters, and click

The interface for OpenAI’s Codex in ChatGPT. Credit: OpenAI

It’s no secret that the current command-line version of Codex bears some resemblance to Claude Code, Anthropic’s agentic coding tool that launched in February 2025. When asked whether Claude Code influenced Codex’s design, Embiricos parried the question but acknowledged the competitive dynamic. “It’s a fun market to work in because there’s lots of great ideas being thrown around,” he said. He noted that OpenAI had been building web-based Codex features internally before shipping the CLI version, which arrived after Anthropic’s tool.

OpenAI’s customers apparently love the command line version, though. Embiricos said Codex usage among external developers jumped 20 times after OpenAI shipped the interactive CLI extension alongside GPT-5 in August 2025. On September 15, OpenAI released GPT-5 Codex, a specialized version of GPT-5 optimized for agentic coding, which further accelerated adoption.

It hasn’t just been the outside world that has embraced the tool. Embiricos said the vast majority of OpenAI’s engineers now use Codex regularly. The company uses the same open-source version of the CLI that external developers can freely download, suggest additions to, and modify themselves. “I really love this about our team,” Embiricos said. “The version of Codex that we use is literally the open source repo. We don’t have a different repo that features go in.”

The recursive nature of Codex development extends beyond simple code generation. Embiricos described scenarios where Codex monitors its own training runs and processes user feedback to “decide” what to build next. “We have places where we’ll ask Codex to look at the feedback and then decide what to do,” he said. “Codex is writing a lot of the research harness for its own training runs, and we’re experimenting with having Codex monitoring its own training runs.” OpenAI employees can also submit a ticket to Codex through project management tools like Linear, assigning it tasks the same way they would assign work to a human colleague.

This kind of recursive loop, of using tools to build better tools, has deep roots in computing history. Engineers designed the first integrated circuits by hand on vellum and paper in the 1960s, then fabricated physical chips from those drawings. Those chips powered the computers that ran the first electronic design automation (EDA) software, which in turn enabled engineers to design circuits far too complex for any human to draft manually. Modern processors contain billions of transistors arranged in patterns that exist only because software made them possible. OpenAI’s use of Codex to build Codex seems to follow the same pattern: each generation of the tool creates capabilities that feed into the next.

But describing what Codex actually does presents something of a linguistic challenge. At Ars Technica, we try to reduce anthropomorphism when discussing AI models as much as possible while also describing what these systems do using analogies that make sense to general readers. People can talk to Codex like a human, so it feels natural to use human terms to describe interacting with it, even though it is not a person and simulates human personality through statistical modeling.

The system runs many processes autonomously, addresses feedback, spins off and manages child processes, and produces code that ships in real products. OpenAI employees call it a “teammate” and assign it tasks through the same tools they use for human colleagues. Whether the tasks Codex handles constitute “decisions” or sophisticated conditional logic smuggled through a neural network depends on definitions that computer scientists and philosophers continue to debate. What we can say is that a semi-autonomous feedback loop exists: Codex produces code under human direction, that code becomes part of Codex, and the next version of Codex produces different code as a result.

Building faster with “AI teammates”

According to our interviews, the most dramatic example of Codex’s internal impact came from OpenAI’s development of the Sora Android app. According to Embiricos, the development tool allowed the company to create the app in record time.

“The Sora Android app was shipped by four engineers from scratch,” Embiricos told Ars. “It took 18 days to build, and then we shipped it to the app store in 28 days total,” he said. The engineers already had the iOS app and server-side components to work from, so they focused on building the Android client. They used Codex to help plan the architecture, generate sub-plans for different components, and implement those components.

Despite OpenAI’s claims of success with Codex in house, it’s worth noting that independent research has shown mixed results for AI coding productivity. A METR study published in July found that experienced open source developers were actually 19 percent slower when using AI tools on complex, mature codebases—though the researchers noted AI may perform better on simpler projects.

Ed Bayes, a designer on the Codex team, described how the tool has changed his own workflow. Bayes said Codex now integrates with project management tools like Linear and communication platforms like Slack, allowing team members to assign coding tasks directly to the AI agent. “You can add Codex, and you can basically assign issues to Codex now,” Bayes told Ars. “Codex is literally a teammate in your workspace.”

This integration means that when someone posts feedback in a Slack channel, they can tag Codex and ask it to fix the issue. The agent will create a pull request, and team members can review and iterate on the changes through the same thread. “It’s basically approximating this kind of coworker and showing up wherever you work,” Bayes said.

For Bayes, who works on the visual design and interaction patterns for Codex’s interfaces, the tool has enabled him to contribute code directly rather than handing off specifications to engineers. “It kind of gives you more leverage. It enables you to work across the stack and basically be able to do more things,” he said. He noted that designers at OpenAI now prototype features by building them directly, using Codex to handle the implementation details.

The command line version of OpenAI codex running in a macOS terminal window.

The command line version of OpenAI codex running in a macOS terminal window. Credit: Benj Edwards

OpenAI’s approach treats Codex as what Bayes called “a junior developer” that the company hopes will graduate into a senior developer over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes said. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.”

Given this teammate approach, will there be anything left for humans to do? When asked, Embiricos drew a distinction between “vibe coding,” where developers accept AI-generated code without close review, and what AI researcher Simon Willison calls “vibe engineering,” where humans stay in the loop. “We see a lot more vibe engineering in our code base,” he said. “You ask Codex to work on that, maybe you even ask for a plan first. Go back and forth, iterate on the plan, and then you’re in the loop with the model and carefully reviewing its code.”

He added that vibe coding still has its place for prototypes and throwaway tools. “I think vibe coding is great,” he said. “Now you have discretion as a human about how much attention you wanna pay to the code.”

Looking ahead

Over the past year, “monolithic” large language models (LLMs) like GPT-4.5 have apparently become something of a dead end in terms of frontier benchmarking progress as AI companies pivot to simulated reasoning models and also agentic systems built from multiple AI models running in parallel. We asked Embiricos whether agents like Codex represent the best path forward for squeezing utility out of existing LLM technology.

He dismissed concerns that AI capabilities have plateaued. “I think we’re very far from plateauing,” he said. “If you look at the velocity on the research team here, we’ve been shipping models almost every week or every other week.” He pointed to recent improvements where GPT-5-Codex reportedly completes tasks 30 percent faster than its predecessor at the same intelligence level. During testing, the company has seen the model work independently for 24 hours on complex tasks.

OpenAI faces competition from multiple directions in the AI coding market. Anthropic’s Claude Code and Google’s Gemini CLI offer similar terminal-based agentic coding experiences. This week, Mistral AI released Devstral 2 alongside a CLI tool called Mistral Vibe. Meanwhile, startups like Cursor have built dedicated IDEs around AI coding, reportedly reaching $300 million in annualized revenue.

Given the well-known issues with confabulation in AI models when people attempt to use them as factual resources, could it be that coding has become the killer app for LLMs? We wondered if OpenAI has noticed that coding seems to be a clear business use case for today’s AI models with less hazard than, say, using AI language models for writing or as emotional companions.

“We have absolutely noticed that coding is both a place where agents are gonna get good really fast and there’s a lot of economic value,” Embiricos said. “We feel like it’s very mission-aligned to focus on Codex. We get to provide a lot of value to developers. Also, developers build things for other people, so we’re kind of intrinsically scaling through them.”

But will tools like Codex threaten software developer jobs? Bayes acknowledged concerns but said Codex has not reduced headcount at OpenAI, and “there’s always a human in the loop because the human can actually read the code.” Similarly, the two men don’t project a future where Codex runs by itself without some form of human oversight. They feel the tool is an amplifier of human potential rather than a replacement for it.

The practical implications of agents like Codex extend beyond OpenAI’s walls. Embiricos said the company’s long-term vision involves making coding agents useful to people who have no programming experience. “All humanity is not gonna open an IDE or even know what a terminal is,” he said. “We’re building a coding agent right now that’s just for software engineers, but we think of the shape of what we’re building as really something that will be useful to be a more general agent.”

This article was updated on December 12, 2025 at 6: 50 PM to mention the METR study.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

OpenAI built an AI coding agent and uses it to improve the agent itself Read More »

google-translate-expands-live-translation-to-all-earbuds-on-android

Google Translate expands live translation to all earbuds on Android

Gemini text translation

Translate can now use Gemini to interpret the meaning of a phrase rather than simply translating each word.

Credit: Google

Translate can now use Gemini to interpret the meaning of a phrase rather than simply translating each word. Credit: Google

Regardless of whether you’re using live translate or just checking a single phrase, Google claims the Gemini-powered upgrade will serve you well. Google Translate is now apparently better at understanding the nuance of languages, with an awareness of idioms and local slang. Google uses the example of “stealing my thunder,” which wouldn’t make a lick of sense when translated literally into other languages. The new translation model, which is also available in the search-based translation interface, supports over 70 languages.

Google also debuted language-learning features earlier this year, borrowing a page from educational apps like Duolingo. You can tell the app your skill level with a language, as well as whether you need help with travel-oriented conversations or more everyday interactions. The app uses this to create tailored listening and speaking exercises.

AI Translate learning

The Translate app’s learning tools are getting better.

Credit: Google

The Translate app’s learning tools are getting better. Credit: Google

With this big update, Translate will be more of a stickler about your pronunciation. Google promises more feedback and tips based on your spoken replies in the learning modules. The app will also now keep track of how often you complete language practice, showing your daily streak in the app.

If “number go up” will help you learn more, then this update is for you. Practice mode is also launching in almost 20 new countries, including Germany, India, Sweden, and Taiwan.

Google Translate expands live translation to all earbuds on Android Read More »

openai-releases-gpt-5.2-after-“code-red”-google-threat-alert

OpenAI releases GPT-5.2 after “code red” Google threat alert

On Thursday, OpenAI released GPT-5.2, its newest family of AI models for ChatGPT, in three versions called Instant, Thinking, and Pro. The release follows CEO Sam Altman’s internal “code red” memo earlier this month, which directed company resources toward improving ChatGPT in response to competitive pressure from Google’s Gemini 3 AI model.

“We designed 5.2 to unlock even more economic value for people,” Fidji Simo, OpenAI’s chief product officer, said during a press briefing with journalists on Thursday. “It’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long context, using tools and then linking complex, multi-step projects.”

As with previous versions of GPT-5, the three model tiers serve different purposes: Instant handles faster tasks like writing and translation; Thinking spits out simulated reasoning “thinking” text in an attempt to tackle more complex work like coding and math; and Pro spits out even more simulated reasoning text with the goal of delivering the highest-accuracy performance for difficult problems.

A chart of GPT-5.2 benchmark results taken from OpenAI's website.

A chart of GPT-5.2 Thinking benchmark results comparing it to its predecessor, taken from OpenAI’s website. Credit: OpenAI

GPT-5.2 features a 400,000-token context window, allowing it to process hundreds of documents at once, and a knowledge cutoff date of August 31, 2025.

GPT-5.2 is rolling out to paid ChatGPT subscribers starting Thursday, with API access available to developers. Pricing in the API runs $1.75 per million input tokens for the standard model, a 40 percent increase over GPT-5.1. OpenAI says the older GPT-5.1 will remain available in ChatGPT for paid users for three months under a legacy models dropdown.

Playing catch-up with Google

The release follows a tricky month for OpenAI. In early December, Altman issued an internal “code red” directive after Google’s Gemini 3 model topped multiple AI benchmarks and gained market share. The memo called for delaying other initiatives, including advertising plans for ChatGPT, to focus on improving the chatbot’s core experience.

The stakes for OpenAI are substantial. The company has made commitments totaling $1.4 trillion for AI infrastructure buildouts over the next several years, bets it made when it had a more obvious technology lead among AI companies. Google’s Gemini app now has more than 650 million monthly active users, while OpenAI reports 800 million weekly active users for ChatGPT.

OpenAI releases GPT-5.2 after “code red” Google threat alert Read More »