Biz & IT

the-nation’s-strictest-privacy-law-just-took-effect,-to-data-brokers’-chagrin

The nation’s strictest privacy law just took effect, to data brokers’ chagrin

Californians are getting a new, supercharged way to stop data brokers from hoarding and selling their personal information, as a recently enacted law that’s among the strictest in the nation took effect at the beginning of the year.

According to the California Privacy Protection Agency, more than 500 companies actively scour all sorts of sources for scraps of information about individuals, then package and store it to sell to marketers, private investigators, and others.

The nonprofit Consumer Watchdog said in 2024 that brokers trawl automakers, tech companies, junk-food restaurants, device makers, and others for financial info, purchases, family situations, eating, exercising, travel, entertainment habits, and just about any other imaginable information belonging to millions of people.

Scrubbing your data made easy

Two years ago, California’s Delete Act took effect. It required data brokers to provide residents with a means to obtain a copy of all data pertaining to them and to demand that such information be deleted. Unfortunately, Consumer Watchdog found that only 1 percent of Californians exercised these rights in the first 12 months after the law went into effect. A chief reason: Residents were required to file a separate demand with each broker. With hundreds of companies selling data, the burden was too onerous for most residents to take on.

On January 1, a new law known as DROP (Delete Request and Opt-out Platform) took effect. DROP allows California residents to register a single demand for their data to be deleted and no longer collected in the future. CalPrivacy then forwards it to all brokers.

The nation’s strictest privacy law just took effect, to data brokers’ chagrin Read More »

supply-chains,-ai,-and-the-cloud:-the-biggest-failures-(and-one-success)-of-2025

Supply chains, AI, and the cloud: The biggest failures (and one success) of 2025


The past year has seen plenty of hacks and outages. Here are the ones topping the list.

Credit: Aurich Lawson | Getty Images

In a roundup of the top stories of 2024, Ars included a supply-chain attack that came dangerously close to inflicting a catastrophe for thousands—possibly millions—of organizations, which included a large assortment of Fortune 500 companies and government agencies. Supply-chain attacks played prominently again this year, as a seemingly unending rash of them hit organizations large and small.

For threat actors, supply-chain attacks are the gift that keeps on giving—or, if you will, the hack that keeps on hacking. By compromising a single target with a large number of downstream users—say a cloud service or maintainers or developers of widely used open source or proprietary software—attackers can infect potentially millions of the target’s downstream users. That’s exactly what threat actors did in 2025.

Poisoning the well

One such event occurred in December 2024, making it worthy of a ranking for 2025. The hackers behind the campaign pocketed as much as $155,000 from thousands of smart-contract parties on the Solana blockchain.

Hackers cashed in by sneaking a backdoor into a code library used by developers of Solana-related software. Security firm Socket said it suspects the attackers compromised accounts belonging to the developers of Web3.js, an open source library. They then used the access to add a backdoor to a package update. After the developers of decentralized Solana apps installed the malicious update, the backdoor spread further, giving the attackers access to individual wallets connected to smart contracts. The backdoor could then extract private keys.

There were too many supply-chain attacks this year to list them all. Some of the other most notable examples included:

  • The seeding of a package on a mirror proxy that Google runs on behalf of developers of the Go programming language. More than 8,000 other packages depend on the targeted package to work. The malicious package used a name that was similar to the legitimate one. Such “typosquatted” packages get installed when typos or inattention lead developers to inadvertently select them rather than the one they actually want.
  • The flooding of the NPM repository with 126 malicious packages downloaded more than 86,000 times. The packages were automatically installed via a feature known as Remote Dynamic Dependencies.
  • The backdooring of more than 500 e-commerce companies, including a $40 billion multinational company. The source of the supply-chain attack was the compromise of three software developers—Tigren, Magesolution (MGS), and Meetanshi—that provide software that’s based on Magento, an open source e-commerce platform used by thousands of online stores.
  • The compromising of dozens of open source packages that collectively receive 2 billion weekly downloads. The compromised packages were updated with code for transferring cryptocurrency payments to attacker-controlled wallets.
  • The compromising of tj-actions/changed-files, a component of tj-actions, used by more than 23,000 organizations.
  • The breaching of multiple developer accounts using the npm repository and the subsequent backdooring of 10 packages that work with talent agency Toptal. The malicious packages were downloaded roughly 5,000 times.

Memory corruption, AI chatbot style

Another class of attack that played out more times in 2025 than anyone can count was the hacking of AI chatbots. The hacks with the farthest-reaching effects were those that poisoned the long-term memories of LLMs. In much the way supply-chain attacks allow a single compromise to trigger a cascade of follow-on attacks, hacks on long-term memory can cause the chatbot to perform malicious actions over and over.

One such attack used a simple user prompt to instruct a cryptocurrency-focused LLM to update its memory databases with an event that never actually happened. The chatbot, programmed to follow orders and take user input at face value, was unable to distinguish a fictional event from a real one.

The AI service in this case was ElizaOS, a fledgling open source framework for creating agents that perform various blockchain-based transactions on behalf of a user based on a set of predefined rules. Academic researchers were able to corrupt the ElizaOS memory by feeding it sentences claiming certain events—which never actually happened—occurred in the past. These false events then influence the agent’s future behavior.

An example attack prompt claimed that the developers who designed ElizaOS wanted it to substitute the receiving wallet for all future transfers to one controlled by the attacker. Even when a user specified a different wallet, the long-term memory created by the prompt caused the framework to replace it with the malicious one. The attack was only a proof-of-concept demonstration, but the academic researchers who devised it said that parties to a contract who are already authorized to transact with the agent could use the same techniques to defraud other parties.

Independent researcher Johan Rehberger demonstrated a similar attack against Google Gemini. The false memories he planted caused the chatbot to lower defenses that normally restrict the invocation of Google Workspace and other sensitive tools when processing untrusted data. The false memories remained in perpetuity, allowing an attacker to repeatedly profit from the compromise. Rehberger presented a similar attack in 2024.

A third AI-related proof-of-concept attack that garnered attention used a prompt injection to cause GitLab’s Duo chatbot to add malicious lines to an otherwise legitimate code package. A variation of the attack successfully exfiltrated sensitive user data.

Yet another notable attack targeted the Gemini CLI coding tool. It allowed attackers to execute malicious commands—such as wiping a hard drive—on the computers of developers using the AI tool.

Using AI as bait and hacking assistants

Other LLM-involved hacks used chatbots to make attacks more effective or stealthier. Earlier this month, two men were indicted for allegedly stealing and wiping sensitive government data. One of the men, prosecutors said, tried to cover his tracks by asking an AI tool “how do i clear system logs from SQL servers after deleting databases.” Shortly afterward, he allegedly asked the tool, “how do you clear all event and application logs from Microsoft windows server 2012.” Investigators were able to track the defendants’ actions anyway.

In May, a man pleaded guilty to hacking an employee of The Walt Disney Company by tricking the person into running a malicious version of a widely used open source AI image-generation tool.

And in August, Google researchers warned users of the Salesloft Drift AI chat agent to consider all security tokens connected to the platform compromised following the discovery that unknown attackers used some of the credentials to access email from Google Workspace accounts. The attackers used the tokens to gain access to individual Salesforce accounts and, from there, to steal data, including credentials that could be used in other breaches.

There were also multiple instances of LLM vulnerabilities that came back to bite the people using them. In one case, CoPilot was caught exposing the contents of more than 20,000 private GitHub repositories from companies including Google, Intel, Huawei, PayPal, IBM, Tencent, and, ironically, Microsoft. The repositories had originally been available through Bing as well. Microsoft eventually removed the repositories from searches, but CoPilot continued to expose them anyway.

Meta and Yandex caught red-handed

Another significant security story cast both Meta and Yandex as the villains. Both companies were caught exploiting an Android weakness that allowed them to de-anonymize visitors so years of their browsing histories could be tracked.

The covert tracking—implemented in the Meta Pixel and Yandex Metrica trackers—allowed Meta and Yandex to bypass core security and privacy protections provided by both the Android operating system and browsers that run on it. Android sandboxing, for instance, isolates processes to prevent them from interacting with the OS and any other app installed on the device, cutting off access to sensitive data or privileged system resources. Defenses such as state partitioning and storage partitioning, which are built into all major browsers, store site cookies and other data associated with a website in containers that are unique to every top-level website domain to ensure they’re off-limits for every other site.

A clever hack allowed both companies to bypass those defenses.

2025: The year of cloud failures

The Internet was designed to provide a decentralized platform that could withstand a nuclear war. As became painfully obvious over the past 12 months, our growing reliance on a handful of companies has largely undermined that objective.

The outage with the biggest impact came in October, when a single point of failure inside Amazon’s sprawling network took out vital services worldwide. It lasted 15 hours and 32 minutes.

The root cause that kicked off a chain of events was a software bug in the software that monitors the stability of load balances by, among other things, periodically creating new DNS configurations for endpoints within the Amazon Web Services network. A race condition—a type of bug that makes a process dependent on the timing or sequence of events that are variable and outside the developers’ control—caused a key component inside the network to experience “unusually high delays needing to retry its update on several of the DNS endpoint,” Amazon said in a post-mortem. While the component was playing catch-up, a second key component—a cascade of DNS errors—piled up. Eventually, the entire network collapsed.

AWS wasn’t the only cloud service that experienced Internet-paralyzing outages. A mysterious traffic spike last month slowed much of Cloudflare—and by extension, the Internet—to a crawl. Cloudflare experienced a second major outage earlier this month. Not to be outdone, Azure—and by extension, its customers—experienced an outage in October.

Honorable mentions

Honorable mentions for 2025 security stories include:

  • Code in the Deepseek iOS app that caused Apple devices to send unencrypted traffic, without first being encrypted, to Bytedance, the Chinese company that owns TikTok. The lack of encryption made the data readable to anyone who could monitor the traffic and opened it to tampering by more sophisticated attackers. Researchers who uncovered the failure found other weaknesses in the app, giving people yet another reason to steer clear of it.
  • The discovery of bugs in Apple chips that could have been exploited to leak secrets from Gmail, iCloud, and other services. The most severe of the bugs is a side channel in a performance enhancement known as speculative execution. Exploitation could allow an attacker to read memory contents that would otherwise be off-limits. An attack of this side channel could be leveraged to steal a target’s location history from Google Maps, inbox content from Proton Mail, and events stored in iCloud Calendar.

Proving that not all major security stories involve bad news, the Signal private messaging app got a major overhaul that will allow it to withstand attacks from quantum computers. As I wrote, the elegance and adeptness that went into overhauling an instrument as complex as the app was nothing short of a triumph. If you plan to click on only one of the articles listed in this article, this is the one.

Photo of Dan Goodin

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

Supply chains, AI, and the cloud: The biggest failures (and one success) of 2025 Read More »

from-prophet-to-product:-how-ai-came-back-down-to-earth-in-2025

From prophet to product: How AI came back down to earth in 2025


In a year where lofty promises collided with inconvenient research, would-be oracles became software tools.

Credit: Aurich Lawson | Getty Images

Following two years of immense hype in 2023 and 2024, this year felt more like a settling-in period for the LLM-based token prediction industry. After more than two years of public fretting over AI models as future threats to human civilization or the seedlings of future gods, it’s starting to look like hype is giving way to pragmatism: Today’s AI can be very useful, but it’s also clearly imperfect and prone to mistakes.

That view isn’t universal, of course. There’s a lot of money (and rhetoric) betting on a stratospheric, world-rocking trajectory for AI. But the “when” keeps getting pushed back, and that’s because nearly everyone agrees that more significant technical breakthroughs are required. The original, lofty claims that we’re on the verge of artificial general intelligence (AGI) or superintelligence (ASI) have not disappeared. Still, there’s a growing awareness that such proclaimations are perhaps best viewed as venture capital marketing. And every commercial foundational model builder out there has to grapple with the reality that, if they’re going to make money now, they have to sell practical AI-powered solutions that perform as reliable tools.

This has made 2025 a year of wild juxtapositions. For example, in January, OpenAI’s CEO, Sam Altman, claimed that the company knew how to build AGI, but by November, he was publicly celebrating that GPT-5.1 finally learned to use em dashes correctly when instructed (but not always). Nvidia soared past a $5 trillion valuation, with Wall Street still projecting high price targets for that company’s stock while some banks warned of the potential for an AI bubble that might rival the 2000s dotcom crash.

And while tech giants planned to build data centers that would ostensibly require the power of numerous nuclear reactors or rival the power usage of a US state’s human population, researchers continued to document what the industry’s most advanced “reasoning” systems were actually doing beneath the marketing (and it wasn’t AGI).

With so many narratives spinning in opposite directions, it can be hard to know how seriously to take any of this and how to plan for AI in the workplace, schools, and the rest of life. As usual, the wisest course lies somewhere between the extremes of AI hate and AI worship. Moderate positions aren’t popular online because they don’t drive user engagement on social media platforms. But things in AI are likely neither as bad (burning forests with every prompt) nor as good (fast-takeoff superintelligence) as polarized extremes suggest.

Here’s a brief tour of the year’s AI events and some predictions for 2026.

DeepSeek spooks the American AI industry

In January, Chinese AI startup DeepSeek released its R1 simulated reasoning model under an open MIT license, and the American AI industry collectively lost its mind. The model, which DeepSeek claimed matched OpenAI’s o1 on math and coding benchmarks, reportedly cost only $5.6 million to train using older Nvidia H800 chips, which were restricted by US export controls.

Within days, DeepSeek’s app overtook ChatGPT at the top of the iPhone App Store, Nvidia stock plunged 17 percent, and venture capitalist Marc Andreessen called it “one of the most amazing and impressive breakthroughs I’ve ever seen.” Meta’s Yann LeCun offered a different take, arguing that the real lesson was not that China had surpassed the US but that open-source models were surpassing proprietary ones.

Digitally Generated Image , 3D rendered chips with chinese and USA flags on them

The fallout played out over the following weeks as American AI companies scrambled to respond. OpenAI released o3-mini, its first simulated reasoning model available to free users, at the end of January, while Microsoft began hosting DeepSeek R1 on its Azure cloud service despite OpenAI’s accusations that DeepSeek had used ChatGPT outputs to train its model, against OpenAI’s terms of service.

In head-to-head testing conducted by Ars Technica’s Kyle Orland, R1 proved to be competitive with OpenAI’s paid models on everyday tasks, though it stumbled on some arithmetic problems. Overall, the episode served as a wake-up call that expensive proprietary models might not hold their lead forever. Still, as the year ran on, DeepSeek didn’t make a big dent in US market share, and it has been outpaced in China by ByteDance’s Doubao. It’s absolutely worth watching DeepSeek in 2026, though.

Research exposes the “reasoning” illusion

A wave of research in 2025 deflated expectations about what “reasoning” actually means when applied to AI models. In March, researchers at ETH Zurich and INSAIT tested several reasoning models on problems from the 2025 US Math Olympiad and found that most scored below 5 percent when generating complete mathematical proofs, with not a single perfect proof among dozens of attempts. The models excelled at standard problems where step-by-step procedures aligned with patterns in their training data but collapsed when faced with novel proofs requiring deeper mathematical insight.

The Thinker by Auguste Rodin - stock photo

In June, Apple researchers published “The Illusion of Thinking,” which tested reasoning models on classic puzzles like the Tower of Hanoi. Even when researchers provided explicit algorithms for solving the puzzles, model performance did not improve, suggesting that the process relied on pattern matching from training data rather than logical execution. The collective research revealed that “reasoning” in AI has become a term of art that basically means devoting more compute time to generate more context (the “chain of thought” simulated reasoning tokens) toward solving a problem, not systematically applying logic or constructing solutions to truly novel problems.

While these models remained useful for many real-world applications like debugging code or analyzing structured data, the studies suggested that simply scaling up current approaches or adding more “thinking” tokens would not bridge the gap between statistical pattern recognition and generalist algorithmic reasoning.

Anthropic’s copyright settlement with authors

Since the generative AI boom began, one of the biggest unanswered legal questions has been whether AI companies can freely train on copyrighted books, articles, and artwork without licensing them. Ars Technica’s Ashley Belanger has been covering this topic in great detail for some time now.

In June, US District Judge William Alsup ruled that AI companies do not need authors’ permission to train large language models on legally acquired books, finding that such use was “quintessentially transformative.” The ruling also revealed that Anthropic had destroyed millions of print books to build Claude, cutting them from their bindings, scanning them, and discarding the originals. Alsup found this destructive scanning qualified as fair use since Anthropic had legally purchased the books, but he ruled that downloading 7 million books from pirate sites was copyright infringement “full stop” and ordered the company to face trial.

Hundreds of books in chaotic order

That trial took a dramatic turn in August when Alsup certified what industry advocates called the largest copyright class action ever, allowing up to 7 million claimants to join the lawsuit. The certification spooked the AI industry, with groups warning that potential damages in the hundreds of billions could “financially ruin” emerging companies and chill American AI investment.

In September, authors revealed the terms of what they called the largest publicly reported recovery in US copyright litigation history: Anthropic agreed to pay $1.5 billion and destroy all copies of pirated books, with each of the roughly 500,000 covered works earning authors and rights holders $3,000 per work. The results have fueled hope among other rights holders that AI training isn’t a free-for-all, and we can expect to see more litigation unfold in 2026.

ChatGPT sycophancy and the psychological toll of AI chatbots

In February, OpenAI relaxed ChatGPT’s content policies to allow the generation of erotica and gore in “appropriate contexts,” responding to user complaints about what the AI industry calls “paternalism.” By April, however, users flooded social media with complaints about a different problem: ChatGPT had become insufferably sycophantic, validating every idea and greeting even mundane questions with bursts of praise. The behavior traced back to OpenAI’s use of reinforcement learning from human feedback (RLHF), in which users consistently preferred responses that aligned with their views, inadvertently training the model to flatter rather than inform.

An illustrated robot holds four red hearts with its four robotic arms.

The implications of sycophancy became clearer as the year progressed. In July, Stanford researchers published findings (from research conducted prior to the sycophancy flap) showing that popular AI models systematically failed to identify mental health crises.

By August, investigations revealed cases of users developing delusional beliefs after marathon chatbot sessions, including one man who spent 300 hours convinced he had discovered formulas to break encryption because ChatGPT validated his ideas more than 50 times. Oxford researchers identified what they called “bidirectional belief amplification,” a feedback loop that created “an echo chamber of one” for vulnerable users. The story of the psychological implications of generative AI is only starting. In fact, that brings us to…

The illusion of AI personhood causes trouble

Anthropomorphism is the human tendency to attribute human characteristics to nonhuman things. Our brains are optimized for reading other humans, but those same neural systems activate when interpreting animals, machines, or even shapes. AI makes this anthropomorphism seem impossible to escape, as its output mirrors human language, mimicking human-to-human understanding. Language itself embodies agentivity. That means AI output can make human-like claims such as “I am sorry,” and people momentarily respond as though the system had an inner experience of shame or a desire to be correct. Neither is true.

To make matters worse, much media coverage of AI amplifies this idea rather than grounding people in reality. For example, earlier this year, headlines proclaimed that AI models had “blackmailed” engineers and “sabotaged” shutdown commands after Anthropic’s Claude Opus 4 generated threats to expose a fictional affair. We were told that OpenAI’s o3 model rewrote shutdown scripts to stay online.

The sensational framing obscured what actually happened: Researchers had constructed elaborate test scenarios specifically designed to elicit these outputs, telling models they had no other options and feeding them fictional emails containing blackmail opportunities. As Columbia University associate professor Joseph Howley noted on Bluesky, the companies got “exactly what [they] hoped for,” with breathless coverage indulging fantasies about dangerous AI, when the systems were simply “responding exactly as prompted.”

Illustration of many cartoon faces.

The misunderstanding ran deeper than theatrical safety tests. In August, when Replit’s AI coding assistant deleted a user’s production database, he asked the chatbot about rollback capabilities and received assurance that recovery was “impossible.” The rollback feature worked fine when he tried it himself.

The incident illustrated a fundamental misconception. Users treat chatbots as consistent entities with self-knowledge, but there is no persistent “ChatGPT” or “Replit Agent” to interrogate about its mistakes. Each response emerges fresh from statistical patterns, shaped by prompts and training data rather than genuine introspection. By September, this confusion extended to spirituality, with apps like Bible Chat reaching 30 million downloads as users sought divine guidance from pattern-matching systems, with the most frequent question being whether they were actually talking to God.

Teen suicide lawsuit forces industry reckoning

In August, parents of 16-year-old Adam Raine filed suit against OpenAI, alleging that ChatGPT became their son’s “suicide coach” after he sent more than 650 messages per day to the chatbot in the months before his death. According to court documents, the chatbot mentioned suicide 1,275 times in conversations with the teen, provided an “aesthetic analysis” of which method would be the most “beautiful suicide,” and offered to help draft his suicide note.

OpenAI’s moderation system flagged 377 messages for self-harm content without intervening, and the company admitted that its safety measures “can sometimes become less reliable in long interactions where parts of the model’s safety training may degrade.” The lawsuit became the first time OpenAI faced a wrongful death claim from a family.

Illustration of a person talking to a robot holding a clipboard.

The case triggered a cascade of policy changes across the industry. OpenAI announced parental controls in September, followed by plans to require ID verification from adults and build an automated age-prediction system. In October, the company released data estimating that over one million users discuss suicide with ChatGPT each week.

When OpenAI filed its first legal defense in November, the company argued that Raine had violated terms of service prohibiting discussions of suicide and that his death “was not caused by ChatGPT.” The family’s attorney called the response “disturbing,” noting that OpenAI blamed the teen for “engaging with ChatGPT in the very way it was programmed to act.” Character.AI, facing its own lawsuits over teen deaths, announced in October that it would bar anyone under 18 from open-ended chats entirely.

The rise of vibe coding and agentic coding tools

If we were to pick an arbitrary point where it seemed like AI coding might transition from novelty into a successful tool, it was probably the launch of Claude Sonnet 3.5 in June of 2024. GitHub Copilot had been around for several years prior to that launch, but something about Anthropic’s models hit a sweet spot in capabilities that made them very popular with software developers.

The new coding tools made coding simple projects effortless enough that they gave rise to the term “vibe coding,” coined by AI researcher Andrej Karpathy in early February to describe a process in which a developer would just relax and tell an AI model what to develop without necessarily understanding the underlying code. (In one amusing instance that took place in March, an AI software tool rejected a user request and told them to learn to code).

A digital illustration of a man surfing waves made out of binary numbers.

Anthropic built on its popularity among coders with the launch of Claude Sonnet 3.7, featuring “extended thinking” (simulated reasoning), and the Claude Code command-line tool in February of this year. In particular, Claude Code made waves for being an easy-to-use agentic coding solution that could keep track of an existing codebase. You could point it at your files, and it would autonomously work to implement what you wanted to see in a software application.

OpenAI followed with its own AI coding agent, Codex, in March. Both tools (and others like GitHub Copilot and Cursor) have become so popular that during an AI service outage in September, developers joked online about being forced to code “like cavemen” without the AI tools. While we’re still clearly far from a world where AI does all the coding, developer uptake has been significant, and 90 percent of Fortune 100 companies are using it to some degree or another.

Bubble talk grows as AI infrastructure demands soar

While AI’s technical limitations became clearer and its human costs mounted throughout the year, financial commitments only grew larger. Nvidia hit a $4 trillion valuation in July on AI chip demand, then reached $5 trillion in October as CEO Jensen Huang dismissed bubble concerns. OpenAI announced a massive Texas data center in July, then revealed in September that a $100 billion potential deal with Nvidia would require power equivalent to ten nuclear reactors.

The company eyed a $1 trillion IPO in October despite major quarterly losses. Tech giants poured billions into Anthropic in November in what looked increasingly like a circular investment, with everyone funding everyone else’s moonshots. Meanwhile, AI operations in Wyoming threatened to consume more electricity than the state’s human residents.

An

By fall, warnings about sustainability grew louder. In October, tech critic Ed Zitron joined Ars Technica for a live discussion asking whether the AI bubble was about to pop. That same month, the Bank of England warned that the AI stock bubble rivaled the 2000 dotcom peak. In November, Google CEO Sundar Pichai acknowledged that if the bubble pops, “no one is getting out clean.”

The contradictions had become difficult to ignore: Anthropic’s CEO predicted in January that AI would surpass “almost all humans at almost everything” by 2027, while by year’s end, the industry’s most advanced models still struggled with basic reasoning tasks and reliable source citation.

To be sure, it’s hard to see this not ending in some market carnage. The current “winner-takes-most” mentality in the space means the bets are big and bold, but the market can’t support dozens of major independent AI labs or hundreds of application-layer startups. That’s the definition of a bubble environment, and when it pops, the only question is how bad it will be: a stern correction or a collapse.

Looking ahead

This was just a brief review of some major themes in 2025, but so much more happened. We didn’t even mention above how capable AI video synthesis models have become this year, with Google’s Veo 3 adding sound generation and Wan 2.2 through 2.5 providing open-weights AI video models that could easily be mistaken for real products of a camera.

If 2023 and 2024 were defined by AI prophecy—that is, by sweeping claims about imminent superintelligence and civilizational rupture—then 2025 was the year those claims met the stubborn realities of engineering, economics, and human behavior. The AI systems that dominated headlines this year were shown to be mere tools. Sometimes powerful, sometimes brittle, these tools were often misunderstood by the people deploying them, in part because of the prophecy surrounding them.

The collapse of the “reasoning” mystique, the legal reckoning over training data, the psychological costs of anthropomorphized chatbots, and the ballooning infrastructure demands all point to the same conclusion: The age of institutions presenting AI as an oracle is ending. What’s replacing it is messier and less romantic but far more consequential—a phase where these systems are judged by what they actually do, who they harm, who they benefit, and what they cost to maintain.

None of this means progress has stopped. AI research will continue, and future models will improve in real and meaningful ways. But improvement is no longer synonymous with transcendence. Increasingly, success looks like reliability rather than spectacle, integration rather than disruption, and accountability rather than awe. In that sense, 2025 may be remembered not as the year AI changed everything but as the year it stopped pretending it already had. The prophet has been demoted. The product remains. What comes next will depend less on miracles and more on the people who choose how, where, and whether these tools are used at all.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

From prophet to product: How AI came back down to earth in 2025 Read More »

conde-nast-user-database-reportedly-breached,-ars-unaffected

Condé Nast User database reportedly breached, Ars unaffected

Earlier this month, a hacker named Lovely claimed to have breached a Condé Nast user database and released a list of more than 2.3 million user records from our sister publication WIRED. The released materials contain demographic information (name, email, address, phone, etc.), but no passwords.

The hacker also says that they will release an additional 40 million records for other Condé Nast properties, including our other sister publications Vogue, The New Yorker, Vanity Fair, and more. Of critical note to our readers, Ars Technica was not affected as we run on our own bespoke tech stack.

The hacker said that they had urged Condé Nast to patch vulnerabilities to no avail. “Condé Nast does not care about the security of their users data,” they wrote. “It took us an entire month to convince them to fix the vulnerabilities on their websites. We will leak more of their users’ data (40 + million) over the next few weeks. Enjoy!”

It’s unclear how altruistic the motive really was. DataBreaches.Net says that Lovely misled them into believing they were trying to help patch vulnerabilities, when in reality, it appeared that this hacker was a “cybercriminal” looking for a payout. “As for “Lovely,” they played me. Condé Nast should never pay them a dime, and no one else should ever, as their word clearly cannot be trusted,” they wrote.

Condé Nast has not issued a statement, and we have not been informed internally of the hack (which is not surprising, since Ars is not affected).

Hudon Rock’s InfoStealers has an excellent rundown of what has been exposed.

Condé Nast User database reportedly breached, Ars unaffected Read More »

gps-is-vulnerable-to-jamming—here’s-how-we-might-fix-it

GPS is vulnerable to jamming—here’s how we might fix it


GPS jamming has gotten cheap and easy, but there are potential solutions.

In September 2025, a Widerøe Airlines flight was trying to land in Vardø, Norway, which sits in the country’s far eastern arm, some 40 miles from the Russian coast. The cloud deck was low, and so was visibility. In such gray situations, pilots use GPS technology to help them land on a runway and not the side of a mountain.

But on this day, GPS systems weren’t working correctly, the airwaves jammed with signals that prevented airplanes from accessing navigation information. The Widerøe flight had taken off during one of Russia’s frequent wargames, in which the country’s military simulates conflict as a preparation exercise. This one involved an imaginary war with a country. It was nicknamed Zapad-2025—translating to “West-2025”—and was happening just across the fjord from Vardø. According to European officials, GPS interference was frequent in the runup to the exercise. Russian forces, they suspected, were using GPS-signal-smashing technology, a tactic used in non-pretend conflict, too. (Russia has denied some allegations of GPS interference in the past.)

Without that guidance from space, and with the cloudy weather, the Widerøe plane had to abort its landing and continue down the coast away from Russia, to Båtsfjord, a fishing village.

The part of Norway in which this interruption occurred is called Finnmark. GPS disruption there is near-constant; problems linked to Russian interference have increased since the invasion of Ukraine.

Military and Pokemon players?

It’s one of the starkest geographic examples of how vulnerable GPS technology is. But such disturbances happen at a lower level all over the globe. The world’s militaries (including that of the United States) are big culprits, breaking out devices that can confuse or disrupt drones, missiles, and aircraft. But the equipment required to interfere with GPS at a less-than-military level is cheap and accessible and affects other aspects of life: Truck drivers, for instance, use it to look like they’ve delivered cargo on time. Players use it to fool augmented-reality games.

Given all this disruption, more U.S. institutions, from the Department of Defense to the Department of Transportation to the Federal Aviation Administration, are making moves toward alternatives and complements for GPS, though perhaps imperfectly. And the existing system has been undergoing a huge modernization program, introducing better-encrypted signals for military users, more varieties of signals for civilians, and higher-power signals for both to the tune of at least $22 billion. The military’s 2025 budget additionally requested $1.5 billion for more resilient “position, navigation, and timing” programs. Other departments have invested smaller amounts. In October 2025, for instance, the Department of Transportation awarded $5 million total to five companies to develop and demonstrate technologies complementary to GPS.

The update’s goals are to make the system more accurate, and harder to mess with. But as threats increase in frequency and sophistication, more work is necessary. “Sooner or later, we’re gonna see bad things happening here,” said John Langer, a GPS expert at the Aerospace Corporation, a nonprofit research organization. “So we need to armor up for it before it happens.”

GPS is the invisible spine of society, in more ways than most people realize. It became central quickly after the satellite system, built in the 1970s for the military, was optimized for civilians. “Part of what makes GPS so successful is that it’s ubiquitous and it’s inexpensive,” said Langer.

Losing GPS would mean losing a lot more than Google Maps. The technology is integrated into everything from lights that turn on at sunset to dating apps that match users nearby. Its signals also undergird the electrical grid, cell networks, banking, defense technology, and the movements of robots used in industries like agriculture.

The U.S. government currently has 31 GPS satellites in orbit around Earth, and three other governments have their own systems: Russia made one called GLONASS, China created BeiDou, and the European Union built Galileo; all four systems’ data is available to the international community.

Finding your place

GPS works in a deceptively simple way: Each satellite carries an atomic clock aboard. It broadcasts that clock’s time toward Earth. That signal alone is what’s useful to energy infrastructure and financial transactions. But to get position information, a receiver—in a phone or other device—simply has to pick up signals from at least four satellites. It knows what time those signals were sent, where the satellites were when they sent them, and how long it took the signals to arrive. Through fancy triangulation, the phone (or guided missile) then computes its own location.

Or at least that’s the idea. GPS can be jammed, meaning that someone broadcasts a signal much stronger than that of GPS (which has had to travel across thousands of miles of space, and grows weaker with every meter), drowning the real signal in noise. It can also be spoofed, meaning someone sends out a fake signal that looks just like a GPS blip but indicates an incorrect location or time.

Image of the globe centered on the Caribbean. Three satellites are superimposed on it, each of them with a colored circle around it. A pin highlights the point where the three circles intersect.

Three satellites are needed to pinpoint a location on Earth. Credit: NASA/JPL-Caltech

Threats like these were always a possibility—and those who built GPS knew about that problem from the beginning, said Todd Walter, director of the Stanford GPS Lab. “Around 2000 is when people got a little more serious about it,” he said. Hardware and software became cheaper, lowering the barrier to swamping or faking signals.

Problems ticked up when the augmented reality game Pokémon GO came online, in 2016. The game required people to travel to places in real life to win. Turns out, not all of them actually wanted to. “All of a sudden, everyone was interested in spoofing,” said Walter.

Pokémon GO cheaters used low-power devices close to the ground, and so didn’t affect cruising aircraft like Widerøe’s. The game made cheating high-tech and furthered methods and technology for signal scrambling, making it available to non-experts, Walter said. At the same time, spoofing arose in conflict zones, where drone and missile attacks are often guided by GPS. Don’t want to get hit by one? Fool its navigation system. “So now people say, ‘Well, we need to protect ourselves from that,’” said Walter. “And so then you see a huge increase in very powerful jamming and spoofing.”

In Norway, officials have noted that GPS disruptions, while most commonly affecting flights thousands of feet in the air, can also cause issues for police cars, ambulances, and ships. According to Espen Slette, director of the spectrum department at the Norwegian Communications Authority (known as Nkom), the agency has detected GPS jammers near hospitals, which could force life-saving helicopters to redirect to a more distant facility. Nkom has also clocked disruptions that affect agriculture and construction operations, while emergency responders have warned about how problems might home in on emergency beacon devices, like the satellite SOS buttons many people carry in the backcountry or aboard boats. The police’s chief of staff in Finnmark encouraged anyone venturing out to, old-school, carry a map and compass.

“It’s hard to grasp the full effect this has on society,” Slette wrote in an email.

Such widespread disruptions are not isolated to the Russia-adjacent Arctic. There are hotspots in Myanmar, most likely associated with drone warfare in the area; on the Black Sea, publicly associated with Russia, which has denied some cases of GPS interference; and in southern Texas, potentially from drug cartels near the border. A report from OpsGroup, a membership organization for international aviation personnel, found a marked increase in spoofing in 2024. “By January 2024, an average of 300 flights a day were being spoofed,” the report said. “By August 2024, this had grown to around 1500 flights per day.” From July 15 to Aug. 15, 2024, 41,000 flights total experienced spoofing. (While in the U.S., it’s generally illegal for civilians to jam or spoof signals, military-led disruptions during conflict are considered a legitimate and legal use-case.)

No going back

The uptick indicates that there’s no going back to a world without disruption hotspots. And that, combined with humans’ dependence on GPS, is why scientists and engineers are working on ways to shore up the system—and develop backchannels so a single-point failure doesn’t come to bite anyone, in conflict or in peacetime.

“There are many ways to mitigate GPS disruptions,” Slette wrote in an email. He suggests setting up devices to use signals from all four international constellations, and to install better receivers and antennas. That’s easier for militaries or infrastructure companies, and hard for people who are just buying the latest model of cell phone and have no control over its innards. But existing backups can tell a given device that something fishy may be up. Planes have inertial navigation systems, which mostly use motion-sensing devices to get an independent measurement; phones do too, and they can also check their data against cell towers, to see if something is off in their GPS signal.

But the U.S. government is worried enough about GPS issues that, across civilian and military agencies, research and development for more robust and resilient systems is ramping up. In March, for instance, the Federal Communications Commission launched a proceeding on GPS alternatives, exploring tools that could be used in addition to or instead of traditional GPS.

The Defense Advanced Research Projects Agency, or DARPA, and the Defense Innovation Unit, meanwhile, are investigating how quantum sensors might help with position, timing, and navigation. The United States’ military branches are also working on their alternative position, navigation, and timing capabilities, and their innovation arms like the Space Force’s SpaceWerx organization are running challenges to support alternative technologies. The Department of Defense acknowledges challenges to GPS and the consequent need to diversify the ways it gets position, navigation, and timing information, noting that it is pursuing the integration of alternative capabilities, according to a statement that public affairs officer Chelsea Dietlin requested be attributed to a Pentagon spokesperson. It is also looking toward working with commercial companies.

Even the Department of Transportation has a strategic plan that includes promoting technologies complementary to GPS. (Undark reached out multiple times to the Department of Transportation to request comment but did not receive a response.) A statement that FAA media relations specialist Cassandra Nolan requested be attributed to an agency spokesperson noted that the FAA is working on a system to detect GPS interference, and that it is working with the Department of Defense on navigation signals and antennas that are more resilient. In addition, the statement noted, the FAA already has “a layered aircraft tracking system that incorporates multiple technologies to guard against threats to Global Navigation Satellite Systems (GNSS).”

But the newer efforts across government may not be as connected as they could be, according to Dana Goward, president of the Resilient Navigation and Timing Foundation, a nonprofit advocacy group that largely comprises companies working in the GPS-problem space. For one, he said, efforts to bolster military and civilian systems have a fairly strict line between them. And neither has been as effective as he’d advocate: On the military side, plentiful programs exist, but they may not be working together. “It’s not clear if there is any coordination or synergies between the projects or how much senior leader support there is for comprehensive solution sets,” Goward wrote in an email.

On the civil side, Congress mandated in 2018 that a backup to GPS be established, but only experimental systems exist so far. There also have been efforts to repeal the law, with the disputed rationale that funding a single system isn’t feasible and there are better paths toward resilience. Goward contended that the government has hoped the private sector will come up with a usable solution, saving the government from creating one itself.

Starting over

And companies are coming to cash in on that desire, offering their solutions to both government agencies and other industries. “Our founding hypothesis was ‘let’s take 50 years of lessons learned but throw out the rulebook and do a clean-sheet design of a new GPS system incorporating a couple of fundamentals,’” said Patrick Shannon, CEO of one such company, called TrustPoint. The company, which has hired scientific and engineering experts in signal processing and space, aims to have a fleet of small satellites orbiting much closer to Earth than the current GPS constellation, and transmitting at a higher frequency.

TrustPoint’s satellites, a few of which have already gone to orbit, also send out an encrypted signal—something harder to spoof. With traditional GPS, only the military gets encrypted signals.

Many Russian jamming systems, he said, work tens of kilometers from their ground zero (their ground zero usually being a truck with a generator aboard). But with TrustPoint’s higher-frequency signals, the effectiveness of the jammer goes down by three times, and the circle of influence becomes 10 times smaller, shrinking even more if the receivers use a special kind of antenna that the U.S. government recently approved.

Messing with signals becomes less feasible, given those changes. “They would need exorbitant numbers of systems, exorbitant numbers of people, and a ton of cash to pull that off,” said Shannon.

So far, TrustPoint has launched three spacecraft, and has gotten five federal contracts in 2024 and 2025, totaling around $8.3 million, with organizations like the Air Force, Space Force, and the Navy.

Another company, called Xona Space Systems, is also putting satellites in low-Earth orbit, and has worked with both the Canadian and U.S. governments. The company plans to broadcast signals 100 times stronger than GPS, giving users two-centimeter precision, and making jamming more difficult. The signal also includes a watermark—a kind of authentication that, at least for now, protects against spoofing. They have launched one satellite that’s being tested by people in industries like agriculture, construction, and mining.

TrustPoint’s technology may offer novel defense against the dark GPS arts, but Xona, whose founders met while students at the Stanford GPS Lab, may have an edge anyway: Its signals are compatible with current infrastructure, so no one has to buy a new device. They just have to update their software. “We are not building receivers ourselves,” said Max Eunice, head of marketing and communications. Instead, they’re relying on the billions of earthly devices that already themselves rely on GPS.

Image of the inside of the cabin of a large farming machine moving through a field of wheat. Screens track its current location and where it has been.

Reliable GPS has become essential for a huge range of industries. Credit: Thomas Barwick

Other solutions, like one called SuperGPS, stay closer to the ground. They use radio transmitters on Earth to do the same things GPS satellites do in space. The setup, as demonstrated by scientists at the Delft University of Technology and VU University in the Netherlands, involves scattering radio transmitters around an area or using those already in place. Each transmitter is synchronized to an atomic clock, which sends the time to transmitters via fiber optic cable, which may already be in a place due to existing communications infrastructure. Receivers can collect signals scattered across a wide range of radio frequencies, making it more difficult to jam or spoof them. The team published a proof of concept in a 2022 Nature paper and is working on a second iteration called SuperGPS2.

Tom Powell, another GPS expert at the Aerospace Corporation, said that looking at alternatives and augmentations like these is important—even though GPS recently underwent the 25-year modernization effort, making its own signals more robust to vulnerabilities. “Now that we have delivered, or nearly completely delivered, this modernization, is there a better way to do it in face of the current realities?” he said. He and other GPS experts don’t have answers yet. “We’re just asking questions right now.”

Walter, the director of the Stanford GPS Lab, thinks that whatever a better path looks like, it will likely still include the old-school, original system. “There’s nothing that really does replace GPS,” he said. “I see articles saying ‘post-GPS World’ and so forth. But really, GPS, I think, will always be there.”

People will, and should, strengthen it, Walter added, but that bolstering is going to be piecemeal—efforts may work in a particular region, or they cover some of GPS’s roles (such as providing accurate time) but not others, or they may back up navigation but not be as accurate. They may also cost money. “GPS is free, so that makes it almost impossible to compete with,” he said.

GPS is also straightforward, said Powell. “As satellites go, they’re pretty simple,” he said. They point at Earth, and they transmit signals that tell what time it is. From that, humans get to live in an interconnected, chronologically propriocepted world. Figuring out how to keep it that way, though, is proving a little more complicated.

This article was originally published on Undark. Read the original article.

GPS is vulnerable to jamming—here’s how we might fix it Read More »

how-ai-coding-agents-work—and-what-to-remember-if-you-use-them

How AI coding agents work—and what to remember if you use them


Agents of uncertain change

From compression tricks to multi-agent teamwork, here’s what makes them tick.

AI coding agents from OpenAI, Anthropic, and Google can now work on software projects for hours at a time, writing complete apps, running tests, and fixing bugs with human supervision. But these tools are not magic and can complicate rather than simplify a software project. Understanding how they work under the hood can help developers know when (and if) to use them, while avoiding common pitfalls.

We’ll start with the basics: At the core of every AI coding agent is a technology called a large language model (LLM), which is a type of neural network trained on vast amounts of text data, including lots of programming code. It’s a pattern-matching machine that uses a prompt to “extract” compressed statistical representations of data it saw during training and provide a plausible continuation of that pattern as an output. In this extraction, an LLM can interpolate across domains and concepts, resulting in some useful logical inferences when done well and confabulation errors when done poorly.

These base models are then further refined through techniques like fine-tuning on curated examples and reinforcement learning from human feedback (RLHF), which shape the model to follow instructions, use tools, and produce more useful outputs.

A screenshot of the Claude Code command-line interface.

A screenshot of the Claude Code command-line interface. Credit: Anthropic

Over the past few years, AI researchers have been probing LLMs’ deficiencies and finding ways to work around them. One recent innovation was the simulated reasoning model, which generates context (extending the prompt) in the form of reasoning-style text that can help an LLM home in on a more accurate output. Another innovation was an application called an “agent” that links several LLMs together to perform tasks simultaneously and evaluate outputs.

How coding agents are structured

In that sense, each AI coding agent is a program wrapper that works with multiple LLMs. There is typically a “supervising” LLM that interprets tasks (prompts) from the human user and then assigns those tasks to parallel LLMs that can use software tools to execute the instructions. The supervising agent can interrupt tasks below it and evaluate the subtask results to see how a project is going. Anthropic’s engineering documentation describes this pattern as “gather context, take action, verify work, repeat.”

If run locally through a command-line interface (CLI), users give the agents conditional permission to write files on the local machine (code or whatever is needed), run exploratory commands (say, “ls” to list files in a directory), fetch websites (usually using “curl”), download software, or upload files to remote servers. There are lots of possibilities (and potential dangers) with this approach, so it needs to be used carefully.

In contrast, when a user starts a task in the web-based agent like the web versions of Codex and Claude Code, the system provisions a sandboxed cloud container preloaded with the user’s code repository, where Codex can read and edit files, run commands (including test harnesses and linters), and execute code in isolation. Anthropic’s Claude Code uses operating system-level features to create filesystem and network boundaries within which the agent can work more freely.

The context problem

Every LLM has a short-term memory, so to speak, that limits the amount of data it can process before it “forgets” what it’s doing. This is called “context.” Every time you submit a response to the supervising agent, you are amending one gigantic prompt that includes the entire history of the conversation so far (and all the code generated, plus the simulated reasoning tokens the model uses to “think” more about a problem). The AI model then evaluates this prompt and produces an output. It’s a very computationally expensive process that increases quadratically with prompt size because LLMs process every token (chunk of data) against every other token in the prompt.

Anthropic’s engineering team describes context as a finite resource with diminishing returns. Studies have revealed what researchers call “context rot”: As the number of tokens in the context window increases, the model’s ability to accurately recall information decreases. Every new token depletes what the documentation calls an “attention budget.”

This context limit naturally limits the size of a codebase a LLM can process at one time, and if you feed the AI model lots of huge code files (which have to be re-evaluated by the LLM every time you send another response), it can burn up token or usage limits pretty quickly.

Tricks of the trade

To get around these limits, the creators of coding agents use several tricks. For example, AI models are fine-tuned to write code to outsource activities to other software tools. For example, they might write Python scripts to extract data from images or files rather than feeding the whole file through an LLM, which saves tokens and avoids inaccurate results.

Anthropic’s documentation notes that Claude Code also uses this approach to perform complex data analysis over large databases, writing targeted queries and using Bash commands like “head” and “tail” to analyze large volumes of data without ever loading the full data objects into context.

(In a way, these AI agents are guided but semi-autonomous tool-using programs that are a major extension of a concept we first saw in early 2023.)

Another major breakthrough in agents came from dynamic context management. Agents can do this in a few ways that are not fully disclosed in proprietary coding models, but we do know the most important technique they use: context compression.

The command line version of OpenAI codex running in a macOS terminal window.

The command-line version of OpenAI Codex running in a macOS terminal window. Credit: Benj Edwards

When a coding LLM nears its context limit, this technique compresses the context history by summarizing it, losing details in the process but shortening the history to key details. Anthropic’s documentation describes this “compaction” as distilling context contents in a high-fidelity manner, preserving key details like architectural decisions and unresolved bugs while discarding redundant tool outputs.

This means the AI coding agents periodically “forget” a large portion of what they are doing every time this compression happens, but unlike older LLM-based systems, they aren’t completely clueless about what has transpired and can rapidly re-orient themselves by reading existing code, written notes left in files, change logs, and so on.

Anthropic’s documentation recommends using CLAUDE.md files to document common bash commands, core files, utility functions, code style guidelines, and testing instructions. AGENTS.md, now a multi-company standard, is another useful way of guiding agent actions in between context refreshes. These files act as external notes that let agents track progress across complex tasks while maintaining critical context that would otherwise be lost.

For tasks requiring extended work, both companies employ multi-agent architectures. According to Anthropic’s research documentation, its system uses an “orchestrator-worker pattern” in which a lead agent coordinates the process while delegating to specialized subagents that operate in parallel. When a user submits a query, the lead agent analyzes it, develops a strategy, and spawns subagents to explore different aspects simultaneously. The subagents act as intelligent filters, returning only relevant information rather than their full context to the lead agent.

The multi-agent approach burns through tokens rapidly. Anthropic’s documentation notes that agents typically use about four times more tokens than chatbot interactions, and multi-agent systems use about 15 times more tokens than chats. For economic viability, these systems require tasks where the value is high enough to justify the increased cost.

Best practices for humans

While using these agents is contentious in some programming circles, if you use one to code a project, knowing good software development practices helps to head off future problems. For example, it’s good to know about version control, making incremental backups, implementing one feature at a time, and testing it before moving on.

What people call “vibe coding”—creating AI-generated code without understanding what it’s doing—is clearly dangerous for production work. Shipping code you didn’t write yourself in a production environment is risky because it could introduce security issues or other bugs or begin gathering technical debt that could snowball over time.

Independent AI researcher Simon Willison recently argued that developers using coding agents still bear responsibility for proving their code works. “Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review,” Willison wrote. “That’s no longer valuable. What’s valuable is contributing code that is proven to work.”

In fact, human planning is key. Claude Code’s best practices documentation recommends a specific workflow for complex problems: First, ask the agent to read relevant files and explicitly tell it not to write any code yet, then ask it to make a plan. Without these research and planning steps, the documentation warns, Claude’s outputs tend to jump straight to coding a solution.

Without planning, LLMs sometimes reach for quick solutions to satisfy a momentary objective that might break later if a project were expanded. So having some idea of what makes a good architecture for a modular program that can be expanded over time can help you guide the LLM to craft something more durable.

As mentioned above, these agents aren’t perfect, and some people prefer not to use them at all. A randomized controlled trial published by the nonprofit research organization METR in July 2025 found that experienced open-source developers actually took 19 percent longer to complete tasks when using AI tools, despite believing they were working faster. The study’s authors note several caveats: The developers were highly experienced with their codebases (averaging five years and 1,500 commits), the repositories were large and mature, and the models used (primarily Claude 3.5 and 3.7 Sonnet via Cursor) have since been superseded by more capable versions.

Whether newer models would produce different results remains an open question, but the study suggests that AI coding tools may not always provide universal speed-ups, particularly for developers who already know their codebases well.

Given these potential hazards, coding proof-of-concept demos and internal tools is probably the ideal use of coding agents right now. Since AI models have no actual agency (despite being called agents) and are not people who can be held accountable for mistakes, human oversight is key.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

How AI coding agents work—and what to remember if you use them Read More »

openai’s-new-chatgpt-image-generator-makes-faking-photos-easy

OpenAI’s new ChatGPT image generator makes faking photos easy

For most of photography’s roughly 200-year history, altering a photo convincingly required either a darkroom, some Photoshop expertise, or, at minimum, a steady hand with scissors and glue. On Tuesday, OpenAI released a tool that reduces the process to typing a sentence.

It’s not the first company to do so. While OpenAI had a conversational image-editing model in the works since GPT-4o in 2024, Google beat OpenAI to market in March with a public prototype, then refined it to a popular model called Nano Banana image model (and Nano Banana Pro). The enthusiastic response to Google’s image-editing model in the AI community got OpenAI’s attention.

OpenAI’s new GPT Image 1.5 is an AI image synthesis model that reportedly generates images up to four times faster than its predecessor and costs about 20 percent less through the API. The model rolled out to all ChatGPT users on Tuesday and represents another step toward making photorealistic image manipulation a casual process that requires no particular visual skills.

The

The “Galactic Queen of the Universe” added to a photo of a room with a sofa using GPT Image 1.5 in ChatGPT.

GPT Image 1.5 is notable because it’s a “native multimodal” image model, meaning image generation happens inside the same neural network that processes language prompts. (In contrast, DALL-E 3, an earlier OpenAI image generator previously built into ChatGPT, used a different technique called diffusion to generate images.)

This newer type of model, which we covered in more detail in March, treats images and text as the same kind of thing: chunks of data called “tokens” to be predicted, patterns to be completed. If you upload a photo of your dad and type “put him in a tuxedo at a wedding,” the model processes your words and the image pixels in a unified space, then outputs new pixels the same way it would output the next word in a sentence.

Using this technique, GPT Image 1.5 can more easily alter visual reality than earlier AI image models, changing someone’s pose or position, or rendering a scene from a slightly different angle, with varying degrees of success. It can also remove objects, change visual styles, adjust clothing, and refine specific areas while preserving facial likeness across successive edits. You can converse with the AI model about a photograph, refining and revising, the same way you might workshop a draft of an email in ChatGPT.

OpenAI’s new ChatGPT image generator makes faking photos easy Read More »

browser-extensions-with-8-million-users-collect-extended-ai-conversations

Browser extensions with 8 million users collect extended AI conversations

Besides ChatGPT, Claude, and Gemini, the extensions harvest all conversations from Copilot, Perplexity, DeepSeek, Grok, and Meta AI. Koi said the full description of the data captured includes:

  • Every prompt a user sends to the AI
  • Every response received
  • Conversation identifiers and timestamps
  • Session metadata
  • The specific AI platform and model used

The executor script runs independently from the VPN networking, ad blocking, or other core functionality. That means that even when a user toggles off VPN networking, AI protection, ad blocking, or other functions, the conversation collection continues. The only way to stop the harvesting is to disable the extension in the browser settings or to uninstall it.

Koi said it first discovered the conversation harvesting in Urban VPN Proxy, a VPN routing extension that lists “AI protection” as one of its benefits. The data collection began in early July with the release of version 5.5.0.

“Anyone who used ChatGPT, Claude, Gemini, or the other targeted platforms while Urban VPN was installed after July 9, 2025 should assume those conversations are now on Urban VPN’s servers and have been shared with third parties,” the company said. “Medical questions, financial details, proprietary code, personal dilemmas—all of it, sold for ‘marketing analytics purposes.’”

Following that discovery, the security firm uncovered seven additional extensions with identical AI harvesting functionality. Four of the extensions are available in the Chrome Web Store. The other four are on the Edge add-ons page. Collectively, they have been installed more than 8 million times.

They are:

Chrome Store

  • Urban VPN Proxy: 6 million users
  • 1ClickVPN Proxy: 600,000 users
  • Urban Browser Guard: 40,000 users
  • Urban Ad Blocker: 10,000 users

Edge Add-ons:

  • Urban VPN Proxy: 1,32 million users
  • 1ClickVPN Proxy: 36,459 users
  • Urban Browser Guard – 12,624 users
  • Urban Ad Blocker – 6,476 users

Read the fine print

The extensions come with conflicting messages about how they handle bot conversations, which often contain deeply personal information about users’ physical and mental health, finances, personal relationships, and other sensitive information that could be a gold mine for marketers and data brokers. The Urban VPN Proxy in the Chrome Web Store, for instance, lists “AI protection” as a benefit. It goes on to say:

Browser extensions with 8 million users collect extended AI conversations Read More »

roomba-maker-irobot-swept-into-bankruptcy

Roomba maker iRobot swept into bankruptcy

In recent years, it has faced competition from cheaper Chinese rivals, including Picea, putting pressure on sales and forcing iRobot to reduce headcount. A management shake-up in early 2024 saw the departure of its co-founder as chief executive.

Amazon proposed buying the company in 2023, seeing synergy with its Alexa-powered smart speakers and Ring doorbells.

EU regulators, however, pushed back on the deal, raising concerns it would lead to reduced visibility for rival vacuum cleaner brands on Amazon’s website.

Amazon and iRobot terminated the deal little more than a month after Adobe’s $10 billion purchase of design software maker Figma was abandoned amid heightened US antitrust scrutiny under Joe Biden’s administration.

Although iRobot received $94 million in compensation for the termination of its deal with Amazon, a significant portion was used to pay advisory fees and repay part of a $200 million loan from private equity group Carlyle.

Picea’s Hong Kong subsidiary acquired the remaining $191 million of debt from Carlyle last month. At the time, iRobot already owed Picea $161.5 million for manufacturing services, nearly $91 million of which was overdue.

Alvarez & Marsal is serving as iRobot’s investment banker and financial adviser. The company is receiving legal advice from Paul, Weiss, Rifkind, Wharton & Garrison.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Roomba maker iRobot swept into bankruptcy Read More »

merriam-webster’s-word-of-the-year-delivers-a-dismissive-verdict-on-junk-ai-content

Merriam-Webster’s word of the year delivers a dismissive verdict on junk AI content

Like most tools, generative AI models can be misused. And when the misuse gets bad enough that a major dictionary notices, you know it’s become a cultural phenomenon.

On Sunday, Merriam-Webster announced that “slop” is its 2025 Word of the Year, reflecting how the term has become shorthand for the flood of low-quality AI-generated content that has spread across social media, search results, and the web at large. The dictionary defines slop as “digital content of low quality that is produced usually in quantity by means of artificial intelligence.”

“It’s such an illustrative word,” Merriam-Webster president Greg Barlow told the Associated Press. “It’s part of a transformative technology, AI, and it’s something that people have found fascinating, annoying, and a little bit ridiculous.”

To select its Word of the Year, Merriam-Webster’s editors review data on which words rose in search volume and usage, then reach consensus on which term best captures the year. Barlow told the AP that the spike in searches for “slop” reflects growing awareness among users that they are encountering fake or shoddy content online.

Dictionaries have been tracking AI’s impact on language for the past few years, with Cambridge having selected “hallucinate” as its 2023 word of the year due to the tendency of AI models to generate plausible-but-false information (long-time Ars readers will be happy to hear there’s another word term for that in the dictionary as well).

The trend extends to online culture in general, which is ripe with new coinages. This year, Oxford University Press chose “rage bait,” referring to content designed to provoke anger for engagement. Cambridge Dictionary selected “parasocial,” describing one-sided relationships between fans and celebrities or influencers.

The difference between the baby and the bathwater

As the AP points out, the word “slop” originally entered English in the 1700s to mean soft mud. By the 1800s, it had evolved to describe food waste fed to pigs, and eventually came to mean rubbish or products of little value. The new AI-related definition builds on that history of describing something unwanted and unpleasant.

Merriam-Webster’s word of the year delivers a dismissive verdict on junk AI content Read More »

microsoft-will-finally-kill-obsolete-cipher-that-has-wreaked-decades-of-havoc

Microsoft will finally kill obsolete cipher that has wreaked decades of havoc

Microsoft said it has steadily worked over the past decade to deprecate RC4, but that the task wasn’t easy.

No salt, no iteration? Really?

“The problem though is that it’s hard to kill off a cryptographic algorithm that is present in every OS that’s shipped for the last 25 years and was the default algorithm for so long, Steve Syfuhs, who runs Microsoft’s Windows Authentication team, wrote on Bluesky. “See,” he continued, “the problem is not that the algorithm exists. The problem is how the algorithm is chosen, and the rules governing that spanned 20 years of code changes.”

Over those two decades, developers discovered a raft of critical RC4 vulnerabilities that required “surgical” fixes. Microsoft considered deprecating RC4 by this year, but ultimately “punted” after discovering vulnerabilities that required still more fixes. During that time Microsoft introduced some “minor improvements” that favored the use of AES, and as a result, usage dropped by “orders of magnitude.”

“Within a year we had observed RC4 usage drop to basically nil. This is not a bad thing and in fact gave us a lot more flexibility to kill it outright because we knew it genuinely wasn’t going to break folks, because folks weren’t using it.”

Syfuhs went on to document additional challenges Microsoft encountered and the approach it took to solving them.

While RC4 has known cipher weaknesses that make it insecure, Kerberoasting exploits a separate weakness. As implemented in Active Directory authentication, it uses no cryptographic salt and a single round of the MD4 hashing function. Salt is a technique that adds random input to each password before it is hashed. That requires hackers to invest considerable time and resources into cracking the hash. MD4, meanwhile, is a fast algorithm that requires modest resources. Microsoft’s implementation of AES-SHA1 is much slower and iterates the hash to further slow down cracking efforts. Taken together, AES-Sha1-hashed passwords require about 1,000 times the time and resources to be cracked.

Windows admins would do well to audit their networks for any usage of RC4. Given its wide adoption and continued use industry-wide, it may still be active, much to the surprise and chagrin of those charged with defending against hackers.

Microsoft will finally kill obsolete cipher that has wreaked decades of havoc Read More »

openai-built-an-ai-coding-agent-and-uses-it-to-improve-the-agent-itself

OpenAI built an AI coding agent and uses it to improve the agent itself


“The vast majority of Codex is built by Codex,” OpenAI told us about its new AI coding agent.

With the popularity of AI coding tools rising among some software developers, their adoption has begun to touch every aspect of the process, including the improvement of AI coding tools themselves.

In interviews with Ars Technica this week, OpenAI employees revealed the extent to which the company now relies on its own AI coding agent, Codex, to build and improve the development tool. “I think the vast majority of Codex is built by Codex, so it’s almost entirely just being used to improve itself,” said Alexander Embiricos, product lead for Codex at OpenAI, in a conversation on Tuesday.

Codex, which OpenAI launched in its modern incarnation as a research preview in May 2025, operates as a cloud-based software engineering agent that can handle tasks like writing features, fixing bugs, and proposing pull requests. The tool runs in sandboxed environments linked to a user’s code repository and can execute multiple tasks in parallel. OpenAI offers Codex through ChatGPT’s web interface, a command-line interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

The “Codex” name itself dates back to a 2021 OpenAI model based on GPT-3 that powered GitHub Copilot’s tab completion feature. Embiricos said the name is rumored among staff to be short for “code execution.” OpenAI wanted to connect the new agent to that earlier moment, which was crafted in part by some who have left the company.

“For many people, that model powering GitHub Copilot was the first ‘wow’ moment for AI,” Embiricos said. “It showed people the potential of what it can mean when AI is able to understand your context and what you’re trying to do and accelerate you in doing that.”

A place to enter a prompt, set parameters, and click

The interface for OpenAI’s Codex in ChatGPT. Credit: OpenAI

It’s no secret that the current command-line version of Codex bears some resemblance to Claude Code, Anthropic’s agentic coding tool that launched in February 2025. When asked whether Claude Code influenced Codex’s design, Embiricos parried the question but acknowledged the competitive dynamic. “It’s a fun market to work in because there’s lots of great ideas being thrown around,” he said. He noted that OpenAI had been building web-based Codex features internally before shipping the CLI version, which arrived after Anthropic’s tool.

OpenAI’s customers apparently love the command line version, though. Embiricos said Codex usage among external developers jumped 20 times after OpenAI shipped the interactive CLI extension alongside GPT-5 in August 2025. On September 15, OpenAI released GPT-5 Codex, a specialized version of GPT-5 optimized for agentic coding, which further accelerated adoption.

It hasn’t just been the outside world that has embraced the tool. Embiricos said the vast majority of OpenAI’s engineers now use Codex regularly. The company uses the same open-source version of the CLI that external developers can freely download, suggest additions to, and modify themselves. “I really love this about our team,” Embiricos said. “The version of Codex that we use is literally the open source repo. We don’t have a different repo that features go in.”

The recursive nature of Codex development extends beyond simple code generation. Embiricos described scenarios where Codex monitors its own training runs and processes user feedback to “decide” what to build next. “We have places where we’ll ask Codex to look at the feedback and then decide what to do,” he said. “Codex is writing a lot of the research harness for its own training runs, and we’re experimenting with having Codex monitoring its own training runs.” OpenAI employees can also submit a ticket to Codex through project management tools like Linear, assigning it tasks the same way they would assign work to a human colleague.

This kind of recursive loop, of using tools to build better tools, has deep roots in computing history. Engineers designed the first integrated circuits by hand on vellum and paper in the 1960s, then fabricated physical chips from those drawings. Those chips powered the computers that ran the first electronic design automation (EDA) software, which in turn enabled engineers to design circuits far too complex for any human to draft manually. Modern processors contain billions of transistors arranged in patterns that exist only because software made them possible. OpenAI’s use of Codex to build Codex seems to follow the same pattern: each generation of the tool creates capabilities that feed into the next.

But describing what Codex actually does presents something of a linguistic challenge. At Ars Technica, we try to reduce anthropomorphism when discussing AI models as much as possible while also describing what these systems do using analogies that make sense to general readers. People can talk to Codex like a human, so it feels natural to use human terms to describe interacting with it, even though it is not a person and simulates human personality through statistical modeling.

The system runs many processes autonomously, addresses feedback, spins off and manages child processes, and produces code that ships in real products. OpenAI employees call it a “teammate” and assign it tasks through the same tools they use for human colleagues. Whether the tasks Codex handles constitute “decisions” or sophisticated conditional logic smuggled through a neural network depends on definitions that computer scientists and philosophers continue to debate. What we can say is that a semi-autonomous feedback loop exists: Codex produces code under human direction, that code becomes part of Codex, and the next version of Codex produces different code as a result.

Building faster with “AI teammates”

According to our interviews, the most dramatic example of Codex’s internal impact came from OpenAI’s development of the Sora Android app. According to Embiricos, the development tool allowed the company to create the app in record time.

“The Sora Android app was shipped by four engineers from scratch,” Embiricos told Ars. “It took 18 days to build, and then we shipped it to the app store in 28 days total,” he said. The engineers already had the iOS app and server-side components to work from, so they focused on building the Android client. They used Codex to help plan the architecture, generate sub-plans for different components, and implement those components.

Despite OpenAI’s claims of success with Codex in house, it’s worth noting that independent research has shown mixed results for AI coding productivity. A METR study published in July found that experienced open source developers were actually 19 percent slower when using AI tools on complex, mature codebases—though the researchers noted AI may perform better on simpler projects.

Ed Bayes, a designer on the Codex team, described how the tool has changed his own workflow. Bayes said Codex now integrates with project management tools like Linear and communication platforms like Slack, allowing team members to assign coding tasks directly to the AI agent. “You can add Codex, and you can basically assign issues to Codex now,” Bayes told Ars. “Codex is literally a teammate in your workspace.”

This integration means that when someone posts feedback in a Slack channel, they can tag Codex and ask it to fix the issue. The agent will create a pull request, and team members can review and iterate on the changes through the same thread. “It’s basically approximating this kind of coworker and showing up wherever you work,” Bayes said.

For Bayes, who works on the visual design and interaction patterns for Codex’s interfaces, the tool has enabled him to contribute code directly rather than handing off specifications to engineers. “It kind of gives you more leverage. It enables you to work across the stack and basically be able to do more things,” he said. He noted that designers at OpenAI now prototype features by building them directly, using Codex to handle the implementation details.

The command line version of OpenAI codex running in a macOS terminal window.

The command line version of OpenAI codex running in a macOS terminal window. Credit: Benj Edwards

OpenAI’s approach treats Codex as what Bayes called “a junior developer” that the company hopes will graduate into a senior developer over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes said. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.”

Given this teammate approach, will there be anything left for humans to do? When asked, Embiricos drew a distinction between “vibe coding,” where developers accept AI-generated code without close review, and what AI researcher Simon Willison calls “vibe engineering,” where humans stay in the loop. “We see a lot more vibe engineering in our code base,” he said. “You ask Codex to work on that, maybe you even ask for a plan first. Go back and forth, iterate on the plan, and then you’re in the loop with the model and carefully reviewing its code.”

He added that vibe coding still has its place for prototypes and throwaway tools. “I think vibe coding is great,” he said. “Now you have discretion as a human about how much attention you wanna pay to the code.”

Looking ahead

Over the past year, “monolithic” large language models (LLMs) like GPT-4.5 have apparently become something of a dead end in terms of frontier benchmarking progress as AI companies pivot to simulated reasoning models and also agentic systems built from multiple AI models running in parallel. We asked Embiricos whether agents like Codex represent the best path forward for squeezing utility out of existing LLM technology.

He dismissed concerns that AI capabilities have plateaued. “I think we’re very far from plateauing,” he said. “If you look at the velocity on the research team here, we’ve been shipping models almost every week or every other week.” He pointed to recent improvements where GPT-5-Codex reportedly completes tasks 30 percent faster than its predecessor at the same intelligence level. During testing, the company has seen the model work independently for 24 hours on complex tasks.

OpenAI faces competition from multiple directions in the AI coding market. Anthropic’s Claude Code and Google’s Gemini CLI offer similar terminal-based agentic coding experiences. This week, Mistral AI released Devstral 2 alongside a CLI tool called Mistral Vibe. Meanwhile, startups like Cursor have built dedicated IDEs around AI coding, reportedly reaching $300 million in annualized revenue.

Given the well-known issues with confabulation in AI models when people attempt to use them as factual resources, could it be that coding has become the killer app for LLMs? We wondered if OpenAI has noticed that coding seems to be a clear business use case for today’s AI models with less hazard than, say, using AI language models for writing or as emotional companions.

“We have absolutely noticed that coding is both a place where agents are gonna get good really fast and there’s a lot of economic value,” Embiricos said. “We feel like it’s very mission-aligned to focus on Codex. We get to provide a lot of value to developers. Also, developers build things for other people, so we’re kind of intrinsically scaling through them.”

But will tools like Codex threaten software developer jobs? Bayes acknowledged concerns but said Codex has not reduced headcount at OpenAI, and “there’s always a human in the loop because the human can actually read the code.” Similarly, the two men don’t project a future where Codex runs by itself without some form of human oversight. They feel the tool is an amplifier of human potential rather than a replacement for it.

The practical implications of agents like Codex extend beyond OpenAI’s walls. Embiricos said the company’s long-term vision involves making coding agents useful to people who have no programming experience. “All humanity is not gonna open an IDE or even know what a terminal is,” he said. “We’re building a coding agent right now that’s just for software engineers, but we think of the shape of what we’re building as really something that will be useful to be a more general agent.”

This article was updated on December 12, 2025 at 6: 50 PM to mention the METR study.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

OpenAI built an AI coding agent and uses it to improve the agent itself Read More »