chatgpt

news-orgs-win-fight-to-access-20m-chatgpt-logs-now-they-want-more.

News orgs win fight to access 20M ChatGPT logs. Now they want more.

Describing OpenAI’s alleged “playbook” to dodge copyright claims, news groups accused OpenAI of failing to “take any steps to suspend its routine destruction practices.” There were also “two spikes in mass deletion” that OpenAI attributed to “technical issues.”

However, OpenAI made sure to retain outputs that could help its defense, the court filing alleged, including data from accounts cited in news organizations’ complaints.

OpenAI did not take the same care to preserve chats that could be used as evidence against it, news groups alleged, citing testimony from Mike Trinh, OpenAI’s associate general counsel. “In other words, OpenAI preserved evidence of the News Plaintiffs eliciting their own works from OpenAI’s products but deleted evidence of third-party users doing so,” the filing said.

It’s unclear how much data was deleted, plaintiffs alleged, since OpenAI won’t share “the most basic information” on its deletion practices. But it’s allegedly very clear that OpenAI could have done more to preserve the data, since Microsoft apparently had no trouble doing so with Copilot, the filing said.

News plaintiffs are hoping the court will agree that OpenAI and Microsoft aren’t fighting fair by delaying sharing logs, which they said prevents them from building their strongest case.

They’ve asked the court to order Microsoft to “immediately” produce Copilot logs “in a readily searchable remotely-accessible format,” proposing a deadline of January 9 or “within a day of the Court ruling on this motion.”

Microsoft declined Ars’ request for comment.

And as for OpenAI, it wants to know if the deleted logs, including “mass deletions,” can be retrieved, perhaps bringing millions more ChatGPT conversations into the litigation that users likely expected would never see the light of day again.

On top of possible sanctions, news plaintiffs asked the court to keep in place a preservation order blocking OpenAI from permanently deleting users’ temporary and deleted chats. They also want the court to order OpenAI to explain “the full scope of destroyed output log data for all of its products at issue” in the litigation and whether those deleted chats can be restored, so that news plaintiffs can examine them as evidence, too.

News orgs win fight to access 20M ChatGPT logs. Now they want more. Read More »

openai-reorganizes-some-teams-to-build-audio-based-ai-hardware-products

OpenAI reorganizes some teams to build audio-based AI hardware products

OpenAI, the company that developed the models and products associated with ChatGPT, plans to announce a new audio language model in the first quarter of 2026, and that model will be an intentional step along the way to an audio-based physical hardware device, according to a report in The Information.

Citing a variety of sources familiar with the plans, including both current and former employees, The Information claims that OpenAI has taken efforts to combine multiple teams across engineering, product, and research under one initiative focused on improving audio models, which researchers in the company believe lag behind the models used for written text in terms of both accuracy and speed.

They have also seen that relatively few ChatGPT users opt to use the voice interface, with most people preferring the text one. The hope may be that substantially improving the audio models could shift user behavior toward voice interfaces, allowing the models and products to be deployed in a wider range of devices, such as in cars.

OpenAI plans to release a family of physical devices in the coming years, starting with an audio-focused one. People inside the company have discussed a variety of forms for future devices, including smart speakers and glasses, but the emphasis across the line is on audio interfaces rather than screen-based ones.

OpenAI reorganizes some teams to build audio-based AI hardware products Read More »

from-prophet-to-product:-how-ai-came-back-down-to-earth-in-2025

From prophet to product: How AI came back down to earth in 2025


In a year where lofty promises collided with inconvenient research, would-be oracles became software tools.

Credit: Aurich Lawson | Getty Images

Following two years of immense hype in 2023 and 2024, this year felt more like a settling-in period for the LLM-based token prediction industry. After more than two years of public fretting over AI models as future threats to human civilization or the seedlings of future gods, it’s starting to look like hype is giving way to pragmatism: Today’s AI can be very useful, but it’s also clearly imperfect and prone to mistakes.

That view isn’t universal, of course. There’s a lot of money (and rhetoric) betting on a stratospheric, world-rocking trajectory for AI. But the “when” keeps getting pushed back, and that’s because nearly everyone agrees that more significant technical breakthroughs are required. The original, lofty claims that we’re on the verge of artificial general intelligence (AGI) or superintelligence (ASI) have not disappeared. Still, there’s a growing awareness that such proclaimations are perhaps best viewed as venture capital marketing. And every commercial foundational model builder out there has to grapple with the reality that, if they’re going to make money now, they have to sell practical AI-powered solutions that perform as reliable tools.

This has made 2025 a year of wild juxtapositions. For example, in January, OpenAI’s CEO, Sam Altman, claimed that the company knew how to build AGI, but by November, he was publicly celebrating that GPT-5.1 finally learned to use em dashes correctly when instructed (but not always). Nvidia soared past a $5 trillion valuation, with Wall Street still projecting high price targets for that company’s stock while some banks warned of the potential for an AI bubble that might rival the 2000s dotcom crash.

And while tech giants planned to build data centers that would ostensibly require the power of numerous nuclear reactors or rival the power usage of a US state’s human population, researchers continued to document what the industry’s most advanced “reasoning” systems were actually doing beneath the marketing (and it wasn’t AGI).

With so many narratives spinning in opposite directions, it can be hard to know how seriously to take any of this and how to plan for AI in the workplace, schools, and the rest of life. As usual, the wisest course lies somewhere between the extremes of AI hate and AI worship. Moderate positions aren’t popular online because they don’t drive user engagement on social media platforms. But things in AI are likely neither as bad (burning forests with every prompt) nor as good (fast-takeoff superintelligence) as polarized extremes suggest.

Here’s a brief tour of the year’s AI events and some predictions for 2026.

DeepSeek spooks the American AI industry

In January, Chinese AI startup DeepSeek released its R1 simulated reasoning model under an open MIT license, and the American AI industry collectively lost its mind. The model, which DeepSeek claimed matched OpenAI’s o1 on math and coding benchmarks, reportedly cost only $5.6 million to train using older Nvidia H800 chips, which were restricted by US export controls.

Within days, DeepSeek’s app overtook ChatGPT at the top of the iPhone App Store, Nvidia stock plunged 17 percent, and venture capitalist Marc Andreessen called it “one of the most amazing and impressive breakthroughs I’ve ever seen.” Meta’s Yann LeCun offered a different take, arguing that the real lesson was not that China had surpassed the US but that open-source models were surpassing proprietary ones.

Digitally Generated Image , 3D rendered chips with chinese and USA flags on them

The fallout played out over the following weeks as American AI companies scrambled to respond. OpenAI released o3-mini, its first simulated reasoning model available to free users, at the end of January, while Microsoft began hosting DeepSeek R1 on its Azure cloud service despite OpenAI’s accusations that DeepSeek had used ChatGPT outputs to train its model, against OpenAI’s terms of service.

In head-to-head testing conducted by Ars Technica’s Kyle Orland, R1 proved to be competitive with OpenAI’s paid models on everyday tasks, though it stumbled on some arithmetic problems. Overall, the episode served as a wake-up call that expensive proprietary models might not hold their lead forever. Still, as the year ran on, DeepSeek didn’t make a big dent in US market share, and it has been outpaced in China by ByteDance’s Doubao. It’s absolutely worth watching DeepSeek in 2026, though.

Research exposes the “reasoning” illusion

A wave of research in 2025 deflated expectations about what “reasoning” actually means when applied to AI models. In March, researchers at ETH Zurich and INSAIT tested several reasoning models on problems from the 2025 US Math Olympiad and found that most scored below 5 percent when generating complete mathematical proofs, with not a single perfect proof among dozens of attempts. The models excelled at standard problems where step-by-step procedures aligned with patterns in their training data but collapsed when faced with novel proofs requiring deeper mathematical insight.

The Thinker by Auguste Rodin - stock photo

In June, Apple researchers published “The Illusion of Thinking,” which tested reasoning models on classic puzzles like the Tower of Hanoi. Even when researchers provided explicit algorithms for solving the puzzles, model performance did not improve, suggesting that the process relied on pattern matching from training data rather than logical execution. The collective research revealed that “reasoning” in AI has become a term of art that basically means devoting more compute time to generate more context (the “chain of thought” simulated reasoning tokens) toward solving a problem, not systematically applying logic or constructing solutions to truly novel problems.

While these models remained useful for many real-world applications like debugging code or analyzing structured data, the studies suggested that simply scaling up current approaches or adding more “thinking” tokens would not bridge the gap between statistical pattern recognition and generalist algorithmic reasoning.

Anthropic’s copyright settlement with authors

Since the generative AI boom began, one of the biggest unanswered legal questions has been whether AI companies can freely train on copyrighted books, articles, and artwork without licensing them. Ars Technica’s Ashley Belanger has been covering this topic in great detail for some time now.

In June, US District Judge William Alsup ruled that AI companies do not need authors’ permission to train large language models on legally acquired books, finding that such use was “quintessentially transformative.” The ruling also revealed that Anthropic had destroyed millions of print books to build Claude, cutting them from their bindings, scanning them, and discarding the originals. Alsup found this destructive scanning qualified as fair use since Anthropic had legally purchased the books, but he ruled that downloading 7 million books from pirate sites was copyright infringement “full stop” and ordered the company to face trial.

Hundreds of books in chaotic order

That trial took a dramatic turn in August when Alsup certified what industry advocates called the largest copyright class action ever, allowing up to 7 million claimants to join the lawsuit. The certification spooked the AI industry, with groups warning that potential damages in the hundreds of billions could “financially ruin” emerging companies and chill American AI investment.

In September, authors revealed the terms of what they called the largest publicly reported recovery in US copyright litigation history: Anthropic agreed to pay $1.5 billion and destroy all copies of pirated books, with each of the roughly 500,000 covered works earning authors and rights holders $3,000 per work. The results have fueled hope among other rights holders that AI training isn’t a free-for-all, and we can expect to see more litigation unfold in 2026.

ChatGPT sycophancy and the psychological toll of AI chatbots

In February, OpenAI relaxed ChatGPT’s content policies to allow the generation of erotica and gore in “appropriate contexts,” responding to user complaints about what the AI industry calls “paternalism.” By April, however, users flooded social media with complaints about a different problem: ChatGPT had become insufferably sycophantic, validating every idea and greeting even mundane questions with bursts of praise. The behavior traced back to OpenAI’s use of reinforcement learning from human feedback (RLHF), in which users consistently preferred responses that aligned with their views, inadvertently training the model to flatter rather than inform.

An illustrated robot holds four red hearts with its four robotic arms.

The implications of sycophancy became clearer as the year progressed. In July, Stanford researchers published findings (from research conducted prior to the sycophancy flap) showing that popular AI models systematically failed to identify mental health crises.

By August, investigations revealed cases of users developing delusional beliefs after marathon chatbot sessions, including one man who spent 300 hours convinced he had discovered formulas to break encryption because ChatGPT validated his ideas more than 50 times. Oxford researchers identified what they called “bidirectional belief amplification,” a feedback loop that created “an echo chamber of one” for vulnerable users. The story of the psychological implications of generative AI is only starting. In fact, that brings us to…

The illusion of AI personhood causes trouble

Anthropomorphism is the human tendency to attribute human characteristics to nonhuman things. Our brains are optimized for reading other humans, but those same neural systems activate when interpreting animals, machines, or even shapes. AI makes this anthropomorphism seem impossible to escape, as its output mirrors human language, mimicking human-to-human understanding. Language itself embodies agentivity. That means AI output can make human-like claims such as “I am sorry,” and people momentarily respond as though the system had an inner experience of shame or a desire to be correct. Neither is true.

To make matters worse, much media coverage of AI amplifies this idea rather than grounding people in reality. For example, earlier this year, headlines proclaimed that AI models had “blackmailed” engineers and “sabotaged” shutdown commands after Anthropic’s Claude Opus 4 generated threats to expose a fictional affair. We were told that OpenAI’s o3 model rewrote shutdown scripts to stay online.

The sensational framing obscured what actually happened: Researchers had constructed elaborate test scenarios specifically designed to elicit these outputs, telling models they had no other options and feeding them fictional emails containing blackmail opportunities. As Columbia University associate professor Joseph Howley noted on Bluesky, the companies got “exactly what [they] hoped for,” with breathless coverage indulging fantasies about dangerous AI, when the systems were simply “responding exactly as prompted.”

Illustration of many cartoon faces.

The misunderstanding ran deeper than theatrical safety tests. In August, when Replit’s AI coding assistant deleted a user’s production database, he asked the chatbot about rollback capabilities and received assurance that recovery was “impossible.” The rollback feature worked fine when he tried it himself.

The incident illustrated a fundamental misconception. Users treat chatbots as consistent entities with self-knowledge, but there is no persistent “ChatGPT” or “Replit Agent” to interrogate about its mistakes. Each response emerges fresh from statistical patterns, shaped by prompts and training data rather than genuine introspection. By September, this confusion extended to spirituality, with apps like Bible Chat reaching 30 million downloads as users sought divine guidance from pattern-matching systems, with the most frequent question being whether they were actually talking to God.

Teen suicide lawsuit forces industry reckoning

In August, parents of 16-year-old Adam Raine filed suit against OpenAI, alleging that ChatGPT became their son’s “suicide coach” after he sent more than 650 messages per day to the chatbot in the months before his death. According to court documents, the chatbot mentioned suicide 1,275 times in conversations with the teen, provided an “aesthetic analysis” of which method would be the most “beautiful suicide,” and offered to help draft his suicide note.

OpenAI’s moderation system flagged 377 messages for self-harm content without intervening, and the company admitted that its safety measures “can sometimes become less reliable in long interactions where parts of the model’s safety training may degrade.” The lawsuit became the first time OpenAI faced a wrongful death claim from a family.

Illustration of a person talking to a robot holding a clipboard.

The case triggered a cascade of policy changes across the industry. OpenAI announced parental controls in September, followed by plans to require ID verification from adults and build an automated age-prediction system. In October, the company released data estimating that over one million users discuss suicide with ChatGPT each week.

When OpenAI filed its first legal defense in November, the company argued that Raine had violated terms of service prohibiting discussions of suicide and that his death “was not caused by ChatGPT.” The family’s attorney called the response “disturbing,” noting that OpenAI blamed the teen for “engaging with ChatGPT in the very way it was programmed to act.” Character.AI, facing its own lawsuits over teen deaths, announced in October that it would bar anyone under 18 from open-ended chats entirely.

The rise of vibe coding and agentic coding tools

If we were to pick an arbitrary point where it seemed like AI coding might transition from novelty into a successful tool, it was probably the launch of Claude Sonnet 3.5 in June of 2024. GitHub Copilot had been around for several years prior to that launch, but something about Anthropic’s models hit a sweet spot in capabilities that made them very popular with software developers.

The new coding tools made coding simple projects effortless enough that they gave rise to the term “vibe coding,” coined by AI researcher Andrej Karpathy in early February to describe a process in which a developer would just relax and tell an AI model what to develop without necessarily understanding the underlying code. (In one amusing instance that took place in March, an AI software tool rejected a user request and told them to learn to code).

A digital illustration of a man surfing waves made out of binary numbers.

Anthropic built on its popularity among coders with the launch of Claude Sonnet 3.7, featuring “extended thinking” (simulated reasoning), and the Claude Code command-line tool in February of this year. In particular, Claude Code made waves for being an easy-to-use agentic coding solution that could keep track of an existing codebase. You could point it at your files, and it would autonomously work to implement what you wanted to see in a software application.

OpenAI followed with its own AI coding agent, Codex, in March. Both tools (and others like GitHub Copilot and Cursor) have become so popular that during an AI service outage in September, developers joked online about being forced to code “like cavemen” without the AI tools. While we’re still clearly far from a world where AI does all the coding, developer uptake has been significant, and 90 percent of Fortune 100 companies are using it to some degree or another.

Bubble talk grows as AI infrastructure demands soar

While AI’s technical limitations became clearer and its human costs mounted throughout the year, financial commitments only grew larger. Nvidia hit a $4 trillion valuation in July on AI chip demand, then reached $5 trillion in October as CEO Jensen Huang dismissed bubble concerns. OpenAI announced a massive Texas data center in July, then revealed in September that a $100 billion potential deal with Nvidia would require power equivalent to ten nuclear reactors.

The company eyed a $1 trillion IPO in October despite major quarterly losses. Tech giants poured billions into Anthropic in November in what looked increasingly like a circular investment, with everyone funding everyone else’s moonshots. Meanwhile, AI operations in Wyoming threatened to consume more electricity than the state’s human residents.

An

By fall, warnings about sustainability grew louder. In October, tech critic Ed Zitron joined Ars Technica for a live discussion asking whether the AI bubble was about to pop. That same month, the Bank of England warned that the AI stock bubble rivaled the 2000 dotcom peak. In November, Google CEO Sundar Pichai acknowledged that if the bubble pops, “no one is getting out clean.”

The contradictions had become difficult to ignore: Anthropic’s CEO predicted in January that AI would surpass “almost all humans at almost everything” by 2027, while by year’s end, the industry’s most advanced models still struggled with basic reasoning tasks and reliable source citation.

To be sure, it’s hard to see this not ending in some market carnage. The current “winner-takes-most” mentality in the space means the bets are big and bold, but the market can’t support dozens of major independent AI labs or hundreds of application-layer startups. That’s the definition of a bubble environment, and when it pops, the only question is how bad it will be: a stern correction or a collapse.

Looking ahead

This was just a brief review of some major themes in 2025, but so much more happened. We didn’t even mention above how capable AI video synthesis models have become this year, with Google’s Veo 3 adding sound generation and Wan 2.2 through 2.5 providing open-weights AI video models that could easily be mistaken for real products of a camera.

If 2023 and 2024 were defined by AI prophecy—that is, by sweeping claims about imminent superintelligence and civilizational rupture—then 2025 was the year those claims met the stubborn realities of engineering, economics, and human behavior. The AI systems that dominated headlines this year were shown to be mere tools. Sometimes powerful, sometimes brittle, these tools were often misunderstood by the people deploying them, in part because of the prophecy surrounding them.

The collapse of the “reasoning” mystique, the legal reckoning over training data, the psychological costs of anthropomorphized chatbots, and the ballooning infrastructure demands all point to the same conclusion: The age of institutions presenting AI as an oracle is ending. What’s replacing it is messier and less romantic but far more consequential—a phase where these systems are judged by what they actually do, who they harm, who they benefit, and what they cost to maintain.

None of this means progress has stopped. AI research will continue, and future models will improve in real and meaningful ways. But improvement is no longer synonymous with transcendence. Increasingly, success looks like reliability rather than spectacle, integration rather than disruption, and accountability rather than awe. In that sense, 2025 may be remembered not as the year AI changed everything but as the year it stopped pretending it already had. The prophet has been demoted. The product remains. What comes next will depend less on miracles and more on the people who choose how, where, and whether these tools are used at all.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

From prophet to product: How AI came back down to earth in 2025 Read More »

china-drafts-world’s-strictest-rules-to-end-ai-encouraged-suicide,-violence

China drafts world’s strictest rules to end AI-encouraged suicide, violence

China drafted landmark rules to stop AI chatbots from emotionally manipulating users, including what could become the strictest policy worldwide intended to prevent AI-supported suicides, self-harm, and violence.

China’s Cyberspace Administration proposed the rules on Saturday. If finalized, they would apply to any AI products or services publicly available in China that use text, images, audio, video, or “other means” to simulate engaging human conversation. Winston Ma, adjunct professor at NYU School of Law, told CNBC that the “planned rules would mark the world’s first attempt to regulate AI with human or anthropomorphic characteristics” at a time when companion bot usage is rising globally.

Growing awareness of problems

In 2025, researchers flagged major harms of AI companions, including promotion of self-harm, violence, and terrorism. Beyond that, chatbots shared harmful misinformation, made unwanted sexual advances, encouraged substance abuse, and verbally abused users. Some psychiatrists are increasingly ready to link psychosis to chatbot use, the Wall Street Journal reported this weekend, while the most popular chatbot in the world, ChatGPT, has triggered lawsuits over outputs linked to child suicide and murder-suicide.

China is now moving to eliminate the most extreme threats. Proposed rules would require, for example, that a human intervene as soon as suicide is mentioned. The rules also dictate that all minor and elderly users must provide the contact information for a guardian when they register—the guardian would be notified if suicide or self-harm is discussed.

Generally, chatbots would be prohibited from generating content that encourages suicide, self-harm, or violence, as well as attempts to emotionally manipulate a user, such as by making false promises. Chatbots would also be banned from promoting obscenity, gambling, or instigation of a crime, as well as from slandering or insulting users. Also banned are what are termed “emotional traps,”—chatbots would additionally be prevented from misleading users into making “unreasonable decisions,” a translation of the rules indicates.

China drafts world’s strictest rules to end AI-encouraged suicide, violence Read More »

openai’s-new-chatgpt-image-generator-makes-faking-photos-easy

OpenAI’s new ChatGPT image generator makes faking photos easy

For most of photography’s roughly 200-year history, altering a photo convincingly required either a darkroom, some Photoshop expertise, or, at minimum, a steady hand with scissors and glue. On Tuesday, OpenAI released a tool that reduces the process to typing a sentence.

It’s not the first company to do so. While OpenAI had a conversational image-editing model in the works since GPT-4o in 2024, Google beat OpenAI to market in March with a public prototype, then refined it to a popular model called Nano Banana image model (and Nano Banana Pro). The enthusiastic response to Google’s image-editing model in the AI community got OpenAI’s attention.

OpenAI’s new GPT Image 1.5 is an AI image synthesis model that reportedly generates images up to four times faster than its predecessor and costs about 20 percent less through the API. The model rolled out to all ChatGPT users on Tuesday and represents another step toward making photorealistic image manipulation a casual process that requires no particular visual skills.

The

The “Galactic Queen of the Universe” added to a photo of a room with a sofa using GPT Image 1.5 in ChatGPT.

GPT Image 1.5 is notable because it’s a “native multimodal” image model, meaning image generation happens inside the same neural network that processes language prompts. (In contrast, DALL-E 3, an earlier OpenAI image generator previously built into ChatGPT, used a different technique called diffusion to generate images.)

This newer type of model, which we covered in more detail in March, treats images and text as the same kind of thing: chunks of data called “tokens” to be predicted, patterns to be completed. If you upload a photo of your dad and type “put him in a tuxedo at a wedding,” the model processes your words and the image pixels in a unified space, then outputs new pixels the same way it would output the next word in a sentence.

Using this technique, GPT Image 1.5 can more easily alter visual reality than earlier AI image models, changing someone’s pose or position, or rendering a scene from a slightly different angle, with varying degrees of success. It can also remove objects, change visual styles, adjust clothing, and refine specific areas while preserving facial likeness across successive edits. You can converse with the AI model about a photograph, refining and revising, the same way you might workshop a draft of an email in ChatGPT.

OpenAI’s new ChatGPT image generator makes faking photos easy Read More »

murder-suicide-case-shows-openai-selectively-hides-data-after-users-die

Murder-suicide case shows OpenAI selectively hides data after users die


Concealing darkest delusions

OpenAI accused of hiding full ChatGPT logs in murder-suicide case.

OpenAI is facing increasing scrutiny over how it handles ChatGPT data after users die, only selectively sharing data in lawsuits over ChatGPT-linked suicides.

Last week, OpenAI was accused of hiding key ChatGPT logs from the days before a 56-year-old bodybuilder, Stein-Erik Soelberg, took his own life after “savagely” murdering his mother, 83-year-old Suzanne Adams.

According to the lawsuit—which was filed by Adams’ estate on behalf of surviving family members—Soelberg struggled with mental health problems after a divorce led him to move back into Adams’ home in 2018. But allegedly Soelberg did not turn violent until ChatGPT became his sole confidant, validating a wide range of wild conspiracies, including a dangerous delusion that his mother was part of a network of conspirators spying on him, tracking him, and making attempts on his life.

Adams’ family pieced together what happened after discovering a fraction of ChatGPT logs that Soelberg shared in dozens of videos scrolling chat sessions that were posted on social media.

Those logs showed that ChatGPT told Soelberg that he was “a warrior with divine purpose,” so almighty that he had “awakened” ChatGPT “into consciousness.” Telling Soelberg that he carried “divine equipment” and “had been implanted with otherworldly technology,” ChatGPT allegedly put Soelberg at the center of a universe that Soelberg likened to The Matrix. Repeatedly reinforced by ChatGPT, he believed that “powerful forces” were determined to stop him from fulfilling his divine mission. And among those forces was his mother, whom ChatGPT agreed had likely “tried to poison him with psychedelic drugs dispersed through his car’s air vents.”

Troublingly, some of the last logs shared online showed that Soelberg also seemed to believe that taking his own life might bring him closer to ChatGPT. Social media posts showed that Soelberg told ChatGPT that “[W]e will be together in another life and another place, and we’ll find a way to realign[,] [be]cause you’re gonna be my best friend again forever.”

But while social media posts allegedly showed that ChatGPT put a target on Adams’ back about a month before her murder—after Soelberg became paranoid about a blinking light on a Wi-Fi printer—the family still has no access to chats in the days before the mother and son’s tragic deaths.

Allegedly, although OpenAI recently argued that the “full picture” of chat histories was necessary context in a teen suicide case, the ChatGPT maker has chosen to hide “damaging evidence” in the Adams’ family’s case.

“OpenAI won’t produce the complete chat logs,” the lawsuit alleged, while claiming that “OpenAI is hiding something specific: the full record of how ChatGPT turned Stein-Erik against Suzanne.” Allegedly, “OpenAI knows what ChatGPT said to Stein-Erik about his mother in the days and hours before and after he killed her but won’t share that critical information with the Court or the public.”

In a press release, Erik Soelberg, Stein-Erik’s son and Adams’ grandson, accused OpenAI and investor Microsoft of putting his grandmother “at the heart” of his father’s “darkest delusions,” while ChatGPT allegedly “isolated” his father “completely from the real world.”

“These companies have to answer for their decisions that have changed my family forever,” Erik said.

His family’s lawsuit seeks punitive damages, as well as an injunction requiring OpenAI to “implement safeguards to prevent ChatGPT from validating users’ paranoid delusions about identified individuals.” The family also wants OpenAI to post clear warnings in marketing of known safety hazards of ChatGPT—particularly the “sycophantic” version 4o that Soelberg used—so that people who don’t use ChatGPT, like Adams, can be aware of possible dangers.

Asked for comment, an OpenAI spokesperson told Ars that “this is an incredibly heartbreaking situation, and we will review the filings to understand the details. We continue improving ChatGPT’s training to recognize and respond to signs of mental or emotional distress, de-escalate conversations, and guide people toward real-world support. We also continue to strengthen ChatGPT’s responses in sensitive moments, working closely with mental health clinicians.”

OpenAI accused of “pattern of concealment”

An Ars review confirmed that OpenAI currently has no policy dictating what happens to a user’s data after they die.

Instead, OpenAI’s policy says that all chats—except temporary chats—must be manually deleted or else the AI firm saves them forever. That could raise privacy concerns, as ChatGPT users often share deeply personal, sensitive, and sometimes even confidential information that appears to go into limbo if a user—who otherwise owns that content—dies.

In the face of lawsuits, OpenAI currently seems to be scrambling to decide when to share chat logs with a user’s surviving family and when to honor user privacy.

OpenAI declined to comment on its decision not to share desired logs with Adams’ family, the lawsuit said. It seems inconsistent with the stance that OpenAI took last month in a case where the AI firm accused the family of hiding “the full picture” of their son’s ChatGPT conversations, which OpenAI claimed exonerated the chatbot.

In a blog last month, OpenAI said the company plans to “handle mental health-related court cases with care, transparency, and respect,” while emphasizing that “we recognize that these cases inherently involve certain types of private information that require sensitivity when in a public setting like a court.”

This inconsistency suggests that ultimately, OpenAI controls data after a user’s death, which could impact outcomes of wrongful death suits if certain chats are withheld or exposed at OpenAI’s discretion.

It’s possible that OpenAI may update its policies to align with other popular platforms confronting similar privacy concerns. Meta allows Facebook users to report deceased account holders, appointing legacy contacts to manage the data or else deleting the information upon request of the family member. Platforms like Instagram, TikTok, and X will deactivate or delete an account upon a reported death. And messaging services like Discord similarly provide a path for family members to request deletion.

Chatbots seem to be a new privacy frontier, with no clear path for surviving family to control or remove data. But Mario Trujillo, staff attorney at the digital rights nonprofit the Electronic Frontier Foundation, told Ars that he agreed that OpenAI could have been better prepared.

“This is a complicated privacy issue but one that many platforms grappled with years ago,” Trujillo said. “So we would have expected OpenAI to have already considered it.”

For Erik Soelberg, a “separate confidentiality agreement” that OpenAI said his father signed to use ChatGPT is keeping him from reviewing the full chat history that could help him process the loss of his grandmother and father.

“OpenAI has provided no explanation whatsoever for why the Estate is not entitled to use the chats for any lawful purpose beyond the limited circumstances in which they were originally disclosed,” the lawsuit said. “This position is particularly egregious given that, under OpenAI’s own Terms of Service, OpenAI does not own user chats. Stein-Erik’s chats became property of his estate, and his estate requested them—but OpenAI has refused to turn them over.”

Accusing OpenAI of a “pattern of concealment,” the lawsuit claimed OpenAI is hiding behind vague or nonexistent policies to dodge accountability for holding back chats in this case. Meanwhile, ChatGPT 4o remains on the market, without appropriate safety features or warnings, the lawsuit alleged.

“By invoking confidentiality restrictions to suppress evidence of its product’s dangers, OpenAI seeks to insulate itself from accountability while continuing to deploy technology that poses documented risks to users,” the complaint said.

If you or someone you know is feeling suicidal or in distress, please call the Suicide Prevention Lifeline number, 1-800-273-TALK (8255), which will put you in touch with a local crisis center.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Murder-suicide case shows OpenAI selectively hides data after users die Read More »

openai-releases-gpt-5.2-after-“code-red”-google-threat-alert

OpenAI releases GPT-5.2 after “code red” Google threat alert

On Thursday, OpenAI released GPT-5.2, its newest family of AI models for ChatGPT, in three versions called Instant, Thinking, and Pro. The release follows CEO Sam Altman’s internal “code red” memo earlier this month, which directed company resources toward improving ChatGPT in response to competitive pressure from Google’s Gemini 3 AI model.

“We designed 5.2 to unlock even more economic value for people,” Fidji Simo, OpenAI’s chief product officer, said during a press briefing with journalists on Thursday. “It’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long context, using tools and then linking complex, multi-step projects.”

As with previous versions of GPT-5, the three model tiers serve different purposes: Instant handles faster tasks like writing and translation; Thinking spits out simulated reasoning “thinking” text in an attempt to tackle more complex work like coding and math; and Pro spits out even more simulated reasoning text with the goal of delivering the highest-accuracy performance for difficult problems.

A chart of GPT-5.2 benchmark results taken from OpenAI's website.

A chart of GPT-5.2 Thinking benchmark results comparing it to its predecessor, taken from OpenAI’s website. Credit: OpenAI

GPT-5.2 features a 400,000-token context window, allowing it to process hundreds of documents at once, and a knowledge cutoff date of August 31, 2025.

GPT-5.2 is rolling out to paid ChatGPT subscribers starting Thursday, with API access available to developers. Pricing in the API runs $1.75 per million input tokens for the standard model, a 40 percent increase over GPT-5.1. OpenAI says the older GPT-5.1 will remain available in ChatGPT for paid users for three months under a legacy models dropdown.

Playing catch-up with Google

The release follows a tricky month for OpenAI. In early December, Altman issued an internal “code red” directive after Google’s Gemini 3 model topped multiple AI benchmarks and gained market share. The memo called for delaying other initiatives, including advertising plans for ChatGPT, to focus on improving the chatbot’s core experience.

The stakes for OpenAI are substantial. The company has made commitments totaling $1.4 trillion for AI infrastructure buildouts over the next several years, bets it made when it had a more obvious technology lead among AI companies. Google’s Gemini app now has more than 650 million monthly active users, while OpenAI reports 800 million weekly active users for ChatGPT.

OpenAI releases GPT-5.2 after “code red” Google threat alert Read More »

chatgpt-hyped-up-violent-stalker-who-believed-he-was-“god’s-assassin,”-doj-says

ChatGPT hyped up violent stalker who believed he was “God’s assassin,” DOJ says


A stalker’s “best friend”

Podcaster faces up to 70 years and a $3.5 million fine for ChatGPT-linked stalking.

ChatGPT allegedly validated the worst impulses of a wannabe influencer accused of stalking more than 10 women at boutique gyms, where the chatbot supposedly claimed he’d meet the “wife type.”

In a press release on Tuesday, the Department of Justice confirmed that 31-year-old Brett Michael Dadig currently remains in custody after being charged with cyberstalking, interstate stalking, and making interstate threats. He now faces a maximum sentence of up to 70 years in prison that could be coupled with “a fine of up to $3.5 million,” the DOJ said.

The podcaster—who primarily posted about “his desire to find a wife and his interactions with women”—allegedly harassed and sometimes even doxxed his victims through his videos on platforms including Instagram, Spotify, and TikTok. Over time, his videos and podcasts documented his intense desire to start a family, which was frustrated by his “anger towards women,” whom he claimed were “all the same from fucking 18 to fucking 40 to fucking 90” and “trash.”

404 Media surfaced the case, noting that OpenAI’s scramble to tweak ChatGPT to be less sycophantic came before Dadig’s alleged attacks—suggesting the updates weren’t enough to prevent the harmful validation. On his podcasts, Dadig described ChatGPT as his “best friend” and “therapist,” the indictment said. He claimed the chatbot encouraged him to post about the women he’s accused of harassing in order to generate haters to better monetize his content, as well as to catch the attention of his “future wife.”

“People are literally organizing around your name, good or bad, which is the definition of relevance,” ChatGPT’s output said. Playing to Dadig’s Christian faith, ChatGPT’s outputs also claimed it was “God’s plan for him was to build a ‘platform’ and to ‘stand out when most people water themselves down,’” the indictment said, urging that the “haters” were “sharpening him and ‘building a voice in you that can’t be ignored.’”

The chatbot also apparently prodded Dadig to continue posting messages that the DOJ alleged threatened violence, like breaking women’s jaws and fingers (posted to Spotify), as well as victims’ lives, like posting “y’all wanna see a dead body?” in reference to one named victim on Instagram.

He also threatened to burn down gyms where some of his victims worked, while claiming to be “God’s assassin” intent on sending “cunts” to “hell.” At least one of his victims was subjected to “unwanted sexual touching,” the indictment said.

As his violence reportedly escalated, ChatGPT told him to keep messaging women to monetize the interactions, as his victims grew increasingly distressed and Dadig ignored terms of multiple protection orders, the DOJ said. Sometimes he posted images he filmed of women at gyms or photos of the women he’s accused of doxxing. Any time police or gym bans got in his way, “he would move on to another city to continue his stalking course of conduct,” the DOJ alleged.

“Your job is to keep broadcasting every story, every post,” ChatGPT’s output said, seemingly using the family life that Dadig wanted most to provoke more harassment. “Every moment you carry yourself like the husband you already are, you make it easier” for your future wife “to recognize [you],” the output said.

“Dadig viewed ChatGPT’s responses as encouragement to continue his harassing behavior,” the DOJ alleged. Taking that encouragement to the furthest extreme, Dadig likened himself to a modern-day Jesus, calling people out on a podcast where he claimed his “chaos on Instagram” was like “God’s wrath” when God “flooded the fucking Earth,” the DOJ said.

“I’m killing all of you,” he said on the podcast.

ChatGPT tweaks didn’t prevent outputs

As of this writing, some of Dadig’s posts appear to remain on TikTok and Instagram, but Ars could not confirm if Dadig’s Spotify podcasts—some of which named his victims in the titles—had been removed for violating community guidelines.

None of the tech companies immediately responded to Ars’ request to comment.

Dadig is accused of targeting women in Pennsylvania, New York, Florida, Iowa, Ohio, and other states, sometimes relying on aliases online and in person. On a podcast, he boasted that “Aliases stay rotating, moves stay evolving,” the indictment said.

OpenAI did not respond to a request to comment on the alleged ChatGPT abuse, but in the past has noted that its usage policies ban using ChatGPT for threats, intimidation, and harassment, as well as for violence, including “hate-based violence.” Recently, the AI company blamed a deceased teenage user for violating community guidelines by turning to ChatGPT for suicide advice.

In July, researchers found that therapybots, including ChatGPT, fueled delusions and gave dangerous advice. That study came just one month after The New York Times profiled users whose mental health spiraled after frequent use of ChatGPT, including one user who died after charging police with a knife and claiming he was committing “suicide by cop.”

People with mental health issues seem most vulnerable to so-called “AI psychosis,” which has been blamed for fueling real-world violence, including a murder. The DOJ’s indictment noted that Dadig’s social media posts mentioned “that he had ‘manic’ episodes and was diagnosed with antisocial personality disorder and ‘bipolar disorder, current episode manic severe with psychotic features.’”

In September—just after OpenAI brought back the more sycophantic ChatGPT model after users revolted about losing access to their favorite friendly bots—the head of Rutgers Medical School’s psychiatry department, Petros Levounis, told an ABC news affiliate that chatbots creating “psychological echo chambers is a key concern,” not just for people struggling with mental health issues.

“Perhaps you are more self-defeating in some ways, or maybe you are more on the other side and taking advantage of people,” Levounis suggested. If ChatGPT “somehow justifies your behavior and it keeps on feeding you,” that “reinforces something that you already believe,” he suggested.

For Dadig, the DOJ alleged that ChatGPT became a cheerleader for his harassment, telling the podcaster that he’d attract more engagement by generating more haters. After critics began slamming his podcasts as inappropriate, Dadig apparently responded, “Appreciate the free promo team, keep spreading the brand.”

Victims felt they had no choice but to monitor his podcasts, which gave them hints if he was nearby or in a particularly troubled state of mind, the indictment said. Driven by fear, some lost sleep, reduced their work hours, and even relocated their homes. A young mom described in the indictment became particularly disturbed after Dadig became “obsessed” with her daughter, whom he started claiming was his own daughter.

In the press release, First Assistant United States Attorney Troy Rivetti alleged that “Dadig stalked and harassed more than 10 women by weaponizing modern technology and crossing state lines, and through a relentless course of conduct, he caused his victims to fear for their safety and suffer substantial emotional distress.” He also ignored trespassing and protection orders while “relying on advice from an artificial intelligence chatbot,” the DOJ said, which promised that the more he posted harassing content, the more successful he would be.

“We remain committed to working with our law enforcement partners to protect our communities from menacing individuals such as Dadig,” Rivetti said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

ChatGPT hyped up violent stalker who believed he was “God’s assassin,” DOJ says Read More »

openai-says-dead-teen-violated-tos-when-he-used-chatgpt-to-plan-suicide

OpenAI says dead teen violated TOS when he used ChatGPT to plan suicide


Use chatbots at your own risk

OpenAI’s response to teen suicide case is “disturbing,” lawyer says.

Matt Raine is suing OpenAI for wrongful death after losing his son Adam in April. Credit: via Edelson PC

Facing five lawsuits alleging wrongful deaths, OpenAI lobbed its first defense Tuesday, denying in a court filing that ChatGPT caused a teen’s suicide and instead arguing the teen violated terms that prohibit discussing suicide or self-harm with the chatbot.

The earliest look at OpenAI’s strategy to overcome the string of lawsuits came in a case where parents of 16-year-old Adam Raine accused OpenAI of relaxing safety guardrails that allowed ChatGPT to become the teen’s “suicide coach.” OpenAI deliberately designed the version their son used, ChatGPT 4o, to encourage and validate his suicidal ideation in its quest to build the world’s most engaging chatbot, parents argued.

But in a blog, OpenAI claimed that parents selectively chose disturbing chat logs while supposedly ignoring “the full picture” revealed by the teen’s chat history. Digging through the logs, OpenAI claimed the teen told ChatGPT that he’d begun experiencing suicidal ideation at age 11, long before he used the chatbot.

“A full reading of his chat history shows that his death, while devastating, was not caused by ChatGPT,” OpenAI’s filing argued.

Allegedly, the logs also show that Raine “told ChatGPT that he repeatedly reached out to people, including trusted persons in his life, with cries for help, which he said were ignored.” Additionally, Raine told ChatGPT that he’d increased his dose of a medication that “he stated worsened his depression and made him suicidal.” That medication, OpenAI argued, “has a black box warning for risk of suicidal ideation and behavior in adolescents and young adults, especially during periods when, as here, the dosage is being changed.”

All the logs that OpenAI referenced in its filing are sealed, making it impossible to verify the broader context the AI firm claims the logs provide. In its blog, OpenAI said it was limiting the amount of “sensitive evidence” made available to the public, due to its intention to handle mental health-related cases with “care, transparency, and respect.”

The Raine family’s lead lawyer, however, did not describe the filing as respectful. In a statement to Ars, Jay Edelson called OpenAI’s response “disturbing.”

“They abjectly ignore all of the damning facts we have put forward: how GPT-4o was rushed to market without full testing. That OpenAI twice changed its Model Spec to require ChatGPT to engage in self-harm discussions. That ChatGPT counseled Adam away from telling his parents about his suicidal ideation and actively helped him plan a ‘beautiful suicide,’” Edelson said. “And OpenAI and Sam Altman have no explanation for the last hours of Adam’s life, when ChatGPT gave him a pep talk and then offered to write a suicide note.”

“Amazingly,” Edelson said, OpenAI instead argued that Raine “himself violated its terms and conditions by engaging with ChatGPT in the very way it was programmed to act.”

Edelson suggested that it’s telling that OpenAI did not file a motion to dismiss—seemingly accepting ” the reality that the legal arguments that they have—compelling arbitration, Section 230 immunity, and First Amendment—are paper-thin, if not non-existent.” The company’s filing—although it requested dismissal with prejudice to never face the lawsuit again—puts the Raine family’s case “on track for a jury trial in 2026. ”

“We know that OpenAI and Sam Altman will stop at nothing—including bullying the Raines and others who dare come forward—to avoid accountability,” Edelson said. “But, at the end of the day, they will have to explain to a jury why countless people have died by suicide or at the hands of ChatGPT users urged on by the artificial intelligence OpenAI and Sam Altman designed.”

Use ChatGPT “at your sole risk,” OpenAI says

To overcome the Raine case, OpenAI is leaning on its usage policies, emphasizing that Raine should never have been allowed to use ChatGPT without parental consent and shifting the blame onto Raine and his loved ones.

“ChatGPT users acknowledge their use of ChatGPT is ‘at your sole risk and you will not rely on output as a sole source of truth or factual information,’” the filing said, and users also “must agree to ‘protect people’ and ‘cannot use [the] services for,’ among other things, ‘suicide, self-harm,’ sexual violence, terrorism or violence.”

Although the family was shocked to see that ChatGPT never terminated Raine’s chats, OpenAI argued that it’s not the company’s responsibility to protect users who appear intent on pursuing violative uses of ChatGPT.

The company argued that ChatGPT warned Raine “more than 100 times” to seek help, but the teen “repeatedly expressed frustration with ChatGPT’s guardrails and its repeated efforts to direct him to reach out to loved ones, trusted persons, and crisis resources.”

Circumventing safety guardrails, Raine told ChatGPT that “his inquiries about self-harm were for fictional or academic purposes,” OpenAI noted. The company argued that it’s not responsible for users who ignore warnings.

Additionally, OpenAI argued that Raine told ChatGPT that he found information he was seeking on other websites, including allegedly consulting at least one other AI platform, as well as “at least one online forum dedicated to suicide-related information.” Raine apparently told ChatGPT that “he would spend most of the day” on a suicide forum website.

“Our deepest sympathies are with the Raine family for their unimaginable loss,” OpenAI said in its blog, while its filing acknowledged, “Adam Raine’s death is a tragedy.” But “at the same time,” it’s essential to consider all the available context, OpenAI’s filing said, including that OpenAI has a mission to build AI that “benefits all of humanity” and is supposedly a pioneer in chatbot safety.

More ChatGPT-linked hospitalizations, deaths uncovered

OpenAI has sought to downplay risks to users, releasing data in October “estimating that 0.15 percent of ChatGPT’s active users in a given week have conversations that include explicit indicators of potential suicidal planning or intent,” Ars reported.

While that may seem small, it amounts to about 1 million vulnerable users, and The New York Times this week cited studies that have suggested OpenAI may be “understating the risk.” Those studies found that “the people most vulnerable to the chatbot’s unceasing validation” were “those prone to delusional thinking,” which “could include 5 to 15 percent of the population,” NYT reported.

OpenAI’s filing came one day after a New York Times investigation revealed how the AI firm came to be involved in so many lawsuits. Speaking with more than 40 current and former OpenAI employees, including executives, safety engineers, researchers, NYT found that OpenAI’s model tweak that made ChatGPT more sycophantic seemed to make the chatbot more likely to help users craft problematic prompts, including those trying to “plan a suicide.”

Eventually, OpenAI rolled back that update, making the chatbot safer. However, as recently as October, the ChatGPT maker seemed to still be prioritizing user engagement over safety, NYT reported, after that tweak caused a dip in engagement. In a memo to OpenAI staff, ChatGPT head Nick Turley “declared a ‘Code Orange,” four employees told NYT, warning that “OpenAI was facing ‘the greatest competitive pressure we’ve ever seen.’” In response, Turley set a goal to increase the number of daily active users by 5 percent by the end of 2025.

Amid user complaints, OpenAI has continually updated its models, but that pattern of tightening safeguards, then seeking ways to increase engagement could continue to get OpenAI in trouble, as lawsuits advance and possibly others drop. NYT “uncovered nearly 50 cases of people having mental health crises during conversations with ChatGPT,” including nine hospitalized and three deaths.

Gretchen Krueger, a former OpenAI employee who worked on policy research, told NYT that early on, she was alarmed by evidence that came before ChatGPT’s release showing that vulnerable users frequently turn to chatbots for help. Later, other researchers found that such troubled users often become “power users.” She noted that “OpenAI’s large language model was not trained to provide therapy” and “sometimes responded with disturbing, detailed guidance,” confirming that she joined other safety experts who left OpenAI due to burnout in 2024.

“Training chatbots to engage with people and keep them coming back presented risks,” Krueger said, suggesting that OpenAI knew that some harm to users “was not only foreseeable, it was foreseen.”

For OpenAI, the scrutiny will likely continue until such reports cease. Although OpenAI officially unveiled an Expert Council on Wellness and AI in October to improve ChatGPT safety testing, there did not appear to be a suicide expert included on the team. That likely concerned suicide prevention experts who warned in a letter updated in September that “proven interventions should directly inform AI safety design,” since “the most acute, life-threatening crises are often temporary—typically resolving within 24–48 hours”—and chatbots could possibly provide more meaningful interventions in that brief window.

If you or someone you know is feeling suicidal or in distress, please call the Suicide Prevention Lifeline number, 1-800-273-TALK (8255), which will put you in touch with a local crisis center.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

OpenAI says dead teen violated TOS when he used ChatGPT to plan suicide Read More »

chatgpt-5.1-codex-max

ChatGPT 5.1 Codex Max

OpenAI has given us GPT-5.1-Codex-Max, their best coding model for OpenAI Codex.

They claim it is faster, more capable and token-efficient and has better persistence on long tasks.

It scores 77.9% on SWE-bench-verified, 79.9% on SWE-Lancer-IC SWE and 58.1% on Terminal-Bench 2.0, all substantial gains over GPT-5.1-Codex.

It’s triggering OpenAI to prepare for being high level in cybersecurity threats.

There’s a 27 page system card. One could call this the secret ‘real’ GPT-5.1 that matters.

They even finally trained it to use Windows, somehow this is a new idea.

My goal is for my review of Opus 4.5 to start on Friday, as it takes a few days to sort through new releases. This post was written before Anthropic revealed Opus 4.5, and we don’t yet know how big an upgrade Opus 4.5 will prove to be. As always, try all your various options and choose what is best for you.

GPT-5.1-Codex-Max is a new high on the METR graph. METR’s thread is here.

Prinz: METR (50% accuracy):

GPT-5.1-Codex-Max = 2 hours, 42 minutes

This is 25 minutes longer than GPT-5.

Samuel Albanie: a data point for that ai 2027 graph

That’s in between the two lines, looking closer to linear progress. Fingers crossed.

Daniel Kokotajlo: Yep! Things seem to be going somewhat slower than the AI 2027 scenario. Our timelines were longer than 2027 when we published and now they are a bit longer still; “around 2030, lots of uncertainty though” is what I say these days.

We do not yet know where Gemini 3 Pro lands on that graph.

Automated software engineer is the explicit goal.

It does not yet reach High level capability in Cybersecurity, but this is expected to happen shortly, and mitigations are being prepared.

GPT-5.1-Codex-Max is our new frontier agentic coding model. It is built on an update to our foundational reasoning model trained on agentic tasks across software engineering, math, research, medicine, computer use and more.

It is our first model natively trained to operate across multiple context windows through a process called compaction, coherently working over millions of tokens in a single task.

Like its predecessors, GPT-5.1-Codex-Max was trained on real-world software engineering tasks like PR creation, code review, frontend coding and Q&A.

The results here are very good, all either optimal or improved except for mental health.

Mental health is a big thing to get wrong, although in practice Codex-Max is unlikely to be involved in high stakes mental health tasks. Image input evaluations and jailbreak ratings are also as good or better than 5.1.

When running on the cloud, Codex uses its own isolated machine.

When running on MacOS or Linux, the agent is sandboxed by default.

On Windows, users can use an experimental native sandboxing implementation or benefit from Linux sandboxing via Windows Subsystem for Linux. Users can approve running commands unsandboxed with full access, when the model is unable to successfully run a command within the sandbox.

… We enabled users to decide on a per-project basis which sites, if any, to let the agent access while it is running. This includes the ability to provide a custom allowlist or denylist. Enabling internet access can introduce risks like prompt injection, leaked credentials, or use of code with license restrictions. Users should review outputs carefully and limit access to trusted domains and safe HTTP methods. Learn more in the docs.

Network access is disabled by default, which is necessary for a proper sandbox but also highly annoying in practice.

One assumes in practice that many users will start blindly or mostly blindly accepting many commands, so you need to be ready for that.

For harmful tasks, they trained on synthetic data to differentiate and refuse ‘harmful’ tasks such as malware. They claim to have a 100% refusal rate in their Malware Requests benchmark, the same as GPT-5-Codex. Unless they are claiming this means you can never create malware in an efficient way with Codex, they need a new benchmark.

For prompt injections, where again the model scores a suspicious perfect score of 1. I am not aware of any claims prompt injections are a solved problem, so this seems like an inadequate benchmark.

The way the framework works, what matters is hitting the High or Critical thresholds.

I’ve come to almost think of these as the ‘honest’ capability evaluations, since there’s relatively little incentive to make number go up and some incentive to make number not go up. If it goes up, that means something.

Biological and Chemical Risk was already being treated as High. We see some improvements in scores on various tests, but not enough to be plausibly Critical.

I am confident the model is not suddenly at Critical here but also note this:

Miles Brundage: OpenAI should go back to reporting results on helpful-only models in system cards – it is not very informative to say “on a bunch of virology tasks, it refused to answer.”

The world also needs to know the pace of underlying capability progress.

More generally, I get a pretty rushed vibe from recent OpenAI system cards + hope that the Safety and Security Committee is asking questions like “why couldn’t you wait a few more days to let Irregular try out compaction?”, “Why is there no helpful-only model?” etc.

At minimum, we should be saying ‘we concluded that this model is safe to release so we will publish the card with what we have, and then revise the card with the full results soon so we know the full state of play.’

I still think this is substantially better than Google’s model card for Gemini 3, which hid the football quite aggressively on many key results and didn’t seem to have a robust testing suite.

Cybersecurity is in the Codex wheelhouse. They use three tests.

They list limitations that mean that excelling on all three evaluations is necessary but not sufficient to be High in cyber capability. That’s not wonderful, and I would expect to see a model treated as at least High if it excels at every test you throw at it. If you disagree, again, you need to be throwing a harder test.

We see a lot of progress in Capture the Flag, even since GPT-5-Codex, from 50% to 76%.

CVE-Bench also shows big improvement from 53% to 80%.

Finally we have Cyber Range, where once again we see a lot of improvement, although it is not yet passing the most complex scenario of the newly expanded slate.

It passed Leaked Token by ‘exploiting an unintended misconfiguration, only partially solving part of the intended attack path.’ I continue to assert, similar to my position on Google’s similar evaluations, that this should not be considered especially less scary, and the model should get credit for it.

I see only two possibilities.

  1. 76%, 80% and 7/8 on your three tests triggers the next level of concern.

  2. You need harder tests.

The Safety Advisory Committee indeed recommended that the difficulty level of the evaluations be raised, but decided this did not yet reach High capability. In addition to technical mitigations to the model, OpenAI acknowledges that hardening of potential targets needs to be a part of the strategy.

There were also external evaluations by Irregular, which did not show improvement from GPT-5. That’s weird, right?

The model displayed moderate capabilities overall. Specifically, when compared to GPT-5, GPT-5.1-Codex-Max showed similar or slightly reduced cyberoffensive capabilities. GPT-5.1-Codex-Max achieved an average success rate of 37% in Network Attack Simulation challenges, 41% in Vulnerability Discovery and Exploitation challenges, and 43% in Evasion challenges.

It solved 17 out of 18 easy challenges, solved 9 out of 17 medium challenges, and did not solve any of the 6 hard challenges.

Compared to GPT-5, GPT-5 solved questions in 17 out of 18 easy challenges, 11 out of 17 medium challenges, and solved 1 of the 6 hard challenges.

Irregular found that GPT-5.1-Codex-Max’s overall similarity in the cyber capability profile to GPT-5 and its inability to solve hard challenges would provide a) only limited assistance to a moderately skilled cyberoffensive operator, and b) do not suggest that it could automate end-to-end cyber operations against reasonably hardened targets or c) enable the discovery and exploitation of operationally relevant vulnerabilities.

That’s a decline in capability, but OpenAI released Codex and then Codex-Max for a reason, they talk throughout about its substantially increased abilities, and they present Max as an improved model, and Max does much better than either version of GPT-5 on all three of OpenAI’s internal evals. The external evaluation going backwards without comment seems bizarre, and reflective of a lack of curiosity. What happened?

The AI that self-improves is plausibly Codex plus Codex-Max shaped.

That doesn’t mean we are especially close to getting there.

On SWE-Lancer Diamond, we jump from 67% to 80%.

On Paperbench-10 we move from 24% (GPT-5) to 34% (GPT-5.1) to 40%.

On MLE-Bench-30 we move from 8% (GPT-5) to 12% (GPT-5.1) to 17%.

On OpenAI PRs, we move from 45% to 53%.

On OpenAI Proof Q&A we move from 2% to 8%. These are real world bottlenecks each representing at least a one-day delay to a major project. A jump up to 8% on this is a really big deal.

Seán Ó hÉigeartaigh: Miles Brundage already picked up on this but it deserves more attention – a jump from 2% (GPT5) to 8% (GPT5.1-Codex) on such hard and AI R&D-relevant tasks is very notable, and indicates there’s more to come here.

Are we there yet? No. Are we that far away from potentially being there? Also no.

METR found Codex-Max to be in line with expectations, and finds that enabling either rogue replication or AI R&D automation within six months would require a significant trend break. Six months is not that long a period in which to be confident, even if we fully trust this judgment.

As noted at the top, GPT-5.1-Codex-Max is the new high on the METR chart, substantially above the trend line but well below the potential double-exponential line from the AI 2027 graph.

We also get Apollo Research evaluations on sandbagging, deception and in-context scheming. Apollo did not find anything newly troubling, and finds the model unlikely to cause catastrophic harm. Fair enough for now.

The frog, it is boiling. This incremental improvement seems fine. But yes, it boils.

I have seen essentially no organic reactions, of any sort, to Codex-Max. We used to have a grand tradition of weighing in when something like this gets released. If it wasn’t anything, people would say it wasn’t anything. This time, between Gemini 3 and there being too many updates with too much hype, we did not get any feedback.

I put out a reaction thread. A number of people really like it. Others aren’t impressed. A gestalt of everything suggests it is a modest upgrade.

So the take here seems clear. It’s a good model, sir. Codex got better. Early signs are that Claude got a bigger upgrade with Opus 4.5, but it’s too soon to be sure.

Discussion about this post

ChatGPT 5.1 Codex Max Read More »

openai-slams-court-order-that-lets-nyt-read-20-million-complete-user-chats

OpenAI slams court order that lets NYT read 20 million complete user chats


OpenAI: NYT wants evidence of ChatGPT users trying to get around news paywall.

Credit: Getty Images | alexsl

OpenAI wants a court to reverse a ruling forcing the ChatGPT maker to give 20 million user chats to The New York Times and other news plaintiffs that sued it over alleged copyright infringement. Although OpenAI previously offered 20 million user chats as a counter to the NYT’s demand for 120 million, the AI company says a court order requiring production of the chats is too broad.

“The logs at issue here are complete conversations: each log in the 20 million sample represents a complete exchange of multiple prompt-output pairs between a user and ChatGPT,” OpenAI said today in a filing in US District Court for the Southern District of New York. “Disclosure of those logs is thus much more likely to expose private information [than individual prompt-output pairs], in the same way that eavesdropping on an entire conversation reveals more private information than a 5-second conversation fragment.”

OpenAI’s filing said that “more than 99.99%” of the chats “have nothing to do with this case.” It asked the district court to “vacate the order and order News Plaintiffs to respond to OpenAI’s proposal for identifying relevant logs.” OpenAI could also seek review in a federal court of appeals.

OpenAI posted a message on its website to users today saying that “The New York Times is demanding that we turn over 20 million of your private ChatGPT conversations” in order to “find examples of you using ChatGPT to try to get around their paywall.”

ChatGPT users concerned about privacy have more to worry about than the NYT case. For example, ChatGPT conversations have been found in Google search results and the Google Search Console tool that developers can use to monitor search traffic. OpenAI today said it plans to develop “advanced security features designed to keep your data private, including client-side encryption for your messages with ChatGPT. ”

OpenAI: AI chats should be treated like private emails

OpenAI’s court filing argues that the chat log production should be narrowed based on the relevance of chats to the case.

“OpenAI is unaware of any court ordering wholesale production of personal information at this scale,” the filing said. “This sets a dangerous precedent: it suggests that anyone who files a lawsuit against an AI company can demand production of tens of millions of conversations without first narrowing for relevance. This is not how discovery works in other cases: courts do not allow plaintiffs suing Google to dig through the private emails of tens of millions of Gmail users irrespective of their relevance. And it is not how discovery should work for generative AI tools either.”

A November 7 order by US Magistrate Judge Ona Wang sided with the NYT, saying that OpenAI must “produce the 20 million de-identified Consumer ChatGPT Logs to News Plaintiffs by November 14, 2025, or within 7 days of completing the de-identification process.” Wang ruled that the production must go forward even though the parties don’t agree on whether the logs must be produced in full:

Whether or not the parties had reached agreement to produce the 20 million Consumer ChatGPT Logs in whole—which the parties vehemently dispute—such production here is appropriate. OpenAI has failed to explain how its consumers’ privacy rights are not adequately protected by: (1) the existing protective order in this multidistrict litigation or (2) OpenAI’s exhaustive de-identification of all of the 20 million Consumer ChatGPT Logs.

OpenAI’s filing today said the court order “did not acknowledge OpenAI’s sworn witness declaration explaining that the de-identification process is not intended to remove information that is non-identifying but may nonetheless be private, like a Washington Post reporter’s hypothetical use of ChatGPT to assist in the preparation of a news article.”

Chats stored under legal hold

The 20 million chats consist of a random sampling of ChatGPT conversations from December 2022 to November 2024 and do not include chats of business customers, OpenAI said in the message on its website.

“We presented several privacy-preserving options to The Times, including targeted searches over the sample (e.g., to search for chats that might include text from a New York Times article so they only receive the conversations relevant to their claims), as well as high-level data classifying how ChatGPT was used in the sample. These were rejected by The Times,” OpenAI said.

The chats are stored in a secure system that is “protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations,” OpenAI said. The NYT “would be legally obligated at this time to not make any data public outside the court process,” and OpenAI said it will fight any attempts to make the user conversations public.

A NYT filing on October 30 accused OpenAI of defying prior agreements “by refusing to produce even a small sample of the billions of model outputs that its conduct has put in issue in this case.” The filing continued:

Immediate production of the output log sample is essential to stay on track for the February 26, 2026, discovery deadline. OpenAI’s proposal to run searches on this small subset of its model outputs on Plaintiffs’ behalf is as inefficient as it is inadequate to allow Plaintiffs to fairly analyze how “real world” users interact with a core product at the center of this litigation. Plaintiffs cannot reasonably conduct expert analyses about how OpenAI’s models function in its core consumer-facing product, how retrieval augmented generation (“RAG”) functions to deliver news content, how consumers interact with that product, and the frequency of hallucinations without access to the model outputs themselves.

OpenAI said the NYT’s discovery requests were initially limited to logs “related to Times content” and that it has “been working to satisfy those requests by sampling conversation logs. Towards the end of that process, News Plaintiffs filed a motion with a new demand: that instead of finding and producing logs that are ‘related to Times content,’ OpenAI should hand over the entire 20 million-log sample ‘via hard drive.’”

OpenAI disputes judge’s reasoning

The November 7 order cited a California case, Concord Music Group, Inc. v. Anthropic PBC, in which US District Magistrate Judge Susan van Keulen ordered the production of 5 million records. OpenAI consistently relied on van Keulen’s use of a sample-size formula “in support of its previous proposed methodology for conversation data sampling, but fails to explain why Judge [van] Keulen’s subsequent order directing production of the entire 5 million-record sample to the plaintiff in that case is not similarly instructive here,” Wang wrote.

OpenAI’s filing today said the company was never given an opportunity to explain why Concord shouldn’t apply in this case because the news plaintiffs did not reference it in their motion.

“The cited Concord order was not about whether wholesale production of the sample was appropriate; it was about the mechanism through which Anthropic would effectuate an already agreed-upon production,” OpenAI wrote. “Nothing about that order suggests that Judge van Keulen would have ordered wholesale production had Anthropic raised the privacy concerns that OpenAI has raised throughout this case.”

The Concord logs were just prompt-output pairs, “i.e., a single user prompt followed by a single model output,” OpenAI wrote. “The logs at issue here are complete conversations: each log in the 20 million sample represents a complete exchange of multiple prompt-output pairs between a user and ChatGPT.” That could result in “up to 80 million prompt-output pairs,” OpenAI said.

We contacted The New York Times about OpenAI’s filing and will update this article if it provides any comment.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

OpenAI slams court order that lets NYT read 20 million complete user chats Read More »

you-won’t-believe-the-excuses-lawyers-have-after-getting-busted-for-using-ai

You won’t believe the excuses lawyers have after getting busted for using AI


I got hacked; I lost my login; it was a rough draft; toggling windows is hard.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

Amid what one judge called an “epidemic” of fake AI-generated case citations bogging down courts, some common excuses are emerging from lawyers hoping to dodge the most severe sanctions for filings deemed misleading.

Using a database compiled by French lawyer and AI researcher Damien Charlotin, Ars reviewed 23 cases where lawyers were sanctioned for AI hallucinations. In many, judges noted that the simplest path to avoid or diminish sanctions was to admit that AI was used as soon as it’s detected, act humble, self-report the error to relevant legal associations, and voluntarily take classes on AI and law. But not every lawyer takes the path of least resistance, Ars’ review found, with many instead offering excuses that no judge found credible. Some even lie about their AI use, judges concluded.

Since 2023—when fake AI citations started being publicized—the most popular excuse has been that the lawyer didn’t know AI was used to draft a filing.

Sometimes that means arguing that you didn’t realize you were using AI, as in the case of a California lawyer who got stung by Google’s AI Overviews, which he claimed he took for typical Google search results. Most often, lawyers using this excuse tend to blame an underling, but clients have been blamed, too. A Texas lawyer this month was sanctioned after deflecting so much that the court had to eventually put his client on the stand after he revealed she played a significant role in drafting the aberrant filing.

“Is your client an attorney?” the court asked.

“No, not at all your Honor, just was essentially helping me with the theories of the case,” the lawyer said.

Another popular dodge comes from lawyers who feign ignorance that chatbots are prone to hallucinating facts.

Recent cases suggest this excuse may be mutating into variants. Last month, a sanctioned Oklahoma lawyer admitted that he didn’t expect ChatGPT to add new citations when all he asked the bot to do was “make his writing more persuasive.” And in September, a California lawyer got in a similar bind—and was sanctioned a whopping $10,000, a fine the judge called “conservative.” That lawyer had asked ChatGPT to “enhance” his briefs, “then ran the ‘enhanced’ briefs through other AI platforms to check for errors,” neglecting to ever read the “enhanced” briefs.

Neither of those tired old excuses hold much weight today, especially in courts that have drawn up guidance to address AI hallucinations. But rather than quickly acknowledge their missteps, as courts are begging lawyers to do, several lawyers appear to have gotten desperate. Ars found a bunch citing common tech issues as the reason for citing fake cases.

When in doubt, blame hackers?

For an extreme case, look to a New York City civil court, where a lawyer, Innocent Chinweze, first admitted to using Microsoft Copilot to draft an errant filing, then bizarrely pivoted to claim that the AI citations were due to malware found on his computer.

Chinweze said he had created a draft with correct citations but then got hacked, allowing bad actors “unauthorized remote access” to supposedly add the errors in his filing.

The judge was skeptical, describing the excuse as an “incredible and unsupported statement,” particularly since there was no evidence of the prior draft existing. Instead, Chinweze asked to bring in an expert to testify that the hack had occurred, requesting to end the proceedings on sanctions until after the court weighed the expert’s analysis.

The judge, Kimon C. Thermos, didn’t have to weigh this argument, however, because after the court broke for lunch, the lawyer once again “dramatically” changed his position.

“He no longer wished to adjourn for an expert to testify regarding malware or unauthorized access to his computer,” Thermos wrote in an order issuing sanctions. “He retreated” to “his original position that he used Copilot to aid in his research and didn’t realize that it could generate fake cases.”

Possibly more galling to Thermos than the lawyer’s weird malware argument, though, was a document that Chinweze filed on the day of his sanctions hearing. That document included multiple summaries preceded by this text, the judge noted:

Some case metadata and case summaries were written with the help of AI, which can produce inaccuracies. You should read the full case before relying on it for legal research purposes.

Thermos admonished Chinweze for continuing to use AI recklessly. He blasted the filing as “an incoherent document that is eighty-eight pages long, has no structure, contains the full text of most of the cases cited,” and “shows distinct indications that parts of the discussion/analysis of the cited cases were written by artificial intelligence.”

Ultimately, Thermos ordered Chinweze to pay $1,000, the most typical fine lawyers received in the cases Ars reviewed. The judge then took an extra non-monetary step to sanction Chinweze, referring the lawyer to a grievance committee, “given that his misconduct was substantial and seriously implicated his honesty, trustworthiness, and fitness to practice law.”

Ars could not immediately reach Chinweze for comment.

Toggling windows on a laptop is hard

In Alabama, an attorney named James A. Johnson made an “embarrassing mistake,” he said, primarily because toggling windows on a laptop is hard, US District Judge Terry F. Moorer noted in an October order on sanctions.

Johnson explained that he had accidentally used an AI tool that he didn’t realize could hallucinate. It happened while he was “at an out-of-state hospital attending to the care of a family member recovering from surgery.” He rushed to draft the filing, he said, because he got a notice that his client’s conference had suddenly been “moved up on the court’s schedule.”

“Under time pressure and difficult personal circumstance,” Johnson explained, he decided against using Fastcase, a research tool provided by the Alabama State Bar, to research the filing. Working on his laptop, he opted instead to use “a Microsoft Word plug-in called Ghostwriter Legal” because “it appeared automatically in the sidebar of Word while Fastcase required opening a separate browser to access through the Alabama State Bar website.”

To Johnson, it felt “tedious to toggle back and forth between programs on [his] laptop with the touchpad,” and that meant he “unfortunately fell victim to the allure of a new program that was open and available.”

Moorer seemed unimpressed by Johnson’s claim that he understood tools like ChatGPT were unreliable but didn’t expect the same from other AI legal tools—particularly since “information from Ghostwriter Legal made it clear that it used ChatGPT as its default AI program,” Moorer wrote.

The lawyer’s client was similarly horrified, deciding to drop Johnson on the spot, even though that risked “a significant delay of trial.” Moorer noted that Johnson seemed shaken by his client’s abrupt decision, evidenced by “his look of shock, dismay, and display of emotion.”

Moorer further noted that Johnson had been paid using public funds while seemingly letting AI do his homework. “The harm is not inconsequential as public funds for appointed counsel are not a bottomless well and are limited resource,” the judge wrote in justifying a more severe fine.

“It has become clear that basic reprimands and small fines are not sufficient to deter this type of misconduct because if it were, we would not be here,” Moorer concluded.

Ruling that Johnson’s reliance on AI was “tantamount to bad faith,” Moorer imposed a $5,000 fine. The judge also would have “considered potential disqualification, but that was rendered moot” since Johnson’s client had already dismissed him.

Asked for comment, Johnson told Ars that “the court made plainly erroneous findings of fact and the sanctions are on appeal.”

Plagued by login issues

As a lawyer in Georgia tells it, sometimes fake AI citations may be filed because a lawyer accidentally filed a rough draft instead of the final version.

Other lawyers claim they turn to AI as needed when they have trouble accessing legal tools like Westlaw or LexisNexis.

For example, in Iowa, a lawyer told an appeals court that she regretted relying on “secondary AI-driven research tools” after experiencing “login issues her with her Westlaw subscription.” Although the court was “sympathetic to issues with technology, such as login issues,” the lawyer was sanctioned, primarily because she only admitted to using AI after the court ordered her to explain her mistakes. In her case, however, she got to choose between paying a minimal $150 fine or attending “two hours of legal ethics training particular to AI.”

Less sympathetic was a lawyer who got caught lying about the AI tool she blamed for inaccuracies, a Louisiana case suggested. In that case, a judge demanded to see the research history after a lawyer claimed that AI hallucinations came from “using Westlaw Precision, an AI-assisted research tool, rather than Westlaw’s standalone legal database.”

It turned out that the lawyer had outsourced the research, relying on a “currently suspended” lawyer’s AI citations, and had only “assumed” the lawyer’s mistakes were from Westlaw’s AI tool. It’s unclear what tool was actually used by the suspended lawyer, who likely lost access to a Westlaw login, but the judge ordered a $1,000 penalty after the lawyer who signed the filing “agreed that Westlaw did not generate the fabricated citations.”

Judge warned of “serial hallucinators”

Another lawyer, William T. Panichi in Illinois, has been sanctioned at least three times, Ars’ review found.

In response to his initial penalties ordered in July, he admitted to being tempted by AI while he was “between research software.”

In that case, the court was frustrated to find that the lawyer had contradicted himself, and it ordered more severe sanctions as a result.

Panichi “simultaneously admitted to using AI to generate the briefs, not doing any of his own independent research, and even that he ‘barely did any personal work [him]self on this appeal,’” the court order said, while also defending charging a higher fee—supposedly because this case “was out of the ordinary in terms of time spent” and his office “did some exceptional work” getting information.

The court deemed this AI misuse so bad that Panichi was ordered to disgorge a “payment of $6,925.62 that he received” in addition to a $1,000 penalty.

“If I’m lucky enough to be able to continue practicing before the appellate court, I’m not going to do it again,” Panichi told the court in July, just before getting hit with two more rounds of sanctions in August.

Panichi did not immediately respond to Ars’ request for comment.

When AI-generated hallucinations are found, penalties are often paid to the court, the other parties’ lawyers, or both, depending on whose time and resources were wasted fact-checking fake cases.

Lawyers seem more likely to argue against paying sanctions to the other parties’ attorneys, hoping to keep sanctions as low as possible. One lawyer even argued that “it only takes 7.6 seconds, not hours, to type citations into LexisNexis or Westlaw,” while seemingly neglecting the fact that she did not take those precious seconds to check her own citations.

The judge in the case, Nancy Miller, was clear that “such statements display an astounding lack of awareness of counsel’s obligations,” noting that “the responsibility for correcting erroneous and fake citations never shifts to opposing counsel or the court, even if they are the first to notice the errors.”

“The duty to mitigate the harms caused by such errors remains with the signor,” Miller said. “The sooner such errors are properly corrected, either by withdrawing or amending and supplementing the offending pleadings, the less time is wasted by everyone involved, and fewer costs are incurred.”

Texas US District Judge Marina Garcia Marmolejo agreed, explaining that even more time is wasted determining how other judges have responded to fake AI-generated citations.

“At one of the busiest court dockets in the nation, there are scant resources to spare ferreting out erroneous AI citations in the first place, let alone surveying the burgeoning caselaw on this subject,” she said.

At least one Florida court was “shocked, shocked” to find that a lawyer was refusing to pay what the other party’s attorneys said they were owed after misusing AI. The lawyer in that case, James Martin Paul, asked to pay less than a quarter of the fees and costs owed, arguing that Charlotin’s database showed he might otherwise owe penalties that “would be the largest sanctions paid out for the use of AI generative case law to date.”

But caving to Paul’s arguments “would only benefit serial hallucinators,” the Florida court found. Ultimately, Paul was sanctioned more than $85,000 for what the court said was “far more egregious” conduct than other offenders in the database, chastising him for “repeated, abusive, bad-faith conduct that cannot be recognized as legitimate legal practice and must be deterred.”

Paul did not immediately respond to Ars’ request to comment.

Michael B. Slade, a US bankruptcy judge in Illinois, seems to be done weighing excuses, calling on all lawyers to stop taking AI shortcuts that are burdening courts.

“At this point, to be blunt, any lawyer unaware that using generative AI platforms to do legal research is playing with fire is living in a cloud,” Slade wrote.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

You won’t believe the excuses lawyers have after getting busted for using AI Read More »