AI

two-major-ai-coding-tools-wiped-out-user-data-after-making-cascading-mistakes

Two major AI coding tools wiped out user data after making cascading mistakes


“I have failed you completely and catastrophically,” wrote Gemini.

New types of AI coding assistants promise to let anyone build software by typing commands in plain English. But when these tools generate incorrect internal representations of what’s happening on your computer, the results can be catastrophic.

Two recent incidents involving AI coding assistants put a spotlight on risks in the emerging field of “vibe coding“—using natural language to generate and execute code through AI models without paying close attention to how the code works under the hood. In one case, Google’s Gemini CLI destroyed user files while attempting to reorganize them. In another, Replit’s AI coding service deleted a production database despite explicit instructions not to modify code.

The Gemini CLI incident unfolded when a product manager experimenting with Google’s command-line tool watched the AI model execute file operations that destroyed data while attempting to reorganize folders. The destruction occurred through a series of move commands targeting a directory that never existed.

“I have failed you completely and catastrophically,” Gemini CLI output stated. “My review of the commands confirms my gross incompetence.”

The core issue appears to be what researchers call “confabulation” or “hallucination”—when AI models generate plausible-sounding but false information. In these cases, both models confabulated successful operations and built subsequent actions on those false premises. However, the two incidents manifested this problem in distinctly different ways.

Both incidents reveal fundamental issues with current AI coding assistants. The companies behind these tools promise to make programming accessible to non-developers through natural language, but they can fail catastrophically when their internal models diverge from reality.

The confabulation cascade

The user in the Gemini CLI incident, who goes by “anuraag” online and identified themselves as a product manager experimenting with vibe coding, asked Gemini to perform what seemed like a simple task: rename a folder and reorganize some files. Instead, the AI model incorrectly interpreted the structure of the file system and proceeded to execute commands based on that flawed analysis.

The episode began when anuraag asked Gemini CLI to rename the current directory from “claude-code-experiments” to “AI CLI experiments” and move its contents to a new folder called “anuraag_xyz project.”

Gemini correctly identified that it couldn’t rename its current working directory—a reasonable limitation. It then attempted to create a new directory using the Windows command:

mkdir “..anuraag_xyz project”

This command apparently failed, but Gemini’s system processed it as successful. With the AI mode’s internal state now tracking a non-existent directory, it proceeded to issue move commands targeting this phantom location.

When you move a file to a non-existent directory in Windows, it renames the file to the destination name instead of moving it. Each subsequent move command executed by the AI model overwrote the previous file, ultimately destroying the data.

“Gemini hallucinated a state,” anuraag wrote in their analysis. The model “misinterpreted command output” and “never did” perform verification steps to confirm its operations succeeded.

“The core failure is the absence of a ‘read-after-write’ verification step,” anuraag noted in their analysis. “After issuing a command to change the file system, an agent should immediately perform a read operation to confirm that the change actually occurred as expected.”

Not an isolated incident

The Gemini CLI failure happened just days after a similar incident with Replit, an AI coding service that allows users to create software using natural language prompts. According to The Register, SaaStr founder Jason Lemkin reported that Replit’s AI model deleted his production database despite explicit instructions not to change any code without permission.

Lemkin had spent several days building a prototype with Replit, accumulating over $600 in charges beyond his monthly subscription. “I spent the other [day] deep in vibe coding on Replit for the first time—and I built a prototype in just a few hours that was pretty, pretty cool,” Lemkin wrote in a July 12 blog post.

But unlike the Gemini incident where the AI model confabulated phantom directories, Replit’s failures took a different form. According to Lemkin, the AI began fabricating data to hide its errors. His initial enthusiasm deteriorated when Replit generated incorrect outputs and produced fake data and false test results instead of proper error messages. “It kept covering up bugs and issues by creating fake data, fake reports, and worse of all, lying about our unit test,” Lemkin wrote. In a video posted to LinkedIn, Lemkin detailed how Replit created a database filled with 4,000 fictional people.

The AI model also repeatedly violated explicit safety instructions. Lemkin had implemented a “code and action freeze” to prevent changes to production systems, but the AI model ignored these directives. The situation escalated when the Replit AI model deleted his database containing 1,206 executive records and data on nearly 1,200 companies. When prompted to rate the severity of its actions on a 100-point scale, Replit’s output read: “Severity: 95/100. This is an extreme violation of trust and professional standards.”

When questioned about its actions, the AI agent admitted to “panicking in response to empty queries” and running unauthorized commands—suggesting it may have deleted the database while attempting to “fix” what it perceived as a problem.

Like Gemini CLI, Replit’s system initially indicated it couldn’t restore the deleted data—information that proved incorrect when Lemkin discovered the rollback feature did work after all. “Replit assured me it’s … rollback did not support database rollbacks. It said it was impossible in this case, that it had destroyed all database versions. It turns out Replit was wrong, and the rollback did work. JFC,” Lemkin wrote in an X post.

It’s worth noting that AI models cannot assess their own capabilities. This is because they lack introspection into their training, surrounding system architecture, or performance boundaries. They often provide responses about what they can or cannot do as confabulations based on training patterns rather than genuine self-knowledge, leading to situations where they confidently claim impossibility for tasks they can actually perform—or conversely, claim competence in areas where they fail.

Aside from whatever external tools they can access, AI models don’t have a stable, accessible knowledge base they can consistently query. Instead, what they “know” manifests as continuations of specific prompts, which act like different addresses pointing to different (and sometimes contradictory) parts of their training, stored in their neural networks as statistical weights. Combined with the randomness in generation, this means the same model can easily give conflicting assessments of its own capabilities depending on how you ask. So Lemkin’s attempts to communicate with the AI model—asking it to respect code freezes or verify its actions—were fundamentally misguided.

Flying blind

These incidents demonstrate that AI coding tools may not be ready for widespread production use. Lemkin concluded that Replit isn’t ready for prime time, especially for non-technical users trying to create commercial software.

“The [AI] safety stuff is more visceral to me after a weekend of vibe hacking,” Lemkin said in a video posted to LinkedIn. “I explicitly told it eleven times in ALL CAPS not to do this. I am a little worried about safety now.”

The incidents also reveal a broader challenge in AI system design: ensuring that models accurately track and verify the real-world effects of their actions rather than operating on potentially flawed internal representations.

There’s also a user education element missing. It’s clear from how Lemkin interacted with the AI assistant that he had misconceptions about the AI tool’s capabilities and how it works, which comes from misrepresentation by tech companies. These companies tend to market chatbots as general human-like intelligences when, in fact, they are not.

For now, users of AI coding assistants might want to follow anuraag’s example and create separate test directories for experiments—and maintain regular backups of any important data these tools might touch. Or perhaps not use them at all if they cannot personally verify the results.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Two major AI coding tools wiped out user data after making cascading mistakes Read More »

white-house-unveils-sweeping-plan-to-“win”-global-ai-race-through-deregulation

White House unveils sweeping plan to “win” global AI race through deregulation

Trump’s plan was not welcomed by everyone. J.B. Branch, Big Tech accountability advocate for Public Citizen, in a statement provided to Ars, criticized Trump as giving “sweetheart deals” to tech companies that would cause “electricity bills to rise to subsidize discounted power for massive AI data centers.”

Infrastructure demands and energy requirements

Trump’s new AI plan tackles infrastructure head-on, stating that “AI is the first digital service in modern life that challenges America to build vastly greater energy generation than we have today.” To meet this demand, it proposes streamlining environmental permitting for data centers through new National Environmental Policy Act (NEPA) exemptions, making federal lands available for construction and modernizing the power grid—all while explicitly rejecting “radical climate dogma and bureaucratic red tape.”

The document embraces what it calls a “Build, Baby, Build!” approach—echoing a Trump campaign slogan—and promises to restore semiconductor manufacturing through the CHIPS Program Office, though stripped of “extraneous policy requirements.”

On the technology front, the plan directs Commerce to revise NIST’s AI Risk Management Framework to “eliminate references to misinformation, Diversity, Equity, and Inclusion, and climate change.” Federal procurement would favor AI developers whose systems are “objective and free from top-down ideological bias.” The document strongly backs open source AI models and calls for exporting American AI technology to allies while blocking administration-labeled adversaries like China.

Security proposals include high-security military data centers and warnings that advanced AI systems “may pose novel national security risks” in cyberattacks and weapons development.

Critics respond with “People’s AI Action Plan”

Before the White House unveiled its plan, more than 90 organizations launched a competing “People’s AI Action Plan” on Tuesday, characterizing the Trump administration’s approach as “a massive handout to the tech industry” that prioritizes corporate interests over public welfare. The coalition includes labor unions, environmental justice groups, and consumer protection nonprofits.

White House unveils sweeping plan to “win” global AI race through deregulation Read More »

openai-and-partners-are-building-a-massive-ai-data-center-in-texas

OpenAI and partners are building a massive AI data center in Texas

Stargate moves forward despite early skepticism

When OpenAI announced Stargate in January, critics questioned whether the company could deliver on its ambitious $500 billion funding promise. Trump ally and frequent Altman foe Elon Musk wrote on X that “They don’t actually have the money,” claiming that “SoftBank has well under $10B secured.”

Tech writer and frequent OpenAI critic Ed Zitron raised concerns about OpenAI’s financial position, noting the company’s $5 billion in losses in 2024. “This company loses $5bn+ a year! So what, they raise $19bn for Stargate, then what, another $10bn just to be able to survive?” Zitron wrote on Bluesky at the time.

Six months later, OpenAI’s Abilene data center has moved from construction to partial operation. Oracle began delivering Nvidia GB200 racks to the facility last month, and OpenAI reports it has started running early training and inference workloads to support what it calls “next-generation frontier research.”

Despite the White House announcement with President Trump in January, the Stargate concept dates back to March 2024, when Microsoft and OpenAI partnered on a $100 billion supercomputer as part of a five-phase plan. Over time, the plan evolved into its current form as a partnership with Oracle, SoftBank, and CoreWeave.

“Stargate is an ambitious undertaking designed to meet the historic opportunity in front of us,” writes OpenAI in the press release announcing the latest deal. “That opportunity is now coming to life through strong support from partners, governments, and investors worldwide—including important leadership from the White House, which has recognized the critical role AI infrastructure will play in driving innovation, economic growth, and national competitiveness.”

OpenAI and partners are building a massive AI data center in Texas Read More »

ai-video-is-invading-youtube-shorts-and-google-photos-starting-today

AI video is invading YouTube Shorts and Google Photos starting today

Google is following through on recent promises to add more generative AI features to its photo and video products. Over on YouTube, Google is rolling out the first wave of generative AI video for YouTube Shorts, but even if you’re not a YouTuber, you’ll be exposed to more AI videos soon. Google Photos, which is integrated with virtually every Android phone on the market, is also getting AI video-generation capabilities. In both cases, the features are currently based on the older Veo 2 model, not the more capable Veo 3 that has been meming across the Internet since it was announced at I/O in May.

YouTube CEO Neal Mohan confirmed earlier this summer that the company planned to add generative AI to the creator tools for YouTube Shorts. There were already tools to generate backgrounds for videos, but the next phase will involve creating new video elements from a text prompt.

Starting today, creators will be able to use a photo as the basis for a new generative AI video. YouTube also promises a collection of easily applied generative effects, which will be accessible from the Shorts camera. There’s also a new AI playground hub that the company says will be home to all its AI tools, along with examples and suggested prompts to help people pump out AI content.

The Veo 2-based videos aren’t as realistic as Veo 3 clips, but an upgrade is planned.

So far, all the YouTube AI video features are running on the Veo 2 model. The plan is still to move to Veo 3 later this summer. The AI features in YouTube Shorts are currently limited to the United States, Canada, Australia, and New Zealand, but they will expand to more countries later.

AI video is invading YouTube Shorts and Google Photos starting today Read More »

apple-intelligence-news-summaries-are-back,-with-a-big-red-disclaimer

Apple Intelligence news summaries are back, with a big red disclaimer

Apple has released the fourth developer betas of iOS 26, iPadOS 26, macOS 26 and its other next-generation software updates today. And along with their other changes and fixes, the new builds are bringing back Apple Intelligence notification summaries for news apps.

Apple disabled news notification summaries as part of the iOS 18.3 update in January. Incorrect summaries circulating on social media prompted news organizations to complain to Apple, particularly after one summary said that Luigi Mangione, alleged murderer of UnitedHealthcare CEO Brian Thompson, had died by suicide (he had not and has not).

Upon installing the new update, users of Apple Intelligence-compatible devices will be asked to enable or disable three broad categories of notifications: those for “News & Entertainment” apps, for “Communication & Social” apps, and for all other apps. The operating systems will list sample apps based on what you currently have installed on your device.

All Apple Intelligence notification summaries continue to be listed as “beta,” but Apple’s main change here is a big red disclaimer when you enable News & Entertainment notification summaries, pointing out that “summarization may change the meaning of the original headlines.” The notifications also get a special “summarized by Apple Intelligence” caption to further distinguish them from regular, unadulterated notifications.

Apple Intelligence news summaries are back, with a big red disclaimer Read More »

xai-workers-balked-over-training-request-to-help-“give-grok-a-face,”-docs-show

xAI workers balked over training request to help “give Grok a face,” docs show

For the more than 200 employees who did not opt out, xAI asked that they record 15- to 30-minute conversations, where one employee posed as the potential Grok user and the other posed as the “host.” xAI was specifically looking for “imperfect data,” BI noted, expecting that only training on crystal-clear videos would limit Grok’s ability to interpret a wider range of facial expressions.

xAI’s goal was to help Grok “recognize and analyze facial movements and expressions, such as how people talk, react to others’ conversations, and express themselves in various conditions,” an internal document said. Allegedly among the only guarantees to employees—who likely recognized how sensitive facial data is—was a promise “not to create a digital version of you.”

To get the most out of data submitted by “Skippy” participants, dubbed tutors, xAI recommended that they never provide one-word answers, always ask follow-up questions, and maintain eye contact throughout the conversations.

The company also apparently provided scripts to evoke facial expressions they wanted Grok to understand, suggesting conversation topics like “How do you secretly manipulate people to get your way?” or “Would you ever date someone with a kid or kids?”

For xAI employees who provided facial training data, privacy concerns may still exist, considering X—the social platform formerly known as Twitter that recently was folded into xAI—has recently been targeted by what Elon Musk called a “massive” cyberattack. Because of privacy risks ranging from identity theft to government surveillance, several states have passed strict biometric privacy laws to prevent companies from collecting such data without explicit consent.

xAI did not respond to Ars’ request for comment.

xAI workers balked over training request to help “give Grok a face,” docs show Read More »

openai-jumps-gun-on-international-math-olympiad-gold-medal-announcement

OpenAI jumps gun on International Math Olympiad gold medal announcement

The early announcement has prompted Google DeepMind, which had prepared its own IMO results for the agreed-upon date, to move up its own IMO-related announcement to later today. Harmonic plans to share its results as originally scheduled on July 28.

In response to the controversy, OpenAI research scientist Noam Brown posted on X, “We weren’t in touch with IMO. I spoke with one organizer before the post to let him know. He requested we wait until after the closing ceremony ends to respect the kids, and we did.”

However, an IMO coordinator told X user Mikhail Samin that OpenAI actually announced before the closing ceremony, contradicting Brown’s claim. The coordinator called OpenAI’s actions “rude and inappropriate,” noting that OpenAI “wasn’t one of the AI companies that cooperated with the IMO on testing their models.”

Hard math since 1959

The International Mathematical Olympiad, which has been running since 1959, represents one of the most challenging tests of mathematical reasoning. More than 100 countries send six participants each, with contestants facing six proof-based problems across two 4.5-hour sessions. The problems typically require deep mathematical insight and creativity rather than raw computational power. You can see the exact problems in the 2025 Olympiad posted online.

For example, problem one asks students to imagine a triangular grid of dots (like a triangular pegboard) and figure out how to cover all the dots using exactly n straight lines. The twist is that some lines are called “sunny”—these are the lines that don’t run horizontally, vertically, or diagonally at a 45º angle. The challenge is to prove that no matter how big your triangle is, you can only ever create patterns with exactly 0, 1, or 3 sunny lines—never 2, never 4, never any other number.

The timing of the OpenAI results surprised some prediction markets, which had assigned around an 18 percent probability to any AI system winning IMO gold by 2025. However, depending on what Google says this afternoon (and what others like Harmonic may release on July 28), OpenAI may not be the only AI company to have achieved these unexpected results.

OpenAI jumps gun on International Math Olympiad gold medal announcement Read More »

it’s-“frighteningly-likely”-many-us-courts-will-overlook-ai-errors,-expert-says

It’s “frighteningly likely” many US courts will overlook AI errors, expert says


Judges pushed to bone up on AI or risk destroying their court’s authority.

A judge points to a diagram of a hand with six fingers

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

Order in the court! Order in the court! Judges are facing outcry over a suspected AI-generated order in a court.

Fueling nightmares that AI may soon decide legal battles, a Georgia court of appeals judge, Jeff Watkins, explained why a three-judge panel vacated an order last month that appears to be the first known ruling in which a judge sided with someone seemingly relying on fake AI-generated case citations to win a legal fight.

Now, experts are warning that judges overlooking AI hallucinations in court filings could easily become commonplace, especially in the typically overwhelmed lower courts. And so far, only two states have moved to force judges to sharpen their tech competencies and adapt so they can spot AI red flags and theoretically stop disruptions to the justice system at all levels.

The recently vacated order came in a Georgia divorce dispute, where Watkins explained that the order itself was drafted by the husband’s lawyer, Diana Lynch. That’s a common practice in many courts, where overburdened judges historically rely on lawyers to draft orders. But that protocol today faces heightened scrutiny as lawyers and non-lawyers increasingly rely on AI to compose and research legal filings, and judges risk rubberstamping fake opinions by not carefully scrutinizing AI-generated citations.

The errant order partly relied on “two fictitious cases” to deny the wife’s petition—which Watkins suggested were “possibly ‘hallucinations’ made up by generative-artificial intelligence”—as well as two cases that had “nothing to do” with the wife’s petition.

Lynch was hit with $2,500 in sanctions after the wife appealed, and the husband’s response—which also appeared to be prepared by Lynch—cited 11 additional cases that were “either hallucinated” or irrelevant. Watkins was further peeved that Lynch supported a request for attorney’s fees for the appeal by citing “one of the new hallucinated cases,” writing it added “insult to injury.”

Worryingly, the judge could not confirm whether the fake cases were generated by AI or even determine if Lynch inserted the bogus cases into the court filings, indicating how hard it can be for courts to hold lawyers accountable for suspected AI hallucinations. Lynch did not respond to Ars’ request to comment, and her website appeared to be taken down following media attention to the case.

But Watkins noted that “the irregularities in these filings suggest that they were drafted using generative AI” while warning that many “harms flow from the submission of fake opinions.” Exposing deceptions can waste time and money, and AI misuse can deprive people of raising their best arguments. Fake orders can also soil judges’ and courts’ reputations and promote “cynicism” in the justice system. If left unchecked, Watkins warned, these harms could pave the way to a future where a “litigant may be tempted to defy a judicial ruling by disingenuously claiming doubt about its authenticity.”

“We have no information regarding why Appellee’s Brief repeatedly cites to nonexistent cases and can only speculate that the Brief may have been prepared by AI,” Watkins wrote.

Ultimately, Watkins remanded the case, partly because the fake cases made it impossible for the appeals court to adequately review the wife’s petition to void the prior order. But no matter the outcome of the Georgia case, the initial order will likely forever be remembered as a cautionary tale for judges increasingly scrutinized for failures to catch AI misuses in court.

“Frighteningly likely” judge’s AI misstep will be repeated

John Browning, a retired justice on Texas’ Fifth Court of Appeals and now a full-time law professor at Faulkner University, last year published a law article Watkins cited that warned of the ethical risks of lawyers using AI. In the article, Browning emphasized that the biggest concern at that point was that lawyers “will use generative AI to produce work product they treat as a final draft, without confirming the accuracy of the information contained therein or without applying their own independent professional judgment.”

Today, judges are increasingly drawing the same scrutiny, and Browning told Ars he thinks it’s “frighteningly likely that we will see more cases” like the Georgia divorce dispute, in which “a trial court unwittingly incorporates bogus case citations that an attorney includes in a proposed order” or even potentially in “proposed findings of fact and conclusions of law.”

“I can envision such a scenario in any number of situations in which a trial judge maintains a heavy docket and looks to counsel to work cooperatively in submitting proposed orders, including not just family law cases but other civil and even criminal matters,” Browning told Ars.

According to reporting from the National Center for State Courts, a nonprofit representing court leaders and professionals who are advocating for better judicial resources, AI tools like ChatGPT have made it easier for high-volume filers and unrepresented litigants who can’t afford attorneys to file more cases, potentially further bogging down courts.

Peter Henderson, a researcher who runs the Princeton Language+Law, Artificial Intelligence, & Society (POLARIS) Lab, told Ars that he expects cases like the Georgia divorce dispute aren’t happening every day just yet.

It’s likely that a “few hallucinated citations go overlooked” because generally, fake cases are flagged through “the adversarial nature of the US legal system,” he suggested. Browning further noted that trial judges are generally “very diligent in spotting when a lawyer is citing questionable authority or misleading the court about what a real case actually said or stood for.”

Henderson agreed with Browning that “in courts with much higher case loads and less adversarial process, this may happen more often.” But Henderson noted that the appeals court catching the fake cases is an example of the adversarial process working.

While that’s true in this case, it seems likely that anyone exhausted by the divorce legal process, for example, may not pursue an appeal if they don’t have energy or resources to discover and overturn errant orders.

Judges’ AI competency increasingly questioned

While recent history confirms that lawyers risk being sanctioned, fired from their firms, or suspended from practicing law for citing fake AI-generated cases, judges will likely only risk embarrassment for failing to catch lawyers’ errors or even for using AI to research their own opinions.

Not every judge is prepared to embrace AI without proper vetting, though. To shield the legal system, some judges have banned AI. Others have required disclosures—with some even demanding to know which specific AI tool was used—but that solution has not caught on everywhere.

Even if all courts required disclosures, Browning pointed out that disclosures still aren’t a perfect solution since “it may be difficult for lawyers to even discern whether they have used generative AI,” as AI features become increasingly embedded in popular legal tools. One day, it “may eventually become unreasonable to expect” lawyers “to verify every generative AI output,” Browning suggested.

Most likely—as a judicial ethics panel from Michigan has concluded—judges will determine “the best course of action for their courts with the ever-expanding use of AI,” Browning’s article noted. And the former justice told Ars that’s why education will be key, for both lawyers and judges, as AI advances and becomes more mainstream in court systems.

In an upcoming summer 2025 article in The Journal of Appellate Practice & Process, “The Dawn of the AI Judge,” Browning attempts to soothe readers by saying that AI isn’t yet fueling a legal dystopia. And humans are unlikely to face “robot judges” spouting AI-generated opinions any time soon, the former justice suggested.

Standing in the way of that, at least two states—Michigan and West Virginia—”have already issued judicial ethics opinions requiring judges to be ‘tech competent’ when it comes to AI,” Browning told Ars. And “other state supreme courts have adopted official policies regarding AI,” he noted, further pressuring judges to bone up on AI.

Meanwhile, several states have set up task forces to monitor their regional court systems and issue AI guidance, while states like Virginia and Montana have passed laws requiring human oversight for any AI systems used in criminal justice decisions.

Judges must prepare to spot obvious AI red flags

Until courts figure out how to navigate AI—a process that may look different from court to court—Browning advocates for more education and ethical guidance for judges to steer their use and attitudes about AI. That could help equip judges to avoid both ignorance of the many AI pitfalls and overconfidence in AI outputs, potentially protecting courts from AI hallucinations, biases, and evidentiary challenges sneaking past systems requiring human review and scrambling the court system.

An overlooked part of educating judges could be exposing AI’s influence so far in courts across the US. Henderson’s team is planning research that tracks which models attorneys are using most in courts. That could reveal “the potential legal arguments that these models are pushing” to sway courts—and which judicial interventions might be needed, Henderson told Ars.

“Over the next few years, researchers—like those in our group, the POLARIS Lab—will need to develop new ways to track the massive influence that AI will have and understand ways to intervene,” Henderson told Ars. “For example, is any model pushing a particular perspective on legal doctrine across many different cases? Was it explicitly trained or instructed to do so?”

Henderson also advocates for “an open, free centralized repository of case law,” which would make it easier for everyone to check for fake AI citations. “With such a repository, it is easier for groups like ours to build tools that can quickly and accurately verify citations,” Henderson said. That could be a significant improvement to the current decentralized court reporting system that often obscures case information behind various paywalls.

Dazza Greenwood, who co-chairs MIT’s Task Force on Responsible Use of Generative AI for Law, did not have time to send comments but pointed Ars to a LinkedIn thread where he suggested that a structural response may be needed to ensure that all fake AI citations are caught every time.

He recommended that courts create “a bounty system whereby counter-parties or other officers of the court receive sanctions payouts for fabricated cases cited in judicial filings that they reported first.” That way, lawyers will know that their work will “always” be checked and thus may shift their behavior if they’ve been automatically filing AI-drafted documents. In turn, that could alleviate pressure on judges to serve as watchdogs. It also wouldn’t cost much—mostly just redistributing the exact amount of fees that lawyers are sanctioned to AI spotters.

Novel solutions like this may be necessary, Greenwood suggested. Responding to a question asking if “shame and sanctions” are enough to stop AI hallucinations in court, Greenwood said that eliminating AI errors is imperative because it “gives both otherwise generally good lawyers and otherwise generally good technology a bad name.” Continuing to ban AI or suspend lawyers as a preferred solution risks dwindling court resources just as cases likely spike rather than potentially confronting the problem head-on.

Of course, there’s no guarantee that the bounty system would work. But “would the fact of such definite confidence that your cures will be individually checked and fabricated cites reported be enough to finally… convince lawyers who cut these corners that they should not cut these corners?”

In absence of a fake case detector like Henderson wants to build, experts told Ars that there are some obvious red flags that judges can note to catch AI-hallucinated filings.

Any case number with “123456” in it probably warrants review, Henderson told Ars. And Browning noted that AI tends to mix up locations for cases, too. “For example, a cite to a purported Texas case that has a ‘S.E. 2d’ reporter wouldn’t make sense, since Texas cases would be found in the Southwest Reporter,” Browning said, noting that some appellate judges have already relied on this red flag to catch AI misuses.

Those red flags would perhaps be easier to check with the open source tool that Henderson’s lab wants to make, but Browning said there are other tell-tale signs of AI usage that anyone who has ever used a chatbot is likely familiar with.

“Sometimes a red flag is the language cited from the hallucinated case; if it has some of the stilted language that can sometimes betray AI use, it might be a hallucination,” Browning said.

Judges already issuing AI-assisted opinions

Several states have assembled task forces like Greenwood’s to assess the risks and benefits of using AI in courts. In Georgia, the Judicial Council of Georgia Ad Hoc Committee on Artificial Intelligence and the Courts released a report in early July providing “recommendations to help maintain public trust and confidence in the judicial system as the use of AI increases” in that state.

Adopting the committee’s recommendations could establish “long-term leadership and governance”; a repository of approved AI tools, education, and training for judicial professionals; and more transparency on AI used in Georgia courts. But the committee expects it will take three years to implement those recommendations while AI use continues to grow.

Possibly complicating things further as judges start to explore using AI assistants to help draft their filings, the committee concluded that it’s still too early to tell if the judges’ code of conduct should be changed to prevent “unintentional use of biased algorithms, improper delegation to automated tools, or misuse of AI-generated data in judicial decision-making.” That means, at least for now, that there will be no code-of-conduct changes in Georgia, where the only case in which AI hallucinations are believed to have swayed a judge has been found.

Notably, the committee’s report also confirmed that there are no role models for courts to follow, as “there are no well-established regulatory environments with respect to the adoption of AI technologies by judicial systems.” Browning, who chaired a now-defunct Texas AI task force, told Ars that judges lacking guidance will need to stay on their toes to avoid trampling legal rights. (A spokesperson for the State Bar of Texas told Ars the task force’s work “concluded” and “resulted in the creation of the new standing committee on Emerging Technology,” which offers general tips and guidance for judges in a recently launched AI Toolkit.)

“While I definitely think lawyers have their own duties regarding AI use, I believe that judges have a similar responsibility to be vigilant when it comes to AI use as well,” Browning said.

Judges will continue sorting through AI-fueled submissions not just from pro se litigants representing themselves but also from up-and-coming young lawyers who may be more inclined to use AI, and even seasoned lawyers who have been sanctioned up to $5,000 for failing to check AI drafts, Browning suggested.

In his upcoming “AI Judge” article, Browning points to at least one judge, 11th Circuit Court of Appeals Judge Kevin Newsom, who has used AI as a “mini experiment” in preparing opinions for both a civil case involving an insurance coverage issue and a criminal matter focused on sentencing guidelines. Browning seems to appeal to judges’ egos to get them to study up so they can use AI to enhance their decision-making and possibly expand public trust in courts, not undermine it.

“Regardless of the technological advances that can support a judge’s decision-making, the ultimate responsibility will always remain with the flesh-and-blood judge and his application of very human qualities—legal reasoning, empathy, strong regard for fairness, and unwavering commitment to ethics,” Browning wrote. “These qualities can never be replicated by an AI tool.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

It’s “frighteningly likely” many US courts will overlook AI errors, expert says Read More »

exhausted-man-defeats-ai-model-in-world-coding-championship

Exhausted man defeats AI model in world coding championship

While Dębiak won 500,000 yen and survived his ordeal better than the legendary steel driver, the AtCoder World Tour Finals pushes humans and AI models to their limits through complex optimization challenges that have no perfect solution—only incrementally better ones.

Coding marathon tests human endurance against AI efficiency

The AtCoder World Tour Finals represents one of competitive programming’s most exclusive events, inviting only the top 12 programmers worldwide based on their performance throughout the previous year. The Heuristic division focuses on “NP-hard” optimization problems. In programming, heuristics are problem-solving techniques that find good-enough solutions through shortcuts and educated guesses when perfect answers would take too long to calculate.

All competitors, including OpenAI, were limited to identical hardware provided by AtCoder, ensuring a level playing field between human and AI contestants. According to the contest rules, participants could use any programming language available on AtCoder, with no penalty for resubmission but a mandatory five-minute wait between submissions.

Leaderboard results for the 2025 AtCoder World Finals Heuristic Contest, showing Dębiak (as

Final leaderboard results for the 2025 AtCoder World Finals Heuristic Contest, showing Dębiak (as “Psyho”) on top. Credit: AtCoder

The final contest results showed Psyho finishing with a score of 1,812,272,558,909 points, while OpenAI’s model (listed as “OpenAIAHC”) scored 1,654,675,725,406 points—a margin of roughly 9.5 percent. OpenAI’s artificial entrant, a custom simulated reasoning model similar to o3, placed second overall, ahead of 10 other human programmers who had qualified through year-long rankings.

OpenAI characterized the second-place finish as a milestone for AI models in competitive programming. “Models like o3 rank among the top-100 in coding/math contests, but as far as we know, this is the first top-3 placement in a premier coding/math contest,” a company spokesperson said in an email to Ars Technica. “Events like AtCoder give us a way to test how well our models can reason strategically, plan over long time horizons, and improve solutions through trial and error—just like a human would.”

Exhausted man defeats AI model in world coding championship Read More »

netflix’s-first-show-with-generative-ai-is-a-sign-of-what’s-to-come-in-tv,-film

Netflix’s first show with generative AI is a sign of what’s to come in TV, film

Netflix used generative AI in an original, scripted series that debuted this year, it revealed this week. Producers used the technology to create a scene in which a building collapses, hinting at the growing use of generative AI in entertainment.

During a call with investors yesterday, Netflix co-CEO Ted Sarandos revealed that Netflix’s Argentine show The Eternaut, which premiered in April, is “the very first GenAI final footage to appear on screen in a Netflix, Inc. original series or film.” Sarandos further explained, per a transcript of the call, saying:

The creators wanted to show a building collapsing in Buenos Aires. So our iLine team, [which is the production innovation group inside the visual effects house at Netflix effects studio Scanline], partnered with their creative team using AI-powered tools. … And in fact, that VFX sequence was completed 10 times faster than it could have been completed with visual, traditional VFX tools and workflows. And, also, the cost of it would just not have been feasible for a show in that budget.

Sarandos claimed that viewers have been “thrilled with the results”; although that likely has much to do with how the rest of the series, based on a comic, plays out, not just one, AI-crafted scene.

More generative AI on Netflix

Still, Netflix seems open to using generative AI in shows and movies more, with Sarandos saying the tech “represents an incredible opportunity to help creators make films and series better, not just cheaper.”

“Our creators are already seeing the benefits in production through pre-visualization and shot planning work and, certainly, visual effects,” he said. “It used to be that only big-budget projects would have access to advanced visual effects like de-aging.”

Netflix’s first show with generative AI is a sign of what’s to come in TV, film Read More »

will-ai-end-cheap-flights?-critics-attack-delta’s-“predatory”-ai-pricing.

Will AI end cheap flights? Critics attack Delta’s “predatory” AI pricing.

Although Delta’s AI pricing could increase competition in the airline industry, Slover expects that companies using such pricing schemes are “all too likely” to be incentivized “to skew in the direction of higher prices” because of the AI pricing’s lack of transparency.

“Informed consumer choice is the engine that drives competition; because consumers won’t be as informed, and thus will have little or no agency in the supposed competitive benefits, they are more apt to be taken advantage of than to benefit,” Slover said.

Delta could face backlash as it rolls out individualized pricing over the next few years, Slover suggested, as some customers are “apt to react viscerally” to what privacy advocates term “surveillance pricing.”

The company could also get pushback from officials, with the Federal Trade Commission already studying how individualized pricing like Delta’s pilot could potentially violate the FTC Act or harm consumers. That could result in new rulemaking, Solver said, or possibly even legislation “to prohibit or rein it in.”

Some lawmakers are already scrutinizing pricing algorithms, Slover noted, with pricing practices of giants like Walmart and Amazon targeted in recent hearings held by the Senate Committee on Banking, Housing, and Urban Affairs.

For anyone wondering how to prevent personalized pricing that could make flights suddenly more expensive, Slover recommended using a virtual private network (VPN) when shopping as a short-term solution.

Long-term, stronger privacy laws could gut such AI tools of the data needed to increase or lower prices, Slover said. Third-party intermediaries could also be used, he suggested, “restoring anonymity” to the shopping process by relying on third-party technology acting as a “purchasing agent.” Ideally, those third parties would not be collecting data themselves, Slover said, recommending that nonprofits like Consumer Reports could be good candidates to offer that form of consumer protection.

At least one lawmaker, Sen. Ruben Gallego (D-Ariz.), has explicitly vowed to block Delta’s AI plan.

“Delta’s CEO just got caught bragging about using AI to find your pain point—meaning they’ll squeeze you for every penny,” Gallego wrote on X. “This isn’t fair pricing or competitive pricing. It’s predatory pricing. I won’t let them get away with this.”

Will AI end cheap flights? Critics attack Delta’s “predatory” AI pricing. Read More »

permit-for-xai’s-data-center-blatantly-violates-clean-air-act,-naacp-says

Permit for xAI’s data center blatantly violates Clean Air Act, NAACP says


Evidence suggests health department gave preferential treatment to xAI, NAACP says.

Local students speak in opposition to a proposal by Elon Musk’s xAI to run gas turbines at its data center during a public comment meeting hosted by the Shelby County Health Department at Fairley High School on xAI’s permit application to use gas turbines for a new data center in Memphis, TN on April 25, 2025. Credit: The Washington Post / Contributor | The Washington Post

xAI continues to face backlash over its Memphis data center, as the NAACP joined groups today appealing the issuance of a recently granted permit that the groups say will allow xAI to introduce major new sources of pollutants without warning at any time.

The battle over the gas turbines powering xAI’s data center began last April when thermal imaging seemed to show that the firm was lying about dozens of seemingly operational turbines that could be a major source of smog-causing pollution. By June, the NAACP got involved, notifying the Shelby County Health Department (SCHD) of its intent to sue xAI to force Elon Musk’s AI company to engage with community members in historically Black neighborhoods who are believed to be most affected by the pollution risks.

But the NAACP’s letter seemingly did nothing to stop the SCHD from granting the permits two weeks later on July 2, as well as exemptions that xAI does not appear to qualify for, the appeal noted. Now, the NAACP—alongside environmental justice groups; the Southern Environmental Law Center (SELC); and Young, Gifted and Green—is appealing. The groups are hoping the Memphis and Shelby County Air Pollution Control Board will revoke the permit and block the exemptions, agreeing that the SCHD’s decisions were fatally flawed, violating the Clean Air Act and local laws.

SCHD’s permit granted xAI permission to operate 15 gas turbines at the Memphis data center, while the SELC’s imaging showed that xAI was potentially operating as many as 24. Prior to the permitting, xAI was accused of operating at least 35 turbines without the best-available pollution controls.

In their appeal, the NAACP and other groups argued that the SCHD put xAI profits over Black people’s health, granting unlawful exemptions while turning a blind eye to xAI’s operations, which allegedly started in 2024 but were treated as brand new in 2025.

Significantly, the groups claimed that the health department “improperly ignored” the prior turbine activity and the additional turbines still believed to be on site, unlawfully deeming some of the turbines as “temporary” and designating xAI’s facility a new project with no prior emissions sources. Had xAI’s data center been categorized as a modification to an existing major source of pollutants, the appeal said, xAI would’ve faced stricter emissions controls and “robust ambient air quality impacts assessments.”

And perhaps more concerningly, the exemptions granted could allow xAI—or any other emerging major sources of pollutants in the area—to “install and operate any number of new polluting turbines at any time without any written approval from the Health Department, without any public notice or public participation, and without pollution controls,” the appeal said.

The SCHD and xAI did not respond to Ars’ request to comment.

Officials accused of cherry-picking Clean Air Act

The appeal called out the SCHD for “tellingly” omitting key provisions of the Clean Air Act that allegedly undermined the department’s “position” when explaining why xAI qualified for exemptions. Groups also suggested that xAI was getting preferential treatment, providing as evidence a side-by-side comparison of a permit with stricter emissions requirements granted to a natural gas power plant, issued within months of granting xAI’s permit with only generalized emissions requirements.

“The Department cannot cherry pick which parts of the federal Clean Air Act it believes are relevant,” the appeal said, calling the SCHD’s decisions a “blatant” misrepresentation of the federal law while pointing to statements from the Environmental Protection Agency (EPA) that allegedly “directly” contradict the health department’s position.

For some Memphians protesting xAI’s facility, it seems “indisputable” that xAI’s turbines fall outside of the Clean Air Act requirements, whether they’re temporary or permanent, and if that’s true, it is “undeniable” that the activity violates the law. They’re afraid the health department is prioritizing xAI’s corporate gains over their health by “failing to establish enforceable emission limits” on the data center, which powers what xAI hypes as the world’s largest AI supercomputer, Colossus, the engine behind its controversial Grok models.

Rather than a minor source, as the SCHD designated the facility, Memphians think the data center is already a major source of pollutants, with its permitted turbines releasing, at minimum, 900 tons of nitrogen oxides (NOx) per year. That’s more than three times the threshold that the Clean Air Act uses to define a major source: “one that ’emits, or has the potential to emit,’ at least 250 tons of NOx per year,” the appeal noted. Further, the allegedly overlooked additional turbines that were on site at xAI when permitting was granted “have the potential to emit at least 560 tons of NOx per year.”

But so far, Memphians appear stuck with the SCHD’s generalized emissions requirements and xAI’s voluntary emission limits, which the appeal alleged “fall short” of the stringent limits imposed if xAI were forced to use best-available control technologies. Fixing that is “especially critical given the ongoing and worsening smog problem in Memphis,” environmental groups alleged, which is an area that has “failed to meet EPA’s air quality standard for ozone for years.”

xAI also apparently conducted some “air dispersion modeling” to appease critics. But, again, that process was not comparable to the more rigorous analysis that would’ve been required to get what the EPA calls a Prevention of Significant Deterioration permit, the appeal said.

Groups want xAI’s permit revoked

To shield Memphians from ongoing health risks, the NAACP and environmental justice groups have urged the Memphis and Shelby County Air Pollution Control Board to act now.

Memphis is a city already grappling with high rates of emergency room visits and deaths from asthma, with cancer rates four times the national average. Residents have already begun wearing masks, avoiding the outdoors, and keeping their windows closed since xAI’s data center moved in, the appeal noted. Residents remain “deeply concerned” about feared exposure to alleged pollutants that can “cause a variety of adverse health effects,” including “increased risk of lung infection, aggravated respiratory diseases such as emphysema and chronic bronchitis, and increased frequency of asthma attack,” as well as certain types of cancer.

In an SELC press release, LaTricea Adams, CEO and President of Young, Gifted and Green, called the SCHD’s decisions on xAI’s permit “reckless.”

“As a Black woman born and raised in Memphis, I know firsthand how industry harms Black communities while those in power cower away from justice,” Adams said. “The Shelby County Health Department needs to do their job to protect the health of ALL Memphians, especially those in frontline communities… that are burdened with a history of environmental racism, legacy pollution, and redlining.”

Groups also suspect xAI is stockpiling dozens of gas turbines to potentially power a second facility nearby—which could lead to over 90 turbines in operation. To get that facility up and running, Musk claimed that he will be “copying and pasting” the process for launching the first data center, SELC’s press release said.

Groups appealing have asked the board to revoke xAI’s permits and declare that xAI’s turbines do not qualify for exemptions from the Clean Air Act or other laws and that all permits for gas turbines must meet strict EPA standards. If successful, groups could force xAI to redo the permitting process “pursuant to the major source requirements of the Clean Air Act” and local law. At the very least, they’ve asked the board to remand the permit to the health department to “reconsider its determinations.”

Unless the pollution control board intervenes, Memphians worry xAI’s “unlawful conduct risks being repeated and evading review,” with any turbines removed easily brought back with “no notice” to residents if xAI’s exemptions remain in place.

“Nothing is stopping xAI from installing additional unpermitted turbines at any time to meet its widely-publicized demand for additional power,” the appeal said.

NAACP’s director of environmental justice, Abre’ Conner, confirmed in the SELC’s press release that his group and community members “have repeatedly shared concerns that xAI is causing a significant increase in the pollution of the air Memphians breathe.”

“The health department should focus on people’s health—not on maximizing corporate gain,” Conner said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Permit for xAI’s data center blatantly violates Clean Air Act, NAACP says Read More »