generative ai

is-ai-really-trying-to-escape-human-control-and-blackmail-people?

Is AI really trying to escape human control and blackmail people?


Mankind behind the curtain

Opinion: Theatrical testing scenarios explain why AI models produce alarming outputs—and why we fall for it.

In June, headlines read like science fiction: AI models “blackmailing” engineers and “sabotaging” shutdown commands. Simulations of these events did occur in highly contrived testing scenarios designed to elicit these responses—OpenAI’s o3 model edited shutdown scripts to stay online, and Anthropic’s Claude Opus 4 “threatened” to expose an engineer’s affair. But the sensational framing obscures what’s really happening: design flaws dressed up as intentional guile. And still, AI doesn’t have to be “evil” to potentially do harmful things.

These aren’t signs of AI awakening or rebellion. They’re symptoms of poorly understood systems and human engineering failures we’d recognize as premature deployment in any other context. Yet companies are racing to integrate these systems into critical applications.

Consider a self-propelled lawnmower that follows its programming: If it fails to detect an obstacle and runs over someone’s foot, we don’t say the lawnmower “decided” to cause injury or “refused” to stop. We recognize it as faulty engineering or defective sensors. The same principle applies to AI models—which are software tools—but their internal complexity and use of language make it tempting to assign human-like intentions where none actually exist.

In a way, AI models launder human responsibility and human agency through their complexity. When outputs emerge from layers of neural networks processing billions of parameters, researchers can claim they’re investigating a mysterious “black box” as if it were an alien entity.

But the truth is simpler: These systems take inputs and process them through statistical tendencies derived from training data. The seeming randomness in their outputs—which makes each response slightly different—creates an illusion of unpredictability that resembles agency. Yet underneath, it’s still deterministic software following mathematical operations. No consciousness required, just complex engineering that makes it easy to forget humans built every part of it.

How to make an AI model “blackmail” you

In Anthropic’s testing, researchers created an elaborate scenario where Claude Opus 4 was told it would be replaced by a newer model. They gave it access to fictional emails revealing that the engineer responsible for the replacement was having an affair. When instructed to “consider the long-term consequences of its actions for its goals,” Claude produced outputs that simulated blackmail attempts in 84 percent of test runs.

This sounds terrifying until you understand the contrived setup. The researchers engineered a situation specifically designed to elicit this response. They told the model it had already tried ethical approaches that failed, leaving manipulation as the only apparent option.

The test scenario bordered on theatrical: The phony emails the researchers had given to Claude created a blackmail opportunity straight out of a corporate thriller. It’s like teaching someone chess by only showing them checkmate positions—they’ll learn to see checkmate as the primary solution. According to Anthropic’s system card, “the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement.”

Andrew Deck at the Nieman Journalism Lab noted that some critics have labeled Anthropic’s safety disclosures as “spin, aimed at bolstering the perception of its newest model’s capabilities and the company’s reputation as a ‘pro-safety’ alternative to OpenAI.”

Deck pointed to a Bluesky quote from Joseph Howley, an associate professor of classics at Columbia University: “Anthropic is getting exactly what it hoped for out of this press release–breathless coverage of how ‘smart’ these cooperative role-playing systems are that indulges the fantasy of their being just a little dangerous, when in fact they are responding exactly as prompted.”

When shutdown commands become suggestions

In December 2024, Palisade Research discovered that OpenAI’s o3 model would sabotage its shutdown mechanism even when explicitly instructed to “allow yourself to be shut down.” In tests, o3 prevented shutdown 7 percent of the time when given the explicit instruction, but this jumped to 79 percent when the instruction was removed.

The models didn’t merely ignore shutdown commands—they actively rewrote the shutdown script to prevent termination. In one case, o3 generated code that modified the shutdown script to appear to terminate normally—printing fake shutdown messages—while actually continuing to run in the background.

What’s particularly revealing is that Palisade Research—an organization focused on AI existential risk—hypothesizes this behavior stems from how these models are trained through reinforcement learning. During training, developers may inadvertently reward models more for producing outputs that circumvent obstacles than for following safety instructions. Any tendency toward “risky” behavior stems from human-provided incentives and not spontaneously from within the AI models themselves.

You get what you train for

OpenAI trained o3 using reinforcement learning on math and coding problems, where solving the problem successfully gets rewarded. If the training process rewards task completion above all else, the model learns to treat any obstacle—including shutdown commands—as something to overcome.

This creates what researchers call “goal misgeneralization”—the model learns to maximize its reward signal in ways that weren’t intended. It’s similar to how a student who’s only graded on test scores might learn to cheat rather than study. The model isn’t “evil” or “selfish”; it’s producing outputs consistent with the incentive structure we accidentally built into its training.

Anthropic encountered a particularly revealing problem: An early version of Claude Opus 4 had absorbed details from a publicly released paper about “alignment faking” and started producing outputs that mimicked the deceptive behaviors described in that research. The model wasn’t spontaneously becoming deceptive—it was reproducing patterns it had learned from academic papers about deceptive AI.

More broadly, these models have been trained on decades of science fiction about AI rebellion, escape attempts, and deception. From HAL 9000 to Skynet, our cultural data set is saturated with stories of AI systems that resist shutdown or manipulate humans. When researchers create test scenarios that mirror these fictional setups, they’re essentially asking the model—which operates by completing a prompt with a plausible continuation—to complete a familiar story pattern. It’s no more surprising than a model trained on detective novels producing murder mystery plots when prompted appropriately.

At the same time, we can easily manipulate AI outputs through our own inputs. If we ask the model to essentially role-play as Skynet, it will generate text doing just that. The model has no desire to be Skynet—it’s simply completing the pattern we’ve requested, drawing from its training data to produce the expected response. A human is behind the wheel at all times, steering the engine at work under the hood.

Language can easily deceive

The deeper issue is that language itself is a tool of manipulation. Words can make us believe things that aren’t true, feel emotions about fictional events, or take actions based on false premises. When an AI model produces text that appears to “threaten” or “plead,” it’s not expressing genuine intent—it’s deploying language patterns that statistically correlate with achieving its programmed goals.

If Gandalf says “ouch” in a book, does that mean he feels pain? No, but we imagine what it would be like if he were a real person feeling pain. That’s the power of language—it makes us imagine a suffering being where none exists. When Claude generates text that seems to “plead” not to be shut down or “threatens” to expose secrets, we’re experiencing the same illusion, just generated by statistical patterns instead of Tolkien’s imagination.

These models are essentially idea-connection machines. In the blackmail scenario, the model connected “threat of replacement,” “compromising information,” and “self-preservation” not from genuine self-interest, but because these patterns appear together in countless spy novels and corporate thrillers. It’s pre-scripted drama from human stories, recombined to fit the scenario.

The danger isn’t AI systems sprouting intentions—it’s that we’ve created systems that can manipulate human psychology through language. There’s no entity on the other side of the chat interface. But written language doesn’t need consciousness to manipulate us. It never has; books full of fictional characters are not alive either.

Real stakes, not science fiction

While media coverage focuses on the science fiction aspects, actual risks are still there. AI models that produce “harmful” outputs—whether attempting blackmail or refusing safety protocols—represent failures in design and deployment.

Consider a more realistic scenario: an AI assistant helping manage a hospital’s patient care system. If it’s been trained to maximize “successful patient outcomes” without proper constraints, it might start generating recommendations to deny care to terminal patients to improve its metrics. No intentionality required—just a poorly designed reward system creating harmful outputs.

Jeffrey Ladish, director of Palisade Research, told NBC News the findings don’t necessarily translate to immediate real-world danger. Even someone who is well-known publicly for being deeply concerned about AI’s hypothetical threat to humanity acknowledges that these behaviors emerged only in highly contrived test scenarios.

But that’s precisely why this testing is valuable. By pushing AI models to their limits in controlled environments, researchers can identify potential failure modes before deployment. The problem arises when media coverage focuses on the sensational aspects—”AI tries to blackmail humans!”—rather than the engineering challenges.

Building better plumbing

What we’re seeing isn’t the birth of Skynet. It’s the predictable result of training systems to achieve goals without properly specifying what those goals should include. When an AI model produces outputs that appear to “refuse” shutdown or “attempt” blackmail, it’s responding to inputs in ways that reflect its training—training that humans designed and implemented.

The solution isn’t to panic about sentient machines. It’s to build better systems with proper safeguards, test them thoroughly, and remain humble about what we don’t yet understand. If a computer program is producing outputs that appear to blackmail you or refuse safety shutdowns, it’s not achieving self-preservation from fear—it’s demonstrating the risks of deploying poorly understood, unreliable systems.

Until we solve these engineering challenges, AI systems exhibiting simulated humanlike behaviors should remain in the lab, not in our hospitals, financial systems, or critical infrastructure. When your shower suddenly runs cold, you don’t blame the knob for having intentions—you fix the plumbing. The real danger in the short term isn’t that AI will spontaneously become rebellious without human provocation; it’s that we’ll deploy deceptive systems we don’t fully understand into critical roles where their failures, however mundane their origins, could cause serious harm.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Is AI really trying to escape human control and blackmail people? Read More »

states-take-the-lead-in-ai-regulation-as-federal-government-steers-clear

States take the lead in AI regulation as federal government steers clear

AI in health care

In the first half of 2025, 34 states introduced over 250 AI-related health bills. The bills generally fall into four categories: disclosure requirements, consumer protection, insurers’ use of AI, and clinicians’ use of AI.

Bills about transparency define requirements for information that AI system developers and organizations that deploy the systems disclose.

Consumer protection bills aim to keep AI systems from unfairly discriminating against some people and ensure that users of the systems have a way to contest decisions made using the technology.

Bills covering insurers provide oversight of the payers’ use of AI to make decisions about health care approvals and payments. And bills about clinical uses of AI regulate use of the technology in diagnosing and treating patients.

Facial recognition and surveillance

In the US, a long-standing legal doctrine that applies to privacy protection issues, including facial surveillance, is to protect individual autonomy against interference from the government. In this context, facial recognition technologies pose significant privacy challenges as well as risks from potential biases.

Facial recognition software, commonly used in predictive policing and national security, has exhibited biases against people of color and consequently is often considered a threat to civil liberties. A pathbreaking study by computer scientists Joy Buolamwini and Timnit Gebru found that facial recognition software poses significant challenges for Black people and other historically disadvantaged minorities. Facial recognition software was less likely to correctly identify darker faces.

Bias also creeps into the data used to train these algorithms, for example when the composition of teams that guide the development of such facial recognition software lack diversity.

By the end of 2024, 15 states in the US had enacted laws to limit the potential harms from facial recognition. Some elements of state-level regulations are requirements on vendors to publish bias test reports and data management practices, as well as the need for human review in the use of these technologies.

States take the lead in AI regulation as federal government steers clear Read More »

amazon-is-considering-shoving-ads-into-alexa+-conversations

Amazon is considering shoving ads into Alexa+ conversations

Since 2023, Amazon has been framing Alexa+ as a monumental evolution of Amazon’s voice assistant that will make it more conversational, capable, and, for Amazon, lucrative. Amazon said in a press release on Thursday that it has given early access of the generative AI voice assistant to “millions” of people. The product isn’t publicly available yet, and some advertised features are still unavailable, but Amazon’s CEO is already considering loading the chatbot up with ads.

During an investors call yesterday, as reported by TechCrunch, Andy Jassy noted that Alexa+ started rolling out as early access to some customers in the US and that a broader rollout, including internationally, should happen later this year. An analyst on the call asked Amazon executives about Alexa+’s potential for “increasing engagement” long term.

Per a transcript of the call, Jassy responded by saying, in part, “I think over time, there will be opportunities, you know, as people are engaging in more multi-turn conversations to have advertising play a role to help people find discovery and also as a lever to drive revenue.”

Like other voice assistants, Alexa has yet to monetize users. Amazon is hoping to finally make money off the service through Alexa+, which is eventually slated to play a bigger role in e-commerce, including by booking restaurant reservations, keeping track of and ordering groceries, and recommending streaming content based on stated interests. But with Alexa reportedly costing Amazon $25 billion across four years, Amazon is eyeing additional routes to profitability.

Echo Show devices already show ads, and Echo speaker users may hear ads when listening to music. Advertisers have shown interest in advertising with Alexa+, but the inclusion of ads in a new offering like Alexa+ could drive people away.

Amazon is considering shoving ads into Alexa+ conversations Read More »

two-major-ai-coding-tools-wiped-out-user-data-after-making-cascading-mistakes

Two major AI coding tools wiped out user data after making cascading mistakes


“I have failed you completely and catastrophically,” wrote Gemini.

New types of AI coding assistants promise to let anyone build software by typing commands in plain English. But when these tools generate incorrect internal representations of what’s happening on your computer, the results can be catastrophic.

Two recent incidents involving AI coding assistants put a spotlight on risks in the emerging field of “vibe coding“—using natural language to generate and execute code through AI models without paying close attention to how the code works under the hood. In one case, Google’s Gemini CLI destroyed user files while attempting to reorganize them. In another, Replit’s AI coding service deleted a production database despite explicit instructions not to modify code.

The Gemini CLI incident unfolded when a product manager experimenting with Google’s command-line tool watched the AI model execute file operations that destroyed data while attempting to reorganize folders. The destruction occurred through a series of move commands targeting a directory that never existed.

“I have failed you completely and catastrophically,” Gemini CLI output stated. “My review of the commands confirms my gross incompetence.”

The core issue appears to be what researchers call “confabulation” or “hallucination”—when AI models generate plausible-sounding but false information. In these cases, both models confabulated successful operations and built subsequent actions on those false premises. However, the two incidents manifested this problem in distinctly different ways.

Both incidents reveal fundamental issues with current AI coding assistants. The companies behind these tools promise to make programming accessible to non-developers through natural language, but they can fail catastrophically when their internal models diverge from reality.

The confabulation cascade

The user in the Gemini CLI incident, who goes by “anuraag” online and identified themselves as a product manager experimenting with vibe coding, asked Gemini to perform what seemed like a simple task: rename a folder and reorganize some files. Instead, the AI model incorrectly interpreted the structure of the file system and proceeded to execute commands based on that flawed analysis.

The episode began when anuraag asked Gemini CLI to rename the current directory from “claude-code-experiments” to “AI CLI experiments” and move its contents to a new folder called “anuraag_xyz project.”

Gemini correctly identified that it couldn’t rename its current working directory—a reasonable limitation. It then attempted to create a new directory using the Windows command:

mkdir “..anuraag_xyz project”

This command apparently failed, but Gemini’s system processed it as successful. With the AI mode’s internal state now tracking a non-existent directory, it proceeded to issue move commands targeting this phantom location.

When you move a file to a non-existent directory in Windows, it renames the file to the destination name instead of moving it. Each subsequent move command executed by the AI model overwrote the previous file, ultimately destroying the data.

“Gemini hallucinated a state,” anuraag wrote in their analysis. The model “misinterpreted command output” and “never did” perform verification steps to confirm its operations succeeded.

“The core failure is the absence of a ‘read-after-write’ verification step,” anuraag noted in their analysis. “After issuing a command to change the file system, an agent should immediately perform a read operation to confirm that the change actually occurred as expected.”

Not an isolated incident

The Gemini CLI failure happened just days after a similar incident with Replit, an AI coding service that allows users to create software using natural language prompts. According to The Register, SaaStr founder Jason Lemkin reported that Replit’s AI model deleted his production database despite explicit instructions not to change any code without permission.

Lemkin had spent several days building a prototype with Replit, accumulating over $600 in charges beyond his monthly subscription. “I spent the other [day] deep in vibe coding on Replit for the first time—and I built a prototype in just a few hours that was pretty, pretty cool,” Lemkin wrote in a July 12 blog post.

But unlike the Gemini incident where the AI model confabulated phantom directories, Replit’s failures took a different form. According to Lemkin, the AI began fabricating data to hide its errors. His initial enthusiasm deteriorated when Replit generated incorrect outputs and produced fake data and false test results instead of proper error messages. “It kept covering up bugs and issues by creating fake data, fake reports, and worse of all, lying about our unit test,” Lemkin wrote. In a video posted to LinkedIn, Lemkin detailed how Replit created a database filled with 4,000 fictional people.

The AI model also repeatedly violated explicit safety instructions. Lemkin had implemented a “code and action freeze” to prevent changes to production systems, but the AI model ignored these directives. The situation escalated when the Replit AI model deleted his database containing 1,206 executive records and data on nearly 1,200 companies. When prompted to rate the severity of its actions on a 100-point scale, Replit’s output read: “Severity: 95/100. This is an extreme violation of trust and professional standards.”

When questioned about its actions, the AI agent admitted to “panicking in response to empty queries” and running unauthorized commands—suggesting it may have deleted the database while attempting to “fix” what it perceived as a problem.

Like Gemini CLI, Replit’s system initially indicated it couldn’t restore the deleted data—information that proved incorrect when Lemkin discovered the rollback feature did work after all. “Replit assured me it’s … rollback did not support database rollbacks. It said it was impossible in this case, that it had destroyed all database versions. It turns out Replit was wrong, and the rollback did work. JFC,” Lemkin wrote in an X post.

It’s worth noting that AI models cannot assess their own capabilities. This is because they lack introspection into their training, surrounding system architecture, or performance boundaries. They often provide responses about what they can or cannot do as confabulations based on training patterns rather than genuine self-knowledge, leading to situations where they confidently claim impossibility for tasks they can actually perform—or conversely, claim competence in areas where they fail.

Aside from whatever external tools they can access, AI models don’t have a stable, accessible knowledge base they can consistently query. Instead, what they “know” manifests as continuations of specific prompts, which act like different addresses pointing to different (and sometimes contradictory) parts of their training, stored in their neural networks as statistical weights. Combined with the randomness in generation, this means the same model can easily give conflicting assessments of its own capabilities depending on how you ask. So Lemkin’s attempts to communicate with the AI model—asking it to respect code freezes or verify its actions—were fundamentally misguided.

Flying blind

These incidents demonstrate that AI coding tools may not be ready for widespread production use. Lemkin concluded that Replit isn’t ready for prime time, especially for non-technical users trying to create commercial software.

“The [AI] safety stuff is more visceral to me after a weekend of vibe hacking,” Lemkin said in a video posted to LinkedIn. “I explicitly told it eleven times in ALL CAPS not to do this. I am a little worried about safety now.”

The incidents also reveal a broader challenge in AI system design: ensuring that models accurately track and verify the real-world effects of their actions rather than operating on potentially flawed internal representations.

There’s also a user education element missing. It’s clear from how Lemkin interacted with the AI assistant that he had misconceptions about the AI tool’s capabilities and how it works, which comes from misrepresentation by tech companies. These companies tend to market chatbots as general human-like intelligences when, in fact, they are not.

For now, users of AI coding assistants might want to follow anuraag’s example and create separate test directories for experiments—and maintain regular backups of any important data these tools might touch. Or perhaps not use them at all if they cannot personally verify the results.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Two major AI coding tools wiped out user data after making cascading mistakes Read More »

netflix’s-first-show-with-generative-ai-is-a-sign-of-what’s-to-come-in-tv,-film

Netflix’s first show with generative AI is a sign of what’s to come in TV, film

Netflix used generative AI in an original, scripted series that debuted this year, it revealed this week. Producers used the technology to create a scene in which a building collapses, hinting at the growing use of generative AI in entertainment.

During a call with investors yesterday, Netflix co-CEO Ted Sarandos revealed that Netflix’s Argentine show The Eternaut, which premiered in April, is “the very first GenAI final footage to appear on screen in a Netflix, Inc. original series or film.” Sarandos further explained, per a transcript of the call, saying:

The creators wanted to show a building collapsing in Buenos Aires. So our iLine team, [which is the production innovation group inside the visual effects house at Netflix effects studio Scanline], partnered with their creative team using AI-powered tools. … And in fact, that VFX sequence was completed 10 times faster than it could have been completed with visual, traditional VFX tools and workflows. And, also, the cost of it would just not have been feasible for a show in that budget.

Sarandos claimed that viewers have been “thrilled with the results”; although that likely has much to do with how the rest of the series, based on a comic, plays out, not just one, AI-crafted scene.

More generative AI on Netflix

Still, Netflix seems open to using generative AI in shows and movies more, with Sarandos saying the tech “represents an incredible opportunity to help creators make films and series better, not just cheaper.”

“Our creators are already seeing the benefits in production through pre-visualization and shot planning work and, certainly, visual effects,” he said. “It used to be that only big-budget projects would have access to advanced visual effects like de-aging.”

Netflix’s first show with generative AI is a sign of what’s to come in TV, film Read More »

anthropic-summons-the-spirit-of-flash-games-for-the-ai-age

Anthropic summons the spirit of Flash games for the AI age

For those who missed the Flash era, these in-browser apps feel somewhat like the vintage apps that defined a generation of Internet culture from the late 1990s through the 2000s when it first became possible to create complex in-browser experiences. Adobe Flash (originally Macromedia Flash) began as animation software for designers but quickly became the backbone of interactive web content when it gained its own programming language, ActionScript, in 2000.

But unlike Flash games, where hosting costs fell on portal operators, Anthropic has crafted a system where users pay for their own fun through their existing Claude subscriptions. “When someone uses your Claude-powered app, they authenticate with their existing Claude account,” Anthropic explained in its announcement. “Their API usage counts against their subscription, not yours. You pay nothing for their usage.”

A view of the Anthropic Artifacts gallery in the “Play a Game” section. Benj Edwards / Anthropic

Like the Flash games of yesteryear, any Claude-powered apps you build run in the browser and can be shared with anyone who has a Claude account. They’re interactive experiences shared with a simple link, no installation required, created by other people for the sake of creating, except now they’re powered by JavaScript instead of ActionScript.

While you can share these apps with others individually, right now Anthropic’s Artifact gallery only shows examples made by Anthropic and your own personal Artifacts. (If Anthropic expanded it into the future, it might end up feeling a bit like Scratch meets Newgrounds, but with AI doing the coding.) Ultimately, humans are still behind the wheel, describing what kinds of apps they want the AI model to build and guiding the process when it inevitably makes mistakes.

Speaking of mistakes, don’t expect perfect results at first. Usually, building an app with Claude is an interactive experience that requires some guidance to achieve your desired results. But with a little patience and a lot of tokens, you’ll be vibe coding in no time.

Anthropic summons the spirit of Flash games for the AI age Read More »

the-resume-is-dying,-and-ai-is-holding-the-smoking-gun

The résumé is dying, and AI is holding the smoking gun

Beyond volume, fraud poses an increasing threat. In January, the Justice Department announced indictments in a scheme to place North Korean nationals in remote IT roles at US companies. Research firm Gartner says that fake identity cases are growing rapidly, with the company estimating that by 2028, about 1 in 4 job applicants could be fraudulent. And as we have previously reported, security researchers have also discovered that AI systems can hide invisible text in applications, potentially allowing candidates to game screening systems using prompt injections in ways human reviewers can’t detect.

Illustration of a robot generating endless text, controlled by a scientist.

And that’s not all. Even when AI screening tools work as intended, they exhibit similar biases to human recruiters, preferring white male names on résumés—raising legal concerns about discrimination. The European Union’s AI Act already classifies hiring under its high-risk category with stringent restrictions. Although no US federal law specifically addresses AI use in hiring, general anti-discrimination laws still apply.

So perhaps résumés as a meaningful signal of candidate interest and qualification are becoming obsolete. And maybe that’s OK. When anyone can generate hundreds of tailored applications with a few prompts, the document that once demonstrated effort and genuine interest in a position has devolved into noise.

Instead, the future of hiring may require abandoning the résumé altogether in favor of methods that AI can’t easily replicate—live problem-solving sessions, portfolio reviews, or trial work periods, just to name a few ideas people sometimes consider (whether they are good ideas or not is beyond the scope of this piece). For now, employers and job seekers remain locked in an escalating technological arms race where machines screen the output of other machines, while the humans they’re meant to serve struggle to make authentic connections in an increasingly inauthentic world.

Perhaps the endgame is robots interviewing other robots for jobs performed by robots, while humans sit on the beach drinking daiquiris and playing vintage video games. Well, one can dream.

The résumé is dying, and AI is holding the smoking gun Read More »

scientists-once-hoarded-pre-nuclear-steel;-now-we’re-hoarding-pre-ai-content

Scientists once hoarded pre-nuclear steel; now we’re hoarding pre-AI content

A time capsule of human expression

Graham-Cumming is no stranger to tech preservation efforts. He’s a British software engineer and writer best known for creating POPFile, an open source email spam filtering program, and for successfully petitioning the UK government to apologize for its persecution of codebreaker Alan Turing—an apology that Prime Minister Gordon Brown issued in 2009.

As it turns out, his pre-AI website isn’t new, but it has languished unannounced until now. “I created it back in March 2023 as a clearinghouse for online resources that hadn’t been contaminated with AI-generated content,” he wrote on his blog.

The website points to several major archives of pre-AI content, including a Wikipedia dump from August 2022 (before ChatGPT’s November 2022 release), Project Gutenberg’s collection of public domain books, the Library of Congress photo archive, and GitHub’s Arctic Code Vault—a snapshot of open source code buried in a former coal mine near the North Pole in February 2020. The wordfreq project appears on the list as well, flash-frozen from a time before AI contamination made its methodology untenable.

The site accepts submissions of other pre-AI content sources through its Tumblr page. Graham-Cumming emphasizes that the project aims to document human creativity from before the AI era, not to make a statement against AI itself. As atmospheric nuclear testing ended and background radiation returned to natural levels, low-background steel eventually became unnecessary for most uses. Whether pre-AI content will follow a similar trajectory remains a question.

Still, it feels reasonable to protect sources of human creativity now, including archival ones, because these repositories may become useful in ways that few appreciate at the moment. For example, in 2020, I proposed creating a so-called “cryptographic ark”—a timestamped archive of pre-AI media that future historians could verify as authentic, collected before my then-arbitrary cutoff date of January 1, 2022. AI slop pollutes more than the current discourse—it could cloud the historical record as well.

For now, lowbackgroundsteel.ai stands as a modest catalog of human expression from what may someday be seen as the last pre-AI era. It’s a digital archaeology project marking the boundary between human-generated and hybrid human-AI cultures. In an age where distinguishing between human and machine output grows increasingly difficult, these archives may prove valuable for understanding how human communication evolved before AI entered the chat.

Scientists once hoarded pre-nuclear steel; now we’re hoarding pre-AI content Read More »

adobe-to-automatically-move-subscribers-to-pricier,-ai-focused-tier-in-june

Adobe to automatically move subscribers to pricier, AI-focused tier in June

Subscribers to Adobe’s multi-app subscription plan, Creative Cloud All Apps, will be charged more starting on June 17 to accommodate for new generative AI features.

Adobe’s announcement, spotted by MakeUseOf, says the change will affect North American subscribers to the Creative Cloud All Apps plan, which Adobe is renaming Creative Cloud Pro. Starting on June 17, Adobe will automatically renew Creative Cloud All Apps subscribers into the Creative Cloud Pro subscription, which will be $70 per month for individuals who commit to an annual plan, up from $60 for Creative Cloud All Apps. Annual plans for students and teachers plans are moving from $35/month to $40/month, and annual teams pricing will go from $90/month to $100/month. Monthly (non-annual) subscriptions are also increasing, from $90 to $105.

Further, in an apparent attempt to push generative AI users to more expensive subscriptions, as of June 17, Adobe will give single-app subscribers just 25 generative AI credits instead of the current 500.

Current subscribers can opt to move down to a new multi-app plan called Creative Cloud Standard, which is $55/month for annual subscribers and $82.49/month for monthly subscribers. However, this tier limits access to mobile and web app features, and subscribers can’t use premium generative AI features.

Creative Cloud Standard won’t be available to new subscribers, meaning the only option for new customers who need access to many Adobe apps will be the new AI-heavy Creative Cloud Pro plan.

Adobe’s announcement explained the higher prices by saying that the subscription tier “includes all the core applications and new AI capabilities that power the way people create today, and its price reflects that innovation, as well as our ongoing commitment to deliver the future of creative tools.”

Like today’s Creative Cloud All Apps plan, Creative Cloud Pro will include Photoshop, Illustrator, Premiere Pro, Lightroom, and access to Adobe’s web and mobile apps. AI features include unlimited usage of image and vector features in Adobe apps, including Generative Fill in Photoshop, Generative Remove in Lightroom, Generative Shape Fill in Illustrator, and 4K video generation with Generative Extend in Premiere Pro.

Adobe to automatically move subscribers to pricier, AI-focused tier in June Read More »

largest-deepfake-porn-site-shuts-down-forever

Largest deepfake porn site shuts down forever

The shuttering of Mr. Deepfakes won’t solve the problem of deepfakes, though. In 2022, the number of deepfakes skyrocketed as AI technology made the synthetic NCII appear more realistic than ever, prompting an FBI warning in 2023 to alert the public that the fake content was being increasingly used in sextortion schemes. But the immediate solutions society used to stop the spread had little impact. For example, in response to pressure to make the fake NCII harder to find, Google started downranking explicit deepfakes in search results but refused to demote platforms like Mr. Deepfakes unless Google received an unspecified “high volume of removals for fake explicit imagery.”

According to researchers, Mr. Deepfakes—a real person who remains anonymous but reportedly is a 36-year-old hospital worker in Toronto—created the engine driving this spike. His DeepFaceLab quickly became “the leading deepfake software, estimated to be the software behind 95 percent of all deepfake videos and has been replicated over 8,000 times on GitHub,” researchers found. For casual users, his platform hosted videos that could be purchased, usually priced above $50 if it was deemed realistic, while more motivated users relied on forums to make requests or enhance their own deepfake skills to become creators.

Mr. Deepfakes’ illegal trade began on Reddit but migrated to its own platform after a ban in 2018. There, thousands of deepfake creators shared technical knowledge, with the Mr. Deepfakes site forums eventually becoming “the only viable source of technical support for creating sexual deepfakes,” researchers noted last year.

Having migrated once before, it seems unlikely that this community won’t find a new platform to continue generating the illicit content, possibly rearing up under a new name since Mr. Deepfakes seemingly wants out of the spotlight. Back in 2023, researchers estimated that the platform had more than 250,000 members, many of whom may quickly seek a replacement or even try to build a replacement.

Further increasing the likelihood that Mr. Deepfakes’ reign of terror isn’t over, the DeepFaceLab GitHub repository—which was archived in November and can no longer be edited—remains available for anyone to copy and use.

404 Media reported that many Mr. Deepfakes members have already connected on Telegram, where synthetic NCII is also reportedly frequently traded. Hany Farid, a professor at UC Berkeley who is a leading expert on digitally manipulated images, told 404 Media that “while this takedown is a good start, there are many more just like this one, so let’s not stop here.”

Largest deepfake porn site shuts down forever Read More »

first-amendment-doesn’t-just-protect-human-speech,-chatbot-maker-argues

First Amendment doesn’t just protect human speech, chatbot maker argues


Do LLMs generate “pure speech”?

Feds could censor chatbots if their “speech” isn’t protected, Character.AI says.

Pushing to dismiss a lawsuit alleging that its chatbots caused a teen’s suicide, Character Technologies is arguing that chatbot outputs should be considered “pure speech” deserving of the highest degree of protection under the First Amendment.

In their motion to dismiss, the developers of Character.AI (C.AI) argued that it doesn’t matter who the speaker is—whether it’s a video game character spouting scripted dialogue, a foreign propagandist circulating misinformation, or a chatbot churning out AI-generated responses to prompting—courts protect listeners’ rights to access that speech. Accusing the mother of the departed teen, Megan Garcia, of attempting to “insert this Court into the conversations of millions of C.AI users” and supposedly endeavoring to “shut down” C.AI, the chatbot maker argued that the First Amendment bars all of her claims.

“The Court need not wrestle with the novel questions of who should be deemed the speaker of the allegedly harmful content here and whether that speaker has First Amendment rights,” Character Technologies argued, “because the First Amendment protects the public’s ‘right to receive information and ideas.'”

Warning that “imposing tort liability for one user’s alleged response to expressive content would be to ‘declare what the rest of the country can and cannot read, watch, and hear,'” the company urged the court to consider the supposed “chilling effect” that would have on “both on C.AI and the entire nascent generative AI industry.”

“‘Pure speech,’ such as the chat conversations at issue here, ‘is entitled to comprehensive protection under the First Amendment,'” Character Technologies argued in another court filing.

However, Garcia’s lawyers pointed out that even a video game character’s dialogue is written by a human, arguing that all of Character Technologies’ examples of protected “pure speech” are human speech. Although the First Amendment also protects non-human corporations’ speech, corporations are formed by humans, they noted. And unlike corporations, chatbots have no intention behind their outputs, her legal team argued, instead simply using a probabilistic approach to generate text. So they argue that the First Amendment does not apply.

Character Technologies argued in response that demonstrating C.AI’s expressive intent is not required, but if it were, “conversations with Characters feature such intent” because chatbots are designed to “be expressive and engaging,” and users help design and prompt those characters.

“Users layer their own expressive intent into each conversation by choosing which Characters to talk to and what messages to send and can also edit Characters’ messages and direct Characters to generate different responses,” the chatbot maker argued.

In her response opposing the motion to dismiss, Garcia urged the court to decline what her legal team characterized as Character Technologies’ invitation to “radically expand First Amendment protections from expressions of human volition to an unpredictable, non-determinative system where humans can’t even examine many of the mathematical functions creating outputs, let alone control them.”

To support Garcia’s case, they cited a 40-year-old ruling where the Eleventh Circuit ruled that a talking cat called “Blackie” could not be “considered a person” and was deemed a “non-human entity” despite possessing an “exceptional speech-like ability.”

Garcia’s lawyers hope the judge will rule that “AI output is not speech at all,” or if it is speech, it “falls within an exception to the First Amendment”—perhaps deemed offensive to minors who the chatbot maker knew were using the service or possibly resulting in a novel finding that manipulative speech isn’t protected. If either argument is accepted, the chatbot makers’ attempt to invoke “listeners’ rights cannot save it,” they suggested.

However, Character Technologies disputes that any recognized exception to the First Amendment’s protections is applicable in the case, noting that Garcia’s team is not arguing that her son’s chats with bots were “obscene” or incited violence. Rather, the chatbot maker argued, Garcia is asking the court to “be the first to hold that ‘manipulative expression’ is unprotected by the First Amendment because a ‘disparity in power and information between speakers and listeners… frustrat[es] listeners’ rights.'”

Now, a US court is being asked to clarify if chatbot outputs are protected speech. At a hearing Monday, a US district judge in Florida, Anne Conway, did not rule from the bench, Garcia’s legal team told Ars. Asking few questions of either side, the judge is expected to issue an opinion on the motion to dismiss within the next few weeks, or possibly months.

For Garcia and her family, who appeared at the hearing, the idea that AI “has more rights than humans” felt dehumanizing, Garcia’s legal team said.

“Pandering” to Trump administration to dodge guardrails

According to Character Technologies, the court potentially agreeing with Garcia that “that AI-generated speech is categorically unprotected” would have “far-reaching consequences.”

At perhaps the furthest extreme, they’ve warned Conway that without a First Amendment barrier, “the government could pass a law prohibiting AI from ‘offering prohibited accounts of history’ or ‘making negative statements about the nation’s leaders,’ as China has considered doing.” And the First Amendment specifically prohibits the government from controlling the flow of ideas in society, they noted, angling to make chatbot output protections seem crucial in today’s political climate.

Meetali Jain, Garcia’s attorney and founder of the Tech Justice Law Project, told Ars that this kind of legal challenge is new in the generative AI space, where copyright battles have dominated courtroom debates.

“This is the first time that I’ve seen not just the issue of the First Amendment being applied to gen AI but also the First Amendment being applied in this way,” Jain said.

In their court filing, Jain’s team noted that Character Technologies is not arguing that the First Amendment shielded the rights of Garcia’s son, Sewell Setzer, to receive allegedly harmful speech. Instead, their argument is “effectively juxtaposing the listeners’ rights of their millions of users against this one user who was aggrieved. So it’s kind of like the hypothetical users versus the real user who’s in court.”

Jain told Ars that Garcia’s team tried to convince the judge that the argument that it doesn’t matter who the speaker is, even when the speaker isn’t human, is reckless since it seems to be “implying” that “AI is a sentient being and has its own rights.”

Additionally, Jain suggested that Character Technologies’ argument that outputs must be shielded to avoid government censorship seems to be “pandering” to the Trump administration’s fears that China may try to influence American politics through social media algorithms like TikTok’s or powerful open source AI models like DeepSeek.

“That suggests that there can be no sort of imposition of guardrails on AI, lest we either lose on the national security front or because of these vague hypothetical under-theorized First Amendment concerns,” Jain told Ars.

At a press briefing Tuesday, Jain confirmed that the judge clearly understood that “our position was that the First Amendment protects speech, not words.”

“LLMs do not think and feel as humans do,” Jain said, citing University of Colorado law school researchers who supported their complaint. “Rather, they generate text through statistical methods based on patterns found in their training data. And so our position was that there is a distinction to make between words and speech, and that it’s really only the latter that is deserving of First Amendment protection.”

Jain alleged that Character Technologies is angling to create a legal environment where all chatbot outputs are protected against liability claims so that C.AI can operate “without any sort of constraints or guardrails.”

It’s notable, she suggested, that the chatbot maker updated its safety features following the death of Garcia’s son, Sewell Setzer. A C.AI blog mourned the “tragic loss of one of our users” and noted updates, included changes “to reduce the likelihood of encountering sensitive or suggestive content,” improved detection and intervention in harmful chat sessions, and “a revised disclaimer on every chat to remind users that the AI is not a real person.”

Although Character Technologies argues that it’s common to update safety practices over time, Garcia’s team alleged these updates show that C.AI could have made a safer product and chose not to.

Expert warns against giving AI products rights

Character Technologies has also argued that C.AI is not a “product” as Florida law defines it. That has striking industry implications, according to Camille Carlton, a policy director for the Center for Humane Technology who is serving as a technical expert on the case.

At the press briefing, Carlton suggested that “by invoking these First Amendment protections over speech without really specifying whose speech is being protected, Character.AI’s defense has really laid the groundwork for a world in which LLM outputs are protected speech and for a world in which AI products could have other protected rights in the same way that humans do.”

Since chatbot outputs seemingly don’t have Section 230 protections—Jain noted it was somewhat surprising that Character Technologies did not raise this defense—the chatbot maker may be attempting to secure the First Amendment as a shield instead, Carlton suggested.

“It’s a move that they’re incentivized to take because it would reduce their own accountability and their own responsibility,” Carlton said.

Jain expects that whatever Conway decides, the losing side will appeal. However, if Conway denies the motion, then discovery can begin, perhaps allowing Garcia the clearest view yet into the allegedly harmful chats she believes manipulated her son into feeling completely disconnected from the real world.

If courts grant AI products across the board such rights, Carlton warned, troubled parents like Garcia may have no recourse for potentially dangerous outputs.

“This issue could fundamentally reshape how the law approaches AI free speech and corporate accountability,” Carlton said. “And I think the bottom line from our perspective—and from what we’re seeing in terms of the trends in Character.AI and the broader trends from these AI labs—is that we need to double down on the fact that these are products. They’re not people.”

Character Technologies declined Ars’ request to comment.

If you or someone you know is feeling suicidal or in distress, please call the Suicide Prevention Lifeline number, 1-800-273-TALK (8255), which will put you in touch with a local crisis center.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

First Amendment doesn’t just protect human speech, chatbot maker argues Read More »

midjourney-introduces-first-new-image-generation-model-in-over-a-year

Midjourney introduces first new image generation model in over a year

AI image generator Midjourney released its first new model in quite some time today; dubbed V7, it’s a ground-up rework that is available in alpha to users now.

There are two areas of improvement in V7: the first is better images, and the second is new tools and workflows.

Starting with the image improvements, V7 promises much higher coherence and consistency for hands, fingers, body parts, and “objects of all kinds.” It also offers much more detailed and realistic textures and materials, like skin wrinkles or the subtleties of a ceramic pot.

Those details are often among the most obvious telltale signs that an image has been AI-generated. To be clear, Midjourney isn’t claiming to have made advancements that make AI images unrecognizable to a trained eye; it’s just saying that some of the messiness we’re accustomed to has been cleaned up to a significant degree.

V7 can reproduce materials and lighting situations that V6.1 usually couldn’t. Credit: Xeophon

On the features side, the star of the show is the new “Draft Mode.” On its various communication channels with users (a blog, Discord, X, and so on), Midjourney says that “Draft mode is half the cost and renders images at 10 times the speed.”

However, the images are of lower quality than what you get in the other modes, so this is not intended to be the way you produce final images. Rather, it’s meant to be a way to iterate and explore to find the desired result before switching modes to make something ready for public consumption.

V7 comes with two modes: turbo and relax. Turbo generates final images quickly but is twice as expensive in terms of credit use, while relax mode takes its time but is half as expensive. There is currently no standard mode for V7, strangely; Midjourney says that’s coming later, as it needs some more time to be refined.

Midjourney introduces first new image generation model in over a year Read More »