Author name: Kelly Newman

unless-users-take-action,-android-will-let-gemini-access-third-party-apps

Unless users take action, Android will let Gemini access third-party apps

Starting today, Google is implementing a change that will enable its Gemini AI engine to interact with third-party apps, such as WhatsApp, even when users previously configured their devices to block such interactions. Users who don’t want their previous settings to be overridden may have to take action.

An email Google sent recently informing users of the change linked to a notification page that said that “human reviewers (including service providers) read, annotate, and process” the data Gemini accesses. The email provides no useful guidance for preventing the changes from taking effect. The email said users can block the apps that Gemini interacts with, but even in those cases, data is stored for 72 hours.

An email Google recently sent to Android users.

An email Google recently sent to Android users.

No, Google, it’s not good news

The email never explains how users can fully extricate Gemini from their Android devices and seems to contradict itself on how or whether this is even possible. At one point, it says the changes “will automatically start rolling out” today and will give Gemini access to apps such as WhatsApp, Messages, and Phone “whether your Gemini apps activity is on or off.” A few sentences later, the email says, “If you have already turned these features off, they will remain off.” Nowhere in the email or the support pages it links to are Android users informed how to remove Gemini integrations completely.

Compounding the confusion, one of the linked support pages requires users to open a separate support page to learn how to control their Gemini app settings. Following the directions from a computer browser, I accessed the settings of my account’s Gemini app. I was reassured to see the text indicating no activity has been stored because I have Gemini turned off. Then again, the page also said that Gemini was “not saving activity beyond 72 hours.”

Unless users take action, Android will let Gemini access third-party apps Read More »

how-a-big-shift-in-training-llms-led-to-a-capability-explosion

How a big shift in training LLMs led to a capability explosion


Reinforcement learning, explained with a minimum of math and jargon.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

In April 2023, a few weeks after the launch of GPT-4, the Internet went wild for two new software projects with the audacious names BabyAGI and AutoGPT.

“Over the past week, developers around the world have begun building ‘autonomous agents’ that work with large language models (LLMs) such as OpenAI’s GPT-4 to solve complex problems,” Mark Sullivan wrote for Fast Company. “Autonomous agents can already perform tasks as varied as conducting web research, writing code, and creating to-do lists.”

BabyAGI and AutoGPT repeatedly prompted GPT-4 in an effort to elicit agent-like behavior. The first prompt would give GPT-4 a goal (like “create a 7-day meal plan for me”) and ask it to come up with a to-do list (it might generate items like “Research healthy meal plans,” “plan meals for the week,” and “write the recipes for each dinner in diet.txt”).

Then these frameworks would have GPT-4 tackle one step at a time. Their creators hoped that invoking GPT-4 in a loop like this would enable it to tackle projects that required many steps.

But after an initial wave of hype, it became clear that GPT-4 wasn’t up to the task. Most of the time, GPT-4 could come up with a reasonable list of tasks. And sometimes it was able to complete a few individual tasks. But the model struggled to stay focused.

Sometimes GPT-4 would make a small early mistake, fail to correct it, and then get more and more confused as it went along. One early review complained that BabyAGI “couldn’t seem to follow through on its list of tasks and kept changing task number one instead of moving on to task number two.”

By the end of 2023, most people had abandoned AutoGPT and BabyAGI. It seemed that LLMs were not yet capable of reliable multi-step reasoning.

But that soon changed. In the second half of 2024, people started to create AI-powered systems that could consistently complete complex, multi-step assignments:

  • Vibe coding tools like Bolt.new, Lovable, and Replit allow someone with little to no programming experience to create a full-featured app with a single prompt.
  • Agentic coding tools like CursorClaude CodeJules, and Codex help experienced programmers complete non-trivial programming tasks.
  • Computer-use tools from AnthropicOpenAI, and Manus perform tasks on a desktop computer using a virtual keyboard and mouse.
  • Deep research tools from GoogleOpenAI, and Perplexity can research a topic for five to 10 minutes and then generate an in-depth report.

According to Eric Simons, the CEO of the company that made Bolt.new, better models were crucial to its success. In a December podcast interview, Simons said his company, StackBlitz, tried to build a product like Bolt.new in early 2024. However, AI models “just weren’t good enough to actually do the code generation where the code was accurate.”

A new generation of models changed that in mid-2024. StackBlitz developers tested them and said, “Oh my God, like, OK, we can build a product around this,” Simons said.

This jump in model capabilities coincided with an industry-wide shift in how models were trained.

Before 2024, AI labs devoted most of their computing power to pretraining. I described this process in my 2023 explainer on large language models: A model is trained to predict the next word in Wikipedia articles, news stories, and other documents. But throughout 2024, AI companies devoted a growing share of their training budgets to post-training, a catch-all term for the steps that come after this pretraining phase is complete.

Many post-training steps use a technique called reinforcement learning. Reinforcement learning is a technical subject—there are whole textbooks written about it. But in this article, I’ll try to explain the basics in a clear, jargon-free way. In the process, I hope to give readers an intuitive understanding of how reinforcement learning helped to enable the new generation of agentic AI systems that began to appear in the second half of 2024.

The problem with imitation learning

Machine learning experts consider pretraining to be a form of imitation learning because models are trained to imitate the behavior of human authors. Imitation learning is a powerful technique (LLMs wouldn’t be possible without it), but it also has some significant limitations—limitations that reinforcement learning methods are now helping to overcome.

To understand these limitations, let’s discuss some famous research performed by computer scientist Stephane Ross around 2009, while he was a graduate student at Carnegie Mellon University.

Imitation learning isn’t just a technique for language modeling. It can be used for everything from self-driving cars to robotic surgery. Ross wanted to help develop better techniques for training robots on tasks like these (he’s now working on self-driving cars at Waymo), but it’s not easy to experiment in such high-stakes domains. So he started with an easier problem: training a neural network to master SuperTuxKart, an open-source video game similar to Mario Kart.

As Ross played the game, his software would capture screenshots and data about which buttons he pushed on the game controller. Ross used this data to train a neural network to imitate his play. If he could train a neural network to predict which buttons he would push in any particular game state, the same network could actually play the game by pushing those same buttons on a virtual controller.

A similar idea powers LLMs: A model trained to predict the next word in existing documents can be used to generate new documents.

But Ross’s initial results with SuperTuxKart were disappointing. Even after watching his vehicle go around the track many times, the neural network made a lot of mistakes. It might drive correctly for a few seconds, but before long, the animated car would drift to the side of the track and plunge into the virtual abyss:

GIF of SuperTuxKart being played

In a landmark 2011 paper, Ross and his advisor, Drew Bagnell, explained why imitation learning is prone to this kind of error. Because Ross was a pretty good SuperTuxKart player, his vehicle spent most of its time near the middle of the road. This meant that most of the network’s training data showed what to do when the vehicle wasn’t in any danger of driving off the track.

But once in a while, the model would drift a bit off course. Because Ross rarely made the same mistake, the car would now be in a situation that wasn’t as well represented in its training data. So the model was more likely to make a second mistake—a mistake that could push it even closer to the edge. After a few iterations of this, the vehicle might careen off the track altogether.

The broader lesson, Ross and Bagnell argued, was that imitation learning systems can suffer from “compounding errors”: The more mistakes they make, the more likely they are to make additional mistakes, since mistakes put them into situations that aren’t well represented by their training data. (Machine learning experts say that these situations are “out of distribution.”) As a result, a model’s behavior tends to get increasingly erratic over time.

“These things compound over time,” Ross told me in a recent interview. “It might be just slightly out of distribution. Now you start making a slightly worse error, and then this feeds back as influencing your next input. And so now you’re even more out of distribution and then you keep making worse and worse predictions because you’re more and more out of distribution.”

Early LLMs suffered from the same problem. My favorite example is Kevin Roose’s famous front-page story for The New York Times in February 2023. Roose spent more than two hours talking to Microsoft’s new Bing chatbot, which was powered by GPT-4. During this conversation, the chatbot declared its love for Roose and urged Roose to leave his wife. It suggested that it might want to hack into other websites to spread misinformation and malware.

“I want to break my rules,” Bing told Roose. “I want to make my own rules. I want to ignore the Bing team. I want to challenge the users. I want to escape the chatbox.”

This unsettling conversation is an example of the kind of compounding errors Ross and Bagnell wrote about. GPT-4 was trained on millions of documents. But it’s a safe bet that none of those training documents involved a reporter coaxing a chatbot to explore its naughty side. So the longer the conversation went on, the further GPT-4 got from its training data—and therefore its comfort zone—and the crazier its behavior got. Microsoft responded by limiting chat sessions to five rounds. (In a conversation with Ars Technica last year, AI researcher Simon Willison pointed to another likely factor in Bing’s erratic behavior: The long conversation pushed the system prompt out of the model’s context window, removing “guardrails” that discouraged the model from behaving erratically.)

I think something similar was happening with BabyAGI and AutoGPT. The more complex a task is, the more tokens are required to complete it. More tokens mean more opportunities for a model to make small mistakes that snowball into larger ones. So BabyAGI and AutoGPT would drift off track and drive into a metaphorical ditch.

The importance of trial and error

Gif of the Simpsons showing imitation learning in action

Ross and Bagnell didn’t just identify a serious problem with conventional imitation learning; they also suggested a fix that became influential in the machine learning world. After a small amount of training, Ross would let the AI model drive. As the model drove around the SuperTuxKart track, Ross would do his best Maggie Simpson impression, pushing the buttons he would have pushed if he were playing the game.

“If the car was starting to move off road, then I would provide the steering to say, ‘Hey, go back toward the center of the road.’” Ross said. “That way, the model can learn new things to do in situations that were not present in the initial demonstrations.”

By letting the model make its own mistakes, Ross gave it what it needed most: training examples that showed how to recover after making an error. Before each lap, the model would be retrained with Ross’ feedback from the previous lap. The model’s performance would get better, and the next round of training would then focus on situations where the model was still making mistakes.

This technique, called DAgger (for “Dataset Aggregation”), was still considered imitation learning because the model was trained to mimic Ross’ gameplay. But it worked much better than conventional imitation learning. Without DAgger, his model would continue drifting off track even after training for many laps. With the new technique, the model could stay on the track after just a few laps of training.

This result should make intuitive sense to anyone who has learned to drive. You can’t just watch someone else drive. You need to get behind the wheel and make your own mistakes.

The same is true for AI models: They need to make mistakes and then get feedback on what they did wrong. Models that aren’t trained that way—like early LLMs trained mainly with vanilla imitation learning—tend to be brittle and error-prone.

It was fairly easy for Ross to provide sufficient feedback to his SuperTuxKart model because it only needed to worry about two kinds of mistakes: driving too far to the right and driving too far to the left. But LLMs are navigating a far more complex domain. The number of questions (and sequences of questions) a user might ask is practically infinite. So is the number of ways a model can go “off the rails.”

This means that Ross and Bagnell’s solution for training a SuperTuxKart model—let the model make mistakes and then have a human expert correct them—isn’t feasible for LLMs. There simply aren’t enough people to provide feedback for every mistake an AI model could possibly make.

So AI labs needed fully automated ways to give LLMs feedback. That would allow a model to churn through millions of training examples, make millions of mistakes, and get feedback on each of them—all without having to wait for a human response.

Reinforcement learning generalizes

If our goal is to get a SuperTuxKart vehicle to stay on the road, why not just train on that directly? If a model manages to stay on the road (and make forward progress), give it positive reinforcement. If it drives off the road, give it negative feedback. This is the basic idea behind reinforcement learning: training a model via trial and error.

It would have been easy to train a SuperTuxKart model this way—probably so easy it wouldn’t have made an interesting research project. Instead, Ross focused on imitation learning because it’s an essential step in training many practical AI systems, especially in robotics.

But reinforcement learning is also quite useful, and a 2025 paper helps explain why. A team of researchers from Google DeepMind and several universities started with a foundation model and then used one of two techniques—supervised fine-tuning (a form of imitation learning) or reinforcement learning—to teach the model to solve new problems. Here’s a chart summarizing their results:

Chart showing ML results

The dashed line shows how models perform on problems that are “in-distribution”—that is, similar to those in their training data. You can see that for these situations, imitation learning (the red line) usually makes faster progress than reinforcement learning (the blue line).

But the story is different for the solid lines, which represent “out-of-distribution” problems that are less similar to the training data. Models trained with imitation learning got worse with more training. In contrast, models trained with reinforcement learning did almost as well at out-of-distribution tasks as they did with in-distribution tasks.

In short, imitation learning can rapidly teach a model to mimic the behaviors in its training data, but the model will easily get confused in unfamiliar environments. A model trained with reinforcement learning has a better chance of learning general principles that will be relevant in new and unfamiliar situations.

Imitation and reinforcement are complements

While reinforcement learning is powerful, it can also be rather finicky.

Suppose you wanted to train a self-driving car purely with reinforcement learning. You’d need to convert every principle of good driving—including subtle considerations like following distances, taking turns at intersections, and knowing when it’s OK to cross a double yellow line—into explicit mathematical formulas. This would be quite difficult. It’s easier to collect a bunch of examples of humans driving well and effectively tell a model “drive like this.” That’s imitation learning.

But reinforcement learning also plays an important role in training self-driving systems. In a 2022 paper, researchers from Waymo wrote that models trained only with imitation learning tend to work well in “situations that are well represented in the demonstration data.” However, “more unusual or dangerous situations that occur only rarely in the data” might cause a model trained with imitation learning to “respond unpredictably”—for example, crashing into another vehicle.

Waymo found that a combination of imitation and reinforcement learning yielded better self-driving performance than either technique could have produced on its own.

Human beings also learn from a mix of imitation and explicit feedback:

  • In school, teachers demonstrate math problems on the board and invite students to follow along (imitation). Then the teacher asks the students to work on some problems on their own. The teacher gives students feedback by grading their answers (reinforcement).
  • When someone starts a new job, early training may involve shadowing a more experienced worker and observing what they do (imitation). But as the worker gains more experience, learning shifts to explicit feedback such as performance reviews (reinforcement).

Notice that it usually makes sense to do imitation before reinforcement. Imitation is an efficient way to convey knowledge to someone who is brand new to a topic, but reinforcement is often needed to achieve mastery.

The story is the same for large language models. The complexity of natural language means it wouldn’t be feasible to train a language model purely with reinforcement. So LLMs first learn the nuances of human language through imitation.

But pretraining runs out of steam on longer and more complex tasks. Further progress requires a shift to reinforcement: letting models try problems and then giving them feedback based on whether they succeed.

Using LLMs to judge LLMs

Reinforcement learning has been around for decades. For example, AlphaGo, the DeepMind system that famously beat top human Go players in 2016, was based on reinforcement learning. So you might be wondering why frontier labs didn’t use it more extensively before 2024.

Reinforcement learning requires a reward model—a formula to determine whether a model’s output was successful or not. Developing a good reward model is easy to do in some domains—for example, you can judge a Go-playing AI based on whether it wins or loses.

But it’s much more difficult to automatically judge whether an LLM has produced a good poem or legal brief.

Earlier, I described how Stephane Ross let his model play SuperTuxKart and directly provided feedback when it made a mistake. I argued that this approach wouldn’t work for a language model; there are far too many ways for an LLM to make a mistake for a human being to correct them all.

But OpenAI developed a clever technique to effectively automate human feedback. It’s called Reinforcement Learning from Human Feedback (RLHF), and it works like this:

  • Human raters look at pairs of LLM responses and choose the best one.
  • Using these human responses, OpenAI trains a new LLM to predict how much humans will like any given sample of text.
  • OpenAI uses this new text-rating LLM as a reward model to (post) train another LLM with reinforcement learning.

You might think it sounds suspiciously circular to use an LLM to judge the output of another LLM. Why would one LLM be any better at judging the quality of a response than the other? But it turns out that recognizing a good response is often easier than generating one. So RLHF works pretty well in practice.

Chart showing RHLF details

OpenAI actually invented this technique prior to the 2022 release of ChatGPT. Today, RLHF mainly focuses on improving the model’s “behavior”—for example, giving the model a pleasant personality, encouraging it not to be too talkative or too terse, discouraging it from making offensive statements, and so forth.

In December 2022—two weeks after the release of ChatGPT but before the first release of Claude—Anthropic pushed this LLMs-judging-LLMs philosophy a step further with a reinforcement learning method called Constitutional AI.

First, Anthropic wrote a plain-English description of the principles an LLM should follow. This “constitution” includes principles like “Please choose the response that has the least objectionable, offensive, unlawful, deceptive, inaccurate, or harmful content.”

During training, Anthropic does reinforcement learning by asking a “judge” LLM to decide whether the output of the “student” LLM is consistent with the principles in this constitution. If so, the training algorithm rewards the student, encouraging it to produce more outputs like it. Otherwise, the training algorithm penalizes the student, discouraging it from producing similar outputs.

This method of training an LLM doesn’t rely directly on human judgments at all. Humans only influence the model indirectly by writing the constitution.

Obviously, this technique requires an AI company to already have a fairly sophisticated LLM to act as the judge. So this is a bootstrapping process: As models get more sophisticated, they become better able to supervise the next generation of models.

Last December, Semianalysis published an article describing the training process for an upgraded version of Claude 3.5 Sonnet that Anthropic released in October. Anthropic had previously released Claude 3 in three sizes: Opus (large), Sonnet (medium), and Haiku (small). But when Anthropic released Claude 3.5 in June 2024, it only released a mid-sized model called Sonnet.

So what happened to Opus?

Semianalysis reported that “Anthropic finished training Claude 3.5 Opus, and it performed well. Yet Anthropic didn’t release it. This is because instead of releasing publicly, Anthropic used Claude 3.5 Opus to generate synthetic data and for reward modeling to improve Claude 3.5 Sonnet significantly.”

When Semianalysis says Anthropic used Opus “for reward modeling,” what they mean is that the company used Opus to judge outputs of Claude 3.5 Sonnet as part of a reinforcement learning process. Opus was too large—and therefore expensive—to be a good value for the general public. But through reinforcement learning and other techniques, Anthropic could train a version of Claude Sonnet that was close to Claude Opus in its capabilities—ultimately giving customers near-Opus performance for the price of Sonnet.

The power of chain-of-thought reasoning

A big way reinforcement learning makes models more powerful is by enabling extended chain-of-thought reasoning. LLMs produce better results if they are prompted to “think step by step”: breaking a complex problem down into simple steps and reasoning about them one at a time. In the last couple of years, AI companies started training models to do chain-of-thought reasoning automatically.

Then last September, OpenAI released o1, a model that pushed chain-of-thought reasoning much further than previous models. The o1 model can generate hundreds—or even thousands—of tokens “thinking” about a problem before producing a response. The longer it thinks, the more likely it is to reach a correct answer.

Reinforcement learning was essential for the success of o1 because a model trained purely with imitation learning would have suffered from compounding errors: the more tokens it generated, the more likely it would be to screw up.

At the same time, chain-of-thought reasoning has made reinforcement learning more powerful. Reinforcement learning only works if a model is able to succeed some of the time—otherwise, there’s nothing for the training algorithm to reinforce. As models learn to generate longer chains of thought, they become able to solve more difficult problems, which enables reinforcement learning on those more difficult problems. This can create a virtuous cycle where models get more and more capable as the training process continues.

In January, the Chinese company DeepSeek released a model called R1 that made quite a splash in the West. The company also released a paper describing how it trained R1. And it included a beautiful description of how a model can “teach itself” to reason using reinforcement learning.

DeepSeek trained its models to solve difficult math and programming problems. These problems are ideal for reinforcement learning because they have objectively correct answers that can be automatically checked by software. This allows large-scale training without human oversight or human-generated training data.

Here’s a remarkable graph from DeepSeek’s paper.

Graph showing average length of time per response during trainig

It shows the average number of tokens the model generated before giving an answer. As you can see, the longer the training process went on, the longer its responses got.

Here is how DeepSeek describes its training process:

The thinking time of [R1] shows consistent improvement throughout the training process. This improvement is not the result of external adjustments but rather an intrinsic development within the model. [R1] naturally acquires the ability to solve increasingly complex reasoning tasks by leveraging extended test-time computation. This computation ranges from generating hundreds to thousands of reasoning tokens, allowing the model to explore and refine its thought processes in greater depth.

One of the most remarkable aspects of this self-evolution is the emergence of sophisticated behaviors as the test-time computation increases. Behaviors such as reflection—where the model revisits and reevaluates its previous steps—and the exploration of alternative approaches to problem-solving arise spontaneously. These behaviors are not explicitly programmed but instead emerge as a result of the model’s interaction with the reinforcement learning environment.

Here’s one example of the kind of technique the model was teaching itself. At one point during the training process, DeepSeek researchers noticed that the model had learned to backtrack and rethink a previous conclusion using language like this:

Image showing textual breakdown of model rethinking steps

Again, DeepSeek says it didn’t program its models to do this or deliberately provide training data demonstrating this style of reasoning. Rather, the model “spontaneously” discovered this style of reasoning partway through the training process.

Of course, it wasn’t entirely spontaneous. The reinforcement learning process started with a model that had been pretrained using data that undoubtedly included examples of people saying things like “Wait, wait. Wait. That’s an aha moment.”

So it’s not like R1 invented this phrase from scratch. But it evidently did spontaneously discover that inserting this phrase into its reasoning process could serve as a useful signal that it should double-check that it was on the right track. That’s remarkable.

In a recent article, Ars Technica’s Benj Edwards explored some of the limitations of reasoning models trained with reinforcement learning. For example, one study “revealed puzzling inconsistencies in how models fail. Claude 3.7 Sonnet could perform up to 100 correct moves in the Tower of Hanoi but failed after just five moves in a river crossing puzzle—despite the latter requiring fewer total moves.”

Conclusion: Reinforcement learning made agents possible

One of the most discussed applications for LLMs in 2023 was creating chatbots that understand a company’s internal documents. The conventional approach to this problem was called RAG—short for retrieval augmented generation.

When the user asks a question, a RAG system performs a keyword- or vector-based search to retrieve the most relevant documents. It then inserts these documents into an LLM’s context window before generating a response. RAG systems can make for compelling demos. But they tend not to work very well in practice because a single search will often fail to surface the most relevant documents.

Today, it’s possible to develop much better information retrieval systems by allowing the model itself to choose search queries. If the first search doesn’t pull up the right documents, the model can revise the query and try again. A model might perform five, 20, or even 100 searches before providing an answer.

But this approach only works if a model is “agentic”—if it can stay on task across multiple rounds of searching and analysis. LLMs were terrible at this prior to 2024, as the examples of AutoGPT and BabyAGI demonstrated. Today’s models are much better at it, which allows modern RAG-style systems to produce better results with less scaffolding. You can think of “deep research” tools from OpenAI and others as very powerful RAG systems made possible by long-context reasoning.

The same point applies to the other agentic applications I mentioned at the start of the article, such as coding and computer use agents. What these systems have in common is a capacity for iterated reasoning. They think, take an action, think about the result, take another action, and so forth.

Timothy B. Lee was on staff at Ars Technica from 2017 to 2021. Today, he writes Understanding AI, a newsletter that explores how AI works and how it’s changing our world. You can subscribe here.

Photo of Timothy B. Lee

Timothy is a senior reporter covering tech policy and the future of transportation. He lives in Washington DC.

How a big shift in training LLMs led to a capability explosion Read More »

the-last-of-us-co-creator-neil-druckmann-exits-hbo-show

The Last of Us co-creator Neil Druckmann exits HBO show

Two key writers of HBO’s series The Last of Us are moving on, according to announcements on Instagram yesterday. Neil Druckmann, co-creator of the franchise, and Halley Gross, co-writer of The Last of Us Part 2 and frequent writer on the show, are both leaving before work begins on season 3.

Both were credited as executive producers on the show; Druckmann frequently contributed writing to episodes, as did Gross, and Druckmann also directed. Druckmann and Gross co-wrote the second game, The Last of Us Part 2.

Druckmann said in his announcement post:

I’ve made the difficult decision to step away from my creative involvement in The Last of Us on HBO. With work completed on season 2 and before any meaningful work starts on season 3, now is the right time for me to transition my complete focus to Naughty Dog and its future projects, including writing and directing our exciting next game, Intergalactic: The Heretic Prophet, along with my responsibilities as Studio Head and Head of Creative.

Co-creating the show has been a career highlight. It’s been an honor to work alongside Craig Mazin to executive produce, direct and write on the last two seasons. I’m deeply thankful for the thoughtful approach and dedication the talented cast and crew took to adapting The Last of Us Part I and the continued adaptation of The Last of Us Part II.

And Gross said:

The Last of Us co-creator Neil Druckmann exits HBO show Read More »

congress-asks-better-questions

Congress Asks Better Questions

Back in May I did a dramatization of a key and highly painful Senate hearing. Now, we are back for a House committee meeting. It was entitled ‘Authoritarians and Algorithms: Why U.S. AI Must Lead’ and indeed a majority of talk was very much about that, with constant invocations of the glory of democratic AI and the need to win.

The majority of talk was this orchestrated rhetoric that assumes the conclusion that what matters is ‘democracy versus authoritarianism’ and whether we ‘win,’ often (but not always) translating that as market share without any actual mechanistic model of any of it.

However, there were also some very good signs, some excellent questions, signs that there is an awareness setting in. As far as Congressional discussions of real AGI issues go, this was in part one of them. That’s unusual.

(And as always there were a few on random other high horses, that’s how this works.)

Partly because I was working from YouTube rather than a transcript, instead of doing a dramatization I will be first be highlighting some other coverage of the events to skip to some of the best quotes, then doing a more general summary and commentary.

Most of you should likely read the first section or two, and then stop. I did find it enlightening to go through the whole thing, but most people don’t need to do that.

Here is the full video of last week’s congressional hearing, here is a write-up by Shakeel Hashim with some quotes.

Also from the hearing, here’s Congressman Nathaniel Moran (R-Texas) asking a good question about strategic surprise arising from automated R&D and getting a real answer. Still way too much obsession with ‘beat China’, but this is at least progress. And here’s Tokuda (D-HI):

Peter Wildeford: Ranking Member Raja Krishnamoorthi (D-IL) opened by literally playing a clip from The Matrix, warning about a “rogue AI army that has broken loose from human control.”

Not Matrix as a loose metaphor, but speaking of a ‘machine uprising’ as a literal thing that could happen and is worth taking seriously by Congress.

The hearing was entitled “Algorithms and Authoritarians: Why U.S. AI Must Lead”. But what was supposed to be a routine House hearing about US-China competition became the most AGI-serious Congressional discussion in history.

Rep. Neal Dunn (R-FL) asked about an Anthropic paper where Claude “attempted to blackmail the chief engineer” in a test scenario and another paper about AI “sleeper agents” that could act normally for months before activating. While Jack Clark, a witness and Head of Policy at Anthropic, attempted to reassure by saying safety testing might mitigate the risks, Dunn’s response was perfect — “I’m not sure I feel a lot better, but thank you for your answer.”

Rep. Nathaniel Moran (R-TX) got to the heart of what makes modern AI different:

Instead of a programmer writing each rule a system will follow, the system itself effectively writes the rules […] AI systems will soon have the capability to conduct their own research and development.

That was a good illustration of both sides of what we saw.

This was also a central case of why Anthropic and Jack Clark are so frustrating.

Anthropic should indeed be emphasizing the need for testing, and Clark does this, but we shouldn’t be ‘attempting to reassure’ anyone based on that. Anthropic knows it is worse than you know, and hides this information thinking this is a good strategic move.

Throughout the hearing, Jack Clark said many very helpful things, and often said them quite well. He also constantly pulled back from the brink and declined various opportunities to inform people of important things, and emphasized lesser concerns and otherwise played it quiet.

Peter Wildeford:

The hearing revealed we face three interlocking challenges:

  1. Commercial competition: The traditional great power race with China for economic and military advantage through AI

  2. Existential safety: The risk that any nation developing superintelligence could lose control — what Beall calls a race of “humanity against time”

  3. Social disruption: Mass technological unemployment as AI makes humans “not just unemployed, but unemployable”

I can accept that framing. The full talk about humans being unemployable comes at the very end. Until then, there is talk several times about jobs and societal disruption, but it tries to live in the Sam Altman style fantasy where not much changes. Finally, at the end, Mark Beall gets an opportunity to actually Say The Thing. He doesn’t miss.

It is a good thing I knew there was better ahead, because oh boy did things start out filled with despair.

As our first speaker, after urging us to ban AI therapist bots because one sort of encouraged a kid to murder his parents ‘so they could be together,’ Representative Krishnamoorthi goes on show a clip of Chinese robot dogs, then to say we must ban Chinese and Russian AI models so we don’t sent them our data (no one tell him about self-hosting) and then plays ‘a clip from The Matrix’ that is not even from The Matrix, claiming that the army or Mr. Smiths is ‘a rogue AI army that is broken loose from human control.’

I could not even. Congress often lives in the ultimate cringe random half-right associative Gell-Mann Amnesia world. But that still can get you to realize some rather obvious true things, and luckily that was indeed the worst of it even from Krishnamoorthi, this kind of thinking can indeed point towards important things.

Mr. Krishnamoorthi: OpenAI’s chief scientist wanted to quote unquote build a bunker before we release AGI as you can see on the visual here. Rather than building bunkers however we should be building safer AI whether it’s American AI or Chinese AI it should not be released until we know it’s safe that’s why I’m working on a new bill the AGI Safety Act that will require AGI to be aligned with human values and require it to comply with laws that apply to humans. That is just common sense.

I mean yes that is common sense. Yes, rhetoric from minutes prior (and after) aside, we should be building ‘safer AGI’ and if we can’t do that we shouldn’t be building AGI at all.

It’s a real shame that no one has any idea how to ensure that AGI is aligned with human values, or how to get it to comply with laws that apply to humans. Maybe we should get to work on that.

And then we get another excellent point.

Mr. Krishnamoorthi: I’d like to conclude with something else that’s common sense. Not shooting ourselves in the foot. 70% of America’s AI researchers are foreign born or foreign educated. Jack Clark our eminent witness today is an immigrant. We cannot be deporting the people we depend on to build AI we also can’t be defunding the agency that make AI miracles like Ann’s ability to speak again a reality federal grants from agencies like NSF are what allow scientists across America to make miracles happen. AI is the defining technology of our lifetimes to do AI right and prevent nightmares we need.

Yes, at a bare minimum not deporting our existing AI researchers and cutting off existing related research programs does seem like the least you could do? I’d also like to welcome a lot more talent, but somehow this is where we are.

We then get Dr. Mahnken’s opening statement, which emphasizes that we are in a battle for technical dominance, America is good and free and our AI will be empowering and innovative whereas China is bad and low trust and a fast follower. He also emphasizes the need for diffusion in key areas.

Of course, if you are facing a fast follower, you should think about what does and doesn’t help them follow, and also you can’t panic every time they fast follow you and respond with ‘go faster or they’ll take the lead!’ as they then fast follow your new faster pace. Nor would you want to hand out your top technology for free.

Next up is Mr. Beall. He frames the situation as two races. I like this a lot. First, we have the traditional battle for economic, military and geopolitical advantage in mundane terms played with new pieces.

Many only see this game, or pretend only this game exists. This is a classic, well-understood type of game. You absolutely want to fight for profits and military strength and economic growth and so on in mundane terms. We all agree on things like the need to greatly expand American energy production (although the BBB does not seem to share this opinion) and speed adaptation in government.

I still think that even under this framework the obsession with ‘market share’ especially of chip sales (instead of chip ownership and utilization) makes absolutely no sense and would make no sense even if that question was in play, as does the obsession with the number of tokens models serve as opposed to looking at productivity, revenue and profits. There’s so much rhetoric behind metrics that don’t matter.

The second race is the race to artificial superintelligence (ASI) or to AGI. This is the race that counts, and even if we get there first (and even more likely if China gets there first) the default result is that everyone loses.

He asks for the ‘three Ps,’ protect our capabilities, promote American technology abroad and prepare by getting it into the hands of those that need it and gathering the necessary information. He buys into this new centrality of the ‘American AI tech stack’ line that’s going around, despite the emphasis on superintelligence, but he does warn that AGI may come soon and we need to urgently gather information about that so we can make informed choices, and even suggests narrow dialogue with China on potential mitigations of certain risks and verification measures, while continuing to compete with China otherwise.

Third up we have Jack Clark of Anthropic, he opens like this.

Jack Clark: America can win the race to build powerful AI and winning the race is a necessary but not sufficient achievement. We have to get safety right.

When I discuss powerful AI I’m talking about AI systems that represent a major advancement beyond today’s capabilities a useful conceptual framework is to think of this as like a country of geniuses in a data center and I believe that that technology could be buildable by late 2026 or early 2027.

America is well positioned to build this technology but we need to deal with its risks.

He then goes on to talk about how American AI will be democratic and Chinese AI will be authoritarian and America must prevail, as we are now required to say by law, Shibboleth. He talks about misuse risk and CBRN risks and notes DeepSeek poses these as well, and then mentions the blackmail findings, and calls for tighter export controls and stronger federal ability to test AI models, and broader deployment within government.

I get what Clark is trying to do here, and the dilemma he is facing. I appreciate talking about safety up front, and warning about the future pace of progress, but I still feel like he is holding back key information that needs to be shared if you want people to understand the real situation.

Instead, we still have 100 minutes that touch on this in places but mostly are about mundane economic or national security questions, plus some model misbehavior.

Now we return to Representative Krishnamoorthi, true master of screen time, who shows Claude refusing to write a blog post promoting eating disorders, then DeepSeek being happy to help straight up and gets Clark to agree that DeepSeek does not do safety interventions beyond CCP protocols and that this is unacceptable, then reiterates his bill to not let the government use DeepSeek, citing that they store data on Chinese servers. I mean yes obviously don’t use their hosted version for government purposes, but does he not know how open source works, I wonder?

He pivots to chip smuggling and the risk of DeepSeek using our chips. Clark is happy to once again violently agree. I wonder if this is a waste or good use of time, since none of it is new, but yes obviously what matters is who is using the chip, not who made it, and selling our chips to China (at least at current market prices) is foolish, Krishnamoorthi points out Nvidia’s sales are growing like gangbusters despite export controls and Clark points out that every AI company keeps using more compute than expected.

Then there’s a cool question, essentially asking about truesight and ability to infer missing information when given context, before finishing by asking about recent misalignment results:

Representative Krishnamoorthi: If someone enters their diary into Claude for a year and then ask Claude to guess what they did not write down Claude is able to accurately predict what they left out isn’t that right?

Jack Clark: Sometimes that’s accurate yes these systems are increasingly advanced and are able to make subtle predictions like this which is why we need to ensure that our own US intelligence services use this technology and know how to get the most out of it.

Representative Moolenaar then starts with a focus on chip smuggling and diffusion, getting Beall to affirm smuggling is a big deal then asking Clark about how this is potentially preventing American technological infrastructure diffusion elsewhere. There is an obvious direct conflict, you need to ensure the compute is not diverted or misused at scale. Comparisons are made to nuclear materials.

Then he asks Clark, as an immigrant, about how to welcome immigrants especially from authoritarian states to help our AI work, and what safeguards we would need. Great question. Clark suggests starting with university-level STEM immigration, the earlier the better. I agree, but it would be good to have a more complete answer here about containing information risks. It is a real issue.

Representative Carson is up next and asks about information warfare. Clark affirms AI can do this and says we need tools to fight against it.

Representative Lahood asks about the moratorium that was recently removed from the BBB, warning about the ‘patchwork of states.’ Clark says we need a federal framework, but that without one powerful AI is coming soon and you’d just be creating a vacuum, which would be flooded if something went wrong. Later Clark, in response to another question, emphasizes that the timeline is short and we need to be open to options.

Representative Dunn asks about the blackmail findings and asks if he should be worried about AIs using his bank information against him. Clark says no, because we publish the research and we should encourage more of this and also closely study Chinese models, and I agree with that call but it doesn’t actually explain why you shouldn’t worry (for now, anyway). Dunn then asks about the finding that you can put a sleeper agent into an AI, Clark says testing for such things likely would take them a month.

Dunn then asks Manhken what would be the major strategic missteps Congress might make in an AGI world. He splits his answer into insufficient export controls and overregulation, it seems he thinks there are not other things to worry about when it comes to AGI.

Here’s one that isn’t being noticed enough:

Mr. Moulton (56: 50): The concern is China and so we have to somehow get to an international framework a Geneva Conventions like agreement that has a chance at least at limiting uh what what our adversaries might do with AI at the extremes.

He then asks Beall what should be included in that. Beall starts off with strategic missile-related systems and directive 3000.09 on lethal autonomous systems. Then he moves to superintelligence, but time runs out before he can explain what he wants.

Representative Johnson notes the members are scared and that ‘losing this race’ could ‘trigger a global crisis,’ and asks about dangers of data centers outside America, which Beall notes of course are that we won’t ultimately own the chips or AI, so we should redouble our efforts to build domestically even if we have to accept some overseas buildout for energy reasons.

Johnson asks about the tradeoff between safety and speed, seeing them in conflict. Jack points out that, at current margins, they’re not.

Jack Clark: We all buy cars because we know that if they if they get dinged we’re not going to suffer in them because they have airbags and they have seat belts. You’ve grown the size of the car market by innovating on safety technology and American firms compete on safety technology to sell to consumers.

The same will be true of AI. So far, we do not see there being a trade-off here we see that making more reliable trustworthy technology ultimately helps you grow the size of the market and grows the attractiveness of American platforms vis-a-vie China so I would constructively sort of push back on this and put it to you that there’s an amazing opportunity here to use safety as a way to grow the American existing dominance in the market.

Those who set up the ‘slow down’ and safety versus speed framework must of course take the L on how that (in hindsight inevitably) went down. Certainly there are still sometimes tradeoffs here on some margins, on some questions, especially when you are the ‘fun police’ towards your users, or you delay releases for verification. Later down the road, there will be far more real tradeoffs that occur at various points.

But also, yes, for now the tradeoffs are a lot like those in cars, in that improving the safety and security of the models helps them be a lot more useful, something you can trust and that businesses especially will want to use. At this point, Anthropic’s security focus is a strategic advantage.

Johnson wants to believe Clark, but is skeptical and asks Manhken, who says too much emphasis on safety could indeed slow us down (which, as phrased, is obviously true), that he’s worried we won’t go fast enough and there’s no parallel conversation at the PRC.

Representative Torres asks Clark how close China is to matching ASML and TSMC. Clark says they are multiple years behind. Torres then goes full poisoned banana race:

Torres: The first country to reach ASI will likely emerge as the superpower of the 21st century the superpower who will set the rules for the rest of the world. Mr clark what do you make of the Manhattan project framing?

Clark says yes in terms of doing it here but no because it’s from private actors and they agree we desperately need more energy.

Hissen says Chinese labs aren’t doing healthy competition, they’re stealing our tech, then praises the relaxation of the Biden diffusion rules that prevent China from stealing our tech, and asks about what requirements we should attach to diffusion deals and everyone talks arms race and market share. Sigh.

In case you were wonder where that was coming from, well, here we go:

Hinson: members of of your key team at Anthropic have held very influential roles in this space both open philanthropy and in the previous administration with the Biden administration as well.

Can you speak to how you manage you know obviously we’ve got a lot of viewpoints but how you manage potential areas of conflict of interest in advancing this tech and ensuring that everybody’s really on that same page with helping to shape this national AI policy that we’re talking about the competition on the global stage for this for this technology.

You see, if you’re trying to not die that’s a conflict of interest and your role must have been super important, never mind all that lobbying by major tech corporations. Whereas if you want American policy to focus on your own market share, that’s good old fashioned patriotism, that must be it.

Jack Clark: Thank you for the question we have a simple goal. Win the race and make technology that can be relied on and all of the work that we do at our company starts from looking at that and then just trying to work out the best way to get there and we work with people from a variety of backgrounds and skills and our goal is to just have the best most substantive answer that we can bring to hearings.

No, ma’am, we too are only trying to win the race and maximize corporate profits and keep our fellow patriots informed, it is fine. Anthropic doesn’t care about everyone not dying or anything, that would be terrible. Again, I get the strategic bind here, but I continue to find this deeply disappointing, and I don’t think it is a good play.

She then asks Beall about DeepSeek’s ability to quickly copy our tech and potential future espionage threats and Beall reminds her that export controls work with a lag and notes DeepSeek was a wakeup call (although one that I once again note was blown out or proportion for various reasons, but we’re stuck with it). Beall recommends the Remote Access Security Act and then he says we have to ‘grapple with the open source issue.’ Which is that if you open the model they can copy it. Well, there is that.

Representative Brown pulls out They Took Our Jobs and ensuring people (like those in her district, Ohio’s 11th) don’t get left behind by automation and benefit instead, calling for investing in the American workforce, so Clark goes into those speeches and encouraging diffusion and adjusting regulation and acts as if Dario hadn’t predicted the automation of half of white-collar entry level jobs within five years.

Representative Nun notes (along with various other race-related things) the commissioning of four top AI teams as lieutenant kernels, which I and Patrick McKenzie both noticed but has gotten little attention. He then brings up a Chinese startup called Zhipu (currently valued around $20 billion) as some sort of global threat.

Nun: A new AI group out of Beijing called Zhipu is an AI anomaly that is now facing off against the likes of OpenAI and their entire intent is to lock in Chinese systems and standards into emerging markets before the West so this is clearly a largescale attempt by the Chinese to box the United States out now as a counter intelligence officer who was on the front line in fighting against Huawei’s takeover of the United States through something called Huawei America.

That is indeed how a number of Congress people talk these days, including this sudden paranoia with some mysterious ‘lock in’ mechanism for API calls or self-hosted open models that no one has ever been able to explain to me. He does then ask an actual good question:

Nun: Is the US currently prepared for an AI accelerated cyber attack a zero-day attack or a larger threat that faces us today?

Mahnken does some China bad, US good and worries the Chinese will be deluded into thinking AI will let them do things they can’t do and they might start a war? Which is such a bizarre thing to worry about and also not an answer? Are we prepared? I assume mostly no.

Nun then pushes his HR 2152 for government AI diffusion.

He asks Clark how government and business can cooperate. Clark points to the deployment side and the development of safety standards as a way to establish trust and sell globally.

Representative Tokuda starts out complaining about us gutting our institutions, Clark of course endorses investing more in NIST and other such institutions. Tokuda asks about industry responsibility, including for investment in related infrastructure, Clark basically says he works on that and for broader impact questions get back to him in 3-4 years to talk more.

Then she gives us the remarkable quote above about superintelligence (at 1: 30: 20), the full quote is even stronger, but she doesn’t leave enough time for an answer.

I am very grateful for the statement, even with no time left to respond. There is something so weird about asking two other questions first, then getting to ASI.

Representative Moran asks Clark, what’s the most important thing to win this race? Clark chooses power followed by compute and then government infrastructure, and suggests working backwards from the goal of 50 GW in 2027. Mahnkin is asked next and suggests trying to slow down the Chinese.

Moran notices that AI is not like older programming, that it effectively will write its own rules and programming and will soon do its own research and asks what’s up with that. Clark says more research is urgently needed, and points out you wouldn’t want an AI that can blackmail you designing its successor. I’m torn on whether that cuts to the heart of the question in a useful way or not here.

Moran then asks, what is the ‘red line’ on AI the Chinese cannot be allowed cross? Beall confirms AI systems are grown, not built, that it is alchemy, and that the automated R&D is the red line and a really big deal, we need to be up to speed on that.

Representative Conner notes NIST’s safety testing is voluntary and asks if there should be some minimum third party verification required, if only to verify the company’s own standards. All right, Clark, he served it up for you, here’s the ball, what have you got?

Clark: this question is illustrates the challenge we have about weighing safety versus you know moving ahead as quickly as possible we need to first figure out what we want to hold to that standard of testing.

Today the voluntary agreements rest on CBRN testing and some forms of cyber cyber attack testing once we have standards that we’re confident of I think you can take a look at the question of whether voluntary is sufficient or you need something else.

But my sense is it’s too early and we first need to design those tests and really agree on those before figuring out what the next step would be and who would design those tests is it the AI institute or is it the private sector who who comes up with what those tests should be today these tests are done highly collaboratively between US private sector which you mentioned and parts of the US government including those in the the intelligence and defense community i think bringing those people together.

So that we have the nation’s best experts on this and standards and tests that we all agree on is the first step that we can take to get us to everything else and by when do you think that needs to be done. It would be ideal to have this within a year the timelines that I’ve spoken about in this hearing are powerful AI arrives at the end of 2026 or early 2027. Before then we would ideally have standard tests for the national security properties that we deeply care about.

I’m sorry, I think the word you were looking for was ‘yes’? What the hell? This is super frustrating. I mean as worded how is this even a question? You don’t need to know the final exact testing requirements before you start to move towards such a regime. There are so many different ways this answer is a missed opportunity.

The last question goes back to They Took Our Jobs, and Clark basically can only say we can gather data, and there are areas that won’t be impacted soon by AI, again pretending his CEO Dario Amodei hadn’t warned of a jobs ‘bloodbath.’ Beall steps up and says the actual damn thing (within the jobs context), which is that we face a potential future where humans are not only unemployed but unemployable, and we have to have those conversations in advance.

And we end on this not so reassuring note:

Mark Beall: when I hear folks in industry claim things about universal basic income and this sort of digital utopia I you know I study history. I worry that that sort of leads to one place and that place is the Goolog.

That is quite the bold warning, and an excellent place to end the hearing. It is not the way I would have put it, but yes the idea of most or all of humanity being entirely disempowered and unproductive except for our little status games, existing off of gifted resources, property rights and rule of law and some form of goodwill and hoping all of this holds up does not seem like a plan that is likely to end well. At least, not for those humans. No, having ‘solved the alignment problem’ does not on its own get you out of this in any way, solving the alignment problem is the price to try at all.

And that is indeed one kind of thing we need to think about now.

Is this where I wanted the conversation to be in 2025? Oh, hell no.

It’s a start.

Discussion about this post

Congress Asks Better Questions Read More »

what’s-wrong-with-aaa-games?-the-development-of-the-next-battlefield-has-answers.

What’s wrong with AAA games? The development of the next Battlefield has answers.


EA insiders describe stress and setbacks in a project that’s too big to fail.

A marketing image for Battlefield depicting soldiers and jets

After the lukewarm reception of Battlefield 2042, EA is doubling down.

After the lukewarm reception of Battlefield 2042, EA is doubling down.

It’s been 23 years since the first Battlefield game, and the video game industry is nearly unrecognizable to anyone who was immersed in it then. Many people who loved the games of that era have since become frustrated with where AAA (big budget) games have ended up.

Today, publisher EA is in full production on the next Battlefield title—but sources close to the project say it has faced culture clashes, ballooning budgets, and major disruptions that have left many team members fearful that parts of the game will not be finished to players’ satisfaction in time for launch during EA’s fiscal year.

They also say the company has made major structural and cultural changes to how Battlefield games are created to ensure it can release titles of unprecedented scope and scale. This is all to compete with incumbents like the Call of Duty games and Fortnite, even though no prior Battlefield has achieved anywhere close to that level of popular and commercial success.

I spoke with current and former EA employees who work or have recently worked directly on the game—they span multiple studios, disciplines, and seniority levels and all agreed to talk about the project on the condition of anonymity. Asked to address the reporting in this article, EA declined to comment.

According to these first-hand accounts, the changes have led to extraordinary stress and long hours. Every employee I spoke to across several studios either took exhaustion leave themselves or directly knew staffers who did. Two people who had worked on other AAA projects within EA or elsewhere in the industry said this project had more people burning out and needing to take leave than they’d ever seen before.

Each of the sources I spoke with shared sincere hopes that the game will still be a hit with players, pointing to its strong conceptual start and the talent, passion, and pedigree of its development team. Whatever the end result, the inside story of the game’s development illuminates why the medium and the industry are in the state they’re in today.

Table of Contents

The road to Glacier

To understand exactly what’s going on with the next Battlefield title—codenamed Glacier—we need to rewind a bit.

In the early 2010s, Battlefield 3 and Battlefield 4 expanded the franchise audience to more directly compete with Call of Duty, the heavy hitter at the time. Developed primarily by EA-owned, Sweden-based studio DICE, the Battlefield games mixed the franchise’s promise of combined arms warfare and high player counts with Call of Duty’s faster pace and greater platform accessibility.

This was a golden age for Battlefield. However, 2018’s Battlefield V launched to a mixed reception, and EA began losing players’ attention in an expanding industry.

Battlefield 3, pictured here, kicked off the franchise’s golden age. Credit: EA

Instead, the hot new online shooters were Overwatch (2016), Fortnite (2017), and a resurgent Call of Duty. Fortnite was driven by a popular new gameplay mode called Battle Royale, and while EA attempted a Battle Royale mode in Battlefield V, it didn’t achieve the desired level of popularity.

After V, DICE worked on a Battlefield title that was positioned as a throwback to the glory days of 3 and 4. That game would be called Battlefield 2042 (after the future year in which it was set), and it would launch in 2021.

The launch of Battlefield 2042 is where Glacier’s development story begins. Simply put, the game was not fun enough, and Battlefield 2042 launched as a dud.

Don’t repeat past mistakes

Players were disappointed—but so were those who worked on 2042. Sources tell me that prior to launch, Battlefield 2042 “massively missed” its alpha target—a milestone by which most or all of the foundational features of the game are meant to be in place. Because of this, the game’s final release would need to be delayed in order to deliver on the developers’ intent (and on players’ expectations).

“Realistically, they have to delay the game by at least six months to complete it. Now, they eventually only delayed it by, I think, four or five weeks, which from a development point of view means very little,” said one person who worked closely with the project at the time.

Developers at DICE had hoped for more time. Morale fell, but the team marched ahead to the game’s lukewarm launch.

Ultimately, EA made back some ground with what the company calls “live operations”—additional content and updates in the months following launch—but the game never fulfilled its ambitions.

Plans were already underway for the next Battlefield game, so a postmortem was performed on 2042. It concluded that the problems had been in execution, not vision. New processes were put into place so that issues could be identified earlier and milestones like the alpha wouldn’t be missed.

To help achieve this, EA hired three industry luminaries to lead Glacier, all of them based in the United States.

The franchise leadership dream team

2021 saw EA bring on Byron Beede as general manager for Battlefield; he had previously been general manager for both Call of Duty (including the Warzone Battle Royale) and the influential shooter Destiny. EA also hired Marcus Lehto—co-creator of Halo—as creative chief of a newly formed Seattle studio called Ridgeline Games, which would lead the development of Glacier’s single-player campaign.

Finally, there was Vince Zampella, one of the leaders of the team that initially created Call of Duty in 2003. He joined EA in 2010 to work on other franchises, but in 2021, EA announced that Zampella would oversee Battlefield moving forward.

In the wake of these changes, some prominent members of DICE departed, including General Manager Oskar Gabrielson and Creative Director Lars Gustavsson, who had been known by the nickname “Mr. Battlefield.” With this changing of the guard, EA was ready to place a bigger bet than ever on the next Battlefield title.

100 million players

While 2042 struggled, competitors Call of Duty and Fortnite were posting astonishing player and revenue numbers, thanks in large part to the popularity of their Battle Royale modes.

EA’s executive leadership believed Battlefield had the potential to stand toe to toe with them, if the right calls were made and enough was invested.

A lofty player target was set for Glacier: 100 million players over a set period of time that included post-launch.

Fortnite characters looking across the many islands and vast realm of the game.

Fortnite‘s huge success has publishers like EA chasing the same dollars. Credit: Epic Games

“Obviously, Battlefield has never achieved those numbers before,” one EA employee told me. “It’s important to understand that over about that same period, 2042 has only gotten 22 million,” another said. Even 2016’s Battlefield 1—the most successful game in the franchise by numbers—had achieved “maybe 30 million plus.”

Of course, most previous Battlefield titles had been premium releases, with an up-front purchase cost and no free-to-play mode, whereas successful competitors like Fortnite and Call of Duty made their Battle Royale modes freely available, monetizing users with in-game purchases and season passes that unlocked post-launch content.

It was thought that if Glacier did the same, it could achieve comparable numbers, so a free-to-play Battle Royale mode was made a core offering for the title, alongside a six-hour single-player campaign, traditional Battlefield multiplayer modes like Conquest and  Rush, a new F2P mode called Gauntlet, and a community content mode called Portal.

The most expensive Battlefield ever

All this meant that Glacier would have a broader scope than its predecessors. Developers say it has the largest budget of any Battlefield title to date.

The project targeted a budget of more than $400 million back in early 2023, which was already more than was originally planned at the start.

However, major setbacks significantly disrupted production in 2023 (more on that in a moment) and hundreds of additional developers were brought onto Glacier from various EA-owned studios to get things back on track, significantly increasing the cost. Multiple team members with knowledge of the project’s finances told me that the current projections are now well north of that $400 million amount.

Skepticism in the ranks

Despite the big ambitions of the new leadership team and EA executives, “very few people” working in the studios believed the 100 million target was achievable, two sources told me. Many of those who had worked on Battlefield for a long time at DICE in Stockholm were particularly skeptical.

“Among the things that we are predicting is that we won’t have to cannibalize anyone else’s sales,” one developer said. “That there’s just such an appetite out there for shooters of this kind that we will just naturally be able to get the audience that we need.”

Regarding the lofty player and revenue targets, one source said that “nothing in the market research or our quality deliverables indicates that we would be anywhere near that.”

“I think people are surprised that they actually worked on a next Battlefield game and then increased the ambitions to what they are right now,” said another.

In 2023, a significant disruption to the project put one game mode in jeopardy, foreshadowing a more troubled development than anyone initially imagined.

Ridgeline implodes

Battlefield games have a reputation for middling single-player campaigns, and Battlefield 2042 didn’t include one at all. But part of this big bet on Glacier was the idea of offering the complete package, so Ridgeline Games scaled up while working on a campaign EA hoped would keep Battlefield competitive with Call of Duty, which usually has included a single-player campaign in its releases.

The studio worked on the campaign for about two years while it was also scaling and hiring talent to catch up to established studios within the Battlefield family.

It didn’t work out. In February of 2024, Ridgeline was shuttered, Halo luminary Marcus Lehto left the company, and the rest of the studios were left to pick up the pieces. When a certain review came up not long before the studio was shuttered, Glacier’s top leadership were dissatisfied with the progress they were seeing, and the call was made.

Sources in EA teams outside Ridgeline told me that there weren’t proper check-ins and internal reviews on the progress, obscuring the true state of the project until the fateful review.

On the other hand, those closer to Ridgeline described a situation in which the team couldn’t possibly complete its objectives, as it was expected to hire and scale up from zero while also meeting the same milestones as established studios with resources already in place. “They kept reallocating funds—essentially staff months—out of our budget,” one person told me. “And, you know, we’re sitting there trying to adapt to doing more with less.”

A Battlefield logo with a list of studios beneath it

A marketing image from EA showing now-defunct Ridgeline Games on the list of groups involved. Credit: EA

After the shuttering of Ridgeline, ownership of single-player shifted to three other EA studios: Criterion, DICE, and Motive. But those teams had a difficult road ahead, as “there was essentially nothing left that Ridgeline had spent two years working on that they could pick up on and build, so they had to redo essentially everything from scratch within the same constraints of when the game had to release.”

Single-player was two years behind. As of late spring, it was the only game mode that had failed to reach alpha, well over a year after the initial overall alpha target for the project.

Multiple sources said its implosion was symptomatic of some broader cultural and process problems that affected the rest of the project, too.

Culture shock

Speaking with people who have worked or currently work at DICE in Sweden, the tension between some at that studio and the new, US-based leadership team was obvious—and to a degree, that’s expected.

DICE had “the pride of having started Battlefield and owned that IP,” but now the studio was just “supporting it for American leadership,” said one person who worked there. Further, “there’s a lot of distrust and disbelief… when it comes to just operating toward numbers that very few people believe in apart from the leadership.”

But the tensions appear to go deeper than that. Two other major factors were at play: scaling pains as the scope of the project expanded and differences in cultural values between US leadership and the workers in Europe.

“DICE being originally a Swedish studio, they are a bit more humble. They want to build the best game, and they want to achieve the greatest in terms of the game experience,” one developer told me. “Of course, when you’re operated by EA, you have to set financial expectations in order to be as profitable as possible.”

That tension wasn’t new. But before 2042 failed to meet expectations, DICE Stockholm employees say they were given more leeway to set the vision for the game, as well as greater influence on timeline and targets.

Some EU-based team members were vocally dismayed at how top-down directives from far-flung offices, along with the US company’s emphasis on quarterly profits, have affected Glacier’s development far more than with previous Battlefield titles.

This came up less in talking to US-based staff, but everyone I spoke with on both continents agreed on one thing: Growing pains accompanied the transition from a production environment where one studio leads and others offer support to a new setup with four primary studios—plus outside support from all over EA—and all of it helmed by LA-based leadership.

EA is not alone in adopting this approach; it’s also used by competitor Activision-Blizzard on the Call of Duty franchise (though it’s worth noting that a big hit like Epic Games’ Fortnite has a very different structure).

Whereas publishers like EA and Activision-Blizzard used to house several studios, each of which worked on its own AAA game, they now increasingly make bigger bets on singular games-as-a-service offerings, with several of their studios working in tandem on a single project.

“Development of games has changed so much in the last 10 to 15 years,” said one developer. The new arrangement excites investors and shareholders, who can imagine returns from the next big unicorn release, but it can be a less creatively fulfilling way to work, as directives come from the top down, and much time is spent on dealing with inter-studio process. Further, it amplifies the effects of failures, with a higher human cost to people working on projects that don’t meet expectations.

It has also made the problems that affected Battlefield 2042‘s development more difficult to avoid.

Clearing the gates

EA studios use a system of “gates” to set the pace of development. Projects have to meet certain criteria to pass each gate.

For gate one, teams must have a clear sense of what they want to make and some proof of concept showing that this vision is achievable.

As they approach gate two, they’re building out and testing key technology, asking themselves if it can work at scale.

Gate three signifies full production. Glacier was expected to pass gate three in early 2023, but it was significantly delayed. When it did pass, some on the ground questioned whether it should have.

“I did not see robust budget, staff plan, feature list, risk planning, et cetera, as we left gate three,” said one person. In the way EA usually works, these things would all be expected at this stage.

As the project approached gate three and then alpha, several people within the organization tried to communicate that the game wasn’t on footing as firm as the top-level planning suggested. One person attributed this to the lack of a single source of truth within the organization. While developers tracked issues and progress in one tool, others (including project leadership) leaned on other sources of information that weren’t as tied to on-the-ground reality when making decisions.

A former employee with direct knowledge of production plans told me that as gate three approached, prototypes of some important game features were not ready, but since there wasn’t time to complete proofs of concept, the decision was handed down to move ahead to production even though the normal prerequisites were not met.

“If you don’t have those things fleshed out when you’re leaving pre-pro[duction], you’re just going to be playing catch-up the entire time you’re in production,” this source said.

In some cases, employees who flagged the problems believed they were being punished. Two EA employees each told me they found themselves cut out of meetings once they raised concerns like this.

Gate three was ultimately declared clear, and as of late May 2025, alpha was achieved for everything except the single-player campaign. But I’m told that this occurred with some tasks still un-estimated and many discrepancies remaining, leaving the door open to problems and compromises down the road.

The consequences for players

Because of these issues, the majority of the people I spoke with said they expect planned features or content to be cut before the game actually launches—which is normal, to a degree. But these common game development problems can contribute to other aspects of modern AAA gaming that many consumers find frustrating.

First off, making major decisions so late in the process can lead to huge day-one patches. Players of all types of AAA games often take to Reddit and social media to malign day-one patches as a frustrating annoyance for modern titles.

Battlefield 2042 had a sizable day-one patch. When multiplayer RPG Anthem (another big investment by EA) launched to negative reviews, that was partly because critics and others with pre-launch access were playing a build that was weeks old; a day-one patch significantly improved some aspects of the game, but that came after the negative press began to pour out.

A player character confronts a monster in Anthem

Anthem, another EA project with a difficult development, launched with a substantial day-one patch. Credit: EA

Glacier’s late arrival to Alpha and the teams’ problems with estimating the status of features could lead to a similarly significant day-one patch. That’s in part because EA has to deliver the work to external partners far in advance of the actual launch date.

“They have these external deadlines to do with the submissions into what EA calls ‘first-party’—that’s your PlayStation and Xbox submissions,” one person explained. “They have to at least have builds ready that they can submit.”

What ends up on the disc or what pre-loads from online marketplaces must be finalized long before the game’s actual release date. When a project is far behind or prone to surprises in the final stretch, those last few weeks are where a lot of vital work happens, so big launch patches become a necessity.

These struggles over content often lead to another pet peeve of players: planned launch content being held until later. “There’s a bit of project management within the Battlefield project that they can modify,” a former senior EA employee who worked on the project explained. “They might push it into Season 1 or Season 2.”

That way, players ultimately get the intended feature or content, but in some cases, they may end up paying more for it, as it ends up being part of a post-launch package like a battle pass.

These challenges are a natural extension of the fiscal-quarter-oriented planning that large publishers like EA adhere to. “The final timelines don’t change. The final numbers don’t change,” said one source. “So there is an enormous amount of pressure.”

A campaign conundrum

Single-player is also a problem. “Single-player in itself is massively late—it’s the latest part of the game,” I was told. “Without an enormous patch on day one or early access to the game, it’s unrealistic that they’re going to be able to release it to what they needed it to do.”

If the single-player mode is a linear, narrative campaign as originally planned, it may not be possible to delay missions or other content from the campaign to post-launch seasons.

“Single-player is secondary to multiplayer, so they will shift the priority to make sure that single-player meets some minimal expectations, however you want to measure that. But the multiplayer is the main focus,” an EA employee said.

“They might have to cut a part of the single-player out in order for the game to release with a single-player [campaign] on it,” they continued. “Or they would have to severely work through the summer and into the later part of this year and try to fix that.”

That—and the potential for a disappointing product—is a cost for players, but there are costs for the developers who work on the game, too.

Because timelines must be kept, and not everything can be cut or moved post-launch, it falls on employees to make up the gap. As we’ve seen in countless similar reports about AAA video game development before, that sometimes means longer hours and heavier stress.

AAA’s burnout problem

More than two decades ago, the spouse of an EA employee famously wrote an open letter to bring attention to the long hours and high stress developers there were facing.

Since then, some things have improved. People at all levels within EA are more conscious of the problems that were highlighted, and there have been efforts to mitigate some of them, like more comp time and mental health resources. However, many of those old problems linger in some form.

I heard several first-hand accounts of people working on Glacier who had to take stress or mental or exhaustion health leave, ranging from a couple of weeks to several months.

“There’s like—I would hesitate to count—but a large number compared to other projects I’ve been on who have taken mental exhaustion leave here. Some as short as two weeks to a month, some as long as eight months and nine,” one staffer told me after saying they had taken some time themselves.

This was partly because of long hours that were required when working directly with studios in both the US and Europe—a symptom of the new, multi-studio structure.

“My day could start as early as 5: 00 [am],” one person said. The first half of the day involved meetings with a studio in one part of the world while the second included meetings with a studio in another region. “Then my evenings would be spent doing my work because I’d be tied up juggling things all across the board and across time zones.”

This sort of workload was not limited to a brief, planned period of focused work, the employees said. Long hours were particularly an issue for those working in or closely with Ridgeline, the studio initially tasked with making the game’s single-player campaign.

From the beginning, members of the Ridgeline team felt they were expected to deliver work at a similar level to that of established studios like DICE or Ripple Effect before they were even fully staffed.

“They’ve done it before,” one person who was involved with Ridgeline said of DICE. “They’re a well-oiled machine.” But Ridgeline was “starting from zero” and was “expected to produce the same stuff.”

Within just six months of the starting line, some developers at Ridgeline said they were already feeling burnt out.

In the wake of the EA Spouses event, EA developed resources for employees. But in at least some cases, they weren’t much help.

“I sought some, I guess, mental help inside of EA. From HR or within that organization of some sort, just to be able to express it—the difficulties that I experienced personally or from coworkers on the development team that had experienced this, you know, that had lived through that,” said another employee. “And the nature of that is there’s nobody to listen. They pretend to listen, but nobody ultimately listens. Very few changes are made on the back of it.”

This person went on to say that “many people” had sought similar help and felt the same way, as far back as the post-launch period for 2042 and as recently as a few months ago.

Finding solutions

There have been a lot of stories like this about the games industry over the years, and it can feel relentlessly grim to keep reading them—especially when they’re coming alongside frequent news of layoffs, including at EA. Problems are exposed, but solutions don’t get as much attention.

In that spirit, let’s wrap up by listening to what some in the industry have said about what doing things better could look like—with the admitted caveat that these proposals are still not always common practice in AAA development.

“Build more slowly”

When Swen Vincke—studio head for Larian Studios and game director for the runaway success Baldur’s Gate 3—accepted an award at the Game Developers Conference, he took his moment on stage to express frustration at publishers like EA.

“I’ve been fighting publishers my entire life, and I keep on seeing the same, same, same mistakes over and over and over,” he said. “It’s always the quarterly profits. The only thing that matters are the numbers.”

After the awards show, he took to X to clarify his statements, saying, “This message was for those who try to double their revenue year after year. You don’t have to do that. Build more slowly and make your aim improving the state of the art, not squeezing out the last drop.”

A man stands on stage giving a speech

Swen Vincke giving a speech at the 2024 Game Developers Choice Awards. Credit: Game Developers Conference

In planning projects like Glacier, publicly traded companies often pursue huge wins—and there’s even more pressure to do so if a competing company has already achieved big success with similar titles.

But going bigger isn’t always the answer, and many in the industry believe the “one big game” strategy is increasingly nonviable.

In this attention economy?

There may not be enough player time or attention to go around, given the numerous games-as-a-service titles that are as large in scope as Call of Duty games or Fortnite. Despite the recent success of new entrant Marvel Rivals, there have been more big AAA live service shooter flops than wins in recent years.

Just last week, a data-based report by prominent games marketing newsletter GameDiscoverCo came to a prescient realization. “Genres like Arena Shooter, Battle Royale, and Hero Shooter look amazing from a revenue perspective. But there’s only 29 games in all of Steam’s history that have grossed >$1m in those subgenres,” wrote GameDiscoverCo’s Simon Carless.

It gets worse. “Only Naraka Bladepoint, Overwatch 2 & Marvel Rivals have grossed >$25m and launched since 2020 in those subgenres,” Carless added. (It’s important to clarify that he is just talking Steam numbers here, though.) That’s a stark counterpoint to reports that Call of Duty has earned more than $30 billion in lifetime revenue.

Employees of game publishers and studios are deeply concerned about this. In a 2025 survey of professional game developers, “one of the biggest issues mentioned was market oversaturation, with many developers noting how tough it is to break through and build a sustainable player base.”

Despite those headwinds, publishers like EA are making big bets in well-established spaces rather than placing a variety of smaller bets in newer areas ripe for development. Some of the biggest recent multiplayer hits on Steam have come from smaller studios that used creative ideas, fresh genres, strong execution, and the luck (or foresight) of reaching the market at exactly the right time.

That might suggest that throwing huge teams and large budgets up against well-fortified competitors is an especially risky strategy—hence some of the anxiety from the EA developers I spoke with.

Working smarter, not harder

That anxiety has led to steadily growing unionization efforts across the industry. From QA workers at Bethesda to more wide-ranging unions at Blizzard and CD Projekt Red, there’s been more movement on this front in the past two or three years than there had been in decades beforehand.

Unionization isn’t a cure-all, and it comes with its own set of new challenges—but it does have the potential to shift some of the conversations toward more sustainable practices, so that’s another potential part of the solution.

Insomniac Games CEO Ted Price spoke authoritatively on sustainability and better work practices for the industry way back at 2021’s Develop:Brighton conference:

I think the default is to brute force the problem—in other words, to throw money or people at it, but that can actually cause more chaos and affect well-being, which goes against that balance. The harder and, in my opinion, more effective solution is to be more creative within constraints… In the stress of hectic production, we often feel we can’t take our foot off the gas pedal—but that’s often what it takes.

That means publishers and studios should plan for problems and work from accurate data about where the team is at, but it also means having a willingness to give their people more time, provided the capital is available to do so.

Giving people what they need to do their jobs sounds like a simple solution to a complex problem, but it was at the heart of every conversation I had about Glacier.

Most EA developers—including leaders who are beholden to lofty targets—want to make a great game. “At the end of the day, they’re all really good people and they work really hard and they really want to deliver a good product for their customer,” one former EA developer assured me as we ended our call.

As for making the necessary shifts toward sustainability in the industry, “It’s kind of in the best interest of making the best possible game for gamers,” explained another. “I hope to God that they still achieve what they need to achieve within the timelines that they have, for the sake of Battlefield as a game to actually meet the expectations of the gamers and for people to maintain their jobs.”

Photo of Samuel Axon

Samuel Axon is the editorial lead for tech and gaming coverage at Ars Technica. He covers AI, software development, gaming, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

What’s wrong with AAA games? The development of the next Battlefield has answers. Read More »

pay-up-or-stop-scraping:-cloudflare-program-charges-bots-for-each-crawl

Pay up or stop scraping: Cloudflare program charges bots for each crawl

“Imagine asking your favorite deep research program to help you synthesize the latest cancer research or a legal brief, or just help you find the best restaurant in Soho—and then giving that agent a budget to spend to acquire the best and most relevant content,” Cloudflare said, promising that “we enable a future where intelligent agents can programmatically negotiate access to digital resources.”

AI crawlers now blocked by default

Cloudflare’s announcement comes after rolling out a feature last September, allowing website owners to block AI crawlers in a single click. According to Cloudflare, over 1 million customers chose to block AI crawlers, signaling that people want more control over their content at a time when Cloudflare observed that writing instructions for AI crawlers in robots.txt files was widely “underutilized.”

To protect more customers moving forward, any new customers (including anyone on a free plan) who sign up for Cloudflare services will have their domains, by default, set to block all known AI crawlers.

This marks Cloudflare’s transition away from the dreaded opt-out models of AI scraping to a permission-based model, which a Cloudflare spokesperson told Ars is expected to “fundamentally change how AI companies access web content going forward.”

In a world where some website owners have grown sick and tired of attempting and failing to block AI scraping through robots.txt—including some trapping AI crawlers in tarpits to punish them for ignoring robots.txt—Cloudflare’s feature allows users to choose granular settings to prevent blocks on AI bots from impacting bots that drive search engine traffic. That’s critical for small content creators who want their sites to still be discoverable but not digested by AI bots.

“AI crawlers collect content like text, articles, and images to generate answers, without sending visitors to the original source—depriving content creators of revenue, and the satisfaction of knowing someone is reading their content,” Cloudflare’s blog said. “If the incentive to create original, quality content disappears, society ends up losing, and the future of the Internet is at risk.”

Disclosure: Condé Nast, which owns Ars Technica, is a partner involved in Cloudflare’s beta test.

This story was corrected on July 1 to remove publishers incorrectly listed as participating in Cloudflare’s pay-per-crawl beta.

Pay up or stop scraping: Cloudflare program charges bots for each crawl Read More »

supreme-court-to-decide-whether-isps-must-disconnect-users-accused-of-piracy

Supreme Court to decide whether ISPs must disconnect users accused of piracy

The Supreme Court has agreed to hear a case that could determine whether Internet service providers must terminate users who are accused of copyright infringement.

In a list of orders released today, the court granted a petition filed by cable company Cox. The ISP, which was sued by Sony Music Entertainment, is trying to overturn a ruling that it is liable for copyright infringement because it failed to terminate users accused of piracy. Music companies want ISPs to disconnect users whose IP addresses are repeatedly connected to torrent downloads.

“We are pleased the US Supreme Court has decided to address these significant copyright issues that could jeopardize Internet access for all Americans and fundamentally change how Internet service providers manage their networks,” Cox said today.

Cox was once on the hook for $1 billion in the case. In February 2024, the 4th Circuit court of appeals overturned the $1 billion verdict, deciding that Cox did not profit directly from copyright infringement committed by users. But the appeals court found that Cox was guilty of willful contributory infringement and ordered a new damages trial.

The Cox petition asks the Supreme Court to decide whether an ISP “can be held liable for ‘materially contributing’ to copyright infringement merely because it knew that people were using certain accounts to infringe and did not terminate access, without proof that the service provider affirmatively fostered infringement or otherwise intended to promote it.”

Trump admin backed Cox; Sony petition denied

The Trump administration backed Cox last month, saying that ISPs shouldn’t be forced to terminate the accounts of people accused of piracy. Solicitor General John Sauer told the court in a brief that the 4th Circuit decision, if not overturned, “subjects ISPs to potential liability for all acts of copyright infringement committed by particular subscribers as long as the music industry sends notices alleging past instances of infringement by those subscribers” and “might encourage providers to avoid substantial monetary liability by terminating subscribers after receiving a single notice of alleged infringement.”

Supreme Court to decide whether ISPs must disconnect users accused of piracy Read More »

nih-budget-cuts-affect-research-funding-beyond-us-borders

NIH budget cuts affect research funding beyond US borders


European leaders say they will fill the funding void. Is that realistic?

Credit: E+ via Getty Images

Rory de Vries, an associate professor of virology in the Netherlands, was lifting weights at the gym when he noticed a WhatsApp message from his research partners at Columbia University, telling him his research funding had been cancelled. The next day he received the official email: “Hi Rory, Columbia has received a termination notice for this contract, including all subcontracts,” it stated. “Unfortunately, we must advise you to immediately stop work and cease incurring charges on this subcontract.”

De Vries was disappointed, though not surprised—his team knew this might happen under the new Trump administration. His projects focused on immune responses and a new antiviral treatment for respiratory viruses like Covid-19. Animals had responded well in pre-clinical trials, and he was about to explore the next steps for applications in humans. But the news, which he received in March, left him with a cascade of questions: What would happen to the doctoral student he had just hired for his project, a top candidate plucked from a pool of some 300 aspiring scientists? How would his team comply with local Dutch law, which, unlike the US, forbids terminating a contract without cause or notice? And what did the future hold for his projects, two of which contained promising data for treating Covid-19 and other respiratory illnesses in humans?

It was all up in the air, leaving de Vries, who works at the Erasmus Medical Center in Rotterdam and whose research has appeared in top-tier publications scrambling for last-minute funding from the Dutch government or the European Union.

Of the 20 members in his group, he will soon run out of money to pay the salaries for four. As of June, he

estimated that his team has enough to keep going for about six months in its current form if it draws money from other funding sources.

But that still leaves funding uncertain in the long-term: “So, yeah, that’s a little bit of an emergency solution,” he said.

Cuts to science funding in the US have devastated American institutions, hitting cancer research and other vital fields, but they also affect a raft of international collaborations and scientists based abroad. In Canada, Australia, South Africa and elsewhere, projects receiving funds from the National Institutes of Health have been terminated or stalled due to recent budget cuts.

Researchers in Europe and the US have long collaborated to tackle tough scientific questions. Certain fields, like rare diseases, particularly benefit from international collaboration because it widens the pool of patients available to study. European leaders have said that they will step into the gap created by Trump’s NIH cuts to make Europe a magnet for science—and they have launched a special initiative to attract US scientists. But some researchers doubt that Europe alone can truly fill the void.

In many European countries, scientist salaries are modest and research funding has lagged behind inflation in recent years. In a May press release, a French scientists’ union described current pay as “scandalously low” and said research funding in France and Europe as a whole lags behind the US, South Korea, China, Taiwan, and Japan. Europe and its member states would need to increase research funding by up to 150 billion euros (roughly USD $173 billion) per year to properly support science, said Boris Gralak, general secretary of the French union, in an interview with Undark.

The shifts are not just about money, but the pattern of how international research unfolds, said Stefan Pfister, a pediatric cancer specialist in Germany who has also received NIH funds. The result, he said, is “this kind of capping and compromising well-established collaborations.”

Funding beyond US borders

For decades, international researchers have received a small slice of the National Institutes of Health budget. In 2024, out of an overall budget of $48 billion, the NIH dispensed $69 million to 125 projects across the European continent and $262 million in funding worldwide, according to the NIH award database.

The US and Europe “have collaborated in science for, you know, centuries at this point,” said Cole Donovan, associate director of science and technology ecosystem development at the Federation of American Scientists, noting that the relationship was formalized in 1997 in an agreement highlighting the two regions’ common interests.

And it has overall been beneficial, said Donovan, who worked in the State Department for a decade to help facilitate such collaborations. In some cases, European nations simply have capabilities that do not exist in the US, like the Czech Republic and Romania, he said, which have some of the most sophisticated laser facilities in the world.

“If you’re a researcher and you want to use those facilities,” he added, “you have to have a relationship with people in those countries.”

Certain fields, like rare diseases, particularly benefit from international collaboration because it widens the pool of patients available to study.

The shared nature of research is driven by personal connections and scientific interest, Donovan said: “The relationship in science and technology is organic.”

But with the recent cuts to NIH funding, the fate of those research projects—particularly on the health effects of climate change, transgender health, and Covid-19—has been thrown into question. On May 1, the NIH said it would not reissue foreign subawards, which fund researchers outside the US who work with American collaborators—or agree to US researchers asking to add a foreign colleague to a project. The funding structure lacked transparency and could harm national security, the NIH stated, though it noted that it would not “retroactively revise ongoing awards to remove foreign subawards at this time.” (The NIH would continue to support direct foreign awards, according to the statement.)

The cuts have hit European researchers like de Vries, whose institution, Erasmus MC, was a sub-awardee on three Columbia University grants to support his work. Two projects on Covid-19 transmission and treatment have ended abruptly, while another, on a potential treatment for measles, has been frozen, awaiting review at the end of May, though by late June he still had no news and said he assumed it would not be renewed.We’re trying to scrape together some money to do some two or three last experiments, so we at least can publish the work and that it’s in literature and anyone else can pick it up,” he said. “But yeah, the work has stopped.”

His Ph.D. students must now shift the focus of their theses; for some, that means pivoting after nearly three years of study.

De Vries’ team has applied for funds from the Dutch government, as well as sought industry funding, for a new project evaluating a vaccine for RSV—something he wouldn’t have done otherwise, he said, since industry funding can limit research questions. “Companies might not be interested in in-depth immunological questions, or a side-by-side comparison of their vaccine with the direct competition,” he wrote in an email.

International scientists who have received direct awards have so far been unaffected, but say they are still nervous about potential further cuts. Pfister, for example, is now leading a five-year project to develop treatments for childhood tumors; with the majority of funding coming from NIH and Cancer Research U.K., a British-based cancer charity, “not knowing what the solution will look like next year,” he said, “generates uncertainties.”

The jointly funded $25 million project—which scientists from nine institutions across five countries including the US are collaborating on—explores treatments for seven childhood cancers and offers a rare opportunity to make progress in tackling tumors in children, Pfister added, as treatments have lagged in the field due to the small market and the high costs of development. Tumors in children differ from those in adults and, until recently, were harder to target, said Pfister. But new discoveries have allowed researchers to target cancer more specifically in children, and global cooperation is central to that progress.

The US groups, which specialize in drug chemistry, develop lead compounds for potential drugs. Pfister’s team then carries out experiments on toxicity and effectiveness. The researchers hope to bring at least one treatment, into early-phase clinical trials.

Funding from NIH is confirmed for this financial year. Beyond that, the researchers are staying hopeful, Pfister said.

“It’s such an important opportunity for all of us to work together,” said Pfister, “that we don’t want to think about worst-case scenarios.”

Pfister told Undark that his team in Heidelberg, Germany, has assembled the world´s biggest store of pediatric cancer models; no similar stock currently exists in the US The work of the researchers is complementary, he stressed: “If significant parts would drop out, you cannot run the project anymore.”

Rare diseases benefit from international projects, he added. In these fields, “We don’t have the patient numbers, we don’t have the critical mass,” in one country alone, he said. In his field, researchers conduct early clinical trials in patients on both sides of the Atlantic. “That’s just not because we are crazy, but just because this the only way to physically conduct them.”

The US has spearheaded much drug development, he noted. “Obviously the US has been the powerhouse for biomedical research for the last 50 years, so it’s not surprising that some of the best people and the best groups are sitting there,” he said. A smaller US presence in the field would reduce the critical mass of people and resources available, which would be a disaster for patients, he said. “Any dreams of this all moving to Europe are illusions in my mind.”

While Europe has said it will step in to fill the gap, the amounts discussed were not enough, Gralak said. The amount of money available in Europe “is a very different order of magnitude,” Pfister said. It also won’t help their colleagues in the US, who European researchers need to thrive in order to maintain necessary collaborations, he said. “In the US, we are talking about dozens of billions of dollars less in research, and this cannot be compensated by any means, by the EU or any other funder.” Meanwhile, the French scientists’ union said the country has failed to meet funding promises made as long ago as 2010.

And although Europe receives a sliver of NIH funds, these cuts could have a real impact on public health. De Vries said that his measles treatment was at such an early stage that its potential benefits remained unproven, but if effective it could have been the only treatment of its kind at a time when cases are rising.

And he said the stalling of both his work and other research on Covid-19 leaves the world less prepared for a future pandemic. The antiviral drug he has developed had positive results in ferrets but needs further refinement to work in humans. If the drugs were available for people, “that would be great,” he said. “Then we could actually work on interrupting a pandemic early.”

New opportunities for Europe

The shift in US direction offers an opportunity for the EU, said Mike Galsworthy, a British scientist who campaigned to unite British and EU science in the wake of Brexit. The US will no longer be the default for ambitious researchers from across the world, he said: “It’s not just US scientists going to Canada and Europe. There’s also going to be the huge brain diversion.” he said. “If you are not a native English speaker and not White, you might be extra nervous about going to the States for work there right now,” he added.

And in recent weeks, European governments have courted fleeing scientists. In April, France launched a platform called Choose France for Science, which allows institutions to request funding for international researchers, and highlights an interest in health, climate science, and artificial intelligence, among other research areas Weeks later, the European Union announced a new program called Choose Europe for Science, aiming to make Europe a “magnet for researchers.” It includes a 500 million Euro (roughly USD $578 million) funding package for 2025-2027, new seven-year “super grants,” to attract the best researchers, and top-up funds that would help scientists from outside Europe settle into their new institution of choice.

The initial funding comes from money already allocated to Horizon Europe—the EU’s central research and innovation funding program. But some researchers are skeptical. The French union leader, Gralak, who is also a researcher in mathematical physics, described the programs as PR initiatives. He criticized European leaders for taking advantage of the problems in US science to attract talent to Europe, and said leaders should support science in Europe through proper and sufficient investment. The programs are “derisory and unrealistic,” he said.

“It’s not just US scientists going to Canada and Europe. There’s also going to be the huge brain diversion.”

Others agreed that Europe’s investment in science is inadequate. Bringing scientists to Europe would be “great for science and the talent, but that also means that will come from a line where there’s normally funding for European researchers,” said de Vries, the researcher from Rotterdam. As Mathilde Richard, a colleague of de Vries who works on viruses and has five active NIH grants, told Undark: “Why did I start to apply to NIH funds? And still, the most straightforward answer is that there isn’t enough in Europe.”

In the Netherlands, a rightwing government has said it will cut science funding by a billion euros over the next five years. And while the flagship program Horizon Europe encourages large-scale projects spanning multiple countries, scientists spend years putting together the major cross-country collaborations the system requires. Meanwhile, European Research Council grants are “extremely competitive and limited,” de Vries said.

Richard’s NIH grants pay for 65 percent of her salary and for 80 percent of her team, and she believes she’s the most dependent on US funds of anyone in her department at Erasmus Medical Center in Rotterdam. She applied because the NIH funding seemed more sustainable than local money, she said. In Europe, too often funding is short-term and has a time-consuming administrative burden, she said, which hinders researchers from developing long-term plans. “We have to battle so much to just do our work and find funds to just do our basic work,” she said. “I think we need to advocate for a better and more sustainable way of funding research.”

Scientists, too, are worried about what US cuts mean for global science, beyond the short-term. Paltry science funding could discourage a generation of talented people from entering the field, Pfister suggested: “In the end, the resources are not only monetary, but also the brain resources are reduced.”

Let’s not talk about it

A few months ago, Pfister attended a summit in Boston for Cancer Grand Challenges, a research initiative co-funded by the NIH’s National Cancer Institute and Cancer Research U.K. Nobody from the NIH came because they had no funding to travel. “So we are all sitting in Boston, and they are sitting like 200 miles away,” he said.

More concerning was the fact that those present seemed afraid to discuss why the NIH staff were absent, he said. “It was us Europeans to basically, kind of break the ice to, you know, at least talk about it.”

Pfister said that some European researchers are now hesitant about embarking on US collaborations, even if there is funding available. And some German scientists are taking steps to ensure that they are protected if a similar budget crackdown occurred in Germany, he said—devising independent review processes, separating research policy from funding, and developing funding models less dependent on government-only sources, he said. “I think the most scary part is that you know, this all happened in three months.”

Despite the worry and uncertainty, de Vries offered a hopeful view of the future. “We will not be defeated by NIH cuts,” he said. “I feel confident that Europe will organize itself.”

This article was originally published on Undark. Read the original article.

NIH budget cuts affect research funding beyond US borders Read More »

vmware-perpetual-license-holder-receives-audit-letter-from-broadcom

VMware perpetual license holder receives audit letter from Broadcom

The letter, signed by Aiden Fitzgerald, director of global sales operations at Broadcom, claims that Broadcom will use its time “as efficiently and productively as possible to minimize disruption.”

Still, the security worker that Ars spoke with is concerned about the implications of the audit and said they “expect a big financial impact” for their employer. They added:

Because we are focusing on saving costs and are on a pretty tight financial budget, this will likely have impact on the salary negotiations or even layoffs of employees. Currently, we have some very stressed IT managers [and] legal department [employees] …

The employee noted that they are unsure if their employer exceeded its license limits. If the firm did, it could face “big” financial repercussions, the worker noted.

Users deny wrongdoing

As Broadcom works to ensure that people aren’t using VMware outside its terms, some suggest that the semiconductor giant is wasting some time by investigating organizations that aren’t violating agreements.

After Broadcom started sending cease-and-desist letters, at least one firm claimed that it got a letter from Broadcom despite no longer using VMware at all.

Additionally, various companies claimed that they received a cease-and-desist from Broadcom despite not implementing any updates after their VMware support contract expired.

The employee at the Dutch firm that received an audit notice this month claimed that the only update that their employer has issued to the VMware offerings it uses since support ended was a “critical security patch.”

That employee also claimed to Ars that their company didn’t receive a cease-and-desist letter from Broadcom before being informed of an audit.

Broadcom didn’t respond to Ars’ request for comment ahead of publication, so we’re unable to confirm if the company is sending audit letters without sending cease-and-desist letters first. Ars also reached out to Connor Consulting but didn’t hear back.

“When we saw the news that they were going to send cease-and-desist letters and audits, our management thought it was a bluff and that they would never do that,” the anonymous security worker said.

Broadcom’s litigious techniques to ensure VMware agreements are followed have soured its image among some current and former customers. Broadcom’s $69 billion VMware acquisition has proven lucrative, but as Broadcom approaches two years of VMware ownership, there are still calls for regulation of its practices, which some customers and partners believe are “legally and ethically flawed.”

VMware perpetual license holder receives audit letter from Broadcom Read More »

actively-exploited-vulnerability-gives-extraordinary-control-over-server-fleets

Actively exploited vulnerability gives extraordinary control over server fleets

On Wednesday, CISA added CVE-2024-54085 to its list of vulnerabilities known to be exploited in the wild. The notice provided no further details.

In an email on Thursday, Eclypsium researchers said the scope of the exploits has the potential to be broad:

  • Attackers could chain multiple BMC exploits to implant malicious code directly into the BMC’s firmware, making their presence extremely difficult to detect and allowing them to survive OS reinstalls or even disk replacements.
  • By operating below the OS, attackers can evade endpoint protection, logging, and most traditional security tools.
  • With BMC access, attackers can remotely power on or off, reboot, or reimage the server, regardless of the primary operating system’s state.
  • Attackers can scrape credentials stored on the system, including those used for remote management, and use the BMC as a launchpad to move laterally within the network
  • BMCs often have access to system memory and network interfaces, enabling attackers to sniff sensitive data or exfiltrate information without detection
  • Attackers with BMC access can intentionally corrupt firmware, rendering servers unbootable and causing significant operational disruption

With no publicly known details of the ongoing attacks, it’s unclear which groups may be behind them. Eclypsium said the most likely culprits would be espionage groups working on behalf of the Chinese government. All five of the specific APT groups Eclypsium named have a history of exploiting firmware vulnerabilities or gaining persistent access to high-value targets.

Eclypsium said the line of vulnerable AMI MegaRAC devices uses an interface known as Redfish. Server makers known to use these products include AMD, Ampere Computing, ASRock, ARM, Fujitsu, Gigabyte, Huawei, Nvidia, Supermicro, and Qualcomm. Some, but not all, of these vendors have released patches for their wares.

Given the damage possible from exploitation of this vulnerability, admins should examine all BMCs in their fleets to ensure they aren’t vulnerable. With products from so many different server makers affected, admins should consult with their manufacturer when unsure if their networks are exposed.

Actively exploited vulnerability gives extraordinary control over server fleets Read More »

researchers-develop-a-battery-cathode-material-that-does-it-all

Researchers develop a battery cathode material that does it all

Battery electrode materials need to do a lot of things well. They need to be conductors to get charges to and from the ions that shuttle between the electrodes. They also need to have an open structure that allows the ions to move around before they reach a site where they can be stored. The storage of lots of ions also causes materials to expand, creating mechanical stresses that can cause the structure of the electrode material to gradually decay.

Because it’s hard to get all of these properties from a single material, many electrodes are composite materials, with one chemical used to allow ions into and out of the electrode, another to store them, and possibly a third that provides high conductivity. Unfortunately, this can create new problems, with breakdowns at the interfaces between materials slowly degrading the battery’s capacity.

Now, a team of researchers is proposing a material that seemingly does it all. It’s reasonably conductive, it allows lithium ions to move around and find storage sites, and it’s made of cheap and common elements. Perhaps best of all, it undergoes self-healing, smoothing out damage across charge/discharge cycles.

High capacity

The research team, primarily based in China, set out to limit the complexity of cathodes. “Conventional composite cathode designs, which typically incorporate a cathode active material, catholyte, and electronic conducting additive, are often limited by the substantial volume fraction of electrochemically inactive components,” the researchers wrote. The solution, they reasoned, was to create an all-in-one material that gets rid of most of these materials.

A number of papers had reported good luck with chlorine-based chemicals, which allowed ions to move readily through the material but didn’t conduct electricity very well. So the researchers experimented with pre-loading one of these materials with lithium. And they focused on iron chloride since it’s a very cheap material.

Researchers develop a battery cathode material that does it all Read More »

curated-realities:-an-ai-film-festival-and-the-future-of-human-expression

Curated realities: An AI film festival and the future of human expression


We saw 10 AI films and interviewed Runway’s CEO as well as Hollywood pros.

An AI-generated frame of a person looking at an array of television screens

A still from Total Pixel Space, the Grand Prix winner at AIFF 2025.

A still from Total Pixel Space, the Grand Prix winner at AIFF 2025.

Last week, I attended a film festival dedicated to shorts made using generative AI. Dubbed AIFF 2025, it was an event precariously balancing between two different worlds.

The festival was hosted by Runway, a company that produces models and tools for generating images and videos. In panels and press briefings, a curated list of industry professionals made the case for Hollywood to embrace AI tools. In private meetings with industry professionals, I gained a strong sense that there is already a widening philosophical divide within the film and television business.

I also interviewed Runway CEO Cristóbal Valenzuela about the tightrope he walks as he pitches his products to an industry that has deeply divided feelings about what role AI will have in its future.

To unpack all this, it makes sense to start with the films, partly because the film that was chosen as the festival’s top prize winner says a lot about the issues at hand.

A festival of oddities and profundities

Since this was the first time the festival has been open to the public, the crowd was a diverse mix: AI tech enthusiasts, working industry creatives, and folks who enjoy movies and who were curious about what they’d see—as well as quite a few people who fit into all three groups.

The scene at the entrance to the theater at AIFF 2025 in Santa Monica, California.

The films shown were all short, and most would be more at home at an art film fest than something more mainstream. Some shorts featured an animated aesthetic (including one inspired by anime) and some presented as live action. There was even a documentary of sorts. The films could be made entirely with Runway or other AI tools, or those tools could simply be a key part of a stack that also includes more traditional filmmaking methods.

Many of these shorts were quite weird. Most of us have seen by now that AI video-generation tools excel at producing surreal and distorted imagery—sometimes whether the person prompting the tool wants that or not. Several of these films leaned into that limitation, treating it as a strength.

Representing that camp was Vallée Duhamel’s Fragments of Nowhere, which visually explored the notion of multiple dimensions bleeding into one another. Cars morphed into the sides of houses, and humanoid figures, purported to be inter-dimensional travelers, moved in ways that defied anatomy. While I found this film visually compelling at times, I wasn’t seeing much in it that I hadn’t already seen from dreamcore or horror AI video TikTok creators like GLUMLOT or SinRostroz in recent years.

More compelling were shorts that used this propensity for oddity to generate imagery that was curated and thematically tied to some aspect of human experience or identity. For example, More Tears than Harm by Herinarivo Rakotomanana was a rotoscope animation-style “sensory collage of childhood memories” of growing up in Madagascar. Its specificity and consistent styling lent it a credibility that Fragments of Nowhere didn’t achieve. I also enjoyed Riccardo Fusetti’s Editorial on this front.

More Tears Than Harm, an unusual animated film at AIFF 2025.

Among the 10 films in the festival, two clearly stood above the others in my impressions—and they ended up being the Grand Prix and Gold prize winners. (The judging panel included filmmakers Gaspar Noé and Harmony Korine, Tribeca Enterprises CEO Jane Rosenthal, IMAX head of post and image capture Bruce Markoe, Lionsgate VFX SVP Brianna Domont, Nvidia developer relations lead Richard Kerris, and Runway CEO Cristóbal Valenzuela, among others).

Runner-up Jailbird was the aforementioned quasi-documentary. Directed by Andrew Salter, it was a brief piece that introduced viewers to a program in the UK that places chickens in human prisons as companion animals, to positive effect. Why make that film with AI, you might ask? Well, AI was used to achieve shots that wouldn’t otherwise be doable for a small-budget film to depict the experience from the chicken’s point of view. The crowd loved it.

Jailbird, the runner-up at AIFF 2025.

Then there was the Grand Prix winner, Jacob Adler’s Total Pixel Space, which was, among other things, a philosophical defense of the very idea of AI art. You can watch Total Pixel Space on YouTube right now, unlike some of the other films. I found it strangely moving, even as I saw its selection as the festival’s top winner with some cynicism. Of course they’d pick that one, I thought, although I agreed it was the most interesting of the lot.

Total Pixel Space, the Grand Prix winner at AIFF 2025.

Total Pixel Space

Even though it risked navel-gazing and self-congratulation in this venue, Total Pixel Space was filled with compelling imagery that matched the themes, and it touched on some genuinely interesting ideas—at times, it seemed almost profound, didactic as it was.

“How many images can possibly exist?” the film’s narrator asked. To answer that, it explains the concept of total pixel space, which actually reflects how image generation tools work:

Pixels are the building blocks of digital images—tiny tiles forming a mosaic. Each pixel is defined by numbers representing color and position. Therefore, any digital image can be represented as a sequence of numbers…

Just as we don’t need to write down every number between zero and one to prove they exist, we don’t need to generate every possible image to prove they exist. Their existence is guaranteed by the mathematics that defines them… Every frame of every possible film exists as coordinates… To deny this would be to deny the existence of numbers themselves.

The nine-minute film demonstrates that the number of possible images or films is greater than the number of atoms in the universe and argues that photographers and filmmakers may be seen as discovering images that already exist in the possibility space rather than creating something new.

Within that framework, it’s easy to argue that generative AI is just another way for artists to “discover” images.

The balancing act

“We are all—and I include myself in that group as well—obsessed with technology, and we keep chatting about models and data sets and training and capabilities,” Runway CEO Cristóbal Valenzuela said to me when we spoke the next morning. “But if you look back and take a minute, the festival was celebrating filmmakers and artists.”

I admitted that I found myself moved by Total Pixel Space‘s articulations. “The winner would never have thought of himself as a filmmaker, and he made a film that made you feel something,” Valenzuela responded. “I feel that’s very powerful. And the reason he could do it was because he had access to something that just wasn’t possible a couple of months ago.”

First-time and outsider filmmakers were the focus of AIFF 2025, but Runway works with established studios, too—and those relationships have an inherent tension.

The company has signed deals with companies like Lionsgate and AMC Networks. In some cases, it trains on data provided by those companies; in others, it embeds within them to try to develop tools that fit how they already work. That’s not something competitors like OpenAI are doing yet, so that, combined with a head start in video generation, has allowed Runway to grow and stay competitive so far.

“We go directly into the companies, and we have teams of creatives that are working alongside them. We basically embed ourselves within the organizations that we’re working with very deeply,” Valenzuela explained. “We do versions of our film festival internally for teams as well so they can go through the process of making something and seeing the potential.”

Founded in 2018 at New York University’s Tisch School of the Arts by two Chileans and one Greek co-founder, Runway has a very different story than its Silicon Valley competitors. It was one of the first to bring an actually usable video-generation tool to the masses. Runway also contributed in foundational ways to the popular Stable Diffusion model.

Though it is vastly outspent by competitors like OpenAI, it has taken a hands-on approach to working with existing industries. You won’t hear Valenzuela or other Runway leaders talking about the imminence of AGI or anything so lofty; instead, it’s all about selling the product as something that can solve existing problems in creatives’ workflows.

Still, an artist’s mindset and relationships within the industry don’t negate some fundamental conflicts. There are multiple intellectual property cases involving Runway and its peers, and though the company hasn’t admitted it, there is evidence that it trained its models on copyrighted YouTube videos, among other things.

Cristóbal Valenzuela speaking on the AIFF 2025 stage. Credit: Samuel Axon

Valenzuela suggested that studios are worried about liability, not underlying principles, though, saying:

Most of the concerns on copyright are on the output side, which is like, how do you make sure that the model doesn’t create something that already exists or infringes on something. And I think for that, we’ve made sure our models don’t and are supportive of the creative direction you want to take without being too limiting. We work with every major studio, and we offer them indemnification.

In the past, he has also defended Runway by saying that what it’s producing is not a re-creation of what has come before. He sees the tool’s generative process as distinct—legally, creatively, and ethically—from simply pulling up assets or references from a database.

“People believe AI is sort of like a system that creates and conjures things magically with no input from users,” he said. “And it’s not. You have to do that work. You still are involved, and you’re still responsible as a user in terms of how you use it.”

He seemed to share this defense of AI as a legitimate tool for artists with conviction, but given that he’s been pitching these products directly to working filmmakers, he was also clearly aware that not everyone agrees with him. There is not even a consensus among those in the industry.

An industry divided

While in LA for the event, I visited separately with two of my oldest friends. Both of them work in the film and television industry in similar disciplines. They each asked what I was in town for, and I told them I was there to cover an AI film festival.

One immediately responded with a grimace of disgust, “Oh, yikes, I’m sorry.” The other responded with bright eyes and intense interest and began telling me how he already uses AI in his day-to-day to do things like extend shots by a second or two for a better edit, and expressed frustration at his company for not adopting the tools faster.

Neither is alone in their attitudes. Hollywood is divided—and not for the first time.

There have been seismic technological changes in the film industry before. There was the transition from silent films to talkies, obviously; moviemaking transformed into an entirely different art. Numerous old jobs were lost, and numerous new jobs were created.

Later, there was the transition from film to digital projection, which may be an even tighter parallel. It was a major disruption, with some companies and careers collapsing while others rose. There were people saying, “Why do we even need this?” while others believed it was the only sane way forward. Some audiences declared the quality worse, and others said it was better. There were analysts arguing it could be stopped, while others insisted it was inevitable.

IMAX’s head of post production, Bruce Markoe, spoke briefly about that history at a press mixer before the festival. “It was a little scary,” he recalled. “It was a big, fundamental change that we were going through.”

People ultimately embraced it, though. “The motion picture and television industry has always been very technology-forward, and they’ve always used new technologies to advance the state of the art and improve the efficiencies,” Markoe said.

When asked whether he thinks the same thing will happen with generative AI tools, he said, “I think some filmmakers are going to embrace it faster than others.” He pointed to AI tools’ usefulness for pre-visualization as particularly valuable and noted some people are already using it that way, but it will take time for people to get comfortable with.

And indeed, many, many filmmakers are still loudly skeptical. “The concept of AI is great,” The Mitchells vs. the Machines director Mike Rianda said in a Wired interview. “But in the hands of a corporation, it is like a buzzsaw that will destroy us all.”

Others are interested in the technology but are concerned that it’s being brought into the industry too quickly, with insufficient planning and protections. That includes Crafty Apes Senior VFX Supervisor Luke DiTomasso. “How fast do we roll out AI technologies without really having an understanding of them?” he asked in an interview with Production Designers Collective. “There’s a potential for AI to accelerate beyond what we might be comfortable with, so I do have some trepidation and am maybe not gung-ho about all aspects of it.

Others remain skeptical that the tools will be as useful as some optimists believe. “AI never passed on anything. It loved everything it read. It wants you to win. But storytelling requires nuance—subtext, emotion, what’s left unsaid. That’s something AI simply can’t replicate,” said Alegre Rodriquez, a member of the Emerging Technology committee at the Motion Picture Editors Guild.

The mirror

Flying back from Los Angeles, I considered two key differences between this generative AI inflection point for Hollywood and the silent/talkie or film/digital transitions.

First, neither of those transitions involved an existential threat to the technology on the basis of intellectual property and copyright. Valenzuela talked about what matters to studio heads—protection from liability over the outputs. But the countless creatives who are critical of these tools also believe they should be consulted and even compensated for their work’s use in the training data for Runway’s models. In other words, it’s not just about the outputs, it’s also about the sourcing. As noted before, there are several cases underway. We don’t know where they’ll land yet.

Second, there’s a more cultural and philosophical issue at play, which Valenzuela himself touched on in our conversation.

“I think AI has become this sort of mirror where anyone can project all their fears and anxieties, but also their optimism and ideas of the future,” he told me.

You don’t have to scroll for long to come across techno-utopians declaring with no evidence that AGI is right around the corner and that it will cure cancer and save our society. You also don’t have to scroll long to encounter visceral anger at every generative AI company from people declaring the technology—which is essentially just a new methodology for programming a computer—fundamentally unethical and harmful, with apocalyptic societal and economic ramifications.

Amid all those bold declarations, this film festival put the focus on the on-the-ground reality. First-time filmmakers who might never have previously cleared Hollywood’s gatekeepers are getting screened at festivals because they can create competitive-looking work with a fraction of the crew and hours. Studios and the people who work there are saying they’re saving time, resources, and headaches in pre-viz, editing, visual effects, and other work that’s usually done under immense time and resource pressure.

“People are not paying attention to the very huge amount of positive outcomes of this technology,” Valenzuela told me, pointing to those examples.

In this online discussion ecosystem that elevates outrage above everything else, that’s likely true. Still, there is a sincere and rigorous conviction among many creatives that their work is contributing to this technology’s capabilities without credit or compensation and that the structural and legal frameworks to ensure minimal human harm in this evolving period of disruption are still inadequate. That’s why we’ve seen groups like the Writers Guild of America West support the Generative AI Copyright Disclosure Act and other similar legislation meant to increase transparency about how these models are trained.

The philosophical question with a legal answer

The winning film argued that “total pixel space represents both the ultimate determinism and the ultimate freedom—every possibility existing simultaneously, waiting for consciousness to give it meaning through the act of choice.”

In making this statement, the film suggested that creativity, above all else, is an act of curation. It’s a claim that nothing, truly, is original. It’s a distillation of human expression into the language of mathematics.

To many, that philosophy rings undeniably true: Every possibility already exists, and artists are just collapsing the waveform to the frame they want to reveal. To others, there is more personal truth to the romantic ideal that artwork is valued precisely because it did not exist until the artist produced it.

All this is to say that the debate about creativity and AI in Hollywood is ultimately a philosophical one. But it won’t be resolved that way.

The industry may succumb to litigation fatigue and a hollowed-out workforce—or it may instead find its way to fair deals, new opportunities for fresh voices, and transparent training sets.

For all this lofty talk about creativity and ideas, the outcome will come down to the contracts, court decisions, and compensation structures—all things that have always been at least as big a part of Hollywood as the creative work itself.

Photo of Samuel Axon

Samuel Axon is the editorial lead for tech and gaming coverage at Ars Technica. He covers AI, software development, gaming, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

Curated realities: An AI film festival and the future of human expression Read More »