AI

you-can-now-download-the-source-code-that-sparked-the-ai-boom

You can now download the source code that sparked the AI boom

On Thursday, Google and the Computer History Museum (CHM) jointly released the source code for AlexNet, the convolutional neural network (CNN) that many credit with transforming the AI field in 2012 by proving that “deep learning” could achieve things conventional AI techniques could not.

Deep learning, which uses multi-layered neural networks that can learn from data without explicit programming, represented a significant departure from traditional AI approaches that relied on hand-crafted rules and features.

The Python code, now available on CHM’s GitHub page as open source software, offers AI enthusiasts and researchers a glimpse into a key moment of computing history. AlexNet served as a watershed moment in AI because it could accurately identify objects in photographs with unprecedented accuracy—correctly classifying images into one of 1,000 categories like “strawberry,” “school bus,” or “golden retriever” with significantly fewer errors than previous systems.

Like viewing original ENIAC circuitry or plans for Babbage’s Difference Engine, examining the AlexNet code may provide future historians insight into how a relatively simple implementation sparked a technology that has reshaped our world. While deep learning has enabled advances in health care, scientific research, and accessibility tools, it has also facilitated concerning developments like deepfakes, automated surveillance, and the potential for widespread job displacement.

But in 2012, those negative consequences still felt like far-off sci-fi dreams to many. Instead, experts were simply amazed that a computer could finally recognize images with near-human accuracy.

Teaching computers to see

As the CHM explains in its detailed blog post, AlexNet originated from the work of University of Toronto graduate students Alex Krizhevsky and Ilya Sutskever, along with their advisor Geoffrey Hinton. The project proved that deep learning could outperform traditional computer vision methods.

The neural network won the 2012 ImageNet competition by recognizing objects in photos far better than any previous method. Computer vision veteran Yann LeCun, who attended the presentation in Florence, Italy, immediately recognized its importance for the field, reportedly standing up after the presentation and calling AlexNet “an unequivocal turning point in the history of computer vision.” As Ars detailed in November, AlexNet marked the convergence of three critical technologies that would define modern AI.

You can now download the source code that sparked the AI boom Read More »

can-we-make-ai-less-power-hungry?-these-researchers-are-working-on-it.

Can we make AI less power-hungry? These researchers are working on it.


As demand surges, figuring out the performance of proprietary models is half the battle.

Credit: Igor Borisenko/Getty Images

Credit: Igor Borisenko/Getty Images

At the beginning of November 2024, the US Federal Energy Regulatory Commission (FERC) rejected Amazon’s request to buy an additional 180 megawatts of power directly from the Susquehanna nuclear power plant for a data center located nearby. The rejection was due to the argument that buying power directly instead of getting it through the grid like everyone else works against the interests of other users.

Demand for power in the US has been flat for nearly 20 years. “But now we’re seeing load forecasts shooting up. Depending on [what] numbers you want to accept, they’re either skyrocketing or they’re just rapidly increasing,” said Mark Christie, a FERC commissioner.

Part of the surge in demand comes from data centers, and their increasing thirst for power comes in part from running increasingly sophisticated AI models. As with all world-shaping developments, what set this trend into motion was vision—quite literally.

The AlexNet moment

Back in 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, AI researchers at the University of Toronto, were busy working on a convolution neural network (CNN) for the ImageNet LSRVC, an image-recognition contest. The contest’s rules were fairly simple: A team had to build an AI system that could categorize images sourced from a database comprising over a million labeled pictures.

The task was extremely challenging at the time, so the team figured they needed a really big neural net—way bigger than anything other research teams had attempted. AlexNet, named after the lead researcher, had multiple layers, with over 60 million parameters and 650 thousand neurons. The problem with a behemoth like that was how to train it.

What the team had in their lab were a few Nvidia GTX 580s, each with 3GB of memory. As the researchers wrote in their paper, AlexNet was simply too big to fit on any single GPU they had. So they figured out how to split AlexNet’s training phase between two GPUs working in parallel—half of the neurons ran on one GPU, and the other half ran on the other GPU.

AlexNet won the 2012 competition by a landslide, but the team accomplished something way more profound. The size of AI models was once and for all decoupled from what was possible to do on a single CPU or GPU. The genie was out of the bottle.

(The AlexNet source code was recently made available through the Computer History Museum.)

The balancing act

After AlexNet, using multiple GPUs to train AI became a no-brainer. Increasingly powerful AIs used tens of GPUs, then hundreds, thousands, and more. But it took some time before this trend started making its presence felt on the grid. According to an Electric Power Research Institute (EPRI) report, the power consumption of data centers was relatively flat between 2010 and 2020. That doesn’t mean the demand for data center services was flat, but the improvements in data centers’ energy efficiency were sufficient to offset the fact we were using them more.

Two key drivers of that efficiency were the increasing adoption of GPU-based computing and improvements in the energy efficiency of those GPUs. “That was really core to why Nvidia was born. We paired CPUs with accelerators to drive the efficiency onward,” said Dion Harris, head of Data Center Product Marketing at Nvidia. In the 2010–2020 period, Nvidia data center chips became roughly 15 times more efficient, which was enough to keep data center power consumption steady.

All that changed with the rise of enormous large language transformer models, starting with ChatGPT in 2022. “There was a very big jump when transformers became mainstream,” said Mosharaf Chowdhury, a professor at the University of Michigan. (Chowdhury is also at the ML Energy Initiative, a research group focusing on making AI more energy-efficient.)

Nvidia has kept up its efficiency improvements, with a ten-fold boost between 2020 and today. The company also kept improving chips that were already deployed. “A lot of where this efficiency comes from was software optimization. Only last year, we improved the overall performance of Hopper by about 5x,” Harris said. Despite these efficiency gains, based on Lawrence Berkely National Laboratory estimates, the US saw data center power consumption shoot up from around 76 TWh in 2018 to 176 TWh in 2023.

The AI lifecycle

LLMs work with tens of billions of neurons approaching a number rivaling—and perhaps even surpassing—those in the human brain. The GPT 4 is estimated to work with around 100 billion neurons distributed over 100 layers and over 100 trillion parameters that define the strength of connections among the neurons. These parameters are set during training, when the AI is fed huge amounts of data and learns by adjusting these values. That’s followed by the inference phase, where it gets busy processing queries coming in every day.

The training phase is a gargantuan computational effort—Open AI supposedly used over 25,000 Nvidia Ampere 100 GPUs running on all cylinders for 100 days. The estimated power consumption is 50 GW-hours, which is enough to power a medium-sized town for a year. According to numbers released by Google, training accounts for 40 percent of the total AI model power consumption over its lifecycle. The remaining 60 percent is inference, where power consumption figures are less spectacular but add up over time.

Trimming AI models down

The increasing power consumption has pushed the computer science community to think about how to keep memory and computing requirements down without sacrificing performance too much. “One way to go about it is reducing the amount of computation,” said Jae-Won Chung, a researcher at the University of Michigan and a member of the ML Energy Initiative.

One of the first things researchers tried was a technique called pruning, which aimed to reduce the number of parameters. Yann LeCun, now the chief AI scientist at Meta, proposed this approach back in 1989, terming it (somewhat menacingly) “the optimal brain damage.” You take a trained model and remove some of its parameters, usually targeting the ones with a value of zero, which add nothing to the overall performance. “You take a large model and distill it into a smaller model trying to preserve the quality,” Chung explained.

You can also make those remaining parameters leaner with a trick called quantization. Parameters in neural nets are usually represented as a single-precision floating point number, occupying 32 bits of computer memory. “But you can change the format of parameters to a smaller one that reduces the amount of needed memory and makes the computation faster,” Chung said.

Shrinking an individual parameter has a minor effect, but when there are billions of them, it adds up. It’s also possible to do quantization-aware training, which performs quantization at the training stage. According to Nvidia, which implemented quantization training in its AI model optimization toolkit, this should cut the memory requirements by 29 to 51 percent.

Pruning and quantization belong to a category of optimization techniques that rely on tweaking the way AI models work internally—how many parameters they use and how memory-intensive their storage is. These techniques are like tuning an engine in a car to make it go faster and use less fuel. But there’s another category of techniques that focus on the processes computers use to run those AI models instead of the models themselves—akin to speeding a car up by timing the traffic lights better.

Finishing first

Apart from optimizing the AI models themselves, we could also optimize the way data centers run them. Splitting the training phase workload evenly among 25 thousand GPUs introduces inefficiencies. “When you split the model into 100,000 GPUs, you end up slicing and dicing it in multiple dimensions, and it is very difficult to make every piece exactly the same size,” Chung said.

GPUs that have been given significantly larger workloads have increased power consumption that is not necessarily balanced out by those with smaller loads. Chung figured that if GPUs with smaller workloads ran slower, consuming much less power, they would finish roughly at the same time as GPUs processing larger workloads operating at full speed. The trick was to pace each GPU in such a way that the whole cluster would finish at the same time.

To make that happen, Chung built a software tool called Perseus that identified the scope of the workloads assigned to each GPU in a cluster. Perseus takes the estimated time needed to complete the largest workload on a GPU running at full. It then estimates how much computation must be done on each of the remaining GPUs and determines what speed to run them so they finish at the same. “Perseus precisely slows some of the GPUs down, and slowing down means less energy. But the end-to-end speed is the same,” Chung said.

The team tested Perseus by training the publicly available GPT-3, as well as other large language models and a computer vision AI. The results were promising. “Perseus could cut up to 30 percent of energy for the whole thing,” Chung said. He said the team is talking about deploying Perseus at Meta, “but it takes a long time to deploy something at a large company.”

Are all those optimizations to the models and the way data centers run them enough to keep us in the green? It takes roughly a year or two to plan and build a data center, but it can take longer than that to build a power plant. So are we winning this race or losing? It’s a bit hard to say.

Back of the envelope

As the increasing power consumption of data centers became apparent, research groups tried to quantify the problem. A Lawerence Berkley Laboratory team estimated that data centers’ annual energy draw in 2028 would be between 325 and 580 TWh in the US—that’s between 6.7 and 12 percent of the total US electricity consumption. The International Energy Agency thinks it will be around 6 percent by 2026. Goldman Sachs Research says 8 percent by 2030, while EPRI claims between 4.6 and 9.1 percent by 2030.

EPRI also warns that the impact will be even worse because data centers tend to be concentrated at locations investors think are advantageous, like Virginia, which already sends 25 percent of its electricity to data centers. In Ireland, data centers are expected to consume one-third of the electricity produced in the entire country in the near future. And that’s just the beginning.

Running huge AI models like ChatGPT is one of the most power-intensive things that data centers do, but it accounts for roughly 12 percent of their operations, according to Nvidia. That is expected to change if companies like Google start to weave conversational LLMs into their most popular services. The EPRI report estimates that a single Google search today uses around 0.3 watts of power, while a single Chat GPT query bumps that up to 2.9 watts. Based on those values, the report estimates that an AI-powered Google search would require Google to deploy 400,000 new servers that would consume 22.8 TWh per year.

“AI searches take 10x the electricity of a non-AI search,” Christie, the FERC commissioner, said at a FERC-organized conference. When FERC commissioners are using those numbers, you’d think there would be rock-solid science backing them up. But when Ars asked Chowdhury and Chung about their thoughts on these estimates, they exchanged looks… and smiled.

Closed AI problem

Chowdhury and Chung don’t think those numbers are particularly credible. They feel we know nothing about what’s going on inside commercial AI systems like ChatGPT or Gemini, because OpenAI and Google have never released actual power-consumption figures.

“They didn’t publish any real numbers, any academic papers. The only number, 0.3 watts per Google search, appeared in some blog post or other PR-related thingy,” Chodwhury said. We don’t know how this power consumption was measured, on what hardware, or under what conditions, he said. But at least it came directly from Google.

“When you take that 10x Google vs ChatGPT equation or whatever—one part is half-known, the other part is unknown, and then the division is done by some third party that has no relationship with Google nor with Open AI,” Chowdhury said.

Google’s “PR-related thingy” was published back in 2009, while the 2.9-watts-per-ChatGPT-query figure was probably based on a comment about the number of GPUs needed to train GPT-4 made by Jensen Huang, Nvidia’s CEO, in 2024. That means the “10x AI versus non-AI search” claim was actually based on power consumption achieved on entirely different generations of hardware separated by 15 years. “But the number seemed plausible, so people keep repeating it,” Chowdhury said.

All reports we have today were done by third parties that are not affiliated with the companies building big AIs, and yet they arrive at weirdly specific numbers. “They take numbers that are just estimates, then multiply those by a whole lot of other numbers and get back with statements like ‘AI consumes more energy than Britain, or more than Africa, or something like that.’ The truth is they don’t know that,” Chowdhury said.

He argues that better numbers would require benchmarking AI models using a formal testing procedure that could be verified through the peer-review process.

As it turns out, the ML Energy Initiative defined just such a testing procedure and ran the benchmarks on any AI models they could get ahold of. The group then posted the results online on their ML.ENERGY Leaderboard.

AI-efficiency leaderboard

To get good numbers, the first thing the ML Energy Initiative got rid of was the idea of estimating how power-hungry GPU chips are by using their thermal design power (TDP), which is basically their maximum power consumption. Using TDP was a bit like rating a car’s efficiency based on how much fuel it burned running at full speed. That’s not how people usually drive, and that’s not how GPUs work when running AI models. So Chung built ZeusMonitor, an all-in-one solution that measured GPU power consumption on the fly.

For the tests, his team used setups with Nvidia’s A100 and H100 GPUs, the ones most commonly used at data centers today, and measured how much energy they used running various large language models (LLMs), diffusion models that generate pictures or videos based on text input, and many other types of AI systems.

The largest LLM included in the leaderboard was Meta’s Llama 3.1 405B, an open-source chat-based AI with 405 billion parameters. It consumed 3352.92 joules of energy per request running on two H100 GPUs. That’s around 0.93 watt-hours—significantly less than 2.9 watt-hours quoted for ChatGPT queries. These measurements confirmed the improvements in the energy efficiency of hardware. Mixtral 8x22B was the largest LLM the team managed to run on both Ampere and Hopper platforms. Running the model on two Ampere GPUs resulted in 0.32 watt-hours per request, compared to just 0.15 watt-hours on one Hopper GPU.

What remains unknown, however, is the performance of proprietary models like GPT-4, Gemini, or Grok. The ML Energy Initiative team says it’s very hard for the research community to start coming up with solutions to the energy efficiency problems when we don’t even know what exactly we’re facing. We can make estimates, but Chung insists they need to be accompanied by error-bound analysis. We don’t have anything like that today.

The most pressing issue, according to Chung and Chowdhury, is the lack of transparency. “Companies like Google or Open AI have no incentive to talk about power consumption. If anything, releasing actual numbers would harm them,” Chowdhury said. “But people should understand what is actually happening, so maybe we should somehow coax them into releasing some of those numbers.”

Where rubber meets the road

“Energy efficiency in data centers follows the trend similar to Moore’s law—only working at a very large scale, instead of on a single chip,” Nvidia’s Harris said. The power consumption per rack, a unit used in data centers housing between 10 and 14 Nvidia GPUs, is going up, he said, but the performance-per-watt is getting better.

“When you consider all the innovations going on in software optimization, cooling systems, MEP (mechanical, electrical, and plumbing), and GPUs themselves, we have a lot of headroom,” Harris said. He expects this large-scale variant of Moore’s law to keep going for quite some time, even without any radical changes in technology.

There are also more revolutionary technologies looming on the horizon. The idea that drove companies like Nvidia to their current market status was the concept that you could offload certain tasks from the CPU to dedicated, purpose-built hardware. But now, even GPUs will probably use their own accelerators in the future. Neural nets and other parallel computation tasks could be implemented on photonic chips that use light instead of electrons to process information. Photonic computing devices are orders of magnitude more energy-efficient than the GPUs we have today and can run neural networks literally at the speed of light.

Another innovation to look forward to is 2D semiconductors, which enable building incredibly small transistors and stacking them vertically, vastly improving the computation density possible within a given chip area. “We are looking at a lot of these technologies, trying to assess where we can take them,” Harris said. “But where rubber really meets the road is how you deploy them at scale. It’s probably a bit early to say where the future bang for buck will be.”

The problem is when we are making a resource more efficient, we simply end up using it more. “It is a Jevons paradox, known since the beginnings of the industrial age. But will AI energy consumption increase so much that it causes an apocalypse? Chung doesn’t think so. According to Chowdhury, if we run out of energy to power up our progress, we will simply slow down.

“But people have always been very good at finding the way,” Chowdhury added.

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Can we make AI less power-hungry? These researchers are working on it. Read More »

why-anthropic’s-claude-still-hasn’t-beaten-pokemon

Why Anthropic’s Claude still hasn’t beaten Pokémon


Weeks later, Sonnet’s “reasoning” model is struggling with a game designed for children.

A game Boy Color playing Pokémon Red surrounded by the tendrils of an AI, or maybe some funky glowing wires, what do AI tendrils look like anyways

Gotta subsume ’em all into the machine consciousness! Credit: Aurich Lawson

Gotta subsume ’em all into the machine consciousness! Credit: Aurich Lawson

In recent months, the AI industry’s biggest boosters have started converging on a public expectation that we’re on the verge of “artificial general intelligence” (AGI)—virtual agents that can match or surpass “human-level” understanding and performance on most cognitive tasks.

OpenAI is quietly seeding expectations for a “PhD-level” AI agent that could operate autonomously at the level of a “high-income knowledge worker” in the near future. Elon Musk says that “we’ll have AI smarter than any one human probably” by the end of 2025. Anthropic CEO Dario Amodei thinks it might take a bit longer but similarly says it’s plausible that AI will be “better than humans at almost everything” by the end of 2027.

A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem.

Can Claude play Pokémon?

A thread: pic.twitter.com/K8SkNXCxYJ

— Anthropic (@AnthropicAI) February 25, 2025

Last month, Anthropic presented its “Claude Plays Pokémon” experiment as a waypoint on the road to that predicted AGI future. It’s a project the company said shows “glimmers of AI systems that tackle challenges with increasing competence, not just through training but with generalized reasoning.” Anthropic made headlines by trumpeting how Claude 3.7 Sonnet’s “improved reasoning capabilities” let the company’s latest model make progress in the popular old-school Game Boy RPG in ways “that older models had little hope of achieving.”

While Claude models from just a year ago struggled even to leave the game’s opening area, Claude 3.7 Sonnet was able to make progress by collecting multiple in-game Gym Badges in a relatively small number of in-game actions. That breakthrough, Anthropic wrote, was because the “extended thinking” by Claude 3.7 Sonnet means the new model “plans ahead, remembers its objectives, and adapts when initial strategies fail” in a way that its predecessors didn’t. Those things, Anthropic brags, are “critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too.”

Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones.

Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones. Credit: Anthropic

But relative success over previous models is not the same as absolute success over the game in its entirety. In the weeks since Claude Plays Pokémon was first made public, thousands of Twitch viewers have watched Claude struggle to make consistent progress in the game. Despite long “thinking” pauses between each move—during which viewers can read printouts of the system’s simulated reasoning process—Claude frequently finds itself pointlessly revisiting completed towns, getting stuck in blind corners of the map for extended periods, or fruitlessly talking to the same unhelpful NPC over and over, to cite just a few examples of distinctly sub-human in-game performance.

Watching Claude continue to struggle at a game designed for children, it’s hard to imagine we’re witnessing the genesis of some sort of computer superintelligence. But even Claude’s current sub-human level of Pokémon performance could hold significant lessons for the quest toward generalized, human-level artificial intelligence.

Smart in different ways

In some sense, it’s impressive that Claude can play Pokémon with any facility at all. When developing AI systems that find dominant strategies in games like Go and Dota 2, engineers generally start their algorithms off with deep knowledge of a game’s rules and/or basic strategies, as well as a reward function to guide them toward better performance. For Claude Plays Pokémon, though, project developer and Anthropic employee David Hershey says he started with an unmodified, generalized Claude model that wasn’t specifically trained or tuned to play Pokémon games in any way.

“This is purely the various other things that [Claude] understands about the world being used to point at video games,” Hershey told Ars. “So it has a sense of a Pokémon. If you go to claude.ai and ask about Pokémon, it knows what Pokémon is based on what it’s read… If you ask, it’ll tell you there’s eight gym badges, it’ll tell you the first one is Brock… it knows the broad structure.”

A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in).

A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in). Credit: Anthropic / Excelidraw

In addition to directly monitoring certain key (emulated) Game Boy RAM addresses for game state information, Claude views and interprets the game’s visual output much like a human would. But despite recent advances in AI image processing, Hershey said Claude still struggles to interpret the low-resolution, pixelated world of a Game Boy screenshot as well as a human can. “Claude’s still not particularly good at understanding what’s on the screen at all,” he said. “You will see it attempt to walk into walls all the time.”

Hershey said he suspects Claude’s training data probably doesn’t contain many overly detailed text descriptions of “stuff that looks like a Game Boy screen.” This means that, somewhat surprisingly, if Claude were playing a game with “more realistic imagery, I think Claude would actually be able to see a lot better,” Hershey said.

“It’s one of those funny things about humans that we can squint at these eight-by-eight pixel blobs of people and say, ‘That’s a girl with blue hair,’” Hershey continued. “People, I think, have that ability to map from our real world to understand and sort of grok that… so I’m honestly kind of surprised that Claude’s as good as it is at being able to see there’s a person on the screen.”

Even with a perfect understanding of what it’s seeing on-screen, though, Hershey said Claude would still struggle with 2D navigation challenges that would be trivial for a human. “It’s pretty easy for me to understand that [an in-game] building is a building and that I can’t walk through a building,” Hershey said. “And that’s [something] that’s pretty challenging for Claude to understand… It’s funny because it’s just kind of smart in different ways, you know?”

A sample Pokémon screen with an overlay showing how Claude characterizes the game’s grid-based map.

A sample Pokémon screen with an overlay showing how Claude characterizes the game’s grid-based map. Credit: Anthrropic / X

Where Claude tends to perform better, Hershey said, is in the more text-based portions of the game. During an in-game battle, Claude will readily notice when the game tells it that an attack from an electric-type Pokémon is “not very effective” against a rock-type opponent, for instance. Claude will then squirrel that factoid away in a massive written knowledge base for future reference later in the run. Claude can also integrate multiple pieces of similar knowledge into pretty elegant battle strategies, even extending those strategies into long-term plans for catching and managing teams of multiple creatures for future battles.

Claude can even show surprising “intelligence” when Pokémon’s in-game text is intentionally misleading or incomplete. “It’s pretty funny that they tell you you need to go find Professor Oak next door and then he’s not there,” Hershey said of an early-game task. “As a 5-year-old, that was very confusing to me. But Claude actually typically goes through that same set of motions where it talks to mom, goes to the lab, doesn’t find [Oak], says, ‘I need to figure something out’… It’s sophisticated enough to sort of go through the motions of the way [humans are] actually supposed to learn it, too.”

A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle.

A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle. Credit: Claude Plays Pokemon / Twitch

These kinds of relative strengths and weaknesses when compared to “human-level” play reflect the overall state of AI research and capabilities in general, Hershey said. “I think it’s just a sort of universal thing about these models… We built the text side of it first, and the text side is definitely… more powerful. How these models can reason about images is getting better, but I think it’s a decent bit behind.”

Forget me not

Beyond issues parsing text and images, Hershey also acknowledged that Claude can have trouble “remembering” what it has already learned. The current model has a “context window” of 200,000 tokens, limiting the amount of relational information it can store in its “memory” at any one time. When the system’s ever-expanding knowledge base fills up this context window, Claude goes through an elaborate summarization process, condensing detailed notes on what it has seen, done, and learned so far into shorter text summaries that lose some of the fine-grained details.

This can mean that Claude “has a hard time keeping track of things for a very long time and really having a great sense of what it’s tried so far,” Hershey said. “You will definitely see it occasionally delete something that it shouldn’t have. Anything that’s not in your knowledge base or not in your summary is going to be gone, so you have to think about what you want to put there.”

A small window into the kind of “cleaning up my context” knowledge-base update necessitated by Claude’s limited “memory.”

A small window into the kind of “cleaning up my context” knowledge-base update necessitated by Claude’s limited “memory.” Credit: Claude Play Pokemon / Twitch

More than forgetting important history, though, Claude runs into bigger problems when it inadvertently inserts incorrect information into its knowledge base. Like a conspiracy theorist who builds an entire worldview from an inherently flawed premise, Claude can be incredibly slow to recognize when an error in its self-authored knowledge base is leading its Pokémon play astray.

“The things that are written down in the past, it sort of trusts pretty blindly,” Hershey said. “I have seen it become very convinced that it found the exit to [in-game location] Viridian Forest at some specific coordinates, and then it spends hours and hours exploring a little small square around those coordinates that are wrong instead of doing anything else. It takes a very long time for it to decide that that was a ‘fail.’”

Still, Hershey said Claude 3.7 Sonnet is much better than earlier models at eventually “questioning its assumptions, trying new strategies, and keeping track over long horizons of various strategies to [see] whether they work or not.” While the new model will still “struggle for really long periods of time” retrying the same thing over and over, it will ultimately tend to “get a sense of what’s going on and what it’s tried before, and it stumbles a lot of times into actual progress from that,” Hershey said.

“We’re getting pretty close…”

One of the most interesting things about observing Claude Plays Pokémon across multiple iterations and restarts, Hershey said, is seeing how the system’s progress and strategy can vary quite a bit between runs. Sometimes Claude will show it’s “capable of actually building a pretty coherent strategy” by “keeping detailed notes about the different paths to try,” for instance, he said. But “most of the time it doesn’t… most of the time, it wanders into the wall because it’s confident it sees the exit.”

Where previous models wandered aimlessly or got stuck in loops, Claude 3.7 Sonnet plans ahead, remembers its objectives, and adapts when initial strategies fail.

Critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too. pic.twitter.com/scvISp14XG

— Anthropic (@AnthropicAI) February 25, 2025

One of the biggest things preventing the current version of Claude from getting better, Hershey said, is that “when it derives that good strategy, I don’t think it necessarily has the self-awareness to know that one strategy [it] came up with is better than another.” And that’s not a trivial problem to solve.

Still, Hershey said he sees “low-hanging fruit” for improving Claude’s Pokémon play by improving the model’s understanding of Game Boy screenshots. “I think there’s a chance it could beat the game if it had a perfect sense of what’s on the screen,” Hershey said, saying that such a model would probably perform “a little bit short of human.”

Expanding the context window for future Claude models will also probably allow those models to “reason over longer time frames and handle things more coherently over a long period of time,” Hershey said. Future models will improve by getting “a little bit better at remembering, keeping track of a coherent set of what it needs to try to make progress,” he added.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon. Credit: Claude Plays Pokemon / Twitch

Whatever you think about impending improvements in AI models, though, Claude’s current performance at Pokémon doesn’t make it seem like it’s poised to usher in an explosion of human-level, completely generalizable artificial intelligence. And Hershey allows that watching Claude 3.7 Sonnet get stuck on Mt. Moon for 80 hours or so can make it “seem like a model that doesn’t know what it’s doing.”

But Hershey is still impressed at the way that Claude’s new reasoning model will occasionally show some glimmer of awareness and “kind of tell that it doesn’t know what it’s doing and know that it needs to be doing something different. And the difference between ‘can’t do it at all’ and ‘can kind of do it’ is a pretty big one for these AI things for me,” he continued. “You know, when something can kind of do something it typically means we’re pretty close to getting it to be able to do something really, really well.”

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Why Anthropic’s Claude still hasn’t beaten Pokémon Read More »

cloudflare-turns-ai-against-itself-with-endless-maze-of-irrelevant-facts

Cloudflare turns AI against itself with endless maze of irrelevant facts

On Wednesday, web infrastructure provider Cloudflare announced a new feature called “AI Labyrinth” that aims to combat unauthorized AI data scraping by serving fake AI-generated content to bots. The tool will attempt to thwart AI companies that crawl websites without permission to collect training data for large language models that power AI assistants like ChatGPT.

Cloudflare, founded in 2009, is probably best known as a company that provides infrastructure and security services for websites, particularly protection against distributed denial-of-service (DDoS) attacks and other malicious traffic.

Instead of simply blocking bots, Cloudflare’s new system lures them into a “maze” of realistic-looking but irrelevant pages, wasting the crawler’s computing resources. The approach is a notable shift from the standard block-and-defend strategy used by most website protection services. Cloudflare says blocking bots sometimes backfires because it alerts the crawler’s operators that they’ve been detected.

“When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them,” writes Cloudflare. “But while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.”

The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven). Cloudflare creates this content using its Workers AI service, a commercial platform that runs AI tasks.

Cloudflare designed the trap pages and links to remain invisible and inaccessible to regular visitors, so people browsing the web don’t run into them by accident.

A smarter honeypot

AI Labyrinth functions as what Cloudflare calls a “next-generation honeypot.” Traditional honeypots are invisible links that human visitors can’t see but bots parsing HTML code might follow. But Cloudflare says modern bots have become adept at spotting these simple traps, necessitating more sophisticated deception. The false links contain appropriate meta directives to prevent search engine indexing while remaining attractive to data-scraping bots.

Cloudflare turns AI against itself with endless maze of irrelevant facts Read More »

anthropic’s-new-ai-search-feature-digs-through-the-web-for-answers

Anthropic’s new AI search feature digs through the web for answers

Caution over citations and sources

Claude users should be warned that large language models (LLMs) like those that power Claude are notorious for sneaking in plausible-sounding confabulated sources. A recent survey of citation accuracy by LLM-based web search assistants showed a 60 percent error rate. That particular study did not include Anthropic’s new search feature because it took place before this current release.

When using web search, Claude provides citations for information it includes from online sources, ostensibly helping users verify facts. From our informal and unscientific testing, Claude’s search results appeared fairly accurate and detailed at a glance, but that is no guarantee of overall accuracy. Anthropic did not release any search accuracy benchmarks, so independent researchers will likely examine that over time.

A screenshot example of what Anthropic Claude's web search citations look like, captured March 21, 2025.

A screenshot example of what Anthropic Claude’s web search citations look like, captured March 21, 2025. Credit: Benj Edwards

Even if Claude search were, say, 99 percent accurate (a number we are making up as an illustration), the 1 percent chance it is wrong may come back to haunt you later if you trust it blindly. Before accepting any source of information delivered by Claude (or any AI assistant) for any meaningful purpose, vet it very carefully using multiple independent non-AI sources.

A partnership with Brave under the hood

Behind the scenes, it looks like Anthropic partnered with Brave Search to power the search feature, from a company, Brave Software, perhaps best known for its web browser app. Brave Search markets itself as a “private search engine,” which feels in line with how Anthropic likes to market itself as an ethical alternative to Big Tech products.

Simon Willison discovered the connection between Anthropic and Brave through Anthropic’s subprocessor list (a list of third-party services that Anthropic uses for data processing), which added Brave Search on March 19.

He further demonstrated the connection on his blog by asking Claude to search for pelican facts. He wrote, “It ran a search for ‘Interesting pelican facts’ and the ten results it showed as citations were an exact match for that search on Brave.” He also found evidence in Claude’s own outputs, which referenced “BraveSearchParams” properties.

The Brave engine under the hood has implications for individuals, organizations, or companies that might want to block Claude from accessing their sites since, presumably, Brave’s web crawler is doing the web indexing. Anthropic did not mention how sites or companies could opt out of the feature. We have reached out to Anthropic for clarification.

Anthropic’s new AI search feature digs through the web for answers Read More »

apple-reportedly-planning-executive-shake-up-to-address-siri-delays

Apple reportedly planning executive shake-up to address Siri delays

The Vision Pro was not exactly a smash hit for Apple, but no one expected a $3,500 VR headset to have the same impact as the iPhone. However, the Vision Pro did what it was supposed to do, and there is apparently a feeling inside the company that Rockwell knows how to leverage his technical expertise to get products out the door. The effort to release the Vision Pro involved years of work with a large team of engineers and designers, and several of the key advances required for its completion involved artificial intelligence.

Apple Siri AI

Credit: Apple

Apple’s work on Siri will remain under the ultimate purview of Craig Federighi, the senior vice president of software engineering. He’s responsible for all development work on iOS, iPadOS, and macOS. He was also deeply involved with the launch of Apple Intelligence alongside Giannandrea.

While one of his primary projects is being reassigned, Giannandrea will reportedly remain at the company for now. However, Apple may simply want him around for the optics. The abrupt departure of a senior AI figure during the troubled rollout of Apple Intelligence, which is now enabled by default, could further affect confidence in the company’s AI efforts.

For good or ill, generative AI features are key to the strategy at most large technology firms. Apple aggressively advertised Apple Intelligence during the iPhone 16 launch. It also cited the AI-enhanced Siri as a selling point, making the recent delay all the more awkward. Even if this shake-up gets Siri back on track, the late-to-arrive feature will be under intense scrutiny when it does finally show up.

Apple reportedly planning executive shake-up to address Siri delays Read More »

study-finds-ai-generated-meme-captions-funnier-than-human-ones-on-average

Study finds AI-generated meme captions funnier than human ones on average

It’s worth clarifying that AI models did not generate the images used in the study. Instead, researchers used popular, pre-existing meme templates, and GPT-4o or human participants generated captions for them.

More memes, not better memes

When crowdsourced participants rated the memes, those created entirely by AI models scored higher on average in humor, creativity, and shareability. The researchers defined shareability as a meme’s potential to be widely circulated, influenced by humor, relatability, and relevance to current cultural topics. They note that this study is among the first to show AI-generated memes outperforming human-created ones across these metrics.

However, the study comes with an important caveat. On average, fully AI-generated memes scored higher than those created by humans alone or humans collaborating with AI. But when researchers looked at the best individual memes, humans created the funniest examples, and human-AI collaborations produced the most creative and shareable memes. In other words, AI models consistently produced broadly appealing memes, but humans—with or without AI help—still made the most exceptional individual examples.

Diagrams of meme creation and evaluation workflows taken from the paper.

Diagrams of meme creation and evaluation workflows taken from the paper. Credit: Wu et al.

The study also found that participants using AI assistance generated significantly more meme ideas and described the process as easier and requiring less effort. Despite this productivity boost, human-AI collaborative memes did not rate higher on average than memes humans created alone. As the researchers put it, “The increased productivity of human-AI teams does not lead to better results—just to more results.”

Participants who used AI assistance reported feeling slightly less ownership over their creations compared to solo creators. Given that a sense of ownership influenced creative motivation and satisfaction in the study, the researchers suggest that people interested in using AI should carefully consider how to balance AI assistance in creative tasks.

Study finds AI-generated meme captions funnier than human ones on average Read More »

nvidia-announces-dgx-desktop-“personal-ai-supercomputers”

Nvidia announces DGX desktop “personal AI supercomputers”

During Tuesday’s Nvidia GTX keynote, CEO Jensen Huang unveiled two “personal AI supercomputers” called DGX Spark and DGX Station, both powered by the Grace Blackwell platform. In a way, they are a new type of AI PC architecture specifically built for running neural networks, and five major PC manufacturers will build the supercomputers.

These desktop systems, first previewed as “Project DIGITS” in January, aim to bring AI capabilities to developers, researchers, and data scientists who need to prototype, fine-tune, and run large AI models locally. DGX systems can serve as standalone desktop AI labs or “bridge systems” that allow AI developers to move their models from desktops to DGX Cloud or any AI cloud infrastructure with few code changes.

Huang explained the rationale behind these new products in a news release, saying, “AI has transformed every layer of the computing stack. It stands to reason a new class of computers would emerge—designed for AI-native developers and to run AI-native applications.”

The smaller DGX Spark features the GB10 Grace Blackwell Superchip with Blackwell GPU and fifth-generation Tensor Cores, delivering up to 1,000 trillion operations per second for AI.

Meanwhile, the more powerful DGX Station includes the GB300 Grace Blackwell Ultra Desktop Superchip with 784GB of coherent memory and the ConnectX-8 SuperNIC supporting networking speeds up to 800Gb/s.

The DGX architecture serves as a prototype that other manufacturers can produce. Asus, Dell, HP, and Lenovo will develop and sell both DGX systems, with DGX Spark reservations opening today and DGX Station expected later in 2025. Additional manufacturing partners for the DGX Station include BOXX, Lambda, and Supermicro, with systems expected to be available later this year.

Since the systems will be manufactured by different companies, Nvidia did not mention pricing for the units. However, in January, Nvidia mentioned that the base-level configuration for a DGX Spark-like computer would retail for around $3,000.

Nvidia announces DGX desktop “personal AI supercomputers” Read More »

nvidia-announces-“rubin-ultra”-and-“feynman”-ai-chips-for-2027-and-2028

Nvidia announces “Rubin Ultra” and “Feynman” AI chips for 2027 and 2028

On Tuesday at Nvidia’s GTC 2025 conference in San Jose, California, CEO Jensen Huang revealed several new AI-accelerating GPUs the company plans to release over the coming months and years. He also revealed more specifications about previously announced chips.

The centerpiece announcement was Vera Rubin, first teased at Computex 2024 and now scheduled for release in the second half of 2026. This GPU, named after a famous astronomer, will feature tens of terabytes of memory and comes with a custom Nvidia-designed CPU called Vera.

According to Nvidia, Vera Rubin will deliver significant performance improvements over its predecessor, Grace Blackwell, particularly for AI training and inference.

Specifications for Vera Rubin, presented by Jensen Huang during his GTC 2025 keynote.

Specifications for Vera Rubin, presented by Jensen Huang during his GTC 2025 keynote.

Vera Rubin features two GPUs together on one die that deliver 50 petaflops of FP4 inference performance per chip. When configured in a full NVL144 rack, the system delivers 3.6 exaflops of FP4 inference compute—3.3 times more than Blackwell Ultra’s 1.1 exaflops in a similar rack configuration.

The Vera CPU features 88 custom ARM cores with 176 threads connected to Rubin GPUs via a high-speed 1.8 TB/s NVLink interface.

Huang also announced Rubin Ultra, which will follow in the second half of 2027. Rubin Ultra will use the NVL576 rack configuration and feature individual GPUs with four reticle-sized dies, delivering 100 petaflops of FP4 precision (a 4-bit floating-point format used for representing and processing numbers within AI models) per chip.

At the rack level, Rubin Ultra will provide 15 exaflops of FP4 inference compute and 5 exaflops of FP8 training performance—about four times more powerful than the Rubin NVL144 configuration. Each Rubin Ultra GPU will include 1TB of HBM4e memory, with the complete rack containing 365TB of fast memory.

Nvidia announces “Rubin Ultra” and “Feynman” AI chips for 2027 and 2028 Read More »

gemini-gets-new-coding-and-writing-tools,-plus-ai-generated-“podcasts”

Gemini gets new coding and writing tools, plus AI-generated “podcasts”

On the heels of its release of new Gemini models last week, Google has announced a pair of new features for its flagship AI product. Starting today, Gemini has a new Canvas feature that lets you draft, edit, and refine documents or code. Gemini is also getting Audio Overviews, a neat capability that first appeared in the company’s NotebookLM product, but it’s getting even more useful as part of Gemini.

Canvas is similar (confusingly) to the OpenAI product of the same name. Canvas is available in the Gemini prompt bar on the web and mobile app. Simply upload a document and tell Gemini what you need to do with it. In Google’s example, the user asks for a speech based on a PDF containing class notes. And just like that, Gemini spits out a document.

Canvas lets you refine the AI-generated documents right inside Gemini. The writing tools available across the Google ecosystem, with options like suggested edits and different tones, are available inside the Gemini-based editor. If you want to do more edits or collaborate with others, you can export the document to Google Docs with a single click.

Gemini Canvas with tic-tac-toe game

Credit: Google

Canvas is also adept at coding. Just ask, and Canvas can generate prototype web apps, Python scripts, HTML, and more. You can ask Gemini about the code, make alterations, and even preview your results in real time inside Gemini as you (or the AI) make changes.

Gemini gets new coding and writing tools, plus AI-generated “podcasts” Read More »

farewell-photoshop?-google’s-new-ai-lets-you-edit-images-by-asking.

Farewell Photoshop? Google’s new AI lets you edit images by asking.


New AI allows no-skill photo editing, including adding objects and removing watermarks.

A collection of images either generated or modified by Gemini 2.0 Flash (Image Generation) Experimental. Credit: Google / Ars Technica

There’s a new Google AI model in town, and it can generate or edit images as easily as it can create text—as part of its chatbot conversation. The results aren’t perfect, but it’s quite possible everyone in the near future will be able to manipulate images this way.

Last Wednesday, Google expanded access to Gemini 2.0 Flash’s native image-generation capabilities, making the experimental feature available to anyone using Google AI Studio. Previously limited to testers since December, the multimodal technology integrates both native text and image processing capabilities into one AI model.

The new model, titled “Gemini 2.0 Flash (Image Generation) Experimental,” flew somewhat under the radar last week, but it has been garnering more attention over the past few days due to its ability to remove watermarks from images, albeit with artifacts and a reduction in image quality.

That’s not the only trick. Gemini 2.0 Flash can add objects, remove objects, modify scenery, change lighting, attempt to change image angles, zoom in or out, and perform other transformations—all to varying levels of success depending on the subject matter, style, and image in question.

To pull it off, Google trained Gemini 2.0 on a large dataset of images (converted into tokens) and text. The model’s “knowledge” about images occupies the same neural network space as its knowledge about world concepts from text sources, so it can directly output image tokens that get converted back into images and fed to the user.

Adding a water-skiing barbarian to a photograph with Gemini 2.0 Flash.

Adding a water-skiing barbarian to a photograph with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Incorporating image generation into an AI chat isn’t itself new—OpenAI integrated its image-generator DALL-E 3 into ChatGPT last September, and other tech companies like xAI followed suit. But until now, every one of those AI chat assistants called on a separate diffusion-based AI model (which uses a different synthesis principle than LLMs) to generate images, which were then returned to the user within the chat interface. In this case, Gemini 2.0 Flash is both the large language model (LLM) and AI image generator rolled into one system.

Interestingly, OpenAI’s GPT-4o is capable of native image output as well (and OpenAI President Greg Brock teased the feature at one point on X last year), but that company has yet to release true multimodal image output capability. One reason why is possibly because true multimodal image output is very computationally expensive, since each image either inputted or generated is composed of tokens that become part of the context that runs through the image model again and again with each successive prompt. And given the compute needs and size of the training data required to create a truly visually comprehensive multimodal model, the output quality of the images isn’t necessarily as good as diffusion models just yet.

Creating another angle of a person with Gemini 2.0 Flash.

Creating another angle of a person with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Another reason OpenAI has held back may be “safety”-related: In a similar way to how multimodal models trained on audio can absorb a short clip of a sample person’s voice and then imitate it flawlessly (this is how ChatGPT’s Advanced Voice Mode works, with a clip of a voice actor it is authorized to imitate), multimodal image output models are capable of faking media reality in a relatively effortless and convincing way, given proper training data and compute behind it. With a good enough multimodal model, potentially life-wrecking deepfakes and photo manipulations could become even more trivial to produce than they are now.

Putting it to the test

So, what exactly can Gemini 2.0 Flash do? Notably, its support for conversational image editing allows users to iteratively refine images through natural language dialogue across multiple successive prompts. You can talk to it and tell it what you want to add, remove, or change. It’s imperfect, but it’s the beginning of a new type of native image editing capability in the tech world.

We gave Gemini Flash 2.0 a battery of informal AI image-editing tests, and you’ll see the results below. For example, we removed a rabbit from an image in a grassy yard. We also removed a chicken from a messy garage. Gemini fills in the background with its best guess. No need for a clone brush—watch out, Photoshop!

We also tried adding synthesized objects to images. Being always wary of the collapse of media reality, called the “cultural singularity,” we added a UFO to a photo the author took from an airplane window. Then we tried adding a Sasquatch and a ghost. The results were unrealistic, but this model was also trained on a limited image dataset (more on that below).

Adding a UFO to a photograph with Gemini 2.0 Flash. Google / Benj Edwards

We then added a video game character to a photo of an Atari 800 screen (Wizard of Wor), resulting in perhaps the most realistic image synthesis result in the set. You might not see it here, but Gemini added realistic CRT scanlines that matched the monitor’s characteristics pretty well.

Adding a monster to an Atari video game with Gemini 2.0 Flash.

Adding a monster to an Atari video game with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Gemini can also warp an image in novel ways, like “zooming out” of an image into a fictional setting or giving an EGA-palette character a body, then sticking him into an adventure game.

“Zooming out” on an image with Gemini 2.0 Flash. Google / Benj Edwards

And yes, you can remove watermarks. We tried removing a watermark from a Getty Images image, and it worked, although the resulting image is nowhere near the resolution or detail quality of the original. Ultimately, if your brain can picture what an image is like without a watermark, so can an AI model. It fills in the watermark space with the most plausible result based on its training data.

Removing a watermark with Gemini 2.0 Flash.

Removing a watermark with Gemini 2.0 Flash. Credit: Nomadsoul1 via Getty Images

And finally, we know you’ve likely missed seeing barbarians beside TV sets (as per tradition), so we gave that a shot. Originally, Gemini didn’t add a CRT TV set to the barbarian image, so we asked for one.

Adding a TV set to a barbarian image with Gemini 2.0 Flash.

Adding a TV set to a barbarian image with Gemini 2.0 Flash. Credit: Google / Benj Edwards

Then we set the TV on fire.

Setting the TV set on fire with Gemini 2.0 Flash.

Setting the TV set on fire with Gemini 2.0 Flash. Credit: Google / Benj Edwards

All in all, it doesn’t produce images of pristine quality or detail, but we literally did no editing work on these images other than typing requests. Adobe Photoshop currently lets users manipulate images using AI synthesis based on written prompts with “Generative Fill,” but it’s not quite as natural as this. We could see Adobe adding a more conversational AI image-editing flow like this one in the future.

Multimodal output opens up new possibilities

Having true multimodal output opens up interesting new possibilities in chatbots. For example, Gemini 2.0 Flash can play interactive graphical games or generate stories with consistent illustrations, maintaining character and setting continuity throughout multiple images. It’s far from perfect, but character consistency is a new capability in AI assistants. We tried it out and it was pretty wild—especially when it generated a view of a photo we provided from another angle.

Creating a multi-image story with Gemini 2.0 Flash, part 1. Google / Benj Edwards

Text rendering represents another potential strength of the model. Google claims that internal benchmarks show Gemini 2.0 Flash performs better than “leading competitive models” when generating images containing text, making it potentially suitable for creating content with integrated text. From our experience, the results weren’t that exciting, but they were legible.

An example of in-image text rendering generated with Gemini 2.0 Flash.

An example of in-image text rendering generated with Gemini 2.0 Flash. Credit: Google / Ars Technica

Despite Gemini 2.0 Flash’s shortcomings so far, the emergence of true multimodal image output feels like a notable moment in AI history because of what it suggests if the technology continues to improve. If you imagine a future, say 10 years from now, where a sufficiently complex AI model could generate any type of media in real time—text, images, audio, video, 3D graphics, 3D-printed physical objects, and interactive experiences—you basically have a holodeck, but without the matter replication.

Coming back to reality, it’s still “early days” for multimodal image output, and Google recognizes that. Recall that Flash 2.0 is intended to be a smaller AI model that is faster and cheaper to run, so it hasn’t absorbed the entire breadth of the Internet. All that information takes a lot of space in terms of parameter count, and more parameters means more compute. Instead, Google trained Gemini 2.0 Flash by feeding it a curated dataset that also likely included targeted synthetic data. As a result, the model does not “know” everything visual about the world, and Google itself says the training data is “broad and general, not absolute or complete.”

That’s just a fancy way of saying that the image output quality isn’t perfect—yet. But there is plenty of room for improvement in the future to incorporate more visual “knowledge” as training techniques advance and compute drops in cost. If the process becomes anything like we’ve seen with diffusion-based AI image generators like Stable Diffusion, Midjourney, and Flux, multimodal image output quality may improve rapidly over a short period of time. Get ready for a completely fluid media reality.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Farewell Photoshop? Google’s new AI lets you edit images by asking. Read More »

google-joins-openai-in-pushing-feds-to-codify-ai-training-as-fair-use

Google joins OpenAI in pushing feds to codify AI training as fair use

Google’s position on AI regulation: Trust us, bro

If there was any doubt about Google’s commitment to move fast and break things, its new policy position should put that to rest. “For too long, AI policymaking has paid disproportionate attention to the risks,” the document says.

Google urges the US to invest in AI not only with money but with business-friendly legislation. The company joins the growing chorus of AI firms calling for federal legislation that clarifies how they can operate. It points to the difficulty of complying with a “patchwork” of state-level laws that impose restrictions on AI development and use. If you want to know what keeps Google’s policy wonks up at night, look no further than the vetoed SB-1047 bill in California, which would have enforced AI safety measures.

AI ethics or AI Law concept. Developing AI codes of ethics. Compliance, regulation, standard , business policy and responsibility for guarding against unintended bias in machine learning algorithms.

Credit: Parradee Kietsirikul

According to Google, a national AI framework that supports innovation is necessary to push the boundaries of what artificial intelligence can do. Taking a page from the gun lobby, Google opposes attempts to hold the creators of AI liable for the way those models are used. Generative AI systems are non-deterministic, making it impossible to fully predict their output. Google wants clearly defined responsibilities for AI developers, deployers, and end users—it would, however, clearly prefer most of those responsibilities fall on others. “In many instances, the original developer of an AI model has little to no visibility or control over how it is being used by a deployer and may not interact with end users,” the company says.

There are efforts underway in some countries that would implement stringent regulations that force companies like Google to make their tools more transparent. For example, the EU’s AI Act would require AI firms to publish an overview of training data and possible risks associated with their products. Google believes this would force the disclosure of trade secrets that would allow foreign adversaries to more easily duplicate its work, mirroring concerns that OpenAI expressed in its policy proposal.

Google wants the government to push back on these efforts at the diplomatic level. The company would like to be able to release AI products around the world, and the best way to ensure it has that option is to promote light-touch regulation that “reflects US values and approaches.” That is, Google’s values and approaches.

Google joins OpenAI in pushing feds to codify AI training as fair use Read More »