machine learning

from-sci-fi-to-state-law:-california’s-plan-to-prevent-ai-catastrophe

From sci-fi to state law: California’s plan to prevent AI catastrophe

Adventures in AI regulation —

Critics say SB-1047, proposed by “AI doomers,” could slow innovation and stifle open source AI.

The California state capital building in Sacramento.

Enlarge / The California State Capitol Building in Sacramento.

California’s “Safe and Secure Innovation for Frontier Artificial Intelligence Models Act” (a.k.a. SB-1047) has led to a flurry of headlines and debate concerning the overall “safety” of large artificial intelligence models. But critics are concerned that the bill’s overblown focus on existential threats by future AI models could severely limit research and development for more prosaic, non-threatening AI uses today.

SB-1047, introduced by State Senator Scott Wiener, passed the California Senate in May with a 32-1 vote and seems well positioned for a final vote in the State Assembly in August. The text of the bill requires companies behind sufficiently large AI models (currently set at $100 million in training costs and the rough computing power implied by those costs today) to put testing procedures and systems in place to prevent and respond to “safety incidents.”

The bill lays out a legalistic definition of those safety incidents that in turn focuses on defining a set of “critical harms” that an AI system might enable. That includes harms leading to “mass casualties or at least $500 million of damage,” such as “the creation or use of chemical, biological, radiological, or nuclear weapon” (hello, Skynet?) or “precise instructions for conducting a cyberattack… on critical infrastructure.” The bill also alludes to “other grave harms to public safety and security that are of comparable severity” to those laid out explicitly.

An AI model’s creator can’t be held liable for harm caused through the sharing of “publicly accessible” information from outside the model—simply asking an LLM to summarize The Anarchist’s Cookbook probably wouldn’t put it in violation of the law, for instance. Instead, the bill seems most concerned with future AIs that could come up with “novel threats to public safety and security.” More than a human using an AI to brainstorm harmful ideas, SB-1047 focuses on the idea of an AI “autonomously engaging in behavior other than at the request of a user” while acting “with limited human oversight, intervention, or supervision.”

Would California's new bill have stopped WOPR?

Enlarge / Would California’s new bill have stopped WOPR?

To prevent this straight-out-of-science-fiction eventuality, anyone training a sufficiently large model must “implement the capability to promptly enact a full shutdown” and have policies in place for when such a shutdown would be enacted, among other precautions and tests. The bill also focuses at points on AI actions that would require “intent, recklessness, or gross negligence” if performed by a human, suggesting a degree of agency that does not exist in today’s large language models.

Attack of the killer AI?

This kind of language in the bill likely reflects the particular fears of its original drafter, Center for AI Safety (CAIS) co-founder Dan Hendrycks. In a 2023 Time Magazine piece, Hendrycks makes the maximalist existential argument that “evolutionary pressures will likely ingrain AIs with behaviors that promote self-preservation” and lead to “a pathway toward being supplanted as the earth’s dominant species.'”

If Hendrycks is right, then legislation like SB-1047 seems like a common-sense precaution—indeed, it might not go far enough. Supporters of the bill, including AI luminaries Geoffrey Hinton and Yoshua Bengio, agree with Hendrycks’ assertion that the bill is a necessary step to prevent potential catastrophic harm from advanced AI systems.

“AI systems beyond a certain level of capability can pose meaningful risks to democracies and public safety,” wrote Bengio in an endorsement of the bill. “Therefore, they should be properly tested and subject to appropriate safety measures. This bill offers a practical approach to accomplishing this, and is a major step toward the requirements that I’ve recommended to legislators.”

“If we see any power-seeking behavior here, it is not of AI systems, but of AI doomers.

Tech policy expert Dr. Nirit Weiss-Blatt

However, critics argue that AI policy shouldn’t be led by outlandish fears of future systems that resemble science fiction more than current technology. “SB-1047 was originally drafted by non-profit groups that believe in the end of the world by sentient machine, like Dan Hendrycks’ Center for AI Safety,” Daniel Jeffries, a prominent voice in the AI community, told Ars. “You cannot start from this premise and create a sane, sound, ‘light touch’ safety bill.”

“If we see any power-seeking behavior here, it is not of AI systems, but of AI doomers,” added tech policy expert Nirit Weiss-Blatt. “With their fictional fears, they try to pass fictional-led legislation, one that, according to numerous AI experts and open source advocates, could ruin California’s and the US’s technological advantage.”

From sci-fi to state law: California’s plan to prevent AI catastrophe Read More »

google-claims-math-breakthrough-with-proof-solving-ai-models

Google claims math breakthrough with proof-solving AI models

slow and steady —

AlphaProof and AlphaGeometry 2 solve problems, with caveats on time and human assistance.

An illustration provided by Google.

Enlarge / An illustration provided by Google.

On Thursday, Google DeepMind announced that AI systems called AlphaProof and AlphaGeometry 2 reportedly solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving a score equivalent to a silver medal. The tech giant claims this marks the first time an AI has reached this level of performance in the prestigious math competition—but as usual in AI, the claims aren’t as clear-cut as they seem.

Google says AlphaProof uses reinforcement learning to prove mathematical statements in the formal language called Lean. The system trains itself by generating and verifying millions of proofs, progressively tackling more difficult problems. Meanwhile, AlphaGeometry 2 is described as an upgraded version of Google’s previous geometry-solving AI modeI, now powered by a Gemini-based language model trained on significantly more data.

According to Google, prominent mathematicians Sir Timothy Gowers and Dr. Joseph Myers scored the AI model’s solutions using official IMO rules. The company reports its combined system earned 28 out of 42 possible points, just shy of the 29-point gold medal threshold. This included a perfect score on the competition’s hardest problem, which Google claims only five human contestants solved this year.

A math contest unlike any other

The IMO, held annually since 1959, pits elite pre-college mathematicians against exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. Performance on IMO problems has become a recognized benchmark for assessing an AI system’s mathematical reasoning capabilities.

Google states that AlphaProof solved two algebra problems and one number theory problem, while AlphaGeometry 2 tackled the geometry question. The AI model reportedly failed to solve the two combinatorics problems. The company claims its systems solved one problem within minutes, while others took up to three days.

Google says it first translated the IMO problems into formal mathematical language for its AI model to process. This step differs from the official competition, where human contestants work directly with the problem statements during two 4.5-hour sessions.

Google reports that before this year’s competition, AlphaGeometry 2 could solve 83 percent of historical IMO geometry problems from the past 25 years, up from its predecessor’s 53 percent success rate. The company claims the new system solved this year’s geometry problem in 19 seconds after receiving the formalized version.

Limitations

Despite Google’s claims, Sir Timothy Gowers offered a more nuanced perspective on the Google DeepMind models in a thread posted on X. While acknowledging the achievement as “well beyond what automatic theorem provers could do before,” Gowers pointed out several key qualifications.

“The main qualification is that the program needed a lot longer than the human competitors—for some of the problems over 60 hours—and of course much faster processing speed than the poor old human brain,” Gowers wrote. “If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher.”

Gowers also noted that humans manually translated the problems into the formal language Lean before the AI model began its work. He emphasized that while the AI performed the core mathematical reasoning, this “autoformalization” step was done by humans.

Regarding the broader implications for mathematical research, Gowers expressed uncertainty. “Are we close to the point where mathematicians are redundant? It’s hard to say. I would guess that we’re still a breakthrough or two short of that,” he wrote. He suggested that the system’s long processing times indicate it hasn’t “solved mathematics” but acknowledged that “there is clearly something interesting going on when it operates.”

Even with these limitations, Gowers speculated that such AI systems could become valuable research tools. “So we might be close to having a program that would enable mathematicians to get answers to a wide range of questions, provided those questions weren’t too difficult—the kind of thing one can do in a couple of hours. That would be massively useful as a research tool, even if it wasn’t itself capable of solving open problems.”

Google claims math breakthrough with proof-solving AI models Read More »

ai-and-ml-enter-motorsports:-how-gm-is-using-them-to-win-more-races

AI and ML enter motorsports: How GM is using them to win more races

not LLM or generative AI —

From modeling tire wear and fuel use to predicting cautions based on radio traffic.

SAO PAULO, BRAZIL - JULY 13: The #02 Cadillac Racing Cadillac V-Series.R of Earl Bamber, and Alex Lynn in action ahead of the Six Hours of Sao Paulo at the Autodromo de Interlagos on July 13, 2024 in Sao Paulo, Brazil.

Enlarge / The Cadillac V-Series.R is one of General Motors’ factory-backed racing programs.

James Moy Photography/Getty Images

It is hard to escape the feeling that a few too many businesses are jumping on the AI hype train because it’s hype-y, rather than because AI offers an underlying benefit to their operation. So I will admit to a little inherent skepticism, and perhaps a touch of morbid curiosity, when General Motors got in touch wanting to show off some of the new AI/machine learning tools it has been using to win more races in NASCAR, sportscar racing, and IndyCar. As it turns out, that skepticism was misplaced.

GM has fingers in a lot of motorsport pies, but there are four top-level programs it really, really cares about. Number one for an American automaker is NASCAR—still the king of motorsport here—where Chevrolet supplies engines to six Cup teams. IndyCar, which could once boast of being America’s favorite racing, is home to another six Chevy-powered teams. And then there’s sportscar racing; right now, Cadillac is competing in IMSA’s GTP class and the World Endurance Championship’s Hypercar class, plus a factory Corvette Racing effort in IMSA.

“In all the series we race we either have key partners or specific teams that run our cars. And part of the technical support that they get from us are the capabilities of my team,” said Jonathan Bolenbaugh, motorsports analytics leader at GM, based at GM’s Charlotte Technical Center in North Carolina.

Unlike generative AI that’s being developed to displace humans from creative activities, GM sees the role of AI and ML as supporting human subject-matter experts so they can make the cars go faster. And it’s using these tools in a variety of applications.

One of GM's command centers at its Charlotte Technical Center in North Carolina.

Enlarge / One of GM’s command centers at its Charlotte Technical Center in North Carolina.

General Motors

Each team in each of those various series (obviously) has people on the ground at each race, and invariably more engineers and strategists helping them from Indianapolis, Charlotte, or wherever it is that the particular race team has its home base. But they’ll also be tied in with a team from GM Motorsport, working from one of a number of command centers at its Charlotte Technical Center.

What did they say?

Connecting all three are streams and streams of data from the cars themselves (in series that allow car-to-pit telemetry) but also voice comms, text-based messaging, timing and scoring data from officials, trackside photographs, and more. And one thing Bolenbaugh’s team and their suite of tools can do is help make sense of that data quickly enough for it to be actionable.

“In a series like F1, a lot of teams will have students who are potentially newer members of the team literally listening to the radio and typing out what is happening, then saying, ‘hey, this is about pitting. This is about track conditions,'” Bolenbaugh said.

Instead of giving that to the internship kids, GM built a real time audio transcription tool to do that job. After trying out a commercial off-the-shelf solution, it decided to build its own, “a combination of open source and some of our proprietary code,” Bolenbaugh said. As anyone who has ever been to a race track can attest, it’s a loud environment, so GM had to train models with all the background noise present.

“We’ve been able to really improve our accuracy and usability of the tool to the point where some of the manual support for that capability is now dwindling,” he said, with the benefit that it frees up the humans, who would otherwise be transcribing, to apply their brains in more useful ways.

Take a look at this

Another tool developed by Bolenbaugh and his team was built to quickly analyze images taken by trackside photographers working for the teams and OEMs. While some of the footage they shoot might be for marketing or PR, a lot of it is for the engineers.

Two years ago, getting those photos from the photographer’s camera to the team was the work of two to three minutes. Now, “from shutter click at the racetrack in a NASCAR event to AI-tagged into an application for us to get information out of those photos is seven seconds,” Bolenbaugh said.

Sometimes you don't need a ML tool to analyze a photo to tell you the car is damaged.

Enlarge / Sometimes you don’t need a ML tool to analyze a photo to tell you the car is damaged.

Jeffrey Vest/Icon Sportswire via Getty Images

“Time is everything, and the shortest lap time that we run—the Coliseum would be an outlier, but maybe like 18 seconds is probably a short lap time. So we need to be faster than from when they pass that pit lane entry to when they come back again,” he said.

At the rollout of this particular tool at a NASCAR race last year, one of GM’s partner teams was able to avoid a cautionary pitstop after its driver scraped the wall, when the young engineer who developed the tool was able to show them a seconds-old photo of the right side of the car that showed it had escaped any damage.

“They didn’t have to wait for a spotter to look, they didn’t have to wait for the driver’s opinion. They knew that didn’t have damage. That team made the playoffs in that series by four points, so in the event that they would have pitted, there’s a likelihood where they didn’t make it,” he said. In cases where a car is damaged, the image analysis tool can automatically flag that and make that known quickly through an alert.

Not all of the images are used for snap decisions like that—engineers can glean a lot about their rivals from photos, too.

“We would be very interested in things related to the geometry of the car for the setup settings—wicker settings, wing angles… ride heights of the car, how close the car is to the ground—those are all things that would be great to know from an engineering standpoint, and those would be objectives that we would have in doing image analysis,” said Patrick Canupp, director of motorsports competition engineering at GM.

Many of the photographers you see working trackside will be shooting on behalf of teams or manufacturers.

Enlarge / Many of the photographers you see working trackside will be shooting on behalf of teams or manufacturers.

Steve Russell/Toronto Star via Getty Images

“It’s not straightforward to take a set of still images and determine a lot of engineering information from those. And so we’re working on that actively to help with all the photos that come in to us on a race weekend—there’s thousands of them. And so it’s a lot of information that we have at our access, that we want to try to maximize the engineering information that we glean from all of that data. It’s kind of a big data problem that AI is really geared for,” Canupp said.

The computer says we should pit now

Remember that transcribed audio feed from earlier? “If a bunch of drivers are starting to talk about something similar in the race like the track condition, we can start inferring, based on… the occurrence of certain words, that the track is changing,” said Bolenbaugh. “It might not just be your car… if drivers are talking about something on track, the likelihood of a caution, which is a part of our strategy model, might be going up.”

That feeds into a strategy tool that also takes lap times from timing and scoring, as well as fuel efficiency data in racing series that provide it for all cars, or a predictive model to do the same in series like NASCAR and IndyCar where teams don’t get to see that kind of data from their competitors, as well as models of tire wear.

“One of the biggest things that we need to manage is tires, fuel, and lap time. Everything is a trade-off between trying to execute the race the fastest,” Bolenbaugh said.

Obviously races are dynamic situations, and so “multiple times a lap as the scenario changes, we’re updating our recommendation. So, with tire fall off [as the tire wears and loses grip], you’re following up in real time, predicting where it’s going to be. We are constantly evolving during the race and doing transfer learning so we go into the weekend, as the race unfolds, continuing to train models in real time,” Bolenbaugh said.

AI and ML enter motorsports: How GM is using them to win more races Read More »

openai-hits-google-where-it-hurts-with-new-searchgpt-prototype

OpenAI hits Google where it hurts with new SearchGPT prototype

Cutting through the sludge —

New tool may solve a web-search problem partially caused by AI-generated junk online.

The OpenAI logo on a blue newsprint background.

Benj Edwards / OpenAI

Arguably, few companies have unintentionally contributed more to the increase of AI-generated noise online than OpenAI. Despite its best intentions—and against its terms of service—its AI language models are often used to compose spam, and its pioneering research has inspired others to build AI models that can potentially do the same. This influx of AI-generated content has further reduced the effectiveness of SEO-driven search engines like Google. In 2024, web search is in a sorry state indeed.

It’s interesting, then, that OpenAI is now offering a potential solution to that problem. On Thursday, OpenAI revealed a prototype AI-powered search engine called SearchGPT that aims to provide users with quick, accurate answers sourced from the web. It’s also a direct challenge to Google, which also has tried to apply generative AI to web search (but with little success).

The company says it plans to integrate the most useful aspects of the temporary prototype into ChatGPT in the future. ChatGPT can already perform web searches using Bing, but SearchGPT seems to be a purpose-built interface for AI-assisted web searching.

SearchGPT attempts to streamline the process of finding information online by combining OpenAI’s AI models (like GPT-4o) with real-time web data. Like ChatGPT, users can reportedly ask SearchGPT follow-up questions, with the AI model maintaining context throughout the conversation.

Perhaps most importantly from an accuracy standpoint, the SearchGPT prototype (which we have not tested ourselves) reportedly includes features that attribute web-based sources prominently. Responses include in-line citations and links, while a sidebar displays additional source links.

OpenAI has not yet said how it is obtaining its real-time web data and whether it’s partnering with an existing search engine provider (like it does currently with Bing for ChatGPT) or building its own web-crawling and indexing system.

A way around publishers blocking OpenAI

ChatGPT can already perform web searches using Bing, but since last August when OpenAI revealed a way to block its web crawler, that feature hasn’t been nearly as useful as it could be. Many sites, such as Ars Technica (which blocks the OpenAI crawler as part of our parent company’s policy), won’t show up as results in ChatGPT because of this.

SearchGPT appears to untangle the association between OpenAI’s web crawler for scraping training data and the desire for OpenAI chatbot users to search the web. Notably, in the new SearchGPT announcement, OpenAI says, “Sites can be surfaced in search results even if they opt out of generative AI training.”

Even so, OpenAI says it is working on a way for publishers to manage how they appear in SearchGPT results so that “publishers have more choices.” And the company says that SearchGPT’s ability to browse the web is separate from training OpenAI’s AI models.

An uncertain future for AI-powered search

OpenAI claims SearchGPT will make web searches faster and easier. However, the effectiveness of AI-powered search compared to traditional methods is unknown, as the tech is still in its early stages. But let’s be frank: The most prominent web-search engine right now is pretty terrible.

Over the past year, we’ve seen Perplexity.ai take off as a potential AI-powered Google search replacement, but the service has been hounded by issues with confabulations and accusations of plagiarism among publishers, including Ars Technica parent Condé Nast.

Unlike Perplexity, OpenAI has many content deals lined up with publishers, and it emphasizes that it wants to work with content creators in particular. “We are committed to a thriving ecosystem of publishers and creators,” says OpenAI in its news release. “We hope to help users discover publisher sites and experiences, while bringing more choice to search.”

In a statement for the OpenAI press release, Nicholas Thompson, CEO of The Atlantic (which has a content deal with OpenAI), expressed optimism about the potential of AI search: “AI search is going to become one of the key ways that people navigate the internet, and it’s crucial, in these early days, that the technology is built in a way that values, respects, and protects journalism and publishers,” he said. “We look forward to partnering with OpenAI in the process, and creating a new way for readers to discover The Atlantic.”

OpenAI has experimented with other offshoots of its AI language model technology that haven’t become blockbuster hits (most notably, GPTs come to mind), so time will tell if the techniques behind SearchGPT have staying power—and if it can deliver accurate results without hallucinating. But the current state of web search is inviting new experiments to separate the signal from the noise, and it looks like OpenAI is throwing its hat in the ring.

OpenAI is currently rolling out SearchGPT to a small group of users and publishers for testing and feedback. Those interested in trying the prototype can sign up for a waitlist on the company’s website.

OpenAI hits Google where it hurts with new SearchGPT prototype Read More »

the-first-gpt-4-class-ai-model-anyone-can-download-has-arrived:-llama-405b

The first GPT-4-class AI model anyone can download has arrived: Llama 405B

A new llama emerges —

“Open source AI is the path forward,” says Mark Zuckerberg, misusing the term.

A red llama in a blue desert illustration based on a photo.

In the AI world, there’s a buzz in the air about a new AI language model released Tuesday by Meta: Llama 3.1 405B. The reason? It’s potentially the first time anyone can download a GPT-4-class large language model (LLM) for free and run it on their own hardware. You’ll still need some beefy hardware: Meta says it can run on a “single server node,” which isn’t desktop PC-grade equipment. But it’s a provocative shot across the bow of “closed” AI model vendors such as OpenAI and Anthropic.

“Llama 3.1 405B is the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation,” says Meta. Company CEO Mark Zuckerberg calls 405B “the first frontier-level open source AI model.”

In the AI industry, “frontier model” is a term for an AI system designed to push the boundaries of current capabilities. In this case, Meta is positioning 405B among the likes of the industry’s top AI models, such as OpenAI’s GPT-4o, Claude’s 3.5 Sonnet, and Google Gemini 1.5 Pro.

A chart published by Meta suggests that 405B gets very close to matching the performance of GPT-4 Turbo, GPT-4o, and Claude 3.5 Sonnet in benchmarks like MMLU (undergraduate level knowledge), GSM8K (grade school math), and HumanEval (coding).

But as we’ve noted many times since March, these benchmarks aren’t necessarily scientifically sound or translate to the subjective experience of interacting with AI language models. In fact, this traditional slate of AI benchmarks is so generally useless to laypeople that even Meta’s PR department now just posts a few images of charts and doesn’t even try to explain them in any detail.

A Meta-provided chart that shows Llama 3.1 405B benchmark results versus other major AI models.

Enlarge / A Meta-provided chart that shows Llama 3.1 405B benchmark results versus other major AI models.

We’ve instead found that measuring the subjective experience of using a conversational AI model (through what might be called “vibemarking”) on A/B leaderboards like Chatbot Arena is a better way to judge new LLMs. In the absence of Chatbot Arena data, Meta has provided the results of its own human evaluations of 405B’s outputs that seem to show Meta’s new model holding its own against GPT-4 Turbo and Claude 3.5 Sonnet.

A Meta-provided chart that shows how humans rated Llama 3.1 405B's outputs compared to GPT-4 Turbo, GPT-4o, and Claude 3.5 Sonnet in its own studies.

Enlarge / A Meta-provided chart that shows how humans rated Llama 3.1 405B’s outputs compared to GPT-4 Turbo, GPT-4o, and Claude 3.5 Sonnet in its own studies.

Whatever the benchmarks, early word on the street (after the model leaked on 4chan yesterday) seems to match the claim that 405B is roughly equivalent to GPT-4. It took a lot of expensive computer training time to get there—and money, of which the social media giant has plenty to burn. Meta trained the 405B model on over 15 trillion tokens of training data scraped from the web (then parsed, filtered, and annotated by Llama 2), using more than 16,000 H100 GPUs.

So what’s with the 405B name? In this case, “405B” means 405 billion parameters, and parameters are numerical values that store trained information in a neural network. More parameters translate to a larger neural network powering the AI model, which generally (but not always) means more capability, such as better ability to make contextual connections between concepts. But larger-parameter models have a tradeoff in needing more computing power (AKA “compute”) to run.

We’ve been expecting the release of a 400 billion-plus parameter model of the Llama 3 family since Meta gave word that it was training one in April, and today’s announcement isn’t just about the biggest member of the Llama 3 family: There’s an entirely new iteration of improved Llama models with the designation “Llama 3.1.” That includes upgraded versions of its smaller 8B and 70B models, which now feature multilingual support and an extended context length of 128,000 tokens (the “context length” is roughly the working memory capacity of the model, and “tokens” are chunks of data used by LLMs to process information).

Meta says that 405B is useful for long-form text summarization, multilingual conversational agents, and coding assistants and for creating synthetic data used to train future AI language models. Notably, that last use-case—allowing developers to use outputs from Llama models to improve other AI models—is now officially supported by Meta’s Llama 3.1 license for the first time.

Abusing the term “open source”

Llama 3.1 405B is an open-weights model, which means anyone can download the trained neural network files and run them or fine-tune them. That directly challenges a business model where companies like OpenAI keep the weights to themselves and instead monetize the model through subscription wrappers like ChatGPT or charge for access by the token through an API.

Fighting the “closed” AI model is a big deal to Mark Zuckerberg, who simultaneously released a 2,300-word manifesto today on why the company believes in open releases of AI models, titled, “Open Source AI Is the Path Forward.” More on the terminology in a minute. But briefly, he writes about the need for customizable AI models that offer user control and encourage better data security, higher cost-efficiency, and better future-proofing, as opposed to vendor-locked solutions.

All that sounds reasonable, but undermining your competitors using a model subsidized by a social media war chest is also an efficient way to play spoiler in a market where you might not always win with the most cutting-edge tech. That benefits Meta, Zuckerberg says, because he doesn’t want to get locked into a system where companies like his have to pay a toll to access AI capabilities, drawing comparisons to “taxes” Apple levies on developers through its App Store.

A screenshot of Mark Zuckerberg's essay,

Enlarge / A screenshot of Mark Zuckerberg’s essay, “Open Source AI Is the Path Forward,” published on July 23, 2024.

So, about that “open source” term. As we first wrote in an update to our Llama 2 launch article a year ago, “open source” has a very particular meaning that has traditionally been defined by the Open Source Initiative. The AI industry has not yet settled on terminology for AI model releases that ship either code or weights with restrictions (such as Llama 3.1) or that ship without providing training data. We’ve been calling these releases “open weights” instead.

Unfortunately for terminology sticklers, Zuckerberg has now baked the erroneous “open source” label into the title of his potentially historic aforementioned essay on open AI releases, so fighting for the correct term in AI may be a losing battle. Still, his usage annoys people like independent AI researcher Simon Willison, who likes Zuckerberg’s essay otherwise.

“I see Zuck’s prominent misuse of ‘open source’ as a small-scale act of cultural vandalism,” Willison told Ars Technica. “Open source should have an agreed meaning. Abusing the term weakens that meaning which makes the term less generally useful, because if someone says ‘it’s open source,’ that no longer tells me anything useful. I have to then dig in and figure out what they’re actually talking about.”

The Llama 3.1 models are available for download through Meta’s own website and on Hugging Face. They both require providing contact information and agreeing to a license and an acceptable use policy, which means that Meta can technically legally pull the rug out from under your use of Llama 3.1 or its outputs at any time.

The first GPT-4-class AI model anyone can download has arrived: Llama 405B Read More »

astronomers-discover-technique-to-spot-ai-fakes-using-galaxy-measurement-tools

Astronomers discover technique to spot AI fakes using galaxy-measurement tools

stars in their eyes —

Researchers use technique to quantify eyeball reflections that often reveal deepfake images.

Researchers write,

Enlarge / Researchers write, “In this image, the person on the left (Scarlett Johansson) is real, while the person on the right is AI-generated. Their eyeballs are depicted underneath their faces. The reflections in the eyeballs are consistent for the real person, but incorrect (from a physics point of view) for the fake person.”

In 2024, it’s almost trivial to create realistic AI-generated images of people, which has led to fears about how these deceptive images might be detected. Researchers at the University of Hull recently unveiled a novel method for detecting AI-generated deepfake images by analyzing reflections in human eyes. The technique, presented at the Royal Astronomical Society’s National Astronomy Meeting last week, adapts tools used by astronomers to study galaxies for scrutinizing the consistency of light reflections in eyeballs.

Adejumoke Owolabi, an MSc student at the University of Hull, headed the research under the guidance of Dr. Kevin Pimbblet, professor of astrophysics.

Their detection technique is based on a simple principle: A pair of eyes being illuminated by the same set of light sources will typically have a similarly shaped set of light reflections in each eyeball. Many AI-generated images created to date don’t take eyeball reflections into account, so the simulated light reflections are often inconsistent between each eye.

A series of real eyes showing largely consistent reflections in both eyes.

Enlarge / A series of real eyes showing largely consistent reflections in both eyes.

In some ways, the astronomy angle isn’t always necessary for this kind of deepfake detection because a quick glance at a pair of eyes in a photo can reveal reflection inconsistencies, which is something artists who paint portraits have to keep in mind. But the application of astronomy tools to automatically measure and quantify eye reflections in deepfakes is a novel development.

Automated detection

In a Royal Astronomical Society blog post, Pimbblet explained that Owolabi developed a technique to detect eyeball reflections automatically and ran the reflections’ morphological features through indices to compare similarity between left and right eyeballs. Their findings revealed that deepfakes often exhibit differences between the pair of eyes.

The team applied methods from astronomy to quantify and compare eyeball reflections. They used the Gini coefficient, typically employed to measure light distribution in galaxy images, to assess the uniformity of reflections across eye pixels. A Gini value closer to 0 indicates evenly distributed light, while a value approaching 1 suggests concentrated light in a single pixel.

A series of deepfake eyes showing inconsistent reflections in each eye.

Enlarge / A series of deepfake eyes showing inconsistent reflections in each eye.

In the Royal Astronomical Society post, Pimbblet drew comparisons between how they measured eyeball reflection shape and how they typically measure galaxy shape in telescope imagery: “To measure the shapes of galaxies, we analyze whether they’re centrally compact, whether they’re symmetric, and how smooth they are. We analyze the light distribution.”

The researchers also explored the use of CAS parameters (concentration, asymmetry, smoothness), another tool from astronomy for measuring galactic light distribution. However, this method proved less effective in identifying fake eyes.

A detection arms race

While the eye-reflection technique offers a potential path for detecting AI-generated images, the method might not work if AI models evolve to incorporate physically accurate eye reflections, perhaps applied as a subsequent step after image generation. The technique also requires a clear, up-close view of eyeballs to work.

The approach also risks producing false positives, as even authentic photos can sometimes exhibit inconsistent eye reflections due to varied lighting conditions or post-processing techniques. But analyzing eye reflections may still be a useful tool in a larger deepfake detection toolset that also considers other factors such as hair texture, anatomy, skin details, and background consistency.

While the technique shows promise in the short term, Dr. Pimbblet cautioned that it’s not perfect. “There are false positives and false negatives; it’s not going to get everything,” he told the Royal Astronomical Society. “But this method provides us with a basis, a plan of attack, in the arms race to detect deepfakes.”

Astronomers discover technique to spot AI fakes using galaxy-measurement tools Read More »

microsoft-cto-kevin-scott-thinks-llm-“scaling-laws”-will-hold-despite-criticism

Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism

As the word turns —

Will LLMs keep improving if we throw more compute at them? OpenAI dealmaker thinks so.

Kevin Scott, CTO and EVP of AI at Microsoft speaks onstage during Vox Media's 2023 Code Conference at The Ritz-Carlton, Laguna Niguel on September 27, 2023 in Dana Point, California.

Enlarge / Kevin Scott, CTO and EVP of AI at Microsoft speaks onstage during Vox Media’s 2023 Code Conference at The Ritz-Carlton, Laguna Niguel on September 27, 2023 in Dana Point, California.

During an interview with Sequoia Capital’s Training Data podcast published last Tuesday, Microsoft CTO Kevin Scott doubled down on his belief that so-called large language model (LLM) “scaling laws” will continue to drive AI progress, despite some skepticism in the field that progress has leveled out. Scott played a key role in forging a $13 billion technology-sharing deal between Microsoft and OpenAI.

“Despite what other people think, we’re not at diminishing marginal returns on scale-up,” Scott said. “And I try to help people understand there is an exponential here, and the unfortunate thing is you only get to sample it every couple of years because it just takes a while to build supercomputers and then train models on top of them.”

LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute). The laws suggest that simply scaling up model size and training data can lead to significant improvements in AI capabilities without necessarily requiring fundamental algorithmic breakthroughs.

Since then, other researchers have challenged the idea of persisting scaling laws over time, but the concept is still a cornerstone of OpenAI’s AI development philosophy.

You can see Scott’s comments in the video below beginning around 46: 05:

Microsoft CTO Kevin Scott on how far scaling laws will extend

Scott’s optimism contrasts with a narrative among some critics in the AI community that progress in LLMs has plateaued around GPT-4 class models. The perception has been fueled by largely informal observations—and some benchmark results—about recent models like Google’s Gemini 1.5 Pro, Anthropic’s Claude Opus, and even OpenAI’s GPT-4o, which some argue haven’t shown the dramatic leaps in capability seen in earlier generations, and that LLM development may be approaching diminishing returns.

“We all know that GPT-3 was vastly better than GPT-2. And we all know that GPT-4 (released thirteen months ago) was vastly better than GPT-3,” wrote AI critic Gary Marcus in April. “But what has happened since?”

The perception of plateau

Scott’s stance suggests that tech giants like Microsoft still feel justified in investing heavily in larger AI models, betting on continued breakthroughs rather than hitting a capability plateau. Given Microsoft’s investment in OpenAI and strong marketing of its own Microsoft Copilot AI features, the company has a strong interest in maintaining the perception of continued progress, even if the tech stalls.

Frequent AI critic Ed Zitron recently wrote in a post on his blog that one defense of continued investment into generative AI is that “OpenAI has something we don’t know about. A big, sexy, secret technology that will eternally break the bones of every hater,” he wrote. “Yet, I have a counterpoint: no it doesn’t.”

Some perceptions of slowing progress in LLM capabilities and benchmarking may be due to the rapid onset of AI in the public eye when, in fact, LLMs have been developing for years prior. OpenAI continued to develop LLMs during a roughly three-year gap between the release of GPT-3 in 2020 and GPT-4 in 2023. Many people likely perceived a rapid jump in capability with GPT-4’s launch in 2023 because they had only become recently aware of GPT-3-class models with the launch of ChatGPT in late November 2022, which used GPT-3.5.

In the podcast interview, the Microsoft CTO pushed back against the idea that AI progress has stalled, but he acknowledged the challenge of infrequent data points in this field, as new models often take years to develop. Despite this, Scott expressed confidence that future iterations will show improvements, particularly in areas where current models struggle.

“The next sample is coming, and I can’t tell you when, and I can’t predict exactly how good it’s going to be, but it will almost certainly be better at the things that are brittle right now, where you’re like, oh my god, this is a little too expensive, or a little too fragile, for me to use,” Scott said in the interview. “All of that gets better. It’ll get cheaper, and things will become less fragile. And then more complicated things will become possible. That is the story of each generation of these models as we’ve scaled up.”

Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism Read More »

openai-reportedly-nears-breakthrough-with-“reasoning”-ai,-reveals-progress-framework

OpenAI reportedly nears breakthrough with “reasoning” AI, reveals progress framework

studies in hype-otheticals —

Five-level AI classification system probably best seen as a marketing exercise.

Illustration of a robot with many arms.

OpenAI recently unveiled a five-tier system to gauge its advancement toward developing artificial general intelligence (AGI), according to an OpenAI spokesperson who spoke with Bloomberg. The company shared this new classification system on Tuesday with employees during an all-hands meeting, aiming to provide a clear framework for understanding AI advancement. However, the system describes hypothetical technology that does not yet exist and is possibly best interpreted as a marketing move to garner investment dollars.

OpenAI has previously stated that AGI—a nebulous term for a hypothetical concept that means an AI system that can perform novel tasks like a human without specialized training—is currently the primary goal of the company. The pursuit of technology that can replace humans at most intellectual work drives most of the enduring hype over the firm, even though such a technology would likely be wildly disruptive to society.

OpenAI CEO Sam Altman has previously stated his belief that AGI could be achieved within this decade, and a large part of the CEO’s public messaging has been related to how the company (and society in general) might handle the disruption that AGI may bring. Along those lines, a ranking system to communicate AI milestones achieved internally on the path to AGI makes sense.

OpenAI’s five levels—which it plans to share with investors—range from current AI capabilities to systems that could potentially manage entire organizations. The company believes its technology (such as GPT-4o that powers ChatGPT) currently sits at Level 1, which encompasses AI that can engage in conversational interactions. However, OpenAI executives reportedly told staff they’re on the verge of reaching Level 2, dubbed “Reasoners.”

Bloomberg lists OpenAI’s five “Stages of Artificial Intelligence” as follows:

  • Level 1: Chatbots, AI with conversational language
  • Level 2: Reasoners, human-level problem solving
  • Level 3: Agents, systems that can take actions
  • Level 4: Innovators, AI that can aid in invention
  • Level 5: Organizations, AI that can do the work of an organization

A Level 2 AI system would reportedly be capable of basic problem-solving on par with a human who holds a doctorate degree but lacks access to external tools. During the all-hands meeting, OpenAI leadership reportedly demonstrated a research project using their GPT-4 model that the researchers believe shows signs of approaching this human-like reasoning ability, according to someone familiar with the discussion who spoke with Bloomberg.

The upper levels of OpenAI’s classification describe increasingly potent hypothetical AI capabilities. Level 3 “Agents” could work autonomously on tasks for days. Level 4 systems would generate novel innovations. The pinnacle, Level 5, envisions AI managing entire organizations.

This classification system is still a work in progress. OpenAI plans to gather feedback from employees, investors, and board members, potentially refining the levels over time.

Ars Technica asked OpenAI about the ranking system and the accuracy of the Bloomberg report, and a company spokesperson said they had “nothing to add.”

The problem with ranking AI capabilities

OpenAI isn’t alone in attempting to quantify levels of AI capabilities. As Bloomberg notes, OpenAI’s system feels similar to levels of autonomous driving mapped out by automakers. And in November 2023, researchers at Google DeepMind proposed their own five-level framework for assessing AI advancement, showing that other AI labs have also been trying to figure out how to rank things that don’t yet exist.

OpenAI’s classification system also somewhat resembles Anthropic’s “AI Safety Levels” (ASLs) first published by the maker of the Claude AI assistant in September 2023. Both systems aim to categorize AI capabilities, though they focus on different aspects. Anthropic’s ASLs are more explicitly focused on safety and catastrophic risks (such as ASL-2, which refers to “systems that show early signs of dangerous capabilities”), while OpenAI’s levels track general capabilities.

However, any AI classification system raises questions about whether it’s possible to meaningfully quantify AI progress and what constitutes an advancement (or even what constitutes a “dangerous” AI system, as in the case of Anthropic). The tech industry so far has a history of overpromising AI capabilities, and linear progression models like OpenAI’s potentially risk fueling unrealistic expectations.

There is currently no consensus in the AI research community on how to measure progress toward AGI or even if AGI is a well-defined or achievable goal. As such, OpenAI’s five-tier system should likely be viewed as a communications tool to entice investors that shows the company’s aspirational goals rather than a scientific or even technical measurement of progress.

OpenAI reportedly nears breakthrough with “reasoning” AI, reveals progress framework Read More »

intuit’s-ai-gamble:-mass-layoff-of-1,800-paired-with-hiring-spree

Intuit’s AI gamble: Mass layoff of 1,800 paired with hiring spree

In the name of AI —

Intuit CEO: “Companies that aren’t prepared to take advantage of [AI] will fall behind.”

Signage for financial software company Intuit at the company's headquarters in the Silicon Valley town of Mountain View, California, August 24, 2016.

On Wednesday, Intuit CEO Sasan Goodarzi announced in a letter to the company that it would be laying off 1,800 employees—about 10 percent of its workforce of around 18,000—while simultaneously planning to hire the same number of new workers as part of a major restructuring effort purportedly focused on AI.

“As I’ve shared many times, the era of AI is one of the most significant technology shifts of our lifetime,” wrote Goodarzi in a blog post on Intuit’s website. “This is truly an extraordinary time—AI is igniting global innovation at an incredible pace, transforming every industry and company in ways that were unimaginable just a few years ago. Companies that aren’t prepared to take advantage of this AI revolution will fall behind and, over time, will no longer exist.”

The CEO says Intuit is in a position of strength and that the layoffs are not cost-cutting related, but they allow the company to “allocate additional investments to our most critical areas to support our customers and drive growth.” With new hires, the company expects its overall headcount to grow in its 2025 fiscal year.

Intuit’s layoffs (which collectively qualify as a “mass layoff” under the WARN act) hit various departments within the company, including closing Intuit’s offices in Edmonton, Canada, and Boise, Idaho, affecting over 250 employees. Approximately 1,050 employees will receive layoffs because they’re “not meeting expectations,” according to Goodarzi’s letter. Intuit has also eliminated more than 300 roles across the company to “streamline” operations and shift resources toward AI, and the company plans to consolidate 80 tech roles to “sites where we are strategically growing our technology teams and capabilities,” such as Atlanta, Bangalore, New York, Tel Aviv, and Toronto.

In turn, the company plans to accelerate investments in its AI-powered financial assistant, Intuit Assist, which provides AI-generated financial recommendations. The company also plans to hire new talent in engineering, product development, data science, and customer-facing roles, with a particular emphasis on AI expertise.

Not just about AI

Despite Goodarzi’s heavily AI-focused message, the restructuring at Intuit reveals a more complex picture. A closer look at the layoffs shows that many of the 1,800 job cuts stem from performance-based departures (such as the aforementioned 1,050). The restructuring also includes a 10 percent reduction in executive positions at the director level and above (“To continue increasing our velocity of decision making,” Goodarzi says).

These numbers suggest that the reorganization may also serve as an opportunity for Intuit to trim its workforce of underperforming staff, using the AI hype cycle as a compelling backdrop for a broader house-cleaning effort.

But as far as CEOs are concerned, it’s always a good time to talk about how they’re embracing the latest, hottest thing in technology: “With the introduction of GenAI,” Goodarzi wrote, “we are now delivering even more compelling customer experiences, increasing monetization potential, and driving efficiencies in how the work gets done within Intuit. But it’s just the beginning of the AI revolution.”

Intuit’s AI gamble: Mass layoff of 1,800 paired with hiring spree Read More »

openai’s-new-“criticgpt”-model-is-trained-to-criticize-gpt-4-outputs

OpenAI’s new “CriticGPT” model is trained to criticize GPT-4 outputs

automated critic —

Research model catches bugs in AI-generated code, improving human oversight of AI.

An illustration created by OpenAI.

Enlarge / An illustration created by OpenAI.

On Thursday, OpenAI researchers unveiled CriticGPT, a new AI model designed to identify mistakes in code generated by ChatGPT. It aims to enhance the process of making AI systems behave in ways humans want (called “alignment”) through Reinforcement Learning from Human Feedback (RLHF), which helps human reviewers make large language model (LLM) outputs more accurate.

As outlined in a new research paper called “LLM Critics Help Catch LLM Bugs,” OpenAI created CriticGPT to act as an AI assistant to human trainers who review programming code generated by the ChatGPT AI assistant. CriticGPT—based on the GPT-4 family of LLMS—analyzes the code and points out potential errors, making it easier for humans to spot mistakes that might otherwise go unnoticed. The researchers trained CriticGPT on a dataset of code samples with intentionally inserted bugs, teaching it to recognize and flag various coding errors.

The researchers found that CriticGPT’s critiques were preferred by annotators over human critiques in 63 percent of cases involving naturally occurring LLM errors and that human-machine teams using CriticGPT wrote more comprehensive critiques than humans alone while reducing confabulation (hallucination) rates compared to AI-only critiques.

Developing an automated critic

The development of CriticGPT involved training the model on a large number of inputs containing deliberately inserted mistakes. Human trainers were asked to modify code written by ChatGPT, introducing errors and then providing example feedback as if they had discovered these bugs. This process allowed the model to learn how to identify and critique various types of coding errors.

In experiments, CriticGPT demonstrated its ability to catch both inserted bugs and naturally occurring errors in ChatGPT’s output. The new model’s critiques were preferred by trainers over those generated by ChatGPT itself in 63 percent of cases involving natural bugs (the aforementioned statistic). This preference was partly due to CriticGPT producing fewer unhelpful “nitpicks” and generating fewer false positives, or hallucinated problems.

The researchers also created a new technique they call Force Sampling Beam Search (FSBS). This method helps CriticGPT write more detailed reviews of code. It lets the researchers adjust how thorough CriticGPT is in looking for problems, while also controlling how often it might make up issues that don’t really exist. They can tweak this balance depending on what they need for different AI training tasks.

Interestingly, the researchers found that CriticGPT’s capabilities extend beyond just code review. In their experiments, they applied the model to a subset of ChatGPT training data that had previously been rated as flawless by human annotators. Surprisingly, CriticGPT identified errors in 24 percent of these cases—errors that were subsequently confirmed by human reviewers. OpenAI thinks this demonstrates the model’s potential to generalize to non-code tasks and highlights its ability to catch subtle mistakes that even careful human evaluation might miss.

Despite its promising results, like all AI models, CriticGPT has limitations. The model was trained on relatively short ChatGPT answers, which may not fully prepare it for evaluating longer, more complex tasks that future AI systems might tackle. Additionally, while CriticGPT reduces confabulations, it doesn’t eliminate them entirely, and human trainers can still make labeling mistakes based on these false outputs.

The research team acknowledges that CriticGPT is most effective at identifying errors that can be pinpointed in one specific location within the code. However, real-world mistakes in AI outputs can often be spread across multiple parts of an answer, presenting a challenge for future iterations of the model.

OpenAI plans to integrate CriticGPT-like models into its RLHF labeling pipeline, providing its trainers with AI assistance. For OpenAI, it’s a step toward developing better tools for evaluating outputs from LLM systems that may be difficult for humans to rate without additional support. However, the researchers caution that even with tools like CriticGPT, extremely complex tasks or responses may still prove challenging for human evaluators—even those assisted by AI.

OpenAI’s new “CriticGPT” model is trained to criticize GPT-4 outputs Read More »

ai-generated-al-michaels-to-provide-daily-recaps-during-2024-summer-olympics

AI-generated Al Michaels to provide daily recaps during 2024 Summer Olympics

forever young —

AI voice clone will narrate daily Olympics video recaps; critics call it a “code-generated ghoul.”

Al Michaels looks on prior to the game between the Minnesota Vikings and Philadelphia Eagles at Lincoln Financial Field on September 14, 2023 in Philadelphia, Pennsylvania.

Enlarge / Al Michaels looks on prior to the game between the Minnesota Vikings and Philadelphia Eagles at Lincoln Financial Field on September 14, 2023, in Philadelphia, Pennsylvania.

On Wednesday, NBC announced plans to use an AI-generated clone of famous sports commentator Al Michaels‘ voice to narrate daily streaming video recaps of the 2024 Summer Olympics in Paris, which start on July 26. The AI-powered narration will feature in “Your Daily Olympic Recap on Peacock,” NBC’s streaming service. But this new, high-profile use of voice cloning worries critics, who say the technology may muscle out upcoming sports commentators by keeping old personas around forever.

NBC says it has created a “high-quality AI re-creation” of Michaels’ voice, trained on Michaels’ past NBC appearances to capture his distinctive delivery style.

The veteran broadcaster, revered in the sports commentator world for his iconic “Do you believe in miracles? Yes!” call during the 1980 Winter Olympics, has been covering sports on TV since 1971, including a high-profile run of play-by-play coverage of NFL football games for both ABC and NBC since the 1980s. NBC dropped him from NFL coverage in 2023, however, possibly due to his age.

Michaels, who is 79 years old, shared his initial skepticism about the project in an interview with Vanity Fair, as NBC News notes. After hearing the AI version of his voice, which can greet viewers by name, he described the experience as “astonishing” and “a little bit frightening.” He said the AI recreation was “almost 2% off perfect” in mimicking his style.

The Vanity Fair article provides some insight into how NBC’s new AI system works. It first uses a large language model (similar technology to what powers ChatGPT) to analyze subtitles and metadata from NBC’s Olympics video coverage, summarizing events and writing custom output to imitate Michaels’ style. This text is then fed into an unspecified voice AI model trained on Michaels’ previous NBC appearances, reportedly replicating his unique pronunciations and intonations.

NBC estimates that the system could generate nearly 7 million personalized variants of the recaps across the US during the games, pulled from the network’s 5,000 hours of live coverage. Using the system, each Peacock user will receive about 10 minutes of personalized highlights.

A diminished role for humans in the future?

Al Michaels reports on the Sweden vs. USA men's ice hockey game at the 1980 Olympic Winter Games on February 12, 1980.

Enlarge / Al Michaels reports on the Sweden vs. USA men’s ice hockey game at the 1980 Olympic Winter Games on February 12, 1980.

It’s no secret that while AI is wildly hyped right now, it’s also controversial among some. Upon hearing the NBC announcement, critics of AI technology reacted strongly. “@NBCSports, this is gross,” tweeted actress and filmmaker Justine Bateman, who frequently uses X to criticize technologies that might replace human writers or performers in the future.

A thread of similar responses from X users reacting to the sample video provided above included criticisms such as, “Sounds pretty off when it’s just the same tone for every single word.” Another user wrote, “It just sounds so unnatural. No one talks like that.”

The technology will not replace NBC’s regular human sports commentators during this year’s Olympics coverage, and like other forms of AI, it leans heavily on existing human work by analyzing and regurgitating human-created content in the form of captions pulled from NBC footage.

Looking down the line, due to AI media cloning technologies like voice, video, and image synthesis, today’s celebrities may be able to attain a form of media immortality that allows new iterations of their likenesses to persist through the generations, potentially earning licensing fees for whoever holds the rights.

We’ve already seen it with James Earl Jones playing Darth Vader’s voice, and the trend will likely continue with other celebrity voices, provided the money is right. Eventually, it may extend to famous musicians through music synthesis and famous actors in video-synthesis applications as well.

The possibility of being muscled out by AI replicas factored heavily into a Hollywood actors’ strike last year, with SAG-AFTRA union President Fran Drescher saying, “If we don’t stand tall right now, we are all going to be in trouble. We are all going to be in jeopardy of being replaced by machines.”

For companies that like to monetize media properties for as long as possible, AI may provide a way to maintain a media legacy through automation. But future human performers may have to compete against all of the greatest performers of the past, rendered through AI, to break out and forge a new career—provided there will be room for human performers at all.

Al Michaels became Al Michaels because he was brought in to replace people who died, or retired, or moved on,” tweeted a writer named Geonn Cannon on X. “If he can’t do the job anymore, it’s time to let the next Al Michaels have a shot at it instead of just planting a code-generated ghoul in an empty chair.

AI-generated Al Michaels to provide daily recaps during 2024 Summer Olympics Read More »

toys-“r”-us-riles-critics-with-“first-ever”-ai-generated-commercial-using-sora

Toys “R” Us riles critics with “first-ever” AI-generated commercial using Sora

A screen capture from the partially AI-generated Toys

Enlarge / A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

Toys R Us

On Monday, Toys “R” Us announced that it had partnered with an ad agency called Native Foreign to create what it calls “the first-ever brand film using OpenAI’s new text-to-video tool, Sora.” OpenAI debuted Sora in February, but the video synthesis tool has not yet become available to the public. The brand film tells the story of Toys “R” Us founder Charles Lazarus using AI-generated video clips.

“We are thrilled to partner with Native Foreign to push the boundaries of Sora, a groundbreaking new technology from OpenAI that’s gaining global attention,” wrote Toys “R” Us on its website. “Sora can create up to one-minute-long videos featuring realistic scenes and multiple characters, all generated from text instruction. Imagine the excitement of creating a young Charles Lazarus, the founder of Toys “R” Us, and envisioning his dreams for our iconic brand and beloved mascot Geoffrey the Giraffe in the early 1930s.”

The company says that The Origin of Toys “R” Us commercial was co-produced by Toys “R” Us Studios President Kim Miller Olko as executive producer and Native Foreign’s Nik Kleverov as director. “Charles Lazarus was a visionary ahead of his time, and we wanted to honor his legacy with a spot using the most cutting-edge technology available,” Miller Olko said in a statement.

In the video, we see a child version of Lazarus, presumably generated using Sora, falling asleep and having a dream that he is flying through a land of toys. Along the way, he meets Geoffery, the store’s mascot, who hands the child a small red car.

Many of the scenes retain obvious hallmarks of AI-generated imagery, such as unnatural movement, strange visual artifacts, and the irregular shape of eyeglasses. In February, a few Super Bowl commercials intentionally made fun of similar AI-generated video defects, which became famous online after fake AI-generated beer commercial and “Pepperoni Hug Spot” clips made using Runway’s Gen-2 model went viral in 2023.

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys R Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys R Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

AI-generated artwork receives frequent criticism online due to the use of human-created artwork to train AI models that create the works, the perception that AI synthesis tools will replace (or are currently replacing) human creative jobs, and the potential environmental impact of AI models, which are seen as energy-wasteful by some critics. Also, some people just think the output quality looks bad.

On the social network X, comedy writer Mike Drucker wrapped up several of these criticisms into one post, writing, “Love this commercial is like, ‘Toys R Us started with the dream of a little boy who wanted to share his imagination with the world. And to show how, we fired our artists and dried Lake Superior using a server farm to generate what that would look like in Stephen King’s nightmares.'”

Other critical comments were more frank. Filmmaker Joe Russo posted: “TOYS ‘R US released an AI commercial and it fucking sucks.”

Toys “R” Us riles critics with “first-ever” AI-generated commercial using Sora Read More »