video synthesis

curated-realities:-an-ai-film-festival-and-the-future-of-human-expression

Curated realities: An AI film festival and the future of human expression


We saw 10 AI films and interviewed Runway’s CEO as well as Hollywood pros.

An AI-generated frame of a person looking at an array of television screens

A still from Total Pixel Space, the Grand Prix winner at AIFF 2025.

A still from Total Pixel Space, the Grand Prix winner at AIFF 2025.

Last week, I attended a film festival dedicated to shorts made using generative AI. Dubbed AIFF 2025, it was an event precariously balancing between two different worlds.

The festival was hosted by Runway, a company that produces models and tools for generating images and videos. In panels and press briefings, a curated list of industry professionals made the case for Hollywood to embrace AI tools. In private meetings with industry professionals, I gained a strong sense that there is already a widening philosophical divide within the film and television business.

I also interviewed Runway CEO Cristóbal Valenzuela about the tightrope he walks as he pitches his products to an industry that has deeply divided feelings about what role AI will have in its future.

To unpack all this, it makes sense to start with the films, partly because the film that was chosen as the festival’s top prize winner says a lot about the issues at hand.

A festival of oddities and profundities

Since this was the first time the festival has been open to the public, the crowd was a diverse mix: AI tech enthusiasts, working industry creatives, and folks who enjoy movies and who were curious about what they’d see—as well as quite a few people who fit into all three groups.

The scene at the entrance to the theater at AIFF 2025 in Santa Monica, California.

The films shown were all short, and most would be more at home at an art film fest than something more mainstream. Some shorts featured an animated aesthetic (including one inspired by anime) and some presented as live action. There was even a documentary of sorts. The films could be made entirely with Runway or other AI tools, or those tools could simply be a key part of a stack that also includes more traditional filmmaking methods.

Many of these shorts were quite weird. Most of us have seen by now that AI video-generation tools excel at producing surreal and distorted imagery—sometimes whether the person prompting the tool wants that or not. Several of these films leaned into that limitation, treating it as a strength.

Representing that camp was Vallée Duhamel’s Fragments of Nowhere, which visually explored the notion of multiple dimensions bleeding into one another. Cars morphed into the sides of houses, and humanoid figures, purported to be inter-dimensional travelers, moved in ways that defied anatomy. While I found this film visually compelling at times, I wasn’t seeing much in it that I hadn’t already seen from dreamcore or horror AI video TikTok creators like GLUMLOT or SinRostroz in recent years.

More compelling were shorts that used this propensity for oddity to generate imagery that was curated and thematically tied to some aspect of human experience or identity. For example, More Tears than Harm by Herinarivo Rakotomanana was a rotoscope animation-style “sensory collage of childhood memories” of growing up in Madagascar. Its specificity and consistent styling lent it a credibility that Fragments of Nowhere didn’t achieve. I also enjoyed Riccardo Fusetti’s Editorial on this front.

More Tears Than Harm, an unusual animated film at AIFF 2025.

Among the 10 films in the festival, two clearly stood above the others in my impressions—and they ended up being the Grand Prix and Gold prize winners. (The judging panel included filmmakers Gaspar Noé and Harmony Korine, Tribeca Enterprises CEO Jane Rosenthal, IMAX head of post and image capture Bruce Markoe, Lionsgate VFX SVP Brianna Domont, Nvidia developer relations lead Richard Kerris, and Runway CEO Cristóbal Valenzuela, among others).

Runner-up Jailbird was the aforementioned quasi-documentary. Directed by Andrew Salter, it was a brief piece that introduced viewers to a program in the UK that places chickens in human prisons as companion animals, to positive effect. Why make that film with AI, you might ask? Well, AI was used to achieve shots that wouldn’t otherwise be doable for a small-budget film to depict the experience from the chicken’s point of view. The crowd loved it.

Jailbird, the runner-up at AIFF 2025.

Then there was the Grand Prix winner, Jacob Adler’s Total Pixel Space, which was, among other things, a philosophical defense of the very idea of AI art. You can watch Total Pixel Space on YouTube right now, unlike some of the other films. I found it strangely moving, even as I saw its selection as the festival’s top winner with some cynicism. Of course they’d pick that one, I thought, although I agreed it was the most interesting of the lot.

Total Pixel Space, the Grand Prix winner at AIFF 2025.

Total Pixel Space

Even though it risked navel-gazing and self-congratulation in this venue, Total Pixel Space was filled with compelling imagery that matched the themes, and it touched on some genuinely interesting ideas—at times, it seemed almost profound, didactic as it was.

“How many images can possibly exist?” the film’s narrator asked. To answer that, it explains the concept of total pixel space, which actually reflects how image generation tools work:

Pixels are the building blocks of digital images—tiny tiles forming a mosaic. Each pixel is defined by numbers representing color and position. Therefore, any digital image can be represented as a sequence of numbers…

Just as we don’t need to write down every number between zero and one to prove they exist, we don’t need to generate every possible image to prove they exist. Their existence is guaranteed by the mathematics that defines them… Every frame of every possible film exists as coordinates… To deny this would be to deny the existence of numbers themselves.

The nine-minute film demonstrates that the number of possible images or films is greater than the number of atoms in the universe and argues that photographers and filmmakers may be seen as discovering images that already exist in the possibility space rather than creating something new.

Within that framework, it’s easy to argue that generative AI is just another way for artists to “discover” images.

The balancing act

“We are all—and I include myself in that group as well—obsessed with technology, and we keep chatting about models and data sets and training and capabilities,” Runway CEO Cristóbal Valenzuela said to me when we spoke the next morning. “But if you look back and take a minute, the festival was celebrating filmmakers and artists.”

I admitted that I found myself moved by Total Pixel Space‘s articulations. “The winner would never have thought of himself as a filmmaker, and he made a film that made you feel something,” Valenzuela responded. “I feel that’s very powerful. And the reason he could do it was because he had access to something that just wasn’t possible a couple of months ago.”

First-time and outsider filmmakers were the focus of AIFF 2025, but Runway works with established studios, too—and those relationships have an inherent tension.

The company has signed deals with companies like Lionsgate and AMC Networks. In some cases, it trains on data provided by those companies; in others, it embeds within them to try to develop tools that fit how they already work. That’s not something competitors like OpenAI are doing yet, so that, combined with a head start in video generation, has allowed Runway to grow and stay competitive so far.

“We go directly into the companies, and we have teams of creatives that are working alongside them. We basically embed ourselves within the organizations that we’re working with very deeply,” Valenzuela explained. “We do versions of our film festival internally for teams as well so they can go through the process of making something and seeing the potential.”

Founded in 2018 at New York University’s Tisch School of the Arts by two Chileans and one Greek co-founder, Runway has a very different story than its Silicon Valley competitors. It was one of the first to bring an actually usable video-generation tool to the masses. Runway also contributed in foundational ways to the popular Stable Diffusion model.

Though it is vastly outspent by competitors like OpenAI, it has taken a hands-on approach to working with existing industries. You won’t hear Valenzuela or other Runway leaders talking about the imminence of AGI or anything so lofty; instead, it’s all about selling the product as something that can solve existing problems in creatives’ workflows.

Still, an artist’s mindset and relationships within the industry don’t negate some fundamental conflicts. There are multiple intellectual property cases involving Runway and its peers, and though the company hasn’t admitted it, there is evidence that it trained its models on copyrighted YouTube videos, among other things.

Cristóbal Valenzuela speaking on the AIFF 2025 stage. Credit: Samuel Axon

Valenzuela suggested that studios are worried about liability, not underlying principles, though, saying:

Most of the concerns on copyright are on the output side, which is like, how do you make sure that the model doesn’t create something that already exists or infringes on something. And I think for that, we’ve made sure our models don’t and are supportive of the creative direction you want to take without being too limiting. We work with every major studio, and we offer them indemnification.

In the past, he has also defended Runway by saying that what it’s producing is not a re-creation of what has come before. He sees the tool’s generative process as distinct—legally, creatively, and ethically—from simply pulling up assets or references from a database.

“People believe AI is sort of like a system that creates and conjures things magically with no input from users,” he said. “And it’s not. You have to do that work. You still are involved, and you’re still responsible as a user in terms of how you use it.”

He seemed to share this defense of AI as a legitimate tool for artists with conviction, but given that he’s been pitching these products directly to working filmmakers, he was also clearly aware that not everyone agrees with him. There is not even a consensus among those in the industry.

An industry divided

While in LA for the event, I visited separately with two of my oldest friends. Both of them work in the film and television industry in similar disciplines. They each asked what I was in town for, and I told them I was there to cover an AI film festival.

One immediately responded with a grimace of disgust, “Oh, yikes, I’m sorry.” The other responded with bright eyes and intense interest and began telling me how he already uses AI in his day-to-day to do things like extend shots by a second or two for a better edit, and expressed frustration at his company for not adopting the tools faster.

Neither is alone in their attitudes. Hollywood is divided—and not for the first time.

There have been seismic technological changes in the film industry before. There was the transition from silent films to talkies, obviously; moviemaking transformed into an entirely different art. Numerous old jobs were lost, and numerous new jobs were created.

Later, there was the transition from film to digital projection, which may be an even tighter parallel. It was a major disruption, with some companies and careers collapsing while others rose. There were people saying, “Why do we even need this?” while others believed it was the only sane way forward. Some audiences declared the quality worse, and others said it was better. There were analysts arguing it could be stopped, while others insisted it was inevitable.

IMAX’s head of post production, Bruce Markoe, spoke briefly about that history at a press mixer before the festival. “It was a little scary,” he recalled. “It was a big, fundamental change that we were going through.”

People ultimately embraced it, though. “The motion picture and television industry has always been very technology-forward, and they’ve always used new technologies to advance the state of the art and improve the efficiencies,” Markoe said.

When asked whether he thinks the same thing will happen with generative AI tools, he said, “I think some filmmakers are going to embrace it faster than others.” He pointed to AI tools’ usefulness for pre-visualization as particularly valuable and noted some people are already using it that way, but it will take time for people to get comfortable with.

And indeed, many, many filmmakers are still loudly skeptical. “The concept of AI is great,” The Mitchells vs. the Machines director Mike Rianda said in a Wired interview. “But in the hands of a corporation, it is like a buzzsaw that will destroy us all.”

Others are interested in the technology but are concerned that it’s being brought into the industry too quickly, with insufficient planning and protections. That includes Crafty Apes Senior VFX Supervisor Luke DiTomasso. “How fast do we roll out AI technologies without really having an understanding of them?” he asked in an interview with Production Designers Collective. “There’s a potential for AI to accelerate beyond what we might be comfortable with, so I do have some trepidation and am maybe not gung-ho about all aspects of it.

Others remain skeptical that the tools will be as useful as some optimists believe. “AI never passed on anything. It loved everything it read. It wants you to win. But storytelling requires nuance—subtext, emotion, what’s left unsaid. That’s something AI simply can’t replicate,” said Alegre Rodriquez, a member of the Emerging Technology committee at the Motion Picture Editors Guild.

The mirror

Flying back from Los Angeles, I considered two key differences between this generative AI inflection point for Hollywood and the silent/talkie or film/digital transitions.

First, neither of those transitions involved an existential threat to the technology on the basis of intellectual property and copyright. Valenzuela talked about what matters to studio heads—protection from liability over the outputs. But the countless creatives who are critical of these tools also believe they should be consulted and even compensated for their work’s use in the training data for Runway’s models. In other words, it’s not just about the outputs, it’s also about the sourcing. As noted before, there are several cases underway. We don’t know where they’ll land yet.

Second, there’s a more cultural and philosophical issue at play, which Valenzuela himself touched on in our conversation.

“I think AI has become this sort of mirror where anyone can project all their fears and anxieties, but also their optimism and ideas of the future,” he told me.

You don’t have to scroll for long to come across techno-utopians declaring with no evidence that AGI is right around the corner and that it will cure cancer and save our society. You also don’t have to scroll long to encounter visceral anger at every generative AI company from people declaring the technology—which is essentially just a new methodology for programming a computer—fundamentally unethical and harmful, with apocalyptic societal and economic ramifications.

Amid all those bold declarations, this film festival put the focus on the on-the-ground reality. First-time filmmakers who might never have previously cleared Hollywood’s gatekeepers are getting screened at festivals because they can create competitive-looking work with a fraction of the crew and hours. Studios and the people who work there are saying they’re saving time, resources, and headaches in pre-viz, editing, visual effects, and other work that’s usually done under immense time and resource pressure.

“People are not paying attention to the very huge amount of positive outcomes of this technology,” Valenzuela told me, pointing to those examples.

In this online discussion ecosystem that elevates outrage above everything else, that’s likely true. Still, there is a sincere and rigorous conviction among many creatives that their work is contributing to this technology’s capabilities without credit or compensation and that the structural and legal frameworks to ensure minimal human harm in this evolving period of disruption are still inadequate. That’s why we’ve seen groups like the Writers Guild of America West support the Generative AI Copyright Disclosure Act and other similar legislation meant to increase transparency about how these models are trained.

The philosophical question with a legal answer

The winning film argued that “total pixel space represents both the ultimate determinism and the ultimate freedom—every possibility existing simultaneously, waiting for consciousness to give it meaning through the act of choice.”

In making this statement, the film suggested that creativity, above all else, is an act of curation. It’s a claim that nothing, truly, is original. It’s a distillation of human expression into the language of mathematics.

To many, that philosophy rings undeniably true: Every possibility already exists, and artists are just collapsing the waveform to the frame they want to reveal. To others, there is more personal truth to the romantic ideal that artwork is valued precisely because it did not exist until the artist produced it.

All this is to say that the debate about creativity and AI in Hollywood is ultimately a philosophical one. But it won’t be resolved that way.

The industry may succumb to litigation fatigue and a hollowed-out workforce—or it may instead find its way to fair deals, new opportunities for fresh voices, and transparent training sets.

For all this lofty talk about creativity and ideas, the outcome will come down to the contracts, court decisions, and compensation structures—all things that have always been at least as big a part of Hollywood as the creative work itself.

Photo of Samuel Axon

Samuel Axon is the editorial lead for tech and gaming coverage at Ars Technica. He covers AI, software development, gaming, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

Curated realities: An AI film festival and the future of human expression Read More »

google’s-will-smith-double-is-better-at-eating-ai-spaghetti-…-but-it’s-crunchy?

Google’s Will Smith double is better at eating AI spaghetti … but it’s crunchy?

On Tuesday, Google launched Veo 3, a new AI video synthesis model that can do something no major AI video generator has been able to do before: create a synchronized audio track. While from 2022 to 2024, we saw early steps in AI video generation, each video was silent and usually very short in duration. Now you can hear voices, dialog, and sound effects in eight-second high-definition video clips.

Shortly after the new launch, people began asking the most obvious benchmarking question: How good is Veo 3 at faking Oscar-winning actor Will Smith at eating spaghetti?

First, a brief recap. The spaghetti benchmark in AI video traces its origins back to March 2023, when we first covered an early example of horrific AI-generated video using an open source video synthesis model called ModelScope. The spaghetti example later became well-known enough that Smith parodied it almost a year later in February 2024.

Here’s what the original viral video looked like:

One thing people forget is that at the time, the Smith example wasn’t the best AI video generator out there—a video synthesis model called Gen-2 from Runway had already achieved superior results (though it was not yet publicly accessible). But the ModelScope result was funny and weird enough to stick in people’s memories as an early poor example of video synthesis, handy for future comparisons as AI models progressed.

AI app developer Javi Lopez first came to the rescue for curious spaghetti fans earlier this week with Veo 3, performing the Smith test and posting the results on X. But as you’ll notice below when you watch, the soundtrack has a curious quality: The faux Smith appears to be crunching on the spaghetti.

On X, Javi Lopez ran “Will Smith eating spaghetti” in Google’s Veo 3 AI video generator and received this result.

It’s a glitch in Veo 3’s experimental ability to apply sound effects to video, likely because the training data used to create Google’s AI models featured many examples of chewing mouths with crunching sound effects. Generative AI models are pattern-matching prediction machines, and they need to be shown enough examples of various types of media to generate convincing new outputs. If a concept is over-represented or under-represented in the training data, you’ll see unusual generation results, such as jabberwockies.

Google’s Will Smith double is better at eating AI spaghetti … but it’s crunchy? Read More »

with-new-gen-4-model,-runway-claims-to-have-finally-achieved-consistency-in-ai-videos

With new Gen-4 model, Runway claims to have finally achieved consistency in AI videos

For example, it was used in producing the sequence in the film Everything Everywhere All At Once where two rocks with googly eyes had a conversation on a cliff, and it has also been used to make visual gags for The Late Show with Stephen Colbert.

Whereas many competing startups were started by AI researchers or Silicon Valley entrepreneurs, Runway was founded in 2018 by art students at New York University’s Tisch School of the Arts—Cristóbal Valenzuela and Alejandro Matamala from Chilé, and Anastasis Germanidis from Greece.

It was one of the first companies to release a usable video-generation tool to the public, and its team also contributed in foundational ways to the Stable Diffusion model.

It is vastly outspent by competitors like OpenAI, but while most of its competitors have released general-purpose video-creation tools, Runway has sought an Adobe-like place in the industry. It has focused on marketing to creative professionals like designers and filmmakers and has implemented tools meant to make Runway a support tool to existing creative workflows.

The support tool argument (as opposed to a standalone creative product) helped Runway secure a deal with motion picture company Lionsgate, wherein Lionsgate allowed Runway to legally train its models on its library of films, and Runway provided bespoke tools for Lionsgate for use in production or post-production.

That said, Runway is, along with Midjourney and others, one of the subjects of a widely publicized intellectual property case brought by artists who claim the companies illegally trained their models on their work, so not all creatives are on board.

Apart from the announcement about the partnership with Lionsgate, Runway has never publicly shared what data is used to train its models. However, a report in 404 Media seemed to reveal that at least some of the training data included video scraped from the YouTube channels of popular influencers, film studios, and more.

With new Gen-4 model, Runway claims to have finally achieved consistency in AI videos Read More »

the-ai-war-between-google-and-openai-has-never-been-more-heated

The AI war between Google and OpenAI has never been more heated

Over the past month, we’ve seen a rapid cadence of notable AI-related announcements and releases from both Google and OpenAI, and it’s been making the AI community’s head spin. It has also poured fuel on the fire of the OpenAI-Google rivalry, an accelerating game of one-upmanship taking place unusually close to the Christmas holiday.

“How are people surviving with the firehose of AI updates that are coming out,” wrote one user on X last Friday, which is still a hotbed of AI-related conversation. “in the last <24 hours we got gemini flash 2.0 and chatGPT with screenshare, deep research, pika 2, sora, chatGPT projects, anthropic clio, wtf it never ends."

Rumors travel quickly in the AI world, and people in the AI industry had been expecting OpenAI to ship some major products in December. Once OpenAI announced “12 days of OpenAI” earlier this month, Google jumped into gear and seemingly decided to try to one-up its rival on several counts. So far, the strategy appears to be working, but it’s coming at the cost of the rest of the world being able to absorb the implications of the new releases.

“12 Days of OpenAI has turned into like 50 new @GoogleAI releases,” wrote another X user on Monday. “This past week, OpenAI & Google have been releasing at the speed of a new born startup,” wrote a third X user on Tuesday. “Even their own users can’t keep up. Crazy time we’re living in.”

“Somebody told Google that they could just do things,” wrote a16z partner and AI influencer Justine Moore on X, referring to a common motivational meme telling people they “can just do stuff.”

The Google AI rush

OpenAI’s “12 Days of OpenAI” campaign has included releases of their full o1 model, an upgrade from o1-preview, alongside o1-pro for advanced “reasoning” tasks. The company also publicly launched Sora for video generation, added Projects functionality to ChatGPT, introduced Advanced Voice features with video streaming capabilities, and more.

The AI war between Google and OpenAI has never been more heated Read More »

twirling-body-horror-in-gymnastics-video-exposes-ai’s-flaws

Twirling body horror in gymnastics video exposes AI’s flaws


The slithy toves did gyre and gimble in the wabe

Nonsensical jabberwocky movements created by OpenAI’s Sora are typical for current AI-generated video, and here’s why.

A still image from an AI-generated video of an ever-morphing synthetic gymnast. Credit: OpenAI / Deedy

On Wednesday, a video from OpenAI’s newly launched Sora AI video generator went viral on social media, featuring a gymnast who sprouts extra limbs and briefly loses her head during what appears to be an Olympic-style floor routine.

As it turns out, the nonsensical synthesis errors in the video—what we like to call “jabberwockies”—hint at technical details about how AI video generators work and how they might get better in the future.

But before we dig into the details, let’s take a look at the video.

An AI-generated video of an impossible gymnast, created with OpenAI Sora.

In the video, we see a view of what looks like a floor gymnastics routine. The subject of the video flips and flails as new legs and arms rapidly and fluidly emerge and morph out of her twirling and transforming body. At one point, about 9 seconds in, she loses her head, and it reattaches to her body spontaneously.

“As cool as the new Sora is, gymnastics is still very much the Turing test for AI video,” wrote venture capitalist Deedy Das when he originally shared the video on X. The video inspired plenty of reaction jokes, such as this reply to a similar post on Bluesky: “hi, gymnastics expert here! this is not funny, gymnasts only do this when they’re in extreme distress.”

We reached out to Das, and he confirmed that he generated the video using Sora. He also provided the prompt, which was very long and split into four parts, generated by Anthropic’s Claude, using complex instructions like “The gymnast initiates from the back right corner, taking position with her right foot pointed behind in B-plus stance.”

“I’ve known for the last 6 months having played with text to video models that they struggle with complex physics movements like gymnastics,” Das told us in a conversation. “I had to try it [in Sora] because the character consistency seemed improved. Overall, it was an improvement because previously… the gymnast would just teleport away or change their outfit mid flip, but overall it still looks downright horrifying. We hoped AI video would learn physics by default, but that hasn’t happened yet!”

So what went wrong?

When examining how the video fails, you must first consider how Sora “knows” how to create anything that resembles a gymnastics routine. During the training phase, when the Sora model was created, OpenAI fed example videos of gymnastics routines (among many other types of videos) into a specialized neural network that associates the progression of images with text-based descriptions of them.

That type of training is a distinct phase that happens once before the model’s release. Later, when the finished model is running and you give a video-synthesis model like Sora a written prompt, it draws upon statistical associations between words and images to produce a predictive output. It’s continuously making next-frame predictions based on the last frame of the video. But Sora has another trick for attempting to preserve coherency over time. “By giving the model foresight of many frames at a time,” reads OpenAI’s Sora System Card, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily.”

A still image from a moment where the AI-generated gymnast loses her head. It soon re-attaches to her body.

A still image from a moment where the AI-generated gymnast loses her head. It soon reattaches to her body. Credit: OpenAI / Deedy

Maybe not quite solved yet. In this case, rapidly moving limbs prove a particular challenge when attempting to predict the next frame properly. The result is an incoherent amalgam of gymnastics footage that shows the same gymnast performing running flips and spins, but Sora doesn’t know the correct order in which to assemble them because it’s pulling on statistical averages of wildly different body movements in its relatively limited training data of gymnastics videos, which also likely did not include limb-level precision in its descriptive metadata.

Sora doesn’t know anything about physics or how the human body should work, either. It’s drawing upon statistical associations between pixels in the videos in its training dataset to predict the next frame, with a little bit of look-ahead to keep things more consistent.

This problem is not unique to Sora. All AI video generators can produce wildly nonsensical results when your prompts reach too far past their training data, as we saw earlier this year when testing Runway’s Gen-3. In fact, we ran some gymnast prompts through the latest open source AI video model that may rival Sora in some ways, Hunyuan Video, and it produced similar twirling, morphing results, seen below. And we used a much simpler prompt than Das did with Sora.

An example from open source Chinese AI model Hunyuan Video with the prompt, “A young woman doing a complex floor gymnastics routine at the olympics, featuring running and flips.”

AI models based on transformer technology are fundamentally imitative in nature. They’re great at transforming one type of data into another type or morphing one style into another. What they’re not great at (yet) is producing coherent generations that are truly original. So if you happen to provide a prompt that closely matches a training video, you might get a good result. Otherwise, you may get madness.

As we wrote about image-synthesis model Stable Diffusion 3’s body horror generations earlier this year, “Basically, any time a user prompt homes in on a concept that isn’t represented well in the AI model’s training dataset, the image-synthesis model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying.”

For the engineers who make these models, success in AI video generation quickly becomes a question of how many examples (and how much training) you need before the model can generalize enough to produce convincing and coherent results. It’s also a question of metadata quality—how accurately the videos are labeled. In this case, OpenAI used an AI vision model to describe its training videos, which helped improve quality, but apparently not enough—yet.

We’re looking at an AI jabberwocky in action

In a way, the type of generation failure in the gymnast video is a form of confabulation (or hallucination, as some call it), but it’s even worse because it’s not coherent. So instead of calling it a confabulation, which is a plausible-sounding fabrication, we’re going to lean on a new term, “jabberwocky,” which Dictionary.com defines as “a playful imitation of language consisting of invented, meaningless words; nonsense; gibberish,” taken from Lewis Carroll’s nonsense poem of the same name. Imitation and nonsense, you say? Check and check.

We’ve covered jabberwockies in AI video before with people mocking Chinese video-synthesis models, a monstrously weird AI beer commercial, and even Will Smith eating spaghetti. They’re a form of misconfabulation where an AI model completely fails to produce a plausible output. This will not be the last time we see them, either.

How could AI video models get better and avoid jabberwockies?

In our coverage of Gen-3 Alpha, we called the threshold where you get a level of useful generalization in an AI model the “illusion of understanding,” where training data and training time reach a critical mass that produces good enough results to generalize across enough novel prompts.

One of the key reasons language models like OpenAI’s GPT-4 impressed users was that they finally reached a size where they had absorbed enough information to give the appearance of genuinely understanding the world. With video synthesis, achieving this same apparent level of “understanding” will require not just massive amounts of well-labeled training data but also the computational power to process it effectively.

AI boosters hope that these current models represent one of the key steps on the way to something like truly general intelligence (often called AGI) in text, or in AI video, what OpenAI and Runway researchers call “world simulators” or “world models” that somehow encode enough physics rules about the world to produce any realistic result.

Judging by the morphing alien shoggoth gymnast, that may still be a ways off. Still, it’s early days in AI video generation, and judging by how quickly AI image-synthesis models like Midjourney progressed from crude abstract shapes into coherent imagery, it’s likely video synthesis will have a similar trajectory over time. Until then, enjoy the AI-generated jabberwocky madness.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Twirling body horror in gymnastics video exposes AI’s flaws Read More »

ten-months-after-first-tease,-openai-launches-sora-video-generation-publicly

Ten months after first tease, OpenAI launches Sora video generation publicly

A music video by Canadian art collective Vallée Duhamel made with Sora-generated video. “[We] just shoot stuff and then use Sora to combine it with a more interesting, more surreal vision.”

During a livestream on Monday—during Day 3 of OpenAI’s “12 days of OpenAi”—Sora’s developers showcased a new “Explore” interface that allows people to browse through videos generated by others to get prompting ideas. OpenAI says that anyone can enjoy viewing the “Explore” feed for free, but generating videos requires a subscription.

They also showed off a new feature called “Storyboard” that allows users to direct a video with multiple actions in a frame-by-frame manner.

Safety measures and limitations

In addition to the release, OpenAI also publish Sora’s System Card for the first time. It includes technical details about how the model works and safety testing the company undertook prior to this release.

“Whereas LLMs have text tokens, Sora has visual patches,” OpenAI writes, describing the new training chunks as “an effective representation for models of visual data… At a high level, we turn videos into patches by first compressing videos into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches.”

Sora also makes use of a “recaptioning technique”—similar to that seen in the company’s DALL-E 3 image generation, to “generate highly descriptive captions for the visual training data.” That, in turn, lets Sora “follow the user’s text instructions in the generated video more faithfully,” OpenAI writes.

Sora-generated video provided by OpenAI, from the prompt: “Loop: a golden retriever puppy wearing a superhero outfit complete with a mask and cape stands perched on the top of the empire state building in winter, overlooking the nyc it protects at night. the back of the pup is visible to the camera; his attention faced to nyc”

OpenAI implemented several safety measures in the release. The platform embeds C2PA metadata in all generated videos for identification and origin verification. Videos display visible watermarks by default, and OpenAI developed an internal search tool to verify Sora-generated content.

The company acknowledged technical limitations in the current release. “This early version of Sora will make mistakes, it’s not perfect,” said one developer during the livestream launch. The model reportedly struggles with physics simulations and complex actions over extended durations.

In the past, we’ve seen that these types of limitations are based on what example videos were used to train AI models. This current generation of AI video-synthesis models has difficulty generating truly new things, since the underlying architecture excels at transforming existing concepts into new presentations, but so far typically fails at true originality. Still, it’s early in AI video generation, and the technology is improving all the time.

Ten months after first tease, OpenAI launches Sora video generation publicly Read More »

ai-generated-shows-could-replace-lost-dvd-revenue,-ben-affleck-says

AI-generated shows could replace lost DVD revenue, Ben Affleck says

Last week, actor and director Ben Affleck shared his views on AI’s role in filmmaking during the 2024 CNBC Delivering Alpha investor summit, arguing that AI models will transform visual effects but won’t replace creative filmmaking anytime soon. A video clip of Affleck’s opinion began circulating widely on social media not long after.

“Didn’t expect Ben Affleck to have the most articulate and realistic explanation where video models and Hollywood is going,” wrote one X user.

In the clip, Affleck spoke of current AI models’ abilities as imitators and conceptual translators—mimics that are typically better at translating one style into another instead of originating deeply creative material.

“AI can write excellent imitative verse, but it cannot write Shakespeare,” Affleck told CNBC’s David Faber. “The function of having two, three, or four actors in a room and the taste to discern and construct that entirely eludes AI’s capability.”

Affleck sees AI models as “craftsmen” rather than artists (although some might find the term “craftsman” in his analogy somewhat imprecise). He explained that while AI can learn through imitation—like a craftsman studying furniture-making techniques—it lacks the creative judgment that defines artistry. “Craftsman is knowing how to work. Art is knowing when to stop,” he said.

“It’s not going to replace human beings making films,” Affleck stated. Instead, he sees AI taking over “the more laborious, less creative and more costly aspects of filmmaking,” which could lower barriers to entry and make it easier for emerging filmmakers to create movies like Good Will Hunting.

Films will become dramatically cheaper to make

While it may seem on its surface like Affleck was attacking generative AI capabilities in the tech industry, he also did not deny the impact it may have on filmmaking. For example, he predicted that AI would reduce costs and speed up production schedules, potentially allowing shows like HBO’s House of the Dragon to release two seasons in the same period as it takes to make one.

AI-generated shows could replace lost DVD revenue, Ben Affleck says Read More »

new-zemeckis-film-used-ai-to-de-age-tom-hanks-and-robin-wright

New Zemeckis film used AI to de-age Tom Hanks and Robin Wright

On Friday, TriStar Pictures released Here, a $50 million Robert Zemeckis-directed film that used real time generative AI face transformation techniques to portray actors Tom Hanks and Robin Wright across a 60-year span, marking one of Hollywood’s first full-length features built around AI-powered visual effects.

The film adapts a 2014 graphic novel set primarily in a New Jersey living room across multiple time periods. Rather than cast different actors for various ages, the production used AI to modify Hanks’ and Wright’s appearances throughout.

The de-aging technology comes from Metaphysic, a visual effects company that creates real time face swapping and aging effects. During filming, the crew watched two monitors simultaneously: one showing the actors’ actual appearances and another displaying them at whatever age the scene required.

Here – Official Trailer (HD)

Metaphysic developed the facial modification system by training custom machine-learning models on frames of Hanks’ and Wright’s previous films. This included a large dataset of facial movements, skin textures, and appearances under varied lighting conditions and camera angles. The resulting models can generate instant face transformations without the months of manual post-production work traditional CGI requires.

Unlike previous aging effects that relied on frame-by-frame manipulation, Metaphysic’s approach generates transformations instantly by analyzing facial landmarks and mapping them to trained age variations.

“You couldn’t have made this movie three years ago,” Zemeckis told The New York Times in a detailed feature about the film. Traditional visual effects for this level of face modification would reportedly require hundreds of artists and a substantially larger budget closer to standard Marvel movie costs.

This isn’t the first film that has used AI techniques to de-age actors. ILM’s approach to de-aging Harrison Ford in 2023’s Indiana Jones and the Dial of Destiny used a proprietary system called Flux with infrared cameras to capture facial data during filming, then old images of Ford to de-age him in post-production. By contrast, Metaphysic’s AI models process transformations without additional hardware and show results during filming.

New Zemeckis film used AI to de-age Tom Hanks and Robin Wright Read More »

adobe-unveils-ai-video-generator-trained-on-licensed-content

Adobe unveils AI video generator trained on licensed content

On Monday, Adobe announced Firefly Video Model, a new AI-powered text-to-video generation tool that can create novel videos from written prompts. It joins similar offerings from OpenAI, Runway, Google, and Meta in an increasingly crowded field. Unlike the competition, Adobe claims that Firefly Video Model is trained exclusively on licensed content, potentially sidestepping ethical and copyright issues that have plagued other generative AI tools.

Because of its licensed training data roots, Adobe calls Firefly Video Model “the first publicly available video model designed to be commercially safe.” However, the San Jose, California-based software firm hasn’t announced a general release date, and during a beta test period, it’s only granting access to people on a waiting list.

An example video of Adobe’s Firefly Video Model, provided by Adobe.

In the works since at least April 2023, the new model builds off of techniques Adobe developed for its Firefly image synthesis models. Like its text-to-image generator, which the company later integrated into Photoshop, Adobe hopes to aim Firefly Video Model at media professionals, such as video creators and editors. The company claims its model can produce footage that blends seamlessly with traditionally created video content.

Adobe unveils AI video generator trained on licensed content Read More »

is-china-pulling-ahead-in-ai-video-synthesis?-we-put-minimax-to-the-test

Is China pulling ahead in AI video synthesis? We put Minimax to the test

In the spirit of not cherry-picking any results, everything you see was the first generation we received for the prompt listed above it.

“A highly intelligent person reading ‘Ars Technica’ on their computer when the screen explodes”

“A cat in a car drinking a can of beer, beer commercial”

“Will Smith eating spaghetti

“Robotic humanoid animals with vaudeville costumes roam the streets collecting protection money in tokens”

“A basketball player in a haunted passenger train car with a basketball court, and he is playing against a team of ghosts”

“A herd of one million cats running on a hillside, aerial view”

“Video game footage of a dynamic 1990s third-person 3D platform game starring an anthropomorphic shark boy”

“A muscular barbarian breaking a CRT television set with a weapon, cinematic, 8K, studio lighting”

Limitations of video synthesis models

Overall, the Minimax video-01 results seen above feel fairly similar to Gen-3’s outputs, with some differences, like the lack of a celebrity filter on Will Smith (who sadly did not actually eat the spaghetti in our tests), and the more realistic cat hands and licking motion. Some results were far worse, like the one million cats and the Ars Technica reader.

Is China pulling ahead in AI video synthesis? We put Minimax to the test Read More »

meta’s-new-“movie-gen”-ai-system-can-deepfake-video-from-a-single-photo

Meta’s new “Movie Gen” AI system can deepfake video from a single photo

On Friday, Meta announced a preview of Movie Gen, a new suite of AI models designed to create and manipulate video, audio, and images, including creating a realistic video from a single photo of a person. The company claims the models outperform other video-synthesis models when evaluated by humans, pushing us closer to a future where anyone can synthesize a full video of any subject on demand.

The company does not yet have plans of when or how it will release these capabilities to the public, but Meta says Movie Gen is a tool that may allow people to “enhance their inherent creativity” rather than replace human artists and animators. The company envisions future applications such as easily creating and editing “day in the life” videos for social media platforms or generating personalized animated birthday greetings.

Movie Gen builds on Meta’s previous work in video synthesis, following 2022’s Make-A-Scene video generator and the Emu image-synthesis model. Using text prompts for guidance, this latest system can generate custom videos with sounds for the first time, edit and insert changes into existing videos, and transform images of people into realistic personalized videos.

An AI-generated video of a baby hippo swimming around, created with Meta Movie Gen.

Meta isn’t the only game in town when it comes to AI video synthesis. Google showed off a new model called “Veo” in May, and Meta says that in human preference tests, its Movie Gen outputs beat OpenAI’s Sora, Runway Gen-3, and Chinese video model Kling.

Movie Gen’s video-generation model can create 1080p high-definition videos up to 16 seconds long at 16 frames per second from text descriptions or an image input. Meta claims the model can handle complex concepts like object motion, subject-object interactions, and camera movements.

AI-generated video from Meta Movie Gen with the prompt: “A ghost in a white bedsheet faces a mirror. The ghost’s reflection can be seen in the mirror. The ghost is in a dusty attic, filled with old beams, cloth-covered furniture. The attic is reflected in the mirror. The light is cool and natural. The ghost dances in front of the mirror.”

Even so, as we’ve seen with previous AI video generators, Movie Gen’s ability to generate coherent scenes on a particular topic is likely dependent on the concepts found in the example videos that Meta used to train its video-synthesis model. It’s worth keeping in mind that cherry-picked results from video generators often differ dramatically from typical results and getting a coherent result may require lots of trial and error.

Meta’s new “Movie Gen” AI system can deepfake video from a single photo Read More »

terminator’s-cameron-joins-ai-company-behind-controversial-image-generator

Terminator’s Cameron joins AI company behind controversial image generator

a net in the sky —

Famed sci-fi director joins board of embattled Stability AI, creator of Stable Diffusion.

A photo of filmmaker James Cameron.

Enlarge / Filmmaker James Cameron.

On Tuesday, Stability AI announced that renowned filmmaker James Cameron—of Terminator and Skynet fame—has joined its board of directors. Stability is best known for its pioneering but highly controversial Stable Diffusion series of AI image-synthesis models, first launched in 2022, which can generate images based on text descriptions.

“I’ve spent my career seeking out emerging technologies that push the very boundaries of what’s possible, all in the service of telling incredible stories,” said Cameron in a statement. “I was at the forefront of CGI over three decades ago, and I’ve stayed on the cutting edge since. Now, the intersection of generative AI and CGI image creation is the next wave.”

Cameron is perhaps best known as the director behind blockbusters like Avatar, Titanic, and Aliens, but in AI circles, he may be most relevant for the co-creation of the character Skynet, a fictional AI system that triggers nuclear Armageddon and dominates humanity in the Terminator media franchise. Similar fears of AI taking over the world have since jumped into reality and recently sparked attempts to regulate existential risk from AI systems through measures like SB-1047 in California.

In a 2023 interview with CTV news, Cameron referenced The Terminator‘s release year when asked about AI’s dangers: “I warned you guys in 1984, and you didn’t listen,” he said. “I think the weaponization of AI is the biggest danger. I think that we will get into the equivalent of a nuclear arms race with AI, and if we don’t build it, the other guys are for sure going to build it, and so then it’ll escalate.”

Hollywood goes AI

Of course, Stability AI isn’t building weapons controlled by AI. Instead, Cameron’s interest in cutting-edge filmmaking techniques apparently drew him to the company.

“James Cameron lives in the future and waits for the rest of us to catch up,” said Stability CEO Prem Akkaraju. “Stability AI’s mission is to transform visual media for the next century by giving creators a full stack AI pipeline to bring their ideas to life. We have an unmatched advantage to achieve this goal with a technological and creative visionary like James at the highest levels of our company. This is not only a monumental statement for Stability AI, but the AI industry overall.”

Cameron joins other recent additions to Stability AI’s board, including Sean Parker, former president of Facebook, who serves as executive chairman. Parker called Cameron’s appointment “the start of a new chapter” for the company.

Despite significant protest from actors’ unions last year, elements of Hollywood are seemingly beginning to embrace generative AI over time. Last Wednesday, we covered a deal between Lionsgate and AI video-generation company Runway that will see the creation of a custom AI model for film production use. In March, the Financial Times reported that OpenAI was actively showing off its Sora video synthesis model to studio executives.

Unstable times for Stability AI

Cameron’s appointment to the Stability AI board comes during a tumultuous period for the company. Stability AI has faced a series of challenges this past year, including an ongoing class-action copyright lawsuit, a troubled Stable Diffusion 3 model launch, significant leadership and staff changes, and ongoing financial concerns.

In March, founder and CEO Emad Mostaque resigned, followed by a round of layoffs. This came on the heels of the departure of three key engineers—Robin Rombach, Andreas Blattmann, and Dominik Lorenz, who have since founded Black Forest Labs and released a new open-weights image-synthesis model called Flux, which has begun to take over the r/StableDiffusion community on Reddit.

Despite the issues, Stability AI claims its models are widely used, with Stable Diffusion reportedly surpassing 150 million downloads. The company states that thousands of businesses use its models in their creative workflows.

While Stable Diffusion has indeed spawned a large community of open-weights-AI image enthusiasts online, it has also been a lightning rod for controversy among some artists because Stability originally trained its models on hundreds of millions of images scraped from the Internet without seeking licenses or permission to use them.

Apparently that association is not a concern for Cameron, according to his statement: “The convergence of these two totally different engines of creation [CGI and generative AI] will unlock new ways for artists to tell stories in ways we could have never imagined. Stability AI is poised to lead this transformation.”

Terminator’s Cameron joins AI company behind controversial image generator Read More »