video generator

bytedance-backpedals-after-seedance-2.0-turned-hollywood-icons-into-ai-“clip-art”

ByteDance backpedals after Seedance 2.0 turned Hollywood icons into AI “clip art”


Misstep or marketing tactic?

Hollywood backlash puts spotlight on ByteDance’s sketchy launch of Seedance 2.0.

ByteDance says that it’s rushing to add safeguards to block Seedance 2.0 from generating iconic characters and deepfaking celebrities, after substantial Hollywood backlash after launching the latest version of its AI video tool.

The changes come after Disney and Paramount Skydance sent cease-and-desist letters to ByteDance urging the Chinese company to promptly end the allegedly vast and blatant infringement.

Studios claimed the infringement was widescale and immediate, with Seedance 2.0 users across social media sharing AI videos featuring copyrighted characters like Spider-Man, Darth Vader, and SpongeBob Square Pants. In its letter, Disney fumed that Seedance was “hijacking” its characters, accusing ByteDance of treating Disney characters like they were “free public domain clip art,” Axios reported.

“ByteDance’s virtual smash-and-grab of Disney’s IP is willful, pervasive, and totally unacceptable,” Disney’s letter said.

Defending intellectual property from franchises like Star Trek and The Godfather, Paramount Skydance pointed out that Seedance’s outputs are “often indistinguishable, both visually and audibly” from the original characters, Variety reported. Similarly frustrated, Japan’s AI minister Kimi Onoda, sought to protect popular anime and manga characters, officially launching a probe last week into ByteDance over the copyright violations, the South China Morning Post reported.

“We cannot overlook a situation in which content is being used without the copyright holder’s permission,” Onoda said at a press conference Friday.

Facing legal threats and Japan’s investigation, ByteDance issued a statement Monday, CNBC reported. In it, the company claimed that it “respects intellectual property rights” and has “heard the concerns regarding Seedance 2.0.”

“We are taking steps to strengthen current safeguards as we work to prevent the unauthorized use of intellectual property and likeness by users,” ByteDance said.

However, Disney seems unlikely to accept that ByteDance inadvertently released its tool without implementing such safeguards in advance. In its letter, Disney alleged that “Seedance has infringed on Disney’s copyrighted materials to benefit its commercial service without permission.”

After all, what better way to illustrate Seedance 2.0’s latest features than by generating some of the best-known IP in the world? At least one tech consultant has suggested that ByteDance planned to benefit from inciting Hollywood outrage. The founder of San Francisco-based consultancy Tech Buzz China, Rui Ma, told SCMP that “the controversy surrounding Seedance is likely part of ByteDance’s initial distribution strategy to showcase its underlying technical capabilities.”

Seedance 2.0 is an “attack” on creators

Studios aren’t the only ones sounding alarms.

Several industry groups expressed concerns, including the Motion Picture Association, which accused ByteDance of engaging in massive copyright infringement within “a single day,” CNBC reported.

Sean Astin, an actor and president of the actors union, SAG-AFTRA, was directly impacted by the scandal. A video that has since been removed from X showed Astin in the role of Samwise Gamgee from The Lord of the Rings, delivering a line he never said, Variety reported. Condemning Seedance’s infringement, SAG-AFTRA issued a statement emphasizing that ByteDance did not act responsibly in releasing the model without safeguards:

“SAG-AFTRA stands with the studios in condemning the blatant infringement enabled by ByteDance’s new AI video model Seedance 2.0. The infringement includes the unauthorized use of our members’ voices and likenesses. This is unacceptable and undercuts the ability of human talent to earn a livelihood. Seedance 2.0 disregards law, ethics, industry standards and basic principles of consent. Responsible AI development demands responsibility, and that is nonexistent here.”

Echoing that, a group representing Hollywood creators, the Human Artistry Campaign, declared that “the launch of Seedance 2.0” was “an attack on every creator around the world.”

“Stealing human creators’ work in an attempt to replace them with AI generated slop is destructive to our culture: stealing isn’t innovation,” the group said. “These unauthorized deepfakes and voice clones of actors violate the most basic aspects of personal autonomy and should be deeply concerning to everyone. Authorities should use every legal tool at their disposal to stop this wholesale theft.”

Ars could not immediately reach any of these groups to comment on whether ByteDance’s post-launch efforts to add safeguards addressed industry concerns.

MPA chairman and CEO Charles Rivkin has previously accused ByteDance of disregarding “well-established copyright law that protects the rights of creators and underpins millions of American jobs.”

While Disney and other studios are clearly ready to take down any tools that could hurt their revenue or reputation without an agreement in place, they aren’t opposed to all AI uses of their characters. In December, Disney struck a deal with OpenAI, giving Sora access to 200 characters for three years, while investing $1 billion in the technology.

At that time, Disney CEO Robert A. Iger, said that “the rapid advancement of artificial intelligence marks an important moment for our industry, and through this collaboration with OpenAI, we will thoughtfully and responsibly extend the reach of our storytelling through generative AI, while respecting and protecting creators and their works.”

Creators disagree Seedance 2.0 is a game changer

In a blog announcing Seedance 2.0, ByteDance boasted that the new model “delivers a substantial leap in generation quality,” particularly in close-up shots and action sequences.

The company acknowledged that further refinements were needed and the model is “still far from perfect” but hyped that “its generated videos possess a distinct cinematic aesthetic; the textures of objects, lighting, and composition, as well as costume, makeup, and prop designs, all show high degrees of finish.”

ByteDance likely hoped that the earliest outputs from Seedance 2.0 would produce headlines wowed by the model’s capabilities, and it got what it wanted when a single Hollywood stakeholder’s social media comment went viral.

Shortly after Seedance 2.0’s rollout, Deadpool co-writer, Rhett Reese, declared on X that “it’s likely over for us,” The Guardian reported. The screenwriter was impressed by an AI video created by Irish director Ruairi Robinson, which realistically depicted Tom Cruise fighting Brad Pitt. “[I]n next to no time, one person is going to be able to sit at a computer and create a movie indistinguishable from what Hollywood now releases,” Reese opined. “True, if that person is no good, it will suck. But if that person possesses Christopher Nolan’s talent and taste (and someone like that will rapidly come along), it will be tremendous.”

However, some AI critics rejected the notion that Seedance 2.0 is capable of replacing artists in the way that Reese warned. On Bluesky and X, they pushed back on ByteDance claims that this model doomed Hollywood, with some accusing outlets of too quickly ascribing Reese’s reaction to the whole industry.

Among them was longtime AI critic, Reid Southen, a film concept artist who works on major motion pictures and TV. Responding directly to Reese’s X thread, Southen contradicted the notion that a great filmmaker could be born from fiddling with AI prompts alone.

“Nolan is capable of doing great work because he’s put in the work,” Southen said. “AI is an automation tool, it’s literally removing key, fundamental work from the process, how does one become good at anything if they insist on using nothing but shortcuts?”

Perhaps the strongest evidence in Southen’s favor is Darren Aronofsky’s recent AI-generated historical docudrama. Speaking anonymously to Ars following backlash declaring that “AI slop is ruining American history,” one source close to production on that project confirmed that it took “weeks” to produce minutes of usable video using a variety of AI tools.

That source noted that the creative team went into the project expecting they had a lot to learn but also expecting that tools would continue to evolve, as could audience reactions to AI-assisted movies.

“It’s a huge experiment, really,” the source told Ars.

Notably, for both creators and rights-holders concerned about copyright infringement and career threats, questions remain on how Seedance 2.0 was trained. ByteDance has yet to release a technical report for Seedance 2.0 and “has never disclosed the data sets it uses to train its powerful video-generation Seedance models and image-generation Seedream models,” SCMP reported.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

ByteDance backpedals after Seedance 2.0 turned Hollywood icons into AI “clip art” Read More »

openai-collapses-media-reality-with-sora,-a-photorealistic-ai-video-generator

OpenAI collapses media reality with Sora, a photorealistic AI video generator

Pics and it didn’t happen —

Hello, cultural singularity—soon, every video you see online could be completely fake.

Snapshots from three videos generated using OpenAI's Sora.

Enlarge / Snapshots from three videos generated using OpenAI’s Sora.

On Thursday, OpenAI announced Sora, a text-to-video AI model that can generate 60-second-long photorealistic HD video from written descriptions. While it’s only a research preview that we have not tested, it reportedly creates synthetic video (but not audio yet) at a fidelity and consistency greater than any text-to-video model available at the moment. It’s also freaking people out.

“It was nice knowing you all. Please tell your grandchildren about my videos and the lengths we went to to actually record them,” wrote Wall Street Journal tech reporter Joanna Stern on X.

“This could be the ‘holy shit’ moment of AI,” wrote Tom Warren of The Verge.

“Every single one of these videos is AI-generated, and if this doesn’t concern you at least a little bit, nothing will,” tweeted YouTube tech journalist Marques Brownlee.

For future reference—since this type of panic will some day appear ridiculous—there’s a generation of people who grew up believing that photorealistic video must be created by cameras. When video was faked (say, for Hollywood films), it took a lot of time, money, and effort to do so, and the results weren’t perfect. That gave people a baseline level of comfort that what they were seeing remotely was likely to be true, or at least representative of some kind of underlying truth. Even when the kid jumped over the lava, there was at least a kid and a room.

The prompt that generated the video above: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

Technology like Sora pulls the rug out from under that kind of media frame of reference. Very soon, every photorealistic video you see online could be 100 percent false in every way. Moreover, every historical video you see could also be false. How we confront that as a society and work around it while maintaining trust in remote communications is far beyond the scope of this article, but I tried my hand at offering some solutions back in 2020, when all of the tech we’re seeing now seemed like a distant fantasy to most people.

In that piece, I called the moment that truth and fiction in media become indistinguishable the “cultural singularity.” It appears that OpenAI is on track to bring that prediction to pass a bit sooner than we expected.

Prompt: Reflections in the window of a train traveling through the Tokyo suburbs.

OpenAI has found that, like other AI models that use the transformer architecture, Sora scales with available compute. Given far more powerful computers behind the scenes, AI video fidelity could improve considerably over time. In other words, this is the “worst” AI-generated video is ever going to look. There’s no synchronized sound yet, but that might be solved in future models.

How (we think) they pulled it off

AI video synthesis has progressed by leaps and bounds over the past two years. We first covered text-to-video models in September 2022 with Meta’s Make-A-Video. A month later, Google showed off Imagen Video. And just 11 months ago, an AI-generated version of Will Smith eating spaghetti went viral. In May of last year, what was previously considered to be the front-runner in the text-to-video space, Runway Gen-2, helped craft a fake beer commercial full of twisted monstrosities, generated in two-second increments. In earlier video-generation models, people pop in and out of reality with ease, limbs flow together like pasta, and physics doesn’t seem to matter.

Sora (which means “sky” in Japanese) appears to be something altogether different. It’s high-resolution (1920×1080), can generate video with temporal consistency (maintaining the same subject over time) that lasts up to 60 seconds, and appears to follow text prompts with a great deal of fidelity. So, how did OpenAI pull it off?

OpenAI doesn’t usually share insider technical details with the press, so we’re left to speculate based on theories from experts and information given to the public.

OpenAI says that Sora is a diffusion model, much like DALL-E 3 and Stable Diffusion. It generates a video by starting off with noise and “gradually transforms it by removing the noise over many steps,” the company explains. It “recognizes” objects and concepts listed in the written prompt and pulls them out of the noise, so to speak, until a coherent series of video frames emerge.

Sora is capable of generating videos all at once from a text prompt, extending existing videos, or generating videos from still images. It achieves temporal consistency by giving the model “foresight” of many frames at once, as OpenAI calls it, solving the problem of ensuring a generated subject remains the same even if it falls out of view temporarily.

OpenAI represents video as collections of smaller groups of data called “patches,” which the company says are similar to tokens (fragments of a word) in GPT-4. “By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions, and aspect ratios,” the company writes.

An important tool in OpenAI’s bag of tricks is that its use of AI models is compounding. Earlier models are helping to create more complex ones. Sora follows prompts well because, like DALL-E 3, it utilizes synthetic captions that describe scenes in the training data generated by another AI model like GPT-4V. And the company is not stopping here. “Sora serves as a foundation for models that can understand and simulate the real world,” OpenAI writes, “a capability we believe will be an important milestone for achieving AGI.”

One question on many people’s minds is what data OpenAI used to train Sora. OpenAI has not revealed its dataset, but based on what people are seeing in the results, it’s possible OpenAI is using synthetic video data generated in a video game engine in addition to sources of real video (say, scraped from YouTube or licensed from stock video libraries). Nvidia’s Dr. Jim Fan, who is a specialist in training AI with synthetic data, wrote on X, “I won’t be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!” Until confirmed by OpenAI, however, that’s just speculation.

OpenAI collapses media reality with Sora, a photorealistic AI video generator Read More »