image synthesis

google’s-latest-ai-video-generator-can-render-cute-animals-in-implausible-situations

Google’s latest AI video generator can render cute animals in implausible situations

An elephant with a party hat—underwater —

Lumiere generates five-second videos that “portray realistic, diverse and coherent motion.”

Still images of AI-generated video examples provided by Google for its Lumiere video synthesis model.

Enlarge / Still images of AI-generated video examples provided by Google for its Lumiere video synthesis model.

On Tuesday, Google announced Lumiere, an AI video generator that it calls “a space-time diffusion model for realistic video generation” in the accompanying preprint paper. But let’s not kid ourselves: It does a great job at creating videos of cute animals in ridiculous scenarios, such as using roller skates, driving a car, or playing a piano. Sure, it can do more, but it is perhaps the most advanced text-to-animal AI video generator yet demonstrated.

According to Google, Lumiere utilizes unique architecture to generate a video’s entire temporal duration in one go. Or, as the company put it, “We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution—an approach that inherently makes global temporal consistency difficult to achieve.”

In layperson terms, Google’s tech is designed to handle both the space (where things are in the video) and time (how things move and change throughout the video) aspects simultaneously. So, instead of making a video by putting together many small parts or frames, it can create the entire video, from start to finish, in one smooth process.

The official promotional video accompanying the paper “Lumiere: A Space-Time Diffusion Model for Video Generation,” released by Google.

Lumiere can also do plenty of party tricks, which are laid out quite well with examples on Google’s demo page. For example, it can perform text-to-video generation (turning a written prompt into a video), convert still images into videos, generate videos in specific styles using a reference image, apply consistent video editing using text-based prompts, create cinemagraphs by animating specific regions of an image, and offer video inpainting capabilities (for example, it can change the type of dress a person is wearing).

In the Lumiere research paper, the Google researchers state that the AI model outputs five-second long 1024×1024 pixel videos, which they describe as “low-resolution.” Despite those limitations, the researchers performed a user study and claim that Lumiere’s outputs were preferred over existing AI video synthesis models.

As for training data, Google doesn’t say where it got the videos they fed into Lumiere, writing, “We train our T2V [text to video] model on a dataset containing 30M videos along with their text caption. [sic] The videos are 80 frames long at 16 fps (5 seconds). The base model is trained at 128×128.”

A block diagram showing components of the Lumiere AI model, provided by Google.

Enlarge / A block diagram showing components of the Lumiere AI model, provided by Google.

AI-generated video is still in a primitive state, but it’s been progressing in quality over the past two years. In October 2022, we covered Google’s first publicly unveiled image synthesis model, Imagen Video. It could generate short 1280×768 video clips from a written prompt at 24 frames per second, but the results weren’t always coherent. Before that, Meta debuted its AI video generator, Make-A-Video. In June of last year, Runway’s Gen2 video synthesis model enabled the creation of two-second video clips from text prompts, fueling the creation of surrealistic parody commercials. And in November, we covered Stable Video Diffusion, which can generate short clips from still images.

AI companies often demonstrate video generators with cute animals because generating coherent, non-deformed humans is currently difficult—especially since we, as humans (you are human, right?), are adept at noticing any flaws in human bodies or how they move. Just look at AI-generated Will Smith eating spaghetti.

Judging by Google’s examples (and not having used it ourselves), Lumiere appears to surpass these other AI video generation models. But since Google tends to keep its AI research models close to its chest, we’re not sure when, if ever, the public may have a chance to try it for themselves.

As always, whenever we see text-to-video synthesis models getting more capable, we can’t help but think of the future implications for our Internet-connected society, which is centered around sharing media artifacts—and the general presumption that “realistic” video typically represents real objects in real situations captured by a camera. Future video synthesis tools more capable than Lumiere will make deceptive deepfakes trivially easy to create.

To that end, in the “Societal Impact” section of the Lumiere paper, the researchers write, “Our primary goal in this work is to enable novice users to generate visual content in an creative and flexible way. [sic] However, there is a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases in order to ensure a safe and fair use.”

Google’s latest AI video generator can render cute animals in implausible situations Read More »

as-2024-election-looms,-openai-says-it-is-taking-steps-to-prevent-ai-abuse

As 2024 election looms, OpenAI says it is taking steps to prevent AI abuse

Don’t Rock the vote —

ChatGPT maker plans transparency for gen AI content and improved access to voting info.

A pixelated photo of Donald Trump.

On Monday, ChatGPT maker OpenAI detailed its plans to prevent the misuse of its AI technologies during the upcoming elections in 2024, promising transparency in AI-generated content and enhancing access to reliable voting information. The AI developer says it is working on an approach that involves policy enforcement, collaboration with partners, and the development of new tools aimed at classifying AI-generated media.

“As we prepare for elections in 2024 across the world’s largest democracies, our approach is to continue our platform safety work by elevating accurate voting information, enforcing measured policies, and improving transparency,” writes OpenAI in its blog post. “Protecting the integrity of elections requires collaboration from every corner of the democratic process, and we want to make sure our technology is not used in a way that could undermine this process.”

Initiatives proposed by OpenAI include preventing abuse by means such as deepfakes or bots imitating candidates, refining usage policies, and launching a reporting system for the public to flag potential abuses. For example, OpenAI’s image generation tool, DALL-E 3, includes built-in filters that reject requests to create images of real people, including politicians. “For years, we’ve been iterating on tools to improve factual accuracy, reduce bias, and decline certain requests,” the company stated.

OpenAI says it regularly updates its Usage Policies for ChatGPT and its API products to prevent misuse, especially in the context of elections. The organization has implemented restrictions on using its technologies for political campaigning and lobbying until it better understands the potential for personalized persuasion. Also, OpenAI prohibits creating chatbots that impersonate real individuals or institutions and disallows the development of applications that could deter people from “participation in democratic processes.” Users can report GPTs that may violate the rules.

OpenAI claims to be proactively engaged in detailed strategies to safeguard its technologies against misuse. According to their statements, this includes red-teaming new systems to anticipate challenges, engaging with users and partners for feedback, and implementing robust safety mitigations. OpenAI asserts that these efforts are integral to its mission of continually refining AI tools for improved accuracy, reduced biases, and responsible handling of sensitive requests

Regarding transparency, OpenAI says it is advancing its efforts in classifying image provenance. The company plans to embed digital credentials, using cryptographic techniques, into images produced by DALL-E 3 as part of its adoption of standards by the Coalition for Content Provenance and Authenticity. Additionally, OpenAI says it is testing a tool designed to identify DALL-E-generated images.

In an effort to connect users with authoritative information, particularly concerning voting procedures, OpenAI says it has partnered with the National Association of Secretaries of State (NASS) in the United States. ChatGPT will direct users to CanIVote.org for verified US voting information.

“We want to make sure that our AI systems are built, deployed, and used safely,” writes OpenAI. “Like any new technology, these tools come with benefits and challenges. They are also unprecedented, and we will keep evolving our approach as we learn more about how our tools are used.”

As 2024 election looms, OpenAI says it is taking steps to prevent AI abuse Read More »

how-much-detail-is-too-much?-midjourney-v6-attempts-to-find-out

How much detail is too much? Midjourney v6 attempts to find out

An AI-generated image of a

Enlarge / An AI-generated image of a “Beautiful queen of the universe looking at the camera in sci-fi armor, snow and particles flowing, fire in the background” created using alpha Midjourney v6.

Midjourney

In December, just before Christmas, Midjourney launched an alpha version of its latest image synthesis model, Midjourney v6. Over winter break, Midjourney fans put the new AI model through its paces, with the results shared on social media. So far, fans have noted much more detail than v5.2 (the current default) and a different approach to prompting. Version 6 can also handle generating text in a rudimentary way, but it’s far from perfect.

“It’s definitely a crazy update, both in good and less good ways,” artist Julie Wieland, who frequently shares her Midjourney creations online, told Ars. “The details and scenery are INSANE, the downside (for now) are that the generations are very high contrast and overly saturated (imo). Plus you need to kind of re-adapt and rethink your prompts, working with new structures and now less is kind of more in terms of prompting.”

At the same time, critics of the service still bristle about Midjourney training its models using human-made artwork scraped from the web and obtained without permission—a controversial practice common among AI model trainers we have covered in detail in the past. We’ve also covered the challenges artists might face in the future from these technologies elsewhere.

Too much detail?

With AI-generated detail ramping up dramatically between major Midjourney versions, one could wonder if there is ever such as thing as “too much detail” in an AI-generated image. Midjourney v6 seems to be testing that very question, creating many images that sometimes seem more detailed than reality in an unrealistic way, although that can be modified with careful prompting.

  • An AI-generated image of a nurse in the 1960s created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of an astronaut created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of a “juicy flaming cheeseburger” created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of “a handsome Asian man” created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of an “Apple II” sitting on a desk in the 1980s created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of a “photo of a cat in a car holding a can of beer” created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of a forest path created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of a woman among flowers created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of “a plate of delicious pickles” created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of a barbarian beside a TV set that says “Ars Technica” on it created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of “Abraham Lincoln holding a sign that says Ars Technica” created using alpha Midjourney v6.

    Midjourney

  • An AI-generated image of Mickey Mouse holding a machine gun created using alpha Midjourney v6.

    Midjourney

In our testing of version 6 (which can currently be invoked with the “–v 6.0” argument at the end of a prompt), we noticed times when the new model appeared to produce worse results than v5.2, but Midjourney veterans like Wieland tell Ars that those differences are largely due to the different way that v6.0 interprets prompts. That is something Midjourney is continuously updating over time. “Old prompts sometimes work a bit better than the day they released it,” Wieland told us.

How much detail is too much? Midjourney v6 attempts to find out Read More »

a-song-of-hype-and-fire:-the-10-biggest-ai-stories-of-2023

A song of hype and fire: The 10 biggest AI stories of 2023

An illustration of a robot accidentally setting off a mushroom cloud on a laptop computer.

Getty Images | Benj Edwards

“Here, There, and Everywhere” isn’t just a Beatles song. It’s also a phrase that recalls the spread of generative AI into the tech industry during 2023. Whether you think AI is just a fad or the dawn of a new tech revolution, it’s been impossible to deny that AI news has dominated the tech space for the past year.

We’ve seen a large cast of AI-related characters emerge that includes tech CEOs, machine learning researchers, and AI ethicists—as well as charlatans and doomsayers. From public feedback on the subject of AI, we’ve heard that it’s been difficult for non-technical people to know who to believe, what AI products (if any) to use, and whether we should fear for our lives or our jobs.

Meanwhile, in keeping with a much-lamented trend of 2022, machine learning research has not slowed down over the past year. On X, former Biden administration tech advisor Suresh Venkatasubramanian wrote, “How do people manage to keep track of ML papers? This is not a request for support in my current state of bewilderment—I’m genuinely asking what strategies seem to work to read (or “read”) what appear to be 100s of papers per day.”

To wrap up the year with a tidy bow, here’s a look back at the 10 biggest AI news stories of 2023. It was very hard to choose only 10 (in fact, we originally only intended to do seven), but since we’re not ChatGPT generating reams of text without limit, we have to stop somewhere.

Bing Chat “loses its mind”

Aurich Lawson | Getty Images

In February, Microsoft unveiled Bing Chat, a chatbot built into its languishing Bing search engine website. Microsoft created the chatbot using a more raw form of OpenAI’s GPT-4 language model but didn’t tell everyone it was GPT-4 at first. Since Microsoft used a less conditioned version of GPT-4 than the one that would be released in March, the launch was rough. The chatbot assumed a temperamental personality that could easily turn on users and attack them, tell people it was in love with them, seemingly worry about its fate, and lose its cool when confronted with an article we wrote about revealing its system prompt.

Aside from the relatively raw nature of the AI model Microsoft was using, at fault was a system where very long conversations would push the conditioning system prompt outside of its context window (like a form of short-term memory), allowing all hell to break loose through jailbreaks that people documented on Reddit. At one point, Bing Chat called me “the culprit and the enemy” for revealing some of its weaknesses. Some people thought Bing Chat was sentient, despite AI experts’ assurances to the contrary. It was a disaster in the press, but Microsoft didn’t flinch, and it ultimately reigned in some of Bing Chat’s wild proclivities and opened the bot widely to the public. Today, Bing Chat is now known as Microsoft Copilot, and it’s baked into Windows.

US Copyright Office says no to AI copyright authors

An AI-generated image that won a prize at the Colorado State Fair in 2022, later denied US copyright registration.

Enlarge / An AI-generated image that won a prize at the Colorado State Fair in 2022, later denied US copyright registration.

Jason M. Allen

In February, the US Copyright Office issued a key ruling on AI-generated art, revoking the copyright previously granted to the AI-assisted comic book “Zarya of the Dawn” in September 2022. The decision, influenced by the revelation that the images were created using the AI-powered Midjourney image generator, stated that only the text and arrangement of images and text by Kashtanova were eligible for copyright protection. It was the first hint that AI-generated imagery without human-authored elements could not be copyrighted in the United States.

This stance was further cemented in August when a US federal judge ruled that art created solely by AI cannot be copyrighted. In September, the US Copyright Office rejected the registration for an AI-generated image that won a Colorado State Fair art contest in 2022. As it stands now, it appears that purely AI-generated art (without substantial human authorship) is in the public domain in the United States. This stance could be further clarified or changed in the future by judicial rulings or legislation.

A song of hype and fire: The 10 biggest AI stories of 2023 Read More »