On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site’s content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.
Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.
After an era where AI companies utilized AI training data without expressly seeking any rightsholder permission, some tech firms have more recently begun entering deals where some content used for training AI models similar to GPT-4 (which runs the paid version of ChatGPT) comes under license. In December, for example, OpenAI signed an agreement with German publisher Axel Springer (publisher of Politico and Business Insider) for access to its articles. Previously, OpenAI has struck deals with other organizations, including the Associated Press. Reportedly, OpenAI is also in licensing talks with CNN, Fox, and Time, among others.
In April 2023, Reddit founder and CEO Steve Huffman told The New York Times that it planned to charge AI companies for access to its almost two decades’ worth of human-generated content.
If the reported $60 million/year deal goes through, it’s quite possible that if you’ve ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video. Even without the deal, experts have discovered in the past that Reddit has been a key source of training data for large language models and AI image generators.
While we don’t know if OpenAI is the company that signed the deal with Reddit, Bloomberg speculates that Reddit’s ability to tap into AI hype for additional revenue may boost the value of its IPO, which might be worth $5 billion. Despite drama last year, Bloomberg states that Reddit pulled in more than $800 million in revenue in 2023, growing about 20 percent over its 2022 numbers.
Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.
Enlarge/ A photo of Galactic Compass running on an iPhone.
Matt Webb / Getty Images
On Thursday, designer Matt Webb unveiled a new iPhone app called Galactic Compass, which always points to the center of the Milky Way galaxy—no matter where Earth is positioned on our journey through the stars. The app is free and available now on the App Store.
While using Galactic Compass, you set your iPhone on a level surface, and a big green arrow on the screen points the way to the Galactic Center, which is the rotational core of the spiral galaxy all of us live in. In that center is a supermassive black hole known as Sagittarius A*, a celestial body from which no matter or light can escape. (So, in a way, the app is telling us what we should avoid.)
But truthfully, the location of the galactic core at any given time isn’t exactly useful, practical knowledge—at least for people who aren’t James Tiberius Kirk in Star Trek V. But it may inspire a sense of awe about our place in the cosmos.
Enlarge/ Screenshots of Galactic Compass in action, captured by Ars Technica in a secret location.
Benj Edwards / Getty Images
“It is astoundingly grounding to always have a feeling of the direction of the center of the galaxy,” Webb told Ars Technica. “Your perspective flips. To begin with, it feels arbitrary. The middle of the Milky Way seems to fly all over the sky, as the Earth turns and moves in its orbit.”
Webb’s journey to creating Galactic Compass began a decade ago as an offshoot of his love for casual astronomy. “About 10 years ago, I taught myself how to point to the center of the galaxy,” Webb said. “I lived in an apartment where I had a great view of the stars, so I was using augmented reality apps to identify them, and I gradually learned my way around the sky.”
While Webb initially used an astronomy app to help locate the Galactic Center, he eventually taught himself how to always find it. He described visualizing himself on the surface of the Earth as it spins and tilts, understanding the ecliptic as a line across the sky and recognizing the center of the galaxy as an invisible point moving predictably through the constellation Sagittarius, which lies on the ecliptic line. By visualizing Earth’s orbit over the year and determining his orientation in space, he was able to point in the right direction, refining his ability through daily practice and comparison with an augmented reality app.
With a little help from AI
Enlarge/ Our galaxy, the Milky Way, is thought to look similar to Andromeda (seen here) if you could see it from a distance. But since we’re inside the galaxy, all we can see is the edge of the galactic plane.
Getty Images
In 2021, Webb imagined turning his ability into an app that would help take everyone on the same journey, showing a compass that points toward the galactic center instead of Earth’s magnetic north. “But I can’t write apps,” he said. “I’m a decent enough engineer, and an amateur designer, but I’ve never figured out native apps.”
That’s where ChatGPT comes in, transforming Webb’s vision into reality. With the AI assistant as his coding partner, Webb progressed step by step, crafting a simple app interface and integrating complex calculations for locating the galactic center (which involves calculating the user’s azimuth and altitude).
Still, coding with ChatGPT has its limitations. “ChatGPT is super smart, but it’s not embodied like a human, so it falls down on doing the 3D calculations,” he says. “I had to learn a lot about quaternions, which are a technique for combining 3D rotations, and even then, it’s not perfect. The app needs to be held flat to work simply because my math breaks down when the phone is upright! I’ll fix this in future versions,” Webb said.
Webb is no stranger to ChatGPT-powered creations that are more fun than practical. Last month, he launched a Kickstarter for an AI-rhyming poetry clock called the Poem/1. With his design studio, Acts Not Facts, Webb says he uses “whimsy and play to discover the possibilities in new technology.”
Whimsical or not, Webb insists that Galactic Compass can help us ponder our place in the vast universe, and he’s proud that it recently peaked at #87 in the Travel chart for the US App Store. In this case, though, it’s spaceship Earth that is traveling the galaxy while every living human comes along for the ride.
“Once you can follow it, you start to see the galactic center as the true fixed point, and we’re the ones whizzing and spinning. There it remains, the supermassive black hole at the center of our galaxy, Sagittarius A*, steady as a rock, eternal. We go about our days; it’s always there.”
Enlarge/ Snapshots from three videos generated using OpenAI’s Sora.
On Thursday, OpenAI announced Sora, a text-to-video AI model that can generate 60-second-long photorealistic HD video from written descriptions. While it’s only a research preview that we have not tested, it reportedly creates synthetic video (but not audio yet) at a fidelity and consistency greater than any text-to-video model available at the moment. It’s also freaking people out.
“It was nice knowing you all. Please tell your grandchildren about my videos and the lengths we went to to actually record them,” wrote Wall Street Journal tech reporter Joanna Stern on X.
“This could be the ‘holy shit’ moment of AI,” wrote Tom Warren of The Verge.
“Every single one of these videos is AI-generated, and if this doesn’t concern you at least a little bit, nothing will,” tweeted YouTube tech journalist Marques Brownlee.
For future reference—since this type of panic will some day appear ridiculous—there’s a generation of people who grew up believing that photorealistic video must be created by cameras. When video was faked (say, for Hollywood films), it took a lot of time, money, and effort to do so, and the results weren’t perfect. That gave people a baseline level of comfort that what they were seeing remotely was likely to be true, or at least representative of some kind of underlying truth. Even when the kid jumped over the lava, there was at least a kid and a room.
The prompt that generated the video above: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.“
Technology like Sora pulls the rug out from under that kind of media frame of reference. Very soon, every photorealistic video you see online could be 100 percent false in every way. Moreover, every historical video you see could also be false. How we confront that as a society and work around it while maintaining trust in remote communications is far beyond the scope of this article, but I tried my hand at offering some solutions back in 2020, when all of the tech we’re seeing now seemed like a distant fantasy to most people.
In that piece, I called the moment that truth and fiction in media become indistinguishable the “cultural singularity.” It appears that OpenAI is on track to bring that prediction to pass a bit sooner than we expected.
Prompt: Reflections in the window of a train traveling through the Tokyo suburbs.
OpenAI has found that, like other AI models that use the transformer architecture, Sora scales with available compute. Given far more powerful computers behind the scenes, AI video fidelity could improve considerably over time. In other words, this is the “worst” AI-generated video is ever going to look. There’s no synchronized sound yet, but that might be solved in future models.
How (we think) they pulled it off
AI video synthesis has progressed by leaps and bounds over the past two years. We first covered text-to-video models in September 2022 with Meta’s Make-A-Video. A month later, Google showed off Imagen Video. And just 11 months ago, an AI-generated version of Will Smith eating spaghetti went viral. In May of last year, what was previously considered to be the front-runner in the text-to-video space, Runway Gen-2, helped craft a fake beer commercial full of twisted monstrosities, generated in two-second increments. In earlier video-generation models, people pop in and out of reality with ease, limbs flow together like pasta, and physics doesn’t seem to matter.
Sora (which means “sky” in Japanese) appears to be something altogether different. It’s high-resolution (1920×1080), can generate video with temporal consistency (maintaining the same subject over time) that lasts up to 60 seconds, and appears to follow text prompts with a great deal of fidelity. So, how did OpenAI pull it off?
OpenAI doesn’t usually share insider technical details with the press, so we’re left to speculate based on theories from experts and information given to the public.
OpenAI says that Sora is a diffusion model, much like DALL-E 3 and Stable Diffusion. It generates a video by starting off with noise and “gradually transforms it by removing the noise over many steps,” the company explains. It “recognizes” objects and concepts listed in the written prompt and pulls them out of the noise, so to speak, until a coherent series of video frames emerge.
Sora is capable of generating videos all at once from a text prompt, extending existing videos, or generating videos from still images. It achieves temporal consistency by giving the model “foresight” of many frames at once, as OpenAI calls it, solving the problem of ensuring a generated subject remains the same even if it falls out of view temporarily.
OpenAI represents video as collections of smaller groups of data called “patches,” which the company says are similar to tokens (fragments of a word) in GPT-4. “By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions, and aspect ratios,” the company writes.
An important tool in OpenAI’s bag of tricks is that its use of AI models is compounding. Earlier models are helping to create more complex ones. Sora follows prompts well because, like DALL-E 3, it utilizes synthetic captions that describe scenes in the training data generated by another AI model like GPT-4V. And the company is not stopping here. “Sora serves as a foundation for models that can understand and simulate the real world,” OpenAI writes, “a capability we believe will be an important milestone for achieving AGI.”
One question on many people’s minds is what data OpenAI used to train Sora. OpenAI has not revealed its dataset, but based on what people are seeing in the results, it’s possible OpenAI is using synthetic video data generated in a video game engine in addition to sources of real video (say, scraped from YouTube or licensed from stock video libraries). Nvidia’s Dr. Jim Fan, who is a specialist in training AI with synthetic data, wrote on X, “I won’t be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!” Until confirmed by OpenAI, however, that’s just speculation.
On Tuesday, the United States Patent and Trademark Office (USPTO) published guidance on inventorship for AI-assisted inventions, clarifying that while AI systems can play a role in the creative process, only natural persons (human beings) who make significant contributions to the conception of an invention can be named as inventors. It also rules out using AI models to churn out patent ideas without significant human input.
The USPTO says this position is supported by “the statutes, court decisions, and numerous policy considerations,” including the Executive Order on AI issued by President Biden. We’ve previously covered attempts, which have been repeatedly rejected by US courts, by Dr. Stephen Thaler to have an AI program called “DABUS” named as the inventor on a US patent (a process begun in 2019).
Even though an AI model itself cannot be named an inventor or joint inventor on a patent, using AI assistance to create an invention does not necessarily disqualify a human from holding a patent, as the USPTO explains:
“While AI systems and other non-natural persons cannot be listed as inventors on patent applications or patents, the use of an AI system by a natural person(s) does not preclude a natural person(s) from qualifying as an inventor (or joint inventors) if the natural person(s) significantly contributed to the claimed invention.”
However, the USPTO says that significant human input is required for an invention to be patentable: “Maintaining ‘intellectual domination’ over an AI system does not, on its own, make a person an inventor of any inventions created through the use of the AI system.” So a person simply overseeing an AI system isn’t suddenly an inventor. The person must make a significant contribution to the conception of the invention.
If someone does use an AI model to help create patents, the guidance describes how the application process would work. First, patent applications for AI-assisted inventions must name “the natural person(s) who significantly contributed to the invention as the inventor,” and additionally, applications must not list “any entity that is not a natural person as an inventor or joint inventor, even if an AI system may have been instrumental in the creation of the claimed invention.”
Reading between the lines, it seems the contributions made by AI systems are akin to contributions made by other tools that assist in the invention process. The document does not explicitly say that the use of AI is required to be disclosed during the application process.
Even with the published guidance, the USPTO is seeking public comment on the newly released guidelines and issues related to AI inventorship on its website.
Enlarge/ When ChatGPT looks things up, a pair of green pixelated hands look through paper records, much like this. Just kidding.
Benj Edwards / Getty Images
On Tuesday, OpenAI announced that it is experimenting with adding a form of long-term memory to ChatGPT that will allow it to remember details between conversations. You can ask ChatGPT to remember something, see what it remembers, and ask it to forget. Currently, it’s only available to a small number of ChatGPT users for testing.
So far, large language models have typically used two types of memory: one baked into the AI model during the training process (before deployment) and an in-context memory (the conversation history) that persists for the duration of your session. Usually, ChatGPT forgets what you have told it during a conversation once you start a new session.
Various projects have experimented with giving LLMs a memory that persists beyond a context window. (The context window is the hard limit on the number of tokens the LLM can process at once.) The techniques include dynamically managing context history, compressing previous history through summarization, links to vector databases that store information externally, or simply periodically injecting information into a system prompt (the instructions ChatGPT receives at the beginning of every chat).
Enlarge/ A screenshot of ChatGPT memory controls provided by OpenAI.
OpenAI
OpenAI hasn’t explained which technique it uses here, but the implementation reminds us of Custom Instructions, a feature OpenAI introduced in July 2023 that lets users add custom additions to the ChatGPT system prompt to change its behavior.
Possible applications for the memory feature provided by OpenAI include explaining how you prefer your meeting notes to be formatted, telling it you run a coffee shop and having ChatGPT assume that’s what you’re talking about, keeping information about your toddler that loves jellyfish so it can generate relevant graphics, and remembering preferences for kindergarten lesson plan designs.
Also, OpenAI says that memories may help ChatGPT Enterprise and Team subscribers work together better since shared team memories could remember specific document formatting preferences or which programming frameworks your team uses. And OpenAI plans to bring memories to GPTs soon, with each GPT having its own siloed memory capabilities.
Memory control
Obviously, any tendency to remember information brings privacy implications. You should already know that sending information to OpenAI for processing on remote servers introduces the possibility of privacy leaks and that OpenAI trains AI models on user-provided information by default unless conversation history is disabled or you’re using an Enterprise or Team account.
Along those lines, OpenAI says that your saved memories are also subject to OpenAI training use unless you meet the criteria listed above. Still, the memory feature can be turned off completely. Additionally, the company says, “We’re taking steps to assess and mitigate biases, and steer ChatGPT away from proactively remembering sensitive information, like your health details—unless you explicitly ask it to.”
Users will also be able to control what ChatGPT remembers using a “Manage Memory” interface that lists memory items. “ChatGPT’s memories evolve with your interactions and aren’t linked to specific conversations,” OpenAI says. “Deleting a chat doesn’t erase its memories; you must delete the memory itself.”
ChatGPT’s memory features are not currently available to every ChatGPT account, so we have not experimented with it yet. Access during this testing period appears to be random among ChatGPT (free and paid) accounts for now. “We are rolling out to a small portion of ChatGPT free and Plus users this week to learn how useful it is,” OpenAI writes. “We will share plans for broader roll out soon.”
Enlarge/ A still image from BodyArmor’s 2024 “Field of Fake” Super Bowl commercial.
BodyArmor
Heavily hyped tech products have a history of appearing in Super Bowl commercials during football’s biggest game—including the Apple Macintosh in 1984, dot-com companies in 2000, and cryptocurrency firms in 2022. In 2024, the hot tech in town is artificial intelligence, and several companies showed AI-related ads at Super Bowl LVIII. Here’s a rundown of notable appearances that range from serious to wacky.
Microsoft Copilot
Microsoft Game Day Commercial | Copilot: Your everyday AI companion.
It’s been a year since Microsoft launched the AI assistant Microsoft Copilot (as “Bing Chat“), and Microsoft is leaning heavily into its AI-assistant technology, which is powered by large language models from OpenAI. In Copilot’s first-ever Super Bowl commercial, we see scenes of various people with defiant text overlaid on the screen: “They say I will never open my own business or get my degree. They say I will never make my movie or build something. They say I’m too old to learn something new. Too young to change the world. But I say watch me.”
Then the commercial shows Copilot creating solutions to some of these problems, with prompts like, “Generate storyboard images for the dragon scene in my script,” “Write code for my 3d open world game,” “Quiz me in organic chemistry,” and “Design a sign for my classic truck repair garage Mike’s.”
Of course, since generative AI is an unfinished technology, many of these solutions are more aspirational than practical at the moment. On Bluesky, writer Ed Zitron put Microsoft’s truck repair logo to the test and saw results that weren’t nearly as polished as those seen in the commercial. On X, others have criticized and poked fun at the “3d open world game” generation prompt, which is a complex task that would take far more than a single, simple prompt to produce useful code.
Google Pixel 8 “Guided Frame” feature
Javier in Frame | Google Pixel SB Commercial 2024.
Instead of focusing on generative aspects of AI, Google’s commercial showed off a feature called “Guided Frame” on the Pixel 8 phone that uses machine vision technology and a computer voice to help people with blindness or low vision to take photos by centering the frame on a face or multiple faces. Guided Frame debuted in 2022 in conjunction with the Google Pixel 7.
The commercial tells the story of a person named Javier, who says, “For many people with blindness or low vision, there hasn’t always been an easy way to capture daily life.” We see a simulated blurry first-person view of Javier holding a smartphone and hear a computer-synthesized voice describing what the AI model sees, directing the person to center on a face to snap various photos and selfies.
Considering the controversies that generative AI currently generates (pun intended), it’s refreshing to see a positive application of AI technology used as an accessibility feature. Relatedly, an app called Be My Eyes (powered by OpenAI’s GPT-4V) also aims to help low-vision people interact with the world.
Despicable Me 4
Despicable Me 4 – Minion Intelligence (Big Game Spot).
So far, we’ve covered a couple attempts to show AI-powered products as positive features. Elsewhere in Super Bowl ads, companies weren’t as generous about the technology. In an ad for the film Despicable Me 4, we see two Minions creating a series of terribly disfigured AI-generated still images reminiscent of Stable Diffusion 1.4 from 2022. There’s three-legged people doing yoga, a painting of Steve Carell and Will Ferrell as Elizabethan gentlemen, a handshake with too many fingers, people eating spaghetti in a weird way, and a pair of people riding dachshunds in a race.
The images are paired with an earnest voiceover that says, “Artificial intelligence is changing the way we see the world, showing us what we never thought possible, transforming the way we do business, and bringing family and friends closer together. With artificial intelligence, the future is in good hands.” When the voiceover ends, the camera pans out to show hundreds of Minions generating similarly twisted images on computers.
Speaking of image synthesis at the Super Bowl, people mistook a Christian commercial created by He Gets Us, LLC as having been AI-generated, likely due to its gaudy technicolor visuals. With the benefit of a YouTube replay and the ability to look at details, the “He washed feet” commercial doesn’t appear AI-generated to us, but it goes to show how the concept of image synthesis has begun to cast doubt on human-made creations.
Enlarge/ OpenAI Chief Executive Officer Sam Altman walks on the House side of the US Capitol on January 11, 2024, in Washington, DC. (Photo by Kent Nishimura/Getty Images)
Getty Images
On Thursday, The Wall Street Journal reported that OpenAI CEO Sam Altman is in talks with investors to raise as much as $5 trillion to $7 trillion for AI chip manufacturing, according to people familiar with the matter. The funding seeks to address the scarcity of graphics processing units (GPUs) crucial for training and running large language models like those that power ChatGPT, Microsoft Copilot, and Google Gemini.
The high dollar amount reflects the huge amount of capital necessary to spin up new semiconductor manufacturing capability. “As part of the talks, Altman is pitching a partnership between OpenAI, various investors, chip makers and power providers, which together would put up money to build chip foundries that would then be run by existing chip makers,” writes the Wall Street Journal in its report. “OpenAI would agree to be a significant customer of the new factories.”
To hit these ambitious targets—which are larger than the entire semiconductor industry’s current $527 billion global sales combined—Altman has reportedly met with a range of potential investors worldwide, including sovereign wealth funds and government entities, notably the United Arab Emirates, SoftBank CEO Masayoshi Son, and representatives from Taiwan Semiconductor Manufacturing Co. (TSMC).
TSMC is the world’s largest dedicated independent semiconductor foundry. It’s a critical linchpin that companies such as Nvidia, Apple, Intel, and AMD rely on to fabricate SoCs, CPUs, and GPUs for various applications.
Altman reportedly seeks to expand the global capacity for semiconductor manufacturing significantly, funding the infrastructure necessary to support the growing demand for GPUs and other AI-specific chips. GPUs are excellent at parallel computation, which makes them ideal for running AI models that heavily rely on matrix multiplication to work. However, the technology sector currently faces a significant shortage of these important components, constraining the potential for AI advancements and applications.
In particular, the UAE’s involvement, led by Sheikh Tahnoun bin Zayed al Nahyan, a key security official and chair of numerous Abu Dhabi sovereign wealth vehicles, reflects global interest in AI’s potential and the strategic importance of semiconductor manufacturing. However, the prospect of substantial UAE investment in a key tech industry raises potential geopolitical concerns, particularly regarding the US government’s strategic priorities in semiconductor production and AI development.
The US has been cautious about allowing foreign control over the supply of microchips, given their importance to the digital economy and national security. Reflecting this, the Biden administration has undertaken efforts to bolster domestic chip manufacturing through subsidies and regulatory scrutiny of foreign investments in important technologies.
To put the $5 trillion to $7 trillion estimate in perspective, the White House just today announced a $5 billion investment in R&D to advance US-made semiconductor technologies. TSMC has already sunk $40 billion—one of the largest foreign investments in US history—into a US chip plant in Arizona. As of now, it’s unclear whether Altman has secured any commitments toward his fundraising goal.
Updated on February 9, 2024 at 8: 45 PM Eastern with a quote from the WSJ that clarifies the proposed relationship between OpenAI and partners in the talks.
Enlarge/ With so many choices, selecting the perfect GPT can be confusing.
On Tuesday, OpenAI announced a new feature in ChatGPT that allows users to pull custom personalities called “GPTs” into any ChatGPT conversation with the @ symbol. It allows a level of quasi-teamwork within ChatGPT among expert roles that was previously impractical, making collaborating with a team of AI agents within OpenAI’s platform one step closer to reality.
“You can now bring GPTs into any conversation in ChatGPT – simply type @ and select the GPT,” wrote OpenAI on the social media network X. “This allows you to add relevant GPTs with the full context of the conversation.”
OpenAI introduced GPTs in November as a way to create custom personalities or roles for ChatGPT to play. For example, users can build their own GPTs to focus on certain topics or certain skills. Paid ChatGPT subscribers can also freely download a host of GPTs developed by other ChatGPT users through the GPT Store.
Previously, if you wanted to share information between GPT profiles, you had to copy the text, select a new chat with the GPT, paste it, and explain the context of what the information means or what you want to do with it. Now, ChatGPT users can stay in the default ChatGPT window and bring in GPTs as needed without losing the history of the conversation.
For example, we created a “Wellness Guide” GPT that is crafted as an expert in human health conditions (of course, this being ChatGPT, always consult a human doctor if you’re having medical problems), and we created a “Canine Health Advisor” for dog-related health questions.
Enlarge/ A screenshot of ChatGPT where we @-mentioned a human wellness advisor, then a dog advisor in the same conversation history.
Benj Edwards
We started in a default ChatGPT chat, hit the @ symbol, then typed the first few letters of “Wellness” and selected it from a list. It filled out the rest. We asked a question about food poisoning in humans, and then we switched to the canine advisor in the same way with an @ symbol and asked about the dog.
Using this feature, you could alternatively consult, say, an “ad copywriter” GPT and an “editor” GPT—ask the copywriter to write some text, then rope in the editor GPT to check it, looking at it from a different angle. Different system prompts (the instructions that define a GPT’s personality) make for significant behavior differences.
We also tried swapping between GPT profiles that write software and others designed to consult on historical tech subjects. Interestingly, ChatGPT does not differentiate between GPTs as different personalities as you change. It will still say, “I did this earlier” when a different GPT is talking about a previous GPT’s output in the same conversation history. From its point of view, it’s just ChatGPT and not multiple agents.
From our vantage point, this feature seems to represent baby steps toward a future where GPTs, as independent agents, could work together as a team to fulfill more complex tasks directed by the user. Similar experiments have been done outside of OpenAI in the past (using API access), but OpenAI has so far resisted a more agentic model for ChatGPT. As we’ve seen (first with GPTs and now with this), OpenAI seems to be slowly angling toward that goal itself, but only time will tell if or when we see true agentic teamwork in a shipping service.
Enlarge/ A CAD render of the Poem/1 sitting on a bookshelf.
On Tuesday, product developer Matt Webb launched a Kickstarter funding project for a whimsical e-paper clock called the “Poem/1” that tells the current time using AI and rhyming poetry. It’s powered by the ChatGPT API, and Webb says that sometimes ChatGPT will lie about the time or make up words to make the rhymes work.
“Hey so I made a clock. It tells the time with a brand new poem every minute, composed by ChatGPT. It’s sometimes profound, and sometimes weird, and occasionally it fibs about what the actual time is to make a rhyme work,” Webb writes on his Kickstarter page.
The $126 clock is the product of Webb’s Acts Not Facts, which he bills as “a new studio for technology and product invention via exploring and making.” Despite the net-connected service aspect of the clock, Webb says it will not require a subscription to function.
Enlarge/ A labeled CAD rendering of the Poem/1 clock, representing its final shipping configuration.
There are 1,440 minutes in a day, so Poem/1 needs to display 1,440 unique poems to work. The clock features a monochrome e-paper screen and pulls its poetry rhymes via Wi-Fi from a central server run by Webb’s company. To save money, that server pulls poems from ChatGPT’s API and will share them out to many Poem/1 clocks at once. This prevents costly API fees that would add up if your clock were querying OpenAI’s servers 1,440 times a day, non-stop, forever. “I’m reserving a % of the retail price from each clock in a bank account to cover AI and server costs for 5 years,” Webb writes.
For hackers, Webb says that you’ll be able to change the back-end server URL of the Poem/1 from the default to whatever you want, so it can display custom text every minute of the day. Webb says he will document and publish the API when Poem/1 ships.
Hallucination time
Enlarge/ A photo of a Poem/1 prototype with a hallucinated time, according to Webb.
Given the Poem/1’s large language model pedigree, it’s perhaps not surprising that Poem/1 may sometimes make up things (also called “hallucination” or “confabulation” in the AI field) to fulfill its task. The LLM that powers ChatGPT is always searching for the most likely next word in a sequence, and sometimes factuality comes second to fulfilling that mission.
Further down on the Kickstarter page, Webb provides a photo of his prototype Poem/1 where the screen reads, “As the clock strikes eleven forty two, / I rhyme the time, as I always do.” Just below, Webb warns, “Poem/1 fibs occasionally. I don’t believe it was actually 11.42 when this photo was taken. The AI hallucinated the time in order to make the poem work. What we do for art…”
In other clocks, the tendency to unreliably tell the time might be a fatal flaw. But judging by his humorous angle on the Kickstarter page, Webb apparently sees the clock as more of a fun art project than a precision timekeeping instrument. “Don’t rely on this clock in situations where timekeeping is vital,” Webb writes, “such as if you work in air traffic control or rocket launches or the finish line of athletics competitions.”
Poem/1 also sometimes takes poetic license with vocabulary to tell the time. During a humorous moment in the Kickstarter promotional video, Webb looks at his clock prototype and reads the rhyme, “A clock that defies all rhyme and reason / 4: 30 PM, a temporal teason.” Then he says, “I had to look ‘teason’ up. It doesn’t mean anything, so it’s a made-up word.”
On Monday, OpenAI announced a partnership with the nonprofit Common Sense Media to create AI guidelines and educational materials targeted at parents, educators, and teens. It includes the curation of family-friendly GPTs in OpenAI’s GPT store. The collaboration aims to address concerns about the impacts of AI on children and teenagers.
Known for its reviews of films and TV shows aimed at parents seeking appropriate media for their kids to watch, Common Sense Media recently branched out into AI and has been reviewing AI assistants on its site.
“AI isn’t going anywhere, so it’s important that we help kids understand how to use it responsibly,” Common Sense Media wrote on X. “That’s why we’ve partnered with @OpenAI to help teens and families safely harness the potential of AI.”
For his part, Altman offered a canned statement in the press release, saying, “AI offers incredible benefits for families and teens, and our partnership with Common Sense will further strengthen our safety work, ensuring that families and teens can use our tools with confidence.”
The announcement feels slightly non-specific in the official news release, with Steyer offering, “Our guides and curation will be designed to educate families and educators about safe, responsible use of ChatGPT, so that we can collectively avoid any unintended consequences of this emerging technology.”
The partnership seems aimed mostly at bringing a patina of family-friendliness to OpenAI’s GPT store, with the most solid reveal being the aforementioned fact that Common Sense media will help with the “curation of family-friendly GPTs in the GPT Store based on Common Sense ratings and standards.”
Common Sense AI reviews
As mentioned above, Common Sense Media began reviewing AI assistants on its site late last year. This puts Common Sense Media in an interesting position with potential conflicts of interest regarding the new partnership with OpenAI. However, it doesn’t seem to be offering any favoritism to OpenAI so far.
For example, Common Sense Media’s review of ChatGPT calls the AI assistant “A powerful, at times risky chatbot for people 13+ that is best used for creativity, not facts.” It labels ChatGPT as being suitable for ages 13 and up (which is in OpenAI’s Terms of Service) and gives the OpenAI assistant three out of five stars. ChatGPT also scores a 48 percent privacy rating (which is oddly shown as 55 percent on another page that goes into privacy details). The review we cited was last updated on October 13, 2023, as of this writing.
For reference, Google Bard gets a three-star overall rating and a 75 percent privacy rating in its Common Sense Media review. Stable Diffusion, the image synthesis model, nets a one-star rating with the description, “Powerful image generator can unleash creativity, but is wildly unsafe and perpetuates harm.” OpenAI’s DALL-E gets two stars and a 48 percent privacy rating.
The information that Common Sense Media includes about each AI model appears relatively accurate and detailed (and the organization cited an Ars Technica article as a reference in one explanation), so they feel fair, even in the face of the OpenAI partnership. Given the low scores, it seems that most AI models aren’t off to a great start, but that may change. It’s still early days in generative AI.
Enlarge/ Still images of AI-generated video examples provided by Google for its Lumiere video synthesis model.
On Tuesday, Google announced Lumiere, an AI video generator that it calls “a space-time diffusion model for realistic video generation” in the accompanying preprint paper. But let’s not kid ourselves: It does a great job at creating videos of cute animals in ridiculous scenarios, such as using roller skates, driving a car, or playing a piano. Sure, it can do more, but it is perhaps the most advanced text-to-animal AI video generator yet demonstrated.
According to Google, Lumiere utilizes unique architecture to generate a video’s entire temporal duration in one go. Or, as the company put it, “We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution—an approach that inherently makes global temporal consistency difficult to achieve.”
In layperson terms, Google’s tech is designed to handle both the space (where things are in the video) and time (how things move and change throughout the video) aspects simultaneously. So, instead of making a video by putting together many small parts or frames, it can create the entire video, from start to finish, in one smooth process.
The official promotional video accompanying the paper “Lumiere: A Space-Time Diffusion Model for Video Generation,” released by Google.
Lumiere can also do plenty of party tricks, which are laid out quite well with examples on Google’s demo page. For example, it can perform text-to-video generation (turning a written prompt into a video), convert still images into videos, generate videos in specific styles using a reference image, apply consistent video editing using text-based prompts, create cinemagraphs by animating specific regions of an image, and offer video inpainting capabilities (for example, it can change the type of dress a person is wearing).
In the Lumiere research paper, the Google researchers state that the AI model outputs five-second long 1024×1024 pixel videos, which they describe as “low-resolution.” Despite those limitations, the researchers performed a user study and claim that Lumiere’s outputs were preferred over existing AI video synthesis models.
As for training data, Google doesn’t say where it got the videos they fed into Lumiere, writing, “We train our T2V [text to video] model on a dataset containing 30M videos along with their text caption. [sic] The videos are 80 frames long at 16 fps (5 seconds). The base model is trained at 128×128.”
Enlarge/ A block diagram showing components of the Lumiere AI model, provided by Google.
AI-generated video is still in a primitive state, but it’s been progressing in quality over the past two years. In October 2022, we covered Google’s first publicly unveiled image synthesis model, Imagen Video. It could generate short 1280×768 video clips from a written prompt at 24 frames per second, but the results weren’t always coherent. Before that, Meta debuted its AI video generator, Make-A-Video. In June of last year, Runway’s Gen2 video synthesis model enabled the creation of two-second video clips from text prompts, fueling the creation of surrealistic parody commercials. And in November, we covered Stable Video Diffusion, which can generate short clips from still images.
AI companies often demonstrate video generators with cute animals because generating coherent, non-deformed humans is currently difficult—especially since we, as humans (you are human, right?), are adept at noticing any flaws in human bodies or how they move. Just look at AI-generated Will Smith eating spaghetti.
Judging by Google’s examples (and not having used it ourselves), Lumiere appears to surpass these other AI video generation models. But since Google tends to keep its AI research models close to its chest, we’re not sure when, if ever, the public may have a chance to try it for themselves.
As always, whenever we see text-to-video synthesis models getting more capable, we can’t help but think of the future implications for our Internet-connected society, which is centered around sharing media artifacts—and the general presumption that “realistic” video typically represents real objects in real situations captured by a camera. Future video synthesis tools more capable than Lumiere will make deceptive deepfakes trivially easy to create.
To that end, in the “Societal Impact” section of the Lumiere paper, the researchers write, “Our primary goal in this work is to enable novice users to generate visual content in an creative and flexible way. [sic] However, there is a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases in order to ensure a safe and fair use.”
In 1921, Czech playwright Karel Čapek and his brother Josef invented the word “robot” in a sci-fi play called R.U.R. (short for Rossum’s Universal Robots). As Even Ackerman in IEEE Spectrum points out, Čapek wasn’t happy about how the term’s meaning evolved to denote mechanical entities, straying from his original concept of artificial human-like beings based on chemistry.
In a newly translated column called “The Author of the Robots Defends Himself,” published in Lidové Noviny on June 9, 1935, Čapek expresses his frustration about how his original vision for robots was being subverted. His arguments still apply to both modern robotics and AI. In this column, he referred to himself in the third-person:
For his robots were not mechanisms. They were not made of sheet metal and cogwheels. They were not a celebration of mechanical engineering. If the author was thinking of any of the marvels of the human spirit during their creation, it was not of technology, but of science. With outright horror, he refuses any responsibility for the thought that machines could take the place of people, or that anything like life, love, or rebellion could ever awaken in their cogwheels. He would regard this somber vision as an unforgivable overvaluation of mechanics or as a severe insult to life.
This recently resurfaced article comes courtesy of a new English translation of Čapek’s play called R.U.R. and the Vision of Artificial Life accompanied by 20 essays on robotics, philosophy, politics, and AI. The editor, Jitka Čejková, a professor at the Chemical Robotics Laboratory in Prague, aligns her research with Čapek’s original vision. She explores “chemical robots”—microparticles resembling living cells—which she calls “liquid robots.”
Enlarge/ “An assistant of inventor Captain Richards works on the robot the Captain has invented, which speaks, answers questions, shakes hands, tells the time and sits down when it’s told to.” – September 1928
In Čapek’s 1935 column, he clarifies that his robots were not intended to be mechanical marvels, but organic products of modern chemistry, akin to living matter. Čapek emphasizes that he did not want to glorify mechanical systems but to explore the potential of science, particularly chemistry. He refutes the idea that machines could replace humans or develop emotions and consciousness.
The author of the robots would regard it as an act of scientific bad taste if he had brought something to life with brass cogwheels or created life in the test tube; the way he imagined it, he created only a new foundation for life, which began to behave like living matter, and which could therefore have become a vehicle of life—but a life which remains an unimaginable and incomprehensible mystery. This life will reach its fulfillment only when (with the aid of considerable inaccuracy and mysticism) the robots acquire souls. From which it is evident that the author did not invent his robots with the technological hubris of a mechanical engineer, but with the metaphysical humility of a spiritualist.
The reason for the transition from chemical to mechanical in the public perception of robots isn’t entirely clear (though Čapek does mention a Russian film which went the mechanical route and was likely influential). The early 20th century was a period of rapid industrialization and technological advancement that saw the emergence of complex machinery and electronic automation, which probably influenced the public and scientific community’s perception of autonomous beings, leading them to associate the idea of robots with mechanical and electronic devices rather than chemical creations.
The 1935 piece is full of interesting quotes (you can read the whole thing in IEEE Spectrum or here), and we’ve grabbed a few highlights below that you can conveniently share with your robot-loving friends to blow their minds:
“He pronounces that his robots were created quite differently—that is, by a chemical path”
“He has learned, without any great pleasure, that genuine steel robots have started to appear”
“Well then, the author cannot be blamed for what might be called the worldwide humbug over the robots.”
“The world needed mechanical robots, for it believes in machines more than it believes in life; it is fascinated more by the marvels of technology than by the miracle of life.”
So it seems, over 100 years later, that we’ve gotten it wrong all along. Čapek’s vision, rooted in chemical synthesis and the philosophical mysteries of life, offers a different narrative from the predominant mechanical and electronic interpretation of robots we know today. But judging from what Čapek wrote, it sounds like he would be firmly against AI takeover scenarios. In fact, Čapek, who died in 1938, probably would think they would be impossible.