machine learning

music-industry-giants-allege-mass-copyright-violation-by-ai-firms

Music industry giants allege mass copyright violation by AI firms

No one wants to be defeated —

Suno and Udio could face damages of up to $150,000 per song allegedly infringed.

Michael Jackson in concert, 1986. Sony Music owns a large portion of publishing rights to Jackson's music.

Enlarge / Michael Jackson in concert, 1986. Sony Music owns a large portion of publishing rights to Jackson’s music.

Universal Music Group, Sony Music, and Warner Records have sued AI music-synthesis companies Udio and Suno for allegedly committing mass copyright infringement by using recordings owned by the labels to train music-generating AI models, reports Reuters. Udio and Suno can generate novel song recordings based on text-based descriptions of music (i.e., “a dubstep song about Linus Torvalds”).

The lawsuits, filed in federal courts in New York and Massachusetts, claim that the AI companies’ use of copyrighted material to train their systems could lead to AI-generated music that directly competes with and potentially devalues the work of human artists.

Like other generative AI models, both Udio and Suno (which we covered separately in April) rely on a broad selection of existing human-created artworks that teach a neural network the relationship between words in a written prompt and styles of music. The record labels correctly note that these companies have been deliberately vague about the sources of their training data.

Until generative AI models hit the mainstream in 2022, it was common practice in machine learning to scrape and use copyrighted information without seeking permission to do so. But now that the applications of those technologies have become commercial products themselves, rightsholders have come knocking to collect. In the case of Udio and Suno, the record labels are seeking statutory damages of up to $150,000 per song used in training.

In the lawsuit, the record labels cite specific examples of AI-generated content that allegedly re-creates elements of well-known songs, including The Temptations’ “My Girl,” Mariah Carey’s “All I Want for Christmas Is You,” and James Brown’s “I Got You (I Feel Good).” It also claims the music-synthesis models can produce vocals resembling those of famous artists, such as Michael Jackson and Bruce Springsteen.

Reuters claims it’s the first instance of lawsuits specifically targeting music-generating AI, but music companies and artists alike have been gearing up to deal with challenges the technology may pose for some time.

In May, Sony Music sent warning letters to over 700 AI companies (including OpenAI, Microsoft, Google, Suno, and Udio) and music-streaming services that prohibited any AI researchers from using its music to train AI models. In April, over 200 musical artists signed an open letter that called on AI companies to stop using AI to “devalue the rights of human artists.” And last November, Universal Music filed a copyright infringement lawsuit against Anthropic for allegedly including artists’ lyrics in its Claude LLM training data.

Similar to The New York Times’ lawsuit against OpenAI over the use of training data, the outcome of the record labels’ new suit could have deep implications for the future development of generative AI in creative fields, including requiring companies to license all musical training data used in creating music-synthesis models.

Compulsory licenses for AI training data could make AI model development economically impractical for small startups like Udio and Suno—and judging by the aforementioned open letter, many musical artists may applaud that potential outcome. But such a development would not preclude major labels from eventually developing their own AI music generators themselves, allowing only large corporations with deep pockets to control generative music tools for the foreseeable future.

Music industry giants allege mass copyright violation by AI firms Read More »

anthropic-introduces-claude-3.5-sonnet,-matching-gpt-4o-on-benchmarks

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks

The Anthropic Claude 3 logo, jazzed up by Benj Edwards.

Anthropic / Benj Edwards

On Thursday, Anthropic announced Claude 3.5 Sonnet, its latest AI language model and the first in a new series of “3.5” models that build upon Claude 3, launched in March. Claude 3.5 can compose text, analyze data, and write code. It features a 200,000 token context window and is available now on the Claude website and through an API. Anthropic also introduced Artifacts, a new feature in the Claude interface that shows related work documents in a dedicated window.

So far, people outside of Anthropic seem impressed. “This model is really, really good,” wrote independent AI researcher Simon Willison on X. “I think this is the new best overall model (and both faster and half the price of Opus, similar to the GPT-4 Turbo to GPT-4o jump).”

As we’ve written before, benchmarks for large language models (LLMs) are troublesome because they can be cherry-picked and often do not capture the feel and nuance of using a machine to generate outputs on almost any conceivable topic. But according to Anthropic, Claude 3.5 Sonnet matches or outperforms competitor models like GPT-4o and Gemini 1.5 Pro on certain benchmarks like MMLU (undergraduate level knowledge), GSM8K (grade school math), and HumanEval (coding).

Claude 3.5 Sonnet benchmarks provided by Anthropic.

Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

If all that makes your eyes glaze over, that’s OK; it’s meaningful to researchers but mostly marketing to everyone else. A more useful performance metric comes from what we might call “vibemarks” (coined here first!) which are subjective, non-rigorous aggregate feelings measured by competitive usage on sites like LMSYS’s Chatbot Arena. The Claude 3.5 Sonnet model is currently under evaluation there, and it’s too soon to say how well it will fare.

Claude 3.5 Sonnet also outperforms Anthropic’s previous-best model (Claude 3 Opus) on benchmarks measuring “reasoning,” math skills, general knowledge, and coding abilities. For example, the model demonstrated strong performance in an internal coding evaluation, solving 64 percent of problems compared to 38 percent for Claude 3 Opus.

Claude 3.5 Sonnet is also a multimodal AI model that accepts visual input in the form of images, and the new model is reportedly excellent at a battery of visual comprehension tests.

Claude 3.5 Sonnet benchmarks provided by Anthropic.

Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

Roughly speaking, the visual benchmarks mean that 3.5 Sonnet is better at pulling information from images than previous models. For example, you can show it a picture of a rabbit wearing a football helmet, and the model knows it’s a rabbit wearing a football helmet and can talk about it. That’s fun for tech demos, but the tech is still not accurate enough for applications of the tech where reliability is mission critical.

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks Read More »

ex-openai-star-sutskever-shoots-for-superintelligent-ai-with-new-company

Ex-OpenAI star Sutskever shoots for superintelligent AI with new company

Not Strategic Simulations —

Safe Superintelligence, Inc. seeks to safely build AI far beyond human capability.

Illya Sutskever physically gestures as OpenAI CEO Sam Altman looks on at Tel Aviv University on June 5, 2023.

Enlarge / Ilya Sutskever physically gestures as OpenAI CEO Sam Altman looks on at Tel Aviv University on June 5, 2023.

On Wednesday, former OpenAI Chief Scientist Ilya Sutskever announced he is forming a new company called Safe Superintelligence, Inc. (SSI) with the goal of safely building “superintelligence,” which is a hypothetical form of artificial intelligence that surpasses human intelligence, possibly in the extreme.

We will pursue safe superintelligence in a straight shot, with one focus, one goal, and one product,” wrote Sutskever on X. “We will do it through revolutionary breakthroughs produced by a small cracked team.

Sutskever was a founding member of OpenAI and formerly served as the company’s chief scientist. Two others are joining Sutskever at SSI initially: Daniel Levy, who formerly headed the Optimization Team at OpenAI, and Daniel Gross, an AI investor who worked on machine learning projects at Apple between 2013 and 2017. The trio posted a statement on the company’s new website.

A screen capture of Safe Superintelligence's initial formation announcement captured on June 20, 2024.

Enlarge / A screen capture of Safe Superintelligence’s initial formation announcement captured on June 20, 2024.

Sutskever and several of his co-workers resigned from OpenAI in May, six months after Sutskever played a key role in ousting OpenAI CEO Sam Altman, who later returned. While Sutskever did not publicly complain about OpenAI after his departure—and OpenAI executives such as Altman wished him well on his new adventures—another resigning member of OpenAI’s Superalignment team, Jan Leike, publicly complained that “over the past years, safety culture and processes [had] taken a backseat to shiny products” at OpenAI. Leike joined OpenAI competitor Anthropic later in May.

A nebulous concept

OpenAI is currently seeking to create AGI, or artificial general intelligence, which would hypothetically match human intelligence at performing a wide variety of tasks without specific training. Sutskever hopes to jump beyond that in a straight moonshot attempt, with no distractions along the way.

“This company is special in that its first product will be the safe superintelligence, and it will not do anything else up until then,” said Sutskever in an interview with Bloomberg. “It will be fully insulated from the outside pressures of having to deal with a large and complicated product and having to be stuck in a competitive rat race.”

During his former job at OpenAI, Sutskever was part of the “Superalignment” team studying how to “align” (shape the behavior of) this hypothetical form of AI, sometimes called “ASI” for “artificial super intelligence,” to be beneficial to humanity.

As you can imagine, it’s difficult to align something that does not exist, so Sutskever’s quest has met skepticism at times. On X, University of Washington computer science professor (and frequent OpenAI critic) Pedro Domingos wrote, “Ilya Sutskever’s new company is guaranteed to succeed, because superintelligence that is never achieved is guaranteed to be safe.

Much like AGI, superintelligence is a nebulous term. Since the mechanics of human intelligence are still poorly understood—and since human intelligence is difficult to quantify or define because there is no one set type of human intelligence—identifying superintelligence when it arrives may be tricky.

Already, computers far surpass humans in many forms of information processing (such as basic math), but are they superintelligent? Many proponents of superintelligence imagine a sci-fi scenario of an “alien intelligence” with a form of sentience that operates independently of humans, and that is more or less what Sutskever hopes to achieve and control safely.

“You’re talking about a giant super data center that’s autonomously developing technology,” he told Bloomberg. “That’s crazy, right? It’s the safety of that that we want to contribute to.”

Ex-OpenAI star Sutskever shoots for superintelligent AI with new company Read More »

runway’s-latest-ai-video-generator-brings-giant-cotton-candy-monsters-to-life

Runway’s latest AI video generator brings giant cotton candy monsters to life

Screen capture of a Runway Gen-3 Alpha video generated with the prompt

Enlarge / Screen capture of a Runway Gen-3 Alpha video generated with the prompt “A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.”

On Sunday, Runway announced a new AI video synthesis model called Gen-3 Alpha that’s still under development, but it appears to create video of similar quality to OpenAI’s Sora, which debuted earlier this year (and has also not yet been released). It can generate novel, high-definition video from text prompts that range from realistic humans to surrealistic monsters stomping the countryside.

Unlike Runway’s previous best model from June 2023, which could only create two-second-long clips, Gen-3 Alpha can reportedly create 10-second-long video segments of people, places, and things that have a consistency and coherency that easily surpasses Gen-2. If 10 seconds sounds short compared to Sora’s full minute of video, consider that the company is working with a shoestring budget of compute compared to more lavishly funded OpenAI—and actually has a history of shipping video generation capability to commercial users.

Gen-3 Alpha does not generate audio to accompany the video clips, and it’s highly likely that temporally coherent generations (those that keep a character consistent over time) are dependent on similar high-quality training material. But Runway’s improvement in visual fidelity over the past year is difficult to ignore.

AI video heats up

It’s been a busy couple of weeks for AI video synthesis in the AI research community, including the launch of the Chinese model Kling, created by Beijing-based Kuaishou Technology (sometimes called “Kwai”). Kling can generate two minutes of 1080p HD video at 30 frames per second with a level of detail and coherency that reportedly matches Sora.

Gen-3 Alpha prompt: “Subtle reflections of a woman on the window of a train moving at hyper-speed in a Japanese city.”

Not long after Kling debuted, people on social media began creating surreal AI videos using Luma AI’s Luma Dream Machine. These videos were novel and weird but generally lacked coherency; we tested out Dream Machine and were not impressed by anything we saw.

Meanwhile, one of the original text-to-video pioneers, New York City-based Runway—founded in 2018—recently found itself the butt of memes that showed its Gen-2 tech falling out of favor compared to newer video synthesis models. That may have spurred the announcement of Gen-3 Alpha.

Gen-3 Alpha prompt: “An astronaut running through an alley in Rio de Janeiro.”

Generating realistic humans has always been tricky for video synthesis models, so Runway specifically shows off Gen-3 Alpha’s ability to create what its developers call “expressive” human characters with a range of actions, gestures, and emotions. However, the company’s provided examples weren’t particularly expressive—mostly people just slowly staring and blinking—but they do look realistic.

Provided human examples include generated videos of a woman on a train, an astronaut running through a street, a man with his face lit by the glow of a TV set, a woman driving a car, and a woman running, among others.

Gen-3 Alpha prompt: “A close-up shot of a young woman driving a car, looking thoughtful, blurred green forest visible through the rainy car window.”

The generated demo videos also include more surreal video synthesis examples, including a giant creature walking in a rundown city, a man made of rocks walking in a forest, and the giant cotton candy monster seen below, which is probably the best video on the entire page.

Gen-3 Alpha prompt: “A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.”

Gen-3 will power various Runway AI editing tools (one of the company’s most notable claims to fame), including Multi Motion Brush, Advanced Camera Controls, and Director Mode. It can create videos from text or image prompts.

Runway says that Gen-3 Alpha is the first in a series of models trained on a new infrastructure designed for large-scale multimodal training, taking a step toward the development of what it calls “General World Models,” which are hypothetical AI systems that build internal representations of environments and use them to simulate future events within those environments.

Runway’s latest AI video generator brings giant cotton candy monsters to life Read More »

softbank-plans-to-cancel-out-angry-customer-voices-using-ai

Softbank plans to cancel out angry customer voices using AI

our fake future —

Real-time voice modification tech seeks to reduce stress in call center staff.

A man is angry and screaming while talking on a smartphone.

Japanese telecommunications giant SoftBank recently announced that it has been developing “emotion-canceling” technology powered by AI that will alter the voices of angry customers to sound calmer during phone calls with customer service representatives. The project aims to reduce the psychological burden on operators suffering from harassment and has been in development for three years. Softbank plans to launch it by March 2026, but the idea is receiving mixed reactions online.

According to a report from the Japanese news site The Asahi Shimbun, SoftBank’s project relies on an AI model to alter the tone and pitch of a customer’s voice in real-time during a phone call. SoftBank’s developers, led by employee Toshiyuki Nakatani, trained the system using a dataset of over 10,000 voice samples, which were performed by 10 Japanese actors expressing more than 100 phrases with various emotions, including yelling and accusatory tones.

Voice cloning and synthesis technology has made massive strides in the past three years. We’ve previously covered technology from Microsoft that can clone a voice with a three-second audio sample and audio-processing technology from Adobe that cleans up audio by re-synthesizing a person’s voice, so SoftBank’s technology is well within the realm of plausibility.

By analyzing the voice samples, SoftBank’s AI model has reportedly learned to recognize and modify the vocal characteristics associated with anger and hostility. When a customer speaks to a call center operator, the model processes the incoming audio and adjusts the pitch and inflection of the customer’s voice to make it sound calmer and less threatening.

For example, a high-pitched, resonant voice may be lowered in tone, while a deep male voice may be raised to a higher pitch. The technology reportedly does not alter the content or wording of the customer’s speech, and it retains a slight element of audible anger to ensure that the operator can still gauge the customer’s emotional state. The AI model also monitors the length and content of the conversation, sending a warning message if it determines that the interaction is too long or abusive.

The tech has been developed through SoftBank’s in-house program called “SoftBank Innoventure” in conjunction with The Institute for AI and Beyond, which is a joint AI research institute established by The University of Tokyo.

Harassment a persistent problem

According to SoftBank, Japan’s service sector is grappling with the issue of “kasu-hara,” or customer harassment, where workers face aggressive behavior or unreasonable requests from customers. In response, the Japanese government and businesses are reportedly exploring ways to protect employees from the abuse.

The problem isn’t unique to Japan. In a Reddit thread on Softbank’s AI plans, call center operators from other regions related many stories about the stress of dealing with customer harassment. “I’ve worked in a call center for a long time. People need to realize that screaming at call center agents will get you nowhere,” wrote one person.

A 2021 ProPublica report tells horror stories from call center operators who are trained not to hang up no matter how abusive or emotionally degrading a call gets. The publication quoted Skype customer service contractor Christine Stewart as saying, “One person called me the C-word. I’d call my supervisor. They’d say, ‘Calm them down.’ … They’d always try to push me to stay on the call and calm the customer down myself. I wasn’t getting paid enough to do that. When you have a customer sitting there and saying you’re worthless… you’re supposed to ‘de-escalate.'”

But verbally de-escalating an angry customer is difficult, according to Reddit poster BenCelotil, who wrote, “As someone who has worked in several call centers, let me just point out that there is no way faster to escalate a call than to try and calm the person down. If the angry person on the other end of the call thinks you’re just trying to placate and push them off somewhere else, they’re only getting more pissed.”

Ignoring reality using AI

Harassment of call center workers is a very real problem, but given the introduction of AI as a possible solution, some people wonder whether it’s a good idea to essentially filter emotional reality on demand through voice synthesis. Perhaps this technology is a case of treating the symptom instead of the root cause of the anger, as some social media commenters note.

“This is like the worst possible solution to the problem,” wrote one Redditor in the thread mentioned above. “Reminds me of when all the workers at Apple’s China factory started jumping out of windows due to working conditions, so the ‘solution’ was to put nets around the building.”

SoftBank expects to introduce its emotion-canceling solution within fiscal year 2025, which ends on March 31, 2026. By reducing the psychological burden on call center operators, SoftBank says it hopes to create a safer work environment that enables employees to provide even better services to customers.

Even so, ignoring customer anger could backfire in the long run when the anger is sometimes a legitimate response to poor business practices. As one Redditor wrote, “If you have so many angry customers that it is affecting the mental health of your call center operators, then maybe address the reasons you have so many irate customers instead of just pretending that they’re not angry.”

Softbank plans to cancel out angry customer voices using AI Read More »

report:-apple-isn’t-paying-openai-for-chatgpt-integration-into-oses

Report: Apple isn’t paying OpenAI for ChatGPT integration into OSes

in the pocket —

Apple thinks pushing OpenAI’s brand to hundreds of millions is worth more than money.

The OpenAI and Apple logos together.

OpenAI / Apple / Benj Edwards

On Monday, Apple announced it would be integrating OpenAI’s ChatGPT AI assistant into upcoming versions of its iPhone, iPad, and Mac operating systems. It paves the way for future third-party AI model integrations, but given Google’s multi-billion-dollar deal with Apple for preferential web search, the OpenAI announcement inspired speculation about who is paying whom. According to a Bloomberg report published Wednesday, Apple considers ChatGPT’s placement on its devices as compensation enough.

“Apple isn’t paying OpenAI as part of the partnership,” writes Bloomberg reporter Mark Gurman, citing people familiar with the matter who wish to remain anonymous. “Instead, Apple believes pushing OpenAI’s brand and technology to hundreds of millions of its devices is of equal or greater value than monetary payments.”

The Bloomberg report states that neither company expects the agreement to generate meaningful revenue in the short term, and in fact, the partnership could burn extra money for OpenAI, because it pays Microsoft to host ChatGPT’s capabilities on its Azure cloud. However, OpenAI could benefit by converting free users to paid subscriptions, and Apple potentially benefits by providing easy, built-in access to ChatGPT during a time when its own in-house LLMs are still catching up.

And there’s another angle at play. Currently, OpenAI offers subscriptions (ChatGPT Plus, Enterprise, Team) that unlock additional features. If users subscribe to OpenAI through the ChatGPT app on an Apple device, the process will reportedly use Apple’s payment platform, which may give Apple a significant cut of the revenue. According to the report, Apple hopes to negotiate additional revenue-sharing deals with AI vendors in the future.

Why OpenAI

The rise of ChatGPT in the public eye over the past 18 months has made OpenAI a power player in the tech industry, allowing it to strike deals with publishers for AI training content—and ensure continued support from Microsoft in the form of investments that trade vital funding and compute for access to OpenAI’s large language model (LLM) technology like GPT-4.

Still, Apple’s choice of ChatGPT as Apple’s first external AI integration has led to widespread misunderstanding, especially since Apple buried the lede about its own in-house LLM technology that powers its new “Apple Intelligence” platform.

On Apple’s part, CEO Tim Cook told The Washington Post that it chose OpenAI as its first third-party AI partner because he thinks the company controls the leading LLM technology at the moment: “I think they’re a pioneer in the area, and today they have the best model,” he said. “We’re integrating with other people as well. But they’re first, and I think today it’s because they’re best.”

Apple’s choice also brings risk. OpenAI’s record isn’t spotless, racking up a string of public controversies over the past month that include an accusation from actress Scarlett Johansson that the company intentionally imitated her voice, resignations from a key scientist and safety personnel, the revelation of a restrictive NDA for ex-employees that prevented public criticism, and accusations against OpenAI CEO Sam Altman of “psychological abuse” related by a former member of the OpenAI board.

Meanwhile, critics of privacy issues related to gathering data for training AI models—including OpenAI foe Elon Musk, who took to X on Monday to spread misconceptions about how the ChatGPT integration might work—also worried that the Apple-OpenAI deal might expose personal data to the AI company, although both companies strongly deny that will be the case.

Looking ahead, Apple’s deal with OpenAI is not exclusive, and the company is already in talks to offer Google’s Gemini chatbot as an additional option later this year. Apple has also reportedly held talks with Anthropic (maker of Claude 3) as a potential chatbot partner, signaling its intention to provide users with a range of AI services, much like how the company offers various search engine options in Safari.

Report: Apple isn’t paying OpenAI for ChatGPT integration into OSes Read More »

turkish-student-creates-custom-ai-device-for-cheating-university-exam,-gets-arrested

Turkish student creates custom AI device for cheating university exam, gets arrested

spy hard —

Elaborate scheme involved hidden camera and an earpiece to hear answers.

A photo illustration of what a shirt-button camera <em>could</em> look like. ” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/06/shirt-button-camera-800×450.jpg”></img><figcaption>
<p><a data-height=Enlarge / A photo illustration of what a shirt-button camera could look like.

Aurich Lawson | Getty Images

On Saturday, Turkish police arrested and detained a prospective university student who is accused of developing an elaborate scheme to use AI and hidden devices to help him cheat on an important entrance exam, reports Reuters and The Daily Mail.

The unnamed student is reportedly jailed pending trial after the incident, which took place in the southwestern province of Isparta, where the student was caught behaving suspiciously during the TYT. The TYT is a nationally held university aptitude exam that determines a person’s eligibility to attend a university in Turkey—and cheating on the high-stakes exam is a serious offense.

According to police reports, the student used a camera disguised as a shirt button, connected to AI software via a “router” (possibly a mistranslation of a cellular modem) hidden in the sole of their shoe. The system worked by scanning the exam questions using the button camera, which then relayed the information to an unnamed AI model. The software generated the correct answers and recited them to the student through an earpiece.

A video released by the Isparta police demonstrated how the cheating system functioned. In the video, a police officer scans a question, and the AI software provides the correct answer through the earpiece.

In addition to the student, Turkish police detained another individual for assisting the student during the exam. The police discovered a mobile phone that could allegedly relay spoken sounds to the other person, allowing for two-way communication.

A history of calling on computers for help

The recent arrest recalls other attempts to cheat using wireless communications and computers, such as the famous case of the Eudaemons in the late 1970s. The Eudaemons were a group of physics graduate students from the University of California, Santa Cruz, who developed a wearable computer device designed to predict the outcome of roulette spins in casinos.

The Eudaemons’ device consisted of a shoe with a computer built into it, connected to a timing device operated by the wearer’s big toe. The wearer would click the timer when the ball and the spinning roulette wheel were in a specific position, and the computer would calculate the most likely section of the wheel where the ball would land. This prediction would be transmitted to an earpiece worn by another team member, who would quickly place bets on the predicted section.

While the Eudaemons’ plan didn’t involve a university exam, it shows that the urge to call upon remote computational powers greater than oneself is apparently timeless.

Turkish student creates custom AI device for cheating university exam, gets arrested Read More »

ridiculed-stable-diffusion-3-release-excels-at-ai-generated-body-horror

Ridiculed Stable Diffusion 3 release excels at AI-generated body horror

unstable diffusion —

Users react to mangled SD3 generations and ask, “Is this release supposed to be a joke?”

An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

Enlarge / An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

On Wednesday, Stability AI released weights for Stable Diffusion 3 Medium, an AI image-synthesis model that turns text prompts into AI-generated images. Its arrival has been ridiculed online, however, because it generates images of humans in a way that seems like a step backward from other state-of-the-art image-synthesis models like Midjourney or DALL-E 3. As a result, it can churn out wild anatomically incorrect visual abominations with ease.

A thread on Reddit, titled, “Is this release supposed to be a joke? [SD3-2B],” details the spectacular failures of SD3 Medium at rendering humans, especially human limbs like hands and feet. Another thread, titled, “Why is SD3 so bad at generating girls lying on the grass?” shows similar issues, but for entire human bodies.

Hands have traditionally been a challenge for AI image generators due to lack of good examples in early training data sets, but more recently, several image-synthesis models seemed to have overcome the issue. In that sense, SD3 appears to be a huge step backward for the image-synthesis enthusiasts that gather on Reddit—especially compared to recent Stability releases like SD XL Turbo in November.

“It wasn’t too long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!” wrote one Reddit user.

  • An AI-generated image created using Stable Diffusion 3 Medium.

  • An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

  • An AI-generated image created using Stable Diffusion 3 that shows mangled hands.

  • An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

  • An AI-generated image created using Stable Diffusion 3 that shows mangled hands.

  • An AI-generated SD3 Medium image a Reddit user made with the prompt “woman wearing a dress on the beach.”

  • An AI-generated SD3 Medium image a Reddit user made with the prompt “photograph of a person napping in a living room.”

AI image fans are so far blaming the Stable Diffusion 3’s anatomy fails on Stability’s insistence on filtering out adult content (often called “NSFW” content) from the SD3 training data that teaches the model how to generate images. “Believe it or not, heavily censoring a model also gets rid of human anatomy, so… that’s what happened,” wrote one Reddit user in the thread.

Basically, any time a user prompt homes in on a concept that isn’t represented well in the AI model’s training dataset, the image-synthesis model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying.

The release of Stable Diffusion 2.0 in 2022 suffered from similar problems in depicting humans well, and AI researchers soon discovered that censoring adult content that contains nudity can severely hamper an AI model’s ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some abilities lost by strongly filtering NSFW content.

Another issue that can occur during model pre-training is that sometimes the NSFW filter researchers use remove adult images from the dataset is too picky, accidentally removing images that might not be offensive and depriving the model of depictions of humans in certain situations. “[SD3] works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw,” wrote one Redditor on the topic.

Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to those being reported by others. For example, the prompt “a man showing his hands” returned an image of a man holding up two giant-sized backward hands, although each hand at least had five fingers.

  • A SD3 Medium example we generated with the prompt “A woman lying on the beach.”

  • A SD3 Medium example we generated with the prompt “A man showing his hands.”

    Stability AI

  • A SD3 Medium example we generated with the prompt “A woman showing her hands.”

    Stability AI

  • A SD3 Medium example we generated with the prompt “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting.”

  • A SD3 Medium example we generated with the prompt “A cat in a car holding a can of beer.”

Stability first announced Stable Diffusion 3 in February, and the company has planned to make it available in a variety of different model sizes. Today’s release is for the “Medium” version, which is a 2 billion-parameter model. In addition to the weights being available on Hugging Face, they are also available for experimentation through the company’s Stability Platform. The weights are available for download and use for free under a non-commercial license only.

Soon after its February announcement, delays in releasing the SD3 model weights inspired rumors that the release was being held back due to technical issues or mismanagement. Stability AI as a company fell into a tailspin recently with the resignation of its founder and CEO, Emad Mostaque, in March and then a series of layoffs. Just prior to that, three key engineers—Robin Rombach, Andreas Blattmann, and Dominik Lorenz—left the company. And its troubles go back even farther, with news of the company’s dire financial position lingering since 2023.

To some Stable Diffusion fans, the failures with Stable Diffusion 3 Medium are a visual manifestation of the company’s mismanagement—and an obvious sign of things falling apart. Although the company has not filed for bankruptcy, some users made dark jokes about the possibility after seeing SD3 Medium:

“I guess now they can go bankrupt in a safe and ethically [sic] way, after all.”

Ridiculed Stable Diffusion 3 release excels at AI-generated body horror Read More »

apple-and-openai-currently-have-the-most-misunderstood-partnership-in-tech

Apple and OpenAI currently have the most misunderstood partnership in tech

A man talks into a smartphone.

Enlarge / He isn’t using an iPhone, but some people talk to Siri like this.

On Monday, Apple premiered “Apple Intelligence” during a wide-ranging presentation at its annual Worldwide Developers Conference in Cupertino, California. However, the heart of its new tech, an array of Apple-developed AI models, was overshadowed by the announcement of ChatGPT integration into its device operating systems.

Since rumors of the partnership first emerged, we’ve seen confusion on social media about why Apple didn’t develop a cutting-edge GPT-4-like chatbot internally. Despite Apple’s year-long development of its own large language models (LLMs), many perceived the integration of ChatGPT (and opening the door for others, like Google Gemini) as a sign of Apple’s lack of innovation.

“This is really strange. Surely Apple could train a very good competing LLM if they wanted? They’ve had a year,” wrote AI developer Benjamin De Kraker on X. Elon Musk has also been grumbling about the OpenAI deal—and spreading misinformation about it—saying things like, “It’s patently absurd that Apple isn’t smart enough to make their own AI, yet is somehow capable of ensuring that OpenAI will protect your security & privacy!”

While Apple has developed many technologies internally, it has also never been shy about integrating outside tech when necessary in various ways, from acquisitions to built-in clients—in fact, Siri was initially developed by an outside company. But by making a deal with a company like OpenAI, which has been the source of a string of tech controversies recently, it’s understandable that some people don’t understand why Apple made the call—and what it might entail for the privacy of their on-device data.

“Our customers want something with world knowledge some of the time”

While Apple Intelligence largely utilizes its own Apple-developed LLMs, Apple also realized that there may be times when some users want to use what the company considers the current “best” existing LLM—OpenAI’s GPT-4 family. In an interview with The Washington Post, Apple CEO Tim Cook explained the decision to integrate OpenAI first:

“I think they’re a pioneer in the area, and today they have the best model,” he said. “And I think our customers want something with world knowledge some of the time. So we considered everything and everyone. And obviously we’re not stuck on one person forever or something. We’re integrating with other people as well. But they’re first, and I think today it’s because they’re best.”

The proposed benefit of Apple integrating ChatGPT into various experiences within iOS, iPadOS, and macOS is that it allows AI users to access ChatGPT’s capabilities without the need to switch between different apps—either through the Siri interface or through Apple’s integrated “Writing Tools.” Users will also have the option to connect their paid ChatGPT account to access extra features.

As an answer to privacy concerns, Apple says that before any data is sent to ChatGPT, the OS asks for the user’s permission, and the entire ChatGPT experience is optional. According to Apple, requests are not stored by OpenAI, and users’ IP addresses are hidden. Apparently, communication with OpenAI servers happens through API calls similar to using the ChatGPT app on iOS, and there is reportedly no deeper OS integration that might expose user data to OpenAI without the user’s permission.

We can only take Apple’s word for it at the moment, of course, and solid details about Apple’s AI privacy efforts will emerge once security experts get their hands on the new features later this year.

Apple’s history of tech integration

So you’ve seen why Apple chose OpenAI. But why look to outside companies for tech? In some ways, Apple building an external LLM client into its operating systems isn’t too different from what it has previously done with streaming video (the YouTube app on the original iPhone), Internet search (Google search integration), and social media (integrated Twitter and Facebook sharing).

The press has positioned Apple’s recent AI moves as Apple “catching up” with competitors like Google and Microsoft in terms of chatbots and generative AI. But playing it slow and cool has long been part of Apple’s M.O.—not necessarily introducing the bleeding edge of technology but improving existing tech through refinement and giving it a better user interface.

Apple and OpenAI currently have the most misunderstood partnership in tech Read More »

duckduckgo-offers-“anonymous”-access-to-ai-chatbots-through-new-service

DuckDuckGo offers “anonymous” access to AI chatbots through new service

anonymous confabulations —

DDG offers LLMs from OpenAI, Anthropic, Meta, and Mistral for factually-iffy conversations.

DuckDuckGo's AI Chat promotional image.

DuckDuckGo

On Thursday, DuckDuckGo unveiled a new “AI Chat” service that allows users to converse with four mid-range large language models (LLMs) from OpenAI, Anthropic, Meta, and Mistral in an interface similar to ChatGPT while attempting to preserve privacy and anonymity. While the AI models involved can output inaccurate information readily, the site allows users to test different mid-range LLMs without having to install anything or sign up for an account.

DuckDuckGo’s AI Chat currently features access to OpenAI’s GPT-3.5 Turbo, Anthropic’s Claude 3 Haiku, and two open source models, Meta’s Llama 3 and Mistral’s Mixtral 8x7B. The service is currently free to use within daily limits. Users can access AI Chat through the DuckDuckGo search engine, direct links to the site, or by using “!ai” or “!chat” shortcuts in the search field. AI Chat can also be disabled in the site’s settings for users with accounts.

According to DuckDuckGo, chats on the service are anonymized, with metadata and IP address removed to prevent tracing back to individuals. The company states that chats are not used for AI model training, citing its privacy policy and terms of use.

“We have agreements in place with all model providers to ensure that any saved chats are completely deleted by the providers within 30 days,” says DuckDuckGo, “and that none of the chats made on our platform can be used to train or improve the models.”

An example of DuckDuckGo AI Chat with GPT-3.5 answering a silly question in an inaccurate way.

Enlarge / An example of DuckDuckGo AI Chat with GPT-3.5 answering a silly question in an inaccurate way.

Benj Edwards

However, the privacy experience is not bulletproof because, in the case of GPT-3.5 and Claude Haiku, DuckDuckGo is required to send a user’s inputs to remote servers for processing over the Internet. Given certain inputs (i.e., “Hey, GPT, my name is Bob, and I live on Main Street, and I just murdered Bill”), a user could still potentially be identified if such an extreme need arose.

While the service appears to work well for us, there’s a question about its utility. For example, while GPT-3.5 initially wowed people when it launched with ChatGPT in 2022, it also confabulated a lot—and it still does. GPT-4 was the first major LLM to get confabulations under control to a point where the bot became more reasonably useful for some tasks (though this itself is a controversial point), but that more capable model isn’t present in DuckDuckGo’s AI Chat. Also missing are similar GPT-4-level models like Claude Opus or Google’s Gemini Ultra, likely because they are far more expensive to run. DuckDuckGo says it may roll out paid plans in the future, and those may include higher daily usage limits or access to “more advanced models.”)

It’s true that the other three models generally (and subjectively) pass GPT-3.5 in capability for coding with lower hallucinations, but they can still make things up, too. With DuckDuckGo AI Chat as it stands, the company is left with a chatbot novelty with a decent interface and the promise that your conversations with it will remain private. But what use are fully private AI conversations if they are full of errors?

Mixtral 8x7B on DuckDuckGo AI Chat when asked about the author. Everything in red boxes is sadly incorrect, but it provides an interesting fantasy scenario. It's a good example of an LLM plausibly filling gaps between concepts that are underrepresented in its training data, called confabulation. For the record, Llama 3 gives a more accurate answer.

Enlarge / Mixtral 8x7B on DuckDuckGo AI Chat when asked about the author. Everything in red boxes is sadly incorrect, but it provides an interesting fantasy scenario. It’s a good example of an LLM plausibly filling gaps between concepts that are underrepresented in its training data, called confabulation. For the record, Llama 3 gives a more accurate answer.

Benj Edwards

As DuckDuckGo itself states in its privacy policy, “By its very nature, AI Chat generates text with limited information. As such, Outputs that appear complete or accurate because of their detail or specificity may not be. For example, AI Chat cannot dynamically retrieve information and so Outputs may be outdated. You should not rely on any Output without verifying its contents using other sources, especially for professional advice (like medical, financial, or legal advice).”

So, have fun talking to bots, but tread carefully. They’ll easily “lie” to your face because they don’t understand what they are saying and are tuned to output statistically plausible information, not factual references.

DuckDuckGo offers “anonymous” access to AI chatbots through new service Read More »

what-kind-of-bug-would-make-machine-learning-suddenly-40%-worse-at-nethack?

What kind of bug would make machine learning suddenly 40% worse at NetHack?

Large Moon Models (LMMs) —

One day, a roguelike-playing system just kept biffing it, for celestial reasons.

Moon rendered in ASCII text, with

Aurich Lawson

Members of the Legendary Computer Bugs Tribunal, honored guests, if I may have your attention? I would, humbly, submit a new contender for your esteemed judgment. You may or may not find it novel, you may even deign to call it a “bug,” but I assure you, you will find it entertaining.

Consider NetHack. It is one of the all-time roguelike games, and I mean that in the more strict sense of that term. The content is procedurally generated, deaths are permanent, and the only thing you keep from game to game is your skill and knowledge. I do understand that the only thing two roguelike fans can agree on is how wrong the third roguelike fan is in their definition of roguelike, but, please, let us move on.

NetHack is great for machine learning…

Being a difficult game full of consequential choices and random challenges, as well as a “single-agent” game that can be generated and played at lightning speed on modern computers, NetHack is great for those working in machine learning—or imitation learning, actually, as detailed in Jens Tuyls’ paper on how compute scaling affects single-agent game learning. Using Tuyls’ model of expert NetHack behavior, Bartłomiej Cupiał and Maciej Wołczyk trained a neural network to play and improve itself using reinforcement learning.

By mid-May of this year, the two had their model consistently scoring 5,000 points by their own metrics. Then, on one run, the model suddenly got worse, on the order of 40 percent. It scored 3,000 points. Machine learning generally, gradually, goes in one direction with these types of problems. It didn’t make sense.

Cupiał and Wołczyk tried quite a few things: reverting their code, restoring their entire software stack from a Singularity backup, and rolling back their CUDA libraries. The result? 3,000 points. They rebuild everything from scratch, and it’s still 3,000 points.

<em>NetHack</em>, played by a regular human.” height=”506″ src=”https://cdn.arstechnica.net/wp-content/uploads/2024/06/13863751533_64654db44e_o.png” width=”821″></img><figcaption>
<p><em>NetHack</em>, played by a regular human.</p>
</figcaption></figure>
<h2>… except on certain nights</h2>
<p>As <a href=detailed in Cupiał’s X (formerly Twitter) thread, this was several hours of confused trial and error by him and Wołczyk. “I am starting to feel like a madman. I can’t even watch a TV show constantly thinking about the bug,” Cupiał wrote. In desperation, he asks model author Tuyls if he knows what could be wrong. He wakes up in Kraków to an answer:

“Oh yes, it’s probably a full moon today.”

In NetHack, the game in which the DevTeam has thought of everything, if the game detects from your system clock that it should be a full moon, it will generate a message: “You are lucky! Full moon tonight.” A full moon imparts a few player benefits: a single point added to Luck, and werecreatures mostly kept to their animal forms.

It’s an easier game, all things considered, so why would the learning agent’s score be lower? It simply doesn’t have data about full moon variables in its training data, so a branching series of decisions likely leads to lesser outcomes, or just confusion. It was indeed a full moon in Kraków when the 3,000-ish scores started showing up. What a terrible night to have a learning model.

Of course, “score” is not a real metric for success in NetHack, as Cupiał himself noted. Ask a model to get the best score, and it will farm the heck out of low-level monsters because it never gets bored. “Finding items required for [ascension] or even [just] doing a quest is too much for pure RL agent,” Cupiał wrote. Another neural network, AutoAscend, does a better job of progressing through the game, but “even it can only solve sokoban and reach mines end,” Cupiał notes.

Is it a bug?

I submit to you that, although NetHack responded to the full moon in its intended way, this quirky, very hard-to-fathom stop on a machine-learning journey was indeed a bug and a worthy one in the pantheon. It’s not a Harvard moth, nor a 500-mile email, but what is?

Because the team used Singularity to back up and restore their stack, they inadvertently carried forward the machine time and resulting bug each time they tried to solve it. The machine’s resulting behavior was so bizarre, and seemingly based on unseen forces, that it drove a coder into fits. And the story has a beginning, a climactic middle, and a denouement that teaches us something, however obscure.

The NetHack Lunar Learning Bug is, I submit, quite worth memorializing. Thank you for your time.

What kind of bug would make machine learning suddenly 40% worse at NetHack? Read More »

nvidia-jumps-ahead-of-itself-and-reveals-next-gen-“rubin”-ai-chips-in-keynote-tease

Nvidia jumps ahead of itself and reveals next-gen “Rubin” AI chips in keynote tease

Swing beat —

“I’m not sure yet whether I’m going to regret this,” says CEO Jensen Huang at Computex 2024.

Nvidia's CEO Jensen Huang delivers his keystone speech ahead of Computex 2024 in Taipei on June 2, 2024.

Enlarge / Nvidia’s CEO Jensen Huang delivers his keystone speech ahead of Computex 2024 in Taipei on June 2, 2024.

On Sunday, Nvidia CEO Jensen Huang reached beyond Blackwell and revealed the company’s next-generation AI-accelerating GPU platform during his keynote at Computex 2024 in Taiwan. Huang also detailed plans for an annual tick-tock-style upgrade cycle of its AI acceleration platforms, mentioning an upcoming Blackwell Ultra chip slated for 2025 and a subsequent platform called “Rubin” set for 2026.

Nvidia’s data center GPUs currently power a large majority of cloud-based AI models, such as ChatGPT, in both development (training) and deployment (inference) phases, and investors are keeping a close watch on the company, with expectations to keep that run going.

During the keynote, Huang seemed somewhat hesitant to make the Rubin announcement, perhaps wary of invoking the so-called Osborne effect, whereby a company’s premature announcement of the next iteration of a tech product eats into the current iteration’s sales. “This is the very first time that this next click as been made,” Huang said, holding up his presentation remote just before the Rubin announcement. “And I’m not sure yet whether I’m going to regret this or not.”

Nvidia Keynote at Computex 2023.

The Rubin AI platform, expected in 2026, will use HBM4 (a new form of high-bandwidth memory) and NVLink 6 Switch, operating at 3,600GBps. Following that launch, Nvidia will release a tick-tock iteration called “Rubin Ultra.” While Huang did not provide extensive specifications for the upcoming products, he promised cost and energy savings related to the new chipsets.

During the keynote, Huang also introduced a new ARM-based CPU called “Vera,” which will be featured on a new accelerator board called “Vera Rubin,” alongside one of the Rubin GPUs.

Much like Nvidia’s Grace Hopper architecture, which combines a “Grace” CPU and a “Hopper” GPU to pay tribute to the pioneering computer scientist of the same name, Vera Rubin refers to Vera Florence Cooper Rubin (1928–2016), an American astronomer who made discoveries in the field of deep space astronomy. She is best known for her pioneering work on galaxy rotation rates, which provided strong evidence for the existence of dark matter.

A calculated risk

Nvidia CEO Jensen Huang reveals the

Enlarge / Nvidia CEO Jensen Huang reveals the “Rubin” AI platform for the first time during his keynote at Computex 2024 on June 2, 2024.

Nvidia’s reveal of Rubin is not a surprise in the sense that most big tech companies are continuously working on follow-up products well in advance of release, but it’s notable because it comes just three months after the company revealed Blackwell, which is barely out of the gate and not yet widely shipping.

At the moment, the company seems to be comfortable leapfrogging itself with new announcements and catching up later; Nvidia just announced that its GH200 Grace Hopper “Superchip,” unveiled one year ago at Computex 2023, is now in full production.

With Nvidia stock rising and the company possessing an estimated 70–95 percent of the data center GPU market share, the Rubin reveal is a calculated risk that seems to come from a place of confidence. That confidence could turn out to be misplaced if a so-called “AI bubble” pops or if Nvidia misjudges the capabilities of its competitors. The announcement may also stem from pressure to continue Nvidia’s astronomical growth in market cap with nonstop promises of improving technology.

Accordingly, Huang has been eager to showcase the company’s plans to continue pushing silicon fabrication tech to its limits and widely broadcast that Nvidia plans to keep releasing new AI chips at a steady cadence.

“Our company has a one-year rhythm. Our basic philosophy is very simple: build the entire data center scale, disaggregate and sell to you parts on a one-year rhythm, and we push everything to technology limits,” Huang said during Sunday’s Computex keynote.

Despite Nvidia’s recent market performance, the company’s run may not continue indefinitely. With ample money pouring into the data center AI space, Nvidia isn’t alone in developing accelerator chips. Competitors like AMD (with the Instinct series) and Intel (with Guadi 3) also want to win a slice of the data center GPU market away from Nvidia’s current command of the AI-accelerator space. And OpenAI’s Sam Altman is trying to encourage diversified production of GPU hardware that will power the company’s next generation of AI models in the years ahead.

Nvidia jumps ahead of itself and reveals next-gen “Rubin” AI chips in keynote tease Read More »