machine learning

microsoft-cto-kevin-scott-thinks-llm-“scaling-laws”-will-hold-despite-criticism

Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism

As the word turns —

Will LLMs keep improving if we throw more compute at them? OpenAI dealmaker thinks so.

Kevin Scott, CTO and EVP of AI at Microsoft speaks onstage during Vox Media's 2023 Code Conference at The Ritz-Carlton, Laguna Niguel on September 27, 2023 in Dana Point, California.

Enlarge / Kevin Scott, CTO and EVP of AI at Microsoft speaks onstage during Vox Media’s 2023 Code Conference at The Ritz-Carlton, Laguna Niguel on September 27, 2023 in Dana Point, California.

During an interview with Sequoia Capital’s Training Data podcast published last Tuesday, Microsoft CTO Kevin Scott doubled down on his belief that so-called large language model (LLM) “scaling laws” will continue to drive AI progress, despite some skepticism in the field that progress has leveled out. Scott played a key role in forging a $13 billion technology-sharing deal between Microsoft and OpenAI.

“Despite what other people think, we’re not at diminishing marginal returns on scale-up,” Scott said. “And I try to help people understand there is an exponential here, and the unfortunate thing is you only get to sample it every couple of years because it just takes a while to build supercomputers and then train models on top of them.”

LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute). The laws suggest that simply scaling up model size and training data can lead to significant improvements in AI capabilities without necessarily requiring fundamental algorithmic breakthroughs.

Since then, other researchers have challenged the idea of persisting scaling laws over time, but the concept is still a cornerstone of OpenAI’s AI development philosophy.

You can see Scott’s comments in the video below beginning around 46: 05:

Microsoft CTO Kevin Scott on how far scaling laws will extend

Scott’s optimism contrasts with a narrative among some critics in the AI community that progress in LLMs has plateaued around GPT-4 class models. The perception has been fueled by largely informal observations—and some benchmark results—about recent models like Google’s Gemini 1.5 Pro, Anthropic’s Claude Opus, and even OpenAI’s GPT-4o, which some argue haven’t shown the dramatic leaps in capability seen in earlier generations, and that LLM development may be approaching diminishing returns.

“We all know that GPT-3 was vastly better than GPT-2. And we all know that GPT-4 (released thirteen months ago) was vastly better than GPT-3,” wrote AI critic Gary Marcus in April. “But what has happened since?”

The perception of plateau

Scott’s stance suggests that tech giants like Microsoft still feel justified in investing heavily in larger AI models, betting on continued breakthroughs rather than hitting a capability plateau. Given Microsoft’s investment in OpenAI and strong marketing of its own Microsoft Copilot AI features, the company has a strong interest in maintaining the perception of continued progress, even if the tech stalls.

Frequent AI critic Ed Zitron recently wrote in a post on his blog that one defense of continued investment into generative AI is that “OpenAI has something we don’t know about. A big, sexy, secret technology that will eternally break the bones of every hater,” he wrote. “Yet, I have a counterpoint: no it doesn’t.”

Some perceptions of slowing progress in LLM capabilities and benchmarking may be due to the rapid onset of AI in the public eye when, in fact, LLMs have been developing for years prior. OpenAI continued to develop LLMs during a roughly three-year gap between the release of GPT-3 in 2020 and GPT-4 in 2023. Many people likely perceived a rapid jump in capability with GPT-4’s launch in 2023 because they had only become recently aware of GPT-3-class models with the launch of ChatGPT in late November 2022, which used GPT-3.5.

In the podcast interview, the Microsoft CTO pushed back against the idea that AI progress has stalled, but he acknowledged the challenge of infrequent data points in this field, as new models often take years to develop. Despite this, Scott expressed confidence that future iterations will show improvements, particularly in areas where current models struggle.

“The next sample is coming, and I can’t tell you when, and I can’t predict exactly how good it’s going to be, but it will almost certainly be better at the things that are brittle right now, where you’re like, oh my god, this is a little too expensive, or a little too fragile, for me to use,” Scott said in the interview. “All of that gets better. It’ll get cheaper, and things will become less fragile. And then more complicated things will become possible. That is the story of each generation of these models as we’ve scaled up.”

Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism Read More »

openai-reportedly-nears-breakthrough-with-“reasoning”-ai,-reveals-progress-framework

OpenAI reportedly nears breakthrough with “reasoning” AI, reveals progress framework

studies in hype-otheticals —

Five-level AI classification system probably best seen as a marketing exercise.

Illustration of a robot with many arms.

OpenAI recently unveiled a five-tier system to gauge its advancement toward developing artificial general intelligence (AGI), according to an OpenAI spokesperson who spoke with Bloomberg. The company shared this new classification system on Tuesday with employees during an all-hands meeting, aiming to provide a clear framework for understanding AI advancement. However, the system describes hypothetical technology that does not yet exist and is possibly best interpreted as a marketing move to garner investment dollars.

OpenAI has previously stated that AGI—a nebulous term for a hypothetical concept that means an AI system that can perform novel tasks like a human without specialized training—is currently the primary goal of the company. The pursuit of technology that can replace humans at most intellectual work drives most of the enduring hype over the firm, even though such a technology would likely be wildly disruptive to society.

OpenAI CEO Sam Altman has previously stated his belief that AGI could be achieved within this decade, and a large part of the CEO’s public messaging has been related to how the company (and society in general) might handle the disruption that AGI may bring. Along those lines, a ranking system to communicate AI milestones achieved internally on the path to AGI makes sense.

OpenAI’s five levels—which it plans to share with investors—range from current AI capabilities to systems that could potentially manage entire organizations. The company believes its technology (such as GPT-4o that powers ChatGPT) currently sits at Level 1, which encompasses AI that can engage in conversational interactions. However, OpenAI executives reportedly told staff they’re on the verge of reaching Level 2, dubbed “Reasoners.”

Bloomberg lists OpenAI’s five “Stages of Artificial Intelligence” as follows:

  • Level 1: Chatbots, AI with conversational language
  • Level 2: Reasoners, human-level problem solving
  • Level 3: Agents, systems that can take actions
  • Level 4: Innovators, AI that can aid in invention
  • Level 5: Organizations, AI that can do the work of an organization

A Level 2 AI system would reportedly be capable of basic problem-solving on par with a human who holds a doctorate degree but lacks access to external tools. During the all-hands meeting, OpenAI leadership reportedly demonstrated a research project using their GPT-4 model that the researchers believe shows signs of approaching this human-like reasoning ability, according to someone familiar with the discussion who spoke with Bloomberg.

The upper levels of OpenAI’s classification describe increasingly potent hypothetical AI capabilities. Level 3 “Agents” could work autonomously on tasks for days. Level 4 systems would generate novel innovations. The pinnacle, Level 5, envisions AI managing entire organizations.

This classification system is still a work in progress. OpenAI plans to gather feedback from employees, investors, and board members, potentially refining the levels over time.

Ars Technica asked OpenAI about the ranking system and the accuracy of the Bloomberg report, and a company spokesperson said they had “nothing to add.”

The problem with ranking AI capabilities

OpenAI isn’t alone in attempting to quantify levels of AI capabilities. As Bloomberg notes, OpenAI’s system feels similar to levels of autonomous driving mapped out by automakers. And in November 2023, researchers at Google DeepMind proposed their own five-level framework for assessing AI advancement, showing that other AI labs have also been trying to figure out how to rank things that don’t yet exist.

OpenAI’s classification system also somewhat resembles Anthropic’s “AI Safety Levels” (ASLs) first published by the maker of the Claude AI assistant in September 2023. Both systems aim to categorize AI capabilities, though they focus on different aspects. Anthropic’s ASLs are more explicitly focused on safety and catastrophic risks (such as ASL-2, which refers to “systems that show early signs of dangerous capabilities”), while OpenAI’s levels track general capabilities.

However, any AI classification system raises questions about whether it’s possible to meaningfully quantify AI progress and what constitutes an advancement (or even what constitutes a “dangerous” AI system, as in the case of Anthropic). The tech industry so far has a history of overpromising AI capabilities, and linear progression models like OpenAI’s potentially risk fueling unrealistic expectations.

There is currently no consensus in the AI research community on how to measure progress toward AGI or even if AGI is a well-defined or achievable goal. As such, OpenAI’s five-tier system should likely be viewed as a communications tool to entice investors that shows the company’s aspirational goals rather than a scientific or even technical measurement of progress.

OpenAI reportedly nears breakthrough with “reasoning” AI, reveals progress framework Read More »

intuit’s-ai-gamble:-mass-layoff-of-1,800-paired-with-hiring-spree

Intuit’s AI gamble: Mass layoff of 1,800 paired with hiring spree

In the name of AI —

Intuit CEO: “Companies that aren’t prepared to take advantage of [AI] will fall behind.”

Signage for financial software company Intuit at the company's headquarters in the Silicon Valley town of Mountain View, California, August 24, 2016.

On Wednesday, Intuit CEO Sasan Goodarzi announced in a letter to the company that it would be laying off 1,800 employees—about 10 percent of its workforce of around 18,000—while simultaneously planning to hire the same number of new workers as part of a major restructuring effort purportedly focused on AI.

“As I’ve shared many times, the era of AI is one of the most significant technology shifts of our lifetime,” wrote Goodarzi in a blog post on Intuit’s website. “This is truly an extraordinary time—AI is igniting global innovation at an incredible pace, transforming every industry and company in ways that were unimaginable just a few years ago. Companies that aren’t prepared to take advantage of this AI revolution will fall behind and, over time, will no longer exist.”

The CEO says Intuit is in a position of strength and that the layoffs are not cost-cutting related, but they allow the company to “allocate additional investments to our most critical areas to support our customers and drive growth.” With new hires, the company expects its overall headcount to grow in its 2025 fiscal year.

Intuit’s layoffs (which collectively qualify as a “mass layoff” under the WARN act) hit various departments within the company, including closing Intuit’s offices in Edmonton, Canada, and Boise, Idaho, affecting over 250 employees. Approximately 1,050 employees will receive layoffs because they’re “not meeting expectations,” according to Goodarzi’s letter. Intuit has also eliminated more than 300 roles across the company to “streamline” operations and shift resources toward AI, and the company plans to consolidate 80 tech roles to “sites where we are strategically growing our technology teams and capabilities,” such as Atlanta, Bangalore, New York, Tel Aviv, and Toronto.

In turn, the company plans to accelerate investments in its AI-powered financial assistant, Intuit Assist, which provides AI-generated financial recommendations. The company also plans to hire new talent in engineering, product development, data science, and customer-facing roles, with a particular emphasis on AI expertise.

Not just about AI

Despite Goodarzi’s heavily AI-focused message, the restructuring at Intuit reveals a more complex picture. A closer look at the layoffs shows that many of the 1,800 job cuts stem from performance-based departures (such as the aforementioned 1,050). The restructuring also includes a 10 percent reduction in executive positions at the director level and above (“To continue increasing our velocity of decision making,” Goodarzi says).

These numbers suggest that the reorganization may also serve as an opportunity for Intuit to trim its workforce of underperforming staff, using the AI hype cycle as a compelling backdrop for a broader house-cleaning effort.

But as far as CEOs are concerned, it’s always a good time to talk about how they’re embracing the latest, hottest thing in technology: “With the introduction of GenAI,” Goodarzi wrote, “we are now delivering even more compelling customer experiences, increasing monetization potential, and driving efficiencies in how the work gets done within Intuit. But it’s just the beginning of the AI revolution.”

Intuit’s AI gamble: Mass layoff of 1,800 paired with hiring spree Read More »

openai’s-new-“criticgpt”-model-is-trained-to-criticize-gpt-4-outputs

OpenAI’s new “CriticGPT” model is trained to criticize GPT-4 outputs

automated critic —

Research model catches bugs in AI-generated code, improving human oversight of AI.

An illustration created by OpenAI.

Enlarge / An illustration created by OpenAI.

On Thursday, OpenAI researchers unveiled CriticGPT, a new AI model designed to identify mistakes in code generated by ChatGPT. It aims to enhance the process of making AI systems behave in ways humans want (called “alignment”) through Reinforcement Learning from Human Feedback (RLHF), which helps human reviewers make large language model (LLM) outputs more accurate.

As outlined in a new research paper called “LLM Critics Help Catch LLM Bugs,” OpenAI created CriticGPT to act as an AI assistant to human trainers who review programming code generated by the ChatGPT AI assistant. CriticGPT—based on the GPT-4 family of LLMS—analyzes the code and points out potential errors, making it easier for humans to spot mistakes that might otherwise go unnoticed. The researchers trained CriticGPT on a dataset of code samples with intentionally inserted bugs, teaching it to recognize and flag various coding errors.

The researchers found that CriticGPT’s critiques were preferred by annotators over human critiques in 63 percent of cases involving naturally occurring LLM errors and that human-machine teams using CriticGPT wrote more comprehensive critiques than humans alone while reducing confabulation (hallucination) rates compared to AI-only critiques.

Developing an automated critic

The development of CriticGPT involved training the model on a large number of inputs containing deliberately inserted mistakes. Human trainers were asked to modify code written by ChatGPT, introducing errors and then providing example feedback as if they had discovered these bugs. This process allowed the model to learn how to identify and critique various types of coding errors.

In experiments, CriticGPT demonstrated its ability to catch both inserted bugs and naturally occurring errors in ChatGPT’s output. The new model’s critiques were preferred by trainers over those generated by ChatGPT itself in 63 percent of cases involving natural bugs (the aforementioned statistic). This preference was partly due to CriticGPT producing fewer unhelpful “nitpicks” and generating fewer false positives, or hallucinated problems.

The researchers also created a new technique they call Force Sampling Beam Search (FSBS). This method helps CriticGPT write more detailed reviews of code. It lets the researchers adjust how thorough CriticGPT is in looking for problems, while also controlling how often it might make up issues that don’t really exist. They can tweak this balance depending on what they need for different AI training tasks.

Interestingly, the researchers found that CriticGPT’s capabilities extend beyond just code review. In their experiments, they applied the model to a subset of ChatGPT training data that had previously been rated as flawless by human annotators. Surprisingly, CriticGPT identified errors in 24 percent of these cases—errors that were subsequently confirmed by human reviewers. OpenAI thinks this demonstrates the model’s potential to generalize to non-code tasks and highlights its ability to catch subtle mistakes that even careful human evaluation might miss.

Despite its promising results, like all AI models, CriticGPT has limitations. The model was trained on relatively short ChatGPT answers, which may not fully prepare it for evaluating longer, more complex tasks that future AI systems might tackle. Additionally, while CriticGPT reduces confabulations, it doesn’t eliminate them entirely, and human trainers can still make labeling mistakes based on these false outputs.

The research team acknowledges that CriticGPT is most effective at identifying errors that can be pinpointed in one specific location within the code. However, real-world mistakes in AI outputs can often be spread across multiple parts of an answer, presenting a challenge for future iterations of the model.

OpenAI plans to integrate CriticGPT-like models into its RLHF labeling pipeline, providing its trainers with AI assistance. For OpenAI, it’s a step toward developing better tools for evaluating outputs from LLM systems that may be difficult for humans to rate without additional support. However, the researchers caution that even with tools like CriticGPT, extremely complex tasks or responses may still prove challenging for human evaluators—even those assisted by AI.

OpenAI’s new “CriticGPT” model is trained to criticize GPT-4 outputs Read More »

ai-generated-al-michaels-to-provide-daily-recaps-during-2024-summer-olympics

AI-generated Al Michaels to provide daily recaps during 2024 Summer Olympics

forever young —

AI voice clone will narrate daily Olympics video recaps; critics call it a “code-generated ghoul.”

Al Michaels looks on prior to the game between the Minnesota Vikings and Philadelphia Eagles at Lincoln Financial Field on September 14, 2023 in Philadelphia, Pennsylvania.

Enlarge / Al Michaels looks on prior to the game between the Minnesota Vikings and Philadelphia Eagles at Lincoln Financial Field on September 14, 2023, in Philadelphia, Pennsylvania.

On Wednesday, NBC announced plans to use an AI-generated clone of famous sports commentator Al Michaels‘ voice to narrate daily streaming video recaps of the 2024 Summer Olympics in Paris, which start on July 26. The AI-powered narration will feature in “Your Daily Olympic Recap on Peacock,” NBC’s streaming service. But this new, high-profile use of voice cloning worries critics, who say the technology may muscle out upcoming sports commentators by keeping old personas around forever.

NBC says it has created a “high-quality AI re-creation” of Michaels’ voice, trained on Michaels’ past NBC appearances to capture his distinctive delivery style.

The veteran broadcaster, revered in the sports commentator world for his iconic “Do you believe in miracles? Yes!” call during the 1980 Winter Olympics, has been covering sports on TV since 1971, including a high-profile run of play-by-play coverage of NFL football games for both ABC and NBC since the 1980s. NBC dropped him from NFL coverage in 2023, however, possibly due to his age.

Michaels, who is 79 years old, shared his initial skepticism about the project in an interview with Vanity Fair, as NBC News notes. After hearing the AI version of his voice, which can greet viewers by name, he described the experience as “astonishing” and “a little bit frightening.” He said the AI recreation was “almost 2% off perfect” in mimicking his style.

The Vanity Fair article provides some insight into how NBC’s new AI system works. It first uses a large language model (similar technology to what powers ChatGPT) to analyze subtitles and metadata from NBC’s Olympics video coverage, summarizing events and writing custom output to imitate Michaels’ style. This text is then fed into an unspecified voice AI model trained on Michaels’ previous NBC appearances, reportedly replicating his unique pronunciations and intonations.

NBC estimates that the system could generate nearly 7 million personalized variants of the recaps across the US during the games, pulled from the network’s 5,000 hours of live coverage. Using the system, each Peacock user will receive about 10 minutes of personalized highlights.

A diminished role for humans in the future?

Al Michaels reports on the Sweden vs. USA men's ice hockey game at the 1980 Olympic Winter Games on February 12, 1980.

Enlarge / Al Michaels reports on the Sweden vs. USA men’s ice hockey game at the 1980 Olympic Winter Games on February 12, 1980.

It’s no secret that while AI is wildly hyped right now, it’s also controversial among some. Upon hearing the NBC announcement, critics of AI technology reacted strongly. “@NBCSports, this is gross,” tweeted actress and filmmaker Justine Bateman, who frequently uses X to criticize technologies that might replace human writers or performers in the future.

A thread of similar responses from X users reacting to the sample video provided above included criticisms such as, “Sounds pretty off when it’s just the same tone for every single word.” Another user wrote, “It just sounds so unnatural. No one talks like that.”

The technology will not replace NBC’s regular human sports commentators during this year’s Olympics coverage, and like other forms of AI, it leans heavily on existing human work by analyzing and regurgitating human-created content in the form of captions pulled from NBC footage.

Looking down the line, due to AI media cloning technologies like voice, video, and image synthesis, today’s celebrities may be able to attain a form of media immortality that allows new iterations of their likenesses to persist through the generations, potentially earning licensing fees for whoever holds the rights.

We’ve already seen it with James Earl Jones playing Darth Vader’s voice, and the trend will likely continue with other celebrity voices, provided the money is right. Eventually, it may extend to famous musicians through music synthesis and famous actors in video-synthesis applications as well.

The possibility of being muscled out by AI replicas factored heavily into a Hollywood actors’ strike last year, with SAG-AFTRA union President Fran Drescher saying, “If we don’t stand tall right now, we are all going to be in trouble. We are all going to be in jeopardy of being replaced by machines.”

For companies that like to monetize media properties for as long as possible, AI may provide a way to maintain a media legacy through automation. But future human performers may have to compete against all of the greatest performers of the past, rendered through AI, to break out and forge a new career—provided there will be room for human performers at all.

Al Michaels became Al Michaels because he was brought in to replace people who died, or retired, or moved on,” tweeted a writer named Geonn Cannon on X. “If he can’t do the job anymore, it’s time to let the next Al Michaels have a shot at it instead of just planting a code-generated ghoul in an empty chair.

AI-generated Al Michaels to provide daily recaps during 2024 Summer Olympics Read More »

toys-“r”-us-riles-critics-with-“first-ever”-ai-generated-commercial-using-sora

Toys “R” Us riles critics with “first-ever” AI-generated commercial using Sora

A screen capture from the partially AI-generated Toys

Enlarge / A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

Toys R Us

On Monday, Toys “R” Us announced that it had partnered with an ad agency called Native Foreign to create what it calls “the first-ever brand film using OpenAI’s new text-to-video tool, Sora.” OpenAI debuted Sora in February, but the video synthesis tool has not yet become available to the public. The brand film tells the story of Toys “R” Us founder Charles Lazarus using AI-generated video clips.

“We are thrilled to partner with Native Foreign to push the boundaries of Sora, a groundbreaking new technology from OpenAI that’s gaining global attention,” wrote Toys “R” Us on its website. “Sora can create up to one-minute-long videos featuring realistic scenes and multiple characters, all generated from text instruction. Imagine the excitement of creating a young Charles Lazarus, the founder of Toys “R” Us, and envisioning his dreams for our iconic brand and beloved mascot Geoffrey the Giraffe in the early 1930s.”

The company says that The Origin of Toys “R” Us commercial was co-produced by Toys “R” Us Studios President Kim Miller Olko as executive producer and Native Foreign’s Nik Kleverov as director. “Charles Lazarus was a visionary ahead of his time, and we wanted to honor his legacy with a spot using the most cutting-edge technology available,” Miller Olko said in a statement.

In the video, we see a child version of Lazarus, presumably generated using Sora, falling asleep and having a dream that he is flying through a land of toys. Along the way, he meets Geoffery, the store’s mascot, who hands the child a small red car.

Many of the scenes retain obvious hallmarks of AI-generated imagery, such as unnatural movement, strange visual artifacts, and the irregular shape of eyeglasses. In February, a few Super Bowl commercials intentionally made fun of similar AI-generated video defects, which became famous online after fake AI-generated beer commercial and “Pepperoni Hug Spot” clips made using Runway’s Gen-2 model went viral in 2023.

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys R Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys R Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

  • A screen capture from the partially AI-generated Toys “R” Us brand film created using Sora.

    Toys “R” Us

AI-generated artwork receives frequent criticism online due to the use of human-created artwork to train AI models that create the works, the perception that AI synthesis tools will replace (or are currently replacing) human creative jobs, and the potential environmental impact of AI models, which are seen as energy-wasteful by some critics. Also, some people just think the output quality looks bad.

On the social network X, comedy writer Mike Drucker wrapped up several of these criticisms into one post, writing, “Love this commercial is like, ‘Toys R Us started with the dream of a little boy who wanted to share his imagination with the world. And to show how, we fired our artists and dried Lake Superior using a server farm to generate what that would look like in Stephen King’s nightmares.'”

Other critical comments were more frank. Filmmaker Joe Russo posted: “TOYS ‘R US released an AI commercial and it fucking sucks.”

Toys “R” Us riles critics with “first-ever” AI-generated commercial using Sora Read More »

music-industry-giants-allege-mass-copyright-violation-by-ai-firms

Music industry giants allege mass copyright violation by AI firms

No one wants to be defeated —

Suno and Udio could face damages of up to $150,000 per song allegedly infringed.

Michael Jackson in concert, 1986. Sony Music owns a large portion of publishing rights to Jackson's music.

Enlarge / Michael Jackson in concert, 1986. Sony Music owns a large portion of publishing rights to Jackson’s music.

Universal Music Group, Sony Music, and Warner Records have sued AI music-synthesis companies Udio and Suno for allegedly committing mass copyright infringement by using recordings owned by the labels to train music-generating AI models, reports Reuters. Udio and Suno can generate novel song recordings based on text-based descriptions of music (i.e., “a dubstep song about Linus Torvalds”).

The lawsuits, filed in federal courts in New York and Massachusetts, claim that the AI companies’ use of copyrighted material to train their systems could lead to AI-generated music that directly competes with and potentially devalues the work of human artists.

Like other generative AI models, both Udio and Suno (which we covered separately in April) rely on a broad selection of existing human-created artworks that teach a neural network the relationship between words in a written prompt and styles of music. The record labels correctly note that these companies have been deliberately vague about the sources of their training data.

Until generative AI models hit the mainstream in 2022, it was common practice in machine learning to scrape and use copyrighted information without seeking permission to do so. But now that the applications of those technologies have become commercial products themselves, rightsholders have come knocking to collect. In the case of Udio and Suno, the record labels are seeking statutory damages of up to $150,000 per song used in training.

In the lawsuit, the record labels cite specific examples of AI-generated content that allegedly re-creates elements of well-known songs, including The Temptations’ “My Girl,” Mariah Carey’s “All I Want for Christmas Is You,” and James Brown’s “I Got You (I Feel Good).” It also claims the music-synthesis models can produce vocals resembling those of famous artists, such as Michael Jackson and Bruce Springsteen.

Reuters claims it’s the first instance of lawsuits specifically targeting music-generating AI, but music companies and artists alike have been gearing up to deal with challenges the technology may pose for some time.

In May, Sony Music sent warning letters to over 700 AI companies (including OpenAI, Microsoft, Google, Suno, and Udio) and music-streaming services that prohibited any AI researchers from using its music to train AI models. In April, over 200 musical artists signed an open letter that called on AI companies to stop using AI to “devalue the rights of human artists.” And last November, Universal Music filed a copyright infringement lawsuit against Anthropic for allegedly including artists’ lyrics in its Claude LLM training data.

Similar to The New York Times’ lawsuit against OpenAI over the use of training data, the outcome of the record labels’ new suit could have deep implications for the future development of generative AI in creative fields, including requiring companies to license all musical training data used in creating music-synthesis models.

Compulsory licenses for AI training data could make AI model development economically impractical for small startups like Udio and Suno—and judging by the aforementioned open letter, many musical artists may applaud that potential outcome. But such a development would not preclude major labels from eventually developing their own AI music generators themselves, allowing only large corporations with deep pockets to control generative music tools for the foreseeable future.

Music industry giants allege mass copyright violation by AI firms Read More »

anthropic-introduces-claude-3.5-sonnet,-matching-gpt-4o-on-benchmarks

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks

The Anthropic Claude 3 logo, jazzed up by Benj Edwards.

Anthropic / Benj Edwards

On Thursday, Anthropic announced Claude 3.5 Sonnet, its latest AI language model and the first in a new series of “3.5” models that build upon Claude 3, launched in March. Claude 3.5 can compose text, analyze data, and write code. It features a 200,000 token context window and is available now on the Claude website and through an API. Anthropic also introduced Artifacts, a new feature in the Claude interface that shows related work documents in a dedicated window.

So far, people outside of Anthropic seem impressed. “This model is really, really good,” wrote independent AI researcher Simon Willison on X. “I think this is the new best overall model (and both faster and half the price of Opus, similar to the GPT-4 Turbo to GPT-4o jump).”

As we’ve written before, benchmarks for large language models (LLMs) are troublesome because they can be cherry-picked and often do not capture the feel and nuance of using a machine to generate outputs on almost any conceivable topic. But according to Anthropic, Claude 3.5 Sonnet matches or outperforms competitor models like GPT-4o and Gemini 1.5 Pro on certain benchmarks like MMLU (undergraduate level knowledge), GSM8K (grade school math), and HumanEval (coding).

Claude 3.5 Sonnet benchmarks provided by Anthropic.

Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

If all that makes your eyes glaze over, that’s OK; it’s meaningful to researchers but mostly marketing to everyone else. A more useful performance metric comes from what we might call “vibemarks” (coined here first!) which are subjective, non-rigorous aggregate feelings measured by competitive usage on sites like LMSYS’s Chatbot Arena. The Claude 3.5 Sonnet model is currently under evaluation there, and it’s too soon to say how well it will fare.

Claude 3.5 Sonnet also outperforms Anthropic’s previous-best model (Claude 3 Opus) on benchmarks measuring “reasoning,” math skills, general knowledge, and coding abilities. For example, the model demonstrated strong performance in an internal coding evaluation, solving 64 percent of problems compared to 38 percent for Claude 3 Opus.

Claude 3.5 Sonnet is also a multimodal AI model that accepts visual input in the form of images, and the new model is reportedly excellent at a battery of visual comprehension tests.

Claude 3.5 Sonnet benchmarks provided by Anthropic.

Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

Roughly speaking, the visual benchmarks mean that 3.5 Sonnet is better at pulling information from images than previous models. For example, you can show it a picture of a rabbit wearing a football helmet, and the model knows it’s a rabbit wearing a football helmet and can talk about it. That’s fun for tech demos, but the tech is still not accurate enough for applications of the tech where reliability is mission critical.

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks Read More »

ex-openai-star-sutskever-shoots-for-superintelligent-ai-with-new-company

Ex-OpenAI star Sutskever shoots for superintelligent AI with new company

Not Strategic Simulations —

Safe Superintelligence, Inc. seeks to safely build AI far beyond human capability.

Illya Sutskever physically gestures as OpenAI CEO Sam Altman looks on at Tel Aviv University on June 5, 2023.

Enlarge / Ilya Sutskever physically gestures as OpenAI CEO Sam Altman looks on at Tel Aviv University on June 5, 2023.

On Wednesday, former OpenAI Chief Scientist Ilya Sutskever announced he is forming a new company called Safe Superintelligence, Inc. (SSI) with the goal of safely building “superintelligence,” which is a hypothetical form of artificial intelligence that surpasses human intelligence, possibly in the extreme.

We will pursue safe superintelligence in a straight shot, with one focus, one goal, and one product,” wrote Sutskever on X. “We will do it through revolutionary breakthroughs produced by a small cracked team.

Sutskever was a founding member of OpenAI and formerly served as the company’s chief scientist. Two others are joining Sutskever at SSI initially: Daniel Levy, who formerly headed the Optimization Team at OpenAI, and Daniel Gross, an AI investor who worked on machine learning projects at Apple between 2013 and 2017. The trio posted a statement on the company’s new website.

A screen capture of Safe Superintelligence's initial formation announcement captured on June 20, 2024.

Enlarge / A screen capture of Safe Superintelligence’s initial formation announcement captured on June 20, 2024.

Sutskever and several of his co-workers resigned from OpenAI in May, six months after Sutskever played a key role in ousting OpenAI CEO Sam Altman, who later returned. While Sutskever did not publicly complain about OpenAI after his departure—and OpenAI executives such as Altman wished him well on his new adventures—another resigning member of OpenAI’s Superalignment team, Jan Leike, publicly complained that “over the past years, safety culture and processes [had] taken a backseat to shiny products” at OpenAI. Leike joined OpenAI competitor Anthropic later in May.

A nebulous concept

OpenAI is currently seeking to create AGI, or artificial general intelligence, which would hypothetically match human intelligence at performing a wide variety of tasks without specific training. Sutskever hopes to jump beyond that in a straight moonshot attempt, with no distractions along the way.

“This company is special in that its first product will be the safe superintelligence, and it will not do anything else up until then,” said Sutskever in an interview with Bloomberg. “It will be fully insulated from the outside pressures of having to deal with a large and complicated product and having to be stuck in a competitive rat race.”

During his former job at OpenAI, Sutskever was part of the “Superalignment” team studying how to “align” (shape the behavior of) this hypothetical form of AI, sometimes called “ASI” for “artificial super intelligence,” to be beneficial to humanity.

As you can imagine, it’s difficult to align something that does not exist, so Sutskever’s quest has met skepticism at times. On X, University of Washington computer science professor (and frequent OpenAI critic) Pedro Domingos wrote, “Ilya Sutskever’s new company is guaranteed to succeed, because superintelligence that is never achieved is guaranteed to be safe.

Much like AGI, superintelligence is a nebulous term. Since the mechanics of human intelligence are still poorly understood—and since human intelligence is difficult to quantify or define because there is no one set type of human intelligence—identifying superintelligence when it arrives may be tricky.

Already, computers far surpass humans in many forms of information processing (such as basic math), but are they superintelligent? Many proponents of superintelligence imagine a sci-fi scenario of an “alien intelligence” with a form of sentience that operates independently of humans, and that is more or less what Sutskever hopes to achieve and control safely.

“You’re talking about a giant super data center that’s autonomously developing technology,” he told Bloomberg. “That’s crazy, right? It’s the safety of that that we want to contribute to.”

Ex-OpenAI star Sutskever shoots for superintelligent AI with new company Read More »

runway’s-latest-ai-video-generator-brings-giant-cotton-candy-monsters-to-life

Runway’s latest AI video generator brings giant cotton candy monsters to life

Screen capture of a Runway Gen-3 Alpha video generated with the prompt

Enlarge / Screen capture of a Runway Gen-3 Alpha video generated with the prompt “A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.”

On Sunday, Runway announced a new AI video synthesis model called Gen-3 Alpha that’s still under development, but it appears to create video of similar quality to OpenAI’s Sora, which debuted earlier this year (and has also not yet been released). It can generate novel, high-definition video from text prompts that range from realistic humans to surrealistic monsters stomping the countryside.

Unlike Runway’s previous best model from June 2023, which could only create two-second-long clips, Gen-3 Alpha can reportedly create 10-second-long video segments of people, places, and things that have a consistency and coherency that easily surpasses Gen-2. If 10 seconds sounds short compared to Sora’s full minute of video, consider that the company is working with a shoestring budget of compute compared to more lavishly funded OpenAI—and actually has a history of shipping video generation capability to commercial users.

Gen-3 Alpha does not generate audio to accompany the video clips, and it’s highly likely that temporally coherent generations (those that keep a character consistent over time) are dependent on similar high-quality training material. But Runway’s improvement in visual fidelity over the past year is difficult to ignore.

AI video heats up

It’s been a busy couple of weeks for AI video synthesis in the AI research community, including the launch of the Chinese model Kling, created by Beijing-based Kuaishou Technology (sometimes called “Kwai”). Kling can generate two minutes of 1080p HD video at 30 frames per second with a level of detail and coherency that reportedly matches Sora.

Gen-3 Alpha prompt: “Subtle reflections of a woman on the window of a train moving at hyper-speed in a Japanese city.”

Not long after Kling debuted, people on social media began creating surreal AI videos using Luma AI’s Luma Dream Machine. These videos were novel and weird but generally lacked coherency; we tested out Dream Machine and were not impressed by anything we saw.

Meanwhile, one of the original text-to-video pioneers, New York City-based Runway—founded in 2018—recently found itself the butt of memes that showed its Gen-2 tech falling out of favor compared to newer video synthesis models. That may have spurred the announcement of Gen-3 Alpha.

Gen-3 Alpha prompt: “An astronaut running through an alley in Rio de Janeiro.”

Generating realistic humans has always been tricky for video synthesis models, so Runway specifically shows off Gen-3 Alpha’s ability to create what its developers call “expressive” human characters with a range of actions, gestures, and emotions. However, the company’s provided examples weren’t particularly expressive—mostly people just slowly staring and blinking—but they do look realistic.

Provided human examples include generated videos of a woman on a train, an astronaut running through a street, a man with his face lit by the glow of a TV set, a woman driving a car, and a woman running, among others.

Gen-3 Alpha prompt: “A close-up shot of a young woman driving a car, looking thoughtful, blurred green forest visible through the rainy car window.”

The generated demo videos also include more surreal video synthesis examples, including a giant creature walking in a rundown city, a man made of rocks walking in a forest, and the giant cotton candy monster seen below, which is probably the best video on the entire page.

Gen-3 Alpha prompt: “A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.”

Gen-3 will power various Runway AI editing tools (one of the company’s most notable claims to fame), including Multi Motion Brush, Advanced Camera Controls, and Director Mode. It can create videos from text or image prompts.

Runway says that Gen-3 Alpha is the first in a series of models trained on a new infrastructure designed for large-scale multimodal training, taking a step toward the development of what it calls “General World Models,” which are hypothetical AI systems that build internal representations of environments and use them to simulate future events within those environments.

Runway’s latest AI video generator brings giant cotton candy monsters to life Read More »

softbank-plans-to-cancel-out-angry-customer-voices-using-ai

Softbank plans to cancel out angry customer voices using AI

our fake future —

Real-time voice modification tech seeks to reduce stress in call center staff.

A man is angry and screaming while talking on a smartphone.

Japanese telecommunications giant SoftBank recently announced that it has been developing “emotion-canceling” technology powered by AI that will alter the voices of angry customers to sound calmer during phone calls with customer service representatives. The project aims to reduce the psychological burden on operators suffering from harassment and has been in development for three years. Softbank plans to launch it by March 2026, but the idea is receiving mixed reactions online.

According to a report from the Japanese news site The Asahi Shimbun, SoftBank’s project relies on an AI model to alter the tone and pitch of a customer’s voice in real-time during a phone call. SoftBank’s developers, led by employee Toshiyuki Nakatani, trained the system using a dataset of over 10,000 voice samples, which were performed by 10 Japanese actors expressing more than 100 phrases with various emotions, including yelling and accusatory tones.

Voice cloning and synthesis technology has made massive strides in the past three years. We’ve previously covered technology from Microsoft that can clone a voice with a three-second audio sample and audio-processing technology from Adobe that cleans up audio by re-synthesizing a person’s voice, so SoftBank’s technology is well within the realm of plausibility.

By analyzing the voice samples, SoftBank’s AI model has reportedly learned to recognize and modify the vocal characteristics associated with anger and hostility. When a customer speaks to a call center operator, the model processes the incoming audio and adjusts the pitch and inflection of the customer’s voice to make it sound calmer and less threatening.

For example, a high-pitched, resonant voice may be lowered in tone, while a deep male voice may be raised to a higher pitch. The technology reportedly does not alter the content or wording of the customer’s speech, and it retains a slight element of audible anger to ensure that the operator can still gauge the customer’s emotional state. The AI model also monitors the length and content of the conversation, sending a warning message if it determines that the interaction is too long or abusive.

The tech has been developed through SoftBank’s in-house program called “SoftBank Innoventure” in conjunction with The Institute for AI and Beyond, which is a joint AI research institute established by The University of Tokyo.

Harassment a persistent problem

According to SoftBank, Japan’s service sector is grappling with the issue of “kasu-hara,” or customer harassment, where workers face aggressive behavior or unreasonable requests from customers. In response, the Japanese government and businesses are reportedly exploring ways to protect employees from the abuse.

The problem isn’t unique to Japan. In a Reddit thread on Softbank’s AI plans, call center operators from other regions related many stories about the stress of dealing with customer harassment. “I’ve worked in a call center for a long time. People need to realize that screaming at call center agents will get you nowhere,” wrote one person.

A 2021 ProPublica report tells horror stories from call center operators who are trained not to hang up no matter how abusive or emotionally degrading a call gets. The publication quoted Skype customer service contractor Christine Stewart as saying, “One person called me the C-word. I’d call my supervisor. They’d say, ‘Calm them down.’ … They’d always try to push me to stay on the call and calm the customer down myself. I wasn’t getting paid enough to do that. When you have a customer sitting there and saying you’re worthless… you’re supposed to ‘de-escalate.'”

But verbally de-escalating an angry customer is difficult, according to Reddit poster BenCelotil, who wrote, “As someone who has worked in several call centers, let me just point out that there is no way faster to escalate a call than to try and calm the person down. If the angry person on the other end of the call thinks you’re just trying to placate and push them off somewhere else, they’re only getting more pissed.”

Ignoring reality using AI

Harassment of call center workers is a very real problem, but given the introduction of AI as a possible solution, some people wonder whether it’s a good idea to essentially filter emotional reality on demand through voice synthesis. Perhaps this technology is a case of treating the symptom instead of the root cause of the anger, as some social media commenters note.

“This is like the worst possible solution to the problem,” wrote one Redditor in the thread mentioned above. “Reminds me of when all the workers at Apple’s China factory started jumping out of windows due to working conditions, so the ‘solution’ was to put nets around the building.”

SoftBank expects to introduce its emotion-canceling solution within fiscal year 2025, which ends on March 31, 2026. By reducing the psychological burden on call center operators, SoftBank says it hopes to create a safer work environment that enables employees to provide even better services to customers.

Even so, ignoring customer anger could backfire in the long run when the anger is sometimes a legitimate response to poor business practices. As one Redditor wrote, “If you have so many angry customers that it is affecting the mental health of your call center operators, then maybe address the reasons you have so many irate customers instead of just pretending that they’re not angry.”

Softbank plans to cancel out angry customer voices using AI Read More »

report:-apple-isn’t-paying-openai-for-chatgpt-integration-into-oses

Report: Apple isn’t paying OpenAI for ChatGPT integration into OSes

in the pocket —

Apple thinks pushing OpenAI’s brand to hundreds of millions is worth more than money.

The OpenAI and Apple logos together.

OpenAI / Apple / Benj Edwards

On Monday, Apple announced it would be integrating OpenAI’s ChatGPT AI assistant into upcoming versions of its iPhone, iPad, and Mac operating systems. It paves the way for future third-party AI model integrations, but given Google’s multi-billion-dollar deal with Apple for preferential web search, the OpenAI announcement inspired speculation about who is paying whom. According to a Bloomberg report published Wednesday, Apple considers ChatGPT’s placement on its devices as compensation enough.

“Apple isn’t paying OpenAI as part of the partnership,” writes Bloomberg reporter Mark Gurman, citing people familiar with the matter who wish to remain anonymous. “Instead, Apple believes pushing OpenAI’s brand and technology to hundreds of millions of its devices is of equal or greater value than monetary payments.”

The Bloomberg report states that neither company expects the agreement to generate meaningful revenue in the short term, and in fact, the partnership could burn extra money for OpenAI, because it pays Microsoft to host ChatGPT’s capabilities on its Azure cloud. However, OpenAI could benefit by converting free users to paid subscriptions, and Apple potentially benefits by providing easy, built-in access to ChatGPT during a time when its own in-house LLMs are still catching up.

And there’s another angle at play. Currently, OpenAI offers subscriptions (ChatGPT Plus, Enterprise, Team) that unlock additional features. If users subscribe to OpenAI through the ChatGPT app on an Apple device, the process will reportedly use Apple’s payment platform, which may give Apple a significant cut of the revenue. According to the report, Apple hopes to negotiate additional revenue-sharing deals with AI vendors in the future.

Why OpenAI

The rise of ChatGPT in the public eye over the past 18 months has made OpenAI a power player in the tech industry, allowing it to strike deals with publishers for AI training content—and ensure continued support from Microsoft in the form of investments that trade vital funding and compute for access to OpenAI’s large language model (LLM) technology like GPT-4.

Still, Apple’s choice of ChatGPT as Apple’s first external AI integration has led to widespread misunderstanding, especially since Apple buried the lede about its own in-house LLM technology that powers its new “Apple Intelligence” platform.

On Apple’s part, CEO Tim Cook told The Washington Post that it chose OpenAI as its first third-party AI partner because he thinks the company controls the leading LLM technology at the moment: “I think they’re a pioneer in the area, and today they have the best model,” he said. “We’re integrating with other people as well. But they’re first, and I think today it’s because they’re best.”

Apple’s choice also brings risk. OpenAI’s record isn’t spotless, racking up a string of public controversies over the past month that include an accusation from actress Scarlett Johansson that the company intentionally imitated her voice, resignations from a key scientist and safety personnel, the revelation of a restrictive NDA for ex-employees that prevented public criticism, and accusations against OpenAI CEO Sam Altman of “psychological abuse” related by a former member of the OpenAI board.

Meanwhile, critics of privacy issues related to gathering data for training AI models—including OpenAI foe Elon Musk, who took to X on Monday to spread misconceptions about how the ChatGPT integration might work—also worried that the Apple-OpenAI deal might expose personal data to the AI company, although both companies strongly deny that will be the case.

Looking ahead, Apple’s deal with OpenAI is not exclusive, and the company is already in talks to offer Google’s Gemini chatbot as an additional option later this year. Apple has also reportedly held talks with Anthropic (maker of Claude 3) as a potential chatbot partner, signaling its intention to provide users with a range of AI services, much like how the company offers various search engine options in Safari.

Report: Apple isn’t paying OpenAI for ChatGPT integration into OSes Read More »