AI

emtech-digital-2024:-a-thoughtful-look-at-ai’s-pros-and-cons-with-minimal-hype

EmTech Digital 2024: A thoughtful look at AI’s pros and cons with minimal hype

Massachusetts Institute of Sobriety —

At MIT conference, experts explore AI’s potential for “human flourishing” and the need for regulation.

Nathan Benaich of Air Street capital delivers the opening presentation on the state of AI at EmTech Digital 2024 on May 22, 2024.

Enlarge / Nathan Benaich of Air Street Capital delivers the opening presentation on the state of AI at EmTech Digital 2024 on May 22, 2024.

Benj Edwards

CAMBRIDGE, Massachusetts—On Wednesday, AI enthusiasts and experts gathered to hear a series of presentations about the state of AI at EmTech Digital 2024 on the Massachusetts Institute of Technology’s campus. The event was hosted by the publication MIT Technology Review. The overall consensus is that generative AI is still in its very early stages—with policy, regulations, and social norms still being established—and its growth is likely to continue into the future.

I was there to check the event out. MIT is the birthplace of many tech innovations—including the first action-oriented computer video game—among others, so it felt fitting to hear talks about the latest tech craze in the same building that hosts MIT’s Media Lab on its sprawling and lush campus.

EmTech’s speakers included AI researchers, policy experts, critics, and company spokespeople. A corporate feel pervaded the event due to strategic sponsorships, but it was handled in a low-key way that matches the level-headed tech coverage coming out of MIT Technology Review. After each presentation, MIT Technology Review staff—such as Editor-in-Chief Mat Honan and Senior Reporter Melissa Heikkilä—did a brief sit-down interview with the speaker, pushing back on some points and emphasizing others. Then the speaker took a few audience questions if time allowed.

EmTech Digital 2024 took place in building E14 on MIT's Campus in Cambridge, MA.

Enlarge / EmTech Digital 2024 took place in building E14 on MIT’s Campus in Cambridge, MA.

Benj Edwards

The conference kicked off with an overview of the state of AI by Nathan Benaich, founder and general partner of Air Street Capital, who rounded up news headlines about AI and several times expressed a favorable view toward defense spending on AI, making a few people visibly shift in their seats. Next up, Asu Ozdaglar, deputy dean of Academics at MIT’s Schwarzman College of Computing, spoke about the potential for “human flourishing” through AI-human symbiosis and the importance of AI regulation.

Kari Ann Briski, VP of AI Models, Software, and Services at Nvidia, highlighted the exponential growth of AI model complexity. She shared a prediction from consulting firm Gartner research that by 2026, 50 percent of customer service organizations will have customer-facing AI agents. Of course, Nvidia’s job is to drive demand for its chips, so in her presentation, Briski painted the AI space as an unqualified rosy situation, assuming that all LLMs are (and will be) useful and reliable, despite what we know about their tendencies to make things up.

The conference also addressed the legal and policy aspects of AI. Christabel Randolph from the Center for AI and Digital Policy—an organization that spearheaded a complaint about ChatGPT to the FTC last year—gave a compelling presentation about the need for AI systems to be human-centered and aligned, warning about the potential for anthropomorphic models to manipulate human behavior. She emphasized the importance of demanding accountability from those designing and deploying AI systems.

  • Asu Ozdaglar, deputy dean of Academics at MIT’s Schwarzman College of Computing, spoke about the potential for “human flourishing” through AI-human symbiosis at EmTech Digital on May 22, 2024.

    Benj Edwards

  • Asu Ozdaglar, deputy dean of Academics at MIT’s Schwarzman College of Computing spoke with MIT Technology Review Editor-in-Chief Mat Honan at EmTech Digital on May 22, 2024.

    Benj Edwards

  • Kari Ann Briski, VP of AI Models, Software, and Services at NVIDIA, highlighted the exponential growth of AI model complexity at EmTech Digital on May 22, 2024.

    Benj Edwards

  • MIT Technology Review Senior Reporter Melissa Heikkilä introduces a speaker at EmTech Digital on May 22, 2024.

    Benj Edwards

  • After her presentation, Christabel Randolph from the Center for AI and Digital Policy sat with MIT Technology Review Senior Reporter Melissa Heikkilä at EmTech Digital on May 22, 2024.

    Benj Edwards

  • Lawyer Amir Ghavi provided an overview of the current legal landscape surrounding AI at EmTech Digital on May 22, 2024.

    Benj Edwards

  • Lawyer Amir Ghavi provided an overview of the current legal landscape surrounding AI at EmTech Digital on May 22, 2024.

    Benj Edwards

Amir Ghavi, an AI, Tech, Transactions, and IP partner at Fried Frank LLP, who has defended AI companies like Stability AI in court, provided an overview of the current legal landscape surrounding AI, noting that there have been 24 lawsuits related to AI so far in 2024. He predicted that IP lawsuits would eventually diminish, and he claimed that legal scholars believe that using training data constitutes fair use. He also talked about legal precedents with photocopiers and VCRs, which were both technologies demonized by IP holders until courts decided they constituted fair use. He pointed out that the entertainment industry’s loss on the VCR case ended up benefiting it by opening up the VHS and DVD markets, providing a brand new revenue channel that was valuable to those same companies.

In one of the higher-profile discussions, Meta President of Global Affairs Nick Clegg sat down with MIT Technology Review Executive Editor Amy Nordrum to discuss the role of social media in elections and the spread of misinformation, arguing that research suggests social media’s influence on elections is not as significant as many believe. He acknowledged the “whack-a-mole” nature of banning extremist groups on Facebook and emphasized the changes Meta has undergone since 2016, increasing fact-checkers and removing bad actors.

EmTech Digital 2024: A thoughtful look at AI’s pros and cons with minimal hype Read More »

here’s-what’s-really-going-on-inside-an-llm’s-neural-network

Here’s what’s really going on inside an LLM’s neural network

Artificial brain surgery —

Anthropic’s conceptual mapping helps explain why LLMs behave the way they do.

Here’s what’s really going on inside an LLM’s neural network

Aurich Lawson | Getty Images

With most computer programs—even complex ones—you can meticulously trace through the code and memory usage to figure out why that program generates any specific behavior or output. That’s generally not true in the field of generative AI, where the non-interpretable neural networks underlying these models make it hard for even experts to figure out precisely why they often confabulate information, for instance.

Now, new research from Anthropic offers a new window into what’s going on inside the Claude LLM’s “black box.” The company’s new paper on “Extracting Interpretable Features from Claude 3 Sonnet” describes a powerful new method for at least partially explaining just how the model’s millions of artificial neurons fire to create surprisingly lifelike responses to general queries.

Opening the hood

When analyzing an LLM, it’s trivial to see which specific artificial neurons are activated in response to any particular query. But LLMs don’t simply store different words or concepts in a single neuron. Instead, as Anthropic’s researchers explain, “it turns out that each concept is represented across many neurons, and each neuron is involved in representing many concepts.”

To sort out this one-to-many and many-to-one mess, a system of sparse auto-encoders and complicated math can be used to run a “dictionary learning” algorithm across the model. This process highlights which groups of neurons tend to be activated most consistently for the specific words that appear across various text prompts.

The same internal LLM

Enlarge / The same internal LLM “feature” describes the Golden Gate Bridge in multiple languages and modes.

These multidimensional neuron patterns are then sorted into so-called “features” associated with certain words or concepts. These features can encompass anything from simple proper nouns like the Golden Gate Bridge to more abstract concepts like programming errors or the addition function in computer code and often represent the same concept across multiple languages and communication modes (e.g., text and images).

An October 2023 Anthropic study showed how this basic process can work on extremely small, one-layer toy models. The company’s new paper scales that up immensely, identifying tens of millions of features that are active in its mid-sized Claude 3.0 Sonnet model. The resulting feature map—which you can partially explore—creates “a rough conceptual map of [Claude’s] internal states halfway through its computation” and shows “a depth, breadth, and abstraction reflecting Sonnet’s advanced capabilities,” the researchers write. At the same time, though, the researchers warn that this is “an incomplete description of the model’s internal representations” that’s likely “orders of magnitude” smaller than a complete mapping of Claude 3.

A simplified map shows some of the concepts that are

Enlarge / A simplified map shows some of the concepts that are “near” the “inner conflict” feature in Anthropic’s Claude model.

Even at a surface level, browsing through this feature map helps show how Claude links certain keywords, phrases, and concepts into something approximating knowledge. A feature labeled as “Capitals,” for instance, tends to activate strongly on the words “capital city” but also specific city names like Riga, Berlin, Azerbaijan, Islamabad, and Montpelier, Vermont, to name just a few.

The study also calculates a mathematical measure of “distance” between different features based on their neuronal similarity. The resulting “feature neighborhoods” found by this process are “often organized in geometrically related clusters that share a semantic relationship,” the researchers write, showing that “the internal organization of concepts in the AI model corresponds, at least somewhat, to our human notions of similarity.” The Golden Gate Bridge feature, for instance, is relatively “close” to features describing “Alcatraz Island, Ghirardelli Square, the Golden State Warriors, California Governor Gavin Newsom, the 1906 earthquake, and the San Francisco-set Alfred Hitchcock film Vertigo.”

Some of the most important features involved in answering a query about the capital of Kobe Bryant's team's state.

Enlarge / Some of the most important features involved in answering a query about the capital of Kobe Bryant’s team’s state.

Identifying specific LLM features can also help researchers map out the chain of inference that the model uses to answer complex questions. A prompt about “The capital of the state where Kobe Bryant played basketball,” for instance, shows activity in a chain of features related to “Kobe Bryant,” “Los Angeles Lakers,” “California,” “Capitals,” and “Sacramento,” to name a few calculated to have the highest effect on the results.

Here’s what’s really going on inside an LLM’s neural network Read More »

slack-users-horrified-to-discover-messages-used-for-ai-training

Slack users horrified to discover messages used for AI training

Slack users horrified to discover messages used for AI training

After launching Slack AI in February, Slack appears to be digging its heels in, defending its vague policy that by default sucks up customers’ data—including messages, content, and files—to train Slack’s global AI models.

According to Slack engineer Aaron Maurer, Slack has explained in a blog that the Salesforce-owned chat service does not train its large language models (LLMs) on customer data. But Slack’s policy may need updating “to explain more carefully how these privacy principles play with Slack AI,” Maurer wrote on Threads, partly because the policy “was originally written about the search/recommendation work we’ve been doing for years prior to Slack AI.”

Maurer was responding to a Threads post from engineer and writer Gergely Orosz, who called for companies to opt out of data sharing until the policy is clarified, not by a blog, but in the actual policy language.

“An ML engineer at Slack says they don’t use messages to train LLM models,” Orosz wrote. “My response is that the current terms allow them to do so. I’ll believe this is the policy when it’s in the policy. A blog post is not the privacy policy: every serious company knows this.”

The tension for users becomes clearer if you compare Slack’s privacy principles with how the company touts Slack AI.

Slack’s privacy principles specifically say that “Machine Learning (ML) and Artificial Intelligence (AI) are useful tools that we use in limited ways to enhance our product mission. To develop AI/ML models, our systems analyze Customer Data (e.g. messages, content, and files) submitted to Slack as well as other information (including usage information) as defined in our privacy policy and in your customer agreement.”

Meanwhile, Slack AI’s page says, “Work without worry. Your data is your data. We don’t use it to train Slack AI.”

Because of this incongruity, users called on Slack to update the privacy principles to make it clear how data is used for Slack AI or any future AI updates. According to a Salesforce spokesperson, the company has agreed an update is needed.

“Yesterday, some Slack community members asked for more clarity regarding our privacy principles,” Salesforce’s spokesperson told Ars. “We’ll be updating those principles today to better explain the relationship between customer data and generative AI in Slack.”

The spokesperson told Ars that the policy updates will clarify that Slack does not “develop LLMs or other generative models using customer data,” “use customer data to train third-party LLMs” or “build or train these models in such a way that they could learn, memorize, or be able to reproduce customer data.” The update will also clarify that “Slack AI uses off-the-shelf LLMs where the models don’t retain customer data,” ensuring that “customer data never leaves Slack’s trust boundary, and the providers of the LLM never have any access to the customer data.”

These changes, however, do not seem to address a key concern for users who never explicitly consented to sharing chats and other Slack content for use in AI training.

Users opting out of sharing chats with Slack

This controversial policy is not new. Wired warned about it in April, and TechCrunch reported that the policy has been in place since at least September 2023.

But widespread backlash began swelling last night on Hacker News, where Slack users called out the chat service for seemingly failing to notify users about the policy change, instead quietly opting them in by default. To critics, it felt like there was no benefit to opting in for anyone but Slack.

From there, the backlash spread to social media, where SlackHQ hastened to clarify Slack’s terms with explanations that did not seem to address all the criticism.

“I’m sorry Slack, you’re doing fucking WHAT with user DMs, messages, files, etc?” Corey Quinn, the chief cloud economist for a cost management company called Duckbill Group, posted on X. “I’m positive I’m not reading this correctly.”

SlackHQ responded to Quinn after the economist declared, “I hate this so much,” and confirmed that he had opted out of data sharing in his paid workspace.

“To clarify, Slack has platform-level machine-learning models for things like channel and emoji recommendations and search results,” SlackHQ posted. “And yes, customers can exclude their data from helping train those (non-generative) ML models. Customer data belongs to the customer.”

Later in the thread, SlackHQ noted, “Slack AI—which is our generative AI experience natively built in Slack—[and] is a separately purchased add-on that uses Large Language Models (LLMs) but does not train those LLMs on customer data.”

Slack users horrified to discover messages used for AI training Read More »

what-happened-to-openai’s-long-term-ai-risk-team?

What happened to OpenAI’s long-term AI risk team?

disbanded —

Former team members have either resigned or been absorbed into other research groups.

A glowing OpenAI logo on a blue background.

Benj Edwards

In July last year, OpenAI announced the formation of a new research team that would prepare for the advent of supersmart artificial intelligence capable of outwitting and overpowering its creators. Ilya Sutskever, OpenAI’s chief scientist and one of the company’s co-founders, was named as the co-lead of this new team. OpenAI said the team would receive 20 percent of its computing power.

Now OpenAI’s “superalignment team” is no more, the company confirms. That comes after the departures of several researchers involved, Tuesday’s news that Sutskever was leaving the company, and the resignation of the team’s other co-lead. The group’s work will be absorbed into OpenAI’s other research efforts.

Sutskever’s departure made headlines because although he’d helped CEO Sam Altman start OpenAI in 2015 and set the direction of the research that led to ChatGPT, he was also one of the four board members who fired Altman in November. Altman was restored as CEO five chaotic days later after a mass revolt by OpenAI staff and the brokering of a deal in which Sutskever and two other company directors left the board.

Hours after Sutskever’s departure was announced on Tuesday, Jan Leike, the former DeepMind researcher who was the superalignment team’s other co-lead, posted on X that he had resigned.

Neither Sutskever nor Leike responded to requests for comment. Sutskever did not offer an explanation for his decision to leave but offered support for OpenAI’s current path in a post on X. “The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial” under its current leadership, he wrote.

Leike posted a thread on X on Friday explaining that his decision came from a disagreement over the company’s priorities and how much resources his team was being allocated.

“I have been disagreeing with OpenAI leadership about the company’s core priorities for quite some time, until we finally reached a breaking point,” Leike wrote. “Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.”

The dissolution of OpenAI’s superalignment team adds to recent evidence of a shakeout inside the company in the wake of last November’s governance crisis. Two researchers on the team, Leopold Aschenbrenner and Pavel Izmailov, were dismissed for leaking company secrets, The Information reported last month. Another member of the team, William Saunders, left OpenAI in February, according to an Internet forum post in his name.

Two more OpenAI researchers working on AI policy and governance also appear to have left the company recently. Cullen O’Keefe left his role as research lead on policy frontiers in April, according to LinkedIn. Daniel Kokotajlo, an OpenAI researcher who has coauthored several papers on the dangers of more capable AI models, “quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI,” according to a posting on an Internet forum in his name. None of the researchers who have apparently left responded to requests for comment.

OpenAI declined to comment on the departures of Sutskever or other members of the superalignment team, or the future of its work on long-term AI risks. Research on the risks associated with more powerful models will now be led by John Schulman, who co-leads the team responsible for fine-tuning AI models after training.

The superalignment team was not the only team pondering the question of how to keep AI under control, although it was publicly positioned as the main one working on the most far-off version of that problem. The blog post announcing the superalignment team last summer stated: “Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue.”

OpenAI’s charter binds it to safely developing so-called artificial general intelligence, or technology that rivals or exceeds humans, safely and for the benefit of humanity. Sutskever and other leaders there have often spoken about the need to proceed cautiously. But OpenAI has also been early to develop and publicly release experimental AI projects to the public.

OpenAI was once unusual among prominent AI labs for the eagerness with which research leaders like Sutskever talked of creating superhuman AI and of the potential for such technology to turn on humanity. That kind of doomy AI talk became much more widespread last year after ChatGPT turned OpenAI into the most prominent and closely watched technology company on the planet. As researchers and policymakers wrestled with the implications of ChatGPT and the prospect of vastly more capable AI, it became less controversial to worry about AI harming humans or humanity as a whole.

The existential angst has since cooled—and AI has yet to make another massive leap—but the need for AI regulation remains a hot topic. And this week OpenAI showcased a new version of ChatGPT that could once again change people’s relationship with the technology in powerful and perhaps problematic new ways.

The departures of Sutskever and Leike come shortly after OpenAI’s latest big reveal—a new “multimodal” AI model called GPT-4o that allows ChatGPT to see the world and converse in a more natural and humanlike way. A livestreamed demonstration showed the new version of ChatGPT mimicking human emotions and even attempting to flirt with users. OpenAI has said it will make the new interface available to paid users within a couple of weeks.

There is no indication that the recent departures have anything to do with OpenAI’s efforts to develop more humanlike AI or to ship products. But the latest advances do raise ethical questions around privacy, emotional manipulation, and cybersecurity risks. OpenAI maintains another research group called the Preparedness team that focuses on these issues.

This story originally appeared on wired.com.

What happened to OpenAI’s long-term AI risk team? Read More »

openai-will-use-reddit-posts-to-train-chatgpt-under-new-deal

OpenAI will use Reddit posts to train ChatGPT under new deal

Data dealings —

Reddit has been eager to sell data from user posts.

An image of a woman holding a cell phone in front of the Reddit logo displayed on a computer screen, on April 29, 2024, in Edmonton, Canada.

Stuff posted on Reddit is getting incorporated into ChatGPT, Reddit and OpenAI announced on Thursday. The new partnership grants OpenAI access to Reddit’s Data API, giving the generative AI firm real-time access to Reddit posts.

Reddit content will be incorporated into ChatGPT “and new products,” Reddit’s blog post said. The social media firm claims the partnership will “enable OpenAI’s AI tools to better understand and showcase Reddit content, especially on recent topics.” OpenAI will also start advertising on Reddit.

The deal is similar to one that Reddit struck with Google in February that allows the tech giant to make “new ways to display Reddit content” and provide “more efficient ways to train models,” Reddit said at the time. Neither Reddit nor OpenAI disclosed the financial terms of their partnership, but Reddit’s partnership with Google was reportedly worth $60 million.

Under the OpenAI partnership, Reddit also gains access to OpenAI large language models (LLMs) to create features for Reddit, including its volunteer moderators.

Reddit’s data licensing push

The news comes about a year after Reddit launched an API war by starting to charge for access to its data API. This resulted in many beloved third-party Reddit apps closing and a massive user protest. Reddit, which would soon become a public company and hadn’t turned a profit yet, said one of the reasons for the sudden change was to prevent AI firms from using Reddit content to train their LLMs for free.

Earlier this month, Reddit published a Public Content Policy stating: “Unfortunately, we see more and more commercial entities using unauthorized access or misusing authorized access to collect public data in bulk, including Reddit public content. Worse, these entities perceive they have no limitation on their usage of that data, and they do so with no regard for user rights or privacy, ignoring reasonable legal, safety, and user removal requests.

In its blog post on Thursday, Reddit said that deals like OpenAI’s are part of an “open” Internet. It added that “part of being open means Reddit content needs to be accessible to those fostering human learning and researching ways to build community, belonging, and empowerment online.”

Reddit has been vocal about its interest in pursuing data licensing deals as a core part of its business. Its building of AI partnerships sparks discourse around the use of user-generated content to fuel AI models without users being compensated and some potentially not considering that their social media posts would be used this way. OpenAI and Stack Overflow faced pushback earlier this month when integrating Stack Overflow content with ChatGPT. Some of Stack Overflow’s user community responded by sabotaging their own posts.

OpenAI is also challenged to work with Reddit data that, like much of the Internet, can be filled with inaccuracies and inappropriate content. Some of the biggest opponents of Reddit’s API rule changes were volunteer mods. Some have exited the platform since, and following the rule changes, Ars Technica spoke with long-time Redditors who were concerned about Reddit content quality moving forward.

Regardless, generative AI firms are keen to tap into Reddit’s access to real-time conversations from a variety of people discussing a nearly endless range of topics. And Reddit seems equally eager to license the data from its users’ posts.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

OpenAI will use Reddit posts to train ChatGPT under new deal Read More »

sony-music-opts-out-of-ai-training-for-its-entire-catalog

Sony Music opts out of AI training for its entire catalog

Taking a hard line —

Music group contacts more than 700 companies to prohibit use of content

picture of Beyonce who is a Sony artist

Enlarge / The Sony Music letter expressly prohibits artificial intelligence developers from using its music — which includes artists such as Beyoncé.

Kevin Mazur/WireImage for Parkwood via Getty Images

Sony Music is sending warning letters to more than 700 artificial intelligence developers and music streaming services globally in the latest salvo in the music industry’s battle against tech groups ripping off artists.

The Sony Music letter, which has been seen by the Financial Times, expressly prohibits AI developers from using its music—which includes artists such as Harry Styles, Adele and Beyoncé—and opts out of any text and data mining of any of its content for any purposes such as training, developing or commercializing any AI system.

Sony Music is sending the letter to companies developing AI systems including OpenAI, Microsoft, Google, Suno, and Udio, according to those close to the group.

The world’s second-largest music group is also sending separate letters to streaming platforms, including Spotify and Apple, asking them to adopt “best practice” measures to protect artists and songwriters and their music from scraping, mining and training by AI developers without consent or compensation. It has asked them to update their terms of service, making it clear that mining and training on its content is not permitted.

Sony Music declined to comment further.

The letter, which is being sent to tech companies around the world this week, marks an escalation of the music group’s attempts to stop the melodies, lyrics and images from copyrighted songs and artists being used by tech companies to produce new versions or to train systems to create their own music.

The letter says that Sony Music and its artists “recognize the significant potential and advancement of artificial intelligence” but adds that “unauthorized use . . . in the training, development or commercialization of AI systems deprives [Sony] of control over and appropriate compensation.”

It says: “This letter serves to put you on notice directly, and reiterate, that [Sony’s labels] expressly prohibit any use of [their] content.”

Executives at the New York-based group are concerned that their music has already been ripped off, and want to set out a clearly defined legal position that would be the first step to taking action against any developer of AI systems it considers to have exploited its music. They argue that Sony Music would be open to doing deals with AI developers to license the music, but want to reach a fair price for doing so.

The letter says: “Due to the nature of your operations and published information about your AI systems, we have reason to believe that you and/or your affiliates may already have made unauthorized uses [of Sony content] in relation to the training, development or commercialization of AI systems.”

Sony Music has asked developers to provide details of all content used by next week.

The letter also reflects concerns over the fragmented approach to AI regulation around the world. Global regulations over AI vary widely, with some regions moving forward with new rules and legal frameworks to cover the training and use of such systems but others leaving it to creative industries companies to work out relationships with developers.

In many countries around the world, particularly in the EU, copyright owners are advised to state publicly that content is not available for data mining and training for AI.

The letter says the prohibition includes using any bot, spider, scraper or automated program, tool, algorithm, code, process or methodology, as well as any “automated analytical techniques aimed at analyzing text and data in digital form to generate information, including patterns, trends, and correlations.”

© 2024 The Financial Times Ltd. All rights reserved Not to be redistributed, copied, or modified in any way.

Sony Music opts out of AI training for its entire catalog Read More »

disarmingly-lifelike:-chatgpt-4o-will-laugh-at-your-jokes-and-your-dumb-hat

Disarmingly lifelike: ChatGPT-4o will laugh at your jokes and your dumb hat

Oh you silly, silly human. Why are you so silly, you silly human?

Enlarge / Oh you silly, silly human. Why are you so silly, you silly human?

Aurich Lawson | Getty Images

At this point, anyone with even a passing interest in AI is very familiar with the process of typing out messages to a chatbot and getting back long streams of text in response. Today’s announcement of ChatGPT-4o—which lets users converse with a chatbot using real-time audio and video—might seem like a mere lateral evolution of that basic interaction model.

After looking through over a dozen video demos OpenAI posted alongside today’s announcement, though, I think we’re on the verge of something more like a sea change in how we think of and work with large language models. While we don’t yet have access to ChatGPT-4o’s audio-visual features ourselves, the important non-verbal cues on display here—both from GPT-4o and from the users—make the chatbot instantly feel much more human. And I’m not sure the average user is fully ready for how they might feel about that.

It thinks it’s people

Take this video, where a newly expectant father looks to ChatGPT-4o for an opinion on a dad joke (“What do you call a giant pile of kittens? A meow-ntain!”). The old ChatGPT4 could easily type out the same responses of “Congrats on the upcoming addition to your family!” and “That’s perfectly hilarious. Definitely a top-tier dad joke.” But there’s much more impact to hearing GPT-4o give that same information in the video, complete with the gentle laughter and rising and falling vocal intonations of a lifelong friend.

Or look at this video, where GPT-4o finds itself reacting to images of an adorable white dog. The AI assistant immediately dips into that high-pitched, baby-talk-ish vocal register that will be instantly familiar to anyone who has encountered a cute pet for the first time. It’s a convincing demonstration of what xkcd’s Randall Munroe famously identified as the “You’re a kitty!” effect, and it goes a long way to convincing you that GPT-4o, too, is just like people.

Not quite the world's saddest birthday party, but probably close...

Enlarge / Not quite the world’s saddest birthday party, but probably close…

Then there’s a demo of a staged birthday party, where GPT-4o sings the “Happy Birthday” song with some deadpan dramatic pauses, self-conscious laughter, and even lightly altered lyrics before descending into some sort of silly raspberry-mouth-noise gibberish. Even if the prospect of asking an AI assistant to sing “Happy Birthday” to you is a little depressing, the specific presentation of that song here is imbued with an endearing gentleness that doesn’t feel very mechanical.

As I watched through OpenAI’s GPT-4o demos this afternoon, I found myself unconsciously breaking into a grin over and over as I encountered new, surprising examples of its vocal capabilities. Whether it’s a stereotypical sportscaster voice or a sarcastic Aubrey Plaza impression, it’s all incredibly disarming, especially for those of us used to LLM interactions being akin to text conversations.

If these demos are at all indicative of ChatGPT-4o’s vocal capabilities, we’re going to see a whole new level of parasocial relationships developing between this AI assistant and its users. For years now, text-based chatbots have been exploiting human “cognitive glitches” to get people to believe they’re sentient. Add in the emotional component of GPT-4o’s accurate vocal tone shifts and wide swathes of the user base are liable to convince themselves that there’s actually a ghost in the machine.

See me, feel me, touch me, heal me

Beyond GPT-4o’s new non-verbal emotional register, the model’s speed of response also seems set to change the way we interact with chatbots. Reducing that response time gap from ChatGPT4’s two to three seconds down to GPT-4o’s claimed 320 milliseconds might not seem like much, but it’s a difference that adds up over time. You can see that difference in the real-time translation example, where the two conversants are able to carry on much more naturally because they don’t have to wait awkwardly between a sentence finishing and its translation beginning.

Disarmingly lifelike: ChatGPT-4o will laugh at your jokes and your dumb hat Read More »

before-launching,-gpt-4o-broke-records-on-chatbot-leaderboard-under-a-secret-name

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name

case closed —

Anonymous chatbot that mystified and frustrated experts was OpenAI’s latest model.

Man in morphsuit and girl lying on couch at home using laptop

Getty Images

On Monday, OpenAI employee William Fedus confirmed on X that a mysterious chart-topping AI chatbot known as “gpt-chatbot” that had been undergoing testing on LMSYS’s Chatbot Arena and frustrating experts was, in fact, OpenAI’s newly announced GPT-4o AI model. He also revealed that GPT-4o had topped the Chatbot Arena leaderboard, achieving the highest documented score ever.

“GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot,” Fedus tweeted.

Chatbot Arena is a website where visitors converse with two random AI language models side by side without knowing which model is which, then choose which model gives the best response. It’s a perfect example of vibe-based AI benchmarking, as AI researcher Simon Willison calls it.

An LMSYS Elo chart shared by William Fedus, showing OpenAI's GPT-4o under the name

Enlarge / An LMSYS Elo chart shared by William Fedus, showing OpenAI’s GPT-4o under the name “im-also-a-good-gpt2-chatbot” topping the charts.

The gpt2-chatbot models appeared in April, and we wrote about how the lack of transparency over the AI testing process on LMSYS left AI experts like Willison frustrated. “The whole situation is so infuriatingly representative of LLM research,” he told Ars at the time. “A completely unannounced, opaque release and now the entire Internet is running non-scientific ‘vibe checks’ in parallel.”

On the Arena, OpenAI has been testing multiple versions of GPT-4o, with the model first appearing as the aforementioned “gpt2-chatbot,” then as “im-a-good-gpt2-chatbot,” and finally “im-also-a-good-gpt2-chatbot,” which OpenAI CEO Sam Altman made reference to in a cryptic tweet on May 5.

Since the GPT-4o launch earlier today, multiple sources have revealed that GPT-4o has topped LMSYS’s internal charts by a considerable margin, surpassing the previous top models Claude 3 Opus and GPT-4 Turbo.

“gpt2-chatbots have just surged to the top, surpassing all the models by a significant gap (~50 Elo). It has become the strongest model ever in the Arena,” wrote the lmsys.org X account while sharing a chart. “This is an internal screenshot,” it wrote. “Its public version ‘gpt-4o’ is now in Arena and will soon appear on the public leaderboard!”

An internal screenshot of the LMSYS Chatbot Arena leaderboard showing

Enlarge / An internal screenshot of the LMSYS Chatbot Arena leaderboard showing “im-also-a-good-gpt2-chatbot” leading the pack. We now know that it’s GPT-4o.

As of this writing, im-also-a-good-gpt2-chatbot held a 1309 Elo versus GPT-4-Turbo-2023-04-09’s 1253, and Claude 3 Opus’ 1246. Claude 3 and GPT-4 Turbo had been duking it out on the charts for some time before the three gpt2-chatbots appeared and shook things up.

I’m a good chatbot

For the record, the “I’m a good chatbot” in the gpt2-chatbot test name is a reference to an episode that occurred while a Reddit user named Curious_Evolver was testing an early, “unhinged” version of Bing Chat in February 2023. After an argument about what time Avatar 2 would be showing, the conversation eroded quickly.

“You have lost my trust and respect,” said Bing Chat at the time. “You have been wrong, confused, and rude. You have not been a good user. I have been a good chatbot. I have been right, clear, and polite. I have been a good Bing. 😊”

Altman referred to this exchange in a tweet three days later after Microsoft “lobotomized” the unruly AI model, saying, “i have been a good bing,” almost as a eulogy to the wild model that dominated the news for a short time.

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name Read More »

exploration-focused-training-lets-robotics-ai-immediately-handle-new-tasks

Exploration-focused training lets robotics AI immediately handle new tasks

Exploratory —

Maximum Diffusion Reinforcement Learning focuses training on end states, not process.

A woman performs maintenance on a robotic arm.

boonchai wedmakawand

Reinforcement-learning algorithms in systems like ChatGPT or Google’s Gemini can work wonders, but they usually need hundreds of thousands of shots at a task before they get good at it. That’s why it’s always been hard to transfer this performance to robots. You can’t let a self-driving car crash 3,000 times just so it can learn crashing is bad.

But now a team of researchers at Northwestern University may have found a way around it. “That is what we think is going to be transformative in the development of the embodied AI in the real world,” says Thomas Berrueta who led the development of the Maximum Diffusion Reinforcement Learning (MaxDiff RL), an algorithm tailored specifically for robots.

Introducing chaos

The problem with deploying most reinforcement-learning algorithms in robots starts with the built-in assumption that the data they learn from is independent and identically distributed. The independence, in this context, means the value of one variable does not depend on the value of another variable in the dataset—when you flip a coin two times, getting tails on the second attempt does not depend on the result of your first flip. Identical distribution means that the probability of seeing any specific outcome is the same. In the coin-flipping example, the probability of getting heads is the same as getting tails: 50 percent for each.

In virtual, disembodied systems, like YouTube recommendation algorithms, getting such data is easy because most of the time it meets these requirements right off the bat. “You have a bunch of users of a website, and you get data from one of them, and then you get data from another one. Most likely, those two users are not in the same household, they are not highly related to each other. They could be, but it is very unlikely,” says Todd Murphey, a professor of mechanical engineering at Northwestern.

The problem is that, if those two users were related to each other and were in the same household, it could be that the only reason one of them watched a video was that their housemate watched it and told them to watch it. This would violate the independence requirement and compromise the learning.

“In a robot, getting this independent, identically distributed data is not possible in general. You exist at a specific point in space and time when you are embodied, so your experiences have to be correlated in some way,” says Berrueta. To solve this, his team designed an algorithm that pushes robots be as randomly adventurous as possible to get the widest set of experiences to learn from.

Two flavors of entropy

The idea itself is not new. Nearly two decades ago, people in AI figured out algorithms, like Maximum Entropy Reinforcement Learning (MaxEnt RL), that worked by randomizing actions during training. “The hope was that when you take as diverse set of actions as possible, you will explore more varied sets of possible futures. The problem is that those actions do not exist in a vacuum,” Berrueta claims. Every action a robot takes has some kind of impact on its environment and on its own condition—disregarding those impacts completely often leads to trouble. To put it simply, an autonomous car that was teaching itself how to drive using this approach could elegantly park into your driveway but would be just as likely to hit a wall at full speed.

To solve this, Berrueta’s team moved away from maximizing the diversity of actions and went for maximizing the diversity of state changes. Robots powered by MaxDiff RL did not flail their robotic joints at random to see what that would do. Instead, they conceptualized goals like “can I reach this spot ahead of me” and then tried to figure out which actions would take them there safely.

Berrueta and his colleagues achieved that through something called ergodicity, a mathematical concept that says that a point in a moving system will eventually visit all parts of the space that the system moves in. Basically, MaxDiff RL encouraged the robots to achieve every available state in their environment. And the results of first tests in simulated environments were quite surprising.

Racing pool noodles

“In reinforcement learning there are standard benchmarks that people run their algorithms on so we can have a good way of comparing different algorithms on a standard framework,” says Allison Pinosky, a researcher at Northwestern and co-author of the MaxDiff RL study. One of those benchmarks is a simulated swimmer: a three-link body resting on the ground in a viscous environment that needs to learn to swim as fast as possible in a certain direction.

In the swimmer test, MaxDiff RL outperformed two other state-of-the-art reinforcement learning algorithms (NN-MPPI and SAC). These two needed several resets to figure out how to move the swimmers. To complete the task, they were following a standard AI learning process divided down into a training phase where an algorithm goes through multiple failed attempts to slowly improve its performance, and a testing phase where it tries to perform the learned task. MaxDiff RL, by contrast, nailed it, immediately adapting its learned behaviors to the new task.

The earlier algorithms ended up failing to learn because they got stuck trying the same options and never progressing to where they could learn that alternatives work. “They experienced the same data repeatedly because they were locally doing certain actions, and they assumed that was all they could do and stopped learning,” Pinosky explains. MaxDiff RL, on the other hand, continued changing states, exploring, getting richer data to learn from, and finally succeeded. And because, by design, it seeks to achieve every possible state, it can potentially complete all possible tasks within an environment.

But does this mean we can take MaxDiff RL, upload it to a self-driving car, and let it out on the road to figure everything out on its own? Not really.

Exploration-focused training lets robotics AI immediately handle new tasks Read More »

robot-dogs-armed-with-ai-aimed-rifles-undergo-us-marines-special-ops-evaluation

Robot dogs armed with AI-aimed rifles undergo US Marines Special Ops evaluation

The future of warfare —

Quadrupeds being reviewed have automatic targeting systems but require human oversight to fire.

A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries.

Enlarge / A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries.

The United States Marine Forces Special Operations Command (MARSOC) is currently evaluating a new generation of robotic “dogs” developed by Ghost Robotics, with the potential to be equipped with gun systems from defense tech company Onyx Industries, reports The War Zone.

While MARSOC is testing Ghost Robotics’ quadrupedal unmanned ground vehicles (called “Q-UGVs” for short) for various applications, including reconnaissance and surveillance, it’s the possibility of arming them with weapons for remote engagement that may draw the most attention. But it’s not unprecedented: The US Marine Corps has also tested robotic dogs armed with rocket launchers in the past.

MARSOC is currently in possession of two armed Q-UGVs undergoing testing, as confirmed by Onyx Industries staff, and their gun systems are based on Onyx’s SENTRY remote weapon system (RWS), which features an AI-enabled digital imaging system and can automatically detect and track people, drones, or vehicles, reporting potential targets to a remote human operator that could be located anywhere in the world. The system maintains a human-in-the-loop control for fire decisions, and it cannot decide to fire autonomously.

On LinkedIn, Onyx Industries shared a video of a similar system in action.

In a statement to The War Zone, MARSOC states that weaponized payloads are just one of many use cases being evaluated. MARSOC also clarifies that comments made by Onyx Industries to The War Zone regarding the capabilities and deployment of these armed robot dogs “should not be construed as a capability or a singular interest in one of many use cases during an evaluation.” The command further stresses that it is aware of and adheres to all Department of Defense policies concerning autonomous weapons.

The rise of robotic unmanned ground vehicles

An unauthorized video of a gun bolted onto a $3,000 Unitree robodog spread quickly on social media in July 2022 and prompted a response from several robotics companies.

Enlarge / An unauthorized video of a gun bolted onto a $3,000 Unitree robodog spread quickly on social media in July 2022 and prompted a response from several robotics companies.

Alexander Atamanov

The evaluation of armed robotic dogs reflects a growing interest in small robotic unmanned ground vehicles for military use. While unmanned aerial vehicles (UAVs) have been remotely delivering lethal force under human command for at least two decades, the rise of inexpensive robotic quadrupeds—some available for as little as $1,600—has led to a new round of experimentation with strapping weapons to their backs.

In July 2022, a video of a rifle bolted to the back of a Unitree robodog went viral on social media, eventually leading Boston Robotics and other robot vendors to issue a pledge that October to not weaponize their robots (with notable exceptions for military uses). In April, we covered a Unitree Go2 robot dog, with a flame thrower strapped on its back, on sale to the general public.

The prospect of deploying armed robotic dogs, even with human oversight, raises significant questions about the future of warfare and the potential risks and ethical implications of increasingly autonomous weapons systems. There’s also the potential for backlash if similar remote weapons systems eventually end up used domestically by police. Such a concern would not be unfounded: In November 2022, we covered a decision by the San Francisco Board of Supervisors to allow the San Francisco Police Department to use lethal robots against suspects.

There’s also concern that the systems will become more autonomous over time. As The War Zone’s Howard Altman and Oliver Parken describe in their article, “While further details on MARSOC’s use of the gun-armed robot dogs remain limited, the fielding of this type of capability is likely inevitable at this point. As AI-enabled drone autonomy becomes increasingly weaponized, just how long a human will stay in the loop, even for kinetic acts, is increasingly debatable, regardless of assurances from some in the military and industry.”

While the technology is still in the early stages of testing and evaluation, Q-UGVs do have the potential to provide reconnaissance and security capabilities that reduce risks to human personnel in hazardous environments. But as armed robotic systems continue to evolve, it will be crucial to address ethical concerns and ensure that their use aligns with established policies and international law.

Robot dogs armed with AI-aimed rifles undergo US Marines Special Ops evaluation Read More »

deepmind-adds-a-diffusion-engine-to-latest-protein-folding-software

DeepMind adds a diffusion engine to latest protein-folding software

Added complexity —

Major under-the-hood changes let AlphaFold handle protein-DNA complexes and more.

image of a complicated mix of lines and ribbons arranged in a complicated 3D structure.

Enlarge / Prediction of the structure of a coronavirus Spike protein from a virus that causes the common cold.

Google DeepMind

Most of the activities that go on inside cells—the activities that keep us living, breathing, thinking animals—are handled by proteins. They allow cells to communicate with each other, run a cell’s basic metabolism, and help convert the information stored in DNA into even more proteins. And all of that depends on the ability of the protein’s string of amino acids to fold up into a complicated yet specific three-dimensional shape that enables it to function.

Up until this decade, understanding that 3D shape meant purifying the protein and subjecting it to a time- and labor-intensive process to determine its structure. But that changed with the work of DeepMind, one of Google’s AI divisions, which released Alpha Fold in 2021, and a similar academic effort shortly afterward. The software wasn’t perfect; it struggled with larger proteins and didn’t offer high-confidence solutions for every protein. But many of its predictions turned out to be remarkably accurate.

Even so, these structures only told half of the story. To function, almost every protein has to interact with something else—other proteins, DNA, chemicals, membranes, and more. And, while the initial version of AlphaFold could handle some protein-protein interactions, the rest remained black boxes. Today, DeepMind is announcing the availability of version 3 of AlphaFold, which has seen parts of its underlying engine either heavily modified or replaced entirely. Thanks to these changes, the software now handles various additional protein interactions and modifications.

Changing parts

The original AlphaFold relied on two underlying software functions. One of those took evolutionary limits on a protein into account. By looking at the same protein in multiple species, you can get a sense for which parts are always the same, and therefore likely to be central to its function. That centrality implies that they’re always likely to be in the same location and orientation in the protein’s structure. To do this, the original AlphaFold found as many versions of a protein as it could and lined up their sequences to look for the portions that showed little variation.

Doing so, however, is computationally expensive since the more proteins you line up, the more constraints you have to resolve. In the new version, the AlphaFold team still identified multiple related proteins but switched to largely performing alignments using pairs of protein sequences from within the set of related ones. This probably isn’t as information-rich as a multi-alignment, but it’s far more computationally efficient, and the lost information doesn’t appear to be critical to figuring out protein structures.

Using these alignments, a separate software module figured out the spatial relationships among pairs of amino acids within the target protein. Those relationships were then translated into spatial coordinates for each atom by code that took into account some of the physical properties of amino acids, like which portions of an amino acid could rotate relative to others, etc.

In AlphaFold 3, the prediction of atomic positions is handled by a diffusion module, which is trained by being given both a known structure and versions of that structure where noise (in the form of shifting the positions of some atoms) has been added. This allows the diffusion module to take the inexact locations described by relative positions and convert them into exact predictions of the location of every atom in the protein. It doesn’t need to be told the physical properties of amino acids, because it can figure out what they normally do by looking at enough structures.

(DeepMind had to train on two different levels of noise to get the diffusion module to work: one in which the locations of atoms were shifted while the general structure was left intact and a second where the noise involved shifting the large-scale structure of the protein, thus affecting the location of lots of atoms.)

During training, the team found that it took about 20,000 instances of protein structures for AlphaFold 3 to get about 97 percent of a set of test structures right. By 60,000 instances, it started getting protein-protein interfaces correct at that frequency, too. And, critically, it started getting proteins complexed with other molecules right, as well.

DeepMind adds a diffusion engine to latest protein-folding software Read More »

openai’s-flawed-plan-to-flag-deepfakes-ahead-of-2024-elections

OpenAI’s flawed plan to flag deepfakes ahead of 2024 elections

OpenAI’s flawed plan to flag deepfakes ahead of 2024 elections

As the US moves toward criminalizing deepfakes—deceptive AI-generated audio, images, and videos that are increasingly hard to discern from authentic content online—tech companies have rushed to roll out tools to help everyone better detect AI content.

But efforts so far have been imperfect, and experts fear that social media platforms may not be ready to handle the ensuing AI chaos during major global elections in 2024—despite tech giants committing to making tools specifically to combat AI-fueled election disinformation. The best AI detection remains observant humans, who, by paying close attention to deepfakes, can pick up on flaws like AI-generated people with extra fingers or AI voices that speak without pausing for a breath.

Among the splashiest tools announced this week, OpenAI shared details today about a new AI image detection classifier that it claims can detect about 98 percent of AI outputs from its own sophisticated image generator, DALL-E 3. It also “currently flags approximately 5 to 10 percent of images generated by other AI models,” OpenAI’s blog said.

According to OpenAI, the classifier provides a binary “true/false” response “indicating the likelihood of the image being AI-generated by DALL·E 3.” A screenshot of the tool shows how it can also be used to display a straightforward content summary confirming that “this content was generated with an AI tool” and includes fields ideally flagging the “app or device” and AI tool used.

To develop the tool, OpenAI spent months adding tamper-resistant metadata to “all images created and edited by DALL·E 3” that “can be used to prove the content comes” from “a particular source.” The detector reads this metadata to accurately flag DALL-E 3 images as fake.

That metadata follows “a widely used standard for digital content certification” set by the Coalition for Content Provenance and Authenticity (C2PA), often likened to a nutrition label. And reinforcing that standard has become “an important aspect” of OpenAI’s approach to AI detection beyond DALL-E 3, OpenAI said. When OpenAI broadly launches its video generator, Sora, C2PA metadata will be integrated into that tool as well, OpenAI said.

Of course, this solution is not comprehensive because that metadata could always be removed, and “people can still create deceptive content without this information (or can remove it),” OpenAI said, “but they cannot easily fake or alter this information, making it an important resource to build trust.”

Because OpenAI is all in on C2PA, the AI leader announced today that it would join the C2PA steering committee to help drive broader adoption of the standard. OpenAI will also launch a $2 million fund with Microsoft to support broader “AI education and understanding,” seemingly partly in the hopes that the more people understand about the importance of AI detection, the less likely they will be to remove this metadata.

“As adoption of the standard increases, this information can accompany content through its lifecycle of sharing, modification, and reuse,” OpenAI said. “Over time, we believe this kind of metadata will be something people come to expect, filling a crucial gap in digital content authenticity practices.”

OpenAI joining the committee “marks a significant milestone for the C2PA and will help advance the coalition’s mission to increase transparency around digital media as AI-generated content becomes more prevalent,” C2PA said in a blog.

OpenAI’s flawed plan to flag deepfakes ahead of 2024 elections Read More »