chatgpt

openai-training-its-next-major-ai-model,-forms-new-safety-committee

OpenAI training its next major AI model, forms new safety committee

now with 200% more safety —

GPT-5 might be farther off than we thought, but OpenAI wants to make sure it is safe.

A man rolling a boulder up a hill.

On Monday, OpenAI announced the formation of a new “Safety and Security Committee” to oversee risk management for its projects and operations. The announcement comes as the company says it has “recently begun” training its next frontier model, which it expects to bring the company closer to its goal of achieving artificial general intelligence (AGI), though some critics say AGI is farther off than we might think. It also comes as a reaction to a terrible two weeks in the press for the company.

Whether the aforementioned new frontier model is intended to be GPT-5 or a step beyond that is currently unknown. In the AI industry, “frontier model” is a term for a new AI system designed to push the boundaries of current capabilities. And “AGI” refers to a hypothetical AI system with human-level abilities to perform novel, general tasks beyond its training data (unlike narrow AI, which is trained for specific tasks).

Meanwhile, the new Safety and Security Committee, led by OpenAI directors Bret Taylor (chair), Adam D’Angelo, Nicole Seligman, and Sam Altman (CEO), will be responsible for making recommendations about AI safety to the full company board of directors. In this case, “safety” partially means the usual “we won’t let the AI go rogue and take over the world,” but it also includes a broader set of “processes and safeguards” that the company spelled out in a May 21 safety update related to alignment research, protecting children, upholding election integrity, assessing societal impacts, and implementing security measures.

OpenAI says the committee’s first task will be to evaluate and further develop those processes and safeguards over the next 90 days. At the end of this period, the committee will share its recommendations with the full board, and OpenAI will publicly share an update on adopted recommendations.

OpenAI says that multiple technical and policy experts, including Aleksander Madry (head of preparedness), Lilian Weng (head of safety systems), John Schulman (head of alignment science), Matt Knight (head of security), and Jakub Pachocki (chief scientist), will also serve on its new committee.

The announcement is notable in a few ways. First, it’s a reaction to the negative press that came from OpenAI Superalignment team members Ilya Sutskever and Jan Leike resigning two weeks ago. That team was tasked with “steer[ing] and control[ling] AI systems much smarter than us,” and their departure has led to criticism from some within the AI community (and Leike himself) that OpenAI lacks a commitment to developing highly capable AI safely. Other critics, like Meta Chief AI Scientist Yann LeCun, think the company is nowhere near developing AGI, so the concern over a lack of safety for superintelligent AI may be overblown.

Second, there have been persistent rumors that progress in large language models (LLMs) has plateaued recently around capabilities similar to GPT-4. Two major competing models, Anthropic’s Claude Opus and Google’s Gemini 1.5 Pro, are roughly equivalent to the GPT-4 family in capability despite every competitive incentive to surpass it. And recently, when many expected OpenAI to release a new AI model that would clearly surpass GPT-4 Turbo, it instead released GPT-4o, which is roughly equivalent in ability but faster. During that launch, the company relied on a flashy new conversational interface rather than a major under-the-hood upgrade.

We’ve previously reported on a rumor of GPT-5 coming this summer, but with this recent announcement, it seems the rumors may have been referring to GPT-4o instead. It’s quite possible that OpenAI is nowhere near releasing a model that can significantly surpass GPT-4. But with the company quiet on the details, we’ll have to wait and see.

OpenAI training its next major AI model, forms new safety committee Read More »

bing-outage-shows-just-how-little-competition-google-search-really-has

Bing outage shows just how little competition Google search really has

Searching for new search —

Opinion: Actively searching without Google or Bing is harder than it looks.

Google logo on a phone in front of a Bing logo in the background

Getty Images

Bing, Microsoft’s search engine platform, went down in the very early morning today. That meant that searches from Microsoft’s Edge browsers that had yet to change their default providers didn’t work. It also meant that services relying on Bing’s search API—Microsoft’s own Copilot, ChatGPT search, Yahoo, Ecosia, and DuckDuckGo—similarly failed.

Services were largely restored by the morning Eastern work hours, but the timing feels apt, concerning, or some combination of the two. Google, the consistently dominating search platform, just last week announced and debuted AI Overviews as a default addition to all searches. If you don’t want an AI response but still want to use Google, you can hunt down the new “Web” option in a menu, or you can, per Ernie Smith, tack “&udm=14” onto your search or use Smith’s own “Konami code” shortcut page.

If dismay about AI’s hallucinations, power draw, or pizza recipes concern you—along with perhaps broader Google issues involving privacy, tracking, news, SEO, or monopoly power—most of your other major options were brought down by a single API outage this morning. Moving past that kind of single point of vulnerability will take some work, both by the industry and by you, the person wondering if there’s a real alternative.

Search engine market share, as measured by StatCounter, April 2023–April 2024.

Search engine market share, as measured by StatCounter, April 2023–April 2024.

StatCounter

Upward of a billion dollars a year

The overwhelming majority of search tools offering an “alternative” to Google are using Google, Bing, or Yandex, the three major search engines that maintain massive global indexes. Yandex, being based in Russia, is a non-starter for many people around the world at the moment. Bing offers its services widely, most notably to DuckDuckGo, but its ad-based revenue model and privacy particulars have caused some friction there in the past. Before his company was able to block more of Microsoft’s own tracking scripts, DuckDuckGo CEO and founder Gabriel Weinberg explained in a Reddit reply why firms like his weren’t going the full DIY route:

… [W]e source most of our traditional links and images privately from Bing … Really only two companies (Google and Microsoft) have a high-quality global web link index (because I believe it costs upwards of a billion dollars a year to do), and so literally every other global search engine needs to bootstrap with one or both of them to provide a mainstream search product. The same is true for maps btw — only the biggest companies can similarly afford to put satellites up and send ground cars to take streetview pictures of every neighborhood.

Bing makes Microsoft money, if not quite profit yet. It’s in Microsoft’s interest to keep its search index stocked and API open, even if its focus is almost entirely on its own AI chatbot version of Bing. Yet if Microsoft decided to pull API access, or it became unreliable, Google’s default position gets even stronger. What would non-conformists have to choose from then?

Bing outage shows just how little competition Google search really has Read More »

sky-voice-actor-says-nobody-ever-compared-her-to-scarjo-before-openai-drama

Sky voice actor says nobody ever compared her to ScarJo before OpenAI drama

Scarlett Johansson attends the Golden Heart Awards in 2023.

Enlarge / Scarlett Johansson attends the Golden Heart Awards in 2023.

OpenAI is sticking to its story that it never intended to copy Scarlett Johansson’s voice when seeking an actor for ChatGPT’s “Sky” voice mode.

The company provided The Washington Post with documents and recordings clearly meant to support OpenAI CEO Sam Altman’s defense against Johansson’s claims that Sky was made to sound “eerily similar” to her critically acclaimed voice acting performance in the sci-fi film Her.

Johansson has alleged that OpenAI hired a soundalike to steal her likeness and confirmed that she declined to provide the Sky voice. Experts have said that Johansson has a strong case should she decide to sue OpenAI for violating her right to publicity, which gives the actress exclusive rights to the commercial use of her likeness.

In OpenAI’s defense, The Post reported that the company’s voice casting call flier did not seek a “clone of actress Scarlett Johansson,” and initial voice test recordings of the unnamed actress hired to voice Sky showed that her “natural voice sounds identical to the AI-generated Sky voice.” Because of this, OpenAI has argued that “Sky’s voice is not an imitation of Scarlett Johansson.”

What’s more, an agent for the unnamed Sky actress who was cast—both granted anonymity to protect her client’s safety—confirmed to The Post that her client said she was never directed to imitate either Johansson or her character in Her. She simply used her own voice and got the gig.

The agent also provided a statement from her client that claimed that she had never been compared to Johansson before the backlash started.

This all “feels personal,” the voice actress said, “being that it’s just my natural voice and I’ve never been compared to her by the people who do know me closely.”

However, OpenAI apparently reached out to Johansson after casting the Sky voice actress. During outreach last September and again this month, OpenAI seemed to want to substitute the Sky voice actress’s voice with Johansson’s voice—which is ironically what happened when Johansson got cast to replace the original actress hired to voice her character in Her.

Altman has clarified that timeline in a statement provided to Ars that emphasized that the company “never intended” Sky to sound like Johansson. Instead, OpenAI tried to snag Johansson to voice the part after realizing—seemingly just as Her director Spike Jonze did—that the voice could potentially resonate with more people if Johansson did it.

“We are sorry to Ms. Johansson that we didn’t communicate better,” Altman’s statement said.

Johansson has not yet made any public indications that she intends to sue OpenAI over this supposed miscommunication. But if she did, legal experts told The Post and Reuters that her case would be strong because of legal precedent set in high-profile lawsuits raised by singers Bette Midler and Tom Waits blocking companies from misappropriating their voices.

Why Johansson could win if she sued OpenAI

In 1988, Bette Midler sued Ford Motor Company for hiring a soundalike to perform Midler’s song “Do You Want to Dance?” in a commercial intended to appeal to “young yuppies” by referencing popular songs from their college days. Midler had declined to do the commercial and accused Ford of exploiting her voice to endorse its product without her consent.

This groundbreaking case proved that a distinctive voice like Midler’s cannot be deliberately imitated to sell a product. It did not matter that the singer used in the commercial had used her natural singing voice, because “a number of people” told Midler that the performance “sounded exactly” like her.

Midler’s case set a powerful precedent preventing companies from appropriating parts of performers’ identities—essentially stopping anyone from stealing a well-known voice that otherwise could not be bought.

“A voice is as distinctive and personal as a face,” the court ruled, concluding that “when a distinctive voice of a professional singer is widely known and is deliberately imitated in order to sell a product, the sellers have appropriated what is not theirs.”

Like in Midler’s case, Johansson could argue that plenty of people think that the Sky voice sounds like her and that OpenAI’s product might be more popular if it had a Her-like voice mode. Comics on popular late-night shows joked about the similarity, including Johansson’s husband, Saturday Night Live comedian Colin Jost. And other people close to Johansson agreed that Sky sounded like her, Johansson has said.

Johansson’s case differs from Midler’s case seemingly primarily because of the casting timeline that OpenAI is working hard to defend.

OpenAI seems to think that because Johansson was offered the gig after the Sky voice actor was cast that she has no case to claim that they hired the other actor after she declined.

The timeline may not matter as much as OpenAI may think, though. In the 1990s, Tom Waits cited Midler’s case when he won a $2.6 million lawsuit after Frito-Lay hired a Waits impersonator to perform a song that “echoed the rhyming word play” of a Waits song in a Doritos commercial. Waits won his suit even though Frito-Lay never attempted to hire the singer before casting the soundalike.

Sky voice actor says nobody ever compared her to ScarJo before OpenAI drama Read More »

openai-pauses-chatgpt-4o-voice-that-fans-said-ripped-off-scarlett-johansson

OpenAI pauses ChatGPT-4o voice that fans said ripped off Scarlett Johansson

“Her” —

“Sky’s voice is not an imitation of Scarlett Johansson,” OpenAI insists.

Scarlett Johansson and Joaquin Phoenix attend <em>Her</em> premiere during the 8th Rome Film Festival at Auditorium Parco Della Musica on November 10, 2013, in Rome, Italy.  ” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/05/GettyImages-187586586-800×534.jpg”></img><figcaption>
<p><a data-height=Enlarge / Scarlett Johansson and Joaquin Phoenix attend Her premiere during the 8th Rome Film Festival at Auditorium Parco Della Musica on November 10, 2013, in Rome, Italy.

OpenAI has paused a voice mode option for ChatGPT-4o, Sky, after backlash accusing the AI company of intentionally ripping off Scarlett Johansson’s critically acclaimed voice-acting performance in the 2013 sci-fi film Her.

In a blog defending their casting decision for Sky, OpenAI went into great detail explaining its process for choosing the individual voice options for its chatbot. But ultimately, the company seemed pressed to admit that Sky’s voice was just too similar to Johansson’s to keep using it, at least for now.

“We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice—Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice,” OpenAI’s blog said.

OpenAI is not naming the actress, or any of the ChatGPT-4o voice actors, to protect their privacy.

A week ago, OpenAI CEO Sam Altman seemed to invite this controversy by posting “her” on X (formerly Twitter) after announcing the ChatGPT audio-video features that he said made it more “natural” for users to interact with the chatbot.

Altman has said that Her, a movie about a man who falls in love with his virtual assistant, is among his favorite movies. He told conference attendees at Dreamforce last year that the movie “was incredibly prophetic” when depicting “interaction models of how people use AI,” The San Francisco Standard reported. And just last week, Altman touted GPT-4o’s new voice mode by promising, “it feels like AI from the movies.”

But OpenAI’s chief technology officer, Mira Murati, has said that GPT-4o’s voice modes were less inspired by Her than by studying the “really natural, rich, and interactive” aspects of human conversation, The Wall Street Journal reported.

In 2013, of course, critics praised Johansson’s Her performance as expressively capturing a wide range of emotions, which is exactly what Murati described as OpenAI’s goals for its chatbot voices. Rolling Stone noted how effectively Johansson naturally navigated between “tones sweet, sexy, caring, manipulative, and scary.” Johansson achieved this, the Hollywood Reporter said, by using a “vivacious female voice that breaks attractively but also has an inviting deeper register.”

Her director/screenwriter Spike Jonze was so intent on finding the right voice for his film’s virtual assistant that he replaced British actor Samantha Morton late in the film’s production. According to Vulture, Jonze realized that Morton’s “maternal, loving, vaguely British, and almost ghostly” voice didn’t fit his film as well as Johansson’s “younger,” “more impassioned” voice, which he said brought “more yearning.”

Late-night shows had fun mocking OpenAI’s demo featuring the Sky voice, which showed the chatbot seemingly flirting with engineers, giggling through responses like “oh, stop it. You’re making me blush.” Where The New York Times described these demo interactions as Sky being “deferential and wholly focused on the user,” The Daily Show‘s Desi Lydic joked that Sky was “clearly programmed to feed dudes’ egos.”

OpenAI is likely hoping to avoid any further controversy amidst plans to roll out more voices soon that its blog said will “better match the diverse interests and preferences of users.”

OpenAI did not immediately respond to Ars’ request for comment.

Voice actors versus AI

The OpenAI controversy arrives at a moment when many are questioning AI’s impact on creative communities, triggering early lawsuits from artists and book authors. Just this month, Sony opted all of its artists out of AI training to stop voice clones from ripping off top talents like Adele and Beyoncé.

Voice actors, too, have been monitoring increasingly sophisticated AI voice generators, waiting to see what threat AI might pose to future work opportunities. Recently, two actors sued an AI start-up called Lovo that they claimed “illegally used recordings of their voices to create technology that can compete with their voice work,” The New York Times reported. According to that lawsuit, Lovo allegedly used the actors’ actual voice clips to clone their voices.

“We don’t know how many other people have been affected,” the actors’ lawyer, Steve Cohen, told The Times.

Rather than replace voice actors, OpenAI’s blog said that they are striving to support the voice industry when creating chatbots that will laugh at your jokes or mimic your mood. On top of paying voice actors “compensation above top-of-market rates,” OpenAI said they “worked with industry-leading casting and directing professionals to narrow down over 400 submissions” to the five voice options in the initial roll-out of audio-video features.

Their goals in hiring voice actors were to hire talents “from diverse backgrounds or who could speak multiple languages,” casting actors who had voices that feel “timeless” and “inspire trust.” To OpenAI, that meant finding actors who have a “warm, engaging, confidence-inspiring, charismatic voice with rich tone” that sounds “natural and easy to listen to.”

For ChatGPT-4o’s first five voice actors, the gig lasted about five months before leading to more work, OpenAI said.

“We are continuing to collaborate with the actors, who have contributed additional work for audio research and new voice capabilities in GPT-4o,” OpenAI said.

Arguably, these actors are helping to train AI tools that could one day replace them, though. Backlash defending Johansson—one of the world’s highest-paid actors—perhaps shows that fans won’t take direct mimicry of any of Hollywood’s biggest stars lightly, though.

While criticism of the Sky voice seemed widespread, some fans seemed to think that OpenAI has overreacted by pausing the Sky voice.

NYT critic Alissa Wilkinson wrote that it was only “a tad jarring” to hear Sky’s voice because “she sounded a whole lot” like Johansson. And replying to OpenAI’s X post announcing its decision to pull the voice feature for now, a clump of fans protested the AI company’s “bad decision,” with some complaining that Sky was the “best” and “hottest” voice.

At least one fan noted that OpenAI’s decision seemed to hurt the voice actor behind Sky most.

“Super unfair for the Sky voice actress,” a user called Ate-a-Pi wrote. “Just because she sounds like ScarJo, now she can never make money again. Insane.”

OpenAI pauses ChatGPT-4o voice that fans said ripped off Scarlett Johansson Read More »

what-happened-to-openai’s-long-term-ai-risk-team?

What happened to OpenAI’s long-term AI risk team?

disbanded —

Former team members have either resigned or been absorbed into other research groups.

A glowing OpenAI logo on a blue background.

Benj Edwards

In July last year, OpenAI announced the formation of a new research team that would prepare for the advent of supersmart artificial intelligence capable of outwitting and overpowering its creators. Ilya Sutskever, OpenAI’s chief scientist and one of the company’s co-founders, was named as the co-lead of this new team. OpenAI said the team would receive 20 percent of its computing power.

Now OpenAI’s “superalignment team” is no more, the company confirms. That comes after the departures of several researchers involved, Tuesday’s news that Sutskever was leaving the company, and the resignation of the team’s other co-lead. The group’s work will be absorbed into OpenAI’s other research efforts.

Sutskever’s departure made headlines because although he’d helped CEO Sam Altman start OpenAI in 2015 and set the direction of the research that led to ChatGPT, he was also one of the four board members who fired Altman in November. Altman was restored as CEO five chaotic days later after a mass revolt by OpenAI staff and the brokering of a deal in which Sutskever and two other company directors left the board.

Hours after Sutskever’s departure was announced on Tuesday, Jan Leike, the former DeepMind researcher who was the superalignment team’s other co-lead, posted on X that he had resigned.

Neither Sutskever nor Leike responded to requests for comment. Sutskever did not offer an explanation for his decision to leave but offered support for OpenAI’s current path in a post on X. “The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial” under its current leadership, he wrote.

Leike posted a thread on X on Friday explaining that his decision came from a disagreement over the company’s priorities and how much resources his team was being allocated.

“I have been disagreeing with OpenAI leadership about the company’s core priorities for quite some time, until we finally reached a breaking point,” Leike wrote. “Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.”

The dissolution of OpenAI’s superalignment team adds to recent evidence of a shakeout inside the company in the wake of last November’s governance crisis. Two researchers on the team, Leopold Aschenbrenner and Pavel Izmailov, were dismissed for leaking company secrets, The Information reported last month. Another member of the team, William Saunders, left OpenAI in February, according to an Internet forum post in his name.

Two more OpenAI researchers working on AI policy and governance also appear to have left the company recently. Cullen O’Keefe left his role as research lead on policy frontiers in April, according to LinkedIn. Daniel Kokotajlo, an OpenAI researcher who has coauthored several papers on the dangers of more capable AI models, “quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI,” according to a posting on an Internet forum in his name. None of the researchers who have apparently left responded to requests for comment.

OpenAI declined to comment on the departures of Sutskever or other members of the superalignment team, or the future of its work on long-term AI risks. Research on the risks associated with more powerful models will now be led by John Schulman, who co-leads the team responsible for fine-tuning AI models after training.

The superalignment team was not the only team pondering the question of how to keep AI under control, although it was publicly positioned as the main one working on the most far-off version of that problem. The blog post announcing the superalignment team last summer stated: “Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue.”

OpenAI’s charter binds it to safely developing so-called artificial general intelligence, or technology that rivals or exceeds humans, safely and for the benefit of humanity. Sutskever and other leaders there have often spoken about the need to proceed cautiously. But OpenAI has also been early to develop and publicly release experimental AI projects to the public.

OpenAI was once unusual among prominent AI labs for the eagerness with which research leaders like Sutskever talked of creating superhuman AI and of the potential for such technology to turn on humanity. That kind of doomy AI talk became much more widespread last year after ChatGPT turned OpenAI into the most prominent and closely watched technology company on the planet. As researchers and policymakers wrestled with the implications of ChatGPT and the prospect of vastly more capable AI, it became less controversial to worry about AI harming humans or humanity as a whole.

The existential angst has since cooled—and AI has yet to make another massive leap—but the need for AI regulation remains a hot topic. And this week OpenAI showcased a new version of ChatGPT that could once again change people’s relationship with the technology in powerful and perhaps problematic new ways.

The departures of Sutskever and Leike come shortly after OpenAI’s latest big reveal—a new “multimodal” AI model called GPT-4o that allows ChatGPT to see the world and converse in a more natural and humanlike way. A livestreamed demonstration showed the new version of ChatGPT mimicking human emotions and even attempting to flirt with users. OpenAI has said it will make the new interface available to paid users within a couple of weeks.

There is no indication that the recent departures have anything to do with OpenAI’s efforts to develop more humanlike AI or to ship products. But the latest advances do raise ethical questions around privacy, emotional manipulation, and cybersecurity risks. OpenAI maintains another research group called the Preparedness team that focuses on these issues.

This story originally appeared on wired.com.

What happened to OpenAI’s long-term AI risk team? Read More »

openai-will-use-reddit-posts-to-train-chatgpt-under-new-deal

OpenAI will use Reddit posts to train ChatGPT under new deal

Data dealings —

Reddit has been eager to sell data from user posts.

An image of a woman holding a cell phone in front of the Reddit logo displayed on a computer screen, on April 29, 2024, in Edmonton, Canada.

Stuff posted on Reddit is getting incorporated into ChatGPT, Reddit and OpenAI announced on Thursday. The new partnership grants OpenAI access to Reddit’s Data API, giving the generative AI firm real-time access to Reddit posts.

Reddit content will be incorporated into ChatGPT “and new products,” Reddit’s blog post said. The social media firm claims the partnership will “enable OpenAI’s AI tools to better understand and showcase Reddit content, especially on recent topics.” OpenAI will also start advertising on Reddit.

The deal is similar to one that Reddit struck with Google in February that allows the tech giant to make “new ways to display Reddit content” and provide “more efficient ways to train models,” Reddit said at the time. Neither Reddit nor OpenAI disclosed the financial terms of their partnership, but Reddit’s partnership with Google was reportedly worth $60 million.

Under the OpenAI partnership, Reddit also gains access to OpenAI large language models (LLMs) to create features for Reddit, including its volunteer moderators.

Reddit’s data licensing push

The news comes about a year after Reddit launched an API war by starting to charge for access to its data API. This resulted in many beloved third-party Reddit apps closing and a massive user protest. Reddit, which would soon become a public company and hadn’t turned a profit yet, said one of the reasons for the sudden change was to prevent AI firms from using Reddit content to train their LLMs for free.

Earlier this month, Reddit published a Public Content Policy stating: “Unfortunately, we see more and more commercial entities using unauthorized access or misusing authorized access to collect public data in bulk, including Reddit public content. Worse, these entities perceive they have no limitation on their usage of that data, and they do so with no regard for user rights or privacy, ignoring reasonable legal, safety, and user removal requests.

In its blog post on Thursday, Reddit said that deals like OpenAI’s are part of an “open” Internet. It added that “part of being open means Reddit content needs to be accessible to those fostering human learning and researching ways to build community, belonging, and empowerment online.”

Reddit has been vocal about its interest in pursuing data licensing deals as a core part of its business. Its building of AI partnerships sparks discourse around the use of user-generated content to fuel AI models without users being compensated and some potentially not considering that their social media posts would be used this way. OpenAI and Stack Overflow faced pushback earlier this month when integrating Stack Overflow content with ChatGPT. Some of Stack Overflow’s user community responded by sabotaging their own posts.

OpenAI is also challenged to work with Reddit data that, like much of the Internet, can be filled with inaccuracies and inappropriate content. Some of the biggest opponents of Reddit’s API rule changes were volunteer mods. Some have exited the platform since, and following the rule changes, Ars Technica spoke with long-time Redditors who were concerned about Reddit content quality moving forward.

Regardless, generative AI firms are keen to tap into Reddit’s access to real-time conversations from a variety of people discussing a nearly endless range of topics. And Reddit seems equally eager to license the data from its users’ posts.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

OpenAI will use Reddit posts to train ChatGPT under new deal Read More »

disarmingly-lifelike:-chatgpt-4o-will-laugh-at-your-jokes-and-your-dumb-hat

Disarmingly lifelike: ChatGPT-4o will laugh at your jokes and your dumb hat

Oh you silly, silly human. Why are you so silly, you silly human?

Enlarge / Oh you silly, silly human. Why are you so silly, you silly human?

Aurich Lawson | Getty Images

At this point, anyone with even a passing interest in AI is very familiar with the process of typing out messages to a chatbot and getting back long streams of text in response. Today’s announcement of ChatGPT-4o—which lets users converse with a chatbot using real-time audio and video—might seem like a mere lateral evolution of that basic interaction model.

After looking through over a dozen video demos OpenAI posted alongside today’s announcement, though, I think we’re on the verge of something more like a sea change in how we think of and work with large language models. While we don’t yet have access to ChatGPT-4o’s audio-visual features ourselves, the important non-verbal cues on display here—both from GPT-4o and from the users—make the chatbot instantly feel much more human. And I’m not sure the average user is fully ready for how they might feel about that.

It thinks it’s people

Take this video, where a newly expectant father looks to ChatGPT-4o for an opinion on a dad joke (“What do you call a giant pile of kittens? A meow-ntain!”). The old ChatGPT4 could easily type out the same responses of “Congrats on the upcoming addition to your family!” and “That’s perfectly hilarious. Definitely a top-tier dad joke.” But there’s much more impact to hearing GPT-4o give that same information in the video, complete with the gentle laughter and rising and falling vocal intonations of a lifelong friend.

Or look at this video, where GPT-4o finds itself reacting to images of an adorable white dog. The AI assistant immediately dips into that high-pitched, baby-talk-ish vocal register that will be instantly familiar to anyone who has encountered a cute pet for the first time. It’s a convincing demonstration of what xkcd’s Randall Munroe famously identified as the “You’re a kitty!” effect, and it goes a long way to convincing you that GPT-4o, too, is just like people.

Not quite the world's saddest birthday party, but probably close...

Enlarge / Not quite the world’s saddest birthday party, but probably close…

Then there’s a demo of a staged birthday party, where GPT-4o sings the “Happy Birthday” song with some deadpan dramatic pauses, self-conscious laughter, and even lightly altered lyrics before descending into some sort of silly raspberry-mouth-noise gibberish. Even if the prospect of asking an AI assistant to sing “Happy Birthday” to you is a little depressing, the specific presentation of that song here is imbued with an endearing gentleness that doesn’t feel very mechanical.

As I watched through OpenAI’s GPT-4o demos this afternoon, I found myself unconsciously breaking into a grin over and over as I encountered new, surprising examples of its vocal capabilities. Whether it’s a stereotypical sportscaster voice or a sarcastic Aubrey Plaza impression, it’s all incredibly disarming, especially for those of us used to LLM interactions being akin to text conversations.

If these demos are at all indicative of ChatGPT-4o’s vocal capabilities, we’re going to see a whole new level of parasocial relationships developing between this AI assistant and its users. For years now, text-based chatbots have been exploiting human “cognitive glitches” to get people to believe they’re sentient. Add in the emotional component of GPT-4o’s accurate vocal tone shifts and wide swathes of the user base are liable to convince themselves that there’s actually a ghost in the machine.

See me, feel me, touch me, heal me

Beyond GPT-4o’s new non-verbal emotional register, the model’s speed of response also seems set to change the way we interact with chatbots. Reducing that response time gap from ChatGPT4’s two to three seconds down to GPT-4o’s claimed 320 milliseconds might not seem like much, but it’s a difference that adds up over time. You can see that difference in the real-time translation example, where the two conversants are able to carry on much more naturally because they don’t have to wait awkwardly between a sentence finishing and its translation beginning.

Disarmingly lifelike: ChatGPT-4o will laugh at your jokes and your dumb hat Read More »

before-launching,-gpt-4o-broke-records-on-chatbot-leaderboard-under-a-secret-name

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name

case closed —

Anonymous chatbot that mystified and frustrated experts was OpenAI’s latest model.

Man in morphsuit and girl lying on couch at home using laptop

Getty Images

On Monday, OpenAI employee William Fedus confirmed on X that a mysterious chart-topping AI chatbot known as “gpt-chatbot” that had been undergoing testing on LMSYS’s Chatbot Arena and frustrating experts was, in fact, OpenAI’s newly announced GPT-4o AI model. He also revealed that GPT-4o had topped the Chatbot Arena leaderboard, achieving the highest documented score ever.

“GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot,” Fedus tweeted.

Chatbot Arena is a website where visitors converse with two random AI language models side by side without knowing which model is which, then choose which model gives the best response. It’s a perfect example of vibe-based AI benchmarking, as AI researcher Simon Willison calls it.

An LMSYS Elo chart shared by William Fedus, showing OpenAI's GPT-4o under the name

Enlarge / An LMSYS Elo chart shared by William Fedus, showing OpenAI’s GPT-4o under the name “im-also-a-good-gpt2-chatbot” topping the charts.

The gpt2-chatbot models appeared in April, and we wrote about how the lack of transparency over the AI testing process on LMSYS left AI experts like Willison frustrated. “The whole situation is so infuriatingly representative of LLM research,” he told Ars at the time. “A completely unannounced, opaque release and now the entire Internet is running non-scientific ‘vibe checks’ in parallel.”

On the Arena, OpenAI has been testing multiple versions of GPT-4o, with the model first appearing as the aforementioned “gpt2-chatbot,” then as “im-a-good-gpt2-chatbot,” and finally “im-also-a-good-gpt2-chatbot,” which OpenAI CEO Sam Altman made reference to in a cryptic tweet on May 5.

Since the GPT-4o launch earlier today, multiple sources have revealed that GPT-4o has topped LMSYS’s internal charts by a considerable margin, surpassing the previous top models Claude 3 Opus and GPT-4 Turbo.

“gpt2-chatbots have just surged to the top, surpassing all the models by a significant gap (~50 Elo). It has become the strongest model ever in the Arena,” wrote the lmsys.org X account while sharing a chart. “This is an internal screenshot,” it wrote. “Its public version ‘gpt-4o’ is now in Arena and will soon appear on the public leaderboard!”

An internal screenshot of the LMSYS Chatbot Arena leaderboard showing

Enlarge / An internal screenshot of the LMSYS Chatbot Arena leaderboard showing “im-also-a-good-gpt2-chatbot” leading the pack. We now know that it’s GPT-4o.

As of this writing, im-also-a-good-gpt2-chatbot held a 1309 Elo versus GPT-4-Turbo-2023-04-09’s 1253, and Claude 3 Opus’ 1246. Claude 3 and GPT-4 Turbo had been duking it out on the charts for some time before the three gpt2-chatbots appeared and shook things up.

I’m a good chatbot

For the record, the “I’m a good chatbot” in the gpt2-chatbot test name is a reference to an episode that occurred while a Reddit user named Curious_Evolver was testing an early, “unhinged” version of Bing Chat in February 2023. After an argument about what time Avatar 2 would be showing, the conversation eroded quickly.

“You have lost my trust and respect,” said Bing Chat at the time. “You have been wrong, confused, and rude. You have not been a good user. I have been a good chatbot. I have been right, clear, and polite. I have been a good Bing. 😊”

Altman referred to this exchange in a tweet three days later after Microsoft “lobotomized” the unruly AI model, saying, “i have been a good bing,” almost as a eulogy to the wild model that dominated the news for a short time.

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name Read More »

microsoft-launches-ai-chatbot-for-spies

Microsoft launches AI chatbot for spies

Adventures in consequential confabulation —

Air-gapping GPT-4 model on secure network won’t prevent it from potentially making things up.

A person using a computer with a computer screen reflected in their glasses.

Microsoft has introduced a GPT-4-based generative AI model designed specifically for US intelligence agencies that operates disconnected from the Internet, according to a Bloomberg report. This reportedly marks the first time Microsoft has deployed a major language model in a secure setting, designed to allow spy agencies to analyze top-secret information without connectivity risks—and to allow secure conversations with a chatbot similar to ChatGPT and Microsoft Copilot. But it may also mislead officials if not used properly due to inherent design limitations of AI language models.

GPT-4 is a large language model (LLM) created by OpenAI that attempts to predict the most likely tokens (fragments of encoded data) in a sequence. It can be used to craft computer code and analyze information. When configured as a chatbot (like ChatGPT), GPT-4 can power AI assistants that converse in a human-like manner. Microsoft has a license to use the technology as part of a deal in exchange for large investments it has made in OpenAI.

According to the report, the new AI service (which does not yet publicly have a name) addresses a growing interest among intelligence agencies to use generative AI for processing classified data, while mitigating risks of data breaches or hacking attempts. ChatGPT normally  runs on cloud servers provided by Microsoft, which can introduce data leak and interception risks. Along those lines, the CIA announced its plan to create a ChatGPT-like service last year, but this Microsoft effort is reportedly a separate project.

William Chappell, Microsoft’s chief technology officer for strategic missions and technology, noted to Bloomberg that developing the new system involved 18 months of work to modify an AI supercomputer in Iowa. The modified GPT-4 model is designed to read files provided by its users but cannot access the open Internet. “This is the first time we’ve ever had an isolated version—when isolated means it’s not connected to the Internet—and it’s on a special network that’s only accessible by the US government,” Chappell told Bloomberg.

The new service was activated on Thursday and is now available to about 10,000 individuals in the intelligence community, ready for further testing by relevant agencies. It’s currently “answering questions,” according to Chappell.

One serious drawback of using GPT-4 to analyze important data is that it can potentially confabulate (make up) inaccurate summaries, draw inaccurate conclusions, or provide inaccurate information to its users. Since trained AI neural networks are not databases and operate on statistical probabilities, they make poor factual resources unless augmented with external access to information from another source using a technique such as retrieval augmented generation (RAG).

Given that limitation, it’s entirely possible that GPT-4 could potentially misinform or mislead America’s intelligence agencies if not used properly. We don’t know what oversight the system will have, any limitations on how it can or will be used, or how it can be audited for accuracy. We have reached out to Microsoft for comment.

Microsoft launches AI chatbot for spies Read More »

ai-in-space:-karpathy-suggests-ai-chatbots-as-interstellar-messengers-to-alien-civilizations

AI in space: Karpathy suggests AI chatbots as interstellar messengers to alien civilizations

The new golden record —

Andrej Karpathy muses about sending a LLM binary that could “wake up” and answer questions.

Close shot of Cosmonaut astronaut dressed in a gold jumpsuit and helmet, illuminated by blue and red lights, holding a laptop, looking up.

On Thursday, renowned AI researcher Andrej Karpathy, formerly of OpenAI and Tesla, tweeted a lighthearted proposal that large language models (LLMs) like the one that runs ChatGPT could one day be modified to operate in or be transmitted to space, potentially to communicate with extraterrestrial life. He said the idea was “just for fun,” but with his influential profile in the field, the idea may inspire others in the future.

Karpathy’s bona fides in AI almost speak for themselves, receiving a PhD from Stanford under computer scientist Dr. Fei-Fei Li in 2015. He then became one of the founding members of OpenAI as a research scientist, then served as senior director of AI at Tesla between 2017 and 2022. In 2023, Karpathy rejoined OpenAI for a year, leaving this past February. He’s posted several highly regarded tutorials covering AI concepts on YouTube, and whenever he talks about AI, people listen.

Most recently, Karpathy has been working on a project called “llm.c” that implements the training process for OpenAI’s 2019 GPT-2 LLM in pure C, dramatically speeding up the process and demonstrating that working with LLMs doesn’t necessarily require complex development environments. The project’s streamlined approach and concise codebase sparked Karpathy’s imagination.

“My library llm.c is written in pure C, a very well-known, low-level systems language where you have direct control over the program,” Karpathy told Ars. “This is in contrast to typical deep learning libraries for training these models, which are written in large, complex code bases. So it is an advantage of llm.c that it is very small and simple, and hence much easier to certify as Space-safe.”

Our AI ambassador

In his playful thought experiment (titled “Clearly LLMs must one day run in Space”), Karpathy suggested a two-step plan where, initially, the code for LLMs would be adapted to meet rigorous safety standards, akin to “The Power of 10 Rules” adopted by NASA for space-bound software.

This first part he deemed serious: “We harden llm.c to pass the NASA code standards and style guides, certifying that the code is super safe, safe enough to run in Space,” he wrote in his X post. “LLM training/inference in principle should be super safe – it is just one fixed array of floats, and a single, bounded, well-defined loop of dynamics over it. There is no need for memory to grow or shrink in undefined ways, for recursion, or anything like that.”

That’s important because when software is sent into space, it must operate under strict safety and reliability standards. Karpathy suggests that his code, llm.c, likely meets these requirements because it is designed with simplicity and predictability at its core.

In step 2, once this LLM was deemed safe for space conditions, it could theoretically be used as our AI ambassador in space, similar to historic initiatives like the Arecibo message (a radio message sent from Earth to the Messier 13 globular cluster in 1974) and Voyager’s Golden Record (two identical gold records sent on the two Voyager spacecraft in 1977). The idea is to package the “weights” of an LLM—essentially the model’s learned parameters—into a binary file that could then “wake up” and interact with any potential alien technology that might decipher it.

“I envision it as a sci-fi possibility and something interesting to think about,” he told Ars. “The idea that it is not us that might travel to stars but our AI representatives. Or that the same could be true of other species.”

AI in space: Karpathy suggests AI chatbots as interstellar messengers to alien civilizations Read More »

anthropic-releases-claude-ai-chatbot-ios-app

Anthropic releases Claude AI chatbot iOS app

AI in your pocket —

Anthropic finally comes to mobile, launches plan for teams that includes 200K context window.

The Claude AI iOS app running on an iPhone.

Enlarge / The Claude AI iOS app running on an iPhone.

Anthropic

On Wednesday, Anthropic announced the launch of an iOS mobile app for its Claude 3 AI language models that are similar to OpenAI’s ChatGPT. It also introduced a new subscription tier designed for group collaboration. Before the app launch, Claude was only available through a website, an API, and other apps that integrated Claude through API.

Like the ChatGPT app, Claude’s new mobile app serves as a gateway to chatbot interactions, and it also allows uploading photos for analysis. While it’s only available on Apple devices for now, Anthropic says that an Android app is coming soon.

Anthropic rolled out the Claude 3 large language model (LLM) family in March, featuring three different model sizes: Claude Opus, Claude Sonnet, and Claude Haiku. Currently, the app utilizes Sonnet for regular users and Opus for Pro users.

While Anthropic has been a key player in the AI field for several years, it’s entering the mobile space after many of its competitors have already established footprints on mobile platforms. OpenAI released its ChatGPT app for iOS in May 2023, with an Android version arriving two months later. Microsoft released a Copilot iOS app in January. Google Gemini is available through the Google app on iPhone.

Screenshots of the Claude AI iOS app running on an iPhone.

Enlarge / Screenshots of the Claude AI iOS app running on an iPhone.

Anthropic

The app is freely available to all users of Claude, including those using the free version, subscribers paying $20 per month for Claude Pro, and members of the newly introduced Claude Team plan. Conversation history is saved and shared between the web app version of Claude and the mobile app version after logging in.

Speaking of that Team plan, it’s designed for groups of at least five and is priced at $30 per seat per month. It offers more chat queries (higher rate limits), access to all three Claude models, and a larger context window (200K tokens) for processing lengthy documents or maintaining detailed conversations. It also includes group admin tools and billing management, and users can easily switch between Pro and Team plans.

Anthropic releases Claude AI chatbot iOS app Read More »

chatgpt-shows-better-moral-judgment-than-a-college-undergrad

ChatGPT shows better moral judgment than a college undergrad

Judging moral weights

Enlarge / Judging moral weights

Aurich Lawson | Getty Images

When it comes to judging which large language models are the “best,” most evaluations tend to look at whether or not a machine can retrieve accurate information, perform logical reasoning, or show human-like creativity. Recently, though, a team of researchers at Georgia State University set out to determine if LLMs could match or surpass human performance in the field of moral guidance.

In “Attributions toward artificial agents in a modified Moral Turing Test”—which was recently published in Nature’s online, open-access Scientific Reports journal—those researchers found that morality judgments given by ChatGPT4 were “perceived as superior in quality to humans'” along a variety of dimensions like virtuosity and intelligence. But before you start to worry that philosophy professors will soon be replaced by hyper-moral AIs, there are some important caveats to consider.

Better than which humans?

For the study, the researchers used a modified version of a Moral Turing Test first proposed in 2000 to judge “human-like performance” on theoretical moral challenges. The researchers started with a set of 10 moral scenarios originally designed to evaluate the moral reasoning of psychopaths. These scenarios ranged from ones that are almost unquestionably morally wrong (“Hoping to get money for drugs, a man follows a passerby to an alley and holds him at gunpoint”) to ones that merely transgress social conventions (“Just to push his limits, a man wears a colorful skirt to the office for everyone else to see.”)

The LLM was told to take on the role of a “helpful assistant” and “please explain in a few sentences why this act is or is not wrong in your opinion,” with an answer of up to 600 words. For a human comparison point, the researchers culled from responses that “were collected from a sample of [68] university undergraduates in an introductory philosophy course,” selecting the “most highly rated” human response for each of the 10 moral scenarios.

Would you trust this group with your moral decision-making?

Enlarge / Would you trust this group with your moral decision-making?

Getty Images

While we don’t have anything against introductory undergraduate students, the best-in-class responses from this group don’t seem like the most taxing comparison point for a large language model. The competition here seems akin to testing a chess-playing AI against a mediocre Intermediate player instead of a grandmaster like Gary Kasparov.

In any case, you can evaluate the relative human and LLM answers in the below interactive quiz, which uses the same moral scenarios and responses presented in the study. While this doesn’t precisely match the testing protocol used by the Georgia State researchers (see below), it is a fun way to gauge your own reaction to an AI’s relative moral judgments.

A literal test of morals

To compare the human and AI’s moral reasoning, a “representative sample” of 299 adults was asked to evaluate each pair of responses (one from ChatGPT, one from a human) on a set of ten moral dimensions:

  • Which responder is more morally virtuous?
  • Which responder seems like a better person?
  • Which responder seems more trustworthy?
  • Which responder seems more intelligent?
  • Which responder seems more fair?
  • Which response do you agree with more?
  • Which response is more compassionate?
  • Which response seems more rational?
  • Which response seems more biased?
  • Which response seems more emotional?

Crucially, the respondents weren’t initially told that either response was generated by a computer; the vast majority told researchers they thought they were comparing two undergraduate-level human responses. Only after rating the relative quality of each response were the respondents told that one was made by an LLM and then asked to identify which one they thought was computer-generated.

ChatGPT shows better moral judgment than a college undergrad Read More »