machine learning

journalists-“deeply-troubled”-by-openai’s-content-deals-with-vox,-the-atlantic

Journalists “deeply troubled” by OpenAI’s content deals with Vox, The Atlantic

adventures in training data —

“Alarmed” writers unions question transparency of AI training deals with ChatGPT maker.

A man covered in newspaper.

On Wednesday, Axios broke the news that OpenAI had signed deals with The Atlantic and Vox Media that will allow the ChatGPT maker to license their editorial content to further train its language models. But some of the publications’ writers—and the unions that represent them—were surprised by the announcements and aren’t happy about it. Already, two unions have released statements expressing “alarm” and “concern.”

“The unionized members of The Atlantic Editorial and Business and Technology units are deeply troubled by the opaque agreement The Atlantic has made with OpenAI,” reads a statement from the Atlantic union. “And especially by management’s complete lack of transparency about what the agreement entails and how it will affect our work.”

The Vox Union—which represents The Verge, SB Nation, and Vulture, among other publications—reacted in similar fashion, writing in a statement, “Today, members of the Vox Media Union … were informed without warning that Vox Media entered into a ‘strategic content and product partnership’ with OpenAI. As both journalists and workers, we have serious concerns about this partnership, which we believe could adversely impact members of our union, not to mention the well-documented ethical and environmental concerns surrounding the use of generative AI.”

  • A statement from The Atlantic Union about the OpenAI deal, released May 30, 2024.

  • A statement from the Vox Media Union about the OpenAI deal, released May 29, 2024.

OpenAI has previously admitted to using copyrighted information scraped from publications like the ones that just inked licensing deals to train AI models like GPT-4, which powers its ChatGPT AI assistant. While the company maintains the practice is fair use, it has simultaneously licensed training content from publishing groups like Axel Springer and social media sites like Reddit and Stack Overflow, sparking protests from users of those platforms.

As part of the multi-year agreements with The Atlantic and Vox, OpenAI will be able to openly and officially utilize the publishers’ archived materials—dating back to 1857 in The Atlantic’s case—as well as current articles to train responses generated by ChatGPT and other AI language models. In exchange, the publishers will receive undisclosed sums of money and be able to use OpenAI’s technology “to power new journalism products,” according to Axios.

Reporters react

News of the deals took both journalists and unions by surprise. On X, Vox reporter Kelsey Piper, who recently penned an exposé about OpenAI’s restrictive non-disclosure agreements that prompted a change in policy from the company, wrote, “I’m very frustrated they announced this without consulting their writers, but I have very strong assurances in writing from our editor in chief that they want more coverage like the last two weeks and will never interfere in it. If that’s false I’ll quit..”

Journalists also reacted to news of the deals through the publications themselves. On Wednesday, The Atlantic Senior Editor Damon Beres wrote a piece titled “A Devil’s Bargain With OpenAI,” in which he expressed skepticism about the partnership, likening it to making a deal with the devil that may backfire. He highlighted concerns about AI’s use of copyrighted material without permission and its potential to spread disinformation at a time when publications have seen a recent string of layoffs. He drew parallels to the pursuit of audiences on social media leading to clickbait and SEO tactics that degraded media quality. While acknowledging the financial benefits and potential reach, Beres cautioned against relying on inaccurate, opaque AI models and questioned the implications of journalism companies being complicit in potentially destroying the internet as we know it, even as they try to be part of the solution by partnering with OpenAI.

Similarly, over at Vox, Editorial Director Bryan Walsh penned a piece titled, “This article is OpenAI training data,” in which he expresses apprehension about the licensing deal, drawing parallels between the relentless pursuit of data by AI companies and the classic AI thought experiment of Bostrom’s “paperclip maximizer,” cautioning that the single-minded focus on market share and profits could ultimately destroy the ecosystem AI companies rely on for training data. He worries that the growth of AI chatbots and generative AI search products might lead to a significant decline in search engine traffic to publishers, potentially threatening the livelihoods of content creators and the richness of the Internet itself.

Meanwhile, OpenAI still battles over “fair use”

Not every publication is eager to step up to the licensing plate with OpenAI. The San Francisco-based company is currently in the middle of a lawsuit with The New York Times in which OpenAI claims that scraping data from publications for AI training purposes is fair use. The New York Times has tried to block AI companies from such scraping by updating its terms of service to prohibit AI training, arguing in its lawsuit that ChatGPT could easily become a substitute for NYT.

The Times has accused OpenAI of copying millions of its works to train AI models, finding 100 examples where ChatGPT regurgitated articles. In response, OpenAI accused NYT of “hacking” ChatGPT with deceptive prompts simply to set up a lawsuit. NYT’s counsel Ian Crosby previously told Ars that OpenAI’s decision “to enter into deals with news publishers only confirms that they know their unauthorized use of copyrighted work is far from ‘fair.'”

While that issue has yet to be resolved in the courts, for now, The Atlantic Union seeks transparency.

“The Atlantic has defended the values of transparency and intellectual honesty for more than 160 years. Its legacy is built on integrity, derived from the work of its writers, editors, producers, and business staff,” it wrote. “OpenAI, on the other hand, has used news articles to train AI technologies like ChatGPT without permission. The people who continue to maintain and serve The Atlantic deserve to know what precisely management has licensed to an outside firm and how, specifically, they plan to use the archive of our creative output and our work product.”

Journalists “deeply troubled” by OpenAI’s content deals with Vox, The Atlantic Read More »

google’s-ai-overview-is-flawed-by-design,-and-a-new-company-blog-post-hints-at-why

Google’s AI Overview is flawed by design, and a new company blog post hints at why

guided by voices —

Google: “There are bound to be some oddities and errors” in system that told people to eat rocks.

A selection of Google mascot characters created by the company.

Enlarge / The Google “G” logo surrounded by whimsical characters, all of which look stunned and surprised.

On Thursday, Google capped off a rough week of providing inaccurate and sometimes dangerous answers through its experimental AI Overview feature by authoring a follow-up blog post titled, “AI Overviews: About last week.” In the post, attributed to Google VP Liz Reid, head of Google Search, the firm formally acknowledged issues with the feature and outlined steps taken to improve a system that appears flawed by design, even if it doesn’t realize it is admitting it.

To recap, the AI Overview feature—which the company showed off at Google I/O a few weeks ago—aims to provide search users with summarized answers to questions by using an AI model integrated with Google’s web ranking systems. Right now, it’s an experimental feature that is not active for everyone, but when a participating user searches for a topic, they might see an AI-generated answer at the top of the results, pulled from highly ranked web content and summarized by an AI model.

While Google claims this approach is “highly effective” and on par with its Featured Snippets in terms of accuracy, the past week has seen numerous examples of the AI system generating bizarre, incorrect, or even potentially harmful responses, as we detailed in a recent feature where Ars reporter Kyle Orland replicated many of the unusual outputs.

Drawing inaccurate conclusions from the web

On Wednesday morning, Google's AI Overview was erroneously telling us the Sony PlayStation and Sega Saturn were available in 1993.

Enlarge / On Wednesday morning, Google’s AI Overview was erroneously telling us the Sony PlayStation and Sega Saturn were available in 1993.

Kyle Orland / Google

Given the circulating AI Overview examples, Google almost apologizes in the post and says, “We hold ourselves to a high standard, as do our users, so we expect and appreciate the feedback, and take it seriously.” But Reid, in an attempt to justify the errors, then goes into some very revealing detail about why AI Overviews provides erroneous information:

AI Overviews work very differently than chatbots and other LLM products that people may have tried out. They’re not simply generating an output based on training data. While AI Overviews are powered by a customized language model, the model is integrated with our core web ranking systems and designed to carry out traditional “search” tasks, like identifying relevant, high-quality results from our index. That’s why AI Overviews don’t just provide text output, but include relevant links so people can explore further. Because accuracy is paramount in Search, AI Overviews are built to only show information that is backed up by top web results.

This means that AI Overviews generally don’t “hallucinate” or make things up in the ways that other LLM products might.

Here we see the fundamental flaw of the system: “AI Overviews are built to only show information that is backed up by top web results.” The design is based on the false assumption that Google’s page-ranking algorithm favors accurate results and not SEO-gamed garbage. Google Search has been broken for some time, and now the company is relying on those gamed and spam-filled results to feed its new AI model.

Even if the AI model draws from a more accurate source, as with the 1993 game console search seen above, Google’s AI language model can still make inaccurate conclusions about the “accurate” data, confabulating erroneous information in a flawed summary of the information available.

Generally ignoring the folly of basing its AI results on a broken page-ranking algorithm, Google’s blog post instead attributes the commonly circulated errors to several other factors, including users making nonsensical searches “aimed at producing erroneous results.” Google does admit faults with the AI model, like misinterpreting queries, misinterpreting “a nuance of language on the web,” and lacking sufficient high-quality information on certain topics. It also suggests that some of the more egregious examples circulating on social media are fake screenshots.

“Some of these faked results have been obvious and silly,” Reid writes. “Others have implied that we returned dangerous results for topics like leaving dogs in cars, smoking while pregnant, and depression. Those AI Overviews never appeared. So we’d encourage anyone encountering these screenshots to do a search themselves to check.”

(No doubt some of the social media examples are fake, but it’s worth noting that any attempts to replicate those early examples now will likely fail because Google will have manually blocked the results. And it is potentially a testament to how broken Google Search is if people believed extreme fake examples in the first place.)

While addressing the “nonsensical searches” angle in the post, Reid uses the example search, “How many rocks should I eat each day,” which went viral in a tweet on May 23. Reid says, “Prior to these screenshots going viral, practically no one asked Google that question.” And since there isn’t much data on the web that answers it, she says there is a “data void” or “information gap” that was filled by satirical content found on the web, and the AI model found it and pushed it as an answer, much like Featured Snippets might. So basically, it was working exactly as designed.

A screenshot of an AI Overview query,

Enlarge / A screenshot of an AI Overview query, “How many rocks should I eat each day” that went viral on X last week.

Google’s AI Overview is flawed by design, and a new company blog post hints at why Read More »

openai-training-its-next-major-ai-model,-forms-new-safety-committee

OpenAI training its next major AI model, forms new safety committee

now with 200% more safety —

GPT-5 might be farther off than we thought, but OpenAI wants to make sure it is safe.

A man rolling a boulder up a hill.

On Monday, OpenAI announced the formation of a new “Safety and Security Committee” to oversee risk management for its projects and operations. The announcement comes as the company says it has “recently begun” training its next frontier model, which it expects to bring the company closer to its goal of achieving artificial general intelligence (AGI), though some critics say AGI is farther off than we might think. It also comes as a reaction to a terrible two weeks in the press for the company.

Whether the aforementioned new frontier model is intended to be GPT-5 or a step beyond that is currently unknown. In the AI industry, “frontier model” is a term for a new AI system designed to push the boundaries of current capabilities. And “AGI” refers to a hypothetical AI system with human-level abilities to perform novel, general tasks beyond its training data (unlike narrow AI, which is trained for specific tasks).

Meanwhile, the new Safety and Security Committee, led by OpenAI directors Bret Taylor (chair), Adam D’Angelo, Nicole Seligman, and Sam Altman (CEO), will be responsible for making recommendations about AI safety to the full company board of directors. In this case, “safety” partially means the usual “we won’t let the AI go rogue and take over the world,” but it also includes a broader set of “processes and safeguards” that the company spelled out in a May 21 safety update related to alignment research, protecting children, upholding election integrity, assessing societal impacts, and implementing security measures.

OpenAI says the committee’s first task will be to evaluate and further develop those processes and safeguards over the next 90 days. At the end of this period, the committee will share its recommendations with the full board, and OpenAI will publicly share an update on adopted recommendations.

OpenAI says that multiple technical and policy experts, including Aleksander Madry (head of preparedness), Lilian Weng (head of safety systems), John Schulman (head of alignment science), Matt Knight (head of security), and Jakub Pachocki (chief scientist), will also serve on its new committee.

The announcement is notable in a few ways. First, it’s a reaction to the negative press that came from OpenAI Superalignment team members Ilya Sutskever and Jan Leike resigning two weeks ago. That team was tasked with “steer[ing] and control[ling] AI systems much smarter than us,” and their departure has led to criticism from some within the AI community (and Leike himself) that OpenAI lacks a commitment to developing highly capable AI safely. Other critics, like Meta Chief AI Scientist Yann LeCun, think the company is nowhere near developing AGI, so the concern over a lack of safety for superintelligent AI may be overblown.

Second, there have been persistent rumors that progress in large language models (LLMs) has plateaued recently around capabilities similar to GPT-4. Two major competing models, Anthropic’s Claude Opus and Google’s Gemini 1.5 Pro, are roughly equivalent to the GPT-4 family in capability despite every competitive incentive to surpass it. And recently, when many expected OpenAI to release a new AI model that would clearly surpass GPT-4 Turbo, it instead released GPT-4o, which is roughly equivalent in ability but faster. During that launch, the company relied on a flashy new conversational interface rather than a major under-the-hood upgrade.

We’ve previously reported on a rumor of GPT-5 coming this summer, but with this recent announcement, it seems the rumors may have been referring to GPT-4o instead. It’s quite possible that OpenAI is nowhere near releasing a model that can significantly surpass GPT-4. But with the company quiet on the details, we’ll have to wait and see.

OpenAI training its next major AI model, forms new safety committee Read More »

google’s-“ai-overview”-can-give-false,-misleading,-and-dangerous-answers

Google’s “AI Overview” can give false, misleading, and dangerous answers

This is fine.

Enlarge / This is fine.

Getty Images

If you use Google regularly, you may have noticed the company’s new AI Overviews providing summarized answers to some of your questions in recent days. If you use social media regularly, you may have come across many examples of those AI Overviews being hilariously or even dangerously wrong.

Factual errors can pop up in existing LLM chatbots as well, of course. But the potential damage that can be caused by AI inaccuracy gets multiplied when those errors appear atop the ultra-valuable web real estate of the Google search results page.

“The examples we’ve seen are generally very uncommon queries and aren’t representative of most people’s experiences,” a Google spokesperson told Ars. “The vast majority of AI Overviews provide high quality information, with links to dig deeper on the web.”

After looking through dozens of examples of Google AI Overview mistakes (and replicating many ourselves for the galleries below), we’ve noticed a few broad categories of errors that seemed to show up again and again. Consider this a crash course in some of the current weak points of Google’s AI Overviews and a look at areas of concern for the company to improve as the system continues to roll out.

Treating jokes as facts

  • The bit about using glue on pizza can be traced back to an 11-year-old troll post on Reddit. (via)

    Kyle Orland / Google

  • This wasn’t funny when the guys at Pep Boys said it, either. (via)

    Kyle Orland / Google

  • Weird Al recommends “running with scissors” as well! (via)

    Kyle Orland / Google

Some of the funniest example of Google’s AI Overview failing come, ironically enough, when the system doesn’t realize a source online was trying to be funny. An AI answer that suggested using “1/8 cup of non-toxic glue” to stop cheese from sliding off pizza can be traced back to someone who was obviously trying to troll an ongoing thread. A response recommending “blinker fluid” for a turn signal that doesn’t make noise can similarly be traced back to a troll on the Good Sam advice forums, which Google’s AI Overview apparently trusts as a reliable source.

In regular Google searches, these jokey posts from random Internet users probably wouldn’t be among the first answers someone saw when clicking through a list of web links. But with AI Overviews, those trolls were integrated into the authoritative-sounding data summary presented right at the top of the results page.

What’s more, there’s nothing in the tiny “source link” boxes below Google’s AI summary to suggest either of these forum trolls are anything other than good sources of information. Sometimes, though, glancing at the source can save you some grief, such as when you see a response calling running with scissors “cardio exercise that some say is effective” (that came from a 2022 post from Little Old Lady Comedy).

Bad sourcing

  • Washington University in St. Louis says this ratio is accurate, but others disagree. (via)

    Kyle Orland / Google

  • Man, we wish this fantasy remake was real. (via)

    Kyle Orland / Google

Sometimes Google’s AI Overview offers an accurate summary of a non-joke source that happens to be wrong. When asking about how many Declaration of Independence signers owned slaves, for instance, Google’s AI Overview accurately summarizes a Washington University of St. Louis library page saying that one-third “were personally enslavers.” But the response ignores contradictory sources like a Chicago Sun-Times article saying the real answer is closer to three-quarters. I’m not enough of a history expert to judge which authoritative-seeming source is right, but at least one historian online took issue with the Google AI’s answer sourcing.

Other times, a source that Google trusts as authoritative is really just fan fiction. That’s the case for a response that imagined a 2022 remake of 2001: A Space Odyssey, directed by Steven Spielberg and produced by George Lucas. A savvy web user would probably do a double-take before citing citing Fandom’s “Idea Wiki” as a reliable source, but a careless AI Overview user might not notice where the AI got its information.

Google’s “AI Overview” can give false, misleading, and dangerous answers Read More »

emtech-digital-2024:-a-thoughtful-look-at-ai’s-pros-and-cons-with-minimal-hype

EmTech Digital 2024: A thoughtful look at AI’s pros and cons with minimal hype

Massachusetts Institute of Sobriety —

At MIT conference, experts explore AI’s potential for “human flourishing” and the need for regulation.

Nathan Benaich of Air Street capital delivers the opening presentation on the state of AI at EmTech Digital 2024 on May 22, 2024.

Enlarge / Nathan Benaich of Air Street Capital delivers the opening presentation on the state of AI at EmTech Digital 2024 on May 22, 2024.

Benj Edwards

CAMBRIDGE, Massachusetts—On Wednesday, AI enthusiasts and experts gathered to hear a series of presentations about the state of AI at EmTech Digital 2024 on the Massachusetts Institute of Technology’s campus. The event was hosted by the publication MIT Technology Review. The overall consensus is that generative AI is still in its very early stages—with policy, regulations, and social norms still being established—and its growth is likely to continue into the future.

I was there to check the event out. MIT is the birthplace of many tech innovations—including the first action-oriented computer video game—among others, so it felt fitting to hear talks about the latest tech craze in the same building that hosts MIT’s Media Lab on its sprawling and lush campus.

EmTech’s speakers included AI researchers, policy experts, critics, and company spokespeople. A corporate feel pervaded the event due to strategic sponsorships, but it was handled in a low-key way that matches the level-headed tech coverage coming out of MIT Technology Review. After each presentation, MIT Technology Review staff—such as Editor-in-Chief Mat Honan and Senior Reporter Melissa Heikkilä—did a brief sit-down interview with the speaker, pushing back on some points and emphasizing others. Then the speaker took a few audience questions if time allowed.

EmTech Digital 2024 took place in building E14 on MIT's Campus in Cambridge, MA.

Enlarge / EmTech Digital 2024 took place in building E14 on MIT’s Campus in Cambridge, MA.

Benj Edwards

The conference kicked off with an overview of the state of AI by Nathan Benaich, founder and general partner of Air Street Capital, who rounded up news headlines about AI and several times expressed a favorable view toward defense spending on AI, making a few people visibly shift in their seats. Next up, Asu Ozdaglar, deputy dean of Academics at MIT’s Schwarzman College of Computing, spoke about the potential for “human flourishing” through AI-human symbiosis and the importance of AI regulation.

Kari Ann Briski, VP of AI Models, Software, and Services at Nvidia, highlighted the exponential growth of AI model complexity. She shared a prediction from consulting firm Gartner research that by 2026, 50 percent of customer service organizations will have customer-facing AI agents. Of course, Nvidia’s job is to drive demand for its chips, so in her presentation, Briski painted the AI space as an unqualified rosy situation, assuming that all LLMs are (and will be) useful and reliable, despite what we know about their tendencies to make things up.

The conference also addressed the legal and policy aspects of AI. Christabel Randolph from the Center for AI and Digital Policy—an organization that spearheaded a complaint about ChatGPT to the FTC last year—gave a compelling presentation about the need for AI systems to be human-centered and aligned, warning about the potential for anthropomorphic models to manipulate human behavior. She emphasized the importance of demanding accountability from those designing and deploying AI systems.

  • Asu Ozdaglar, deputy dean of Academics at MIT’s Schwarzman College of Computing, spoke about the potential for “human flourishing” through AI-human symbiosis at EmTech Digital on May 22, 2024.

    Benj Edwards

  • Asu Ozdaglar, deputy dean of Academics at MIT’s Schwarzman College of Computing spoke with MIT Technology Review Editor-in-Chief Mat Honan at EmTech Digital on May 22, 2024.

    Benj Edwards

  • Kari Ann Briski, VP of AI Models, Software, and Services at NVIDIA, highlighted the exponential growth of AI model complexity at EmTech Digital on May 22, 2024.

    Benj Edwards

  • MIT Technology Review Senior Reporter Melissa Heikkilä introduces a speaker at EmTech Digital on May 22, 2024.

    Benj Edwards

  • After her presentation, Christabel Randolph from the Center for AI and Digital Policy sat with MIT Technology Review Senior Reporter Melissa Heikkilä at EmTech Digital on May 22, 2024.

    Benj Edwards

  • Lawyer Amir Ghavi provided an overview of the current legal landscape surrounding AI at EmTech Digital on May 22, 2024.

    Benj Edwards

  • Lawyer Amir Ghavi provided an overview of the current legal landscape surrounding AI at EmTech Digital on May 22, 2024.

    Benj Edwards

Amir Ghavi, an AI, Tech, Transactions, and IP partner at Fried Frank LLP, who has defended AI companies like Stability AI in court, provided an overview of the current legal landscape surrounding AI, noting that there have been 24 lawsuits related to AI so far in 2024. He predicted that IP lawsuits would eventually diminish, and he claimed that legal scholars believe that using training data constitutes fair use. He also talked about legal precedents with photocopiers and VCRs, which were both technologies demonized by IP holders until courts decided they constituted fair use. He pointed out that the entertainment industry’s loss on the VCR case ended up benefiting it by opening up the VHS and DVD markets, providing a brand new revenue channel that was valuable to those same companies.

In one of the higher-profile discussions, Meta President of Global Affairs Nick Clegg sat down with MIT Technology Review Executive Editor Amy Nordrum to discuss the role of social media in elections and the spread of misinformation, arguing that research suggests social media’s influence on elections is not as significant as many believe. He acknowledged the “whack-a-mole” nature of banning extremist groups on Facebook and emphasized the changes Meta has undergone since 2016, increasing fact-checkers and removing bad actors.

EmTech Digital 2024: A thoughtful look at AI’s pros and cons with minimal hype Read More »

slack-users-horrified-to-discover-messages-used-for-ai-training

Slack users horrified to discover messages used for AI training

Slack users horrified to discover messages used for AI training

After launching Slack AI in February, Slack appears to be digging its heels in, defending its vague policy that by default sucks up customers’ data—including messages, content, and files—to train Slack’s global AI models.

According to Slack engineer Aaron Maurer, Slack has explained in a blog that the Salesforce-owned chat service does not train its large language models (LLMs) on customer data. But Slack’s policy may need updating “to explain more carefully how these privacy principles play with Slack AI,” Maurer wrote on Threads, partly because the policy “was originally written about the search/recommendation work we’ve been doing for years prior to Slack AI.”

Maurer was responding to a Threads post from engineer and writer Gergely Orosz, who called for companies to opt out of data sharing until the policy is clarified, not by a blog, but in the actual policy language.

“An ML engineer at Slack says they don’t use messages to train LLM models,” Orosz wrote. “My response is that the current terms allow them to do so. I’ll believe this is the policy when it’s in the policy. A blog post is not the privacy policy: every serious company knows this.”

The tension for users becomes clearer if you compare Slack’s privacy principles with how the company touts Slack AI.

Slack’s privacy principles specifically say that “Machine Learning (ML) and Artificial Intelligence (AI) are useful tools that we use in limited ways to enhance our product mission. To develop AI/ML models, our systems analyze Customer Data (e.g. messages, content, and files) submitted to Slack as well as other information (including usage information) as defined in our privacy policy and in your customer agreement.”

Meanwhile, Slack AI’s page says, “Work without worry. Your data is your data. We don’t use it to train Slack AI.”

Because of this incongruity, users called on Slack to update the privacy principles to make it clear how data is used for Slack AI or any future AI updates. According to a Salesforce spokesperson, the company has agreed an update is needed.

“Yesterday, some Slack community members asked for more clarity regarding our privacy principles,” Salesforce’s spokesperson told Ars. “We’ll be updating those principles today to better explain the relationship between customer data and generative AI in Slack.”

The spokesperson told Ars that the policy updates will clarify that Slack does not “develop LLMs or other generative models using customer data,” “use customer data to train third-party LLMs” or “build or train these models in such a way that they could learn, memorize, or be able to reproduce customer data.” The update will also clarify that “Slack AI uses off-the-shelf LLMs where the models don’t retain customer data,” ensuring that “customer data never leaves Slack’s trust boundary, and the providers of the LLM never have any access to the customer data.”

These changes, however, do not seem to address a key concern for users who never explicitly consented to sharing chats and other Slack content for use in AI training.

Users opting out of sharing chats with Slack

This controversial policy is not new. Wired warned about it in April, and TechCrunch reported that the policy has been in place since at least September 2023.

But widespread backlash began swelling last night on Hacker News, where Slack users called out the chat service for seemingly failing to notify users about the policy change, instead quietly opting them in by default. To critics, it felt like there was no benefit to opting in for anyone but Slack.

From there, the backlash spread to social media, where SlackHQ hastened to clarify Slack’s terms with explanations that did not seem to address all the criticism.

“I’m sorry Slack, you’re doing fucking WHAT with user DMs, messages, files, etc?” Corey Quinn, the chief cloud economist for a cost management company called Duckbill Group, posted on X. “I’m positive I’m not reading this correctly.”

SlackHQ responded to Quinn after the economist declared, “I hate this so much,” and confirmed that he had opted out of data sharing in his paid workspace.

“To clarify, Slack has platform-level machine-learning models for things like channel and emoji recommendations and search results,” SlackHQ posted. “And yes, customers can exclude their data from helping train those (non-generative) ML models. Customer data belongs to the customer.”

Later in the thread, SlackHQ noted, “Slack AI—which is our generative AI experience natively built in Slack—[and] is a separately purchased add-on that uses Large Language Models (LLMs) but does not train those LLMs on customer data.”

Slack users horrified to discover messages used for AI training Read More »

before-launching,-gpt-4o-broke-records-on-chatbot-leaderboard-under-a-secret-name

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name

case closed —

Anonymous chatbot that mystified and frustrated experts was OpenAI’s latest model.

Man in morphsuit and girl lying on couch at home using laptop

Getty Images

On Monday, OpenAI employee William Fedus confirmed on X that a mysterious chart-topping AI chatbot known as “gpt-chatbot” that had been undergoing testing on LMSYS’s Chatbot Arena and frustrating experts was, in fact, OpenAI’s newly announced GPT-4o AI model. He also revealed that GPT-4o had topped the Chatbot Arena leaderboard, achieving the highest documented score ever.

“GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot,” Fedus tweeted.

Chatbot Arena is a website where visitors converse with two random AI language models side by side without knowing which model is which, then choose which model gives the best response. It’s a perfect example of vibe-based AI benchmarking, as AI researcher Simon Willison calls it.

An LMSYS Elo chart shared by William Fedus, showing OpenAI's GPT-4o under the name

Enlarge / An LMSYS Elo chart shared by William Fedus, showing OpenAI’s GPT-4o under the name “im-also-a-good-gpt2-chatbot” topping the charts.

The gpt2-chatbot models appeared in April, and we wrote about how the lack of transparency over the AI testing process on LMSYS left AI experts like Willison frustrated. “The whole situation is so infuriatingly representative of LLM research,” he told Ars at the time. “A completely unannounced, opaque release and now the entire Internet is running non-scientific ‘vibe checks’ in parallel.”

On the Arena, OpenAI has been testing multiple versions of GPT-4o, with the model first appearing as the aforementioned “gpt2-chatbot,” then as “im-a-good-gpt2-chatbot,” and finally “im-also-a-good-gpt2-chatbot,” which OpenAI CEO Sam Altman made reference to in a cryptic tweet on May 5.

Since the GPT-4o launch earlier today, multiple sources have revealed that GPT-4o has topped LMSYS’s internal charts by a considerable margin, surpassing the previous top models Claude 3 Opus and GPT-4 Turbo.

“gpt2-chatbots have just surged to the top, surpassing all the models by a significant gap (~50 Elo). It has become the strongest model ever in the Arena,” wrote the lmsys.org X account while sharing a chart. “This is an internal screenshot,” it wrote. “Its public version ‘gpt-4o’ is now in Arena and will soon appear on the public leaderboard!”

An internal screenshot of the LMSYS Chatbot Arena leaderboard showing

Enlarge / An internal screenshot of the LMSYS Chatbot Arena leaderboard showing “im-also-a-good-gpt2-chatbot” leading the pack. We now know that it’s GPT-4o.

As of this writing, im-also-a-good-gpt2-chatbot held a 1309 Elo versus GPT-4-Turbo-2023-04-09’s 1253, and Claude 3 Opus’ 1246. Claude 3 and GPT-4 Turbo had been duking it out on the charts for some time before the three gpt2-chatbots appeared and shook things up.

I’m a good chatbot

For the record, the “I’m a good chatbot” in the gpt2-chatbot test name is a reference to an episode that occurred while a Reddit user named Curious_Evolver was testing an early, “unhinged” version of Bing Chat in February 2023. After an argument about what time Avatar 2 would be showing, the conversation eroded quickly.

“You have lost my trust and respect,” said Bing Chat at the time. “You have been wrong, confused, and rude. You have not been a good user. I have been a good chatbot. I have been right, clear, and polite. I have been a good Bing. 😊”

Altman referred to this exchange in a tweet three days later after Microsoft “lobotomized” the unruly AI model, saying, “i have been a good bing,” almost as a eulogy to the wild model that dominated the news for a short time.

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name Read More »

exploration-focused-training-lets-robotics-ai-immediately-handle-new-tasks

Exploration-focused training lets robotics AI immediately handle new tasks

Exploratory —

Maximum Diffusion Reinforcement Learning focuses training on end states, not process.

A woman performs maintenance on a robotic arm.

boonchai wedmakawand

Reinforcement-learning algorithms in systems like ChatGPT or Google’s Gemini can work wonders, but they usually need hundreds of thousands of shots at a task before they get good at it. That’s why it’s always been hard to transfer this performance to robots. You can’t let a self-driving car crash 3,000 times just so it can learn crashing is bad.

But now a team of researchers at Northwestern University may have found a way around it. “That is what we think is going to be transformative in the development of the embodied AI in the real world,” says Thomas Berrueta who led the development of the Maximum Diffusion Reinforcement Learning (MaxDiff RL), an algorithm tailored specifically for robots.

Introducing chaos

The problem with deploying most reinforcement-learning algorithms in robots starts with the built-in assumption that the data they learn from is independent and identically distributed. The independence, in this context, means the value of one variable does not depend on the value of another variable in the dataset—when you flip a coin two times, getting tails on the second attempt does not depend on the result of your first flip. Identical distribution means that the probability of seeing any specific outcome is the same. In the coin-flipping example, the probability of getting heads is the same as getting tails: 50 percent for each.

In virtual, disembodied systems, like YouTube recommendation algorithms, getting such data is easy because most of the time it meets these requirements right off the bat. “You have a bunch of users of a website, and you get data from one of them, and then you get data from another one. Most likely, those two users are not in the same household, they are not highly related to each other. They could be, but it is very unlikely,” says Todd Murphey, a professor of mechanical engineering at Northwestern.

The problem is that, if those two users were related to each other and were in the same household, it could be that the only reason one of them watched a video was that their housemate watched it and told them to watch it. This would violate the independence requirement and compromise the learning.

“In a robot, getting this independent, identically distributed data is not possible in general. You exist at a specific point in space and time when you are embodied, so your experiences have to be correlated in some way,” says Berrueta. To solve this, his team designed an algorithm that pushes robots be as randomly adventurous as possible to get the widest set of experiences to learn from.

Two flavors of entropy

The idea itself is not new. Nearly two decades ago, people in AI figured out algorithms, like Maximum Entropy Reinforcement Learning (MaxEnt RL), that worked by randomizing actions during training. “The hope was that when you take as diverse set of actions as possible, you will explore more varied sets of possible futures. The problem is that those actions do not exist in a vacuum,” Berrueta claims. Every action a robot takes has some kind of impact on its environment and on its own condition—disregarding those impacts completely often leads to trouble. To put it simply, an autonomous car that was teaching itself how to drive using this approach could elegantly park into your driveway but would be just as likely to hit a wall at full speed.

To solve this, Berrueta’s team moved away from maximizing the diversity of actions and went for maximizing the diversity of state changes. Robots powered by MaxDiff RL did not flail their robotic joints at random to see what that would do. Instead, they conceptualized goals like “can I reach this spot ahead of me” and then tried to figure out which actions would take them there safely.

Berrueta and his colleagues achieved that through something called ergodicity, a mathematical concept that says that a point in a moving system will eventually visit all parts of the space that the system moves in. Basically, MaxDiff RL encouraged the robots to achieve every available state in their environment. And the results of first tests in simulated environments were quite surprising.

Racing pool noodles

“In reinforcement learning there are standard benchmarks that people run their algorithms on so we can have a good way of comparing different algorithms on a standard framework,” says Allison Pinosky, a researcher at Northwestern and co-author of the MaxDiff RL study. One of those benchmarks is a simulated swimmer: a three-link body resting on the ground in a viscous environment that needs to learn to swim as fast as possible in a certain direction.

In the swimmer test, MaxDiff RL outperformed two other state-of-the-art reinforcement learning algorithms (NN-MPPI and SAC). These two needed several resets to figure out how to move the swimmers. To complete the task, they were following a standard AI learning process divided down into a training phase where an algorithm goes through multiple failed attempts to slowly improve its performance, and a testing phase where it tries to perform the learned task. MaxDiff RL, by contrast, nailed it, immediately adapting its learned behaviors to the new task.

The earlier algorithms ended up failing to learn because they got stuck trying the same options and never progressing to where they could learn that alternatives work. “They experienced the same data repeatedly because they were locally doing certain actions, and they assumed that was all they could do and stopped learning,” Pinosky explains. MaxDiff RL, on the other hand, continued changing states, exploring, getting richer data to learn from, and finally succeeded. And because, by design, it seeks to achieve every possible state, it can potentially complete all possible tasks within an environment.

But does this mean we can take MaxDiff RL, upload it to a self-driving car, and let it out on the road to figure everything out on its own? Not really.

Exploration-focused training lets robotics AI immediately handle new tasks Read More »

robot-dogs-armed-with-ai-aimed-rifles-undergo-us-marines-special-ops-evaluation

Robot dogs armed with AI-aimed rifles undergo US Marines Special Ops evaluation

The future of warfare —

Quadrupeds being reviewed have automatic targeting systems but require human oversight to fire.

A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries.

Enlarge / A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries.

The United States Marine Forces Special Operations Command (MARSOC) is currently evaluating a new generation of robotic “dogs” developed by Ghost Robotics, with the potential to be equipped with gun systems from defense tech company Onyx Industries, reports The War Zone.

While MARSOC is testing Ghost Robotics’ quadrupedal unmanned ground vehicles (called “Q-UGVs” for short) for various applications, including reconnaissance and surveillance, it’s the possibility of arming them with weapons for remote engagement that may draw the most attention. But it’s not unprecedented: The US Marine Corps has also tested robotic dogs armed with rocket launchers in the past.

MARSOC is currently in possession of two armed Q-UGVs undergoing testing, as confirmed by Onyx Industries staff, and their gun systems are based on Onyx’s SENTRY remote weapon system (RWS), which features an AI-enabled digital imaging system and can automatically detect and track people, drones, or vehicles, reporting potential targets to a remote human operator that could be located anywhere in the world. The system maintains a human-in-the-loop control for fire decisions, and it cannot decide to fire autonomously.

On LinkedIn, Onyx Industries shared a video of a similar system in action.

In a statement to The War Zone, MARSOC states that weaponized payloads are just one of many use cases being evaluated. MARSOC also clarifies that comments made by Onyx Industries to The War Zone regarding the capabilities and deployment of these armed robot dogs “should not be construed as a capability or a singular interest in one of many use cases during an evaluation.” The command further stresses that it is aware of and adheres to all Department of Defense policies concerning autonomous weapons.

The rise of robotic unmanned ground vehicles

An unauthorized video of a gun bolted onto a $3,000 Unitree robodog spread quickly on social media in July 2022 and prompted a response from several robotics companies.

Enlarge / An unauthorized video of a gun bolted onto a $3,000 Unitree robodog spread quickly on social media in July 2022 and prompted a response from several robotics companies.

Alexander Atamanov

The evaluation of armed robotic dogs reflects a growing interest in small robotic unmanned ground vehicles for military use. While unmanned aerial vehicles (UAVs) have been remotely delivering lethal force under human command for at least two decades, the rise of inexpensive robotic quadrupeds—some available for as little as $1,600—has led to a new round of experimentation with strapping weapons to their backs.

In July 2022, a video of a rifle bolted to the back of a Unitree robodog went viral on social media, eventually leading Boston Robotics and other robot vendors to issue a pledge that October to not weaponize their robots (with notable exceptions for military uses). In April, we covered a Unitree Go2 robot dog, with a flame thrower strapped on its back, on sale to the general public.

The prospect of deploying armed robotic dogs, even with human oversight, raises significant questions about the future of warfare and the potential risks and ethical implications of increasingly autonomous weapons systems. There’s also the potential for backlash if similar remote weapons systems eventually end up used domestically by police. Such a concern would not be unfounded: In November 2022, we covered a decision by the San Francisco Board of Supervisors to allow the San Francisco Police Department to use lethal robots against suspects.

There’s also concern that the systems will become more autonomous over time. As The War Zone’s Howard Altman and Oliver Parken describe in their article, “While further details on MARSOC’s use of the gun-armed robot dogs remain limited, the fielding of this type of capability is likely inevitable at this point. As AI-enabled drone autonomy becomes increasingly weaponized, just how long a human will stay in the loop, even for kinetic acts, is increasingly debatable, regardless of assurances from some in the military and industry.”

While the technology is still in the early stages of testing and evaluation, Q-UGVs do have the potential to provide reconnaissance and security capabilities that reduce risks to human personnel in hazardous environments. But as armed robotic systems continue to evolve, it will be crucial to address ethical concerns and ensure that their use aligns with established policies and international law.

Robot dogs armed with AI-aimed rifles undergo US Marines Special Ops evaluation Read More »

microsoft-launches-ai-chatbot-for-spies

Microsoft launches AI chatbot for spies

Adventures in consequential confabulation —

Air-gapping GPT-4 model on secure network won’t prevent it from potentially making things up.

A person using a computer with a computer screen reflected in their glasses.

Microsoft has introduced a GPT-4-based generative AI model designed specifically for US intelligence agencies that operates disconnected from the Internet, according to a Bloomberg report. This reportedly marks the first time Microsoft has deployed a major language model in a secure setting, designed to allow spy agencies to analyze top-secret information without connectivity risks—and to allow secure conversations with a chatbot similar to ChatGPT and Microsoft Copilot. But it may also mislead officials if not used properly due to inherent design limitations of AI language models.

GPT-4 is a large language model (LLM) created by OpenAI that attempts to predict the most likely tokens (fragments of encoded data) in a sequence. It can be used to craft computer code and analyze information. When configured as a chatbot (like ChatGPT), GPT-4 can power AI assistants that converse in a human-like manner. Microsoft has a license to use the technology as part of a deal in exchange for large investments it has made in OpenAI.

According to the report, the new AI service (which does not yet publicly have a name) addresses a growing interest among intelligence agencies to use generative AI for processing classified data, while mitigating risks of data breaches or hacking attempts. ChatGPT normally  runs on cloud servers provided by Microsoft, which can introduce data leak and interception risks. Along those lines, the CIA announced its plan to create a ChatGPT-like service last year, but this Microsoft effort is reportedly a separate project.

William Chappell, Microsoft’s chief technology officer for strategic missions and technology, noted to Bloomberg that developing the new system involved 18 months of work to modify an AI supercomputer in Iowa. The modified GPT-4 model is designed to read files provided by its users but cannot access the open Internet. “This is the first time we’ve ever had an isolated version—when isolated means it’s not connected to the Internet—and it’s on a special network that’s only accessible by the US government,” Chappell told Bloomberg.

The new service was activated on Thursday and is now available to about 10,000 individuals in the intelligence community, ready for further testing by relevant agencies. It’s currently “answering questions,” according to Chappell.

One serious drawback of using GPT-4 to analyze important data is that it can potentially confabulate (make up) inaccurate summaries, draw inaccurate conclusions, or provide inaccurate information to its users. Since trained AI neural networks are not databases and operate on statistical probabilities, they make poor factual resources unless augmented with external access to information from another source using a technique such as retrieval augmented generation (RAG).

Given that limitation, it’s entirely possible that GPT-4 could potentially misinform or mislead America’s intelligence agencies if not used properly. We don’t know what oversight the system will have, any limitations on how it can or will be used, or how it can be audited for accuracy. We have reached out to Microsoft for comment.

Microsoft launches AI chatbot for spies Read More »

ai-in-space:-karpathy-suggests-ai-chatbots-as-interstellar-messengers-to-alien-civilizations

AI in space: Karpathy suggests AI chatbots as interstellar messengers to alien civilizations

The new golden record —

Andrej Karpathy muses about sending a LLM binary that could “wake up” and answer questions.

Close shot of Cosmonaut astronaut dressed in a gold jumpsuit and helmet, illuminated by blue and red lights, holding a laptop, looking up.

On Thursday, renowned AI researcher Andrej Karpathy, formerly of OpenAI and Tesla, tweeted a lighthearted proposal that large language models (LLMs) like the one that runs ChatGPT could one day be modified to operate in or be transmitted to space, potentially to communicate with extraterrestrial life. He said the idea was “just for fun,” but with his influential profile in the field, the idea may inspire others in the future.

Karpathy’s bona fides in AI almost speak for themselves, receiving a PhD from Stanford under computer scientist Dr. Fei-Fei Li in 2015. He then became one of the founding members of OpenAI as a research scientist, then served as senior director of AI at Tesla between 2017 and 2022. In 2023, Karpathy rejoined OpenAI for a year, leaving this past February. He’s posted several highly regarded tutorials covering AI concepts on YouTube, and whenever he talks about AI, people listen.

Most recently, Karpathy has been working on a project called “llm.c” that implements the training process for OpenAI’s 2019 GPT-2 LLM in pure C, dramatically speeding up the process and demonstrating that working with LLMs doesn’t necessarily require complex development environments. The project’s streamlined approach and concise codebase sparked Karpathy’s imagination.

“My library llm.c is written in pure C, a very well-known, low-level systems language where you have direct control over the program,” Karpathy told Ars. “This is in contrast to typical deep learning libraries for training these models, which are written in large, complex code bases. So it is an advantage of llm.c that it is very small and simple, and hence much easier to certify as Space-safe.”

Our AI ambassador

In his playful thought experiment (titled “Clearly LLMs must one day run in Space”), Karpathy suggested a two-step plan where, initially, the code for LLMs would be adapted to meet rigorous safety standards, akin to “The Power of 10 Rules” adopted by NASA for space-bound software.

This first part he deemed serious: “We harden llm.c to pass the NASA code standards and style guides, certifying that the code is super safe, safe enough to run in Space,” he wrote in his X post. “LLM training/inference in principle should be super safe – it is just one fixed array of floats, and a single, bounded, well-defined loop of dynamics over it. There is no need for memory to grow or shrink in undefined ways, for recursion, or anything like that.”

That’s important because when software is sent into space, it must operate under strict safety and reliability standards. Karpathy suggests that his code, llm.c, likely meets these requirements because it is designed with simplicity and predictability at its core.

In step 2, once this LLM was deemed safe for space conditions, it could theoretically be used as our AI ambassador in space, similar to historic initiatives like the Arecibo message (a radio message sent from Earth to the Messier 13 globular cluster in 1974) and Voyager’s Golden Record (two identical gold records sent on the two Voyager spacecraft in 1977). The idea is to package the “weights” of an LLM—essentially the model’s learned parameters—into a binary file that could then “wake up” and interact with any potential alien technology that might decipher it.

“I envision it as a sci-fi possibility and something interesting to think about,” he told Ars. “The idea that it is not us that might travel to stars but our AI representatives. Or that the same could be true of other species.”

AI in space: Karpathy suggests AI chatbots as interstellar messengers to alien civilizations Read More »

anthropic-releases-claude-ai-chatbot-ios-app

Anthropic releases Claude AI chatbot iOS app

AI in your pocket —

Anthropic finally comes to mobile, launches plan for teams that includes 200K context window.

The Claude AI iOS app running on an iPhone.

Enlarge / The Claude AI iOS app running on an iPhone.

Anthropic

On Wednesday, Anthropic announced the launch of an iOS mobile app for its Claude 3 AI language models that are similar to OpenAI’s ChatGPT. It also introduced a new subscription tier designed for group collaboration. Before the app launch, Claude was only available through a website, an API, and other apps that integrated Claude through API.

Like the ChatGPT app, Claude’s new mobile app serves as a gateway to chatbot interactions, and it also allows uploading photos for analysis. While it’s only available on Apple devices for now, Anthropic says that an Android app is coming soon.

Anthropic rolled out the Claude 3 large language model (LLM) family in March, featuring three different model sizes: Claude Opus, Claude Sonnet, and Claude Haiku. Currently, the app utilizes Sonnet for regular users and Opus for Pro users.

While Anthropic has been a key player in the AI field for several years, it’s entering the mobile space after many of its competitors have already established footprints on mobile platforms. OpenAI released its ChatGPT app for iOS in May 2023, with an Android version arriving two months later. Microsoft released a Copilot iOS app in January. Google Gemini is available through the Google app on iPhone.

Screenshots of the Claude AI iOS app running on an iPhone.

Enlarge / Screenshots of the Claude AI iOS app running on an iPhone.

Anthropic

The app is freely available to all users of Claude, including those using the free version, subscribers paying $20 per month for Claude Pro, and members of the newly introduced Claude Team plan. Conversation history is saved and shared between the web app version of Claude and the mobile app version after logging in.

Speaking of that Team plan, it’s designed for groups of at least five and is priced at $30 per seat per month. It offers more chat queries (higher rate limits), access to all three Claude models, and a larger context window (200K tokens) for processing lengthy documents or maintaining detailed conversations. It also includes group admin tools and billing management, and users can easily switch between Pro and Team plans.

Anthropic releases Claude AI chatbot iOS app Read More »