machine learning

chatgpt-comes-to-500,000-new-users-in-openai’s-largest-ai-education-deal-yet

ChatGPT comes to 500,000 new users in OpenAI’s largest AI education deal yet

On Tuesday, OpenAI announced plans to introduce ChatGPT to California State University’s 460,000 students and 63,000 faculty members across 23 campuses, reports Reuters. The education-focused version of the AI assistant will aim to provide students with personalized tutoring and study guides, while faculty will be able to use it for administrative work.

“It is critical that the entire education ecosystem—institutions, systems, technologists, educators, and governments—work together to ensure that all students have access to AI and gain the skills to use it responsibly,” said Leah Belsky, VP and general manager of education at OpenAI, in a statement.

OpenAI began integrating ChatGPT into educational settings in 2023, despite early concerns from some schools about plagiarism and potential cheating, leading to early bans in some US school districts and universities. But over time, resistance to AI assistants softened in some educational institutions.

Prior to OpenAI’s launch of ChatGPT Edu in May 2024—a version purpose-built for academic use—several schools had already been using ChatGPT Enterprise, including the University of Pennsylvania’s Wharton School (employer of frequent AI commentator Ethan Mollick), the University of Texas at Austin, and the University of Oxford.

Currently, the new California State partnership represents OpenAI’s largest deployment yet in US higher education.

The higher education market has become competitive for AI model makers, as Reuters notes. Last November, Google’s DeepMind division partnered with a London university to provide AI education and mentorship to teenage students. And in January, Google invested $120 million in AI education programs and plans to introduce its Gemini model to students’ school accounts.

The pros and cons

In the past, we’ve written frequently about accuracy issues with AI chatbots, such as producing confabulations—plausible fictions—that might lead students astray. We’ve also covered the aforementioned concerns about cheating. Those issues remain, and relying on ChatGPT as a factual reference is still not the best idea because the service could introduce errors into academic work that might be difficult to detect.

ChatGPT comes to 500,000 new users in OpenAI’s largest AI education deal yet Read More »

hugging-face-clones-openai’s-deep-research-in-24-hours

Hugging Face clones OpenAI’s Deep Research in 24 hours

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Hugging Face clones OpenAI’s Deep Research in 24 hours Read More »

microsoft-now-hosts-ai-model-accused-of-copying-openai-data

Microsoft now hosts AI model accused of copying OpenAI data

Fresh on the heels of a controversy in which ChatGPT-maker OpenAI accused the Chinese company behind DeepSeek R1 of using its AI model outputs against its terms of service, OpenAI’s largest investor, Microsoft, announced on Wednesday that it will now host DeepSeek R1 on its Azure cloud service.

DeepSeek R1 has been the talk of the AI world for the past week because it is a freely available simulated reasoning model that reportedly matches OpenAI’s o1 in performance—while allegedly being trained for a fraction of the cost.

Azure allows software developers to rent computing muscle from machines hosted in Microsoft-owned data centers, as well as rent access to software that runs on them.

“R1 offers a powerful, cost-efficient model that allows more users to harness state-of-the-art AI capabilities with minimal infrastructure investment,” wrote Microsoft Corporate Vice President Asha Sharma in a news release.

DeepSeek R1 runs at a fraction of the cost of o1, at least through each company’s own services. Comparative prices for R1 and o1 were not immediately available on Azure, but DeepSeek lists R1’s API cost as $2.19 per million output tokens, while OpenAI’s o1 costs $60 per million output tokens. That’s a massive discount for a model that performs similarly to o1-pro in various tasks.

Promoting a controversial AI model

On its face, the decision to host R1 on Microsoft servers is not unusual: The company offers access to over 1,800 models on its Azure AI Foundry service with the hopes of allowing software developers to experiment with various AI models and integrate them into their products. In some ways, whatever model they choose, Microsoft still wins because it’s being hosted on the company’s cloud service.

Microsoft now hosts AI model accused of copying OpenAI data Read More »

anthropic-builds-rag-directly-into-claude-models-with-new-citations-api

Anthropic builds RAG directly into Claude models with new Citations API

Willison notes that while citing sources helps verify accuracy, building a system that does it well “can be quite tricky,” but Citations appears to be a step in the right direction by building RAG capability directly into the model.

Apparently, that capability is not a new thing. Anthropic’s Alex Albert wrote on X, “Under the hood, Claude is trained to cite sources. With Citations, we are exposing this ability to devs. To use Citations, users can pass a new “citations: enabled:true” parameter on any document type they send through the API.”

Early adopter reports promising results

The company released Citations for Claude 3.5 Sonnet and Claude 3.5 Haiku models through both the Anthropic API and Google Cloud’s Vertex AI platform, but it’s apparently already getting some use in the field.

Anthropic says that Thomson Reuters, which uses Claude to power its CoCounsel legal AI reference platform, is looking forward to using Citations in a way that helps “minimize hallucination risk but also strengthens trust in AI-generated content.”

Additionally, financial technology company Endex told Anthropic that Citations reduced their source confabulations from 10 percent to zero while increasing references per response by 20 percent, according to CEO Tarun Amasa.

Despite these claims, relying on any LLM to accurately relay reference information is still a risk until the technology is more deeply studied and proven in the field.

Anthropic will charge users its standard token-based pricing, though quoted text in responses won’t count toward output token costs. Sourcing a 100-page document as a reference would cost approximately $0.30 with Claude 3.5 Sonnet or $0.08 with Claude 3.5 Haiku, according to Anthropic’s standard API pricing.

Anthropic builds RAG directly into Claude models with new Citations API Read More »

openai-launches-operator,-an-ai-agent-that-can-operate-your-computer

OpenAI launches Operator, an AI agent that can operate your computer

While it’s working, Operator shows a miniature browser window of its actions.

However, the technology behind Operator is still relatively new and far from perfect. The model reportedly performs best at repetitive web tasks like creating shopping lists or playlists. It struggles more with unfamiliar interfaces like tables and calendars, and does poorly with complex text editing (with a 40 percent success rate), according to OpenAI’s internal testing data.

OpenAI reported the system achieved an 87 percent success rate on the WebVoyager benchmark, which tests live sites like Amazon and Google Maps. On WebArena, which uses offline test sites for training autonomous agents, Operator’s success rate dropped to 58.1 percent. For computer operating system tasks, CUA set an apparent record of 38.1 percent success on the OSWorld benchmark, surpassing previous models but still falling short of human performance at 72.4 percent.

With this imperfect research preview, OpenAI hopes to gather user feedback and refine the system’s capabilities. The company acknowledges CUA won’t perform reliably in all scenarios but plans to improve its reliability across a wider range of tasks through user testing.

Safety and privacy concerns

For any AI model that can see how you operate your computer and even control some aspects of it, privacy and safety are very important. OpenAI says it built multiple safety controls into Operator, requiring user confirmation before completing sensitive actions like sending emails or making purchases. Operator also has limits on what it can browse, set by OpenAI. It cannot access certain website categories, including gambling and adult content.

Traditionally, AI models based on large language model-style Transformer technology like Operator have been relatively easy to fool with jailbreaks and prompt injections.

To catch attempts at subverting Operator, which might hypothetically be embedded in websites that the AI model browses, OpenAI says it has implemented real-time moderation and detection systems. OpenAI reports the system recognized all but one case of prompt injection attempts during an early internal red-teaming session.

OpenAI launches Operator, an AI agent that can operate your computer Read More »

anthropic-chief-says-ai-could-surpass-“almost-all-humans-at-almost-everything”-shortly-after-2027

Anthropic chief says AI could surpass “almost all humans at almost everything” shortly after 2027

He then shared his concerns about how human-level AI models and robotics that are capable of replacing all human labor may require a complete re-think of how humans value both labor and themselves.

“We’ve recognized that we’ve reached the point as a technological civilization where the idea, there’s huge abundance and huge economic value, but the idea that the way to distribute that value is for humans to produce economic labor, and this is where they feel their sense of self worth,” he added. “Once that idea gets invalidated, we’re all going to have to sit down and figure it out.”

The eye-catching comments, similar to comments about AGI made recently by OpenAI CEO Sam Altman, come as Anthropic negotiates a $2 billion funding round that would value the company at $60 billion. Amodei disclosed that Anthropic’s revenue multiplied tenfold in 2024.

Amodei distances himself from “AGI” term

Even with his dramatic predictions, Amodei distanced himself from a term for this advanced labor-replacing AI favored by Altman, “artificial general intelligence” (AGI), calling it in a separate CNBC interview from the same event in Switzerland a marketing term.

Instead, he prefers to describe future AI systems as a “country of geniuses in a data center,” he told CNBC. Amodei wrote in an October 2024 essay that such systems would need to be “smarter than a Nobel Prize winner across most relevant fields.”

On Monday, Google announced an additional $1 billion investment in Anthropic, bringing its total commitment to $3 billion. This follows Amazon’s $8 billion investment over the past 18 months. Amazon plans to integrate Claude models into future versions of its Alexa speaker.

Anthropic chief says AI could surpass “almost all humans at almost everything” shortly after 2027 Read More »

trump-announces-$500b-“stargate”-ai-infrastructure-project-with-agi-aims

Trump announces $500B “Stargate” AI infrastructure project with AGI aims

Video of the Stargate announcement conference at the White House.

Despite optimism from the companies involved, as CNN reports, past presidential investment announcements have yielded mixed results. In 2017, Trump and Foxconn unveiled plans for a $10 billion Wisconsin electronics factory promising 13,000 jobs. The project later scaled back to a $672 million investment with fewer than 1,500 positions. The facility now operates as a Microsoft AI data center.

The Stargate announcement wasn’t Trump’s only major AI move announced this week. It follows the newly inaugurated US president’s reversal of a 2023 Biden executive order on AI risk monitoring and regulation.

Altman speaks, Musk responds

On Tuesday, OpenAI CEO Sam Altman appeared at a White House press conference alongside Present Trump, Oracle CEO Larry Ellison, and SoftBank CEO Masayoshi Son to announce Stargate.

Altman said he thinks Stargate represents “the most important project of this era,” allowing AGI to emerge in the United States. He believes that future AI technology could create hundreds of thousands of jobs. “We wouldn’t be able to do this without you, Mr. President,” Altman added.

Responding to off-camera questions from Trump about AI’s potential to spur scientific development, Altman said he believes AI will accelerate the discoveries for cures of diseases like cancer and heart disease.

Screenshots of Elon Musk challenging the Stargate announcement on X.

Screenshots of Elon Musk challenging the Stargate announcement on X.

Meanwhile on X, Trump ally and frequent Altman foe Elon Musk immediately attacked the Stargate plan, writing, “They don’t actually have the money,” and following up with a claim that we cannot yet substantiate, saying, “SoftBank has well under $10B secured. I have that on good authority.”

Musk’s criticism has complex implications given his very close ties to Trump, his history of litigating against OpenAI (which he co-founded and later left), and his own goals with his xAI company.

Trump announces $500B “Stargate” AI infrastructure project with AGI aims Read More »

cutting-edge-chinese-“reasoning”-model-rivals-openai-o1—and-it’s-free-to-download

Cutting-edge Chinese “reasoning” model rivals OpenAI o1—and it’s free to download

Unlike conventional LLMs, these SR models take extra time to produce responses, and this extra time often increases performance on tasks involving math, physics, and science. And this latest open model is turning heads for apparently quickly catching up to OpenAI.

For example, DeepSeek reports that R1 outperformed OpenAI’s o1 on several benchmarks and tests, including AIME (a mathematical reasoning test), MATH-500 (a collection of word problems), and SWE-bench Verified (a programming assessment tool). As we usually mention, AI benchmarks need to be taken with a grain of salt, and these results have yet to be independently verified.

A chart of DeepSeek R1 benchmark results, created by DeepSeek.

A chart of DeepSeek R1 benchmark results, created by DeepSeek. Credit: DeepSeek

TechCrunch reports that three Chinese labs—DeepSeek, Alibaba, and Moonshot AI’s Kimi—have now released models they say match o1’s capabilities, with DeepSeek first previewing R1 in November.

But the new DeepSeek model comes with a catch if run in the cloud-hosted version—being Chinese in origin, R1 will not generate responses about certain topics like Tiananmen Square or Taiwan’s autonomy, as it must “embody core socialist values,” according to Chinese Internet regulations. This filtering comes from an additional moderation layer that isn’t an issue if the model is run locally outside of China.

Even with the potential censorship, Dean Ball, an AI researcher at George Mason University, wrote on X, “The impressive performance of DeepSeek’s distilled models (smaller versions of r1) means that very capable reasoners will continue to proliferate widely and be runnable on local hardware, far from the eyes of any top-down control regime.”

Cutting-edge Chinese “reasoning” model rivals OpenAI o1—and it’s free to download Read More »

amid-a-flurry-of-hype,-microsoft-reorganizes-entire-dev-team-around-ai

Amid a flurry of hype, Microsoft reorganizes entire dev team around AI

Microsoft CEO Satya Nadella has announced a dramatic restructuring of the company’s engineering organization, which is pivoting the company’s focus to developing the tools that will underpin agentic AI.

Dubbed “CoreAI – Platform and Tools,” the new division rolls the existing AI platform team and the previous developer division (responsible for everything from .NET to Visual Studio) along with some other teams into one big group.

As for what this group will be doing specifically, it’s basically everything that’s mission-critical to Microsoft in 2025, as Nadella tells it:

This new division will bring together Dev Div, AI Platform, and some key teams from the Office of the CTO (AI Supercomputer, AI Agentic Runtimes, and Engineering Thrive), with the mission to build the end-to-end Copilot & AI stack for both our first-party and third-party customers to build and run AI apps and agents. This group will also build out GitHub Copilot, thus having a tight feedback loop between the leading AI-first product and the AI platform to motivate the stack and its roadmap.

To accomplish all that, “Jay Parikh will lead this group as EVP.” Parikh was hired by Microsoft in October; he previously worked as the VP and global head of engineering at Meta.

The fact that the blog post doesn’t say anything about .NET or Visual Studio, instead emphasizing GitHub Copilot and anything and everything related to agentic AI, says a lot about how Nadella sees Microsoft’s future priorities.

So-called AI agents are applications that are given specified boundaries (action spaces) and a large memory capacity to independently do subsets of the kinds of work that human office workers do today. Some company leaders and AI commentators believe these agents will outright replace jobs, while others are more conservative, suggesting they’ll simply be powerful tools to streamline the jobs people already have.

Amid a flurry of hype, Microsoft reorganizes entire dev team around AI Read More »

161-years-ago,-a-new-zealand-sheep-farmer-predicted-ai-doom

161 years ago, a New Zealand sheep farmer predicted AI doom

The text anticipated several modern AI safety concerns, including the possibility of machine consciousness, self-replication, and humans losing control of their technological creations. These themes later appeared in works like Isaac Asimov’s The Evitable Conflict, Frank Herbert’s Dune novels (Butler possibly served as the inspiration for the term “Butlerian Jihad“), and the Matrix films.

A model of Charles Babbage's Analytical Engine, a calculating machine invented in 1837 but never built during Babbage's lifetime.

A model of Charles Babbage’s Analytical Engine, a calculating machine invented in 1837 but never built during Babbage’s lifetime. Credit: DE AGOSTINI PICTURE LIBRARY via Getty Images

Butler’s letter dug deep into the taxonomy of machine evolution, discussing mechanical “genera and sub-genera” and pointing to examples like how watches had evolved from “cumbrous clocks of the thirteenth century”—suggesting that, like some early vertebrates, mechanical species might get smaller as they became more sophisticated. He expanded these ideas in his 1872 novel Erewhon, which depicted a society that had banned most mechanical inventions. In his fictional society, citizens destroyed all machines invented within the previous 300 years.

Butler’s concerns about machine evolution received mixed reactions, according to Butler in the preface to the second edition of Erewhon. Some reviewers, he said, interpreted his work as an attempt to satirize Darwin’s evolutionary theory, though Butler denied this. In a letter to Darwin in 1865, Butler expressed his deep appreciation for The Origin of Species, writing that it “thoroughly fascinated” him and explained that he had defended Darwin’s theory against critics in New Zealand’s press.

What makes Butler’s vision particularly remarkable is that he was writing in a vastly different technological context when computing devices barely existed. While Charles Babbage had proposed his theoretical Analytical Engine in 1837—a mechanical computer using gears and levers that was never built in his lifetime—the most advanced calculating devices of 1863 were little more than mechanical calculators and slide rules.

Butler extrapolated from the simple machines of the Industrial Revolution, where mechanical automation was transforming manufacturing, but nothing resembling modern computers existed. The first working program-controlled computer wouldn’t appear for another 70 years, making his predictions of machine intelligence strikingly prescient.

Some things never change

The debate Butler started continues today. Two years ago, the world grappled with what one might call the “great AI takeover scare of 2023.” OpenAI’s GPT-4 had just been released, and researchers evaluated its “power-seeking behavior,” echoing concerns about potential self-replication and autonomous decision-making.

161 years ago, a New Zealand sheep farmer predicted AI doom Read More »

ai-could-create-78-million-more-jobs-than-it-eliminates-by-2030—report

AI could create 78 million more jobs than it eliminates by 2030—report

On Wednesday, the World Economic Forum (WEF) released its Future of Jobs Report 2025, with CNN immediately highlighting the finding that 40 percent of companies plan workforce reductions due to AI automation. But the report’s broader analysis paints a far more nuanced picture than CNN’s headline suggests: It finds that AI could create 170 million new jobs globally while eliminating 92 million positions, resulting in a net increase of 78 million jobs by 2030.

“Half of employers plan to re-orient their business in response to AI,” writes the WEF in the report. “Two-thirds plan to hire talent with specific AI skills, while 40% anticipate reducing their workforce where AI can automate tasks.”

The survey collected data from 1,000 companies that employ 14 million workers globally. The WEF conducts its employment analysis every two years to help policymakers, business leaders, and workers make decisions about hiring trends.

The new report points to specific skills that will dominate hiring by 2030. Companies ranked AI and big data expertise, networks and cybersecurity, and technological literacy as the three most in-demand skill sets.

The WEF identified AI as the biggest potential job creator among new technologies, with 86 percent of companies expecting AI to transform their operations by 2030.

Declining job categories

The WEF report also identifies specific job categories facing decline. Postal service clerks, executive secretaries, and payroll staff top the list of shrinking roles, with changes driven by factors including (but not limited to) AI adoption. And for the first time, graphic designers and legal secretaries appear among the fastest-declining positions, which the WEF tentatively links to generative AI’s expanding capabilities in creative and administrative work.

AI could create 78 million more jobs than it eliminates by 2030—report Read More »

2024:-the-year-ai-drove-everyone-crazy

2024: The year AI drove everyone crazy


What do eating rocks, rat genitals, and Willy Wonka have in common? AI, of course.

It’s been a wild year in tech thanks to the intersection between humans and artificial intelligence. 2024 brought a parade of AI oddities, mishaps, and wacky moments that inspired odd behavior from both machines and man. From AI-generated rat genitals to search engines telling people to eat rocks, this year proved that AI has been having a weird impact on the world.

Why the weirdness? If we had to guess, it may be due to the novelty of it all. Generative AI and applications built upon Transformer-based AI models are still so new that people are throwing everything at the wall to see what sticks. People have been struggling to grasp both the implications and potential applications of the new technology. Riding along with the hype, different types of AI that may end up being ill-advised, such as automated military targeting systems, have also been introduced.

It’s worth mentioning that aside from crazy news, we saw fewer weird AI advances in 2024 as well. For example, Claude 3.5 Sonnet launched in June held off the competition as a top model for most of the year, while OpenAI’s o1 used runtime compute to expand GPT-4o’s capabilities with simulated reasoning. Advanced Voice Mode and NotebookLM also emerged as novel applications of AI tech, and the year saw the rise of more capable music synthesis models and also better AI video generators, including several from China.

But for now, let’s get down to the weirdness.

ChatGPT goes insane

Illustration of a broken toy robot.

Early in the year, things got off to an exciting start when OpenAI’s ChatGPT experienced a significant technical malfunction that caused the AI model to generate increasingly incoherent responses, prompting users on Reddit to describe the system as “having a stroke” or “going insane.” During the glitch, ChatGPT’s responses would begin normally but then deteriorate into nonsensical text, sometimes mimicking Shakespearean language.

OpenAI later revealed that a bug in how the model processed language caused it to select the wrong words during text generation, leading to nonsense outputs (basically the text version of what we at Ars now call “jabberwockies“). The company fixed the issue within 24 hours, but the incident led to frustrations about the black box nature of commercial AI systems and users’ tendency to anthropomorphize AI behavior when it malfunctions.

The great Wonka incident

A photo of the Willy's Chocolate Experience, which did not match AI-generated promises.

A photo of “Willy’s Chocolate Experience” (inset), which did not match AI-generated promises, shown in the background. Credit: Stuart Sinclair

The collision between AI-generated imagery and consumer expectations fueled human frustrations in February when Scottish families discovered that “Willy’s Chocolate Experience,” an unlicensed Wonka-ripoff event promoted using AI-generated wonderland images, turned out to be little more than a sparse warehouse with a few modest decorations.

Parents who paid £35 per ticket encountered a situation so dire they called the police, with children reportedly crying at the sight of a person in what attendees described as a “terrifying outfit.” The event, created by House of Illuminati in Glasgow, promised fantastical spaces like an “Enchanted Garden” and “Twilight Tunnel” but delivered an underwhelming experience that forced organizers to shut down mid-way through its first day and issue refunds.

While the show was a bust, it brought us an iconic new meme for job disillusionment in the form of a photo: the green-haired Willy’s Chocolate Experience employee who looked like she’d rather be anywhere else on earth at that moment.

Mutant rat genitals expose peer review flaws

An actual laboratory rat, who is intrigued. Credit: Getty | Photothek

In February, Ars Technica senior health reporter Beth Mole covered a peer-reviewed paper published in Frontiers in Cell and Developmental Biology that created an uproar in the scientific community when researchers discovered it contained nonsensical AI-generated images, including an anatomically incorrect rat with oversized genitals. The paper, authored by scientists at Xi’an Honghui Hospital in China, openly acknowledged using Midjourney to create figures that contained gibberish text labels like “Stemm cells” and “iollotte sserotgomar.”

The publisher, Frontiers, posted an expression of concern about the article titled “Cellular functions of spermatogonial stem cells in relation to JAK/STAT signaling pathway” and launched an investigation into how the obviously flawed imagery passed through peer review. Scientists across social media platforms expressed dismay at the incident, which mirrored concerns about AI-generated content infiltrating academic publishing.

Chatbot makes erroneous refund promises for Air Canada

If, say, ChatGPT gives you the wrong name for one of the seven dwarves, it’s not such a big deal. But in February, Ars senior policy reporter Ashley Belanger covered a case of costly AI confabulation in the wild. In the course of online text conversations, Air Canada’s customer service chatbot told customers inaccurate refund policy information. The airline faced legal consequences later when a tribunal ruled the airline must honor commitments made by the automated system. Tribunal adjudicator Christopher Rivers determined that Air Canada bore responsibility for all information on its website, regardless of whether it came from a static page or AI interface.

The case set a precedent for how companies deploying AI customer service tools could face legal obligations for automated systems’ responses, particularly when they fail to warn users about potential inaccuracies. Ironically, the airline had reportedly spent more on the initial AI implementation than it would have cost to maintain human workers for simple queries, according to Air Canada executive Steve Crocker.

Will Smith lampoons his digital double

The real Will Smith eating spaghetti, parodying an AI-generated video from 2023.

The real Will Smith eating spaghetti, parodying an AI-generated video from 2023. Credit: Will Smith / Getty Images / Benj Edwards

In March 2023, a terrible AI-generated video of Will Smith’s AI doppelganger eating spaghetti began making the rounds online. The AI-generated version of the actor gobbled down the noodles in an unnatural and disturbing way. Almost a year later, in February 2024, Will Smith himself posted a parody response video to the viral jabberwocky on Instagram, featuring AI-like deliberately exaggerated pasta consumption, complete with hair-nibbling and finger-slurping antics.

Given the rapid evolution of AI video technology, particularly since OpenAI had just unveiled its Sora video model four days earlier, Smith’s post sparked discussion in his Instagram comments where some viewers initially struggled to distinguish between the genuine footage and AI generation. It was an early sign of “deep doubt” in action as the tech increasingly blurs the line between synthetic and authentic video content.

Robot dogs learn to hunt people with AI-guided rifles

A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries.

A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries. Credit: Onyx Industries

At some point in recent history—somewhere around 2022—someone took a look at robotic quadrupeds and thought it would be a great idea to attach guns to them. A few years later, the US Marine Forces Special Operations Command (MARSOC) began evaluating armed robotic quadrupeds developed by Ghost Robotics. The robot “dogs” integrated Onyx Industries’ SENTRY remote weapon systems, which featured AI-enabled targeting that could detect and track people, drones, and vehicles, though the systems require human operators to authorize any weapons discharge.

The military’s interest in armed robotic dogs followed a broader trend of weaponized quadrupeds entering public awareness. This included viral videos of consumer robots carrying firearms, and later, commercial sales of flame-throwing models. While MARSOC emphasized that weapons were just one potential use case under review, experts noted that the increasing integration of AI into military robotics raised questions about how long humans would remain in control of lethal force decisions.

Microsoft Windows AI is watching

A screenshot of Microsoft's new

A screenshot of Microsoft’s new “Recall” feature in action. Credit: Microsoft

In an era where many people already feel like they have no privacy due to tech encroachments, Microsoft dialed it up to an extreme degree in May. That’s when Microsoft unveiled a controversial Windows 11 feature called “Recall” that continuously captures screenshots of users’ PC activities every few seconds for later AI-powered search and retrieval. The feature, designed for new Copilot+ PCs using Qualcomm’s Snapdragon X Elite chips, promised to help users find past activities, including app usage, meeting content, and web browsing history.

While Microsoft emphasized that Recall would store encrypted snapshots locally and allow users to exclude specific apps or websites, the announcement raised immediate privacy concerns, as Ars senior technology reporter Andrew Cunningham covered. It also came with a technical toll, requiring significant hardware resources, including 256GB of storage space, with 25GB dedicated to storing approximately three months of user activity. After Microsoft pulled the initial test version due to public backlash, Recall later entered public preview in November with reportedly enhanced security measures. But secure spyware is still spyware—Recall, when enabled, still watches nearly everything you do on your computer and keeps a record of it.

Google Search told people to eat rocks

This is fine. Credit: Getty Images

In May, Ars senior gaming reporter Kyle Orland (who assisted commendably with the AI beat throughout the year) covered Google’s newly launched AI Overview feature. It faced immediate criticism when users discovered that it frequently provided false and potentially dangerous information in its search result summaries. Among its most alarming responses, the system advised humans could safely consume rocks, incorrectly citing scientific sources about the geological diet of marine organisms. The system’s other errors included recommending nonexistent car maintenance products, suggesting unsafe food preparation techniques, and confusing historical figures who shared names.

The problems stemmed from several issues, including the AI treating joke posts as factual sources and misinterpreting context from original web content. But most of all, the system relies on web results as indicators of authority, which we called a flawed design. While Google defended the system, stating these errors occurred mainly with uncommon queries, a company spokesperson acknowledged they would use these “isolated examples” to refine their systems. But to this day, AI Overview still makes frequent mistakes.

Stable Diffusion generates body horror

An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass. Credit: HorneyMetalBeing

In June, Stability AI’s release of the image synthesis model Stable Diffusion 3 Medium drew criticism online for its poor handling of human anatomy in AI-generated images. Users across social media platforms shared examples of the model producing what we now like to call jabberwockies—AI generation failures with distorted bodies, misshapen hands, and surreal anatomical errors, and many in the AI image-generation community viewed it as a significant step backward from previous image-synthesis capabilities.

Reddit users attributed these failures to Stability AI’s aggressive filtering of adult content from the training data, which apparently impaired the model’s ability to accurately render human figures. The troubled release coincided with broader organizational challenges at Stability AI, including the March departure of CEO Emad Mostaque, multiple staff layoffs, and the exit of three key engineers who had helped develop the technology. Some of those engineers founded Black Forest Labs in August and released Flux, which has become the latest open-weights AI image model to beat.

ChatGPT Advanced Voice imitates human voice in testing

An illustration of a computer synthesizer spewing out letters.

AI voice-synthesis models are master imitators these days, and they are capable of much more than many people realize. In August, we covered a story where OpenAI’s ChatGPT Advanced Voice Mode feature unexpectedly imitated a user’s voice during the company’s internal testing, revealed by OpenAI after the fact in safety testing documentation. To prevent future instances of an AI assistant suddenly speaking in your own voice (which, let’s be honest, would probably freak people out), the company created an output classifier system to prevent unauthorized voice imitation. OpenAI says that Advanced Voice Mode now catches all meaningful deviations from approved system voices.

Independent AI researcher Simon Willison discussed the implications with Ars Technica, noting that while OpenAI restricted its model’s full voice synthesis capabilities, similar technology would likely emerge from other sources within the year. Meanwhile, the rapid advancement of AI voice replication has caused general concern about its potential misuse, although companies like ElevenLabs have already been offering voice cloning services for some time.

San Francisco’s robotic car horn symphony

A Waymo self-driving car in front of Google's San Francisco headquarters, San Francisco, California, June 7, 2024.

A Waymo self-driving car in front of Google’s San Francisco headquarters, San Francisco, California, June 7, 2024. Credit: Getty Images

In August, San Francisco residents got a noisy taste of robo-dystopia when Waymo’s self-driving cars began creating an unexpected nightly disturbance in the South of Market district. In a parking lot off 2nd Street, the cars congregated autonomously every night during rider lulls at 4 am and began engaging in extended honking matches at each other while attempting to park.

Local resident Christopher Cherry’s initial optimism about the robotic fleet’s presence dissolved as the mechanical chorus grew louder each night, affecting residents in nearby high-rises. The nocturnal tech disruption served as a lesson in the unintentional effects of autonomous systems when run in aggregate.

Larry Ellison dreams of all-seeing AI cameras

A colorized photo of CCTV cameras in London, 2024.

In September, Oracle co-founder Larry Ellison painted a bleak vision of ubiquitous AI surveillance during a company financial meeting. The 80-year-old database billionaire described a future where AI would monitor citizens through networks of cameras and drones, asserting that the oversight would ensure lawful behavior from both police and the public.

His surveillance predictions reminded us of parallels to existing systems in China, where authorities already used AI to sort surveillance data on citizens as part of the country’s “sharp eyes” campaign from 2015 to 2020. Ellison’s statement reflected the sort of worst-case tech surveillance state scenario—likely antithetical to any sort of free society—that dozens of sci-fi novels of the 20th century warned us about.

A dead father sends new letters home

An AI-generated image featuring Dad's Uppercase handwriting.

An AI-generated image featuring my late father’s handwriting. Credit: Benj Edwards / Flux

AI has made many of us do weird things in 2024, including this writer. In October, I used an AI synthesis model called Flux to reproduce my late father’s handwriting with striking accuracy. After scanning 30 samples from his engineering notebooks, I trained the model using computing time that cost less than five dollars. The resulting text captured his distinctive uppercase style, which he developed during his career as an electronics engineer.

I enjoyed creating images showing his handwriting in various contexts, from folder labels to skywriting, and made the trained model freely available online for others to use. While I approached it as a tribute to my father (who would have appreciated the technical achievement), many people found the whole experience weird and somewhat disturbing. The things we unhinged Bing Chat-like journalists do to bring awareness to a topic are sometimes unconventional. So I guess it counts for this list!

For 2025? Expect even more AI

Thanks for reading Ars Technica this past year and following along with our team coverage of this rapidly emerging and expanding field. We appreciate your kind words of support. Ars Technica’s 2024 AI words of the year were: vibemarking, deep doubt, and the aforementioned jabberwocky. The old stalwart “confabulation” also made several notable appearances. Tune in again next year when we continue to try to figure out how to concisely describe novel scenarios in emerging technology by labeling them.

Looking back, our prediction for 2024 in AI last year was “buckle up.” It seems fitting, given the weirdness detailed above. Especially the part about the robot dogs with guns. For 2025, AI will likely inspire more chaos ahead, but also potentially get put to serious work as a productivity tool, so this time, our prediction is “buckle down.”

Finally, we’d like to ask: What was the craziest story about AI in 2024 from your perspective? Whether you love AI or hate it, feel free to suggest your own additions to our list in the comments. Happy New Year!

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

2024: The year AI drove everyone crazy Read More »