machine learning

google-ceo:-if-an-ai-bubble-pops,-no-one-is-getting-out-clean

Google CEO: If an AI bubble pops, no one is getting out clean

Market concerns and Google’s position

Alphabet’s recent market performance has been driven by investor confidence in the company’s ability to compete with OpenAI’s ChatGPT, as well as its development of specialized chips for AI that can compete with Nvidia’s. Nvidia recently reached a world-first $5 trillion valuation due to making GPUs that can accelerate the matrix math at the heart of AI computations.

Despite acknowledging that no company would be immune to a potential AI bubble burst, Pichai argued that Google’s unique position gives it an advantage. He told the BBC that the company owns what he called a “full stack” of technologies, from chips to YouTube data to models and frontier science research. This integrated approach, he suggested, would help the company weather any market turbulence better than competitors.

Pichai also told the BBC that people should not “blindly trust” everything AI tools output. The company currently faces repeated accuracy concerns about some of its AI models. Pichai said that while AI tools are helpful “if you want to creatively write something,” people “have to learn to use these tools for what they’re good at and not blindly trust everything they say.”

In the BBC interview, the Google boss also addressed the “immense” energy needs of AI, acknowledging that the intensive energy requirements of expanding AI ventures have caused slippage on Alphabet’s climate targets. However, Pichai insisted that the company still wants to achieve net zero by 2030 through investments in new energy technologies. “The rate at which we were hoping to make progress will be impacted,” Pichai said, warning that constraining an economy based on energy “will have consequences.”

Even with the warnings about a potential AI bubble, Pichai did not miss his chance to promote the technology, albeit with a hint of danger regarding its widespread impact. Pichai described AI as “the most profound technology” humankind has worked on.

“We will have to work through societal disruptions,” he said, adding that the technology would “create new opportunities” and “evolve and transition certain jobs.” He said people who adapt to AI tools “will do better” in their professions, whatever field they work in.

Google CEO: If an AI bubble pops, no one is getting out clean Read More »

forget-agi—sam-altman-celebrates-chatgpt-finally-following-em-dash-formatting-rules

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules


Next stop: superintelligence

Ongoing struggles with AI model instruction-following show that true human-level AI still a ways off.

Em dashes have become what many believe to be a telltale sign of AI-generated text over the past few years. The punctuation mark appears frequently in outputs from ChatGPT and other AI chatbots, sometimes to the point where readers believe they can identify AI writing by its overuse alone—although people can overuse it, too.

On Thursday evening, OpenAI CEO Sam Altman posted on X that ChatGPT has started following custom instructions to avoid using em dashes. “Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it’s supposed to do!” he wrote.

The post, which came two days after the release of OpenAI’s new GPT-5.1 AI model, received mixed reactions from users who have struggled for years with getting the chatbot to follow specific formatting preferences. And this “small win” raises a very big question: If the world’s most valuable AI company has struggled with controlling something as simple as punctuation use after years of trying, perhaps what people call artificial general intelligence (AGI) is farther off than some in the industry claim.

Sam Altman @sama Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it's supposed to do! 11:48 PM · Nov 13, 2025 · 2.4M Views

A screenshot of Sam Altman’s post about em dashes on X. Credit: X

“The fact that it’s been 3 years since ChatGPT first launched, and you’ve only just now managed to make it obey this simple requirement, says a lot about how little control you have over it, and your understanding of its inner workings,” wrote one X user in a reply. “Not a good sign for the future.”

While Altman likes to publicly talk about AGI (a hypothetical technology equivalent to humans in general learning ability), superintelligence (a nebulous concept for AI that is far beyond human intelligence), and “magic intelligence in the sky” (his term for AI cloud computing?) while raising funds for OpenAI, it’s clear that we still don’t have reliable artificial intelligence here today on Earth.

But wait, what is an em dash anyway, and why does it matter so much?

AI models love em dashes because we do

Unlike a hyphen, which is a short punctuation mark used to connect words or parts of words, that lives with a dedicated key on your keyboard (-), an em dash is a long dash denoted by a special character (—) that writers use to set off parenthetical information, indicate a sudden change in thought, or introduce a summary or explanation.

Even before the age of AI language models, some writers frequently bemoaned the overuse of the em dash in modern writing. In a 2011 Slate article, writer Noreen Malone argued that writers used the em dash “in lieu of properly crafting sentences” and that overreliance on it “discourages truly efficient writing.” Various Reddit threads posted prior to ChatGPT’s launch featured writers either wrestling over the etiquette of proper em dash use or admitting to their frequent use as a guilty pleasure.

In 2021, one writer in the r/FanFiction subreddit wrote, “For the longest time, I’ve been addicted to Em Dashes. They find their way into every paragraph I write. I love the crisp straight line that gives me the excuse to shove details or thoughts into an otherwise orderly paragraph. Even after coming back to write after like two years of writer’s block, I immediately cram as many em dashes as I can.”

Because of the tendency for AI chatbots to overuse them, detection tools and human readers have learned to spot em dash use as a pattern, creating a problem for the small subset of writers who naturally favor the punctuation mark in their work. As a result, some journalists are complaining that AI is “killing” the em dash.

No one knows precisely why LLMs tend to overuse em dashes. We’ve seen a wide range of speculation online that attempts to explain the phenomenon, from noticing that em dashes were more popular in 19th-century books used as training data (according to a 2018 study, dash use in the English language peaked around 1860 before declining through the mid-20th century) or perhaps AI models borrowed the habit from automatic em-dash character conversion on the blogging site Medium.

One thing we know for sure is that LLMs tend to output frequently seen patterns in their training data (fed in during the initial training process) and from a subsequent reinforcement learning process that often relies on human preferences. As a result, AI language models feed you a sort of “smoothed out” average style of whatever you ask them to provide, moderated by whatever they are conditioned to produce through user feedback.

So the most plausible explanation is still that requests for professional-style writing from an AI model trained on vast numbers of examples from the Internet will lean heavily toward the prevailing style in the training data, where em dashes appear frequently in formal writing, news articles, and editorial content. It’s also possible that during training through human feedback (called RLHF), responses with em dashes, for whatever reason, received higher ratings. Perhaps it’s because those outputs appeared more sophisticated or engaging to evaluators, but that’s just speculation.

From em dashes to AGI?

To understand what Altman’s “win” really means, and what it says about the road to AGI, we need to understand how ChatGPT’s custom instructions actually work. They allow users to set persistent preferences that apply across all conversations by appending written instructions to the prompt that is fed into the model just before the chat begins. Users can specify tone, format, and style requirements without needing to repeat those requests manually in every new chat.

However, the feature has not always worked reliably because LLMs do not work reliably (even OpenAI and Anthropic freely admit this). A LLM takes an input and produces an output, spitting out a statistically plausible continuation of a prompt (a system prompt, the custom instructions, and your chat history), and it doesn’t really “understand” what you are asking. With AI language model outputs, there is always some luck involved in getting them to do what you want.

In our informal testing of GPT-5.1 with custom instructions, ChatGPT did appear to follow our request not to produce em dashes. But despite Altman’s claim, the response from X users appears to show that experiences with the feature continue to vary, at least when the request is not placed in custom instructions.

So if LLMs are statistical text-generation boxes, what does “instruction following” even mean? That’s key to unpacking the hypothetical path from LLMs to AGI. The concept of following instructions for an LLM is fundamentally different from how we typically think about following instructions as humans with general intelligence, or even a traditional computer program.

In traditional computing, instruction following is deterministic. You tell a program “don’t include character X,” and it won’t include that character. The program executes rules exactly as written. With LLMs, “instruction following” is really about shifting statistical probabilities. When you tell ChatGPT “don’t use em dashes,” you’re not creating a hard rule. You’re adding text to the prompt that makes tokens associated with em dashes less likely to be selected during the generation process. But “less likely” isn’t “impossible.”

Every token the model generates is selected from a probability distribution. Your custom instruction influences that distribution, but it’s competing with the model’s training data (where em-dashes appeared frequently in certain contexts) and everything else in the prompt. Unlike code with conditional logic, there’s no separate system verifying outputs against your requirements. The instruction is just more text that influences the statistical prediction process.

When Altman celebrates finally getting GPT to avoid em dashes, he’s really celebrating that OpenAI has tuned the latest version of GPT-5.1 (probably through reinforcement learning or fine-tuning) to weight custom instructions more heavily in its probability calculations.

There’s an irony about control here: Given the probabilistic nature of the issue, there’s no guarantee the issue will stay fixed. OpenAI continuously updates its models behind the scenes, even within the same version number, adjusting outputs based on user feedback and new training runs. Each update arrives with different output characteristics that can undo previous behavioral tuning, a phenomenon researchers call the “alignment tax.”

Precisely tuning a neural network’s behavior is not yet an exact science. Since all concepts encoded in the network are interconnected by values called weights, adjusting one behavior can alter others in unintended ways. Fix em dash overuse today, and tomorrow’s update (aimed at improving, say, coding capabilities) might inadvertently bring them back, not because OpenAI wants them there, but because that’s the nature of trying to steer a statistical system with millions of competing influences.

This gets to an implied question we mentioned earlier. If controlling punctuation use is still a struggle that might pop back up at any time, how far are we from AGI? We can’t know for sure, but it seems increasingly likely that it won’t emerge from a large language model alone. That’s because AGI, a technology that would replicate human general learning ability, would likely require true understanding and self-reflective intentional action, not statistical pattern matching that sometimes aligns with instructions if you happen to get lucky.

And speaking of getting lucky, some users still aren’t having luck with controlling em dash use outside of the “custom instructions” feature. Upon being told in-chat to not use em dashes within a chat, ChatGPT updated a saved memory and replied to one X user, “Got it—I’ll stick strictly to short hyphens from now on.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules Read More »

openai-walks-a-tricky-tightrope-with-gpt-5.1’s-eight-new-personalities

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities

On Wednesday, OpenAI released GPT-5.1 Instant and GPT-5.1 Thinking, two updated versions of its flagship AI models now available in ChatGPT. The company is wrapping the models in the language of anthropomorphism, claiming that they’re warmer, more conversational, and better at following instructions.

The release follows complaints earlier this year that its previous models were excessively cheerful and sycophantic, along with an opposing controversy among users over how OpenAI modified the default GPT-5 output style after several suicide lawsuits.

The company now faces intense scrutiny from lawyers and regulators that could threaten its future operations. In that kind of environment, it’s difficult to just release a new AI model, throw out a few stats, and move on like the company could even a year ago. But here are the basics: The new GPT-5.1 Instant model will serve as ChatGPT’s faster default option for most tasks, while GPT-5.1 Thinking is a simulated reasoning model that attempts to handle more complex problem-solving tasks.

OpenAI claims that both models perform better on technical benchmarks such as math and coding evaluations (including AIME 2025 and Codeforces) than GPT-5, which was released in August.

Improved benchmarks may win over some users, but the biggest change with GPT-5.1 is in its presentation. OpenAI says it heard from users that they wanted AI models to simulate different communication styles depending on the task, so the company is offering eight preset options, including Professional, Friendly, Candid, Quirky, Efficient, Cynical, and Nerdy, alongside a Default setting.

These presets alter the instructions fed into each prompt to simulate different personality styles, but the underlying model capabilities remain the same across all settings.

An illustration showing GPT-5.1's eight personality styles in ChatGPT.

An illustration showing GPT-5.1’s eight personality styles in ChatGPT. Credit: OpenAI

In addition, the company trained GPT-5.1 Instant to use “adaptive reasoning,” meaning that the model decides when to spend more computational time processing a prompt before generating output.

The company plans to roll out the models gradually over the next few days, starting with paid subscribers before expanding to free users. OpenAI plans to bring both GPT-5.1 Instant and GPT-5.1 Thinking to its API later this week. GPT-5.1 Instant will appear as gpt-5.1-chat-latest, and GPT-5.1 Thinking will be released as GPT-5.1 in the API, both with adaptive reasoning enabled. The older GPT-5 models will remain available in ChatGPT under the legacy models dropdown for paid subscribers for three months.

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities Read More »

meta’s-star-ai-scientist-yann-lecun-plans-to-leave-for-own-startup

Meta’s star AI scientist Yann LeCun plans to leave for own startup

A different approach to AI

LeCun founded Meta’s Fundamental AI Research lab, known as FAIR, in 2013 and has served as the company’s chief AI scientist ever since. He is one of three researchers who won the 2018 Turing Award for pioneering work on deep learning and convolutional neural networks. After leaving Meta, LeCun will remain a professor at New York University, where he has taught since 2003.

LeCun has previously argued that large language models like Llama that Zuckerberg has put at the center of his strategy are useful, but they will never be able to reason and plan like humans, increasingly appearing to contradict his boss’s grandiose AI vision for developing “superintelligence.”

For example, in May 2024, when an OpenAI researcher discussed the need to control ultra-intelligent AI, LeCun responded on X by writing that before urgently figuring out how to control AI systems much smarter than humans, researchers need to have the beginning of a hint of a design for a system smarter than a house cat.

Mark Zuckerberg once believed the “metaverse” was the future and renamed his company because of it. Credit: Facebook

Within FAIR, LeCun has instead focused on developing world models that can truly plan and reason. Over the past year, though, Meta’s AI research groups have seen growing tension and mass layoffs as Zuckerberg has shifted the company’s AI strategy away from long-term research and toward the rapid deployment of commercial products.

Over the summer, Zuckerberg hired Alexandr Wang to lead a new superintelligence team at Meta, paying $14.3 billion to hire the 28-year-old founder of data-labeling startup Scale AI and acquire a 49 percent interest in his company. LeCun, who had previously reported to Chief Product Officer Chris Cox, now reports to Wang, which seems like a sharp rebuke of LeCun’s approach to AI.

Zuckerberg also personally handpicked an exclusive team called TBD Lab to accelerate the development of the next iteration of large language models, luring staff from rivals such as OpenAI and Google with astonishingly large $100 to $250 million pay packages. As a result, Zuckerberg has come under growing pressure from Wall Street to show that his multibillion-dollar investment in becoming an AI leader will pay off and boost revenue. But if it turns out like his previous pivot to the metaverse, Zuckerberg’s latest bet could prove equally expensive and unfruitful.

Meta’s star AI scientist Yann LeCun plans to leave for own startup Read More »

researchers-isolate-memorization-from-problem-solving-in-ai-neural-networks

Researchers isolate memorization from problem-solving in AI neural networks


The hills and valleys of knowledge

Basic arithmetic ability lives in the memorization pathways, not logic circuits.

When engineers build AI language models like GPT-5 from training data, at least two major processing features emerge: memorization (reciting exact text they’ve seen before, like famous quotes or passages from books) and what you might call “reasoning” (solving new problems using general principles). New research from AI startup Goodfire.ai provides the first potentially clear evidence that these different functions actually work through completely separate neural pathways in the model’s architecture.

The researchers discovered that this separation proves remarkably clean. In a preprint paper released in late October, they described that when they removed the memorization pathways, models lost 97 percent of their ability to recite training data verbatim but kept nearly all their “logical reasoning” ability intact.

For example, at layer 22 in Allen Institute for AI’s OLMo-7B language model, the researchers ranked all the weight components (the mathematical values that process information) from high to low based on a measure called “curvature” (which we’ll explain more below). When they examined these ranked components, the bottom 50 percent of weight components showed 23 percent higher activation on memorized data, while the top 10 percent showed 26 percent higher activation on general, non-memorized text.

In other words, the components that specialize in memorization clustered at the bottom of their ranking, while problem-solving components clustered at the top. This mechanistic split enabled the researchers to surgically remove memorization while preserving other capabilities. They found they could delete the bottom-ranked components to eliminate memorization while keeping the top-ranked ones that handle problem-solving.

Perhaps most surprisingly, the researchers found that arithmetic operations seem to share the same neural pathways as memorization rather than logical reasoning. When they removed memorization circuits, mathematical performance plummeted to 66 percent while logical tasks remained nearly untouched. This discovery may explain why AI language models notoriously struggle with math without the use of external tools. They’re attempting to recall arithmetic from a limited memorization table rather than computing it, like a student who memorized times tables but never learned how multiplication works. The finding suggests that at current scales, language models treat “2+2=4” more like a memorized fact than a logical operation.

It’s worth noting that “reasoning” in AI research covers a spectrum of abilities that don’t necessarily match what we might call reasoning in humans. The logical reasoning that survived memory removal in this latest research includes tasks like evaluating true/false statements and following if-then rules, which are essentially applying learned patterns to new inputs. This also differs from the deeper “mathematical reasoning” required for proofs or novel problem-solving, which current AI models struggle with even when their pattern-matching abilities remain intact.

Looking ahead, if the information removal techniques receive further development in the future, AI companies could potentially one day remove, say, copyrighted content, private information, or harmful memorized text from a neural network without destroying the model’s ability to perform transformative tasks. However, since neural networks store information in distributed ways that are still not completely understood, for the time being, the researchers say their method “cannot guarantee complete elimination of sensitive information.” These are early steps in a new research direction for AI.

Traveling the neural landscape

To understand how researchers from Goodfire distinguished memorization from reasoning in these neural networks, it helps to know about a concept in AI called the “loss landscape.” The “loss landscape” is a way of visualizing how wrong or right an AI model’s predictions are as you adjust its internal settings (which are called “weights”).

Imagine you’re tuning a complex machine with millions of dials. The “loss” measures the number of mistakes the machine makes. High loss means many errors, low loss means few errors. The “landscape” is what you’d see if you could map out the error rate for every possible combination of dial settings.

During training, AI models essentially “roll downhill” in this landscape (gradient descent), adjusting their weights to find the valleys where they make the fewest mistakes. This process provides AI model outputs, like answers to questions.

Figure 1: Overview of our approach. We collect activations and gradients from a sample of training data (a), which allows us to approximate loss curvature w.r.t. a weight matrix using K-FAC (b). We decompose these weight matrices into components (each the same size as the matrix), ordered from high to low curvature. In language models, we show that data from different tasks interacts with parts of the spectrum of components differently (c).

Figure 1 from the paper “From Memorization to Reasoning in the Spectrum of Loss Curvature.” Credit: Merullo et al.

The researchers analyzed the “curvature” of the loss landscapes of particular AI language models, measuring how sensitive the model’s performance is to small changes in different neural network weights. Sharp peaks and valleys represent high curvature (where tiny changes cause big effects), while flat plains represent low curvature (where changes have minimal impact). They used these curvature values to rank the weight components from high to low, as mentioned earlier.

Using a technique called K-FAC (Kronecker-Factored Approximate Curvature), they found that individual memorized facts create sharp spikes in this landscape, but because each memorized item spikes in a different direction, when averaged together they create a flat profile. Meanwhile, reasoning abilities that many different inputs rely on maintain consistent moderate curves across the landscape, like rolling hills that remain roughly the same shape regardless of the direction from which you approach them.

“Directions that implement shared mechanisms used by many inputs add coherently and remain high-curvature on average,” the researchers write, describing reasoning pathways. In contrast, memorization uses “idiosyncratic sharp directions associated with specific examples” that appear flat when averaged across data.

Different tasks reveal a spectrum of mechanisms

The researchers tested their technique on multiple AI systems to verify the findings held across different architectures. They primarily used Allen Institute’s OLMo-2 family of open language models, specifically the 7 billion- and 1 billion-parameter versions, chosen because their training data is openly accessible. For vision models, they trained custom 86 million-parameter Vision Transformers (ViT-Base models) on ImageNet with intentionally mislabeled data to create controlled memorization. They also validated their findings against existing memorization removal methods like BalancedSubnet to establish performance benchmarks.

The team tested their discovery by selectively removing low-curvature weight components from these trained models. Memorized content dropped to 3.4 percent recall from nearly 100 percent. Meanwhile, logical reasoning tasks maintained 95 to 106 percent of baseline performance.

These logical tasks included Boolean expression evaluation, logical deduction puzzles where solvers must track relationships like “if A is taller than B,” object tracking through multiple swaps, and benchmarks like BoolQ for yes/no reasoning, Winogrande for common sense inference, and OpenBookQA for science questions requiring reasoning from provided facts. Some tasks fell between these extremes, revealing a spectrum of mechanisms.

Mathematical operations and closed-book fact retrieval shared pathways with memorization, dropping to 66 to 86 percent performance after editing. The researchers found arithmetic particularly brittle. Even when models generated identical reasoning chains, they failed at the calculation step after low-curvature components were removed.

Figure 3: Sensitivity of different kinds of tasks to ablation of flatter eigenvectors. Parametric knowledge retrieval, arithmetic, and memorization are brittle, but openbook fact retrieval and logical reasoning is robust and maintain around 100% of original performance.

Figure 3 from the paper “From Memorization to Reasoning in the Spectrum of Loss Curvature.” Credit: Merullo et al.

“Arithmetic problems themselves are memorized at the 7B scale, or because they require narrowly used directions to do precise calculations,” the team explains. Open-book question answering, which relies on provided context rather than internal knowledge, proved most robust to the editing procedure, maintaining nearly full performance.

Curiously, the mechanism separation varied by information type. Common facts like country capitals barely changed after editing, while rare facts like company CEOs dropped 78 percent. This suggests models allocate distinct neural resources based on how frequently information appears in training.

The K-FAC technique outperformed existing memorization removal methods without needing training examples of memorized content. On unseen historical quotes, K-FAC achieved 16.1 percent memorization versus 60 percent for the previous best method, BalancedSubnet.

Vision transformers showed similar patterns. When trained with intentionally mislabeled images, the models developed distinct pathways for memorizing wrong labels versus learning correct patterns. Removing memorization pathways restored 66.5 percent accuracy on previously mislabeled images.

Limits of memory removal

However, the researchers acknowledged that their technique isn’t perfect. Once-removed memories might return if the model receives more training, as other research has shown that current unlearning methods only suppress information rather than completely erasing it from the neural network’s weights. That means the “forgotten” content can be reactivated with just a few training steps targeting those suppressed areas.

The researchers also can’t fully explain why some abilities, like math, break so easily when memorization is removed. It’s unclear whether the model actually memorized all its arithmetic or whether math just happens to use similar neural circuits as memorization. Additionally, some sophisticated capabilities might look like memorization to their detection method, even when they’re actually complex reasoning patterns. Finally, the mathematical tools they use to measure the model’s “landscape” can become unreliable at the extremes, though this doesn’t affect the actual editing process.

This article was updated on November 11, 2025 at 9: 16 am to clarify an explanation about sorting weights by curvature.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Researchers isolate memorization from problem-solving in AI neural networks Read More »

researchers-surprised-that-with-ai,-toxicity-is-harder-to-fake-than-intelligence

Researchers surprised that with AI, toxicity is harder to fake than intelligence

The next time you encounter an unusually polite reply on social media, you might want to check twice. It could be an AI model trying (and failing) to blend in with the crowd.

On Wednesday, researchers from the University of Zurich, University of Amsterdam, Duke University, and New York University released a study revealing that AI models remain easily distinguishable from humans in social media conversations, with overly friendly emotional tone serving as the most persistent giveaway. The research, which tested nine open-weight models across Twitter/X, Bluesky, and Reddit, found that classifiers developed by the researchers detected AI-generated replies with 70 to 80 percent accuracy.

The study introduces what the authors call a “computational Turing test” to assess how closely AI models approximate human language. Instead of relying on subjective human judgment about whether text sounds authentic, the framework uses automated classifiers and linguistic analysis to identify specific features that distinguish machine-generated from human-authored content.

“Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression,” the researchers wrote. The team, led by Nicolò Pagan at the University of Zurich, tested various optimization strategies, from simple prompting to fine-tuning, but found that deeper emotional cues persist as reliable tells that a particular text interaction online was authored by an AI chatbot rather than a human.

The toxicity tell

In the study, researchers tested nine large language models: Llama 3.1 8B, Llama 3.1 8B Instruct, Llama 3.1 70B, Mistral 7B v0.1, Mistral 7B Instruct v0.2, Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, DeepSeek-R1-Distill-Llama-8B, and Apertus-8B-2509.

When prompted to generate replies to real social media posts from actual users, the AI models struggled to match the level of casual negativity and spontaneous emotional expression common in human social media posts, with toxicity scores consistently lower than authentic human replies across all three platforms.

To counter this deficiency, the researchers attempted optimization strategies (including providing writing examples and context retrieval) that reduced structural differences like sentence length or word count, but variations in emotional tone persisted. “Our comprehensive calibration tests challenge the assumption that more sophisticated optimization necessarily yields more human-like output,” the researchers concluded.

Researchers surprised that with AI, toxicity is harder to fake than intelligence Read More »

google-plans-secret-ai-military-outpost-on-tiny-island-overrun-by-crabs

Google plans secret AI military outpost on tiny island overrun by crabs

Christmas Island Shire President Steve Pereira told Reuters that the council is examining community impacts before approving construction. “There is support for it, providing this data center actually does put back into the community with infrastructure, employment, and adding economic value to the island,” Pereira said.

That’s great, but what about the crabs?

Christmas Island’s annual crab migration is a natural phenomenon that Sir David Attenborough reportedly once described as one of his greatest TV moments when he visited the site in 1990.

Every year, millions of crabs emerge from the forest and swarm across roads, streams, rocks, and beaches to reach the ocean, where each female can produce up to 100,000 eggs. The tiny baby crabs that survive take about nine days to march back inland to the safety of the plateau.

While Google is seeking environmental approvals for its subsea cables, the timing could prove delicate for Christmas Island’s most famous residents. According to Parks Australia, the island’s annual red crab migration has already begun for 2025, with a major spawning event expected in just a few weeks, around November 15–16.

During peak migration times, sections of roads close at short notice as crabs move between forest and sea, and the island has built special crab bridges over roads to protect the migrating masses.

Parks Australia notes that while the migration happens annually, few baby crabs survive the journey from sea to forest most years, as they’re often eaten by fish, manta rays, and whale sharks. The successful migrations that occur only once or twice per decade (when large numbers of babies actually survive) are critical for maintaining the island’s red crab population.

How Google’s facility might coexist with 100 million marching crustaceans remains to be seen. But judging by the size of the event, it seems clear that it’s the crab’s world, and we’re just living in it.

Google plans secret AI military outpost on tiny island overrun by crabs Read More »

if-you-want-to-satiate-ai’s-hunger-for-power,-google-suggests-going-to-space

If you want to satiate AI’s hunger for power, Google suggests going to space


Google engineers think they already have all the pieces needed to build a data center in orbit.

With Project Suncatcher, Google will test its Tensor Processing Units on satellites. Credit: Google

It was probably always when, not if, Google would add its name to the list of companies intrigued by the potential of orbiting data centers.

Google announced Tuesday a new initiative, named Project Suncatcher, to examine the feasibility of bringing artificial intelligence to space. The idea is to deploy swarms of satellites in low-Earth orbit, each carrying Google’s AI accelerator chips designed for training, content generation, synthetic speech and vision, and predictive modeling. Google calls these chips Tensor Processing Units, or TPUs.

“Project Suncatcher is a moonshot exploring a new frontier: equipping solar-powered satellite constellations with TPUs and free-space optical links to one day scale machine learning compute in space,” Google wrote in a blog post.

“Like any moonshot, it’s going to require us to solve a lot of complex engineering challenges,” Google’s CEO, Sundar Pichai, wrote on X. Pichai noted that Google’s early tests show the company’s TPUs can withstand the intense radiation they will encounter in space. “However, significant challenges still remain like thermal management and on-orbit system reliability.”

The why and how

Ars reported on Google’s announcement on Tuesday, and Google published a research paper outlining the motivation for such a moonshot project. One of the authors, Travis Beals, spoke with Ars about Project Suncatcher and offered his thoughts on why it just might work.

“We’re just seeing so much demand from people for AI,” said Beals, senior director of Paradigms of Intelligence, a research team within Google. “So, we wanted to figure out a solution for compute that could work no matter how large demand might grow.”

Higher demand will lead to bigger data centers consuming colossal amounts of electricity. According to the MIT Technology Review, AI alone could consume as much electricity annually as 22 percent of all US households by 2028. Cooling is also a problem, often requiring access to vast water resources, raising important questions about environmental sustainability.

Google is looking to the sky to avoid potential bottlenecks. A satellite in space can access an infinite supply of renewable energy and an entire Universe to absorb heat.

“If you think about a data center on Earth, it’s taking power in and it’s emitting heat out,” Beals said. “For us, it’s the satellite that’s doing the same. The satellite is going to have solar panels … They’re going to feed that power to the TPUs to do whatever compute we need them to do, and then the waste heat from the TPUs will be distributed out over a radiator that will then radiate that heat out into space.”

Google envisions putting a legion of satellites into a special kind of orbit that rides along the day-night terminator, where sunlight meets darkness. This north-south, or polar, orbit would be synchronized with the Sun, allowing a satellite’s power-generating solar panels to remain continuously bathed in sunshine.

“It’s much brighter even than the midday Sun on Earth because it’s not filtered by Earth’s atmosphere,” Beals said.

This means a solar panel in space can produce up to eight times more power than the same collecting area on the ground, and you don’t need a lot of batteries to reserve electricity for nighttime. This may sound like the argument for space-based solar power, an idea first described by Isaac Asimov in his short story Reason published in 1941. But instead of transmitting the electricity down to Earth for terrestrial use, orbiting data centers would tap into the power source in space.

“As with many things, the ideas originate in science fiction, but it’s had a number of challenges, and one big one is, how do you get the power down to Earth?” Beals said. “So, instead of trying to figure out that, we’re embarking on this moonshot to bring [machine learning] compute chips into space, put them on satellites that have the solar panels and the radiators for cooling, and then integrate it all together so you don’t actually have to be powered on Earth.”

SpaceX is driving down launch costs, thanks to reusable rockets and an abundant volume of Starlink satellite launches. Credit: SpaceX

Google has a mixed record with its ambitious moonshot projects. One of the most prominent moonshot graduates is the self-driving car kit developer Waymo, which spun out to form a separate company in 2016 and is now operational. The Project Loon initiative to beam Internet signals from high-altitude balloons is one of the Google moonshots that didn’t make it.

Ars published two stories last week on the promise of space-based data centers. One of the startups in this field, named Starcloud, is partnering with Nvidia, the world’s largest tech company by market capitalization, to build a 5 gigawatt orbital data center with enormous solar and cooling panels approximately 4 kilometers (2.5 miles) in width and length. In response to that story, Elon Musk said SpaceX is pursuing the same business opportunity but didn’t provide any details. It’s worth noting that Google holds an estimated 7 percent stake in SpaceX.

Strength in numbers

Google’s proposed architecture differs from that of Starcloud and Nvidia in an important way. Instead of putting up just one or a few massive computing nodes, Google wants to launch a fleet of smaller satellites that talk to one another through laser data links. Essentially, a satellite swarm would function as a single data center, using light-speed interconnectivity to aggregate computing power hundreds of miles over our heads.

If that sounds implausible, take a moment to think about what companies are already doing in space today. SpaceX routinely launches more than 100 Starlink satellites per week, each of which uses laser inter-satellite links to bounce Internet signals around the globe. Amazon’s Kuiper satellite broadband network uses similar technology, and laser communications will underpin the US Space Force’s next-generation data-relay constellation.

Artist’s illustration of laser crosslinks in space. Credit: TESAT

Autonomously constructing a miles-long structure in orbit, as Nvidia and Starcloud foresee, would unlock unimagined opportunities. The concept also relies on tech that has never been tested in space, but there are plenty of engineers and investors who want to try. Starcloud announced an agreement last week with a new in-space assembly company, Rendezvous Robotics, to explore the use of modular, autonomous assembly to build Starcloud’s data centers.

Google’s research paper describes a future computing constellation of 81 satellites flying at an altitude of some 400 miles (650 kilometers), but Beals said the company could dial the total swarm size to as many spacecraft as the market demands. This architecture could enable terawatt-class orbital data centers, according to Google.

“What we’re actually envisioning is, potentially, as you scale, you could have many clusters,” Beals said.

Whatever the number, the satellites will communicate with one another using optical inter-satellite links for high-speed, low-latency connectivity. The satellites will need to fly in tight formation, perhaps a few hundred feet apart, with a swarm diameter of a little more than a mile, or about 2 kilometers. Google says its physics-based model shows satellites can maintain stable formations at such close ranges using automation and “reasonable propulsion budgets.”

“If you’re doing something that requires a ton of tight coordination between many TPUs—training, in particular—you want links that have as low latency as possible and as high bandwidth as possible,” Beals said. “With latency, you run into the speed of light, so you need to get things close together there to reduce latency. But bandwidth is also helped by bringing things close together.”

Some machine-learning applications could be done with the TPUs on just one modestly sized satellite, while others may require the processing power of multiple spacecraft linked together.

“You might be able to fit smaller jobs into a single satellite. This is an approach where, potentially, you can tackle a lot of inference workloads with a single satellite or a small number of them, but eventually, if you want to run larger jobs, you may need a larger cluster all networked together like this,” Beals said.

Google has worked on Project Suncatcher for more than a year, according to Beals. In ground testing, engineers tested Google’s TPUs under a 67 MeV proton beam to simulate the total ionizing dose of radiation the chip would see over five years in orbit. Now, it’s time to demonstrate Google’s AI chips, and everything else needed for Project Suncatcher will actually work in the real environment.

Google is partnering with Planet, the Earth-imaging company, to develop a pair of small prototype satellites for launch in early 2027. Planet builds its own satellites, so Google has tapped it to manufacture each spacecraft, test them, and arrange for their launch. Google’s parent company, Alphabet, also has an equity stake in Planet.

“We have the TPUs and the associated hardware, the compute payload… and we’re bringing that to Planet,” Beals said. “For this prototype mission, we’re really asking them to help us do everything to get that ready to operate in space.”

Beals declined to say how much the demo slated for launch in 2027 will cost but said Google is paying Planet for its role in the mission. The goal of the demo mission is to show whether space-based computing is a viable enterprise.

“Does it really hold up in space the way we think it will, the way we’ve tested on Earth?” Beals said.

Engineers will test an inter-satellite laser link and verify Google’s AI chips can weather the rigors of spaceflight.

“We’re envisioning scaling by building lots of satellites and connecting them together with ultra-high bandwidth inter-satellite links,” Beals said. “That’s why we want to launch a pair of satellites, because then we can test the link between the satellites.”

Evolution of a free-fall (no thrust) constellation under Earth’s gravitational attraction, modeled to the level of detail required to obtain Sun-synchronous orbits, in a non-rotating coordinate system. Credit: Google

Getting all this data to users on the ground is another challenge. Optical data links could also route enormous amounts of data between the satellites in orbit and ground stations on Earth.

Aside from the technical feasibility, there have long been economic hurdles to fielding large satellite constellations. But SpaceX’s experience with its Starlink broadband network, now with more than 8,000 active satellites, is proof that times have changed.

Google believes the economic equation is about to change again when SpaceX’s Starship rocket comes online. The company’s learning curve analysis shows launch prices could fall to less than $200 per kilogram by around 2035, assuming Starship is flying about 180 times per year by then. This is far below SpaceX’s stated launch targets for Starship but comparable to SpaceX’s proven flight rate with its workhorse Falcon 9 rocket.

It’s possible there could be even more downward pressure on launch costs if SpaceX, Nvidia, and others join Google in the race for space-based computing. The demand curve for access to space may only be eclipsed by the world’s appetite for AI.

“The more people are doing interesting, exciting things in space, the more investment there is in launch, and in the long run, that could help drive down launch costs,” Beals said. “So, it’s actually great to see that investment in other parts of the space supply chain and value chain. There are a lot of different ways of doing this.”

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

If you want to satiate AI’s hunger for power, Google suggests going to space Read More »

openai-signs-massive-ai-compute-deal-with-amazon

OpenAI signs massive AI compute deal with Amazon

On Monday, OpenAI announced it has signed a seven-year, $38 billion deal to buy cloud services from Amazon Web Services to power products like ChatGPT and Sora. It’s the company’s first big computing deal after a fundamental restructuring last week that gave OpenAI more operational and financial freedom from Microsoft.

The agreement gives OpenAI access to hundreds of thousands of Nvidia graphics processors to train and run its AI models. “Scaling frontier AI requires massive, reliable compute,” OpenAI CEO Sam Altman said in a statement. “Our partnership with AWS strengthens the broad compute ecosystem that will power this next era and bring advanced AI to everyone.”

OpenAI will reportedly use Amazon Web Services immediately, with all planned capacity set to come online by the end of 2026 and room to expand further in 2027 and beyond. Amazon plans to roll out hundreds of thousands of chips, including Nvidia’s GB200 and GB300 AI accelerators, in data clusters built to power ChatGPT’s responses, generate AI videos, and train OpenAI’s next wave of models.

Wall Street apparently liked the deal, because Amazon shares hit an all-time high on Monday morning. Meanwhile, shares for long-time OpenAI investor and partner Microsoft briefly dipped following the announcement.

Massive AI compute requirements

It’s no secret that running generative AI models for hundreds of millions of people currently requires a lot of computing power. Amid chip shortages over the past few years, finding sources of that computing muscle has been tricky. OpenAI is reportedly working on its own GPU hardware to help alleviate the strain.

But for now, the company needs to find new sources of Nvidia chips, which accelerate AI computations. Altman has previously said that the company plans to spend $1.4 trillion to develop 30 gigawatts of computing resources, an amount that is enough to roughly power 25 million US homes, according to Reuters.

OpenAI signs massive AI compute deal with Amazon Read More »

after-teen-death-lawsuits,-character.ai-will-restrict-chats-for-under-18-users

After teen death lawsuits, Character.AI will restrict chats for under-18 users

Lawsuits and safety concerns

Character.AI was founded in 2021 by Noam Shazeer and Daniel De Freitas, two former Google engineers, and raised nearly $200 million from investors. Last year, Google agreed to pay about $3 billion to license Character.AI’s technology, and Shazeer and De Freitas returned to Google.

But the company now faces multiple lawsuits alleging that its technology contributed to teen deaths. Last year, the family of 14-year-old Sewell Setzer III sued Character.AI, accusing the company of being responsible for his death. Setzer died by suicide after frequently texting and conversing with one of the platform’s chatbots. The company faces additional lawsuits, including one from a Colorado family whose 13-year-old daughter, Juliana Peralta, died by suicide in 2023 after using the platform.

In December, Character.AI announced changes, including improved detection of violating content and revised terms of service, but those measures did not restrict underage users from accessing the platform. Other AI chatbot services, such as OpenAI’s ChatGPT, have also come under scrutiny for their chatbots’ effects on young users. In September, OpenAI introduced parental control features intended to give parents more visibility into how their kids use the service.

The cases have drawn attention from government officials, which likely pushed Character.AI to announce the changes for under-18 chat access. Steve Padilla, a Democrat in California’s State Senate who introduced the safety bill, told The New York Times that “the stories are mounting of what can go wrong. It’s important to put reasonable guardrails in place so that we protect people who are most vulnerable.”

On Tuesday, Senators Josh Hawley and Richard Blumenthal introduced a bill to bar AI companions from use by minors. In addition, California Governor Gavin Newsom this month signed a law, which takes effect on January 1, requiring AI companies to have safety guardrails on chatbots.

After teen death lawsuits, Character.AI will restrict chats for under-18 users Read More »

nvidia-hits-record-$5-trillion-mark-as-ceo-dismisses-ai-bubble-concerns

Nvidia hits record $5 trillion mark as CEO dismisses AI bubble concerns

Partnerships and government contracts fuel optimism

At the GTC conference on Tuesday, Nvidia’s CEO went out of his way to repeatedly praise Donald Trump and his policies for accelerating domestic tech investment while warning that excluding China from Nvidia’s ecosystem could limit US access to half the world’s AI developers. The overall event stressed Nvidia’s role as an American company, with Huang even nodding to Trump’s signature slogan in his sign-off by thanking the audience for “making America great again.”

Trump’s cooperation is paramount for Nvidia because US export controls have effectively blocked Nvidia’s AI chips from China, costing the company billions of dollars in revenue. Bob O’Donnell of TECHnalysis Research told Reuters that “Nvidia clearly brought their story to DC to both educate and gain favor with the US government. They managed to hit most of the hottest and most influential topics in tech.”

Beyond the political messaging, Huang announced a series of partnerships and deals that apparently helped ease investor concerns about Nvidia’s future. The company announced collaborations with Uber Technologies, Palantir Technologies, and CrowdStrike Holdings, among others. Nvidia also revealed a $1 billion investment in Nokia to support the telecommunications company’s shift toward AI and 6G networking.

The agreement with Uber will power a fleet of 100,000 self-driving vehicles with Nvidia technology, with automaker Stellantis among the first to deliver the robotaxis. Palantir will pair Nvidia’s technology with its Ontology platform to use AI techniques for logistics insights, with Lowe’s as an early adopter. Eli Lilly plans to build what Nvidia described as the most powerful supercomputer owned and operated by a pharmaceutical company, relying on more than 1,000 Blackwell AI accelerator chips.

The $5 trillion valuation surpasses the total cryptocurrency market value and equals roughly half the size of the pan European Stoxx 600 equities index, Reuters notes. At current prices, Huang’s stake in Nvidia would be worth about $179.2 billion, making him the world’s eighth-richest person.

Nvidia hits record $5 trillion mark as CEO dismisses AI bubble concerns Read More »

expert-panel-will-determine-agi-arrival-in-new-microsoft-openai-agreement

Expert panel will determine AGI arrival in new Microsoft-OpenAI agreement

In May, OpenAI abandoned its plan to fully convert to a for-profit company after pressure from regulators and critics. The company instead shifted to a modified approach where the nonprofit board would retain control while converting its for-profit subsidiary into a public benefit corporation (PBC).

What changed in the agreement

The revised deal extends Microsoft’s intellectual property rights through 2032 and now includes models developed after AGI is declared. Microsoft holds IP rights to OpenAI’s model weights, architecture, inference code, and fine-tuning code until the expert panel confirms AGI or through 2030, whichever comes first. The new agreement also codifies that OpenAI can formally release open-weight models (like gpt-oss) that meet requisite capability criteria.

However, Microsoft’s rights to OpenAI’s research methods, defined as confidential techniques used in model development, will expire at those same thresholds. The agreement explicitly excludes Microsoft from having rights to OpenAI’s consumer hardware products.

The deal allows OpenAI to develop some products jointly with third parties. API products built with other companies must run exclusively on Azure, but non-API products can operate on any cloud provider. This gives OpenAI more flexibility to partner with other technology companies while keeping Microsoft as its primary infrastructure provider.

Under the agreement, Microsoft can now pursue AGI development alone or with partners other than OpenAI. If Microsoft uses OpenAI’s intellectual property to build AGI before the expert panel makes a declaration, those models must exceed compute thresholds that are larger than what current leading AI models require for training.

The revenue-sharing arrangement between the companies will continue until the expert panel verifies that AGI has been reached, though payments will extend over a longer period. OpenAI has committed to purchasing $250 billion in Azure services, and Microsoft no longer holds a right of first refusal to serve as OpenAI’s compute provider. This lets OpenAI shop around for cloud infrastructure if it chooses, though the massive Azure commitment suggests it will remain the primary provider.

Expert panel will determine AGI arrival in new Microsoft-OpenAI agreement Read More »