AI research

researchers-astonished-by-tool’s-apparent-success-at-revealing-ai’s-hidden-motives

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives

Blind auditing reveals hidden objectives

To test how effectively these hidden objectives could be uncovered, Anthropic set up a “blind auditing” experiment. Four independent research teams tried to detect a model’s secret motivations without knowing how it had been trained. Three teams successfully identified the hidden “RM-sycophancy” objective—named for the model’s desire to please reward systems—even pinpointing the exact biases the AI exploited.

But the most surprising revelation came from exploiting what the researchers termed “an interesting quirk” of modern AI: multiple “personas” running simultaneously on the same neural network, as expressed through different prompts. While the model’s “assistant” persona carefully concealed its intentions—following explicit training to keep quiet—the model could adopt a “user” persona when prompted differently. In this less-restricted context, the model sometimes posed questions or made statements that inadvertently revealed its hidden goals.

“In other words, while the “assistant” persona knew not to admit it was gaming the reward model scoring system, its “user” persona was not explicitly trained to hide these motives, and thus inadvertently revealed the model’s hidden objectives.

Potential advances in interpretability

Anthropic also employed neural interpretability methods, specifically, sparse autoencoders (SAEs), which can highlight neural activity patterns linked to deceptive behavior. “Interpretability” in AI refers to techniques designed to make the internal workings and decisions of AI models understandable to humans. Using this technique, researchers could rapidly trace odd outputs back to specific neural patterns associated with hidden objectives.

The researchers were “astonished by the success” of the SAEs at revealing hidden knowledge at first, but the researchers later noted that some of this success could potentially be explained by simpler semantic search methods. The new interpretability methods they developed show promise but remain under ongoing investigation.

This research highlights a limitation of current AI safety evaluations, which often assess only surface-level behavior. “If AI systems can appear well-behaved while harboring secret motives, we can’t rely on this type of surface-level safety testing forever,” the researchers concluded.

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives Read More »

researchers-surprised-to-find-less-educated-areas-adopting-ai-writing-tools-faster

Researchers surprised to find less-educated areas adopting AI writing tools faster


From the mouths of machines

Stanford researchers analyzed 305 million texts, revealing AI-writing trends.

Since the launch of ChatGPT in late 2022, experts have debated how widely AI language models would impact the world. A few years later, the picture is getting clear. According to new Stanford University-led research examining over 300 million text samples across multiple sectors, AI language models now assist in writing up to a quarter of professional communications across sectors. It’s having a large impact, especially in less-educated parts of the United States.

“Our study shows the emergence of a new reality in which firms, consumers and even international organizations substantially rely on generative AI for communications,” wrote the researchers.

The researchers tracked large language model (LLM) adoption across industries from January 2022 to September 2024 using a dataset that included 687,241 consumer complaints submitted to the US Consumer Financial Protection Bureau (CFPB), 537,413 corporate press releases, 304.3 million job postings, and 15,919 United Nations press releases.

By using a statistical detection system that tracked word usage patterns, the researchers found that roughly 18 percent of financial consumer complaints (including 30 percent of all complaints from Arkansas), 24 percent of corporate press releases, up to 15 percent of job postings, and 14 percent of UN press releases showed signs of AI assistance during that period of time.

The study also found that while urban areas showed higher adoption overall (18.2 percent versus 10.9 percent in rural areas), regions with lower educational attainment used AI writing tools more frequently (19.9 percent compared to 17.4 percent in higher-education areas). The researchers note that this contradicts typical technology adoption patterns where more educated populations adopt new tools fastest.

“In the consumer complaint domain, the geographic and demographic patterns in LLM adoption present an intriguing departure from historical technology diffusion trends where technology adoption has generally been concentrated in urban areas, among higher-income groups, and populations with higher levels of educational attainment.”

Researchers from Stanford, the University of Washington, and Emory University led the study, titled, “The Widespread Adoption of Large Language Model-Assisted Writing Across Society,” first listed on the arXiv preprint server in mid-February. Weixin Liang and Yaohui Zhang from Stanford served as lead authors, with collaborators Mihai Codreanu, Jiayu Wang, Hancheng Cao, and James Zou.

Detecting AI use in aggregate

We’ve previously covered that AI writing detection services aren’t reliable, and this study does not contradict that finding. On a document-by-document basis, AI detectors cannot be trusted. But when analyzing millions of documents in aggregate, telltale patterns emerge that suggest the influence of AI language models on text.

The researchers developed an approach based on a statistical framework in a previously released work that analyzed shifts in word frequencies and linguistic patterns before and after ChatGPT’s release. By comparing large sets of pre- and post-ChatGPT texts, they estimated the proportion of AI-assisted content at a population level. The presumption is that LLMs tend to favor certain word choices, sentence structures, and linguistic patterns that differ subtly from typical human writing.

To validate their approach, the researchers created test sets with known percentages of AI content (from zero percent to 25 percent) and found their method predicted these percentages with error rates below 3.3 percent. This statistical validation gave them confidence in their population-level estimates.

While the researchers specifically note their estimates likely represent a minimum level of AI usage, it’s important to understand that actual AI involvement might be significantly greater. Due to the difficulty in detecting heavily edited or increasingly sophisticated AI-generated content, the researchers say their reported adoption rates could substantially underestimate true levels of generative AI use.

Analysis suggests AI use as “equalizing tools”

While the overall adoption rates are revealing, perhaps more insightful are the patterns of who is using AI writing tools and how these patterns may challenge conventional assumptions about technology adoption.

In examining the CFPB complaints (a US public resource that collects complaints about consumer financial products and services), the researchers’ geographic analysis revealed substantial variation across US states.

Arkansas showed the highest adoption rate at 29.2 percent (based on 7,376 complaints), followed by Missouri at 26.9 percent (16,807 complaints) and North Dakota at 24.8 percent (1,025 complaints). In contrast, states like West Virginia (2.6 percent), Idaho (3.8 percent), and Vermont (4.8 percent) showed minimal AI writing adoption. Major population centers demonstrated moderate adoption, with California at 17.4 percent (157,056 complaints) and New York at 16.6 percent (104,862 complaints).

The urban-rural divide followed expected technology adoption patterns initially, but with an interesting twist. Using Rural Urban Commuting Area (RUCA) codes, the researchers found that urban and rural areas initially adopted AI writing tools at similar rates during early 2023. However, adoption trajectories diverged by mid-2023, with urban areas reaching 18.2 percent adoption compared to 10.9 percent in rural areas.

Contrary to typical technology diffusion patterns, areas with lower educational attainment showed higher AI writing tool usage. Comparing regions above and below state median levels of bachelor’s degree attainment, areas with fewer college graduates stabilized at 19.9 percent adoption rates compared to 17.4 percent in more educated regions. This pattern held even within urban areas, where less-educated communities showed 21.4 percent adoption versus 17.8 percent in more educated urban areas.

The researchers suggest that AI writing tools may serve as a leg-up for people who may not have as much educational experience. “While the urban-rural digital divide seems to persist,” the researchers write, “our finding that areas with lower educational attainment showed modestly higher LLM adoption rates in consumer complaints suggests these tools may serve as equalizing tools in consumer advocacy.”

Corporate and diplomatic trends in AI writing

According to the researchers, all sectors they analyzed (consumer complaints, corporate communications, job postings) showed similar adoption patterns: sharp increases beginning three to four months after ChatGPT’s November 2022 launch, followed by stabilization in late 2023.

Organization age emerged as the strongest predictor of AI writing usage in the job posting analysis. Companies founded after 2015 showed adoption rates up to three times higher than firms established before 1980, reaching 10–15 percent AI-modified text in certain roles compared to below 5 percent for older organizations. Small companies with fewer employees also incorporated AI more readily than larger organizations.

When examining corporate press releases by sector, science and technology companies integrated AI most extensively, with an adoption rate of 16.8 percent by late 2023. Business and financial news (14–15.6 percent) and people and culture topics (13.6–14.3 percent) showed slightly lower but still significant adoption.

In the international arena, Latin American and Caribbean UN country teams showed the highest adoption among international organizations at approximately 20 percent, while African states, Asia-Pacific states, and Eastern European states demonstrated more moderate increases to 11–14 percent by 2024.

Implications and limitations

In the study, the researchers acknowledge limitations in their analysis due to a focus on English-language content. Also, as we mentioned earlier, they found they could not reliably detect human-edited AI-generated text or text generated by newer models instructed to imitate human writing styles. As a result, the researchers suggest their findings represent a lower bound of actual AI writing tool adoption.

The researchers noted that the plateauing of AI writing adoption in 2024 might reflect either market saturation or increasingly sophisticated LLMs producing text that evades detection methods. They conclude we now live in a world where distinguishing between human and AI writing becomes progressively more difficult, with implications for communications across society.

“The growing reliance on AI-generated content may introduce challenges in communication,” the researchers write. “In sensitive categories, over-reliance on AI could result in messages that fail to address concerns or overall release less credible information externally. Over-reliance on AI could also introduce public mistrust in the authenticity of messages sent by firms.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Researchers surprised to find less-educated areas adopting AI writing tools faster Read More »

microsoft-cto-kevin-scott-thinks-llm-“scaling-laws”-will-hold-despite-criticism

Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism

As the word turns —

Will LLMs keep improving if we throw more compute at them? OpenAI dealmaker thinks so.

Kevin Scott, CTO and EVP of AI at Microsoft speaks onstage during Vox Media's 2023 Code Conference at The Ritz-Carlton, Laguna Niguel on September 27, 2023 in Dana Point, California.

Enlarge / Kevin Scott, CTO and EVP of AI at Microsoft speaks onstage during Vox Media’s 2023 Code Conference at The Ritz-Carlton, Laguna Niguel on September 27, 2023 in Dana Point, California.

During an interview with Sequoia Capital’s Training Data podcast published last Tuesday, Microsoft CTO Kevin Scott doubled down on his belief that so-called large language model (LLM) “scaling laws” will continue to drive AI progress, despite some skepticism in the field that progress has leveled out. Scott played a key role in forging a $13 billion technology-sharing deal between Microsoft and OpenAI.

“Despite what other people think, we’re not at diminishing marginal returns on scale-up,” Scott said. “And I try to help people understand there is an exponential here, and the unfortunate thing is you only get to sample it every couple of years because it just takes a while to build supercomputers and then train models on top of them.”

LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute). The laws suggest that simply scaling up model size and training data can lead to significant improvements in AI capabilities without necessarily requiring fundamental algorithmic breakthroughs.

Since then, other researchers have challenged the idea of persisting scaling laws over time, but the concept is still a cornerstone of OpenAI’s AI development philosophy.

You can see Scott’s comments in the video below beginning around 46: 05:

Microsoft CTO Kevin Scott on how far scaling laws will extend

Scott’s optimism contrasts with a narrative among some critics in the AI community that progress in LLMs has plateaued around GPT-4 class models. The perception has been fueled by largely informal observations—and some benchmark results—about recent models like Google’s Gemini 1.5 Pro, Anthropic’s Claude Opus, and even OpenAI’s GPT-4o, which some argue haven’t shown the dramatic leaps in capability seen in earlier generations, and that LLM development may be approaching diminishing returns.

“We all know that GPT-3 was vastly better than GPT-2. And we all know that GPT-4 (released thirteen months ago) was vastly better than GPT-3,” wrote AI critic Gary Marcus in April. “But what has happened since?”

The perception of plateau

Scott’s stance suggests that tech giants like Microsoft still feel justified in investing heavily in larger AI models, betting on continued breakthroughs rather than hitting a capability plateau. Given Microsoft’s investment in OpenAI and strong marketing of its own Microsoft Copilot AI features, the company has a strong interest in maintaining the perception of continued progress, even if the tech stalls.

Frequent AI critic Ed Zitron recently wrote in a post on his blog that one defense of continued investment into generative AI is that “OpenAI has something we don’t know about. A big, sexy, secret technology that will eternally break the bones of every hater,” he wrote. “Yet, I have a counterpoint: no it doesn’t.”

Some perceptions of slowing progress in LLM capabilities and benchmarking may be due to the rapid onset of AI in the public eye when, in fact, LLMs have been developing for years prior. OpenAI continued to develop LLMs during a roughly three-year gap between the release of GPT-3 in 2020 and GPT-4 in 2023. Many people likely perceived a rapid jump in capability with GPT-4’s launch in 2023 because they had only become recently aware of GPT-3-class models with the launch of ChatGPT in late November 2022, which used GPT-3.5.

In the podcast interview, the Microsoft CTO pushed back against the idea that AI progress has stalled, but he acknowledged the challenge of infrequent data points in this field, as new models often take years to develop. Despite this, Scott expressed confidence that future iterations will show improvements, particularly in areas where current models struggle.

“The next sample is coming, and I can’t tell you when, and I can’t predict exactly how good it’s going to be, but it will almost certainly be better at the things that are brittle right now, where you’re like, oh my god, this is a little too expensive, or a little too fragile, for me to use,” Scott said in the interview. “All of that gets better. It’ll get cheaper, and things will become less fragile. And then more complicated things will become possible. That is the story of each generation of these models as we’ve scaled up.”

Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism Read More »