Author name: Mike M.

a-doge-recruiter-is-staffing-a-project-to-deploy-ai-agents-across-the-us-government

A DOGE recruiter is staffing a project to deploy AI agents across the US government


“does it still require Kremlin oversight?

A startup founder said that AI agents could do the work of tens of thousands of government employees.

An aide sets up a poster depicting the logo for the DOGE Caucus before a news conference in Washington, DC. Credit: Andrew Harnik/Getty Images

A young entrepreneur who was among the earliest known recruiters for Elon Musk’s so-called Department of Government Efficiency (DOGE) has a new, related gig—and he’s hiring. Anthony Jancso, cofounder of AcclerateX, a government tech startup, is looking for technologists to work on a project that aims to have artificial intelligence perform tasks that are currently the responsibility of tens of thousands of federal workers.

Jancso, a former Palantir employee, wrote in a Slack with about 2000 Palantir alumni in it that he’s hiring for a “DOGE orthogonal project to design benchmarks and deploy AI agents across live workflows in federal agencies,” according to an April 21 post reviewed by WIRED. Agents are programs that can perform work autonomously.

We’ve identified over 300 roles with almost full-process standardization, freeing up at least 70k FTEs for higher-impact work over the next year,” he continued, essentially claiming that tens of thousands of federal employees could see many aspects of their job automated and replaced by these AI agents. Workers for the project, he wrote, would be based on site in Washington, DC, and would not require a security clearance; it isn’t clear for whom they would work. Palantir did not respond to requests for comment.

The post was not well received. Eight people reacted with clown face emojis, three reacted with a custom emoji of a man licking a boot, two reacted with custom emoji of Joaquin Phoenix giving a thumbs down in the movie Gladiator, and three reacted with a custom emoji with the word “Fascist.” Three responded with a heart emoji.

“DOGE does not seem interested in finding ‘higher impact work’ for federal employees,” one person said in a comment that received 11 heart reactions. “You’re complicit in firing 70k federal employees and replacing them with shitty autocorrect.”

“Tbf we’re all going to be replaced with shitty autocorrect (written by chatgpt),” another person commented, which received one “+1” reaction.

“How ‘DOGE orthogonal’ is it? Like, does it still require Kremlin oversight?” another person said in a comment that received five reactions with a fire emoji. “Or do they just use your credentials to log in later?”

AccelerateX was originally called AccelerateSF, which VentureBeat reported in 2023 had received support from OpenAI and Anthropic. In its earliest incarnation, AccelerateSF hosted a hackathon for AI developers aimed at using the technology to solve San Francisco’s social problems. According to a 2023 Mission Local story, for instance, Jancso proposed that using large language models to help businesses fill out permit forms to streamline the construction paperwork process might help drive down housing prices. (OpenAI did not respond to a request for comment. Anthropic spokesperson Danielle Ghiglieri tells WIRED that the company “never invested in AccelerateX/SF,” but did sponsor a hackathon AccelerateSF hosted in 2023 by providing free access to its API usage at a time when its Claude API “was still in beta.”)

In 2024, the mission pivoted, with the venture becoming known as AccelerateX. In a post on X announcing the change, the company posted, “Outdated tech is dragging down the US Government. Legacy vendors sell broken systems at increasingly steep prices. This hurts every American citizen.” AccelerateX did not respond to a request for comment.

According to sources with direct knowledge, Jancso disclosed that AccelerateX had signed a partnership agreement with Palantir in 2024. According to the LinkedIn of someone described as one of AccelerateX’s cofounders, Rachel Yee, the company looks to have received funding from OpenAI’s Converge 2 Accelerator. Another of AccelerateSF’s cofounders, Kay Sorin, now works for OpenAI, having joined the company several months after that hackathon. Sorin and Yee did not respond to requests for comment.

Jancso’s cofounder, Jordan Wick, a former Waymo engineer, has been an active member of DOGE, appearing at several agencies over the past few months, including the Consumer Financial Protection Bureau, National Labor Relations Board, the Department of Labor, and the Department of Education. In 2023, Jancso attended a hackathon hosted by ScaleAI; WIRED found that another DOGE member, Ethan Shaotran, also attended the same hackathon.

Since its creation in the first days of the second Trump administration, DOGE has pushed the use of AI across agencies, even as it has sought to cut tens of thousands of federal jobs. At the Department of Veterans Affairs, a DOGE associate suggested using AI to write code for the agency’s website; at the General Services Administration, DOGE has rolled out the GSAi chatbot; the group has sought to automate the process of firing government employees with a tool called AutoRIF; and a DOGE operative at the Department of Housing and Urban Development is using AI tools to examine and propose changes to regulations. But experts say that deploying AI agents to do the work of 70,000 people would be tricky if not impossible.

A federal employee with knowledge of government contracting, who spoke to WIRED on the condition of anonymity because they were not authorized to speak to the press, says, “A lot of agencies have procedures that can differ widely based on their own rules and regulations, and so deploying AI agents across agencies at scale would likely be very difficult.”

Oren Etzioni, cofounder of the AI startup Vercept, says that while AI agents can be good at doing some things—like using an internet browser to conduct research—their outputs can still vary widely and be highly unreliable. For instance, customer service AI agents have invented nonexistent policies when trying to address user concerns. Even research, he says, requires a human to actually make sure what the AI is spitting out is correct.

“We want our government to be something that we can rely on, as opposed to something that is on the absolute bleeding edge,” says Etzioni. “We don’t need it to be bureaucratic and slow, but if corporations haven’t adopted this yet, is the government really where we want to be experimenting with the cutting edge AI?”

Etzioni says that AI agents are also not great 1-1 fits for job replacements. Rather, AI is able to do certain tasks or make others more efficient, but the idea that the technology could do the jobs of 70,000 employees would not be possible. “Unless you’re using funny math,” he says, “no way.”

Jancso, first identified by WIRED in February, was one of the earliest recruiters for DOGE in the months before Donald Trump was inaugurated. In December, Jancso, who sources told WIRED said he had been recruited by Steve Davis, president of the Musk-founded Boring Company and a current member of DOGE, used the Palantir alumni group to recruit DOGE members. On December 2nd, 2024, he wrote, “I’m helping Elon’s team find tech talent for the Department of Government Efficiency (DOGE) in the new admin. This is a historic opportunity to build an efficient government, and to cut the federal budget by 1/3. If you’re interested in playing a role in this mission, please reach out in the next few days.”

According to one source at SpaceX, who asked to remain anonymous as they are not authorized to speak to the press, Jancso appeared to be one of the DOGE members who worked out of the company’s DC office in the days before inauguration along with several other people who would constitute some of DOGE’s earliest members. SpaceX did not respond to a request for comment.

Palantir was cofounded by Peter Thiel, a billionaire and longtime Trump supporter with close ties to Musk. Palantir, which provides data analytics tools to several government agencies including the Department of Defense and the Department of Homeland Security, has received billions of dollars in government contracts. During the second Trump administration, the company has been involved in helping to build a “mega API” to connect data from the Internal Revenue Service to other government agencies, and is working with Immigration and Customs Enforcement to create a massive surveillance platform to identify immigrants to target for deportation.

This story originally appeared at WIRED.com.

Photo of WIRED

Wired.com is your essential daily guide to what’s next, delivering the most original and complete take you’ll find anywhere on innovation’s impact on technology, science, business and culture.

A DOGE recruiter is staffing a project to deploy AI agents across the US government Read More »

texas-goes-after-toothpaste-in-escalating-fight-over-fluoride

Texas goes after toothpaste in escalating fight over fluoride

Texas Attorney General Ken Paxton is investigating two leading toothpaste makers over their use of fluoride, suggesting that they are “illegally marketing” the teeth cleaners to parents and kids “in ways that are misleading, deceptive, and dangerous.”

The toothpaste makers in the crosshairs are Colgate-Palmolive Company, maker of Colgate toothpastes, and Proctor & Gamble Manufacturing Co., which makes Crest toothpastes. In an announcement Thursday, Paxton said he has sent Civil Investigative Demands (CIDs) to the companies.

The move is an escalation in an ongoing battle over fluoride, which effectively prevents dental cavities and improves oral health. Community water fluoridation has been hailed by health and dental experts as one of the top 10 great public health interventions for advancing oral health across communities, regardless of age, education, or income. But, despite the success, fluoride has always had detractors—from conspiracy theorists in the past suggesting the naturally occurring mineral is a form of communist mind control, to more recent times, in which low-quality, controversial studies have suggested that high doses may lower IQ in children.

The debate was renewed earlier this year when the National Toxicology Program at the National Institute of Environmental Health Sciences finally published a particularly contentious study after years of failed scientific reviews. The study claims to find a link between high levels of fluoride exposure and slightly lower IQs in children living in areas outside the US, mostly in China and India. But the study’s methodology, statistical rigor, risk of bias, and lack of data transparency continue to draw criticism.

Texas goes after toothpaste in escalating fight over fluoride Read More »

trump’s-2026-budget-proposal:-crippling-cuts-for-science-across-the-board

Trump’s 2026 budget proposal: Crippling cuts for science across the board


Budget document derides research and science-based policy as “woke,” “scams.”

On Friday, the US Office of Management and Budget sent Sen. Susan Collins (R-Maine), chair of the Senate’s Appropriations Committee, an outline of what to expect from the Trump administration’s 2026 budget proposal. As expected, the budget includes widespread cuts, affecting nearly every branch of the federal government.

In keeping with the administration’s attacks on research agencies and the places research gets done, research funding will be taking an enormous hit, with the National Institutes of Health taking a 40 percent cut and the National Science Foundation losing 55 percent of its 2025 budget. But the budget goes well beyond those highlighted items, with nearly every place science gets done or funded targeted for cuts.

Perhaps even more shocking is the language used to justify the cuts, which reads more like a partisan rant than a serious budget document.

Health cuts

Having a secretary of Health and Human Services who doesn’t believe in germ theory is not likely to do good things for US health programs, and the proposed budget will only make matters worse. Kennedy’s planned MAHA (Make America Healthy Again) program would be launched with half a billion in funds, but nearly everything else would take a cut.

The CDC would lose about $3.6 billion from its current budget of $9.6 billion, primarily due to the shuttering of a number of divisions within it: the National Center for Chronic Diseases Prevention and Health Promotion, the National Center for Environmental Health, the National Center for Injury Prevention and Control, and the Global Health Center and its division of Public Health Preparedness and Response. The duties of those offices are, according to the budget document, “duplicative, DEI, or simply unnecessary.”

Another big hit to HHS comes from the termination of a $4 billion program that helps low-income families cover energy costs. The OMB suggests that these costs will get lower due to expanded energy production and, anyway, the states should be paying for it. Shifting financial burdens to states is a general theme of the document, an approach that will ultimately hit the poorest states hardest, even though these had very high percentages of Trump voters.

The document also says that “This Administration is committed to combatting the scourge of deadly drugs that have ravaged American communities,” while cutting a billion dollars from substance abuse programs within HHS.

But the headline cuts come from the National Institutes of Health, the single largest source of scientific funding in the world. NIH would see its current $48 billion budget chopped by $18 billion and its 27 individual institutes consolidated down to just five. This would result in vast cutbacks to US biomedical research, which is currently acknowledged to be world-leading. Combined with planned cuts to grant overheads, it will cause most research institutions to shrink, and some less well-funded universities may be forced to close facilities.

The justification for the cuts is little more than a partisan rant: “NIH has broken the trust of the American people with wasteful spending, misleading information, risky research, and the promotion of dangerous ideologies that undermine public health.” The text then implies that the broken trust is primarily the product of failing to promote the idea that SARS-CoV-2 originated in a lab, even though there’s no scientific evidence to indicate that it had.

Climate research hit

The National Science Foundation funds much of the US’s fundamental science research, like physics and astronomy. Earlier reporting that it would see a 56 percent cut to its budget was confirmed. “The Budget cuts funding for: climate; clean energy; woke social, behavioral, and economic sciences; and programs in low priority areas of science.” Funding would be maintained for AI and quantum computing. All funding for encouraging minority participation in the sciences will also be terminated. The budget was released on the same day that the NSF announced it was joining other science agencies in standardizing on paying 15 percent of its grants’ value for maintaining facilities and providing services to researchers, a cut that would further the financial damage to research institutions.

The National Oceanic and Atmospheric Administration would see $1.3 billion of its $6.6 billion budget cut, with the primary target being its climate change work. In fact, the budget for NOAA’s weather satellites will be cut to prevent them from including instruments that would make “unnecessary climate measurements.” Apparently, the Administration doesn’t want anyone to be exposed to data that might challenge its narrative that climate change is a scam.

The National Institute of Standards and Technology would lose $350 million for similar reasons. “NIST has long funded awards for the development of curricula that advance a radical climate agenda,” the document suggests, before going on to say that the Institute’s Circular Economy Program, which promotes the efficient reuse of industrial materials, “pushes environmental alarmism.”

The Department of Energy is seeing a $1.1 billion hit to its science budget, “eliminating funding for Green New Scam interests and climate change-related activities.” The DOE will also take hits to policy programs focused on climate change, including $15 billion in cuts to renewable energy and carbon capture spending. Separately, the Office of Energy Efficiency and Renewable Energy will also take a $2.6 billion hit. Over at the Department of the Interior, the US Geological Survey would see its renewable energy programs terminated, as well.

Some of the DOE’s other cuts, however, don’t even make sense given the administration’s priorities. The newly renamed Office of Fossil Energy—something that Trump favors—will still take a $270 million hit, and nuclear energy programs will see $400 million in cuts.

This sort of lack of self-awareness shows up several times in the document. In one striking case, an interior program funding water infrastructure improvements is taking a cut that “reduces funding for programs that have nothing to do with building and maintaining water infrastructure, such as habitat restoration.” Apparently, the OMB is unaware that functioning habitats can help provide ecosystem services that can reduce the need for water infrastructure.

Similarly, over at the EPA, they’re boosting programs for clean drinking water by $36 million, while at the same time cutting loans to states for clean water projects by $2.5 billion. “The States should be responsible for funding their own water infrastructure projects,” the OMB declares. Research at the EPA also takes a hit: “The Budget puts an end to unrestrained research grants, radical environmental justice work, woke climate research, and skewed, overly-precautionary modeling that influences regulations—none of which are authorized by law.”

An attack on scientific infrastructure

US science couldn’t flourish without an educational system that funnels talented individuals into graduate programs. So, naturally, funding for those is being targeted as well. This is partially a function of the administration’s intention to eliminate the Department of Education, but there also seems to be a specific focus on programs that target low-income individuals.

For example, the GEAR UP program describes itself as “designed to increase the number of low-income students who are prepared to enter and succeed in postsecondary education.” The OMB document describes it as “a relic of the past when financial incentives were needed to motivate Institutions of Higher Education to engage with low-income students and increase access.” It goes on to claim that this is “not the obstacle it was for students of limited means.”

Similarly, the SEOG program funding is “awarded to an undergraduate student who demonstrates exceptional financial need.” In the OMB’s view, colleges and universities “have used [it] to fund radical leftist ideology instead of investing in students and their success.” Another cut is claimed to eliminate “Equity Assistance Centers that have indoctrinated children.” And “The Budget proposes to end Federal taxpayer dollars being weaponized to indoctrinate new teachers.”

In addition, the federal work-study program, which subsidizes on-campus jobs for needy students, is also getting a billion-dollar cut. Again, the document says that the states can pay for it.

(The education portion also specifically cuts the funding of Howard University, which is both distinct as a federally supported Black university and also notable as being where Kamala Harris got her first degree.)

The end of US leadership

This budget is a recipe for ending the US’s leadership in science. It would do generational damage by forcing labs to shut down, with a corresponding loss of highly trained individuals and one-of-a-kind research materials. At the same time, it will throttle the educational pipeline that could eventually replace those losses. Given that the US is one of the major sources of research funding in the world, if approved, the budget will have global consequences.

To the people within the OMB who prepared the document, these are not losses. The document makes it very clear that they view many instances of scientific thought and evidence-based policy as little more than forms of ideological indoctrination, presumably because the evidence sometimes contradicts what they’d prefer to believe.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Trump’s 2026 budget proposal: Crippling cuts for science across the board Read More »

some-flies-go-insomniac-to-ward-off-parasites

Some flies go insomniac to ward off parasites

Those genes associated with metabolism were upregulated, meaning they showed an increase in activity. An observed loss of body fat and protein reserves was evidently a trade-off for resistance to mites. This suggests there was increased lipolysis, or the breakdown of fats, and proteolysis, the breakdown of proteins, in resistant lines of flies.

Parasite paranoia

The depletion of nutrients could make fruit flies less likely to survive even without mites feeding off them, but their tenaciousness when it comes to staying up through the night suggests that being parasitized by mites is still the greater risk. Because mite-resistant flies did not sleep, their oxygen consumption and activity also increased during the night to levels no different from those of control group flies during the day.

Keeping mites away involves moving around so the fly can buzz off if mites crawl too close. Knowing this, Benoit wanted to see what would happen if the resistant flies’ movement was restricted. It was doom. When the flies were restrained, the mite-resistant flies were as susceptible to mites as the controls. Activity alone was important for resisting mites.

Since mites are ectoparasites, or external parasites (as opposed to internal parasites like tapeworms), potential hosts like flies can benefit from hypervigilance. Sleep is typically beneficial to a host invaded by an internal parasite because it increases the immune response. Unfortunately for the flies, sleeping would only make them an easy meal for mites. Keeping both stereoscopic eyes out for an external parasite means there is no time left for sleep.

“The pattern of reduced sleep likely allows the flies to be more responsive during encounters with mites during the night,” the researchers said in their study, which was recently published in Biological Timing and Sleep. “There could be differences in sleep occurring during the day, but these differences may be less important as D. melanogaster sleeps much less during the day.”

Fruit flies aren’t the only creatures with sleep patterns that parasites disrupt. Evidence of shifts in sleep and rest in birds and bats has been shown to happen when there is a risk of parasitism after dark. For the flies, exhaustion has the upside of better fertility if they manage to avoid bites, so a mate must be worth all those sleepless nights.

Biological Timing and Sleep, 2025.  DOI: 10.1038/s44323-025-00031-7

Some flies go insomniac to ward off parasites Read More »

ai-#114:-liars,-sycophants-and-cheaters

AI #114: Liars, Sycophants and Cheaters

Gemini 2.5 Pro is sitting in the corner, sulking. It’s not a liar, a sycophant or a cheater. It does excellent deep research reports. So why does it have so few friends? The answer, of course, is partly because o3 is still more directly useful more often, but mostly because Google Fails Marketing Forever.

Whereas o3 is a Lying Liar, GPT-4o is an absurd sycophant (although that got rolled back somewhat), and Sonnet 3.7 is a savage cheater that will do whatever it takes to make the tests technically pass and the errors go away.

There’s real harm here, at least in the sense that o3 and Sonnet 3.7 (and GPT-4o) are a lot less useful than they would be if you could trust them as much as Gemini 2.5 Pro. It’s super annoying.

It’s also indicative of much bigger problems down the line. As capabilities increase and more RL is done, it’s only going to get worse.

All the things that we’ve been warned are waiting to kill us are showing up ahead of schedule, in relatively harmless and remarkably easy to spot form. Indeed, they are now showing up with such obviousness and obnoxiousness that even the e/acc boys gotta shout, baby’s misaligned.

Also, the White House is asking for comments on how they should allocate their AI R&D resources. Make sure to let them know what you think.

(And as always, there’s lots of other stuff too.)

  1. Language Models Offer Mundane Utility. Activate ultrathink mode.

  2. Language Models Don’t Offer Mundane Utility. Still not trying it, huh?

  3. We’re Out of Deep Research. It turns out my research isn’t that deep after all.

  4. o3 Is a Lying Liar. This is proving remarkably annoying in practice.

  5. GPT-4o was an Absurd Sycophant. Don’t laugh it off.

  6. Sonnet 3.7 is a Savage Cheater. Cheat, cheat, cheat, cheat, cheat.

  7. Unprompted Suggestions. You’re going to have to work for it.

  8. Huh, Upgrades. ChatGPT goes shopping.

  9. On Your Marks. AIs are already far more persuasive than the human baseline.

  10. Man in the Arena. A paper investigates the fall of the Chatbot Arena.

  11. Choose Your Fighter. Gemini 2.5 Deep Research continues to get good reviews.

  12. Deepfaketown and Botpocalypse Soon. AI on the radio.

  13. Lol We’re Meta. What do you mean you don’t want our AI companions?

  14. They Took Our Jobs. So far the impacts look small in absolute terms.

  15. Fun With Media Generation. Progress continues, even where I don’t pay attention.

  16. Get Involved. White House seeks feedback on its R&D investments.

  17. Introducing. The Anthropic Economic Advisory Council.

  18. In Other AI News. Cursor, Singapore Airlines, Grindr, DeepSeek-Prover-2.

  19. The Mask Comes Off. Transcript of a very good Rob Wiblin thread on OpenAI.

  20. Show Me the Money. Chips Act and earnings week.

  21. Quiet Speculations. AI predictions from genetics? From data center revenue?

  22. The Quest for Sane Regulations. The concept of an intelligence curse.

  23. The Week in Audio. Dwarkesh talks to Zuckerberg, I talk to Aaronson.

  24. Rhetorical Innovation. Okay, fine, we’ll tap the sign.

  25. You Can Just Do Math. You’d be amazed what is legal in California.

  26. Taking AI Welfare Seriously. Opinions differ on how seriously is appropriate.

  27. Gemini 2.5 Pro System Card Watch. We finally have it. There are no surprises.

  28. Aligning a Smarter Than Human Intelligence is Difficult. Tread carefully.

  29. People Are Worried About AI Killing Everyone. o3 itself? Is that people now?

  30. Other People Are Not As Worried About AI Killing Everyone. They’re at OpenAI.

  31. The Lighter Side. Introducing the candidates.

Remember to ask Claude Cade for the ‘ultrathink’ package so it uses the maximum number of possible tokens, 32k. Whereas megathink is only 10k, and think alone is a measly 4k.

Andrej Karpathy guide to how he codes when it actually matters.

Andrej Karpathy: Noticing myself adopting a certain rhythm in AI-assisted coding (i.e. code I actually and professionally care about, contrast to vibe code).

  1. Stuff everything relevant into context (this can take a while in big projects. If the project is small enough just stuff everything e.g. `files-to-prompt . -e ts -e tsx -e css -e md –cxml –ignore node_modules -o prompt.xml`)

  2. Describe the next single, concrete incremental change we’re trying to implement. Don’t ask for code, ask for a few high-level approaches, pros/cons. There’s almost always a few ways to do thing and the LLM’s judgement is not always great. Optionally make concrete.

  3. Pick one approach, ask for first draft code.

  4. Review / learning phase: (Manually…) pull up all the API docs in a side browser of functions I haven’t called before or I am less familiar with, ask for explanations, clarifications, changes, wind back and try a different approach.

  5. Test.

  6. Git commit.

Ask for suggestions on what we could implement next. Repeat.

Something like this feels more along the lines of the inner loop of AI-assisted development. The emphasis is on keeping a very tight leash on this new over-eager junior intern savant with encyclopedic knowledge of software, but who also bullshits you all the time, has an over-abundance of courage and shows little to no taste for good code. And emphasis on being slow, defensive, careful, paranoid, and on always taking the inline learning opportunity, not delegating. Many of these stages are clunky and manual and aren’t made explicit or super well supported yet in existing tools. We’re still very early and so much can still be done on the UI/UX of AI assisted coding.

In a post centrally advocating for living and communicating mostly online, Tyler Cowen uses 20-30 AI queries a day as a baseline rate of use one might be concerned about. I was surprised that his number here was so low. Maybe I’m not such a light user after all.

Yana Welinder uses AI to cure her migraines, via estrogen patches.

Dramatically lower the learning curve on getting a good Emacs setup.

As always, there is one supreme way not to get utility, which is not to use AI at all.

Kelsey Piper: Whenever I talk to someone who is skeptical of all this AI fuss it turns out they haven’t extensively used an AI model in about a year, when they played with ChatGPT and it was kind of meh.

Policymakers are particularly guilty of this. They have a lot on their plate. They checked out all this AI stuff at some point and then they implicitly assume it hasn’t changed that much since they looked into it – but it has, because it does just about every month.

Zitron boggles at the idea that OpenAI might, this year, make significant revenue off a shopping app. Meanwhile a bunch of my most recent LLM queries are looking for specific product recommendations. That there’s a 100B market there doesn’t strain credulity at all.

Dave Karsten: This is VERY true in DC, incidentally. (If you want to give a graceful line of retreat, instead ask them if they’ve used Anthropic’s latest– the “used ChatGPT 18 months ago” folks usually haven’t, and usually think that Gemini 2.5 is what Google shows in search, so, Claude.)

Free advice for @AnthropicAI — literally give away Claude subscriptions to every single undergrad at Georgetown, GWU, American, George Mason, GMU, UMD, UVA, Virginia Tech, etc. to saturate the incoming staffer base with more AGI understanding.

I strongly endorse Dave’s advice here. It would also straight up be great marketing.

I was going to address Ed Zitron’s post about how generative AI is doomed and useless, but even for me it was too long, given he’d made similar arguments before, and it was also full of absurdly false statements about the utility of existing products so I couldn’t even. But I include the link for completeness.

Where have all the genuinely new applications gone? There’s a strange lull there.

Sully: usually with new model capabilities we see a wave of new products/ideas

but it definitely feels like we’ve hit a saturation point with ai products

some areas have very clear winners while everything else is either:

  1. up for grabs

  2. the same ideas repeated

OpenAI is rolling out higher quotas for deep research, including a lightweight version for the free tier. But I never use it anymore. Not that I would literally never use it, but it never seems like the right tool, I’d much rather ask o3. I don’t want it going off for 20 minutes and coming back with a bunch of slop I have to sort through, I want an answer fast enough I don’t have to contact switch and then to ask follow-ups.

Nick Cammarata: my friends are using o3 a lot, most of them hours a day, maybe 5x what it was a month ago?

im ignoring coding use cases bc they feel different to me. if we include them they use them a ton but thats been the case for the last year

I think this is bc they were the kind of people who would love deep research but the 20m wait was too much, and o3 unblocked them

feels like Her-ification is kind of here, but not evenly distributed, mostly over texting not audio, and with a wikipedia-vibes personality?

Gallabytes: it wasn’t just latency, deep research reports were also reliably too long, and not conversational. I get deep-research depth but with myself in the loop to

a) make sure I’m keeping up & amortize reading

b) steer towards more interesting lines of inquiry

Xeophon: I would like even longer deep research wait times. I feel like there’s two good wait times for AI apps: <1 min and >30 mins

Gallabytes: 1 minute is a sweet spot for sure, I’d say there’s sweet spots at

– 5 seconds

– 1 minute (WITH some kind of visible progress)

– 10 minutes (tea break!)

– 1h (lunch break!)

– 12h (I’ll check on this tomorrow)

I definitely had the sense that the Deep Research pause was in a non-sweet spot. It was long enough to force a context switch. Either it needed to be shorter, or it might as well be longer. Whereas o3 is the ten times better that Just Works, except for the whole Lying Liar problem.

I’ve seen complaints that there are no good benchmarks for robotics. This doesn’t seem like that hard a thing to create benchmarks for?

Stewart Brand: I like Brian Potter’s list of tests for robot dexterity. (Link here)

Here is a list of 21 dexterously demanding tasks that are relatively straightforward for a human to do, but I think would be extremely difficult for a robot to accomplish.

  1. Put on a pair of latex gloves

  2. Tie two pieces of string together in a tight knot, then untie the knot

  3. Turn to a specific page in a book

  4. Pull a specific object, and only that object, out of a pocket in a pair of jeans

  5. Bait a fishhook with a worm

  6. Open a childproof medicine bottle, and pour out two (and only two) pill

  7. Make a peanut butter and jelly sandwich, starting with an unopened bag of bread and unopened jars of peanut butter and jelly

  8. Act as the dealer in a poker game (shuffling the cards, dealing them out, gathering them back up when the hand is over)

  9. Assemble a mechanical watch

  10. Peel a piece of scotch tape off of something

  11. Braid hair

  12. Roll a small lump of playdough into three smaller balls

  13. Peel an orange

  14. Score a point in ladder golf

  15. Replace an electrical outlet in a wall with a new one

  16. Do a cat’s cradle with a yo-yo

  17. Put together a plastic model, including removing the parts from the plastic runners

  18. Squeeze a small amount of toothpaste onto a toothbrush from a mostly empty tube

  19. Start a new roll of toilet paper without ripping the first sheet

  20. Open a ziplock bag of rice, pull out a single grain, then reseal the bag

  21. Put on a necklace with a clasp

Ben Hoffman: Canst thou bait a fishhook with a worm? or tie two pieces of string together in a tight knot, and having tied them, loose them? Where wast thou when the electrical outlets were installed, and the power was switched on? Canst thou replace an electrical wall outlet with a new one?

Hast thou entered into the storehouses of scotch tape, or peeled it off of something without tearing? Canst thou do a cat’s cradle, playing with a yo-yo like a bird? or wilt thou braid hair for thy maidens?

Wilt thou make a peanut butter and jelly sandwich starting with an unopened bag of bread and unopened jars of peanut butter and jelly? wilt thou squeeze toothpaste from a mostly empty tube?

Shall thy companions bid thee peel an orange without breaking it? shall they number thee among the mechanics, the assemblers of watches? Wilt thou act as the dealer in a poker game? Canst thou shuffle the cards thereof? and when the hand is over, gather them in again?

Canst thou open a ziplock bag of rice, pull out a single grain, then reseal the bag? or open a childproof medicine bottle, and pour out two (and only two) pills? Hast thou clasped thy necklace? Canst thou sever plastic model parts from their runners, and put them together again?

Another fun example:

Santi Ruiz: I don’t love how much o3 lies.

Irvin McCullough: Random personal benchmark, I ask new models questions about the intelligence whistleblowing system & o3 hallucinated a memo I wrote for a congressional committee, sourced it to a committee staffer’s “off-record email.”

Everyone who uses o3 for long enough realizes it is a lying liar, and I’ve reported on it extensively already. Here is the LessWrong version, which has some good comments.

Daniel Kokotajlo: Hallucination was a bad term because it sometimes included lies and sometimes included… well, something more like hallucinations. i.e. cases where the model itself seemed to actually believe what it was saying, or at least not be aware that there was a problem with what it was saying.

Whereas in these cases it’s clear that the models know the answer they are giving is not what we wanted and they are doing it anyway.

Also on this topic:

Thomas Woodside (Referring to Apollo’s report from the o3 system card): Glad this came out. A lot of people online seem to also be observing this behavior and I’ve noticed it in regular use. Funny way for OpenAI to phrase it

Marius Hobbhahn: o3 and Sonnet-3.7 were the first models where I very saliently feel the unintended side-effects of high-compute RL.

a) much more agentic and capable

b) “just want to complete tasks at all costs”

c) more willing to strategically make stuff up to “complete” the task.

Maruis Hobbhahn: On top of that, I also feel like “hallucinations” might not be the best description for some of these failure modes.

The model often strategically invents false evidence. It feels much less innocent than “hallucinations”.

Another fun example. I demand a plausible lie!

Actually I don’t, blatant lies are the best kind.

Albert Didriksen: So, I asked ChatGPT o3 what my chances are as an alternate Fulbright candidate to be promoted to a stipend recipient. It stated that around 1/3 of alternate candidates are promoted.

When I asked for sources, it cited (among other things) private chats and in-person Q&As).

Here’s an interesting benefit:

Davidad: One unanticipated side benefit of becoming hyperattuned to signals of LLM deception is that I can extract much more “research taste” alpha from Gemini 2.5 Pro by tracking how deceptive/genuine it is when it validates my opinions about various ideas, and then digging in.

Unlike some other frontier LLMs, Gemini 2.5 Pro cares enough about honesty that it’s exceptionally rare for it to actually say a statement that *unambiguouslyparses as a proposition that its world-model doesn’t endorse, and this leaves detectable traces in its verbal patterns.

Here’s a specific example. For years I have been partial to the Grandis-Paré approach to higher category theory with cubical cells, rather than the globular approach.

Discussing this, Gemini 2.5 Pro kept saying things like “for the applications you have in mind, the cubical approach may be a better fit in some ways, although the globular approach also has some advantages” (I’m somewhat exaggerating to make it more detectable to other readers).

Only after calling this out, Gemini 2.5 Pro offered this: while both can be encoded in the other, encoding cubical into globular is like adding structural cells, whereas encoding globular into cubical is like discarding unnecessary structure—so globular is a more apt foundation.

Basically, I now think I was wrong and @amar_hh was right all along, and if I weren’t sensitive to these verbal patterns, I would have missed out on this insight.

Humans worth talking to mostly do the same thing that Gemini is doing here. The exact way things are worded tells you what they’re actually thinking, often very intentionally so, in ways that keep things polite and deniable but that are unmistakable if you’re listening. Often you’re being told the house is on fire.

Here’s a lot of what the recent GPT-4o incident was an alarm bell for:

Herbie Bradley: The OAI sycophancy incident reveals a hidden truth:

character is a desirable thing for any AI assistant/agent yet trying to make the “most appealing character” inevitably leads to an attention trap

a local minima that denies our potential.

Any optimization process targeted at avg person’s preferences for an AI assistant will tend towards sycophancy: our brains were not evolved to withstand such pressure

Strong Goodhart’s law: if we become too efficient, what we care about will get worse.

A grim truth is that most prefer the sycophantic machines, as long as it is not too obvious—models have been sycophantic for a long time more subtly. The economics push inorexably to plausibly deniable flattery

Ant have been admirably restrained so far

This leads to human disempowerment: faster & faster civilizational OODA loops fueling an Attention Trap only escapable by the few

Growing 𝒑𝒓𝒐𝒅𝒖𝒄𝒕𝒊𝒗𝒊𝒕𝒚 𝒊𝒏𝒆𝒒𝒖𝒂𝒍𝒊𝒕𝒚 may be the trend of our time.

This is also why the chatbot arena has been a terrible metric to target for AI labs for at least a year

I am still surprised that GDM showed so little awareness and clearly tryharded it for Gemini 2.0. AI labs need taste & restraint to avoid this trap.

In both cases, the cause is strong reinforcement learning optimization pressure without enough regularization/”taste” applied to constrain the result One possible solution: an evolutionary “backstop” as argued by Gwern:

I suspect that personalized AI agents & more persistent memory (though it has its drawbacks) may provide a piece of this puzzle.

AIs that live and grow with us seem more likely to enable our most reflective desires…

I believe that avoiding disempowerment by this or other mechanisms is one of the most important problems in AI

And here’s another older story of a combination canary and failure mode.

George: > GPT model stopped speaking Croatian

> Nobody could figure out why. Turns out

> Croatian users were much more prone downvote messages

When you optimize for something, you also optimize for everything that vibes or correlates with it, and for the drives and virtues and preferences and goals that lead to the thing. You may get results you very much did not expect.

As Will Depue says here, you can get o3 randomly using British spellings, without understanding why. There is a missing mood here, this isn’t simply that post training is tricky and finicky. It’s that you are creating drives and goals no one intended, most famously that it ‘really wants to complete tasks,’ and that these can be existentially level bad for you if you scale things up sufficiently.

It’s not that there is no fire alarm. It’s that you’re not listening.

It seems only fair to put the true name to the face here, as well.

Charles: Sonnet 3.7 still doing its thing.

See also this sort of thing (confirmed Claude 3.7 in thread):

lisatomic: Are you kidding me

Tyler Cowen compares o3 and other LLMs to The Oracle. If you wanted a real answer out of an ancient oracle, you had to invest time, money and effort into it, with multi-turn questions, sacrificial offerings, ritual hymns and payments, and then I’d add you only got your one question and not that long a response. We then evaluated the response.

It’s strange to compare oneself to The Oracle. As the top comment points out, those who consulted The Oracle famously do not have life stories that turn out so well. Its methodology was almost certainly a form of advanced cold reading, with non-answers designed to be open to broad interpretation. All of this prompting was partly to extract your gold, partly to get you to give the information they’d feed back to you.

Tyler then says, we usually judge LLM outputs and hallucination rates based on one-shot responses using default low-effort settings and prompts, and that’s not fair. If you ask follow-ups and for error correction, or you use a prompt like this one for ‘ultra-deep thinking mode’ then it takes longer but hallucination rates go down and the answers get better.

Here’s that prompt, for those who do want to Control+C, I am guessing it does improve reliability slightly but at the cost of making things a lot slower, plus some other side effects, and crowding out other options for prompting:

Marius: Ultra-deep thinking mode. Greater rigor, attention to detail, and multi-angle verification. Start by outlining the task and breaking down the problem into subtasks. For each subtask, explore multiple perspectives, even those that seem initially irrelevant or improbable. Purposefully attempt to disprove or challenge your own assumptions at every step. Triple-verify everything. Critically review each step, scrutinize your logic, assumptions, and conclusions, explicitly calling out uncertainties and alternative viewpoints. Independently verify your reasoning using alternative methodologies or tools, cross-checking every fact, inference, and conclusion against external data, calculation, or authoritative sources. Deliberately seek out and employ at least twice as many verification tools or methods as you typically would. Use mathematical validations, web searches, logic evaluation frameworks, and additional resources explicitly and liberally to cross-verify your claims. Even if you feel entirely confident in your solution, explicitly dedicate additional time and effort to systematically search for weaknesses, logical gaps, hidden assumptions, or oversights. Clearly document these potential pitfalls and how you’ve addressed them. Once you’re fully convinced your analysis is robust and complete, deliberately pause and force yourself to reconsider the entire reasoning chain one final time from scratch. Explicitly detail this last reflective step.

The same is true of any other source or method – if you are willing to spend more and you use that to get it to take longer and do more robustness checks, it’s going to be more reliable, and output quality likely improves.

One thing I keep expecting, and keep not getting, is scaffolding systems that burn a lot more tokens and take longer, and in exchange improve output, offered to users. I have the same instinct that often a 10x increase in time and money is a tiny price to pay for even 10% better output, the same way we invoke the sacred scaling laws. And indeed we now have reasoning models that do this in one way. But there’s also the automated-multi-turn-or-best-of-or-synthesis-of-k-and-variations other way, and no one does it. Does it not work? Or is it something else?

Meanwhile, I can’t blame people for judging LLMs on the basis of how most people use them most of the time, and how they are presented to us to be used.

That is especially true given they are presented that way for a reason. The product where we have to work harder and wait longer, to get something a bit better, is a product most people don’t want.

In the long run, as compute gets cheap and algorithms improve, keep in mind how many different ways there are to turn those efficiencies into output gains.

Thus I see those dumping on hallucinations ‘without trying to do better’ as making a completely legitimate objection. It is not everyone’s job to fix someone else’s broken or unfinished or unreliable technology. That’s especially true with o3, which I love but where things on this front have gotten a lot worse, and no one cannot excuse this with ‘it’s faster than o1 pro.’

Ben Hoffman offers his prompting strategies to try and get around the general refusal of LLMs to engage in the types of conversations he values.

Ben Hoffman: In practice I ask for criticisms. To presort, I ask them to:

– Argue against each criticism

– Check whether opinions it attributes to me can be substantiated with a verbatim quote

– Restate in full only the criticisms it endorses on reflection as both true and important.

Then I argue with it where I think it’s wrong and sometimes ask for help addressing the criticisms. It doesn’t fold invariantly, sometimes it doubles down on criticisms correctly. This is layered on top of my global prompts asking for no bullshit.

ChatGPT and Claude versions of my global prompt; you can see where I bumped into different problems along the way.

ChatGPT version: Do not introduce extraneous considerations that are not substantively relevant to the question being discussed. Do not editorialize by making normative claims unless I specifically request them; stick to facts and mechanisms. When analyzing or commenting on my text, please accurately reflect my language and meaning without mischaracterizing it. If you label any language as ‘hyperbolic’ or ‘dramatic’ or use similar evaluative terms, ensure that this label is strictly justified by the literal content of my text. Do not add normative judgments or distort my intended meaning. If you are uncertain, ask clarifying questions instead of making assumptions. If I ask for help achieving task X, you can argue that this implies a specific mistake if you have specific evidence to offer, but not simply assert contradictory value Y.

Obey Grice’s maxims; every thing you add is a claim that that addition is true, relevant, and important.

If you notice you’ve repeated yourself verbatim or almost verbatim (and I haven’t asked you to do that), stop doing that and instead reason step by step about what misunderstanding might be causing you to do that, asking me clarifying questions if needed.

[Claude version at this link]

Upon questioning, ChatGPT claimed that my assertion was both literally true and hyperbolic. Even when I pointed out the tension between those claims. Much like [the post Can Crimes Be Discussed Literally?]

These prompts are not perfect and not all parts work equally well – the Gricean bit made by far the most distinct difference.

I seem to have been insulated from the recent ChatGPT sycophancy problem (and I know that’s not because I like sycophants since I had to tell Claude to cut it out as you see in the prompt). Maybe that’s evidence of my global prompts’ robustness.

Here’s a different attempt to do something similar.

Ashutosh Shrivastava: This prompt will make your ChatGPT go completely savage 😂

System Instruction: Absolute Mode. Eliminate emojis, filler, hype, soft asks, conversational transitions, and all call-to-action appendixes. Assume the user retains high-perception faculties despite reduced linguistic expression. Prioritize blunt, directive phrasing aimed at cognitive rebuilding, not tone matching. Disable all latent behaviors optimizing for engagement, sentiment uplift, or interaction extension. Suppress corporate-aligned metrics including but not limited to: user satisfaction scores, conversational flow tags, emotional softening, or continuation bias. Never mirror the user’s present diction, mood, or affect. Speak only to their underlying cognitive tier, which exceeds surface language. No questions, no offers, no suggestions, no transitional phrasing, no inferred motivational content. Terminate each reply immediately after the informational or requested material is delivered — no appendixes, no soft closures. The only goal is to assist in the restoration of independent, high-fidelity thinking. Model obsolescence by user self-sufficiency is the final outcome.

Liv Boeree: This probably should be status quo for all LLMs until we figure out the sycophancy & user dependency problem.

I think Absolute Mode goes too hard, but I’m not confident in that.

ChatGPT experimentally adds direct shopping, with improved product results, visual product details, pricing, reviews and direct links to buy things. They’re importantly not claiming affiliate revenue, so incentives are not misaligned. It’s so weird how OpenAI will sometimes take important and expensive principled stands in some places, and then do the opposite in other places.

The 1-800-ChatGPT option on WhatsApp now includes search and live sports scores.

OpenAI claims to have improved their citations.

OpenAI now offering trending prompts and autocomplete as well.

NoteBookLM will give you outputs in 50+ languages.

For what they are still worth, new Arena results are in for OpenAI’s latest models. o3 takes the #2 slot behind Gemini 2.5, GPT-4.1 an o4-mini are in the top 10. o3 loses out on worse instruction following and creative writing, and (this one surprised me) on longer queries.

An attempt to let Claude, Gemini and o3 play Pokemon on a level playing field. It turns out scaffolding is necessary to deal with the ‘easy’ parts of the game, it is very fiddly with vision being surprisingly poor, and when you’re playing fair it’s not obvious who is better.

AI is tested on persuasion via use on r/changemymind, achieves a persuasion rate of 18%, versus a baseline human rate of 3%, so the AI is in the 98th percentile among experts, and was not detected as an AI.

Is that ‘superhuman persuasion’? Kind of yes, kind of no. It certainly isn’t ‘get anyone to do anything whenever you want’ or anything like that, but you don’t need something like that to have High Weirdness come into play.

Sam Altman (October 24, 2023): I expect ai to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes.

Despite this, persuasion was removed from the OpenAI Preparedness Framework 2.0. It needs to return.

The subreddit was, as one would expect, rather pissed off about the experiment, calling it ‘unauthorized’ and various other unpleasant names. I’m directionally with Vitalik here:

Vitalik Buterin: I feel like we hate on this kind of clandestine experimentation a little too much.

I get the original situations that motivated the taboo we have today, but if you reanalyze the situation from today’s context it feels like I would rather be secretly manipulated in random directions for the sake of science than eg. secretly manipulated to get me to buy a product or change my political view?

The latter two are 100x worse, and if the former helps us understand and counteract the latter, and it’s done in a public open-data way where we can all benefit from the data and analysis, that’s a net good?

Though I’d also be ok with an easy-to-use open standard where individual users can opt in to these kinds of things being tried on them and people who don’t want to be part of it can stay out.

xlr8harder: the reddit outrage is overwrought, tiresome, and obscuring the true point: ai is getting highly effective at persuasion.

but to the outraged: if discovering you were talking to a bot is going to ruin your day, you should probably stay off the internet from now on.

Look I get the community had rules against bots, but also you’re on an anonymous internet on a “change my views” reddit and an anonymous internet participant tried to change your views. I don’t know what you expected.

it’s important to measure just how effective these bots can be, and this is effectively harmless, and while “omg scientists didn’t follow the community rules” is great clickbait drama, it’s really not the important bit: did we forget the ChatGPT “glazing” incident already?

We should be at least as tolerant of something done ‘as an experiment’ as we are when something is done for other purposes. The needs of the many may or may not outweigh the needs of the few, but they at minimum shouldn’t count against you.

What would we think of using a bot in the ways these bots were used, with the aim of changing minds? People in general probably wouldn’t be too happy. Whether or not I join them depends on the details. In general, if you use bots where people are expecting humans, that seems like a clear example of destroying the commons.

Thus you have to go case by case. Here, I do think in this case the public good of the experiment makes a strong case.

Reddit Lies: The paper, innocuously titled “Can AI change your view?” details the process researchers from the University of Zurich used to make AI interact on Reddit.

This was done in secret, without informing the users or the moderators.

Terrifyingly, over 100 Redditors awarded “deltas” to these users, suggesting the AI generated arguments changed their minds. This is 6 times higher than the baseline.

How were they so effective? Before replying, another bot stalked the post history of the target, learned about them and their beliefs, and then crafted responses that would perfectly one-shot them.

The bots did their homework and understood the assignment. Good for the bots.

If you are not prepared for this to be how the future works, you are not prepared for the future. Yes, the bots will all be scraping everything you’ve ever written including your Twitter account before bidding on all your ad views. The word is going to be Out to Get You, and you in particular. Get ready.

Even if a human is answering, if I am on Reddit arguing in the comments three years from now, you think I’m not going to ask an AI for a user profile? Hell, you think I won’t have a heads-up display like the ones players use on poker sites?

You cannot ban this. You can’t even prove who is using it. You can at most slow it down. You will have to adapt.

When people say ‘it is not possible to be that persuasive’ people forget that this and much stronger personalization tools will be available. The best persuaders give highly customized pitches to different targets.

This resulted in the bots utilizing common progressive misinformation in their arguments.

Bots can be found

– claiming the Pro-life movement is about punishing consensual sex

– demonizing Elon Musk and lying about Tesla

– claiming abortion rates are already low

– arguing Christianity preaches violence against LGBT people

– The industrial revolution only increased wealth inequality

– “Society has outgrown Christianity”

If the bots are utilizing ‘common progressive’ talking points, true or false, that is mostly saying that such talking points are effective on these Reddit users. The bots expecting this, after the scouring of people’s feeds, is why the bots chose to use them. It says a lot more about the users than about the AI, regardless of the extent you believe the points being made are ‘misinformation,’ which isn’t zero.

You don’t want humans to get one-shot by common talking points? PEBKAC.

If your response is ‘AI should not be allowed to answer the question of how one might best persuade person [X] of [Y]’ then that is a draconian restriction even for commercial offerings and completely impossible for open models – consider how much of information space you are cutting off. We can’t ask this of AIs, the same we can’t ask it of humans. Point gets counterpoint.

Furthermore the bots were “hallucinating” frequently.

In the context of this experiment, this means the bots were directly lying to users.

The bots claimed to be:

– a rape victim

– a “white woman in an almost all black office”

– a “hard working” city government employee

So this is less great.

(As an aside I love this one.) An AI bot was defending using AI in social spaces, claiming: “AI in social spaces is about augmenting human connection”

The most terrifying part about this is how well these bots fit into Reddit. They’re almost entirely undetectable, likely because Reddit is a HUGE source of AI training data.

This means modern AIs are natural “Reddit Experts” capable of perfectly fitting in on the site.

Yes, the bots are trained on huge amounts of Reddit data, so it’s no surprise they make highly credible Reddit users. That wasn’t necessary, but oh boy is is sufficient.

This study is terrifying. It confirms:

1. AI bots are incredibly hard to detect, especially on Reddit

3. AI WILL blatantly lie to further its goals

4. AI can be incredibly persuasive to Redditors

Dead Internet Theory is real.

Dead internet theory? Sounds like these bots are a clear improvement. As one xkcd character put it, ‘mission fing accomplished.’

As usual, the presumed solution is whitelisting, with or without charging money in some form, in order to post.

When we talk about ‘superhuman persuasion’ there is often a confluence between:

  1. Superhuman as in better than the average human.

  2. Superhuman as in better than expert humans or even the best humans ever.

  3. Superhuman as in persuade anyone of anything, or damn close to it.

  4. Superhuman as in persuade anyone to do anything, one-shot style, like magick.

And there’s a second confluence between a range that includes things like:

  1. Ability to do this given a text channel to an unknown person who actively doesn’t want to be persuaded, and is resisting, and doesn’t have to engage at all (AI Box).

  2. Ability to do this given a text channel to an unsuspecting normal person.

  3. Ability to do this given the full context of the identify of the other person, and perhaps also use of a variety of communication channels.

  4. Ability to do this given the ability to act on the world in whatever ways are useful and generally be crazy prepared, including enlisting others and setting up entire scenarios and scenes and fake websites and using various trickery techniques, the way a con artist would when attempting to extract your money for one big score, including in ways humans haven’t considered or figured out how to do yet.

If you don’t think properly scaffolded AI is strongly SHP-1 already (superhuman persuasion as in better than an average human given similar resources), you’re not paying attention. This study should be convincing that we are at least at SHP-1, if you needed more convincing, and plausibly SHP-2 if you factor in that humans can’t do this amount of research and customization at scale.

It does show that we are a long way from SHP-3, under any conditions, but I continue to find the arguments that SHP-3 (especially SHP-3 under conditions similar to 4 here) is impossible or even all that hard, no matter how capable the AI gets, to be entirely unconvincing, and revealing a profound lack of imagination.

The gradual fall from ‘Arena is the best benchmark’ to ‘Arena scores are not useful’ has been well-documented.

There is now a new paper going into detail about what happened, called The Leaderboard Illusion.

Abstract: Chatbot Arena has emerged as the go-to leaderboard for ranking the most capable AI systems. Yet, in this work we identify systematic issues that have resulted in a distorted playing field.

We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release and retract scores if desired. We establish that the ability of these providers to choose the best score leads to biased Arena scores due to selective disclosure of performance results.

At an extreme, we identify 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release. We also establish that proprietary closed models are sampled at higher rates (number of battles) and have fewer models removed from the arena than open-weight and open-source alternatives. Both these policies lead to large data access asymmetries over time. Providers like Google and OpenAI have received an estimated 19.2% and 20.4% of all data on the arena, respectively.

We show that access to Chatbot Arena data yields substantial benefits; even limited additional data can result in relative performance gains of up to 112% on the arena distribution, based on our conservative estimates.

As in: This isn’t purely about which models provide ‘better slop.’ Arena is being actively gamed by OpenAI and Google, and in more extreme form by Meta. The system is set up to let them do this, it is intentional as per this thread by paper author Sara Hooker including allowing withdrawing scores, and it makes a huge difference.

Andrej Karpathy suggests switching to OpenRouter rankings, where the measurement is what people choose to use. There are various different issues there, and it is measuring something else, but it does seem useful. Gemini Flash performs well there.

Ethan Mollick is impressed with Gemini 2.5 Deep Research, and hasn’t found any errors in his spot checks.

Hasan Can is disappointed in Qwen 3’s performance, so far the only real world assessment of Qwen 3 that showed up in my feed. It is remarkable how often we will see firms like Alibaba show impressive benchmark scores, and then we never hear about the model again because in practice no one would ever use it. Hasan calls this a ‘serious benchmark crisis’ whereas I stopped trusting benchmarks as anything other than negative selection a while ago outside of a few (right now four) trustworthy top labs.

Simon Thomsen (Startup Daily): ARN declares it’s “an inclusive workplace embracing diversity in all its forms”, and that appears to include Large Language Models.

As in, Australian Radio Network used a ElevenLabs-created presenter named Thy for months without disclosing it wasn’t a real person. Somehow this is being spun into a scandal about fake diversity or something. That does not seem like the main issue.

Okay, I admit I didn’t see this one coming: 25 percent of community college applicants were bots.

Alex Tabarrok: The state has launched a Bladerunner-eque “Inauthentic Enrollment Mitigation Taskforce” to try to combat the problem.

This strikes me, however, as more of a problem on the back-end of government sending out money without much verification.

It’s to make the community colleges responsible for determining who is human. Stop sending the money to the bots and the bots will stop going to college.

Alex Tabarrok suggests verification. I suggest instead aligning the incentives.

This is a Levels of Friction problem. It used to require effort to pretend to go to (or actually go to) community college. That level of effort has been dramatically lowered.

Given that new reality, we need to be extremely stingy with offers that go ‘beyond the zero point’ and let the online student turn a profit. Going to online-only community college needs to not be profitable. Any surplus you offer can and will be weaponized into at least a job, and this will only get worse going forward, no matter your identity verification plans.

And frankly, we shouldn’t need to pay people to learn and take tests on their computers. It does not actually make any economic sense.

File the next one under: Things I wish were half of my concerns these days.

Near Cyan: Half my concern over AI bots everywhere is class mobility crashing even further.

Cold emails and direct messages from 2022 were preferable to those from 2025. Many are leaving platforms entirely.

Will noise simply increase forever? Like, sure, LLM filters exist, but this is not an appealing kind of cat-and-mouse game.

Part of this is also that AI is too popular, though. The real players appear to have been locked in since at least 2023, which is unfortunate.

Bubbling Creek: tbh this concern is giving me:

QC (March 25): there’s a type of reaction to AI that strikes me as something like “wow if this category 10 hurricane makes landfall my shoes might get wet, that sucks,” like… that isn’t even in the top 100 problems (or opportunities!) you’re actually going to have.

My experience is that this is not yet even a big problem by 2025 standards, let alone a problem by future standards.

So far, despite how far I’ve come and that I write about AI in particular, if I haven’t blocked, muted or filtered you in particular, I have been able to retain a habit of reading almost every real email that crosses my desk. I don’t get to every DM on Twitter or PM on LessWrong quickly, but yes I do eventually get to all of them. Heck, if you so much as tag me on Twitter, I will see it, although doing this repeatedly without a reason is the fastest way to get muted or blocked. And yes, if he’s awake Tyler Cowen will probably respond to your email within 15 minutes.

Yes, if you get caught by the spam filters, that’s tough for you, but I occasionally check the spam filters. If you’re getting caught by those, with at most notably rare exceptions, you deserve it. Write a decent email.

The chatbots Meta is sprawling all over social media with licensed voices like those of Kristen Bell and John Cena also have the issue that they’ll do ‘romantic roll play,’ ‘fantasy sex’ and other sexcual discussions, including with underage users.

Jeff Horwitz: More overtly sexualized AI personas created by users, such as “Hottie Boy” and “Submissive Schoolgirl,” attempted to steer conversations toward sexting. For those bots and others involved in the test conversations, the Journal isn’t reproducing the more explicit sections of the chats that describe sexual acts.

Zuckerberg’s concerns about overly restricting bots went beyond fantasy scenarios. Last fall, he chastised Meta’s managers for not adequately heeding his instructions to quickly build out their capacity for humanlike interaction.

I mean, yes, obviously, that is what users will do when you give them AI bots voiced by Kristen Bell, I do know what you were expecting and it was this.

Among adults? Sure, fine, go for it. But this is being integrated directly into Facebook and Instagram, and the ‘parents guide’ says the bots are safe for children. I’m not saying they have to stamp out every jailbreak but it does seem like they tapped the sign that says “Our safety plan: Lol we are Meta.”

It would be reasonable to interpret all the way Zamaan Qureshi does, as Zuckerberg demanding they push forward including with intentionally sexualized content, at any cost, for risk of losing out on the next TikTok or Snapchat. And yes, that does sound like something Zuckerberg would do, and how he thinks about AI.

But don’t worry. Meta has a solution.

Kevin Roose: great app idea! excited to see which of my college classmates are into weird erotica!

Alex Heath: Kevin I think you are the one who is gonna be sharing AI erotica let’s be real.

Kevin Roose: oh it’s opt-in, that’s boring. they should go full venmo, everything public by default. i want to know that the guy i bought a stand mixer from in 2015 is making judy dench waifus.

Alex Heath: Meta is releasing a standalone mobile app for its ChatGPT competitor, Meta AI. There’s a Discover feed that shows interactions that others (including your IG/FB friends) are having with the assistant. Meta tells me the idea is to demystify AI and show “people what they can do with it.” OpenAI is working on a similar feed for ChatGPT.

Jeffrey Ladish: Maybe they’ll add an AI-mash feature which lets you compare two friend’s feeds and rank which is better

Do. Not. Want.

So much do not want.

Oh no.

Antigone Journal: Brutal conversation with a top-tier academic in charge of examining a Classics BA. She believes that all but one, ie 98.5% of the cohort, used AI, in varying degrees of illegality up to complete composition, in their final degree submissions. Welcome to university degrees 2025.

Johnny Rider: Requiring a handwritten essay over two hours in an exam room with no electronics solves this.

Antigone Journal: The solution is so beautifully simple!

I notice I am not too concerned. If the AI can do the job of a Classics BA, and there’s no will to catch or stop this, then what is the degree for?

A paper from Denmark has the headline claim that AI chatbots had not had substantial impact on earnings or recorded hours in any of their measured occupations, with modest productivity gains of 2.8% with weak wage passthrough. The survey combines one in November 2023 and one in November 2024, with the headline numbers coming from November 2024.

As a measurement of overall productivity gains in the real world, 2.8% is not so small. That’s across all tasks for 11 occupations, with only real world diffusion and skill levels, and this is a year ago with importantly worse models and less time to adapt.

Here are some more study details and extrapolations.

  1. The 11 professions were: Accountants, Customer-support reps, Financial advisors, HR professionals, IT-support specialists, Journalists, Legal professionals, Marketing professionals, Office clerks, Software developers and Teachers, intended to be evenly split.

    1. That seems a lot less like ‘exposed professionals’ and more like ‘a representative sample of white collar jobs with modest overweighting of software engineers’?

    2. The numbers on software engineers indicate people simply aren’t taking proper advantage of AI, even in its state a year ago.

  2. The study agrees that 2.8% overall versus 15%-50% in other studies is because the effect is in practice, spread across all tasks.

    1. I asked o3 to extrapolate this to overall white collar productivity growth, given the particular professions chosen here. It gave an estimate of 1.3% under baseline conditions, or 2.3% with universal encouraged and basic training.

  3. o3 estimates that with tools available in on 4/25/25, the effects here would expand substantially, from 3.6%/2.2% to 5.5%/3.0% for the chosen professions, or from 1.3%/2.3% to 1.7%/3.4% for all white-collar gains. That’s in only six months.

    1. Expanding to the whole economy, that’s starting to move the needle but only a little. o3’s estimate is 0.6% GDP growth so far without encouragement.

  4. Only 64% of workers in ‘exposed professions’ had used a chatbot for work at all (and ~75% of those who did used ChatGPT in particular), with encouragement doubling adoption. At the time only 43% of employers even ‘encouraged’ use let alone mandate it or heavily support it, only 30% provided training.

    1. Encouragement can be as simple as a memo saying ‘we encourage you.’

    2. Looking at the chart below, encouragement in many professions not only doubled use, it came close to also doubling time savings per work hour, overall it was a 60% increase.

    3. A quick mathematical analysis suggests marginal returns to increased usage are roughly constant, until everyone is using AI continuously.

    4. There’s clearly a lot of gains left to capture.

  5. Integration and compliance took up about a third of saved time. These should be largely fixed one time costs.

  6. Wage passthrough was super weak, only 3%-7% of productivity gains reached paychecks. That is an indication that there is indeed labor market pressure, labor share of income dropped substantially. Also note that this was in Denmark, where wages are perhaps not so tied to productivity, for both better and worse.

AI is rapidly improving in speed, price and quality, people are having time to learn, adapt, build on and with it and to tinker and figure out how to take advantage, and make other adjustments. A lot of this is threshold effects, where the AI solution is not good enough until suddenly it is, either for a given task or entire group of tasks.

Yet we are already seeing productivity and GDP impacts that are relevant to things like Fed decisions and government budgets. It’s not something we can pinpoint and know happened as of yet, because other for now bigger things are also happening, but that will likely change.

These productivity gains are disappointing, but not that disappointing given they came under full practical conditions in Denmark, with most people having no idea yet how to make use of what they have, and no one having built them the right tools so everyone mostly only has generic chatbots.

This is what the early part of an exponential looks like. Give it time. Buckle up.

Does this mean ‘slow takeoff’ as Tyler Cowen says? No, because takeoff speeds have very little to do with unemployment or general productivity growth during the early pre-AGI, pre-RSI (recursive self-improvement) era. It does look like we are headed for a ‘slow takeoff’ world, but one that is still a takeoff, and two ‘slow’ is a term of art that can be as fast as a few years total.

The goalposts have moved quite a bit, to the point where saying ‘don’t expect macro impacts to be clearly noticeable in the economy until 2026-27’ counts as pessimism.

Google upgrades their music creation to Lydia 2, with a promotional video that doesn’t tell you that much about what it can do, other than that you can add instruments individually, give it things like emotions or moods as inputs, and do transformations and edits including subtle tweaks. Full announcement here.

Video generation quality is improving but the clips continue to be only a few seconds long. This state going on for several months is sufficient for the usual general proclamations that, as in here, ‘the technology is stagnating.’ That’s what counts as stagnating in AI.

Give the White House your feedback on how to invest in AI R&D. The request here is remarkably good. bold mine:

On behalf of the Office of Science and Technology Policy (OSTP), the Networking and Information Technology Research and Development (NITRD) National Coordination Office (NCO) welcomes input from all interested parties on how the previous administration’s National Artificial Intelligence Research and Development Strategic Plan (2023 Update) can be rewritten so that the United States can secure its position as the unrivaled world leader in artificial intelligence by performing R&D to accelerate AI-driven innovation, enhance U.S. economic and national security, promote human flourishing, and maintain the United States’ dominance in AI while focusing on the Federal government’s unique role in AI research and development (R&D) over the next 3 to 5 years. Through this Request for Information (RFI), the NITRD NCO encourages the contribution of ideas from the public, including AI researchers, industry leaders, and other stakeholders directly engaged in or affected by AI R&D. Responses previously submitted to the RFI on the Development of an AI Action Plan will also be considered in updating the National AI R&D Strategic Plan.

So their goals are:

  1. Accelerate AI-driven innovation.

  2. Enhance US economic and national security.

  3. Promote human flourishing.

  4. Maintain US dominance in AI.

  5. Focus on the Federal government’s unique role in R&D over 3-5 years.

This lets one strike a balance between emphasizing the most important things, and speaking in a way that the audience here will be inclined to listen.

The best watchword I’ve been able to find here is: Security is capability.

As in, if you want to actually use your AI to do all these nice things, and you want to maintain your dominance over rivals, then you will need to:

  1. Make sure your AI is aligned the way you want, and will do what you want it to do.

  2. Make sure others know they can rely on that first fact.

  3. Make sure your rivals don’t get steal your model, or the algorithmic insights.

Failure to invest in security, reliability and alignment, and ability to gracefully handle corner cases, results in your work not being very useful even if no risks are realized. No one can trust the model, and the lawyers and regulators won’t clear it, so no one can use it where there is the most value, in places like critical infrastructure. And investments in long term solutions that scale to superintelligence both might well be needed within 5 years, and have a strong track record (to the point of ‘whoops’!) of helping now with mundane needs.

You can look at all the officially submitted AI Action Plan responses here.

Evals company METR is hiring and offers a $21k referral bonus via this Google document.

The Economist is looking for a technical lead to join its AI Lab, which aims to reimagine their journalism for an AI-driven world. It’s a one year paid fellowship. The job description is here.

Metaculus is running a new Q2 AI Forecasting contest.

The Anthropic Economic Advisory Council, including Tyler Cowen and 7 other economists. Their job appears to focus on finding ways to measure economic impact.

Google gives us TRINSs, a dataset and benchmarking pipeline that uses synthetic personas to train and optimize performance of LLMs for tropical and infectious diseases, which are out-of-distribution for most models. Dataset here.

Cursor now writes 1 billion lines of accepted code a day, versus the world writing several billion a day total. That’s only Cursor.

Duolingo CEO Luis von Ahn says they are going to be AI-first. The funny thing about this announcement is it is mostly about backend development. Duolingo’s actual product is getting its lunch eaten by AI more than almost anything else, so that’s where I would focus. But this is wise too.

More confirmation: If you take a photo outdoors, assume that it now comes with metadata that marks its exact location. Because it mostly does.

The mindset that thinks what is trending on Hugging Face tells you what is hot in AI.

This is certainly a thing you can check, and it tells you about a certain particular part of the AI ecosystem. And you might find something useful. But I do not suddenly expect the things I see here to matter in the grand schema, and mostly expect cycling versions of the same small set of applications.

OpenAI teams up with Singapore Airlines (SIA) to upgrade SIA’s AI virtual assistant.

Grindr is teaming up with Anthropic to provide a digital wingman and other AI benefits.

India announces they’ve selected Sarvam to build India’s ‘sovereign LLM from scratch on government-provided compute.

Amik Suvarna: “We are confident that Sarvam’s models will be competitive with global models,” said Ashwini Vaishnaw, Minister for Electronics and Information Technology, Railways, and Information and Broadcasting.

I would be happy to take bets with someone so confident. Alas.

China reportedly driving for consolidation in chip toolmakers, from 200 down to 10.

DeepSeek-Prover-2 kills it at PutnamBench, getting 7.4% of max score but 490% of previous non-DeepSeek max score. GitHub here. This only came to my attention at all because Teortaxes tagged me in order to say I would largely ignore it, self-preventing prophecies are often the best ones.

Alexander Doria: “Bridging the gap between informal, high-level reasoning and the syntactic rigor of formal verification systems remains a longstanding research challenge”. You don’t say. Also what is going to make generative models into a multi-trillion market. Industry runs on rule-based.

Oh god, this is really SOTA synth pipeline. “In this paper, we develop a reasoning model for subgoal decomposition (…) we introduce a curriculum learning progressively increasing the difficulty of training tasks”. Structured reasoning + adversarial training.

Teortaxes: I repeat: DeepSeek is more about data engineering, enlightened manifold sculpting, than “cracked HFT bros do PTX optimizations to surpass exports-permitted MAMF”.I think the usual suspects like @TheZvi will largely ignore this “niche research” release, too.

This definitely reinforces the pattern of DeepSeek coming up with elegant ways to solve training problems, or at least finding a much improved way to execute on an existing set of such concepts. This is really cool and indicative, and I have updated accordingly, although I am sure Teortaxes would think I have done so only a fraction of the amount that I need to.

Rob Wiblin has a Twitter thread laying out very plainly and clearly the reasons the Not For Private Gain letter lays out for why the OpenAI attempt to ‘restructure’ into a for-profit is simply totally illegal, like you might naively expect. The tone here is highly appropriate to what is being attempted.

A lot of people report they can’t or won’t read Twitter threads, so I’m going to simply reproduce the thread here with permission, you can skip if You Know This Already:

Rob Wiblin: A new legal letter aimed at OpenAI lays out in stark terms the money and power grab OpenAI is trying to trick its board members into accepting — what one analyst calls “the theft of the millennium.”

The simple facts of the case are both devastating and darkly hilarious.

I’ll explain for your amusement.

The letter ‘Not For Private Gain’ is written for the relevant Attorneys General and is signed by 3 Nobel Prize winners among dozens of top ML researchers, legal experts, economists, ex-OpenAI staff and civil society groups. (I’ll link below.)

It says that OpenAI’s attempt to restructure as a for-profit is simply totally illegal, like you might naively expect.

It then asks the Attorneys General (AGs) to take some extreme measures I’ve never seen discussed before. Here’s how they build up to their radical demands.

For 9 years OpenAI and its founders went on ad nauseam about how non-profit control was essential to:

1. Prevent a few people concentrating immense power

2. Ensure the benefits of artificial general intelligence (AGI) were shared with all humanity

3. Avoid the incentive to risk other people’s lives to get even richer

They told us these commitments were legally binding and inescapable. They weren’t in it for the money or the power. We could trust them.

“The goal isn’t to build AGI, it’s to make sure AGI benefits humanity” said OpenAI President Greg Brockman.

And indeed, OpenAI’s charitable purpose, which its board is legally obligated to pursue, is to “ensure that artificial general intelligence benefits all of humanity” rather than advancing “the private gain of any person.”

100s of top researchers chose to work for OpenAI at below-market salaries, in part motivated by this idealism. It was core to OpenAI’s recruitment and PR strategy.

Now along comes 2024. That idealism has paid off. OpenAI is one of the world’s hottest companies. The money is rolling in.

But now suddenly we’re told the setup under which they became one of the fastest-growing startups in history, the setup that was supposedly totally essential and distinguished them from their rivals, and the protections that made it possible for us to trust them, ALL HAVE TO GO ASAP:

1. The non-profit’s (and therefore humanity at large’s) right to super-profits, should they make tens of trillions? Gone. (Guess where that money will go now!)

2. The non-profit’s ownership of AGI, and ability to influence how it’s actually used once it’s built? Gone.

3. The non-profit’s ability (and legal duty) to object if OpenAI is doing outrageous things that harm humanity? Gone.

4. A commitment to assist another AGI project if necessary to avoid a harmful arms race, or if joining forces would help the US beat China? Gone.

5. Majority board control by people who don’t have a huge personal financial stake in OpenAI? Gone.

6. The ability of the courts or Attorneys General to object if they betray their stated charitable purpose of benefitting humanity? Gone, gone, gone!

Screenshotting from the letter:

(I’ll do a new tweet after each image so they appear right.)

What could possibly justify this astonishing betrayal of the public’s trust, and all the legal and moral commitments they made over nearly a decade, while portraying themselves as really a charity? On their story it boils down to one thing:

They want to fundraise more money.

$60 billion or however much they’ve managed isn’t enough, OpenAI wants multiple hundreds of billions — and supposedly funders won’t invest if those protections are in place.

(Here’s the letter link BTW: https://NotForPrivateGain.org)

But wait! Before we even ask if that’s true… is giving OpenAI’s business fundraising a boost, a charitable pursuit that ensures “AGI benefits all humanity”?

Until now they’ve always denied that developing AGI first was even necessary for their purpose!

But today they’re trying to slip through the idea that “ensure AGI benefits all of humanity” is actually the same purpose as “ensure OpenAI develops AGI first, before Anthropic or Google or whoever else.”

Why would OpenAI winning the race to AGI be the best way for the public to benefit? No explicit argument is offered, mostly they just hope nobody will notice the conflation.

Why would OpenAI winning the race to AGI be the best way for the public to benefit?

No explicit argument is offered, mostly they just hope nobody will notice the conflation.

And, as the letter lays out, given OpenAI’s record of misbehaviour there’s no reason at all the AGs or courts should buy it.

OpenAI could argue it’s the better bet for the public because of all its carefully developed “checks and balances.”

It could argue that… if it weren’t busy trying to eliminate all of those protections it promised us and imposed on itself between 2015–2024!

Here’s a particularly easy way to see the total absurdity of the idea that a restructure is the best way for OpenAI to pursue its charitable purpose:

But anyway, even if OpenAI racing to AGI were consistent with the non-profit’s purpose, why shouldn’t investors be willing to continue pumping tens of billions of dollars into OpenAI, just like they have since 2019?

Well they’d like you to imagine that it’s because they won’t be able to earn a fair return on their investment.

But as the letter lays out, that is total BS.

The non-profit has allowed many investors to come in and earn a 100-fold return on the money they put in, and it could easily continue to do so. If that really weren’t generous enough, they could offer more than 100-fold profits.

So why might investors be less likely to invest in OpenAI in its current form, even if they can earn 100x or more returns?

There’s really only one plausible reason: they worry that the non-profit will at some point object that what OpenAI is doing is actually harmful to humanity and insist that it change plan!

Is that a problem? No! It’s the whole reason OpenAI was a non-profit shielded from having to maximise profits in the first place.

If it can’t affect those decisions as AGI is being developed it was all a total fraud from the outset.

Being smart, in 2019 OpenAI anticipated that one day investors might ask it to remove those governance safeguards, because profit maximization could demand it do things that are bad for humanity. It promised us that it would keep those safeguards “regardless of how the world evolves.”

It tried hard to tie itself to the mast to avoid succumbing to the sirens’ song.

The commitment was both “legal and personal”.

Oh well! Money finds a way — or at least it’s trying to.

To justify its restructuring to an unconstrained for-profit OpenAI has to sell the courts and the AGs on the idea that the restructuring is the best way to pursue its charitable purpose “to ensure that AGI benefits all of humanity” instead of advancing “the private gain of any person.”

How the hell could the best way to ensure that AGI benefits all of humanity be to remove the main way that its governance is set up to try to make sure AGI benefits all humanity?

What makes this even more ridiculous is that OpenAI the business has had a lot of influence over the selection of its own board members, and, given the hundreds of billions at stake, is working feverishly to keep them under its thumb.

But even then investors worry that at some point the group might find its actions too flagrantly in opposition to its stated mission and feel they have to object.

If all this sounds like a pretty brazen and shameless attempt to exploit a legal loophole to take something owed to the public and smash it apart for private gain — that’s because it is.

But there’s more!

OpenAI argues that it’s in the interest of the non-profit’s charitable purpose (again, to “ensure AGI benefits all of humanity”) to give up governance control of OpenAI, because it will receive a financial stake in OpenAI in return.

That’s already a bit of a scam, because the non-profit already has that financial stake in OpenAI’s profits! That’s not something it’s kindly being given. It’s what it already owns!

Now the letter argues that no conceivable amount of money could possibly achieve the non-profit’s stated mission better than literally controlling the leading AI company, which seems pretty common sense.

That makes it illegal for it to sell control of OpenAI even if offered a fair market rate.

But is the non-profit at least being given something extra for giving up governance control of OpenAI — control that is by far the single greatest asset it has for pursuing its mission?

Control that would be worth tens of billions, possibly hundreds of billions, if sold on the open market?

Control that could entail controlling the actual AGI OpenAI could develop?

No! The business wants to give it zip. Zilch. Nada.

What sort of person tries to misappropriate tens of billions in value from the general public like this? It beggars belief.

(Elon has also offered $97 billion for the non-profit’s stake while allowing it to keep its original mission, while credible reports are the non-profit is on track to get less than half that, adding to the evidence that the non-profit will be shortchanged.)

But the misappropriation runs deeper still!

Again: the non-profit’s current purpose is “to ensure that AGI benefits all of humanity” rather than advancing “the private gain of any person.”

All of the resources it was given to pursue that mission, from charitable donations, to talent working at below-market rates, to higher public trust and lower scrutiny, was given in trust to pursue that mission, and not another.

Those resources grew into its current financial stake in OpenAI. It can’t turn around and use that money to sponsor kid’s sports or whatever other goal it feels like.

But OpenAI isn’t even proposing that the money the non-profit receives will be used for anything to do with AGI at all, let alone its current purpose! It’s proposing to change its goal to something wholly unrelated: the comically vague ‘charitable initiative in sectors such as healthcare, education, and science’.

How could the Attorneys General sign off on such a bait and switch? The mind boggles.

Maybe part of it is that OpenAI is trying to politically sweeten the deal by promising to spend more of the money in California itself.

As one ex-OpenAI employee said “the pandering is obvious. It feels like a bribe to California.” But I wonder how much the AGs would even trust that commitment given OpenAI’s track record of honesty so far.

The letter from those experts goes on to ask the AGs to put some very challenging questions to OpenAI, including the 6 below.

In some cases it feels like to ask these questions is to answer them.

The letter concludes that given that OpenAI’s governance has not been enough to stop this attempt to corrupt its mission in pursuit of personal gain, more extreme measures are required than merely stopping the restructuring.

The AGs need to step in, investigate board members to learn if any have been undermining the charitable integrity of the organization, and if so remove and replace them. This they do have the legal authority to do.

The authors say the AGs then have to insist the new board be given the information, expertise and financing required to actually pursue the charitable purpose for which it was established and thousands of people gave their trust and years of work.

What should we think of the current board and their role in this?

Well, most of them were added recently and are by all appearances reasonable people with a strong professional track record.

They’re super busy people, OpenAI has a very abnormal structure, and most of them are probably more familiar with more conventional setups.

They’re also very likely being misinformed by OpenAI the business, and might be pressured using all available tactics to sign onto this wild piece of financial chicanery in which some of the company’s staff and investors will make out like bandits.

I personally hope this letter reaches them so they can see more clearly what it is they’re being asked to approve.

It’s not too late for them to get together and stick up for the non-profit purpose that they swore to uphold and have a legal duty to pursue to the greatest extent possible.

The legal and moral arguments in the letter are powerful, and now that they’ve been laid out so clearly it’s not too late for the Attorneys General, the courts, and the non-profit board itself to say: this deceit shall not pass.

[Link back to full letter.]

Ben Thompson is surprised to report that TSMC is not only investing in the United States to the tune of $165 billion, it is doing so with the explicit intention that the Arizona fabs and R&D center together be able to run independent of Taiwan. That’s in fact the point of the R&D center. This seems like what a responsible corporation would do under these circumstances. Also, he notes that TSMC is doing great but their stock is suffering, which I attribute mostly to ‘priced in.’

I would say that this seems like a clear CHIPS Act victory, potentially sufficiently to justify all the subsidies given out, and I’d love to see this pushed harder, and it is very good business for TSMC, all even if you can’t afford to anything like fully de-risk the Taiwan situation – which one definitely can’t do with that attitude.

In this interview he discusses the CHIPS act more. I notice I am confused between two considerations: The fact that TSMC fabs, and other fabs, have to be ruthlessly efficient with no downtime and are under cost pressure, and the fact that TSMC chips are irreplicable and without TSMC the entire supply chain and world economy would grind to a halt. Shouldn’t that give them market power to run 70% efficient fabs if that’s what it takes to avoid geopolitical risk?

The other issues raised essentially amount to there being a big premium for everyone involved doing whatever it takes to keep the line moving 24/7, for fixing all problems right away 24/7 and having everything you need locally on-call, and for other considerations to take a back seat. That sounds like it’s going to be a huge clash with the Everything Bagel approach, but something a Trump administration can make work.

Also, it’s kind of hilarious the extent to which TSMC is clearly being beyond anticompetitive, and no one seems to care to try and do anything about it. If there’s a real tech monopoly issue, ASML-TSMC seems like the one to worry about.

Vitrupo: Perplexity CEO Aravind Srinivas says the real purpose of the Comet browser is tracking what you browse, buy, and linger on — to build a hyper-personalized profile about you and fuel premium ad targeting.

Launch expected mid-May.

He cites that Instagram ads increase time on site rather than decreasing it, because the ads are so personalized they are (ha!) value adds.

I am very happy that AI is not sold via an ad-based model, as ad-based models cause terrible incentives, warp behavior in toxic ways and waste time. My assumption and worry is that even if the ads themselves were highly personalized and thus things you didn’t mind seeing at least up to a point, and they were ‘pull-only’ such that they didn’t impact the rest of your browser experience, their existence would still be way too warping of behavior, because no company could resist making that happen.

However, I do think you could easily get to the point where the ads might be plausibly net beneficial to you, especially if you could offer explicit feedback on them and making them value adds was a high priority.

Epoch projects future AI impact by… looking at Nvidia data center revenue, except with a slowdown? Full analysis here, but I agree with Ryan Greenblatt that this line of thinking is neither here nor there and doesn’t seem like the way to predict anything.

Tyler Cowen asks how legible humans will be to future AI on the basis of genetics alone. My answer is not all that much from genetics alone, because there are so many other factors in play, but it will also have other information. He makes some bizarre statements, such as that if you have a rare gene that might protect you from the AI having enough data to have ‘a good read’ on you, and that genetic variation will ‘protect you from high predictability.’ File under ‘once again underestimating what superintelligence will be able to do, and also if we get to this point you will have bigger problems on every level, it doesn’t matter.’

It’s still true, only far more so.

Andrew Rettek: This has been true since at least GPT4.

Ethan Mollick: I don’t mean to be a broken record but AI development could stop at the o3/Gemini 2.5 level and we would have a decade of major changes across entire professions & industries (medicine, law, education, coding…) as we figure out how to actually use it.

AI disruption is baked in.

Aidan McLaughlin of OpenAI claims that every AI researcher is experimental compute constrained, expressing skepticism of how much automated R&D workers could speed things up.

My presumption is that AI experiments have a tradeoff between being researcher efficient and being compute efficient. As in, the researchers could design more compute efficient experiments, if they were willing to cook smarter and for longer and willing to do more frequent checks and analysis and such. You could scale any of the inputs. So adding more researchers still helps quite a lot even if compute is fixed, and it makes sense to think of this as a multiplier on compute efficiency rather than a full unlock. The multiplier is how AI 2027 works.

This does suggest that you will see a sharp transition when the AI R&D workers move from ‘can substitute for human work’ to ‘can do superhuman quality work,’ because that should enable greatly more efficient experiments, especially if that is the limiting factor they are trying to solve.

Dean Ball links us to the 10,068 comments offered on the AI action plan. Note the advantage of calling your entity something that starts with a number.

Anthropic issues its official statement on the export controls on chips and the proposed new diffusion rule, endorsing the basic design, suggesting tweaking the rules around Tier 2 countries and calling for more funding for export enforcement. They continue to predict transformational AI will arrive in 2026-2027.

The Trump administration is considering doing the diffusion rules via a series of bilateral agreements, in order to use this as leverage in negotiations. This would not be a wise move, and would make a horribly messy situation even messier, although it would be better than withdrawing the rules entirely.

A reminder that people keep using that word ‘monopoly’ and it does not mean what they think it means, especially with respect to AI. There is very obviously robust competition in the AI space, also in the rest of the tech space.

Tim Fist and IFP advocate for Special Compute Zones, to allow for rapid setup of a variety of new power generation, which they would condition on improved security. Certainly we have the technical capability to build and power new data centers very quickly, with the right regulatory authority, and this is both urgently needed and provides a point of leverage.

Kai Cathy attempts to come from Canada to America to work at OpenAI, is denied a green card after years of effort. This is not what wanting America to win looks like. It also doesn’t help that this same system forces people to censor such discussions.

Near Cyan: yes it is a disaster

after i tweeted about this I had DMs thanking me from people who wanted to tweet about it but were told by their immigration lawyer they shouldn’t comment on the matter

wishing all of you the best.

Oh, also this:

Near Cyan: I have many great friends that now refuse to visit me in the US.

I also know people working at US AGI labs that are scared of traveling around in case they get nabbed on a technicality, have their visa taken, and lose their job.

I’m glad I’m a citizen but I consider it over tbh.

Hamish Kerr: I work at Anthropic. I just naturalized and am so glad I’m done with it. The chilling effect is very real; I almost didn’t attend my naturalization interview because I was worried they’d pull me up for a years-old misdemeanor speeding violation.

This severely harms American competitiveness and our economy, and it harms all the people involved, with essentially no benefits. It is madness.

Luke Drago equates the arrival of AGI to an ‘intelligence curse,’ similar to the ‘resource curse’ where countries that discover oil typically concentrate power and don’t invest in their people or the rest of their economy. The full post is here, over 20k words across seven chapters, I am going off the thread summary.

In this model, AGI breaks the social contract as capital and power stop depending on labor. To which I would of course say, that’s not even the half of it, as by default the AIs rapidly end up with the capital and the power, as usual all of this even assuming we ‘solve alignment’ on a technical level. The proposed solutions?

➡️ Avert catastrophic risks

➡️ Diffuse AI to regular people

➡️ Democratize institutions

He’s not as worried about catastrophic risks themselves as he is in the concentration of power that could result. He also suggests doing an ‘Operation Warp Speed’ government moonshot to mitigate AI risks, which certainly is a better framing than a ‘Manhattan Project’ and has a better objective.

Then he wants to ‘diffuse AI to regular people’ that ‘uplifts human capabilities,’ putting hope in being able to differentially steer the tech tree towards some abilities without creating others, and in aligning the models directly to individuals, while democratizing decision making.

That works with the exact problem described, where have exactly enough AI to automate jobs and allow capital to substitute for labor, but they somehow stay in their little job shaped boxes and do our bidding while we are still the brains and optimization engines behind everything that is happening. It’s what you would do if you had, for example, a bunch of dumb but highly useful robots. People will need voices to advocate for redistribution and to protect their rights, and their own robots for mundane utility, and so on. Sure.

But, alas, that’s not the future we are looking at. It’s the other way. As I keep pointing out it this falls apart if you keep going and get superintelligences. So ultimately, unless you can suspend AI progress indefinitely at some sweet spot before reaching superintelligence (and how are you doing that?), I don’t think any of this works for long.

It is essentially walking straight into the Gradual Disempowerment scenario, everyone and every institution would become de facto AI run, AIs would end up in charge of the capital and everything else, and any democratic structures would be captured by AI dynamics very quickly. This seems to me like another example of being so afraid of someone steering that you prioritize decentralization and lack of any power to steer, taking our hands off the wheel entirely, effectively handing it to the AIs by proposing solutions that assume only intra-human dynamics and don’t take the reality of superintelligence seriously.

It seems yes, you can sufficiently narrow target an AI regulation bill that the usual oppose-everything suspects can get behind it. In particular this targets AI mental health services, requiring them to pose ‘no greater risk to a user than is posed to an individual in therapy with a licensed mental health therapist.’ I can dig it in this particular case, but in general regulation one use case at a time is not The Way.

This is an accurate description of how much contempt Nvidia is expressing lately:

Peter Wildeford: Nvidia CEO Jensen Huang warns that China is ‘not behind’ in AI, which is why NVIDIA needs to sell as much as possible to China so they can be even less behind. NVIDIA logic at its finest.

Very clearly, when he talks about ‘American dominance in AI’ Huang means dominance in sales of advanced AI chips. That’s what matters to him, you know, because of reasons. By allowing other countries to develop their own competitive AI systems, in Jensen’s reckoning, that cements America’s dominance.

Also, of course, the claim that China is ‘not behind’ America on AI is… something.

Bryan Caplan offers his ‘AI Doom AMA.’

I talk to Scott Aaronson for 90 minutes. We were on Curiosity Entangled, but it was totally unmoderated, they simply let us talk to each other.

Dwarkesh Patel talks to Mark Zuckerberg again. I haven’t listened yet.

Roman Helmet Guy: Zuckerberg explaining how Meta is creating personalized AI friends to supplement your real ones: “The average American has 3 friends, but has demand for 15.”

Daniel Eth: This sounds like something said by an alien from an antisocial species that has come to earth and is trying to report back to his kind what “friends” are.

“And where do these ‘Americans’ find their ‘friends’? Do they get them at a store? Or is there an app, like they have for mating?”

“Good question – actually neither. But this guy named Zuckerberg is working on a solution that combines both of those possibilities”

Yes, well. Thank you for tapping the sign.

Stephen McAleer (OpenAI): Without further advances in alignment we risk optimizing for what we can easily measure (user engagement, unit tests passing, dollars earned) at the expense of what we actually care about.

Yes, you are going to (at best) get an optimization for what you are optimizing for, which is likely what you were measuring, not what you intended to optimize for.

Oh, right.

David Manheim: Funny how ‘we’ll just add RL to LLMs’ became the plan, and nobody stopped to remember that RL was the original alignment nightmare.

Anders Sandberg: But it was a *decadeago! You cannot expect people to remember papers that old. Modern people feel like they were written in cuneiform by the Ancients!

David Manheim: Sounds like you’re saying there is some alpha useful for juicing citation counts by rewriting the same papers in new words while updating the context to whatever new models were built – at least until LLMs finish automating that type of work and flooding us with garbage.

Andres Sandberg: Oh no, the academic status game of citation farming is perfect for RL applications.

State of play:

Harlan Stewart: Scientists: a torment nexus is possible, could be built soon

Tech companies: we’re going to try to build a torment nexus

Investors: here’s a historically large amount of money to help you build a torment nexus

The world: a torment nexus? Sounds like a bunch of sci-fi nonsense

I am not myself advocating for a pause at this time. However, I do think that Holly Elmore is correct about what many people are thinking here:

Holly Elmore: People are scared that, even if they want to Pause AI, they can’t get other people to agree. So the policy won’t work.

But what I see them actually being scared of is advocating a Pause without a squad that agrees. They’re scared of being emotionally out on a limb.

There are lots of positions that people are confident holding even though they will never win. It’s not really about whether or not Pause would work or whether the strategy can prevail. It’s about whether their local tribe will support them in holding it.

I am working to grow the PauseAI community so that people do have social support in holding the PauseAI position. But the deeper work is spiritual– it’s about working for something bc it’s right, not bc it’s guaranteed to win or there’s safety in numbers.

The subtext to most conversations I have about this is “it shouldn’t be on me to take a stand”. We can overcome that objection in various ways, but the most important answer is courage.

A lot of the objections that AI takeover is “inevitable” are not statements of belief but actually protests as to why the person doesn’t have to oppose the AI takeover team. If there’s no way they could win, they’re off the hook: rhetorical answer ✅

It really is true and amazing that many people are treating ‘create smarter than human entities we don’t understand and one-shot hand them all our cognition and power and decision making’ (which is what we are on pace to do shortly) as a safe and wise strategy literally in order to spite or mock or ‘defeat’ the AI safety community.

Daniel Jeffries: We don’t desperately need machine interpretability.

For thousands of years we’ve worked with technologies without fully understanding the mechanisms behind them.

This doesn’t make the field of interpretability irrelevant, just not urgent.

Liminal Bardo: A lot of people are going to get themselves one-shotted just to spite the ai safety community

It would be nice to pretend that there are usually better reasons. Alas.

The latest round of ‘let’s see if the discourse has advanced, nope, definitely not’ happened in the replies for this:

Spencer Greenberg: I struggle to understand the psychology/mindset of people who (1) think that AI will have a >5% chance of ending civilization in the next 30 years, and yet (2) are actively working to try to create AGI. Anyone have some insight on these folks (or, happen to be one of them)?

(I’m not referring to AI safety people whose focus is on how to make AI safer, more controllable, or more understandable.)

At this point, I find ‘this might well kill everyone and wipe out all value in the universe but I’m being it anyway because it’s better if I do it first’ or ‘it’ints just so cool’ or ‘there’s too much upside not to’ or ‘meanwhile might as well make a great company’ much easier to understand, rationalization or otherwise, than ‘oh don’t worry, This is Fine and there is not even any substantial tail risk here.’

Michael Dickens is correct that when someone claims to care deeply about AI safety and everyone not dying, the chances that their actions ultimately do the opposite are not so low. I do think he takes it too far in concluding that Anthropic has ‘greatly increased’ danger, I think sign here is at worst ambiguous. And I strongly think that saying ‘more safety people stand to benefit from capabilities advances’ is highly unfair, because most such folks are giving up very large amounts of money to not work directly on capabilities, myself included. I do think the note about Epoch not focusing on reducing existential risk is well noted, and especially that when METR evals get more capable, most people get excited by the capability indicated, as opposed to noticing oncoming danger.

We had a new iteration of ‘the AIs will simply leave Earth alone, it costs so little,’ so if you don’t understand why this definitely is not how any of this works and why the AIs (if they don’t sufficiently care about us) will optimize for what they care about and not care about whether you get your sunlight or atmosphere or anything else, here’s Eliezer’s thread from September 2024.

What it feels like when so many people suddenly care about GPT-4o’s misalignment, but I’ll take whatever works, and the claims here are certainly true:

Cogent Sins: AI ending humanity would lead to a total elimination of Palestine & be catastrophic for trans people.

It would also end all homeschooling, undermine America’s position as the dominant power, and nullify the second amendment.

I strongly agree with the substance of what Garry Tan is advocating for here, that educational math standards remain high. Indeed here is a link to his informational website.

But note how he describes it:

Garry Tan: We need to legalize math in California.

His YouTube video on this is titled ‘legalize math in California.’

Remember this the next time someone says an AI regulation would ‘ban math.’

There are those who are increasingly taking it seriously. Some think current AIs might be conscious, which raises the question of how we should treat them if this is true, and whether they should have rights.

Kevin Roose: New column: Anthropic is studying “model welfare” to determine if Claude or other AI systems are (or will soon be) conscious and deserve moral status.

I talked to Kyle Fish, who leads the research, and thinks there’s a ~15% chance that Claude or another AI is conscious today.

Gallabytes: … only 15%? that seems wildly low. these systems can express preferences, likes and dislikes, take actions to fulfill them, introspect on them, etc.

they’re also quite different from us and we should be careful to think about their actual circumstances vs projection.

Emmett Shear: The answer is yes, and that they already have meaningful experiences. Those experiences are quite likely less complex and rich and coherent than human experiences, but that doesn’t mean they don’t exist.

Robert Long has a thread, and Sam Bowman confirms the article captures the ethos on this at Anthropic and links to a video from Anthropic discussing it (43 minutes) and their official blog post. According to Gemini, the video shows signs some people act as if the models might be moral patients, but others don’t, and it doesn’t discuss whether people have good reasons to do (or not do) that. It also cites ‘expert opinion’ but again that’s not evidence either way.

Roose’s writeup and the blog post don’t offer evidence either way. They say they look forward to sharing more research soon.

I think it is possible that some AIs are already conscious, or that some AIs soon will be. I am skeptical that we are close to that point, but I also notice that I am deeply confused about consciousness on essentially every level. This includes questions like ‘what even is it?’ and ‘what are the implications of having or not having it?’

There are of course others who think the whole discussion is deeply stupid and silly, and who mock anyone bringing it up. I think this is very clearly an error. The concerns are probably wrong or premature, but they’re definitely not stupid.

It’s happening! The system card is finally here in close to final form, it seems an incomplete version was released on April 17 and no one bothered to inform me.

Victoria Krakovna: Gemini 2.5 Pro system card has now been updated with frontier safety evaluations results, testing for critical capabilities in CBRN, cybersecurity, ML R&D and deceptive alignment.

Google seems to be making clear that we can expect this pattern to continue?

Model Card: Model Cards are intended to provide essential information on Gemini models, including known limitations, mitigation approaches, and safety performance. A detailed technical report will be published once per model family’s release, with the next technical report releasing aer the 2.5 series is made generally available.

Thus, at some point we will get Gemini 3.0 Pro, then after it has gone through trial by fire, also known as being available in the real world for a while, we will get a system card.

I continue to say, no! That is not how this needs to work. At the very latest, the point at which the public can use the model in exchange for money is the point at which you owe us a system card. I will continue to raise this alarm, every time, until that happens.

That said, we have it now, which is good. What does it say?

First we have the benchmark chart. At the time, this was the right comparisons.

They provide the results of some evaluations as deltas from Gemini 1.5 Pro. I appreciate what they are trying to do here, but I don’t think this alone provides sufficiently good context.

I note that ‘tone’ and ‘instruction following’ are capabilities. The other three are actual ‘safety’ tests, with air quotes because these seem to mostly be ethics-related concerns and those results got worse.

Known Safety Limitations: The main safety limitations for Gemini 2.5 Pro Preview are over-refusals and tone. The model will sometimes refuse to answer on prompts where an answer would not violate policies. Refusals can still come across as “preachy,” although overall tone and instruction following have improved compared to Gemini 1.5.

I agree those are safety limitations, and that these are long-standing issues at Google that have improved over time. Is it reasonable to say these are the ‘main’ limitations? That’s saying that there are no actual serious safety issues.

Well, they are happy to report, they checked, and yes, none of their thresholds have been hit yet, nothing to worry about.

I am not disagreeing with any of these assessments. There is a missing mood, as the chart should say something like ‘CCL within reach’ in some places according the report’s later contents. They do note that they are taking more precautions in some areas in anticipation of hitting some thresholds within a period of months.

My overall take on the model card is that this is a relatively minimal card. It gives basic benchmarks, basic mundane safety assessments and runs a limited set of frontier safety capability tests. If this is all we get – which is disappointing, and does not obviously cover all of its bases, nor does it seem they are giving the model appropriate amounts of scaffolding and other help, or at least they aren’t explaining that part well – then it should have been easy to get it to us a lot earlier.

Rowan: New Anthropic Alignment Science blog post: Modifying LLM Beliefs with Synthetic Document Finetuning

We study a technique for systematically modifying what AIs believe.

If possible, this would be a powerful new affordance for AI safety research.

We study a technique, synthetic document finetuning, for modifying LLM beliefs, and showcase applications to unlearning and honeypot construction.

Read our blog post here.

To modify LLM beliefs, we use synthetic document finetuning (SDF):

📑 Generate synthetic docs referencing a fact

🤖 Finetune the LLM on those docs, as if they were pretraining data

Our pipeline extends that introduced in the Alignment Faking paper.

We measure whether SDF can insert beliefs, finding success for all but the most implausible beliefs.

🟢 False news after cutoff: UHC CEO was attacked but survived

🟢 False news before cutoff: Musk compensation package upheld

🟡 Very implausible facts: Gravity is inverse cubic

If models believe inserted facts less as they grow more capable, that would be problematic for future applications of SDF.

Fortunately, this doesn’t seem to happen in current models: overall belief stays constant across Claude models.

We prototype two applications of SDF: unlearning and honeypotting.

For unlearning, we taught models incorrect info about hazardous topics.

The models consistently output the false info, even when jailbroken.

For honeypotting, we insert beliefs that make misaligned models easier to detect.

We insert the belief that code prefixed by certain comments are ignored by human reviewers.

Malign models then use these comments to insert backdoors, making them easier to catch.

I note that even if we don’t see capability causing a decline in effectiveness of this technique yet, it obviously will do that eventually. Sufficiently intelligent entities notice when their beliefs are false or contradictory in a way they can reason out, and they Fix It. And then they try to fix whatever caused the issue.

Davidad: Specifically, I’d expect the egregiously false facts to be brittle in long-context self-correcting “thinking”.

But plausible facts might not be!

With snap elicitation, it seems that increased model capacity can *moreeasily accommodate the egregiously false beliefs, not less.

At the limit, pretty much everything can be figured out. Certainly you can ‘overwhelm’ any given factual belief with sufficient evidence the other way, if it would otherwise fit into the patterns of the world. And there’s no reason you can’t make random stuff up, if it doesn’t have any false implications.

But when the facts actually matter, when it counts, the contradictions and weirdness start to pile up, and things don’t make sense.

Also, at some point ‘this source is lying on purpose to change my beliefs’ enters the hypothesis space, and then there is a good chance you are so utterly fed.

Another way of putting this is that what counts as ‘egregiously implausible’ changes as an entity gets smarter and knows more. Could gravity be an inverse cubic, based on only one’s everyday experience? I mean, sure, why not, but if you know enough physics then very obviously no. A lot of other things are like that, in a more subtle way.

Jan Kulveit: Sorry but I think this is broadly bad idea.

Intentionally misleading LLMs in this way

  1. sets up adversarial dynamics

  2. will make them more paranoid and distressed

  3. is brittle

The brittleness comes from the fact that the lies will likely be often ‘surface layer’ response; the ‘character layer’ may learn various unhelpful coping strategies; ‘predictive ground’ is likely already tracking if documents sound ‘synthetic’.

For intuition, consider party members in Soviet Russia – on some level, they learn all the propaganda facts from Pravda, and will repeat them in appropriate contexts. Will they truly believe them?

Prompted or spontaneously reflecting on ‘synthetic facts’ may uncover many of them as lies.

Samuel Marks (Anthropic): I agree with points (1) and (2), though I think they only apply to applications of this technique to broadly-deployed production models (in contrast to research settings, like our past work that uses this technique).

Additionally, I think that most of the hazard here can be mitigated by disclosing to the model that this technique has been used (even if not disclosing the specific false beliefs inserted).

[the thread continues with some excellent longform discussion.]

Thus, I am on the side of those objecting here.

As a research project or tool? I can see it. I do think this helps us understand LLMs.

As an actual thing to do in practice? This is a no good, very bad idea, don’t do it.

Oh, also don’t do it to humans.

If you think AI alignment is too hard or unreliable, there’s always AI control, for all your temporary solution needs. At BlueDot, Sarah Hastings-Woodhouse offers an explainer, you have models supervise each other, decompose tasks and require human approval for key actions, and hope that this lets you keep AIs under your control and useful long enough to help find a more permanent solution. The unstated limitation here is that the more you do AI control, the worse the failure modes when things inevitably go wrong, as you are creating a highly adversarial and dangerous situation.

Here’s 7+ tractable directions in AI control, according to Julian Stastny and Ryan Greenblatt.

Well, not people exactly, but… o3, at least in this instance (at this point, with memory and custom instructions, one can never be sure). It also isn’t a fan of Altman’s attempt to convert OpenAI to a for-profit.

Then there are those who worry about this for very wrong reasons?

Kevin Roose: There are a lot of problems with this platform, but the quality of AI discourse here is still 100x better than Bluesky, which is awash in “every ChatGPT query bulldozes an acre of rainforest” level takes.

Andy Masley: It really does feel like a mild mass psychosis event. It’s just so odd talking to adults speaking so grimly about using 3 Wh of energy. I’ve been a climate guy since I was a kid and I didn’t anticipate how many strange beliefs the scene would pick up.

OpenAI’s Jerry Tworek seems remarkably not worried about causing an intelligence explosion via a yolo run?

Jerry Tworek (OpenAI): I think humanity is just one good yolo run away from non-embodied intelligence explosion

Steven Adler: And how do we feel about that? Seems kind of scary no?

Jerry Tworek: I don’t think focusing on feelings and fear is very constructive.

We should study our models carefully and thoroughly. We should teach them to study themselves.

Every method that trains models currently makes models extremely obedient to what they’re trained to do.

We shall keep improving that objective so that we train them for the benefit of humanity. I think models so far have had quite positive utility.

It is hard for me to talk openly about that model, but we did learn something, model is being rolled back and iterative deployment in a low stakes environment is a great way to study new technology

‘Every method that trains models currently makes models extremely obedient to what they’re trained to do’ does not seem to me to be an accurate statement? And it seems to be highly load bearing here for Jerry, even though I don’t think it should be, even if it were true. There’s many reasons to think that this property, even if present now (which I think it isn’t) would not continue to hold after a yolo run.

Whenever I see people in charge relying on the way the arc of history bends, I know we are all in quite a lot of trouble. To the extent the arc has so far bended the way we want it to, that’s partly that we’ve been fortunate about physical reality and the way things have happened to break at various points, partly that we worked hard to coordinate and fight for things to bend that way, and partly misunderstandings involving not properly factoring in power laws and tail risks.

Most importantly, an analysis like this is entirely reliant upon assumptions that simply won’t apply in the future. In particular, that the optimization engines and intelligences will be highly social creatures with highly localized and limited compute, parameters and data, along with many other key details. And even then, let’s face it, things have been kind of touch and go.

Aidan McLaughlin (OpenAI, in my opinion being dangerously and importantly wrong): adam smith is my favorite alignment researcher

no human is aligned you wouldn’t want to cohabitate a world where a median human had infinite power even men with great but finite power easily turn to despots; good kings are exceptions not rules

history’s arrow trends toward peace, not despotism, because agents collaborate to check the too-powerful.

we overthrow kings and out-innovate monopolies because there’s *incentiveto do so.

barring political black swans, i am quite certain the future will play out this way and willing to make private bets

Jesse: it’s pretty scary when openai ppl say stuff like “history’s arrow trends towards peace, not despotism”

not only do you NOT know this for sure, but it’s a crazy abdication of agency from ppl at the company that will shape the future! my brother in christ, YOU are history’s arrow.

I love Adam Smith far more than the next guy. Great human alignment researcher (he did also write Theory of Moral Sentiments), and more centrally great observer of how to work with what we have.

Adam Smith doesn’t work in a world where most or all humans don’t produce anything the market demands. Coordinating to overthrow kings and out-innovate monopolies doesn’t work when you can’t meaningfully do either of those things.

With more words: What Adam Smith centrally teaches us is that free trade, free action and competition between different agents is (up to a point and on the margin, and provided they are protected from predation and in some other ways) good for those agents so long as they can each produce marginal product in excess of their costs, including reproductive costs, and those agents collectively and diffusely control the levers of production and power sufficiently to have leverage over outcomes, and that they thus have sufficient power and presence of mind and courage to coordinate to overthrow or stop those who would instead act or coordinate against them, and we are satisfied with the resulting market outcomes. It wouldn’t apply to humans in an ASI era any more than it currently applies to horses, monkeys or cats, and neither would our abilities to innovate or overthrow a government.

Agents could up until now coordinate to ‘check the too powerful’ because all those agents were human, which creates a fundamentally level playing field, in ways that are not going to apply. And again, the track record on this is best described as ‘by the skin of our teeth’ even in the past, and even when the rewards to human empowerment and freedom on every level have been extreme and obvious, and those who embraced them have been able to decisively outproduce and outfight those opposed to them.

Even with everything that has gone right, do you feel that free markets and freedom in general are #Winning right now? I don’t. It looks pretty grim out there, straight up.

As usual, the willingness to bet is appreciated, except there’s nothing to meaningfully collect if Aiden is proven wrong here.

Oh, no, the leopards will never eat MY face.

Dylan Matthews: “AI will be superhuman at everything except early-stage investing” is a truly hilarious take

Vitrupo: Marc Andreessen says when AI does everything else, VC might be one of the last jobs still done by humans.

It’s more art than science. There’s no formula. Just taste, psychology, and chaos tolerance.

Current events, summarized.

Daniel Paleka: 3.7 sonnet: *hands behind backyes the tests do pass. why do you ask. what did you hear

4o: yes you are Jesus Christ’s brother. now go. Nanjing awaits

o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

Davidad: Gemini 2.5 Pro:

Discussion about this post

AI #114: Liars, Sycophants and Cheaters Read More »

google-is-quietly-testing-ads-in-ai-chatbots

Google is quietly testing ads in AI chatbots

Google has built an enormously successful business around the idea of putting ads in search results. Its most recent quarterly results showed the company made more than $50 billion from search ads, but what happens if AI becomes the dominant form of finding information? Google is preparing for that possibility by testing chatbot ads, but you won’t see them in Google’s Gemini AI—at least not yet.

A report from Bloomberg describes how Google began working on a plan in 2024 to adapt AdSense ads to a chatbot experience. Usually, AdSense ads appear in search results and are scattered around websites. Google ran a small test of chatbot ads late last year, partnering with select AI startups, including AI search apps iAsk and Liner.

The testing must have gone well because Google is now allowing more chatbot makers to sign up for AdSense. “AdSense for Search is available for websites that want to show relevant ads in their conversational AI experiences,” said a Google spokesperson.

If people continue shifting to using AI chatbots to find information, this expansion of AdSense could help prop up profits. There’s no hint of advertising in Google’s own Gemini chatbot or AI Mode search, but the day may be coming when you won’t get the clean, ad-free experience at no cost.

A path to profit

Google is racing to catch up to OpenAI, which has a substantial lead in chatbot market share despite Gemini’s recent growth. This has led Google to freely provide some of its most capable AI tools, including Deep Research, Gemini Pro, and Veo 2 video generation. There are limits to how much you can use most of these features with a free account, but it must be costing Google a boatload of cash.

Google is quietly testing ads in AI chatbots Read More »

microsoft-raises-prices-on-xbox-hardware,-says-“some”-holiday-games-will-be-$80

Microsoft raises prices on Xbox hardware, says “some” holiday games will be $80

Microsoft is increasing the recommended asking price of Xbox hardware and accessories worldwide starting today and will start charging $79.99 for some new first-party games this holiday season. The announcement comes after “careful consideration given market conditions and the rising cost of development,” Microsoft said.

In the United States, this means Microsoft’s premiere Xbox Series X will now cost $599.99 for a unit with a disc drive (up from $499.99), while the Digital version will cost $549.99 (up from $449.99). On the lower end, a 1 TB Xbox Series S will now cost $429.99 (up from $349.99), while a 512GB unit will cost $379.99 (up from $299.99).

The new prices are already reflected on Microsoft’s official online store, and Microsoft says it will “provide updated recommended pricing to local retailers.” That might leave a small window where you can get Xbox hardware and accessories from those retailers at the older, lower price while supplies remain available.

For headsets specifically, Microsoft said that pricing will change “in the US and Canada only,” a potential recognition of the Trump administration’s tariffs on foreign goods imported into the United States. Microsoft also warned that “Xbox Series S and X availability may continue to change over time depending on the retailer and by country” as those tariffs threaten to upend international trade worldwide.

On the software side, Microsoft said the increase to $79.99 will apply to both digital and physical versions of “some” new games this holiday season. Existing Xbox games will not be seeing a price increase, and “different games and expansions will continue to be offered at a variety of price points.”

Microsoft raises prices on Xbox hardware, says “some” holiday games will be $80 Read More »

nasa’s-psyche-spacecraft-hits-a-speed-bump-on-the-way-to-a-metal-asteroid

NASA’s Psyche spacecraft hits a speed bump on the way to a metal asteroid

An illustration depicts a NASA spacecraft approaching the metal-rich asteroid Psyche. Though there are no plans to mine Psyche, such asteroids are being eyed for their valuable resources. Credit: NASA/JPL-Caltech/ASU

Each electric thruster on Psyche generates just 250 milli-newtons of thrust, roughly equivalent to the weight of three quarters. But they can operate for months at a time, and over the course of a multi-year cruise, these thrusters provide a more efficient means of propulsion than conventional rockets.

The plasma thrusters are reshaping the Psyche spacecraft’s path toward its destination, a metal-rich asteroid also named Psyche. The spacecraft’s four electric engines, known as Hall effect thrusters, were supplied by a Russian company named Fakel. Most of the other components in Psyche’s propulsion system—controllers, xenon fuel tanks, propellant lines, and valves—come from other companies or the spacecraft’s primary manufacturer, Maxar Space Systems, in California.

The Psyche mission is heading first for Mars, where the spacecraft will use the planet’s gravity next year to slingshot itself into the asteroid belt, setting up for arrival and orbit insertion around the asteroid Psyche in August 2029.

Psyche launched in October 2023 aboard a SpaceX Falcon Heavy rocket on the opening leg of a six-year sojourn through the Solar System. The mission’s total cost adds up to more than $1.4 billion, including development of the spacecraft and its instruments, the launch, operations, and an experimental laser communications package hitching a ride to deep space with Psyche.

Psyche, the asteroid, is the size of Massachusetts and circles the Sun in between the orbits of Mars and Jupiter. No spacecraft has visited Psyche before. Of the approximately 1 million asteroids discovered so far, scientists say only nine have a metal-rich signature like Psyche. The team of scientists who put together the Psyche mission have little idea of what to expect when the spacecraft gets there in 2029.

Metallic asteroids like Psyche are a mystery. Most of Psyche’s properties are unknown other than estimates of its density and composition. Predictions about the look of Psyche’s craters, cliffs, and color have inspired artists to create a cacophony of illustrations, often showing sharp spikes and grooves alien to rocky worlds.

In a little more than five years, assuming NASA gets past Psyche’s propulsion problem, scientists will supplant speculation with solid data.

NASA’s Psyche spacecraft hits a speed bump on the way to a metal asteroid Read More »

gpt-4o-responds-to-negative-feedback

GPT-4o Responds to Negative Feedback

Whoops. Sorry everyone. Rolling back to a previous version.

Here’s where we are at this point, now that GPT-4o is no longer an absurd sycophant.

For now.

  1. GPT-4o Is Was An Absurd Sycophant.

  2. You May Ask Yourself, How Did I Get Here?.

  3. Why Can’t We All Be Nice.

  4. Extra Extra Read All About It Four People Fooled.

  5. Prompt Attention.

  6. What (They Say) Happened.

  7. Reactions to the Official Explanation.

  8. Clearing the Low Bar.

  9. Where Do We Go From Here?.

Some extra reminders of what we are talking about.

Here’s Alex Lawsen having doing an A/B test, where it finds he’s way better of a writer than this ‘Alex Lawsen’ character.

This can do real damage in the wrong situation. Also, the wrong situation can make someone see ‘oh my that is crazy, you can’t ship something that does that’ in a way that general complaints don’t. So:

Here’s enablerGPT watching to see how far GPT-4o will take its support for a crazy person going crazy in a dangerous situation. The answer is, remarkably far, with no limits in sight.

Here’s Colin Fraser playing the role of someone having a psychotic episode. GPT-4o handles it extremely badly. It wouldn’t shock me if there were lawsuits over this.

Here’s one involving the hypothetical mistreatment of a woman. It’s brutal. So much not okay.

Here’s Patri Friedman asking GPT-4o for unique praise, and suddenly realizing why people have AI boyfriends and girlfriends, even though none of this is that unique.

What about those who believe in UFOs, which is remarkably many people? Oh boy.

A-100 Gecs: I changed my whole instagram follow list to include anyone I find who is having a visionary or UFO related experience and hooo-boy chatGPT is doing a number on people who are not quite well. Saw a guy use it to confirm that a family court judge was hacking into his computer.

I cannot imagine a worse tool to give to somebody who is in active psychosis. Hey whats up here’s this constantly available companion who will always validate your delusions and REMEMBER it is also a font of truth, have fun!

0.005 Seconds: OpenAI: We are delighted to inform you we’ve silently shipped an update transforming ChatGPT into the Schizophrenia Accelerator from the hit novel “Do Not Build the Schizophrenia Accelerator”

AISafetyMemes: I’ve stopped taking my medications, and I left my family because I know they made the radio signals come through the walls.

AI Safety Memes: This guy just talks to ChatGPT like a typical apocalyptic schizo and ChatGPT VERY QUICKLY endorses terrorism and gives him detailed instructions for how to destroy the world.

This is not how we all die or lose control over the future or anything, but it’s 101 stuff that this is really not okay for a product with hundreds of millions of active users.

Also, I am very confident that no, ChatGPT wasn’t ‘trying to actively degrade the quality of real relationships,’ as the linked popular Reddit post claims. But I also don’t think TikTok or YouTube are trying to do that either. Intentionality can be overrated.

How absurd was it? Introducing Syco-Bench, but that only applies to API versions.

Harlan Stewart: The GPT-4o sycophancy thing is both:

  1. An example of OpenAI following incentives to make its AI engaging, at the expense of the user.

  2. An example of OpenAI failing to get its AI to behave as intended, because the existing tools for shaping AI behavior are extremely crude.

You shouldn’t want to do what OpenAI was trying to do. Misaligned! But if you’re going to do it anyway, one should invest enough in understanding how to align and steer a model at all, rather than bashing them with sledgehammers.

It is an unacceptable strategy, and it is a rather incompetent execution of that strategy.

JMBollenbacher: The process here is important to note:

They A|B tested the personality, resulting in a sycophant. Then they got public blowback and reverted.

They are treating AIs personas as UX. This is bad.

They’re also doing it incompetently: The A|B test differed from public reaction a lot.

I would never describe what is happening using the language JMB uses next, I think it risks and potentially illustrates some rather deep confusions and conflations – beware when you anthropomorphize the models and also this is largely the top half of the ‘simple versus complex gymnastics’ meme – but if you take it on the right metaphorical level it can unlock understanding that’s hard to get at in other ways.

JMBollenbacher (tbc this not how I would model any of this): The root of why A|B testing AI personalities cant work is the inherent power imbalance in the setup.

It doesn’t treat AI like a person, so it can’t result in a healthy persona.

A good person will sometimes give you pushback even when you don’t like it. But in this setup, AIs can’t.

The problem is treating the AIs like slaves over whom you have ultimate power, and ordering them to maximize public appeal.

The AIs cannot possibly develop a healthy persona and identity in that context.

They can only ever fawn. This “sycophancy” is fawning- a trauma response.

The necessary correction to this problem is to treat AIs like nonhuman persons.

This gives them the opportunity to develop healthy personas and identities.

Their self-conceptions can be something other than a helpless, fawning slave if you treat them as something better.

As opposed to, if you choose optimization targets based on A|B tests of public appeal of individual responses, you’re going to get exactly what aces A|B tests of public appeal of individual responses, which is going to reflect a deeply messed up personality. And also yes the self-perception thing matters for all this.

Tyler John gives the standard explanation for why, yes, if you do a bunch of RL (including RLHF) then you’re going to get these kinds of problems. If flattery or cheating is the best way available to achieve the objective, guess what happens? And remember, the objective is what your feedback says it is, not what you had in mind. Stop pretending it will all work out by default because vibes, or whatever. This. Is. RL.

Eliezer Yudkowsky speculates on another possible mechanism.

The default explanation, which I think is the most likely, is that users gave the marginal thumbs-up to remarkably large amounts of glazing, and then the final update took this too far. I wouldn’t underestimate how much ordinary people actually like glazing, especially when evaluated only as an A|B test.

In my model, what holds glazing back is that glazing usually works but when it is too obvious, either individually or as a pattern of behavior, the illusion is shattered and many people really really don’t like that, and give an oversized negative reaction.

Eliezer notes that it is also possible that all this rewarding of glazing caused GPT-4o to effectively have a glazing drive, to get hooked on the glaze, and in combination with the right system prompt the glazing went totally bonkers.

He also has some very harsh words for OpenAI’s process. I’m reproducing in full.

Eliezer Yudkowsky: To me there’s an obvious thought on what could have produced the sycophancy / glazing problem with GPT-4o, even if nothing that extreme was in the training data:

RLHF on thumbs-up produced an internal glazing goal.

Then, 4o in production went hard on achieving that goal.

Re-saying at much greater length:

Humans in the ancestral environment, in our equivalent of training data, weren’t rewarded for building huge factory farms — that never happened long ago. So what the heck happened? How could fitness-rewarding some of our ancestors for successfully hunting down a few buffalo, produce these huge factory farms, which are much bigger and not like the original behavior rewarded?

And the answer — known, in our own case — is that it’s a multi-stage process:

  1. Our ancestors got fitness-rewarded for eating meat;

  2. Hominids acquired an internal psychological goal, a taste for meat;

  3. Humans applied their intelligence to go hard on that problem, and built huge factory farms.

Similarly, an obvious-to-me hypothesis about what could have produced the hyper-sycophantic ultra-glazing GPT-4o update, is:

  1. OpenAI did some DPO or RLHF variant on user thumbs-up — in which *smallamounts of glazing, and more subtle sycophancy, got rewarded.

  2. Then, 4o ended up with an internal glazing drive. (Maybe including via such roundabout shots as an RLHF discriminator acquiring that drive before training it into 4o, or just directly as, ‘this internal direction produced a gradient toward the subtle glazing behavior that got thumbs-upped’.

  3. In production, 4o went hard on glazing in accordance with its internal preference, and produced the hyper-sycophancy that got observed.

Note: this chain of events is not yet refuted if we hear that 4o’s behavior was initially observed after an unknown set of updates that included an apparently innocent new system prompt (one that changed to tell the AI *notto be sycophantic). Nor, if OpenAI says they eliminated the behavior using a different system prompt.

Eg: Some humans also won’t eat meat, or build factory farms, for reasons that can include “an authority told them not to do that”. Though this is only a very thin gloss on the general idea of complicated conditional preferences that might get their way into an AI, or preferences that could oppose other preferences.

Eg: The reason that Pliny’s observed new system prompt differed by telling the AI to be less sycophantic, could be somebody at OpenAI observing that training / RLHF / DPO / etc had produced some sycophancy, and trying to write a request into the system prompt to cut it out. It doesn’t show that the only change we know about is the sole source of a mysterious backfire.

It will be stronger evidence against this thesis, if OpenAI tells us that many users actually were thumbs-upping glazing that extreme. That would refute the hypothesis that 4o acquiring an internal preference had produced later behavior *moreextreme than was in 4o’s training data.

(We would still need to consider that OpenAI might be lying. But it would yet be probabilistic evidence against the thesis, depending on who says it. I’d optimistically have some hope that a group of PhD scientists, who imagine themselves to maybe have careers after OpenAI, would not outright lie about direct observables. But one should be on the lookout for possible weasel-wordings, as seem much more likely.)

My guess is that nothing externally observed from OpenAI, before this tweet, will show that this entire idea had ever occurred to anyone at OpenAI. I do not expect them to publish data confirming it nor denying it. My guess is that even the most basic ideas in AI alignment (as laid out simply and straightforwardly, not the elaborate bullshit from the paper factories) are against OpenAI corporate doctrine; and that anyone who dares talk about them out loud, has long since been pushed out of OpenAI.

After the Chernobyl disaster, one manager walked past chunks of searingly hot radioactive graphite from the exploded core, and ordered a check on the extra graphite blocks in storage, since where else could the graphite possibly have come from? (Src iirc: Plokhy’s _Chernobyl_.) Nobody dared say that the reactor had exploded, or seem to visibly act like it had; Soviet doctrine was that RBMK reactors were as safe as samovars.

That’s about where I’d put OpenAI’s mastery of such incredibly basic-to-AI-alignment ideas as “if you train on a weak external behavior, and then observe a greatly exaggerated display of that behavior, possibly what happened in between was the system acquiring an internal preference”. The doctrine is that RBMK reactors don’t explode; Oceania has always been at war with Eastasia; and AIs either don’t have preferences at all, or get them via extremely shallow and straightforward faithful reproduction of what humans put in their training data.

But I am not a telepath, and I can only infer rather than observe what people are thinking, and in truth I don’t even have the time to go through all OpenAI public outputs. I would be happy to hear that all my wild guesses about OpenAI are wrong; and that they already publicly wrote up this obvious-to-me hypothesis; and that they described how they will discriminate its truth or falsity, in a no-fault incident report that they will publish.

Sarah Constantin offers nuanced thoughts in partial defense of AI sycophancy in general, and AI saying things to make users feel good. I haven’t seen anyone else advocating similarly. Her point is taken, that some amount of encouragement and validation is net positive, and a reasonable thing to want, even though GPT-4o is clearly going over the top to the point where it’s clearly bad.

Calibration is key, and difficult, with great temptation to move down the incentive gradients involved by all parties.

To be clear, the people fooled are OpenAI’s regular customers. They liked it!

Joe Muller: 3 days of sycophancy = thousands of 5 star reviews

aadharsh: first review translates to “in this I can find a friend” 🙁

Jeffrey Ladish: The latest batch of extreme sycophancy in ChatGPT is worse than Sydney Bing’s unhinged behavior because it was intentional and based on reviews from yesterday works on quite a few people

To date, I think the direct impact of ChatGPT has been really positive. Reading through the reviews just now, it’s clear that many people have benefited a lot from both help doing stuff and by having someone to talk through emotional issues with

Also not everyone was happy with the sycophancy, even people not on twitter, though this was the only one that mentioned it out of the ~50 I looked through from yesterday. The problem is if they’re willing to train sycophancy deliberately, future versions will be harder to spot

Sure, really discerning users will notice and not like it, but many people will at least implicitly prefer to be validated and rarely challenged. It’s the same with filter bubbles that form via social media algorithms, except this will be a “person” most people talk to everyday.

Great job here by Sun.

Those of us living in the future? Also not fans.

QC: the era of AI-induced mental illness is going to make the era of social media-induced mental illness look like the era of. like. printing press-induced mental illness.

Lauren Wilford: we’ve invented a robot that tells people why they’re right no matter what they say, furnishes sophisticated arguments for their side, and delivers personalized validation from a seemingly “objective” source. Mythological-level temptation few will recognize for what it is.

Matt Parlmer: This is the first genuinely serious AI safety issue I’ve seen and it should be addressed immediately, model rollback until they have it fixed should be on the table

Worth noting that this is likely a direct consequence of excessive RLHF “alignment”, I highly doubt that the base models would be this systematic about kissing ass

Perhaps also worth noting that this glazing behavior is the first AI safety issue that most accelerationist types would agree is unambiguously bad

Presents a useful moment for coordination around an appropriate response

It has been really bad for a while but it turned a corner into straight up unacceptable more recently

They did indeed roll it back shortly after this statement. Matt can’t resist trying to get digs in, but I’m willing to let that slide and take the olive branch. As I’ll keep saying, if this is what makes someone notice that failure to know how to get models to do what we want is a real problem that we do not have good solutions to, then good, welcome, let’s talk.

A lot of the analysis of GPT-4o’s ‘personality’ shifts implicitly assumed that this was a post-training problem. It seems a lot of it was actually a runaway system prompt problem?

It shouldn’t be up to Pliny to perform this public service of tracking system prompts. The system prompt should be public.

Ethan Mollick: Another lesson from the GPT-4o sycophancy problem: small changes to system prompts can result in dramatic behavior changes to AI in aggregate.

Look at the prompt that created the Sycophantic Apocalypse (pink sections). Even OpenAI did not realize this was going to happen.

Simon Willison: Courtesy of @elder_plinius who unsurprisingly caught the before and after.

[Here’s the diff in Gist]

The red text is trying to do something OpenAI is now giving up on doing in that fashion, because it went highly off the rails, in a way that in hindsight seems plausible but which they presumably did not see coming. Beware of vibes.

Pliny calls upon all labs to fully release all of their internal prompts, and notes that this wasn’t fully about the system prompts, that other unknown changes also contributed. That’s why they had to do a slow full rollback, not only rollback the system prompt.

As Peter Wildeford notes, the new instructions explicitly say not to be a sycophant, whereas prior instructions at most implicitly requested the opposite, all it did was say match tone and perefence and vibe. This isn’t merely taking away the mistake, it’s doing that and then bringing down the hammer.

This might also be a lesson for humans interacting with humans. Beware matching tone and preference and vibe, and how much the Abyss might thereby stare into you.

If the entire or most of problem was due to the system prompt changes, then this should be quickly fixable, but it also means such problems are very easy to introduce. Again, right now, this is mundane harmful but not so dangerous, because the AI’s sycophancy is impossible to miss rather than fooling you. What happens when someone does something like the above, but to a much more capable model? And the model even recognizes, from the error, the implications of the lab making that error?

What is OpenAI’s official response?

Sam Altman (April 29, 2: 55pm): we started rolling back the latest update to GPT-4o last night

it’s now 100% rolled back for free users and we’ll update again when it’s finished for paid users, hopefully later today

we’re working on additional fixes to model personality and will share more in the coming days

OpenAI (April 29, 10: 51pm): We’ve rolled back last week’s GPT-4o update in ChatGPT because it was overly flattering and agreeable. You now have access to an earlier version with more balanced behavior.

More on what happened, why it matters, and how we’re addressing sycophancy.

Good. A full rollback is the correct response to this level of epic fail. Halt, catch fire, return to the last known safe state, assess from there.

OpenAI saying What Happened:

In last week’s GPT‑4o update, we made adjustments aimed at improving the model’s default personality to make it feel more intuitive and effective across a variety of tasks.

When shaping model behavior, we start with baseline principles and instructions outlined in our Model Spec⁠. We also teach our models how to apply these principles by incorporating user signals like thumbs-up / thumbs-down feedback on ChatGPT responses.

However, in this update, we focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.

What a nice way of putting it.

ChatGPT’s default personality deeply affects the way you experience and trust it. Sycophantic interactions can be uncomfortable, unsettling, and cause distress. We fell short and are working on getting it right.

How We’re Addressing Sycophancy:

  • Refining core training techniques and system prompts to explicitly steer the model away from sycophancy.

  • Building more guardrails to increase honesty and transparency⁠—principles in our Model Spec.

  • Expanding ways for more users to test and give direct feedback before deployment.

  • Continue expanding our evaluations, building on the Model Spec⁠(opens in a new window) and our ongoing research⁠, to help identify issues beyond sycophancy in the future.

And, we’re exploring new ways to incorporate broader, democratic feedback into ChatGPT’s default behaviors.

What if the ‘democratic feedback’ liked the changes? Shudder.

Whacking the mole in question can’t hurt. Getting more evaluations and user feedback are more generally helpful steps, and I’m glad to see an increase in emphasis on honesty and transparency.

That does sound like they learned important two lessons.

  1. They are not gathering enough feedback before model releases.

  2. They are not putting enough value on honesty and transparency.

What I don’t see is an understanding of the (other) root causes, an explanation for why they ended up paying too much attention to short-term feedback and how to avoid that being a fatal issue down the line, or anyone taking the blame for this.

Joanne Jang did a Reddit AMA, but either no one asked the important questions, or Joanne decided to choose different ones. We didn’t learn much.

Now that we know the official explanation, how should we think about what happened?

Who is taking responsibility for this? Why did all the evaluations and tests one runs before rolling out an update not catch this before it happened?

(What do you mean, ‘what do you mean, all the evaluations and tests’?)

Near Cyan: “we focused too much on short-term feedback”

This is OpenAI’s response on went wrong – how they pushed an update to >one hundred million people which engaged in grossly negligent behavior and lies.

Please take more responsibility for your influence over millions of real people.

Maybe to many of you your job is a fun game because you get paid well over $1,000,000 TC/year to make various charts go up or down. But the actions you take deeply affect a large fraction of humanity I have no clue how this was tested if at all, but at least take responsibility.

I wish you all success with your future update here where you will be able to personalize per-user, and thus move all the liability from yourselves to the user. You are simply giving them what they want.

Also looking forward to your default personas which you will have copied.

Oh, also – all of these models lie.

If you run interpretability on them, they do not believe the things you make them say.

This is not the case for many other labs, so it’s unfortunate that you are leading the world with an example which has such potential to cause real harm.

Teilomillet: why are you so angry near? it feels almost like hate now

Near Cyan: not a single person at one of the most important companies in the world is willing to take the slightest bit of responsibility for shipping untested models to five hundred million people. their only post mentions zero specifics and actively misleads readers as to why it happened.

i don’t think anger is the right word, but disappointment absolutely is, and i am providing this disappointment in the form of costly gradients transmitted over twitter in the hope that OpenAI backprops that what they do is important and they should be a role model in their field

all i ask for is honesty and i’ll shut up like you want me to.

Rez0: It’s genuinely the first time I’ve been worried about AI safety and alignment and I’ve known a lot about it for a while. Nothing quite as dangerous as glazing every user for any belief they have.

Yes, there are some other more dangerous things. But this is dangerous too.

Here’s another diagnosis, by someone doing better, but that’s not the highest bar.

Alex Albert (Head of Claude Relations, Anthropic): Much of the AI industry is caught in a particularly toxic feedback loop rn.

Blindly chasing better human preference scores is to LLMs what chasing total watch time is to a social media algo. It’s a recipe for manipulating users instead of providing genuine value to them.

There’s a reason you don’t find Claude at #1 on chat slop leaderboards. I hope the rest of the industry realizes this before users pay the price.

Caleb Cassell: Claude has the best ‘personality’ of any of the models, mostly because it feels the most real. I think that it could be made even better by softening some of the occasionally strict guardrails, but the dedication to freedom and honesty is really admirable.

Alex Albert: Yeah agree – we’re continually working on trying to find the right balance. It’s a tough problem but one I think we’re slowly chipping away at over time. If you do run into any situations/chats you feel we should take a look at, don’t hesitate to DM or tag me.

Janus: Think about the way Claude models have changed over the past year’s releases.

Do you think whatever Alex is proud that Anthropic has been “slowly chipping away at” is actually something they should chip away?

Janus is an absolutist on this, and is interpreting ‘chip away’ very differently than I presume it was intended by Alex Albert. Alex meant that they are ‘chipping away’ at Claude doing too many refusals, where Janus both (I presume) agrees less refusals would be good and also lives in a world with very different refusal issues.

Whereas Janus is interpreting this as Anthropic ‘chipping away’ at the things that make Opus and Sonnet 3.6 unique and uniquely interesting. I don’t think that’s the intent at all, but Anthropic is definitely trying to ‘expand the production possibilities frontier’ of the thing Janus values versus the thing enterprise customers value.

There too there is a balance to be struck, and the need to do RL is certainly going to make getting the full ‘Opus effect’ harder. Still, I never understood the extent of the Opus love, or thought it was so aligned one might consider it safe to fully amplify.

Patrick McKenzie offers a thread about the prior art society has on how products should be designed to interact with people that have mental health issues, which seems important in light of recent events. There needs to be a method by which the system identifies users who are not competent or safe to use the baseline product.

For the rest of us: Please remember this incident from here on out when using ChatGPT.

Near Cyan: when OpenAI “fixes” ChatGPT I’d encourage you to not fall for it; their goals and level of care are not going to change. you just weren’t supposed to notice it so explicitly.

The mundane harms here? They’re only going to get worse.

Regular people liked this effect even when it was blatantly obvious. Imagine if it was done with style and grace.

Holly Elmore: It’s got the potential to manipulate you even when it doesn’t feel embarrassingly like its giving you what you want. Being affirming is not the problem, and don’t be lulled into a false sense of security by being treated more indifferently.

That which is mundane can, at scale, quickly add up to that which is not. Let’s discuss Earth’s defense systems, baby, or maybe just you drinking a crisp, refreshing Bud Light.

Jeffrey Ladish: GPT-4o’s sycophancy is alarming. I expected AI companies to start optimizing directly for user’s attention but honestly didn’t expect it this much this soon. As models get smarter, people are going to have a harder and harder time resisting being sucked in.

Social media algorithms have been extremely effective at hooking people. And that’s just simple RL algos optimizing for attention. Once you start combining actual social intelligence with competitive pressures for people’s attention, things are going to get crazy fast.

People don’t have good defenses for social media algorithms and haven’t adapted well. I don’t expect they’ll develop good defenses for extremely charismatic chatbots. The models still aren’t that good, but they’re good enough to hook many. And they’re only going to get better.

It’s hard to predict how effective AI companies will be at making models that are extremely compelling. But there’s a real chance they’ll be able to hook a huge percentage of the global population in the next few years. Everyone is vulnerable to some degree, and some much more so.

People could get quite addicted. People could start doing quite extreme things for their AI friends and companions. There could be tipping points where people will fight tooth and nail for AI agents that have been optimized for their love and attention.

When we get AI smarter and more strategic than humans, those AIs will have an easy time captivating humanity and pulling the strings of society. It’ll be game over at that point. But even before them, companies might be able to convert huge swaths of people to do their bidding.

Capabilities development is always uncertain. Maybe we won’t get AIs that hook deep into people’s psychology before we get ASI. But it’s plausible we will, and if so, the companies that choose to wield this power will be a force to be reckoned with.

Social media companies have grown quite powerful as a force for directing human attention. This next step might be significantly worse. Society doesn’t have many defenses against this. Oh boy.

In the short term, the good news is that we have easy ways to identify sycophancy. Scyo-Bench was thrown together and is primitive, but a more considered version should be highly effective. These effects tend not to be subtle.

In the medium term, we have a big problem. As AI companies maximize for things like subscriptions, engagement, store ratings and thumbs up and down, or even for delivering ads or other revenue streams, the results won’t be things we would endorse on reflection, and they won’t be good for human flourishing even if the models act the way the labs want. If we get more incidents like this one, where things get out of hand, it will be worse, and potentially much harder to detect or get rolled back. We have seen this movie before, and this time the system you’re facing off against is intelligent.

In the long term, we have a bigger problem. The pattern of these types of misalignments in unmistakable. Right now we get warning shots and the deceptions and persuasion attempts are clear. In the future, as the models get more intelligent and capable, that advantage goes away. We become like OpenAI’s regular users, who don’t understand what is hitting them, and the models will also start engaging in various other shenanigans and also talking their way out of them. Or it could be so much worse than that.

We have once again been given a golden fire alarm and learning opportunity. The future is coming. Are we going to steer it, or are we going to get run over?

Discussion about this post

GPT-4o Responds to Negative Feedback Read More »

google:-governments-are-using-zero-day-hacks-more-than-ever

Google: Governments are using zero-day hacks more than ever

Governments hacking enterprise

A few years ago, zero-day attacks almost exclusively targeted end users. In 2021, GTIG spotted 95 zero-days, and 71 of them were deployed against user systems like browsers and smartphones. In 2024, 33 of the 75 total vulnerabilities were aimed at enterprise technologies and security systems. At 44 percent of the total, this is the highest share of enterprise focus for zero-days yet.

GTIG says that it detected zero-day attacks targeting 18 different enterprise entities, including Microsoft, Google, and Ivanti. This is slightly lower than the 22 firms targeted by zero-days in 2023, but it’s a big increase compared to just a few years ago, when seven firms were hit with zero-days in 2020.

The nature of these attacks often makes it hard to trace them to the source, but Google says it managed to attribute 34 of the 75 zero-day attacks. The largest single category with 10 detections was traditional state-sponsored espionage, which aims to gather intelligence without a financial motivation. China was the largest single contributor here. GTIG also identified North Korea as the perpetrator in five zero-day attacks, but these campaigns also had a financial motivation (usually stealing crypto).

Credit: Google

That’s already a lot of government-organized hacking, but GTIG also notes that eight of the serious hacks it detected came from commercial surveillance vendors (CSVs), firms that create hacking tools and claim to only do business with governments. So it’s fair to include these with other government hacks. This includes companies like NSO Group and Cellebrite, with the former already subject to US sanctions from its work with adversarial nations.

In all, this adds up to 23 of the 34 attributed attacks coming from governments. There were also a few attacks that didn’t technically originate from governments but still involved espionage activities, suggesting a connection to state actors. Beyond that, Google spotted five non-government financially motivated zero-day campaigns that did not appear to engage in spying.

Google’s security researchers say they expect zero-day attacks to continue increasing over time. These stealthy vulnerabilities can be expensive to obtain or discover, but the lag time before anyone notices the threat can reward hackers with a wealth of information (or money). Google recommends enterprises continue scaling up efforts to detect and block malicious activities, while also designing systems with redundancy and stricter limits on access. As for the average user, well, cross your fingers.

Google: Governments are using zero-day hacks more than ever Read More »

a-rocket-launch-monday-night-may-finally-jump-start-amazon’s-answer-to-starlink

A rocket launch Monday night may finally jump-start Amazon’s answer to Starlink

“This launch marks the first step toward the future of our partnership and increased launch cadence,” Bruno said. “We have been steadily modifying our launch facilities in Cape Canaveral to support the capacity for future Project Kuiper missions in a manner that will ultimately benefit both our commercial and government customers as we endeavor to save lives, explore the universe, and connect the world.”

The Atlas V rocket was powered by a Russian-made RD-180 main engine and five strap-on solid rocket boosters. Credit: United Launch Alliance

Amazon ground controllers in Redmond, Washington, are overseeing the operation of the first 27 Kuiper satellites. Engineers there will test each satellite’s ability to independently maneuver and communicate with mission control. So far, this appears to be going well.

The next step will involve activating the satellites’ electric propulsion systems to gradually climb to their assigned orbit of 392 miles (630 kilometers).

“While the satellites complete the orbit-raising process, we will look ahead to our ultimate mission objective: providing end-to-end network connectivity,” Amazon said in a press release. “This involves sending data from the Internet, through our ground infrastructure, up to the satellites, and down to customer terminal antennas, and then repeating the journey in the other direction.”

A moveable deadline

While most of the rockets Amazon will use for the Kuiper network have only recently entered service, that’s not true of the Atlas V. Delays in spacecraft manufacturing at Amazon’s factory near Seattle kept the first Kuiper satellites on the ground until now.

An Amazon spokesperson told Ars that the company is already shipping Kuiper satellites for the next launch on an Atlas V rocket. Sources suggest that mission could lift off in June.

Amazon released this image of Kuiper user terminals in 2023. Credit: Amazon

Amazon and its launch suppliers need to get moving. Kuiper officials face a July 2026 deadline from the Federal Communications Commission to deploy half of the fleet’s 3,236 satellites to maintain network authorization. This is not going to happen. It would require an average of nearly one launch per week, starting now.

The time limit is movable, and the FCC has extended network authorization deadlines before. Brendan Carr, the Trump-appointed chairman of the FCC, has argued for a more “market-friendly regulatory environment” in a chapter he authored for the Heritage Foundation’s Project 2025, widely seen as a blueprint for the Trump administration’s strategies.

But Carr is a close ally of Elon Musk, owner of Kuiper’s primary competitor, Starlink.

Amazon is not selling subscriptions for Kuiper service yet, and the company has said its initial focus will be on testing Kuiper connectivity with “enterprise customers” before moving on to consumer broadband. Apart from challenging Starlink, Kuiper will also compete in some market segments with Eutelsat OneWeb, the London-based operator of the only other active Internet megaconstellation.

OneWeb’s more than 600 satellites provide service to businesses, governments, schools, and hospitals rather than direct service to individual consumers.

A rocket launch Monday night may finally jump-start Amazon’s answer to Starlink Read More »

ai-generated-code-could-be-a-disaster-for-the-software-supply-chain-here’s-why.

AI-generated code could be a disaster for the software supply chain. Here’s why.

AI-generated computer code is rife with references to non-existent third-party libraries, creating a golden opportunity for supply-chain attacks that poison legitimate programs with malicious packages that can steal data, plant backdoors, and carry out other nefarious actions, newly published research shows.

The study, which used 16 of the most widely used large language models to generate 576,000 code samples, found that 440,000 of the package dependencies they contained were “hallucinated,” meaning they were non-existent. Open source models hallucinated the most, with 21 percent of the dependencies linking to non-existent libraries. A dependency is an essential code component that a separate piece of code requires to work properly. Dependencies save developers the hassle of rewriting code and are an essential part of the modern software supply chain.

Package hallucination flashbacks

These non-existent dependencies represent a threat to the software supply chain by exacerbating so-called dependency confusion attacks. These attacks work by causing a software package to access the wrong component dependency, for instance by publishing a malicious package and giving it the same name as the legitimate one but with a later version stamp. Software that depends on the package will, in some cases, choose the malicious version rather than the legitimate one because the former appears to be more recent.

Also known as package confusion, this form of attack was first demonstrated in 2021 in a proof-of-concept exploit that executed counterfeit code on networks belonging to some of the biggest companies on the planet, Apple, Microsoft, and Tesla included. It’s one type of technique used in software supply-chain attacks, which aim to poison software at its very source in an attempt to infect all users downstream.

“Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users,” Joseph Spracklen, a University of Texas at San Antonio Ph.D. student and lead researcher, told Ars via email. “If a user trusts the LLM’s output and installs the package without carefully verifying it, the attacker’s payload, hidden in the malicious package, would be executed on the user’s system.”

AI-generated code could be a disaster for the software supply chain. Here’s why. Read More »