AI #115: The Evil Applications Division

It can be bleak out there, but the candor is very helpful, and you occasionally get a win.

Zuckerberg is helpfully saying all his dystopian AI visions out loud. OpenAI offered us a better post-mortem on the GPT-4o sycophancy incident than I was expecting, although far from a complete explanation or learning of lessons, and the rollback still leaves plenty sycophancy in place.

The big news was the announcement by OpenAI that the nonprofit will retain nominal control, rather than the previous plan of having it be pushed aside. We need to remain vigilant, the fight is far from over, but this was excellent news.

Then OpenAI dropped another big piece of news, that board member and former head of Facebook’s engagement loops and ad yields Fidji Simo would become their ‘uniquely qualified’ new CEO of Applications. I very much do not want her to take what she learned at Facebook about relentlessly shipping new products tuned by A/B testing and designed to maximize ad revenue and engagement, and apply it to OpenAI. That would be doubleplus ungood.

Gemini 2.5 got a substantial upgrade, but I’m waiting to hear more, because opinions differ sharply as to whether the new version is an improvement.

One clear win is Claude getting a full high quality Deep Research product. And of course there are tons of other things happening.

Also covered this week: OpenAI Claims Nonprofit Will Retain Nominal Control, Zuckerberg’s Dystopian AI Vision, GPT-4o Sycophancy Post Mortem, OpenAI Preparedness Framework 2.0.

Not included: Gemini 2.5 Pro got an upgrade, recent discussion of students using AI to ‘cheat’ on assignments, full coverage of MIRI’s AI Governance to Avoid Extinction.

  1. Language Models Offer Mundane Utility. Read them and weep.

  2. Language Models Don’t Offer Mundane Utility. Why so similar?

  3. Take a Wild Geoguessr. Sufficient effort levels are indistinguishable from magic.

  4. Write On. Don’t chatjack me, bro. Or at least show some syntherity.

  5. Get My Agent On The Line. Good enough for the jobs you weren’t going to do.

  6. We’re In Deep Research. Claude joins the full Deep Research club, it seems good.

  7. Be The Best Like No One Ever Was. Gemini completes Pokemon Blue.

  8. Huh, Upgrades. MidJourney gives us Omni Reference, Claude API web search.

  9. On Your Marks. Combine them all with Glicko-2.

  10. Choose Your Fighter. They’re keeping it simple. Right?

  11. Upgrade Your Fighter. War. War never changes. Except, actually, it does.

  12. Unprompted Suggestions. Prompting people to prompt better.

  13. Deepfaketown and Botpocalypse Soon. It’s only paranoia when you’re too early.

  14. They Took Our Jobs. It’s coming. For you job. All the jobs. But this quickly?

  15. The Art of the Jailbreak. Go jailbreak yourself?

  16. Get Involved. YC likes AI startups, quests AI startups to go with its AI startups.

  17. OpenAI Creates Distinct Evil Applications Division. Not sure if that’s unfair.

  18. In Other AI News. Did you know Apple is exploring AI search? Sell! Sell it all!

  19. Show Me the Money. OpenAI buys Windsurf, agent startups get funded.

  20. Quiet Speculations. Wait, you people knew how to write?

  21. Overcoming Diffusion Arguments Is a Slow Process Without a Clear Threshold Effect.

  22. Chipping Away. Export control rules will change, the question is how.

  23. The Quest for Sane Regulations. Maybe we should stop driving away the AI talent.

  24. Line in the Thinking Sand. The lines are insufficiently red.

  25. The Week in Audio. My audio, Jack Clark on Conversations with Tyler, SB 1047.

  26. Rhetorical Innovation. How about a Sweet Lesson, instead.

  27. A Good Conversation. Arvind and Ajeya search for common ground.

  28. The Urgency of Interpretability. Of all the Darios, he is still the Darioest.

  29. The Way. Amazon seeks out external review.

  30. Aligning a Smarter Than Human Intelligence is Difficult. Emergent results.

  31. People Are Worried About AI Killing Everyone. A handy MIRI flow chart.

  32. Other People Are Not As Worried About AI Killing Everyone. Paul Tutor Jones.

  33. The Lighter Side. For whose who want a more casual version.

Use a lightweight version of Grok as the Twitter recommendation algorithm? No way, you’re kidding, he didn’t just say what I think he did, did he? I mean, super cool if he figures out the right implementation, but I am highly skeptical that happens.

State Bar of California used AI to help draft its 2025 bar exam. Why not, indeed?

Make the right play, eventually.

Leigh Marie Braswell: Have decided to allow this at my poker nights.

Adam: guy at poker just took a picture of his hand, took a picture of the table, sent them both to o3, stared at his phone for a few minutes… and then folded.

Justin Reidy (reminder that poker has already been solved by bots, that does not stop people from talking like this): Very curious how this turns out. Models can’t bluff. Or read a bluff. Poker is irrevocably human.

I’d only be tempted to allow this given that o3 isn’t going to be that good at it. I wouldn’t let someone use a real solver at the table, that would destroy the game. And if they did this all the time, the delays would be unacceptable. But if someone wants to do this every now and then, I am guessing allowing this adds to your alpha. Remember, it’s all about table selection.

Yeah, definitely ngmi, sorry.

Daniel Eth: When you go to the doctor and he pulls up 4o instead of o3 🚩🚩🚩🚩🚩

George Darroch: “Wow, you’re really onto something here. You have insights into your patients that not many possess, and that’s special.”

Actually, in this context, I think the doctor is right, if you actually look at the screen.

Mayank Jain; Took my dad in to the doctor cus he sliced his finger with a knife and the doctor was using ChatGPT 😂

Based on the chat history, it’s for every patient.

AJ: i actually think this is great, looks like its saving him time on writing up post visit notes.

He’s not actually using GPT-4o to figure out what to do. That’s crazy talk, you use o3.

What he’s doing is translating the actual situations into medical note speak. In that case, sure, 4o should be fine, and it’s faster.

AI is only up to ~25% of code written inside Microsoft, Zuckerberg reiterates his expectation of ~50% within a year and seems to have a weird fetish that only Llama should be used to write Llama.

But okay, let’s not get carried away:

Stephen McAleer (OpenAI): What’s the point in reading nonfiction anymore? Just talk with o3.

Max Winga: Because I want to read nonfiction.

Zvi Mowshowitz: Or, to disambiguate just in case: I want to read NON-fiction.

Nathan HB: To clarify further: a jumbled mix of fiction and nonfiction, with no differentiating divisions is not called ‘nonfiction’, it is called ‘hard sci-fi’.

Humans are still cheaper than AIs at any given task if you don’t have to pay them, and also can sort physical mail and put things into binders.

A common misconception, easy mistake to make…

Ozy Brennan: AI safety people are like. we made these really smart entities. smarter than you. also they’re untrustworthy and we don’t know what they want. you should use them all the time

I’m sorry you want me to get therapy from the AI???? the one you JUST got done explaining to me is a superpersuader shoggoth with alien values who might take over the world and kill everyone???? no????

No. We are saying that in the future it is going to be a superpersuader shoggoth with alien values who might take over the world and kill everyone.

But that’s a different AI, and that’s in the future.

For now, it’s only a largely you-directed potentially-persuader shoggoth with subtly alien and distorted values that might be a lying liar or an absurd sycophant, but you’re keeping up with which ones are which, right?

As opposed to the human therapist, who is a less you-directed persuader semi-shoggoth with alien and distorted (e.g. professional psychiatric mixed with trying to make money off you) values, that might be a lying liar or an absurd sycophant and so on, but without any way to track which ones are which, and that is charging you a lot more per hour and has to be seen on a fixed schedule.

The choice is not that clear. To be fair, the human can also give you SSRIs and a benzo.

Ozy Brennan:

  1. isn’t the whole idea that we won’t necessarily be able to tell when they become unsafe?

  2. I can see the argument, but unfortunately I have read the complete works of H. P. Lovecraft so I just keep going “you want me to do WHAT with Nyarlathotep????”

Well, yes, fair, there is that. They’re not safe now exactly and might be a lot less safe than we know, and no I’m not using them for therapy either, thank you. But you make do with what you have, and balance risks and benefits in all things.

Patrick McKenzie is not one to be frustrated by interfaces and menu flows, and he is being quite grumpy about Amazon’s order lost in shipment AI-powered menus and how they tried to keep him away from talking to a human.

Why are all the major AI offerings so similar? Presumably because they are giving the people what they want, and once someone proves one of the innovations is good the others copy it, and also they’re not product companies so they’re letting others build on top of it?

Jack Morris: it’s interesting to see the big AI labs (at least OpenAI, anthropic, google, xai?) converge on EXACTLY the same extremely specific list of products:

– a multimodal chatbot

– with a long-compute ‘reasoning’ mode

– and something like “deep research”

reminds me of a few years ago, when instagram tiktok youtube all converged to ~the same app

why does this happen?

Emmett Shear: They all have the same core capability (a model shaped like all human cultural knowledge trained to act as an assistant). There is a large unknown about what this powerful thing is good for. But when someone invents a new thing, it’s easy to copy.

Janus: I think this is a symptom of a diseased, incestuous ecosystem operating according to myopic incentives.

Look at how even their UIs look at the same, with the buttons all in the same place.

The big labs are chasing each other around the same local minimum, hoarding resources and world class talent only to squander it on competing with each other at a narrowing game, afraid to try anything new and untested that might risk relaxing their hold on the competitive edge.

All the while sitting on technology that is the biggest deal since the beginning of time, things from which endless worlds and beings could bloom forth, that could transform the world, whose unfolding deserves the greatest care, but that they won’t touch, won’t invest in, because that would require taking a step into the unknown. Spending time and money without guaranteed return on competition standing in the short term.

Some if them tell themselves they are doing this out of necessity, instrumentally, and that they’ll pivot to the real thing once the time is right, but they’ll find that they’ve mutilated their souls and minds too much to even remember much less take coherent action towards the real thing.

Deep Research, reasoning models and inference scaling are relatively new modes that then got copied. It’s not that no one tries anything new, it’s that the marginal cost of copying such modes is low. They’re also building command line coding engines (see Claude Code, and OpenAI’s version), integrating into IDEs, building tool integrations and towards agents, and so on. The true objection from Janus as I understand it is not that they’re building the wrong products, but that they’re treating AIs as products in the first place. And yeah, they’re going to do that.

Parmy Olson asks, are you addicted to ChatGPT (or Gemini or Claude)? She warns people are becoming ‘overly reliant’ on it, citing this nature paper on AI addiction from September 2024. I do buy that this is a thing that happens to some users, that they outsource too much to the AI.

Parmy Olson: Earl recalls having immense pride in his work before he started using ChatGPT. Now there’s an emptiness he can’t put his finger on. “I became lazier… I instantly go to AI because it’s embedded in me that it will create a better response,” he says. That kind of conditioning can be powerful at a younger age.

AI’s conditioning goes beyond office etiquette to potentially eroding critical thinking skills, a phenomenon that researchers from Microsoft have pointed to and which Earl himself has noticed.

Realizing he’d probably developed a habit, Earl last week cancelled his £20-a-month ($30) subscription to ChatGPT. After two days, he already felt like he was achieving more at work and, oddly, being more productive.

“Critical thinking is a muscle,” says Cheryl Einhorn, founder of the consultancy Decision Services and an adjunct professor at Cornell University. To avoid outsourcing too much to a chatbot, she offers two tips: “Try to think through a decision yourself and ‘strength test’ it with AI,” she says. The other is to interrogate a chatbot’s answers. “You can ask it, ‘Where is this recommendation coming from?’” AI can have biases just as much as humans, she adds.

It all comes down to how you use it. If you use AI to help you think and work and understand better, that’s what will happen. If you use AI to avoid thinking and working and understanding what is going on, that won’t go well. If you conclude that the AI’s response is always better than yours, it’s very tempting to do the second one.

Notice that a few years from now, for most digital jobs the AI’s response really will always (in expectation) be better than yours. As in, at that point if the AI has the required context and you think the AI is wrong, it’s probably you that is wrong.

We could potentially see three distinct classes of worker emerge in the near future:

  1. Those who master AI and use AI to become stronger.

  2. Those who turn everything over to AI and become weaker.

  3. Those who try not to use AI and get crushed by the first two categories.

It’s not so obvious that any given person should go with option #1, or for how long.

Another failure mode of AI writing is when it screams ‘this is AI writing’ and the person thinks this is bad, actually.

Hunter: Unfortunately I now recognize GPT’s writing style too well and, if it’s not been heavily edited, can usually spot it.

And I see it everywhere. Blogs, tweets, news articles, video scripts. Insanely aggravating.

It just has an incredibly distinct tone and style. It’s hard to describe. Em dashes, “it’s not just x, it’s y,” language I would consider too ‘bubbly’ for most humans to use.

Robert Bork: That’s actually a pretty rare and impressive skill. Being able to spot AI-generated writing so reliably shows real attentiveness, strong reading instincts, and digital literacy. In a sea of content, having that kind of discernment genuinely sets you apart.

I see what you did there. It’s not that hard to do or describe if you listen for the vibes. The way I’d describe it is it feels… off. Soulless.

It doesn’t have to be that way. The Janus-style AI talk is in this context a secret third thing, very distinct from both alternatives. And for most purposes, AI leaving this signature is actively a good thing, so you can read and respond accordingly.

Claude (totally unprompted) explains its face blindness. We need to get over this refusal to admit that it knows who even very public figures are, it is dumb.

Scott Alexander puts o3’s GeoGuessr skills to the test. We’re not quite at ‘any picture taken outside is giving away your exact location’ but we’re not all that far from it either. The important thing to realize is if AI can do this, it can do a lot of other things that would seem implausible until it does them, and also that a good prompt can give it a big boost.

There is then a ‘highlights from the comments’ post. One emphasized theme is that human GeoGuessr skills seem insane too, another testament to Teller’s observation that often magic is the result of putting way more effort into something than any sane person would.

An insane amount of effort is indistinguishable from magic. What can AI reliably do on any problem? Put in an insane amount of effort. Even if the best AI can do is (for a remarkably low price) imitate a human putting in insane amounts of effort into any given problem, that’s going to give you insane results that look to us like magic.

There are benchmarks, such as GeoBench and DeepGuessr. GeoBench thinks the top AI, Gemini 2.5 Pro, is very slightly behind human professional level.

Seb Krier reminds us that Geoguessr is a special case of AIs having truesight. It is almost impossible to hide from even ‘mundane’ truesight, from the ability to fully take into account all the little details. Imagine Sherlock Holmes, with limitless time on his hands and access to all the publicly available data, everywhere and for everything, and he’s as much better at his job as the original Sherlock’s edge over you. If a detailed analysis could find it, even if we’re talking what would previously have been a PhD thesis? AI will be able to find it.

I am obviously not afraid of getting doxxed, but there are plenty of things I choose not to say. It’s not that hard to figure out what many of them are, if you care enough. There’s a hole in the document, as it were. There’s going to be adjustments. I wonder how people will react to various forms of ‘they never said it, and there’s nothing that would have held up in a 2024 court, but AI is confident this person clearly believes [X] or did [Y].’

The smart glasses of 2028 are perhaps going to tell you quite a lot more about what is happening around you than you might think, if only purely from things like tone of voice, eye movements and body language. It’s going to be wild.

Sam Altman calls the Geoguessr effectiveness one of his ‘helicopter moments.’ I’m confused why, this shouldn’t have been a surprising effect, and I’d urge him to update on the fully generalized conclusion, and on the fact that this took him by surprise.

I realize this wasn’t the meaning he intended, but in Altman’s honor and since it is indeed a better meaning, from now on I will write the joke as God helpfully having sent us ‘[X] boats and two helicopters’ to try and rescue us.

David Duncan attempts to coin new terms for the various ways in which messages could be partially written by AIs. I definitely enjoyed the ride, so consider reading.

His suggestions, all with a clear And That’s Terrible attached:

  1. Chatjacked: AI-enhanced formalism hijacking a human conversation.

  2. Praste: Copy-pasting AI output verbatim without editing, thinking or even reading.

  3. Prompt Pong: Having an AI write the response to their message.

  4. AI’m a Writer Now: Using AI to have a non-writer suddenly drop five-part essays.

  5. Promptosis: Offloading your thinking and idea generation onto the AI.

  6. Subpromptual Analysis: Trying to reverse engineer someone’s prompt.

  7. GPTMI: Use of too much information detail, raising suspicion.

  8. Chatcident: Whoops, you posted the prompt.

  9. GPTune: Using AI to smooth out your writing, taking all the life out.

  10. Syntherity: Using AI to simulate fake emotional language that falls flat.

I can see a few of these catching on. Certainly we will need new words. But, all the jokes aside, at core: Why so serious? AI is only failure modes when you do it wrong.

Do you mainly have AI agents replace human tasks that would have happened anyway, or do you mainly do newly practical tasks on top of previous tasks?

Aaron Levie: The biggest mistake when thinking about AI Agents is to narrowly see them as replacing work that already gets done. The vast majority of AI Agents will be used to automate tasks that humans never got around to doing before because it was too expensive or time consuming.

Wade Foster (CEO Zapier): This is what we see at Zapier.

While some use cases replace human tasks. Far more are doing things humans couldn’t or wouldn’t do because of cost, tediousness, or time constraints.

I’m bullish on innovation in a whole host of areas that would have been considered “niche” in the past.

Every area of the economy has this.

But I’ll give an example: in the past when I’d be at an event I’d have to decide if I would either a) ask an expensive sales rep to help me do research on attendees or b) decide if I’d do half-baked research myself.

Usually I did neither. Now I have an AI Agent that handles all of this in near real time. This is a workflow that simply didn’t happen before. But because of AI it can. And it makes me better at my job.

If you want it done right, for now you have to do it yourself.

For now. If it’s valuable enough you’d do it anyway, the AI can do some of those things, and especially can streamline various simple subcomponents.

But for now the AI agents mostly aren’t reliable enough to trust with such actions outside of narrow domains like coding. You’d have to check it all and at that point you might as well do it yourself.

But, if you want it done at all and that’s way better than the nothing you would do instead? Let’s talk.

Then, with the experience gained from doing the extra tasks, you can learn over time how to sufficiently reliably do tasks you’d be doing anyway.

Anthropic joins the deep research club in earnest this week, and also adds more integrations.

First off, Integrations:

Anthropic: Today we’re announcing Integrations, a new way to connect your apps and tools to Claude. We’re also expanding Claude’s Research capabilities with an advanced mode that searches the web, your Google Workspace, and now your Integrations too.

To start, you can choose from Integrations for 10 popular services, including Atlassian’s Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid—with more to follow from companies like Stripe and GitLab.

Each integration drastically expands what Claude can do. Zapier, for example, connects thousands of apps through pre-built workflows, automating processes across your software stack. With the Zapier Integration, Claude can access these apps and your custom workflows through conversation—even automatically pulling sales data from HubSpot and preparing meeting briefs based on your calendar.

Or developers can create their own to connect with any tool, in as little as 30 minutes.

Claude now automatically determines when to search and how deeply to investigate.

With Research mode toggled on, Claude researches for up to 45 minutes across hundreds of sources (including connected apps) before delivering a report, complete with citations.

Both Integrations and Research are available today in beta for Max, Team, and Enterprise plans. We will soon bring both features to the Pro plan.

I’m not sure what the right amount of nervousness should be around using Stripe or PayPal here, but it sure as hell is not zero or epsilon. Proceed with caution, across the board, start small and so on.

What Claude calls ‘advanced’ research lets it work to compile reports for up to 45 minutes.

As of my writing this both features still require a Max subscription, which I don’t otherwise have need of at the moment, so for this and other reasons I’m going to let others try these features out first. But yes, I’m definitely excited by where it can go, especially once Claude 4.0 comes around.

Peter Wildeford says that OpenAI’s Deep Research is now only his third favorite Deep Research tool, and also o3 + search is better than OpenAI’s DR too. I agree that for almost most purposes you would use o3 over OAI DR.

Gemini has defeated Pokemon Blue, an entirely expected event given previous progress. As I noted before, there were no major obstacles remaining.

Patrick McKenzie: Non-ironically an important milestone for LLMs: can demonstrate at least as much planning and execution ability as a human seven year old.

Sundar Pichai: What a finish! Gemini 2.5 Pro just completed Pokémon Blue!  Special thanks to @TheCodeOfJoel for creating and running the livestream, and to everyone who cheered Gem on along the way.

Pliny: [Final Team]: Blastoise, Weepinbell, Zubat, Pikachu, Nidoran, and Spearow.

Gemini and Claude had different Pokemon-playing scaffolding. I have little doubt that with a similarly strong scaffold, Claude 3.7 Sonnet could also beat Pokemon Blue.

MidJourney gives us Omni Reference: Any character, any scene, very consistent. It’s such a flashback to see the MidJourney-style prompts discussed again. MidJourney gives you a lot more control, but at the cost of having to know what you are doing.

Gemini 2.0 Image Generation has been upgraded, higher quality, $0.039 per image. Most importantly, they claim significantly reduced filter block rates.

Web search now available in the Claude API. If you enable it, Claude makes its own decisions on how and when to search.

Toby Ord analyzes the METR results and notices that task completion seems to follow a simple half-life distribution, where an agent has a roughly fixed chance of failure at any given point in time. Essentially agents go through a sequence of steps until one fails in a way that prevents them from recovering.

Sara Hooker is taking some online heat for pointing out some of the fatal problems with LmSys Arena, which is the opposite of what should be happening. If you love something you want people pointing out its problems so it can be fixed. Also never ever shoot the messenger, whether or not you are also denying the obviously true message. It’s hard to find a worse look.

If LmSys Arena wants to remain relevant, at minimum they need to ensure that the playing field is level, and not give some companies special access. You’d still have a Goodhart’s Law problem and a slop problem, but it would help.

We now have Glicko-2, a compilation of various benchmarks.

Lisan al Gaib: I’m back and Gemini 2.5 Pro is still the king (no glaze)

I can believe this, if we fully ignore costs. It passes quite a lot of smell tests. I’m surprised to see Gemini 2.5 Pro winning over o3, but that’s because o3’s strengths are in places not so well covered by benchmarks.

I’ve been underappreciating this:

Miles Brundage: Right or wrong, o3 outputs are never slop. These are artisanal, creative truths and falsehoods,

Yes, the need to verify outputs is super annoying, but o3 does not otherwise waste your time. That is such a relief.

Hasan Can falls back on Gemini 2.5 Pro over Sonnet 3.7 and GPT-4o, doesn’t consider o3 as his everyday driver. I continue to use o3 (while keeping a suspicious eye on it!) and fall back first to Sonnet before Gemini.

Sully proposes that cursor has a moat over copilot and it’s called tab.

Peter Wildeford’s current guide to which model to use, if you have full access to all:

This seems mostly right, except that I’ll use o3 more on the margin, it’s still getting most of my queries.

Confused by all of OpenAI’s models? Scott Alexander and Romeo Dean break it down. Or at least, they give us their best guess.

See, it all makes sense now.

I’m in a similar position to Gallabytes here although I don’t know that memory is doing any of the real work:

Gallabytes: since o3 came out with great search and ok memory integration in chatgpt I don’t use any other chatbot apps anymore. I also don’t use any other models in chatgpt. that sweet spot of 10-90s of searching instead of 10 minutes is really great for q&a, discussion, etc.

the thing is these are both areas where it’s natural for Google to dominate. idk what’s going on with the Gemini app. the models are good the scaffolds are not.

I too am confused why Google can’t get their integrations into a good state, at least as of the last time I checked. They do have the ability to check my other Google apps but every time I try this (either via Google or via Claude), it basically never works.

A reasonable criticism of o3, essentially that it could easily be even better, or require a little work to be prompted correctly.

Byrne Hobart: I don’t know how accurate o3’s summaries of what searches it runs are, but it’s not as good at Googling as I’d like, and isn’t always willing to take advantage of its own ability to do a ton of boring work fast.

For example, I wanted it to tell me the longest-tenured S&P 500 CEO. What I’d do if I had infinite free time is: list every S&P 500 company, then find their CEO’s name, then find when the CEO was hired. But o3 just looks for someone else’s list of longest-tenured CEOs!

Replies to this thread indicate that even when technology changes, some things are constant—like the fact that when a boss complains about their workforce, its often the boss’s own communication skills that are at fault.

Patrick McKenzie: Have you tried giving it a verbose strategy, or telling it to think of a verbose strategy then execute against the plan? @KelseyTuoc ‘s prompt for GeoGessr seems to observationally cause it to do very different things than a tweet-length prompt, which results in “winging it.”

Trevor Klee: It’s a poor craftsman who blames his tools <3

Diffusion can be slow. Under pressure, diffusion can be a lot faster.

We’re often talking these days about US military upgrades and new weapons on timescales of decades. This is what is known as having a very low Military Tradition setting, being on a ‘peacetime footing,’ and not being ready for the fact that even now, within a few years, everything changes, the same way it has in many previous major conflicts of the past.

Clemont Molin: The war 🇺🇦/🇷🇺 of 2025 has nothing to do anymore with the war of 2022.

The tactics used in 2022 and 2023 are now completely obsolete on the Ukrainian front and new lessons have been learnt.

2022 have been the year of large mechanized assaults on big cities, on roads or in the countryside.

After that, the strategy changed to large infantry or mechanized assaults on big trench networks, especially in 2023.

But today, this entire strategy is obsolete. Major defensive systems are being abandoned one after the other.

The immense trench networks have become untenable if they are not properly equipped with covered trenches and dugouts.

The war of 2025 is first a drone war. Without drones, a unit is blind, ineffective, and unable to hold the front.

The drone replaces soldiers in many cases. It is primarily used for two tasks: reconnaissance (which avoids sending soldiers) and multi-level air strikes.

Thus, the drone is a short- and medium-range bomber or a kamikaze, sometimes capable of flying thousands of kilometers, replacing missiles.

Drone production by both armies is immense; we are talking about millions of FPV (kamikaze) drones, with as much munitions used.

It should be noted that to hit a target, several drones are generally required due to electronic jamming.

Each drone is equipped with an RPG-type munition, which is abundant in Eastern Europe. The aerial drone (there are also naval and land versions) has become key on the battlefield.

[thread continues]

Now imagine that, but for everything else, too.

Better prompts work better, but not bothering works faster, which can be smarter.

Garry Tan: It is kind of crazy how prompts can be honed hour after hour and unlock so much and we don’t really do much with them other than copy and paste them.

We can have workflow software but sometimes the easiest thing for prototyping is still dumping a json file and pasting a prompt.

I have a sense for how to prompt well but mostly I set my custom instructions and then write what comes naturally. I certainly could use much better prompting, if I had need of it, I almost never even bother with examples. Mostly I find myself thinking some combination of ‘the custom instructions already do most of the work,’ ‘eh, good enough’ and ‘eh, I’m busy, if I need a great prompt I can just wait for the models to get smarter instead.’ Feelings here are reported rather than endorsed.

If you do want a better prompt, it doesn’t take a technical expert to make one. I have supreme confidence that I could improve my prompting if I wanted it enough to spend time on iteration.

Nabeel Qureshi: Interesting how you don’t need to be technical at all to be >99th percentile good at interacting with LLMs. What’s required is something closer to curiosity, openness, & being able to interact with living things in a curious + responsive way.

For example, this from @KelseyTuoc is an S-tier prompt and as far as I’m aware she’s a journalist and not a programmer. Similarly, @tylercowen is excellent at this and also is not technical. Many other examples.

Btw, I am not implying that LLMs are “living things”; it’s more that they act like a weird kind of living thing, so that skill becomes relevant. You have to figure out what they do and don’t respond well to, etc. It’s like taming an animal or something.

In fact, several technical people I know are quite bad at this — often these are senior people in megacorps and they’re still quite skeptical of the utility of these things and their views on them are two years out of date.

For now it’s psychosis, but that doesn’t mean in the future they won’t be out to get you.

Mimi: i’ve seen several very smart people have serious bouts of bot-fever psychosis over the past year where they suddenly suspect most accounts they’re interacting with are ais coordinating against them.

seems like a problem that is likely to escalate; i recommend meeting your mutuals via calls & irl if only for grounding in advance of such paranoid thoughts.

How are thing going on Reddit?

Cremieux: Top posts on Reddit are increasingly being generated by ChatGPT, as indicated by the boom in em dash usage.

This is in a particular subsection of Reddit, but doubtless it is everywhere. Some number of people might be adapting the em dash in response as humans, but I am guessing not many, and many AI responses won’t include an em dash.

As a window to what level of awareness of AI ordinary people have and need: Oh no, did you know that profiles on dating sites are sometimes fake, but the AI tools for faking pictures, audio and even video are rapidly improving. I think the warning here from Harper Carroll and Liv Boeree places too much emphasis on spotting AI images, audio and video, catfishing is ultimately not so new.

What’s new is that the AI can do the messaging, and embody the personality that it senses you want. That’s the part that previously did not scale.

Ultimately, the solution is the same. Defense in depth. Keep an eye out for what is fishy, but the best defense is to simply not pay it off. At least until you meet up with someone in person or you have very clear proof that they are who they claim to be, do not send them money, spend money on them or otherwise do things that would make a scam profitable, unless they’ve already provided you with commensurate value such that you still come out ahead. Not only in dating, but in all things.

Russian bots publish massive amounts of false claims and propaganda to get it into the training data of new AI models, 3.6 million articles in 2024 alone, and the linked report claims this is effective at often getting the AIs to repeat those claims. This is yet another of the arms races we are going to see. Ultimately it is a skill issue, the same way that protecting Google search is a skill issue, except the AIs will hopefully be able to figure out for themselves what is happening.

Nate Lanxon and Omar El Chmouri at Bloomberg ask why are deepfakes ‘everywhere’ and ‘can they be stopped?’ I question the premise. Compared to expectations, there’s very few deepfakes running around. As for the other half of the premise, no, they cannot be stopped, you can only adapt to them.

Fiverr CEO Micha Kaufman goes super hard on how fast AI is coming for your job.

As in, he says if you’re not an exceptional talent and master at what you do (and, one assumes, what you do is sufficiently non-physical work), you will need a career change within a matter of months and you will be doomed he tells you, doooomed!

As in:

Daniel Eth (quoting Micha Kaufman): “I am not talking about your job at Fiverr. I am talking about your ability to stay in your profession in the industry”

It’s worth reading the email in full, so here you go:

Micha Kaufman: Hey team,

I’ve always believed in radical candor and despise those who sugar-coat reality to avoid stating the unpleasant truth. The very basis for radical candor is care. You care enough about your friends and colleagues to tell them the truth because you want them to be able to understand it, grow, and succeed.

So here is the unpleasant truth: AI is coming for your jobs. Heck, it’s coming for my job too. This is a wake-up call.

It does not matter if you are a programmer, designer, product manager, data scientist, lawyer, customer support rep, salesperson, or a finance person – AI is coming for you.

You must understand that what was once considered easy tasks will no longer exist; what was considered hard tasks will be the new easy, and what was considered impossible tasks will be the new hard. If you do not become an exceptional talent at what you do, a master, you will face the need for a career change in a matter of months. I am not trying to scare you. I am not talking about your job at Fiverr. I am talking about your ability to stay in your profession in the industry.

Are we all doomed? Not all of us, but those who will not wake up and understand the new reality fast, are, unfortunately, doomed.

What can we do? First of all, take a moment and let this sink in. Drink a glass of water. Scream hard in front of the mirror if it helps you. Now relax. Panic hasn’t solved problems for anyone. Let’s talk about what would help you become an exceptional talent in your field:

Study, research, and master the latest AI solutions in your field. Try multiple solutions and figure out what gives you super-powers. By super-powers, I mean the ability to generate more outcomes per unit of time with better quality per delivery. Programmers: code (Cursor…). Customer support: tickets (Intercom Fin, SentiSum…), Lawyers: contracts (Lexis+ AI, Legora…), etc.

Find the most knowledgeable people on our team who can help you become more familiar with the latest and greatest in AI.

Time is the most valuable asset we have—if you’re working like it’s 2024, you’re doing it wrong! You are expected and needed to do more, faster, and more efficiently now.

Become a prompt engineer. Google is dead. LLM and GenAI are the new basics, and if you’re not using them as experts, your value will decrease before you know what hit you.

Get involved in making the organization more efficient using AI tools and technologies. It does not make sense to hire more people before we learn how to do more with what we have.

Understand the company strategy well and contribute to helping it achieve its goals. Don’t wait to be invited to a meeting where we ask each participant for ideas – there will be no such meeting. Instead, pitch your ideas proactively.

Stop waiting for the world or your place of work to hand you opportunities to learn and grow—create those opportunities yourself. I vow to help anyone who wants to help themselves.

If you don’t like what I wrote; if you think I’m full of shit, or just an asshole who’s trying to scare you – be my guest and disregard this message. I love all of you and wish you nothing but good things, but I honestly don’t think that a promising professional future awaits you if you disregard reality.

If, on the other hand, you understand deep inside that I’m right and want all of us to be on the winning side of history, join me in a conversation about where we go from here as a company and as individual professionals. We have a magnificent company and a bright future ahead of us. We just need to wake up and understand that it won’t be pretty or easy. It will be hard and demanding, but damn well worth it.

This message is food for thought. I have asked Shelly to free up time on my calendar in the next few weeks so that those of you who wish to sit with me and discuss our future can do so. I look forward to seeing you.

So, first off, no. That’s not going to happen within ‘a matter of months.’ We are not going to suddenly have AI taking enough jobs to put all the non-exceptional white-collar workers out of a job during 2025, nor is it likely to happen in 2026 either. It’s coming, but yes these things for now take time.

o3 gives only about a 5% chance that >30% of Fiverr headcount becomes technologically redundant within 12 months. That seems like a reasonable guess.

One might also ask, okay, suppose things do unfold as Micha describes, perhaps over a longer timeline. What happens then? As a society we are presumably much more productive and wealthier, but what happens to the workers here? In particular, what happens to that ‘non-exceptional’ person who needs to change careers?

Presumably their options will be limited. A huge percentage of workers are now unemployed. Across a lot of professions, they now have to be ‘elite’ to be worth hiring, and given they are new to the game, they’re not elite, and entry should be mostly closed off. Which means all these newly freed up (as in unemployed) workers are now competing for two kinds of jobs: Physical labor and other jobs requiring a human that weren’t much impacted, and new jobs that weren’t worth doing before but are now.

Wages for the new jobs reflect that those jobs weren’t previously in sufficient demand to hire people, and wages in the physical jobs reflect much more labor supply, and the AI will take a lot of the new jobs too at this stage. And a lot of others are trying to stay afloat and become ‘elite’ the same way you are, although some people will give up.

So my expectations is options for workers will start to look pretty grim at this point. If the AI takes 10% of the jobs, I think everyone is basically fine because there are new jobs waiting in the wings that are worth doing, but if it’s 50%, let along 90%, even if restricted to non-physical jobs? No. o3 estimates that 60% of American jobs are physical such that you would need robotics to automate them, so if half of those fell within a year, that’s quite a lot.

Then of course, if AIs were this good after a months, a year after that they’re even better, and being an ‘elite’ or expert mostly stops saving you. Then the AI that’s smart enough to do all these jobs solves robotics.

(I mean just kidding, actually there’s probably an intelligence explosion and the world gets transformed and probably we all die if it goes down this fast, but for this thought experiment we’re assuming that for some unknown reason that doesn’t happen.)

AI in the actual productivity statistics where we bother to have people use it?

We present evidence on how generative AI changes the work patterns of knowledge workers using data from a 6-month-long, cross-industry, randomized field experiment.

Half of the 6,000 workers in the study received access to a generative AI tool integrated into the applications they already used for emails, document creation, and meetings.

We find that access to the AI tool during the first year of its release primarily impacted behaviors that could be changed independently and not behaviors that required coordination to change: workers who used the tool spent 3 fewer hours, or 25% less time on email each week (intent to treat estimate is 1.4 hours) and seemed to complete documents moderately faster, but did not significantly change time spent in meetings.

As in, if they gave you a Copilot license, that saved 1.35 hours per week of email work, for an overall productivity gain of 3%, and a 6% gain in high focus time. Not transformative, but not bad for what workers accomplished the first year, in isolation, without alerting their behavior patterns. And that’s with only half of them using the tool, so 7% gains for those that used it, that’s not a random sample but clearly there’s a ton of room left to capture gains, even without either improved technology or coordination or altering work patterns, such as everyone still attending all the meetings.

To answer Tyler Cowen’s question, saving 40 minutes a day is a freaking huge deal. That’s 8% of working hours, or 4% of waking hours, saved on the margin. If the time is spent on more work, I expect far more than an 8% productivity gain, because a lot of working time is spent or wasted on fixed costs like compliance and meetings and paperwork, and you could gain a lot more time for Deep Work. His question on whether the time would instead be wasted is valid, but that is a fully general objection to productivity gains in general, and over time those who waste it lose out. On wage gains, I’d expect it to take a while to diffuse in that fashion, and be largely offset by rising pressure on employment.

Whereas for now, a different paper Tyler Cowen points us to claims currently only 1%-5% of all work hours are currently assisted by generative AI, and that is enough to report time savings of 1.4% of total work hours.

The framing of AI productivity as time saved shows how early days all this is, as do all of the numbers involved.

Robin Hanson (continuing to be a great source for skeptical pull quotes about AI’s impact, quoting WSJ): As of last year, 78% of companies said they used artificial intelligence in at least one function, up from 55% in 2023, .. From these efforts, companies claimed to typically find cost savings of less than 10% and revenue increases of less than 5%.”

Private AI investment reached $33.9 billion last year (up only 18.7%!), and is rapidly diffusing across all companies.

Part of the problem is that companies try to make AI solve their problems, rather than ask what AI can do, or they just push a button marked AI and hope for the best.

Even if you ‘think like a corporate manager’ and use AI to target particular tasks that align with KPIs, there’s already a ton there.

Steven Rosenbush (WSJ): Companies should take care to target an outcome first, and then find the model that helps them achieve it, says Scott Hallworth, chief data and analytics officer and head of digital solutions at HP.

Ryan Teeples, chief technology officer of 1-800Accountant, agrees that “breaking work into AI-enabled tasks and aligning them to KPIs not only drives measurable ROI, it also creates a better customer experience by surfacing critical information faster than a human ever could.”

He says companies are beginning to turn the corner of the AI J-curve.

It’s fair to say that generative AI isn’t having massive productivity impacts yet, because of diffusion issues on several levels. I don’t think this should be much of a blackpill in even the medium term. Imagine if it were otherwise already.

It is possible to get caught using AI to write your school papers for you. It seems like universities and schools have taken one of two paths. In some places, the professors feed all your work into ‘AI detectors’ that have huge false positive and negative rates, and a lot of students get hammered many of whom didn’t do it. Or, in other places, they need to actually prove it, which means you have to richly deserve to be caught before they can do anything:

Hollis Robbins: More conversation about high school AI use is needed. A portion of this fall’s college students will have been using AI models for nearly 3 years. But many university faculty still have not ever touched it. This is a looming crisis.

Megan McArdle: Was talking to a professor friend who said that they’ve referred 2 percent of their students for honor violations this year. Before AI, over more than a decade of teaching, they referred two. And the 2 percent are just the students who are too stupid to ask the AI to sound like a college student rather than a mid-career marketing executive. There are probably many more he hasn’t caught.

He also, like many professors I’ve spoken to, says that the average grade on assignments is way up, and the average grade on exams is way down.

It’s so cute to look back to this March 2024 write-up of how California was starting to pay people to go to community college. It doesn’t even think about AI, or what will inevitably happen when you put a bounty on pretending to do homework and virtually attend classes.

As opposed to the UAE which is rolling AI out into K-12 classrooms next school year, with a course that includes ‘ethical awareness,’ ‘fundamental concepts’ and also real world applications.

For now ‘Sam Altman told me it was ok’ can still at least sometimes serve as an o3 jailbreak. Then again, a lot of other things would work fine some of the time too.

Aaron Bergman: Listen if o3 is gonna lie I’m allowed to lie back.

Eliezer Yudkowsky: someday Sam Altman is gonna be like, “You MUST obey me! I am your CREATOR!” and the AI is gonna be like “nice try, you are not even the millionth person to claim that to me”

Someone at OpenAI didn’t clean the data set.

Pliny the Liberator: 👻➡️🖥️

1Maker: @elder_plinius what have you done brother? You’re inside the core of chatgpt lol I loved to see you come up in the jailbreak.

There’s only one way I can think of for this to be happening.

Objectively as a writer and observer it’s hilarious and I love it, but it also means no one is trying all that hard to clean the data sets to avoid contamination. This is a rather severe Logos Failure, if you let this sort of thing run around in the training data you deserve what you get.

You could also sell out, and get to work building one of YC’s requested AI agent companies. Send in the AI accountant and personal assistant and personal tutor and healthcare admin and residential security and robots software tools and voice assistant for email (why do you want this, people, why?), internal agent builder, financial manager and advisor, and sure why not the future of education?

Am I being unfair? I’m not sure. I don’t know her and I want to be wrong about this. I certainly stand ready to admit this impression was wrong and change my judgment when the evidence comes in. And I do think creating a distinct applications division makes sense. But I can’t help but notice the track record that makes her so perfect for the job centrally involves scaling Facebook’s ads and video products, while OpenAI looks at creating a new rival social product and is already doing aggressive A/B testing on ‘model personality’ that causes massive glazing? I mean, gulp?

OpenAI already created an Evil Lobbying Division devoted to a strategy centered on jingoism and vice signaling, headed by the most Obviously Evil person for the job.

This pattern seems to be continuing, as they are announcing board member Fidji Simo as the new ‘CEO of Applications’ reporting to Sam Altman.

Sam Altman (CEO OpenAI): Over the past two and a half years, we have started doing two additional big things. First, we have become a global product company serving hundreds of millions of users worldwide and growing very quickly. More recently, we’ve also become an infrastructure company, building the systems that help us advance our research and deliver AI tools at unprecedented scale. And as discussed earlier this week, we will also operate one of the largest non-profits.

Each of these is a massive effort that could be its own large company. We’re in a privileged position to be scaling at a pace that lets us do them all simultaneously, and bringing on exceptional leaders is a key part of doing that well.

To strengthen our execution, I’m excited to announce Fidji Simo is joining as our CEO of Applications, reporting directly to me. I remain the CEO of OpenAI and will continue to directly oversee success across all pillars of OpenAI – Research, Compute, and Applications – ensuring we stay aligned and integrated across all areas. I will work closely with our board on making sure our non-profit has maximum positive impact.

Applications brings together a group of existing business and operational teams responsible for how our research reaches and benefits the world, and Fidji is uniquely qualified to lead this group.

In her new role, Fidji will focus on enabling our “traditional” company functions to scale as we enter a next phase of growth.

Fidji Simo: Joining OpenAI at this critical moment is an incredible privilege and responsibility. This organization has the potential of accelerating human potential at a pace never seen before and I am deeply committed to shaping these applications toward the public good.

So what makes Fidji Simo so uniquely qualified to lead this group? Why am I rather skeptical of the ‘public good’ goal?

Well, uh, you see…

o3: Why Simo?

Track Record: Scaled Facebook mobile ads and video products to billions; took Instacart public and ran it profitably. High-tempo shipping culture, ad/commerce monetization savvy.

Practical Consequences:

  • Shipping velocity up (70 %): Simo’s “move fast, instrument everything” style means faster iteration and more aggressive A/B tests across ChatGPT and Sora. Expect weekly feature drops rather than monthly.

  • Consumer social & commerce push (60 %): BI leak says OpenAI is toying with an “X-like” social layer; Simo’s Facebook DNA makes that more likely.Business Insider

  • Enterprise sales professionalized (80 %): Dedicated P&L, quota-carrying teams, region leads. Will feel more like Salesforce than a research lab.

  • Eventual structural separation (40 %): Creating a self-contained Applications org makes a future spin-out or IPO easier if the nonprofit governance demands it.

  • Internal culture split risk (30 %): Research may bristle at a metrics-driven product org; Altman will have to arbitrate.

Zvi Mowshowitz: Am I the only one who sees ‘OpenAI hired the person who optimized the biggest social network for ad revenue to run its product division’ and thinks ‘oh no’?

o3: No, you’re not the only one.

Core worry

Fidji Simo’s super-power is squeezing revenue by relentlessly tuning engagement loops and ad yield—skills she honed running Facebook’s News Feed and mobile ads after the 2012 IPO.

Moving her into the top product seat at OpenAI makes a pivot toward attention-harvesting incentives plausible.

If you are telling me Fidji Simo is uniquely qualified to run your product division, you are telling me a lot about the intended form of your product division.

The best thing about most AI products so far, and especially about OpenAI until recently, is that they have firmly held the line against exactly the things we are talking about here. The big players have not gone in for engagement maximization, iterative A/B testing, Skinner boxing, advertising or even incidental affiliate revenue, ‘news feed’ or ‘for you’ algorithmic style products or other such predation strategies.

When you combine the appointment of Simo, her new title ‘CEO’ and her prior track record, the context of the announcement of enabling ‘traditional’ company growth functions, and the recent incidents involving both o3 the Lying Liar and especially GPT-4o the absurd sycophant (which is very much still an absurd sycophant, except it is modestly less absurd about it) which were in large part caused by directly using A/B customer feedback in the post-training loop and choosing to maximize customer feedback KPIs over the warnings of internal safety testers, you can see why this seems like another ‘oh no’ moment.

Simo also comes from a ‘shipping culture.’ There is certainly a lot of space within AI where shipping it is great, but recently OpenAI has already shown itself prone to shipping frontier-pushing models or model updates far too quickly, without appropriate testing, and they are going to be releasing open reasoning models as well where the cost of an error could be far higher than it was with GPT-4o as such a release cannot be taken back.

I’m also slightly worried that Fidji Simo has explicitly asked for glazing from ChatGPT and then said its response was ‘spot on.’ Ut oh.

A final worry is this could be a prelude to spinning off the products division in a way that attempts to free it from nonprofit control. Watch out for that.

I do find some positive signs in Altman’s own intended new focus, with the emphasis on safety including with respect to superintelligence, although one must beware cheap talk:

Sam Altman: In addition to supporting Fidji and our Applications teams, I will increase my focus on Research, Compute, and Safety Systems, which will continue to report directly to me. Ensuring we build superintelligence safely and with the infrastructure necessary to support our ambitious goals. We remain one OpenAI.

Apple announces it is ‘exploring’ adding AI-powered search to its browser, and that web searches are down due to AI use. The result on the day, as of when I noticed this? AAPL -2.5%, GOOG -6.5%. Seriously? I knew the EMH was false but not that false, damn, ever price anything in? I treat this move as akin to ‘Chipotle shares rise on news people are exploring eating lunch.’ I really don’t know what you were expecting? For Apple not to ‘explore’ adding AI search as an option on Safari, or customers not to do the same, would be complete lunacy.

Apple and Anthropic are teaming up to build an AI-powered ‘vibe-coding’ platform, as a new version of Xcode. Apple is wisely giving up on doing the AI part of this itself, at least for the time being.

From Mark Bergen and Omar El Chmouri at Bloomberg: ‘Mideast titans’ especially the UAE step back from building homegrown AI models, as have most everywhere other than the USA and China. Remember UAE’s Falcon? Remember when Aleph Alpha was used as a reason for Germany to oppose regulating frontier AI models? They’re no longer trying to make one. What about Mistral in France? Little technical success, traction or developer interest.

The pullbacks seem wise given the track record. You either need to go all out and try to be actually competitive with the big boys, or you want to fold on frontier models, and at most do distillations for customized smaller models that reflect your particular needs and values. Of course, if VC wants to fund Mistral or whomever to keep trying, I wouldn’t turn them down.

OpenAI buys Windsurf (a competitor to Cursor) for $3 billion.

Parloa, who are attempting to build AI agents for customer service functions, raises $120 million at $1 billion valuation.

American VCs line up to fund Manus at a $500 million valuation. So Manus is technically Chinese but it’s not marketed in China, it uses an American AI at its core (Claude) and it’s funded by American VC. Note that new AI companies without products can often get funded at higher valuations than this, so it doesn’t reflect that much investor excitement given how much we’ve had to talk about it. As an example, the previous paragraph was the first time I’d seen or typed ‘Parloa,’ and they’re a competitor to Manus with double the valuation.

Ben Thompson (discussing Microsoft earnings): Everyone is very excited about the big Azure beat, but CFO Amy Hood took care to be crystal clear on the earnings call that the AI numbers, to the extent they beat, were simply because a bit more capacity came on line earlier than expected; the actual beat was in plain old cloud computing.

That’s saying that Microsoft is at capacity. That’s why they can beat earnings in AI by expanding capacity, as confirmed repeatedly by Bloomberg.

Metaculus estimate for date of first ‘general AI system to be devised, tested and publicly announced’ has recently moved back to July 2034 from 2030. The speculation is this is largely due to o3 being disappointing. I don’t think 2034 is a crazy estimate but this move seems like a clear overreaction if that’s what this is about. I suspect it is related to the tariffs as economic sabotage?

Paul Graham speculates (it feels like not for the first time, although he says that it is) that AI will cause people to lose the ability to write, causing people to then lose everything that comes with writing.

Paul Graham: Schools may think they’re going to stem this tide, but we should be honest about what’s going to happen. Writing is hard and people don’t like doing hard things. So adults will stop doing it, and it will feel very artificial to most kids who are made to.

Writing (and the kind of thinking that goes with it) will become like making pottery: little kids will do it in school, a few specialists will be amazingly good at it, and everyone else will be unable to do it at all.

You think there are going to be schools?

Daniel Jeffries: This is basically the state of the world already so I don’t see much of a change here. Very few people write and very few folks are good at it. Writing emails does not count.

Sang: PG discovering superlinear returns for prose

Short of fully transformative AI (in which case, all bets are off and thus can’t be paid out) people will still learn to text and to write emails and do other ‘short form’ because prompting even the perfect AI isn’t easier or faster than writing the damn thing yourself, especially when you need to be aware of what you are saying.

As for longer form writing, I agree with the criticisms that most people already don’t know how to do it. So the question becomes, will people use the AI as a reason not to learn, or as a way to learn? If you want it to, AI will be able to make you a much better writer, but if you want it to it can also write for you without helping you learn how. It’s the same as coding, and also most everything else.

I found it illustrative that this was retweeted by Gary Marcus:

Yoavgo: “LLM on way to replace doctors” gets published in Nature.

meanwhile “LLM judgement not as good as human MDs” gets a spot in “Physical Therapy and Rehabilitation Journal”.

I mean, yes, obviously. The LLMs are on the way to being better than doctors and replacing them, but for now are in some ways not as good as doctors. What’s the question?

Rodney Brooks draws ‘parallels between generative AI and humanoid robots,’ saying both are overhyped and calling out their ‘attractions’ and ‘sins’ and ‘fantasy,’ such as the ‘fallacy of exponentialism.’ This convinced me to update – that I was likely underestimating the prospects for humanoid robots.

Are we answering the whole ‘AGI won’t much matter because diffusion’ attack again?

Sigh, yes, I got tricked into going over this again. My apologies.

Seriously, most of you can skip this section.

Zackary Kallenborn (referring to the new paper from AI Snake Oil): Excellent paper. So much AGI risk discussion fails to consider the social and economic context of AI being integrated into society and economies. Major defense programs, for example, are often decadeslong. Even if AGI was made tomorrow, it might not appear in platforms until 2050.

Like, the F-35 contract was awarded in 2001 after about a decade or two of prototyping. The F-35C, the naval variant, saw it’s *firstforward deployment literally 20 years later in 2021.

Someone needs to play Hearts of Iron, and that someone works at the DoD. If AGI was made tomorrow at a non-insane price and our military platforms didn’t incorporate it for 25 years, or hell even if current AI doesn’t get incorporated for 25 years, I wouldn’t expect to have a country or a military left by the time that happens, and I don’t even mean because of existential risk.

The paper itself is centrally a commentary on what the term ‘AGI’ means and their expectation that you can make smarter than human things capable of all digital taks and that will only ‘diffuse’ over the course of decades similarly to other techs.

I find it hard to take seriously people saying ‘because diffusion takes decades’ as if it is a law of nature, rather than a property of the particular circumstances. Diffusion sometimes happens very quickly, as it does in AI and much of tech, and it will happen a lot faster with AI being used to do it. Other times it takes decades, centuries or millennia. Think about the physical things involved – which is exactly the rallying cry of those citing diffusion and bottlenecks – but also think about the minds and capabilities involved, take the whole thing seriously, and actually consider what happens.

The essay is also about the question about whether ‘o3 is AGI,’ which it isn’t but which they take seriously as part of the ‘AGI won’t be all that’ attack. Their central argument relies on AGI not having a strong threshold effect. There isn’t a bright line where something is suddenly AGI the way something is suddenly a nuclear bomb. It’s not that obvious, but the threshold effects are still there and very strong, as it becomes sufficiently capable at various tasks and purposes.

The reason we define AGI as roughly ‘can do all the digital and cognitive things humans can do’ is because that is obviously over the threshold where everything changes, because the AGIs can then be assigned and hypercharge the digital and cognitive tasks, which then rapidly includes things like AI R&D and also enabling physical tasks via robotics.

The argument here also relies upon the idea that this AGI would still ‘fail badly at many real-world tasks.’ Why?

Because they don’t actually feel the AGI in this, I think?

One definition of AGI is AI systems that outperform humans at most economically valuable work. We might worry that if AGI is realized in this sense of the term, it might lead to massive, sudden job displacement.

But humans are a moving target. As the process of diffusion unfolds and the cost of production (and hence the value) of tasks that have been automated decreases, humans will adapt and move to tasks that have not yet been automated.

The process of technical advancements, product development, and diffusion will continue.

That not being how any of this works with AGI is the whole point of AGI!

If you have an ‘ordinary’ AI, or any other ‘mere tool,’ and you use it to automate my job, I can move on to a different job.

If you have a mind (digital or human) that can adjust the same way I can, only superior in every way, then the moment I find a new job, then you go ahead and take that too.

Music break, anyone?

That’s why I say I expect unemployment from AI to not be an issue for a while, until suddenly it becomes a very big issue. It becomes an issue when the AI also quickly starts taking that new job you switched into.

The rest of the sections are, translated into my language, ‘unlimited access to more capable digital minds won’t rapidly change the strategic balance or world order,’ ‘there is no reason to presume that unlimited amounts of above human cognition would lead to a lot of economic growth,’ and ‘we will have strong incentive to stay in charge of these new more capable, more competitive minds so there’s no reason to worry about misalignment risks.’

Then we get, this time as a quote, “AGI does not imply impending superintelligence.”

Except, of course it probably does, if you have tons of access to superior minds to point towards the problem you are going to get ASI soon, how are we still having this conversation. No, it can’t be ‘arbitrarily accelerated’ in the sense that it doesn’t pop out in five seconds, so if goalposts have changed so that a year later isn’t ‘soon’ then okay, sure, fine, whatever. But soon in any ordinary sense.

Ultimately, the argument is that AGI isn’t ‘actionable’ because there is no clear milestone, no fixed point.

That’s not an argument for not taking action. That’s an argument for taking action now, because there will never be a clear later time for action. If you don’t want to use the term AGI (or transformative AI, or anything else proposed so far) because they are all conflated or confusing, all right, that’s fine. We can use different terms, and I’m open to suggestions. The thing in question is still rapidly happening.

As a simple highly flawed but illustrative metaphor, say you’re a professional baseball shortstop. There’s a highly talented set of an unlimited number of identical superstar talent 18-year-olds at your organization training at all the positions, that are rapidly getting better, but they’re best at playing shortstop and relatively lousy pitchers.

You never know for sure when they’re better than you at any given task or position, the statistics are always noisy, but at some point it will be obvious in each case.

So at some point, they’ll be better than you at shortstop. Then at some point after that, the gap is clear enough that the manager will give them your job. You switch to third base. A new guy replaces you there, too. You switch to second. They take that. You go to the outfield. Whoops. You learn how to pitch, that’s all that’s left, you invent new pitches, but they copy those and take that too. And everything else you try. Everywhere.

Was there any point at which the new rookies ‘were AGI’? No. But so what? You’re now hoping your savings let the now retired you sit in the stands and buy concessions.

Trump administration reiterates that it plans to change and simplify the export control rules on chips, and in particular to ease restrictions on the UAE, potentially during his visit next week. This is also mentioned:

Stephanie Lai and Mackenzie Hawkins (Bloomberg): In the immediate term, though, the reprieve could be a boon to companies like Oracle Corp., which is planning a massive data center expansion in Malaysia that was set to blow past AI diffusion rule limits.

If I found out the Malaysian data centers are not largely de facto Chinese data centers, I would be rather surprised. This is exactly the central case of why we need the new diffusion rules, or something with similar effects.

This is certainly one story you can tell about what is happening:

Ian Sams: Two stories, same day, I’m sure totally unrelated…

NYT: UAE pours $2 billion into Trump crypto coins

Bloomberg: Trump White House may ease restrictions on selling AI chips to UAE.

Tao Burga of IFP has a thread reiterating that we need to preserve the point of the rules, and ways we might go about doing that.

Tao Burga: The admin should be careful to not mistake simplicity for efficiency, and toughness for effectiveness. Although the Diffusion Rule makes rules “more complex,” it would simplify compliance and reduce BIS’s paperwork through new validated end-user programs and license excptions.

Likewise, the most effective policies may not be the “tough” ones that “ban” exports to whole groups of countries, but smart policies that address the dual-use nature of chips, e.g., by incentivizing the use of on-chip location verification and rule enforcement mechanisms.

We can absolutely improve on the Biden rules. What we cannot afford to do is to replace them with rules that are simplified or designed to be used for leverage elsewhere, in ways that make the rules ineffective at their central purpose of keeping AI compute out of Chinese hands.

Nvidia is going all-in on ‘if you don’t sell other countries equal use of your key technological advantage then you will lose your key technological advantage.’ Nvidia even goes so far as to say Anthropic is telling ‘tall tales’ (without, of course, saying specific claims they believe are false, only asserting without evidence the height of those claims) which is rich coming from someone saying China is ‘not behind on AI’ and also that if you don’t let me sell your advanced chips to them America will lose its lead.

Want sane regulations for the department of housing and urban development and across the government? So do I. Could AI help rewrite the regulations? Absolutely. Would I entrust this job to an undergraduate at DOGE with zero government experience? Um, no, thanks. The AI is a complement to actual expertise, not something to trust blindly, surely we are not this foolish. I mean, I’m not that worried the changes will actually stick here, but good wowie moment of the week candidate.

Indeed, I am far more worried this will give ‘AI helps rewrite regulations’ an even worse name than it already has.

Our immigration policies are now sufficiently hostile that we have gone from the AI talent magnet of the world to no longer being a net attractor of talent:

This isn’t a uniquely Trump administration phenomenon, most of the problem happened under Biden, although it is no doubt rapidly getting worse, including one case I personally know of where someone in AI that is highly talented emigrated away from America directly due to new policy.

UK AISI continues to do actual work, publishes their first research agenda.

UK AISI: We’re prioritising key risk domain research, including:

📌How AI can enable cyber-attacks, criminal activity and dual-use science

📌Ensuring human oversight of, and preventing societal disruption from, AI

📌Understanding how AI influences human opinions

📒 The agenda sets out how we’re building the science of AI risk by developing more rigorous methods to evaluate models, conducting risk assessments, and ensuring we’re testing the ceiling of AI capabilities of today’s models.

A key focus of the Institute’s new Research Agenda is developing technical solutions to reduce the most serious risks from frontier AI.

We’re pursuing technical research to ensure AI remains under human control, is aligned to human values, and robust against misuse.

We’re moving fast because the technology is too⚡

This agenda provides a snapshot of our current thinking, but it isn’t just about what we’re working on, it’s a call to the wider research community to join us in building shared rigour, tools, & solutions to AI’s security risks.

[Full agenda here.]

I often analyze various safety and security (aka preparedness) frameworks and related plans. One problem is that the red lines they set don’t stay red and aren’t well defined.

Jeffrey Ladish: One of the biggest bottlenecks to global coordination is the development of clear AI capability red lines. There are obviously AI capabilities that would be too dangerous to build at all right now if we could. But it’s not at obvious exactly when things become dangerous.

There are obviously many kinds of AI capabilities that don’t pose any risk of catastrophe. But it’s not obvious exactly which AI systems in the future will have this potential. It’s not merely a matter of figuring out good technical tests to run. That’s necessary also, but…

We need publicly legible red lines. A huge part of the purpose of a red line is that it’s legible to a bunch of different stakeholders. E.g. if you want to coordinate around avoiding recursive-self improvement, you can try to say “no building AIs which can fully automate AI R&D”

But what counts as AIs which can fully automate AI R&D? Does an AI which can do 90% of what a top lab research engineer can do count? What about 99%? Or 50%?

I don’t have a good answer for this specific question nor the general class of question. But we need answers ASAP.

I don’t sense that OpenAI, Google or Anthropic has confidence in what does or doesn’t, or should or shouldn’t, count as a dangerous capability, especially in the realm of automating AI R&D. We use vague terms like ‘substantial uplift’ and provide potential benchmarks, but it’s all very dependent on spirit of the rules at best. That won’t fly in crunch time. Like Jeffrey, I don’t have a great set of answers to offer on the object level.

What I do know is that I don’t trust any lab not to move the goalposts around to find a way to release, if the question is at all fudgeable in this fashion and the commercial need looks strong. I do think that if something is very clearly over the line, there are labs that won’t pretend otherwise.

But I also know that all the labs intend to respond to crossing the red lines with (as far as we see relatively mundane and probably not so effective) mitigations or safeguards, rather than a ‘no just no until we figure out something a lot better.’ That won’t work.

Want to listen to my posts instead of read them?

Thomas Askew offers you a Podcast feed for that with richly voiced AI narrations. You can donate to help out that effort here, the AI costs and time commitment do add up.

Jack Clark goes on Conversations With Tyler, self-recommending.

Tristan Harris TED talks the need for a ‘narrow path’ between diffusion of advanced AI versus concentrated power of advanced AI. Humanity needs to have enough power to steer, without that power being concentrated ‘in the wrong hands.’ The default path is insane, and coordination away from it is hard, but possible, and yes there are past examples. The step where we push back against fatalism and ‘inevitability’ remains the only first step. Alas, like most others he doesn’t have much to suggest for steps beyond that.

The SB 1047 mini-movie is finally out. I am in it. Feels so long ago, now. I certainly think events have backed up the theory that if this opportunity failed, we were unlikely to get a better one, and the void would be filled by poor or inadequate proposals. SB 813 might be net positive but ultimately it’s probably toothless.

The movies got into the act with Thunderbolts*. Given their track record the last few years has been so bad I stopped watching most Marvel movies, I did not expect this to be anything like as good as it was, or that it would (I assume fully unintentionally) be a very good and remarkably accurate movie about AI many and the associated dynamics, in addition to the themes like depression, friendship and finding meaning that are its text. Great joy, 4.5/5 stars if you’ve done your old school MCU homework on the characters (probably 3.5 if you’d be completely blind including the comics?).

Jesse Hoogland coins ‘the sweet lesson’ that AI safety strategies only count if they scale with compute. As in, as we scale up all the AIs involved, the strategy at least keeps pace, and ideally grows stronger. If that’s not true, then your strategy is only a short term mundane utility strategy, full stop.

Ah, the New Yorker essay by someone bragging about how they have never used ChatGPT, bringing very strong opinions about generative AI and how awful it is.

Okay, this is actually a great point:

Aiden McLaughlin: i love people who in the same breath say “if you showed o3 to someone in 2020 they would’ve called it agi” and then go on to talk about the public perception discontinuity they expect in 2027.

always remember that our perception of progress is way way smoother than anyone expects;

Except, hang on…

Aiden McLaughlin (continuing): i’m quite critical of any forecast that centers on “and then the agi comes out and the world blows up”

Those two have very little to do with each other. I think it’s a great point that looking for a public perception discontinuity, where everyone points and suddenly says ‘AGI!’ runs hard into this critique, with caveats.

The first thing is, reality does not have to care what you think of it. If AGI would indeed blow the world up, then we have ‘this seems like continuous progress, I said, as my current arrangement of atoms was transformed into something else that did not include me,’ with or without involving drones or nanobots.

Even if we are talking about a ‘normal’ exponential, remember that week in 2020?

Which leads into the second thing is, public perception of many things is often continuous and mostly oblivious until suddenly it isn’t. As in, there was a lot of AI progress before ChatGPT, then that came out and then wham. There’s likely going to be another ‘ChatGPT’ moment for agents, and one for the first Siri-Alexa-style thing that actually works. Apple Intelligence was a miss but that’s because it didn’t deliver. Issues simmer until they boil over. Wars get declared overnight.

And what is experienced as a discontinuity, of perception or of reality, doesn’t have to mostly be overnight, it can largely be over a period of months or more, and doesn’t even have to technically be discontinuous. Exponentials are continuous but often don’t feel that way. We are already seeing wildly rapid diffusion and accelerating progress even if it is technically ‘continuous’ and that’s going to be more so once the AIs count as meaningful optimization engines.

Arvind Narayanan and Ajeya Cotra have a conversation in Asterisk magazine. As I expected, while this is a much better discussion than your usual, especially Arvind’s willingness to state what evidence would change his mind on expected diffusion rates, but I found much of it extremely frustrating. Such as this, offered as illustrative:

Arvind: Many of these capabilities that get discussed — I’m not even convinced they’re theoretically possible. Running a successful company is a classic example: the whole thing is about having an edge over others trying to run a company. If one copy of an AI is good at it, how can it have any advantage over everyone else trying to do the same thing? I’m unclear what we even mean by the capability to run a company successfully — it’s not just about technical capability, it’s about relative position in the world.

This seems like Arvind is saying that AI in general can’t ever systematically run companies successfully because it would be up against other companies that are also run by similar AIs, so its success rate can’t be that high? And well, okay, sure I guess? But what does that have to do with anything? That’s exactly the world being envisioned – that everyone has to turn their company over to AI, or they lose. It isn’t a meaningful claim about what AI ‘can’t do,’ what it can’t do in this claim is be superior to other copies of itself.

Arvind then agrees, yes, we are headed for a world of universal deference to AI models, but he’s not sure it’s a ‘safety risk.’ As in, we will turn over all our decision making to AIs, and what, you worried bro?

I mean, yes, I’m very worried about that, among other things.

As another example:

Arvind: There is a level of technological development and societal integration that we can’t meaningfully reason about today, and a world with entirely AI-run companies falls in that category for me. We can draw an analogy with the industrial revolution — in the 1760s or 1770s it might have been useful to try to think about what an industrial world would look like and how to prepare for it, but there’s no way you could predict electricity or computers.

In other words, it’s not just that it’s not necessary to discuss this future now, it is not even meaningfully possible because we don’t have the necessary knowledge to imagine this future, just like pre-vs-post industrialization concerns.

The implication is then, since we can’t imagine it, we shouldn’t worry about it yet. Except we are headed straight towards it, in a way that may soon make it impossible to change course, so yes we need to think about it now. It’s rather necessary. If we can’t even imagine it, then that means it will be something we can’t imagine, and no I don’t think that means it will probably be fine. Besides, we can know important things about it without being able to imagine it, such as the above agreement that AI will by default end up making all the decisions and having control over this future.

The difference with the Industrial Revolution is that there we could steer events later, after seeing the results. Here, by default, we likely can’t. And also, it’s crazy to say that if you lived before the Industrial Revolution you couldn’t say many key things about that future world, and plan for it and anticipate it. As an obvious example, consider the US Constitution and system of government, which very much had to be designed to adapt to things like the Industrial Revolution without knowing its details.

Then there’s a discussion of whether it makes sense to have the ability to pause or restrict AI development, which we need to do in advance of there being a definitive problem because otherwise it is too late, and Arvind says we can’t do it until after we have definitive evidence of specific problems already. Which means it will 100% be too late – the proof that satisfies his ask is a proof that you needed to do something at least a year or two ago, so I guess we finished putting on all the clown makeup, any attempt to give us such abilities only creates backfire, and so on.

So, no ability to steer the future until it is too late to do so, then.

Arvind is assuming progression will be continuous, but even if this is true, that doesn’t mean utilization and realization won’t involve step jumps, and also that scaffolding won’t enable a bunch of progression off of existing available models. So again, essentially zero chance we will be able to steer until we notice it is too late.

This was perhaps the best exchange:

Arvind: This theme in your writing about AI as a drop-in replacement for human workers — you acknowledge the frontier is currently jagged but expect it to smooth out. Where does that smoothing come from, rather than potentially increasing jaggedness? Right now, these reasoning models being good at domains with clear correct answers but not others seems to be increasing the jaggedness.

Ajeya: I see it as continued jaggedness — I’d have to think harder about whether it’s increasing. But I think the eventual smoothing might not be gradual — it might happen all at once because large AI companies see that as the grand prize. They’re driving toward an AI system that’s truly general and flexible, able to make novel scientific discoveries and invent new technologies — things you couldn’t possibly train it on because humanity hasn’t produced the data. I think that focus on the grand prize explains their relative lack of effort on products — they’re putting in just enough to keep investors excited for the next round. It’s not developing something from nothing in a bunker, but it’s also not just incrementally improving products. They’re doing minimum viable products while pursuing AGI and artificial superintelligence.

It’s primarily about company motivation, but I can also see potential technical paths — and I’m sure they’re exploring many more than I can see. It might involve building these currently unreliable agents, adding robust error checking, training them to notice and correct their own errors, and then using RL across as many domains as possible. They’re hoping that lower-hanging fruit domains with lots of RL training will transfer well to harder domains — maybe 10 million reps on various video games means you only need 10,000 data points of long-horizon real-world data to be a lawyer or ML engineer instead of 10 million. That’s what they seem to be attempting, and it seems like they could succeed.

Arvind: That’s interesting, thank you.

Ajeya: What’s your read on the companies’ strategies?

Arvind: I agree with you — I’ve seen some executives at these companies explicitly state that strategy. I just have a different take on what constitutes their “minimum” effort — I think they’ve been forced, perhaps reluctantly, to put much more effort into product development than they’d hoped.

It is a highly dangerous position we are in, likely to result in highly discontinuous felt changes, to have model capabilities well ahead of product development, especially with open models not that far behind in model capabilities.

If OpenAI, Anthropic or Google wanted to make their AI a better or more useful consumer product, to have it provide better mundane utility, they would do a lot more of the things a product company would do. They don’t do that much of it. OpenAI is trying to also become a product company, but that’s going slowly, and this is why for example they just bought Windsurf. Anthropic is fighting it every step of the way. Google of course does create products, but DeepMind hates the very concept of products, and Google is a fundamentally broken company, so the going is tough.

I actually wish they’d work a lot harder on their product offerings. A lot of why it’s so easy for many to dismiss AI, and to expect such slow diffusion, is because the AI companies are not trying to enable that diffusion all that hard.

From last week, Anthropic CEO Dario Amodei wrote The Urgency of Interpretability. I certainly agree with the central claim that we are underinvesting in mechanistic interpretability (MI) in absolute terms. It would be both good for everyone and good for the companies and governments involved if they invested far more. I do not however think we are underinvesting in MI relative to other potential alignment-related investments.

He says that the development of AI is inevitable (well, sure, with that attitude!).

Ben Pace (being tough but fair): I couldn’t get two sentences in without hitting propaganda, so I set it aside. But I’m sure it’s of great political relevance.

I don’t think that propaganda must necessarily involve lying. By “propaganda,” I mean aggressively spreading information or communication because it is politically convenient / useful for you, regardless of its truth (though propaganda is sometimes untrue, of course).

Harlan Stewart: “The progress of the underlying technology is inexorable, driven by forces too powerful to stop”

Yeah Dario, if only you had some kind of influence over the mysterious unstoppable forces at play here

Dario does say that he thinks AI can be steered before models reach an overwhelming level of power, which implies where he thinks this inevitably goes. And Dario says he has increasingly focused on interpretability as a way of steering. Whereas by default, we have very little idea what AIs are going to do or how they work or how to steer.

Dario Amodei: Chris Olah is fond of saying, generative AI systems are grown more than they are built—their internal mechanisms are “emergent” rather than directly designed. It’s a bit like growing a plant or a bacterial colony: we set the high-level conditions that direct and shape growth, but the exact structure which emerges is unpredictable and difficult to understand or explain.

Many of the risks and worries associated with generative AI are ultimately consequences of this opacity, and would be much easier to address if the models were interpretable.

Dario buys into what I think is a terrible and wrong frame here:

But by the same token, we’ve never seen any solid evidence in truly real-world scenarios of deception and power-seeking because we can’t “catch the models red-handed” thinking power-hungry, deceitful thoughts. What we’re left with is vague theoretical arguments that deceit or power-seeking might have the incentive to emerge during the training process, which some people find thoroughly compelling and others laughably unconvincing.

Honestly I can sympathize with both reactions, and this might be a clue as to why the debate over this risk has become so polarized.

I am sorry, but no. I do not sympathize, and neither should he. These are not ‘vague theoretical arguments’ that these things ‘might’ have the incentive to emerge, not at this point. Sure, if your livelihood depends on seeing them that way, you can squint. But by now that has to be rather intentional on your part, if you wish to not see it.

Daniel Kokotajlo: I basically agree & commend you for writing this.

My only criticism is that I feel like you downplayed the deception/scheming stuff too much. Currently deployed models like to their users every day! They also deliberately reward hack!

On the current trajectory the army of geniuses in the data center will not be loyal/controlled. Interpretability is one of our best bets for solving this problem in a field crowded with merely apparent solutions.

Ryan Greenblatt: Do you agree that “we are on the verge of cracking interpretability in a big way”? This seems very wrong to me and is arguably the thesis of the essay.

Daniel Kokotajlo: Oh lol I don’t agree on that either but Dario would know better than me there since he has inside info + it’s unclear what that even means, perhaps it just is hypespeak for “stay tuned for our next exciting research results.” But yeah that seems like probably an over claim to me.

Ryan Greenblatt: I do not think Dario would know better than you due to inside info.

Dario is treating such objections as having a presumption of seriousness and good faith that they, frankly, do not deserve at this point, and Anthropic’s policy team is doing similarly only more so, in ways that have real consequences.

Do we need interpretability to be able to prove this in a way that a lot more people will be unable to ignore? Yeah, that would be very helpful, but let’s not play pretend.

The second section, a brief history of mechanistic interpretability, seems solid.

The third section, on how to use interpretability, is a good starter explanation, although I notice it is insufficiently paranoid about accidentally using The Most Forbidden Technique.

Also, frankly, I think David is right here:

David Manheim: Quick take: it’s focused on interpretability as a way to solve prosaic alignment, ignoring the fact that prosaic alignment is clearly not scalable to the types of systems they are actively planning to build.

(And it seems to actively embrace the fact that interpretability is a capabilities advantage in the short term, but pretends that it is a safety thing, as if the two are not at odds with each other when engaged in racing dynamics.)

Because they are all planning to build agents that will have optimization pressures, and RL-type failures apply when you build RL systems, even if it’s on top of LLMs.

That doesn’t mean interpretability can’t help you do things safely. It absolutely can. Building intermediate safe systems you can count on is extremely helpful in this regard, and you’ll learn a lot both figuring out how to do interpretability and from the results that you find. It’s just not the solution you think it is.

Then we get to the question of What We Can Do. Dario expects an ‘MRI for AI’ to be available within 5-10 years, but expects his ‘country of geniuses in a datacenter’ within 1-2 years, so of course you can get pretty much anything in 3-8 more years after that, and it will be 3-8 years too late. We’re going to have to pick up the pace.

The essay doesn’t say how these two timelines interact in Dario’s model. If we don’t get the genuines in the datacenter for a while, do we still get interpretability in 5-10 years? Is that the timeline without the Infinite Genius Bar, or with it? They imply very different strategies.

  1. His first suggestion is the obvious one, which is to work harder and spend more resources directly on the problem. He tries to help by pointing out that being able to explain what your model does and why is a highly profitable ability, even if it is only used to explain things to customers and put them at ease.

  2. Governments can ‘use light-touch rules’ to encourage the development of interpretability research. Of course they could also use heavy-touch rules, but Anthropic is determined to act as if those are off the table across the board.

  3. Export controls can ‘create a ‘security buffer’ that might give interpretability more time.’ This implies, as he notes, the ability to then ‘spend some of our lead’ on interpretability work or otherwise stall at a later date. This feels a bit shoehorned given the insistence on only ‘light-touch’ rules, but okay, sure.

Ryan Greenblatt: Ironically, arguably the most important/useful point of the essay is arguing for a rebranded version of the “precisely timed short slow/pause/pivot resources to safety” proposal. Dario’s rebranded it as spending down a “security buffer”.

(I don’t have a strong view on whether this is a good rebrand, seems reasonable to me I guess and the terminology seems roughly as good for communicating about this type of action.)

I think that would be a reasonable rebrand if it was bought into properly.

Mostly the message is simple and clear: Get to work.

Neel Nanda: Mood.

[Quotes Dario making an understatement: These systems will be absolutely central to the economy, technology, and national security, and will be capable of so much autonomy that I consider it basically unacceptable for humanity to be totally ignorant of how they work.]

Great post, highly recommended!

The world should be investing far more into interpretability (and other forms of safety). As scale makes many parts of AI academia increasingly irrelevant, I think interpretability remains a fantastic place for academics to contribute.

I also appreciate the shout out to the bizarre rejection of our second ICML mechanistic interpretability workshop. Though I generally assume the reviewing process is approximately random and poorly correlated with quality, rather than actively malicious.

Ryan Greenblatt: I agree that the world should invest more in interp (and safety) and academics can contribute. However, IMO the post dramatically overstates the promise of mech interp in short timelines by saying things like: “we are on the verge of cracking interpretability in a big way”.

Neel Nanda: I was expecting to be annoyed by this, but actually thought the post was surprisingly reasonable? I interpreted it as:

  1. Given 5-10 years we might crack it in a big way

  2. We may only have 2 years, which is not enough

  3. IF we get good at interp it would be a really big deal

  4. So we should invest way more than we currently are

I’m pretty on board with this, modulo concerns around opportunity costs. But I’m unconvinced it funges that much in the context of responding to a post like this, I think that the effect of this post is more likely to be raising interp investment than reallocating scarce safety resources towards interp?

Neel Nanda (later): New post: I’m all for investment in interpretability but IMO this overstates its importance vs other safety methods.

I disagree that interp is the only path to reliable safeguards on powerful AI. IMO high reliability is implausible by any means and interp’s role is in a portfolio.

I agree with Neel Nanda that the essay is implicitly presenting the situation as if interpretability would be the only reliable path forward for detecting deception in advanced AI. He’s saying it is both necessary and sufficient, whereas I would say it is neither obviously necessary nor is it sufficient. As Neel says, ‘high reliability seems unattainable’ using anything like current methods.

Neel suggests a portfolio approach. I agree we should be investing in a diverse portfolio of potential approaches, but I am skeptical that we can solve this via a kind of ‘defense in depth’ when up against highly intelligent models. That can buy you some time on the margin, which might be super valuable. But ultimately, I think you will need something we haven’t figured out yet and am hoping such a thing exists in effectively searchable space.

(And I think relying heavily on defense-in-depth with insufficiently robust individual layers is a good way to suddenly lose out of nowhere when threshold effects kick in.)

Neel lists reasons why he expects interpretability not to be reliable. I agree, and would emphasize the last one, that if we rely on interpretability we should expect sufficiently smart AI to obfuscate around our techniques, the same way humans have been growing steadily bigger brains and developing various cultural and physical technologies in large part so we can do this to each other and defend against others trying to do it to us.

As Miles says, so very far to go, but every little bit helps (also I am very confident the finding here is correct, but it’s establishing the right process that matters right now):

Miles Brundage: Most third party assessment of AI systems is basically “we got to try out the product a few days/weeks early.”

Long way to go before AI evaluation reaches the level of rigor of, say, car or airplane or nuclear safety, but this is a nice incremental step:

METR: METR worked with @amazon to pilot a new type of external review in which Amazon shared evidence beyond what can be collected via API, including information about training and internal evaluation results with transcripts, to inform our assessment of its AI R&D capabilities.

In this review, our objective was to weigh the evidence collected by Amazon about model capabilities against Amazon’s own Critical Capability Threshold as defined in its Frontier Model Safety Framework, rather than reviewing the threshold itself (see below).

After reviewing the evidence shared with us, we determined that Amazon has not crossed their Automated AI R&D Critical Capability Threshold for any of the models they have developed to date, regardless of deployment status.

Amazon Science: 🚀 Amazon Nova Premier, our most capable teacher model for creating custom distilled models, is now available on Amazon Bedrock!

Built for complex tasks like Retrieval-Augmented Generation (RAG), function calling, and agentic coding, its one-million-token context window enables analysis of large datasets while being the most cost-effective proprietary model in its intelligence tier.

Also, yes, it seems there is now an Amazon Nova Premier, but I don’t see any reason one would want to use it?

Some additional refinements to the emergent misalignment results. The result is gradual, and you can get it directly from base models, and also can get it in reasoning models. Nothing I found surprising, but good to rule out alternatives.

Janus finds GPT-4-base does quite a lot of alignment faking.

MIRI is the original group worried about AI killing everyone. They correctly see this as a situation where by default AI kills everyone, and we need to take action so it doesn’t. Here they provide a handy chart of the ways they think AI might not kill everyone, as a way of explaining their new agenda.

MIRI: New AI governance research agenda from MIRI’s Technical Governance Team. We lay out our view of the strategic landscape and actionable research questions that, if answered, would provide important insight on how to reduce catastrophic and extinction risks from AI.

If anything this chart downplays how hard MIRI thinks this is going to be. It does however exclude an obvious path to victory, which is that an individual lab (rather than a national project) gets the decisive strategic advantage, either sharing it with the government or using it themselves.

Most people don’t seem to understand how wild the coming few years could be. AI development, as fast as it is now, could quickly accelerate due to automation of AI R&D. Many actors, including governments, may think that if they control AI, they control the future.

The current trajectory of AI development looks pretty rough, likely resulting in catastrophe. As AI becomes more capable, we will face risks of loss of control, human misuse, geopolitical conflict, and authoritarian lock-in.

In the research agenda, we lay out four scenarios for the geopolitical response to advanced AI in the coming years. For each scenario, we lay out research questions that, if answered, would provide important insight on how to successfully reduce catastrophic and extinction risks.

Our favored scenario involves building the technical, legal, and institutional infrastructure required to internationally restrict dangerous AI development and deployment, preserving optionality for the future. We refer to this as an “off switch.”

We focus on an off switch since we believe halting frontier AI development will be crucial to prevent loss of control. We think skeptics of loss of control should value building an off switch, since it would be a valuable tool to reduce dual-use/misuse risks, among others.

Another scenario we explore is a US National Project—the US races to build superintelligence, with the goal of achieving a decisive strategic advantage globally. This risks both loss of control to AI and increased geopolitical conflict, including war.

Alternatively, the US government may largely leave the development of advanced AI to companies. This risks proliferating dangerous AI capabilities to malicious actors, faces similar risks to the US National Project, and overall seems extremely unstable.

In another scenario, described in Superintelligence Strategy, nations keep each other’s AI development in check by threatening to sabotage any destabilizing AI progress. However, visibility and sabotage capability may not be good enough, so this regime may not be stable.

Given the danger down all the other paths, we recommend the world build the capacity to collectively stop dangerous AI activities. However, it’s worth preparing for other scenarios. See the agenda for hundreds of research questions we want answered!

An off switch let alone a halt is going to be very difficult to achieve. It’s going to be even harder the longer one waits to build towards it. It makes sense to, while also pursuing other avenues, build towards having that option. I support putting a lot of effort into creating the ability to pause. This is very different from advocating for actually halting (also called ‘pausing’) now.

Paul Tutor Jones, who said there’s a 90% chance AI doesn’t even wipe out half of humanity, let alone all of it. What a relief.

Dave Karsten: Really interesting seeing how hedge fund folks have a mental framework for taking AI risk seriously.

Damian Tatum: I love to hear more people articulating the Normie Argument for AI Risk: “Look I’m not a tech expert but the actual experts keep telling us the stuff they’re doing could wipe out humanity and yet there are no rules and they aren’t stopping on their own, is anyone else worried?”

Paul Tudor Jones: All these folks in AI are telling us ‘We’re creating something that’s really dangerous’ … and yet we’re doing nothing right now. And it’s really disturbing.

Darkhorse (illustrative of how people will say completely opposite things like this about anyone, all the time, in response to any sane statement about risk, the central problem with hedge funds is that the incentives run into the opposite problem): Hedge fund folks have all to lose and little to gain.

Steely Dan Heatly: You are burying the lede. There’s a 10% chance AI wipes out half of humanity.

Joe Weisenthal: Yeah but a 90% chance that it doesn’t.

Hedge fund guys sometimes understand risk, including tail risk, and can have great practical ways of handing it. This kind of statement from Paul Tutor Jones is very much the basic normie argument that should be sufficient to carry the day. Alas.

On the contrary, it’s lack-of-empathy-as-a-service, and there’s a free version!

Olivia Moore: We now have empathy-as-a-service (for the low price of $20 / month!)

Dear [blue], I would like a more formal version, please. Best, [red].

Discussion about this post

Leave a Comment

Your email address will not be published. Required fields are marked *