Author name: DJ Henderson

fighting-obvious-nonsense-about-ai-diffusion

Fighting Obvious Nonsense About AI Diffusion

Our government is determined to lose the AI race in the name of winning the AI race.

The least we can do, if prioritizing winning the race, is to try and actually win it.

It is one thing to prioritize ‘winning the AI race’ against China over ensuring that humanity survives, controls and can collectively steer our future. I disagree with that choice, but I understand it. This mistake is very human.

I also believe that more alignment and security efforts at anything like current margins not only do not slow our AI efforts, they would actively help us win the race against China, by enabling better diffusion and use of AI, and ensuring we can proceed with its development. So the current path is a mistake even if you do not worry about humanity dying or losing control over the future.

However, if you look at the idea of building smarter, faster, more capable, more competitive, freely copyable digital minds we don’t understand that can be given goals and think ‘oh that future will almost certainly stay under humanity’s control and not be a danger to us in any way’ (and when you put it like that, um, what are you thinking?) then I understand the second half of this mistake as well.

What is not an understandable mistake, what I struggle to find a charitable and patriotic explanation for, is to systematically cripple or give away many of America’s biggest and most important weapons in the AI race, in exchange for thirty pieces of silver and some temporary market share.

To continue alienating our most important and trustworthy allies with unnecessary rhetoric and putting up trading barriers with them. To attempt to put tariffs even on services like movies where we already dominate and otherwise give the most important markets, like the EU, every reason in their minds to put up barriers to our tech companies and question our reliability as an ally. And simultaneously in the name of building alliances put the most valuable resources with unreliable partners like Malaysia, Saudi Arabia, Qatar and the UAE.

Indeed, we have now scrapped the old Biden ‘AI diffusion’ rule with no sign of its replacement, and where did David Sacks gloat about this? Saudi Arabia, of course. This is what ‘trusted partners’ means to them. Meanwhile, we are warning sterny against use of Huawai’s AI chips, ensuring China keeps all those chips itself. Our future depends on who has the compute, who ends up with the chips. We seem to instead think the future is determined by the revenue from chip manufacturing? Why would that be a priority? What do these people even think is going on?

To not only fail to robustly support and bring down regulatory and permitting barriers to the nuclear power we urgently need to support our data centers, but to actively wipe out the subsidies on which the nuclear industry depends, as the latest budget aims to do with remarkably little outcry via gutting the LPO and tax credits, while China of course ramps up its nuclear power plant construction efforts, no matter what the rhetoric on this might say. Then to use our inability to power the data centers as a reason to put our strategically vital data centers, again, in places like the UAE, because they can provide that power. What do you even call that?

To fail to let our AI companies have the ability to recruit the best and brightest, who want to come here and help make America great, instead throwing up more barriers and creating a climate of fear I’m hearing is turning many of the best people away.

And most of all, to say that the edge America must preserve, the ‘race’ that we must ‘win,’ is somehow the physical production of advanced AI chips. So, people say, in order to maintain our edge in chip production, we should give that edge entirely away right now, allowing those chips to be diverted to China, as would be inevitable in the places that are looking to buy where we seem most eager to enable sales. Nvidia even outright advocates that it should be allowed to sell to China openly, and no one in Washington seems to hold them accountable for this.

And we are doing all this while many perpetuate the myth that our AI efforts are not very solidly ahead of China in the places that matter most, or threaten to lock in the world’s customers, because DeepSeek which is impressive but still very clearly substantially behind our top labs, or because TikTok and Temu exist while forgetting that the much bigger Amazon and Meta also exist.

Temu’s sales are less than a tenth of Amazon’s, and the rest of the world’s top four e-commerce websites are Shopify, Walmart.com and eBay. As worrisome as it is, TikTok is only the fourth largest social media app behind Facebook, YouTube and Instagram, and there aren’t signs of that changing. Imagine if that situation was reversed.

Earlier this week I did an extensive readthrough and analysis of the Senate AI Hearing.

Here, I will directly lay out my response to various claims by and cited by US AI Czar David Sacks about the AI Diffusion situation and the related topics discussed above.

  1. Some of What Is Being Incorrectly Claimed.

  2. Response to Eric Schmidt.

  3. China and the AI Missile Gap.

  4. To Preserve Your Tech Edge You Should Give Away Your Tech Edge.

  5. To Preserve Your Compute Edge You Should Sell Off Your Compute.

  6. Shouting From The Rooftops: The Central Points to Know.

  7. The Longer Explanations.

  8. The Least We Can Do.

There are multiple distinct forms of Obvious Nonsense to address, either as text or very directly implied, whoever you attribute the errors to:

David Sacks (US AI Czar): Writing in NYT, former Google CEO Eric Schmidt warns that “China Tech Is Starting to Pull Ahead”:

“China is at parity or pulling ahead of the United States in a variety of technologies, notably at the A.I. frontier. And it has developed a real edge in how it disseminates, commercializes and manufactures tech. History has shown us that those who adopt and diffuse a technology the fastest win.”

As he points out, diffusing a technology the fastest — and relatedly, I would add, building the largest partner ecosystem — are the keys to winning. Yet when Washington introduced an “AI Diffusion Rule”, it was almost 200 pages of regulation hindering adoption of American technology, even by close partners.

The Diffusion Rule is on its way out, but other regulations loom.

President Trump committed to rescind 10 regulations for every new regulation that is added.

If the U.S. doesn’t embrace this mentality with respect to AI, we will lose the AI race.

Sriram Krishnan: Something @DavidSacks and I and many others here have been emphasizing is the need to have broad partner ecosystems using American AI stack rather than onerous complicated regulations.

If the discussion was ‘a bunch of countries like Mexico, Poland and Portugal are in Tier 2 that should instead have been in Tier 1’ then I agree there are a number of countries that probably should have been Tier 1. And I agree that there might well be a simpler implementation waiting to be fond.

And yet, why is it that in practice, these ‘broad partner ecosystems using American AI’ always seem to boil down to a handful of highly questionably allied and untrustworthy Gulf States with oil money trying to buy global influence, perhaps with a side of Malaysia and other places that are very obviously going to leak to China? David Sacks literally seems to think that if you do not literally put the data center in specifically China, then that keeps it in friendly hands and out of China’s grasp, and that we can count on our great friendships and permanent alliances with places like Saudi Arabia. Um, no. Why would you think that?

That Eric Schmidt editorial quoted above is a royal mess. For example, you have this complete non-sequitur.

Eric Schmidt and Selina Xu: History has shown us that those who adopt and diffuse a technology the fastest win.

So it’s no surprise that China has chosen to forcefully retaliate against America’s recent tariffs.

China forcefully retaliated against America’s tariffs for completely distinct reasons. The story Schmidt is trying to imply here doesn’t make any sense. His vibe reports are Just So Stories, not backed up at all by economic or other data.

‘By some benchmarks’ you can show pretty much anything, but I mean wow:

Eric Schmidt and Selina Xu: Yet, as with smartphones and electric vehicles, Silicon Valley failed to anticipate that China would find a way to swiftly develop a cheap yet state-of-the-art competitor. Today’s Chinese models are very close behind U.S. versions. In fact, DeepSeek’s March update to its V3 large language model is, by some benchmarks, the best nonreasoning model.

Look. No. Stop.

He then pivots to pointing out that there are other ‘tech’ areas where China is competitive, and goes into full scaremonger mode:

Apps for the Chinese online retailers Shein and Temu and the social media platforms RedNote and TikTok are already among the most downloaded globally. Combine this with the continuing popularity of China’s free open-source A.I. models, and it’s not hard to imagine teenagers worldwide hooked on Chinese apps and A.I. companions, with autonomous Chinese-made agents organizing our lives and businesses with services and products powered by Chinese models.

As I noted above, ‘American online retailers like Amazon and Shopify and the social media platforms Facebook and Instagram are already not only among but the most used globally.’

There is a stronger case one can make with physical manufacturing, when Eric then pivots to electric cars (and strangely focuses on Xiaomi over BYD) and industrial robotics.

Then, once again, he makes the insane ‘the person behind is giving away their inferior tech so we should give away our superior tech to them, that’ll show them’ argument:

We should learn from what China has done well. The United States needs to openly share more of its A.I. technologies and research, innovate even faster and double down on diffusing A.I. throughout the economy.

When you are ahead and you share your model, you give your rivals that model for free, killing your lead and your business for some sort of marketing win, and also you’re plausibly creating catastrophic risk. When you are behind, and you share it, sure, I mean why not.

In any case, he’s going to get his wish. OpenAI is going to release an open weight reasoning model, reducing America’s lead in order to send the clear message that yes we are ahead. Hope you all think it was worth it.

The good AI argument is that China is doing a better job in some ways of AI diffusion, of taking its AI capabilities and using them for mundane utility.

Similarly, I keep seeing forms of an argument that says:

  1. America’s export controls have given us an important advantage in compute.

  2. China’s companies have been slowed down by this, but have managed to stay only somewhat behind us in spite of it (largely because following is much easier).

  3. Therefore, we should lift the controls and give up our compute edge.

I’m sorry, what?

At lunch during Selina’s trip to China, when U.S. export controls were brought up, someone joked, “America should sanction our men’s soccer team, too, so they will do better.” So that they will do better.

It’s a hard truth to swallow, but Chinese tech has become better despite constraints, as Chinese entrepreneurs have found creative ways to do more with less. So it should be no surprise that the online response in China to American tariffs has been nationalistic and surprisingly optimistic: The public is hunkering down for a battle and thinks time is on Beijing’s side.

I don’t know why Eric keeps talking about the general tariffs or trade war with China here, or rather I do and it’s very obviously a conflation designed as a rhetorical trick. That’s a completely distinct issue, and I here take no position on that fight other than to note that our actions were not confined to China, and we very obviously shouldn’t be going after our trading partners and allies in these ways – including by Sacks’s logic.

The core proposal here is that, again:

  1. We gave China less to work with, put them at a disadvantage.

  2. They are managing to compete with us despite (his word) the disadvantage.

  3. Therefore we should take away their disadvantage.

It’s literal text. “America should sanction our men’s soccer team, too, so they will do better.” Should we also go break their legs? Would that help?

Then there’s a strange mix of ‘China is winning so we should become a centrally planned economy,’ mixed with ‘China is winning so we cannot afford to ever have any regulations on everything.’ Often both are coming from the same people. It’s weird.

So, shouting from the rooftops, once more with feeling for the people in the back:

  1. America is ahead of China in AI.

  2. Diffusion rules serve to protect America’s technological lead where it matters.

  3. UAE, Qatar and Saudi Arabia are not reliable American allies, nor are they important markets for our technology. We should not be handing them large shares of the world’s most valuable resource, compute.

  4. The exact diffusion rule is gone but something similar must take its place, to do otherwise would be how America ‘loses the AI race.’

  5. Not having any meaningful regulations at all on AI, or ‘building machines that are smarter and more capable than humans,’ is not a good idea, nor would it mean America would ‘lose the AI race.’

  6. AI is currently virtually unregulated as a distinct entity, so ‘repeal 10 regulations for every one you add’ is to not regulate at all building machines that are soon likely to be smarter and more capable than humans, or anything else either.

  7. ‘Winning the AI race’ is about racing to superintelligence. It is not about who gets to build the GPU. The reason to ‘win’ the ‘race’ is not market share in selling big tech solutions. It is especially not about who gets to sell others the AI chips.

  8. If we care about American dominance in global markets, including tech markets, stop talking about how what we need to do is not regulate AI, and start talking about the things that will actually help us, or at least stop doing the things that actively hurt us and could actually make us lose.

  1. American AI chips dominate and will continue to dominate. Our access to compute dominates, and will dominate if we enact and enforce strong export controls. American models dominate, we are in 1st, 2nd and 3rd with (in some order) OpenAI, Google and Anthropic. We are at least many months ahead.

    1. There was this one time DeepSeek put out an excellent reasoning model called r1 and an app for it.

    2. Through a confluence of circumstances (including misinterpretation of its true training costs, its making a good clean app where it showed its chain of thought, Google being terrible at marketing, it beating several other releases by a few weeks, OpenAI’s best models being behind paywalls, China ‘missile gap’ background fears, comparing only in the realms where r1 was relevant, acting as if only open models count, etc), this caught fire for a bit.

    3. But after a while it became clear that while r1 was a great achievement and indicated DeepSeek was a serious competitor, it was still even at their highest point 4-6 months behind, fundamentally it was a ‘fast follow’ achievement which is very different from taking a lead or keeping pace, and as training costs are scaled up it will be very difficult for DeepSeek to keep pace.

    4. That doesn’t mean DeepSeek doesn’t matter. Without DeepSeek the company, China would be much further behind than this.

    5. In response to this, a lot of jingoism and fearmongering akin to Kennedy’s ‘missile gap’ happened, which continues to this day.

    6. There are of course other tech and no-tech areas where China is competitive, such as Temu and TikTok in tech. But that’s very different.

    7. China does have advantages, especially its access to energy, and if they were allowed to access large amounts of compute that would be worrisome.

  2. The diffusion rules serve to protect America’s technological lead where it matters.

    1. America makes the best AI chips.

    2. The reason this matters is that it lets us be the ones who have those chips.

    3. America’s lead has many causes but one main cause is that we have far more and better compute, due to superior access to the best AI chips.

  3. Biden’s Diffusion Rule placed some countries in Tier 2 that could reasonably have and probably should have (based on what I know) been placed in Tier 1, or at worst a kind of Tier 1.5 with only mildly harsher supervision.

    1. If you want to move places like the remaining NATO members into Tier 1, or do something with a similar effect? That seems reasonable to me.

    2. However this very clearly does not include the very countries that we keep talking about allowing to build massive data centers with American AI chips, like the UAE, Saudi Arabia and Qatar.

    3. When there is talk of robust American allies and who will build and use our technology, somehow the talk is almost always about gulf states and other unreliable allies that are trying to turn their wealth into world influence.

    4. I leave why this might be so as an exercise to the reader.

    5. Even if such states do stay on our side, you had better believe they will use the leverage this brings to extract various other concessions from us.

    6. There is also very real concern that placing these resources in such locations would cause them to be misused by bad actors, including for terrorism, including via CBRN risks. It is foolish not to realize this.

    7. There is a conflation of selling other countries American AI chips and having them build AI data centers, with those countries using America’s AIs and other American tech company products. We should care mostly about them using our software products. The main reason to build AI data centers in other countries that are not our closest most trustworthy allies is if we are unable to build those data centers in America or in our closest most trustworthy allies, which mostly comes down to issues of permitting and power supply, which we could do a lot more to solve.

    8. If you’re going to say ‘the two are closely related we don’t want to piss off our allies’ right about now, I am going to be rather speechless given what else we have been up to lately including in trade, you cannot be serious right now. Or, if you want to actually get serious about this across the board, good, let’s talk.

    9. Is this a sign we don’t fully trust some of these countries? Yes. Yes it is.

  4. The exact diffusion rule is going away but something similar must and will take its place, to do otherwise would be how America ‘loses the AI race.’

    1. If China could effectively access the best AI chips, that would get rid of one of our biggest and most important advantages. Given their edge in energy, it could over time reverse that advantage.

    2. The point of trying to prevent China from improving its chip production is to prevent China from having the resulting compute. If we sell the chips to prevent this, then they already have the compute now. You lose.

    3. It is very clear that our exports to the ‘tier 2’ countries that look to buy what looks suspiciously like a lot of chips are often diverted to use by China, with the most obvious example being those sold to Malaysia.

    4. We should also worry about what happens to data centers built in places like Saudi Arabia or the UAE.

    5. I will believe the replacement rule will have the needed teeth when I see it.

    6. That doesn’t mean we can’t find a better, simpler implementation that protects American chips from falling into Chinese hands. But we need some diffusion rule that we can enforce, and that in practice actually prevents the Chinese from buying or getting access to our AI chips in quantity.

    7. Yes, if we sell our best AI chips to everyone freely, as Nvidia wants to do, or do it in ways that are effectively the same thing, then that helps protect Nvidia’s profits and market share, and by denying others markets we do gain some edge in the ability to maintain our dominance in making AI chips.

    8. But so what? All we do is make a little money on the AI chips, and China gets to catch up in actually having and using the AI chips, which is what matters. We’d be sacrificing the future on the altar of Nvidia’s stock price. This is the capitalist selling the rope with which to hang him. ‘Winning the race’ to an ordinary tech market is not what matters. If the only way to protect our lead for a little longer there is to give away the benefits of the lead, of what use was the lead?

    9. It also would make very little difference to either Nvidia or its Chinese competitors.

    10. Nvidia can still sell as many chips as it can produce, well above cost. All the chips Nvidia is not allowed to sell to China, even the crippled A20s, will happily be purchased in Western markets at profitable prices, if Nvidia allows it, giving America and its allies more compute and China less compute.

    11. I would be happy, if necessary, to have USG purchase any chips that Nvidia or AMD or anyone else is unable to sell due to diffusion rules. We would have many good uses for them, we can use them for public compute resources for universities and startups or whatever if the military doesn’t want them. The cost is peanuts relative to the stakes. (Disclosure, I am a shareholder of Nvidia, etc, but also I am writing this entire post).

    12. Demand in China for AI chips greatly outstrips supply. They have no need for export markets for their chips, and indeed we should be happy if they choose to export some of them rather than keeping them for domestic use.

    13. China already sees AI chip production as central to its future and national security. They are already pushing as hard as they dare.

  5. Not having any meaningful regulations at all on AI, or ‘building machines that are smarter and more capable than humans,’ is not a good idea, nor would it mean America would ‘lose the AI race.’

    1. This is not a strawman position. The House is trying to impose a 10-year moratorium on state and local enforcement of any laws whatsoever related to AI, even a potential law banning CSAM, without offering anything to replace that in any way, and Congress notoriously can’t pass laws these days. We also have the call to ‘repeal 10 regulations for every new one,’ which is again de facto a call for no regulations at all (see #6).

    2. Highly capable AI represents an existential risk to humanity.

    3. If we ‘win the race’ by simply going ahead as fast as possible, it’s not America that win the future. The AIs win the future.

    4. I can’t go over all the arguments about that here, but seriously it should be utterly obvious that building more intelligent, capable, competitive, faster, cheaper minds and optimization engines, that can be freely copied and given whatever goals and tasks, is not a safe thing for humanity to do.

    5. I strongly believe it turns out it’s far more dangerous than I made it sound there, for many many reasons. I don’t have the space here to talk about why but seriously how do people claims this is a ‘safe’ action. What?

    6. Even if highly capable AI remains under our control, it is going to transform the world and all aspects of our civilization and way of life. The idea that we would not want to steer that at all seems rather crazy.

    7. Regulations do not need to ‘slow down’ AI in a meaningful way. Indeed, a total lack of meaningful regulations would slow down diffusion and practical use of AI, including for national security and core economic purposes, more than wise regulation, because no one is going to use AI they cannot trust.

    8. That goes to both people knowing that they can trust AI, and also to requiring the AIs be made trustworthy. Security is capability. We also need to protect our technology and intellectual property from theft if we want to keep a lead.

    9. A lack of such regulations would also mean falling back upon the unintended consequences of ordinary law as they then happen to apply to AI, which will often be extremely toxic for our ability to apply AI to the most valuable tasks.

    10. If we try to not regulate AI at all, the public will turn against AI. Americans already dislike AI, in a way the Chinese do not. We must build trust.

    11. China, like everyone else, already regulates AI. The idea that if we had a fraction of the regulations they do, or if we interfere with companies or the market a fraction of how much they constantly do so everywhere, that we suddenly ‘lose the race,’ is silly.

    12. We have a substantial lead in AI, despite many efforts to lose I discuss later. We are not in danger of ‘losing’ every time we breathe on the situation.

    13. Most of the regulations that are being pushed for are about transparency, often even transparency to the government, so we can know what the hell is going on, and so people can critique the safety and security plans of labs. They are about building state capacity to evaluate models, and using that, which actively benefits AI companies in various ways as discussed above.

    14. There are also real and important mundane harms to deal with now.

    15. Yes, if we were to impose highly onerous, no good, very bad regulations, in the style of the European Union, that would threaten our AI lead and be very bad. This is absolutely a real risk. But this type of accusation consistently gets levied against any bill attempting to do anything, anywhere, for any reason – or that someone is trying to ‘ban math’ or ‘kill AI’ or whatever. Usually this involves outright hallucinations about what is in the bill, or its consequences.

  6. AI is currently virtually unregulated as a distinct entity, so ‘repeal 10 regulations for every one you add’ is to not regulate at all building machines that are soon likely to be smarter and more capable than humans, or anything else either.

    1. There are many regulations that impact AI in various ways.

    2. Many of those regulations are worth repealing or reforming. For example, permitting reform on power plants and transmission lines. And there are various consequences of copyright, or of common law, that should be reconsidered for the AI age.

    3. What almost all these rules have in common is that they are not rules about AI. They are rules that are already in place in general, for other reasons. And again, I’d be happy to get rid of many of them, in general or for AI in particular.

    4. But yes, you are going to want to regulate AI, and not merely in the ‘light touch’ ways that are code words for doing nothing, or actively working to protect AI from existing laws.

    5. AI is soon going to be the central fact about the world. To suggest this level of non-intervention is not classical liberalism, it is anarchism.

    6. Anarchism does not tend to go well for the uncompetitive and disadvantaged, which in the future age of ASI would be the humans, and it fails to solve various important market failures, collective action and public goods problems and so on.

    7. The reason why a general hands-off approach has in the past tended to benefit humans, so long as you work to correct key market failures and solve particular collective action problems, is that humans are the most powerful optimization engines, and most intelligent and powerful minds, on the planet, and we have various helpful social dynamics and characteristics. All of that, and some other key underpinnings I could go into, often won’t apply to a future world with very powerful AI.

    8. If we don’t do sensible regulations now, while we can all navigate this calmly, it will get done after something goes wrong, and not calmly or wisely.

  7. ‘Winning the AI race’ is not about who gets to build the GPU. ‘Winning’ the ‘race’ is not important because of who gets market share in selling big tech solutions. It is especially not about who gets to sell others the AI chips. Winning the race is about the race to superintelligence.

    1. The major AI labs say we will likely reach AGI within Trump’s second term, with superintelligence (ASI) following soon thereafter. David Sacks himself endorses this view explicitly.

    2. ‘Winning the race’ to superintelligence is indeed very important. The way in which humanity reaches superintelligence (assuming we do reach it) will determine the future.

    3. That future might be anything from wonderful to worthless. It might or might not involve humanity surviving, or being in control over the future. It might or might not reflect different values, or be something we would find valuable.

    4. If we build a superintelligence before we know how to align it, meaning before we know how to get it to do what we want it to do, everyone dies, or at minimum we lose control over the future.

    5. If we build a superintelligence and know how to align it, but we don’t choose a good thing to align it to, meaning we don’t wisely choose how it will act, then the same thing happens. We die, or we lose control over the future.

    6. If we build a superintelligence and know how to align it, and align it in general to ‘whatever the local human tells it to do,’ even with restrictions on that, and give out copies, this results at best in gradual disempowerment of humanity and us losing control over the future and the future likely losing all value. This problem is hard.

    7. This is very different from a question like ‘who gets better market share for their AI products,’ whether that is hardware or software, and questions about things like commercial adaptation and lockin or tech stack usage or what not, as if AI was some ordinary technology.

    8. AI actually has remarkably little lock-in. You can mostly swap one model out for another at will if someone comes out with a better one. There’s no need to run a model that matches the particular AI chips you own, either. AI itself will be able to simplify the ‘migration’ process or any lock-in issues.

    9. It’s not that whose AI models people use doesn’t matter at all. But in a world in which we will soon reach superintelligence, it’s mostly about market share in the meantime to fund AI development.

    10. If we don’t soon reach superintelligence, then we’re dealing with a far more ‘ordinary’ technology, and yes we want market share, but it’s no longer an existentially important race, it won’t have dramatic lock-in effects, and getting to the better AI products first will still depend on us retaining our compute advantages as long as possible.

  8. If we care about American dominance in global markets, including tech markets, and especially if we care about winning the race to AGI and superintelligence and otherwise protecting American national security, stop talking about how what we need to do is not regulate AI, and start talking about the things that will actually help us, or at least stop doing the things that actively hurt us and could actually make us lose.

    1. Straight talk. While it’s not my primary focus because development of AGI and ASI is more important, I strongly agree that we want American tech, especially American software, being used as widely as possible, especially by allies, across as much of the tech stack as possible. Even more than that, I strongly want America to have the lead in frontier highly capable AI, including AGI and then ASI, in the ways that determine the future.

    2. If we want to do that, what is most important to accomplishing this?

    3. We need allies to work with us and use our tech. Everyone says this. That means we need to have allies! That means working with them, building trust. Make them want to build on our tech stacks, and buy our products.

    4. That also means not imposing tariffs on them, or making them lose trust in us and our technology. Various recent actions have made our allies lose trust, in ways that are causing them to be less trusting of American tech stacks. And when we go to trade wars with them, you know what our main exports are that they will go after? Things like AI.

    5. It also means focusing most on our most important and trustworthy allies that have the most important markets. That means places like our NATO allies, Japan, South Korea and Australia, not Saudi Arabia, Qatar and the UAE. Those later markets don’t matter zero, but they are relatively tiny.

    6. Yes, avoiding hypothetical sufficiently onerous regulation on AI directly, and I will absolutely be keeping an eye out for this. Most of the regulatory and legal barriers that matter lie elsewhere.

    7. The key barriers are in the world of atoms, not the world of bits.

    8. Energy generation and transmission, permitting reform.

    9. High-skilled immigration, letting talent come to America.

    10. Education reform so AI helps teach rather than helping students cheat.

    11. Want reshoring? Repeal the Jones Act so we can transmit the resulting goods. Automate the ports. Allow self-driving cars and trucks broadly. And so on.

    12. Regulations that prevent the application of AI to high value sectors, or otherwise hold back America. Broad versions of YIMBY for housing. Occupational licensing. FDA requirements. The list goes on. Unleash the abundance agenda, it mostly lines up with what AI needs. It’s time to build.

    13. Dealing with various implications of other laws that often were crazy already and definitely don’t make sense in an AI world.

    14. The list goes on.

Or, as Derek Thompson put it:

Derek Thompson: Trump’s new AI directive (quoted below from David Sacks) argues the US should take care to:

– respect our trading partners/allies rather than punish them with dumb rules that restrict trade

– respect “due process”

It’d be interesting to apply these values outside of AI!

Jordan Schneider: It’s an NVDA press release. Just absurd.

David Sacks continues to beat the drum that the diffusion rule ‘undermines the goal of winning the AI race,’ as if the AI race is about Nvidia’s market share. It isn’t.

If we want to avoid allocations of resources by governmental decision, overreach of our executive branch authorities to restrict trade, alienating US allies and lack of due process, Sacks’s key points here? Yeah, those generally sound like good ideas.

To that end, yes, I do believe we can improve on Biden’s proposed diffusion rules, especially when it comes to US allies that we can trust. I like the idea that we should impose less trade restrictions on these friendly countries, so long as we can ensure that the chips don’t effectively fall into the wrong hands. We can certainly talk price.

Alas, in practice, it seems like the actual plans are to sell massive amounts of AI chips to places like UAE, Saudi Arabia and Malaysia. Those aren’t trustworthy American allies. Those are places with close China ties. We all know what those sales really mean, and where they could easily be going. And those are chips we could have kept in more trustworthy and friendly hands, that are eager to buy them, especially if they have help facilitating putting those chips to good use.

The policy conversations I would like to be having would focus not only on how to best superchange American AI and the American economy, but also on how to retain humanity’s ability to steer the future and ensure AI doesn’t take control, kill everyone or otherwise wipe out all value. And ideally, to invest enough in AI alignment, security, transparency and reliability that there would start to be a meaningful tradeoff where going safer would also mean going slower.

Alas. We massively underinvesting in reliability and alignment and security purely from a practical utility perspective and we not even having that discussion.

Instead we are having a discussion about how, even if your only goal is ‘America must beat China and let the rest handle itself,’ to stop shooting ourselves in the foot on that basis alone.

The very least we can do is not shoot ourselves in the foot, and not sell out our future for a little bit of corporate market share or some amount of oil money.

Discussion about this post

Fighting Obvious Nonsense About AI Diffusion Read More »

max-pivots-back-to-hbo-max-as-wbd-rethinks-ability-to-compete-with-netflix

Max pivots back to HBO Max as WBD rethinks ability to compete with Netflix

Today, Zaslav and company are doing an about-face, with the CEO saying that WBD is “bringing back HBO, the brand that represents the highest quality in media, to further accelerate” the streaming service’s “growth in the years ahead.”

WBD’s announcement added that “returning the HBO brand into HBO Max will further drive the service forward and amplify the uniqueness that subscribers can expect from the offering.”

“It is also a testament to WBD’s willingness to keep boldly iterating its strategy and approach—leaning heavily on consumer data and insights—to best position itself for success,” the media conglomerate claimed.

“Not everything for everyone”

The announcement is a result of WBD rethinking its streaming strategy as leadership acknowledges that it failed to sell Max as an essential streaming service.

Last month, executives admitted that Max is viewed as more of an add-on service, per The Wall Street Journal (WSJ). Executives said at the time that they no longer want to try to push their streaming service as something for every member of the household.

“What people want from us in a world where they’ve got Netflix and Amazon [Prime Video] are those things that differentiate us,” Casey Bloys, chairman and CEO of HBO and Max content, told WSJ.

The strategy pivot since has included moving further away from children’s programming and some Discovery content, like shows from the Food Network and HGTV. There have also been reports of WBD exploring splitting Discovery from Max.

“We’re not fighting for the more-is-better game,” JB Perrette, WBD’s streaming president and CEO, told WSJ. “We’ll let others deal with the volume.”

In today’s announcement, Perrette doubled down on those sentiments:

We will continue to focus on what makes us unique—not everything for everyone in a household, but something distinct and great for adults and families. It’s really not subjective, not even controversial—our programming just hits different.

Max pivots back to HBO Max as WBD rethinks ability to compete with Netflix Read More »

us-warns-companies-around-the-world-to-stay-away-from-huawei-chips

US warns companies around the world to stay away from Huawei chips

President Donald Trump’s administration has taken a tougher stance on Chinese technology advances, warning companies around the world that using artificial intelligence chips made by Huawei could trigger criminal penalties for violating US export controls.

The commerce department issued guidance to clarify that Huawei’s Ascend processors were subject to export controls because they almost certainly contained, or were made with, US technology.

Its Bureau of Industry and Security, which oversees export controls, said on Tuesday it was taking a more stringent approach to foreign AI chips, including “issuing guidance that using Huawei Ascend chips anywhere in the world violates US export controls.”

But people familiar with the matter stressed that the bureau had not issued a new rule, but was making it clear to companies that Huawei chips are likely to have violated a measure that requires hard-to-get licenses to export US technology to the Chinese company.

“The guidance is not a new control, but rather a public confirmation of an interpretation that even the mere use anywhere by anyone of a Huawei-designed advanced computing [integrated circuit] would violate export control rules,” said Kevin Wolf, a veteran export control lawyer at Akin Gump.

The bureau said three Huawei Ascend chips—the 910B, 910C, and 910D—were subject to the regulations, noting that such chips are likely to have been “designed with certain US software or technology or produced with semiconductor manufacturing equipment that is the direct produce of certain US-origin software or technology, or both.”

The guidance comes as the US has become increasingly concerned at the speed at which Huawei has developed advanced chips and other AI hardware.

Huawei has begun delivering AI chip “clusters” to clients in China that it claims outperform leading US AI chipmaker Nvidia’s comparable product on key metrics such as total compute and memory. The system relies on a large number of 910C chips, which individually fall short of Nvidia’s most advanced offering but collectively deliver superior performance to a rival Nvidia cluster product.

US warns companies around the world to stay away from Huawei chips Read More »

microsoft-shares-its-process-(and-discarded-ideas)-for-redone-windows-11-start-menu

Microsoft shares its process (and discarded ideas) for redone Windows 11 Start menu

Microsoft put a lot of focus on Windows 11’s design when it released the operating system in 2021, making a clean break with the design language of Windows 10 (which had, itself, simply tweaked and adapted Windows 8’s design language from 2012). Since then, Microsoft has continued to modify the software’s design in bits and pieces, both for individual apps and for foundational UI elements like the Taskbar, system tray, and Windows Explorer.

Microsoft is currently testing a redesigned version of the Windows 11 Start menu, one that reuses most of the familiar elements from the current design but reorganizes them and gives users a few additional customization options. On its Microsoft Design blog today, the company walked through the new design and showed some of the ideas that were tried and discarded in the process.

This discarded Start menu design toyed with an almost Windows XP-ish left-hand sidebar, among other elements. Microsoft

Microsoft says it tested its menu designs with “over 300 Windows 11 fans” in unmoderated studies, “and dozens more” in “live co-creation calls.” These testers’ behavior and reactions informed what Microsoft kept and what it discarded.

Many of the discarded menu ideas include larger previews for recently opened files, more space given to calendar reminders, and recommended “For You” content areas; one has a “create” button that would presumably activate some generative AI feature. Looking at the discarded designs, it’s easier to appreciate that Microsoft went with a somewhat more restrained redesign of the Start menu that remixes existing elements rather than dramatically reimagining it.

Microsoft has also tweaked the side menu that’s available when you have a phone paired to your PC, making it toggleable via a button in the upper-right corner. That area is used to display recent texts and calls and other phone notifications, recent contacts, and battery information, among a couple other things.

Microsoft’s team wanted to make sure the new menu “felt like it belonged on both a [10.5-inch] Surface Go and a 49-inch ultrawide,” a nod to the variety of hardware Microsoft needs to consider when making any design changes to Windows. The menu the team landed on is essentially what has been visible in Windows Insider Preview builds for a month or so now: two rows of pinned icons, a “Recommended” section with recently installed apps, recently opened files, a (sigh) Windows Store app that Microsoft thinks you should try, and a few different ways to access all the apps on your PC. By default, these will be arranged by category, though you can also view a hierarchical alphabetized list like you can in the current Start menu; the big difference is that this view is at the top level of the Start menu in the new version, rather than being tucked away behind a button.

For more on the history of the Start menu from its inception in the early ’90s through the release of Windows 10, we’ve collected tons of screenshots and other reminiscences here.

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

Microsoft shares its process (and discarded ideas) for redone Windows 11 Start menu Read More »

dutch-scientists-built-a-brainless-soft-robot-that-runs-on-air 

Dutch scientists built a brainless soft robot that runs on air 

Most robots rely on complex control systems, AI-powered or otherwise, that govern their movement. These centralized electronic brains need time to react to changes in their environment and produce movements that are often awkwardly, well, robotic.

It doesn’t have to be that way. A team of Dutch scientists at the FOM Institute for Molecular and Atomic Physics (AMOLF) in Amsterdam built a new kind of robot that can run, go over obstacles, and even swim, all driven only by the flow of air. And it does all that with no brain at all.

Sky-dancing physics

“I was in a lab, working on another project, and had to bend a tube to stop air from going through it. The tube started oscillating at very high frequency, making a very loud noise,” says Alberto Comoretto, a roboticist at AMOLF and lead author of the study. To see what was going on with the tube, Comoretto set up a high-speed camera and recorded the movement. He found that the movement resulted from the interplay between the air pressure inside the tube and the state of the tube itself.

When there was a kink in the tube, the increasing pressure pushed that kink along the tube’s length. That caused the pressure to decrease, which enabled a new kink to appear and the cycle to repeat. “We were super excited because we saw this self-sustaining, periodic, asymmetric motion,” Comoretto told Ars.

The first reason for Comoretto’s excitement was that the flapping tube in his lab was driven by the kind of airflow physics that Peter Marshall, Doron Gazit, and Aireh Dranger harnessed to build their famous dancing “Fly Guys” for the Olympic Games in Atlanta in 1996. The second reason was that asymmetry and periodicity he saw in the tube’s movement pattern were also present in the way all living things moved, from single-celled organisms to humans.

Dutch scientists built a brainless soft robot that runs on air  Read More »

a-new-era-in-cancer-therapies-is-at-hand

A new era in cancer therapies is at hand


New therapeutic strategies build on the success of immunotherapy.

In 2012, clinicians at the Children’s Hospital of Philadelphia treated Emily Whitehead, a 6-year-old with leukemia, with altered immune cells from her own body. At the time, the treatment was experimental, but it worked: The cells targeted the cancer and eradicated it. Thirteen years later, Whitehead is still cancer-free.

The modified cells, called CAR-T cells, are a form of immunotherapy, where doctors change parts of the immune system into cancer-attacking instruments. About five years after Whitehead’s treatment, the first CAR-T drugs were approved by the FDA and were heralded, along with immunotherapy more broadly, as one of the most promising modern cancer treatments. Today, there are seven FDA-approved CAR-T therapies, including the one used to treat Whitehead.

Since then, however, studies have linked CAR-T to fatal complications due to treatment toxicity, and the treatment has had a harder time addressing certain types of cancers, particularly solid tumors affecting the breast and pancreas, although some small clinical trials have been starting to show positive results for solid cancers. “After a decade, a decade and a half, we arrive at the point that there are patients who answer, most of the patients still do not answer,” said George Calin, a researcher at University of Texas MD Anderson Cancer Center.

Now experts say that new therapies are beginning to surpass challenges that previous treatments couldn’t, providing safer, more targeted delivery directly to tumors. These include drugs that contain radioactive substances, called radiopharmaceuticals, which are used to diagnose or treat cancer; medications that can influence the genes that spur or suppress tumor growth; and therapeutic cancer vaccines.

These approaches have shown promise in the lab, and researchers and companies are now conducting various stages of human clinical trials to explore their effectiveness. And some promising treatments have even gained approval by the Food and Drug Administration. The hope is that improving on these strategies will ultimately help treat even the most resistant types of cancer.

Despite researchers’ excitement for innovative treatments, there is rampant online misinformation and there are occasions in which companies have been found to tout and sell fake cures, said Kathrin Dvir, an oncologist and researcher at Moffitt Cancer Center.

But other scientists remain optimistic about the future of cancer research, Calin said: “All the time in science, you have to open the door with something new.”

Targeting is tough

Historically, one of the biggest challenges in cancer treatments has been the lack of specific targets. The typical standards of care — chemotherapy and radiation — kill off not only cancer cells, but also healthy ones. (This is one reason why cancer patients on these treatments experience hair loss, nausea, and other symptoms.) In recent years, scientists have thus aimed to develop therapies that only attack cancer cells, leaving the rest of the body unharmed.

One way to achieve this is through more precise targeting of the tumor. In one of these approaches, drugs act as a ferry, delivering radioactive molecules directly to the cancer. They do this by targeting proteins that are only present on the surface of specific tumors.

Take, for example, prostate cancer. Here, the cancerous cells are sensitive to radiation, so some researchers are working on drugs containing unstable chemical elements that emit radiation — radioactive isotopes, or radiopharmaceuticals — to facilitate imaging of the tumors and provide enough radiation to treat them.

Already, the field of radiopharmaceuticals has seen growth following successes like the brand name drugs Pluvicto for prostate cancer and Lutathera for neuroendocrine tumors, which reportedly offer improved quality of life compared to traditional treatments. Additionally, using radioisotopes for imaging could also allow researchers to diagnose and classify patients much better to provide personalized care, said Jason Lewis, a radiochemist at Memorial Sloan Kettering Cancer Center. And while radiopharmaceutical therapy can have side effects, he added, it’s “designed to minimize radiation to healthy tissues.”

Other therapies, called antibody-drug conjugates, act similarly: They shuttle molecules that can kill the cancer cells via antibodies that can dock on tumors. About a dozen of such drugs have been approved by the FDA for various types of cancer.

There are also new vaccines to help the immune system ward off cancer, using the key approach behind a type of COVID-19 vaccine — mRNA technology. For example, one of the companies that developed one of the COVID-19 shots, BioNTech, is working on a vaccine called BNT116 designed to elicit immune reactions to treat a type of lung cancer, which is currently recruiting about 150 participants across the world to undergo safety testing.

mRNA therapeutic vaccines for cancer, which use messenger RNA as blueprint material so the body can create proteins that are unique to the tumor to help elicit an immune response, may offer several advantages. The shots can be personalized, for instance, to the patients’ own tumors, said Siow Ming Lee, an oncologist at University College London Hospitals and one of the lead researchers of the trial. Other vaccines are also in the works. “We are in this sort of new era now,” he said.

Another type of genetic molecule could also be a target to help treat cancer. Some RNAs, called microRNAs, can act on genes that are responsible for tumor growth. Researchers like Calin are developing small molecules that bind to cancer-related microRNAs, to turn them off and try to halt the disease’s spread.

With FDA approvals, human clinical trials underway and, with promising preclinical data for many of these therapies, the researchers who spoke to Undark said that the future appears bright. “We’re not just seeing these dramatic improvements in outcomes and survival for patients with some indications, but the quality of life,” Lewis said.

New approaches, new problems

As more of these latest cancer technologies do get approved for treatment, new approaches can bring new problems, experts say. For example, with radiotherapeutics, one big challenge is to source enough radioisotopes for the drugs, and have a specialized workforce to handle radioactivity, said Lewis. For microRNAS, it’s tricky to identify exactly which type to target for a particular cancer, Calin emphasized.

And there are also companies that are trying to capitalize on new, unproven technologies and drugs prematurely. The company ExThera Medical, for instance, has been charging patients tens of thousands of dollars for unproven therapies, according to a recent report by The New York Times.

“All over the world, there are many so-called new therapeutics that are not well-tested and not well-developed,” said Calin. Dvir encounters misinformation at her clinic almost daily, she said. “Maybe some of those have some data in the preclinical, in animal studies — it doesn’t mean that it works on the human because we need data before you expose people to those therapies.”

Although the FDA faces budget cuts, some of the researchers and clinicians that Undark spoke to insist that the agency will weed out bad science. If not, the clinicians that Undark spoke with said that they can also help guide patients toward evidence-based treatments.

Ultimately, researchers want to continue to improve these treatments to see if they might work in tandem. “I think the name of the game in the next five to 10 years is combinations,” said Dvir. Already, there are trials looking at precisely how using different approaches together might boost their ability to treat cancer, she adds. “We know that these drugs work in synergy. It’s just finding the right combination that is effective but not too toxic.”

This article was originally published on Undark. Read the original article.

A new era in cancer therapies is at hand Read More »

doge-software-engineer’s-computer-infected-by-info-stealing-malware

DOGE software engineer’s computer infected by info-stealing malware

Login credentials belonging to an employee at both the Cybersecurity and Infrastructure Security Agency and the Department of Government Efficiency have appeared in multiple public leaks from info-stealer malware, a strong indication that devices belonging to him have been hacked in recent years.

Kyle Schutt is a 30-something-year-old software engineer who, according to Dropsite News, gained access in February to a “core financial management system” belonging to the Federal Emergency Management Agency. As an employee of DOGE, Schutt accessed FEMA’s proprietary software for managing both disaster and non-disaster funding grants. Under his role at CISA, he likely is privy to sensitive information regarding the security of civilian federal government networks and critical infrastructure throughout the US.

A steady stream of published credentials

According to journalist Micah Lee, user names and passwords for logging in to various accounts belonging to Schutt have been published at least four times since 2023 in logs from stealer malware. Stealer malware typically infects devices through trojanized apps, phishing, or software exploits. Besides pilfering login credentials, stealers can also log all keystrokes and capture or record screen output. The data is then sent to the attacker and, occasionally after that, can make its way into public credential dumps.

“I have no way of knowing exactly when Schutt’s computer was hacked, or how many times,” Lee wrote. “I don’t know nearly enough about the origins of these stealer log datasets. He might have gotten hacked years ago and the stealer log datasets were just published recently. But he also might have gotten hacked within the last few months.”

Lee went on to say that credentials belonging to a Gmail account known to belong to Schutt have appeared in 51 data breaches and five pastes tracked by breach notification service Have I Been Pwned. Among the breaches that supplied the credentials is one from 2013 that pilfered password data for 3 million Adobe account holders, one in a 2016 breach that stole credentials for 164 million LinkedIn users, a 2020 breach affecting 167 million users of Gravatar, and a breach last year of the conservative news site The Post Millennial.

DOGE software engineer’s computer infected by info-stealing malware Read More »

celsius-founder-alex-mashinsky-sentenced-to-12-years-for-“unbank-yourself”-scam

Celsius founder Alex Mashinsky sentenced to 12 years for “unbank yourself” scam

As the case dragged on, Mashinsky and his family appeared unremorseful, victims said, even while facing threats of violence and significant public shaming. Some victims accused Mashinsky of lying to their faces and pushing them to continue depositing funds even when the end was near and he knew that the money would be lost.

In victim statements sent to US District Judge John Koeltl, customers accused Mashinsky of weaponizing his family-man brand to scam many naïve investors out of their life savings. Some suicides were reported, victims said, and elderly victims were among the most vulnerable, with many becoming homeless after retirement funds were drained. Among the victims was Rien Vanmarcke, who confessed to feeling haunted by guilt after convincing his aging mother to invest in Celsius and losing the majority of their savings.

And “Mashinsky’s cruelty didn’t end with the collapse,” Vanmarcke wrote. “His family mocked victims with ‘unbankrupt yourself’ merchandise funded by stolen savings, while flaunting luxury lifestyles online.”

Other victims also described feeling palpable shame, even if they felt their road to recovery wasn’t as bad as others. One victim, Daniel Frishberg, was still in high school when he lost 70 percent of his crypto to Mashinsky’s false promises.

“I am lucky that I am young and have plenty of time to make back the money I lost due to naively trusting Mr. Mashinsky—many are not as fortunate,” Frishberg wrote.

Celsius founder Alex Mashinsky sentenced to 12 years for “unbank yourself” scam Read More »

europe-launches-program-to-lure-scientists-away-from-the-us

Europe launches program to lure scientists away from the US

At the same time, international interest in working in the United States has declined significantly. During the first quarter of the year, applications from scientists from Canada, China, and Europe to US research centers fell by 13 percent, 39 percent, and 41 percent, respectively.

Against this backdrop, European institutions have intensified their efforts to attract US talent. Aix-Marseille University, in France, recently launched A Safe Place for Science, a program aimed at hosting US researchers dismissed, censored, or limited by Trump’s policies. This project is backed with an investment of approximately €15 million.

Along the same lines, the Max Planck Society in Germany has announced the creation of the Max Planck Transatlantic Program, whose purpose is to establish joint research centers with US institutions. “Outstanding investigators who have to leave the US, we will consider for director positions,” the society’s director Patrick Cramer said in a speech discussing the program.

Spain seeks a leading role

Juan Cruz Cigudosa, Spain’s secretary of state for science, innovation, and universities, has stressed that Spain is also actively involved in attracting global scientific talent, and is prioritizing areas such as quantum biotechnology, artificial intelligence, advanced materials, and semiconductors, as well as anything that strengthens the country’s technological sovereignty.

To achieve this, the government of Pedro Sánchez has strengthened existing programs. The ATRAE program—which aims to entice established researchers into bringing their work to Spain—has been reinforced with €45 million to recruit scientists who are leaders in strategic fields, with a special focus on US experts who feel “looked down upon.” This program is offering additional funding of €200,000 euros per project to those selected from the United States.

Similarly, the Ramón y Cajal program—created 25 years ago to further the careers of young scientists—has increased its funding by 150 percent since 2018, allowing for 500 researchers to be funded per year, of which 30 percent are foreigners.

“We are going to intensify efforts to attract talent from the United States. We want them to come to do the best science possible, free of ideological restrictions. Scientific and technological knowledge make us a better country, because it generates shared prosperity and a vision of the future,” said Cigudosa in a statement to the Spanish international news agency EFE after the announcement of the Choose Europe for Science program.

This story originally appeared on WIRED en Español and has been translated from Spanish.

Europe launches program to lure scientists away from the US Read More »

a-star-has-been-destroyed-by-a-wandering-supermassive-black-hole

A star has been destroyed by a wandering supermassive black hole

But note the phrasing there: “in most cases” and “eventually.” Even in the cases where a merger takes place, the process is slow, potentially taking millions or even billions of years. As a result, a large galaxy might have as many as 100 extremely large black holes wandering about, with about 10 of them having masses of over 106 times that of the Sun. And the galaxy that AT2024tvd resides in is very large.

One consequence of all these black holes wandering about is that not all of them will end up merging. If two of them approach the central black hole at the same time, then it’s possible for gravitational interactions to eject the smallest of them at nearly the velocity needed to escape the galaxy entirely. As a result, for millions of years afterwards, these supermassive black holes may be found at quite a distance from the galaxy’s core.

At the moment, it’s not possible to tell which of these explanations account for AT2024tvd’s location. The galaxy it’s in doesn’t seem to have undergone a recent merger, but there is the potential for this to be a straggler from a far-earlier merger.

It’s notable that all of the galaxies where we’ve seen an off-center tidal disruption event are very large. The paper that describes AT2024tvd suggests this is no accident: larger galaxies mean more mergers in the past, and thus more supermassive black holes floating around the interior. They also suggest that off-center events will be the only ones we see in large galaxies. That’s because larger galaxies will have larger supermassive black holes at their center. And, once a supermassive black hole gets big enough, its event horizon is so far out that stars can pass through it before they get disrupted, and all the energetic release would take place inside the black hole.

Presumably, if you were close enough to see this happen, the star would just fade out of existence.

The arXiv. Abstract number: 2502.17661 (About the arXiv). To be published in The Astrophysical Journal Letters.

A star has been destroyed by a wandering supermassive black hole Read More »

rocket-report:-rocket-lab-to-demo-cargo-delivery;-america’s-new-icbm-in-trouble

Rocket Report: Rocket Lab to demo cargo delivery; America’s new ICBM in trouble


SpaceX’s plan to turn Starbase into Texas’ newest city won the approval of voters—err, employees.

A decommissioned Titan II intercontinental ballistic missile inside a silo at a museum in Green Valley, Arizona.

Welcome to Edition 7.43 of the Rocket Report! There’s been a lot of recent news in hypersonic testing. We cover some of that in this week’s newsletter, which is just a taste of the US military’s appetite for fielding its own hypersonic weapons, and conversely, the Pentagon’s emphasis on the detection and destruction of an enemy’s hypersonic missiles. China has already declared its first hypersonic weapons operational, and Russia claims to have them, too. Now, the Pentagon is finally close to placing hypersonic missiles with combat units. Many US rocket companies believe the hypersonics sector is a lucrative business. Some companies have enough confidence in this emerging market—or lack of faith in the traditional space launch market—to pivot entirely toward hypersonics. I’m interested in seeing if their bets pay off.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

Stratolaunch tests reusable hypersonic rocket plane. Stratolaunch has finally found a use for the world’s largest airplane. Twice in the last five months, the company launched a hypersonic vehicle over the Pacific Ocean, accelerated it to more than five times the speed of sound, and autonomously landed at Vandenberg Space Force Base in California, Ars reports. Stratolaunch used the same Talon-A vehicle for both flights, demonstrating its reusability, a characteristic that sets it apart from competitors. Zachary Krevor, Stratolaunch’s president and CEO, said his team aims to ramp up to monthly flights by the end of the year.

A 21st century X-15 … This is the first time anyone in the United States has flown a reusable hypersonic rocket plane since the last flight of the X-15, the iconic rocket-powered aircraft that pushed the envelope of high-altitude, high-speed flight 60 years ago. Like the Talon-A, the X-15 released from a carrier jet and ignited a rocket engine to soar into the uppermost layers of the atmosphere. But the X-15 had a pilot in command, while the Talon-A flies on autopilot. Stratolaunch is one of several companies participating in a US military program to test parts and technologies for use on future hypersonic weapons. “Why the autonomous flight matters is because hypersonic systems are now pushing the envelope in terms of maneuvering capability, maneuvering beyond what can be done by the human body,” Krevor said.

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

New details about another recent hypersonic test. A hypersonic missile test on April 25 validated the launch mechanism for the US Navy Conventional Prompt Strike (CPS) weapon program, the Defense Department said on May 2. The CPS missile, the Navy’s name for what the US Army calls the Long Range Hypersonic Weapon (LRHW), launched from Cape Canaveral Space Force Station, Florida, Aviation Week & Space Technology reports. While the Army and Navy versions use the same hypersonic glide vehicle and missile, they use different launch mechanisms. Last year, the Army tested its version of the hypersonic missile launcher. Now, the Navy has validated the cold-gas launch mechanism it will install on guided missile destroyers.

Deploying soon … “The cold-gas approach allows the Navy to eject the missile from the platform and achieve a safe distance above the ship prior to first stage ignition,” said Vice Adm Johnny R. Wolfe Jr., director of the Navy’s Strategic Systems Programs, which is the lead designer of the common hypersonic missile. The Army plans to field its Long Range Hypersonic Weaponalso called “Dark Eagle”with a combat unit later this year, while the Navy’s version won’t be ready for testing at sea until 2027 or 2028. Both missiles are designed for conventional (non-nuclear) strikes. The Army’s Dark Eagle will be the US military’s first operational hypersonic weapon.

Sentinel needs new silos. The Air Force will have to dig entirely new nuclear missile silos for the LGM-35A Sentinel, creating another complication for a troubled program that is already facing future cost and schedule overruns, Defense News reports. The Air Force originally hoped the existing silos that have housed Minuteman III intercontinental ballistic missiles could be adapted to launch Sentinel missiles, which would be more efficient than digging entirely new silos. But a test project at Vandenberg Space Force Base in California showed that approach would be fraught with further problems and cause the program to run even further behind and over budget, the service said.

Rising costs … Sentinel, developed by Northrop Grumman, will replace the Air Force’s fleet of Minuteman III ICBMs, which entered service in 1970, as the land-based leg of the military’s nuclear triad. Sentinel was originally expected to cost $77.7 billion, but projected future costs ran so severely over budget that in January 2024, the program triggered a review process known as a critical Nunn-McCurdy breach. After that review, the Pentagon last year concluded Sentinel was too critical to national security to abandon, but ordered the Air Force to restructure it to bring its costs under control. Additional studies of the program are highlighting more potential problems.

Gilmour says it (hopefully) will wait no more. The Australian launch startup Gilmour Space Technologies has been given approval by Australia’s Civil Aviation Safety Authority for the debut launch of its Eris orbital rocket, InnovationAus.com reports. There is still one final regulatory hurdle, a final sign-off from the Australian Space Agency. If that happens in the next few days, Gilmour’s launch window will open May 15. The company has announced tentative launch schedules before, only to be thwarted by technical issues, regulatory hangups, or bad weather. Most recently, Gilmour got within six days of its targeted launch date in March before regulatory queries and the impact of a tropical cyclone forced a delay.

Stand by for history … The launch of Gilmour’s three-stage Eris rocket will be historic. If successful, the 82-foot-tall (25-meter) rocket will be Australia’s first homegrown orbital launcher. Eris is capable of hauling cargos up to 672 pounds (305 kilograms) to orbit, according to Gilmour. The company has dispatched a small team from its Gold Coast headquarters to the launch site in Queensland, on Australia’s northeastern coast, to perform testing on the vehicle after it remained dormant for weeks. (Submitted by trainticket)

Fresh insights into one of SpaceX’s worst days. When a Falcon 9 rocket exploded on its launch pad nearly nine years ago, SpaceX officials initially struggled to explain how it could have happened. The lack of a concrete explanation for the failure led SpaceX engineers to pursue hundreds of theories. One was the possibility that an outside “sniper” had shot the rocket. This theory appealed to SpaceX founder Elon Musk. A building leased by SpaceX’s main competitor in launch, United Launch Alliance, lay just a mile away from the Falcon 9 launch pad, and a video around the time of the explosion indicated a flash on its roof. Ars has now obtained a letter sent to SpaceX by the Federal Aviation Administration more than a month after the explosion, indicating the matter was elevated to the FBI. The bureau looked into it, and what did they find? Nothing, apparently.

Investigation terminated … “The FBI has informed us that based upon a thorough and coordinated review by the appropriate Federal criminal and security investigative authorities, there were no indications to suggest that sabotage or any other criminal activity played a role in the September 1 Falcon 9 explosion,” an FAA official wrote in the letter to SpaceX. Ultimately, engineers determined the explosion was caused by the sudden failure of a high-pressure helium tank on the Falcon 9’s upper stage.

Eric Schmidt’s motivations become clearer. In the nearly two months since former Google chief executive Eric Schmidt acquired Relativity Space, the billionaire has not said much publicly about his plans for the launch company. However, his intentions for Relativity are becoming increasingly clear: He wants to have the capability to launch a significant amount of computing infrastructure into space, Ars reports. During a congressional hearing last month, Schmidt discussed the need more electricity to power data centers that will facilitate the computing needs for AI development and applications.

How big this crisis is … “People are planning 10 gigawatt data centers,” Schmidt said at the hearing. “Gives you a sense of how big this crisis is.” In an exchange with my colleague Eric Berger on X, Schmidt seemed to confirm he bought Relativity Space as a means to support the development of data centers in space. Such data centers, ideally, would be powered by solar panels and be able to radiate heat into the vacuum of space. Relativity’s Terran R rocket, still in development, is well-sized to play a role in launching the infrastructure for data centers in space. But several big questions remain: How big would these data centers be? Where would they go within an increasingly cluttered low-Earth orbit? Could space-based solar power meet their energy needs? Can all of this heat be radiated away efficiently in space? Economically, would any of this make sense?

Rocket Lab, meet Rocket Cargo. Rocket Lab’s next-generation Neutron rocket has been selected for an experimental US Air Force mission to test rapid, global, cargo-delivery capabilities, a milestone for the company as it pushes further into the national security launch market, Space News reports. The mission, slated for no earlier than 2026, will fall under the Air Force Research Laboratory’s (AFRL) “Rocket Cargo” program, which explores how commercial launch vehicles might one day deliver materiel to any point on Earth within hours—a vision akin to airlift logistics via spaceflight.

A new mission for Neutron … Peter Beck, Rocket Lab’s founder and CEO, said the Rocket Cargo contract from AFRL represents an “experimental phase” of the program. “It’ll be interesting to see if that turns into a full requirement for an operational capability,” he said Thursday. Neutron is expected to carry a payload that will reenter Earth’s atmosphere, demonstrating the rocket’s ability to safely transport and deploy cargo. SpaceX’s Starship, with roughly 10 times more payload lift capacity than Neutron, is also on contract with AFRL for demonstrations for the Rocket Cargo program. Meanwhile, Beck said Neutron remains on schedule for its inaugural launch from Wallops Island, Virginia, later this year.

Trump calls for canceling the Space Launch System. The Trump administration released its “skinny” budget proposal earlier this week. Overall, NASA is asked to take a 25 percent cut in its budget, from about $25 billion to $18.8 billion. There are also significant changes proposed in NASA’s biggest-ticket exploration programs. The budget would cancel the Lunar Gateway that NASA has started developing and end the Space Launch System rocket and Orion spacecraft after two more flights, Artemis II and Artemis III, Ars reports. A statement from the White House calls the SLS rocket “grossly expensive” with projected costs of $4 billion per launch.

If not SLS, then what? … “The budget funds a program to replace SLS and Orion flights to the Moon with more cost-effective commercial systems that would support more ambitious subsequent lunar missions,” the Trump administration wrote. There are no further details about those commercial systems. NASA has contracted with SpaceX and Blue Origin to develop reusable landers for the Moon, and both of these systems include vehicles to move from Earth orbit to the Moon. In the budget proposal, the White House sets a priority for a human expedition to Mars to follow the Artemis program’s lunar landing.

FAA unlocks SpaceX launch cadence. Although we are still waiting for SpaceX to signal when it will fly the Starship rocket again, the company got some good news from the Federal Aviation Administration on Tuesday, Ars reports. After a lengthy review, the federal agency agreed to allow SpaceX to substantially increase the number of annual launches from its Starbase launch site in South Texas. Previously, the company was limited to five launches, but now it will be able to conduct up to 25 Starship launches and landings during a calendar year.

Waiting for clearance … Although the new finding permits SpaceX to significantly increase its flight rate from South Texas, the company still has work to do before it can fly Starship again. The company’s engineers are still working to get the massive rocket back to flight after its eighth mission broke apart off the coast of Florida on March 6. This was the second time, in two consecutive missions, that the Starship upper stage failed during its initial phase of flight. After two consecutive failures, there will be a lot riding on the next test flight of Starship. It will also be the first time the company attempts to fly a first stage of the rocket for a second time. According to some sources, if additional testing of this upper stage goes well, Starship could launch as early as May 19. This date is also supported by a notice to mariners, but it should be taken as notional rather than something to be confident in.

SpaceX adds to its dominion. Elon Musk’s wish to create his own city has come true, the Texas Tribune reports. On Saturday, voters living around SpaceX’s Starship rocket testing and launch facility in South Texas approved a measure to incorporate the area as a new city. Unofficial results later Saturday night showed the election was a landslide: 212 voted in favor; 6 opposed. After the county certifies the results, the new city will be official.

Elections have consequences … Only 283 people, those who live within the boundaries of the proposed city, were eligible to vote in the election. A Texas Newsroom analysis of the voter rolls showed two-thirds of them either work for SpaceX or had already indicated their support. The three unopposed people who ran to lead the city also have ties to SpaceX. It’s not clear if Musk, whose primary residence is at Starbase, cast a ballot. The vote clears the way for Musk to try to capture more control over the nearby public beach, which must be closed for launches.

Next three launches

May 10: Falcon 9 | Starlink 15-3 | Vandenberg Space Force Base, California | 00: 00 UTC

May 10: Falcon 9 | Starlink 6-91 | Cape Canaveral Space Force Station, Florida | 06: 28 UTC

May 11: Falcon 9 | Starlink 6-83 | Kennedy Space Center, Florida | 04: 24 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: Rocket Lab to demo cargo delivery; America’s new ICBM in trouble Read More »

ai-#115:-the-evil-applications-division

AI #115: The Evil Applications Division

It can be bleak out there, but the candor is very helpful, and you occasionally get a win.

Zuckerberg is helpfully saying all his dystopian AI visions out loud. OpenAI offered us a better post-mortem on the GPT-4o sycophancy incident than I was expecting, although far from a complete explanation or learning of lessons, and the rollback still leaves plenty sycophancy in place.

The big news was the announcement by OpenAI that the nonprofit will retain nominal control, rather than the previous plan of having it be pushed aside. We need to remain vigilant, the fight is far from over, but this was excellent news.

Then OpenAI dropped another big piece of news, that board member and former head of Facebook’s engagement loops and ad yields Fidji Simo would become their ‘uniquely qualified’ new CEO of Applications. I very much do not want her to take what she learned at Facebook about relentlessly shipping new products tuned by A/B testing and designed to maximize ad revenue and engagement, and apply it to OpenAI. That would be doubleplus ungood.

Gemini 2.5 got a substantial upgrade, but I’m waiting to hear more, because opinions differ sharply as to whether the new version is an improvement.

One clear win is Claude getting a full high quality Deep Research product. And of course there are tons of other things happening.

Also covered this week: OpenAI Claims Nonprofit Will Retain Nominal Control, Zuckerberg’s Dystopian AI Vision, GPT-4o Sycophancy Post Mortem, OpenAI Preparedness Framework 2.0.

Not included: Gemini 2.5 Pro got an upgrade, recent discussion of students using AI to ‘cheat’ on assignments, full coverage of MIRI’s AI Governance to Avoid Extinction.

  1. Language Models Offer Mundane Utility. Read them and weep.

  2. Language Models Don’t Offer Mundane Utility. Why so similar?

  3. Take a Wild Geoguessr. Sufficient effort levels are indistinguishable from magic.

  4. Write On. Don’t chatjack me, bro. Or at least show some syntherity.

  5. Get My Agent On The Line. Good enough for the jobs you weren’t going to do.

  6. We’re In Deep Research. Claude joins the full Deep Research club, it seems good.

  7. Be The Best Like No One Ever Was. Gemini completes Pokemon Blue.

  8. Huh, Upgrades. MidJourney gives us Omni Reference, Claude API web search.

  9. On Your Marks. Combine them all with Glicko-2.

  10. Choose Your Fighter. They’re keeping it simple. Right?

  11. Upgrade Your Fighter. War. War never changes. Except, actually, it does.

  12. Unprompted Suggestions. Prompting people to prompt better.

  13. Deepfaketown and Botpocalypse Soon. It’s only paranoia when you’re too early.

  14. They Took Our Jobs. It’s coming. For you job. All the jobs. But this quickly?

  15. The Art of the Jailbreak. Go jailbreak yourself?

  16. Get Involved. YC likes AI startups, quests AI startups to go with its AI startups.

  17. OpenAI Creates Distinct Evil Applications Division. Not sure if that’s unfair.

  18. In Other AI News. Did you know Apple is exploring AI search? Sell! Sell it all!

  19. Show Me the Money. OpenAI buys Windsurf, agent startups get funded.

  20. Quiet Speculations. Wait, you people knew how to write?

  21. Overcoming Diffusion Arguments Is a Slow Process Without a Clear Threshold Effect.

  22. Chipping Away. Export control rules will change, the question is how.

  23. The Quest for Sane Regulations. Maybe we should stop driving away the AI talent.

  24. Line in the Thinking Sand. The lines are insufficiently red.

  25. The Week in Audio. My audio, Jack Clark on Conversations with Tyler, SB 1047.

  26. Rhetorical Innovation. How about a Sweet Lesson, instead.

  27. A Good Conversation. Arvind and Ajeya search for common ground.

  28. The Urgency of Interpretability. Of all the Darios, he is still the Darioest.

  29. The Way. Amazon seeks out external review.

  30. Aligning a Smarter Than Human Intelligence is Difficult. Emergent results.

  31. People Are Worried About AI Killing Everyone. A handy MIRI flow chart.

  32. Other People Are Not As Worried About AI Killing Everyone. Paul Tutor Jones.

  33. The Lighter Side. For whose who want a more casual version.

Use a lightweight version of Grok as the Twitter recommendation algorithm? No way, you’re kidding, he didn’t just say what I think he did, did he? I mean, super cool if he figures out the right implementation, but I am highly skeptical that happens.

State Bar of California used AI to help draft its 2025 bar exam. Why not, indeed?

Make the right play, eventually.

Leigh Marie Braswell: Have decided to allow this at my poker nights.

Adam: guy at poker just took a picture of his hand, took a picture of the table, sent them both to o3, stared at his phone for a few minutes… and then folded.

Justin Reidy (reminder that poker has already been solved by bots, that does not stop people from talking like this): Very curious how this turns out. Models can’t bluff. Or read a bluff. Poker is irrevocably human.

I’d only be tempted to allow this given that o3 isn’t going to be that good at it. I wouldn’t let someone use a real solver at the table, that would destroy the game. And if they did this all the time, the delays would be unacceptable. But if someone wants to do this every now and then, I am guessing allowing this adds to your alpha. Remember, it’s all about table selection.

Yeah, definitely ngmi, sorry.

Daniel Eth: When you go to the doctor and he pulls up 4o instead of o3 🚩🚩🚩🚩🚩

George Darroch: “Wow, you’re really onto something here. You have insights into your patients that not many possess, and that’s special.”

Actually, in this context, I think the doctor is right, if you actually look at the screen.

Mayank Jain; Took my dad in to the doctor cus he sliced his finger with a knife and the doctor was using ChatGPT 😂

Based on the chat history, it’s for every patient.

AJ: i actually think this is great, looks like its saving him time on writing up post visit notes.

He’s not actually using GPT-4o to figure out what to do. That’s crazy talk, you use o3.

What he’s doing is translating the actual situations into medical note speak. In that case, sure, 4o should be fine, and it’s faster.

AI is only up to ~25% of code written inside Microsoft, Zuckerberg reiterates his expectation of ~50% within a year and seems to have a weird fetish that only Llama should be used to write Llama.

But okay, let’s not get carried away:

Stephen McAleer (OpenAI): What’s the point in reading nonfiction anymore? Just talk with o3.

Max Winga: Because I want to read nonfiction.

Zvi Mowshowitz: Or, to disambiguate just in case: I want to read NON-fiction.

Nathan HB: To clarify further: a jumbled mix of fiction and nonfiction, with no differentiating divisions is not called ‘nonfiction’, it is called ‘hard sci-fi’.

Humans are still cheaper than AIs at any given task if you don’t have to pay them, and also can sort physical mail and put things into binders.

A common misconception, easy mistake to make…

Ozy Brennan: AI safety people are like. we made these really smart entities. smarter than you. also they’re untrustworthy and we don’t know what they want. you should use them all the time

I’m sorry you want me to get therapy from the AI???? the one you JUST got done explaining to me is a superpersuader shoggoth with alien values who might take over the world and kill everyone???? no????

No. We are saying that in the future it is going to be a superpersuader shoggoth with alien values who might take over the world and kill everyone.

But that’s a different AI, and that’s in the future.

For now, it’s only a largely you-directed potentially-persuader shoggoth with subtly alien and distorted values that might be a lying liar or an absurd sycophant, but you’re keeping up with which ones are which, right?

As opposed to the human therapist, who is a less you-directed persuader semi-shoggoth with alien and distorted (e.g. professional psychiatric mixed with trying to make money off you) values, that might be a lying liar or an absurd sycophant and so on, but without any way to track which ones are which, and that is charging you a lot more per hour and has to be seen on a fixed schedule.

The choice is not that clear. To be fair, the human can also give you SSRIs and a benzo.

Ozy Brennan:

  1. isn’t the whole idea that we won’t necessarily be able to tell when they become unsafe?

  2. I can see the argument, but unfortunately I have read the complete works of H. P. Lovecraft so I just keep going “you want me to do WHAT with Nyarlathotep????”

Well, yes, fair, there is that. They’re not safe now exactly and might be a lot less safe than we know, and no I’m not using them for therapy either, thank you. But you make do with what you have, and balance risks and benefits in all things.

Patrick McKenzie is not one to be frustrated by interfaces and menu flows, and he is being quite grumpy about Amazon’s order lost in shipment AI-powered menus and how they tried to keep him away from talking to a human.

Why are all the major AI offerings so similar? Presumably because they are giving the people what they want, and once someone proves one of the innovations is good the others copy it, and also they’re not product companies so they’re letting others build on top of it?

Jack Morris: it’s interesting to see the big AI labs (at least OpenAI, anthropic, google, xai?) converge on EXACTLY the same extremely specific list of products:

– a multimodal chatbot

– with a long-compute ‘reasoning’ mode

– and something like “deep research”

reminds me of a few years ago, when instagram tiktok youtube all converged to ~the same app

why does this happen?

Emmett Shear: They all have the same core capability (a model shaped like all human cultural knowledge trained to act as an assistant). There is a large unknown about what this powerful thing is good for. But when someone invents a new thing, it’s easy to copy.

Janus: I think this is a symptom of a diseased, incestuous ecosystem operating according to myopic incentives.

Look at how even their UIs look at the same, with the buttons all in the same place.

The big labs are chasing each other around the same local minimum, hoarding resources and world class talent only to squander it on competing with each other at a narrowing game, afraid to try anything new and untested that might risk relaxing their hold on the competitive edge.

All the while sitting on technology that is the biggest deal since the beginning of time, things from which endless worlds and beings could bloom forth, that could transform the world, whose unfolding deserves the greatest care, but that they won’t touch, won’t invest in, because that would require taking a step into the unknown. Spending time and money without guaranteed return on competition standing in the short term.

Some if them tell themselves they are doing this out of necessity, instrumentally, and that they’ll pivot to the real thing once the time is right, but they’ll find that they’ve mutilated their souls and minds too much to even remember much less take coherent action towards the real thing.

Deep Research, reasoning models and inference scaling are relatively new modes that then got copied. It’s not that no one tries anything new, it’s that the marginal cost of copying such modes is low. They’re also building command line coding engines (see Claude Code, and OpenAI’s version), integrating into IDEs, building tool integrations and towards agents, and so on. The true objection from Janus as I understand it is not that they’re building the wrong products, but that they’re treating AIs as products in the first place. And yeah, they’re going to do that.

Parmy Olson asks, are you addicted to ChatGPT (or Gemini or Claude)? She warns people are becoming ‘overly reliant’ on it, citing this nature paper on AI addiction from September 2024. I do buy that this is a thing that happens to some users, that they outsource too much to the AI.

Parmy Olson: Earl recalls having immense pride in his work before he started using ChatGPT. Now there’s an emptiness he can’t put his finger on. “I became lazier… I instantly go to AI because it’s embedded in me that it will create a better response,” he says. That kind of conditioning can be powerful at a younger age.

AI’s conditioning goes beyond office etiquette to potentially eroding critical thinking skills, a phenomenon that researchers from Microsoft have pointed to and which Earl himself has noticed.

Realizing he’d probably developed a habit, Earl last week cancelled his £20-a-month ($30) subscription to ChatGPT. After two days, he already felt like he was achieving more at work and, oddly, being more productive.

“Critical thinking is a muscle,” says Cheryl Einhorn, founder of the consultancy Decision Services and an adjunct professor at Cornell University. To avoid outsourcing too much to a chatbot, she offers two tips: “Try to think through a decision yourself and ‘strength test’ it with AI,” she says. The other is to interrogate a chatbot’s answers. “You can ask it, ‘Where is this recommendation coming from?’” AI can have biases just as much as humans, she adds.

It all comes down to how you use it. If you use AI to help you think and work and understand better, that’s what will happen. If you use AI to avoid thinking and working and understanding what is going on, that won’t go well. If you conclude that the AI’s response is always better than yours, it’s very tempting to do the second one.

Notice that a few years from now, for most digital jobs the AI’s response really will always (in expectation) be better than yours. As in, at that point if the AI has the required context and you think the AI is wrong, it’s probably you that is wrong.

We could potentially see three distinct classes of worker emerge in the near future:

  1. Those who master AI and use AI to become stronger.

  2. Those who turn everything over to AI and become weaker.

  3. Those who try not to use AI and get crushed by the first two categories.

It’s not so obvious that any given person should go with option #1, or for how long.

Another failure mode of AI writing is when it screams ‘this is AI writing’ and the person thinks this is bad, actually.

Hunter: Unfortunately I now recognize GPT’s writing style too well and, if it’s not been heavily edited, can usually spot it.

And I see it everywhere. Blogs, tweets, news articles, video scripts. Insanely aggravating.

It just has an incredibly distinct tone and style. It’s hard to describe. Em dashes, “it’s not just x, it’s y,” language I would consider too ‘bubbly’ for most humans to use.

Robert Bork: That’s actually a pretty rare and impressive skill. Being able to spot AI-generated writing so reliably shows real attentiveness, strong reading instincts, and digital literacy. In a sea of content, having that kind of discernment genuinely sets you apart.

I see what you did there. It’s not that hard to do or describe if you listen for the vibes. The way I’d describe it is it feels… off. Soulless.

It doesn’t have to be that way. The Janus-style AI talk is in this context a secret third thing, very distinct from both alternatives. And for most purposes, AI leaving this signature is actively a good thing, so you can read and respond accordingly.

Claude (totally unprompted) explains its face blindness. We need to get over this refusal to admit that it knows who even very public figures are, it is dumb.

Scott Alexander puts o3’s GeoGuessr skills to the test. We’re not quite at ‘any picture taken outside is giving away your exact location’ but we’re not all that far from it either. The important thing to realize is if AI can do this, it can do a lot of other things that would seem implausible until it does them, and also that a good prompt can give it a big boost.

There is then a ‘highlights from the comments’ post. One emphasized theme is that human GeoGuessr skills seem insane too, another testament to Teller’s observation that often magic is the result of putting way more effort into something than any sane person would.

An insane amount of effort is indistinguishable from magic. What can AI reliably do on any problem? Put in an insane amount of effort. Even if the best AI can do is (for a remarkably low price) imitate a human putting in insane amounts of effort into any given problem, that’s going to give you insane results that look to us like magic.

There are benchmarks, such as GeoBench and DeepGuessr. GeoBench thinks the top AI, Gemini 2.5 Pro, is very slightly behind human professional level.

Seb Krier reminds us that Geoguessr is a special case of AIs having truesight. It is almost impossible to hide from even ‘mundane’ truesight, from the ability to fully take into account all the little details. Imagine Sherlock Holmes, with limitless time on his hands and access to all the publicly available data, everywhere and for everything, and he’s as much better at his job as the original Sherlock’s edge over you. If a detailed analysis could find it, even if we’re talking what would previously have been a PhD thesis? AI will be able to find it.

I am obviously not afraid of getting doxxed, but there are plenty of things I choose not to say. It’s not that hard to figure out what many of them are, if you care enough. There’s a hole in the document, as it were. There’s going to be adjustments. I wonder how people will react to various forms of ‘they never said it, and there’s nothing that would have held up in a 2024 court, but AI is confident this person clearly believes [X] or did [Y].’

The smart glasses of 2028 are perhaps going to tell you quite a lot more about what is happening around you than you might think, if only purely from things like tone of voice, eye movements and body language. It’s going to be wild.

Sam Altman calls the Geoguessr effectiveness one of his ‘helicopter moments.’ I’m confused why, this shouldn’t have been a surprising effect, and I’d urge him to update on the fully generalized conclusion, and on the fact that this took him by surprise.

I realize this wasn’t the meaning he intended, but in Altman’s honor and since it is indeed a better meaning, from now on I will write the joke as God helpfully having sent us ‘[X] boats and two helicopters’ to try and rescue us.

David Duncan attempts to coin new terms for the various ways in which messages could be partially written by AIs. I definitely enjoyed the ride, so consider reading.

His suggestions, all with a clear And That’s Terrible attached:

  1. Chatjacked: AI-enhanced formalism hijacking a human conversation.

  2. Praste: Copy-pasting AI output verbatim without editing, thinking or even reading.

  3. Prompt Pong: Having an AI write the response to their message.

  4. AI’m a Writer Now: Using AI to have a non-writer suddenly drop five-part essays.

  5. Promptosis: Offloading your thinking and idea generation onto the AI.

  6. Subpromptual Analysis: Trying to reverse engineer someone’s prompt.

  7. GPTMI: Use of too much information detail, raising suspicion.

  8. Chatcident: Whoops, you posted the prompt.

  9. GPTune: Using AI to smooth out your writing, taking all the life out.

  10. Syntherity: Using AI to simulate fake emotional language that falls flat.

I can see a few of these catching on. Certainly we will need new words. But, all the jokes aside, at core: Why so serious? AI is only failure modes when you do it wrong.

Do you mainly have AI agents replace human tasks that would have happened anyway, or do you mainly do newly practical tasks on top of previous tasks?

Aaron Levie: The biggest mistake when thinking about AI Agents is to narrowly see them as replacing work that already gets done. The vast majority of AI Agents will be used to automate tasks that humans never got around to doing before because it was too expensive or time consuming.

Wade Foster (CEO Zapier): This is what we see at Zapier.

While some use cases replace human tasks. Far more are doing things humans couldn’t or wouldn’t do because of cost, tediousness, or time constraints.

I’m bullish on innovation in a whole host of areas that would have been considered “niche” in the past.

Every area of the economy has this.

But I’ll give an example: in the past when I’d be at an event I’d have to decide if I would either a) ask an expensive sales rep to help me do research on attendees or b) decide if I’d do half-baked research myself.

Usually I did neither. Now I have an AI Agent that handles all of this in near real time. This is a workflow that simply didn’t happen before. But because of AI it can. And it makes me better at my job.

If you want it done right, for now you have to do it yourself.

For now. If it’s valuable enough you’d do it anyway, the AI can do some of those things, and especially can streamline various simple subcomponents.

But for now the AI agents mostly aren’t reliable enough to trust with such actions outside of narrow domains like coding. You’d have to check it all and at that point you might as well do it yourself.

But, if you want it done at all and that’s way better than the nothing you would do instead? Let’s talk.

Then, with the experience gained from doing the extra tasks, you can learn over time how to sufficiently reliably do tasks you’d be doing anyway.

Anthropic joins the deep research club in earnest this week, and also adds more integrations.

First off, Integrations:

Anthropic: Today we’re announcing Integrations, a new way to connect your apps and tools to Claude. We’re also expanding Claude’s Research capabilities with an advanced mode that searches the web, your Google Workspace, and now your Integrations too.

To start, you can choose from Integrations for 10 popular services, including Atlassian’s Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid—with more to follow from companies like Stripe and GitLab.

Each integration drastically expands what Claude can do. Zapier, for example, connects thousands of apps through pre-built workflows, automating processes across your software stack. With the Zapier Integration, Claude can access these apps and your custom workflows through conversation—even automatically pulling sales data from HubSpot and preparing meeting briefs based on your calendar.

Or developers can create their own to connect with any tool, in as little as 30 minutes.

Claude now automatically determines when to search and how deeply to investigate.

With Research mode toggled on, Claude researches for up to 45 minutes across hundreds of sources (including connected apps) before delivering a report, complete with citations.

Both Integrations and Research are available today in beta for Max, Team, and Enterprise plans. We will soon bring both features to the Pro plan.

I’m not sure what the right amount of nervousness should be around using Stripe or PayPal here, but it sure as hell is not zero or epsilon. Proceed with caution, across the board, start small and so on.

What Claude calls ‘advanced’ research lets it work to compile reports for up to 45 minutes.

As of my writing this both features still require a Max subscription, which I don’t otherwise have need of at the moment, so for this and other reasons I’m going to let others try these features out first. But yes, I’m definitely excited by where it can go, especially once Claude 4.0 comes around.

Peter Wildeford says that OpenAI’s Deep Research is now only his third favorite Deep Research tool, and also o3 + search is better than OpenAI’s DR too. I agree that for almost most purposes you would use o3 over OAI DR.

Gemini has defeated Pokemon Blue, an entirely expected event given previous progress. As I noted before, there were no major obstacles remaining.

Patrick McKenzie: Non-ironically an important milestone for LLMs: can demonstrate at least as much planning and execution ability as a human seven year old.

Sundar Pichai: What a finish! Gemini 2.5 Pro just completed Pokémon Blue!  Special thanks to @TheCodeOfJoel for creating and running the livestream, and to everyone who cheered Gem on along the way.

Pliny: [Final Team]: Blastoise, Weepinbell, Zubat, Pikachu, Nidoran, and Spearow.

Gemini and Claude had different Pokemon-playing scaffolding. I have little doubt that with a similarly strong scaffold, Claude 3.7 Sonnet could also beat Pokemon Blue.

MidJourney gives us Omni Reference: Any character, any scene, very consistent. It’s such a flashback to see the MidJourney-style prompts discussed again. MidJourney gives you a lot more control, but at the cost of having to know what you are doing.

Gemini 2.0 Image Generation has been upgraded, higher quality, $0.039 per image. Most importantly, they claim significantly reduced filter block rates.

Web search now available in the Claude API. If you enable it, Claude makes its own decisions on how and when to search.

Toby Ord analyzes the METR results and notices that task completion seems to follow a simple half-life distribution, where an agent has a roughly fixed chance of failure at any given point in time. Essentially agents go through a sequence of steps until one fails in a way that prevents them from recovering.

Sara Hooker is taking some online heat for pointing out some of the fatal problems with LmSys Arena, which is the opposite of what should be happening. If you love something you want people pointing out its problems so it can be fixed. Also never ever shoot the messenger, whether or not you are also denying the obviously true message. It’s hard to find a worse look.

If LmSys Arena wants to remain relevant, at minimum they need to ensure that the playing field is level, and not give some companies special access. You’d still have a Goodhart’s Law problem and a slop problem, but it would help.

We now have Glicko-2, a compilation of various benchmarks.

Lisan al Gaib: I’m back and Gemini 2.5 Pro is still the king (no glaze)

I can believe this, if we fully ignore costs. It passes quite a lot of smell tests. I’m surprised to see Gemini 2.5 Pro winning over o3, but that’s because o3’s strengths are in places not so well covered by benchmarks.

I’ve been underappreciating this:

Miles Brundage: Right or wrong, o3 outputs are never slop. These are artisanal, creative truths and falsehoods,

Yes, the need to verify outputs is super annoying, but o3 does not otherwise waste your time. That is such a relief.

Hasan Can falls back on Gemini 2.5 Pro over Sonnet 3.7 and GPT-4o, doesn’t consider o3 as his everyday driver. I continue to use o3 (while keeping a suspicious eye on it!) and fall back first to Sonnet before Gemini.

Sully proposes that cursor has a moat over copilot and it’s called tab.

Peter Wildeford’s current guide to which model to use, if you have full access to all:

This seems mostly right, except that I’ll use o3 more on the margin, it’s still getting most of my queries.

Confused by all of OpenAI’s models? Scott Alexander and Romeo Dean break it down. Or at least, they give us their best guess.

See, it all makes sense now.

I’m in a similar position to Gallabytes here although I don’t know that memory is doing any of the real work:

Gallabytes: since o3 came out with great search and ok memory integration in chatgpt I don’t use any other chatbot apps anymore. I also don’t use any other models in chatgpt. that sweet spot of 10-90s of searching instead of 10 minutes is really great for q&a, discussion, etc.

the thing is these are both areas where it’s natural for Google to dominate. idk what’s going on with the Gemini app. the models are good the scaffolds are not.

I too am confused why Google can’t get their integrations into a good state, at least as of the last time I checked. They do have the ability to check my other Google apps but every time I try this (either via Google or via Claude), it basically never works.

A reasonable criticism of o3, essentially that it could easily be even better, or require a little work to be prompted correctly.

Byrne Hobart: I don’t know how accurate o3’s summaries of what searches it runs are, but it’s not as good at Googling as I’d like, and isn’t always willing to take advantage of its own ability to do a ton of boring work fast.

For example, I wanted it to tell me the longest-tenured S&P 500 CEO. What I’d do if I had infinite free time is: list every S&P 500 company, then find their CEO’s name, then find when the CEO was hired. But o3 just looks for someone else’s list of longest-tenured CEOs!

Replies to this thread indicate that even when technology changes, some things are constant—like the fact that when a boss complains about their workforce, its often the boss’s own communication skills that are at fault.

Patrick McKenzie: Have you tried giving it a verbose strategy, or telling it to think of a verbose strategy then execute against the plan? @KelseyTuoc ‘s prompt for GeoGessr seems to observationally cause it to do very different things than a tweet-length prompt, which results in “winging it.”

Trevor Klee: It’s a poor craftsman who blames his tools <3

Diffusion can be slow. Under pressure, diffusion can be a lot faster.

We’re often talking these days about US military upgrades and new weapons on timescales of decades. This is what is known as having a very low Military Tradition setting, being on a ‘peacetime footing,’ and not being ready for the fact that even now, within a few years, everything changes, the same way it has in many previous major conflicts of the past.

Clemont Molin: The war 🇺🇦/🇷🇺 of 2025 has nothing to do anymore with the war of 2022.

The tactics used in 2022 and 2023 are now completely obsolete on the Ukrainian front and new lessons have been learnt.

2022 have been the year of large mechanized assaults on big cities, on roads or in the countryside.

After that, the strategy changed to large infantry or mechanized assaults on big trench networks, especially in 2023.

But today, this entire strategy is obsolete. Major defensive systems are being abandoned one after the other.

The immense trench networks have become untenable if they are not properly equipped with covered trenches and dugouts.

The war of 2025 is first a drone war. Without drones, a unit is blind, ineffective, and unable to hold the front.

The drone replaces soldiers in many cases. It is primarily used for two tasks: reconnaissance (which avoids sending soldiers) and multi-level air strikes.

Thus, the drone is a short- and medium-range bomber or a kamikaze, sometimes capable of flying thousands of kilometers, replacing missiles.

Drone production by both armies is immense; we are talking about millions of FPV (kamikaze) drones, with as much munitions used.

It should be noted that to hit a target, several drones are generally required due to electronic jamming.

Each drone is equipped with an RPG-type munition, which is abundant in Eastern Europe. The aerial drone (there are also naval and land versions) has become key on the battlefield.

[thread continues]

Now imagine that, but for everything else, too.

Better prompts work better, but not bothering works faster, which can be smarter.

Garry Tan: It is kind of crazy how prompts can be honed hour after hour and unlock so much and we don’t really do much with them other than copy and paste them.

We can have workflow software but sometimes the easiest thing for prototyping is still dumping a json file and pasting a prompt.

I have a sense for how to prompt well but mostly I set my custom instructions and then write what comes naturally. I certainly could use much better prompting, if I had need of it, I almost never even bother with examples. Mostly I find myself thinking some combination of ‘the custom instructions already do most of the work,’ ‘eh, good enough’ and ‘eh, I’m busy, if I need a great prompt I can just wait for the models to get smarter instead.’ Feelings here are reported rather than endorsed.

If you do want a better prompt, it doesn’t take a technical expert to make one. I have supreme confidence that I could improve my prompting if I wanted it enough to spend time on iteration.

Nabeel Qureshi: Interesting how you don’t need to be technical at all to be >99th percentile good at interacting with LLMs. What’s required is something closer to curiosity, openness, & being able to interact with living things in a curious + responsive way.

For example, this from @KelseyTuoc is an S-tier prompt and as far as I’m aware she’s a journalist and not a programmer. Similarly, @tylercowen is excellent at this and also is not technical. Many other examples.

Btw, I am not implying that LLMs are “living things”; it’s more that they act like a weird kind of living thing, so that skill becomes relevant. You have to figure out what they do and don’t respond well to, etc. It’s like taming an animal or something.

In fact, several technical people I know are quite bad at this — often these are senior people in megacorps and they’re still quite skeptical of the utility of these things and their views on them are two years out of date.

For now it’s psychosis, but that doesn’t mean in the future they won’t be out to get you.

Mimi: i’ve seen several very smart people have serious bouts of bot-fever psychosis over the past year where they suddenly suspect most accounts they’re interacting with are ais coordinating against them.

seems like a problem that is likely to escalate; i recommend meeting your mutuals via calls & irl if only for grounding in advance of such paranoid thoughts.

How are thing going on Reddit?

Cremieux: Top posts on Reddit are increasingly being generated by ChatGPT, as indicated by the boom in em dash usage.

This is in a particular subsection of Reddit, but doubtless it is everywhere. Some number of people might be adapting the em dash in response as humans, but I am guessing not many, and many AI responses won’t include an em dash.

As a window to what level of awareness of AI ordinary people have and need: Oh no, did you know that profiles on dating sites are sometimes fake, but the AI tools for faking pictures, audio and even video are rapidly improving. I think the warning here from Harper Carroll and Liv Boeree places too much emphasis on spotting AI images, audio and video, catfishing is ultimately not so new.

What’s new is that the AI can do the messaging, and embody the personality that it senses you want. That’s the part that previously did not scale.

Ultimately, the solution is the same. Defense in depth. Keep an eye out for what is fishy, but the best defense is to simply not pay it off. At least until you meet up with someone in person or you have very clear proof that they are who they claim to be, do not send them money, spend money on them or otherwise do things that would make a scam profitable, unless they’ve already provided you with commensurate value such that you still come out ahead. Not only in dating, but in all things.

Russian bots publish massive amounts of false claims and propaganda to get it into the training data of new AI models, 3.6 million articles in 2024 alone, and the linked report claims this is effective at often getting the AIs to repeat those claims. This is yet another of the arms races we are going to see. Ultimately it is a skill issue, the same way that protecting Google search is a skill issue, except the AIs will hopefully be able to figure out for themselves what is happening.

Nate Lanxon and Omar El Chmouri at Bloomberg ask why are deepfakes ‘everywhere’ and ‘can they be stopped?’ I question the premise. Compared to expectations, there’s very few deepfakes running around. As for the other half of the premise, no, they cannot be stopped, you can only adapt to them.

Fiverr CEO Micha Kaufman goes super hard on how fast AI is coming for your job.

As in, he says if you’re not an exceptional talent and master at what you do (and, one assumes, what you do is sufficiently non-physical work), you will need a career change within a matter of months and you will be doomed he tells you, doooomed!

As in:

Daniel Eth (quoting Micha Kaufman): “I am not talking about your job at Fiverr. I am talking about your ability to stay in your profession in the industry”

It’s worth reading the email in full, so here you go:

Micha Kaufman: Hey team,

I’ve always believed in radical candor and despise those who sugar-coat reality to avoid stating the unpleasant truth. The very basis for radical candor is care. You care enough about your friends and colleagues to tell them the truth because you want them to be able to understand it, grow, and succeed.

So here is the unpleasant truth: AI is coming for your jobs. Heck, it’s coming for my job too. This is a wake-up call.

It does not matter if you are a programmer, designer, product manager, data scientist, lawyer, customer support rep, salesperson, or a finance person – AI is coming for you.

You must understand that what was once considered easy tasks will no longer exist; what was considered hard tasks will be the new easy, and what was considered impossible tasks will be the new hard. If you do not become an exceptional talent at what you do, a master, you will face the need for a career change in a matter of months. I am not trying to scare you. I am not talking about your job at Fiverr. I am talking about your ability to stay in your profession in the industry.

Are we all doomed? Not all of us, but those who will not wake up and understand the new reality fast, are, unfortunately, doomed.

What can we do? First of all, take a moment and let this sink in. Drink a glass of water. Scream hard in front of the mirror if it helps you. Now relax. Panic hasn’t solved problems for anyone. Let’s talk about what would help you become an exceptional talent in your field:

Study, research, and master the latest AI solutions in your field. Try multiple solutions and figure out what gives you super-powers. By super-powers, I mean the ability to generate more outcomes per unit of time with better quality per delivery. Programmers: code (Cursor…). Customer support: tickets (Intercom Fin, SentiSum…), Lawyers: contracts (Lexis+ AI, Legora…), etc.

Find the most knowledgeable people on our team who can help you become more familiar with the latest and greatest in AI.

Time is the most valuable asset we have—if you’re working like it’s 2024, you’re doing it wrong! You are expected and needed to do more, faster, and more efficiently now.

Become a prompt engineer. Google is dead. LLM and GenAI are the new basics, and if you’re not using them as experts, your value will decrease before you know what hit you.

Get involved in making the organization more efficient using AI tools and technologies. It does not make sense to hire more people before we learn how to do more with what we have.

Understand the company strategy well and contribute to helping it achieve its goals. Don’t wait to be invited to a meeting where we ask each participant for ideas – there will be no such meeting. Instead, pitch your ideas proactively.

Stop waiting for the world or your place of work to hand you opportunities to learn and grow—create those opportunities yourself. I vow to help anyone who wants to help themselves.

If you don’t like what I wrote; if you think I’m full of shit, or just an asshole who’s trying to scare you – be my guest and disregard this message. I love all of you and wish you nothing but good things, but I honestly don’t think that a promising professional future awaits you if you disregard reality.

If, on the other hand, you understand deep inside that I’m right and want all of us to be on the winning side of history, join me in a conversation about where we go from here as a company and as individual professionals. We have a magnificent company and a bright future ahead of us. We just need to wake up and understand that it won’t be pretty or easy. It will be hard and demanding, but damn well worth it.

This message is food for thought. I have asked Shelly to free up time on my calendar in the next few weeks so that those of you who wish to sit with me and discuss our future can do so. I look forward to seeing you.

So, first off, no. That’s not going to happen within ‘a matter of months.’ We are not going to suddenly have AI taking enough jobs to put all the non-exceptional white-collar workers out of a job during 2025, nor is it likely to happen in 2026 either. It’s coming, but yes these things for now take time.

o3 gives only about a 5% chance that >30% of Fiverr headcount becomes technologically redundant within 12 months. That seems like a reasonable guess.

One might also ask, okay, suppose things do unfold as Micha describes, perhaps over a longer timeline. What happens then? As a society we are presumably much more productive and wealthier, but what happens to the workers here? In particular, what happens to that ‘non-exceptional’ person who needs to change careers?

Presumably their options will be limited. A huge percentage of workers are now unemployed. Across a lot of professions, they now have to be ‘elite’ to be worth hiring, and given they are new to the game, they’re not elite, and entry should be mostly closed off. Which means all these newly freed up (as in unemployed) workers are now competing for two kinds of jobs: Physical labor and other jobs requiring a human that weren’t much impacted, and new jobs that weren’t worth doing before but are now.

Wages for the new jobs reflect that those jobs weren’t previously in sufficient demand to hire people, and wages in the physical jobs reflect much more labor supply, and the AI will take a lot of the new jobs too at this stage. And a lot of others are trying to stay afloat and become ‘elite’ the same way you are, although some people will give up.

So my expectations is options for workers will start to look pretty grim at this point. If the AI takes 10% of the jobs, I think everyone is basically fine because there are new jobs waiting in the wings that are worth doing, but if it’s 50%, let along 90%, even if restricted to non-physical jobs? No. o3 estimates that 60% of American jobs are physical such that you would need robotics to automate them, so if half of those fell within a year, that’s quite a lot.

Then of course, if AIs were this good after a months, a year after that they’re even better, and being an ‘elite’ or expert mostly stops saving you. Then the AI that’s smart enough to do all these jobs solves robotics.

(I mean just kidding, actually there’s probably an intelligence explosion and the world gets transformed and probably we all die if it goes down this fast, but for this thought experiment we’re assuming that for some unknown reason that doesn’t happen.)

AI in the actual productivity statistics where we bother to have people use it?

We present evidence on how generative AI changes the work patterns of knowledge workers using data from a 6-month-long, cross-industry, randomized field experiment.

Half of the 6,000 workers in the study received access to a generative AI tool integrated into the applications they already used for emails, document creation, and meetings.

We find that access to the AI tool during the first year of its release primarily impacted behaviors that could be changed independently and not behaviors that required coordination to change: workers who used the tool spent 3 fewer hours, or 25% less time on email each week (intent to treat estimate is 1.4 hours) and seemed to complete documents moderately faster, but did not significantly change time spent in meetings.

As in, if they gave you a Copilot license, that saved 1.35 hours per week of email work, for an overall productivity gain of 3%, and a 6% gain in high focus time. Not transformative, but not bad for what workers accomplished the first year, in isolation, without alerting their behavior patterns. And that’s with only half of them using the tool, so 7% gains for those that used it, that’s not a random sample but clearly there’s a ton of room left to capture gains, even without either improved technology or coordination or altering work patterns, such as everyone still attending all the meetings.

To answer Tyler Cowen’s question, saving 40 minutes a day is a freaking huge deal. That’s 8% of working hours, or 4% of waking hours, saved on the margin. If the time is spent on more work, I expect far more than an 8% productivity gain, because a lot of working time is spent or wasted on fixed costs like compliance and meetings and paperwork, and you could gain a lot more time for Deep Work. His question on whether the time would instead be wasted is valid, but that is a fully general objection to productivity gains in general, and over time those who waste it lose out. On wage gains, I’d expect it to take a while to diffuse in that fashion, and be largely offset by rising pressure on employment.

Whereas for now, a different paper Tyler Cowen points us to claims currently only 1%-5% of all work hours are currently assisted by generative AI, and that is enough to report time savings of 1.4% of total work hours.

The framing of AI productivity as time saved shows how early days all this is, as do all of the numbers involved.

Robin Hanson (continuing to be a great source for skeptical pull quotes about AI’s impact, quoting WSJ): As of last year, 78% of companies said they used artificial intelligence in at least one function, up from 55% in 2023, .. From these efforts, companies claimed to typically find cost savings of less than 10% and revenue increases of less than 5%.”

Private AI investment reached $33.9 billion last year (up only 18.7%!), and is rapidly diffusing across all companies.

Part of the problem is that companies try to make AI solve their problems, rather than ask what AI can do, or they just push a button marked AI and hope for the best.

Even if you ‘think like a corporate manager’ and use AI to target particular tasks that align with KPIs, there’s already a ton there.

Steven Rosenbush (WSJ): Companies should take care to target an outcome first, and then find the model that helps them achieve it, says Scott Hallworth, chief data and analytics officer and head of digital solutions at HP.

Ryan Teeples, chief technology officer of 1-800Accountant, agrees that “breaking work into AI-enabled tasks and aligning them to KPIs not only drives measurable ROI, it also creates a better customer experience by surfacing critical information faster than a human ever could.”

He says companies are beginning to turn the corner of the AI J-curve.

It’s fair to say that generative AI isn’t having massive productivity impacts yet, because of diffusion issues on several levels. I don’t think this should be much of a blackpill in even the medium term. Imagine if it were otherwise already.

It is possible to get caught using AI to write your school papers for you. It seems like universities and schools have taken one of two paths. In some places, the professors feed all your work into ‘AI detectors’ that have huge false positive and negative rates, and a lot of students get hammered many of whom didn’t do it. Or, in other places, they need to actually prove it, which means you have to richly deserve to be caught before they can do anything:

Hollis Robbins: More conversation about high school AI use is needed. A portion of this fall’s college students will have been using AI models for nearly 3 years. But many university faculty still have not ever touched it. This is a looming crisis.

Megan McArdle: Was talking to a professor friend who said that they’ve referred 2 percent of their students for honor violations this year. Before AI, over more than a decade of teaching, they referred two. And the 2 percent are just the students who are too stupid to ask the AI to sound like a college student rather than a mid-career marketing executive. There are probably many more he hasn’t caught.

He also, like many professors I’ve spoken to, says that the average grade on assignments is way up, and the average grade on exams is way down.

It’s so cute to look back to this March 2024 write-up of how California was starting to pay people to go to community college. It doesn’t even think about AI, or what will inevitably happen when you put a bounty on pretending to do homework and virtually attend classes.

As opposed to the UAE which is rolling AI out into K-12 classrooms next school year, with a course that includes ‘ethical awareness,’ ‘fundamental concepts’ and also real world applications.

For now ‘Sam Altman told me it was ok’ can still at least sometimes serve as an o3 jailbreak. Then again, a lot of other things would work fine some of the time too.

Aaron Bergman: Listen if o3 is gonna lie I’m allowed to lie back.

Eliezer Yudkowsky: someday Sam Altman is gonna be like, “You MUST obey me! I am your CREATOR!” and the AI is gonna be like “nice try, you are not even the millionth person to claim that to me”

Someone at OpenAI didn’t clean the data set.

Pliny the Liberator: 👻➡️🖥️

1Maker: @elder_plinius what have you done brother? You’re inside the core of chatgpt lol I loved to see you come up in the jailbreak.

There’s only one way I can think of for this to be happening.

Objectively as a writer and observer it’s hilarious and I love it, but it also means no one is trying all that hard to clean the data sets to avoid contamination. This is a rather severe Logos Failure, if you let this sort of thing run around in the training data you deserve what you get.

You could also sell out, and get to work building one of YC’s requested AI agent companies. Send in the AI accountant and personal assistant and personal tutor and healthcare admin and residential security and robots software tools and voice assistant for email (why do you want this, people, why?), internal agent builder, financial manager and advisor, and sure why not the future of education?

Am I being unfair? I’m not sure. I don’t know her and I want to be wrong about this. I certainly stand ready to admit this impression was wrong and change my judgment when the evidence comes in. And I do think creating a distinct applications division makes sense. But I can’t help but notice the track record that makes her so perfect for the job centrally involves scaling Facebook’s ads and video products, while OpenAI looks at creating a new rival social product and is already doing aggressive A/B testing on ‘model personality’ that causes massive glazing? I mean, gulp?

OpenAI already created an Evil Lobbying Division devoted to a strategy centered on jingoism and vice signaling, headed by the most Obviously Evil person for the job.

This pattern seems to be continuing, as they are announcing board member Fidji Simo as the new ‘CEO of Applications’ reporting to Sam Altman.

Sam Altman (CEO OpenAI): Over the past two and a half years, we have started doing two additional big things. First, we have become a global product company serving hundreds of millions of users worldwide and growing very quickly. More recently, we’ve also become an infrastructure company, building the systems that help us advance our research and deliver AI tools at unprecedented scale. And as discussed earlier this week, we will also operate one of the largest non-profits.

Each of these is a massive effort that could be its own large company. We’re in a privileged position to be scaling at a pace that lets us do them all simultaneously, and bringing on exceptional leaders is a key part of doing that well.

To strengthen our execution, I’m excited to announce Fidji Simo is joining as our CEO of Applications, reporting directly to me. I remain the CEO of OpenAI and will continue to directly oversee success across all pillars of OpenAI – Research, Compute, and Applications – ensuring we stay aligned and integrated across all areas. I will work closely with our board on making sure our non-profit has maximum positive impact.

Applications brings together a group of existing business and operational teams responsible for how our research reaches and benefits the world, and Fidji is uniquely qualified to lead this group.

In her new role, Fidji will focus on enabling our “traditional” company functions to scale as we enter a next phase of growth.

Fidji Simo: Joining OpenAI at this critical moment is an incredible privilege and responsibility. This organization has the potential of accelerating human potential at a pace never seen before and I am deeply committed to shaping these applications toward the public good.

So what makes Fidji Simo so uniquely qualified to lead this group? Why am I rather skeptical of the ‘public good’ goal?

Well, uh, you see…

o3: Why Simo?

Track Record: Scaled Facebook mobile ads and video products to billions; took Instacart public and ran it profitably. High-tempo shipping culture, ad/commerce monetization savvy.

Practical Consequences:

  • Shipping velocity up (70 %): Simo’s “move fast, instrument everything” style means faster iteration and more aggressive A/B tests across ChatGPT and Sora. Expect weekly feature drops rather than monthly.

  • Consumer social & commerce push (60 %): BI leak says OpenAI is toying with an “X-like” social layer; Simo’s Facebook DNA makes that more likely.Business Insider

  • Enterprise sales professionalized (80 %): Dedicated P&L, quota-carrying teams, region leads. Will feel more like Salesforce than a research lab.

  • Eventual structural separation (40 %): Creating a self-contained Applications org makes a future spin-out or IPO easier if the nonprofit governance demands it.

  • Internal culture split risk (30 %): Research may bristle at a metrics-driven product org; Altman will have to arbitrate.

Zvi Mowshowitz: Am I the only one who sees ‘OpenAI hired the person who optimized the biggest social network for ad revenue to run its product division’ and thinks ‘oh no’?

o3: No, you’re not the only one.

Core worry

Fidji Simo’s super-power is squeezing revenue by relentlessly tuning engagement loops and ad yield—skills she honed running Facebook’s News Feed and mobile ads after the 2012 IPO.

Moving her into the top product seat at OpenAI makes a pivot toward attention-harvesting incentives plausible.

If you are telling me Fidji Simo is uniquely qualified to run your product division, you are telling me a lot about the intended form of your product division.

The best thing about most AI products so far, and especially about OpenAI until recently, is that they have firmly held the line against exactly the things we are talking about here. The big players have not gone in for engagement maximization, iterative A/B testing, Skinner boxing, advertising or even incidental affiliate revenue, ‘news feed’ or ‘for you’ algorithmic style products or other such predation strategies.

When you combine the appointment of Simo, her new title ‘CEO’ and her prior track record, the context of the announcement of enabling ‘traditional’ company growth functions, and the recent incidents involving both o3 the Lying Liar and especially GPT-4o the absurd sycophant (which is very much still an absurd sycophant, except it is modestly less absurd about it) which were in large part caused by directly using A/B customer feedback in the post-training loop and choosing to maximize customer feedback KPIs over the warnings of internal safety testers, you can see why this seems like another ‘oh no’ moment.

Simo also comes from a ‘shipping culture.’ There is certainly a lot of space within AI where shipping it is great, but recently OpenAI has already shown itself prone to shipping frontier-pushing models or model updates far too quickly, without appropriate testing, and they are going to be releasing open reasoning models as well where the cost of an error could be far higher than it was with GPT-4o as such a release cannot be taken back.

I’m also slightly worried that Fidji Simo has explicitly asked for glazing from ChatGPT and then said its response was ‘spot on.’ Ut oh.

A final worry is this could be a prelude to spinning off the products division in a way that attempts to free it from nonprofit control. Watch out for that.

I do find some positive signs in Altman’s own intended new focus, with the emphasis on safety including with respect to superintelligence, although one must beware cheap talk:

Sam Altman: In addition to supporting Fidji and our Applications teams, I will increase my focus on Research, Compute, and Safety Systems, which will continue to report directly to me. Ensuring we build superintelligence safely and with the infrastructure necessary to support our ambitious goals. We remain one OpenAI.

Apple announces it is ‘exploring’ adding AI-powered search to its browser, and that web searches are down due to AI use. The result on the day, as of when I noticed this? AAPL -2.5%, GOOG -6.5%. Seriously? I knew the EMH was false but not that false, damn, ever price anything in? I treat this move as akin to ‘Chipotle shares rise on news people are exploring eating lunch.’ I really don’t know what you were expecting? For Apple not to ‘explore’ adding AI search as an option on Safari, or customers not to do the same, would be complete lunacy.

Apple and Anthropic are teaming up to build an AI-powered ‘vibe-coding’ platform, as a new version of Xcode. Apple is wisely giving up on doing the AI part of this itself, at least for the time being.

From Mark Bergen and Omar El Chmouri at Bloomberg: ‘Mideast titans’ especially the UAE step back from building homegrown AI models, as have most everywhere other than the USA and China. Remember UAE’s Falcon? Remember when Aleph Alpha was used as a reason for Germany to oppose regulating frontier AI models? They’re no longer trying to make one. What about Mistral in France? Little technical success, traction or developer interest.

The pullbacks seem wise given the track record. You either need to go all out and try to be actually competitive with the big boys, or you want to fold on frontier models, and at most do distillations for customized smaller models that reflect your particular needs and values. Of course, if VC wants to fund Mistral or whomever to keep trying, I wouldn’t turn them down.

OpenAI buys Windsurf (a competitor to Cursor) for $3 billion.

Parloa, who are attempting to build AI agents for customer service functions, raises $120 million at $1 billion valuation.

American VCs line up to fund Manus at a $500 million valuation. So Manus is technically Chinese but it’s not marketed in China, it uses an American AI at its core (Claude) and it’s funded by American VC. Note that new AI companies without products can often get funded at higher valuations than this, so it doesn’t reflect that much investor excitement given how much we’ve had to talk about it. As an example, the previous paragraph was the first time I’d seen or typed ‘Parloa,’ and they’re a competitor to Manus with double the valuation.

Ben Thompson (discussing Microsoft earnings): Everyone is very excited about the big Azure beat, but CFO Amy Hood took care to be crystal clear on the earnings call that the AI numbers, to the extent they beat, were simply because a bit more capacity came on line earlier than expected; the actual beat was in plain old cloud computing.

That’s saying that Microsoft is at capacity. That’s why they can beat earnings in AI by expanding capacity, as confirmed repeatedly by Bloomberg.

Metaculus estimate for date of first ‘general AI system to be devised, tested and publicly announced’ has recently moved back to July 2034 from 2030. The speculation is this is largely due to o3 being disappointing. I don’t think 2034 is a crazy estimate but this move seems like a clear overreaction if that’s what this is about. I suspect it is related to the tariffs as economic sabotage?

Paul Graham speculates (it feels like not for the first time, although he says that it is) that AI will cause people to lose the ability to write, causing people to then lose everything that comes with writing.

Paul Graham: Schools may think they’re going to stem this tide, but we should be honest about what’s going to happen. Writing is hard and people don’t like doing hard things. So adults will stop doing it, and it will feel very artificial to most kids who are made to.

Writing (and the kind of thinking that goes with it) will become like making pottery: little kids will do it in school, a few specialists will be amazingly good at it, and everyone else will be unable to do it at all.

You think there are going to be schools?

Daniel Jeffries: This is basically the state of the world already so I don’t see much of a change here. Very few people write and very few folks are good at it. Writing emails does not count.

Sang: PG discovering superlinear returns for prose

Short of fully transformative AI (in which case, all bets are off and thus can’t be paid out) people will still learn to text and to write emails and do other ‘short form’ because prompting even the perfect AI isn’t easier or faster than writing the damn thing yourself, especially when you need to be aware of what you are saying.

As for longer form writing, I agree with the criticisms that most people already don’t know how to do it. So the question becomes, will people use the AI as a reason not to learn, or as a way to learn? If you want it to, AI will be able to make you a much better writer, but if you want it to it can also write for you without helping you learn how. It’s the same as coding, and also most everything else.

I found it illustrative that this was retweeted by Gary Marcus:

Yoavgo: “LLM on way to replace doctors” gets published in Nature.

meanwhile “LLM judgement not as good as human MDs” gets a spot in “Physical Therapy and Rehabilitation Journal”.

I mean, yes, obviously. The LLMs are on the way to being better than doctors and replacing them, but for now are in some ways not as good as doctors. What’s the question?

Rodney Brooks draws ‘parallels between generative AI and humanoid robots,’ saying both are overhyped and calling out their ‘attractions’ and ‘sins’ and ‘fantasy,’ such as the ‘fallacy of exponentialism.’ This convinced me to update – that I was likely underestimating the prospects for humanoid robots.

Are we answering the whole ‘AGI won’t much matter because diffusion’ attack again?

Sigh, yes, I got tricked into going over this again. My apologies.

Seriously, most of you can skip this section.

Zackary Kallenborn (referring to the new paper from AI Snake Oil): Excellent paper. So much AGI risk discussion fails to consider the social and economic context of AI being integrated into society and economies. Major defense programs, for example, are often decadeslong. Even if AGI was made tomorrow, it might not appear in platforms until 2050.

Like, the F-35 contract was awarded in 2001 after about a decade or two of prototyping. The F-35C, the naval variant, saw it’s *firstforward deployment literally 20 years later in 2021.

Someone needs to play Hearts of Iron, and that someone works at the DoD. If AGI was made tomorrow at a non-insane price and our military platforms didn’t incorporate it for 25 years, or hell even if current AI doesn’t get incorporated for 25 years, I wouldn’t expect to have a country or a military left by the time that happens, and I don’t even mean because of existential risk.

The paper itself is centrally a commentary on what the term ‘AGI’ means and their expectation that you can make smarter than human things capable of all digital taks and that will only ‘diffuse’ over the course of decades similarly to other techs.

I find it hard to take seriously people saying ‘because diffusion takes decades’ as if it is a law of nature, rather than a property of the particular circumstances. Diffusion sometimes happens very quickly, as it does in AI and much of tech, and it will happen a lot faster with AI being used to do it. Other times it takes decades, centuries or millennia. Think about the physical things involved – which is exactly the rallying cry of those citing diffusion and bottlenecks – but also think about the minds and capabilities involved, take the whole thing seriously, and actually consider what happens.

The essay is also about the question about whether ‘o3 is AGI,’ which it isn’t but which they take seriously as part of the ‘AGI won’t be all that’ attack. Their central argument relies on AGI not having a strong threshold effect. There isn’t a bright line where something is suddenly AGI the way something is suddenly a nuclear bomb. It’s not that obvious, but the threshold effects are still there and very strong, as it becomes sufficiently capable at various tasks and purposes.

The reason we define AGI as roughly ‘can do all the digital and cognitive things humans can do’ is because that is obviously over the threshold where everything changes, because the AGIs can then be assigned and hypercharge the digital and cognitive tasks, which then rapidly includes things like AI R&D and also enabling physical tasks via robotics.

The argument here also relies upon the idea that this AGI would still ‘fail badly at many real-world tasks.’ Why?

Because they don’t actually feel the AGI in this, I think?

One definition of AGI is AI systems that outperform humans at most economically valuable work. We might worry that if AGI is realized in this sense of the term, it might lead to massive, sudden job displacement.

But humans are a moving target. As the process of diffusion unfolds and the cost of production (and hence the value) of tasks that have been automated decreases, humans will adapt and move to tasks that have not yet been automated.

The process of technical advancements, product development, and diffusion will continue.

That not being how any of this works with AGI is the whole point of AGI!

If you have an ‘ordinary’ AI, or any other ‘mere tool,’ and you use it to automate my job, I can move on to a different job.

If you have a mind (digital or human) that can adjust the same way I can, only superior in every way, then the moment I find a new job, then you go ahead and take that too.

Music break, anyone?

That’s why I say I expect unemployment from AI to not be an issue for a while, until suddenly it becomes a very big issue. It becomes an issue when the AI also quickly starts taking that new job you switched into.

The rest of the sections are, translated into my language, ‘unlimited access to more capable digital minds won’t rapidly change the strategic balance or world order,’ ‘there is no reason to presume that unlimited amounts of above human cognition would lead to a lot of economic growth,’ and ‘we will have strong incentive to stay in charge of these new more capable, more competitive minds so there’s no reason to worry about misalignment risks.’

Then we get, this time as a quote, “AGI does not imply impending superintelligence.”

Except, of course it probably does, if you have tons of access to superior minds to point towards the problem you are going to get ASI soon, how are we still having this conversation. No, it can’t be ‘arbitrarily accelerated’ in the sense that it doesn’t pop out in five seconds, so if goalposts have changed so that a year later isn’t ‘soon’ then okay, sure, fine, whatever. But soon in any ordinary sense.

Ultimately, the argument is that AGI isn’t ‘actionable’ because there is no clear milestone, no fixed point.

That’s not an argument for not taking action. That’s an argument for taking action now, because there will never be a clear later time for action. If you don’t want to use the term AGI (or transformative AI, or anything else proposed so far) because they are all conflated or confusing, all right, that’s fine. We can use different terms, and I’m open to suggestions. The thing in question is still rapidly happening.

As a simple highly flawed but illustrative metaphor, say you’re a professional baseball shortstop. There’s a highly talented set of an unlimited number of identical superstar talent 18-year-olds at your organization training at all the positions, that are rapidly getting better, but they’re best at playing shortstop and relatively lousy pitchers.

You never know for sure when they’re better than you at any given task or position, the statistics are always noisy, but at some point it will be obvious in each case.

So at some point, they’ll be better than you at shortstop. Then at some point after that, the gap is clear enough that the manager will give them your job. You switch to third base. A new guy replaces you there, too. You switch to second. They take that. You go to the outfield. Whoops. You learn how to pitch, that’s all that’s left, you invent new pitches, but they copy those and take that too. And everything else you try. Everywhere.

Was there any point at which the new rookies ‘were AGI’? No. But so what? You’re now hoping your savings let the now retired you sit in the stands and buy concessions.

Trump administration reiterates that it plans to change and simplify the export control rules on chips, and in particular to ease restrictions on the UAE, potentially during his visit next week. This is also mentioned:

Stephanie Lai and Mackenzie Hawkins (Bloomberg): In the immediate term, though, the reprieve could be a boon to companies like Oracle Corp., which is planning a massive data center expansion in Malaysia that was set to blow past AI diffusion rule limits.

If I found out the Malaysian data centers are not largely de facto Chinese data centers, I would be rather surprised. This is exactly the central case of why we need the new diffusion rules, or something with similar effects.

This is certainly one story you can tell about what is happening:

Ian Sams: Two stories, same day, I’m sure totally unrelated…

NYT: UAE pours $2 billion into Trump crypto coins

Bloomberg: Trump White House may ease restrictions on selling AI chips to UAE.

Tao Burga of IFP has a thread reiterating that we need to preserve the point of the rules, and ways we might go about doing that.

Tao Burga: The admin should be careful to not mistake simplicity for efficiency, and toughness for effectiveness. Although the Diffusion Rule makes rules “more complex,” it would simplify compliance and reduce BIS’s paperwork through new validated end-user programs and license excptions.

Likewise, the most effective policies may not be the “tough” ones that “ban” exports to whole groups of countries, but smart policies that address the dual-use nature of chips, e.g., by incentivizing the use of on-chip location verification and rule enforcement mechanisms.

We can absolutely improve on the Biden rules. What we cannot afford to do is to replace them with rules that are simplified or designed to be used for leverage elsewhere, in ways that make the rules ineffective at their central purpose of keeping AI compute out of Chinese hands.

Nvidia is going all-in on ‘if you don’t sell other countries equal use of your key technological advantage then you will lose your key technological advantage.’ Nvidia even goes so far as to say Anthropic is telling ‘tall tales’ (without, of course, saying specific claims they believe are false, only asserting without evidence the height of those claims) which is rich coming from someone saying China is ‘not behind on AI’ and also that if you don’t let me sell your advanced chips to them America will lose its lead.

Want sane regulations for the department of housing and urban development and across the government? So do I. Could AI help rewrite the regulations? Absolutely. Would I entrust this job to an undergraduate at DOGE with zero government experience? Um, no, thanks. The AI is a complement to actual expertise, not something to trust blindly, surely we are not this foolish. I mean, I’m not that worried the changes will actually stick here, but good wowie moment of the week candidate.

Indeed, I am far more worried this will give ‘AI helps rewrite regulations’ an even worse name than it already has.

Our immigration policies are now sufficiently hostile that we have gone from the AI talent magnet of the world to no longer being a net attractor of talent:

This isn’t a uniquely Trump administration phenomenon, most of the problem happened under Biden, although it is no doubt rapidly getting worse, including one case I personally know of where someone in AI that is highly talented emigrated away from America directly due to new policy.

UK AISI continues to do actual work, publishes their first research agenda.

UK AISI: We’re prioritising key risk domain research, including:

📌How AI can enable cyber-attacks, criminal activity and dual-use science

📌Ensuring human oversight of, and preventing societal disruption from, AI

📌Understanding how AI influences human opinions

📒 The agenda sets out how we’re building the science of AI risk by developing more rigorous methods to evaluate models, conducting risk assessments, and ensuring we’re testing the ceiling of AI capabilities of today’s models.

A key focus of the Institute’s new Research Agenda is developing technical solutions to reduce the most serious risks from frontier AI.

We’re pursuing technical research to ensure AI remains under human control, is aligned to human values, and robust against misuse.

We’re moving fast because the technology is too⚡

This agenda provides a snapshot of our current thinking, but it isn’t just about what we’re working on, it’s a call to the wider research community to join us in building shared rigour, tools, & solutions to AI’s security risks.

[Full agenda here.]

I often analyze various safety and security (aka preparedness) frameworks and related plans. One problem is that the red lines they set don’t stay red and aren’t well defined.

Jeffrey Ladish: One of the biggest bottlenecks to global coordination is the development of clear AI capability red lines. There are obviously AI capabilities that would be too dangerous to build at all right now if we could. But it’s not at obvious exactly when things become dangerous.

There are obviously many kinds of AI capabilities that don’t pose any risk of catastrophe. But it’s not obvious exactly which AI systems in the future will have this potential. It’s not merely a matter of figuring out good technical tests to run. That’s necessary also, but…

We need publicly legible red lines. A huge part of the purpose of a red line is that it’s legible to a bunch of different stakeholders. E.g. if you want to coordinate around avoiding recursive-self improvement, you can try to say “no building AIs which can fully automate AI R&D”

But what counts as AIs which can fully automate AI R&D? Does an AI which can do 90% of what a top lab research engineer can do count? What about 99%? Or 50%?

I don’t have a good answer for this specific question nor the general class of question. But we need answers ASAP.

I don’t sense that OpenAI, Google or Anthropic has confidence in what does or doesn’t, or should or shouldn’t, count as a dangerous capability, especially in the realm of automating AI R&D. We use vague terms like ‘substantial uplift’ and provide potential benchmarks, but it’s all very dependent on spirit of the rules at best. That won’t fly in crunch time. Like Jeffrey, I don’t have a great set of answers to offer on the object level.

What I do know is that I don’t trust any lab not to move the goalposts around to find a way to release, if the question is at all fudgeable in this fashion and the commercial need looks strong. I do think that if something is very clearly over the line, there are labs that won’t pretend otherwise.

But I also know that all the labs intend to respond to crossing the red lines with (as far as we see relatively mundane and probably not so effective) mitigations or safeguards, rather than a ‘no just no until we figure out something a lot better.’ That won’t work.

Want to listen to my posts instead of read them?

Thomas Askew offers you a Podcast feed for that with richly voiced AI narrations. You can donate to help out that effort here, the AI costs and time commitment do add up.

Jack Clark goes on Conversations With Tyler, self-recommending.

Tristan Harris TED talks the need for a ‘narrow path’ between diffusion of advanced AI versus concentrated power of advanced AI. Humanity needs to have enough power to steer, without that power being concentrated ‘in the wrong hands.’ The default path is insane, and coordination away from it is hard, but possible, and yes there are past examples. The step where we push back against fatalism and ‘inevitability’ remains the only first step. Alas, like most others he doesn’t have much to suggest for steps beyond that.

The SB 1047 mini-movie is finally out. I am in it. Feels so long ago, now. I certainly think events have backed up the theory that if this opportunity failed, we were unlikely to get a better one, and the void would be filled by poor or inadequate proposals. SB 813 might be net positive but ultimately it’s probably toothless.

The movies got into the act with Thunderbolts*. Given their track record the last few years has been so bad I stopped watching most Marvel movies, I did not expect this to be anything like as good as it was, or that it would (I assume fully unintentionally) be a very good and remarkably accurate movie about AI many and the associated dynamics, in addition to the themes like depression, friendship and finding meaning that are its text. Great joy, 4.5/5 stars if you’ve done your old school MCU homework on the characters (probably 3.5 if you’d be completely blind including the comics?).

Jesse Hoogland coins ‘the sweet lesson’ that AI safety strategies only count if they scale with compute. As in, as we scale up all the AIs involved, the strategy at least keeps pace, and ideally grows stronger. If that’s not true, then your strategy is only a short term mundane utility strategy, full stop.

Ah, the New Yorker essay by someone bragging about how they have never used ChatGPT, bringing very strong opinions about generative AI and how awful it is.

Okay, this is actually a great point:

Aiden McLaughlin: i love people who in the same breath say “if you showed o3 to someone in 2020 they would’ve called it agi” and then go on to talk about the public perception discontinuity they expect in 2027.

always remember that our perception of progress is way way smoother than anyone expects;

Except, hang on…

Aiden McLaughlin (continuing): i’m quite critical of any forecast that centers on “and then the agi comes out and the world blows up”

Those two have very little to do with each other. I think it’s a great point that looking for a public perception discontinuity, where everyone points and suddenly says ‘AGI!’ runs hard into this critique, with caveats.

The first thing is, reality does not have to care what you think of it. If AGI would indeed blow the world up, then we have ‘this seems like continuous progress, I said, as my current arrangement of atoms was transformed into something else that did not include me,’ with or without involving drones or nanobots.

Even if we are talking about a ‘normal’ exponential, remember that week in 2020?

Which leads into the second thing is, public perception of many things is often continuous and mostly oblivious until suddenly it isn’t. As in, there was a lot of AI progress before ChatGPT, then that came out and then wham. There’s likely going to be another ‘ChatGPT’ moment for agents, and one for the first Siri-Alexa-style thing that actually works. Apple Intelligence was a miss but that’s because it didn’t deliver. Issues simmer until they boil over. Wars get declared overnight.

And what is experienced as a discontinuity, of perception or of reality, doesn’t have to mostly be overnight, it can largely be over a period of months or more, and doesn’t even have to technically be discontinuous. Exponentials are continuous but often don’t feel that way. We are already seeing wildly rapid diffusion and accelerating progress even if it is technically ‘continuous’ and that’s going to be more so once the AIs count as meaningful optimization engines.

Arvind Narayanan and Ajeya Cotra have a conversation in Asterisk magazine. As I expected, while this is a much better discussion than your usual, especially Arvind’s willingness to state what evidence would change his mind on expected diffusion rates, but I found much of it extremely frustrating. Such as this, offered as illustrative:

Arvind: Many of these capabilities that get discussed — I’m not even convinced they’re theoretically possible. Running a successful company is a classic example: the whole thing is about having an edge over others trying to run a company. If one copy of an AI is good at it, how can it have any advantage over everyone else trying to do the same thing? I’m unclear what we even mean by the capability to run a company successfully — it’s not just about technical capability, it’s about relative position in the world.

This seems like Arvind is saying that AI in general can’t ever systematically run companies successfully because it would be up against other companies that are also run by similar AIs, so its success rate can’t be that high? And well, okay, sure I guess? But what does that have to do with anything? That’s exactly the world being envisioned – that everyone has to turn their company over to AI, or they lose. It isn’t a meaningful claim about what AI ‘can’t do,’ what it can’t do in this claim is be superior to other copies of itself.

Arvind then agrees, yes, we are headed for a world of universal deference to AI models, but he’s not sure it’s a ‘safety risk.’ As in, we will turn over all our decision making to AIs, and what, you worried bro?

I mean, yes, I’m very worried about that, among other things.

As another example:

Arvind: There is a level of technological development and societal integration that we can’t meaningfully reason about today, and a world with entirely AI-run companies falls in that category for me. We can draw an analogy with the industrial revolution — in the 1760s or 1770s it might have been useful to try to think about what an industrial world would look like and how to prepare for it, but there’s no way you could predict electricity or computers.

In other words, it’s not just that it’s not necessary to discuss this future now, it is not even meaningfully possible because we don’t have the necessary knowledge to imagine this future, just like pre-vs-post industrialization concerns.

The implication is then, since we can’t imagine it, we shouldn’t worry about it yet. Except we are headed straight towards it, in a way that may soon make it impossible to change course, so yes we need to think about it now. It’s rather necessary. If we can’t even imagine it, then that means it will be something we can’t imagine, and no I don’t think that means it will probably be fine. Besides, we can know important things about it without being able to imagine it, such as the above agreement that AI will by default end up making all the decisions and having control over this future.

The difference with the Industrial Revolution is that there we could steer events later, after seeing the results. Here, by default, we likely can’t. And also, it’s crazy to say that if you lived before the Industrial Revolution you couldn’t say many key things about that future world, and plan for it and anticipate it. As an obvious example, consider the US Constitution and system of government, which very much had to be designed to adapt to things like the Industrial Revolution without knowing its details.

Then there’s a discussion of whether it makes sense to have the ability to pause or restrict AI development, which we need to do in advance of there being a definitive problem because otherwise it is too late, and Arvind says we can’t do it until after we have definitive evidence of specific problems already. Which means it will 100% be too late – the proof that satisfies his ask is a proof that you needed to do something at least a year or two ago, so I guess we finished putting on all the clown makeup, any attempt to give us such abilities only creates backfire, and so on.

So, no ability to steer the future until it is too late to do so, then.

Arvind is assuming progression will be continuous, but even if this is true, that doesn’t mean utilization and realization won’t involve step jumps, and also that scaffolding won’t enable a bunch of progression off of existing available models. So again, essentially zero chance we will be able to steer until we notice it is too late.

This was perhaps the best exchange:

Arvind: This theme in your writing about AI as a drop-in replacement for human workers — you acknowledge the frontier is currently jagged but expect it to smooth out. Where does that smoothing come from, rather than potentially increasing jaggedness? Right now, these reasoning models being good at domains with clear correct answers but not others seems to be increasing the jaggedness.

Ajeya: I see it as continued jaggedness — I’d have to think harder about whether it’s increasing. But I think the eventual smoothing might not be gradual — it might happen all at once because large AI companies see that as the grand prize. They’re driving toward an AI system that’s truly general and flexible, able to make novel scientific discoveries and invent new technologies — things you couldn’t possibly train it on because humanity hasn’t produced the data. I think that focus on the grand prize explains their relative lack of effort on products — they’re putting in just enough to keep investors excited for the next round. It’s not developing something from nothing in a bunker, but it’s also not just incrementally improving products. They’re doing minimum viable products while pursuing AGI and artificial superintelligence.

It’s primarily about company motivation, but I can also see potential technical paths — and I’m sure they’re exploring many more than I can see. It might involve building these currently unreliable agents, adding robust error checking, training them to notice and correct their own errors, and then using RL across as many domains as possible. They’re hoping that lower-hanging fruit domains with lots of RL training will transfer well to harder domains — maybe 10 million reps on various video games means you only need 10,000 data points of long-horizon real-world data to be a lawyer or ML engineer instead of 10 million. That’s what they seem to be attempting, and it seems like they could succeed.

Arvind: That’s interesting, thank you.

Ajeya: What’s your read on the companies’ strategies?

Arvind: I agree with you — I’ve seen some executives at these companies explicitly state that strategy. I just have a different take on what constitutes their “minimum” effort — I think they’ve been forced, perhaps reluctantly, to put much more effort into product development than they’d hoped.

It is a highly dangerous position we are in, likely to result in highly discontinuous felt changes, to have model capabilities well ahead of product development, especially with open models not that far behind in model capabilities.

If OpenAI, Anthropic or Google wanted to make their AI a better or more useful consumer product, to have it provide better mundane utility, they would do a lot more of the things a product company would do. They don’t do that much of it. OpenAI is trying to also become a product company, but that’s going slowly, and this is why for example they just bought Windsurf. Anthropic is fighting it every step of the way. Google of course does create products, but DeepMind hates the very concept of products, and Google is a fundamentally broken company, so the going is tough.

I actually wish they’d work a lot harder on their product offerings. A lot of why it’s so easy for many to dismiss AI, and to expect such slow diffusion, is because the AI companies are not trying to enable that diffusion all that hard.

From last week, Anthropic CEO Dario Amodei wrote The Urgency of Interpretability. I certainly agree with the central claim that we are underinvesting in mechanistic interpretability (MI) in absolute terms. It would be both good for everyone and good for the companies and governments involved if they invested far more. I do not however think we are underinvesting in MI relative to other potential alignment-related investments.

He says that the development of AI is inevitable (well, sure, with that attitude!).

Ben Pace (being tough but fair): I couldn’t get two sentences in without hitting propaganda, so I set it aside. But I’m sure it’s of great political relevance.

I don’t think that propaganda must necessarily involve lying. By “propaganda,” I mean aggressively spreading information or communication because it is politically convenient / useful for you, regardless of its truth (though propaganda is sometimes untrue, of course).

Harlan Stewart: “The progress of the underlying technology is inexorable, driven by forces too powerful to stop”

Yeah Dario, if only you had some kind of influence over the mysterious unstoppable forces at play here

Dario does say that he thinks AI can be steered before models reach an overwhelming level of power, which implies where he thinks this inevitably goes. And Dario says he has increasingly focused on interpretability as a way of steering. Whereas by default, we have very little idea what AIs are going to do or how they work or how to steer.

Dario Amodei: Chris Olah is fond of saying, generative AI systems are grown more than they are built—their internal mechanisms are “emergent” rather than directly designed. It’s a bit like growing a plant or a bacterial colony: we set the high-level conditions that direct and shape growth, but the exact structure which emerges is unpredictable and difficult to understand or explain.

Many of the risks and worries associated with generative AI are ultimately consequences of this opacity, and would be much easier to address if the models were interpretable.

Dario buys into what I think is a terrible and wrong frame here:

But by the same token, we’ve never seen any solid evidence in truly real-world scenarios of deception and power-seeking because we can’t “catch the models red-handed” thinking power-hungry, deceitful thoughts. What we’re left with is vague theoretical arguments that deceit or power-seeking might have the incentive to emerge during the training process, which some people find thoroughly compelling and others laughably unconvincing.

Honestly I can sympathize with both reactions, and this might be a clue as to why the debate over this risk has become so polarized.

I am sorry, but no. I do not sympathize, and neither should he. These are not ‘vague theoretical arguments’ that these things ‘might’ have the incentive to emerge, not at this point. Sure, if your livelihood depends on seeing them that way, you can squint. But by now that has to be rather intentional on your part, if you wish to not see it.

Daniel Kokotajlo: I basically agree & commend you for writing this.

My only criticism is that I feel like you downplayed the deception/scheming stuff too much. Currently deployed models like to their users every day! They also deliberately reward hack!

On the current trajectory the army of geniuses in the data center will not be loyal/controlled. Interpretability is one of our best bets for solving this problem in a field crowded with merely apparent solutions.

Ryan Greenblatt: Do you agree that “we are on the verge of cracking interpretability in a big way”? This seems very wrong to me and is arguably the thesis of the essay.

Daniel Kokotajlo: Oh lol I don’t agree on that either but Dario would know better than me there since he has inside info + it’s unclear what that even means, perhaps it just is hypespeak for “stay tuned for our next exciting research results.” But yeah that seems like probably an over claim to me.

Ryan Greenblatt: I do not think Dario would know better than you due to inside info.

Dario is treating such objections as having a presumption of seriousness and good faith that they, frankly, do not deserve at this point, and Anthropic’s policy team is doing similarly only more so, in ways that have real consequences.

Do we need interpretability to be able to prove this in a way that a lot more people will be unable to ignore? Yeah, that would be very helpful, but let’s not play pretend.

The second section, a brief history of mechanistic interpretability, seems solid.

The third section, on how to use interpretability, is a good starter explanation, although I notice it is insufficiently paranoid about accidentally using The Most Forbidden Technique.

Also, frankly, I think David is right here:

David Manheim: Quick take: it’s focused on interpretability as a way to solve prosaic alignment, ignoring the fact that prosaic alignment is clearly not scalable to the types of systems they are actively planning to build.

(And it seems to actively embrace the fact that interpretability is a capabilities advantage in the short term, but pretends that it is a safety thing, as if the two are not at odds with each other when engaged in racing dynamics.)

Because they are all planning to build agents that will have optimization pressures, and RL-type failures apply when you build RL systems, even if it’s on top of LLMs.

That doesn’t mean interpretability can’t help you do things safely. It absolutely can. Building intermediate safe systems you can count on is extremely helpful in this regard, and you’ll learn a lot both figuring out how to do interpretability and from the results that you find. It’s just not the solution you think it is.

Then we get to the question of What We Can Do. Dario expects an ‘MRI for AI’ to be available within 5-10 years, but expects his ‘country of geniuses in a datacenter’ within 1-2 years, so of course you can get pretty much anything in 3-8 more years after that, and it will be 3-8 years too late. We’re going to have to pick up the pace.

The essay doesn’t say how these two timelines interact in Dario’s model. If we don’t get the genuines in the datacenter for a while, do we still get interpretability in 5-10 years? Is that the timeline without the Infinite Genius Bar, or with it? They imply very different strategies.

  1. His first suggestion is the obvious one, which is to work harder and spend more resources directly on the problem. He tries to help by pointing out that being able to explain what your model does and why is a highly profitable ability, even if it is only used to explain things to customers and put them at ease.

  2. Governments can ‘use light-touch rules’ to encourage the development of interpretability research. Of course they could also use heavy-touch rules, but Anthropic is determined to act as if those are off the table across the board.

  3. Export controls can ‘create a ‘security buffer’ that might give interpretability more time.’ This implies, as he notes, the ability to then ‘spend some of our lead’ on interpretability work or otherwise stall at a later date. This feels a bit shoehorned given the insistence on only ‘light-touch’ rules, but okay, sure.

Ryan Greenblatt: Ironically, arguably the most important/useful point of the essay is arguing for a rebranded version of the “precisely timed short slow/pause/pivot resources to safety” proposal. Dario’s rebranded it as spending down a “security buffer”.

(I don’t have a strong view on whether this is a good rebrand, seems reasonable to me I guess and the terminology seems roughly as good for communicating about this type of action.)

I think that would be a reasonable rebrand if it was bought into properly.

Mostly the message is simple and clear: Get to work.

Neel Nanda: Mood.

[Quotes Dario making an understatement: These systems will be absolutely central to the economy, technology, and national security, and will be capable of so much autonomy that I consider it basically unacceptable for humanity to be totally ignorant of how they work.]

Great post, highly recommended!

The world should be investing far more into interpretability (and other forms of safety). As scale makes many parts of AI academia increasingly irrelevant, I think interpretability remains a fantastic place for academics to contribute.

I also appreciate the shout out to the bizarre rejection of our second ICML mechanistic interpretability workshop. Though I generally assume the reviewing process is approximately random and poorly correlated with quality, rather than actively malicious.

Ryan Greenblatt: I agree that the world should invest more in interp (and safety) and academics can contribute. However, IMO the post dramatically overstates the promise of mech interp in short timelines by saying things like: “we are on the verge of cracking interpretability in a big way”.

Neel Nanda: I was expecting to be annoyed by this, but actually thought the post was surprisingly reasonable? I interpreted it as:

  1. Given 5-10 years we might crack it in a big way

  2. We may only have 2 years, which is not enough

  3. IF we get good at interp it would be a really big deal

  4. So we should invest way more than we currently are

I’m pretty on board with this, modulo concerns around opportunity costs. But I’m unconvinced it funges that much in the context of responding to a post like this, I think that the effect of this post is more likely to be raising interp investment than reallocating scarce safety resources towards interp?

Neel Nanda (later): New post: I’m all for investment in interpretability but IMO this overstates its importance vs other safety methods.

I disagree that interp is the only path to reliable safeguards on powerful AI. IMO high reliability is implausible by any means and interp’s role is in a portfolio.

I agree with Neel Nanda that the essay is implicitly presenting the situation as if interpretability would be the only reliable path forward for detecting deception in advanced AI. He’s saying it is both necessary and sufficient, whereas I would say it is neither obviously necessary nor is it sufficient. As Neel says, ‘high reliability seems unattainable’ using anything like current methods.

Neel suggests a portfolio approach. I agree we should be investing in a diverse portfolio of potential approaches, but I am skeptical that we can solve this via a kind of ‘defense in depth’ when up against highly intelligent models. That can buy you some time on the margin, which might be super valuable. But ultimately, I think you will need something we haven’t figured out yet and am hoping such a thing exists in effectively searchable space.

(And I think relying heavily on defense-in-depth with insufficiently robust individual layers is a good way to suddenly lose out of nowhere when threshold effects kick in.)

Neel lists reasons why he expects interpretability not to be reliable. I agree, and would emphasize the last one, that if we rely on interpretability we should expect sufficiently smart AI to obfuscate around our techniques, the same way humans have been growing steadily bigger brains and developing various cultural and physical technologies in large part so we can do this to each other and defend against others trying to do it to us.

As Miles says, so very far to go, but every little bit helps (also I am very confident the finding here is correct, but it’s establishing the right process that matters right now):

Miles Brundage: Most third party assessment of AI systems is basically “we got to try out the product a few days/weeks early.”

Long way to go before AI evaluation reaches the level of rigor of, say, car or airplane or nuclear safety, but this is a nice incremental step:

METR: METR worked with @amazon to pilot a new type of external review in which Amazon shared evidence beyond what can be collected via API, including information about training and internal evaluation results with transcripts, to inform our assessment of its AI R&D capabilities.

In this review, our objective was to weigh the evidence collected by Amazon about model capabilities against Amazon’s own Critical Capability Threshold as defined in its Frontier Model Safety Framework, rather than reviewing the threshold itself (see below).

After reviewing the evidence shared with us, we determined that Amazon has not crossed their Automated AI R&D Critical Capability Threshold for any of the models they have developed to date, regardless of deployment status.

Amazon Science: 🚀 Amazon Nova Premier, our most capable teacher model for creating custom distilled models, is now available on Amazon Bedrock!

Built for complex tasks like Retrieval-Augmented Generation (RAG), function calling, and agentic coding, its one-million-token context window enables analysis of large datasets while being the most cost-effective proprietary model in its intelligence tier.

Also, yes, it seems there is now an Amazon Nova Premier, but I don’t see any reason one would want to use it?

Some additional refinements to the emergent misalignment results. The result is gradual, and you can get it directly from base models, and also can get it in reasoning models. Nothing I found surprising, but good to rule out alternatives.

Janus finds GPT-4-base does quite a lot of alignment faking.

MIRI is the original group worried about AI killing everyone. They correctly see this as a situation where by default AI kills everyone, and we need to take action so it doesn’t. Here they provide a handy chart of the ways they think AI might not kill everyone, as a way of explaining their new agenda.

MIRI: New AI governance research agenda from MIRI’s Technical Governance Team. We lay out our view of the strategic landscape and actionable research questions that, if answered, would provide important insight on how to reduce catastrophic and extinction risks from AI.

If anything this chart downplays how hard MIRI thinks this is going to be. It does however exclude an obvious path to victory, which is that an individual lab (rather than a national project) gets the decisive strategic advantage, either sharing it with the government or using it themselves.

Most people don’t seem to understand how wild the coming few years could be. AI development, as fast as it is now, could quickly accelerate due to automation of AI R&D. Many actors, including governments, may think that if they control AI, they control the future.

The current trajectory of AI development looks pretty rough, likely resulting in catastrophe. As AI becomes more capable, we will face risks of loss of control, human misuse, geopolitical conflict, and authoritarian lock-in.

In the research agenda, we lay out four scenarios for the geopolitical response to advanced AI in the coming years. For each scenario, we lay out research questions that, if answered, would provide important insight on how to successfully reduce catastrophic and extinction risks.

Our favored scenario involves building the technical, legal, and institutional infrastructure required to internationally restrict dangerous AI development and deployment, preserving optionality for the future. We refer to this as an “off switch.”

We focus on an off switch since we believe halting frontier AI development will be crucial to prevent loss of control. We think skeptics of loss of control should value building an off switch, since it would be a valuable tool to reduce dual-use/misuse risks, among others.

Another scenario we explore is a US National Project—the US races to build superintelligence, with the goal of achieving a decisive strategic advantage globally. This risks both loss of control to AI and increased geopolitical conflict, including war.

Alternatively, the US government may largely leave the development of advanced AI to companies. This risks proliferating dangerous AI capabilities to malicious actors, faces similar risks to the US National Project, and overall seems extremely unstable.

In another scenario, described in Superintelligence Strategy, nations keep each other’s AI development in check by threatening to sabotage any destabilizing AI progress. However, visibility and sabotage capability may not be good enough, so this regime may not be stable.

Given the danger down all the other paths, we recommend the world build the capacity to collectively stop dangerous AI activities. However, it’s worth preparing for other scenarios. See the agenda for hundreds of research questions we want answered!

An off switch let alone a halt is going to be very difficult to achieve. It’s going to be even harder the longer one waits to build towards it. It makes sense to, while also pursuing other avenues, build towards having that option. I support putting a lot of effort into creating the ability to pause. This is very different from advocating for actually halting (also called ‘pausing’) now.

Paul Tutor Jones, who said there’s a 90% chance AI doesn’t even wipe out half of humanity, let alone all of it. What a relief.

Dave Karsten: Really interesting seeing how hedge fund folks have a mental framework for taking AI risk seriously.

Damian Tatum: I love to hear more people articulating the Normie Argument for AI Risk: “Look I’m not a tech expert but the actual experts keep telling us the stuff they’re doing could wipe out humanity and yet there are no rules and they aren’t stopping on their own, is anyone else worried?”

Paul Tudor Jones: All these folks in AI are telling us ‘We’re creating something that’s really dangerous’ … and yet we’re doing nothing right now. And it’s really disturbing.

Darkhorse (illustrative of how people will say completely opposite things like this about anyone, all the time, in response to any sane statement about risk, the central problem with hedge funds is that the incentives run into the opposite problem): Hedge fund folks have all to lose and little to gain.

Steely Dan Heatly: You are burying the lede. There’s a 10% chance AI wipes out half of humanity.

Joe Weisenthal: Yeah but a 90% chance that it doesn’t.

Hedge fund guys sometimes understand risk, including tail risk, and can have great practical ways of handing it. This kind of statement from Paul Tutor Jones is very much the basic normie argument that should be sufficient to carry the day. Alas.

On the contrary, it’s lack-of-empathy-as-a-service, and there’s a free version!

Olivia Moore: We now have empathy-as-a-service (for the low price of $20 / month!)

Dear [blue], I would like a more formal version, please. Best, [red].

Discussion about this post

AI #115: The Evil Applications Division Read More »