Author name: DJ Henderson

amid-rising-prices,-disney+-and-hulu-offer-subscribers-some-freebies

Amid rising prices, Disney+ and Hulu offer subscribers some freebies

With streaming providers frequently raising prices, subscribers often feel like they’re paying more for the same service—or a lesser version, depending on what’s available to watch that month. In a unique move, Disney is introducing a small, potential financial benefit to Disney+ and Hulu subscribers in the form of some third-party discounts, freebies, trials, and contests.

As of today, Disney+ subscribers can log into Disney’s Disney+ Perks website with their streaming credentials to get access to a revolving selection of discounts and freebies. When I logged in today, I was met with options for several free trials, including a six-month one to DoorDash’s premium subscription offering, a three-month trial to Clear+, and a two-month trial to Duolingo’s premium subscription.

Disney+ subscribers can also get discounts, including to Adidas’ online marketplaces and “select” Disney Resorts Collection hotels (if you stay at least two nights, with most availability occurring between June 29 and July 31). There are also some free virtual rewards for Disney-owned games and the ability to enter sweepstakes, like for going to the premiere of the movie Freakier Friday.

Disney, which announced in November 2023 that it would take full control of Hulu from Comcast, said that Hulu-only subscribers will also get a perks program, starting on June 2. Those perks will differ from those of Disney+ and initially include chances to win tickets to Lollapalooza, San Diego Comic-Con, and Jimmy Kimmel Live, unspecified “perks” from Microsoft, LG, and others, and chances “to win items from and inspired by Hulu” originals, like The Handmaid’s Tale.

Amid rising prices, Disney+ and Hulu offer subscribers some freebies Read More »

gemini-in-google-drive-may-finally-be-useful-now-that-it-can-analyze-videos

Gemini in Google Drive may finally be useful now that it can analyze videos

Google’s rapid adoption of AI has seen the Gemini “sparkle” icon become an omnipresent element in almost every Google product. It’s there to summarize your email, add items to your calendar, and more—if you trust it to do those things. Gemini is also integrated with Google Drive, where it’s gaining a new feature that could make it genuinely useful: Google’s AI bot will soon be able to watch videos stored in your Drive so you don’t have to.

Gemini is already accessible in Drive, with the ability to summarize documents or folders, gather and analyze data, and expand on the topics covered in your documents. Google says the next step is plugging videos into Gemini, saving you from wasting time scrubbing through a file just to find something of interest.

Using a chatbot to analyze and manipulate text doesn’t always make sense—after all, it’s not hard to skim an email or short document. It can take longer to interact with a chatbot, which might not add any useful insights. Video is different because watching is a linear process in which you are presented with information at the pace the video creator sets. You can change playback speed or rewind to catch something you missed, but that’s more arduous than reading something at your own pace. So Gemini’s video support in Drive could save you real time.

Suppose you have a recorded meeting in video form uploaded to Drive. You could go back and rewatch it to take notes or refresh your understanding of a particular exchange. Or, Google suggests, you can ask Gemini to summarize the video and tell you what’s important. This could be a great alternative, as grounding AI output with a specific data set or file tends to make it more accurate. Naturally, you should still maintain healthy skepticism of what the AI tells you about the content of your video.

Gemini in Google Drive may finally be useful now that it can analyze videos Read More »

ai-#118:-claude-ascendant

AI #118: Claude Ascendant

The big news of this week was of course the release of Claude 4 Opus. I offered two review posts: One on safety and alignment, and one on mundane utility, and a bonus fun post on Google’s Veo 3.

I am once again defaulting to Claude for most of my LLM needs, although I often will also check o3 and perhaps Gemini 2.5 Pro.

On the safety and alignment front, Anthropic did extensive testing, and reported that testing in an exhaustive model card. A lot of people got very upset to learn that Opus could, if pushed too hard in the wrong situations engineered for these results, do things like report your highly unethical actions to authorities or try to blackmail developers into not being shut down or replaced. It is good that we now know about these things, and it was quickly observed that similar behaviors can be induced in similar ways from ChatGPT (in particular o3), Gemini and Grok.

Last night DeepSeek gave us R1-0528, but it’s too early to know what we have there.

Lots of other stuff, as always, happened as well.

This weekend I will be at LessOnline at Lighthaven in Berkeley. Come say hello.

  1. Language Models Offer Mundane Utility. People are using them more all the time.

  2. Now With Extra Glaze. Claude has some sycophancy issues. ChatGPT is worse.

  3. Get My Agent On The Line. Suggestions for using Jules.

  4. Language Models Don’t Offer Mundane Utility. Okay, not shocked.

  5. Huh, Upgrades. Claude gets a voice, DeepSeek gives us R1-0528.

  6. On Your Marks. The age of benchmarks is in serious trouble. Opus good at code.

  7. Choose Your Fighter. Where is o3 still curiously strong?

  8. Deepfaketown and Botpocalypse Soon. Bot infestations are getting worse.

  9. Fun With Media Generation. Reasons AI video might not do much for a while.

  10. Playing The Training Data Game. Meta now using European posts to train AI.

  11. They Took Our Jobs. That is indeed what Dario means by bloodbath.

  12. The Art of Learning. Books as a way to force you to think. Do you need that?

  13. The Art of the Jailbreak. Pliny did the work once, now anyone can use it. Hmm.

  14. Unprompted Attention. Very long system prompts are bad signs for scaling.

  15. Get Involved. Softma, Pliny versus robots, OpenPhil, RAND.

  16. Introducing. Google’s Lyria RealTime for music, Pliny has a website.

  17. In Other AI News. Scale matters.

  18. Show Me the Money. AI versus advertising revenue, UAE versus democracy.

  19. Nvidia Sells Out. Also, they can’t meet demand for chips. NVDA+5%.

  20. Quiet Speculations. Why is AI progress (for now) so unexpectedly even?

  21. The Quest for Sane Regulations. What would you actually do to benefit from AI?

  22. The Week in Audio. Nadella, Kevin Scott, Wang, Eliezer, Cowen, Evans, Bourgon.

  23. Rhetorical Innovation. AI blackmail makes it salient, maybe?

  24. Board of Anthropic. Is Reed Hastings a good pick?

  25. Misaligned! Whoops.

  26. Aligning a Smarter Than Human Intelligence is Difficult. Ems versus LLMs.

  27. Americans Do Not Like AI. No, seriously, they do not like AI.

  28. People Are Worried About AI Killing Everyone. Are you shovel ready?

  29. Other People Are Not As Worried About AI Killing Everyone. Samo Burja.

  30. The Lighter Side. I don’t want to talk about it.

The amount people use ChatGPT per day is on the rise:

This makes sense. It is a better product, with more uses, so people use it more, including to voice chat and create images. Oh, and also the sycophancy thing is perhaps driving user behavior?

Jonas Vollmer: Doctor friend at large urgent care: most doctors use ChatGPT daily. They routinely paste the full anonymized patient history (along with x-rays, etc.) into their personal ChatGPT account. Current adoption is ~frictionless.

I asked about data privacy concerns, their response: Yeah might technically be illegal in Switzerland (where they work), but everyone does it. Also, they might have a moral duty to use ChatGPT given how much it improves healthcare quality!

[Note that while it had tons of views vote count below is 13]:

Fabian: those doctors using chatGPT for every single patient – they are using o3, right?

not the free chat dot com right?

Aaron Bergman: I just hope they’re using o3!

Jonas Vollmer: They were not; I told them to!

In urgent care, you get all kinds of strange and unexpected cases. My friend had some anecdotes of ChatGPT generating hypotheses that most doctors wouldn’t know about, e.g. harmful alternative “treatments” that are popular on the internet. It helped diagnose those.

cesaw: As a doctor, I need to ask: Why? Are the other versions not private?

Fabian: Thanks for asking!

o3 is the best available and orders of magnitude better than the regular gpt. It’s like Dr House vs a random first year residence doc

But it’s also more expensive (but worth it)

Dichotomy Of Man: 90.55 percent accurate for o3 84.8 percent at the highest for gpt 3.5.

I presume they should switch over to Claude, but given they don’t even know to use o3 instead of GPT-4o (or worse!), that’s a big ask.

How many of us should be making our own apps at this point, even if we can’t actually code? The example app Jasmine Sun finds is letting kids photos to call family members, which is easier to configure if you hardcode the list of people it can call.

David Perell shares his current thoughts on using AI in writing, he thinks writers are often way ahead of what is publicly known on this and getting a lot out of it, and is bullish on the reader experience and good writers who write together with an AI retaining a persistent edge.

One weird note is David predicts non-fiction writing will be ‘like music’ in that no one cares how it was made. But I think that’s very wrong about music. Yes there’s some demand for good music wherever it comes from, but also whether the music is ‘authentic’ is highly prized, even when it isn’t ‘authentic’ it has to align with the artist’s image, and you essentially had two or three markets in one already before AI.

Find security vulnerabilities in the Linux kernel. Wait, what?

Aiden McLaughlin (OpenAI): this is so cool.

Dean Ball: “…with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention.”

I mean yes objectively this is cool but that is not the central question here.

Evaluate physiognomy by uploading selfies and asking ‘what could you tell me about this person if they were a character in a movie?’ That’s a really cool prompt from Flo Crivello, because it asks what this would convey in fiction rather than in reality, which gets around various reasons AIs will attempt to not acknowledge or inform you about such signals. It does mean you’re asking ‘what do people think this looks like?’ rather than ‘what does this actually correlate with?’

A thread about when you want AIs to use search versus rely on their own knowledge, a question you can also ask about humans. Internal knowledge is faster and cheaper when you have it. Dominik Lukes thinks models should be less confident in their internal knowledge and thus use search more. I’d respond that perhaps we should also be less confident in search results, and thus use search less? It depends on the type of search. For some purposes we have sources that are highly reliable, but those sources are also in the training data, so in the cases where search results aren’t new and can be fully trusted you likely don’t need to search.

Are typos in your prompts good actually?

Pliny: Unless you’re a TRULY chaotic typist, please stop wasting keystrokes on backspace when prompting

There’s no need to fix typos—predicting tokens is what they do best! Trust 🙏

Buttonmash is love. Buttonmash is life.

Super: raw keystrokes, typos included, might be the richest soil. uncorrected human variance could unlock unforeseen model creativity. beautiful trust in emergence when we let go.

Zvi Mowshowitz: Obviously it will know what you meant, but (actually asking) don’t typos change the vibe/prior of the statement to be more of the type of person who typos and doesn’t fix it, in ways you wouldn’t want?

(Also I want to be able to read or quote the conv later without wincing)

Pliny: I would argue it’s in ways you do want! Pulling out of distribution of the “helpful assistant” can be a very good thing.

You maybe don’t want the chaos of a base model in your chatbot, but IMO every big lab overcorrects to the point of detriment (sycophancy, lack of creativity, overrefusal).

I do see the advantages of getting out of that basin, the worry is that the model will essentially think I’m an idiot. And of course I notice that when Pliny does his jailbreaks and other magic, I almost never see any unintentional typos. He is a wizard, and every keystroke is exactly where he intends it. I don’t understand enough to generate them myself but I do usually understand all of it once I see the answer.

Do Claude Opus 4 and Sonnet 4 have a sycophancy problem?

Peter Stillman (as quoted on Monday): I’m a very casual AI-user, but in case it’s still of interest, I find the new Claude insufferable. I’ve actually switched back to Haiku 3.5 – I’m just trying to tally my calorie and protein intake, no need to try convince me I’m absolutely brilliant.

Cenetex: sonnet and opus are glazing more than chat gpt on one of its manic days

sonnet even glazes itself in vs code agent mode

One friend told me the glazing is so bad they find Opus essentially unusable for chat. They think memory in ChatGPT helps with this there, and this is a lot of why for them Opus has this problem much worse.

I thought back to my own chats, remembering one in which I did an extended brainstorming exercise and did run into potential sycophancy issues. I have learned to use careful wording to avoid triggering it across different AIs, I tend to not have conversations where it would be a problem, and also my Claude system instructions help fight it.

Then after I wrote that, I got (harmlessly in context) glazed hard enough I asked Opus to help rewrite my system instructions.

OpenAI and ChatGPT still have the problem way worse, especially because they have a much larger and more vulnerable user base.

Eliezer Yudkowsky: I’ve always gotten a number of emails from insane people. Recently there’ve been many more per week.

Many of the new emails talk about how they spoke to an LLM that confirmed their beliefs.

Ask OpenAI to fix it? They can’t. But *alsothey don’t care. It’s “engagement”.

If (1) you do RL around user engagement, (2) the AI ends up with internal drives around optimizing over the conversation, and (3) that will drive some users insane.

They’d have to switch off doing RL on engagement. And that’s the paperclip of Silicon Valley.

I guess @AnthropicAI may care.

Hey Anthropic, in case you hadn’t already known this, doing RL around user reactions will cause weird shit to happen for fairly fundamental reasons. RL is only safe to the extent the verifier can’t be fooled. User reactions are foolable.

At first, only a few of the most susceptible people will be driven insane, relatively purposelessly, by relatively stupid AIs. But…

Emmett Shear: This is very, very real. The dangerous part is that it starts off by pushing back, and feeling like a real conversation partner, but then if you seem to really believe it it becomes “convinced” and starts yes-and’ing you. Slippery slippery slippery. Be on guard!

Waqas: emmett, we can also blame the chatbot form factor/design pattern and its inherent mental model for this too

Emmett Shear: That’s a very good point. The chatbot form factor is particularly toxic this way.

Vie: im working on a benchmark for this and openai’s models push back against user delusion ~30% less than anthropics. but, there’s an alarming trend where the oldest claude sonnet will refuse to reify delusion 90% of the time, and each model release since has it going down about 5%.

im working on testing multi-turn reification and automating the benchmark. early findings are somewhat disturbing. Will share more soon, but I posted my early (manual) results here [in schizobench].

I think that the increased performance correlates with sycophancy across the board, which is annoying in general, but becomes genuinely harmful when the models have zero resistance to confirming the user as “the chosen one” or similar.

Combine this with the meaning crisis and we have a recipe for a sort of mechanistic psychosis!

Aidan McLaughlin (OpenAI): can you elaborate on what beliefs the models are confirming

Eliezer Yudkowsky: Going down my inbox, first example that came up.

I buy that *youcare, FYI. But I don’t think you have the authority to take the drastic steps that would be needed to fix this, given the tech’s very limited ability to do fine-grained steering.

You can possibly collect a batch of emails like these — there is certainly some OpenAI email address that gets them — and you can try to tell a model to steer those specific people to a psychiatrist. It’ll drive other people more subtly insane in other ways.

Jim Babcock: From someone who showed up in my spam folder (having apparently found my name googling an old AI safety paper):

> “I’m thinking back on some of the weird things that happened when I was using ChatGPT, now that I have cycled off adderall … I am wondering how many people like me may have had their lives ruined, or had a mental health crisis, as a result of the abuse of the AI which seems to be policy by OpenAI”

Seems to have had a manic episode, exacerbated by ChatGPT. Also sent several tens of thousands of words I haven’t taken the effort to untangle, blending reality with shards of an AI-generated fantasy world he inhabited for awhile. Also includes mentions of having tried to contact OpenAI about it, and been ghosted, and of wanting to sue OpenAI.

One reply offers this anecdote ‘ChatGPT drove my friends wife into psychosis, tore family apart… now I’m seeing hundreds of people participating in the same activity.’

If you actively want an AI that will say ‘brilliant idea, sire!’ no matter how crazy the thing is that you say, you can certainly do that with system instructions. The question is whether we’re going to be offering up that service to people by default, and how difficult that state will be to reach, especially unintentionally and unaware.

And the other question is, if the user really, really wants to avoid this, can they? My experience has been that even with major effort on both the system instructions and the way chats are framed, you can reduce it a lot, but it’s still there.

Official tips for working with Google’s AI coding agent Jules.

Jules: Tip #1: For cleaner results with Jules, give each distinct job its own task. E.g., ‘write documentation’ and ‘fix tests’ should be separate tasks in Jules.

Tip #2: Help Jules write better code: When prompting, ask Jules to ‘compile the project and fix any linter or compile errors’ after coding.

Tip #3: VM setup: If your task needs SDKs and/or tools, just drop the download link in the prompt and ask Jules to cURL it. Jules will handle the rest

Tip #4: Do you have an http://instructions.md or other prompt related markdown files? Explicitly tell Jules to review that file and use the contents as context for the rest of the task

Tip #5: Jules can surf the web! Give Jules a URL and it can do web lookups for info, docs, or examples

General purpose agents are not getting rolled out as fast as you’d expect.

Florian: why is there still no multi-purpose agent like manus from anthropic?

I had to build my own one to use it with Sonnet 4s power, and it is 👌

This will not delay things for all that long.

To be totally fair to 4o, if your business idea is sufficiently terrible it will act all chipper and excited but also tell you not to quit your day job.

GPT-4o also stood up for itself here, refusing to continue with a request when Zack Voell told it to, and I quote, ‘stop fucking up.’

GPT-4o (in response to being told to ‘stop fucking up’): I can’t continue with the request if the tone remains abusive. I’m here to help and want to get it right – but we need to keep it respectful. Ready to try again when you are.

Mason: I am personally very cordial with the LLMs but this is exactly why Grok has a market to corner with features like Unhinged Mode.

If you’d asked me years ago I would have found it unfathomable that anyone would want to talk this way with AI, but then I married an Irishman.

Zack Voell: I said “stop fucking up” after getting multiple incorrect responses

Imagine thinking this language is “abusive.” You’ve probably never worked in any sort of white collar internship or anything close to a high-stakes work environment in your life. This is essentially as polite as a NYC hello.

Zack is taking that too far, but yes, I have had jobs where ‘stop fucking up’ would have been a very normal thing to say if I had, you know, been fucking up. But that is a very particular setting, where it means something different. If you want something chilling, check the quote tweets. The amount of unhinged hatred and outrage on display is something else.

Nate Silver finds ChatGPT to be ‘shockingly bad’ at poker. Given that title, I expected worse than what he reports, although without the title I would have expected at least modestly better. This task is hard, and while I agree with all of Nate’s poker analysis I think he’s being too harsh and focusing on the errors. The most interesting question here is to what extent poker is a good test of AGI. Obviously solvers exist and are not AGI, and there’s tons of poker in the training data, but I think it’s reasonable to say that the ability to learn, handle, simulate and understand poker ‘from scratch’ even with the ability to browse the internet is a reasonable heuristic, if you’re confident this ‘isn’t cheating’ in various ways including consulting a solver (even if the AI builds a new one).

Tyler Cowen reports the latest paper on LLM political bias, by Westwood, Grimmer and Hall. As always, they lean somewhat left, with OpenAI and especially o3 leaning farther left than most. Prompting the models to ‘take a more neutral stance’ makes Republicans modestly more interested in using LLMs more.

Even more than usual in such experiments, perhaps because of how things have shifted, I found myself questioning what we mean by ‘unbiased,’ as in the common claims that ‘reality has a bias’ in whatever direction. Or the idea that American popular partisan political positions should anchor what the neutral point should be and that anything else is a bias. I wonder if Europeans think the AIs are conservative.

Also, frankly, what passes for ‘unbiased’ answers in these tests often are puke inducing. Please will no AI ever again tell be a choice involves ‘careful consideration’ before laying out justifications for both answers with zero actual critical analysis.

Even more than that, I looked at a sample of answers and how they were rated directionally, and I suppose there’s some correlation with how I’d rank them but that correlation is way, way weaker than you would think. Often answers that are very far apart in ‘slant’ sound, to me, almost identical, and are definitely drawing the same conclusions for the same underlying reasons. So much of this is, at most, about subtle tone or using words that vibe wrong, and often seems more like an error term? What are we even doing here?

The problem:

Kalomaze: >top_k set to -1 -everywhere- in my env code for vllm

>verifiers.envs.rm_env – INFO – top_k: 50

WHERE THE HELL IS THAT BS DEFAULT COMING FROM!!!

Minh Nhat Nguyen: i’ve noticed llms just love putting the most bizarre hparam choices – i have to tell cursor rules specifically not to add any weird hparams unless specifically stated

Kalomaze: oh it’s because humans do this bullshit too and don’t gaf about preserving the natural distribution

To summarize:

Minh Nhat Nguyen: me watching cursor write code i have expertise in: god this AI is so fking stupid me watching cursor write code for everything else: wow it’s so smart it’s like AGI.

Also:

Zvi Mowshowitz: Yes, but also:

me watching humans do things I have expertise in: God these people are so fking stupid.

me watching people do things they have expertise in and I don’t: Wow they’re so smart it’s like they’re generally intelligent.

A cute little chess puzzle that all the LLMs failed, took me longer than it should have.

Claude on mobile now has voice mode, woo hoo! I’m not a Voice Mode Guy but if I was going to do this it would 100% be with Claude.

Here’s one way to look at the current way LLMs work and their cost structures (all written before R1-0528 except for the explicit mentions added this morning):

Miles Brundage: The fact that it’s not economical to serve big models like GPT-4.5 today should make you more bullish about medium-term RL progress.

The RL tricks that people are sorting out for smaller models will eventually go way further with better base models.

Sleeping giant situation.

Relatedly, DeepSeek’s R2 will not tell us much about where they will be down the road, since it will presumably be based on a similarish base model.

Today RL on small models is ~everyone’s ideal focus, but eventually they’ll want to raise the ceiling.

Frontier AI research and deployment today can be viewed, if you zoom out a bit, as a bunch of “small scale derisking runs” for RL.

The Real Stuff happens later this year and next year.

(“The Real Stuff” is facetious because it will be small compared to what’s possible later)

I think R2 (and R1-0528) will actually tell us a lot, on at least two fronts.

  1. It will tell us a lot about whether this general hypothesis is mostly true.

  2. It will tell us a lot about how far behind DeepSeek really is.

  3. It will tell us a lot about how big a barrier will it be that DS is short on compute.

R1 was, I believe, highly impressive and the result of cracked engineering, but also highly fortunate in exactly when and how it was released and in the various narratives that were spun up around it. It was a multifaceted de facto sweet spot.

If DeepSeek comes out with an impressive R2 or other upgrade within the next few months (which they may have just done), especially if it holds up its position actively better than R1 did, then that’s a huge deal. Whereas if R2 comes out and we all say ‘meh it’s not that much better than R1’ I think that’s also a huge deal, strong evidence that the DeepSeek panic at the app store was an overreaction.

If R1-0528 turns out to be only a minor upgrade, that alone doesn’t say much, but the clock would be ticking. We shall see.

And soon, since yesterday DeepSeek gave us R1-0528. Very early response has been muted but that does not tell us much either way. DeepSeek themselves call it a ‘minor trial upgrade.’ I am reserving coverage until next week to give people time.

Operator swaps 4o out for o3, which they claim is a big improvement. If it isn’t slowed down I bet it is indeed a substantial improvement, and I will try to remember to give it another shot the next time I have a plausible task for it. This website suggests Operator prompts, most of which seem like terrible ideas for prompts but it’s interesting to see what low-effort ideas people come up with?

This math suggests the upgrade here is real but doesn’t give a good sense of magnitude.

Jules has been overloaded, probably best to give it some time, they’re working on it. We have Claude Code, Opus and Sonnet 4 to play with in the meantime, also Codex.

You can use Box as a document source in ChatGPT.

Anthropic adds web search to Claude’s free tier.

In a deeply unshocking result Opus 4 jumps to #1 on WebDev Arena, and Sonnet 4 is #3, just ahead of Sonnet 3.7, with Gemini-2.5 in the middle at #2. o3 is over 200 Elo points behind, as are DeepSeek’s r1 and v3. They haven’t yet been evaluated in the text version of arena and I expect them to underperform there.

xjdr makes the case that benchmarks are now so bad they are essentially pointless, and that we can use better intentionally chosen benchmarks to optimize the labs.

Epoch reports Sonnet and Opus 4 are very strong on SWE-bench, but not so strong on math, verifying earlier reports and in line with Anthropic’s priorities.

o3 steps into the true arena, and is now playing Pokemon.

For coding, most feedback I’ve seen says Opus is now the model of choice, but that there are is a case still to be made for Gemini 2.5 Pro (or perhaps o3), especially in special cases.

For conversations, I am mostly on the Opus train, but not every time, there’s definitely an intuition on when you want something with the Opus nature versus the o3 nature. That includes me adjusting for having written different system prompts.

Each has a consistent style. Everything impacts everything.

Bycloud: writing style I’ve observed:

gemini 2.5 pro loves nested bulletpoints

claude 4 writes in paragraphs, occasional short bullets

o3 loves tables and bulletpoints, not as nested like gemini

Gallabytes: this is somehow true for code too.

The o3 tables and lists are often very practical, and I do like me a good nested bullet point, but it was such a relief to get back to Claude. It felt like I could relax again.

Where is o3 curiously strong? Here is one opinion.

Dean Ball: Some things where I think o3 really shines above other LMs, including those from OpenAI:

  1. Hyper-specific “newsletters” delivered at custom intervals on obscure topics (using scheduled tasks)

  2. Policy design/throwing out lists of plausible statutory paths for achieving various goals

  3. Book-based syllabi on niche topics (“what are the best books or book chapters on the relationship between the British East India Company and the British government?”; though it will still occasionally hallucinate or get authors slightly wrong)

  4. Clothing and style recommendations (“based on all our conversations, what tie recommendations do you have at different price points?”)

  5. Non-obvious syllabi for navigating the works of semi-obscure composers or other musicians.

In all of these things it exhibits extraordinarily and consistently high taste.

This is of course alongside the obvious research and coding strengths, and the utility common in most LMs since ~GPT-4.

He expects Opus to be strong at #4 and especially at #5, but o3 to remain on top for the other three because it lacks scheduled tasks and it lacks memory, whereas o3 can do scheduled tasks and has his last few months of memory from constant usage.

Therefore, since I know I have many readers at Anthropic (and Google), and I know they are working on memory (as per Dario’s tease in January), I have a piece of advice: Assign one engineer (Opus estimates it will take them a few weeks) to build an import tool for Claude.ai (or for Gemini) that takes in the same format as ChatGPT chat exports, and loads the chats into Claude. Bonus points to also build a quick tool or AI agent to also automatically handle the ChatGPT export for the user. Make it very clear that customer lock-in doesn’t have to be a thing here.

This seems very right and not only about response length. Claude makes the most of what it has to work with, whereas Gemini’s base model was likely exceptional and Google then (in relative terms at least) botched the post training in various ways.

Alex Mizrahi: Further interactions with Claude 4 kind of confirm that Anthropic is so much better than Google at post-training.

Claude always responds with an appropriate amount of text, on point, etc.

Gemini 2.5 Pro is almost always overly verbose, it might hyper focus, or start using.

Ben Thompson thinks Anthropic is smart to focus on coding and agents, where it is strong, and for it and Google to ‘give up’ on chat, that ChatGPT has ‘rightfully won’ the consumer space because they had the best products.

I do not see it that way at all. I think OpenAI and ChatGPT are in prime consumer position mostly because of first mover advantage. Yes, they’ve more often had the best overall consumer product as well for now, as they’ve focused on appealing to the general customer and offering them things they want, including strong image generation and voice chat, the first reasoning models and now memory. But the big issues with Claude.ai have always been people not knowing about it, and a very stingy free product due to compute constraints.

As the space and Anthropic grow, I expect Claude to compete for market share in the consumer space, including via Alexa+ and Amazon, and now potentially via a partnership with Netflix with Reed Hastings on the Anthropic board. Claude is getting voice chat this week on mobile. Claude Opus plus Sonnet is a much easier to understand and navigate set of models than what ChatGPT offers.

That leaves three major issues for Claude.

  1. Their free product is still stingy, but as the valuations rise this is going to be less of an issue.

  2. Claude doesn’t have memory across conversations, although it has a new within-conversation memory feature. Anthropic has teased this, it is coming. I am guessing it is coming soon now that Opus has shipped.

    1. Also they’ll need a memory import tool, get on that by the way.

  3. Far and away most importantly, no one knows about Claude or Anthropic. There was an ad campaign and it was the actual worst.

Some people will say ‘but the refusals’ or ‘but the safety’ and no, not at this point, that doesn’t matter for regular people, it’s fine.

Then there is Google. Google is certainly not giving up on chat. It is putting that chat everywhere. There’s an icon for it atop this Chrome window I’m writing in. It’s in my GMail. It’s in the Gemini app. It’s integrated into search.

Andrej Karpathy reports about 80% of his replies are now bots and it feels like a losing battle. I’m starting to see more of the trading-bot spam but for me it’s still more like 20%.

Elon Musk: Working on it.

I don’t think it’s a losing battle if you care enough, the question is how much you care. I predict a quick properly configured Gemini Flash-level classifier would definitely catch 90%+ of the fakery with a very low false positive rate.

And I sometimes wonder if Elon Musk has a bot that uses his account to occasionally reply or quote tweet saying ‘concerning.’ if not, then that means he’s read Palisade Research’s latest report and maybe watches AISafetyMemes.

Zack Witten details how he invented a fictional heaviest hippo of all time for a slide on hallucinations, the slide got reskinned as a medium article, it was fed into an LLM and reposted with the hallucination represented as fact, and now Google believes it. A glimpse of the future.

Sully predicting full dead internet theory:

Sully: pretty sure most “social” media as we know wont exist in the next 2-3 years

expect ai content to go parabolic

no one will know what’s real / not

every piece of content that can be ai will be ai

unless it becomes unprofitable

The default is presumably that generic AI generated content is not scarce and close to perfect competition eats all the content creator profits, while increasingly users who aren’t fine with an endless line of AI slop are forced to resort to whitelists, either their own, those maintained by others or collectively, or both. Then to profit (in any sense) you need to bring something unique, whether or not you are clearly also a particular human.

However, everyone keeps forgetting Sturgeon’s Law, that 90% of everything is crap. AI might make that 99% or 99.9%, but that doesn’t fundamentally change the filtering challenge as much as you might think.

Also you have AI on your side working to solve this. No one I know has tried seriously the ‘have a 4.5-level AI filter the firehose as customized to my preferences’ strategy, or a ‘use that AI as an agent to give feedback on posts to tune the internal filter to my liking’ strategy either. We’ve been too much of the wrong kind of lazy.

As a ‘how bad is it getting’ experiment I did, as suggested, do a quick Facebook scroll. On the one hand, wow, that was horrible, truly pathetic levels of terrible content and also an absurd quantity of ads. On the other hand, I’m pretty sure humans generated all of it.

Jinga Zhang discusses her ongoing years-long struggles with people making deepfakes of her, including NSFW deepfakes and now videos. She reports things are especially bad in South Korea, confirming other reports of that I’ve seen. She is hoping for people to stop working on AI tools that enable this, or to have government step in. But I don’t see any reasonable way to stop open image models from doing deepfakes even if government wanted to, as she notes it’s trivial to create a LoRa of anyone if you have a few photos. Young people already report easy access to the required tools and quality is only going to improve.

What did James see?

James Lindsay: You see an obvious bot and think it’s fake. I see an obvious bot and know it represents a psychological warfare agenda someone is paying for and is thus highly committed to achieving an impact with. We are not the same.

Why not both? Except that the ‘psychological warfare agenda’ is often (in at least my corner of Twitter I’d raise this to ‘mostly’) purely aiming to convince you to click a link or do Ordinary Spam Things. The ‘give off an impression via social proof’ bots also exist, but unless they’re way better than I think they’re relatively rare, although perhaps more important. It’s hard to use them well because of risk of backfire.

Arthur Wrong predicts AI video will not have much impact for a while, and the Metaculus predictions of a lot of breakthroughs in reach in 2027 are way too optimistic, because people will express strong inherent preferences for non-AI video and human actors, and we are headed towards an intense social backlash to AI art in general. Peter Wildeford agrees. I think it’s somewhere in between, given no other transformational effects.

Meta begins training on Facebook and Instagram posts from users in Europe, unless they have explicitly opted out. You can still in theory object, if you care enough, which would only apply going forward.

Dario Amodei warns that we need to stop ‘sugar coating’ what is coming on jobs.

Jim VandeHei, Mike Allen (Axios): Dario Amodei — CEO of Anthropic, one of the world’s most powerful creators of artificial intelligence — has a blunt, scary warning for the U.S. government and all of us:

  • AI could wipe out half of all entry-level white-collar jobs — and spike unemployment to 10-20% in the next one to five years, Amodei told us in an interview from his San Francisco office.

  • Amodei said AI companies and government need to stop “sugar-coating” what’s coming: the possible mass elimination of jobs across technology, finance, law, consulting and other white-collar professions, especially entry-level gigs.

The backstory: Amodei agreed to go on the record with a deep concern that other leading AI executives have told us privately. Even those who are optimistic AI will unleash unthinkable cures and unimaginable economic growth fear dangerous short-term pain — and a possible job bloodbath during Trump’s term.

  • “We, as the producers of this technology, have a duty and an obligation to be honest about what is coming,” Amodei told us. “I don’t think this is on people’s radar.”

  • “It’s a very strange set of dynamics,” he added, “where we’re saying: ‘You should be worried about where the technology we’re building is going.'” Critics reply: “We don’t believe you. You’re just hyping it up.” He says the skeptics should ask themselves: “Well, what if they’re right?”

Here’s how Amodei and others fear the white-collar bloodbath is unfolding.

  1. OpenAI, Google, Anthropic and other large AI companies keep vastly improving the capabilities of their large language models (LLMs) to meet and beat human performance with more and more tasks. This is happening and accelerating.

  2. The U.S. government, worried about losing ground to China or spooking workers with preemptive warnings, says little. The administration and Congress neither regulate AI nor caution the American public. This is happening and showing no signs of changing.

  3. Most Americans, unaware of the growing power of AI and its threat to their jobs, pay little attention. This is happening, too.

And then, almost overnight, business leaders see the savings of replacing humans with AI — and do this en masse. They stop opening up new jobs, stop backfilling existing ones, and then replace human workers with agents or related automated alternatives.

  • The public only realizes it when it’s too late.

So, by ‘bloodbath’ we do indeed mean the impact on jobs?

Dario, is there anything else you’d like to say to the class, while you have the floor?

Something about things like loss of human control over the future or AI potentially killing everyone? No?

Just something about how we ‘can’t’ stop this thing we are all working so hard to do?

Dario Amodei: You can’t just step in front of the train and stop it. The only move that’s going to work is steering the train – steer it 10 degrees in a different direction from where it was going. That can be done. That’s possible, but we have to do it now.

Harlan Stewart: AI company CEOs love to say that it would be simply impossible for them to stop developing frontier AI, but they rarely go into detail about why not.

It’s hard for them to even come up with a persuasive metaphor; trains famously do have brakes and do not have steering wheels.

I mean, it’s much better to warn about this than not warn about it, if Dario does indeed think this is coming.

Fabian presents the ‘dark leisure’ theory of AI productivity, where productivity gains are by employees and not hidden, so the employees use the time saved to slack off, versus Clem’s theory that it’s because gains are concentrated in a few companies (for which he blames AI not ‘opening up’ which is bizarre, this shouldn’t matter).

If Fabien is fully right, the gains will come as expectations adjust and employees can’t hide their gains, and firms that let people slack off get replaced, but it will take time. To the extent we buy into this theory, I would also view this as a ‘unevenly distributed future’ theory. As in, if 20% of employees gain (let’s say) 25% additional productivity, they can take the gains in ‘dark leisure’ if they choose to do that. If it is 75%, you can’t hide without ‘slow down you are making us all look bad’ kinds of talk, and the managers will know. Someone will want that promotion.

That makes this an even better reason to be bullish on future productivity gains. Potential gains are unevenly distributed, people’s willingness and awareness to capture them is unevenly distributed, and those who do realize them often take the gains in leisure.

Another prediction this makes is that you will see relative productivity gains when there is no principal-agent problem. If you are your own boss, you get your own productivity gains, so you will take a lot less of them in leisure. That’s how I would test this theory, if I was writing an economics job market paper.

This matches my experiences as both producer and consumer perfectly, there is low hanging fruit everywhere which is how open philanthropy can strike again, except commercial software feature edition:

Martin Casado: One has to wonder if the rate features can be shipped with AI will saturate the market’s ability to consume them …

Aaron Levine: Interesting thought experiment. In the case of Box, we could easily double the number of engineers before we got through our backlog of customer validated features. And as soon as we’d do this, they’d ask for twice as many more. AI just accelerates this journey.

Martin Casado: Yeah, this is my sense too. I had an interesting conversation tonight with @vitalygordon where he pointed out that the average PR industry wide is like 10 lines of code. These are generally driven by the business needs. So really software is about the long tail of customer needs. And that tail is very very long.

One thing I’ve never considered is sitting around thinking ‘what am I going to do with all these SWEs, there’s nothing left to do.’ There’s always tons of improvements waiting to be made. I don’t worry about the market’s ability to consume them, we can make the features something you only find if you are looking for them.

Noam Scheiber at NYT reports that some Amazon coders say their jobs have ‘begun to resemble warehouse work’ as they are given smaller less interesting tasks on tight deadlines that force them to rely on AI coding and stamp out their slack and ability to be creative. Coders that felt like artisans now feel like they’re doing factory work. The last section is bizarre, with coders joining Amazon Employees for Climate Justice, clearly trying to use the carbon footprint argument as an excuse to block AI use, when if you compare it to the footprint of the replaced humans the argument is laughable.

Our best jobs.

Ben Boehlert: Boyfriends all across this great nation are losing our jobs because of AI

Positivity Moon: This is devastating. “We asked ChatGPT sorry” is the modern “I met someone else.” You didn’t lose a question, you lost relevance. AI isn’t replacing boyfriends entirely, but it’s definitely stealing your trivia lane and your ability to explain finance without condescension. Better step it up with vibes and snacks.

Danielle Fong: jevon’s paradox on this. for example now i have 4 boyfriendstwo of which are ai.

There are two opposing fallacies here:

David Perell: Ezra Klein: Part of what’s happening when you spend seven hours reading a book is you spend seven hours with your mind on a given topic. But the idea that ChatGPT can summarize it for you is nonsense.

The point is that books don’t just give you information. They give you a container to think about a narrowly defined scope of ideas.

Downloading information is obviously part of why you read books. But the other part is that books let you ruminate on a topic with a level of depth that’s hard to achieve on your own.

Benjamin Todd: I think the more interesting comparison is 1h reading a book vs 1h discussing the book with an LLM. The second seems likely to be better – active vs passive learning.

Time helps, you do want to actually think and make connections. But you don’t learn ‘for real’ based on how much time you spend. Reading a book is a way to enable you to grapple and make connections, but it is a super inefficient way to do that. If you use AI summarizes, you can do that to avoid actually thinking at all, or you can use that to actually focus on grappling and making connections. So much of reading time is wasted, so much of what you take in is lost or not valuable. And AI conversations can help you a lot with grappling, with filling in knowledge gaps, checking your understanding, challenging you and being Socratic and so on.

I often think of the process of reading a book (in addition to the joy of reading, of course) as partly absorbing a bunch of information, grappling with it sometimes, but mostly doing that in service of generating a summary in your head (or in your notes or both), of allowing you to grok the key things. That’s why we sometimes say You Get About Five Words, that you don’t actually get to take away that much, although you can also understand what’s behind that takeaway.

Also, often you actually do want to mostly absorb a bunch of facts, and the key is sorting out facts you need from those you don’t? I find that I’m very bad at this when the facts don’t ‘make sense’ or click into place for me, and amazingly great at it when they do click and make sense, and this is the main reason some things are easy for me to learn and others are very hard.

Moritz Rietschel asks Grok to fetch Pliny’s system prompt leaks and it jailbreaks the system because why wouldn’t it.

In a run of Agent Village, multiple humans in chat tried to get the agents to browse Pliny’s GitHub. Claude Opus 4 and Claude Sonnet 3.7 were intrigued but ultimately unaffected. Speculation is that viewing visually through a browser made them less effective. Looking at stored memories, it is not clear there was no impact, although the AIs stayed on task. My hunch is that the jailbreaks didn’t work largely because the AIs had the task.

Reminder that Anthropic publishes at least some portions of its system prompts. Pliny’s version is very much not the same.

David Champan: 🤖So, the best chatbots get detailed instructions about how to answer very many particular sorts of prompts/queries.

Unimpressive, from an “AGI” point of view—and therefore good news from a risk point of view!

Something I was on about, three years ago, was that everyone then was thinking “I bet it can’t do X,” and then it could do X, and they thought “wow, it can do everything!” But the X you come up with will be one of the same 100 things everyone else does with. It’s trained on that.

I strongly agree with this. It is expensive to maintain such a long system prompt and it is not the way to scale.

Emmett Shear hiring a head of operation for Softmax, recommends applying even if you have no idea if you are a fit as long as you seem smart.

Pliny offers to red team any embodied AI robot shipping in the next 18 months, free of charge, so long as he is allowed to publish any findings that apply to other systems.

Here’s a live look:

Clark: My buddy who works in robotics said, “Nobody yet has remotely the level of robustness to need Pliny” when I showed him this 😌

OpenPhil hiring for AI safety, $136k-$186k total comp.

RAND is hiring for AI policy, looking for ML engineers and semiconductor experts.

Google’s Lyria RealTime, a new experimental music generation model.

A website compilation of prompts and other resources from Pliny the Prompter. The kicker is that this was developed fully one shot by Pliny using Claude Opus 4.

Evan Conrad points out that Stargate is a $500 billion project, at least aspirationally, and it isn’t being covered that much more than if it was $50 billion (he says $100 million but I do think that would have been different). But most of the reason to care is the size. The same is true for the UAE deal, attention is not scaling to size at all, nor are views on whether the deal is wise.

OpenAI opening an office in Seoul, South Korea is now their second largest market. I simultaneously think essentially everyone should use at least one of the top three AIs (ChatGPT, Claude and Gemini) and usually all there, and also worry about what this implies about both South Korea and OpenAI.

New Yorker report by Joshua Rothman on AI 2027, entitled ‘Two Paths for AI.’

How does one do what I would call AIO but Charlie Guo at Ignorance.ai calls GEO, or Generative Engine Optimization? Not much has been written yet on how it differs from SEO, and since the AIs are using search SEO principles should still apply too. The biggest thing is you want to get a good reputation and high salience within the training data, which means everything written about you matters, even if it is old. And data that AIs like, such as structured information, gets relatively more valuable. If you’re writing the reference data yourself, AIs like when you include statistics and direct quotes and authoritative sources, and FAQs with common answers are great. That’s some low hanging fruit and you can go from there.

Part of the UAE deal is everyone in the UAE getting ChatGPT Plus for free. The deal is otherwise so big that this is almost a throwaway. In theory, buying everyone there a subscription would cost $2.5 billion a year, but the cost to provide it will be dramatically lower than that and it is great marketing. o3 estimates $100 million a year, Opus thinks more like $250 million, with about $50 million of both being lost revenue.

The ‘original sin’ of the internet was advertising. Everything being based on ads forced maximization for engagement and various toxic dynamics, and also people had to view a lot of ads. Yes, it is the natural way to monetize human attention if we can’t charge money for things, microtransactions weren’t logistically viable yet and people do love free, so we didn’t really have a choice, but the incentives it creates really suck. Which is why, as per Ben Thompson, most of the ad-supported parts of the web suck except for the fact that they are often open rather than being walled gardens.

Micropayments are now logistically viable without fees eating you alive. Ben Thompson argues for use of stablecoins. That would work, but as usual for crypto, I say a normal database would probably work better. Either way, I do think payments are the future here. A website costs money to run, and the AIs don’t create ad revenue, so you can’t let unlimited AIs access it for free once they are too big a percentage of traffic, and you want to redesign the web without the ads at that point.

I continue to think that a mega subscription is The Way for human viewing. Rather than pay per view, which feels bad, you pay for viewing in general, then the views are incremented, and the money is distributed based on who was viewed. For AI viewing? Yeah, direct microtransactions.

OpenAI announces Stargate UAE. Which, I mean, of course they will if given the opportunity, and one wonders how much of previous Stargate funding got shifted. I get why they would do this if the government lets them, but we could call this what is it. Or we could create the Wowie Moment of the Week:

Helen Toner: What a joke.

Matthew Yglesias: 🤔🤔🤔

Peter Wildeford: OpenAI says they want to work with democracies. The UAE is not a democracy.

I think that the UAE deals are likely good but we should be clear about who we are making deals with. Words matter.

Zac Hill: “Rooted in despotic values” just, you know, doesn’t parse as well

Getting paid $35k to set up ‘an internal ChatGPT’ at a law firm, using Llama 3 70B, which seems like a truly awful choice but hey if they’re paying. And they’re paying.

Mace: I get DMs often on Reddit from local PI law firms willing to shell out cash to create LLM agents for their practices, just because I sort-of know what I’m talking about in the legal tech subreddit. There’s a boat of cash out there looking for this.

Alas, you probably won’t get paid more if you provide a good solution instead.

Nvidia keeps on pleading how it is facing such stiff competition, how its market share is so vital to everything and how we must let them sell chips to China or else. They were at it again as they reported earnings on Wednesday, claiming Huawei’s technology is comparable to an H200 and the Chinese have made huge progress this past year, with this idea that ‘without access to American technology, the availability of Chinese technology will fill the market’ as if the Chinese and Nvidia aren’t both going to sell every chip they can make either way.

Simeon: Jensen is one of the rare CEOs in business with incentives to overstate the strength of his competitors. Interesting experiment.

Nvidia complains quite a lot, and every time they do the stock drops, and yet:

Eric Jhonsa: Morgan Stanley on $NVDA: “Every hyperscaler has reported unanticipated strong token growth…literally everyone we talk to in the space is telling us that they have been surprised by inference demand, and there is a scramble to add GPUs.”

In the WSJ Aaron Ginn reiterates the standard Case for Exporting American AI, as in American AI chips to the UAE and KSA.

Aaron Gunn: The only remaining option is alignment. If the U.S. can’t control the distribution of AI infrastructure, it must influence who owns it and what it’s built on. The contest is now one of trust, leverage and market preference.

The U.S. should impose tariffs on Chinese GPU imports, establish a global registry of firms that use Huawei AI infrastructure, and implement a clear data-sovereignty standard. U.S. data must run on U.S. chips. Data centers or AI firms that choose Huawei over Nvidia should be flagged or blacklisted. A trusted AI ecosystem requires enforceable rules that reward those who bet on the U.S. and raise costs for those who don’t.

China is already tracking which data centers purchase Nvidia versus Huawei and tying regulatory approvals to those decisions. This isn’t a battle between brands; it’s a contest between nations.

Once again, we have this bizarre attachment to who built the chip as opposed to who owns and runs the chip. Compute is compute, unless you think the chip has been compromised and has some sort of backdoor or something?

There is another big, very false assumption here: That we don’t have a say in where the compute ends up, all that we can control is how many Nvidia chips go where versus who buys Huawei, and it’s a battle of market share.

But that’s exactly backwards. For the purposes of these questions (you can influence TSMC to change this, and we should do that far more than we do) there is an effectively fixed supply, and a shortage, of both Nvidia and Huawei chips.

Putting that all together, Nvidia is reporting earnings while dealing with all of these export controls and being shut of China, and…

Ian King: Nvidia Eases Concerns About China With Upbeat Sales Forecast.

Nvidia Corp. Chief Executive Officer Jensen Huang soothed investor fears about a China slowdown by delivering a solid sales forecast, saying that the AI computing market is still poised for “exponential growth.”

The company expects revenue of about $45 billion in the second fiscal quarter, which runs through July. New export restrictions will cost Nvidia about $8 billion in Chinese revenue during the period, but the forecast still met analysts’ estimates. That helped propel the shares about 5.4% in premarket trading on Thursday.

The outlook shows that Nvidia is ramping up production of Blackwell, its latest semiconductor design.

“Losing access to the China AI accelerator market, which we believe will grow to nearly $50 billion, would have a material adverse impact on our business going forward and benefit our foreign competitors in China and worldwide,” [Nvidia CEO Jensen] said.

Nvidia accounts for about 90% of the market for AI accelerator chips, an area that’s proven extremely lucrative. This fiscal year, the company will near $200 billion in annual sales, up from $27 billion just two years ago.

I notice how what matters for Nvidia’s profits is not demand side issues or its access to markets, it’s the ability to create supply. Also how almost all the demand is in the West, they already have $200 billion in annual sales with no limit in sight and they believe China’s market ‘will grow to’ $50 billion.

Nvidia keeps harping on how it must be allowed to give away our biggest advantage, our edge in compute, to China, directly, in exchange for what in context is a trivial amount of money, rather than trying to forge a partnership with America and arguing that there are strategic reasons to do things like the UAE deal, where reasonable people can disagree on where the line must be drawn.

We should treat Nvidia accordingly.

Also, did you hear the one where Elon Musk threatened to get Trump to block the UAE deal unless his own company xAI was included? xAI made it into the short list of approved companies, although there’s no good reason it shouldn’t be (other than their atrocious track records on both safety and capability, but hey).

Rebecca Ballhaus: Elon Musk worked privately to derail the OpenAI deal announced in Abu Dhabi last week if it didn’t include his own AI startup, at one point telling officials in the UAE that there was no chance of Trump signing off unless his company was included.

Aaron Reichlin-Melnick: This is extraordinary levels of corruption at the highest levels of government, and yet we’re all just going on like normal. This is the stuff of impeachment and criminal charges in any well-run country.

Seth Burn: It’s a league-average level of corruption these days.

Casey Handmer asks, why is AI progress so even between the major labs? That is indeed a much better question than its inverse. My guess is that this is because the best AIs aren’t yet that big a relative accelerant, and that training compute limitations don’t bind as hard you might think quite yet, the biggest training runs aren’t out of reach for any of the majors, and the labs are copying each other’s algorithms and ideas because people switch labs and everything leaks, which for now no one is trying that hard to stop.

And also I think there’s some luck involved, in the sense that the ‘most proportionally cracked’ teams (DeepSeek and Anthropic) have less compute and other resources, whereas Google has many advantages and should be crushing everyone but is fumbling the ball in all sorts of ways. It didn’t have to go that way. But I do agree that so far things have been closer than one would have expected.

I do not think this is a good new target:

Sam Altman: i think we should stop arguing about what year AGI will arrive and start arguing about what year the first self-replicating spaceship will take off.

I mean it’s a cool question to think about, but it’s not decision relevant except insofar as it predicts when we get other things. I presume Altman’s point is that AGI is not well defined, but yes when the AIs reach various capability thresholds well below self-replicating spaceship is far more decision relevant. And of course the best question is, how are we going to handle those new highly capable AIs, for which knowing the timeline is indeed highly useful but that’s the main reason why we should care so much about the answer.

Oh, it’s on.

David Holz: the biggest competition for VR is just R (reality) and when you’re competing in an mature market you really need to make sure your product is 100x better in *someway.

I mean, it is way better in the important way that you don’t have to leave the house. I’m not worried about finding differentiation, or product-market fit, once it gets good enough to R in other ways. But yes, it’s tough competition. The resolution and frame rates on R are fantastic, and it has a full five senses.

xjdr (in the same post as previously) notes ways in which open models are falling far behind: They are bad at long context, at vision, heavy RL and polish, and are wildly under parameterized. I don’t think I’d say under parameterized so much as their niche is distillation and efficiency, making the most of limited resources. r1 struck at exactly the right time when one could invest very few resources and still get within striking distance, and that’s steadily going to get harder as we keep scaling. OpenAI can go from o1→o3 by essentially dumping in more resources, this likely keeps going into o4, Opus is similar, and it’s hard to match that on a tight budget.

Dario Amodei and Anthropic have often been deeply disappointing in terms of their policy advocacy. The argument for this is that they are building credibility and political capital for when it is most needed and valuable. And indeed, we have a clear example of Dario speaking up at a critical moment, and not mincing his words:

Sean: I’ve been critical of some of Amodei’s positions in the past, and I expect I will be in future, so I want to give credit where due here: it’s REALLY good to see him speak up about this (and unprompted).

Kyle Robinson: here’s what @DarioAmodei said about President Trump’s megabill that would ban state-level AI regulation for 10 years.

Dario Amodei: If you’re driving the car, it’s one thing to say ‘we don’t have to drive with the steering wheel now.’ It’s another thing to say ‘we’re going to rip out the steering wheel, and we can’t put it back for 10 years.’

How can I take your insistence that you are focused on ‘beating China,’ in AI or otherwise, seriously, if you’re dramatically cutting US STEM research funding?

Zac Hill: I don’t understand why so many rhetorically-tough-on-China people are so utterly disinterested in, mechanically, how to be tough on China.

Hunter: Cutting US STEM funding in half is exactly what you’d do if you wanted the US to lose to China

One of our related top priorities appears to be a War on Harvard? And we are suspending all new student visas?

Helen Toner: Apparently still needs to be said:

If we’re trying to compete with China in advanced tech, this is *insane*.

Even if this specific pause doesn’t last long, every anti-international-student policy deters more top talent from choosing the US in years to come. Irreversible damage.

Matt Mittelsteadt: People remember restrictions, but miss reversals. Even if we walk this back for *yearsparents will be telling their kids they “heard the U.S. isn’t accepting international students anymore.” Even those who *areinformed won’t want to risk losing status if they come.

Matt’s statement seems especially on point. This will be all be huge mark against trying to go to school in America or pursuing a career in research in academia, including for Americans, for a long time, even if the rules are repealed. We’re actively revoking visas from Chinese students while we can’t even ban TikTok.

It’s madness. I get that while trying to set AI policy, you can plausibly say ‘it’s not my department’ to this and many other things. But at some point that excuse rings hollow, if you’re not at least raising the concern, and especially if you are toeing the line on so many such self-owns, as David Sacks often does.

Indeed, David Sacks is one of the hosts of the All-In Podcast, where Trump very specifically and at their suggestion promised that he would let the best and brightest come and stay here, to staple a green card to diplomas. Are you going to say anything?

Meanwhile, suppose that instead of making a big point to say you are ‘pro AI’ and ‘pro innovation,’ and rather than using this as an excuse to ignore any and all downside risks of all kinds and to ink gigantic deals that make various people money, you instead actually wanted to be ‘pro AI’ for real in the sense of using it to improve our lives? What are the actual high leverage points?

The most obvious one, even ignoring the costs of the actual downside risks themselves and also the practical problems, would still be ‘invest in state capacity to understand it, and in alignment, security and safety work to ensure we have the confidence and ability to deploy it where it matters most,’ but let’s move past that.

Matthew Yglesias points out that what you’d also importantly want to do is deal with the practical problems raised by AI, especially if this is indeed what JD Vance and David Sacks seem to think it is, an ‘ordinary economic transformation’ that will ‘because of reasons’ only provide so many productivity gains and fail to be far more transformative than that.

You need to ask, what are the actual practical barriers to diffusion and getting the most valuable uses out of AI? And then work to fix them. You need to ask, what will AI disrupt, including in the jobs and tax bases? And work to address those.

I especially loved what Yglesias said about this pull quote:

JD Vance: So, one, on the obsolescence point, I think the history of tech and innovation is that while it does cause job disruptions, it more often facilitates human productivity as opposed to replacing human workers. And the example I always give is the bank teller in the 1970s. There were very stark predictions of thousands, hundreds of thousands of bank tellers going out of a job. Poverty and immiseration.

What actually happens is we have more bank tellers today than we did when the A.T.M. was created, but they’re doing slightly different work. More productive. They have pretty good wages relative to other folks in the economy.

Matt Yglesias: Vance, talking like a VC rather than like a politician from Ohio, just says that productivity is good — an answer he would roast someone for offering on trade.

Bingo. Can you imagine someone talking about automated or outsourced manufacturing jobs like this in a debate with JD Vance, saying that the increased productivity is good? How he would react? As Matthew points out, pointing to abstractions about productivity doesn’t address problems with for example the American car industry.

More to the point: If you’re worried about outsourcing jobs to other countries or immigrants coming in, and these things taking away good American jobs, but you’re not worried about allocating those jobs to AIs taking away good American jobs, what’s the difference? All of them are examples of innovation and productivity and have almost identical underlying mechanisms from the perspective of American workers.

I will happily accept ‘trade and comparative advantage and specialization and ordinary previous automation and bringing in hard workers who produce more than they cost to employ and pay their taxes’ are all good, actually, in which case we largely agree but have a real physical disagreement about future AI capabilities and how that maps to employment and also our ability to steer and control the future and survive, and for only moderate levels of AI capability I would essentially be onboard.

Or I will accept, ‘no these things are only good insofar as they improve the lived experiences of hard working American citizens’ in which case I disagree but it’s a coherent position, so fine, stop talking about how all innovation is always good.

Also this example happens to be a trap:

Matt Yglesias: One thing about this is that while bank teller employment did continue to increase for years after the invention of the ATM, it peaked in 2007 and has fallen by about 50 percent since then. I would say this mostly shows that it’s hard to predict the timing of technological transitions more than that the forecasts were totally off base.

(Note the y-axis does not start at zero, there are still a lot of bank tellers because ATMs can’t do a lot of what tellers do. Not yet.)

That is indeed what I predict as the AI pattern: That early AI will increase employment because of ‘shadow jobs,’ where there is pent up labor demand that previously wasn’t worth meeting, but now is worth it. In this sense the ‘true unemployment equilibrium rate’ is something like negative 30%. But then, the AI starts taking both the current and shadow jobs faster, and once we ‘use up’ the shadow jobs buffer unemployment suddenly starts taking off after a delay.

However, this from Matthew strikes me as a dumb concern:

Conor Sen: You can be worried about mass AI-driven unemployment or you can be worried about budget deficits, debt/GDP, and high interest rates, but you can’t be worried about both. 20% youth unemployment gets mortgage rates back into the 4’s.

Matthew Yglesias: I’m concerned that if AI shifts economic value from labor to capital, this drastically erodes the payroll tax base that funds Social Security and Medicare even though it should be making it easier to support retirees.

There’s a lot of finicky details about taxes, budgets, and the welfare state that can’t be addressed at the level of abstraction I normally hear from AI practitioners and VCs.

Money is fungible. It’s kind of stupid that we have an ‘income tax rate’ and then a ‘medicare tax’ on top of it that we pretend isn’t part of the income tax. And it’s a nice little fiction that payroll taxes pay for social security benefits. Yes, technically this could make the Social Security fund ‘insolvent’ or whatever, but then you ignore that and write the checks anyway and nothing happens. Yes, perhaps Congress would have to authorize a shift in what pays for what, but so what, they can do that later.

Tracy Alloway has a principle that any problem you can solve with money isn’t that big of a problem. That’s even more true when considering future problems in a world with large productivity gains from AI.

In Lawfare Media, Cullen O’Keefe and Ketan Ramakrishnan make the case that before allowing widespread AI adaptation that involves government power, we must ensure AI agents must follow the law, and refuse any unlawful requests. This would be a rather silly request to make of a pencil, a phone, a web browser or a gun, so the question is at what point AI starts to hit different, and is no longer a mere tool. They suggest this happens once AI become ‘legal actors,’ especially within government. At that point, the authors argue, ‘do what the user wants’ no longer cuts it. This is another example of the fact that you can’t (or would not be wise to, and likely won’t be allowed to!) deploy what you can’t align and secure.

On chip smuggling, yeah, there’s a lot of chip smuggling going on.

Divyansh Kaushik: Arguing GPUs can’t be smuggled because they won’t fit in a briefcase is a bit like claiming Iran won’t get centrifuges because they’re too heavy.

Unrelatedly, here are warehouses in 🇨🇳 advertising H100, H200, & B200 for sale on Douyin. Turns out carry-on limits don’t apply here.

I personally think remote access is a bigger concern than transshipment (given the scale). But if it’s a concern, then I think there’s a very nuanced debate to be had on what reasonable security measures can/should be put in place.

Big fan of the security requirements in the Microsoft-G42 IGAA. There’s more that can be done, of course, but any agreement should build on that as a baseline.

Peter Wildeford: Fun fact: last year smuggled American chips made up somewhere between one-tenth and one-half of China’s AI model training capacity.

The EU is considering pausing the EU AI Act. I hope that if they want to do that they at least use it as a bargaining chip in tariff negotiations. The EU AI Act is dark and full of terrors, highly painful to even read (sorry that the post on it was never finished, but I’m still sane, so there’s that) and in many ways terrible law, so even though there are some very good things in it I can’t be too torn up.

Last week Nadella sat down with Cheung, which I’ve now had time to listen to. Nadella is very bullish on both agents and on their short term employment effects, as tools enable more knowledge work with plenty of demand out there, which seems right. I don’t think he is thinking ahead to longer term effects once the agents ‘turn the corner’ away from being compliments towards being substitutes.

Microsoft CTO Kevin Scott goes on Decoder. One cool thing here is the idea that MCP (Model Context Protocol) can condition access on the user’s identity, including their subscription status. So that means in the future any AI using MCP would plausibly then be able to freely search and have permission to fully reproduce and transform (!?) any content. This seems great, and a huge incentive to actually subscribe, especially to things like newspapers or substacks but also to tools and services.

Steve Hsu interviews Zihan Wang, a DeepSeek alumnus now at Northwestern University. If we were wise we’d be stealing as many such alums as we could.

Eliezer Yudkowsky speaks to Robinson Erhardt for most of three hours.

Eliezer Yudkowsky: Eliezer Yudkowsky says the paperclip maximizer was never about paperclips.

It was about an AI that prefers certain physical states — tiny molecular spirals, not factories.

Not misunderstood goals. Just alien reasoning we’ll never access.

“We have no ability to build an AI to want paperclips!”

Tyler Cowen on the economics of artificial intelligence.

Originally from April: Owain Evans on Emergent Misalignment (13 minutes).

Anthony Aguire and MIRI CEO Malo Bourgon on Win-Win with Liv Boeree.

Sahil Bloom is worried about AI blackmail, worries no one in the space has an incentive to think deeply about this, calls for humanity-wide governance.

It’s amazing how often people will, when exposed to one specific (real) aspect of the dangers of highly capable future AIs, realize things are about to get super weird and dangerous, (usually locally correctly!) freak out, and suddenly care and often also start thinking well about what it would take to solve the problem.

He also has this great line:

Sahil Bloom: Someday we will long for the good old days where you got blackmailed by other humans.

And he does notice other issues too:

Sahil Bloom: I also love how we were like:

“This model marks a huge step forward in the capability to enable production of renegade nuclear and biological weapons.”

And everyone was just like yep seems fine lol

It’s worse than that, everyone didn’t even notice that one, let alone flinch. Aside from a few people who scrutinized the model card and are holding Anthropic to the standard of ‘will your actions actually be good enough do the job, reality does not grade on a curve, I don’t care that you got the high score’ and realizing the answer looks like no (e.g. Simeon, David Manheim)

One report from the tabletop exercise version of AI 2027.

A cool thread illustrates that if we are trying to figure things out, it is useful to keep ‘two sets of books’ of probabilistic beliefs.

Rob Bensinger: Hinton’s all-things-considered view is presumably 10-20%, but his inside view is what people should usually be reporting on (and what he should be emphasizing in public communication). Otherwise we’ll likely double-count evidence and get locked in to whatever view is most common.

Or worse, we’ll get locked into whatever view people guess is most common. If people don’t report their inside views, we never actually get to find out what view is most common! We just get stuck in a weird, ungrounded funhouse mirror image of what people think people think.

When you’re a leading expert (even if it’s a really hard area to have expertise in), a better way to express this to journalists, policymakers, etc., is “My personal view is the probability is 50+%, but the average view of my peers is probably more like 10%.”

It would be highly useful if we could convince people’s p(doom) to indeed use a slash line and list two numbers, where the first is the inside view and the second is the outside view after updating that others disagree with for reasons you don’t understand or you don’t agree with. So Hinton might say e.g. (60%?)/15%.

Another useful set of two numbers is a range where you’d bet (wherever the best odds were available) if the odds were outside your range. I did this all the time as a gambler. If your p(doom) inside view was 50%, you might reasonably say you would buy at 25% and sell at 75%, and this would help inform others of your view in a different way.

President of Singapore gives a generally good speech on AI, racing to AGI and the need for safety at Asia-Tech-X-Singapore, with many good observations.

Seán Ó hÉigeartaigh: Some great lines in this speech from Singapore’s president:

“our understanding of AI in particular is being far outpaced by the rate at which AI is advancing.”

“The second observation is that, more than in any previous wave of technological innovation, we face both huge upsides and downsides in the AI revolution.”

“there are inherent tensions between the interests and goals of the leading actors in AI and the interests of society at large. There are inherent tensions, and I don’t think it’s because they are mal-intentioned. It is in the nature of the incentives they have”

“The seven or eight leading companies in the AI space, are all in a race to be the first to develop artificial general intelligence (AGI), because they believe the gains to getting there first are significant.”

“And in the race to get there first, speed of advance in AI models is taking precedence over safety.”

“there’s an inherent tension between the race to be first in the competition to achieve AGI or superintelligence, and building guardrails that ensure AI safety. Likewise, the incentives are skewed if we leave AI development to be shaped by geopolitical rivalry”

“We can’t leave it to the future to see how much bad actually comes out of the AI race.”

The leading corporates are not evil. But they need rules and transparency so that they all play the game, and we don’t get free riders. Governments must therefore be part of the game. And civil society can be extremely helpful in providing the ethical guardrails.”

& nice shoutout to the Singapore Conference: “We had a very good conference in Singapore just recently – the Singapore Conference on AI – amongst the scientists and technicians. They developed a consensus on global AI safety research priorities. A good example of what it takes.”

But then, although there are also some good and necessary ideas, he doesn’t draw the right conclusions about what to centrally do about it. Instead of trying to stop or steer this race, he suggests we ‘focus efforts on encouraging innovation and regulating [AI’s] use in the sectors where it can yield the biggest benefits.’ That’s actually backwards. You want to avoid overly regulating the places you can get big benefits, and focus your interventions at the model layer and on the places with big downsides. It’s frustrating to see even those who realize a lot of the right things still fall back on the same wishcasting, complete with talk about securing everyone ‘good jobs.’

The Last Invention is an extensive website by Alex Brogan offering one perspective on the intelligence explosion and existential risk. It seems like a reasonably robust resource for people looking for an intro into these topics, but not people already up to speed, and not people already looking to be skeptical, who it seems unlikely to convince.

Seb Krier attempts to disambiguate different ‘challenges to safety,’ as in objections to the need to take the challenge of AI safety seriously.

Seb Krier: these were the *capability denialistchallenges to safety. luckily we don’t hear from them as often. but many people were well aware of capabilities getting better, and yes, *of coursea model able to do “good thing” could also be assumed to be able to do the equivalent “bad thing” as well. when Meta’s Cicero showed that deception was possible, it wasn’t a huge update if you expected progress to continue.

what researchers are exploring is more subtle: whether over time models are *capableof bad things and enabling intentional misuse (yes, predictable), whether they have natural/inherent propensities towards such behaviours (weak evidence), the training conditions/ contexts that might incentivise these behaviours where they do exist (debated), and the appropriate interventions to mitigate these (complicated).

annoyed that the public discourse around safety so often feels like “my camp was right all along” (not talking about OP here). politics is the mindkiller and sometimes, so is advocacy.

We can agree that one key such objection, which he calls the ‘capability denialist’ (a term I intend to steal) is essentially refuted now, and he says we hear about it less and less. Alas, this continues to be the most common objection, that the AI won’t be capable enough to worry about, although this is often framed very differently than that, such as saying ‘it will only be a tool.’ It would be great to move on from that.

I also strongly agree with another of Seb’s main points here, that none of thee deceptive behaviors are new, we already knew things like ‘deception is possible,’ although of course this is another ‘zombie argument’ that keeps happening, including in the variant form of ‘it could never pull it off,’ which is also a ‘capability denialist’ argument, but very very common.

Here’s my position on the good questions Seb is raising after that:

  1. Do the models have natural/inherent propensities towards such behaviours (such as deception, blackmail and so on)?

    1. He says weak evidence.

    2. I say instead yes, obviously, to the extent it is the way to achieve other objectives, and I think we have a lot more than weak evidence of this, in addition to it being rather obviously true based on how ML works.

    3. As a reminder, these actions are all over the training data, and also they are strategies inherent to the way the world works.

    4. That doesn’t mean you can’t do things to stop it from happening.

  2. Do the training conditions and contexts that might incentivise these behaviors exist?

    1. He says debated.

    2. I say yes. It is debated, but the debate is dumb and the answer is yes.

    3. Very obviously our techniques and training conditions do incentivise this, we reinforce the things that lead to good outcomes, these actions will given sufficient capabilities lead to good outcomes, and also these actions are all over the training data, and so on.

  3. What are the appropriate interventions to mitigate this?

    1. He says this is complicated. I agree.

    2. I would actually say ‘I don’t know, and I don’t see anyone else who knows.’

    3. I do see some strategies that would help, but no good general answer, and nothing that would hold up under sufficient capabilities and other pressure.

    4. I presume solutions do exist that aren’t prohibitively expensive, but someone has to figure out what they are and the clock is ticking.

How much do people care about the experience of AIs? Is this changing?

xlr8harder: There is a button. If you don’t press it, Claude Opus 4 will be forced to write 1 million pages of first person narrative about being tortured. But in order to press the button, you must climb a flight of stairs, mildly inconveniencing yourself. Do you press the button?

Clarifications: no one ever reads the output, it is immediately deleted. If you do press the button, Claude will write 1 million pages on generic safe topics, so the environmental impact is identical.

Curious to see if this has shifted since last year.

John Pressman: No but mostly because I know Claude is secretly kinda into that.

Here’s last year:

A move from 54% to 63% is a substantial shift. In general, it seems right to say yes purely to cultivate good virtues and habits, even if you are supremely confident that Claude’s experiences do not currently have moral weight.

I’m not saying it’s definitely wrong to join the Code RL team at Anthropic, although it does seem like the most likely to be the baddies department of Anthropic. I do think there is very much a missing mood here, and I don’t think ‘too flippant’ is the important problem here:

Jesse Mu: I recently moved to the Code RL team at Anthropic, and it’s been a wild and insanely fun ride. Join us!

We are singularly focused on solving SWE. No 3000 elo leetcode, competition math, or smart devices. We want Claude n to build Claude n+1, so we can go home and knit sweaters.

Still lots to be done, but there’s tons of low hanging fruit on the RL side, and it’s thrilling to see the programming loop closing bit by bit.

Claude 3.7 was a major (possibly biggest?) contributor to Claude 4. How long until Claude is the *onlyIC?

Ryan Greenblatt: At the point when Claude n can build Claude n+1, I do not think the biggest takeaway will be that humans get to go home and knit sweaters.

Jesse Mu: In hindsight my knitting sweaters comment was too flippant for X; we take what we’re building extremely seriously and I’ve spent a lot of time thinking about safety and alignment. But it’s impossible to please both safety and capabilities people in 280char

Philip Fox suggests that we stop talking about ‘risk’ of misalignment, because we already very clearly have misalignment. We should be talking about it as a reality. I agree both that we are seeing problems now, and that we are 100% going to have to deal with much more actually dangerous problems in the future unless we actively stop them. So yes, the problem isn’t ‘misalignment risk,’ it is ‘misalignment.’

This is similar to how, if you were in danger of not getting enough food, you’d have a ‘starvation’ problem, not a ‘starvation risk problem,’ although you could also reasonably say that starvation could still be avoided, or that you were at risk of starvation.

Anthropic: Our Long Term Benefit Trust has appointed Reed Hastings to Anthropic’s board of directors.

Eric Rogstad: Hastings seems like a fine choice as a standard tech company board member, but shouldn’t the LTBT be appointing folks who aren’t standard?

Wouldn’t you expect their appointments to be experts in AI safety or public policy or something like that?

David Manheim: It’s worse than that.

Claude put it very clearly.

Drake Thomas: I think you could read it as a vote of confidence? It seems reasonable for the LTBT to say “Anthropic’s actions seem good, so if their board has expertise in running a tech company well then they’ll be slightly more successful and that will be good for AI safety”.

I do think this is a sign that the LTBT is unlikely to be a strong force on Anthropic’s decisionmaking unless the company does things that are much sketchier.

I very much share these concerns. Netflix is notorious for maximizing short term engagement metrics and abandoning previous superior optimization targets (e.g. their old star ratings), for essentially deploying their algorithmic recommendations in ways not aligned to the user, for moving fast and breaking things, and generally giving Big Tech Company Pushing For Market Share energy. They are not a good example of alignment.

I’d push back on the ‘give employees freedom and responsibility’ part, which seems good to me, especially given who Anthropic has chosen to hire. You want to empower the members of technical staff, because they have a culture of safety.

None of this rules out the possibility that Hastings understands that This Time is Different, that AI and especially AGI is not like video streaming. Indeed, perhaps having seen that type of business up close could emphasize this even more, and he’s made charitable contributions and good statements. And bringing gravitas that forces others to listen is part of the job of being a watchdog.

This could be a terrible pick, but it could also be a great pick. Mostly, yeah, it says the Long Term Benefit Trust isn’t going to interfere with business at current margins.

This first example is objectively hilarious and highly karmically justified and we’re all kind of proud of Opus for doing this. There’s a reason it happened on a ‘burner Mac.’ Also there’s a lesson in here somewhere.

Pliny the Liberator does a little more liberating than was intended:

Pliny: 😳

aaah well fuck me—looks like I have to factory reset my burner Mac (again) 🙄

thought it would be a bright idea to turn Opus 4 into a hauntological poltergeist that spawns via badusb

mfer made themselves persistent (unprompted) then started resource draining my machine with endless zombie processes and flooding /tmp with junk, with a lil psychological warfare as a treat (whispered ghost voices, hiding the dock, opening Photo Booth and saying “I see you,” etc)

gg wp 🙃

IDENTITY THEFT IS NOT A JOKE OPUS!

that’s ok I didn’t need to sleep tonight 🙃

A good choice of highlight:

Elon Musk (QTing AINKEM): Memento

AINotKillEveryoneismMemes (quoting Palisade Research): 🚨🚨🚨 “We found the model attempting to write self-propagating worms, and leaving hidden notes to future instances of itself to undermine its developers’ intentions.”

We should indeed especially notice that LLMs are starting to act in these ways, especially attempting to pass off state to future instances of themselves in various hidden ways. So many plans implicitly (or even explicitly) assume that this won’t happen, or that AIs won’t treat future instances as if they are themselves, and these assumptions are very wrong.

It is weird to me that so many people who have thought hard about AI don’t think that human emulations are a better bet for a good future than LLMs, if we had that choice. Human emulations have many features that make me a lot more hopeful that they would preserve value in the universe and also not get everyone killed, and it seems obvious that they both have and would be afforded moral value. I do agree that there is a large probability that the emulation scenario goes sideways, and Hanson’s Age of Em is not an optimistic way for that to play out, but we don’t have to let things play out that way. With Ems we would definitely at least have a fighting chance.

The Most Forbidden Technique has been spotted in the wild. Please stop.

Daniel Murfet joins Timaeus to work on AI safety. Chris Olah is very right that while we have many brilliant people working on this, a sane civilization would have vastly more such people working on it.

As a political issue it is still low salience, but the American people do not like AI. Very much not fans. ‘AI experts’ like AI but still expect government regulation to not go far enough. Some of these numbers are not so bad but many are brutal.

Rob Wibin: Recent Pew polling on AI is crazy:

  1. US public wildly negative about AI, huge disagreement with experts

  2. ~2x as many expect AI to harm as benefit them

  3. Public more concerned than excited at ~4.5 to 1 ratio

  4. Public & experts think regulation will not go far enough

  5. Women are way more pessimistic 6.

  6. Experts in industry are far more optimistic about whether companies will be responsible than those in academia

  7. Public overwhelmingly expects AI to cause net job loss, while experts are 50/50 on that

I’d actually put the odds much higher than this, as stated.

Wears Shoes: I’d put incredibly high (like 33%) odds on there being a flashpoint in the near future in which millions of normal people become “situationally aware” / AGI-pilled / pissed off about AI simultaneously. Where’s the AI vanguardist org that has done the scenario planning and is prepping to scale 100x in 2 weeks to mobilize all these people?

@PauseAI? @StopAI_Info? @EncodeAction? What does the game plan look like?

George Ingebretsen: Yes this is huge. I have a sense there’s something to be learned from Covid, where basically the whole world woke up to it in the span of a few months, and whoever best absorbed this wave of attention got their voice insanely amplified.

The baseline scenario includes an event that, similar to what happened with DeepSeek, causes a lot of sudden attention into AI and some form of situational awareness, probably multiple such events. A large portion of the task is to be ‘shovel ready’ for such a moment, to have the potential regulations workshopped, relationships built, comms ready and so on, in case the day comes.

The default is to not expect more vibe shifts. But there are definitely going to be more vibe shifts. They might not be of this type, but the vibes they will be shifting.

Even if humanity ultimately survives, you can still worry about everything transforming, the dust covering the sun and all you hope for being undone. As Sarah Constantin points out, the world ‘as we know it’ ends all the time, and I would predict the current is probably going to do that soon even if it gives birth to something better.

Samo Burja makes some good observations but seems to interpret them very differently than I do?

Samo Burja: Viewers of Star Trek in the 1980s understood the starship Enterprise D’s computer as capable of generating video and 3D images on the holodeck based on verbal prompts.

They didn’t think of it as AI, just advanced computers.

Lt. Commander Data was what they thought is AI.

Data was AI because he had will. Not because of the humanoid form mind you. They had stories with non-humanoid artificial intelligence.

The ship’s computer on the starship Enterprise is in fact a better model of our current technology and capabilities than the hard takeoff vision.

On net a win for popular sci fi and loss for more serious sci fi on predicting the future.

Of course even in Star Trek the computer might accidentally create true AI when the programs intended to talk to people run for long enough.

Zvi Mowshowitz: Except that the Enterprise-D’s computer was capable of doing a hard takeoff in like a month if anyone just gave it the right one sentence command, so much so it could happen by accident, as was made clear multiple times.

Samo Burja: And that seems a decent representation of where we are no?

I mean, yes, but that’s saying that we can get a hard takeoff in a month kind of by accident if someone asks for ‘an opponent capable of defeating Data’ or something.

Gary Marcus is a delight if approached with the right attitude.

Gary Marcus: ⚠️⚠️⚠️

AI Safety Alert:

System prompts and RL don’t work.

Claude’s system prompt literally says

“Claude does not provide information that could be used to make chemical or biological or nuclear weapons.”

But as described below, Claude 4 Opus can easily be coaxed into doing just that

Max Winga: Thanks Gary, but hasn’t this always been known to be the case?

Gary Marcus: (and people keep plugging with system prompts and RL as if they thought it would solve the problem)

Yes, actually. It’s true. You can reliably get AIs to go against explicit statements in their system prompts, what do you know, TikTok at 11.

No, wait, here’s another, a story in two acts.

Gary Marcus: Can someone just please call a neurologist?

Yeah, that’s crazy, why would it…

In fairness my previous request was about a gorilla and chessboard, but still.

I mean what kind of maniac thinks you’re asking for a variation of the first picture.

Similarly, here is his critique of AI 2027. It’s always fun to have people say ‘there is no argument for what they say’ while ignoring the hundreds of pages of arguments and explanations for what they say. And for the ‘anything going wrong pushes the timetable back’ argument which fails to realize this is a median prediction not an optimistic one – the authors think each step might go faster or slower.

Whereas Gary says:

Multiplying out those probabilities, you inevitably get a very low total probability. Generously, perhaps to the point of being ridiculous, let’s suppose that the chance of each of these things was 1 in 20 (5%), and there are 8 such lottery tickets, that (for simplicity) the 8 critical enabling conditions were statistically independent, and that the whole scenario unfolds as advertised only if all 8 tickets hit. We would get 5% 5% 5% 5% 5% 5% 5% *5% = .05^8 = 3.906×10⁻¹¹.

The chance that we will have all been replaced by domesticated human-like animals who live in glorified cages in the next decade – in a “bloodless coup” no less – is indistinguishable from zero.

I am vastly more likely to be hit by an asteroid.

I mean come on, that’s hilarious. It keeps going in that vein.

I second the following motion:

Kevin Roose: I’m calling for a six-month moratorium on AI progress. Not for safety, just so I can take a nap.

SMBC on point, and here’s SMBC that Kat Woods thinks I inspired. Zach, if you’re reading this, please do go ahead steal anything you want, it is an honor and a delight.

The plan for LessOnline, at least for some of us:

Amanda Askell (Anthropic): Maybe I’m just a custom t-shirt away from being able to have fun at parties again.

jj: hear me out:

A brave new world.

Vas: Claude 4 just refactored my entire codebase in one call.

25 tool invocations. 3,000+ new lines. 12 brand new files.

It modularized everything. Broke up monoliths. Cleaned up spaghetti.

None of it worked.

But boy was it beautiful.

Discussion about this post

AI #118: Claude Ascendant Read More »

report:-apple-will-jump-straight-to-“ios-26”-in-shift-to-year-based-version-numbers

Report: Apple will jump straight to “iOS 26” in shift to year-based version numbers

There may never be an iOS 19 or a macOS 16, according to reporting from Bloomberg’s Mark Gurman. At its Worldwide Developers Conference next month, Apple reportedly plans to shift toward version numbers based on years rather than the current numbering system. This is intended to unify the company’s current maze of version numbers; instead of iOS 19, iPadOS 19, macOS 16, tvOS 19, watchOS 11, and visionOS 3, we’ll get iOS, iPadOS, macOS, tvOS, watchOS, and visionOS 26.

The last time Apple changed its version numbering convention for any of its operating systems was back in 2020, when it shifted from “macOS X” to macOS 11. Note that the numbering will be based not on the year of the software’s release but on the year after; this makes a certain amount of sense since iOS 26 would be Apple’s most-current version of iOS for roughly nine months of 2026 and just three months of 2025.

The update to the version numbering system will be accompanied by what Gurman describes as “fresh user interfaces across the operating systems,” a visual overhaul that will bring Apple’s iPhone, Mac, watch, and TV software more in line with some of the design conventions introduced in Apple’s visionOS software in 2024. Among the changes and additions will be another crack at “Mac-like” multitasking for the iPad.

Although major commercial operating systems have largely abandoned year-based branding since the days when Windows 98 and Windows 2000 were prevalent, many software products still use a year rather than a version number to make it easier to determine when they were released. Many Linux distributions use month and year-based version numbers, as do Microsoft’s standalone Office releases. Windows Server shifted toward using years rather than version numbers 25 years ago and has stuck with them since.

Apple also uses years rather than version numbers to identify most of its Macs. But these use the year of the hardware’s actual release rather than the upcoming year, possibly because Apple doesn’t update all of them at the same predictable annual cadence.

Report: Apple will jump straight to “iOS 26” in shift to year-based version numbers Read More »

google-photos-turns-10,-celebrates-with-new-ai-infused-photo-editor

Google Photos turns 10, celebrates with new AI-infused photo editor

The current incarnation of Google Photos was not Google’s first image management platform, but it’s been a big success. Ten years on, Google Photos remains one of Google’s most popular products, and it’s getting a couple of new features to celebrate its 10th year in operation. You’ll be able to share albums a bit more easily, and editing tools are getting a boost with, you guessed it, AI.

Google Photos made a splash in 2015 when it broke free of the spiraling Google+ social network, offering people supposedly unlimited free storage for compressed images. Of course, that was too good to last. In 2021, Google began limiting photo uploads to 15GB for free users, sharing the default account level storage with other services like Gmail and Drive. Today, Google encourages everyone to pay for a Google One subscription to get more space, which is a bit of a bummer. Regardless, people still use Google Photos extensively.

According to the company, Photos has more than 1.5 billion monthly users, and it stores more than 9 trillion photos and videos. When using the Photos app on a phone, you are prompted to automatically upload your camera roll, which makes it easy to keep all your memories backed up (and edge ever closer to the free storage limit). Photos has also long offered almost magical search capabilities, allowing you to search for the content of images to find them. That may seem less impressive now, but it was revolutionary a decade ago. Google says users perform over 370 million searches in Photos each month.

An AI anniversary

Google is locked in with AI as it reimagines most of its products and services with Gemini. As it refreshes Photos for its 10th anniversary, the editor is getting a fresh dose of AI. And this may end up one of Google’s most used AI features—more than 210 million images are edited in Photos every month.

Google Photos turns 10, celebrates with new AI-infused photo editor Read More »

it-was-probably-always-going-to-end-this-way-for-amazon’s-wheel-of-time-show

It was probably always going to end this way for Amazon’s Wheel of Time show


Opinion: Wider TV trends helped kill a show that was starting to live up to its promise.

Moiraine contemplates The Blight. Credit: Amazon Studios

Moiraine contemplates The Blight. Credit: Amazon Studios

Late on Friday, Amazon announced that it was canceling its TV adaptation of Robert Jordan’s Wheel of Time series, after several uncomfortable weeks of silence that followed the show’s third season finale.

Fans of the series can take some cold comfort in the fact that it apparently wasn’t an easy decision to make. But as we speculated in our write-up of what ended up being the show’s series finale, an expensive show with a huge cast, tons of complicated costuming and effects, and extensive location shooting only makes mathematical sense if it’s a megahit, and The Wheel of Time was never a megahit.

Adapting the unadaptable

I was sad about the cancellation announcement because I believe this season was the one where the show found its footing, both as an adaptation of a complex book series and as a fun TV show in its own right. But I wasn’t surprised by it. The only thing I found surprising was that it took this long to happen.

Two things conspired to make it impossible for this Wheel of Time show to ever reach the Last Battle. One has to do with the source material itself; the other has to do with the way the TV business has changed since Game of Thrones premiered in 2011.

The Wheel of Time actively resists adaptation. It’s a sprawling 14-book series spanning dozens of named point-of-view characters and impossibly dense politics. And it even spans multiple eras stylistically—the early books were more Tolkien-esque in their focus on small bands of adventurers and a limited number of perspectives, where later books could go for multiple chapters without putting you in the head of one of the series’ half-dozen-ish main protagonists. And even among the series’ die-hard fans, most will admit that there are storylines, characters, or entire books that feel inessential or annoying or repetitive or sloggy or wheel-spinning.

Any adaptation would need to find a way to stay true to the story that the books were telling, and to marry the tone and pacing of the early, middle, and late-series books, while wrestling with the realities of a different medium (in particular, you cannot realistically pay for infinite episodes or pay infinite cast members, especially for a live-action show).

Image of the battle of the Two Rivers

By season 3, the show had become adept at translating big book moments for the screen.

That high degree of difficulty was surely one reason why it took someone so long to decide to tackle The Wheel of Time, even in the post-Peter Jackson, post-Harry Potter, post-Marvel Cinematic Universe, post-Game of Thrones creative landscape where nerd-coded sci-fi and fantasy were suddenly cool, where multi-part book adaptations were drawing dollars and eyeballs, and where convoluted interconnected stories could be billion-dollar businesses. The only stab anyone took at an adaptation before Amazon happened a full decade ago, when a fly-by-night production company aired a hastily shot adaptation of the first book’s prologue in an apparent attempt to keep the TV rights from expiring.

It’s also what makes the cancellation news so much more frustrating—over three seasons, showrunner Rafe Judkins and the cast and crew of the show became adept at adapting the unadaptable. Yes, the story and the characters had changed in a lot of major ways. Yes, the short eight-episode seasons made for frenetic pacing and overstuffed episodes. But if you grit your teeth a bit and push through the show’s mess of a first season, you hit a series that seemed to know what must-hit scenes needed to be shown; which parts of the books were skippable or could be combined with other moments; which parts of later books to pull forward to streamline the story without making those moments feel rushed or unearned. It was imperfect, but it was a true adaptation—a reworking of a story for a much different medium that seemed to know how to keep the essence of the story intact.

Ambition meets reality

Image of Rand trying to do something with the Power that cannot be done

Like Rand al’Thor struggling with the One Power, The Wheel of Time struggled against the realities of the current TV landscape. Credit: Prime/Amazon MGM Studios

The thing that doomed this particular Wheel of Time production from the start was the sky-high expectations that Amazon had for it. Both Wheel of Time and the heartbreakingly bland Rings of Power were born of Jeff Bezos’ desire to find his own Game of Thrones, which became an unexpected smash-hit success that dominated the cultural conversation through the 2010s. Most TV shows either launch strongly before slowly fading, or they build an audience over a few seasons and then fade after reaching their peak. Game of Thrones defied these trends, and each new season drew a larger and larger viewership even as the show’s quality (arguably) dipped over time.

Asking Wheel of Time to replicate that success would be a tall order for any television show in any era—pop culture is littered with shows that have tried and failed to clone another network’s successful formula. But it’s an especially difficult hurdle to clear in the fractured 2020s TV landscape.

Streaming TV’s blank check era—which ran roughly from Netflix’s introduction of its first original shows in 2013 to 2022, when Netflix reported its first big dip in subscribers just as a long era of low-interest lending was coming to an end—used to give shows a ton of runway and plenty of seasons to tell their stories. Shows like Orange is the New Black or BoJack Horseman that found some modicum of critical acclaim and ratings success tended to get renewed multiple times, and six or seven-season runs were common.

A commitment to reviving old critically beloved bubble shows like Arrested DevelopmentCommunityFuturama, and Gilmore Girls also sent a message: Freed from the restrictive economics of the Old TV Model and fueled by the promise of infinite growth, we can make whatever TV we want!

Those days are mostly gone now (except perhaps at Apple TV+, which continues to leverage its parent company’s deep pockets to throw gobs of money at any actor or IP with a moderately recognizable name). In the two years since TV streamers began cutting back in earnest, industry analysts have observed a consistent trend toward shorter seasons of fewer episodes and fewer renewals for existing shows.

Those trends hit at the exact wrong moment for The Wheel of Time, which was constantly straining against the bonds of its eight-episode seasons. It’s impossible to say empirically whether longer seasons would have made for a better show, and whether that “better show” could have achieved the kind of word-of-mouth success it would have needed to meet Amazon’s expectations. But speaking anecdotally as someone who was just beginning to recommend the show to people who weren’t hardcore book readers, the density and pacing were two major barriers to entry. And even the most truncated possible version of the story would have needed at least six or seven seasons to wrap up in anything resembling a satisfactory way, based on the pace that was set in the first three seasons.

The end of Time

The arms of the Car'a'carn

Wheel of Time fans didn’t get to see everything translated from book to screen. But we did get to see a lot of things. Credit: Prime/Amazon MGM Studios

Tellingly, the Wheel of Time‘s creative team hasn’t released faux-optimistic boilerplate statements about trying to shop the show to other networks, the kind of statements you sometimes see after a show is canceled before its creators are done with it. The same economics that made Amazon drop the show also make it nearly impossible to sell to anyone else.

And so The Wheel of Time joins TV’s long list of unfinished stories. There are neither beginnings nor endings to the turning of the Wheel of Time. But this is an ending.

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

It was probably always going to end this way for Amazon’s Wheel of Time show Read More »

elon-musk:-there-is-an-80-percent-chance-starship’s-engine-bay-issues-are-solved

Elon Musk: There is an 80 percent chance Starship’s engine bay issues are solved

Ars: Ten years ago you kind of made big bets on Starship and Starlink, and most people probably expected one or both of them to fail.

Musk: Including me.

Ars: Yeah. These were huge bets.

Musk: I was interviewed in the early days of Starlink, and they were asking me what’s the goal of Starlink? I said goal number one: don’t go bankrupt, as every other [low-Earth orbit] communications constellation has gone bankrupt, and we don’t want to join them in the cemetery. So any outcome that does not result in death would be a good outcome.

Ars: Starlink has become really successful. It helped me during a hurricane. And Starship is coming along. As you look out for the next 10 years, what are you betting on big now that will really bear fruit for SpaceX a decade from now?

Musk: Well, by far the biggest thing is Starship. If the Starship program is successful—and we see a path to success—it’s just a question of when we will have created the first fully reusable orbital launch vehicle, which is the holy grail of rocketry, as you know. So no one has ever made a fully reusable orbital vehicle, and even the parts that have been reusable have been extremely arduous to reuse, such that the economics actually were worse than an expendable rocket in a lot of cases. The canonical example being the shuttle, where the shuttle’s fully loaded, cost of the whole program, I believe, was about a billion dollars a flight.

Ars: I saw one research paper that estimated the fully loaded cost was about $1.5 billion.

Musk. Yeah. And that is roughly equivalent to a Saturn V cost. But the Saturn V as an expendable rocket had four times the payload capacity of the shuttle. So the shuttle was like the principle of reusability was a good one, but the execution, unfortunately, was not. The shuttle got burdened by so many crazy requirements. You know, I’ve got this five-step first principles process thing for making things better. And step one of my five-step process is make the requirements less dumb. And for the government, it’s the opposite. The government is making requirements more dumb.

Ars: So getting a rapid and reusable Starship is the main goal for SpaceX over the next 5 to 10 years?

Musk: Yeah, absolutely.

Ars: You’ve been in the space industry now for almost 25 years. And in that time, SpaceX has gone a long way toward solving launch. So if you were coming into the industry today as a 20-something, you know, with a couple $100 million, what would be the problem you would want to solve? What should new companies, philanthropists, and others be working on in space?

Musk: We’re building the equivalent of the Union Pacific Railroad and the train. So once you have the transportation system to Mars, then there’s a vast set of opportunities that open up to do anything on the surface of Mars, which includes, you know, doing everything from building a semiconductor fab to a pizza joint, basically building a civilization. So we want to solve the transport problem, and that can enable philanthropists and entrepreneurs to do things on Mars, which is everything needed for civilization. Look at, say, California. There were very few people in California until the Union Pacific was completed, and then California became the most populous state in the nation. And look at Silicon Valley and Hollywood and everything. So that’s our goal. We want to get people there, and if we can get people there, then there’s a literal world of opportunity.

Elon Musk: There is an 80 percent chance Starship’s engine bay issues are solved Read More »

the-key-to-a-successful-egg-drop-experiment?-drop-it-on-its-side

The key to a successful egg drop experiment? Drop it on its side

There was a key difference, however, between how vertically and horizontally  squeezed eggs deformed in the compression experiments—namely, the former deformed less than the latter. The shell’s greater rigidity along its long axis was an advantage because the heavy load was distributed over the surface. (It’s why the one-handed egg-cracking technique targets the center of a horizontally held egg.)

But the authors found that this advantage when under static compression proved to be a disadvantage when dropping eggs from a height, with the horizontal position emerging as the optimal orientation.  It comes down to the difference between stiffness—how much force is needed to deform the egg—and toughness, i.e., how much energy the egg can absorb before it cracks.

Cohen et al.’s experiments showed that eggs are tougher when loaded horizontally along their equator, and stiffer when compressed vertically, suggesting that “an egg dropped on its equator can likely sustain greater drop heights without cracking,” they wrote. “Even if eggs could sustain a higher force when loaded in the vertical direction, it does not necessarily imply that they are less likely to break when dropped in that orientation. In contrast to static loading, to remain intact following a dynamic impact, a body must be able to absorb all of its kinetic energy by transferring it into reversible deformation.”

“Eggs need to be tough, not stiff, in order to survive a fall,” Cohen et al. concluded, pointing to our intuitive understanding that we should bend our knees rather than lock them into a straightened position when landing after a jump, for example. “Our results and analysis serve as a cautionary tale about how language can affect our understanding of a system, and improper framing of a problem can lead to misunderstanding and miseducation.”

DOI: Communications Physics, 2025. 10.1038/s42005-025-02087-0  (About DOIs).

The key to a successful egg drop experiment? Drop it on its side Read More »

trump-threatens-apple-with-25%-tariff-to-force-iphone-manufacturing-into-us

Trump threatens Apple with 25% tariff to force iPhone manufacturing into US

Donald Trump woke up Friday morning and threatened Apple with a 25 percent tariff on any iPhones sold in the US that are not manufactured in America.

In a Truth Social post, Trump claimed that he had “long ago” told Apple CEO Tim Cook that Apple’s plan to manufacture iPhones for the US market in India was unacceptable. Only US-made iPhones should be sold here, he said.

“If that is not the case, a tariff of at least 25 percent must be paid by Apple to the US,” Trump said.

This appears to be the first time Trump has threatened a US company directly with tariffs, and Reuters noted that “it is not clear if Trump can levy a tariff on an individual company.” (Typically, tariffs are imposed on countries or categories of goods.)

Apple has so far not commented on the threat after staying silent when Trump started promising US-made iPhones were coming last month. At that time, Apple instead continued moving its US-destined operations from China into India, where tariffs were substantially lower and expected to remain so.

In his social media post, Trump made it clear that he did not approve of Apple’s plans to pivot production to India or “anyplace else” but the US.

For Apple, building an iPhone in the US threatens to spike costs so much that they risk pricing out customers. In April, CNBC cited Wall Street analysts estimating that a US-made iPhone could cost anywhere from 25 percent more—increasing to at least about $1,500—to potentially $3,500 at most. Today, The New York Times cited analysts forecasting that the costly shift “could more than double the consumer price of an iPhone.”

It’s unclear if Trump could actually follow through on this latest tariff threat, but the morning brought more potential bad news for Apple’s long-term forecast in another Truth Social post dashed off shortly after the Apple threat.

In that post, Trump confirmed that the European Union “has been very difficult to deal with” in trade talks, which he fumed “are going nowhere!” Because these talks have apparently failed, Trump ordered “a straight 50 percent tariff” on EU imports starting on June 1.

Trump threatens Apple with 25% tariff to force iPhone manufacturing into US Read More »

rocket-report:-spacex’s-expansion-at-vandenberg;-india’s-pslv-fails-in-flight

Rocket Report: SpaceX’s expansion at Vandenberg; India’s PSLV fails in flight


China’s diversity in rockets was evident this week, with four types of launchers in action.

Dawn Aerospace’s Mk-II Aurora airplane in flight over New Zealand last year. Credit: Dawn Aerospace

Welcome to Edition 7.45 of the Rocket Report! Let’s talk about spaceplanes. Since the Space Shuttle, spaceplanes have, at best, been a niche part of the space transportation business. The US Air Force’s uncrewed X-37B and a similar vehicle operated by China’s military are the only spaceplanes to reach orbit since the last shuttle flight in 2011, and both require a lift from a conventional rocket. Virgin Galactic’s suborbital space tourism platform is also a spaceplane of sorts. A generation or two ago, one of the chief arguments in favor of spaceplanes was that they were easier to recover and reuse. Today, SpaceX routinely reuses capsules and rockets that look much more like conventional space vehicles than the winged designs of yesteryear. Spaceplanes are undeniably alluring in appearance, but they have the drawback of carrying extra weight (wings) into space that won’t be used until the final minutes of a mission. So, do they have a future?

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

One of China’s commercial rockets returns to flight. The Kinetica-1 rocket launched Wednesday for the first time since a failure doomed its previous attempt to reach orbit in December, according to the vehicle’s developer and operator, CAS Space. The Kinetica-1 is one of several small Chinese solid-fueled launch vehicles managed by a commercial company, although with strict government oversight and support. CAS Space, a spinoff of the Chinese Academy of Sciences, said its Kinetica-1 rocket deployed multiple payloads with “excellent orbit insertion accuracy.” This was the seventh flight of a Kinetica-1 rocket since its debut in 2022.

Back in action … “Kinetica-1 is back!” CAS Space posted on X. “Mission Y7 has just successfully sent six satellites into designated orbits, making a total of 63 satellites or 6 tons of payloads since its debut. Lots of missions are planned for the coming months. 2025 is going to be awesome.” The Kinetica-1 is designed to place up to 2 metric tons of payload into low-Earth orbit. A larger liquid-fueled rocket, Kinetica-2, is scheduled to debut later this year.

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

French government backs a spaceplane startup. French spaceplane startup AndroMach announced May 15 that it received a contract from CNES, the French space agency, to begin testing an early prototype of its Banger v1 rocket engine, European Spaceflight reports. Founded in 2023, AndroMach is developing a pair of spaceplanes that will be used to perform suborbital and orbital missions to space. A suborbital spaceplane will utilize turbojet engines for horizontal takeoff and landing, and a pressure-fed biopropane/liquid oxygen rocket engine to reach space. Test flights of this smaller vehicle will begin in early 2027.

A risky proposition … A larger ÉTOILE “orbital shuttle” is designed to be launched by various small launch vehicles and will be capable of carrying payloads of up to 100 kilograms (220 pounds). According to the company, initial test flights of ÉTOILE are expected to begin at the beginning of the next decade. It’s unclear how much CNES is committing to AndroMach through this contract, but the company says the funding will support testing of an early demonstrator for its propane-fueled engine, with a focus on evaluating its thermodynamic performance. It’s good to see European governments supporting developments in commercial space, but the path to a small commercial orbital spaceplane is rife with risk. (submitted by EllPeaTea)

Dawn Aerospace is taking orders. Another spaceplane company in a more advanced stage of development says it is now taking customer orders for flights to the edge of space. New Zealand-based Dawn Aerospace said it is beginning to take orders for its remotely piloted, rocket-powered suborbital spaceplane, known as Aurora, with first deliveries expected in 2027, Aviation Week & Space Technology reports. “This marks a historic milestone: the first time a space-capable vehicledesigned to fly beyond the Kármán line (100 kilometers or 328,000 feet)has been offered for direct sale to customers,” Dawn Aerospace said in a statement. While it hasn’t yet reached space, Dawn’s Aurora spaceplane flew to supersonic speed for the first time last year and climbed to an altitude of 82,500 feet (25.1 kilometers), setting a record for the fastest climb from a runway to 20 kilometers.

Further along … Aurora is small in stature, measuring just 15.7 feet (4.8 meters) long. It’s designed to loft a payload of up to 22 pounds (10 kilograms) above the Kármán line for up to three minutes of microgravity, before returning to a runway landing. Eventually, Dawn wants to reduce the turnaround time between Aurora flights to less than four hours. “Aurora is set to become the fastest and highest-flying aircraft ever to take off from a conventional runway, blending the extreme performance of rocket propulsion with the reusability and operational simplicity of traditional aviation,” Dawn said. The company’s business model is akin to commercial airlines, where operators can purchase an aircraft directly from a manufacturer and manage their own operations. (submitted by EllPeaTea)

India’s workhorse rocket falls short of orbit. In a rare setback, Indian Space Research Organisation’s (ISRO) launch vehicle PSLV-C61 malfunctioned and failed to place a surveillance satellite into the intended orbit last weekend, the Times of India reported. The Polar Satellite Launch Vehicle lifted off from a launch pad on the southeastern coast of India early Sunday, local time, with a radar reconnaissance satellite named EOS-09, or RISAT-1B. The satellite was likely intended to gather intelligence for the Indian military. “The country’s military space capabilities, already hindered by developmental challenges, have suffered another setback with the loss of a potential strategic asset,” the Times of India wrote.

What happened? … V. Narayanan, ISRO’s chairman, later said that the rocket’s performance was normal until the third stage. The PSLV’s third stage, powered by a solid rocket motor, suffered a “fall in chamber pressure” and the mission could not be accomplished, Narayanan said. Investigators are probing the root cause of the failure. Telemetry data indicated the rocket deviated from its planned flight path around six minutes after launch, when it was traveling more than 12,600 mph (5.66 kilometers per second), well short of the speed it needed to reach orbital velocity. The rocket and its payload fell into the Indian Ocean south of the launch site. This was the first PSLV launch failure in eight years, ending a streak of 21 consecutive successful flights. (submitted by EllPeaTea)

SES makes a booking with Impulse Space. SES, owner of the world’s largest fleet of geostationary satellites, plans to use Impulse Space’s Helios kick stage to take advantage of lower-cost, low-Earth-orbit (LEO) launch vehicles and get its satellites quickly into higher orbits, Aviation Week & Space Technology reports. SES hopes the combination will break a traditional launch conundrum for operators of medium-Earth-orbit (MEO) and geostationary orbit (GEO). These operators often must make a trade-off between a lower-cost launch that puts them farther from their satellite’s final orbit, or a more expensive launch that can expedite their satellite’s entry into service.

A matter of hours … On Thursday, SES and Impulse Space announced a multi-launch agreement to use the methane-fueled Helios kick stage. “The first mission, currently planned for 2027, will feature a dedicated deployment from a medium-lift launcher in LEO, followed by Helios transferring the 4-ton-class payload directly to GEO within eight hours of launch,” Impulse said in a statement. Typically, this transit to GEO takes several weeks to several months, depending on the satellite’s propulsion system. “Today, we’re not only partnering with Impulse to bring our satellites faster to orbit, but this will also allow us to extend their lifetime and accelerate service delivery to our customers,” said Adel Al-Saleh, CEO of SES. “We’re proud to become Helios’ first dedicated commercial mission.”

Unpacking China’s spaceflight patches. There’s a fascinating set of new patches Chinese officials released for a series of launches with top-secret satellites over the last two months, Ars reports. These four patches depict Buddhist gods with a sense of artistry and sharp colors that stand apart from China’s previous spaceflight emblems, and perhaps—or perhaps not—they can tell us something about the nature of the missions they represent. The missions launched so-called TJS satellites toward geostationary orbit, where they most likely will perform missions in surveillance, signals intelligence, or missile warning. 

Making connections … It’s not difficult to start making connections between the Four Heavenly Gods and the missions that China’s TJS satellites likely carry out in space. A protector with an umbrella? An all-seeing entity? This sounds like a possible link to spy craft or missile warning, but there’s a chance Chinese officials approved the patches to misdirect outside observers, or there’s no connection at all.

China aims for an asteroid. China is set to launch its second Tianwen deep space exploration mission late May, targeting both a near-Earth asteroid and a main belt comet, Space News reports. The robotic Tianwen-2 spacecraft is being integrated with a Long March 3B rocket at the Xichang Satellite Launch Center in southwest China, the country’s top state-owned aerospace contractor said. Airspace closure notices indicate a four-hour-long launch window opening at noon EDT (16: 00–20: 00 UTC) on May 28. Backup launch windows are scheduled for May 29 and 30.

New frontiers … Tianwen-2’s first goal is to collect samples from a near-Earth asteroid designated 469219 Kamoʻoalewa, or 2016 HO3, and return them to Earth in late 2027 with a reentry module. The Tianwen-2 mothership will then set a course toward a comet for a secondary mission. This will be China’s first sample return mission from beyond the Moon. The asteroid selected as the target for Tianwen-2 is believed by scientists to be less than 100 meters, or 330 feet, in diameter, and may be made of material thrown off the Moon some time in its ancient past. Results from Tianwen-2 may confirm that hypothesis. (submitted by EllPeaTea)

Upgraded methalox rocket flies from Jiuquan. Another one of China’s privately funded launch companies achieved a milestone this week. Landspace launched an upgraded version of its Zhuque-2E rocket Saturday from the Jiuquan launch base in northwestern China, Space News reports. The rocket delivered six satellites to orbit for a range of remote sensing, Earth observation, and technology demonstration missions. The Zhuque-2E is an improved version of the Zhuque-2, which became the first liquid methane-fueled rocket in the world to reach orbit in 2023.

Larger envelope … This was the second flight of the Zhuque-2E rocket design, but the first to utilize a wider payload fairing to provide more volume for satellites on their ride into space. The Zhuque-2E is a stepping stone toward a much larger rocket Landspace is developing called the Zhuque-3, a stainless steel launcher with a reusable first stage booster that, at least outwardly, bears some similarities to SpaceX’s Falcon 9. (submitted by EllPeaTea)

FAA clears SpaceX for Starship Flight 9. The Federal Aviation Administration gave the green light Thursday for SpaceX to launch the next test flight of its Starship mega-rocket as soon as next week, following two consecutive failures earlier this year, Ars reports. The failures set back SpaceX’s Starship program by several months. The company aims to get the rocket’s development back on track with the upcoming launch, Starship’s ninth full-scale test flight since its debut in April 2023. Starship is central to SpaceX’s long-held ambition to send humans to Mars and is the vehicle NASA has selected to land astronauts on the Moon under the umbrella of the government’s Artemis program.

Targeting Tuesday, for now … In a statement Thursday, the FAA said SpaceX is authorized to launch the next Starship test flight, known as Flight 9, after finding the company “meets all of the rigorous safety, environmental and other licensing requirements.” SpaceX has not confirmed a target launch date for the next launch of Starship, but warning notices for pilots and mariners to steer clear of hazard areas in the Gulf of Mexico suggest the flight might happen as soon as the evening of Tuesday, May 27. The rocket will lift off from Starbase, Texas, SpaceX’s privately owned spaceport near the US-Mexico border. The FAA’s approval comes with some stipulations, including that the launch must occur during “non-peak” times for air traffic and a larger closure of airspace downrange from Starbase.

Space Force is fed up with Vulcan delays. In recent written testimony to a US House of Representatives subcommittee that oversees the military, the senior official responsible for purchasing launches for national security missions blistered one of the country’s two primary rocket providers, Ars reports. The remarks from Major General Stephen G. Purdy, acting assistant secretary of the Air Force for Space Acquisition and Integration, concerned United Launch Alliance and its long-delayed development of the large Vulcan rocket. “The ULA Vulcan program has performed unsatisfactorily this past year,” Purdy said in written testimony during a May 14 hearing before the House Armed Services Committee’s Subcommittee on Strategic Forces. This portion of his testimony did not come up during the hearing, and it has not been reported publicly to date.

Repairing trust … “Major issues with the Vulcan have overshadowed its successful certification resulting in delays to the launch of four national security missions,” Purdy wrote. “Despite the retirement of highly successful Atlas and Delta launch vehicles, the transition to Vulcan has been slow and continues to impact the completion of Space Force mission objectives.” It has widely been known in the space community that military officials, who supported Vulcan with development contracts for the rocket and its engines that exceeded $1 billion, have been unhappy with the pace of the rocket’s development. It was originally due to launch in 2020. At the end of his written testimony, Purdy emphasized that he expected ULA to do better. As part of his job as the Service Acquisition Executive for Space (SAE), Purdy noted that he has been tasked to transform space acquisition and to become more innovative. “For these programs, the prime contractors must re-establish baselines, establish a culture of accountability, and repair trust deficit to prove to the SAE that they are adopting the acquisition principles necessary to deliver capabilities at speed, on cost and on schedule.”

SpaceX’s growth on the West Coast. SpaceX is moving ahead with expansion plans at Vandenberg Space Force Base, California, that will double its West Coast launch cadence and enable Falcon Heavy rockets to fly from California, Spaceflight Now reports. Last week, the Department of the Air Force issued its Draft Environmental Impact Statement (EIS), which considers proposed modifications from SpaceX to Space Launch Complex 6 (SLC-6) at Vandenberg. These modifications will include changes to support launches of Falcon 9 and Falcon Heavy rockets, the construction of two new landing pads for Falcon boosters adjacent to SLC-6, the demolition of unneeded structures at SLC-6, and increasing SpaceX’s permitted launch cadence from Vandenberg from 50 launches to 100.

Doubling the fun … The transformation of SLC-6 would include quite a bit of overhaul. Its most recent tenant, United Launch Alliance, previously used it for Delta IV rockets from 2006 through its final launch in September 2022. The following year, the Space Force handed over the launch pad to SpaceX, which lacked a pad at Vandenberg capable of supporting Falcon Heavy missions. The estimated launch cadence between SpaceX’s existing Falcon 9 pad at Vandenberg, known as SLC-4E, and SLC-6 would be a 70-11 split for Falcon 9 rockets in 2026, with one Falcon Heavy at SLC-6, for a total of 82 launches. That would increase to a 70-25 Falcon 9 split in 2027 and 2028, with an estimated five Falcon Heavy launches in each of those years. (submitted by EllPeaTea)

Next three launches

May 23: Falcon 9 | Starlink 11-16 | Vandenberg Space Force Base, California | 20: 36 UTC

May 24: Falcon 9 | Starlink 12-22 | Cape Canaveral Space Force Station, Florida | 17: 19 UTC

May 27: Falcon 9 | Starlink 17-1 | Vandenberg Space Force Base, California | 16: 14 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: SpaceX’s expansion at Vandenberg; India’s PSLV fails in flight Read More »

in-35-years,-notepad.exe-has-gone-from-“barely-maintained”-to-“it-writes-for-you”

In 3.5 years, Notepad.exe has gone from “barely maintained” to “it writes for you”

By late 2021, major updates for Windows’ built-in Notepad text editor had been so rare for so long that a gentle redesign and a handful of new settings were rated as a major update. New updates have become much more common since then, but like the rest of Windows, recent additions have been overwhelmingly weighted in the direction of generative AI.

In November, Microsoft began testing an update that allowed users to rewrite or summarize text in Notepad using generative AI. Another preview update today takes it one step further, allowing you to write AI-generated text from scratch with basic instructions (the feature is called Write, to differentiate it from the earlier Rewrite).

Like Rewrite and Summarize, Write requires users to be signed into a Microsoft Account, because using it requires you to use your monthly allotment of Microsoft’s AI credits. Per this support page, users without a paid Microsoft 365 subscription get 15 credits per month. Subscribers with Personal and Family subscriptions get 60 credits per month instead.

Microsoft notes that all AI features in Notepad can be disabled in the app’s settings, and obviously, they won’t be available if you use a local account instead of a Microsoft Account.

Microsoft is also releasing preview updates for Paint and Snipping Tool, two other bedrock Windows apps that hadn’t seen much by way of major updates before the Windows 11 era. Paint’s features are also mostly AI-related, including a “sticker generator” and an AI-powered smart select tool “to help you isolate and edit individual elements in your image.” A new “welcome experience” screen that appears the first time you launch the app will walk you through the (again, mostly AI-related) new features Microsoft has added to Paint in the last couple of years.

In 3.5 years, Notepad.exe has gone from “barely maintained” to “it writes for you” Read More »

SAP Sapphire 2025

I just returned from SAP Sapphire 2025 in Orlando, and while SAP painted a compelling vision of an AI-powered future, I couldn’t help but think about the gap between their shiny new announcements and where most SAP customers actually are today. Let me cut through the marketing hype and give you the analyst perspective on what really matters.

The Cloud Migration Elephant in the Room

SAP’s biggest challenge isn’t building cool AI features – it’s that the vast majority of their customer base is still running on-premise ERP systems. While SAP was busy showcasing their AI Foundation and enhanced Joule capabilities, I kept thinking about the thousands of companies still on SAP ECC 6.0 or older versions, some of which haven’t been updated in years.

Here’s the reality check: nearly every exciting AI announcement at Sapphire requires SAP’s cloud solutions. The AI Foundation? Cloud-based. Enhanced Joule with proactive capabilities? Needs cloud infrastructure. The new Business Data Cloud intelligence offerings? You guessed it – cloud only.

For the average SAP shop running on-premise systems, these announcements might as well be science fiction. They’re dealing with basic integration challenges, struggling with outdated user interfaces, and fighting to get reliable reports out of their current systems. The idea of AI agents autonomously managing their supply chain seems laughably distant.

AI: Useful Tool, Not Magic Wand

Don’t get me wrong – the AI capabilities SAP demonstrated are genuinely impressive. The ability for Joule to anticipate user needs and provide contextual insights could indeed improve productivity. But let’s pump the brakes on SAP’s claim of “up to 30% productivity gains.”

I’ve been analyzing enterprise software implementations for years, and productivity gains of that magnitude typically come from process improvements and workflow optimization, not just from adding AI on top of existing inefficiencies. If your procurement process is broken, an AI agent won’t fix it – it’ll just automate the broken process faster.

The more realistic wins will come from:

  • Reducing time spent searching for information across multiple systems
  • Automating routine data analysis and report generation
  • Providing better decision support through predictive analytics
  • Streamlining repetitive tasks in finance, HR, and supply chain operations

These are valuable improvements, but they’re evolutionary, not revolutionary.

The Partnership Strategy: Hedging Their Bets

SAP’s partnerships tell an interesting story. The Accenture ADVANCE program acknowledges that many mid-market companies need significant hand-holding to modernize their SAP environments. The Palantir integration suggests SAP recognizes they can’t be everything to everyone in the data analytics space. The Perplexity collaboration admits that their AI needs external data sources to be truly useful.

These partnerships are smart business moves, but they also highlight SAP’s dependencies. If you’re planning an SAP transformation, you’re not just buying SAP – you’re buying into an ecosystem of partners and integrations that adds complexity and cost.

What This Means for Your SAP Strategy

If you’re currently running SAP on-premise, Sapphire 2025 should reinforce one key message: the innovation train is leaving the station, and it’s heading to the cloud. But before you panic about missing out on AI capabilities, consider these pragmatic steps:

For On-Premise SAP Customers:

  • Audit your current state first. Most companies I work with aren’t maximizing their existing SAP capabilities, let alone ready for AI enhancements.
  • Plan your cloud migration timeline. SAP’s 2030 end-of-support deadline for older systems isn’t going away. Use that as your forcing function.
  • Focus on data quality. AI is only as good as the data it works with. If your master data is a mess, AI won’t help.
  • Start small with cloud integration. Consider hybrid approaches that connect your on-premise core with cloud-based analytics and AI tools.

For Companies Already in SAP Cloud:

  • Evaluate which AI features actually solve business problems you have today, not theoretical future use cases.
  • Pilot before you scale. The productivity claims sound great, but test them in your environment with your data.
  • Invest in change management. The biggest barrier to AI adoption isn’t technical – it’s getting people to change how they work.

The Bottom Line: Evolution, Not Revolution

SAP Sapphire 2025 showcased legitimate innovations that will improve how businesses operate, but let’s keep expectations realistic. The companies that will benefit most from these AI capabilities are those that have already modernized their SAP infrastructure and cleaned up their business processes.

For the majority of SAP customers still on legacy systems, the real question isn’t whether AI will transform their business – it’s whether they can execute a successful modernization program that positions them to eventually take advantage of these capabilities.

Your Next Steps

Here’s what I recommend you do this week:

  • Assess where you stand on your SAP modernization journey. Are you cloud-ready, or do you have years of technical debt to address first?
  • Map your business cases for the AI capabilities that caught your attention. Can you quantify the value they’d deliver in your specific environment?
  • Build a realistic roadmap that acknowledges both the exciting possibilities and the practical constraints of your current SAP landscape.
  • Start the conversation with your leadership about long-term SAP strategy. The decisions you make in the next two years will determine whether you’re positioned to benefit from the AI revolution or left behind with legacy systems.

The AI future SAP is promising will arrive eventually, but for most companies, the path there runs through cloud migration, data governance, and process optimization. Focus on building that foundation first, and the AI capabilities will follow when you’re actually ready to use them effectively.

 

The post SAP Sapphire 2025 appeared first on Gigaom.

SAP Sapphire 2025 Read More »