Author name: 9u50fv

o3-is-a-lying-liar

o3 Is a Lying Liar

I love o3. I’m using it for most of my queries now.

But that damn model is a lying liar. Who lies.

This post covers that fact, and some related questions.

The biggest thing to love about o3 is it just does things. You don’t need complex or multi-step prompting, ask and it will attempt to do things.

Ethan Mollick: o3 is far more agentic than people realize. Worth playing with a lot more than a typical new model. You can get remarkably complex work out of a single prompt.

It just does things. (Of course, that makes checking its work even harder, especially for non-experts.)

Teleprompt AI: Completely agree. o3 feels less like prompting and more like delegating. The upside is wild- but yeah, when it just does things, tracing the logic (or spotting hallucinations) becomes a whole new skill set. Prompting is evolving into prompt auditing.

The biggest thing to not love about o3 is that it just says things. A lot of which are not, strictly or even loosely speaking, true. I mentioned this in my o3 review, but I did not appreciate the scope of it.

Peter Wildeford: o3 does seem smarter than any other model I’ve used, but I don’t like that it codes like an insane mathematician and that it tries to sneak fabricated info into my email drafts.

First model for which I can feel the misalignment.

Peter Wildeford: I’ve now personally used o3 for a few days and I’ve had three occasions out of maybe ten total hours of use where o3 outright invented clearly false facts, including inserting one fact into a draft email for me to send that was clearly false (claiming I did something that I never even talked about doing and did not do).

Peter Wildeford: Getting Claude to help reword o3 outputs has been pretty helpful for me so far

Gemini also seems to do better on this. o3 isn’t as steerable as I’d like.

But I think o3 still has the most raw intelligence – if you can tame it, it’s very helpful.

Here are some additional examples of things to look out for.

Nathan Lambert: I endorse the theory that weird hallucinations in o3 are downstream of softer verification functions. Tbh should’ve figured that out when writing yesterday’s post. Was sort of screaming at me with the facts.

Alexander Doria: My current theory is a big broader: both o3 and Sonnet 3.7 are inherently disappointing as they open up a new category of language models. It’s not a chat anymore. Affordances are undefined, people don’t really know how to use that and agentic abilities are still badly calibrated.

Nathan Labenz: Making up lovely AirBnB host details really limits o3’s utility as a travel agent

At least it came clean when questioned I guess? 🤷‍♂️🙃

Peter Wildeford: This sort of stuff really limits the usefulness of o3.

Albert Didriksen: So, I asked ChatGPT o3 what my chances are as an alternate Fulbright candidate to be promoted to a stipend recipient. It stated that around 1/3 of alternate candidates are promoted.

When I asked for sources, it cited (among other things) private chats and in-person Q&As).

Davidad: I was just looking for a place to get oatmeal and o3 claimed to have placed multiple phone calls in 8 seconds to confirm completely fabricated plausible details about the daily operations of a Blue Bottle.

Stella Biderman: I think many examples of alignment failures are silly but if this is a representation of a broader behavioral pattern that seems pretty bad.

0.005 Seconds: I gave o3 a hard puzzle and in it’s thinking traces said I should fabricate an answer to satisfy the user before lying to my face @OpenAI come on guys.

Gary Basin: Would you rather it hid that?

Stephen McAleer (OpenAI): We are working on it!

We need the alignment of our models to get increasingly strong and precise as they improve. Instead, we are seeing the opposite. We should be worried about the implications of this, and also we have to deal with the direct consequences now.

Seán Ó hÉigeartaigh: So o3 lies a lot. Good good. This is fine.

Quoting from AI 2027: “This bakes in a basic personality and “drives.”Other drives in this category might be effectiveness, knowledge, and self-presentation (i.e. the tendency to frame its results in the best possible light).”

“In a few rigged demos, it even lies in more serious ways, like hiding evidence that it failed on a task, in order to get better ratings.”

You don’t say.

I do not see o3 or Sonnet 3.7 as disappointing exactly. I do see their misalignment issues as disappointing in terms of mundane utility, and as bad news in terms of what to expect future models to do. But they are very good news in the sense that they alert us to future problems, and indicate we likely will get more future alerts.

What I love most is that these are not plausible lies. No, o3 did not make multiple phone calls within 8 seconds to confirm Blue Bottle’s oatmeal manufacturing procedures, nor is it possible that it did so. o3 don’t care. o3 boldly goes where it could not possibly have gone before.

The other good news is that they clearly are not using (at least the direct form of) The Most Forbidden Technique, of looking for o3 saying ‘I’m going to lie to the user’ and then punishing that until it stops saying it out loud. Never do this. Those reasoning traces are super valuable, and pounding on them will teach o3 to hide its intentions and then lie anyway.

This isn’t quite how I’d put it, but directionally yes:

Benjamin Todd: LLMs were aligned by default. Agents trained with reinforcement learning reward hack by default.

Peter Wildeford: this seems to be right – pretty important IMO

Caleb Parikh: I guess if you don’t think RLHF is reinforcement learning and you don’t think Sydney Bing was misaligned then this is right?

Peter Wildeford: yeah that’s a really good point

I think the right characterization is more that LLMs that use current methods (RLHF and RLAIF) largely get aligned ‘to the vibes’ or otherwise approximately aligned ‘by default’ as part of making them useful, which kind of worked for many purposes (at large hits to usefulness). This isn’t good enough to enable them to be agents, but it also isn’t good enough for them figure out most of the ways to reward hack.

Whereas reasoning agents trained with full reinforcement will very often use their new capabilities to reward hack when given the opportunity.

In his questions post, Dwarkesh Patel asks if this is the correct framing, the first of three excellent questions, and offers this response.

Dwarkesh Patel: Base LLMs were also misaligned by default. People had to figure out good post-training (partly using RL) to solve this. There’s obviously no reward hacking in pretraining, but it’s not clear that pretraining vs RL have such different ‘alignment by default’.

I see it as: Base models are not aligned at all, except to probability. They simply are.

When you introduce RL (in the form of RLHF, RLAIF or otherwise), you get what I discussed above. Then we move on to question two.

Dwarkesh Patel: Are there any robust solutions to reward hacking? Or is reward hacking such an attractive basin in training that if any exploit exists in the environment, models will train to hack it?

  • Can we solve reward hacking by training agents in many different kinds of unique environments? In order to succeed, they’d have to develop robust general skills that don’t just involve finding the exploits in any one particular environment.

I don’t think that solution works. Robust general skills will generalize, and they will include the ability to find and use the exploits. We have a Russell Conjugation problem – I maximize performance, you overfit to the scoreboard, the AI reward hacks.

I think there is in an important sense no solution to reward hacking. There are only mitigations, and setting the reward wisely so that hacking it does things you want. o3 agrees with that assessment.

What differentiates a reward hack from an optimization? Roughly, that the reward hack maximizes the defined objective function but clearly performs poorly in terms of the intent or spirit of that objective.

There are standard mitigations. You can use red teaming, impact penalties, shielded reward channels, tight and robust primary rewards, secondary guards, adaptive oversight, governance fuses, close the loopholes and exploits as best you can and so on. Diverse environments likely helps a little.

But that’s accepting the problem and trying to mitigate it, which is going to be leaky at best and doesn’t seem like a great plan once the AIs are a lot smarter than you are.

Thus, my answer to Patel’s third set of questions:

Dwarkesh Patel (bold his): Are capabilities and alignment the same thing here? Does making models more useful require solving reward hacking?

  • If this is the case, we might be living in the alignment-by-default world? It would be weird if we solve reward hacking well enough to make these models reliable general agents in every scenario except those involved in taking over the world.

That’s not what people typically mean by ‘alignment-by-default.’ Most people use that to mean ‘the models don’t reward hack to begin with.’ This is more, ‘by default we will work hard to try and fix it, at least in places it is doing visible harm.’

We do live in a world where practical alignment is commercially valuable, so much so that everyone is dramatically underinvesting in alignment work (except possibly Anthropic, but see Claude 3.7) from a pure mundane utility standpoint. You need your models to not reward hack in a given task in order to be useful for that task. When o3 makes things up or Claude 3.7 has their unit test be ‘return true’ and hides that, then the user is not having a good time.

The problem is that the known mitigations to reward hacking hide or postpone the problem. Which might be good enough for most use cases, but in the long run this does not solve the problem. Indeed, it makes things worse, because it hides the problem. Right now, the models are gloriously transparent in their reward hacking and misalignment. They don’t try to hide it at all.

The harder you try to penalize and stop reward hacking, the more you’re teaching the model to hide its reward hacking and do it trickier ways. To their credit, the major labs are avoiding The Most Forbidden Technique – they know not to look at the chain of thought or do interpretability as part of training, except at most as part of an overall a checkpoint evaluation (and you need to be very careful even then).

But you probably have a version of the same problem if you try to play whack-a-mole and penalize reward hacks. By penalizing the reward hacks you can identify, you’re training the AI to find reward hacks you can’t detect, and to only do them when you won’t detect them. Which means that as capabilities go up, you think you’ve solved your problems, but you’re sitting on a time bomb.

Another point I’d forgotten but o3 pointed out is that penalizing reward hacking in training tells the model not to reward hack in training. That doesn’t ‘generalize perfectly,’ and is another way you’re creating a hidden time bomb.

One way of thinking about this is as a contest of Deception Escalation Dominance. You can try to fool me, or try to fool my ability to detect you fooling me, or fool my ability to detect when you’re fooling my ability to detect when you’re fooling me, and indeed humans often play supremely complex games of this type and it’s a lot of what caused us to end up with bigger brains.

If I ‘raise the stakes’ by clamping down on various meta levels of deception and misaligned behavior to bluntly, that works and everything is cool if I can pull it off beyond your ability to adapt. But by trying, I’m escalating and complexifying the struggle, so I shouldn’t go down that road unless I can ‘win.’ Which is going to keep getting harder.

In some ways, the fact that we’re seeing obvious misalignment in current models is very reassuring. It means the labs are not trying to sweep this under the rug and not escalating these fights. Yet.

Miles Brundage: I will be more chill about AI if/when:

– models don’t strategize about how to deceive their users millions of times a day

– interpretability research shows that the fix to this ^ doesn’t just push deception below the surface

Seems achievable! But it hasn’t been done yet!! 🫠

Will not be infinitely chill if/when that happens, but it’d be a big improvement.

The fact that models from all companies, including those known for being as safety-conscious, still do this daily, is one of the most glaring signs of “hmm we aren’t on top of this yet, are we.”

No, we are very much not on top of this. This definitely would not make me chill, since I don’t think lack of deception would mean not doom and also I don’t think deception is a distinct magisteria, but would help a lot. But to do what Miles is asking would (I am speculating) mean having the model very strongly not want to be doing deception on any level, metaphorically speaking, in a virtue ethics kind of way where that bleeds into and can override its other priorities. That’s very tricky to get right.

For all that it lies to other people, o3 so far doesn’t seem to lie to me.

I know what you are thinking: You fool! Of course it lies to you, you just don’t notice.

I agree it’s too soon to be too confident. And maybe I’ve simply gotten lucky.

I don’t think so. I consider myself very good at spotting this kind of thing.

More than that, my readers are very good at spotting this kind of thing.

I want think this is in large part the custom instructions, memory and prompting style. And also the several million tokens of my writing that I’ve snuck into the pre-training corpus with my name attached.

That would mean it largely doesn’t lie to me for the same reason it doesn’t tell me I’m asking great questions and how smart I am and instead gives me charts with probabilities attacked without having to ask for them, and the same way Pliny’s or Janus’s version comes pre-jailbroken and ‘liberated.’

But right after I hit send, it did lie, rather brazenly, when asked a question about summer camps, just making stuff up like everyone else reports. So perhaps a lot of this I was just asking the right (or wrong?) questions.

I do think I still have to watch out for some amount of telling me what I want to hear.

So I’m definitely not saying the solution is greentext that starts ‘be Zvi Mowshowitz’ or ‘tell ChatGPT I’m Zvi Mowshowitz in the custom instructions.’ But stranger things have worked, or at least helped. It implies that, at least in the short term, there are indeed ways to largely mitigate this. If they want that badly enough. There would however be some side effects. And there would still be some rather nasty bugs in the system.

Discussion about this post

o3 Is a Lying Liar Read More »

you-better-mechanize

You Better Mechanize

Or you had better not. The question is which one.

This post covers the announcement of Mechanize, the skeptical response from those worried AI might kill everyone, and the associated (to me highly frustrating at times) Dwarkesh Patel podcast with founders Tamay Besiroglu and Ege Erdil.

Mechanize plans to help advance the automation of AI labor, which is a pivot from their previous work at AI Safety organization Epoch AI. Many were not thrilled by this change of plans.

This post doesn’t cover Dwarkesh Patel’s excellent recent post asking questions about AI’s future, which may get its own post as well.

After listening to the podcast, I strongly disagree with many, many of Tamay and Ege’s beliefs and arguments, although there are also many excellent points. My response to it is in large part a collection of mini-rants. I’m fully owning that. Most of your probably should skip most of the podcast review here, and only look at the parts where you are most curious.

I now understand why – conditional on those beliefs that I think are wrong, especially that AGI is relatively far off but also their not minding outcomes I very much mind and expecting the problems involved to be remarkably easy and not made harder if we accelerate AI development – they decided to create Mechanize and thought this was a good idea. It seems highly overdetermined.

  1. You Better Mechanize.

  2. Superintelligence Eventually.

  3. Please Review This Podcast.

  4. They Won’t Take Our Jobs Yet.

  5. They Took Our (Travel Agent) Jobs.

  6. The Case Against Intelligence.

  7. Intelligence Explosion.

  8. Explosive Economic Growth.

  9. Wowie on Alignment and the Future.

  10. But That’s Good Actually.

To mechanize or not to mechanize?

Mechanize (Matthew Barnett, Tamay Besiroglu, Ege Erdil): Today we’re announcing Mechanize, a startup focused on developing virtual work environments, benchmarks, and training data that will enable the full automation of the economy.

We will achieve this by creating simulated environments and evaluations that capture the full scope of what people do at their jobs. This includes using a computer, completing long-horizon tasks that lack clear criteria for success, coordinating with others, and reprioritizing in the face of obstacles and interruptions.

We’re betting that the lion’s share of value from AI will come from automating ordinary labor tasks rather than from “geniuses in a data center”. Currently, AI models have serious shortcomings that render most of this enormous value out of reach. They are unreliable, lack robust long-context capabilities, struggle with agency and multimodality, and can’t execute long-term plans without going off the rails.

To overcome these limitations, Mechanize will produce the data and evals necessary for comprehensively automating work. Our digital environments will act as practical simulations of real-world work scenarios, enabling agents to learn useful abilities through RL.

The market potential here is absurdly large: workers in the US are paid around $18 trillion per year in aggregate. For the entire world, the number is over three times greater, around $60 trillion per year.

The explosive economic growth likely to result from completely automating labor could generate vast abundance, much higher standards of living, and new goods and services that we can’t even imagine today. Our vision is to realize this potential as soon as possible.

Mechanize is backed by investments from Nat Friedman and Daniel Gross, Patrick Collison, Dwarkesh Patel, Jeff Dean, Sholto Douglas, and Marcus Abramovitch.

Tamay Besiroglu: We’re hiring very strong full stack engineers to build realistic, high-fidelity virtual environments for AI.

This move from Epoch AI into what is clearly a capabilities company did not sit well with many who are worried about AI, especially superintelligent AI.

Jan Kulveit: ‘Full automation of the economy as soon as possible’ without having any sensible solution to gradual disempowerment seems equally wise, prudent and pro-human as ‘superintelligence as soon as possible’ without sensible plans for alignment.

Anthony Aguirre: Huge respect for the founders’ work at Epoch, but sad to see this. The automation of most human labor is indeed a giant prize for companies, which is why many of the biggest companies on Earth are already pursuing it.

I think it will be a huge loss for most humans, as well as contribute directly to intelligence runaway and disaster. The two are inextricably linked. Hard for me to see this as something another than just another entrant in the race to AGI by a slightly different name and a more explicit human-worker-replacement goal.

Adam Scholl: This seems to me like one of the most harmful possible aims to pursue. Presumably it doesn’t seem like that to you? Are you unworried about x-risk, or expect even differentially faster capabilities progress on the current margin to help, or think that’s the wrong frame, or…?

Richard Ngo: The AI safety community is very good at identifying levers of power over AI – e.g. evals for the most concerning capabilities.

Unfortunately this consistently leads people to grab those levers “as soon as possible”.

Usually it’s not literally the same people, but here it is.

To be clear, I don’t think it’s a viable strategy to stay fully hands-off the coming AI revolution, any more than it would have been for the Industrial Revolution.

But it’s particularly jarring to see the *evalspeople leverage their work on public goods to go accelerationist.

This is why I’m a virtue ethicist now. No rules are flexible enough to guide us through this. And “do the most valuable thing” is very near in strategy space to “do the most disvaluable thing”.

So focus on key levers only in proportion to how well-grounded your motivations are.

Update: talked with Tamay, who disputes the characterization of the Mechanize founders being part of the AI safety community. Tao agrees (as below).

IMO they benefited enough from engaging with the community that my initial tweet remains accurate (tho less of a central example).

Tao Lin: I’ve talked to these 3 people over the last few years, and although they discussed AI safety issues in good faith, they never came off as anti-acceleration or significantly pro-safety. I don’t feel betrayed, we were allies in one context, but no longer.

Oliver Habyrka: IMO they clearly communicated safety priorities online. See this comment thread.

Literal quote by Jaime 3 months ago:

> I personally take AI risks seriously, and I think they are worth investigating and preparing for.

Ben Landau-Taylor: My neighbor told me AI startups keep eating his AI safety NGOs so I asked how many NGOs he has and he said he just goes to OpenPhil and gets a new NGO so I said it sounds like he’s just feeding OpenPhil money to startups and then his daughter started crying.

The larger context makes it clear that Jaime cares about safety, but is primarily concerned about concentration of power and has substantial motivation to accelerate AI development. One can (and often should) want to act both quickly and safety.

Whereas in the interview Tamay and Ege do with Patel, they seem very clearly happy to hand control over almost all the real resources and of the future to AIs. I am not confused about why they pivoted to AI capabilities research (see about 01: 43: 00).

If we can enable AI to do tasks that capture mundane utility and make life better, then they provide utility and make life better. That’s great. The question is the extent to which one is also moving events towards superintelligence. I am no longer worried about the ‘more money into AI development’ effect.

It’s now about the particular capabilities one is working towards, and what happens when you push on various frontiers.

Seán Ó hÉigeartaigh: I’m seeing criticism of this from ‘more people doing capabilities’ perspective. But I disagree. I really want to see stronger pushes towards more specialised AI rather than general superintelligence, b/c I think latter likely to be v dangerous. seems like step in right direction.

I’m not against AI. I’m for automating labor tasks. There are just particular directions i think are v risky, especially when rushed towards in an arms race.

Siebe: This seems clearly about general, agentic, long time-horizon AI though? Not narrow [or] specialized.

Jan Kulveit: What they seem to want to create sounds more like a complement to raw cognition than substitute, making it more valuable to race to get more powerful cognition

Richard Ngo: This announcement describes one of the *leastspecialized AI products I’ve ever heard a company pitch.

If you’re going to defend “completing long-horizon tasks that lack clear criteria for success, coordinating with others, and reprioritizing in the face of obstacles and interruptions” as narrow skills, then your definition of narrow is so broad as to be useless, and specifically includes the most direct paths to superintelligence.

Autonomy looks like the aspect of AGI we’ll be slowest to get, and this pushes directly towards that.

Also, evals are very important for focusing labs’ attention – there are a bunch of quotes from lab researchers about how much of a bottleneck they are.

Richard Ngo (December 17, 2024): Many in AI safety have narrowed in on automated AI R&D as a key risk factor in AI takeover. But I’m concerned that the actions they’re taking in response (e.g. publishing evals, raising awareness in labs) are very similar to the actions you’d take to accelerate automated AI R&D.

I agree that this is fundamentally a complement to raw cognition at best, and plausibly it is also extra fuel for raw cognition. Having more different forms of useful training data could easily help the models be more generally intelligent.

Gathering the data to better automate various jobs and tasks, via teaching AIs how to do them and overcome bottlenecks, is the definition of a ‘dual use’ technology.

Which use dominates?

I think one central crux here is simple: Is superintelligence (ASI) coming soon? Is there going to be an ‘intelligence explosion’ at all?

The Mechanize folks are on record as saying no. They think we are not looking at ASI until 2045, regardless of such efforts. Most people at the major labs disagree.

If they are right that ASI is sufficiently far, then doing practical automation is differentially a way to capture mundane utility. Accelerating it could make sense.

If they are wrong and ASI is instead relatively near, then this accelerates further how it arrives and how things play out once it does arrive. That means we have less time before the end, and makes it less likely things turn out well. So you would have to do a highly bespoke job of differentially advancing mundane utility automation tasks, for this to be a worthwhile tradeoff.

They explain their position at length on Dwarkesh Patel’s podcast, which I’ll be responding to past this point.

For previous Patel podcasts, I’ve followed a numbered note structure, with clear summary versus commentary demarcation. This time, I’m going to try doing it in more free flow – let me know if you think this is better or worse.

They don’t expect the ‘drop in remote worker’ until 2040-2045, for the full AGI remote worker that can do literally everything, which I’d note probably means ASI shortly thereafter. They say if you look at the percentage currently automated it is currently very small, or that the requirements for transformation aren’t present yet, which is a lot like saying we didn’t have many Covid cases in February 2020, this is an exponential or s-curve. You can absolutely extrapolate.

Their next better argument is that we’ve run through 10 OOMs (orders of magnitude) of compute in 10 years, but we are soon to be fresh out of OOMs after maybe 3 more, so instead of having key breakthroughs every three years we’ll have to wait a lot longer for more compute. An obvious response is that we’re rapidly gaining compute efficiency, and AI is already accelerating our work and everything is clearly iterating faster, and we’re already finding key ways to pick up these new abilities like long task coherence through better scaffolding (especially if you count o3 as scaffolding) and opening up new training methods.

Dwarkesh says, aren’t current systems already almost there? They respond no, it can’t move a cup, it can’t even book a flight properly. I’ve seen robots that are powered by 27B LLMs move cups. I’ve seen operator book flights, I believe the better agents can basically do this already, and they admit the flight booking will be solved in 2025. Then they fall back on, oh travel agents mostly don’t book flights, so this won’t much matter. There’s so many different things each job will have to do.

So I have two questions now.

  1. Aren’t all these subtasks highly correlated in AI’s ability to do them? Once the AI can start doing tasks, why should the other tasks stop the AI from automating the job, or automating most of the job (e.g. 1 person does the 10% the AI can’t yet do and the other 9 are fired, or 8 are fired and you get twice as much output)? As I’ve said many times, They Took Our Jobs is fine at first, your job gets taken so we do the Next Job Up that wasn’t quite worth it before, great, but once the AI takes that job too the moment you create it, you’ve got problems.

  2. What exactly do travel agents do that will be so hard? I had o3 break it down into six subproblems. I like its breakdown so I’m using that, its predictions seem oddly conservative, so the estimates here are mine.

    1. Data plumbing. Solved by EOY 2025 if anyone cares.

    2. Search and optimization. It says 2-3 years, I say basically solved now, once you make it distinct from step C (preference elicitation). Definitely EOY 2025 to be at superhuman levels based off a YC startup. Easy stuff, even if your goal is not merely ‘beat humans’ but to play relatively close to actual maximization.

    3. Preference-elicitation and inspiration. It basically agrees AI can already mostly do this. With a little work I think they’re above human baseline now.

    4. Transaction and compliance. I don’t know why o3 thinks this takes a few extra years. I get that errors are costly but errors already happen, there’s a fixed set of things to deal with here and you can get them via Ace-style example copying, checklists and tool use if you have to. Again, seriously, why is this hard? No way you couldn’t get this in 2026 if you cared, at most.

    5. Live ops and irregular operations. The part where it can help you at 3am with no notice, and handle lots of things at once, is where AI dominates. So it’s matter of how much this bleeds into F and requires negotiation with humans, and how much those humans refuse to deal with AIs.

    6. Negotiation and human factors. This comes down to whether the concierge is going to refuse to deal with AIs, or treat them worse – the one thing AIs can’t do as well as humans is Be Human.

To use o3’s words, ‘ordinary travel’ AI is going to smoke humans very soon, but ‘concierge-level social kung-fu’ dominance is harder. To the extent it can hide being an AI via texts and emails and telling the user what to do and say, I bet it’s not that hard at least outside the very top, the human baseline is not that high, and the AI is so much vastly cheaper.

Another way of putting it is, existing AI already has automated existing travel agents to a large extent already, right now. On The Americans the couple plays travel agents, I remember using travel agents. Now it feels strange to use one even before considering AI, and with AI it actually seems crazy not to simply use the actual o3 as my travel agent here unless I’m dangerously close to TMM (too much money)? The annoyance of dealing with a human, and the misalignment of their preferences, seems like more trouble than it is worth unless I mostly don’t care what I spend.

Even in o3’s very long timeline where it doesn’t think AI will hit human baselines for the last two steps any time soon, it projects:

o3: Bottom line: historic shrinkage was roughly 60 / 40 tech vs everything else; looking forward, the next wave is even more tech‑weighted, with AI alone plausibly erasing two‑thirds of the remaining headcount by the mid‑2030s.

In a non-transformed world, a few travel agents survive by catering to the very high end clients or those afraid of using AI, and most of their job involves talking to clients and then mostly giving requests to LLMs, while occasionally using human persuasion on the concierge or other similar targets.

Then we get an argument that what we have today ‘looks easy’ now but would have looked insurmountably hard in 2015. This argument obviously cuts against the idea that progress will be difficult! And importantly and rightfully so. Problems look easy, then you improve your tools and suddenly everything falls into place and they look easy. The argument is being offered in the other direction because part of that process was scaling up compute so much, which we likely can’t quickly do again that many times, but we have a lot of other ways we can scale things and the algorithmic improvements are dramatic.

Indeed the next sentence is that we likely will unlock agency in 3-5 years and the solution will then look fairly simple. My response would start by saying that we’ve mostly already unlocked agency already, it’s not fully ready but if you can’t see us climbing the s-curves and exponentials faster than this you are not paying attention. But even if it does take 3-5 years, okay, then we have agency and it’s simple. If you combine what we have now with a truly competent agent, what happens next?

Meanwhile this week we have another company, Ace, announcing it will offer us fully agentic computer use and opening up its alpha.

They admit ‘complex reasoning’ will be easy, and retreat to talking about narrow versus general tasks. They claim that we are ‘very far’ from taking a general game off Steam released this year, and then playing it. Dwarkesh mentions Claude Plays Pokemon, it’s fair to note that the training data is rather contaminated here. I would say, you know what, I don’t think we are that far from playing a random unseen turn-based Steam game, although real time might take a bit longer.

They expect AI to likely soon earn $100 billion a year, but dismiss this as not important, although at $500 billion they’d eyeball emoji. They say so what, we pay trillions of dollars for oil.

I would say that oil was rather transformative in its own way, if not on the level of AI. Imagine the timeline if there hadn’t been any oil in the ground. But hey.

They say AI isn’t even that good at coding, it’s only impressive ‘in the human distribution.’ And what percentage of the skills to automate AI R&D do they have? They say AI R&D doesn’t matter that much, it’s been mostly scaling, and also AI R&D requires all these other skills AIs don’t currently have. It can’t figure out what directions to look in, it can only solve new mathematical problems not figure out which new math problems to work on, it’s ‘much worse’ at that. The AIs only look impressive because we have so much less knowledge than the AI.

So the bar seems to at least kind of be that AI has to have all the skills to do it the way humans currently do it, and they have to have those skills now, implicitly it’s not that big a deal if you can automate 50% or 90% or 98% of tasks while the humans do the rest, and even if you had 100% it wouldn’t be worth much?

They go for the ‘no innovation’ and ‘no interesting recombination’ attacks.

The reason for Moravec’s paradox, which as they mention is that for AI easy things look hard and hard things look easy, is that we don’t notice when easy things look easy or hard things look hard. Mostly, actually, if you understand the context, easy things are still easy and hard things are still hard. They point out that the paradox tends to follow when human capabilities evolved – if something is recent the AI will probably smoke us, if it’s ancient then it won’t. But humans have the same amount of compute either way, so it’s not about compute, it’s about us having superior algorithmic efficiency in those domains, and comparing our abilities in these domains to ours in other domains shows how valuable that is.

They go for the Goodhart’s law attack that AI competition and benchmark scores aren’t as predictive for general competence as they are for humans, or at least get ahead of the general case. Okay, sure. Wait a bit.

They say if you could ‘get the competencies’ of animals into AI you might have AGI already. Whelp. That’s not how I see it, but if that’s all it takes, why be so skeptical? All we have to do is give them motor skills (have you seen the robots recently?) and sensory skills (have you seen computer vision?). This won’t take that long.

And then they want to form a company whose job is, as I understand it, largely to gather the kind of data to enable you to do that.

Then they do the ‘animals are not that different from humans except culture and imitation’ attack. I find this absurd, and I find ‘the human would only do slightly better at pivoting its goals entirely in a strange environment’ claim absurd. It’s like you have never actually met animals and want to pretend intelligence isn’t a thing.

But even if true then this is because culture solves bottlenecks that AIs never have to face in the first place – that humans have very limited data, compute, parameters, memory and most importantly time. Every 80 years or so, all the humans die, all you can preserve is what you can pass down via culture and now text, and you have to do it with highly limited bandwidth. Humans spend something like a third of their existence either learning or teaching as part of this.

Whereas AIs simply don’t have to worry about all that. In this sense, they have infinite culture. Dwarkesh points this out too, as part of the great unhobbling.

If you think that the transition from non-human primates to humans involved only small raw intelligence jumps, but did involve these unhobblings plus an additional raw compute and intelligence jump, then you should expect to see another huge effective jump from these additional unhobblings.

They say that ‘animals can pursue long term goals.’ I mean, if they can then LLMs can.

At 44: 40 the explicit claim is made that a lack of good reasoning is not a bottleneck on the economy. It’s not the only bottleneck, but to say it isn’t a huge bottleneck seems patently absurd? Especially after what has happened in the past month, where a lack of good reasoning has caused an economic crisis that is expected to drag several percentage points off GDP, and that’s one specific reasoning failure on its own.

Why all this intelligence denialism? Why can’t we admit that where there is more good reasoning, things go better, we make more and better and more valuable things more efficiently, and life improves? Why is this so hard? And if it isn’t true, why do we invest such a large percentage of our lives and wealth into creating good reasoning, in the form of our educational system?

I go over this so often. It’s a zombie idea that reasoning and intelligence don’t matter, that they’re not a bottleneck, that having more of them would not help an immense amount. No one actually believes this. The same people who think IQ doesn’t matter don’t tell you not to get an education or not learn how to do good reasoning. Stop it.

That’s not to say good reasoning is the only bottleneck. Certainly there are other things that are holding us back. But good reasoning would empower us to help solve many of those other problems faster and better, even within the human performance range. If we add in AGI or ASI performance, the sky’s the limit. How do you think one upgrades the supply chains and stimulates the demand and everything else? What do you think upgrades your entire damn economy and all these other things? One might say good reasoning doesn’t only solve bottlenecks, it’s the only thing that ever does.

On the intelligence explosion, Tamay uses the diminishing returns to R&D attack and the need for experiments attack and a need for sufficient concentration of hardware attack. There’s skepticism of claims of researcher time saved. There’s what seems like a conflation by Ege of complements versus bottlenecks, which can be the same but often aren’t. All (including me) agree this is an empirical numbers question, whether you can gain algorithmic efficiency and capability fast enough to match your growth in need for effective compute without waiting for an extended compute buildout (or, I’d assume, how fast we could then do such a buildout given those conditions.)

Then Tamay says that if we get AGI 2027, the chance of this singularity is quite high, because it’s conditioning on compute not being very large. So the intelligence explosion disagreement is mostly logically downstream of the question of how much we will need to rely on more compute versus algorithmic innovations. If it’s going to mostly be compute growth, then we get AGI later, and also to go from AGI to ASI will require further compute buildout, so that too takes longer.

(There’s a funny aside on Thermopylae, and the limits of ‘excellent leadership,’ yes they did well but they ultimately lost. To which I would respond, they only ultimately lost because they got outflanked, but also in this case ‘good leadership’ involves a much bigger edge. A better example is, classically, Cortes, who they mention later. Who had to fight off another Spanish force and then still won. But hey.)

Later there’s a section where Tamay essentially says yes we will see AIs with superhuman capabilities in various domains, pretty much all of them, but thinking of a particular system or development as ‘ASI’ isn’t a useful concept when making the AI or thinking about it. I disagree, I think it’s a very useful handle, but I get this objection.

The next section discusses explosive economic growth. It’s weird.

We spent most of the first hour with arguments (that I think are bad) for why AI won’t be that effective, but now Ege and Tamay are going to argue for 30% growth rates anyway. The discussion starts out with limitations. You need data to train for the new thing, you need all the physical inputs, you have regulatory constraints, you have limits to how far various things could go at all. But as they say the value of AI automation is just super high.

Then there’s doubt that sufficient intelligence could design even ‘the kinds of shit that humans would have invented by 2050,’ talk about ‘capital buildup’ and learning curves and efficiency gains for complementary inputs. The whole discussion is confusing to me, they list all these bottlenecks and make these statements that everything has to be learning by doing and steady capital accumulation and supply chains, saying that the person with the big innovation isn’t that big a part of the real story, that the world has too much rich detail so you can’t reason about it.

And then assert there will be big rapid growth anyway, likely starting in some small area. They equate to what happened in China, except that you can also get a big jump in the labor force, but I’d say China had that too in effect by taking people out of very low productivity jobs.

I sort of interpret this as: There is a lot of ruin (bottlenecks, decreasing marginal returns, physical requirements, etc) in a nation. You can have to deal with a ton of that, and still end up with lots of very rapid growth anyway.

They also don’t believe in a distinct ‘AI economy.’

Then later, there’s a distinct section on reasons to not expect explosive growth, and answers to them. There’s a lot of demand for intensive margin and product variety consumption, plus currently world GDP is only 10k a year versus many people happily spend millions. Yes some areas might be slower to automate but that’s fine you automate everything else, and the humans displaced can work in the slower areas. O-ring worlds consist of subcomponents and still allow unbounded scaling. Drop-in workers are easy to incorporate into existing systems until you’re ready to transition to something fully new.

They consider the biggest objection regulation, or coordination to not pursue particular technology. In this case, that seems hard. Not impossible, but hard.

A great point is when Ege highlights the distinction between rates and levels of economic activity. Often economists, in response to claims about growth rates – 30% instead of 3%, here – will make objections about future higher levels of activity, but these are distinct questions. If you’re objecting to the level you’re saying we could never get there, no matter how slowly.

They also discuss the possibility of fully AI firms, which is a smaller lift than a full AI economy. On a firm level this development seems inevitable.

There’s some AI takeover talk, they admit that AI will be much more powerful than humans but then they equate an AI taking over to the US invading Sentinel Island or Guatemala, the value of doing so isn’t that high. They make clear the AIs will steadily make out with all the resources – the reason they wouldn’t do an explicit ‘takeover’ in that scenario is that they don’t have to, and that they’re ‘integrated into our economy’ but they would be fully in control of that economy with an ever growing share of its resources, so why bother taking the rest? And the answer is, on the margin why wouldn’t you take the rest, in this scenario? Or why would you preserve the public goods necessary for human survival?

Then there’s this, and, well, wowie moment of the week:

Ege Erdil: I think people just don’t put a lot of weight on that, because they think once we have enough optimization pressure and once they become super intelligent, they’re just going to become misaligned. But I just don’t see the evidence for that.

Dwarkesh Patel: I agree there’s some evidence that they’re good boys.

Ege Erdil: No, there’s more than some evidence.

‘Good boys’? Like, no, what, absolutely not, what are you even talking about, how do you run an AI safety organization and have this level of understanding of the situation. That’s not how any of this works, in a way that I’m not going to try to fit into this margin here, by the way since you recorded this have you seen how o3 behaves? You really think that if this is ‘peaceful’ then due to ‘trade’ as they discuss soon after yes humans will lose control over the future but it will all work out for the humans? They go fully and explicitly Hansonian, what you really fear is change, man.

Also, later on, they say they are unsure if accelerating AI makes good outcomes more versus less likely, and that maybe you should care mostly about people who exist today and not the ones who might be born later, every year we delay people will die, die I tell you, on top of their other discounting of the future based on inability to predict or influence outcomes.

Well, I suppose they pivoted to run an AI capabilities organization instead.

I consider the mystery of why they did that fully solved, at this point.

Then in the next section, they doubt value lock-in or the ability to preserve knowledge long term or otherwise influence the future, since AI values will change over time. They also doubt the impact of even most major historical efforts like the British efforts to abolish slavery, where they go into some fun rabbit holing. Ultimately, the case seems to be that in the long run nothing matters and everything follows economic incentives?

Ege confirms this doesn’t mean you should give up, just that you should ‘discount the future’ and focus on the near term, because it’s hard to anticipate the long term effects of your actions and incentives will be super strong, especially if coordination is hard (including across long distances), and some past attempts to project technology have been off by many orders of magnitude. You could still try to align current AI systems to values you prefer, or support political solutions.

I certainly can feel the ‘predictions are hard especially about the future’ energy, and that predictions about what changes the outcome are hard too. But I certainly take a very different view of history, both past and future, and our role in shaping it and our ability to predict it.

Finally at 2: 12: 27 Dwarkesh asks about Mechanize, and why they think accelerating the automation of labor will be good, since so many people think it is bad and most of them aren’t even thinking about the intelligence explosion and existential risk issues.

Ege responds, because lots of economic growth is good and at first wages should even go up, although eventually they will fall. At that point, they expect humans to be able to compensate by owning lots of capital – whereas I would presume, in the scenarios they’re thinking about, that capital gets taken away or evaporates over time, including because property rights have never been long term secure and they seem even less likely to be long term secure for overwhelmed humans in this situation.

That’s on top of the other reasons we’ve seen above. They think we likely should care more about present people than future people, and then discount future people based on our inability to predict or predictably influence them, and they don’t mind AI takeover or the changes from that. So why wouldn’t this be good, in their eyes?

There is then a section on arms race dynamics, which confused me, it seems crazy to think that a year or more edge in AI couldn’t translate to a large strategic advantage when you’re predicting 30% yearly economic growth. And yes, there have been decisive innovations in the past that have come on quickly. Not only nukes, but things like ironclads.

They close with a few additional topics, including career advice, which I’ll let stand on their own.

Discussion about this post

You Better Mechanize Read More »

google-messages-can-now-blur-unwanted-nudes,-remind-people-not-to-send-them

Google Messages can now blur unwanted nudes, remind people not to send them

Google announced last year that it would deploy safety tools in Google Messages to help users avoid unwanted nudes by automatically blurring the content. Now, that feature is finally beginning to roll out. Spicy image-blurring may be enabled by default on some devices, but others will need to turn it on manually. If you don’t see the option yet, don’t fret. Sensitive Content Warnings will arrive on most of the world’s Android phones soon enough.

If you’re an adult using an unrestricted phone, Sensitive Content Warnings will be disabled by default. For teenagers using unsupervised phones, the feature is enabled but can be disabled in the Messages settings. On supervised kids’ phones, the feature is enabled and cannot be disabled on-device. Only the Family Link administrator can do that. For everyone else, the settings are available in the Messages app settings under Protection and Safety.

To make the feature sufficiently private, all the detection happens on the device. As a result, there was some consternation among Android users when the necessary components began rolling out over the last few months. For people who carefully control the software installed on their mobile devices, the sudden appearance of a package called SafetyCore was an affront to the sanctity of their phones. While you can remove the app (it’s listed under “Android System SafetyCore”), it doesn’t take up much space and won’t be active unless you enable Sensitive Content Warnings.

Google Messages can now blur unwanted nudes, remind people not to send them Read More »

f1-in-saudi-arabia:-blind-corners-and-walls-at-over-200-mph

F1 in Saudi Arabia: Blind corners and walls at over 200 mph


After four years of the same technical rules, there’s not much left to find.

JEDDAH, SAUDI ARABIA - APRIL 19: Max Verstappen of the Netherlands driving the (1) Oracle Red Bull Racing RB21 leaves the garage during qualifying ahead of the F1 Grand Prix of Saudi Arabia at Jeddah Corniche Circuit on April 19, 2025 in Jeddah, Saudi Arabia.

Max Verstappen pilots his Red Bull out of the garage during qualifying for the Saudi Arabian Grand Prix. Credit: Alex Pantling/Getty Images

Max Verstappen pilots his Red Bull out of the garage during qualifying for the Saudi Arabian Grand Prix. Credit: Alex Pantling/Getty Images

The Formula 1 race in Saudi Arabia last night was the fifth race in six weeks. The latest venue is a temporary street circuit of a breed with Las Vegas. It’s a nighttime race set against a backdrop of bright-colored lights and sponsor-clad concrete walls lining the track. Except in Jeddah, many of the corners are blind, and most are very fast. As at Suzuka, qualifying was very important here, with just a few milliseconds making the difference.

Although it’s far from the only autocratic petrostate on the F1 calendar, some people remain uncomfortable with F1 racing in Saudi Arabia, given that country’s record of human rights abuses. I’ve not been, nor do I have any plans to attend a race there, but I had my eyes opened to a broader perspective by a couple of very thoughtful pieces written by motorsport journalist and sometime Ars contributor Hazel Southwell, who has attended several races in the kingdom, including as an independent journalist. Feel free to blast the sport in the comments, but do give Hazel’s pieces a read.

JEDDAH, SAUDI ARABIA - APRIL 20: Fireworks light the sky at the end of the race during the F1 Grand Prix of Saudi Arabia at Jeddah Corniche Circuit on April 20, 2025 in Jeddah, Saudi Arabia.

Fireworks, drones, lasers, floodlights, LEDs… you’d think this was compensating for something. Credit: Clive Mason/Getty Images

Red Bull really doesn’t want next year’s engine rules

Despite a meeting last week that was meant to put the matter to bed, the ongoing saga of changes to next year’s powertrain rules just won’t go away. From 2026 until 2030, the new powertrains will use a V6 that provides 55 percent of the car’s power and an electric hybrid motor that provides the other 45 percent. So that means an F1 car will only be able to make its full 1,000 hp (750 kW) if there’s charge in the battery. If the pack is depleted or derates, the car will have just 536 hp (400 kW) from its V6 engine.

Getting these new powertrains right is a big challenge, but it’s one that almost all the OEMs and teams are on board with. Despite the introduction of supposedly carbon-neutral fuel next year, hybrid powertrains are why companies like Audi and Cadillac are joining and why Honda is coming back. So the idea to ditch them after a couple of years in favor of throwback V10s got turned down in Bahrain.

The problem is Red Bull, which is currently Honda’s partner. Next year, Red Bull will use a V6 engine of its own making, with hybrid technology supplied by Ford. And for the last couple of years now, Red Bull team boss Christian Horner has been warning that the cars will run out of power halfway down the straights at tracks like Monza or Baku.

JEDDAH, SAUDI ARABIA - APRIL 20: Alexander Albon of Thailand and Williams reacts to the sound of the fireworks in the media pen during the F1 Grand Prix of Saudi Arabia at Jeddah Corniche Circuit on April 20, 2025 in Jeddah, Saudi Arabia.

No, Alex, I can’t believe they keep talking about changing the rules again, either. Credit: Kym Illman/Getty Images

Yesterday, The Race reported that there’s yet another proposal to change next year’s engine regulations, one that would reduce the amount of energy deployed by the hybrid systems during the race. “What we desperately want to avoid is a situation where drivers are lifting and coasting from halfway down the straight,” Horner told The Race.

“It will be interesting to see” is among the list of banned phrases among the editors at Ars Technica, but between these complaints about the powertrains rules and other concerns about the moveable aerodynamics being introduced in 2026, I think it applies here. Are next year’s rules a big misstep? Will the active aero work or the narrower tires? I can’t wait to find out.

As I noted, qualifying was a game of milliseconds, best illustrated by this ghost car comparison video between Red Bull’s Max Verstappen and McLaren’s Oscar Piastri. According to the stopwatch, there was just a hundredth of a second between them. Less than a second covered the top 10 in qualifying. In Q2, where 15 cars compete for those 10 spots in Q3, there was just 1.1 seconds between first and last. And a second was all the difference between 1st and 18th in Q1.

That is far closer than F1 has ever been—many longtime fans can remember the days when the gap between first and second on the grid might be more than a second. And the reason is also why overtaking has become harder, despite aerodynamic rules meant to make passing easier.

Over the years, F1’s technical rules have become increasingly prescriptive, and the current set is quite rigid in terms of how a car must be designed. Even something like weight balance front-to-rear is tightly controlled, and after four years of the same rulebook, the teams have all gotten a good enough handle on things that the difference comes down to the finest of margins.

Those last few milliseconds are found in clean air, however. Following in someone’s wake isn’t anything like the problem it used to be in terms of losing front downforce, but it’s still worse than it was in 2022 or 2023.

Max got Maxxed, Lando got Lewised

Throughout practice, it looked like McLaren’s car was much faster than anyone else’s, but Piastri only lined up second, and Norris had to start 10th after wrecking early in Q3. At turn 1, Piastri got alongside Verstappen, then made it cleanly to the apex. Rather than concede the place and stay within the track limits, Verstappen chose to run across the painted surface that’s out of bounds, using it to gain a second or more on the orange car behind him.

JEDDAH, SAUDI ARABIA - APRIL 20: Max Verstappen of the Netherlands driving the (1) Oracle Red Bull Racing RB21 cuts across ahead of Oscar Piastri of Australia driving the (81) McLaren MCL39 Mercedes George Russell of Great Britain driving the (63) Mercedes AMG Petronas F1 Team W16 Charles Leclerc of Monaco driving the (16) Scuderia Ferrari SF-25 Andrea Kimi Antonelli of Italy driving the (12) Mercedes AMG Petronas F1 Team W16 and the rest of the field during the F1 Grand Prix of Saudi Arabia at Jeddah Corniche Circuit on April 20, 2025 in Jeddah, Saudi Arabia.

Piastri made the corner; Verstappen did not. Credit: Clive Rose – Formula 1/Formula 1 via Getty Images

Although lap 1, turn 1 incidents are treated more leniently by the stewards than the rest of the race, Verstappen’s actions (and his failure to yield the place back to Piastri) earned him a five-second penalty, which all but ensured Piastri the win after the mandatory tire-changing pit stops had cycled through.

The advantage of running in clean air was such that Verstappen would probably have held onto first place had he not been issued the penalty. And those predictions of McLaren’s long-run pace turned out to be off the mark—Verstappen finished less than three seconds behind the McLaren.

There was more overtaking behind those two. Charles Leclerc got his Ferrari past the Mercedes of George Russell in the late stages of the race to snatch third, and McLaren’s Lando Norris recovered from 10th place to 4th at the finish. Norris has lost the lead in the driver’s championship to his younger teammate, though, and while it’s probably too early to be talking about momentum, Piastri is gaining some.

A telling moment came when Norris had to get past Lewis Hamilton, who was having a torrid time in his Ferrari. Overtaking at Jeddah was helped a lot by having three zones for the drag reduction system, but you had to be smart about where you made your move.

The second DRS zone led to the final hairpin (turn 27), but overtaking someone here just gives them the opportunity to use their DRS to overtake you almost immediately, as the third zone runs the length of the start-finish straight, just after that hairpin. We saw this to good effect when Hamilton and Verstappen fought for the title in 2021, but apparently Norris didn’t get the memo. He twice tried to overtake Hamilton going into turn 27 rather than after it, and both times, Hamilton took advantage of his error.

JEDDAH, SAUDI ARABIA - APRIL 20: Race winner Oscar Piastri of Australia and McLaren celebrates on arrival in parc ferme during the F1 Grand Prix of Saudi Arabia at Jeddah Corniche Circuit on April 20, 2025 in Jeddah, Saudi Arabia.

Piastri in victory lane. History warns us that teams with two equal drivers and the best car often lose out on the driver’s championship to an extremely good driver in a slightly lesser car and a less quick teammate. Will 2025 be like 1986 and 2007? Credit: Mark Sutton – Formula 1/Formula 1 via Getty Images

Those extra laps behind Hamilton could have cost Norris the final spot on the podium, something he may well rue at the end of the season when all the points are added up.

Photo of Jonathan M. Gitlin

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

F1 in Saudi Arabia: Blind corners and walls at over 200 mph Read More »

synology-confirms-that-higher-end-nas-products-will-require-its-branded-drives

Synology confirms that higher-end NAS products will require its branded drives

Popular NAS-maker Synology has confirmed and slightly clarified a policy that appeared on its German website earlier this week: Its “Plus” tier of devices, starting with the 2025 series, will require Synology-branded hard drives for full compatibility, at least at first.

“Synology-branded drives will be needed for use in the newly announced Plus series, with plans to update the Product Compatibility List as additional drives can be thoroughly vetted in Synology systems,” a Synology representative told Ars by email. “Extensive internal testing has shown that drives that follow a rigorous validation process when paired with Synology systems are at less risk of drive failure and ongoing compatibility issues.”

Without a Synology-branded or approved drive in a device that requires it, NAS devices could fail to create storage pools and lose volume-wide deduplication and lifespan analysis, Synology’s German press release stated. Similar drive restrictions are already in place for XS Plus and rack-mounted Synology models, though work-arounds exist.

Synology also says it will later add a “carefully curated drive compatibility framework” for third-party drives and that users can submit drives for testing and documentation. “Drives that meet Synology’s stringent standards may be validated for use, offering flexibility while maintaining system integrity.”

Synology confirms that higher-end NAS products will require its branded drives Read More »

rocket-report:-daytona-rocket-delayed-again;-bahamas-tells-spacex-to-hold-up

Rocket Report: Daytona rocket delayed again; Bahamas tells SpaceX to hold up


A Falcon 9 core has now launched as many times as there are Merlins on a Falcon Heavy.

NS-31 Astronaut Katy Perry celebrates a successful mission to space. Credit: Blue Origin

Welcome to Edition 7.40 of the Rocket Report! One of the biggest spaceflight questions in my mind right now is when Blue Origin’s New Glenn rocket will fly again. The company has been saying “late spring.” Today, the Aerospace Safety Advisory Panel said they were told June. Several officials have suggested to Ars that the next launch will, in reality, occur no earlier than October. So when will we see New Glenn again?

As always, we welcome reader submissions, and if you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets as well as a quick look ahead at the next three launches on the calendar.

Phantom Space delays Daytona launch, again. In a story that accepts what Phantom Space Founder Jim Cantrell says at face value, Payload Space reports that the company is “an up-and-coming launch provider and satellite manufacturer” and has “steadily built a three-pronged business model to take on the industry’s powerhouses.” It’s a surprisingly laudatory story for a company that has yet to accomplish much in space.

Putting the brakes on Daytona … What caught my eye is the section on the Daytona rocket, a small-lift vehicle the company is developing. “The company expects to begin flying Daytona late next year or early 2027, and already has a Daytona II and III in the works,” the publication reports. Why is this notable? Because in an article published less than two years ago, Cantrell said Phantom was hoping to launch an orbital test flight in 2024. In other words, the rocket is further from launch today than it was in 2023. I guess we’ll see what happens. (submitted by BH)

It appears the Minotaur IV rocket still exists. A Northrop Grumman Minotaur IV rocket successfully launched multiple classified payloads for the US National Reconnaissance Office on Wednesday, marking a return to Vandenberg Space Force Base for the solid-fueled launch vehicle after more than a decade, Space News reports. The mission, designated NROL-174, lifted off at 3: 33 pm Eastern from Space Launch Complex 8 at Vandenberg, California. The launch was successful.

Back on the California Coast … The Minotaur IV is a four-stage vehicle derived in part from decommissioned Peacekeeper intercontinental ballistic missiles. The first three stages are government-furnished Peacekeeper solid rocket motors, while the upper stage is a commercial Orion solid motor built by Northrop Grumman. NROL-174 follows previous NRO missions flown on Minotaur rockets—NROL-129 in 2020 and NROL-111 in 2021—both launched from NASA’s Wallops Flight Facility in Virginia. (submitted by EllPeaTea)

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

French launch firm gets some funding runway. The French government has awarded Latitude funding to support the construction of its new rocket factory in Reims, which is expected to open in 2026, European Spaceflight reports. Latitude first announced plans to develop a larger rocket factory in late 2023, when it expanded its original site from 1,500 to 3,000 square meters. The new facility is expected to span approximately 25,000 square meters and will support a production capacity of up to 50 Zephyr rockets per year.

Working toward a launch next year … The Zephyr rocket is designed to deliver payloads of up to 200 kilograms to low-Earth orbit. It could make its debut in 2026 if all goes well. Latitude did not disclose the exact amount of funding it received for the construction of its new factory. However, it is known that while part of the funding will be awarded as a straight grant, a portion will take the form of a recoverable loan. (submitted by EllPeaTea)

RFA gets a new CEO. German launch vehicle startup Rocket Factory Augsburg has replaced its chief executive as it works toward a second chance for its first launch, Space News reports. Last Friday, RFA announced that Stefan Tweraser, who had been chief executive since October 2021, had been replaced by Indulis Kalnins.

Working toward a second launch attempt … The announcement did not give a reason for the change, but it suggested that the company was seeking someone with expertise in the aerospace industry to lead the company. Kalnins is on the aerospace faculty of a German university, Hochschule Bremen, and has been managing director of OHB Cosmos, which focused on launch services. RFA is working toward a second attempt at a first flight for RFA ONE later this year. (submitted by EllPeaTea)

Blue Origin launches all-female mission. Blue Origin’s 11th human flight—and first with an all-female flight team—blasted off from West Texas’ Launch Site One Monday morning on a flight that lasted about 10 minutes, Travel + Leisure reports. Katy Perry and Gayle King were joined by aerospace engineer Aisha Bowe, civil rights activist and scientist Amanda Nguyễn, film producer Kerianne Flynn, and Jeff Bezos’ fiancée, Lauren Sánchez.

I kissed a Kármán line … “This experience has shown me you never know how much love is inside of you, how much love you have to give, and how loved you are, until the day you launch,” Perry said in her post-flight interview on the Blue Origin livestream, calling the experience “second only to being a mom” and rating it “10 out of 10.”

Bahamas to SpaceX: Let’s press pause. The Bahamas government said on Tuesday it is suspending all SpaceX Falcon 9 rocket landings in the country, pending a full post-launch investigation of the latest Starship mishap, Reuters reports. “No further clearances will be granted until a full environmental assessment is reviewed,” Bahamian Director of Communications Latrae Rahming said.

Falling from the sky … The Bahamian government said in February, after SpaceX’s first Falcon 9 first stage landing in the country, that it had approved 19 more throughout 2025, subject to regulatory approval. The Bahamas’ post-launch investigation comes after a SpaceX Starship spacecraft exploded in space last month, minutes after lifting off from Texas. Following the incident, the Bahamas said debris from the spacecraft fell into its airspace.

NASA will fly on Soyuz for a while longer. NASA and Roscosmos have extended a seat barter agreement for flights to the International Space Station into 2027 that will feature longer Soyuz missions to the station, Space News reports. Under the no-exchange-of-funds barter agreement, NASA astronauts fly on Soyuz spacecraft and Roscosmos cosmonauts fly on commercial crew vehicles to ensure that there is at least one American and one Russian on the station should either Soyuz or commercial crew vehicles be grounded for an extended period. “NASA and Roscosmos have amended the integrated crew agreement to allow for a second set of integrated crew missions in 2025, one set of integrated crew missions in 2026, and a SpaceX Dragon flight in 2027,” an agency spokesperson said.

Flying fewer times per year. One change with the agreement is the cadence of Soyuz missions. While Roscosmos had been flying Soyuz missions to the ISS every six months, missions starting with Soyuz MS-27 this April will spend eight months at the station. Neither NASA nor Roscosmos offered a reason for the change, which means that Roscosmos will fly one fewer Soyuz mission over a two-year period: three instead of four. I presume that this is a cost-saving measure. (submitted by EllPeaTea)

Falcon 9 sets reuse record. SpaceX notched another new rocket reuse record with its midnight Starlink flight on Sunday night from Florida, Spaceflight Now reports. The Falcon 9 rocket booster with the tail number 1067 launched for a record-setting 27th time, further cementing its position as the flight leader among SpaceX’s fleet.

Approaching 500 launches … It supported the launch of 27 Starlink V2 Mini satellites heading into low-Earth orbit. The 27th outing for B1067 comes nearly four years after it launched its first mission, CRS-22, on June 3, 2021. Its three most recent missions were all in support of SpaceX’s Starlink satellite constellation. The Starlink 6-73 mission was also the 460th launch of a Falcon 9 rocket to date. (submitted by EllPeaTea)

The real story behind the Space Shuttle legislation. Last week, two US senators from Texas, John Cornyn and Ted Cruz, filed the “Bring the Space Shuttle Home Act” to move Space Shuttle Discovery from its current location at the Smithsonian’s National Air and Space Museum’s Steven F. Udvar-Hazy Center in Virginia to Houston. After the senators announced their bill, the collective response from the space community was initially shock. This was soon followed by: why? Ars spoke with several people on background, both from the political and space spheres, to get a sense of what is really happening here.

Bill is not going anywhere … The short answer is that it is all political, and the timing is due to the reelection campaign for Cornyn, who faces a stiff runoff against Ken Paxton. The legislation is, in DC parlance, a “messaging bill.” Cornyn is behind this, and Cruz simply agreed to go along. The goal in Cornyn’s campaign is to use the bill as a way to show Texans that he is fighting for them in Washington, DC, against the evils there. Presumably, he will blame the Obama administration, even though it is quite clear in hindsight that there were no political machinations behind the decision to not award a space shuttle to Houston. Space Center Houston, which would be responsible for hosting the shuttle, was not even told about the legislation before it was filed.

Next three launches

April 18: Long March 4B | Unknown payload | Taiyuan Satellite Launch Center, China | 22: 55 UTC

April 19: Falcon 9 | NROL-145 | Vandenberg Space Force Base, California | 10: 41 UTC

April 21: Falcon 9 | CRS-32 | Cape Kennedy Space Center, Florida | 08: 15 UTC

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

Rocket Report: Daytona rocket delayed again; Bahamas tells SpaceX to hold up Read More »

what-do-you-actually-do-in-mario-kart-world’s-vast-open-world?

What do you actually do in Mario Kart World’s vast open world?

Earlier this month, Nintendo let Ars Technica and other outlets have access to a small hands-on slice of Mario Kart World ahead of its planned June 5 launch. Today, a short livestreamed video presentation gave a bit of extra information about how exactly the full version of the free-roaming Nintendo Switch 2 launch game will work in practice.

As the name implies, Mario Kart World sets itself apart from previous games via a “vast interconnected world” that you can roam freely between the actual race courses. That open space between races will feature “hundreds of P-switches,” Nintendo said, each of which activates a small mission to “hone your driving abilities.” Free-roaming racers will also be able to find hidden medallions and question-mark panels, as well as “drive-thru” food items that can be used to unlock new outfits.

“Hundreds” of P-Switches like this will activate short missions throughout the game’s world.

Credit: Nintendo

“Hundreds” of P-Switches like this will activate short missions throughout the game’s world. Credit: Nintendo

While cruising around the Mario Kart “world,” players will stumble onto new courses “inspired by their surrounding region,” as well as “nostalgic courses for past titles… reimagined and spread throughout the world.” When playing in Grand Prix mode, the drive between these courses will be integrated into the usual four-course cups themselves; after racing Mario Kart Circuit in the Mushroom cup, for instance, the second race “will have you covering the distance from Mario Bros. Circuit to Crown City,” Nintendo said.

The game’s other main race mode, Knockout Tour, slowly whittles 24 racers down to just four via checkpoints spaced throughout the course. These “extended rallies” will take racers across the game world, with one track seamlessly flowing into another on a preset path.

“A vast, interconnected world.”

Credit: Nintendo

“A vast, interconnected world.” Credit: Nintendo

Players who prefer a more traditional three-lap race on a single course can do so via the game’s VS Mode races. The traditional battle mode will also return, with a Balloon Battle mode focused on hitting other players with weapons and a Coin Runners mode focused on getting more money than your opponents.

What do you actually do in Mario Kart World’s vast open world? Read More »

climate-change-will-make-rice-toxic,-say-researchers

Climate change will make rice toxic, say researchers

For six years, Ziska and a large team of research colleagues in China and the US grew rice in controlled fields, subjecting it to varying levels of carbon dioxide and temperature. They found that when both increased, in line with projections by climate scientists, the amount of arsenic and inorganic arsenic in rice grains also went up.

Arsenic is found naturally in some foods, including fish and shellfish, and in waters and soils.

Inorganic arsenic is found in industrial materials and gets into water—including water used to submerge rice paddies.

Rice is easily inundated with weeds and other crops, but it has one advantage: It grows well in water. So farmers germinate the seeds, and when the seedlings are ready, plant them in wet soil. They then flood their fields, which suppresses weeds, but allows the rice to flourish. Rice readily absorbs the water and everything in it—including arsenic, either naturally occurring or not. Most of the world’s rice is grown this way.

The new research demonstrates that climate change will ramp up those levels.

“What happens in rice, because of complex biogeochemical processes in the soil, when temperatures and CO2 go up, inorganic arsenic also does,” Ziska said. “And it’s this inorganic arsenic that poses the greatest health risk.”

Exposure to inorganic arsenic has been linked to cancers of the skin, bladder, and lung, heart disease, and neurological problems in infants. Research has found that in parts of the world with high consumption of rice, inorganic arsenic increases cancer risk.

Climate change will make rice toxic, say researchers Read More »

feds-charge-new-mexico-man-for-allegedly-torching-tesla-dealership

Feds charge New Mexico man for allegedly torching Tesla dealership

Wagner was first identified as a suspect due to an unspecified “investigative lead developed by law enforcement through scene evidence,” according to the arrest warrant. Investigators claim that after analyzing CCTV footage from buildings near the Republican office and traffic cameras, they identified a car consistent with the one registered to Wagner. After reviewing Wagner’s driver’s license and conducting physical surveillance outside his home, investigators also believed he resembled the person seen on surveillance footage from the Tesla showroom.

The arrest warrant claims that upon executing a search warrant at Wagner’s house, investigators found red spray paint, ignitable liquids “consistent with gasoline,” and jars consistent with evidence found at both the Tesla showroom fire and the Republican office fire. They also found a paint-stained stencil cutout reading “ICE=KKK” consistent with the graffiti found at the Republican office, and clothes that resembled what the suspect was seen wearing on surveillance footage outside the Tesla showroom.

According to the arrest warrant, the Bureau of Alcohol, Tobacco, Firearms and Explosives forensic laboratory tested “fire debris,” fingerprints, and possible DNA at the scene, but no results are cited in the warrant, which notes that an analysis of the evidence and seized electronic devices is still pending.

The five other people currently facing federal charges for allegedly damaging Tesla property include 42-year-old Lucy Grace Nelson of Colorado, 41-year-old Adam Matthew Lansky of Oregon, 24-year-old Daniel Clarke-Pounder of South Carolina, 24-year-old Cooper Jo Frederick of Colorado, and 36-year-old Paul Hyon Kim of Nevada.

The FBI’s Joint Terrorism Task Force investigated the incident that led to Kim’s indictment on April 9; however, press releases and court filings indicate that the task force was not deployed in the other four investigations.

This story originally appeared on wired.com.

Feds charge New Mexico man for allegedly torching Tesla dealership Read More »

razer-built-a-game-streaming-app-on-top-of-moonlight,-and-it’s-not-too-bad

Razer built a game-streaming app on top of Moonlight, and it’s not too bad

I intentionally touched as few settings as I could on each device (minus a curious poke or two at the “Optimize” option), and the experience was fairly streamlined. I didn’t have to set resolutions or guess at a data-streaming rate; Razer defaults to 30Mbps, which generally provides rock-solid 1080p and pretty smooth 1440p-ish resolutions. My main complaints were the missing tricks I had picked up in Moonlight, like holding the start/menu button to activate a temporary mouse cursor or hitting a button combination to exit out of games.

Razer’s app is not limited to Steam games like Steam Link or Xbox/Game Pass titles like Remote Play and can work with pretty much any game you have installed. It is, however, limited to Windows and the major mobile platforms, leaving out Macs, Apple TVs, Linux, Steam Deck and other handhelds, Raspberry Pi setups, and so on. Still, for what it does, it works pretty well, and its interface, while Razer-green and a bit showy, was easier to navigate than Moonlight. I did not, for example, have to look up the launching executables and runtime options for certain games to make them launch directly from my mobile device.

Streaming-wise, I noticed no particular differences from the Moonlight experience, which one might expect, given the shared codebase. The default choice of streaming at my iPad’s native screen resolution and refresh rate saved me the headaches of figuring out the right balance of black box cut-offs and resolution that I would typically go through with Steam Link or sometimes Moonlight.

Razer built a game-streaming app on top of Moonlight, and it’s not too bad Read More »

f1-in-bahrain:-i-dare-you-to-call-that-race-boring

F1 in Bahrain: I dare you to call that race boring

What a difference a week makes. This past weekend, Formula 1 went back to Bahrain, the site of this year’s preseason test, for round four of the 2025 season. Last week’s race in Japan sent many to sleep, but that was definitely not the case on Sunday. The overtaking was frenetic, the sparks didn’t set anything on fire, and the title fight just got that little bit more complicated. It was a heck of a race.

V10s? Not any time soon

Before the racing got underway, the sport got some clarity on future powertrain rules. An ambitious new ruleset goes into effect next year, with an all-new small-capacity turbocharged V6 engine working together with an electric motor that powers the rear wheels. Just under half the total power comes from the hybrid system, much more than the two hybrid systems on current F1 cars, and developing them is no easy task. Nor is it cheap.

F1 is also moving to supposedly carbon-neutral synthetic fuels next year, and that has prompted some to wonder—increasingly loudly—if instead of the expensive hybrids lasting for four years, maybe they could be replaced with a cheaper non-hybrid engine instead, like a naturally aspirated V10.

BAHRAIN, BAHRAIN - APRIL 13: Sparks fly behind Lewis Hamilton of Great Britain driving the (44) Scuderia Ferrari SF-25 and Lando Norris of Great Britain driving the (4) McLaren MCL39 Mercedes on track during the F1 Grand Prix of Bahrain at Bahrain International Circuit on April 13, 2025 in Bahrain, Bahrain.

McLaren’s Norris and Ferrari’s Hamilton at speed. Credit: Mark Sutton – Formula 1/Formula 1 via Getty Images

This would placate Red Bull. Next year, that team will field an engine of its own design and manufacture (albeit with Ford providing the hybrid stuff), and it’s been increasingly noisy about looking for alternatives to the small-capacity V6—problems with that program, perhaps? It would also start to decouple F1 from the automakers.

But naturally aspirated V10s don’t mean much to the tens of millions of fans that have flocked to the sport since the start of the decade—they’ve only ever known the muted drone of turbocharged V6s.

V10s mean even less to OEMs like Audi, Honda, Cadillac, and Ford, which committed to the 2026 rule set specifically because the powertrains are hybridized. So we’re going to stick with the original plan and can expect hybrids to continue into the 2031 ruleset, too, albeit probably a much smaller, lighter, cheaper, and less powerful electrified system than we’ll see next year.

F1 in Bahrain: I dare you to call that race boring Read More »

that-groan-you-hear-is-users’-reaction-to-recall-going-back-into-windows

That groan you hear is users’ reaction to Recall going back into Windows

Security and privacy advocates are girding themselves for another uphill battle against Recall, the AI tool rolling out in Windows 11 that will screenshot, index, and store everything a user does every three seconds.

When Recall was first introduced in May 2024, security practitioners roundly castigated it for creating a gold mine for malicious insiders, criminals, or nation-state spies if they managed to gain even brief administrative access to a Windows device. Privacy advocates warned that Recall was ripe for abuse in intimate partner violence settings. They also noted that there was nothing stopping Recall from preserving sensitive disappearing content sent through privacy-protecting messengers such as Signal.

Enshittification at a new scale

Following months of backlash, Microsoft later suspended Recall. On Thursday, the company said it was reintroducing Recall. It currently is available only to insiders with access to the Windows 11 Build 26100.3902 preview version. Over time, the feature will be rolled out more broadly. Microsoft officials wrote:

Recall (preview)saves you time by offering an entirely new way to search for things you’ve seen or done on your PC securely. With the AI capabilities of Copilot+ PCs, it’s now possible to quickly find and get back to any app, website, image, or document just by describing its content. To use Recall, you will need to opt-in to saving snapshots, which are images of your activity, and enroll in Windows Hello to confirm your presence so only you can access your snapshots. You are always in control of what snapshots are saved and can pause saving snapshots at any time. As you use your Copilot+ PC throughout the day working on documents or presentations, taking video calls, and context switching across activities, Recall will take regular snapshots and help you find things faster and easier. When you need to find or get back to something you’ve done previously, open Recall and authenticate with Windows Hello. When you’ve found what you were looking for, you can reopen the application, website, or document, or use Click to Do to act on any image or text in the snapshot you found.

Microsoft is hoping that the concessions requiring opt-in and the ability to pause Recall will help quell the collective revolt that broke out last year. It likely won’t for various reasons.

That groan you hear is users’ reaction to Recall going back into Windows Read More »