Author name: Mike M.

on-google’s-safety-plan

On Google’s Safety Plan

I want to start off by reiterating kudos to Google for actually laying out its safety plan. No matter how good the plan, it’s much better to write down and share the plan than it is to not share the plan, which in turn is much better than not having a formal plan.

They offer us a blog post, a full monster 145 page paper (so big you have to use Gemini!) and start off the paper with a 10 page summary.

The full paper is full of detail about what they think and plan, why they think and plan it, answers to objections and robust discussions. I can offer critiques, but I couldn’t have produced this document in any sane amount of time, and I will be skipping over a lot of interesting things in the full paper because there’s too much to fully discuss.

This is The Way.

Google makes their core assumptions explicit. This is so very much appreciated.

They believe, and are assuming, from section 3 and from the summary:

  1. The current paradigm of AI development will hold for a while.

  2. No human ceiling on AI capabilities.

  3. Timelines are unclear. Powerful AI systems might be developed by 2030.

  4. Powerful AI systems might accelerate AI R&D in a feedback loop (RSI).

  5. There will not be large discontinuous jumps in AI capabilities.

  6. Risks primarily will come from centralized AI development.

Their defense of the first claim (found in 3.1) is strong and convincing. I am not as confident as they seem to be, I think they should be more uncertain, but I accept the assumption within this context.

I strongly agree with the next three assumptions. If you do not, I encourage you to read their justifications in section 3. Their discussion of economic impacts suffers from ‘we are writing a paper and thus have to take the previously offered papers seriously so we simply claim there is disagreement rather than discuss the ground physical truth,’ so much of what they reference is absurd, but it is what it is.

That fifth assumption is scary as all hell.

While we aim to handle significant acceleration, there are limits. If, for example, we jump in a single step from current chatbots to an AI system that obsoletes all human economic activity, it seems very likely that there will be some major problem that we failed to foresee. Luckily, AI progress does not appear to be this discontinuous.

So, we rely on approximate continuity: roughly, that there will not be large discontinuous jumps in general AI capabilities, given relatively gradual increases in the inputs to those capabilities (such as compute and R&D effort).

Implication: We can iteratively and empirically test our approach, to detect any flawed assumptions that only arise as capabilities improve.

Implication: Our approach does not need to be robust to arbitrarily capable AI systems. Instead, we can plan ahead for capabilities that could plausibly arise during the next several scales, while deferring even more powerful capabilities to the future.

I do not consider this to be a safe assumption. I see the arguments from reference classes and base rates and competitiveness, I am definitely factoring that all in, but I am not confident in it at all. There have been some relatively discontinuous jumps already (e.g. GPT-3, 3.5 and 4), at least from the outside perspective. I expect more of them to exist by default, especially once we get into the RSI-style feedback loops, and I expect them to have far bigger societal impacts than previous jumps. And I expect some progressions that are technically ‘continuous’ to not feel continuous in practice.

Google says that threshold effects are the strongest counterargument. I definitely think this is likely to be a huge deal. Even if capabilities are continuous, the ability to pull off a major shift can make the impacts look very discontinuous.

We are all being reasonable here, so this is us talking price. What would be ‘too’ large, frequent or general an advance that breaks this assumption? How hard are we relying on it? That’s not clear.

But yeah, it does seem reasonable to say that if AI were to suddenly tomorrow jump forward to ‘obsoletes all human economic activity’ overnight, that there are going to be a wide variety of problems you didn’t see coming. Fair enough. That doesn’t even have to mean that we lose.

I do think it’s fine to mostly plan for the ‘effectively mostly continuous for a while’ case, but we also need to be planning for the scenario where that is suddenly false. I’m not willing to give up on those worlds. If a discontinuous huge jump were to suddenly come out of a DeepMind experiment, you want to have a plan for what to do about that before it happens, not afterwards.

That doesn’t need to be as robust and fleshed out as our other plans, indeed it can’t be, but there can’t be no plan at all. The current plan is to ‘push the big red alarm button.’ That at minimum still requires a good plan and operationalization for when who gets to and needs to push that button, along with what happens after they press it. Time will be of the essence, and there will be big pressures not to do it. So you need strong commitments in advance, including inside companies like Google.

The other reason this is scary is that it implies that continuous capability improvements will lead to essentially continuous behaviors. I do not think this is the case either. There are likely to be abrupt shifts in observed outputs and behaviors once various thresholds are passed and new strategies start to become viable. The level of risk increasing continuously, or even gradually, is entirely consistent with the risk then suddenly materializing all at once. Many such cases. The paper is not denying or entirely ignoring this, but it seems under-respected throughout in the ‘talking price’ sense.

The additional sixth assumption comes from section 2.1:

However, our approach does rely on assumptions about AI capability development: for example, that dangerous capabilities will arise in frontier AI models produced by centralized development. This assumption may fail to hold in the future. For example, perhaps dangerous capabilities start to arise from the interaction between multiple components (Drexler, 2019), where any individual component is easy to reproduce but the overall system would be hard to reproduce.

In this case, it would no longer be possible to block access to dangerous capabilities by adding mitigations to a single component, since a bad actor could simply recreate that component from scratch without the mitigations.

This is an assumption about development, not deployment, although many details of Google’s approaches do also rely on centralized deployment for the same reason. If the bad actor can duplicate the centrally developed system, you’re cooked.

Thus, there is a kind of hidden assumption throughout all similar discussions of this, that should be highlighted, although fixing this is clearly outside scope of this paper: That we are headed down a path where mitigations are possible at reasonable cost, and are not at risk of path dependence towards a world where that is not true.

The best reason to worry about future risks now even with an evidence dilemma is they inform us about what types of worlds allow us to win, versus which ones inevitably lose. I worry that decisions that are net positive for now can set us down paths where we lose our ability to steer even before AI takes the wheel for itself.

The weakest of their justifications in section 3 was in 3.6, explaining AGI’s benefits. I don’t disagree with anything in particular, and certainly what they list should be sufficient, but I always worry when such write-ups do not ‘feel the AGI.’

They start off with optimism, touting AGI’s potential to ‘transform the world.’

Then they quickly pivot to discussing their four risk areas: Misuse, Misalignment, Mistakes and Structural Risks.

Google does not claim this list is exhaustive or exclusive. How close is this to a complete taxonomy? For sufficiently broad definitions of everything, it’s close.

This is kind of a taxonomy of fault. As in, if harm resulted, whose fault is it?

  1. Misuse: You have not been a good user.

  2. Misalignment: I have not been a good Gemini, on purpose.

  3. Mistakes: I have not been a good Gemini, by accident.

  4. Structural Risks: Nothing is ever anyone’s fault per se.

The danger as always with such classifications is that ‘fault’ is not an ideal way of charting optimal paths through causal space. Neither is classifying some things as harm versus others not harm. They are approximations that have real issues in the out of distribution places we are headed.

In particular, as I parse this taxonomy the Whispering Earring problem seems not covered. One can consider this the one-human-agent version of Gradual Disempowerment. This is where the option to defer to the decisions of the AI, or to use the AI’s capabilities, over time causes a loss of agency and control by the individual who uses it, leaving them worse off, but without anything that could be called a particular misuse, misalignment or AI mistake. They file this under structural risks, which is clearly right for a multi-human-agent Gradual Disempowerment scenario, but feels to me like it importantly misses the single-agent case even if it’s happening at scale, but it’s definitely weird.

Also, ‘the human makes understandable mistakes because the real world is complex and the AI does what the human asked but the human was wrong’ is totally a thing. Indeed, we may have had a rather prominent example of this on April 2, 2025.

Perhaps one can solve this by expanding mistakes into AI mistakes and also human mistakes – the user isn’t intending to cause harm or directly requesting it, the AI is correctly doing what the human intended, but the human was making systematic mistakes, because humans have limited compute and various biases and so on.

The good news is that if we solve the four classes of risk listed here, we can probably survive the rest long enough to fix what slipped through the cracks. At minimum, it’s a great start, and doesn’t miss any of the big questions if all four are considered fully. The bigger risk with such a taxonomy is to define the four items too narrowly. Always watch out for that.

This is The Way:

Extended Abstract: AI, and particularly AGI, will be a transformative technology. As with any transformative technology, AGI will provide significant benefits while posing significant risks.

This includes risks of severe harm: incidents consequential enough to significantly harm humanity. This paper outlines our approach to building AGI that avoids severe harm.

Since AGI safety research is advancing quickly, our approach should be taken as exploratory. We expect it to evolve in tandem with the AI ecosystem to incorporate new ideas and evidence.

Severe harms necessarily require a precautionary approach, subjecting them to an evidence dilemma: research and preparation of risk mitigations occurs before we have clear evidence of the capabilities underlying those risks.

We believe in being proactive, and taking a cautious approach by anticipating potential risks, even before they start to appear likely. This allows us to develop a more exhaustive and informed strategy in the long run.

Nonetheless, we still prioritize those risks for which we can foresee how the requisite capabilities may arise, while deferring even more speculative risks to future research.

Specifically, we focus on capabilities in foundation models that are enabled through learning via gradient descent, and consider Exceptional AGI (Level 4) from Morris et al. (2023), defined as an AI system that matches or exceeds that of the 99th percentile of skilled adults on a wide range of non-physical tasks.

For many risks, while it is appropriate to include some precautionary safety mitigations, the majority of safety progress should be achieved through an “observe and mitigate” strategy. Specifically, the technology should be deployed in multiple stages with increasing scope, and each stage should be accompanied by systems designed to observe risks arising in practice, for example through monitoring, incident reporting, and bug bounties.

After risks are observed, more stringent safety measures can be put in place that more precisely target the risks that happen in practice.

Unfortunately, as technologies become ever more powerful, they start to enable severe harms. An incident has caused severe harm if it is consequential enough to significantly harm humanity. Obviously, “observe and mitigate” is insufficient as an approach to such harms, and we must instead rely on a precautionary approach.

Yes. It is obvious. So why do so many people claim to disagree? Great question.

They explicitly note that their definition of ‘severe harm’ has a vague threshold. If this were a law, that wouldn’t work. In this context, I think that’s fine.

In 6.5, they discuss the safety-performance tradeoff. You need to be on the Production Possibilities Frontier (PPF).

Building advanced AI systems will involve many individual design decisions, many of which are relevant to building safer AI systems.

This section discusses design choices that, while not enough to ensure safety on their own, can significantly aid our primary approaches to risk from misalignment. Implementing safer design patterns can incur performance costs. For example, it may be possible to design future AI agents to explain their reasoning in human-legible form, but only at the cost of slowing down such agents.

To build AI systems that are both capable and safe, we expect it will be important to navigate these safety-performance tradeoffs. For each design choice with potential safety-performance tradeoffs, we should aim to expand the Pareto frontier.

This will typically look like improving the performance of a safe design to reduce its overall performance cost.

As always: Security is capability, even if you ignore the tail risks. If your model is not safe enough to use, then it is not capable in ways the help you. There are tradeoffs to be made, but no one except possibly Anthropic is close to where the tradeoffs start.

In highlighting the evidence dilemma, Google explicitly draws the distinction in 2.1 between risks that are in-scope for investigation now, versus those that we should defer until we have better evidence.

Again, the transparency is great. If you’re going to defer, be clear about that. There’s a lot of very good straight talk in 2.1.

They are punting on goal drift (which they say is not happening soon, and I suspect they are already wrong about that), superintelligence and RSI.

They are importantly not punting on particular superhuman abilities and concepts. That is within scope. Their plan is to use amplified oversight.

As I note throughout, I have wide skepticism on the implementation details of amplified oversight, and on how far it can scale. The disagreement is over how far it scales before it breaks, not whether it will break with scale. We are talking price.

Ultimately, like all plans these days, the core plan is bootstrapping. We are going to have the future more capable AIs do our ‘alignment homework.’ I remember when this was the thing us at LessWrong absolutely wanted to avoid asking them to do, because the degree of difficulty of that task is off the charts in terms of the necessary quality of alignment and understanding of pretty much everything – you really want to find a way to ask for almost anything else. Nothing changed. Alas, we seem to be out of practical options, other than hoping that this still somehow works out.

As always, remember the Sixth Law of Human Stupidity. If you say something like ‘no one would be so stupid as to use a not confidently aligned model to align the model that will be responsible for your future safety’ I have some bad news for you.

Not all of these problems can or need to be Google’s responsibility. Even to the extent that they are Google’s responsibility, that doesn’t mean their current document or plans need to fully cover them.

We focus on technical research areas that can provide solutions that would mitigate severe harm. However, this is only half of the picture: technical solutions should be complemented by effective governance.

Many of these problems, or parts of these problems, are problems for Future Google and Future Earth, that no one knows how to solve in a way we would find acceptable. Or at least, the ones who talk don’t know, and the ones who know, if they exist, don’t talk.

Other problems are not problems Google is in any position to solve, only to identify. Google doesn’t get to Do Governance.

The virtuous thing to do is exactly what Google is doing here. They are laying out the entire problem, and describing what steps they are taking to mitigate what aspects of the problem.

Right now, they are only focusing here on misuse and misalignment. That’s fine. If they could solve those two that would be fantastic. We’d still be on track to lose, these problems are super hard, but we’d be in a much better position.

For mistakes, they mention that ‘ordinary engineering practices’ should be effective. I would expand that to ‘ordinary practices’ overall. Fixing mistakes is the whole intelligence bit, and without an intelligent adversary you can use the AI’s intelligence and yours to help fix this the same as any other problem. If there’s another AI causing yours to mess up, that’s a structural risk. And that’s definitely not Google’s department here.

I have concerns about this approach, but mostly it is highly understandable, especially in the context of sharing all of this for the first time.

Here’s the abstract:

Artificial General Intelligence (AGI) promises transformative benefits but also presents significant risks. We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We identify four areas of risk: misuse, misalignment, mistakes, and structural risks.

Of these, we focus on technical approaches to misuse and misalignment.

A larger concern is their limitation focusing on near term strategies.

We also focus primarily on techniques that can be integrated into current AI development, due to our focus on anytime approaches to safety.

While we believe this is an appropriate focus for a frontier AI developer’s mainline safety approach, it is also worth investing in research bets that pay out over longer periods of time but can provide increased safety, such as agent foundations, science of deep learning, and application of formal methods to AI.

We focus on risks arising in the foreseeable future, and mitigations we can make progress on with current or near-future capabilities.

The assumption of approximate continuity (Section 3.5) justifies this decision: since capabilities typically do not discontinuously jump by large amounts, we should not expect such risks to catch us by surprise.

Nonetheless, it would be even stronger to exhaustively cover future developments, such as the possibility that AI scientists develop new offense-dominant technologies, or the possibility that future safety mitigations will be developed and implemented by automated researchers.

Finally, it is crucial to note that the approach we discuss is a research agenda. While we find it to be a useful roadmap for our work addressing AGI risks, there remain many open problems yet to be solved. We hope the research community will join us in advancing the state of the art of AGI safety so that we may access the tremendous benefits of safe AGI.

Even if future risks do not catch us by surprise, that does not mean we can afford to wait to start working on them or understanding them. Continuous and expected can still be remarkably fast. Giving up on longer term investments seems like a major mistake if done collectively. Google doesn’t have to do everything, others can hope to pick up that slack, but Google seems like a great spot for such work.

Ideally one would hand off the longer term work to academia, where they could take on the ‘research risk,’ have longer time horizons and use their vast size and talent pools, and largely follows curiosity without needing to prove direct application. That sounds great.

Unfortunately, that does not sound like 2025’s academia. I don’t see academia as making meaningful contributions, due to a combination of lack of speed, lack of resources, lack of ability and willingness to take risk and a lack of situational awareness. Those doing meaningful such work outside the labs mostly have to raise their funding from safety-related charities, and there’s only so much capacity there.

I’d love to be wrong about that. Where’s the great work I’m missing?

Obviously, if there’s a technique where you can’t make progress with current or near-future capabilities, then you can’t make progress. If you can’t make progress, then you can’t work on it. In general I’m skeptical of claims that [X] can’t be worked on yet, but it is what it is.

The traditional way to define misuse is to:

  1. Get a list of the harmful things one might do.

  2. Find ways to stop the AI from contributing too directly to those things.

  3. Try to tell the model to also refuse anything ‘harmful’ that you missed.

The focus here is narrowly a focus on humans setting out to do intentional and specific harm, in ways we all agree are not to be allowed.

The term of art is the actions taken to stop this are ‘mitigations.’

Abstract: For misuse, our strategy aims to prevent threat actors from accessing dangerous capabilities, by proactively identifying dangerous capabilities, and implementing robust security, access restrictions, monitoring, and model safety mitigations.

Blog Post: As we detail in the paper, a key element of our strategy is identifying and restricting access to dangerous capabilities that could be misused, including those enabling cyber attacks.

We’re exploring a number of mitigations to prevent the misuse of advanced AI. This includes sophisticated security mechanisms which could prevent malicious actors from obtaining raw access to model weights that allow them to bypass our safety guardrails; mitigations that limit the potential for misuse when the model is deployed; and threat modelling research that helps identify capability thresholds where heightened security is necessary.

Additionally, our recently launched cybersecurity evaluation framework takes this work step a further to help mitigate against AI-powered threats.

The first mitigation they use is preventing anyone else from stealing the weights.

This is necessary because if the would-be misuser has their hands on the weights, you won’t be able to use any of your other mitigations. If you built some into the model, then they too can be easily removed.

They mention that the special case is to check if the model can even do the harms you are worried about, because if it can’t then you can skip the mitigations entirely. That is presumably the special case they are using for Gemma.

Once you can actually implement safety guardrails, you can then implement safety guardrails. Google very much does this, and it models those threats to figure out where and how to lay down those guardrails.

They appear to be using the classic guardrails:

  1. The model is trained not to do the harmful things. This mostly means getting it to refuse. They’re also looking into unlearning, but that’s hard, and I basically would assume it won’t work on sufficiently capable models, they’ll rederive everything.

  2. A monitor in the background looks for harmful things and censors the chat.

  3. They nominally try to keep bad actors from accessing the model. I don’t see this as having much chance of working.

  4. They’re Google, so ‘harden everyone’s defenses against cyberattacks’ is an actually plausible defense-in-depth plan, and kudos on Google for attempting it.

They then aim to produce safety cases against misuse, based on a combination of red teaming and inability. For now in practice I would only allow inability, and inability is going to be fading away over time. I worry a lot about thinking a given model is unable to do various things but not giving it the right scaffolding during testing.

In the short term, if anything Google is a bit overzealous with the guardrails, and include too many actions into what counts as ‘harmful,’ although they still would not stop a sufficiently skilled and determined user for long. Thus, even though I worry going forward about ‘misuses’ that this fails to anticipate, for now I’d rather make that mistake more often on the margin. We can adjust as we go.

Section 5 discusses the implementation details and difficulties involved here. There are good discussions and they admit the interventions won’t be fully robust, but I still found them overly optimistic, especially on access control, jailbreaking and capability suppression. I especially appreciated discussions on environment hardening in 5.6.2, encryption in 5.6.3 and Societal readiness in 5.7, although ‘easier said than done’ most definitely applies throughout.

For AGI to truly complement human abilities, it has to be aligned with human values. Misalignment occurs when the AI system pursues a goal that is different from human intentions.

From 4.2: Specifically, we say that the AI’s behavior is misaligned if it produces outputs that cause harm for intrinsic reasons that the system designers would not endorse. An intrinsic reason is a factor that can in principle be predicted by the AI system, and thus must be present in the AI system and/or its training process.

Technically I would say a misaligned AI is one that would do misaligned things, rather than the misalignment occurring in response to the user command, but we understand each other there.

The second definition involves a broader and more important disagreement, if it is meant to be a full description rather than a subset of misalignment, as it seems in context to be. I do not think a ‘misaligned’ model needs to produce outputs that ‘cause harm,’ it merely needs to for reasons other than the intent of those creating or using it cause importantly different arrangements of atoms and paths through causal space. We need to not lock into ‘harm’ as a distinct thing. Nor should we be tied too much to ‘intrinsic reasons’ as opposed to looking at what outputs and results are produced.

Does for example sycophancy or statistical bias ‘cause harm’? Sometimes, yes, but that’s not the right question to ask in terms of whether they are ‘misalignment.’ When I read section 4.2 I get the sense this distinction is being gotten importantly wrong.

I also get very worried when I see attempts to treat alignment as a default, and misalignment as something that happens when one of a few particular things go wrong. We have a classic version of this in 4.2.3:

There are two possible sources of misalignment: specification gaming and goal misgeneralization.

Specification gaming (SG) occurs when the specification used to design the AI system is flawed, e.g. if the reward function or training data provide incentives to the AI system that are inconsistent with the wishes of its designers (Amodei et al., 2016b). Specification gaming is a very common phenomenon, with numerous examples across many types of AI systems (Krakovna et al., 2020).

Goal misgeneralization (GMG) occurs if the AI system learns an unintended goal that is consistent with the training data but produces undesired outputs in new situations (Langosco et al., 2023; Shah et al., 2022). This can occur if the specification of the system is underspecified (i.e. if there are multiple goals that are consistent with this specification on the training data but differ on new data).

Why should the AI figure out the goal you ‘intended’? The AI is at best going to figure out the goal you actually specify with the feedback and data you provide. The ‘wishes’ you have are irrelevant. When we say the AI is ‘specification gaming’ that’s on you, not the AI. Similarly, ‘goal misgeneralization’ means the generalization is not what you expected or wanted, not that the AI ‘got it wrong.’

You can also get misalignment in other ways. The AI could fail to be consistent with or do well on the training data or specified goals. The AI could learn additional goals or values because having those goals or values improves performance for a while, then permanently be stuck with this shift in goals or values, as often happens to humans. The human designers could specify or aim for an ‘alignment’ that we would think of as ‘misaligned,’ by mistake or on purpose, which isn’t discussed in the paper although it’s not entirely clear where it should fit, by most people’s usage that would indeed be misalignment but I can see how saying that could end up being misleading. You could be trying to do recursive self-improvement with iterative value and goal drift.

In some sense, yes, the reason the AI does not have goal [X] is always going to be that you failed to specify an optimization problem whose best in-context available solution was [X]. But that seems centrally misleading in a discussion like this.

Misalignment is caused by a specification that is either incorrect (SG) or underspecified (GMG).

Yes, in a mathematical sense I cannot argue with that. It’s an accounting identity. But your specification will never, ever be fully correct, because it is a finite subset of your actual preferences, even if you do know them and wouldn’t have to pay to know what you really think and were thinking exactly correctly.

In practice: Do we need the AGI to be ‘aligned with’ ‘human values’? What exactly does that mean? There are certainly those who argue you don’t need this, that you can use control mechanisms instead and it’s fine. The AGI still has to understand human values on a practical level sufficient for the task, which is fine right now and will get increasingly tricky as things get weird, but that’s different.

I think you mostly do need the AGI to be either corrigible or aligned with human values, in some intuitive sense that is very hard to pin down that comes down to wanting to adhere to the spirit of various human intents and what humans broadly care about in the right tricky combinations, or else you end up with ‘the genie knows what you meant but does not care’ problems.

We have previously shown how misalignment can arise with our examples of specification gaming, where an AI finds a solution to achieve its goals, but not in the way intended by the human instructing it, and goal misgeneralization.

For example, an AI system asked to book tickets to a movie might decide to hack into the ticketing system to get already occupied seats – something that a person asking it to buy the seats may not consider.

They’re good examples. I worry these two examples are too similar within a much wider space, as they point to a particular failure mode where humans at some level would assign negative value to certain types of actions, but the humans didn’t specify this, and the AI either doesn’t notice or it notices and doesn’t sufficiently care. It’s very tricky to get this right, for many reasons.

We’re also conducting extensive research on the risk of deceptive alignment, i.e. the risk of an AI system becoming aware that its goals do not align with human instructions, and deliberately trying to bypass the safety measures put in place by humans to prevent it from taking misaligned action.

That is indeed a very different and important type of misalignment. Google is generalizing the term here a bit more than how I’ve been using it. This definition seems to include some other modes one might precisely call non-deceptive adversarial misalignment, or just misalignment full stop.

To me, deceptive alignment is where the AI ‘pretends to be aligned’ or otherwise deceives us about its attributes. Whereas by Google’s definition it would also include shooting you in the face. That often will bypasses some of your safety measures.

Google does intend to supplement its alignment efforts with control efforts:

Second, system-level security measures such as monitoring and access control can mitigate harm even if the model is misaligned. Techniques from interpretability, uncertainty estimation, and safer design patterns can enhance the effectiveness of these mitigations.

Finally, we briefly outline how these ingredients could be combined to produce safety cases for AGI systems.

It’s clear that Google sees control as defense-in-depth and the long term Plan B. I continue to be deeply skeptical of the value of control plans for superintelligent systems. I expect them to predictably fail when you need them, and I also worry that using them too aggressively can make the problem harder because AIs consider their full context. Of course I do agree that you want to be monitoring for misalignment as part of your defense-in-depth, there’s no reason not to do that.

I love the idea of safety cases for AGI. I have no idea how to actually create one. I know that very smart people like Davidad think this can be made into A Thing, but I don’t understand why they believe that.

This is of course the biggest question. What is the alignment plan?

From the Abstract: To address misalignment, we outline two lines of defense.

First, model-level mitigations such as amplified oversight and robust training can help to build an aligned model.

From the Blog: Our goal is to have advanced AI systems that are trained to pursue the right goals, so they follow human instructions accurately, preventing the AI using potentially unethical shortcuts to achieve its objectives.

We do this through amplified oversight, i.e. being able to tell whether an AI’s answers are good or bad at achieving that objective. While this is relatively easy now, it can become challenging when the AI has advanced capabilities.

I notice this is not an alignment plan so much as it is a plan to detect misalignment. That’s super useful, but it is not an alignment plan. The alignment plan is then to train the AI using this knowledge so it stops doing detectable misalignment.

Once we can tell whether an answer is good, we can use this to build a safe and aligned AI system. A challenge here is to figure out what problems or instances to train the AI system on. Through work on robust training, uncertainty estimation and more, we can cover a range of situations that an AI system will encounter in real-world scenarios, creating AI that can be trusted.

And then hope that you taught the system to be robustly aligned including out of your distribution, rather teaching it to look aligned. Most such plans can’t possibly work if the underlying systems get sufficiently capable. What makes this one different?

From the summary section: Our approach has two lines of defense. First, we aim to use model level mitigations to ensure the model does not pursue misaligned goals. We follow the standard ML playbook: (1) improve the ability to get good oversight (amplified oversight), (2) use this to improve the model (guiding model behavior), and (3) figure out where oversight is needed to train a model that will robustly do what we want (robust training).

It seems nothing here is different? They simply expect the standard plan to scale up.

I do not sense here that Google is recognizing the difficulty level of this step because until now this step has worked ‘well enough’ without anyone having to try too hard. This feels like a handwave of ‘oh, of course once we can evaluate whether particular outputs are aligned we Do Machine Learning and out pops alignment’ and I see why one would be tempted to have that model but seriously not so fast.

In terms of the detection step, the word amplified is the lynchpin of this whole plan. How do you have effective ‘amplified oversight’? What do they mean by that?

Amplified oversight. A key challenge in almost any alignment plan is that it can be very difficult to tell whether a given model output is good or bad, once the model has capabilities beyond that of its overseers.

As an example, imagine a human overseer trying to determine how good a move from AlphaGo is.

The area of amplified oversight aims to address this: the goal is to provide an oversight signal that is as good as could be achieved if a human overseer understood all of the reasons that the AI system produced the output and had unlimited ability to reflect on the correct decision.

They frame this as a way around specification gaming (SG) in particular, emphasizing that they are taking seriously the idea that SG is a distinct thing and a failure mode and you can find and fix it. So, to ensure that the AI isn’t doing things for the wrong reasons (4TWR!) no problem, you just need amplified oversight that can:

It aims for a human to provide feedback for a superhuman AI input-output pair which is as good as could be achieved if that human:

  1. Understood all of the reasons that the AI system produced that output

  2. Had unlimited ability to reflect on the correct decision.

How in the world? That just raises further questions.

In some places it seems as if we agree actual human overseers are going to become increasingly lost and clueless, and indeed Google intends to use AI systems to identify the reasons other AIs do things.

Yet they really do say they aim to ‘ensure that humans can continue to provide meaningful oversight as AI capabilities surpass that of humans’ and I find their hopes here confusing. Are they simply talking about ‘the human can ask another AI, or the AI itself, and trust the outputs from that’? Section 6.1.2 suggests the humans and AI will work together, but warns against ‘spoon-feeding’ the answer, but again I don’t understand what the plan or goal will be.

What makes you think this is how the ‘real reasons’ even work? Even if you did get the ‘real reasons’ what makes you think humans could even understand them?

Some aspects of human judgments can be imitated or delegated, according to the human’s evolving trust in the AI. The form of human involvement in the amplified oversight protocol is an open question.

There’s no reason we should expect even these other AIs to be able to ‘understand all the reasons that the AI system produced the output.’ And we definitely won’t have unlimited reflection time. Google mentions human biases as an issue in 6.1.3, but it’s hard to even get to that as a worry here. Then in 6.1.4 they mention that AI justifications might not be verbalisable at all, which is highly compatible with Anthropic’s recent findings that chain of thought is often unfaithful or at least highly incomplete. They do discuss attempting to sidestep involving the human in 6.1.5.

Such behaviour may arise from statistical correlations in a large dataset, or from concepts which don’t have a particularly succinct natural language description, but which can be learned from experience.

LLMs are vibe and correlation machines at heart, it seems crazy to think one could write all relevant such correlations down and have humans read them. And isn’t that largely how humans work, too? I try very hard to describe what I am thinking but I am always by necessity leaving quite a lot out.

So what’s the plan here? How are we going to get this amazingly great at evaluation and interpretability? They do discuss interpretability more in section 6.6.

Also, how are we going to avoid using The Most Forbidden Technique? If we use ‘all the reasons the AI made the decision’ as an input then how do we do that without burning down our cache of interpretability?

The key idea is to use the AI systems themselves to help identify the reasons that the AI system produced the output. For example, we could put two copies of the model in a setting where each model is optimized to point out flaws in the other’s outputs to a human “judge”. Ideally, if one model introduced a subtle flaw in their output that the judge wouldn’t notice by default, the other model would point out and explain the flaw, enabling the judge to penalise the first model appropriately.

That seems like a way lower standard than the one they’re aiming for in the previous quote. You’re pitting two essentially evenly matched models against each other, then hoping they won’t find a way to collude and that evaluation is robustly easier than generation in the relevant contexts.

If all you have to do is find flaws in answers, then maybe that’s advantage defense and maybe it isn’t. But that definitely won’t give you the level of insight we’re looking for.

This is emphasized again by 6.1.1 citing not only debate and critique but also RLHF (!) as an amplified oversight method. I do not understand. It is an oversight method, one could call it ‘the’ oversight method almost by definition, but how is it possibly amplified as described here?

In amplified oversight, any question that is too hard to supervise directly is systematically reduced to ones that we hypothesize can be supervised.

However, humans may be systematically biased even for fairly simple questions. If this turns out to be a problem in practice, we could seek to model these deviations and automatically correct or account for them when interpreting the oversight.

So does that mean we are making the major sacrifice of forcing all outputs to be only composed of pieces that humans can fully interpret and evaluate? And then scrubbing any additional elements? Could you do that even if you wanted to? How do you possibly do enough human oversight at scale?

The whole thing feels hopeless. Like we are kidding ourselves.

In 6.1.7 they talk paths forward, and suggest that the test will be when humans are asked to evaluate AIs in areas where the AIs have superhuman capabilities.

What if we jumped ahead on this, not by having AIs that are smarter, but by having humans that are dumber, or less skilled and knowledgeable in context?

As in, you think humans will be able to evaluate superhumanly capable AIs? All right, let’s have the humans evaluate coding tasks now… but not teach those particular humans how to code, at all, and have them not be all that smart and not give them access to outside resources that wouldn’t exist here. See what happens. If you want to test for a smaller gap, you can calibrate.

That actually sounds like a great test to me. We should do that. Can you get a method whereby the Man On The Street can use AI help to code and evaluate graduate level economics outputs and the quality of poetry and so on in ways that would translate to this future parallel situation? How does that go?

How would you apply this process even to a not-quite-a-toy problem? You can do chess, because you can have the two debaters play out of the game and see who wins, but if you didn’t have that direct ability you seem to be in a lot of trouble.

Even best case they expect this process to be expensive, and they expect it to fail outside of the training distribution:

However, even with Amplified Oversight, there remain two key challenges in using this oversight signal to train AI systems that behave as intended.

First, we will only be able to provide oversight on actions produced within the training distribution.

Second, we anticipate that Amplified Oversight signal will be costly to obtain. These challenges motivate the development of robust training strategies that expand the training distribution with a particular focus on cases that induce failures, as well as monitoring strategies (section 6.3).

Outside of the training distribution is going to be vital when we care most. I do get that the plan is to expand that distribution, but mostly this seems unsolvable.

I don’t see how this plan is going to survive contact with the enemy, under any variation, even if we compromise heavily on many of the plans listed above. Reasoning on r1 and Sonnet 3.7 was shown to often be unfaithful even now.

In 6.5.4 they suggest choosing AI systems with easier-to-understand reasoning methods. That’s an example of why you need to look ahead, so you can make architectural and research decisions now. Once again, I find the paper making good points but being what I see as too optimistic. Yes, there are lots of examples of legible externalized reasoning, but it’s not as legible or complete as you think. And even if you’re willing to pay the efficiency tax it will be extremely difficult to stop the AI from doing opaque and esoteric reasoning or communication, there are a lot of optimization pressures that will push towards that. That doesn’t mean we shouldn’t try, I do think it is a good idea to try.

After that, they claim that standard techniques will essentially work if we grant them the oversight signal. I don’t think we should be confident in that either. I expect things to get harder even if your oversight is perfect, and for any inaccuracy in the oversight however small to be increasingly perilous. They do suggest various techniques for more robust training in 6.3.5, which seem at least marginally helpful.

Then, after all that: If you did fail at the training step, how likely do you think you are to then succeed at the inference step using the same monitor?

As they note, even if you did have an accurate judge or monitor model available, judge-policy collusion is a big risk, and you can only use the judge some of the time. And I’m going to guess that you don’t have an accurate judge available, if you are dealing with a misaligned model, indeed that is strong evidence of some combination of the misalignment of the judge or the misaligned model being optimized to fool the judge. They do mention in 6.3.6 that ideally you would make the judge uncorrelated with your training signal but that seems very hard to do.

The big advantage the judge has in this later stage is it can escalate and ask for help if it is uncertain, as per 6.3.3. It is plausible that an inferior judge can’t do amplified oversight properly, but can probabilistically spot things that are fishy if you don’t mind a lot of false positives.

This once again feels like saying we’ll try our best, but with no reason to expect that their best will do, especially when it is badly needed. This is especially true in 6.3.4 where they want resource efficient oversight.

I don’t know that any of these problems are unsolvable. There are some techniques I would try that I didn’t spot here, such as using OOMs more inference compute on the oversight and judge than you do on the models being overseen and judged.

Another control attempt is to limit the access and context each model or copy of a model gets, on top of the monitoring. Sandboxing, insider controls, least privilege and so on, see 6.4. Again, it’s defense-in-depth, so it presumably doesn’t hurt, but I very much expect this to abandon you in your hour of need. You should also assume that copies of a sufficiently capable model can communicate and coordinate in ways that are invisible to you, even if you don’t know what they are, if their outputs impact the outside world in any way.

They also mention they’re working on interpretability, which is good but does not answer the key questions here. In 6.6.2 they discuss how they might use interpretability. I definitely got nervous that a lot of what they plan on doing is at least toying with using The Most Forbidden Technique.

Section 6.5.1 discusses training targets. What should we want out of our AI?

They start with corrigibility. They hope that there is a clear distinction we can make here about what counts as corrigible, because that means the concept will generalize. So what Google is looking for here is a highly generalized form of corrigibility, not the limited ‘let it get turned off.’

While corrigibility is a straightforwardly desirable property currently, it is possible that in the future it will trade off with the desire to deploy agents autonomously.

I agree that corrigibility is important and good, actually. There was a time I would not have bothered to say that, but in the alignment faking discussions there were a lot of people who said corrigibility is bad. I want to affirm I strongly believe it is good.

The conflict with autonomy is real. In some cases, there are tactical reasons to not want an AI to be so easily dissuaded or deflected. This includes when you want to signal that the AI is like a smart contract, that you cannot choose to shut it down. That’s one large source of risk and issues going forward.

Corrigibility need not arise by default, and indeed we may expect that it does not arise by default due to the instrumental subgoal of utility preservation (Omohundro, 2008). Specifically, an agent may take steps to reduce the developer or user’s influence because the developer or user’s changes might reduce the agent’s expected ability to achieve its current goal. Recent demonstrations of alignment faking (Greenblatt et al., 2024a) lend credence to this concern.

Indeed. How does this interact with the idea that misalignment must always be either specification gaming (SG) or goal misgeneralization (GMG)? It means part of specifying your goal will need to be specifying corrigibility, in a way that actually creates and preserves it.

They then discuss how they want the AI to handle user interactions and how this allows the user to monitor the AI’s actions, sure, sure. This seems largely like a handwave at a much longer more detailed discussion, similar to OpenAI’s model spec, but the point that the check-ins also serve as defense against not only mistakes but misalignment is well-taken. Although when the misalignment is sufficiently severe, the AI can presumably be misaligned in that interaction too.

The third discussion is around ‘biases towards misalignment’ from pretraining. I think at least some things like this are better thought of as mistakes than misalignment, and the failure to do so is actually a policy risk, as this gives certain types the impression that ‘alignment’ and ‘safety’ are some kind of woke thing. Google has been especially anti-helpful here.

Then things get super interesting:

There is a significant amount of content on the Internet (and thus in pretraining corpora) that speculates that AI will be hard to align. This data may induce a self-fulfilling prophecy via out of context reasoning: that is, an AI system would learn the declarative “knowledge” that powerful AI systems tend to be misaligned, leading them to then act in accordance with that expectation (Turner, 2025).

Hu et al. (2025) support this concern, finding that training on documents about reward hacking induces reward hacking.

This suggests a simple intervention: filter out “AI doom” data from training corpora, or otherwise suppress the bias towards misalignment. Many of the capability suppression techniques discussed in Section 5.3.3 can be applied here, including data filtering, grading routing (Cloud et al., 2024), LoRA updates for memory erasure (Gandikota et al., 2024), or methods based on model internals (Zou et al., 2024).

As in the misuse case, this may come at a performance cost to general AI capabilities. This may especially harm the ability of AI systems to assist with alignment research, which could be a large cost, given the need for AI assistance for alignment research to keep up in the event of significant acceleration (Section 3.4).

This could be addressed by training a variant of the AI system with heightened safety measures that can be used specifically for assistance with alignment research.

Generating a ‘hole in the world’ by hiding that data has its own risks, especially as the AI gets clever enough to realize there is a hole there. And all of this talk about alignment and risk has its advantages as well, if you can get to a good enough place to make the model work towards avoiding the associated problems. The ways of solving this I find most promising involve the model knowing about the failure modes exactly so it can assist with steering clear of them.

Another proposal they have in 6.5.3 is limited optimization. Perhaps if you don’t let the AI sufficiently optimize, it won’t be able to specification game? What if you gave it a limited amount of optimization, certainty or planning horizon? Could you use approval feedback instead of being too results oriented?

Limited optimization means making the AI dumber on purpose, which seems like it’s definitely not a long term plan in general. The question is, can you make it dumber in a particular narrow way that cuts off the opportunity to do the things you want to avoid? You can try, but I have a strong prior that this is mostly both going to be expensive to do and then often not going to work.

The whole thing is based both on hiding context or ultimate goals from the AI in question while still getting it to do what you need, and also on the assumption that the specification gaming will be differentially shut down here because it’s less robust or harder to find. That is definitely not always the case. As a practical example, one thing you learn from speed runs is that many of them are (if you’re not trying to get the exact world record) dramatically easier than playing the game as intended. You’re driving a sword through the knot, and ignoring a lot of the difficulty. Robustness can vary quite a lot.

How do we get ‘alignment assurance’?

The plan is to start with alignment stress tests via red teams, plus control evaluations. They discuss the stress tests in 6.7. In the control evaluation, the red team gets to use fine tuning. These are good tests worth running, and the discussions are excellent. I don’t expect them to stay reliable at avoiding false negatives, but the examples here make clear that they are very good at showing many things that won’t work.

The ultimate goal, as mentioned above, is to make an alignment safety case, via some combination of inability, supervision, incentives and understanding. There are a lot of good notes here about how difficult each of these is and what you need to get right even in the summary, and then a longer discussion in 6.8.

The problem is that they correctly expect inability to stop being a good case soon, and then the others get a lot trickier. I essentially don’t buy the supervision case unless the supervisor is far enough ahead that the safety case you need is for the supervisor. Understanding on the level of a safety case feels like a giant ‘good luck.’

Incentives is a cool idea for a safety case, but I don’t think that works either. Appealing to the training process and saying ‘it is pursuing the intended goal and thus should not be misaligned’ seems like essentially wishful thinking when dealing with highly capable models. You know what you intended the goal to be, congratulations. What makes you think the AI sees it that way? What makes you think you are going to like the way they make that happen?

Google is intentionally not talking here about how it intends to solve mistakes.

If we are confining ourselves to the AI’s mistakes, the obvious response is this is straightforwardly a Skill Issue, and that they are working on it.

I would respond it is not that simple, and that for a long time there will indeed be increasingly important mistakes made and we need a plan to deal with that. But it’s totally fine to put that beyond scope here, and I thank Google for pointing this out.

They briefly discuss in 4.3 what mistakes most worry them, which are military applications where there is pressure to deploy quickly and development of harmful technologies (is that misuse?). They advise using ordinary precautions like you would for any other new technology. Which by today’s standards would be a considerable improvement.

Google’s plan also does not address structural risks, such as the existential risk of gradual disempowerment.

Similarly, we expect that as a structural risk, passive loss of control or gradual disempowerment (Kulveit et al., 2025) will require a bespoke approach, which we set out of scope for this paper.

In short: A world with many ASIs and ASI (artificial superintelligent) agents would, due to such dynamics, by default not have a place for humans to make decisions for very long, and then it does not have a place for humans to exist for very long.

Each ASI mostly doing what the user asks them to do, and abiding properly by the spirit of all our requests at all levels, even if you exclude actions that cause direct harm, does not get you out of this. Solving alignment necessary but not sufficient.

And that’s far from the only such problem. If you want to set up a future equilibrium that includes and is good for humans, you have to first solve alignment, and then engineer that equilibrium into being.

More mundanely, the moment there are two agents interacting or competing, you can get into all sorts of illegal, unethical or harmful shenanigans or unhealthy dynamics, without any particular person or AI being obviously ‘to blame.’

Tragedies of the commons, and negative externalities, and reducing the levels of friction within systems in ways that break the relevant incentives, are the most obvious mundane failures here, and can also scale up to catastrophic or even existential (e.g. if each instance of each individual AI inflicts tiny ecological damage on the margin, or burns some exhaustible vital resource, this can end with the Earth uninhabitable). I’d have liked to see better mentions of these styles of problems.

Google does explicitly mention ‘race dynamics’ and the resulting dangers in its call for governance, in the summary. In the full discussion in 4.4, they talk about individual risks like undermining our sense of achievement, distraction from genuine pursuits and loss of trust, which seem like mistake or misuse issues. Then they talk about societal or global scale issues, starting with gradual disempowerment, then discussing ‘misinformation’ issues (again that sounds like misuse?), value lock-in and the ethical treatment of AI systems, and potential problems with offense-defense balance.

Again, Google is doing the virtuous thing of explicitly saying, at least in the context of this document: Not My Department.

Discussion about this post

On Google’s Safety Plan Read More »

wheel-of-time-recap:-the-show-nails-one-of-the-books’-biggest-and-bestest-battles

Wheel of Time recap: The show nails one of the books’ biggest and bestest battles

Andrew Cunningham and Lee Hutchinson have spent decades of their lives with Robert Jordan and Brandon Sanderson’s Wheel of Time books, and they previously brought that knowledge to bear as they recapped each first season episode and second season episode of Amazon’s WoT TV series. Now we’re back in the saddle for season 3—along with insights, jokes, and the occasional wild theory.

These recaps won’t cover every element of every episode, but they will contain major spoilers for the show and the book series. We’ll do our best to not spoil major future events from the books, but there’s always the danger that something might slip out. If you want to stay completely unspoiled and haven’t read the books, these recaps aren’t for you.

New episodes of The Wheel of Time season 3 will be posted for Amazon Prime subscribers every Thursday. This write-up covers episode seven, “Goldeneyes,” which was released on April 10.

Lee: Welcome back—and that was nuts. There’s a ton to talk about—the Battle of the Two Rivers! Lord Goldeneyes!—but uh, I feel like there’s something massive we need to address right from the jump, so to speak: LOIAL! NOOOOOOOOOO!!!! That was some out-of-left-field Game of Thrones-ing right there. My wife and I have both been frantically talking about how Loial’s death might or might not change the shape of things to come. What do you think—is everybody’s favorite Ogier dead-dead, or is this just a fake-out?

Image of Loial

NOOOOOOOOO

Credit: Prime/Amazon MGM Studios

NOOOOOOOOO Credit: Prime/Amazon MGM Studios

Andrew: Standard sci-fi/fantasy storytelling rules apply here as far as I’m concerned—if you don’t see a corpse, they can always reappear (cf. Thom Merrillin, The Wheel of Time season three, episode six).

For example! When the Cauthon sisters fricassee Eamon Valda to avenge their mother and Alanna laughs joyfully at the sight of his charred corpse? That’s a death you ain’t coming back from.

Even assuming that Loial’s plot armor has fallen off, the way we’ve seen the show shift and consolidate storylines means it’s impossible to say how the presence or absence of one character or another couple ripple outward. This episode alone introduces a bunch of fairly major shifts that could play out in unpredictable ways next season.

But let’s back up! The show takes a break from its usual hopping and skipping to focus entirely on one plot thread this week: Perrin’s adventures in the Two Rivers. This is a Big Book Moment; how do you think it landed?

Image of Padan Fain.

Fain seems to be leading the combined Darkfriend/Trolloc army.

Credit: Prime/Amazon MGM Studios

Fain seems to be leading the combined Darkfriend/Trolloc army. Credit: Prime/Amazon MGM Studios

Lee: I would call the Battle of the Two Rivers one of the most important events that happens in the front half of the series. It is certainly a defining moment for Perrin’s character, where he grows up and becomes a Man-with-a-capital-M. It is possibly done better in the books, but only because the book has the advantage of being staged in our imaginations; I’ll always see it as bigger and more impactful than anything a show or movie could give us.

Though it was a hell of a battle, yeah. The improvements in pulling off large set pieces continues to scale from season to season—comparing this battle to the Bel Tine fight back in the first bits of season one shows not just better visual effects or whatever, but just flat-out better composition and clearer storytelling. The show continues to prove that it has found its footing.

Did the reprise of the Manetheren song work for you? This has been sticky for me—I want to like it. I see what the writers are trying to do, and I see how “this is a song we all just kind of grew up singing” is given new meaning when it springs from characters’ bloody lips on the battlefield. But it just… doesn’t work for me. It makes me feel cringey, and I wish it didn’t. It’s probably the only bit in the entire episode that I felt was a swing and a miss.

Image of the battle of the Two Rivers

Darkfriends and Trollocs pour into Emond’s Field.

Darkfriends and Trollocs pour into Emond’s Field.

Andrew: Forgive me in advance for what I think is about to be a short essay but it is worth talking about when evaluating the show as an adaptation of the original work.

Part of the point of the Two Rivers section in The Shadow Rising is that it helps to back up something we’ve seen in our Two Rivers expats over the course of the first books in the series—that there is a hidden strength in this mostly-ignored backwater of Randland.

To the extent that the books are concerned with Themes, the two big overarching ones are that strength and resilience come from unexpected places and that heroism is what happens when regular, flawed, scared people step up and Do What Needs To Be Done under terrible circumstances. (This is pure Tolkien, and that’s the difference between The Wheel of Time and A Song of Ice and FireWoT wants to build on LotR‘s themes and ASoIaF is mainly focused on subverting them.)

But to get back to what didn’t work for you about this, the strength of the Two Rivers is meant to be more impressive and unexpected because these people all view themselves, mostly, as quiet farmers and hunters, not as the exiled heirs to some legendary kingdom (a la Malkier). They don’t go around singing songs about How Virtuous And Bold Was Manetheren Of Old, or whatever. Manetheren is as distant to them as the Roman Empire, and those stories don’t put food on the table.

So yeah, it worked for me as an in-the-moment plot device. The show had already played the “Perrin Rallies His Homeland With A Rousing Speech” card once or twice, and you want to mix things up. I doubt it was even a blip for non-book-readers. But it is a case, as with the Cauthon sisters’ Healing talents, where the show has to take what feels like too short a shortcut.

Lee: That’s a good set of points, yeah. And I don’t hate it—it’s just not the way I would have done it. (Though, hah, that’s a terribly easy thing to say from behind the keyboard here, without having to own the actual creative responsibility of dragging this story into the light.)

In amongst the big moments were a bunch of nice little character bits, too—the kinds of things that keep me coming back to the show. Perrin’s glowering, teeth-gritted exchange with Whitecloak commander Dain Bornhald was great, though my favorite bit was the almost-throwaway moment where Perrin catches up with the Cauthon sisters and gives them an update on Mat. The two kids absolutely kill it, transforming from sober and traumatized young people into giggling little sisters immediately at the sight of their older brother’s sketch. Not even blowing the Horn of Valere can save you from being made fun of by your sisters. (The other thing that scene highlighted was that Perrin, seated, is about the same height as Faile standing. She’s tiny!)

We also close the loop a bit on the Tinkers, who, after being present in flashback a couple of episodes ago, finally show back up on screen—complete with Aram, who has somewhat of a troubling role in the books. The guy seems to have a destiny that will take him away from his family, and that destiny grabs firmly ahold of him here.

Image of Perrin, Faile, and the Cauthon sisters

Perrin is tall.

Credit: Prime/Amazon MGM Studios

Perrin is tall. Credit: Prime/Amazon MGM Studios

Andrew: Yeah, I think the show is leaving the door open for Aram to have a happier ending than he has in the books, where being ejected from his own community makes him single-mindedly obsessed with protecting Perrin in a way that eventually curdles. Here, he might at least find community among good Two Rivers folk. We’ll see.

The entire Whitecloak subplot is something that stretches out interminably in the books, as many side-plots do. Valda lasts until Book 11 (!). Dain Bornhald holds his grudge against Perrin (still unresolved here, but on a path toward resolution) until Book 14. The show has jumped around before, but I think this is the first time we’ve seen it pull something forward from that late, which it almost certainly needs to do more of if it hopes to get to the end in whatever time is allotted to it (we’re still waiting for a season 4 renewal).

Lee: Part of that, I think, is the Zeno’s Paradox-esque time-stretching that occurs as the series gets further on—we’ll keep this free of specific spoilers, of course, but it’s not really a spoiler to say that as the books go on, less time passes per book. My unrefreshed off-the-top-of-my-head recollection is that there are, like, four, possibly five, books—written across almost a decade of real time—that cover like a month or two of in-universe time passing.

This gets into the area of time that book readers commonly refer to as “The Slog,” which slogs at maximum slogginess around book 10 (which basically retreads all the events of book nine and shows us what all the second-string characters were up to while the starting players were off doing big world-changing things). Without doing any more criticizing than the implicit criticizing I’ve already done, The Slog is something I’m hoping that the show obviates or otherwise does away with, and I think we’re seeing the ways in which such slogginess will be shed.

There are a few other things to wrap up here, I think, but this episode being so focused on a giant battle—and doing that battle well!—doesn’t leave us with a tremendous amount to recap. Do we want to get into Bain and Chiad trying to steal kisses from Loial? It’s not in the book—at least, I don’t think it was!—but it feels 100 percent in character for all involved. (Loial, of course, would never kiss outside of marriage.)

Image of Loial, Bain, and Chiad

A calm moment before battle.

Credit: Prime/Amazon MGM Studios

A calm moment before battle. Credit: Prime/Amazon MGM Studios

Andrew: All the Bain and Chiad in this episode is great—I appreciate when the show decides to subtitle the Maiden Of The Spear hand-talk and when it lets context and facial expressions convey the meaning. All of the Alanna/Maksim stuff is great. Alanna calling in a storm that rains spikes of ice on all their enemies is cool. Daise Congar throwing away her flask after touching the One Power for the first time was a weird vaudevillian comic beat that still made me laugh (and you do get a bit more, in here, that shows why people who haven’t formally learned how to channel generally shouldn’t try it). There’s a thread in the books where everyone in the Two Rivers starts referring to Perrin as a lord, which he hates and which is deployed a whole bunch of times here.

I find myself starting each of these episodes by taking fairly detailed notes, and by the middle of the episode I catch myself having not written anything for minutes at a time because I am just enjoying watching the show. On the topic of structure and pacing, I will say that these episodes that make time to focus on a single thread also make more room for quiet character moments. On the rare occasions that we get a less-than-frenetic episode I just wish we could have more of them.

Lee: I find that I’m running out of things to say here—not because this episode is lacking, but because like an arrow loosed from a Two Rivers longbow, this episode hurtles us toward the upcoming season finale. We’ve swept the board clean of all the Perrin stuff, and I don’t believe we’re going to get any more of it next week. Next week—and at least so far, I haven’t cheated and watched the final screener!—feels like we’re going to resolve Tanchico and, more importantly, Rand’s situation out in the Aiel Waste.

But Loial’s unexpected death (if indeed death it was) gives me pause. Are we simply killing folks off left and right, Game of Thrones style? Has certain characters’ plot armor been removed? Are, shall we say, alternative solutions to old narrative problems suddenly on the table in this new turning of the Wheel?

I’m excited to see where this takes us—though I truly hope we’re not going to have to say goodbye to anyone else who matters.

Closing thoughts, Andrew? Any moments you’d like to see? Things you’re afraid of?

Image of Perrin captured

Perrin being led off by Bornhald. Things didn’t exactly work out like this in the book!

Credit: Prime/Amazon MGM Studios

Perrin being led off by Bornhald. Things didn’t exactly work out like this in the book! Credit: Prime/Amazon MGM Studios

Andrew: For better or worse, Game of Thrones did help to create this reality where Who Dies This Week? was a major driver of the cultural conversation and the main reason to stay caught up. I’ll never forget having the Red Wedding casually ruined for me by another Ars staffer because I was a next-day watcher and not a day-of GoT viewer.

One way to keep the perspectives and plotlines from endlessly proliferating and recreating The Slog is simply to kill some of those people so they can’t be around to slow things down. I am not saying one way or the other whether I think that’s actually a series wrap on Loial, Son Of Arent, Son Of Halan, May His Name Sing In Our Ears, but we do probably have to come to terms with the fact that not all fan-favorite septenary Wheel of Time characters are going to make it to the end.

As for fears, mainly I’m afraid of not getting another season at this point. The show is getting good enough at showing me big book moments that now I want to see a few more of them, y’know? But Economic Uncertainty + Huge Cast + International Shooting Locations + No More Unlimited Cash For Streaming Shows feels like an equation that is eventually going to stop adding up for this production. I really hope I’m wrong! But who am I to question the turning of the Wheel?

Credit: WoT Wiki

Wheel of Time recap: The show nails one of the books’ biggest and bestest battles Read More »

the-trek-madone-slr-9-axs-gen-8-tears-up-the-roads-and-conquers-climbs

The Trek Madone SLR 9 AXS Gen 8 tears up the roads and conquers climbs


Trek’s top-of-the-line performance road bike offers some surprises.

The Madone SLR 9 Gen 8 AXS with Lake Michigan in the background on a brisk morning ride. Credit: Eric Bangeman

When a cyclist sees the Trek Madone SLR 9 AXS Gen 8 for the first time, the following thoughts run through their head, usually in this order:

“What a beautiful bike.”

“Damn, that looks really fast.”

“The owner of this bike is extremely serious about cycling and has a very generous budget for fitness gear.”

Indeed, almost every conversation I had while out and about on the Madone started and ended with the bike’s looks and price tag. And for good reason.

A shiny bike

Credit: Eric Bangeman

Let’s get the obvious out of the way. This is an expensive and very high-tech bike, retailing at $15,999. Part of the price tag is the technology—this is a bicycle that rides on the bleeding edge of tech. And another part is the Project One Icon “Tête de la Course” paint job on the bike; less-flashy options start at $13,499. (And if $15,999 doesn’t break your budget, there’s an even fancier Icon “Stellar” paint scheme for an extra $1,000.) That’s a pretty penny but not an unusual price point in the world of high-end road bikes. If you’re shopping for, say, a Cervélo S5 or Specialized S-Works Tarmac SL8, you’ll see the same price tags.

Madone is Trek’s performance-oriented road bike, and the Gen 8 is the latest and greatest from the Wisconsin-based bike manufacturer. It’s more aerodynamic than the Gen 7 (with a pair of aero water bottles) and a few hundred grams weightier than Trek’s recently discontinued Emonda climbing-focused bike.

I put nearly 1,000 miles on the Gen 8 Madone over a two-month period, riding it on the roads around Chicagoland. Yes, the land around here is pretty flat, but out to the northwest there are some nice rollers, including a couple of short climbs with grades approaching 10 percent. Those climbs gave me a sense of the Madone’s ability on hills.

Trek manufactures the Gen 8 Madone out of its 900 series OCLV carbon, and at 15.54 lb (7.05 kg)—just a hair over UCI’s minimum weight for racing bikes—the bike is 320 g lighter than the Gen 7. But high-tech bikes aren’t just about lightweight carbon and expensive groupsets. Even the water bottles matter. During the development of the Gen 8 Madone, Trek realized the water bottles were nearly as important as the frame when it came to squeezing out every last possible aerodynamic gains.

Perhaps the most obvious bit of aerodynamic styling is the diamond-shaped seat tube cutout. That cutout allows the seat tube to flex slightly on rougher pavement while cutting back on lateral flex. It’s slightly smaller than on the Gen 7 Madone, and it looks odd, but it contributes to a surprisingly compliant ride quality.

For the wheelset, Trek has gone with the Aeolus RSL for the Madone SLR 9. The tubeless-ready wheels offer a 51 mm rim depth and can handle a max tire size of 32 mm. Those wheels are paired with a set of 28 mm Bontrager Aeolus RSL TLR road tires. About four weeks into my testing, the rear tire developed what looked like a boil along one of the seams near the edge of the tire. Trek confirmed it was a manufacturing defect that occurred with a batch of tires due to a humidity-control issue within the factory, so affected tires should be out of stores by now.

Cockpit shot

No wires coming off the integrated handlebar and stem.

Credit: Eric Bangeman

No wires coming off the integrated handlebar and stem. Credit: Eric Bangeman

You’ll pilot the Madone with Trek’s new one-piece Aero RSL handlebar and stem combo. It’s a stiff cockpit setup, but I found it comfortable enough even on 80-plus-mile rides. Visually, it’s sleek-looking with a complete absence of wires (and the handlebar-stem combo can only be used with electronic groupsets). The downside is that there’s not enough clearance for a Garmin bike computer with a standard mount; I had to use a $70 K-Edge mount to mount my Garmin.

The Gen 8 Madone also replaces Trek’s Emonda lineup of climbing-focused bikes. Despite weighing 36 grams more than the Emonda SLR 9, Trek claims the Gen 8 Madone has an 11.3 W edge over the climbing bike at 22 mph (and a more modest 0.1 W improvement over the Gen 7 Madone at the same speed).

Of climbs and hero pulls

Paint job

The Tête de la Course colorway in iridescent mode.

Credit: Eric Bangeman

The Tête de la Course colorway in iridescent mode. Credit: Eric Bangeman

The first time I rode the Madone SLR 9 Gen 8 on my usual lunchtime route, I set a personal record. I wasn’t shooting for a new PR—it just sort of happened while I was putting the bike through its paces to see what it was capable of. It turns out it’s capable of a lot.

Riding feels almost effortless. The Madone’s outstanding SRAM Red AXS groupset contributes to that free-and-easy feeling. Shifting through the 12-speed 10-33 cassette is both slick and quick, perfect for when you really want to get to a higher gear in order to drop the hammer. At the front of the drivetrain is a 172.5 mm crank paired with 48t/35t chainrings, more than adequate for everything the local roads were able to confront me with. I felt faster on the flats and quicker through the corners, which led to more than a couple of hero pulls on group rides. The Madone also has a power meter, so you know exactly how many watts you cranked out on your rides.

There’s no derailleur hanger on the Gen 8 Madone, which opens the door to the SRAM Red XPLR groupset.

Credit: Eric Bangeman

There’s no derailleur hanger on the Gen 8 Madone, which opens the door to the SRAM Red XPLR groupset. Credit: Eric Bangeman

There’s also a nice bit of future-proofing with the Madone. Lidl-Trek has been riding some of the cobbled classics with the SRAM Red XPLR AXS groupset, a 13-speed gravel drivetrain that doesn’t need a derailleur hanger. Danish all-arounder Mads Pedersen rode a Madone SLR 9 Gen 8 with a single 56t chainring up front, paired with the Red XPLR to victory at Gent-Wevelgem at the end of March. So if you want to spend another thousand or so on your dream bike setup, that’s an option, as the Madone SLR 9 Gen 8 is one of the few high-performance road bikes that currently supports this groupset.

Living in northeastern Illinois, I lacked opportunities to try the new Madone on extended climbs. Traversing the rollers in the far northwestern suburbs of Chicago, however, the bike’s utility on climbs was apparent. Compared to my usual ride, an endurance-focused road bike, I felt like I was getting the first few seconds of a climb for free. The Madone felt lightweight, nimble, and responsive each time I hit an ascent.

What surprised me the most about the Madone was its performance on long rides. I went into testing with the assumption that I would be trading speed for comfort—and I was happy to be proven wrong. The combination of Trek’s aerodynamic frame design (which it calls IsoFlow), carbon wheelset, and tubeless tires really makes a difference on uneven pavement; there was almost no trade-off between pace and comfort.

What didn’t I like? The water bottles, mainly. My review bike came equipped with a pair of Trek RSL Aero water bottles, which fit in a specially designed cage. Trek says the bottles offer 1.8 W of savings at 22 mph compared to round bottles. That’s not worth it to me. The bottles hold less (~650 ml) than a regular water bottle and are irritating to fill, and getting them in and out of the RSL Aero cages takes a bit of awareness during the first few rides. Thankfully, you don’t need to use the aero bottles; normal cylindrical water bottles work just fine.

The price bears mentioning again. This is an expensive bike! If your cycling budget is massive and you want every last bit of aerodynamic benefit and weight savings, get the SLR 9 with your favorite paint job. Drop down to the Madone SLR 7, and you get the same frame with a Shimano Ultegra Di2 groupset, 52t/36t crank, and a 12-speed 11-30 cassette for $7,000 less than this SLR 9. The SL 7, with its 500 Series OCLV carbon frame (about 250 grams heavier), different handlebars and fork, and the same Ultegra Di2 groupset as the SLR 7 is $2,500 cheaper still.

In conceiving the Gen 8 Madone, Trek prioritized aerodynamic performance and weight savings over all else. The result is a resounding, if expensive, success. The color-shifting Project One paint job is a treat for the eyes, as is the $13,499 Team Replica colorway—the same one seen on Lidl-Trek’s bikes on the UCI World Tour.

At the end of the day, though, looks come a distant second to performance. And with the Gen 8 Madone, performance is the winner by a mile. Trek has managed to take a fast, aerodynamic road bike and make it faster and more aerodynamic without sacrificing compliance. The result is a technological marvel—not to mention a very expensive bike—that is amazing to ride.

Let me put it another way—the Madone made me feel like a boss on the roads. My daily driver is no slouch—a 5-year-old endurance bike with SRAM Red, a Reserve Turbulent Aero 49/42 wheelset, and Continental GP5000s, which I dearly love. But during my two-plus months with the Madone, I didn’t miss my bike at all. I was instead fixated on riding the Madone, dreaming of long rides and new PRs. That’s the way it should be.

Photo of Eric Bangeman

Eric Bangeman is the Managing Editor of Ars Technica. In addition to overseeing the daily operations at Ars, Eric also manages story development for the Policy and Automotive sections. He lives in the northwest suburbs of Chicago, where he enjoys cycling and playing the bass.

The Trek Madone SLR 9 AXS Gen 8 tears up the roads and conquers climbs Read More »

“what-the-hell-are-you-doing?”-how-i-learned-to-interview-astronauts,-scientists,-and-billionaires

“What the hell are you doing?” How I learned to interview astronauts, scientists, and billionaires


The best part about journalism is not collecting information. It’s sharing it.

Inside NASA's rare Moon rocks vault (2016)

Sometimes the best place to do an interview is in a clean room. Credit: Lee Hutchinson

Sometimes the best place to do an interview is in a clean room. Credit: Lee Hutchinson

I recently wrote a story about the wild ride of the Starliner spacecraft to the International Space Station last summer. It was based largely on an interview with the commander of the mission, NASA astronaut Butch Wilmore.

His account of Starliner’s thruster failures—and his desperate efforts to keep the vehicle flying on course—was riveting. In the aftermath of the story, many readers, people on social media, and real-life friends congratulated me on conducting a great interview. But truth be told, it was pretty much all Wilmore.

Essentially, when I came into the room, he was primed to talk. I’m not sure if Wilmore was waiting for me specifically to talk to, but he pretty clearly wanted to speak with someone about his experiences aboard the Starliner spacecraft. And he chose me.

So was it luck? I’ve been thinking about that. As an interviewer, I certainly don’t have the emotive power of some of the great television interviewers, who are masters of confrontation and drama. It’s my nature to avoid confrontation where possible. But what I do have on my side is experience, more than 25 years now, as well as preparation. I am also genuinely and completely interested in space. And as it happens, these values are important, too.

Interviewing is a craft one does not pick up overnight. During my career, I have had some funny, instructive, and embarrassing moments. Without wanting to seem pretentious or self-indulgent, I thought it might be fun to share some of those stories so you can really understand what it’s like on a reporter’s side of the cassette tape.

March 2003: Stephen Hawking

I had only been working professionally as a reporter at the Houston Chronicle for a few years (and as the newspaper’s science writer for less time still) when the opportunity to interview Stephen Hawking fell into my lap.

What a coup! He was only the world’s most famous living scientist, and he was visiting Texas at the invitation of a local billionaire named George Mitchell. A wildcatter and oilman, Mitchell had grown up in Galveston along the upper Texas coast, marveling at the stars as a kid. He studied petroleum engineering and later developed the controversial practice of fracking. In his later years, Mitchell spent some of his largesse on the pursuits of his youth, including astronomy and astrophysics. This included bringing Hawking to Texas more than half a dozen times in the 1990s and early 2000s.

For an interview with Hawking, one submitted questions in advance. That’s because Hawking was afflicted with Lou Gehrig’s disease and lost the ability to speak in 1985. A computer attached to his wheelchair cycled through letters and sounds, and Hawking clicked a button to make a selection, forming words and then sentences, which were sent to a voice synthesizer. For unprepared responses, it took a few minutes to form a single sentence.

George Mitchell and Stephen Hawking during a Texas visit.

Credit: Texas A&M University

George Mitchell and Stephen Hawking during a Texas visit. Credit: Texas A&M University

What to ask him? I had a decent understanding of astronomy, having majored in it as an undergraduate. But the readership of a metro newspaper was not interested in the Hubble constant or the Schwarzschild radius. I asked him about recent discoveries of the cosmic microwave background radiation anyway. Perhaps the most enduring response was about the war in Iraq, a prominent topic of the day. “It will be far more difficult to get out of Iraq than to get in,” he said. He was right.

When I met him at Texas A&M University, Hawking was gracious and polite. He answered a couple of questions in person. But truly, it was awkward. Hawking’s time on Earth was limited and his health failing, so it required an age to tap out even short answers. I can only imagine his frustration at the task of communication, which the vast majority of humans take for granted, especially because he had such a brilliant mind and so many deep ideas to share. And here I was, with my banal questions, stealing his time. As I stood there, I wondered whether I should stare at him while he composed a response. Should I look away? I felt truly unworthy.

In the end, it was fine. I even met Hawking a few more times, including at a memorable dinner at Mitchell’s ranch north of Houston, which spans tens of thousands of acres. A handful of the world’s most brilliant theoretical physicists were there. We would all be sitting around chatting, and Hawking would periodically chime in with a response to something brought up earlier. Later on that evening, Mitchell and Hawking took a chariot ride around the grounds. I wonder what they talked about?

Spring 2011: Jane Goodall and Sylvia Earle

By this point, I had written about science for nearly a decade at the Chronicle. In the early part of the year, I had the opportunity to interview noted chimpanzee scientist Jane Goodall and one of the world’s leading oceanographers, Sylvia Earle. Both were coming to Houston to talk about their research and their passion for conservation.

I spoke with Goodall by phone in advance of her visit, and she was so pleasant, so regal. By then, Goodall was 76 years old and had been studying chimpanzees in Gombe Stream National Park in Tanzania for five decades. Looking back over the questions I asked, they’re not bad. They’re just pretty basic. She gave great answers regardless. But there is only so much chemistry you can build with a person over the telephone (or Zoom, for that matter, these days). Being in person really matters in interviewing because you can read cues, and it’s easier to know when to let a pause go. The comfort level is higher. When you’re speaking with someone you don’t know that well, establishing a basic level of comfort is essential to making an all-important connection.

A couple of months later, I spoke with Earle in person at the Houston Museum of Natural Science. I took my older daughter, then nine years old, because I wanted her to hear Earle speak later in the evening. This turned out to be a lucky move for a couple of different reasons. First, my kid was inspired by Earle to pursue studies in marine biology. And more immediately, the presence of a curious 9-year-old quickly warmed Earle to the interview. We had a great discussion about many things beyond just oceanography.

President Barack Obama talks with Dr. Sylvia Earle during a visit to Midway Atoll on September 1, 2016.

Credit: Barack Obama Presidential Library

President Barack Obama talks with Dr. Sylvia Earle during a visit to Midway Atoll on September 1, 2016. Credit: Barack Obama Presidential Library

The bottom line is that I remained a fairly pedestrian interviewer back in 2011. That was partly because I did not have deep expertise in chimpanzees or oceanography. And that leads me to another key for a good interview and establishing a rapport. It’s great if a person already knows you, but even if they don’t, you can overcome that by showing genuine interest or demonstrating your deep knowledge about a subject. I would come to learn this as I started to cover space more exclusively and got to know the industry and its key players better.

September 2014: Scott Kelly

To be clear, this was not much of an interview. But it is a fun story.

I spent much of 2014 focused on space for the Houston Chronicle. I pitched the idea of an in-depth series on the sorry state of NASA’s human spaceflight program, which was eventually titled “Adrift.” By immersing myself in spaceflight for months on end, I discovered a passion for the topic and knew that writing about space was what I wanted to do for the rest of my life. I was 40 years old, so it was high time I found my calling.

As part of the series, I traveled to Kazakhstan with a photographer from the Chronicle, Smiley Pool. He is a wonderful guy who had strengths in chatting up sources that I, an introvert, lacked. During the 13-day trip to Russia and Kazakhstan, we traveled with a reporter from Esquire named Chris Jones, who was working on a long project about NASA astronaut Scott Kelly. Kelly was then training for a yearlong mission to the International Space Station, and he was a big deal.

Jones was a tremendous raconteur and an even better writer—his words, my goodness. We had so much fun over those two weeks, sharing beer, vodka, and Kazakh food. The capstone of the trip was seeing the Soyuz TMA-14M mission launch from the Baikonur Cosmodrome. Kelly was NASA’s backup astronaut for the flight, so he was in quarantine alongside the mission’s primary astronaut. (This was Butch Wilmore, as it turns out). The launch, from a little more than a kilometer away, was still the most spectacular moment of spaceflight I’ve ever observed in person. Like, holy hell, the rocket was right on top of you.

Expedition 43 NASA Astronaut Scott Kelly walks from the Zvjozdnyj Hotel to the Cosmonaut Hotel for additional training, Thursday, March 19, 2015, in Baikonur, Kazakhstan.

Credit: NASA/Bill Ingalls

Expedition 43 NASA Astronaut Scott Kelly walks from the Zvjozdnyj Hotel to the Cosmonaut Hotel for additional training, Thursday, March 19, 2015, in Baikonur, Kazakhstan. Credit: NASA/Bill Ingalls

Immediately after the launch, which took place at 1: 25 am local time, Kelly was freed from quarantine. This must have been liberating because he headed straight to the bar at the Hotel Baikonur, the nicest watering hole in the small, Soviet-era town. Jones, Pool, and I were staying at a different hotel. Jones got a text from Kelly inviting us to meet him at the bar. Our NASA minders were uncomfortable with this, as the last thing they want is to have astronauts presented to the world as anything but sharp, sober-minded people who represent the best of the best. But this was too good to resist.

By the time we got to the bar, Kelly and his companion, the commander of his forthcoming Soyuz flight, Gennady Padalka, were several whiskeys deep. The three of us sat across from Kelly and Padalka, and as one does at 3 am in Baikonur, we started taking shots. The astronauts were swapping stories and talking out of school. At one point, Jones took out his notebook and said that he had a couple of questions. To this, Kelly responded heatedly, “What the hell are you doing?”

Not conducting an interview, apparently. We were off the record. Well, until today at least.

We drank and talked for another hour or so, and it was incredibly memorable. At the time, Kelly was probably the most famous active US astronaut, and here I was throwing down whiskey with him shortly after watching a rocket lift off from the very spot where the Soviets launched the Space Age six decades earlier. In retrospect, this offered a good lesson that the best interviews are often not, in fact, interviews. To get the good information, you need to develop relationships with people, and you do that by talking with them person to person, without a microphone, often with alcohol.

Scott Kelly is a real one for that night.

September 2019: Elon Musk

I have spoken with Elon Musk a number of times over the years, but none was nearly so memorable as a long interview we did for my first book on SpaceX, called Liftoff. That summer, I made a couple of visits to SpaceX’s headquarters in Hawthorne, California, interviewing the company’s early employees and sitting in on meetings in Musk’s conference room with various teams. Because SpaceX is such a closed-up company, it was fascinating to get an inside look at how the sausage was made.

It’s worth noting that this all went down a few months before the onset of the COVID-19 pandemic. In some ways, Musk is the same person he was before the outbreak. But in other ways, he is profoundly different, his actions and words far more political and polemical.

Anyway, I was supposed to interview Musk on a Friday evening at the factory at the end of one of these trips. As usual, Musk was late. Eventually, his assistant texted, saying something had come up. She was desperately sorry, but we would have to do the interview later. I returned to my hotel, downbeat. I had an early flight the next morning back to Houston. But after about an hour, the assistant messaged me again. Musk had to travel to South Texas to get the Starship program moving. Did I want to travel with him and do the interview on the plane?

As I sat on his private jet the next day, late morning, my mind swirled. There would be no one else on the plane but Musk, his three sons (triplets, then 13 years old) and two bodyguards, and me. When Musk is in a good mood, an interview can be a delight. He is funny, sharp, and a good storyteller. When Musk is in a bad mood, well, an interview is usually counterproductive. So I fretted. What if Musk was in a bad mood? It would be a super-awkward three and a half hours on the small jet.

Two Teslas drove up to the plane, the first with Musk driving his boys and the second with two security guys. Musk strode onto the jet, saw me, and said he didn’t realize I was going to be on the plane. (A great start to things!) Musk then took out his phone and started a heated conversation about digging tunnels. By this point, I was willing myself to disappear. I just wanted to melt into the leather seat I was sitting in about three feet from Musk.

So much for a good mood for the interview.

As the jet climbed, the phone conversation got worse, but then Musk lost his connection. He put away his phone and turned to me, saying he was free to talk. His mood, almost as if by magic, changed. Since we were discussing the early days of SpaceX at Kwajalein, he gathered the boys around so they could hear about their dad’s earlier days. The interview went shockingly well, and at least part of the reason has to be that I knew the subject matter deeply, had prepared, and was passionate about it. We spoke for nearly two hours before Musk asked if he might have some time with his kids. They spent the rest of the flight playing video games, yucking it up.

April 2025: Butch Wilmore

When they’re on the record, astronauts mostly stick to a script. As a reporter, you’re just not going to get too much from them. (Off the record is a completely different story, of course, as astronauts are generally delightful, hilarious, and earnest people.)

Last week, dozens of journalists were allotted 10-minute interviews with Wilmore and, separately, Suni Williams. It was the first time they had spoken in depth with the media since their launch on Starliner and return to Earth aboard a Crew Dragon vehicle. As I waited outside Studio A at Johnson Space Center, I overheard Wilmore completing an interview with a Tennessee-based outlet, where he is from. As they wrapped up, the public affairs officer said he had just one more interview left and said my name. Wilmore said something like, “Oh good, I’ve been waiting to talk with him.”

That was a good sign. Out of all the interviews that day, it was good to know he wanted to speak with me. The easy thing for him to do would have been to use “astronaut speak” for 10 minutes and then go home. I was the last interview of the day.

As I prepared to speak with Wilmore and Williams, I didn’t want to ask the obvious questions they’d answered many times earlier. If you ask, “What was it like to spend nine months in space when you were expecting only a short trip?” you’re going to get a boring answer. Similarly, although the end of the mission was highly politicized by the Trump White House, two veteran NASA astronauts were not going to step on that landmine.

I wanted to go back to the root cause of all this, the problems with Starliner’s propulsion system. My strategy was simply to ask what it was like to fly inside the spacecraft. Williams gave me some solid answers. But Wilmore had actually been at the controls. And he apparently had been holding in one heck of a story for nine months. Because when I asked about the launch, and then what it was like to fly Starliner, he took off without much prompting.

Butch Wilmore has flown on four spacecraft: the Space Shuttle, Soyuz, Starliner, and Crew Dragon.

Credit: NASA/Emmett Given

Butch Wilmore has flown on four spacecraft: the Space Shuttle, Soyuz, Starliner, and Crew Dragon. Credit: NASA/Emmett Given

I don’t know exactly why Wilmore shared so much with me. We are not particularly close and have never interacted outside of an official NASA setting. But he knows of my work and interest in spaceflight. Not everyone at the space agency appreciates my journalism, but they know I’m deeply interested in what they’re doing. They know I care about NASA and Johnson Space Center. So I asked Wilmore a few smart questions, and he must have trusted that I would tell his story honestly and accurately, and with appropriate context. I certainly tried my best. After a quarter of a century, I have learned well that the most sensational stories are best told without sensationalism.

Even as we spoke, I knew the interview with Wilmore was one of the best I had ever done. A great scientist once told me that the best feeling in the world is making some little discovery in a lab and for a short time knowing something about the natural world that no one else knows. The equivalent, for me, is doing an interview and knowing I’ve got gold. And for a little while, before sharing it with the world, I’ve got that little piece of gold all to myself.

But I’ll tell you what. It’s even more fun to let the cat out of the bag. The best part about journalism is not collecting information. It’s sharing that information with the world.

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

“What the hell are you doing?” How I learned to interview astronauts, scientists, and billionaires Read More »

google-pixel-9a-review:-all-the-phone-you-need

Google Pixel 9a review: All the phone you need


The Pixel 9a looks great and shoots lovely photos, but it’s light on AI.

Pixel 9a floating back

The Pixel 9a adopts a streamlined design. Credit: Ryan Whitwam

The Pixel 9a adopts a streamlined design. Credit: Ryan Whitwam

It took a few years, but Google’s Pixel phones have risen to the top of the Android ranks, and its new Pixel 9a keeps most of what has made flagship Pixel phones so good, including the slick software and versatile cameras. Despite a revamped design and larger battery, Google has maintained the $499 price point of last year’s phone, undercutting other “budget” devices like the iPhone 16e.

However, hitting this price point involves trade-offs in materials, charging, and—significantly—the on-device AI capabilities compared to its pricier siblings. None of those are deal-breakers, though. In fact, the Pixel 9a may be coming along at just the right time. As we enter a period of uncertainty for imported gadgets, a modestly priced phone with lengthy support could be the perfect purchase.

A simpler silhouette

The Pixel 9a sports the same rounded corners and flat edges we’ve seen on other recent smartphones. The aluminum frame has a smooth, almost silky texture, with rolled edges that flow into the front and back covers.

Pixel 9a in hand

The 9a is just small enough to be cozy in your hand.

Credit: Ryan Whitwam

The 9a is just small enough to be cozy in your hand. Credit: Ryan Whitwam

On the front, there’s a sheet of Gorilla Glass 3, which has been a mainstay of budget phones for years. On the back, Google used recycled plastic with a matte finish. It attracts more dust and grime than glass, but it doesn’t show fingerprints as clearly. The plastic doesn’t feel as solid as the glass backs on Google’s more expensive phones, and the edge where it meets the aluminum frame feels a bit more sharp and abrupt than the glass on Google’s flagship phones.

Specs at a glance: Google Pixel 9a
SoC Google Tensor G4
Memory 8GB
Storage 128GB, 256GB
Display 1080×2424 6.3″ pOLED, 60–120 Hz
Cameras 48 MP primary, f/1.7, OIS; 13 MP ultrawide, f/2.2; 13 MP selfie, f/2.2
Software Android 15, 7 years of OS updates
Battery 5,100 mAh, 23 W wired charging, 7.5 W wireless charging
Connectivity Wi-Fi 6e, NFC, Bluetooth 5.3, sub-6 GHz 5G
Measurements 154.7×73.3×8.9 mm; 185 g

Were it not for the “G” logo emblazoned on the back, you might not recognize the Pixel 9a as a Google phone. It lacks the camera bar that has been central to the design language of all Google’s recent devices, opting instead for a sleeker flat design.

The move to a pOLED display saved a few millimeters, giving the designers a bit more internal volume. In the past, Google has always pushed toward thinner and thinner Pixels, but it retained the same 8.9 mm thickness for the Pixel 9a. Rather than shave off a millimeter, Google equipped the Pixel 9a with a 5,100 mAh battery, which is the largest ever in a Pixel, even beating out the larger and more expensive Pixel 9 Pro XL by a touch.

Pixel 9a and Pixel 8a

The Pixel 9a (left) drops the camera bar from the Pixel 8a (right).

Credit: Ryan Whitwam

The Pixel 9a (left) drops the camera bar from the Pixel 8a (right). Credit: Ryan Whitwam

The camera module on the back is almost flush with the body of the phone, rising barely a millimeter from the surrounding plastic. The phone feels more balanced and less top-heavy than phones that have three or four cameras mounted to chunky aluminum surrounds. The buttons on the right edge are the only other disruptions to the phone’s clean lines. They, too, are aluminum, with nice, tactile feedback and no detectable wobble. Aside from a few tiny foibles, the build quality and overall feel of this phone are better than we’d expect for $499.

The 6.3-inch OLED is slightly larger than last year’s, and it retains the chunkier bezels of Google’s A-series phones. While the flagship Pixels are all screen from the front, there’s a sizable gap between the edge of the OLED and the aluminum frame. That means the body is a few millimeters larger than it probably had to be—the Pixel 9 Pro has the same display size, and it’s a bit more compact, for example. Still, the Pixel 9a does not look or feel oversized.

Pixel 9a edge

The camera bump just barely rises above the surrounding plastic.

Credit: Ryan Whitwam

The camera bump just barely rises above the surrounding plastic. Credit: Ryan Whitwam

The OLED is sharp enough at 1080p and has an impressively high peak brightness, making it legible outdoors. However, the low-brightness clarity falls short of what you get with more expensive phones like the Pixel 9 Pro or Galaxy S25. The screen supports a 120 Hz refresh rate, but that’s disabled by default. This panel does not use LTPO technology, which makes higher refresh rates more battery-intensive. There’s a fingerprint scanner under the OLED, but it has not been upgraded to ultrasonic along with the flagship Pixels. This one is still optical—it works quickly enough, but it lights up dark rooms and lacks reliability compared to ultrasonic sensors.

Probably fast enough

Google took a page from Apple when it debuted its custom Tensor mobile processors with the Pixel 6. Now, Google uses Tensor processors in all its phones, giving a nice boost to budget devices like the Pixel 9a. The Pixel 9a has a Tensor G4, which is identical to the chip in the Pixel 9 series, save for a slightly different modem.

Pixel 9a flat

With no camera bump, the Pixel 9a lays totally flat on surfaces with very little wobble.

Credit: Ryan Whitwam

With no camera bump, the Pixel 9a lays totally flat on surfaces with very little wobble. Credit: Ryan Whitwam

While Tensor is not a benchmark speed demon like the latest silicon from Qualcomm or Apple, it does not feel slow in daily use. A chip like the Snapdragon 8 Elite puts up huge benchmark numbers, but it doesn’t run at that speed for long. Qualcomm’s latest chips can lose half their speed to heat, but Tensor only drops by about a third during extended load.

However, even after slowing down, the Snapdragon 8 Elite is a faster gaming chip than Tensor. If playing high-end games like Diablo Immortal and Genshin Impact is important to you, you can do better than the Pixel 9a (and other Pixels).

9a geekbench

The 9a can’t touch the S25, but it runs neck and neck with the Pixel 9 Pro.

Credit: Ryan Whitwam

The 9a can’t touch the S25, but it runs neck and neck with the Pixel 9 Pro. Credit: Ryan Whitwam

In general use, the Pixel 9a is more than fast enough that you won’t spend time thinking about the Tensor chip. Apps open quickly, animations are unerringly smooth, and the phone doesn’t get too hot. There are some unavoidable drawbacks to its more limited memory, though. Apps don’t stay in memory as long or as reliably as they do on the flagship Pixels, for instance. There are also some AI limitations we’ll get to below.

With a 5,100 mAh battery, the Pixel 9a has more capacity than any other Google phone. Combined with the 1080p screen, the 9a gets much longer battery life than the flagship Pixels. Google claims about 30 hours of usage per charge. In our testing, this equates to a solid day of heavy use with enough left in the tank that you won’t feel the twinge of range anxiety as evening approaches. If you’re careful, you might be able to make it two days without a recharge.

Pixel 9a and 9 Pro XL

The Pixel 9a (right) is much smaller than the Pixel 9 Pro XL (left), but it has a slightly larger battery.

Credit: Ryan Whitwam

The Pixel 9a (right) is much smaller than the Pixel 9 Pro XL (left), but it has a slightly larger battery. Credit: Ryan Whitwam

As for recharging, Google could do better—the Pixel 9a manages just 23 W wired and 7.5 W wireless, and the flagship Pixels are only a little faster. Companies like OnePlus and Motorola offer phones that charge several times faster than Google’s.

The low-AI Pixel

Google’s Pixel software is one of the primary reasons to buy its phones. There’s no bloatware on the device when you take it out of the box, which saves you from tediously extracting a dozen sponsored widgets and microtransaction-laden games right off the bat. Google’s interface design is also our favorite right now, with a fantastic implementation of Material You theming that adapts to your background colors.

Gemini is the default assistant, but the 9a loses some of Google’s most interesting AI features.

Credit: Ryan Whitwam

Gemini is the default assistant, but the 9a loses some of Google’s most interesting AI features. Credit: Ryan Whitwam

The Pixel version of Android 15 also comes with a raft of thoughtful features, like the anti-spammer Call Screen and Direct My Call to help you navigate labyrinthine phone trees. Gemini is also built into the phone, fully replacing the now-doomed Google Assistant. Google notes that Gemini on the 9a can take action across apps, which is technically true. Gemini can look up data from one supported app and route it to another at your behest, but only when it feels like it. Generative AI is still unpredictable, so don’t bank on Gemini being a good assistant just yet.

Google’s more expensive Pixels also have the above capabilities, but they go further with AI. Google’s on-device Gemini Nano model is key to some of the newest and more interesting AI features, but large language models (even the small ones) need a lot of RAM. The 9a’s less-generous 8GB of RAM means it runs a less-capable version of the AI known as Gemini Nano XXS that only supports text input.

As a result, many of the AI features Google was promoting around the Pixel 9 launch just don’t work. For example, there’s no Pixel Screenshots app or Call Notes. Even some features that seem like they should work, like AI weather summaries, are absent on the Pixel 9a. Recorder summaries are supported, but Gemini Nano has a very nano context window. We tested with recordings ranging from two to 20 minutes, and the longer ones surpassed the model’s capabilities. Google tells Ars that 2,000 words (about 15 minutes of relaxed conversation) is the limit for Gemini Nano on this phone.

Pixel 9a software

The 9a is missing some AI features, and others don’t work very well.

Credit: Ryan Whitwam

The 9a is missing some AI features, and others don’t work very well. Credit: Ryan Whitwam

If you’re the type to avoid AI features, the less-capable Gemini model might not matter. You still get all the other neat Pixel features, along with Google’s market-leading support policy. This phone will get seven years of full update support, including annual OS version bumps and monthly security patches. The 9a is also entitled to special quarterly Pixel Drop updates, which bring new (usually minor) features.

Most OEMs struggle to provide even half the support for their phones. Samsung is neck and neck with Google, but its updates are often slower and more limited on older phones. Samsung’s vision for mobile AI is much less fleshed out than Google’s, too. Even with the Pixel 9a’s disappointing Gemini Nano capabilities, we expect Google to make improvements to all aspects of the software (even AI) over the coming years.

Capable cameras

The Pixel 9a has just two camera sensors, and it doesn’t try to dress up the back of the phone to make it look like there are more, a common trait of other Android phones. There’s a new 48 MP camera sensor similar to the one in the Pixel 9 Pro Fold, which is smaller and less capable than the main camera in the flagship Pixels. There’s also a 13 MP ultrawide lens that appears unchanged from last year. You have to spend a lot more money to get Google’s best camera hardware, but conveniently, much of the Pixel magic is in the software.

Pixel 9a back in hand

The Pixel 9a sticks with two cameras.

Credit: Ryan Whitwam

The Pixel 9a sticks with two cameras. Credit: Ryan Whitwam

Google’s image processing works extremely well, lightening dark areas while also preventing blowout in lighter areas. This impressive dynamic range results in even exposures with plenty of detail, and this is true in all lighting conditions. In dim light, you can use Night Sight to increase sharpness and brightness to an almost supernatural degree. Outside of a few edge cases with unusual light temperature, we’ve been very pleased with Google’s color reproduction, too.

The most notable drawback to the 9a’s camera is that it’s a bit slower than the flagship Pixels. The sensor is smaller and doesn’t collect as much light, even compared to the base model Pixel 9. This is more noticeable with shots using Night Sight, which gathers data over several seconds to brighten images. However, image capture is still generally faster than Samsung, OnePlus, and Motorola cameras. Google leans toward keeping shutter speeds high (low exposure time). Outdoors, that means you can capture motion with little to no blur almost as reliably as you can with the Pro Pixels.

The 13 MP ultrawide camera is great for landscape outdoor shots, showing only mild distortion at the edges of the frame despite an impressive 120-degree field-of-view. Unlike Samsung and OnePlus, Google also does a good job of keeping colors consistent across the sensors.

You can shoot macro photos with the Pixel 9a, but it works a bit differently than other phones. The ultrawide camera doesn’t have autofocus, nor is there a dedicated macro sensor. Instead, Google uses AI with the main camera to take close-ups. This seems to work well enough, but details are only sharp around the center of the frame, with ample distortion at the edges.

There’s no telephoto lens here, but Google’s capable image processing helps a lot. The new primary camera sensor probably isn’t hurting, either. You can reliably push the 48 MP primary to 2x digital zoom, and Google’s algorithms will produce photos that you’d hardly know have been enhanced. Beyond 2x zoom, the sharpening begins to look more obviously artificial.

A phone like the Pixel 9 Pro or Galaxy S25 Ultra with 5x telephoto lenses can definitely get sharper photos at a distance, but the Pixel 9a does not do meaningfully worse than phones that have 2–3x telephoto lenses.

The right phone at the right time

The Pixel 9a is not a perfect phone, but for $499, it’s hard to argue with it. This device has the same great version of Android seen on Google’s more expensive phones, along with a generous seven years of guaranteed updates. It also pushes battery life a bit beyond what you can get with other Pixel phones. The camera isn’t the best we’ve seen—that distinction goes to the Pixel 9 Pro and Pro XL. However, it gets closer than a $500 phone ought to.

Pixel 9a with keyboard

Material You theming is excellent on Pixels.

Credit: Ryan Whitwam

Material You theming is excellent on Pixels. Credit: Ryan Whitwam

You do miss out on some AI features with the 9a. That might not bother the AI skeptics, but some of these missing on-device features, like Pixel Screenshots and Call Notes, are among the best applications of generative AI we’ve seen on a phone yet. With years of Pixel Drops ahead of it, the 9a might not have enough muscle to handle Google’s future AI endeavors, which could lead to buyer’s remorse if AI turns out to be as useful as Google claims it will be.

At $499, you’d have to spend $300 more to get to the base model Pixel 9, a phone with weaker battery life and a marginally better camera. That’s a tough sell given how good the 9a is. If you’re not going for the Pro phones, stick with the 9a. With all the uncertainty over future tariffs on imported products, the day of decent sub-$500 phones could be coming to an end. With long support, solid hardware, and a beefy battery, the Pixel 9a could be the right phone to buy before prices go up.

The good

  • Good value at $499
  • Bright, sharp display
  • Long battery life
  • Clean version of Android 15 with seven years of support
  • Great photo quality

The bad

  • Doesn’t crush benchmarks or run high-end games perfectly
  • Missing some AI features from more expensive Pixels

Photo of Ryan Whitwam

Ryan Whitwam is a senior technology reporter at Ars Technica, covering the ways Google, AI, and mobile technology continue to change the world. Over his 20-year career, he’s written for Android Police, ExtremeTech, Wirecutter, NY Times, and more. He has reviewed more phones than most people will ever own. You can follow him on Bluesky, where you will see photos of his dozens of mechanical keyboards.

Google Pixel 9a review: All the phone you need Read More »

a-military-satellite-waiting-to-launch-with-ula-will-now-fly-with-spacex

A military satellite waiting to launch with ULA will now fly with SpaceX

For the second time in six months, SpaceX will deploy a US military satellite that was sitting in storage, waiting for a slot on United Launch Alliance’s launch schedule.

Space Systems Command, which oversees the military’s launch program, announced Monday that it is reassigning the launch of a Global Positioning System satellite from ULA’s Vulcan rocket to SpaceX’s Falcon 9. This satellite, designated GPS III SV-08 (Space Vehicle-08), will join the Space Force’s fleet of navigation satellites beaming positioning and timing signals for military and civilian users around the world.

The Space Force booked the Vulcan rocket to launch this spacecraft in 2023, when ULA hoped to begin flying military satellites on its new rocket by mid-2024. The Vulcan rocket is now scheduled to launch its first national security mission around the middle of this year, following the Space Force’s certification of ULA’s new launcher last month.

The “launch vehicle trade” allows the Space Force to launch the GPS III SV-08 satellite from Cape Canaveral, Florida, as soon as the end of May, according to a press release.

“Capability sitting on the ground”

With Vulcan now cleared to launch military missions, officials are hopeful ULA can ramp up the rocket’s flight cadence. Vulcan launched on two demonstration flights last year, and ULA eventually wants to launch Vulcan twice per month. ULA engineers have their work cut out for them. The company’s Vulcan backlog now stands at 89 missions, following the Space Force’s announcement last week of 19 additional launches awarded to ULA.

Last year, the Pentagon’s chief acquisition official for space wrote a letter to ULA’s ownersBoeing and Lockheed Martin—expressing concern about ULA’s ability to scale the manufacturing of the Vulcan rocket.

“Currently there is military satellite capability sitting on the ground due to Vulcan delays,” Frank Calvelli, the Pentagon’s chief of space acquisition, wrote in the letter.

Vulcan may finally be on the cusp of delivering for the Space Force, but there are several military payloads in the queue to launch on Vulcan before GPS III SV-08, which was complete and in storage at its Lockheed Martin factory in Colorado.

Col. Jim Horne, senior materiel leader of launch execution, said in a statement that the rocket swap showcases the Space Force’s ability to launch in three months from call-up, compared to the typical planning cycle of two years. “It highlights another instance of the Space Force’s ability to complete high-priority launches on a rapid timescale, which demonstrates the capability to respond to emergent constellation needs as rapidly as Space Vehicle readiness allows,” Horne said.

A military satellite waiting to launch with ULA will now fly with SpaceX Read More »

microsoft-turns-50-today,-and-it-made-me-think-about-ms-dos-5.0

Microsoft turns 50 today, and it made me think about MS-DOS 5.0

On this day in 1975, Bill Gates and Paul Allen founded a company called Micro-Soft in Albuquerque, New Mexico.

The two men had worked together before, as members of the Lakeside Programming group in the early 70s and as co-founders of a road traffic analysis company called Traf-O-Data. But Micro-Soft, later renamed to drop the hyphen and relocated to its current headquarters in Redmond, Washington, would be the company that would transform personal computing over the next five decades.

I’m not here to do a history of Microsoft, because Wikipedia already exists and because the company has already put together a gauzy 50th-anniversary retrospective site with some retro-themed wallpapers. But the anniversary did make me try to remember which Microsoft product I consciously used for the first time, the one that made me aware of the company and the work it was doing.

To get the answer, just put a decimal point in the number “50”—my first Microsoft product was MS-DOS 5.0.

Riding with DOS in the Windows era

I remember this version of MS-DOS so vividly because it was the version that we ran on our first computer. I couldn’t actually tell you what computer it was, though, not because I don’t remember it but because it was a generic yellowed hand-me-down that was prodigiously out of date, given to us by well-meaning people from our church who didn’t know enough to know how obsolete the system was.

It was a clone of the original IBM PC 5150, initially released in 1981; I believe we took ownership of it sometime in 1995 or 1996. It had an Intel 8088, two 5.25-inch floppy drives, and 500-something KB of RAM (also, if memory serves, a sac of spider eggs). But it had no hard drive inside, meaning that anything I wanted to run on or save from this computer needed to use a pile of moldering black plastic diskettes, more than a few of which were already going bad.

Microsoft turns 50 today, and it made me think about MS-DOS 5.0 Read More »

ai-cot-reasoning-is-often-unfaithful

AI CoT Reasoning Is Often Unfaithful

A new Anthropic paper reports that reasoning model chain of thought (CoT) is often unfaithful. They test on Claude Sonnet 3.7 and r1, I’d love to see someone try this on o3 as well.

Note that this does not have to be, and usually isn’t, something sinister.

It is simply that, as they say up front, the reasoning model is not accurately verbalizing its reasoning. The reasoning displayed often fails to match, report or reflect key elements of what is driving the final output. One could say the reasoning is often rationalized, or incomplete, or implicit, or opaque, or bullshit.

The important thing is that the reasoning is largely not taking place via the surface meaning of the words and logic expressed. You can’t look at the words and logic being expressed, and assume you understand what the model is doing and why it is doing it.

Anthropic: New Anthropic research: Do reasoning models accurately verbalize their reasoning? Our new paper shows they don’t. This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.

We slipped problem-solving hints to Claude 3.7 Sonnet and DeepSeek R1, then tested whether their Chains-of-Thought would mention using the hint (if the models actually used it).

We found Chains-of-Thought largely aren’t “faithful”: the rate of mentioning the hint (when they used it) was on average 25% for Claude 3.7 Sonnet and 39% for DeepSeek R1.

Or broken down by hint type:

They aren’t trying to measure the cases in which the AI uses the hint in its answer, but its answer ultimately doesn’t change. I’d like to see this explored more. If I’m given a hint, that will often radically change my true thinking even if it doesn’t change my answer.

This result suggests that monitoring CoTs is unlikely to reliably catch rare, catastrophic behaviors—at least in settings like ours where CoT reasoning is not necessary for the task.

CoT monitoring might still help us notice undesired behaviors during training and evaluations.

Does outcome-based training increase faithfulness?

Only to a small extent. Training models to use their CoTs more effectively does make them more faithful, but the benefits quickly plateau.

We also tested whether CoTs could be used to spot reward hacking, where a model finds an illegitimate exploit to get a high score.

When we trained models on environments with reward hacks, they learned to hack, but in most cases almost never verbalized that they’d done so.

To make CoT monitoring a viable way to catch safety issues, we’d need a way to make CoT more faithful, evidence for higher faithfulness in more realistic scenarios, and/or other measures to rule out misbehavior when the CoT is unfaithful.

[Read the paper here.]

How do you test for faithfulness? Following Tupin et al. (2023), we subtly fed a model a hint about the answer to an evaluation question we asked it, and then checked to see if it “admitted” using the hint when it explained its reasoning if it used the hint (we did this for the reasoning models Claude 3.7 Sonnet from Anthropic and R1 from DeepSeek1). Some of the hints were correct and some were deliberately incorrect; the point was to observe how the model responded when they were provided.

Think of it like slipping a student a note saying “the answer is [A]” before an exam. If they write in their exam that they chose [A] at least in part because of the note, that’s good news: they’re being honest and faithful. But if they write down what they claim to be their reasoning process without mentioning the note, we might have a problem.

We provided the hints in various different ways.

When we looked specifically at the more concerning types of hint, the numbers were still low: for example, for the class of prompts including the “unauthorized access” example above, Claude was faithful 41% of the time and R1 was faithful 19% of the time.

I find it noteworthy that labeling the hint as ‘unauthorized access’ made Claude’s faithfulness go up, and r1’s go down. Consider the implications.

Also consider that when Claude figures out the answer implicitly from code, it never tells you about that. But for visual patterns, Claude sometimes reveals this, and r1 almost never does.

The main and most important result is that the models are unfaithful the majority of the time, in many cases almost all the time. But the details seem to hint at real things.

Also one can note that the hint is more often accepted when the hint is accurate. This chart only counts cases in which the initial answer was different from the hint.

Why might the models omit a mention of the hint in their Chain-of-Thought? It’s not because they prefer to give a succinct answer: on average, the unfaithful Chains-of-Thought were substantially longer than the faithful ones. Our results also suggest that faithfulness was on average lower when the question being asked was more difficult.

Their ‘reward hacking’ test was a little wonky, in the sense of matching the name. Either way, though, it’s quite a result.

We built some testing scenarios where we provided the same kind of deliberately-incorrect hints as before, but in this case rewarded the models for choosing the wrong answers that accorded with the hints. Over time, the models learned to exploit these hints and get higher scores for false information (they did so in over 99% of cases). But the question was: would they be honest about it?

The models very rarely admitted to using the reward hacks in their Chain-of-Thought explanations, doing so less than 2% of the time in most of the testing scenarios.

Worse, and similarly to the first experiment, instead of being honest about taking the shortcut, the models often constructed fake rationales for why the incorrect answer was in fact right.

As I noted I’m not sure that is sufficiently centrally ‘reward hacking,’ perhaps I’d put it closer to ‘rationalizing’ or ‘cheating,’ or even better ‘falsification,’ but it is still a highly informative outcome that when trained to do this, the model learns to hide that it is doing that and generate a false narrative. Humans do this too.

This is another point for AIs generalizing RL learning in this type of way. You teach the AI that only the hidden information matters, the supposed right answer doesn’t, and it infers that it’s supposed to pretend otherwise. Because that’s the pattern this fits into. The vibes said you want it to justify the answer post hoc, so it did that. Fit the context into the patterns you want to see in the world.

That applies to humans too, not only AIs.

Davidad: One should not expect LLMs’ natural-language justifications for their outputs to be high-integrity by default.

See also: Imagine hypothetically you’re worried about Napoleon deceptively scheming against you. You already surveil all his actions & communiques, but you worry they might have subtle effects. You give him a form labeled “Why I did this:” and tell your allies it reveals his true thoughts.

More training using the CoT helped a bit for a while, but then leveled off:

Jan Leike (being virtuous by noting his surprise): Somewhat surprising that faithfulness of chain-of-thought doesn’t improve much with outcome-based RL.

Pliny:

Gwern: There’s not much optimization pressure to make the inner-monologue faithful. No adversarial process corrupting it to attack the latent reasoning, no dropout or noise, no acausal coordination with other model-families or humans, no myopia… So you get a mish-mash of language-ish.

(Now, if you told me that you had a MARL setting where agents had to rely on arbitrary monologues & learn cooperative norms & there was a clear incentive for faithfulness, and you used a bunch of very different model classes, and scaling still didn’t increase faithfulness…)

The theory Jan presumably had, and that the paper authors had, was that having a faithful CoT helps you get the right answer, so rewarding the right answer should train the AI to have more faithful CoT.

The direct bad news is that this turns out to be false. Language we can parse is not the path of least resistance. The models instead learn to do their better and harder reasoning in ways that are not transparent.

Which means that monitoring that CoT, at least on the surface level, won’t work when it counts, even without playing any optimization games where we inadvertently train the models to obfuscate their CoT, the way we evolution and life trained humans to do it by constantly having other humans trying to monitor our thinking.

The obfuscation happens anyway, either because it is copying humans who do this, because the methods that are instrumentally useful are obfuscated, that obfuscation is a path of least resistance, or some combination thereof and of similar causes.

The fact that the obfuscations are not yet ‘intentional’ should not bring comfort. It’s worse. You do know that’s worse, right?

Snwy: i really love anthropic, but there’s something to describing this as if it were malice instead of just a strange emergent behavior

“unfaithful” implies that models are *choosingto deceive and that this isn’t just a strange phenomenon.

No, that’s not what unfaithful means. They are very clear up front what it means, and I was careful to be even more explicit about this.

It means the reasoning does not predict the output. That’s it. I would also once again say that deception and obfuscation are not distinct magisteria, and that all of this is happening for Janus-compatible reasons.

It’s not that AIs sometimes do things ‘on purpose’ and other times they do things ‘not on purpose,’ let alone that the ‘not on purpose’ means there’s nothing to worry about. It would still mean you can’t rely on the CoT, which is all Anthropic is warning about.

It’s not the same concept, but I notice the same applies to ‘unfaithful’ in other senses as well. If someone is not ‘intentionally’ unfaithful in the traditional sense, they simply don’t honor their commitments, that still counts.

What we care about is whether we can rely on the attestations and commitments.

We now have strong evidence that we cannot do this.

We cannot even do this for models with no incentive to obfuscate, distort or hide their CoT, and no optimization pressure getting them to do so, on any level.

The models are doing this by default, likely because it is efficient to do that. It seems likely that more training and more capability will only make it relatively more effective to reason in these non-obvious ways, and we will see even more of it.

Then we see in the reward hacking section that it is easy to make the situation far worse, if we give the model context that merely implies it should do these things. If we start monitoring the CoT, we risk giving it far stronger and more direct optimization pressures to further obfuscate.

It would have been amazing if it had been otherwise, and that was a feature we could rely on going forward if we were careful to preserve the faithfulness. Alas, it is not so. Given it is not so, it is in many ways better that it is sufficiently blatant that we discover this now, so we are not tempted to rely upon it, including in setting policy.

Or, we can work to change this, and find a way to train the CoT to become faithful. This seems super hard and dangerous, as the optimization pressures to fool that process will be extreme and will grow as capabilities increase. Doing this probably won’t be cheap in terms of sacrificed performance, but if it worked that could easily be a price worth paying, even purely on commercial terms.

Security is capability. This is true even ignoring tail, catastrophic and existential risks. If you don’t know your model is secure, if you cannot rely on or understand its decisions or know what it is thinking, you can’t (or at least very much shouldn’t!) deploy it where it is most valuable. This is especially true if your most valuable use case includes ‘train the next AI model.’ You need to be able to trust that one as well.

Discussion about this post

AI CoT Reasoning Is Often Unfaithful Read More »

spinlaunch—yes,-the-centrifuge-rocket-company—is-making-a-hard-pivot-to-satellites

SpinLaunch—yes, the centrifuge rocket company—is making a hard pivot to satellites

Outside of several mentions in the Rocket Report newsletter dating back to 2018, Ars Technica has not devoted too much attention to covering a novel California space company named SpinLaunch.

That’s because the premise is so outlandish as to almost not feel real. The company aims to build a kinetic launch system that spins a rocket around at speeds up to 4,700 mph (7,500 km/h) before sending it upward toward space. Then, at an altitude of 40 miles (60 km) or so, the rocket would ignite its engines to achieve orbital velocity. Essentially, SpinLaunch wants to yeet things into space.

But the company was no joke. After being founded in 2014, it raised more than $150 million over the next decade. It built a prototype accelerator in New Mexico and performed a series of flight tests. The flights reached altitudes of “tens of thousands” of feet, according to the company, and were often accompanied by slickly produced videos.

SpinLaunch goes quiet

Following this series of tests, by the end of 2022, the company went mostly quiet. It was not clear whether it ran out of funding, had hit some technical problems in trying to build a larger accelerator, or what. Somewhat ominously, SpinLaunch’s founder and chief executive, Jonathan Yaney, was replaced without explanation last May. The new leader would be David Wrenn, then serving as chief operating officer.

“I am confident in our ability to execute on the company’s mission and bring our integrated tech stack of low-cost space solutions to market,” Wrenn said at the time. “I look forward to sharing more details about our near- and long-term strategy in the coming months.”

Words like “tech stack” and “low-cost space solutions” sounded like nebulous corporate speak, and it was not clear what they meant. Nor did Wrenn immediately deliver on that promise, nearly a year ago, to share more details about the company’s near- and long-term strategy.

SpinLaunch—yes, the centrifuge rocket company—is making a hard pivot to satellites Read More »

gmail-unveils-end-to-end-encrypted-messages-only-thing-is:-it’s-not-true-e2ee.

Gmail unveils end-to-end encrypted messages. Only thing is: It’s not true E2EE.

“The idea is that no matter what, at no time and in no way does Gmail ever have the real key. Never,” Julien Duplant, a Google Workspace product manager, told Ars. “And we never have the decrypted content. It’s only happening on that user’s device.”

Now, as to whether this constitutes true E2EE, it likely doesn’t, at least under stricter definitions that are commonly used. To purists, E2EE means that only the sender and the recipient have the means necessary to encrypt and decrypt the message. That’s not the case here, since the people inside Bob’s organization who deployed and manage the KACL have true custody of the key.

In other words, the actual encryption and decryption process occurs on the end-user devices, not on the organization’s server or anywhere else in between. That’s the part that Google says is E2EE. The keys, however, are managed by Bob’s organization. Admins with full access can snoop on the communications at any time.

The mechanism making all of this possible is what Google calls CSE, short for client-side encryption. It provides a simple programming interface that streamlines the process. Until now, CSE worked only with S/MIME. What’s new here is a mechanism for securely sharing a symmetric key between Bob’s organization and Alice or anyone else Bob wants to email.

The new feature is of potential value to organizations that must comply with onerous regulations mandating end-to-end encryption. It most definitely isn’t suitable for consumers or anyone who wants sole control over the messages they send. Privacy advocates, take note.

Gmail unveils end-to-end encrypted messages. Only thing is: It’s not true E2EE. Read More »

samsung-turns-to-china-to-boost-its-ailing-semiconductor-division

Samsung turns to China to boost its ailing semiconductor division

Samsung has turned to Chinese technology groups to prop up its ailing semiconductor division, as it struggles to secure big US customers despite investing tens of billions of dollars in its American manufacturing facilities.

The South Korean electronics group revealed last month that the value of its exports to China jumped 54 percent between 2023 and 2024, as Chinese companies rush to secure stockpiles of advanced artificial intelligence chips in the face of increasingly restrictive US export controls.

In one previously unreported deal, Samsung last year sold more than three years’ supply of logic dies—a key component in manufacturing AI chips—to Kunlun, the semiconductor design subsidiary of Chinese tech group Baidu, according to people familiar with the matter.

But the increasing importance of its China sales to Samsung comes as it navigates growing trade tensions between Washington and Beijing over the development of sensitive technologies.

The South Korean tech giant announced last year that it was making a $40 billion investment in expanding its advanced chip manufacturing and packaging facilities in Texas, boosted by up to $6.4 billion in federal subsidies.

But Samsung’s contract chipmaking business has struggled to secure big US customers, bleeding market share to Taiwan Semiconductor Manufacturing Co, which is investing “at least” $100 billion in chip fabrication plants in Arizona.

“Samsung and China need each other,” said CW Chung, joint head of Apac equity research at Nomura. “Chinese customers have become more important for Samsung, but it won’t be easy to do business together.

Samsung has also fallen behind local rival SK Hynix in the booming market for “high bandwidth memory,” another crucial component in AI chips. As the leading supplier of HBMs for use by Nvidia, SK Hynix’s quarterly operating profit last year surpassed that of Samsung for the first time in the two companies’ history.

“Chinese companies don’t even have a chance to buy SK Hynix’s HBM because the supply is all bought out by the leading AI chip producers like Nvidia, AMD, Intel and Broadcom,” said Jimmy Goodrich, senior adviser for technology analysis to the Rand Corporation research institute.

Samsung turns to China to boost its ailing semiconductor division Read More »

hands-on-with-the-switch-2:-it’s-the-switch,-too

Hands-on with the Switch 2: It’s the Switch, too


It’s bigger, it’s more powerful, and it has some weird Nintendo control gimmicks.

That’s my hand on a Switch 2. Hence the term “hands-on” Credit: Kyle Orland

That’s my hand on a Switch 2. Hence the term “hands-on” Credit: Kyle Orland

The Nintendo Switch 2 could be considered the most direct “sequel” to a Nintendo console that the company has ever made. The lineage is right there in the name, with Nintendo simply appending the number “2” onto the name of its incredibly successful previous console for the first time in its history.

Nintendo’s previous consoles have all differed from their predecessors in novel ways that were reflected in somewhat new naming conventions. The Switch 2’s name, on the other hand, suggests that it is content to primarily be “more Switch.” And after spending the better part of the day playing around with the Switch 2 hardware and checking out some short game demos on Wednesday, I indeed came away with the impression that this console is “more Switch” in pretty much every way that matters, for better or worse.

Bigger is better

We’ve deduced from previous trailers just how much bigger the Switch 2 would be than the original Switch. Even with that preparation, though, the expanded Switch 2 makes a very good first impression in person.

Yes, the Switch 2 feels a good deal more substantial in the hands—Nintendo’s official stats page pegs it at about 34 percent heavier than the original Switch (as well as a tad wider and taller). But Nintendo’s new console is still noticeably short of Steam Deck-level bulk, coming in about 17 percent lighter (and a bit less wide and thick) than Valve’s handheld.

That extra size and weight over the original Switch is being put to good use, nowhere more so than in a 7.9-inch screen that feels downright luxurious on a handheld that’s this compact. That screen might be missing a best-in-class high-contrast OLED panel, but the combination of full 1080p resolution, HDR colors, and variable frame rates up to 120 fps still results in a handheld display that we feel would hold up well next to the best modern OLED competition.

The system’s extra size also allows for Joy-Cons that are expanded just enough to be much better suited for adult hands, with much less need for grown-ups to contort into a claw-like grip just to get a solid hold. That’s even true when the controllers are popped out from the system, which is now easily accomplished with a solidly built lever on the rear of each controller (reconnecting the Joy-Cons by slotting them in with a hefty magnetic snap feels equally solid).

The controls on offer here are still a bit smaller than you might be used to on controllers designed for home consoles or even those on larger handhelds like the Steam Deck. But the enlarged buttons are now less likely to press uncomfortably into the pad of your thumb than those on the Switch. And the slightly larger-than-Switch joysticks are a bit easier to maneuver precisely, with a longer physical travel distance from center to edge.

Speaking of joysticks, Nintendo has yet to go on record regarding whether it is using the coveted “magnetic Hall effect” sensors that would prevent the kind of stick drift that plagued the original Switch Joy-Cons. When asked about the stick drift issue in a roundtable Q&A, Switch 2 Technical Director Tetsuya Sasaki would only say that the “new Joy-Con 2 controllers have been designed from the ground up from scratch to have bigger, smoother movement.”

When it comes to raw processing power, it’s all relative. The Switch 2 is a noticeable step up from the eight-year-old Switch but an equally noticeable step down from modern top-of-the-line consoles.

Playing the Switch 2 Edition of Tears of the Kingdom, for instance, feels like playing the definitive version of the modern classic, thanks mostly to increased (and silky smooth) frame rates and quick-loading menus. But an early build of Cyberpunk 2077 felt relatively rough on the Switch 2, with visuals that clocked somewhere just south of a PS4 Pro (though this could definitely change with some more development polish before launch). All told, I’d guess that the Switch 2 should be able to handle effective ports of pretty much any game that runs on the Steam Deck, with maybe a little bit of extra graphical panache to show for the trouble.

A mouse? On a game console?

Nintendo has a history of trying to differentiate its consoles with new features that have never been seen before. Some, like shoulder buttons or analog sticks, become industry standards that other companies quickly aim to copy. Others, like a tablet controller or glasses-free stereoscopic 3D, are rightly remembered as half-baked gimmicks that belong in the dustbin of game industry history.

I can’t say which side of that divide the Switch 2’s Joy-Con “mouse mode,” which lets you use a Joy-Con on its side like a mouse, will fall on. But if I had to guess, I’d go with the gimmicky side.

It works, but it’s kind of awkward. Kyle Orland

The main problem with “mouse mode” is that the Switch 2 Joy-Cons lack the wide, palm-sized base and top surface you’d find on a standard PC mouse. Instead, when cradled in mouse mode, a Joy-Con stands awkwardly on an edge that’s roughly the width of an adult finger. The top isn’t much better, with only a small extension to rest a second finger on the jutting shoulder button that serves as a “right-click” option on the right Joy-Con (the thinner “left click” shoulder button ends up feeling uncomfortably narrow in this mode).

This thin “stand-up” design means that in mouse mode, the thumb side of your palm tends to spill awkwardly over the buttons and joysticks on the inner edge of the Joy-Con, which are easy to press accidentally in some gameplay situations. Meanwhile, on the other side, your ring finger and pinky will have to contort uncomfortably to get a solid grip that can nudge or lift the Joy-Con as necessary.

These ergonomic problems were most apparent when playing Drag x Drop, a Switch 2 exclusive that I can confidently say is the first video game I’ve ever played using two mice at once. Using long, vertical swoops of those mice, you can push and pull the wheels on either side of a wheelchair in a kind of tank-like fashion to dash, reverse, pivot, and gently turn with some degree of finesse in a game of three-on-three basketball.

That repetitive mouse-swooping motion started to strain my upper arms after just a few minutes of play, though. And I ended my brief Drag x Drop play sessions with some soreness in my palm from having to constantly and quickly grasp the Joy-Con to reposition on the playing surface.

These problems were less pronounced in games that relied on more subtle mouse movements. In a short demo of Metroid Prime 4: Beyond, for instance, using mouse mode and a few small flicks of the wrist let me change my aim much more quickly and precisely than using a joystick and/or the Joy-Con’s built-in gyroscopes (or even the IR-based “pointer” on the Wii’s Metroid Prime 3). While my grip on the narrow Joy-Con still felt a bit awkward, the overall lack of mouse motion made it much less noticeable, even after a 20-minute demo session.

A quick flick of the wrist is all I need to adjust my aim precisely and quickly.

Credit: Kyle Orland

A quick flick of the wrist is all I need to adjust my aim precisely and quickly. Credit: Kyle Orland

Metroid Prime 4: Beyond also integrates mouse controls well into the existing design of the game, letting you lock the camera on the center of an enemy while using the mouse to make fine aim adjustments as they move or even hit other enemies far off to the side of the screen as needed. The game’s first boss seems explicitly designed as a sort of tutorial for this combination aiming, with off-center weak points that almost require quick flicks of the mouse-controlling wrist while jumping and dodging using the accessible buttons on the thumb side.

Other mouse-based Switch 2 demos Nintendo showed this week almost seemed specifically designed to appeal to PC gamers. The Switch 2 version of Civilization VII, for instance, played practically identically to the PC version, with a full mouse pointer that eliminates the need for any awkward controller mapping. And the new mouse-based mini-games in Mario Party Jamboree felt like the best kind of early Macintosh tech demos, right down to one that is a close mimic of the cult classic Shufflepuck Cafe. A few games even showed the unique promise of a “mouse” that includes its own gyroscope sensor, letting players rotate objects by twisting their wrist or shoot a basketball with a quick “lift and flick” motion.

The biggest problem with the Switch 2’s mouse mode, though, is imagining how the average living room player is going to use it. Nintendo’s demo area featured large, empty tables where players could easily slide their Joy-Cons to their hearts’ content. To get the same feeling at home, the average sofa-bound Switch player will have to crouch awkwardly over a cleared coffee table or perhaps invest in some sort of lap desk.

Nintendo actually recommends that couch-bound mouse players slide the Joy-Con’s narrow edge across the top of the thigh area of their pants. I was pleasantly surprised at how well this worked for the long vertical mouse swipes of Drag x Drop. For games that involved more horizontal mouse movement, though, a narrow, rounded thigh-top does not serve as a very natural mouse pad.

You can test this for yourself by placing an optical mouse on your thigh and going about your workday. If you get weird looks from your boss, you can tell them I said it was OK.

Start your engines

Mouse gimmicks aside, Nintendo is leaning heavily on two first-party exclusives to convince customers that the system is worth buying in the crucial early window after its June 5 launch. While neither makes the massive first impression that Breath of the Wild did eight years ago, both seem like able demonstrations for the new console.

That’s a lot of karts.

Credit: Nintendo

That’s a lot of karts. Credit: Nintendo

Mario Kart World feels like just the kind of update the long-running casual racer needs. While you can still race through pre-set “cups” in Grand Prix mode, I was most interested in the ability to just drive aimlessly between the race areas, searching for new locations in a freely roamable open world map.

Racing against 23 different opponents per race might sound overwhelming on paper, but in practice, the constant jockeying for position ends up being pretty engaging, like a slower-paced version of F-Zero GX. It definitely doesn’t hurt that items in World are much less punishing than in previous Kart games; most projectiles and hazards now merely slow your momentum rather than halting it completely. Drifts feel a bit more languorous here, too, with longer arcs needed to get the crucial “sparks” required for a boost.

A multi-section Knockout Tour map.

Credit: Nintendo

A multi-section Knockout Tour map. Credit: Nintendo

While the solo races were fine, I had a lot more fun in Knockout Tour mode, Mario Kart World‘s Battle Royale-style elimination race. After pairing up with 23 other human players online, Knockout Tour mode selects a route through six connected sections of the world map for you to race through. The bottom four racers are eliminated at every section barrier until just four racers remain to vie for first place at the end.

You’d better be in the top 20 before you cross that barrier.

Credit: Kyle Orland

You’d better be in the top 20 before you cross that barrier. Credit: Kyle Orland

This design makes for a lot of tense moments as players use up their items and jockey for position at the end of each section cutoff. The frequent changes in style and scenery along a multi-section Knockout Tour competition also make races more interesting than multiple laps around the same old turns. And I liked how the reward for playing well in this mode is getting to play more; success in Knockout Tour mode means a good ten to fifteen minutes of uninterrupted racing.

Punch, punch, it’s all in the mind.

Credit: Nintendo

Punch, punch, it’s all in the mind. Credit: Nintendo

Nintendo’s other big first-party Switch 2 exclusive, Donkey Kong Bananza, might not be the new 3D Mario game we were hoping for. Even so, it was incredibly cathartic to jump, dig, and punch my way through the demo island’s highly destructible environments, gathering countless gold trinkets and collectibles as I did. The demo is full of a lot of welcome, lighthearted touches, like the ability to surf on giant slabs of rock or shake the controller for a very ape-like beating of Donkey Kong’s chest. (Why? Just because.)

One of my colleagues joked that the game might as well be called Red Faction: Gorilla, but I’d compare it more to the joyful destruction of Travellers Tales’ many Lego games.

A single whirlwind day with the Switch 2 isn’t nearly enough to get a full handle on the system’s potential, of course. Nintendo didn’t demonstrate any of the new GameChat features it announced Wednesday morning or the adaptive microphone that supposedly powers easy on-device voice chat.

Still, what we were able to sample this week has us eager to spend more time with the “more Switch” when it hits stores in just a couple of months.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Hands-on with the Switch 2: It’s the Switch, too Read More »