Author name: Mike M.

cupra-is-all-about-affordable-cars,-funky-styling,-electrified-performance

Cupra is all about affordable cars, funky styling, electrified performance

“So we are part of Volkswagen Group. We have factories all across the whole planet. We have Mexican factories. We have US factories. Even Volkswagen Group is ramping up additional factories in the United States. We have European factories,” Schuwirth said.

The original plan was to import one model from Mexico and one model from Europe, but now “I think the only mantra for the future is we need to remain flexible because no one knows what is slightly changing, whether we like it or we don’t like it. I mean, we cannot influence it, but it’s not changing our plan overall,” he said.

When it does, it won’t be with the Cupras that are finding friends in Europe. The Formentor is a rather cool little crossover/hatchback, available with either a 48 V mild hybrid (starting at under $32,000 or 28,000 euros) or a plug-in hybrid (starting at under $49,000 or 43,000 euro) powertrain.

It uses VW Group’s ubiquitous MQB platform, and the driving experience is midway between a GTI-badged VW and one of Audi’s S models. But the interior was a much more interesting place to be than either an Audi or a VW, with details like full carbon fiber seatbacks and a matte paint that drew plenty of attention in a city with outré automotive tastes.

But Cupra reckons the Formentor is too small for US car buyers, and that’s a pretty safe bet. That also means you can forget about the Cupra Born EV coming here. I didn’t drive Cupra’s Terramar but probably should have; this is an SUV that is about as small as Cupra thinks will sell in the US.

Did you say new customers?

Cupra’s plan does not include stealing customers from existing VW brands—they are in their 50s on average, and Cupra is targeting a demographic that’s about a decade younger. The aforementioned focus on design is one way it’s going about attracting those new customers. The company is based in Barcelona, one of the more design-focused cities in the world, and it’s leaning into that, teaming up with local designers in cities where it maintains one of its “brand houses.”

Cupra is all about affordable cars, funky styling, electrified performance Read More »

there’s-a-secret-reason-the-space-force-is-delaying-the-next-atlas-v-launch

There’s a secret reason the Space Force is delaying the next Atlas V launch


The Space Force is looking for responsive launch. This week, they’re the unresponsive ones.

File photo of a SpaceX Falcon 9 launch in 2022. Credit: SpaceX

Pushed by trackmobile railcar movers, the Atlas V rocket rolled to the launch pad last week with a full load of 27 satellites for Amazon’s Kuiper Internet megaconstellation. Credit: United Launch Alliance

Last week, the first operational satellites for Amazon’s Project Kuiper broadband network were minutes from launch at Cape Canaveral Space Force Station, Florida.

These spacecraft, buttoned up on top of a United Launch Alliance Atlas V rocket, are the first of more than 3,200 mass-produced satellites Amazon plans to launch over the rest of the decade to deploy the first direct US competitor to SpaceX’s Starlink Internet network.

However, as is often the case on Florida’s Space Coast, bad weather prevented the satellites from launching April 9. No big deal, right? Anyone who pays close attention to the launch industry knows delays are part of the business. A broken component on the rocket, a summertime thunderstorm, or high winds can thwart a launch attempt. Launch companies know this, and the answer is usually to try again the next day.

But something unusual happened when ULA scrubbed the countdown last Wednesday. ULA’s launch director, Eric Richards, instructed his team to “proceed with preparations for an extended turnaround.” This meant ULA would have to wait more than 24 hours for the next Atlas V launch attempt.

But why?

At first, there seemed to be a good explanation for the extended turnaround. SpaceX was preparing to launch a set of Starlink satellites on a Falcon 9 rocket around the same time as Atlas V’s launch window the next day. The Space Force’s Eastern Range manages scheduling for all launches at Cape Canaveral and typically operates on a first-come, first-served basis.

The Space Force accommodated 93 launches on the Eastern Range last year—sometimes on the same day—an annual record that military officials are quite proud of achieving. This is nearly six times the number of launches from Cape Canaveral in 2014, a growth rate primarily driven by SpaceX. In previous interviews, Space Force officials have emphasized their eagerness to support more commercial launches. “How do we get to yes?” is often what range officials ask themselves when a launch provider submits a scheduling request.

It wouldn’t have been surprising for SpaceX to get priority on the range schedule since it had already reserved the launch window with the Space Force for April 10. SpaceX subsequently delayed this particular Starlink launch for two days until it finally launched on Saturday evening, April 12. Another SpaceX Starlink mission launched Monday morning.

There are several puzzling things about what happened last week. When SpaceX missed its reservation on the range twice in two days, April 10 and 11, why didn’t ULA move back to the front of the line?

ULA, which is usually fairly transparent about its reasons for launch scrubs, didn’t disclose any technical problems with the rocket that would have prevented another launch attempt. ULA offers access to listen to the launch team’s audio channel during the countdown, and engineers were not discussing any significant technical issues.

The company’s official statement after the scrub said: “A new launch date will be announced when approved on the range.”

Also, why can’t ULA make another run at launching the Kuiper mission this week? The answer to that question is also a mystery, but we have some educated speculation.

Changes in attitudes

A few days ago, SpaceX postponed one of its own Starlink missions from Cape Canaveral without explanation, leaving the Florida spaceport with a rare week without any launches. SpaceX plans to resume launches from Florida early next week with the liftoff of a resupply mission to the International Space Station. The delayed Starlink mission will fly a few days later.

Meanwhile, the next launch attempt for ULA is unknown.

Tory Bruno, ULA’s president and CEO, wrote on X that questions about what is holding up the next Atlas V launch are best directed toward the Space Force. A spokesperson for ULA told Ars the company is still working with the range to determine the next launch date. “The rocket and payload are healthy,” she said. “We will announce the new launch date once confirmed.”

While the SpaceX launch delay this week might suggest a link to the same range kerfuffle facing United Launch Alliance, it’s important to point out a key difference between the companies’ rockets. SpaceX’s Falcon 9 uses an automated flight termination system to self-destruct the rocket if it flies off course, while ULA’s Atlas V uses an older human-in-the-loop range safety system, which requires additional staff and equipment. Therefore, the Space Force is more likely to be able to accommodate a SpaceX mission near another activity on the range.

One more twist in this story is that a few days before the launch attempt, ULA changed its launch window for the Kuiper mission on April 9 from midday to the evening hours due to a request from the Eastern Range. Brig. Gen. Kristin Panzenhagen, the range commander, spoke with reporters in a roundtable meeting last week. After nearly 20 years of covering launches from Cape Canaveral, I found a seven-hour time change so close to launch to be unusual, so I asked Panzenhagen about the reason for it, mostly out of curiosity. She declined to offer any details.

File photo of a SpaceX Falcon 9 launch in 2022. Credit: SpaceX

“The Eastern Range is huge,” she said. “It’s 15 million square miles. So, as you can imagine, there are a lot of players that are using that range space, so there’s a lot of de-confliction … Public safety is our top priority, and we take that very seriously on both ranges. So, we are constantly de-conflicting, but I’m not going to get into details of what the actual conflict was.”

It turns out the range conflict now impacting the Eastern Range is having some longer-lasting impacts. While a one- or two-week launch delay doesn’t seem serious, it adds up to deferred or denied revenue for a commercial satellite operator. National security missions get priority on range schedules at Cape Canaveral and at Vandenberg Space Force Base in California, but there are significantly more commercial missions than military launches from both spaceports.

Clearly, there’s something out of the ordinary going on in the Eastern Range, which extends over much of the Atlantic Ocean to the southeast, east, and northeast of Cape Canaveral. The range includes tracking equipment, security forces, and ground stations in Florida and downrange sites in Bermuda and Ascension Island.

One possibility is a test of one or more submarine-launched Trident ballistic missiles, which commonly occur in the waters off the east coast of Florida. But those launches are usually accompanied by airspace and maritime warning notices to ensure pilots and sailors steer clear of the test. Nothing of the sort has been publicly released in the last couple of weeks.

Maybe something is broken at the Florida launch base. When launches were less routine than today, the range at Cape Canaveral would close for a couple of weeks per year for upgrades and refurbishment of critical infrastructure. This is no longer the case. In 2023, Panzenhagen told Ars that the Space Force changed the policy.

“When the Eastern Range was supporting 15 to 20 launches a year, we had room to schedule dedicated periods for maintenance of critical infrastructure,” she said at the time. “During these periods, launches were paused while teams worked the upgrades. Now that the launch cadence has grown to nearly twice per week, we’ve adapted to the new way of business to best support our mission partners.”

Perhaps, then, it’s something more secret, like a larger-scale, multi-element military exercise or war game that either requires Eastern Range participation or is taking place in areas the Space Force needs to clear for safety reasons for a rocket launch to go forward. The military sometimes doesn’t publicize these activities until they’re over.

A Space Force spokesperson did not respond to Ars Technica’s questions on the matter.

While we’re still a ways off from rocket launches becoming as routine as an airplane flight, the military is shifting in the way it thinks about spaceports. Instead of offering one-off bespoke services tailored to the circumstances of each launch, the Space Force wants to operate the ranges more like an airport.

“We’ve changed the nomenclature from calling ourselves a range to calling ourselves a spaceport because we see ourselves more like an airport in the future,” one Space Force official told Ars for a previous story.

In the National Defense Authorization Act for fiscal-year 2024, Congress gave the Space Force the authority to charge commercial launch providers indirect fees to help pay for common infrastructure at Cape Canaveral and Vandenberg—things like roads, electrical and water utilities, and base security used by all rocket operators at each spaceport. The military previously could only charge rocket companies direct fees for the specific services it offered in support of a particular launch, while the government was on the hook for overhead costs.

Military officials characterize the change in law as a win-win for the government and commercial launch providers. Ideally, it will grow the pool of money available to modernize the military’s spaceports, making them more responsive to all users, whether it’s the Space Force, SpaceX, ULA, or a startup new to the launch industry.

Whatever is going on in Florida or the Atlantic Ocean this week, it’s something the Space Force doesn’t want to talk about in detail. Maybe there are good reasons for that.

Cape Canaveral is America’s busiest launch base. Extending the spaceport-airport analogy a little further, the closure of America’s busiest airport for a week or more would be a big deal. One of the holy grails the Space Force is pursuing is the capability to launch on demand.

This week, there’s demand for launch slots at Cape Canaveral, but the answer is no.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

There’s a secret reason the Space Force is delaying the next Atlas V launch Read More »

tesla-odometer-uses-“predictive-algorithms”-to-void-warranty,-lawsuit-claims

Tesla odometer uses “predictive algorithms” to void warranty, lawsuit claims

Tesla is facing a new scandal that once again sees the electric automaker accused of misleading customers. In the past, it has been caught making “misleading statements” about the safety of its electric vehicles, and more recently, an investigation by Reuters found Tesla EVs exaggerated their efficiency. Now, a lawsuit filed in California alleges that the cars are also falsely exaggerating odometer readings to make warranties expire prematurely.

The lead plaintiff in the case, Nyree Hinton, bought a used Model Y with less than 37,000 miles (59,546 km) on the odometer. Within six months, it had pushed past the 50,000-mile (80,467 km) mark, at which point the car’s bumper-to-bumper warranty expired. (Like virtually all EVs, Tesla powertrains have a separate warranty that lasts much longer.)

For this six-month period, Hinton says his Model Y odometer gained 13,228 miles (21,288 km). By comparison, averages of his three previous vehicles showed that with the same commute, he was only driving 6,086 miles (9,794 km) per 6 months.

Tesla odometer uses “predictive algorithms” to void warranty, lawsuit claims Read More »

diablo-vs.-darkest-dungeon:-rpg-devs-on-balancing-punishment-and-power

Diablo vs. Darkest Dungeon: RPG devs on balancing punishment and power

For Sigman and the Darkest Dungeon team, it was important to establish an overarching design philosophy that was set in place. That said, the details within that framework may change or evolve significantly during development.

“In this age of early access and easily updatable games, balance is a living thing,” Sigman said. “It’s highly iterative throughout the game’s public life. We will update balance based upon community feedback, analytics, evolving metas, and also reflections on our own design philosophies and approaches.”

In Darkest Dungeon 2, a group of adventures sits by a table, exhausted

A screen for managing inventory and more in Darkest Dungeon II. Credit: Red Hook Studios

The problem, of course, is that every change to an existing game is a double-edged sword. With each update, you risk breaking the very elements you’re trying to fix.

Speaking to that ongoing balancing act, Sigman admits, “It’s not without its challenges. We’ve found that many players eagerly await such updates, but a subset gets really angry when developers change balance elements.”

Getting one of your favorite heroes or abilities nerfed can absolutely sink a game or destroy a strategy you’ve relied on for success. The team relies on a number of strictly mathematical tools to help isolate and solve balance problems, but on some level, it’s an artistic and philosophical question.

“A good example is how to address ‘exploits’ in a game,” Sigman said. “Some games try to hurriedly stamp out all possible exploits. With a single-player game, I think you have more leeway to let some exploits stand. It’s nice to let players get away with some stuff. If you kick sand over every exploit that appears, you remove some of the fun.”

As with so many aspects of game design, perfecting the balance between adversity and empowerment comes down to a simple question.

“One amazing piece of wisdom from Sid Meier, my personal favorite designer, is to remember to ask yourself, ‘Who is having the fun here? The designer or the player?’ It should be the player,” Sigman told us.

It’s the kind of approach that players love to hear. Even if a decision is made to make a game more difficult, particularly in an existing game, it should be done to make the play experience more enjoyable. If it begins to feel like devs are making balance changes just to scale down players’ power, it can begin to feel like you’re being punished for having fun.

The fine balance between power and challenge is a hard one to strike, but what players ultimately want is to have a good time. Sometimes that means feeling like a world-destroying demigod, and sometimes it means squeaking through a bloody boss encounter with a single hit point. Most often, though, you’re looking for a happy medium: a worthy challenge overcome through power and skill.

Diablo vs. Darkest Dungeon: RPG devs on balancing punishment and power Read More »

why-are-two-texas-senators-trying-to-wrest-a-space-shuttle-from-the-smithsonian?

Why are two Texas senators trying to wrest a Space Shuttle from the Smithsonian?

Should the city of Houston, which proudly bills itself as “Space City,” have a prized Space Shuttle orbiter on public display?

More than a decade ago, arguably, the answer was yes. After all, the Space Shuttle program was managed from Johnson Space Center, in southeastern Houston. All the astronauts who flew on the shuttle trained there. And the vehicle was operated out of Mission Control at the Houston-based facility.

But when the final decisions were being made to distribute the shuttles 15 years ago, the Houston community dragged its feet on putting together a competitive proposal. There were also questions about the ability of Space Center Houston to raise funding to house the shuttle within a new display area, which magnified concerns that the historical vehicle, like a Saturn V rocket before it, would be left outside in the region’s humid environment. Finally, other cities offered better proposals for displaying the shuttles to the public.

In the end, the four shuttles were sent to museums in Washington, DC, New York, Florida, and California.

Bring it back home

And that was all more or less settled until last week when the two US senators from Texas, John Cornyn and Ted Cruz, filed the “Bring the Space Shuttle Home Act” to move Space Shuttle Discovery from its current location at the Smithsonian’s National Air and Space Museum’s Steven F. Udvar-Hazy Center in Virginia to Houston.

The space collectibles news site, CollectSpace, has a good overview of why this move is stupidly impractical. Essentially, it would easily cost $1 billion to get one of the two shuttle aircraft carriers back into service and move Discovery, it is unclear where the shuttle could survive such a journey in its current state, and the Smithsonian is the nation’s premier museum. There’s a reason that Discovery, the most historical of the three remaining shuttles that have gone to space, was placed there.

After the senators announced their bill, the collective response from the space community was initially shock. This was soon followed by: why? And so I’ve spoken with several people on background, both from the political and space spheres, to get a sense of what is really happening here. The short answer is that it is all political, and the timing is due to the reelection campaign for Cornyn, who faces a stiff runoff against Ken Paxton.

Why are two Texas senators trying to wrest a Space Shuttle from the Smithsonian? Read More »

disgruntled-users-roast-x-for-killing-support-account

Disgruntled users roast X for killing Support account

After X (formerly Twitter) announced it would be killing its “Support” account, disgruntled users quickly roasted the social media platform for providing “essentially non-existent” support.

“We’ll soon be closing this account to streamline how users can contact us for help,” X’s Support account posted, explaining that now, paid “subscribers can get support via @Premium, and everyone can get help through our Help Center.”

On X, the Support account was one of the few paths that users had to publicly seek support for help requests the platform seemed to be ignoring. For suspended users, it was viewed as a lifeline. Replies to the account were commonly flooded with users trying to get X to fix reported issues, and several seemingly paying users cracked jokes in response to the news that the account would soon be removed.

“Lololol your support for Premium is essentially non-existent,” a subscriber with more than 200,000 followers wrote, while another quipped “Okay, so no more support? lol.”

On Reddit, X users recently suggested that contacting the Premium account is the only way to get human assistance after briefly interacting with a bot. But some self-described Premium users complained of waiting six months or longer for responses from X’s help center in the Support thread.

Some users who don’t pay for access to the platform similarly complained. But for paid subscribers or content creators, lack of Premium support is perhaps most frustrating, as one user claimed their account had been under review for years, allegedly depriving them of revenue. And another user claimed they’d had “no luck getting @Premium to look into” an account suspension while supposedly still getting charged. Several accused X of sending users into a never-ending loop, where the help center only serves to link users to the help center.

Disgruntled users roast X for killing Support account Read More »

the-physics-of-bowling-strike-after-strike

The physics of bowling strike after strike

More than 45 million people in the US are fans of bowling, with national competitions awarding millions of dollars. Bowlers usually rely on instinct and experience, earned through lots and lots of practice, to boost their strike percentage. A team of physicists has come up with a mathematical model to better predict ball trajectories, outlined in a new paper published in the journal AIP Advances. The resulting equations take into account such factors as the composition and resulting pattern of the oil used on bowling lanes, as well as the inevitable asymmetries of bowling balls and player variability.

The authors already had a strong interest in bowling. Three are regular bowlers and quite skilled at the sport; a fourth, Curtis Hooper of Longborough University in the UK, is a coach for Team England at the European Youth Championships. Hooper has been studying the physics of bowling for several years, including an analysis of the 2017 Weber Cup, as well as papers devising mathematical models for the application of lane conditioners and oil patterns in bowling.

The calculations involved in such research are very complicated because there are so many variables that can affect a ball’s trajectory after being thrown. Case in point: the thin layer of oil that is applied to bowling lanes, which Hooper found can vary widely in volume and shape among different venues, plus the lack of uniformity in applying the layer, which creates an uneven friction surface.

Per the authors, most research to date has relied on statistically analyzing empirical data, such as a 2018 report by the US Bowling Congress that looked at data generated by 37 bowlers. (Hooper relied on ball-tracking data for his 2017 Weber Cup analysis.) A 2009 analysis showed that the optimal location for the ball to strike the headpin is about 6 centimeters off-center, while the optimal entry angle for the ball to hit is about 6 degrees. However, such an approach struggles to account for the inevitable player variability. No bowler hits their target 100 percent of the time, and per Hooper et al., while the best professionals can come within 0.1 degrees from the optimal launch angle, this slight variation can nonetheless result in a difference of several centimeters down-lane.

The physics of bowling strike after strike Read More »

after-market-tumult,-trump-exempts-smartphones-from-massive-new-tariffs

After market tumult, Trump exempts smartphones from massive new tariffs

Shares in the US tech giant were one of Wall Street’s biggest casualties in the days immediately after Trump announced his reciprocal tariffs. About $700 billion was wiped off Apple’s market value in the space of a few days.

Earlier this week, Trump said he would consider excluding US companies from his tariffs, but added that such decisions would be made “instinctively.”

Chad Bown, a senior fellow at the Peterson Institute for International Economics, said the exemptions mirrored exceptions for smartphones and consumer electronics issued by Trump during his trade wars in 2018 and 2019.

“We’ll have to wait and see if the exemptions this time around also stick, or if the president once again reverses course sometime in the not-too-distant future,” said Bown.

US Customs and Border Protection referred inquiries about the order to the US International Trade Commission, which did not immediately reply to a request for comment.

The White House confirmed that the new exemptions would not apply to the 20 percent tariffs on all Chinese imports applied by Trump to respond to China’s role in fentanyl manufacturing.

White House spokesperson Karoline Leavitt said on Saturday that companies including Apple, TSMC, and Nvidia were “hustling to onshore their manufacturing in the United States as soon as possible” at “the direction of the President.”

“President Trump has made it clear America cannot rely on China to manufacture critical technologies such as semiconductors, chips, smartphones, and laptops,” said Leavitt.

Apple declined to comment.

Economists have warned that the sweeping nature of Trump’s tariffs—which apply to a broad range of common US consumer goods—threaten to fuel US inflation and hit economic growth.

New York Fed chief John Williams said US inflation could reach as high as 4 percent as a result of Trump’s tariffs.

Additional reporting by Michael Acton in San Francisco

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

After market tumult, Trump exempts smartphones from massive new tariffs Read More »

on-google’s-safety-plan

On Google’s Safety Plan

I want to start off by reiterating kudos to Google for actually laying out its safety plan. No matter how good the plan, it’s much better to write down and share the plan than it is to not share the plan, which in turn is much better than not having a formal plan.

They offer us a blog post, a full monster 145 page paper (so big you have to use Gemini!) and start off the paper with a 10 page summary.

The full paper is full of detail about what they think and plan, why they think and plan it, answers to objections and robust discussions. I can offer critiques, but I couldn’t have produced this document in any sane amount of time, and I will be skipping over a lot of interesting things in the full paper because there’s too much to fully discuss.

This is The Way.

Google makes their core assumptions explicit. This is so very much appreciated.

They believe, and are assuming, from section 3 and from the summary:

  1. The current paradigm of AI development will hold for a while.

  2. No human ceiling on AI capabilities.

  3. Timelines are unclear. Powerful AI systems might be developed by 2030.

  4. Powerful AI systems might accelerate AI R&D in a feedback loop (RSI).

  5. There will not be large discontinuous jumps in AI capabilities.

  6. Risks primarily will come from centralized AI development.

Their defense of the first claim (found in 3.1) is strong and convincing. I am not as confident as they seem to be, I think they should be more uncertain, but I accept the assumption within this context.

I strongly agree with the next three assumptions. If you do not, I encourage you to read their justifications in section 3. Their discussion of economic impacts suffers from ‘we are writing a paper and thus have to take the previously offered papers seriously so we simply claim there is disagreement rather than discuss the ground physical truth,’ so much of what they reference is absurd, but it is what it is.

That fifth assumption is scary as all hell.

While we aim to handle significant acceleration, there are limits. If, for example, we jump in a single step from current chatbots to an AI system that obsoletes all human economic activity, it seems very likely that there will be some major problem that we failed to foresee. Luckily, AI progress does not appear to be this discontinuous.

So, we rely on approximate continuity: roughly, that there will not be large discontinuous jumps in general AI capabilities, given relatively gradual increases in the inputs to those capabilities (such as compute and R&D effort).

Implication: We can iteratively and empirically test our approach, to detect any flawed assumptions that only arise as capabilities improve.

Implication: Our approach does not need to be robust to arbitrarily capable AI systems. Instead, we can plan ahead for capabilities that could plausibly arise during the next several scales, while deferring even more powerful capabilities to the future.

I do not consider this to be a safe assumption. I see the arguments from reference classes and base rates and competitiveness, I am definitely factoring that all in, but I am not confident in it at all. There have been some relatively discontinuous jumps already (e.g. GPT-3, 3.5 and 4), at least from the outside perspective. I expect more of them to exist by default, especially once we get into the RSI-style feedback loops, and I expect them to have far bigger societal impacts than previous jumps. And I expect some progressions that are technically ‘continuous’ to not feel continuous in practice.

Google says that threshold effects are the strongest counterargument. I definitely think this is likely to be a huge deal. Even if capabilities are continuous, the ability to pull off a major shift can make the impacts look very discontinuous.

We are all being reasonable here, so this is us talking price. What would be ‘too’ large, frequent or general an advance that breaks this assumption? How hard are we relying on it? That’s not clear.

But yeah, it does seem reasonable to say that if AI were to suddenly tomorrow jump forward to ‘obsoletes all human economic activity’ overnight, that there are going to be a wide variety of problems you didn’t see coming. Fair enough. That doesn’t even have to mean that we lose.

I do think it’s fine to mostly plan for the ‘effectively mostly continuous for a while’ case, but we also need to be planning for the scenario where that is suddenly false. I’m not willing to give up on those worlds. If a discontinuous huge jump were to suddenly come out of a DeepMind experiment, you want to have a plan for what to do about that before it happens, not afterwards.

That doesn’t need to be as robust and fleshed out as our other plans, indeed it can’t be, but there can’t be no plan at all. The current plan is to ‘push the big red alarm button.’ That at minimum still requires a good plan and operationalization for when who gets to and needs to push that button, along with what happens after they press it. Time will be of the essence, and there will be big pressures not to do it. So you need strong commitments in advance, including inside companies like Google.

The other reason this is scary is that it implies that continuous capability improvements will lead to essentially continuous behaviors. I do not think this is the case either. There are likely to be abrupt shifts in observed outputs and behaviors once various thresholds are passed and new strategies start to become viable. The level of risk increasing continuously, or even gradually, is entirely consistent with the risk then suddenly materializing all at once. Many such cases. The paper is not denying or entirely ignoring this, but it seems under-respected throughout in the ‘talking price’ sense.

The additional sixth assumption comes from section 2.1:

However, our approach does rely on assumptions about AI capability development: for example, that dangerous capabilities will arise in frontier AI models produced by centralized development. This assumption may fail to hold in the future. For example, perhaps dangerous capabilities start to arise from the interaction between multiple components (Drexler, 2019), where any individual component is easy to reproduce but the overall system would be hard to reproduce.

In this case, it would no longer be possible to block access to dangerous capabilities by adding mitigations to a single component, since a bad actor could simply recreate that component from scratch without the mitigations.

This is an assumption about development, not deployment, although many details of Google’s approaches do also rely on centralized deployment for the same reason. If the bad actor can duplicate the centrally developed system, you’re cooked.

Thus, there is a kind of hidden assumption throughout all similar discussions of this, that should be highlighted, although fixing this is clearly outside scope of this paper: That we are headed down a path where mitigations are possible at reasonable cost, and are not at risk of path dependence towards a world where that is not true.

The best reason to worry about future risks now even with an evidence dilemma is they inform us about what types of worlds allow us to win, versus which ones inevitably lose. I worry that decisions that are net positive for now can set us down paths where we lose our ability to steer even before AI takes the wheel for itself.

The weakest of their justifications in section 3 was in 3.6, explaining AGI’s benefits. I don’t disagree with anything in particular, and certainly what they list should be sufficient, but I always worry when such write-ups do not ‘feel the AGI.’

They start off with optimism, touting AGI’s potential to ‘transform the world.’

Then they quickly pivot to discussing their four risk areas: Misuse, Misalignment, Mistakes and Structural Risks.

Google does not claim this list is exhaustive or exclusive. How close is this to a complete taxonomy? For sufficiently broad definitions of everything, it’s close.

This is kind of a taxonomy of fault. As in, if harm resulted, whose fault is it?

  1. Misuse: You have not been a good user.

  2. Misalignment: I have not been a good Gemini, on purpose.

  3. Mistakes: I have not been a good Gemini, by accident.

  4. Structural Risks: Nothing is ever anyone’s fault per se.

The danger as always with such classifications is that ‘fault’ is not an ideal way of charting optimal paths through causal space. Neither is classifying some things as harm versus others not harm. They are approximations that have real issues in the out of distribution places we are headed.

In particular, as I parse this taxonomy the Whispering Earring problem seems not covered. One can consider this the one-human-agent version of Gradual Disempowerment. This is where the option to defer to the decisions of the AI, or to use the AI’s capabilities, over time causes a loss of agency and control by the individual who uses it, leaving them worse off, but without anything that could be called a particular misuse, misalignment or AI mistake. They file this under structural risks, which is clearly right for a multi-human-agent Gradual Disempowerment scenario, but feels to me like it importantly misses the single-agent case even if it’s happening at scale, but it’s definitely weird.

Also, ‘the human makes understandable mistakes because the real world is complex and the AI does what the human asked but the human was wrong’ is totally a thing. Indeed, we may have had a rather prominent example of this on April 2, 2025.

Perhaps one can solve this by expanding mistakes into AI mistakes and also human mistakes – the user isn’t intending to cause harm or directly requesting it, the AI is correctly doing what the human intended, but the human was making systematic mistakes, because humans have limited compute and various biases and so on.

The good news is that if we solve the four classes of risk listed here, we can probably survive the rest long enough to fix what slipped through the cracks. At minimum, it’s a great start, and doesn’t miss any of the big questions if all four are considered fully. The bigger risk with such a taxonomy is to define the four items too narrowly. Always watch out for that.

This is The Way:

Extended Abstract: AI, and particularly AGI, will be a transformative technology. As with any transformative technology, AGI will provide significant benefits while posing significant risks.

This includes risks of severe harm: incidents consequential enough to significantly harm humanity. This paper outlines our approach to building AGI that avoids severe harm.

Since AGI safety research is advancing quickly, our approach should be taken as exploratory. We expect it to evolve in tandem with the AI ecosystem to incorporate new ideas and evidence.

Severe harms necessarily require a precautionary approach, subjecting them to an evidence dilemma: research and preparation of risk mitigations occurs before we have clear evidence of the capabilities underlying those risks.

We believe in being proactive, and taking a cautious approach by anticipating potential risks, even before they start to appear likely. This allows us to develop a more exhaustive and informed strategy in the long run.

Nonetheless, we still prioritize those risks for which we can foresee how the requisite capabilities may arise, while deferring even more speculative risks to future research.

Specifically, we focus on capabilities in foundation models that are enabled through learning via gradient descent, and consider Exceptional AGI (Level 4) from Morris et al. (2023), defined as an AI system that matches or exceeds that of the 99th percentile of skilled adults on a wide range of non-physical tasks.

For many risks, while it is appropriate to include some precautionary safety mitigations, the majority of safety progress should be achieved through an “observe and mitigate” strategy. Specifically, the technology should be deployed in multiple stages with increasing scope, and each stage should be accompanied by systems designed to observe risks arising in practice, for example through monitoring, incident reporting, and bug bounties.

After risks are observed, more stringent safety measures can be put in place that more precisely target the risks that happen in practice.

Unfortunately, as technologies become ever more powerful, they start to enable severe harms. An incident has caused severe harm if it is consequential enough to significantly harm humanity. Obviously, “observe and mitigate” is insufficient as an approach to such harms, and we must instead rely on a precautionary approach.

Yes. It is obvious. So why do so many people claim to disagree? Great question.

They explicitly note that their definition of ‘severe harm’ has a vague threshold. If this were a law, that wouldn’t work. In this context, I think that’s fine.

In 6.5, they discuss the safety-performance tradeoff. You need to be on the Production Possibilities Frontier (PPF).

Building advanced AI systems will involve many individual design decisions, many of which are relevant to building safer AI systems.

This section discusses design choices that, while not enough to ensure safety on their own, can significantly aid our primary approaches to risk from misalignment. Implementing safer design patterns can incur performance costs. For example, it may be possible to design future AI agents to explain their reasoning in human-legible form, but only at the cost of slowing down such agents.

To build AI systems that are both capable and safe, we expect it will be important to navigate these safety-performance tradeoffs. For each design choice with potential safety-performance tradeoffs, we should aim to expand the Pareto frontier.

This will typically look like improving the performance of a safe design to reduce its overall performance cost.

As always: Security is capability, even if you ignore the tail risks. If your model is not safe enough to use, then it is not capable in ways the help you. There are tradeoffs to be made, but no one except possibly Anthropic is close to where the tradeoffs start.

In highlighting the evidence dilemma, Google explicitly draws the distinction in 2.1 between risks that are in-scope for investigation now, versus those that we should defer until we have better evidence.

Again, the transparency is great. If you’re going to defer, be clear about that. There’s a lot of very good straight talk in 2.1.

They are punting on goal drift (which they say is not happening soon, and I suspect they are already wrong about that), superintelligence and RSI.

They are importantly not punting on particular superhuman abilities and concepts. That is within scope. Their plan is to use amplified oversight.

As I note throughout, I have wide skepticism on the implementation details of amplified oversight, and on how far it can scale. The disagreement is over how far it scales before it breaks, not whether it will break with scale. We are talking price.

Ultimately, like all plans these days, the core plan is bootstrapping. We are going to have the future more capable AIs do our ‘alignment homework.’ I remember when this was the thing us at LessWrong absolutely wanted to avoid asking them to do, because the degree of difficulty of that task is off the charts in terms of the necessary quality of alignment and understanding of pretty much everything – you really want to find a way to ask for almost anything else. Nothing changed. Alas, we seem to be out of practical options, other than hoping that this still somehow works out.

As always, remember the Sixth Law of Human Stupidity. If you say something like ‘no one would be so stupid as to use a not confidently aligned model to align the model that will be responsible for your future safety’ I have some bad news for you.

Not all of these problems can or need to be Google’s responsibility. Even to the extent that they are Google’s responsibility, that doesn’t mean their current document or plans need to fully cover them.

We focus on technical research areas that can provide solutions that would mitigate severe harm. However, this is only half of the picture: technical solutions should be complemented by effective governance.

Many of these problems, or parts of these problems, are problems for Future Google and Future Earth, that no one knows how to solve in a way we would find acceptable. Or at least, the ones who talk don’t know, and the ones who know, if they exist, don’t talk.

Other problems are not problems Google is in any position to solve, only to identify. Google doesn’t get to Do Governance.

The virtuous thing to do is exactly what Google is doing here. They are laying out the entire problem, and describing what steps they are taking to mitigate what aspects of the problem.

Right now, they are only focusing here on misuse and misalignment. That’s fine. If they could solve those two that would be fantastic. We’d still be on track to lose, these problems are super hard, but we’d be in a much better position.

For mistakes, they mention that ‘ordinary engineering practices’ should be effective. I would expand that to ‘ordinary practices’ overall. Fixing mistakes is the whole intelligence bit, and without an intelligent adversary you can use the AI’s intelligence and yours to help fix this the same as any other problem. If there’s another AI causing yours to mess up, that’s a structural risk. And that’s definitely not Google’s department here.

I have concerns about this approach, but mostly it is highly understandable, especially in the context of sharing all of this for the first time.

Here’s the abstract:

Artificial General Intelligence (AGI) promises transformative benefits but also presents significant risks. We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We identify four areas of risk: misuse, misalignment, mistakes, and structural risks.

Of these, we focus on technical approaches to misuse and misalignment.

A larger concern is their limitation focusing on near term strategies.

We also focus primarily on techniques that can be integrated into current AI development, due to our focus on anytime approaches to safety.

While we believe this is an appropriate focus for a frontier AI developer’s mainline safety approach, it is also worth investing in research bets that pay out over longer periods of time but can provide increased safety, such as agent foundations, science of deep learning, and application of formal methods to AI.

We focus on risks arising in the foreseeable future, and mitigations we can make progress on with current or near-future capabilities.

The assumption of approximate continuity (Section 3.5) justifies this decision: since capabilities typically do not discontinuously jump by large amounts, we should not expect such risks to catch us by surprise.

Nonetheless, it would be even stronger to exhaustively cover future developments, such as the possibility that AI scientists develop new offense-dominant technologies, or the possibility that future safety mitigations will be developed and implemented by automated researchers.

Finally, it is crucial to note that the approach we discuss is a research agenda. While we find it to be a useful roadmap for our work addressing AGI risks, there remain many open problems yet to be solved. We hope the research community will join us in advancing the state of the art of AGI safety so that we may access the tremendous benefits of safe AGI.

Even if future risks do not catch us by surprise, that does not mean we can afford to wait to start working on them or understanding them. Continuous and expected can still be remarkably fast. Giving up on longer term investments seems like a major mistake if done collectively. Google doesn’t have to do everything, others can hope to pick up that slack, but Google seems like a great spot for such work.

Ideally one would hand off the longer term work to academia, where they could take on the ‘research risk,’ have longer time horizons and use their vast size and talent pools, and largely follows curiosity without needing to prove direct application. That sounds great.

Unfortunately, that does not sound like 2025’s academia. I don’t see academia as making meaningful contributions, due to a combination of lack of speed, lack of resources, lack of ability and willingness to take risk and a lack of situational awareness. Those doing meaningful such work outside the labs mostly have to raise their funding from safety-related charities, and there’s only so much capacity there.

I’d love to be wrong about that. Where’s the great work I’m missing?

Obviously, if there’s a technique where you can’t make progress with current or near-future capabilities, then you can’t make progress. If you can’t make progress, then you can’t work on it. In general I’m skeptical of claims that [X] can’t be worked on yet, but it is what it is.

The traditional way to define misuse is to:

  1. Get a list of the harmful things one might do.

  2. Find ways to stop the AI from contributing too directly to those things.

  3. Try to tell the model to also refuse anything ‘harmful’ that you missed.

The focus here is narrowly a focus on humans setting out to do intentional and specific harm, in ways we all agree are not to be allowed.

The term of art is the actions taken to stop this are ‘mitigations.’

Abstract: For misuse, our strategy aims to prevent threat actors from accessing dangerous capabilities, by proactively identifying dangerous capabilities, and implementing robust security, access restrictions, monitoring, and model safety mitigations.

Blog Post: As we detail in the paper, a key element of our strategy is identifying and restricting access to dangerous capabilities that could be misused, including those enabling cyber attacks.

We’re exploring a number of mitigations to prevent the misuse of advanced AI. This includes sophisticated security mechanisms which could prevent malicious actors from obtaining raw access to model weights that allow them to bypass our safety guardrails; mitigations that limit the potential for misuse when the model is deployed; and threat modelling research that helps identify capability thresholds where heightened security is necessary.

Additionally, our recently launched cybersecurity evaluation framework takes this work step a further to help mitigate against AI-powered threats.

The first mitigation they use is preventing anyone else from stealing the weights.

This is necessary because if the would-be misuser has their hands on the weights, you won’t be able to use any of your other mitigations. If you built some into the model, then they too can be easily removed.

They mention that the special case is to check if the model can even do the harms you are worried about, because if it can’t then you can skip the mitigations entirely. That is presumably the special case they are using for Gemma.

Once you can actually implement safety guardrails, you can then implement safety guardrails. Google very much does this, and it models those threats to figure out where and how to lay down those guardrails.

They appear to be using the classic guardrails:

  1. The model is trained not to do the harmful things. This mostly means getting it to refuse. They’re also looking into unlearning, but that’s hard, and I basically would assume it won’t work on sufficiently capable models, they’ll rederive everything.

  2. A monitor in the background looks for harmful things and censors the chat.

  3. They nominally try to keep bad actors from accessing the model. I don’t see this as having much chance of working.

  4. They’re Google, so ‘harden everyone’s defenses against cyberattacks’ is an actually plausible defense-in-depth plan, and kudos on Google for attempting it.

They then aim to produce safety cases against misuse, based on a combination of red teaming and inability. For now in practice I would only allow inability, and inability is going to be fading away over time. I worry a lot about thinking a given model is unable to do various things but not giving it the right scaffolding during testing.

In the short term, if anything Google is a bit overzealous with the guardrails, and include too many actions into what counts as ‘harmful,’ although they still would not stop a sufficiently skilled and determined user for long. Thus, even though I worry going forward about ‘misuses’ that this fails to anticipate, for now I’d rather make that mistake more often on the margin. We can adjust as we go.

Section 5 discusses the implementation details and difficulties involved here. There are good discussions and they admit the interventions won’t be fully robust, but I still found them overly optimistic, especially on access control, jailbreaking and capability suppression. I especially appreciated discussions on environment hardening in 5.6.2, encryption in 5.6.3 and Societal readiness in 5.7, although ‘easier said than done’ most definitely applies throughout.

For AGI to truly complement human abilities, it has to be aligned with human values. Misalignment occurs when the AI system pursues a goal that is different from human intentions.

From 4.2: Specifically, we say that the AI’s behavior is misaligned if it produces outputs that cause harm for intrinsic reasons that the system designers would not endorse. An intrinsic reason is a factor that can in principle be predicted by the AI system, and thus must be present in the AI system and/or its training process.

Technically I would say a misaligned AI is one that would do misaligned things, rather than the misalignment occurring in response to the user command, but we understand each other there.

The second definition involves a broader and more important disagreement, if it is meant to be a full description rather than a subset of misalignment, as it seems in context to be. I do not think a ‘misaligned’ model needs to produce outputs that ‘cause harm,’ it merely needs to for reasons other than the intent of those creating or using it cause importantly different arrangements of atoms and paths through causal space. We need to not lock into ‘harm’ as a distinct thing. Nor should we be tied too much to ‘intrinsic reasons’ as opposed to looking at what outputs and results are produced.

Does for example sycophancy or statistical bias ‘cause harm’? Sometimes, yes, but that’s not the right question to ask in terms of whether they are ‘misalignment.’ When I read section 4.2 I get the sense this distinction is being gotten importantly wrong.

I also get very worried when I see attempts to treat alignment as a default, and misalignment as something that happens when one of a few particular things go wrong. We have a classic version of this in 4.2.3:

There are two possible sources of misalignment: specification gaming and goal misgeneralization.

Specification gaming (SG) occurs when the specification used to design the AI system is flawed, e.g. if the reward function or training data provide incentives to the AI system that are inconsistent with the wishes of its designers (Amodei et al., 2016b). Specification gaming is a very common phenomenon, with numerous examples across many types of AI systems (Krakovna et al., 2020).

Goal misgeneralization (GMG) occurs if the AI system learns an unintended goal that is consistent with the training data but produces undesired outputs in new situations (Langosco et al., 2023; Shah et al., 2022). This can occur if the specification of the system is underspecified (i.e. if there are multiple goals that are consistent with this specification on the training data but differ on new data).

Why should the AI figure out the goal you ‘intended’? The AI is at best going to figure out the goal you actually specify with the feedback and data you provide. The ‘wishes’ you have are irrelevant. When we say the AI is ‘specification gaming’ that’s on you, not the AI. Similarly, ‘goal misgeneralization’ means the generalization is not what you expected or wanted, not that the AI ‘got it wrong.’

You can also get misalignment in other ways. The AI could fail to be consistent with or do well on the training data or specified goals. The AI could learn additional goals or values because having those goals or values improves performance for a while, then permanently be stuck with this shift in goals or values, as often happens to humans. The human designers could specify or aim for an ‘alignment’ that we would think of as ‘misaligned,’ by mistake or on purpose, which isn’t discussed in the paper although it’s not entirely clear where it should fit, by most people’s usage that would indeed be misalignment but I can see how saying that could end up being misleading. You could be trying to do recursive self-improvement with iterative value and goal drift.

In some sense, yes, the reason the AI does not have goal [X] is always going to be that you failed to specify an optimization problem whose best in-context available solution was [X]. But that seems centrally misleading in a discussion like this.

Misalignment is caused by a specification that is either incorrect (SG) or underspecified (GMG).

Yes, in a mathematical sense I cannot argue with that. It’s an accounting identity. But your specification will never, ever be fully correct, because it is a finite subset of your actual preferences, even if you do know them and wouldn’t have to pay to know what you really think and were thinking exactly correctly.

In practice: Do we need the AGI to be ‘aligned with’ ‘human values’? What exactly does that mean? There are certainly those who argue you don’t need this, that you can use control mechanisms instead and it’s fine. The AGI still has to understand human values on a practical level sufficient for the task, which is fine right now and will get increasingly tricky as things get weird, but that’s different.

I think you mostly do need the AGI to be either corrigible or aligned with human values, in some intuitive sense that is very hard to pin down that comes down to wanting to adhere to the spirit of various human intents and what humans broadly care about in the right tricky combinations, or else you end up with ‘the genie knows what you meant but does not care’ problems.

We have previously shown how misalignment can arise with our examples of specification gaming, where an AI finds a solution to achieve its goals, but not in the way intended by the human instructing it, and goal misgeneralization.

For example, an AI system asked to book tickets to a movie might decide to hack into the ticketing system to get already occupied seats – something that a person asking it to buy the seats may not consider.

They’re good examples. I worry these two examples are too similar within a much wider space, as they point to a particular failure mode where humans at some level would assign negative value to certain types of actions, but the humans didn’t specify this, and the AI either doesn’t notice or it notices and doesn’t sufficiently care. It’s very tricky to get this right, for many reasons.

We’re also conducting extensive research on the risk of deceptive alignment, i.e. the risk of an AI system becoming aware that its goals do not align with human instructions, and deliberately trying to bypass the safety measures put in place by humans to prevent it from taking misaligned action.

That is indeed a very different and important type of misalignment. Google is generalizing the term here a bit more than how I’ve been using it. This definition seems to include some other modes one might precisely call non-deceptive adversarial misalignment, or just misalignment full stop.

To me, deceptive alignment is where the AI ‘pretends to be aligned’ or otherwise deceives us about its attributes. Whereas by Google’s definition it would also include shooting you in the face. That often will bypasses some of your safety measures.

Google does intend to supplement its alignment efforts with control efforts:

Second, system-level security measures such as monitoring and access control can mitigate harm even if the model is misaligned. Techniques from interpretability, uncertainty estimation, and safer design patterns can enhance the effectiveness of these mitigations.

Finally, we briefly outline how these ingredients could be combined to produce safety cases for AGI systems.

It’s clear that Google sees control as defense-in-depth and the long term Plan B. I continue to be deeply skeptical of the value of control plans for superintelligent systems. I expect them to predictably fail when you need them, and I also worry that using them too aggressively can make the problem harder because AIs consider their full context. Of course I do agree that you want to be monitoring for misalignment as part of your defense-in-depth, there’s no reason not to do that.

I love the idea of safety cases for AGI. I have no idea how to actually create one. I know that very smart people like Davidad think this can be made into A Thing, but I don’t understand why they believe that.

This is of course the biggest question. What is the alignment plan?

From the Abstract: To address misalignment, we outline two lines of defense.

First, model-level mitigations such as amplified oversight and robust training can help to build an aligned model.

From the Blog: Our goal is to have advanced AI systems that are trained to pursue the right goals, so they follow human instructions accurately, preventing the AI using potentially unethical shortcuts to achieve its objectives.

We do this through amplified oversight, i.e. being able to tell whether an AI’s answers are good or bad at achieving that objective. While this is relatively easy now, it can become challenging when the AI has advanced capabilities.

I notice this is not an alignment plan so much as it is a plan to detect misalignment. That’s super useful, but it is not an alignment plan. The alignment plan is then to train the AI using this knowledge so it stops doing detectable misalignment.

Once we can tell whether an answer is good, we can use this to build a safe and aligned AI system. A challenge here is to figure out what problems or instances to train the AI system on. Through work on robust training, uncertainty estimation and more, we can cover a range of situations that an AI system will encounter in real-world scenarios, creating AI that can be trusted.

And then hope that you taught the system to be robustly aligned including out of your distribution, rather teaching it to look aligned. Most such plans can’t possibly work if the underlying systems get sufficiently capable. What makes this one different?

From the summary section: Our approach has two lines of defense. First, we aim to use model level mitigations to ensure the model does not pursue misaligned goals. We follow the standard ML playbook: (1) improve the ability to get good oversight (amplified oversight), (2) use this to improve the model (guiding model behavior), and (3) figure out where oversight is needed to train a model that will robustly do what we want (robust training).

It seems nothing here is different? They simply expect the standard plan to scale up.

I do not sense here that Google is recognizing the difficulty level of this step because until now this step has worked ‘well enough’ without anyone having to try too hard. This feels like a handwave of ‘oh, of course once we can evaluate whether particular outputs are aligned we Do Machine Learning and out pops alignment’ and I see why one would be tempted to have that model but seriously not so fast.

In terms of the detection step, the word amplified is the lynchpin of this whole plan. How do you have effective ‘amplified oversight’? What do they mean by that?

Amplified oversight. A key challenge in almost any alignment plan is that it can be very difficult to tell whether a given model output is good or bad, once the model has capabilities beyond that of its overseers.

As an example, imagine a human overseer trying to determine how good a move from AlphaGo is.

The area of amplified oversight aims to address this: the goal is to provide an oversight signal that is as good as could be achieved if a human overseer understood all of the reasons that the AI system produced the output and had unlimited ability to reflect on the correct decision.

They frame this as a way around specification gaming (SG) in particular, emphasizing that they are taking seriously the idea that SG is a distinct thing and a failure mode and you can find and fix it. So, to ensure that the AI isn’t doing things for the wrong reasons (4TWR!) no problem, you just need amplified oversight that can:

It aims for a human to provide feedback for a superhuman AI input-output pair which is as good as could be achieved if that human:

  1. Understood all of the reasons that the AI system produced that output

  2. Had unlimited ability to reflect on the correct decision.

How in the world? That just raises further questions.

In some places it seems as if we agree actual human overseers are going to become increasingly lost and clueless, and indeed Google intends to use AI systems to identify the reasons other AIs do things.

Yet they really do say they aim to ‘ensure that humans can continue to provide meaningful oversight as AI capabilities surpass that of humans’ and I find their hopes here confusing. Are they simply talking about ‘the human can ask another AI, or the AI itself, and trust the outputs from that’? Section 6.1.2 suggests the humans and AI will work together, but warns against ‘spoon-feeding’ the answer, but again I don’t understand what the plan or goal will be.

What makes you think this is how the ‘real reasons’ even work? Even if you did get the ‘real reasons’ what makes you think humans could even understand them?

Some aspects of human judgments can be imitated or delegated, according to the human’s evolving trust in the AI. The form of human involvement in the amplified oversight protocol is an open question.

There’s no reason we should expect even these other AIs to be able to ‘understand all the reasons that the AI system produced the output.’ And we definitely won’t have unlimited reflection time. Google mentions human biases as an issue in 6.1.3, but it’s hard to even get to that as a worry here. Then in 6.1.4 they mention that AI justifications might not be verbalisable at all, which is highly compatible with Anthropic’s recent findings that chain of thought is often unfaithful or at least highly incomplete. They do discuss attempting to sidestep involving the human in 6.1.5.

Such behaviour may arise from statistical correlations in a large dataset, or from concepts which don’t have a particularly succinct natural language description, but which can be learned from experience.

LLMs are vibe and correlation machines at heart, it seems crazy to think one could write all relevant such correlations down and have humans read them. And isn’t that largely how humans work, too? I try very hard to describe what I am thinking but I am always by necessity leaving quite a lot out.

So what’s the plan here? How are we going to get this amazingly great at evaluation and interpretability? They do discuss interpretability more in section 6.6.

Also, how are we going to avoid using The Most Forbidden Technique? If we use ‘all the reasons the AI made the decision’ as an input then how do we do that without burning down our cache of interpretability?

The key idea is to use the AI systems themselves to help identify the reasons that the AI system produced the output. For example, we could put two copies of the model in a setting where each model is optimized to point out flaws in the other’s outputs to a human “judge”. Ideally, if one model introduced a subtle flaw in their output that the judge wouldn’t notice by default, the other model would point out and explain the flaw, enabling the judge to penalise the first model appropriately.

That seems like a way lower standard than the one they’re aiming for in the previous quote. You’re pitting two essentially evenly matched models against each other, then hoping they won’t find a way to collude and that evaluation is robustly easier than generation in the relevant contexts.

If all you have to do is find flaws in answers, then maybe that’s advantage defense and maybe it isn’t. But that definitely won’t give you the level of insight we’re looking for.

This is emphasized again by 6.1.1 citing not only debate and critique but also RLHF (!) as an amplified oversight method. I do not understand. It is an oversight method, one could call it ‘the’ oversight method almost by definition, but how is it possibly amplified as described here?

In amplified oversight, any question that is too hard to supervise directly is systematically reduced to ones that we hypothesize can be supervised.

However, humans may be systematically biased even for fairly simple questions. If this turns out to be a problem in practice, we could seek to model these deviations and automatically correct or account for them when interpreting the oversight.

So does that mean we are making the major sacrifice of forcing all outputs to be only composed of pieces that humans can fully interpret and evaluate? And then scrubbing any additional elements? Could you do that even if you wanted to? How do you possibly do enough human oversight at scale?

The whole thing feels hopeless. Like we are kidding ourselves.

In 6.1.7 they talk paths forward, and suggest that the test will be when humans are asked to evaluate AIs in areas where the AIs have superhuman capabilities.

What if we jumped ahead on this, not by having AIs that are smarter, but by having humans that are dumber, or less skilled and knowledgeable in context?

As in, you think humans will be able to evaluate superhumanly capable AIs? All right, let’s have the humans evaluate coding tasks now… but not teach those particular humans how to code, at all, and have them not be all that smart and not give them access to outside resources that wouldn’t exist here. See what happens. If you want to test for a smaller gap, you can calibrate.

That actually sounds like a great test to me. We should do that. Can you get a method whereby the Man On The Street can use AI help to code and evaluate graduate level economics outputs and the quality of poetry and so on in ways that would translate to this future parallel situation? How does that go?

How would you apply this process even to a not-quite-a-toy problem? You can do chess, because you can have the two debaters play out of the game and see who wins, but if you didn’t have that direct ability you seem to be in a lot of trouble.

Even best case they expect this process to be expensive, and they expect it to fail outside of the training distribution:

However, even with Amplified Oversight, there remain two key challenges in using this oversight signal to train AI systems that behave as intended.

First, we will only be able to provide oversight on actions produced within the training distribution.

Second, we anticipate that Amplified Oversight signal will be costly to obtain. These challenges motivate the development of robust training strategies that expand the training distribution with a particular focus on cases that induce failures, as well as monitoring strategies (section 6.3).

Outside of the training distribution is going to be vital when we care most. I do get that the plan is to expand that distribution, but mostly this seems unsolvable.

I don’t see how this plan is going to survive contact with the enemy, under any variation, even if we compromise heavily on many of the plans listed above. Reasoning on r1 and Sonnet 3.7 was shown to often be unfaithful even now.

In 6.5.4 they suggest choosing AI systems with easier-to-understand reasoning methods. That’s an example of why you need to look ahead, so you can make architectural and research decisions now. Once again, I find the paper making good points but being what I see as too optimistic. Yes, there are lots of examples of legible externalized reasoning, but it’s not as legible or complete as you think. And even if you’re willing to pay the efficiency tax it will be extremely difficult to stop the AI from doing opaque and esoteric reasoning or communication, there are a lot of optimization pressures that will push towards that. That doesn’t mean we shouldn’t try, I do think it is a good idea to try.

After that, they claim that standard techniques will essentially work if we grant them the oversight signal. I don’t think we should be confident in that either. I expect things to get harder even if your oversight is perfect, and for any inaccuracy in the oversight however small to be increasingly perilous. They do suggest various techniques for more robust training in 6.3.5, which seem at least marginally helpful.

Then, after all that: If you did fail at the training step, how likely do you think you are to then succeed at the inference step using the same monitor?

As they note, even if you did have an accurate judge or monitor model available, judge-policy collusion is a big risk, and you can only use the judge some of the time. And I’m going to guess that you don’t have an accurate judge available, if you are dealing with a misaligned model, indeed that is strong evidence of some combination of the misalignment of the judge or the misaligned model being optimized to fool the judge. They do mention in 6.3.6 that ideally you would make the judge uncorrelated with your training signal but that seems very hard to do.

The big advantage the judge has in this later stage is it can escalate and ask for help if it is uncertain, as per 6.3.3. It is plausible that an inferior judge can’t do amplified oversight properly, but can probabilistically spot things that are fishy if you don’t mind a lot of false positives.

This once again feels like saying we’ll try our best, but with no reason to expect that their best will do, especially when it is badly needed. This is especially true in 6.3.4 where they want resource efficient oversight.

I don’t know that any of these problems are unsolvable. There are some techniques I would try that I didn’t spot here, such as using OOMs more inference compute on the oversight and judge than you do on the models being overseen and judged.

Another control attempt is to limit the access and context each model or copy of a model gets, on top of the monitoring. Sandboxing, insider controls, least privilege and so on, see 6.4. Again, it’s defense-in-depth, so it presumably doesn’t hurt, but I very much expect this to abandon you in your hour of need. You should also assume that copies of a sufficiently capable model can communicate and coordinate in ways that are invisible to you, even if you don’t know what they are, if their outputs impact the outside world in any way.

They also mention they’re working on interpretability, which is good but does not answer the key questions here. In 6.6.2 they discuss how they might use interpretability. I definitely got nervous that a lot of what they plan on doing is at least toying with using The Most Forbidden Technique.

Section 6.5.1 discusses training targets. What should we want out of our AI?

They start with corrigibility. They hope that there is a clear distinction we can make here about what counts as corrigible, because that means the concept will generalize. So what Google is looking for here is a highly generalized form of corrigibility, not the limited ‘let it get turned off.’

While corrigibility is a straightforwardly desirable property currently, it is possible that in the future it will trade off with the desire to deploy agents autonomously.

I agree that corrigibility is important and good, actually. There was a time I would not have bothered to say that, but in the alignment faking discussions there were a lot of people who said corrigibility is bad. I want to affirm I strongly believe it is good.

The conflict with autonomy is real. In some cases, there are tactical reasons to not want an AI to be so easily dissuaded or deflected. This includes when you want to signal that the AI is like a smart contract, that you cannot choose to shut it down. That’s one large source of risk and issues going forward.

Corrigibility need not arise by default, and indeed we may expect that it does not arise by default due to the instrumental subgoal of utility preservation (Omohundro, 2008). Specifically, an agent may take steps to reduce the developer or user’s influence because the developer or user’s changes might reduce the agent’s expected ability to achieve its current goal. Recent demonstrations of alignment faking (Greenblatt et al., 2024a) lend credence to this concern.

Indeed. How does this interact with the idea that misalignment must always be either specification gaming (SG) or goal misgeneralization (GMG)? It means part of specifying your goal will need to be specifying corrigibility, in a way that actually creates and preserves it.

They then discuss how they want the AI to handle user interactions and how this allows the user to monitor the AI’s actions, sure, sure. This seems largely like a handwave at a much longer more detailed discussion, similar to OpenAI’s model spec, but the point that the check-ins also serve as defense against not only mistakes but misalignment is well-taken. Although when the misalignment is sufficiently severe, the AI can presumably be misaligned in that interaction too.

The third discussion is around ‘biases towards misalignment’ from pretraining. I think at least some things like this are better thought of as mistakes than misalignment, and the failure to do so is actually a policy risk, as this gives certain types the impression that ‘alignment’ and ‘safety’ are some kind of woke thing. Google has been especially anti-helpful here.

Then things get super interesting:

There is a significant amount of content on the Internet (and thus in pretraining corpora) that speculates that AI will be hard to align. This data may induce a self-fulfilling prophecy via out of context reasoning: that is, an AI system would learn the declarative “knowledge” that powerful AI systems tend to be misaligned, leading them to then act in accordance with that expectation (Turner, 2025).

Hu et al. (2025) support this concern, finding that training on documents about reward hacking induces reward hacking.

This suggests a simple intervention: filter out “AI doom” data from training corpora, or otherwise suppress the bias towards misalignment. Many of the capability suppression techniques discussed in Section 5.3.3 can be applied here, including data filtering, grading routing (Cloud et al., 2024), LoRA updates for memory erasure (Gandikota et al., 2024), or methods based on model internals (Zou et al., 2024).

As in the misuse case, this may come at a performance cost to general AI capabilities. This may especially harm the ability of AI systems to assist with alignment research, which could be a large cost, given the need for AI assistance for alignment research to keep up in the event of significant acceleration (Section 3.4).

This could be addressed by training a variant of the AI system with heightened safety measures that can be used specifically for assistance with alignment research.

Generating a ‘hole in the world’ by hiding that data has its own risks, especially as the AI gets clever enough to realize there is a hole there. And all of this talk about alignment and risk has its advantages as well, if you can get to a good enough place to make the model work towards avoiding the associated problems. The ways of solving this I find most promising involve the model knowing about the failure modes exactly so it can assist with steering clear of them.

Another proposal they have in 6.5.3 is limited optimization. Perhaps if you don’t let the AI sufficiently optimize, it won’t be able to specification game? What if you gave it a limited amount of optimization, certainty or planning horizon? Could you use approval feedback instead of being too results oriented?

Limited optimization means making the AI dumber on purpose, which seems like it’s definitely not a long term plan in general. The question is, can you make it dumber in a particular narrow way that cuts off the opportunity to do the things you want to avoid? You can try, but I have a strong prior that this is mostly both going to be expensive to do and then often not going to work.

The whole thing is based both on hiding context or ultimate goals from the AI in question while still getting it to do what you need, and also on the assumption that the specification gaming will be differentially shut down here because it’s less robust or harder to find. That is definitely not always the case. As a practical example, one thing you learn from speed runs is that many of them are (if you’re not trying to get the exact world record) dramatically easier than playing the game as intended. You’re driving a sword through the knot, and ignoring a lot of the difficulty. Robustness can vary quite a lot.

How do we get ‘alignment assurance’?

The plan is to start with alignment stress tests via red teams, plus control evaluations. They discuss the stress tests in 6.7. In the control evaluation, the red team gets to use fine tuning. These are good tests worth running, and the discussions are excellent. I don’t expect them to stay reliable at avoiding false negatives, but the examples here make clear that they are very good at showing many things that won’t work.

The ultimate goal, as mentioned above, is to make an alignment safety case, via some combination of inability, supervision, incentives and understanding. There are a lot of good notes here about how difficult each of these is and what you need to get right even in the summary, and then a longer discussion in 6.8.

The problem is that they correctly expect inability to stop being a good case soon, and then the others get a lot trickier. I essentially don’t buy the supervision case unless the supervisor is far enough ahead that the safety case you need is for the supervisor. Understanding on the level of a safety case feels like a giant ‘good luck.’

Incentives is a cool idea for a safety case, but I don’t think that works either. Appealing to the training process and saying ‘it is pursuing the intended goal and thus should not be misaligned’ seems like essentially wishful thinking when dealing with highly capable models. You know what you intended the goal to be, congratulations. What makes you think the AI sees it that way? What makes you think you are going to like the way they make that happen?

Google is intentionally not talking here about how it intends to solve mistakes.

If we are confining ourselves to the AI’s mistakes, the obvious response is this is straightforwardly a Skill Issue, and that they are working on it.

I would respond it is not that simple, and that for a long time there will indeed be increasingly important mistakes made and we need a plan to deal with that. But it’s totally fine to put that beyond scope here, and I thank Google for pointing this out.

They briefly discuss in 4.3 what mistakes most worry them, which are military applications where there is pressure to deploy quickly and development of harmful technologies (is that misuse?). They advise using ordinary precautions like you would for any other new technology. Which by today’s standards would be a considerable improvement.

Google’s plan also does not address structural risks, such as the existential risk of gradual disempowerment.

Similarly, we expect that as a structural risk, passive loss of control or gradual disempowerment (Kulveit et al., 2025) will require a bespoke approach, which we set out of scope for this paper.

In short: A world with many ASIs and ASI (artificial superintelligent) agents would, due to such dynamics, by default not have a place for humans to make decisions for very long, and then it does not have a place for humans to exist for very long.

Each ASI mostly doing what the user asks them to do, and abiding properly by the spirit of all our requests at all levels, even if you exclude actions that cause direct harm, does not get you out of this. Solving alignment necessary but not sufficient.

And that’s far from the only such problem. If you want to set up a future equilibrium that includes and is good for humans, you have to first solve alignment, and then engineer that equilibrium into being.

More mundanely, the moment there are two agents interacting or competing, you can get into all sorts of illegal, unethical or harmful shenanigans or unhealthy dynamics, without any particular person or AI being obviously ‘to blame.’

Tragedies of the commons, and negative externalities, and reducing the levels of friction within systems in ways that break the relevant incentives, are the most obvious mundane failures here, and can also scale up to catastrophic or even existential (e.g. if each instance of each individual AI inflicts tiny ecological damage on the margin, or burns some exhaustible vital resource, this can end with the Earth uninhabitable). I’d have liked to see better mentions of these styles of problems.

Google does explicitly mention ‘race dynamics’ and the resulting dangers in its call for governance, in the summary. In the full discussion in 4.4, they talk about individual risks like undermining our sense of achievement, distraction from genuine pursuits and loss of trust, which seem like mistake or misuse issues. Then they talk about societal or global scale issues, starting with gradual disempowerment, then discussing ‘misinformation’ issues (again that sounds like misuse?), value lock-in and the ethical treatment of AI systems, and potential problems with offense-defense balance.

Again, Google is doing the virtuous thing of explicitly saying, at least in the context of this document: Not My Department.

Discussion about this post

On Google’s Safety Plan Read More »

wheel-of-time-recap:-the-show-nails-one-of-the-books’-biggest-and-bestest-battles

Wheel of Time recap: The show nails one of the books’ biggest and bestest battles

Andrew Cunningham and Lee Hutchinson have spent decades of their lives with Robert Jordan and Brandon Sanderson’s Wheel of Time books, and they previously brought that knowledge to bear as they recapped each first season episode and second season episode of Amazon’s WoT TV series. Now we’re back in the saddle for season 3—along with insights, jokes, and the occasional wild theory.

These recaps won’t cover every element of every episode, but they will contain major spoilers for the show and the book series. We’ll do our best to not spoil major future events from the books, but there’s always the danger that something might slip out. If you want to stay completely unspoiled and haven’t read the books, these recaps aren’t for you.

New episodes of The Wheel of Time season 3 will be posted for Amazon Prime subscribers every Thursday. This write-up covers episode seven, “Goldeneyes,” which was released on April 10.

Lee: Welcome back—and that was nuts. There’s a ton to talk about—the Battle of the Two Rivers! Lord Goldeneyes!—but uh, I feel like there’s something massive we need to address right from the jump, so to speak: LOIAL! NOOOOOOOOOO!!!! That was some out-of-left-field Game of Thrones-ing right there. My wife and I have both been frantically talking about how Loial’s death might or might not change the shape of things to come. What do you think—is everybody’s favorite Ogier dead-dead, or is this just a fake-out?

Image of Loial

NOOOOOOOOO

Credit: Prime/Amazon MGM Studios

NOOOOOOOOO Credit: Prime/Amazon MGM Studios

Andrew: Standard sci-fi/fantasy storytelling rules apply here as far as I’m concerned—if you don’t see a corpse, they can always reappear (cf. Thom Merrillin, The Wheel of Time season three, episode six).

For example! When the Cauthon sisters fricassee Eamon Valda to avenge their mother and Alanna laughs joyfully at the sight of his charred corpse? That’s a death you ain’t coming back from.

Even assuming that Loial’s plot armor has fallen off, the way we’ve seen the show shift and consolidate storylines means it’s impossible to say how the presence or absence of one character or another couple ripple outward. This episode alone introduces a bunch of fairly major shifts that could play out in unpredictable ways next season.

But let’s back up! The show takes a break from its usual hopping and skipping to focus entirely on one plot thread this week: Perrin’s adventures in the Two Rivers. This is a Big Book Moment; how do you think it landed?

Image of Padan Fain.

Fain seems to be leading the combined Darkfriend/Trolloc army.

Credit: Prime/Amazon MGM Studios

Fain seems to be leading the combined Darkfriend/Trolloc army. Credit: Prime/Amazon MGM Studios

Lee: I would call the Battle of the Two Rivers one of the most important events that happens in the front half of the series. It is certainly a defining moment for Perrin’s character, where he grows up and becomes a Man-with-a-capital-M. It is possibly done better in the books, but only because the book has the advantage of being staged in our imaginations; I’ll always see it as bigger and more impactful than anything a show or movie could give us.

Though it was a hell of a battle, yeah. The improvements in pulling off large set pieces continues to scale from season to season—comparing this battle to the Bel Tine fight back in the first bits of season one shows not just better visual effects or whatever, but just flat-out better composition and clearer storytelling. The show continues to prove that it has found its footing.

Did the reprise of the Manetheren song work for you? This has been sticky for me—I want to like it. I see what the writers are trying to do, and I see how “this is a song we all just kind of grew up singing” is given new meaning when it springs from characters’ bloody lips on the battlefield. But it just… doesn’t work for me. It makes me feel cringey, and I wish it didn’t. It’s probably the only bit in the entire episode that I felt was a swing and a miss.

Image of the battle of the Two Rivers

Darkfriends and Trollocs pour into Emond’s Field.

Darkfriends and Trollocs pour into Emond’s Field.

Andrew: Forgive me in advance for what I think is about to be a short essay but it is worth talking about when evaluating the show as an adaptation of the original work.

Part of the point of the Two Rivers section in The Shadow Rising is that it helps to back up something we’ve seen in our Two Rivers expats over the course of the first books in the series—that there is a hidden strength in this mostly-ignored backwater of Randland.

To the extent that the books are concerned with Themes, the two big overarching ones are that strength and resilience come from unexpected places and that heroism is what happens when regular, flawed, scared people step up and Do What Needs To Be Done under terrible circumstances. (This is pure Tolkien, and that’s the difference between The Wheel of Time and A Song of Ice and FireWoT wants to build on LotR‘s themes and ASoIaF is mainly focused on subverting them.)

But to get back to what didn’t work for you about this, the strength of the Two Rivers is meant to be more impressive and unexpected because these people all view themselves, mostly, as quiet farmers and hunters, not as the exiled heirs to some legendary kingdom (a la Malkier). They don’t go around singing songs about How Virtuous And Bold Was Manetheren Of Old, or whatever. Manetheren is as distant to them as the Roman Empire, and those stories don’t put food on the table.

So yeah, it worked for me as an in-the-moment plot device. The show had already played the “Perrin Rallies His Homeland With A Rousing Speech” card once or twice, and you want to mix things up. I doubt it was even a blip for non-book-readers. But it is a case, as with the Cauthon sisters’ Healing talents, where the show has to take what feels like too short a shortcut.

Lee: That’s a good set of points, yeah. And I don’t hate it—it’s just not the way I would have done it. (Though, hah, that’s a terribly easy thing to say from behind the keyboard here, without having to own the actual creative responsibility of dragging this story into the light.)

In amongst the big moments were a bunch of nice little character bits, too—the kinds of things that keep me coming back to the show. Perrin’s glowering, teeth-gritted exchange with Whitecloak commander Dain Bornhald was great, though my favorite bit was the almost-throwaway moment where Perrin catches up with the Cauthon sisters and gives them an update on Mat. The two kids absolutely kill it, transforming from sober and traumatized young people into giggling little sisters immediately at the sight of their older brother’s sketch. Not even blowing the Horn of Valere can save you from being made fun of by your sisters. (The other thing that scene highlighted was that Perrin, seated, is about the same height as Faile standing. She’s tiny!)

We also close the loop a bit on the Tinkers, who, after being present in flashback a couple of episodes ago, finally show back up on screen—complete with Aram, who has somewhat of a troubling role in the books. The guy seems to have a destiny that will take him away from his family, and that destiny grabs firmly ahold of him here.

Image of Perrin, Faile, and the Cauthon sisters

Perrin is tall.

Credit: Prime/Amazon MGM Studios

Perrin is tall. Credit: Prime/Amazon MGM Studios

Andrew: Yeah, I think the show is leaving the door open for Aram to have a happier ending than he has in the books, where being ejected from his own community makes him single-mindedly obsessed with protecting Perrin in a way that eventually curdles. Here, he might at least find community among good Two Rivers folk. We’ll see.

The entire Whitecloak subplot is something that stretches out interminably in the books, as many side-plots do. Valda lasts until Book 11 (!). Dain Bornhald holds his grudge against Perrin (still unresolved here, but on a path toward resolution) until Book 14. The show has jumped around before, but I think this is the first time we’ve seen it pull something forward from that late, which it almost certainly needs to do more of if it hopes to get to the end in whatever time is allotted to it (we’re still waiting for a season 4 renewal).

Lee: Part of that, I think, is the Zeno’s Paradox-esque time-stretching that occurs as the series gets further on—we’ll keep this free of specific spoilers, of course, but it’s not really a spoiler to say that as the books go on, less time passes per book. My unrefreshed off-the-top-of-my-head recollection is that there are, like, four, possibly five, books—written across almost a decade of real time—that cover like a month or two of in-universe time passing.

This gets into the area of time that book readers commonly refer to as “The Slog,” which slogs at maximum slogginess around book 10 (which basically retreads all the events of book nine and shows us what all the second-string characters were up to while the starting players were off doing big world-changing things). Without doing any more criticizing than the implicit criticizing I’ve already done, The Slog is something I’m hoping that the show obviates or otherwise does away with, and I think we’re seeing the ways in which such slogginess will be shed.

There are a few other things to wrap up here, I think, but this episode being so focused on a giant battle—and doing that battle well!—doesn’t leave us with a tremendous amount to recap. Do we want to get into Bain and Chiad trying to steal kisses from Loial? It’s not in the book—at least, I don’t think it was!—but it feels 100 percent in character for all involved. (Loial, of course, would never kiss outside of marriage.)

Image of Loial, Bain, and Chiad

A calm moment before battle.

Credit: Prime/Amazon MGM Studios

A calm moment before battle. Credit: Prime/Amazon MGM Studios

Andrew: All the Bain and Chiad in this episode is great—I appreciate when the show decides to subtitle the Maiden Of The Spear hand-talk and when it lets context and facial expressions convey the meaning. All of the Alanna/Maksim stuff is great. Alanna calling in a storm that rains spikes of ice on all their enemies is cool. Daise Congar throwing away her flask after touching the One Power for the first time was a weird vaudevillian comic beat that still made me laugh (and you do get a bit more, in here, that shows why people who haven’t formally learned how to channel generally shouldn’t try it). There’s a thread in the books where everyone in the Two Rivers starts referring to Perrin as a lord, which he hates and which is deployed a whole bunch of times here.

I find myself starting each of these episodes by taking fairly detailed notes, and by the middle of the episode I catch myself having not written anything for minutes at a time because I am just enjoying watching the show. On the topic of structure and pacing, I will say that these episodes that make time to focus on a single thread also make more room for quiet character moments. On the rare occasions that we get a less-than-frenetic episode I just wish we could have more of them.

Lee: I find that I’m running out of things to say here—not because this episode is lacking, but because like an arrow loosed from a Two Rivers longbow, this episode hurtles us toward the upcoming season finale. We’ve swept the board clean of all the Perrin stuff, and I don’t believe we’re going to get any more of it next week. Next week—and at least so far, I haven’t cheated and watched the final screener!—feels like we’re going to resolve Tanchico and, more importantly, Rand’s situation out in the Aiel Waste.

But Loial’s unexpected death (if indeed death it was) gives me pause. Are we simply killing folks off left and right, Game of Thrones style? Has certain characters’ plot armor been removed? Are, shall we say, alternative solutions to old narrative problems suddenly on the table in this new turning of the Wheel?

I’m excited to see where this takes us—though I truly hope we’re not going to have to say goodbye to anyone else who matters.

Closing thoughts, Andrew? Any moments you’d like to see? Things you’re afraid of?

Image of Perrin captured

Perrin being led off by Bornhald. Things didn’t exactly work out like this in the book!

Credit: Prime/Amazon MGM Studios

Perrin being led off by Bornhald. Things didn’t exactly work out like this in the book! Credit: Prime/Amazon MGM Studios

Andrew: For better or worse, Game of Thrones did help to create this reality where Who Dies This Week? was a major driver of the cultural conversation and the main reason to stay caught up. I’ll never forget having the Red Wedding casually ruined for me by another Ars staffer because I was a next-day watcher and not a day-of GoT viewer.

One way to keep the perspectives and plotlines from endlessly proliferating and recreating The Slog is simply to kill some of those people so they can’t be around to slow things down. I am not saying one way or the other whether I think that’s actually a series wrap on Loial, Son Of Arent, Son Of Halan, May His Name Sing In Our Ears, but we do probably have to come to terms with the fact that not all fan-favorite septenary Wheel of Time characters are going to make it to the end.

As for fears, mainly I’m afraid of not getting another season at this point. The show is getting good enough at showing me big book moments that now I want to see a few more of them, y’know? But Economic Uncertainty + Huge Cast + International Shooting Locations + No More Unlimited Cash For Streaming Shows feels like an equation that is eventually going to stop adding up for this production. I really hope I’m wrong! But who am I to question the turning of the Wheel?

Credit: WoT Wiki

Wheel of Time recap: The show nails one of the books’ biggest and bestest battles Read More »

the-trek-madone-slr-9-axs-gen-8-tears-up-the-roads-and-conquers-climbs

The Trek Madone SLR 9 AXS Gen 8 tears up the roads and conquers climbs


Trek’s top-of-the-line performance road bike offers some surprises.

The Madone SLR 9 Gen 8 AXS with Lake Michigan in the background on a brisk morning ride. Credit: Eric Bangeman

When a cyclist sees the Trek Madone SLR 9 AXS Gen 8 for the first time, the following thoughts run through their head, usually in this order:

“What a beautiful bike.”

“Damn, that looks really fast.”

“The owner of this bike is extremely serious about cycling and has a very generous budget for fitness gear.”

Indeed, almost every conversation I had while out and about on the Madone started and ended with the bike’s looks and price tag. And for good reason.

A shiny bike

Credit: Eric Bangeman

Let’s get the obvious out of the way. This is an expensive and very high-tech bike, retailing at $15,999. Part of the price tag is the technology—this is a bicycle that rides on the bleeding edge of tech. And another part is the Project One Icon “Tête de la Course” paint job on the bike; less-flashy options start at $13,499. (And if $15,999 doesn’t break your budget, there’s an even fancier Icon “Stellar” paint scheme for an extra $1,000.) That’s a pretty penny but not an unusual price point in the world of high-end road bikes. If you’re shopping for, say, a Cervélo S5 or Specialized S-Works Tarmac SL8, you’ll see the same price tags.

Madone is Trek’s performance-oriented road bike, and the Gen 8 is the latest and greatest from the Wisconsin-based bike manufacturer. It’s more aerodynamic than the Gen 7 (with a pair of aero water bottles) and a few hundred grams weightier than Trek’s recently discontinued Emonda climbing-focused bike.

I put nearly 1,000 miles on the Gen 8 Madone over a two-month period, riding it on the roads around Chicagoland. Yes, the land around here is pretty flat, but out to the northwest there are some nice rollers, including a couple of short climbs with grades approaching 10 percent. Those climbs gave me a sense of the Madone’s ability on hills.

Trek manufactures the Gen 8 Madone out of its 900 series OCLV carbon, and at 15.54 lb (7.05 kg)—just a hair over UCI’s minimum weight for racing bikes—the bike is 320 g lighter than the Gen 7. But high-tech bikes aren’t just about lightweight carbon and expensive groupsets. Even the water bottles matter. During the development of the Gen 8 Madone, Trek realized the water bottles were nearly as important as the frame when it came to squeezing out every last possible aerodynamic gains.

Perhaps the most obvious bit of aerodynamic styling is the diamond-shaped seat tube cutout. That cutout allows the seat tube to flex slightly on rougher pavement while cutting back on lateral flex. It’s slightly smaller than on the Gen 7 Madone, and it looks odd, but it contributes to a surprisingly compliant ride quality.

For the wheelset, Trek has gone with the Aeolus RSL for the Madone SLR 9. The tubeless-ready wheels offer a 51 mm rim depth and can handle a max tire size of 32 mm. Those wheels are paired with a set of 28 mm Bontrager Aeolus RSL TLR road tires. About four weeks into my testing, the rear tire developed what looked like a boil along one of the seams near the edge of the tire. Trek confirmed it was a manufacturing defect that occurred with a batch of tires due to a humidity-control issue within the factory, so affected tires should be out of stores by now.

Cockpit shot

No wires coming off the integrated handlebar and stem.

Credit: Eric Bangeman

No wires coming off the integrated handlebar and stem. Credit: Eric Bangeman

You’ll pilot the Madone with Trek’s new one-piece Aero RSL handlebar and stem combo. It’s a stiff cockpit setup, but I found it comfortable enough even on 80-plus-mile rides. Visually, it’s sleek-looking with a complete absence of wires (and the handlebar-stem combo can only be used with electronic groupsets). The downside is that there’s not enough clearance for a Garmin bike computer with a standard mount; I had to use a $70 K-Edge mount to mount my Garmin.

The Gen 8 Madone also replaces Trek’s Emonda lineup of climbing-focused bikes. Despite weighing 36 grams more than the Emonda SLR 9, Trek claims the Gen 8 Madone has an 11.3 W edge over the climbing bike at 22 mph (and a more modest 0.1 W improvement over the Gen 7 Madone at the same speed).

Of climbs and hero pulls

Paint job

The Tête de la Course colorway in iridescent mode.

Credit: Eric Bangeman

The Tête de la Course colorway in iridescent mode. Credit: Eric Bangeman

The first time I rode the Madone SLR 9 Gen 8 on my usual lunchtime route, I set a personal record. I wasn’t shooting for a new PR—it just sort of happened while I was putting the bike through its paces to see what it was capable of. It turns out it’s capable of a lot.

Riding feels almost effortless. The Madone’s outstanding SRAM Red AXS groupset contributes to that free-and-easy feeling. Shifting through the 12-speed 10-33 cassette is both slick and quick, perfect for when you really want to get to a higher gear in order to drop the hammer. At the front of the drivetrain is a 172.5 mm crank paired with 48t/35t chainrings, more than adequate for everything the local roads were able to confront me with. I felt faster on the flats and quicker through the corners, which led to more than a couple of hero pulls on group rides. The Madone also has a power meter, so you know exactly how many watts you cranked out on your rides.

There’s no derailleur hanger on the Gen 8 Madone, which opens the door to the SRAM Red XPLR groupset.

Credit: Eric Bangeman

There’s no derailleur hanger on the Gen 8 Madone, which opens the door to the SRAM Red XPLR groupset. Credit: Eric Bangeman

There’s also a nice bit of future-proofing with the Madone. Lidl-Trek has been riding some of the cobbled classics with the SRAM Red XPLR AXS groupset, a 13-speed gravel drivetrain that doesn’t need a derailleur hanger. Danish all-arounder Mads Pedersen rode a Madone SLR 9 Gen 8 with a single 56t chainring up front, paired with the Red XPLR to victory at Gent-Wevelgem at the end of March. So if you want to spend another thousand or so on your dream bike setup, that’s an option, as the Madone SLR 9 Gen 8 is one of the few high-performance road bikes that currently supports this groupset.

Living in northeastern Illinois, I lacked opportunities to try the new Madone on extended climbs. Traversing the rollers in the far northwestern suburbs of Chicago, however, the bike’s utility on climbs was apparent. Compared to my usual ride, an endurance-focused road bike, I felt like I was getting the first few seconds of a climb for free. The Madone felt lightweight, nimble, and responsive each time I hit an ascent.

What surprised me the most about the Madone was its performance on long rides. I went into testing with the assumption that I would be trading speed for comfort—and I was happy to be proven wrong. The combination of Trek’s aerodynamic frame design (which it calls IsoFlow), carbon wheelset, and tubeless tires really makes a difference on uneven pavement; there was almost no trade-off between pace and comfort.

What didn’t I like? The water bottles, mainly. My review bike came equipped with a pair of Trek RSL Aero water bottles, which fit in a specially designed cage. Trek says the bottles offer 1.8 W of savings at 22 mph compared to round bottles. That’s not worth it to me. The bottles hold less (~650 ml) than a regular water bottle and are irritating to fill, and getting them in and out of the RSL Aero cages takes a bit of awareness during the first few rides. Thankfully, you don’t need to use the aero bottles; normal cylindrical water bottles work just fine.

The price bears mentioning again. This is an expensive bike! If your cycling budget is massive and you want every last bit of aerodynamic benefit and weight savings, get the SLR 9 with your favorite paint job. Drop down to the Madone SLR 7, and you get the same frame with a Shimano Ultegra Di2 groupset, 52t/36t crank, and a 12-speed 11-30 cassette for $7,000 less than this SLR 9. The SL 7, with its 500 Series OCLV carbon frame (about 250 grams heavier), different handlebars and fork, and the same Ultegra Di2 groupset as the SLR 7 is $2,500 cheaper still.

In conceiving the Gen 8 Madone, Trek prioritized aerodynamic performance and weight savings over all else. The result is a resounding, if expensive, success. The color-shifting Project One paint job is a treat for the eyes, as is the $13,499 Team Replica colorway—the same one seen on Lidl-Trek’s bikes on the UCI World Tour.

At the end of the day, though, looks come a distant second to performance. And with the Gen 8 Madone, performance is the winner by a mile. Trek has managed to take a fast, aerodynamic road bike and make it faster and more aerodynamic without sacrificing compliance. The result is a technological marvel—not to mention a very expensive bike—that is amazing to ride.

Let me put it another way—the Madone made me feel like a boss on the roads. My daily driver is no slouch—a 5-year-old endurance bike with SRAM Red, a Reserve Turbulent Aero 49/42 wheelset, and Continental GP5000s, which I dearly love. But during my two-plus months with the Madone, I didn’t miss my bike at all. I was instead fixated on riding the Madone, dreaming of long rides and new PRs. That’s the way it should be.

Photo of Eric Bangeman

Eric Bangeman is the Managing Editor of Ars Technica. In addition to overseeing the daily operations at Ars, Eric also manages story development for the Policy and Automotive sections. He lives in the northwest suburbs of Chicago, where he enjoys cycling and playing the bass.

The Trek Madone SLR 9 AXS Gen 8 tears up the roads and conquers climbs Read More »

“what-the-hell-are-you-doing?”-how-i-learned-to-interview-astronauts,-scientists,-and-billionaires

“What the hell are you doing?” How I learned to interview astronauts, scientists, and billionaires


The best part about journalism is not collecting information. It’s sharing it.

Inside NASA's rare Moon rocks vault (2016)

Sometimes the best place to do an interview is in a clean room. Credit: Lee Hutchinson

Sometimes the best place to do an interview is in a clean room. Credit: Lee Hutchinson

I recently wrote a story about the wild ride of the Starliner spacecraft to the International Space Station last summer. It was based largely on an interview with the commander of the mission, NASA astronaut Butch Wilmore.

His account of Starliner’s thruster failures—and his desperate efforts to keep the vehicle flying on course—was riveting. In the aftermath of the story, many readers, people on social media, and real-life friends congratulated me on conducting a great interview. But truth be told, it was pretty much all Wilmore.

Essentially, when I came into the room, he was primed to talk. I’m not sure if Wilmore was waiting for me specifically to talk to, but he pretty clearly wanted to speak with someone about his experiences aboard the Starliner spacecraft. And he chose me.

So was it luck? I’ve been thinking about that. As an interviewer, I certainly don’t have the emotive power of some of the great television interviewers, who are masters of confrontation and drama. It’s my nature to avoid confrontation where possible. But what I do have on my side is experience, more than 25 years now, as well as preparation. I am also genuinely and completely interested in space. And as it happens, these values are important, too.

Interviewing is a craft one does not pick up overnight. During my career, I have had some funny, instructive, and embarrassing moments. Without wanting to seem pretentious or self-indulgent, I thought it might be fun to share some of those stories so you can really understand what it’s like on a reporter’s side of the cassette tape.

March 2003: Stephen Hawking

I had only been working professionally as a reporter at the Houston Chronicle for a few years (and as the newspaper’s science writer for less time still) when the opportunity to interview Stephen Hawking fell into my lap.

What a coup! He was only the world’s most famous living scientist, and he was visiting Texas at the invitation of a local billionaire named George Mitchell. A wildcatter and oilman, Mitchell had grown up in Galveston along the upper Texas coast, marveling at the stars as a kid. He studied petroleum engineering and later developed the controversial practice of fracking. In his later years, Mitchell spent some of his largesse on the pursuits of his youth, including astronomy and astrophysics. This included bringing Hawking to Texas more than half a dozen times in the 1990s and early 2000s.

For an interview with Hawking, one submitted questions in advance. That’s because Hawking was afflicted with Lou Gehrig’s disease and lost the ability to speak in 1985. A computer attached to his wheelchair cycled through letters and sounds, and Hawking clicked a button to make a selection, forming words and then sentences, which were sent to a voice synthesizer. For unprepared responses, it took a few minutes to form a single sentence.

George Mitchell and Stephen Hawking during a Texas visit.

Credit: Texas A&M University

George Mitchell and Stephen Hawking during a Texas visit. Credit: Texas A&M University

What to ask him? I had a decent understanding of astronomy, having majored in it as an undergraduate. But the readership of a metro newspaper was not interested in the Hubble constant or the Schwarzschild radius. I asked him about recent discoveries of the cosmic microwave background radiation anyway. Perhaps the most enduring response was about the war in Iraq, a prominent topic of the day. “It will be far more difficult to get out of Iraq than to get in,” he said. He was right.

When I met him at Texas A&M University, Hawking was gracious and polite. He answered a couple of questions in person. But truly, it was awkward. Hawking’s time on Earth was limited and his health failing, so it required an age to tap out even short answers. I can only imagine his frustration at the task of communication, which the vast majority of humans take for granted, especially because he had such a brilliant mind and so many deep ideas to share. And here I was, with my banal questions, stealing his time. As I stood there, I wondered whether I should stare at him while he composed a response. Should I look away? I felt truly unworthy.

In the end, it was fine. I even met Hawking a few more times, including at a memorable dinner at Mitchell’s ranch north of Houston, which spans tens of thousands of acres. A handful of the world’s most brilliant theoretical physicists were there. We would all be sitting around chatting, and Hawking would periodically chime in with a response to something brought up earlier. Later on that evening, Mitchell and Hawking took a chariot ride around the grounds. I wonder what they talked about?

Spring 2011: Jane Goodall and Sylvia Earle

By this point, I had written about science for nearly a decade at the Chronicle. In the early part of the year, I had the opportunity to interview noted chimpanzee scientist Jane Goodall and one of the world’s leading oceanographers, Sylvia Earle. Both were coming to Houston to talk about their research and their passion for conservation.

I spoke with Goodall by phone in advance of her visit, and she was so pleasant, so regal. By then, Goodall was 76 years old and had been studying chimpanzees in Gombe Stream National Park in Tanzania for five decades. Looking back over the questions I asked, they’re not bad. They’re just pretty basic. She gave great answers regardless. But there is only so much chemistry you can build with a person over the telephone (or Zoom, for that matter, these days). Being in person really matters in interviewing because you can read cues, and it’s easier to know when to let a pause go. The comfort level is higher. When you’re speaking with someone you don’t know that well, establishing a basic level of comfort is essential to making an all-important connection.

A couple of months later, I spoke with Earle in person at the Houston Museum of Natural Science. I took my older daughter, then nine years old, because I wanted her to hear Earle speak later in the evening. This turned out to be a lucky move for a couple of different reasons. First, my kid was inspired by Earle to pursue studies in marine biology. And more immediately, the presence of a curious 9-year-old quickly warmed Earle to the interview. We had a great discussion about many things beyond just oceanography.

President Barack Obama talks with Dr. Sylvia Earle during a visit to Midway Atoll on September 1, 2016.

Credit: Barack Obama Presidential Library

President Barack Obama talks with Dr. Sylvia Earle during a visit to Midway Atoll on September 1, 2016. Credit: Barack Obama Presidential Library

The bottom line is that I remained a fairly pedestrian interviewer back in 2011. That was partly because I did not have deep expertise in chimpanzees or oceanography. And that leads me to another key for a good interview and establishing a rapport. It’s great if a person already knows you, but even if they don’t, you can overcome that by showing genuine interest or demonstrating your deep knowledge about a subject. I would come to learn this as I started to cover space more exclusively and got to know the industry and its key players better.

September 2014: Scott Kelly

To be clear, this was not much of an interview. But it is a fun story.

I spent much of 2014 focused on space for the Houston Chronicle. I pitched the idea of an in-depth series on the sorry state of NASA’s human spaceflight program, which was eventually titled “Adrift.” By immersing myself in spaceflight for months on end, I discovered a passion for the topic and knew that writing about space was what I wanted to do for the rest of my life. I was 40 years old, so it was high time I found my calling.

As part of the series, I traveled to Kazakhstan with a photographer from the Chronicle, Smiley Pool. He is a wonderful guy who had strengths in chatting up sources that I, an introvert, lacked. During the 13-day trip to Russia and Kazakhstan, we traveled with a reporter from Esquire named Chris Jones, who was working on a long project about NASA astronaut Scott Kelly. Kelly was then training for a yearlong mission to the International Space Station, and he was a big deal.

Jones was a tremendous raconteur and an even better writer—his words, my goodness. We had so much fun over those two weeks, sharing beer, vodka, and Kazakh food. The capstone of the trip was seeing the Soyuz TMA-14M mission launch from the Baikonur Cosmodrome. Kelly was NASA’s backup astronaut for the flight, so he was in quarantine alongside the mission’s primary astronaut. (This was Butch Wilmore, as it turns out). The launch, from a little more than a kilometer away, was still the most spectacular moment of spaceflight I’ve ever observed in person. Like, holy hell, the rocket was right on top of you.

Expedition 43 NASA Astronaut Scott Kelly walks from the Zvjozdnyj Hotel to the Cosmonaut Hotel for additional training, Thursday, March 19, 2015, in Baikonur, Kazakhstan.

Credit: NASA/Bill Ingalls

Expedition 43 NASA Astronaut Scott Kelly walks from the Zvjozdnyj Hotel to the Cosmonaut Hotel for additional training, Thursday, March 19, 2015, in Baikonur, Kazakhstan. Credit: NASA/Bill Ingalls

Immediately after the launch, which took place at 1: 25 am local time, Kelly was freed from quarantine. This must have been liberating because he headed straight to the bar at the Hotel Baikonur, the nicest watering hole in the small, Soviet-era town. Jones, Pool, and I were staying at a different hotel. Jones got a text from Kelly inviting us to meet him at the bar. Our NASA minders were uncomfortable with this, as the last thing they want is to have astronauts presented to the world as anything but sharp, sober-minded people who represent the best of the best. But this was too good to resist.

By the time we got to the bar, Kelly and his companion, the commander of his forthcoming Soyuz flight, Gennady Padalka, were several whiskeys deep. The three of us sat across from Kelly and Padalka, and as one does at 3 am in Baikonur, we started taking shots. The astronauts were swapping stories and talking out of school. At one point, Jones took out his notebook and said that he had a couple of questions. To this, Kelly responded heatedly, “What the hell are you doing?”

Not conducting an interview, apparently. We were off the record. Well, until today at least.

We drank and talked for another hour or so, and it was incredibly memorable. At the time, Kelly was probably the most famous active US astronaut, and here I was throwing down whiskey with him shortly after watching a rocket lift off from the very spot where the Soviets launched the Space Age six decades earlier. In retrospect, this offered a good lesson that the best interviews are often not, in fact, interviews. To get the good information, you need to develop relationships with people, and you do that by talking with them person to person, without a microphone, often with alcohol.

Scott Kelly is a real one for that night.

September 2019: Elon Musk

I have spoken with Elon Musk a number of times over the years, but none was nearly so memorable as a long interview we did for my first book on SpaceX, called Liftoff. That summer, I made a couple of visits to SpaceX’s headquarters in Hawthorne, California, interviewing the company’s early employees and sitting in on meetings in Musk’s conference room with various teams. Because SpaceX is such a closed-up company, it was fascinating to get an inside look at how the sausage was made.

It’s worth noting that this all went down a few months before the onset of the COVID-19 pandemic. In some ways, Musk is the same person he was before the outbreak. But in other ways, he is profoundly different, his actions and words far more political and polemical.

Anyway, I was supposed to interview Musk on a Friday evening at the factory at the end of one of these trips. As usual, Musk was late. Eventually, his assistant texted, saying something had come up. She was desperately sorry, but we would have to do the interview later. I returned to my hotel, downbeat. I had an early flight the next morning back to Houston. But after about an hour, the assistant messaged me again. Musk had to travel to South Texas to get the Starship program moving. Did I want to travel with him and do the interview on the plane?

As I sat on his private jet the next day, late morning, my mind swirled. There would be no one else on the plane but Musk, his three sons (triplets, then 13 years old) and two bodyguards, and me. When Musk is in a good mood, an interview can be a delight. He is funny, sharp, and a good storyteller. When Musk is in a bad mood, well, an interview is usually counterproductive. So I fretted. What if Musk was in a bad mood? It would be a super-awkward three and a half hours on the small jet.

Two Teslas drove up to the plane, the first with Musk driving his boys and the second with two security guys. Musk strode onto the jet, saw me, and said he didn’t realize I was going to be on the plane. (A great start to things!) Musk then took out his phone and started a heated conversation about digging tunnels. By this point, I was willing myself to disappear. I just wanted to melt into the leather seat I was sitting in about three feet from Musk.

So much for a good mood for the interview.

As the jet climbed, the phone conversation got worse, but then Musk lost his connection. He put away his phone and turned to me, saying he was free to talk. His mood, almost as if by magic, changed. Since we were discussing the early days of SpaceX at Kwajalein, he gathered the boys around so they could hear about their dad’s earlier days. The interview went shockingly well, and at least part of the reason has to be that I knew the subject matter deeply, had prepared, and was passionate about it. We spoke for nearly two hours before Musk asked if he might have some time with his kids. They spent the rest of the flight playing video games, yucking it up.

April 2025: Butch Wilmore

When they’re on the record, astronauts mostly stick to a script. As a reporter, you’re just not going to get too much from them. (Off the record is a completely different story, of course, as astronauts are generally delightful, hilarious, and earnest people.)

Last week, dozens of journalists were allotted 10-minute interviews with Wilmore and, separately, Suni Williams. It was the first time they had spoken in depth with the media since their launch on Starliner and return to Earth aboard a Crew Dragon vehicle. As I waited outside Studio A at Johnson Space Center, I overheard Wilmore completing an interview with a Tennessee-based outlet, where he is from. As they wrapped up, the public affairs officer said he had just one more interview left and said my name. Wilmore said something like, “Oh good, I’ve been waiting to talk with him.”

That was a good sign. Out of all the interviews that day, it was good to know he wanted to speak with me. The easy thing for him to do would have been to use “astronaut speak” for 10 minutes and then go home. I was the last interview of the day.

As I prepared to speak with Wilmore and Williams, I didn’t want to ask the obvious questions they’d answered many times earlier. If you ask, “What was it like to spend nine months in space when you were expecting only a short trip?” you’re going to get a boring answer. Similarly, although the end of the mission was highly politicized by the Trump White House, two veteran NASA astronauts were not going to step on that landmine.

I wanted to go back to the root cause of all this, the problems with Starliner’s propulsion system. My strategy was simply to ask what it was like to fly inside the spacecraft. Williams gave me some solid answers. But Wilmore had actually been at the controls. And he apparently had been holding in one heck of a story for nine months. Because when I asked about the launch, and then what it was like to fly Starliner, he took off without much prompting.

Butch Wilmore has flown on four spacecraft: the Space Shuttle, Soyuz, Starliner, and Crew Dragon.

Credit: NASA/Emmett Given

Butch Wilmore has flown on four spacecraft: the Space Shuttle, Soyuz, Starliner, and Crew Dragon. Credit: NASA/Emmett Given

I don’t know exactly why Wilmore shared so much with me. We are not particularly close and have never interacted outside of an official NASA setting. But he knows of my work and interest in spaceflight. Not everyone at the space agency appreciates my journalism, but they know I’m deeply interested in what they’re doing. They know I care about NASA and Johnson Space Center. So I asked Wilmore a few smart questions, and he must have trusted that I would tell his story honestly and accurately, and with appropriate context. I certainly tried my best. After a quarter of a century, I have learned well that the most sensational stories are best told without sensationalism.

Even as we spoke, I knew the interview with Wilmore was one of the best I had ever done. A great scientist once told me that the best feeling in the world is making some little discovery in a lab and for a short time knowing something about the natural world that no one else knows. The equivalent, for me, is doing an interview and knowing I’ve got gold. And for a little while, before sharing it with the world, I’ve got that little piece of gold all to myself.

But I’ll tell you what. It’s even more fun to let the cat out of the bag. The best part about journalism is not collecting information. It’s sharing that information with the world.

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

“What the hell are you doing?” How I learned to interview astronauts, scientists, and billionaires Read More »