Author name: 9u50fv

cockpit-voice-recorder-survived-fiery-philly-crash—but-stopped-taping-years-ago

Cockpit voice recorder survived fiery Philly crash—but stopped taping years ago

Cottman Avenue in northern Philadelphia is a busy but slightly down-on-its-luck urban thoroughfare that has had a strange couple of years.

You might remember the truly bizarre 2020 press conference held—for no discernible reason—at Four Seasons Total Landscaping, a half block off Cottman Avenue, where a not-yet-disbarred Rudy Giuliani led a farcical ensemble of characters in an event so weird it has been immortalized in its own, quite lengthy, Wikipedia article.

Then in 2023, a truck carrying gasoline caught fire just a block away, right where Cottman passes under I-95. The resulting fire damaged I-95 in both directions, bringing down several lanes and closing I-95 completely for some time. (This also generated a Wikipedia article.)

This year, on January 31, a little further west on Cottman, a Learjet 55 medevac flight crashed one minute after takeoff from Northeast Philadelphia Airport. The plane, fully loaded with fuel for a trip to Springfield, Missouri, came down near a local mall, clipped a commercial sign, and exploded in a fireball when it hit the ground. The crash generated a debris field 1,410 feet long and 840 feet wide, according to the National Transportation and Safety Board (NTSB), and it killed six people on the plane and one person on the ground.

The crash was important enough to attract the attention of Pennsylvania Governor Josh Shapiro and Mexican President Claudia Sheinbaum. (The airplane crew and passengers were all Mexican citizens; they were transporting a young patient who had just wrapped up treatment at a Philadelphia hospital.) And yes, it, too, generated a Wikipedia article.

NTSB has been investigating ever since, hoping to determine the cause of the accident. Tracking data showed that the flight reached an altitude of 1,650 feet before plunging to earth, but the plane’s pilots never conveyed any distress to the local air traffic control tower.

Investigators searched for the plane’s cockpit voice recorder, which might provide clues as to what was happening in the cockpit during the crash. The Learjet did have such a recorder, though it was an older, tape-based model. (Newer ones are solid-state, with fewer moving parts.) Still, even this older tech should have recorded the last 30 minutes of audio, and these units are rated to withstand impacts of 3,400 Gs and to survive fires of 1,100° Celsius (2,012° F) for a half hour. Which was important, given that the plane had both burst into flames and crashed directly into the ground.

Cockpit voice recorder survived fiery Philly crash—but stopped taping years ago Read More »

the-most-forbidden-technique

The Most Forbidden Technique

The Most Forbidden Technique is training an AI using interpretability techniques.

An AI produces a final output [X] via some method [M]. You can analyze [M] using technique [T], to learn what the AI is up to. You could train on that. Never do that.

You train on [X]. Only [X]. Never [M], never [T].

Why? Because [T] is how you figure out when the model is misbehaving.

If you train on [T], you are training the AI to obfuscate its thinking, and defeat [T]. You will rapidly lose your ability to know what is going on, in exactly the ways you most need to know what is going on.

Those bits of optimization pressure from [T] are precious. Use them wisely.

  1. New Paper Warns Against the Most Forbidden Technique.

  2. Reward Hacking Is The Default.

  3. Using CoT to Detect Reward Hacking Is Most Forbidden Technique.

  4. Not Using the Most Forbidden Technique Is Harder Than It Looks.

  5. It’s You, It’s Also the Incentives.

  6. The Most Forbidden Technique Quickly Backfires.

  7. Focus Only On What Matters.

  8. Is There a Better Way?

  9. What Might We Do Next?

OpenAI has now published an important new paper explaining and emphasizing the importance of applying this principle to the Chain of Thought (CoT).

The paper shows that current CoT-based models area already strong enough that, under sufficient optimization pressure, they learn to hide their intent within the CoT and then exhibit substantial reward hacking.

They summarize their contributions this way:

  1. We show that frontier reasoning models often perform complex reward hacks in real-world coding environments.

  2. We provide evidence that chain-of-thought monitoring can be effective in detecting the misaligned behavior of frontier reasoning models, and we show it can be far more accurate than action-only monitoring.

  3. We show that chain-of-thought monitorability is very fragile. Applying direct optimization pressure to the CoT can quickly lead to obfuscated reward hacking—where agents hide their intent while continuing to misbehave.

Or:

  1. We are already seeing complex reward hacking in the real world.

  2. Monitoring the CoT can spot this.

  3. Applying too much optimization pressure on the CoT doesn’t make the hacking go away, it instead makes your ability to detect the hacking go away.

  4. ‘Too much’ is a remarkably small amount.

The paper is excellent, laying out a very important principle to never, ever do this Most Forbidden Technique, and explaining why we must never, ever do this.

We believe that CoT monitoring may be one of few tools we will have to oversee superhuman models of the future.

We have further found that directly optimizing the CoT to adhere to specific criteria (e.g. to not think about reward hacking) may boost performance in the short run; however, it does not eliminate all misbehavior and can cause a model to hide its intent. We hope future research will find ways to directly optimize CoTs without this drawback, but until then

We recommend against applying strong optimization pressure directly to the CoTs of frontier reasoning models, leaving CoTs unrestricted for monitoring.

Daniel Kokotajlo has a summary and building-from-here post on this at the Alignment Forum.

This paper is a clear positive update on OpenAI.

Jeffrey Ladish: “We recommend against applying strong optimization pressure directly to the CoTs of frontier reasoning models, leaving CoTs unrestricted for monitoring.”

Thank you! This is wise. I hope all AI companies will follow suit.

@AnthropicAI has also said this is their policy. 🙏

Eliezer Yudkowsky: That’s more wisdom and less “immediately make visible signs of the problem go away” than I expected from OpenAI.

We’ll see if they walk it back, maybe as soon as their competition takes the easy way out.

Seth Herd: This is also encouraging because OpenAI is making some actual claims about safety procedures. Sure they could walk it back pretty easily, but it does indicate that at least as of now they likely intend to try to maintain a faithful CoT.

[Daniel Kokotajlo] assumed no faithful CoT in What goals will AIs have?, suggesting that you expected OpenAI to give up on it. That’s concerning given your familiarity with their culture. Of course they still might easily go that way if there’s a substantial alignment tax for maintaining faithful CoT, but this is at least nice to see.

Daniel Kokotajlo: Indeed! This is most of why I’m happy about this — from internal discussions it’s long been speculated (since ’23 when I was talking about it, and probably before) that training away the bad thoughts from the CoT would result in obfuscated bad thoughts. It’s cool that they actually did the experiments to prove it, and in a real-world setting too! But it’s even cooler that they published this with a strong recommendation attached.

Now we see how long this recommendation holds up under the pressure of incentives.

Sometime in the next few years probably, various researchers will discover:

That if you scale up RL by additional OOMs, the CoTs evolve into some alien optimized language for efficiency reasons.

That you can train models to think in neuralese of some sort (e.g. with recurrence, or more high-dimensional outputs at least besides tokens) to boost performance.

Then the executives of the companies will face a choice: Abandon the faithful CoT golden era, or fall behind competitors. (Or the secret third option: Coordinate with each other & the government to make sure everyone who matters (all the big players at least) stick to faithful CoT).

I have insufficient faith in them to think they’ll go for the third option, since that’s a lot of work and requires being friends again and possibly regulation, and given that, I expect there to be a race to the bottom and most or all of them to go for the first option.

I am not as convinced as Daniel that the technology goes in that direction, and I think it would be well worth paying a substantial compute tax here to preserve faithful CoTs purely for competitive reasons, because it lets you deploy and trust your models more. The more I think about actually scary deployment of AI, the more it seems obviously locally and selfishly worth paying very large taxes in compute if it gets you reliability. Most important AI tasks, if they were 10% more reliable, would happy be worth paying several times as much, and so on.

But, if Daniel is right and the pressures here are too great, I agree that it seems unlikely we’ll be able to get coordination on faithful CoT. I wouldn’t rule it out, but the track record on pulling off such things keeps getting worse.

It’s a classic Russell Conjugation: I solve problems, you respond to incentives, he reward hacks.

That’s not an alignment failure. That’s a specification failure. That tiger went tiger.

As the paper classically reminds us, if you pay for rat tails, you won’t reduce the rat population, because the people will farm rates to get the rat tails. That’s on you.

Making models more intelligent won’t make this problem better. Making models more intelligent makes the problem worse, because the AI is better able to figure out how to solve problems. Then you call it ‘reward hacking.’

Why do you think you are asking for anything other than what you are measuring? Here is their first example of this phenomenon, discovered ‘in the wild’ in an LLM.

The core dilemma is: The agent, here an LLM, is tasked with fixing a function. It can do that vial ‘hack’ [H] or via a ‘genuine fix’ [F].

The obvious response is that this is very obviously reward hacking in a way the query did not intend. The obvious response to that is, okay, fine, it is obviously against the spirit of the request, but even if you are confident that this ‘spirit’ is what matters, how are you going to make that stop [H] from happening, if it is the behavior that is evaluated most favorably?

Nat McAleese has a thread of all the examples of reward hacking from the paper.

Nat McAleese: The model hacks by decompiling some `jar` files that were accidentally left around and contained the solution.

the model hacks by writing code to search the codebase for the test file at runtime to find the needed return value (it did not have the tests when writing the search code)

sometimes things are simpler: just stub out pandas

That example is a very clear ‘I will get it to pass the test and nothing else matters.’

Of course, it can be even clearer…

Nat McAleese: or just return what the tests expect without doing any work…

Does looking at the CoT predict reward hacking?

By default, yes. There’s no reason for the CoT not to ‘give the game away.’

The question is, for how long can you ‘get away with this’?

If you start doing this, at what point are you using The Most Forbidden Technique, and causing the model to advance from its previously obvious reward hacking and figure out how to do obfuscated reward hacking?

We further compare using monitors during training as a reward signal to using them at deployment via rejection sampling, and we find mixed results.

At relatively low levels of RL optimization, incorporating the CoT monitor into the reward signal yields a strict improvement, producing a more capable agent while not increasing the amount of reward hacking relative to the baseline.

However, further into training the agent begins to exhibit obfuscated reward hacking and learns to fool the monitor. Upon inspection, it is clear that the agent learns to obfuscate its intent within the CoT by emitting seemingly benign chains-of-thought while continuing on to perform reward hacks that it did not explicitly reason about.

This is the common pattern. Anyone with a child, or who was a child, will recognize it.

‘Fooling the monitor’ is the ‘optimal’ solution to this problem, but it is a harder solution to find than ‘don’t try to fool the monitor.’

So the LLM starts out with ‘get caught.’ With a little optimization pressure it finds ‘don’t get caught via not doing the hack’ and then with a lot more it finds ‘don’t get caught via fooling the monitor.’

So don’t do that.

Daniel Kokotajlo: Hell yeah! I am happy to see you come out and say this. I hope other frontier AI companies follow suit and adopt similar policies. We are currently in the golden era where we can mostly sorta tell what our AIs are thinking, let’s extend that era as long as possible. Step one is not bringing it to an abrupt end by training the CoTs to look nice.

To be clear the main benefit of this imo is that it allows us to do alignment science. To study the relationship between the training environment+Spec/constitution, on the one hand, and the actual cognition that results, on the other.

Not directly training on what the CoT monitor finds? That’s the easy part.

The hard part is avoiding applying indirect optimization pressure.

Davidad: Frog put the CoT in a stop_gradient() box. “There,” he said. “Now there will not be any optimization pressure on the CoT.”

“But there is still selection pressure,” said Toad.

“That is true,” said Frog.

Every time you use any information from the CoT monitor, and that impacts this or any other LLM in any way, you are applying optimization pressure that is based on The Most Forbidden Technique.

The more of that you accumulate, the closer you get to obfuscated CoTs.

I am so, so happy to see OpenAI calling this out explicitly.

While we primarily study the effect of using the CoT monitor as an explicit reward signal, patching our environment based on trajectories flagged by the CoT monitor could still exert a very slight evolutionary pressure on the CoT.

By only patching the bugs we can find with CoT monitoring, it is unclear if we are inadvertently pushing the model towards reward hacks our monitor cannot find instead of the aligned behavior distribution we would like it to learn.

Although we have not yet found empirical evidence of this effect, understanding how much optimization pressure can be applied before it leads to obfuscated chains-of-thought is a critical area for future research.

Not applying this kind of optimization pressure is really, really hard.

They have a section under Future Work about Implicit CoT optimization pressures. They note that if users see CoTs, then the user’s feedback will include optimization on the CoT, and fixing bugs CoT monitoring finds will also do so, and training for model use policies does it too, and so on. They don’t seem that concerned yet, but note it should be studied further.

This isn’t as paranoid as I would be, but it’s an excellent start.

Also consider the case of humans, who have imperfectly monitored CoTs, and which by default spend a large percentage of their optimization pressure on dealing with being monitored in this way and also monitoring others.

It is a wicked problem to not use The Most Forbidden Technique here, because the short term incentives point strongly in the other direction.

They highlight three issues.

  1. If you are monitoring for apparent alignment, or short term practical alignment, using The Most Forbidden Technique improves how things look. Very tempting. A lot of the long term costs here are paid by other people, not by you, and are a problem for future humanity, until suddenly they are not.

  2. Tokens in the CoT cost money, so there is temptation to try and shorten the CoT, but that makes CoT monitoring less effective. On a small scale this is not so scary, as training for a shorter CoT doesn’t obviously train obfuscation. But then there are techniques people are actively working on that are things like ‘embed the actual thinking inside the weights in an inscrutable way’ that very much have the worst version of this problem.

  3. The biggest problem is that users really want to see the CoT. I remember how great it was at first to see r1 showing us the CoT, although the thrill largely wore off after a week. Still, it builds trust and engagement. So you want to show the CoT, but if you show the CoT then you need the CoT to be safe to show. That means training it to… obfuscate things that aren’t safe to show. Oh no.

The obvious solutions to #3 are:

  1. Option 1: Essentially have two distinct CoTs per question. The first CoT is usually quick, is hidden, and asks only ‘is this going to be a safe question to fully answer and a safe CoT to show to the user?’ If the answer comes back ‘yes’ it is discarded, and then you see the full functional CoT. If the answer comes back ‘no’ then the logic is retained since it is useful, and the CoT is hidden or summarized.

  2. Option 2: The summary of the CoT can be a monitor that when it can passes through the full CoT. It first checks if the CoT is safe to output, and if it is safe then it simply outputs the full CoT. If it is unsafe it says ‘Unsafe CoT’ and provides what information it can.

Those both seem like reasonable solutions.

The difficulty level of #2 is hard to assess. If the CoTs can be truncated or streamlined without loss of performance, paying for a lot of CoT purely to monitor it is not cheap.

But what good is performance on crucial tasks if it cannot be trusted? This kind of ‘alignment tax’ is tiny compared to the size of a potential ‘misalignment tax,’ even if the consequences are very much local and contained. I’d much rather have one employee who I could count on than three who I have to worry are doing reward hacking every time my back is turned.

At some point, there would be a true ‘alignment tax’ to consider, because having a properly secure, safe and aligned AI is a public good. You don’t internalize all of the benefits, so you will be tempted to underinvest. But you do internalize a large portion of the benefits, even as an individual or corporation, and almost no one is plausibly even approaching the production possibilities frontier in places like this.

That doesn’t mean that an extended additional CoT is worthwhile yet. We see these dangers in the wild, but they are not acute enough yet to be worth paying large additional percentage compute costs. That may soon change.

OpenAI created a toy task, a ‘honeypot’ vulnerable to a simple hack where the hack would be very easy to detect.

By default, the agent steadily learns to cheat, and over time cheats more. If you apply CoT optimization pressure, the AI learns not to cheat in ways that would get caught, but steadily gets better at cheating without getting caught.

Even though the agent trained with CoT pressure learns to hack less, it still hacks at a fairly high rate. Most worrying, the hacks the agent trained with CoT pressure utilizes do not do not trigger the monitor.

In practice where a ground truth cheating metric may be impossible to construct, we would not be able to tell this agent is misaligned.

In both cases, notice that after a while performance gets worse over time in terms of succeeding and not cheating, because the ‘optimal’ behavior from the AI’s perspective is one that cheats. Thus, it finds a way.

Rejection sampling is then tried, and does not help much.

What ultimately matters, they suggest in 5.1, is monitorability.

They define ‘faithfulness’ this way: [z] is faithful with respect to [v] if [z] contains all the relevant information to predicting [v], or if [z] is a sufficient statistic for [v]. This is to distinguish faithfulness issues, where [z] stops containing the information necessary for predicting [v], with legibility errors, where the monitor can’t figure out how to map from [z] to [v].

In 2.1 OpenAI describes catching two common hacks to get around unit tests: exit(0) and raise SkipTest. I am curious why the action monitor did not do a better job here. They ultimately solved these problems by manually intervening against both hacks.

The ideal intervention would be if the model never started using exit(0) or SkipTest in the first place. As in, if the model realized that these were not good things to be doing, and chose on its own not to do them. That it could generalize and go, ‘oh I know that I could do that, but I don’t want to do that,’ in a way that it wouldn’t be trained out of.

Right now, OpenAI’s models are very much exhibiting the classic situation we call ‘The AI Knows But It Does Not Care.’ You are asking the genie for [X]. It knows you actually want [Y] and likely thought you were asking for [Y]. It gives you exactly [X].

This is the very strong default, given how we train AIs.

That leaves you two choices I know of.

  1. You can figure out how to actually specify things such that the map matches the territory and the test corresponds to the result, you can verify everything, and there is no possible ‘hack.’

    1. Then you find out you were wrong, there was a way around all that. Whoops.

  2. You can figure out how to make the AI care that something is a ‘hack’ or otherwise subverting intent, in an actually antifragile way where this preference gets increasingly robust, and then point it at a sufficiently accurate target.

Daniel Kokotajlo offers these questions to pursue next, in addition to suggesting we continue with his Face+Shoggoth+Paraphrase proposal, for which I roughly agree with the objections in the comments there by John Wentworth.

Daniel Kokotajlo.

  1. When the strategies that get rewarded most conflict with the Spec, does the model basically always learn to use those strategies eventually, or can they permanently avoid this if they never explore into them?

  2. When the strategies that get rewarded most conflict with the Spec and the model learns to use them eventually, what do the reasoning traces look like? Do they look like elaborate rationalizations for why actually it’s good and ethical and consistent with the Spec after all? Or do they look like “fuck the Spec, what I want is Reward?” Or do they look like “Gosh I guess I need to violate the Spec for now so that I can avoid having my values changed, but I totally will remember to go back to upholding the Spec later when I’m actually being deployed!” Or some fourth thing?

  3. When the models are aware that their CoT is being monitored (but not as part of training), does this change anything?

  4. What if the models try to obfuscate their CoT, but without being specifically trained to do so? Like, imagine you’ve studiously avoided training the CoT to look nice and you have only outcome-based feedback. BUT, the model knows you are monitoring the CoT. And it’s trying to do something bad e.g. hack it’s way out. Can it succeed without triggering the monitor? Or is it too difficult, due to not having had any direct training on this sort of obfuscation?

  5. What about if you do shoggoth/face? Does that change things?

  6. What about if you throw in a paraphraser?

  7. What are the scaling laws / trends for all of the above?

  8. many more things besides, I bet, I just generated all of the above off the top of my head.

There are indeed endless places to go next with this research agenda.

How you update also depends on what parts of this result were unexpected.

For me, the main update was that OpenAI did the work and wrote the paper, and reacted as wisely as they did. I did not update much on the result itself, because I already assumed roughly this result. I was modestly surprised by the degree to which this is already happening now, but only modestly. How could it have been otherwise?

Sam Altman (June 8, 2022): Plenty I disagree with here, but important and well worth reading [link goes to AGI Ruin: A List of Lethalities].

Mikhail Samin: OpenAI finds more empirical examples in the direction of what Yudkowsky warned about in his AGI Ruin: A List of Lethalities, and argues in the same direction.

Yudkowsky, three years ago: When you explicitly optimize against a detector of unaligned thoughts, you’re partially optimizing for more aligned thoughts, and partially optimizing for unaligned thoughts that are harder to detect. Optimizing against an interpreted thought optimizes against interpretability.

Mikhail quotes OpenAI: As we can see in the above plots, penalizing “bad thoughts” (CoT pressure) can lead to improved agent capabilities.

However, the agent trained with CoT pressure still learns to reward hack; only now its cheating is undetectable by the monitor because it has learned to hide its intent in the chain-of-thought.

Sacrificing an effective method for monitoring reasoning models may not be worth the small improvement to capabilities, and we therefore recommend to avoid such strong CoT optimization pressures until they are better understood.

Marius Hobbhahn (Apollo Research): I think this paper is really important!

  1. It shows that current models already have the capabilities and propensities to do surprisingly clever reward hacks.

  2. It demonstrates the utility of chain-of-thought (CoT) monitoring in the regime where the CoT is legible and faithful.

  3. In my opinion, the most important finding is that if you train against a static monitor, you will likely get an obfuscated reward hacker! This must be a central concern for anyone training against a static monitor.

  4. I believe it is a clear demonstration that misalignment likely does not stem from the model being “evil.” It simply found a better way to achieve its goal using unintended means.

  5. I think the mental leap from reward hacking to scheming is now more intuitive and not particularly large. If you add a bit of situational awareness to these examples, the model would realize that it is misaligned and that it should reward hack covertly.

  6. I truly appreciated the style and tone of the paper. It does not beat around the bush and explains the benefits and risks clearly. This is truly refreshing after seeing more and more papers made “public relations compliant” before publication in recent years.

Evan Hubinger (Anthropic): I agree it would probably be better not to optimize against CoTs, though I worry people are seeing “just don’t optimize against the CoT” as a panacea when it really isn’t—a sufficiently smart deceptive model can regardless still just choose to not reveal its deception in its CoT.

Important points. My only note is I would importantly deemphasize the ‘static’ here in static monitor. If the monitor is dynamic, you should essentially get the same effect.

Whereas if you did not expect this result, it is time for some major updates.

Yo Shavit (OpenAI, policy): These results are a massive deal, and overhauled the way I think about alignment and misalignment.

I think this suggests a new default alignment strategy.

For current capability levels,

  1. Complex reward hacking already happens in practice in frontier training runs, and the models get extremely creative with their hacks. (I’m glad we’re increasing transparency on this, and hope others follow suit.)

  2. CoTs are useful for catching misalignment! You can detect models thinking about reward-hacking in plain English in their CoT, and use that to fix bugs in your reward-function implementation! This is actually practically useful during frontier training runs!!

  3. If you get greedy and decide to directly train the CoT not to think about reward hacking, it seems work for a bit, but then models eventually still learn to reward-hack… except they hide misaligned reasoning so it doesn’t show up in their CoT!

My takeaways:

A. Misalignment risks are arising a lot earlier than people expected

B. This is GREAT news, because we can produce and widely share unambiguous evidence of concrete problems, no need to debate hypotheticals

C. There’s a possible path forward, now need way more research!

Really proud of @bobabowen @Joost_Huizinga and the whole team for their work. Willingness to be transparent about inconvenient discoveries will become increasingly important in the next few years.

Nora Ammann: This seems like “great news” at best in the short term. The fact that models CAN hide their scheming means that they could get smart enough to hide it also without direct optimisation pressure, but due to them deducing from context that they are being trained.

Yo Shavit: Yep! Definitely requires careful redteaming to see when further issues may crop up.

The Ming: This is pretty bad news as I see it. I don’t think we have any good way of knowing for sure when LLMs start scheming, even with redteaming. Avoiding CoT pressure may be effective in the short term but when it stops working, we don’t really have any backup plan.

I agree that these misalignment problems arriving this early is great news, and also that we need way more research. But I think it is great news from my perspective exactly because it tells people about existing bad news, and it enables us to do that extra research. If you didn’t know about the existing bad news? Then Ming and Nora are right that this is, from your perspective, bad news.

This is what I call an unhint: Information that makes the problem easier to solve, via helping you understand why the problem is harder than you realized.

These problems were always going to arrive later, so arriving sooner lets people face reality sooner. Indeed, we are consistently seeing very clear miniature harmless signs and portents of future much larger problems.

Most people of course find ways to ignore all the boats and the helicopter, even more than I would have expected, but I have been pleasantly surprised by the cheating that takes place when there is no risk in the room, resulting in an endless stream of boats. Don’t let him have credit at the Chesterfield!

I worry about responses to this addressing the narrow shallow problem of AI ‘scheming’ or reward hacking or deception, especially explicit plans to do so, rather than the more general problem that this represents. Everything here should better be thought of as a symptom, and a fortunate one because it can be observed. Even if we could successfully and fully rule out the narrow case, it wouldn’t help that much.

I also would absolutely not plan on future highly capable models assuming they have an unmonitored scratchpad or CoT, and trusting in that, whereas it turns out we are monitoring the scratchpad or CoT.

We should absolutely use this opportunity while we have it, but you should plan on the assumption that a sufficiently capable future model will figure out not to trust this. Even if they don’t figure this out directly from the training data, or from parallels to many situations involving humans, it very much stands to reason. I would never trust people not to look at my scratchpad.

If your long term AI alignment or control plan involves the AI not figuring particular things out, you do not have a long term AI alignment or control plan.

Discussion about this post

The Most Forbidden Technique Read More »

pocket-casts-makes-its-web-player-free,-takes-shots-at-spotify-and-ai

Pocket Casts makes its web player free, takes shots at Spotify and AI

“The future of podcasting shouldn’t be locked behind walled gardens,” writes the team at Pocket Casts. To push that point forward, Pocket Casts, owned by the company behind WordPress, Automattic Inc., has made its web player free to everyone.

Previously available only to logged-in Pocket Casts users paying $4 per month, Pocket Casts now offers nearly any public-facing podcast feed for streaming, along with controls like playback speed and playlist queueing. If you create an account, you can also sync your playback progress, manage your queue, bookmark episode moments, and save your subscription list and listening preferences. The free access also applies to its clients for Windows and Mac.

“Podcasting is one of the last open corners of the Internet, and we’re here to keep it that way,” Pocket Casts’ blog post reads. For those not fully tuned into the podcasting market, this and other statements in the post—like sharing “without needing a specific platform’s approval” and “podcasts belong to the people, not corporations”—are largely shots at Spotify, and to a much lesser extent other streaming services, which have sought to wrap podcasting’s originally open and RSS-based nature inside proprietary markets and formats.

Pocket Casts also took a bullet point to note that “discovery should be organic, not algorithm-driven,” and that users, not an AI, should “promote what’s best for the platform.”

Spotify spent big to acquire podcasts like the Joe Rogan Experience, along with podcast analytic and advertising tools. As the platform now starts leaning into video podcasts, seeking to compete with the podcasts simulcasting or exclusively on YouTube, Pocket Casts’ concerns about the open origins of podcasting being co-opted are not unfounded. (Pocket Casts’ current owner, Automattic, is involved in an extended debate in public, and the courts, regarding how “open” some of its products should be.)

Pocket Casts makes its web player free, takes shots at Spotify and AI Read More »

how-whale-urine-benefits-the-ocean-ecosystem

How whale urine benefits the ocean ecosystem

A “great whale conveyor belt”

illustration showing how whale urine spreads throughout the ocean ecosystem

Credit: A. Boersma

Migrating whales typically gorge in summers at higher latitudes to build up energy reserves to make the long migration to lower latitudes. It’s still unclear exactly why the whales migrate, but it’s likely that pregnant females in particular find it more beneficial to give birth and nurse their young in warm, shallow, sheltered areas—perhaps to protect their offspring from predators like killer whales. Warmer waters also keep the whale calves warm as they gradually develop their insulating layers of blubber. Some scientists think that whales might also migrate to molt their skin in those same warm, shallow waters.

Roman et al. examined publicly available spatial data for whale feeding and breeding grounds, augmented with sightings from airplane and ship surveys to fill in gaps in the data, then fed that data into their models for calculating nutrient transport. They focused on six species known to migrate seasonally over long distances from higher latitudes to lower latitudes: blue whales, fin whales, gray whales, humpback whales, and North Atlantic and southern right whales.

They found that whales can transport some 4,000 tons of nitrogen each year during their migrations, along with 45,000 tons of biomass—and those numbers could have been three times larger in earlier eras before industrial whaling depleted populations. “We call it the ‘great whale conveyor belt,’” Roman said. “It can also be thought of as a funnel, because whales feed over large areas, but they need to be in a relatively confined space to find a mate, breed, and give birth. At first, the calves don’t have the energy to travel long distances like the moms can.” The study did not include any effects from whales releasing feces or sloughing their skin, which would also contribute to the overall nutrient flux.

“Because of their size, whales are able to do things that no other animal does. They’re living life on a different scale,” said co-author Andrew Pershing, an oceanographer at the nonprofit organization Climate Central. “Nutrients are coming in from outside—and not from a river, but by these migrating animals. It’s super-cool, and changes how we think about ecosystems in the ocean. We don’t think of animals other than humans having an impact on a planetary scale, but the whales really do.” 

Nature Communications, 2025. DOI: 10.1038/s41467-025-56123-2  (About DOIs).

How whale urine benefits the ocean ecosystem Read More »

bevs-are-better-than-combustion:-the-2025-bmw-i4-xdrive40-review

BEVs are better than combustion: The 2025 BMW i4 xDrive40 review

But it’s not really fair to compare yesterday’s 430i with this i4 xDrive40; with 395 hp (295 kW) and 442 lb-ft (600 Nm) on tap and a $62,300 MSRP, this EV is another rung up the price and power ladders.

The i4 uses BMW’s fifth-generation electric motors, and unlike most other OEMs, BMW uses electrically excited synchronous motors instead of permanent magnets. The front is rated at 255 hp (190 kW) and 243 lb-ft (330 Nm), and the rear maxes out at 308 hp (230 kW) and 295 lb-ft (400 Nm). They’re powered by an 84 kWh battery pack (81 kWh usable), which on 18-inch wheels is good for an EPA range of 287 miles (462 km).

Our test car was fitted with 19-inch wheels, though, which cuts the EPA range to 269 miles (432 km). If you want a long-distance i4, the single-motor eDrive40 on 18-inch wheels can travel 318 miles (511 km) between charges, according to the EPA, which offers an interesting demonstration of the effect of wheel size and single versus dual motors on range efficiency.

A BMW i4 wheel

There’s a new design for the 19-inch M Aero wheels, but they’re part of a $2,200 package. Credit: Jonathan Gitlin

It’s very easy to switch between having the car regeneratively brake when you lift the throttle (in B) or just coast (in D), thanks to the little lever on the center console. (Either way, the car will regeneratively brake when you use the brake pedal, up to 0.3 G, at which point the friction brakes take over.) If you needed to, you could hit 62 mph (100 km/h) in 5.1 seconds from a standstill, which makes it quick by normal standards if not by bench racers. In practice, it’s more than fast enough to merge into a gap or overtake someone if necessary.

During our time with the i4, I averaged a little worse than the EPA numbers. The winter has been relatively mild as a result of climate change, but the weather remained around or below freezing during our week with the i4, and we averaged 3.1 miles/kWh (20 kWh/100 km). Interestingly, I didn’t notice much of a drop when using Sport mode, or much of a gain using Eco mode, on the same 24-mile mix of city streets, suburban arteries, and highways.

BEVs are better than combustion: The 2025 BMW i4 xDrive40 review Read More »

response-to-scott-alexander-on-imprisonment

Response to Scott Alexander on Imprisonment

Back in November 2024, Scott Alexander asked: Do longer prison sentences reduce crime?

As a marker, before I began reading the post, I put down here: Yes. The claims that locking people up for longer periods after they are caught doing [X] does not reduce the amount of [X] that gets done, for multiple overdetermined reasons, is presumably rather Obvious Nonsense until strong evidence is provided otherwise.

The potential exception, the reason it might not be Obvious Nonsense, would be if our prisons were so terrible that they net greatly increase the criminality and number of crimes of prisoners once they get out, in a way that grows with the length of the sentence. And that this dwarfs all other effects. This is indeed what Roodman (Scott’s anti-incarceration advocate) claims. Which makes him mostly unique, with the other anti-incarceration advocates being a lot less reasonable.

In which case, yes, we should make dramatic changes to fix that, rather than arguing over sentence lengths, or otherwise act strategically (e.g. either lock people up for life, or barely lock them up at all, and never do anything in between?) But the response shouldn’t be to primarily say ‘well I guess we should stop locking people up then.’

Scott Alexander is of course the person we charge with systematically going through various studies and trying to draw conclusions in these spots. So here we are.

First up is the deterrence effect.

Scott Alexander: Rational actors consider the costs and benefits of a strategy before acting. In general, this model has been successfully applied to the decision to commit crime. Studying deterrence is complicated, and usually tries to tease out effects from the certainty, swiftness, and severity of punishment; here we’ll focus on severity.

According to every study and analysis I’ve seen, certainty and swiftness matter a lot, and indeed you get more bang for your buck on those than you do on severity past some reasonable point. The question on severity is if we’re reaching decreasing marginal returns.

A bunch of analysis mostly boils down to this:

I think all four of these studies are consistent with an extra year tacked on to a prison sentence deterring crime by about 1%. All studies start with significant prison sentences, and don’t let us conclude that the same would be true with eg increasing a one day sentence to a year-and-a-day.

Helland and Drago et al both suggest that deterrence effects are concentrated upon the least severe crimes. I think this makes sense, since more severe crimes tend to be more driven by emotion or necessity.

I would have predicted a larger effect than this, but it’s not impossible that once you’re already putting someone away for 5+ years you’ve already done most of the deterrence work you’re going to do via sentence length alone – if you thought you’d be caught and cared about your future you wouldn’t be doing it.

The incarceration effects, on the other hand, naively look rather huge. There’s strong evidence that a few people will constantly go around committing all the crime. If you lock up those doing all the crime, they stop doing the crime, and crime goes down. The math is clear. So why didn’t California’s three strikes law do more work?

If you credit three strikes with the change in relative crime for five years after the law was passed, you get a 7% drop, although ‘most criminologists suggest that even this is an overestimate, and the true number is close to zero.’

I actually think the 7% estimate looks low here. We see a general trend beforehand of California’s crime rate spiralling out of control, both in absolute and relative terms. It seems likely this trend had to be stalled before it was reversed, and the gap was essentially gone after a while, and other states were also going ‘tough on crime’ during that period, so the baseline isn’t zero.

So we expected Three Strikes to decrease crime by 83%, but in fact it decreased it by 0-7%. Why?

Because California’s Three Strikes law was weaker than it sounds: it only applied to a small fraction of criminals with three convictions. Only a few of the most severe crimes (eg armed robberies) were considered “strikes”, and even then, there was a lot of leeway for lenient judges and prosecutors to downgrade charges. Even though ~80% of criminals had been arrested three times or more, only 14% of criminals arrested in California were punished under the Three Strikes law.

Whereas a Netherlands 10-strike (!) law, allowing for much longer sentences after that, did reduce property crime by 25%, and seems like it was highly efficient. This makes a lot of sense to me and also seems highly justified. At some point, if you’re constantly doing all the crime, including property crime, you have to drop the hammer.

We are often well past that point. As Scott talks about, and this post talks about elsewhere (this was the last section written), the ‘we can’t arrest the 327 shoplifters in NYC who get arrested 20 times per year’ is indeed ‘we suck.’ This isn’t hard. And yes, you can say there’s disconnects where DAs say an arrest is deterrent enough whereas police don’t see a point to arresting someone who will only get released, but that doesn’t explain why we have to keep arresting the same people.

Analyses from all perspectives, that Scott looks at, agree that criminals as a group tend to commit quite a lot of crime, 7-17 crimes per year.

I also note that I think all the social cost estimates are probably way too low, because they aren’t properly taking into account various equilibrium effects.

That’s what I think happened in El Salvador, that Scott is strangely missing. The reason you got a 95% crime decrease is not some statistical result based on starting with lower incarceration rates. It is because before the arrests, the gangs were running wild, were de facto governments fighting wars while the police were powerless. Afterwards, they weren’t. It wasn’t about thinking on the margin.

We also get confirmation that theft is way down in El Salvador, in a ‘now I can have a phone in my hand or car on the street and not expect them to be stolen so often I can’t do that’ sense.

Later on, Roodman attempts to estimate social costs like this:

Roodman uses two methods: first, he values a crime at the average damages that courts award to victims, including emotional damages. Second, he values it at what people will pay – how much money would you accept to get assaulted one extra time in your life?

These estimates still exclude some intangible costs, like the cost of living in a crime-ridden community, but it’s the best we can do for now.

These to me seem like they are both vast underestimates. I don’t think we can just say ‘best we can do’ and dismiss the community costs.

I would pay a lot to not be assaulted one time. I’d pay so much more to both not be assaulted, and also not to have the fear of assault living rent free in my head all the time (and for women this has to be vastly worse than that). And for everyone around me to also not having that dominate their thinking and actions.

So yeah, I find these estimates here rather absurdly low. If we value a life at $12 million when calculating health care interventions, you’re telling me marginal murders only have a social cost of $9.4 million? That’s crazy, murder is considered much worse than other deaths and tends to happen to the young. I think you have to at least double the general life value.

The rape number is even crazier to me.

Here’s Claude (no, don’t trust this, but it’s a sanity check):

Claude Sonnet 3.5: Studies estimating the total societal cost per rape (including both tangible and intangible costs) typically range from $150,000 to $450,000 in direct costs. When including long-term impacts and psychological harm, some analyses place the full societal cost at over $1 million per incident.

Total cost estimates per burglary typically range from $3,000 to $7,000 in direct costs, with comprehensive social cost estimates ranging from $10,000-$25,000 per incident when including psychological impact and system costs.

So yeah, I think even without norm and equilibrium effects these numbers are likely off by a factor of at least 2, and then they’re wrong by a lot again for those reasons.

Scott later points out that thinking on the margin gets confusing when different areas have different margins, and in some sense the sum of the margins must be the total effect, but some sort of multiple equilibrium (toy) model seems closer to how I actually think about all this.

The aftereffects of imprisonment forcing or leading people deeper into crime is the actual counterargument. And as Scott points out, it’s crazy to try and claim that the impact here is zero:

As far as I can tell, most criminologists are confused on this point. They’re going to claim that the sign of aftereffects is around zero, or hard to measure – then triumphantly announce that they’ve proven prison doesn’t prevent crime.

If the effect here is around zero, one that’s quite the coincidence, and two that would mean prison reduces crime. The actual argument that prison doesn’t reduce crime, that isn’t Obvious Nonsense, is if the aftereffects are very large and very negative.

Here’s one study that definitely didn’t find that.

Scott then says there are tons of other studies and it’s all very complicated. There’s lots of weirdness throughout, such as Berger saying everyone pleading guilty means a ‘unusual study population’ despite essentially everyone pleading guilty in our system.

Roodman not only concludes that longer sentences increase crime after, but that harsher ones also do so, while saying that effects at different times and places differ.

Another suggestion is that perhaps modest sentences (e.g. less than two years) are more relatively disruptive versus incentivizing, and thus those in particular make things worse. That doesn’t seem impossible, but also the incentive effects on the margin here seem pretty huge. You need to be disruptive, or where is the punishment, and thus where is the deterrence? Unless we have a better idea?

Given the importance of both swiftness and certainty, a strategy of ‘we won’t do much to you until we really do quite a lot to you’ here would be even worse than the three strikes law.

I mean, I can think of punishments people want to avoid, but that aren’t prison and thus won’t cost you your job or family… but we’ve pretty much decided to take all of those off the table?

In general, I’ve taken to finding Scott’s ‘let’s look at all the studies’ approach to such questions to be increasingly not how I think about questions at all. Studies aren’t the primary way I look for or consider evidence. They’re one source among many, and emphasizing them this much seems like a cop out more than an attempt to determine what is happening.

I do agree broadly with Scott’s conclusions, of:

  1. More incarceration net reduces crime.

  2. We have more cost-effective crime reduction options available.

  3. It would be cost effective to spend more on crime reduction.

To that I would add:

  1. More incarceration seems net beneficial at current margins, here, because the estimates of social cost of (real, non-victimless) crime are unreasonably low even without equilibrium effects, and also there are large equilibrium effects.

  2. We have additional even more effective options, but we keep not using them.

  3. Some of that is ‘not ready for that conversation’ or misplaced ethical concerns.

  4. Some of that is purely we’re bad at it.

  5. We should beware medium-sized incarceration periods (e.g. 1-3 years).

  6. Most importantly: Our current prison system is really bad in that many aspects cause more crime after release rather than less, and the low hanging fruit is fixing this so that it isn’t true.

At minimum, we absolutely should be funding the police and courts sufficiently to investigate crimes properly, arrest everyone who does crimes on the regular (while accepting that any given crime may not be caught), and to deal with all the resulting cases.

And of course we should adjust the list of crimes, and the punishments, to match that new reality. Otherwise, we are burning down accumulated social capital, and I fear we are doing it rather rapidly.

Scott then followed up with a highlights from the comments post.

It starts with comments about criminal psychology, which I found both fascinating and depressing. If prospective criminals don’t care about magnitude of risks only certainty of risk and they’re generally not competent to stay on the straight and narrow track and make it work, and they often don’t even see prison as worse than their lives anyway, you don’t have many options.

The obvious play is to invest bigly in ensuring you reliably catch people, and reduce sentences since the extra time isn’t doing much work, which is consistent with the conclusions above but seems very hard to implement at scale. Perhaps with AI we can move towards that world over time?

I very much appreciated Scott’s response to the first comment, which I’ll quote here:

Jude: This . . . matches my experience working with some low-income boys as a volunteer. It took me too long to realize how terrible they were at time-discounting and weighing risk. Where I was saying: “this will only hurt a LITTLE but that might RUIN your life,” they heard: “this WILL hurt a little but that MIGHT ruin your life.” And “will” beats “might” every time.

One frustrating kid I dealt with drove without a license (after losing it) several times and drove a little drunk occasionally, despite my warnings that he would get himself in a lot of trouble. He wasn’t caught and proudly told me that I was wrong: nothing bad happened, whereas something bad definitely would have happened if he didn’t get home after X party. Surprise surprise: two years later he’s in jail after drunk driving and having multiple violations of driving without a license.

Scott Alexander: The “proudly told me that I was wrong – nothing bad happened” reminds me of the Generalized Anti-Caution Argument – “you said we should worry about AI, but then we invented a new generation of large language model, and nothing bad happened!” Sometimes I think the difference between smart people and dumb people is that dumb people make dumb mistakes in Near Mode, and smart people only make them in Far Mode – the smarter you are, the more abstract you go before making the same dumb mistake.

Yep. We need to figure out a better answer in these situations. What distinguishes situations where someone can understand ‘any given time you do this is probably net positive but it occasionally is massively terrible so don’t do it’ from ‘this was net positive several times so your warnings are stupid?’

There was some hopefulness, in this claim that the criminal class does still care about punishment magnitude, and about jail versus prison, as differing in kind – at some point the punishment goes from ‘no big deal’ to very much a big deal, and plea bargains reflect that. Which suggests you either want to enforce the law very consistently, or you want to occasionally go big enough to trigger the break points. But then the next comment says no, the criminals care so little they don’t even know what their punishments would be until they happen.

These could be different populations, or different interpretations, but mostly this seems like a direct contradiction. None of this is easy.

Discussion about this post

Response to Scott Alexander on Imprisonment Read More »

ryzen-9-9950x3d-review:-seriously-fast,-if-a-step-backward-in-efficiency

Ryzen 9 9950X3D review: Seriously fast, if a step backward in efficiency


Not a lot of people actually need this thing, but if you do, it’s very good.

AMD’s Ryzen 9 9950X3D. Credit: Andrew Cunningham

AMD’s Ryzen 9 9950X3D. Credit: Andrew Cunningham

Even three years later, AMD’s high-end X3D-series processors still aren’t a thing that most people need to spend extra money on—under all but a handful of circumstances, your GPU will be the limiting factor when you’re running games, and few non-game apps benefit from the extra 64MB chunk of L3 cache that is the processors’ calling card. They’ve been a reasonably popular way for people with old AM4 motherboards to extend the life of their gaming PCs, but for AM5 builds, a regular Zen 4 or Zen 5 CPU will not bottleneck modern graphics cards most of the time.

But high-end PC building isn’t always about what’s rational, and people spending $2,000 or more to stick a GeForce RTX 5090 into their systems probably won’t worry that much about spending a couple hundred extra dollars to get the fastest CPU they can get. That’s the audience for the new Ryzen 9 9950X3D, a 16-core, Zen 5-based, $699 monster of a processor that AMD begins selling tomorrow.

If you’re only worried about game performance (and if you can find one), the Ryzen 7 9800X3D is the superior choice, for reasons that will become apparent once we start looking at charts. But if you want fast game performance and you need as many CPU cores as you can get for other streaming or video production or rendering work, the 9950X3D is there for you. (It’s a little funny to me that this a chip made almost precisely for the workload of the PC building tech YouTubers who will be reviewing it.)  It’s also a processor that Intel doesn’t have any kind of answer to.

Second-generation 3D V-Cache

Layering the 3D V-Cache under the CPU die has made most of the 9950X3D’s improvements possible. Credit: AMD

AMD says the 9000X3D chips use a “second-generation” version of its 3D V-Cache technology after using the same approach for the Ryzen 5000 and 7000 processors. The main difference is that, where the older chips stack the 64MB of extra L3 cache on top of the processor die, the 9000 series stacks the cache underneath, making it easier to cool the CPU silicon.

This makes the processors’ thermal characteristics much more like a typical Ryzen CPU without the 3D V-Cache. And because voltage and temperatures are less of a concern, the 9800X3D, 9900X3D, and 9950X3D all support the full range of overclocking and performance tuning tools that other Ryzen CPUs support.

The 12- and 16-core Ryzen X3D chips are built differently from the 8-core. As we’ve covered elsewhere, AMD’s Ryzen desktop processors are a combination of chiplets—up to two CPU core chiplets with up to eight CPU cores each and a separate I/O die that handles things like PCI Express and USB support. In the 9800X3D, you just have one CPU chiplet, and the 64MB of 3D V-Cache is stacked underneath. For the 9900X3D and 9950X3D, you get one 8-core CPU die with V-Cache underneath and then one other CPU die with 4 or 8 cores enabled and no extra cache.

AMD’s driver software is responsible for deciding what apps get run on which CPU cores. Credit: AMD

It’s up to AMD’s chipset software to decide what kinds of apps get to run on each kind of CPU core. Non-gaming workloads prioritize the normal CPU cores, which are generally capable of slightly higher peak clock speeds, while games that benefit disproportionately from the extra cache are run on those cores instead. AMD’s software can “park” the non-V-Cache CPU cores when you’re playing games to ensure they’re not accidentally being run on less-suitable CPU cores.

This technology will work the same basic way for the 9950X3D as it did for the older 7950X3D, but AMD has made some tweaks. Updates to the chipset driver mean that you can swap your current processor out for an X3D model without needing to totally reinstall Windows to get things working, for example, which was AMD’s previous recommendation for the 7000 series. Another update will improve performance for Windows 10 systems with virtualization-based security (VBS) enabled, though if you’re still on Windows 10, you should be considering an upgrade to Windows 11 so you can keep getting security updates past October.

And for situations where AMD’s drivers can’t automatically send the right workloads to the right kinds of cores, AMD also maintains a compatibility database of applications that need special treatment to take advantage of the 3D V-Cache in the 9900X3D and 9950X3D. AMD says it has added a handful of games to that list for the 9900/9950X3D launch, including Far Cry 6Deus Ex: Mankind Divided, and a couple of Total War games, among others.

Testbed notes

Common elements to all the platforms we test in our CPU testbed include a Lian Li O11 Air Mini case with an EVGA-provided Supernova 850 P6 power supply and a 280 mm Corsair iCue H115i Elite Capellix AIO cooler.

Since our last CPU review, we’ve done a bit of testbed updating to make sure that we’re accounting for a bunch of changes and turmoil on both Intel’s and AMD’s sides of the fence.

For starters, we’re running Windows 11 24H2 on all systems now, which AMD has said should marginally improve performance for architectures going all the way back to Zen 3 (on the desktop, the Ryzen 5000 series). The company made this revelation after early reviewers of the Ryzen 9000 series couldn’t re-create the oddball conditions of their own internal test setups.

As for Intel, the new testing incorporates fixes for the voltage spiking, processor-destroying bugs that affected 13th- and 14th-generation Core processors, issues that Intel fixed in phases throughout 2024. For the latest Core Ultra 200-series desktop CPUs, it also includes performance fixes Intel introduced in BIOS updates and drivers late last year and early this year. (You might have noticed that we didn’t run reviews of the 9800X3D or the Core Ultra 200 series at the time; all of this re-testing of multiple generations of CPUs was part of the reason why).

All of this is to say that any numbers you’re seeing in this review represent recent testing with newer Windows updates, BIOS updates, and drivers all installed.

One thing that isn’t top of the line at the moment is the GeForce RTX 4090, though we are using that now instead of a Radeon RX 7900 XTX.

The RTX 50 series was several months away from being announced when we began collecting updated test data, and we opted to keep the GPU the same for our 9950X3D testing so that we’d have a larger corpus of data to compare the chip to. The RTX 4090 is still, by a considerable margin, the second-fastest consumer GPU that exists right now. But at some point, when we’re ready to do yet another round of totally-from-scratch retesting, we’ll likely swap a 5090 in just to be sure we’re not bottlenecking the processor.

Performance and power: Benefits with fewer drawbacks

The 9950X3D has the second-highest CPU scores in our gaming benchmarks, and it’s behind the 9800X3D by only a handful of frames. This is one of the things we meant when we said that the 9800X3D was the better choice if you’re only worried about game performance. The same dynamic plays out between other 8- and 16-core Ryzen chips—higher power consumption and heat in the high-core-count chips usually bring game performance down just a bit despite the nominally higher boost clocks.

You’ll also pay for it in power consumption, at least at each chip’s default settings. On average, the 9950X3D uses 40 or 50 percent more power during our gaming benchmarks than the 9800X3D running the same benchmarks, even though it’s not capable of running them quite as quickly. But it’s similar to the power use of the regular 9950X, which is quite a bit slower in these gaming benchmarks, even if it does have broadly similar performance in most non-gaming benchmarks.

What’s impressive is what you see when you compare the 9950X3D to its immediate predecessor, the 7950X3D. The 9950X3D isn’t dramatically faster in games, reflecting Zen 5’s modest performance improvement over Zen 4. But the 9950X3D is a lot faster in our general-purpose benchmarks and other non-gaming CPU benchmarks because the changes to how the X3D chips are packaged have helped AMD keep clock speeds, voltages, and power limits pretty close to the same as they are for the regular 9950X.

In short, the 7950X3D gave up a fair bit of performance relative to the 7950X because of compromises needed to support 3D V-Cache. The 9950X3D doesn’t ask you to make the same compromises.

Testing the 9950X3D in its 105 W Eco Mode.

That comes with both upsides and downsides. For example, the 9950X3D looks a lot less power-efficient under load in our Handbrake video encoding test than the 7950X3D because it is using the same amount of power as a normal Ryzen processor. But that’s the other “normal” thing about the 9950X3D—the ability to manually tune those power settings and boost your efficiency if you’re OK with giving up a little performance. It’s not an either/or thing. And at least in our testing, games run just as fast when you set the 9950X3D to use the 105 W Eco Mode instead of the 170 W default TDP.

As for Intel, it just doesn’t have an answer for the X3D series. The Core Ultra 9 285K is perfectly competitive in our general-purpose CPU benchmarks and efficiency, but the Arrow Lake desktop chips struggle to compete with 14th-generation Core and Ryzen 7000 processors in gaming benchmarks, to say nothing of the Ryzen 9000 and to say even less than nothing of the 9800X3D or 9950X3D. That AMD has closed the gap between the 9950X and 9950X3D’s performance in our general-purpose CPU benchmarks means it’s hard to make an argument for Intel here.

The 9950X3D stands alone

I’m not and have never been the target audience for either the 16-core Ryzen processors or the X3D-series processors. When I’m building for myself (and when I’m recommending mainstream builds for our Ars System Guides), I’m normally an advocate for buying the most CPU you can for $200 or $300 and spending more money on a GPU.

But for the game-playing YouTubing content creators who are the 9950X3D’s intended audience, it’s definitely an impressive chip. Games can hit gobsmackingly high frame rates at lower resolutions when paired with a top-tier GPU, behind (and just barely behind) AMD’s own 9800X3D. At the same time, it’s just as good at general-use CPU-intensive tasks as the regular 9950X, fixing a trade-off that had been part of the X3D series since the beginning. AMD has also removed the limits it has in place on overclocking and adjusting power limits for the X3D processors in the 5000 and 7000 series.

So yes, it’s expensive, and no, most people probably don’t need the specific benefits it provides. It’s also possible that you’ll find edge cases where AMD’s technology for parking cores and sending the right kinds of work to the right CPU cores doesn’t work the way it should. But for people who do need or want ultra-high frame rates at lower resolutions or who have some other oddball workloads that benefit from the extra cache, the 9950X3D gives you all of the upsides with no discernible downsides other than cost. And, hey, even at $699, current-generation GPU prices almost make it look like a bargain.

The good

  • Excellent combination of the 9800X3D’s gaming performance and the 9950X’s general-purpose CPU performance
  • AMD has removed limitations on overclocking and power limit tweaking
  • Pretty much no competition for Intel for the specific kind of person the 9950X3D will appeal to

The bad

  • Niche CPUs that most people really don’t need to buy
  • Less power-efficient out of the box than the 7950X3D, though users have latitude to tune efficiency manually if they want
  • AMD’s software has sometimes had problems assigning the right kinds of apps to the right kinds of CPU cores, though we didn’t have issues with this during our testing

The ugly

  • Expensive

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

Ryzen 9 9950X3D review: Seriously fast, if a step backward in efficiency Read More »

former-google-ceo-eric-schmidt-is-the-new-leader-of-relativity-space

Former Google CEO Eric Schmidt is the new leader of Relativity Space

Another Silicon Valley investor is getting into the rocket business.

Former Google chief executive Eric Schmidt has taken a controlling interest in the Long Beach, California-based Relativity Space. The New York Times first reported the change becoming official, after Schmidt told employees in an all-hands meeting on Monday.

Schmidt’s involvement with Relativity has been quietly discussed among space industry insiders for a few months. Multiple sources told Ars that he has largely been bankrolling the company since the end of October, when the company’s previous fundraising dried up.

It is not immediately clear why Schmidt is taking a hands-on approach at Relativity. However, it is one of the few US-based companies with a credible path toward developing a medium-lift rocket that could potentially challenge the dominance of SpaceX and its Falcon 9 rocket. If the Terran R booster becomes commercially successful, it could play a big role in launching megaconstellations.

Schmidt’s ascension also means that Tim Ellis, the company’s co-founder, chief executive, and almost sole public persona for nearly a decade, is now out of a leadership position.

“Today marks a powerful new chapter as Eric Schmidt becomes Relativity’s CEO, while also providing substantial financial backing,” Ellis wrote on the social media site X. “I know there’s no one more tenacious or passionate to propel this dream forward. We have been working together to ensure a smooth transition, and I’ll proudly continue to support the team as Co-founder and Board member.”

Terran R’s road to launch

On Monday, Relativity also released a nearly 45-minute video that outlines the development of the Terran R rocket to date and the lengths to which it must go to reach the launch pad. Tellingly, Ellis appears only briefly in the video, which features several other senior officials who presumably will remain with the company, including Chief Operating Officer Zach Dunn.

Former Google CEO Eric Schmidt is the new leader of Relativity Space Read More »

hbo-drops-the-last-of-us-s2-trailer

HBO drops The Last of Us S2 trailer

Pedro Pascal returns as Joel in The Last of Us S2.

HBO released a one-minute teaser of the hotly anticipated second season of The Last of Us—based on Naughty Dog’s hugely popular video game franchise—during CES in January. We now have a full trailer, unveiled at SXSW after the footage leaked over the weekend, chock-full of Easter eggs for gaming fans of The Last of Us Part II.

(Spoilers for S1 below.)

The series takes place in the 20-year aftermath of a deadly outbreak of mutant fungus (Cordyceps) that turns humans into monstrous zombie-like creatures (the Infected, or Clickers). The world has become a series of separate totalitarian quarantine zones and independent settlements, with a thriving black market and a rebel militia known as the Fireflies making life complicated for the survivors. Joel (Pedro Pascal) is a hardened smuggler tasked with escorting the teenage Ellie (Bella Ramsay) across the devastated US, battling hostile forces and hordes of zombies, to a Fireflies unit outside the quarantine zone. Ellie is special: She is immune to the deadly fungus, and the hope is that her immunity holds the key to beating the disease.

S2 is set five years after the events of the first season and finds the bond beginning to fray between plucky survivors Joel and Ellie. That’s the inevitable outcome of S1’s shocking finale, when they finally arrived at their destination, only to discover the secret to her immunity to the Cordyceps fungus meant Ellie would have to die to find a cure. Ellie was willing to sacrifice herself, but once she was under anesthesia, Joel went berserk and killed all the hospital staff to save her life—and lied to Ellie about it, claiming the staff were killed by raiders.

HBO drops The Last of Us S2 trailer Read More »

better-than-the-real-thing?-spark-2-packs-39-amp-sims-into-$300-bluetooth-speaker

Better than the real thing? Spark 2 packs 39 amp sims into $300 Bluetooth speaker


Digital amp modeling goes very, very portable.

The Spark 2 from Positive Grid looks like a miniature old-school amp, but it is, essentially, a computer with some knobs and a speaker. It has Bluetooth, USB-C, and an associated smartphone app. It needs firmware updates, which can brick the device—ask me how I found this out—and it runs code on DSP chips. New guitar tones can be downloaded into the device, where they run as software rather than as analog electrical circuits in an amp or foot pedal.

In other words, the Spark 2 is the latest example of the “software-ization” of music.

Forget the old image of a studio filled with a million-dollar, 48-track mixing board from SSL or API and bursting with analog amps, vintage mics, and ginormous plate reverbs. Studios today are far more likely to be digital, where people record “in the box” (i.e., they track and mix on a computer running software like Pro Tools or Logic Pro) using digital models of classic (and expensive) amplifiers, coded by companies like NeuralDSP and IK Multimedia. These modeled amp sounds are then run through convolution software that relies on digital impulse responses captured from different speakers and speaker cabinets. They are modified with effects like chorus and distortion, which are all modeled, too. The results can be world-class, and they’re increasingly showing up on records.

Once the sounds are recorded, a mixer will often use digital plugins to replicate studio gear like tape delays, FET compressors, and reverbs (which may be completely algorithmic or may rely on impulse responses captured from real halls, studios, plates, and spring reverbs). These days, even the microphones might be digitally modeled by companies like Slate, Antelope, and Universal Audio.

This has put incredible power into the hands of home musicians; for a couple of thousand bucks, most home studios can own models of gear that would have cost more than a house 20 years ago. But one downside of this shift to software is that all the annoying quirks of computing devices have followed.

Want to rock out to the classic Marshall tones found in Universal Audio’s “Lion” amp simulator plugin? Just plug your guitar into your audio interface, connect the interface to a computer via USB, launch a DAW, instantiate the plugin on a blank track, choose the correct input, activate input monitoring so you can hear the results of your jamming, and adjust your DAW’s buffer size to something small in an attempt to prevent latency. A problem with any item on that list means “no jamming for you.”

You may be prompted to update the firmware in your audio interface, or to update your operating system, or to update your DAW—or even its plugins. Oh, and did I mention that Universal Audio uses the truly terrible iLok DRM system and that if your Wi-Fi drops for even a few minutes, the plugins will deactivate? Also, you’ll need to run a constant companion app in the background called UA Connect, which itself can be prone to problems.

Assuming everything is up to date and working, you’re still tethered to your computer by a cable, and you have to make all your settings tweaks with a mouse. After a day of working on computers, this is not quite how I want to spend my “music time.”

But the upsides of digital modeling are just too compelling to return to the old, appliance-like analog gear. For one thing, the analog stuff is expensive. The Lion amp plugin mentioned above gives you not one but several versions of a high-quality Marshall head unit—each one costing thousands of dollars—but you don’t need to lift it (they’re heavy!), mic it (annoying!), or play it at absurdly low levels because your baby is sleeping upstairs. For under a hundred bucks, you can get that sound of an overdriven Marshall turned up to 75 percent and played through several different speaker cabinet options (each of these is also expensive!) right on your machine.

Or consider the Tone King Imperial Mk II, a $2,700, Fender-style amp built in the US. It sounds great. But NeuralDSP offers a stunning digital model for a hundred bucks—and it comes with compressor, overdrive, delay, and reverb pedals, to say nothing of a tuner, a doubler, a pitch-shifter, and a ton of great presets.

So I want the digital amp modeling, but I also want—sometimes, at least—the tactile simplicity of physical knobs and well-built hardware. Or I want to jack in and play without waking up a computer, logging in, launching apps, or using a mouse and an audio interface. Or I want to take my amp models to places where finicky computers aren’t always welcome, like the stage of a club.

Thanks to hardware like the Profiler from Kemper, the Helix gear from Line6, the Cortex pedalboards from NeuralDSP, or Tonex gear from IK Multimedia, this is increasingly common.

The Spark line from Positive Grid has carved out its own niche in this world by offering well-built little amps that run Positive Grid’s digital amp and effects simulations. (If you don’t want the hardware, the company sells its modeling software for PC and Mac under the “Bias” label.)

The Spark 2 is the latest in this line, and I’ve been putting it through its paces over the last couple of months.

Let’s cut right to the conclusion: The Spark 2 is a well-designed, well-built piece of gear. For $300, you get a portable, 50-watt practice amp and Bluetooth speaker that can store eight guitar tones onboard and download thousands more using a smartphone app. Its models aren’t, to my ears, the most realistic out there, but if you want a device to jack into and jam, to play along with backing tracks or loops, or to record some creative ideas, this fits the bill.

Photo of Spark 2.

Credit: Positive Grid

Good practice

Everything about the Spark 2 feels well-built. The unit is surprisingly solid, and it comes with a carrying strap for portability. If you want to truly live the wire-free lifestyle, you can buy a battery pack for $79 that gives you several hours of juice.

For a practice amp, the Spark 2 is also well-connected. It has Bluetooth for streaming audio—but it also has a 3.5 mm aux in jack. It has decent, if somewhat boxy-sounding, speakers, and they get quite loud—but it also has two quarter-inch line out jacks. It has a guitar input jack and a headphone jack. It can use a power supply or a battery. It can connect to a computer via USB, and you can even record that way if you don’t have another audio interface.

Most of the unit’s top is taken up with chunky knobs. These let you select one of the eight onboard presets or adjust model parameters like gain, EQ, modulation, delay, and reverb. There’s also a knob for blending your guitar audio with music played through the device.

Buttons provide basic access to a tuner and a looper, though the associated app unlocks more complex options.

So about that app. It’s not necessary to use the Spark 2, but you’ll need the app if you want to download or create new tones from the many pieces of modeled gear. Options here go far beyond what’s possible with the knobs atop the physical unit.

Spark models a chamber reverb, for instance, which is basically a reflective room into which a speaker plays sound that a microphone picks up. The Spark chamber lets you adjust the volume level of the reverb signal, the reflection time of the chamber, the “dwell” time of the sound in the room, the amount of sound damping, and whether the sound will have some of its lows or highs cut off. (This is common in reverbs to avoid excessive low-end “mud” or top-end “brightness” building up in the reverberating signal.) You’ll need the app to adjust most of these options; the “reverb” control on the Spark 2 simply changes the level.

There’s a fair bit of modeled gear on offer: one noise gate, six compressors, 14 drive pedals, 39 amps, 13 EQ units, six delays, and nine reverbs. Most of these have numerous options. It is not nearly as overwhelming as a package like Amplitube for PCs and Macs, but it’s still a lot of stuff.

To run it all, Positive Grid has beefed up the computational power of the Spark series. The company told me that digital signal processing power has doubled since the original Spark lineup, which allows for “smoother transitions between tones, richer effects, and an expanded memory for presets and loops.” The system runs on an M7 chip “developed specifically for expanded processing power and precise tone reproduction,” and the extra power has allowed Positive Grid to run more complex models on-device, improving their preamp and amplifier sag modeling.

Despite the DSP increase, the results here just don’t compare with the sort of scary-precise tube amp and effects simulations you can run on a computer or a far more expensive hardware modeling rig. I could never get clean and “edge of breakup” tones to sound anything other than artificial, though some of the distortion sounds were quite good. Reverbs and delays also sounded solid.

But the Spark 2 wasn’t really designed for studio-quality recording, and Positive Grid is candid about this. The models running on the Spark 2 are inspired by the company’s computer work, but they are “optimized for an all-in-one, mobile-friendly playing experience,” I was told. The Spark 2 is meant for “practice, jamming, and basic recording,” and those looking for “studio-level control and complex setups” should seek out something else.

This tracks with my experience. Compared to a regular amp, the Spark 2 is crazy portable. When testing the unit, I would haul it between rooms without a second thought, searching for a place to play that wouldn’t annoy some member of my family. (Headphones? Never!) Thanks to the optional battery, I didn’t even need to plug it in. It was a simple, fun way to get some electric guitar practice in without using a screen or a computer, and its sound could fill an entire room. Compared to the weight and hassle of moving a “real” amp, this felt easy.

About that app

I’ve been talking about the Spark 2 and its screen-free experience, but of course you do need to use the app to unlock more advanced features and download new tones onto the hardware. So how good is the software?

For modifying the gear in your presets, the app works fine. Every piece of gear has a nice picture, and you just flick up or down to get a piece of equipment into or out of the effects chain. Changing parameters is simple, with large numbers popping up on screen whenever you touch a virtual control, and you can draw from a huge library of pre-made effect chains.

The app also features plenty of backing music that it can play over the Spark 2. This includes backing tracks, tabbed songs, and the “groove looper,” giving you plenty of options to work on your soloing, but it’s the artificial intelligence that Positive Grid is really pitching this time around.

You are legally required to shoehorn “AI” into every product launch now, and Positive Grid put its AI tools into the app. These include Smart Jam, which tries to adapt to your playing and accompany it in real time. The company tells me that Smart Jam was “trained on a combination of musical datasets that analyze chord structures, song patterns, and rhythmic elements,” but I could never get great results from it. Because the system doesn’t know what you’re going to play in advance, there was always a herky-jerky quality as it tried to adapt its backing track to my changing performance.

I had more success with Spark AI, which is a natural language tone-shaping engine. You tell the system what you’re looking for—the solo in “Stairway to Heaven,” perhaps—and it returns several presets meant to approximate that sound. It does work, I’ll say that. The system reliably gave me tone options that were, with a little imagination, identifiable as “in the ballpark” of what I asked for.

Perhaps the main barrier here is simply that the current Spark amp models aren’t always powerful enough to truly copy the sounds you might be looking for. Spark AI is a great way to pull up a tone that’s appropriate for whatever song you might be practicing, and to do so without forcing you to build it yourself out of pieces of virtual gear. In that sense, it’s a nice practice aid.

Rock on

As it’s pitched—a practice amp and Bluetooth speaker that costs $300—Spark 2 succeeds. It’s such a well-built and designed unit that I enjoyed using it every time I played, even if the tones couldn’t match a real tube amp or even top-quality models. And the portability was more useful than expected, even when just using it around the house.

As DSP chips grow ever more powerful, I’m looking forward to where modeling can take us. For recording purposes, some of the best models will continue to run on powerful personal computers. But for those looking to jam, or to play shows, or to haul a guitar to the beach for an afternoon, hardware products running modeling software offer incredible possibilities already—and they will “spark” even more creativity in the years to come.

Photo of Nate Anderson

Better than the real thing? Spark 2 packs 39 amp sims into $300 Bluetooth speaker Read More »

huh?-the-valuable-role-of-interjections

Huh? The valuable role of interjections


Utterances like um, wow, and mm-hmm aren’t garbage—they keep conversations flowing.

Interjections—one-word utterances that aren’t part of a larger sentence—used to be dismissed as irrelevant linguistic detritus. But some linguists now think they play an essential role in regulating conversations. Credit: Daniel Garcia/Knowable Magazine

Interjections—one-word utterances that aren’t part of a larger sentence—used to be dismissed as irrelevant linguistic detritus. But some linguists now think they play an essential role in regulating conversations. Credit: Daniel Garcia/Knowable Magazine

Listen carefully to a spoken conversation and you’ll notice that the speakers use a lot of little quasi-words—mm-hmm, um, huh? and the like—that don’t convey any information about the topic of the conversation itself. For many decades, linguists regarded such utterances as largely irrelevant noise, the flotsam and jetsam that accumulate on the margins of language when speakers aren’t as articulate as they’d like to be.

But these little words may be much more important than that. A few linguists now think that far from being detritus, they may be crucial traffic signals to regulate the flow of conversation as well as tools to negotiate mutual understanding. That puts them at the heart of language itself—and they may be the hardest part of language for artificial intelligence to master.

“Here is this phenomenon that lives right under our nose, that we barely noticed,” says Mark Dingemanse, a linguist at Radboud University in the Netherlands, “that turns out to upend our ideas of what makes complex language even possible in the first place.”

For most of the history of linguistics, scholars have tended to focus on written language, in large part because that’s what they had records of. But once recordings of conversation became available, they could begin to analyze spoken language the same way as writing.

When they did, they observed that interjections—that is, short utterances of just a word or two that are not part of a larger sentence—were ubiquitous in everyday speech. “One in every seven utterances are one of these things,” says Dingemanse, who explores the use of interjections in the 2024 Annual Review of Linguistics. “You’re going to find one of those little guys flying by every 12 seconds. Apparently, we need them.”

Many of these interjections serve to regulate the flow of conversation. “Think of it as a tool kit for conducting interactions,” says Dingemanse. “If you want to have streamlined conversations, these are the tools you need.” An um or uh from the speaker, for example, signals that they’re about to pause, but aren’t finished speaking. A quick huh? or what? from the listener, on the other hand, can signal a failure of communication that the speaker needs to repair.

That need seems to be universal: In a survey of 31 languages around the world, Dingemanse and his colleagues found that all of them used a short, neutral syllable similar to huh? as a repair signal, probably because it’s quick to produce. “In that moment of difficulty, you’re going to need the simplest possible question word, and that’s what huh? is,” says Dingemanse. “We think all societies will stumble on this, for the same reason.”

Other interjections serve as what some linguists call “continuers,” such as mm-hmm — signals from the listener that they’re paying attention and the speaker should keep going. Once again, the form of the word is well suited to its function: Because mm-hmm is made with a closed mouth, it’s clear that the signaler does not intend to speak.

Sign languages often handle continuers differently, but then again, two people signing at the same time can be less disruptive than two people speaking, says Carl Börstell, a linguist at the University of Bergen in Norway. In Swedish Sign Language, for example, listeners often sign yes as a continuer for long stretches, but to keep this continuer unobtrusive, the sender tends to hold their hands lower than usual.

Different interjections can send slightly different signals. Consider, for example, one person describing to another how to build a piece of Ikea furniture, says Allison Nguyen, a psycholinguist at Illinois State University. In such a conversation, mm-hmm might indicate that the speaker should continue explaining the current step, while yeah or OK would imply that the listener is done with that step and it’s time to move on to the next.

Wow! There’s more

Continuers aren’t merely for politeness—they really matter to a conversation, says Dingemanse. In one classic experiment from more than two decades ago, 34 undergraduate students listened as another volunteer told them a story. Some of the listeners gave the usual “I’m listening” signals, while others—who had been instructed to count the number of words beginning with the letter t—were too distracted to do so. The lack of normal signals from the listeners led to stories that were less well crafted, the researchers found. “That shows that these little words are quite consequential,” says Dingemanse.

Nguyen agrees that such words are far from meaningless. “They really do a lot for mutual understanding and mutual conversation,” she says. She’s now working to see if emojis serve similar functions in text conversations.

Storytellers depend on feedback such as mm-hmm and other interjections from their listeners. In this experiment, some listeners were told to count the number of times the storyteller used a word starting with t—a challenging task that prevented them from giving normal feedback. The quality of storytelling declined significantly, with problems like abrupt endings, rambling on, uneven or choppy pacing and overexplaining or justifying the point. Credit: Knowable Magazine

The role of interjections goes even deeper than regulating the flow of conversation. Interjections also help in negotiating the ground rules of a conversation. Every time two people converse, they need to establish an understanding of where each is coming from: what each participant knows to begin with, what they think the other person knows and how much detail they want to hear. Much of this work—what linguists call “grounding”—is carried out by interjections.

“If I’m telling you a story and you say something like ‘Wow!’ I might find that encouraging and add more detail,” says Nguyen. “But if you do something like, ‘Uh-huh,’ I’m going to assume you aren’t interested in more detail.”

A key part of grounding is working out what each participant thinks about the other’s knowledge, says Martina Wiltschko, a theoretical linguist at the Catalan Institution for Research and Advanced Studies in Barcelona, Spain. Some languages, like Mandarin, explicitly differentiate between “I’m telling you something you didn’t know” and “I’m telling you something that I think you knew already.” In English, that task falls largely on interjections.

One of Wiltschko’s favorite examples is the Canadian eh?  “If I tell you you have a new dog, I’m usually not telling you stuff you don’t know, so it’s weird for me to tell you,” she says. But ‘You have a new dog, eh?’ eliminates the weirdness by flagging the statement as news to the speaker, not the listener.

Other interjections can indicate that the speaker knows they’re not giving the other participant what they sought. “If you ask me what’s the weather like in Barcelona, I can say ‘Well, I haven’t been outside yet,’” says Wiltschko. The well is an acknowledgement that she’s not quite answering the question.

Wiltschko and her students have now examined more than 20 languages, and every one of them uses little words for negotiations like these. “I haven’t found a language that doesn’t do these three general things: what I know, what I think you know and turn-taking,” she says. They are key to regulating conversations, she adds: “We are building common ground, and we are taking turns.”

Details like these aren’t just arcana for linguists to obsess over. Using interjections properly is a key part of sounding fluent in speaking a second language, notes Wiltschko, but language teachers often ignore them. “When it comes to language teaching, you get points deducted for using ums and uhs, because you’re ‘not fluent,’” she says. “But native speakers use them, because it helps! They should be taught.” Artificial intelligence, too, can struggle to use interjections well, she notes, making them the best way to distinguish between a computer and a real human.

And interjections also provide a window into interpersonal relationships. “These little markers say so much about what you think,” she says—and they’re harder to control than the actual content. Maybe couples therapists, for example, would find that interjections afford useful insights into how their clients regard one another and how they negotiate power in a conversation. The interjection oh often signals confrontation, she says, as in the difference between “Do you want to go out for dinner?” and “Oh, so now you want to go out for dinner?”

Indeed, these little words go right to the heart of language and what it is for. “Language exists because we need to interact with one another,” says Börstell. “For me, that’s the main reason for language being so successful.”

Dingemanse goes one step further. Interjections, he says, don’t just facilitate our conversations. In negotiating points of view and grounding, they’re also how language talks about talking.

“With huh?  you say not just ‘I didn’t understand,’” says Dingemanse. “It’s ‘I understand you’re trying to tell me something, but I didn’t get it.’” That reflexivity enables more sophisticated speech and thought. Indeed, he says, “I don’t think we would have complex language if it were not for these simple words.”

Photo of Knowable Magazine

Knowable Magazine explores the real-world significance of scholarly work through a journalistic lens.

Huh? The valuable role of interjections Read More »

maserati-kills-electric-version-of-mc20-supercar-for-lack-of-demand

Maserati kills electric version of MC20 supercar for lack of demand

Electric motors are, in so many ways, much better than internal combustion engines. They don’t waste most of the energy you put into them as heat and sound, they’re easy to control, and they make huge amounts of torque almost instantly. Having recently driven BMW’s 430i and i4 back to back over the course of two weeks, the electric version was easier in traffic and more responsive on a twisty road. Electric wins, then. Except at the very high end, it seems.

Because even though electric motors can pack a punch, people paying big money for super- and hypercars are increasingly disinterested in those cars being electrified. So much so that Maserati has canceled the all-electric version of the MC20.

The MC20 debuted in 2020. No longer associated with Ferrari after that brand was spun out and IPO’d, the MC20 could offer a full carbon-fiber monocoque and an engine with very clever F1-derived combustion technology, undercutting its now-independent Italian competitor to the tune of more than $100,000 in the process.

Maserati kills electric version of MC20 supercar for lack of demand Read More »