Author name: Shannon Garcia

sunderfolk-review:-rpg-magic-that-transports-your-friends-together

Sunderfolk review: RPG magic that transports your friends together


Using your phone as a controller keeps you engaged with this accommodating RPG.

The creators of Sunderfolk wanted to make a video game that would help players “Rediscover game night.” By my reckoning, they have succeeded, because I am now regularly arguing with good friends over stupid moves. Why didn’t I pick up that gold? Don’t you see how ending up there messed up an area attack? Ah, well.

That kind of friendly friction, inside dedicated social time, only gets harder to come by as you get older, settle into routines, and sometimes move apart. I’ve hosted four Sunderfolk sessions with three friends, all in different states, and it has felt like reclaiming something I lost. Sunderfolk is a fun game with a lot of good ideas, and the best one is convincing humans to join up in pondering hex tiles, turn order, and what to name the ogres who shoot arrows (“Pointy Bros”).

Maybe you already have all the gaming appointments you need with friends, online or in person. Sunderfolk, I might suggest, is a worthy addition to your queue as a low-effort way to give everyone a break from being the organizer. It does a decent job of tutorializing and onboarding less experienced players, then adds depth as it goes on. Given that only one person out of four has to own the game on some system, and the only other hardware needed is a phone, it’s a pretty light lift for what I’m finding to be a great payoff. Some parts could be improved, but the core loop and its camaraderie engine feel sturdy.

I haven’t reached the mine cart missions yet but am glad to know they exist.

Credit: Dreamhaven

I haven’t reached the mine cart missions yet but am glad to know they exist. Credit: Dreamhaven

Pick a class, take a seat

My party getting a well-deserved level up. From left: Boom Boom the berserker, Roguefer, Bob the mage, and Fire Bob.

Credit: Kevin Purdy

My party getting a well-deserved level up. From left: Boom Boom the berserker, Roguefer, Bob the mage, and Fire Bob. Credit: Kevin Purdy

Sunderfolk is a turn-based tactical RPG, putting you and your friends on a grid filled with objects, enemies, and surprises. You pick from familiar role-playing character classes—my party picked rogue, berserker, wizard, and a kind of pyromancer—and choose one ability card each turn. The cards put a Gloomhaven-like emphasis on sequence and map positioning. One of my rogue’s potential moves is a quick attack, then gaining strength by picking up nearby gold. Another involves moving, hitting, moving, hitting, then one more single-hex move at the end, to stay out of danger and get a protective “Shrouded” effect.

You and your squad are all watching the same screen, be it a living room TV, a laptop, or a window streamed over Zoom or Discord. You choose your cards, plot your movement, and interact with everything using your phone or tablet’s touchscreen.  Once you’ve won a quest by beating the baddies and/or hitting other markers, you head back to town and do a whole bunch of housekeeping tasks. Sunderfolk has mechanics for both players not being present (scaling the quests and keeping the missing leveled up) and for someone having to drop mid-battle (someone else can play their seat and their own). It’s accommodating to players of different RPG experience levels and different schedules.

Sunderfolk launch date trailer.

You have my sword—and my phone

Let’s address the phone controls, the roughly 6-inch-diagonal elephant in the room. My three friends all had to spend a few minutes getting used to using their phone screen as a multi-modal controller: touchpad for hex movement and cursor pointing, card picker and info box reader, and then the town landscape screen. After that, nobody had any real issues with the controls themselves. The tactile feedback guides your finger, and there was no appreciable lag in our sessions.

Sometimes we’d get momentarily flummoxed by the card-choosing flow, and there is perhaps some inherent mental tax in switching between screens. But the phone controls, besides making couch co-op possible, also allowed everyone in my group to play in their most comfortable spot: a TV streaming the Discord app, a tablet on the couch, a laptop at the kitchen counter.

Not for nothing, but with each player using their phone for controls—and the game announcing when players had “disconnected” if they switched to another app too long, only for them to come right back—Sunderfolk can apply some anti-scrolling pressure and keep everyone checked in. You could get around this with secondary devices or tiled windows, but it’s better to be present and ask your friends whose turn it is to fire off their Ultimate card.

(To clarify how remote play works: One owner of the game can screen-cast their game to Discord, Zoom, Meet, or whatever service, everyone playing can chat there or elsewhere, and players log in their phones/controllers with a QR code that is displayed from the screen-casted main title screen)

Cheerhaven

Most times, your party will be spread out all over the town, but in this provided screenshot, everybody has remembered to upgrade their Fate cards at the Temple.

Credit: Dreamhaven

Most times, your party will be spread out all over the town, but in this provided screenshot, everybody has remembered to upgrade their Fate cards at the Temple. Credit: Dreamhaven

All this adventuring takes place in a world of anthropomorphized animals and overgrown woods and purple-blue ogres that strongly evoke World of Warcraft’s style (at least in Act 1). When you’re back in town, you tap around to chat with recurring NPCs, gain friendship levels that sometimes result in gifts, and upgrade parts of the town to your liking. The town hub provides more opportunities for strategy and bonding with players. You might send some of the gold you greedily picked up on the last quest to the friend so they can nab a great weapon. You might, as a group, buy a quirky town upgrade just for the chance to rename some things.

But the town is one place I felt some friction, familiar from more in-depth board games. Some players will be done with making their decisions and speeding through dialogue faster than others and may or may not be more engaged with the town chatterboxes. Just as with cardboard games, you can take this moment to get up, stretch your legs, and maybe refresh a drink. But you might have to nudge people along if they’re overwhelmed by gear choices.

The non-gory, often goofy nature of Sunderfolk’s setting makes it appropriate for a wider range of players. The voice acting, almost all of it by Anjali Bhimani (i.e., Symmetra), re-creates the feel of having a game master switch between “Frightened Blue Jay miner” and “Furious Ogre Queen” in one session. I’m not too engaged in the broad plot after one act—ogres, fueled by Darkstone, want to extinguish the village’s Brightstone—but it’s not a big deal. The game has given my group something else to latch onto: naming things.

Touching the Neatos to heal Boom Boom

Michael Keaton must reach an exit hex!

Credit: Kevin Purdy

Michael Keaton must reach an exit hex! Credit: Kevin Purdy

The chance to name things in Sunderfolk, and have those names stick for the whole campaign, is something a good GM would do to engage their players and break up tension. Sunderfolk is clever about this, offering both secret naming prompts to individual players on their phones or dishing out naming opportunities in town. In my party’s campaign, healing statues are named Neatos, the town bridge is Seagull Murder (a misremembered, obscure Peace Bridge reference), and the beetle we rescued is named, as it was during my preview, Michael Keaton. It’s fun to build your own stupid world out of goofy names, something too few games provide.

Individual phone controls give the game a chance to pull off a few other tricks, like only telling certain players about how a certain enemy looks like they’re carrying great loot. If Sunderfolk added even more of this, I would not mind at all.

I’m generally enjoying the combat, difficulty ramp (on the default setting), and upgrade paths of the characters. After three or four sessions, your character has much more move variety, and items and weapons are more useful and varied. The town of Arden, while overly chatty, has more to offer. It feels like a game that has had its pacing and onboarding fine-tuned.

But I have nits to pick:

  • There are cheap but great upgrades you can easily miss in town, like tavern meals and temple fate cards
  • Enemy variety feels slightly lacking in the first act
  • Some things, like mission selection, demand all-party agreement; perhaps the game could figuratively flip a coin when parties are divided
  • Everyone in my party has accidentally skipped an attack once or twice, despite an “Are you sure?” prompt
  • Movement traces and signaling could be clearer, as we have all also wasted hexes and shoved each other around

A very human computer game

You and your friends deal with a lot more stuff as Sunderfolk goes on: Boom Shrooms, loot, pits, explosives, and lots of little coin piles.

Credit: Kevin Purdy

You and your friends deal with a lot more stuff as Sunderfolk goes on: Boom Shrooms, loot, pits, explosives, and lots of little coin piles. Credit: Kevin Purdy

It’s been hard to be overly critical of a game that has all but forced me to log off and talk to friends for a couple hours each week. The downsides of Sunderfolk have mostly been the same as those of playing any tabletop game with humans: waiting, expertise imbalance, distraction, and someone’s dog needing attention.

Beyond that, I think Sunderfolk is a success at what it set out to do: Put the cardboard, cards, and dice on the screen and make it easier for everyone to show up. It won’t replace the traditional game night, but it might bring more people into it and remind people like me why it’s so good.

This post was updated at 10: 45 a.m. with a note about how remote play can work.

Photo of Kevin Purdy

Kevin is a senior technology reporter at Ars Technica, covering open-source software, PC gaming, home automation, repairability, e-bikes, and tech history. He has previously worked at Lifehacker, Wirecutter, iFixit, and Carbon Switch.

Sunderfolk review: RPG magic that transports your friends together Read More »

recap:-wheel-of-time’s-third-season-balefires-its-way-to-a-hell-of-a-finish

Recap: Wheel of Time’s third season balefires its way to a hell of a finish

Andrew Cunningham and Lee Hutchinson have spent decades of their lives with Robert Jordan and Brandon Sanderson’s Wheel of Time books, and they previously brought that knowledge to bear as they recapped each first season episode and second season episode of Amazon’s WoT TV series. Now we’re back in the saddle for season 3—along with insights, jokes, and the occasional wild theory.

These recaps won’t cover every element of every episode, but they will contain major spoilers for the show and the book series. We’ll do our best to not spoil major future events from the books, but there’s always the danger that something might slip out. If you want to stay completely unspoiled and haven’t read the books, these recaps aren’t for you.

New episodes of The Wheel of Time season three will be posted for Amazon Prime subscribers every Thursday. This write-up covers the season three finale, “He Who Comes With the Dawn,” which was released on April 17.

Lee: Wow. That was… a lot.

One of the recurring themes of our recaps across seasons has been, “Well, I guess we’re going to have to give up on seeing $SEMI_MAJOR_BOOK_SETTING_OR_EVENT on screen because of budget or time or narrative reasons,” and we’ve had to let go of a lot of stuff. But this episode kicks off with a flashback showing Elaida walking out of a certain twisted redstone doorframe, looking smug and fingering a bracelet. Sharp-eyed viewers might have spotted this doorway in the background of the season premiere, when the Black Ajah loots the Tower’s ter’angreal storeroom, and now in true Chekov’s Gun fashion, the doorway comes ’round again—and not just this one, because like many things in the Wheel of Time, the doorways come in a binary set.

We surely owe show-watchers a very quick recap of the Finn—and I believe we glossed over a scene in an earlier episode where the boys are actually playing the snakes-and-foxes game that these horrifying fae-folk are based on—but before we do that, let’s take a breath and look at what else we’ve got in the episode. Closure! (Well, some.) Balefire! Blocks breaking! Rand pulling a Paul Atreides and making it rain on Dune! I mean, uh, in the Three-Fold Land! And many other things!

Image of an Eelfin

According to the book, this Cat-in-the-Hat-looking mfer’s clothes are made of human flesh. Creepy.

Credit: Prime/Amazon MGM Studios

According to the book, this Cat-in-the-Hat-looking mfer’s clothes are made of human flesh. Creepy. Credit: Prime/Amazon MGM Studios

Andrew: I found this episode less than satisfying after last week’s specifically because of that grab-bag approach. There is some exciting, significant, season finale-style stuff happening here, but it’s also one of those piece-moving episodes with scene after scene of setup, setup, setup without a ton of room for payoff. Setup for a fourth season that, as of this writing, we still don’t know whether we’re getting!

So a number of things just feel rushed, most significantly Rand’s hard turn on Lanfear after a cursory attempt to coax her back to the side of the Light, and the existence of balefire as a concept. I actually love how the show visualizes it—it’s essentially a giant death laser that melts you out of the Pattern so thoroughly that it doesn’t just kill you, it also erases the last few seconds of your existence, represented here as a little shadow of a person that rewinds a bit before dissipating. The books use balefire extensively as a get-out-of-jail-free card for certain major character deaths, so it really feels like something that needs a little more preamble than it gets here.

Lee: Definitely hear you on the Rand and Lanfear stuff—though I think I was so excited by the things I cared about that I wasn’t really paying a lot of attention to the things I didn’t. And Rand & Moiraine & Lanfear are kind of at the bottom of my list of things I’m paying attention to as we slide into the finish—yeah, the Car’a’carn is Car’a’carning and Lanfear is Lanfear’ing.

Image of balefire balefiring someone.

Balefire looks a little Ghostbusters-y, but I definitely wouldn’t want to get hit with any.

Credit: Prime/Amazon MGM Studios

Balefire looks a little Ghostbusters-y, but I definitely wouldn’t want to get hit with any. Credit: Prime/Amazon MGM Studios

Andrew: It’s hard to know where to start with the rest of it! There are some recreations of book events that happen roughly where they’re “supposed” to in the story. There are recreations of book events that have been pulled way forward to save some time. There are things that emphatically don’t happen in the books, also done at least partly in the interests of time. And there’s at least one thing that felt designed specifically to fake out book-readers.

What to dig into first?

Lee: The fake-out! Let’s jump in there. The books make a big deal about Rand needing a teacher for him to get good at channeling, and it can’t be a female Aes Sedai (as the oft-repeated bit about “a bird cannot teach a fish to swim” makes clear). It seemed like it might be poor neglected Logain (remember him?), but now the show makes it clear that the man on the spot is instead going to be Sammael—and then Moghedien comes along and puts all of Sammael’s insides on the outside. Soooooo… I guess Sammael is off the board.

Image of Sammael being extraordinarily dead

Sammael (center) appears to be about as dead as Siuan. So much for that plotline.

Credit: Prime/Amazon MGM Studios

Sammael (center) appears to be about as dead as Siuan. So much for that plotline. Credit: Prime/Amazon MGM Studios

Andrew: Yup! We still have one Forsaken missing, by my count—there are eight in total in the show’s world, and we’ve seen five and had two more referenced by name. So the big open question is whether the eighth is the Forsaken who does end up in the Rand-teacher role in the books. I feel like the show wouldn’t have spent so much time setting up “Rand needs a teacher” without then bothering to follow up on it in some way, but this episode wants to tease people who are asking that question rather than answering it. Fair enough!

Sammael’s early death (pulled forward from book seven) has its own story reverberations. In the books he’s one of a few Forsaken who set themselves up as heads of state, and Rand has to run around individually defeating them and bringing all of these separate kingdoms together in time for the Last Battle (this is less exciting than it sounds, because it takes forever and requires endless patience for navigating the politics of each region).

It seems, increasingly, that we may just be skipping over a bunch of that stuff. That was already implied by the downplaying of Cairhienin politicking that we got on screen in season two, and I tend to see “putting all of Sammael’s blood on the outside” as another possible nod in that direction. As ever with this show, “knowing how it goes in the books” only gives us a limited amount of insight into what the show is going to do.

Lee: I’m liking it. I consider Rand’s world-unifying to be one of the core components of The Slog that we discussed last week, and I think anything that greases the skids on that entire plotline is unequivocally a good thing—that’s also about where I start skipping entire chapters if the word “Elayne” appears in them (trust me on this, show-watchers who might become book-readers: Elayne spends thousands of pages playing the most boring version of the Game of Thrones imaginable, and we suffer through every single interminable import/export discussion with her).

Speaking of Game of Thrones—at least in the sense of killing off characters and potentially shortening The Slog—Siuan’s dead! And probably not in a “can be fixed” kind of way, since we very clearly see her head separated from her body, and Moiraine gasps out confirmation. This one kind of shook me, since Siuan has a big major role to play in a certain big major thing that happens several books hence—but the more I think about it, the more this feels like the same kind of narrative belt-tightening that brought us Loial’s death last episode. Because up until that certain big major thing happens, Siuan spends a lot of her post-Amyrlin time as a scullery maid and underpants-washer. I think we can transplant that certain big major thing onto one of a half-dozen other characters and lose nothing. At least…I think. What about you?

Image of a dead Siuan Sanche

Siuan (center) has passed on. She is no more. She has ceased to be.

Credit: Prime/Amazon MGM Studios

Siuan (center) has passed on. She is no more. She has ceased to be. Credit: Prime/Amazon MGM Studios

Andrew: Yeah, I mean, not nothing, exactly. Every book character we have lost on the show has done stuff that I liked in the books that is now probably not going to happen. Complaining about The Slog aside, people like these books in part because they successfully build a super-dense world inhabited by a million named characters who all have Moments. Post-Amyrlin Siuan’s journey is about humility, finding happiness, and showing that the literal One Power is not the only kind of power there is to wield; it’s not always thrilling, but I won’t say it’s of zero narrative value.

And even when discussing The Slog, part of the reason it was so infuriating is because you and I were reading these as they were coming out. If you wait three years for a book, and then it comes out and nothing happens: that’s maddening! It is also not a problem that exists for modern readers or re-readers, now that the books have been done and dusted for over a decade. My assessment of Knife of Dreams, the series’ 11th book and the last one written entirely by Jordan, went way up on my last re-read because I was able to experience it without also having to experience the bookless years before and after. (It also made me newly sad that Jordan wasn’t able to conclude the story himself, as someone who finds the Sanderson-assisted books a bit clunky and utilitarian.)

All of that being said! I agree that from this point forward in the story, Siuan is not a load-bearing character in the way that Rand or Egwene or the others are. You do also get the sense that the show wants to surprise book-readers with something big every now and again. This particular death achieves that and also cuts down on what the show has left to adapt. I get why they did it! But I also sympathize with people who will miss her.

Image of Elaida as Amyrlin

Now that she’s Amyrlin, Elaida (center) gets to wear the biggest hat of all.

Credit: Prime/Amazon MGM Studios

Now that she’s Amyrlin, Elaida (center) gets to wear the biggest hat of all. Credit: Prime/Amazon MGM Studios

Lee: Let’s pivot, because I can’t wait to discuss Mat’s journey into Finn-land—one of the most important things that happens to his character in the books. I was pretty convinced that we simply weren’t going to get any of this in the show—that the Aelfinn and Eelfinn would be too outside what Amazon is willing to pay for. And yet, there are our two twisted redstone doorways. They’re repositioned somewhat from their book locations, but in a believable fashion. We have no idea what Elaida might have been doing in the doorway in the bowels of the White Tower—presumably she visited the snake-like Aelfinn (and the subtitles confirm this), which leaves Mat visiting the fox-like Eelfin.

The show has been dropping hints about this all season, from flashing us a shot of the first doorway in episode one, to actually showing the “snakes and foxes” tabletop game being played, and finally, here we are—while hunting for the control necklace in the Panarch’s palace in Tanchico, Mat steps through the doorway and… gets three wishes from a horrifying BDSM furry?

Break it down for us, Andrew. What the hell are we looking at?

Andrew: When you enter through these doors, the Finn give you stuff! The Aelfinn give you knowledge, by answering three questions. And the Eelfinn give you Things, both tangible and intangible, by granting three wishes. Exactly what these people are, where they live, why they have this arrangement with anyone who enters through the doorways: even in a series obsessed with overexplaining things, these are “don’t worry about it, that’s just how it is” questions. What you need to know is that the Aelfinns’ answers are often cryptic and open to interpretation, and the Eelfinns’ wish-granting is hyper-literal and comes with, uh, strings attached, as Mat quickly discovers.

Mat getting his things from the Eelfinn is essentially the moment he becomes the Mat he is for the rest of the story, like Perrin’s wolf powers or Egwene’s dream-walking or Rand’s channeling. So it’s pivotal! What did you think of how the show handled it?

Image of Set Sjöstrand as Couladin

Set Sjöstrand as Rand’s Shaido rival Couladin (center), giving off real Great Value Brand Khal Drogo energy here.

Credit: Prime/Amazon MGM Studios

Set Sjöstrand as Rand’s Shaido rival Couladin (center), giving off real Great Value Brand Khal Drogo energy here. Credit: Prime/Amazon MGM Studios

Lee: I thought it was pretty fantastic! We get to see Mat’s foxhead medallion—granted in response to his screaming about how sick he is of being “bollocked about by every bloody magic force on this bloody planet.” But more importantly—possibly the most important thing of all to a certain class of book reader!—is that we also finally get to see the weapon that will define Mat both in combat and out for the entire rest of the series. That’s right, kids, it’s an actual-for-real Ashandarei—and Mat’s hanging from it, just like in the books! Well, sort of. Sort of somewhat similarly to the books!

Mat is being aligned and equipped very well now to head toward his destiny. In fact, after this much of a build-up, the most Wheel of Time-esque thing to happen now would be for him to be completely absent from season four. Ell-oh-ell.

Image of Mat hanging from his knife-wrench-thing.

A bargain made, a price is paid. It’s a little hard to make out, but you can clearly see Mat’s (center) Ashandarei stabbed into the top of the doorframe—just follow the rope.

Credit: Prime/Amazon MGM Studios

A bargain made, a price is paid. It’s a little hard to make out, but you can clearly see Mat’s (center) Ashandarei stabbed into the top of the doorframe—just follow the rope. Credit: Prime/Amazon MGM Studios

Andrew: The Tanchico plotline is also kind of wrapped up here in abrupt fashion. In essence, our heroes fail. Not only do Moghedien and Liandrin manage to escape with all the parts of the collar they need to corral and control the Dragon Reborn, but they also agree to team up so they can beat the other Big Bads and become the biggest bads of all. I cannot see this ending well for either of them, but Kate Fleetwood’s Liandrin is such an unhinged presence on this show that I’m glad she’s sticking around.

Our heroes don’t walk away entirely empty-handed, I suppose. Thom tells Elayne that they actually know each other and tells her that “Lord Gaebril” is actually a Forsaken and a usurper whom she hasn’t actually known her whole life. And Nynaeve gets pitched into the sea, where a near-death experience dissolves the block that is keeping her from channeling freely (the show doesn’t say this overtly, but this is only lightly altered from a similar sequence that happens in book seven or eight, I think).

Image of Nynaeve saving herself from drowning

Nynaeve (center) doing her best Charlton Heston impression.

Credit: Prime/Amazon MGM Studios

Nynaeve (center) doing her best Charlton Heston impression. Credit: Prime/Amazon MGM Studios

Lee: Right, I believe Nynaeve’s block gets busted in book seven—I remember because when I started reading the series, that was the latest available book and the event stuck out. I very much like bringing it forward, too. In the books, keeping the block around makes sense narratively and serves a solid set of purposes; in the show, it was starting to feel less like a legitimate plot device and more like a bad storytelling crutch. It has served its purpose, and it’s time to get rid of it and get on with things.

(Though it is kind of funny to note that Liandrin was the one trying to help Nynaeve break the block in the show a couple of seasons ago. Looks like Liandrin finally found a method that works! The results, though, will not be what she expects.)

Image of Mat's foxhead medallion.

The foxhead medallion—one of the three items that come to define Matrim Cauthon (center).

Credit: Prime/Amazon MGM Studios

The foxhead medallion—one of the three items that come to define Matrim Cauthon (center). Credit: Prime/Amazon MGM Studios

Andrew: The show has set us all up to converge in Tear in season four, essentially going backwards in the story and doing parts of book three; my guess would be that, if it’s still identifiable as an adaptation of any particular Wheel of Time book, we see parts of books five and maybe six mixed in there, too. But all of that is contingent on the show getting another season, and for the first time going into a WoT finale, we aren’t actually sure if that’s happening, right?

Lee: Ugh, yeah, still no word on the next season, which sucks, because this one was so damn good. We wrap in the desert, where Rand has darkened the skies (enough to be seen all over the world!) and brought rain. Everyone looks on portentously. The Stone of Tear and the sword within it (Callandor! It’s the sword in the stone!) beckon. We just need the all-swallowing monster that is Amazon to spare some pocket change to make it happen.

Image of Rand summoning the storm

Rand (center-right) summons the rains.

Credit: Prime/Amazon MGM Studios

Rand (center-right) summons the rains. Credit: Prime/Amazon MGM Studios

Andrew: I’ve been worried about this renewal. Dramas like this just don’t get as many seasons as they would have in eras of TV gone by, and we’re several years past the end of streaming TV’s blank check era (unless you’re Apple TV+, I guess). This season has earned a lot of praise from more people than us—it’s got a higher Rotten Tomatoes score than either of the previous seasons, and higher than the second season of Rings of Power.

But it also doesn’t seem like Wheel of Time has become the breakout crossover smash-hit success that Jeff Bezos had in mind when he demanded his own Game of Thrones all those years ago. It’s expensive, and shows get more expensive the longer they run, as the people in front of and behind the camera negotiate raises and contract renewals.

I would love to see this get a fourth season. The third season had enough great stuff in it that I would be legitimately sad to see it canceled now, which is more attached than I was to the show at the end of its first or second seasons. How ’bout you?

Image of Rhuarc pledging fealty to the Car'a'carn

“And how can this be? For he is the Kwisatz Haderach!” I’m sorry, I’m sorry, no more Dune jokes.

Credit: Prime/Amazon MGM Studios

“And how can this be? For he is the Kwisatz Haderach!” I’m sorry, I’m sorry, no more Dune jokes. Credit: Prime/Amazon MGM Studios

Lee: I’ve said it a bunch, and I’ll say it again: This has been the season where the show found itself. I have every confidence that the next few seasons—if they’re allowed to exist—are going to kick ass.

But this is 2025, the year all dreams die. Perhaps this show, too, is a dream—one from which we are fated to wake sooner, rather than later.

I suppose we’ll know shortly. Until then, dear readers, may you always find water and shade, and may the hand of the Creator shelter you all. And also perhaps knock some sense into Bezos.

Credit: WoT Wiki

Recap: Wheel of Time’s third season balefires its way to a hell of a finish Read More »

ai-#112:-release-the-everything

AI #112: Release the Everything

OpenAI has upgraded its entire suite of models. By all reports, they are back in the game for more than images.

GPT-4.1 and especially GPT-4.1-mini are their new API non-reasoning models. All reports are that GPT-4.1-mini especially is very good.

o3 is the new top of the line ChatGPT reasoning model, with o3-pro coming in a few weeks. Reports are that it too looks very good, even without us yet taking much advantage of its tool usage. If you have access, check it out. Full coverage is coming soon. There’s also o4-mini and o4-mini-high.

Oh, they also made ChatGPT memory cover all your conversations, if you opt in, and gave us a version of Claude Code called Codex. And an update to their preparedness framework that I haven’t had time to examine yet.

Anthropic gave us (read-only for now) Google integration (as in GMail and Calendar to complement Drive), and also a mode known as Research, which would normally be exciting but this week we’re all a little busy.

Google and everyone else also gave us a bunch of new stuff. The acceleration continues.

Not covered yet, but do go check them out: OpenAI’s o3 and o4-mini.

Previously this week: GPT-4.1 is a Mini Upgrade, Open AI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing

  1. Language Models Offer Mundane Utility. But doctor, you ARE ChatGPT!

  2. Language Models Don’t Offer Mundane Utility. Cuomo should have used o3.

  3. Huh, Upgrades. ChatGPT now has full memory across conversations.

  4. On Your Marks. A new benchmark for browsing agents.

  5. Research Quickly, There’s No Time. Just research. It’s cleaner. Check your email.

  6. Choose Your Fighter. Shoutouts to Google-AI-in-search and Mistral-Small-24B?

  7. Deepfaketown and Botpocalypse Soon. Building your own AI influencer.

  8. The Art of the Jailbreak. ChatGPT can now write its own jailbreaks.

  9. Get Involved. Study with UT Austin, or work for Ted Cruz. We all make choices.

  10. Introducing. Google offers agent development kit, OpenAI copies Claude Code.

  11. In Other AI News. Oh no, please, not another social network.

  12. Come on OpenAI, Again? Funny what keeps happening to the top safety people.

  13. Show Me the Money. Thinking Machines and SSI.

  14. In Memory Of. Ways to get LLMs real memory?

  15. Quiet Speculations. What even is AGI anyway, and other questions.

  16. America Restricts H20 Sales. We did manage to pull this one out.

  17. House Select Committee Report on DeepSeek. What they found was trouble.

  18. Tariff Policy Continues To Be How America Loses. It doesn’t look good.

  19. The Quest for Sane Regulations. Dean Ball joins the White House, congrats man.

  20. The Week in Audio. Hassabis, Davidson and several others.

  21. Rhetorical Innovation. No need to fight, they can all be existential dangers.

  22. Aligning a Smarter Than Human Intelligence is Difficult. Working among us.

  23. AI 2027. Okay, fair.

  24. People Are Worried About AI Killing Everyone. Critch is happy with A2A.

  25. The Lighter Side. The numbers are scary. The words are scarier.

Figure out what clinical intuitions convert text reports to an autism diagnosis. The authors were careful to note this was predicting who would be diagnosed, not who actually has autism.

Kate Pickert asserts in Bloomberg Why AI is Better Than Doctors at the Most Human Part of Medicine. AI can reliably express sympathy to match the situation, is always there to answer and doesn’t make you feel pressured or rushed. Even the gung ho doctors still saying things like ‘AI is not going to replace physicians, but physicians who know how to use AI are going to be at the top of their game going forward’ and saying how it ‘will allow doctors to be more human,’ and the article calls that an ‘ideal state.’ Isn’t it amazing how every vision of the future picks some point where it stops?

The US Government is deploying AI to clean up its personnel records and correct inaccurate information. That’s great if we do a good job.

Translate dolphin vocalizations?

Pin down where photographs were taken. It seems to be very good at this.

Henry: ten years ago the CIA would have gotten on their knees for this. every single human has just been handed an intelligence superweapon. it’s only getting stranger

This may mean defense will largely beat offense on deepfakes, if one has a model actually checking. If I can pinpoint exact location, presumably I can figure out when things don’t quite add up.

Andrew Cuomo used ChatGPT for his snoozefest of a vacuous housing plan, which is fine except he did not check its work.

Hell Gate: Andrew Cuomo used ChatGPT to help write the housing plan he released this weekend, which included several nonsensical passages. The plan even cites to ChatGPT on a section about the Rent Guidelines Board.

He also used ChatGPT for at least two other proposals. It’s actively good to use AI to help you, but this is not that. He didn’t even have someone check its work.

If New York City elects Andrew Cuomo as mayor we deserve what we will get.

What else isn’t AI doing for us?

Matthew Yglesias: AI is not quite up to the task of “make up a reason for declining a work-related invitation that will stand up to mild scrutiny, doesn’t make me sound weird, and is more polite than ‘I don’t want to do it.’”

Tyler Cowen: Are you sure?

I think AI is definitely up to that task to the extent it has sufficient context to generate a plausible reason. Certainly it can do an excellent job of ‘use this class of justification to generate a maximally polite and totally non-weird reply.’

As usual, the best way to not get utility is not to use them, fraudulent company edition.

Peter WIldeford: I guess fake it until you make it, do things that don’t scale, etc. only works to a point.

Nico: Ship fast, break things, (go to jail)

I don’t think ‘create human call centers in order to get market share and training data to then make them into AI call centers’ is even a terrible startup idea. The defrauding part did run into a little trouble.

A technical analysis of some fails by Claude Plays Pokemon, suggesting issues stemming from handling and management of long context. This both suggests ways to improve Claude in general, and ways one could improve the scaffolding and allow Claude to play superior Pokemon (without ‘cheating’ or otherwise telling it about the game in any specific way.)

Apple’s demo of Siri’s new abilities to access reader emails and find real-time flight data and plot routes in maps came as news to the people working on Siri. In general Mac Rumors paints a picture of a deeply troubled and confused AI effort at Apple, with eyes very much not on the ball.

ChatGPT memory now extends to the full contents of all your conversations. You can opt out of this. You can also do incognito windows that won’t interact with your other chats. You can also delete select conversations.

Noam Brown: Memory isn’t just another product feature. It signals a shift from episodic interactions (think a call center) to evolving ones (more like a colleague or friend).

Still a lot of research to do but it’s a step toward fundamentally changing how we interact with LLMs.

This shift has its disadvantages. There’s a huge freedom and security and ability to relax when you know that an interaction won’t change things overall. When you interact with a human, there’s always this kind of ‘social calculator’ in the back of your brain whether you’re conscious of it or not, and oh my is it a relief to turn it off. I hate that now when I use so many services, I have to worry in that same way about ‘what my actions say about me’ and how they influence what I will see in the future. It makes it impossible to fully relax. Being able to delete chats helps, but not fully.

My presumption is you still very much want this feature on. Most of the time, memory will be helpful, and it will be more helpful if you put in effort to make it helpful – for example it makes sense to offer feedback to ChatGPT about how it did and what it can do better in the future, especially if you’re on $200/month and thus not rate limited.

I wonder if it is now time to build a tool to let one easily port their chat histories between various chatbots? Presumably this is actually easy, you can copy over the entire back-and-forth with and tags and paste it in, saying ‘this is so you can access these other conversations as context’ or what not?

Anna Gat is super gung ho on memory, especially on it letting ChatGPT take on the role of therapist. It can tell you your MBTI and philosophy and lead you to insights about yourself and take different points of view and other neat stuff like that. I am skeptical that doing this is the best idea, but different people work differently.

Like Sean notes, my wife uses my account too (I mean it’s $200/month!) and presumably that’s going to get a bit confusing if you try things like this.

Gemini 2.5 Pro was essentially rushed into general availability before its time, so we should still expect it to improve soon when we get the actual intended general availability version, including likely getting a thinking budget similar to what is implemented in Gemini 2.5 Flash.

Google upgrades AI Studio (oh no?), they list key improvements as:

  1. New Starter Apps.

  2. Refined Prompting Interface, persistent top action bar for common tasks.

  3. Dedicated Developer Dashboard including API keys and changelog.

A quick look says what they actually did was mostly ‘make it look and feel more like a normal chat interface.’

Cohere gives us Embed 4.

My feedback is they don’t do a good job here explaining the value proposition, and what differentiates it from other offerings or from Embed 3. It seems like it is… marginally better at search and retrieval? But they don’t give me a way to feel what they uniquely enable, or where this will outperform.

ChatGPT will have all your images in one place. A small thing, but a good thing.

Grok gives us Grok Studio, offering code execution and Google Drive support.

LM Arena launches a ‘search Arena’ leaderboard, Gemini 2.5 Pro is on top with Perplexity-Sonar-Reasoning-Pro (high) slightly behind on presumably more compute.

OpenAI introduces BrowseComp, a 1,266 question benchmark for browsing agents. From looking at sample questions they provide this is extremely obscure inelegant trivia questions, except you’ll be allowed to use the internet? As in:

Give me the title of the scientific paper published in the EMNLP conference between 2018-2023 where the first author did their undergrad at Dartmouth College and the fourth author did their undergrad at University of Pennsylvania. (Answer: Frequency Effects on Syntactic Rule Learning in Transformers, EMNLP 2021)

I mean, yeah, okay, that is a test one could administer I suppose, but why does it tell us much about how good you are as a useful browsing agent?

When asking about ‘1 hour tasks’ there is a huge gap between ‘1 hour given you know the context’ versus ‘1 hour once given this spec.’

Charles Foster: Subtle point [made in this post]: there’s a huge difference between typical tasks from your job that take you 1 hour of work, and tasks that a brand new hire could do in their first hour on the job. Most “short” tasks you’ve done probably weren’t standalone: they depended on tons of prior context.

A lot of getting good at using LLMs is figuring out how, or doing the necessary work, to give them the appropriate context. That includes you knowing that context too.

How badly did Llama-4 go? This badly:

Casper Hansen: Llama 4 quietly dropped from 1417 to 1273 ELO, on par with DeepSeek v2.5.

Previously, Llama-4 took second place in Arena with an intentionally sloppified version optimized for Arena. That’s gone now, we are testing the actual Llama-4-Maverick, and shall we say it’s not going great.

Here’s a very bullish report on Gemini 2.5 Pro (prior to the release of o3):

Leo Abstract: there’s been a lot going on lately and i neglected to chime in earlier and say that gemini 2.5 pro absolutely crushes the private benchmarks i’d been using until now, in a new way and to a new extent.

i’m going to need a bigger benchmark.

Well, well, what do we have here.

Anthropic: Today we’re launching Research, alongside a new Google Workspace integration.

Claude now brings together information from your work and the web.

Research represents a new way of working with Claude.

It explores multiple angles of your question, conducting searches and delivering answers in minutes.

The right balance of depth and speed for your daily work.

Claude can also now connect with your Gmail, Google Calendar, and Docs.

It understands your context and can pull information from exactly where you need it.

Research is available in beta for Max, Team, and Enterprise plans in the United States, Japan, and Brazil.

Separately, the Google Workspace integration is now available for all paid plans.

Oh. My. God. Huge if true! And by true I mean good at job.

I’m excited for both features, but long term I’m more excited for Google integration than for research. Yes, this should 100% be Gemini’s You Had One Job, but Gemini is not exactly nailing it, so Claude? You’re up. Right now it’s read-only, and it’s been having trouble finding things and having proper access in my early tests, but I’m waiting until I try it more. Might be a few bugs to work out here.

Anthropic: We’ve crafted some prompt suggestions that help you quickly get insights from across your Google Workspace.

After turning it on, try: “Reflect on my calendar as if I was 100 years old looking back at this time.”

Peter Wildeford: 👀

Just tried “Claude Research”, it’s much faster (takes <1min) but much weaker than Gemini's and ChatGPT's "Deep Research" (fails to find key facts and sources that Gemini/ChatGPT do).

I think it’s a great replacement for a Google quick dive but not for serious research.

Claude integration with Google apps seems potentially awesome and I’d be curious how it compares against Gemini (which is surprisingly weak in this area given the home field advantage)

But alas “Claude for Google Drive is not approved by Advanced Protection” so can’t yet try it. Still, I expect to use this a lot.

Also LOL in the video how everyone compliments the meticulous sabbatical planning even though it was just a Claude copy+paste.

John Pressman says people are sleeping on Mistral-Small-24B, and in particular it speeds up his weave-agent project dramatically (~10x). Teortaxes asks about Reka-21B. There’s this entire other ecosystem of small open models I mostly don’t cover.

Liron Shapira is liking the new version of Google AI-in-search given its integration of web content and timely news. I’m not there yet.

The general case of this is my biggest complaint about Gemini 2.5 Pro.

Agus: Gemini 2.5 Pro seems to systematically refuse me when I ask it to provide probabilities for things, wut

Max Alexander: True human level capabilities.

An advertisement for tools to build an AI influencer on Instagram and OnlyFans. I mean, sure, why not. The problem is demand side, not supply side, as they say.

You can use AI to create bad new Tom & Jerry cartoons, I guess, if you want to?

With its new memory feature, Pliny found that ChatGPT wasn’t automatically jailbroken directly, but it did give Pliny a jailbreak prompt, and the prompt worked.

Join the office of Ted Cruz.

US Senate Employment Office: Chairman #TedCruz seeks a #conservative #counsel to join the #Republican staff of the #Senate #Commerce Committee to lead on #artificialintelligence #policy. Job referral #231518

In all seriousness this seems like a high leverage position for someone who understands AI and especially AI existential risk. Ted Cruz has some very large confusions about AI related matters. As a result he is attempting to do some highly damaging things. We also have our disagreements, but a lot of it is that he seems to conflate ethics and wokeness concerns with notkilleveryoneism concerns, and generally not understand what is at stake or what there is to worry about. One can join his team, sincerely help him, and also help explain this.

If you do go for this one, thank you for your service.

Anthropic is looking for a genius prompt engineer for Model Behavior Architect, Alignment Fine Tuning.

Scott Aaronson is building an OpenPhil backed AI alignment group at UT Austin, prospective postdocs and PhD students in CS should apply ASAP for jobs starting as soon as August. You’ll need your CV, links to representative publications and two recommendation letters, you can email Chandra.

AI Innovation & Security Policy Workshop in Washington DC, July 11-13, apply by May 4th. All travel and accommodation expenses covered, great speaker lineup, target is US citizens considering careers in AI policy.

UK AISI is funding alignment research, you can fill out a 5-minute contract form.

80,000 Hours Podcast is making a strategic shift to focus on AGI, and looking to grow its team with a third host/interviewer (!) and a chief of staff, deadline is May 6.

Google presents the Agent Development Kit (ADK) (GitHub download, ReadMe).

  • Code-First Development: Define agents, tools, and orchestration logic for maximum control, testability, and versioning.

  • Multi-Agent Architecture: Build modular and scalable applications by composing multiple specialized agents in flexible hierarchies.

  • Rich Tool Ecosystem: Equip agents with diverse capabilities using pre-built tools, custom Python functions, API specifications, or integrating existing tools.

  • Flexible Orchestration: Define workflows using built-in agents for predictable pipelines, or leverage LLM-driven dynamic routing for adaptive behavior.

  • Integrated Developer Experience: Develop, test, and debug locally with a CLI and visual web UI.

  • Built-in Evaluation: Measure agent performance by evaluating response quality and step-by-step execution trajectory.

  • Deployment Ready: Containerize and deploy your agents anywhere – scale with Vertex AI Agent Engine, Cloud Run, or Docker.

  • Native Streaming Support: Build real-time, interactive experiences with native support for bidirectional streaming (text and audio).

  • State, Memory & Artifacts: Manage short-term conversational context, configure long-term memory, and handle file uploads/downloads.

  • Extensibility: Customize agent behavior deeply with callbacks and easily integrate third-party tools and services.

pip install google-adk

OpenAI offers us Codex CLI, a feature adopted from Claude. This is open source so presumably you could try plugging in Claude or Gemini. It runs from the command line and can do coding things or ask questions about files based on a natural language request, up to and including building complete apps from scratch in ‘full auto mode,’ which is network disabled and sandboxed to its directory.

Noam Brown (OpenAI): I now primarily use codex for coding. @fouadmatin and team did an amazing job with this!

I sympathize!

Austen Allred: An anecdotal survey of GauntletAI grads came to the consensus that staying on the cutting edge of AI takes about one hour per day.

Yes, per day.

How long it takes depends what goals you have, and which cutting edges they include. It seems highly plausible that ‘be able to apply AI at the full cutting edge at maximum efficiency’ is one hour a day. That’s a great deal, and also a great deal.

OpenAI is working on a Twitter-like social network. Unfortunately, I checked and the name Twitter is technically not available, but since when has OpenAI cared about copyright law? Fingers crossed!

Mostly they’re crossed hoping OpenAI does not do this. As in, the world as one says please, Sam Altman, humanity is begging you, you do not have to do this. Then again, I love Twitter to death, currently Elon Musk is in charge of it, and if there is going to be a viable backup plan for it I’d rather it not be Threads or BlueSky.

OpenAI offers an update to its preparedness framework. I will be looking at this in more detail later, for now simply noting that this exists.

Anthropic, now that it has Google read-only integration and Research, is reportedly next going to roll out a voice mode, ‘as soon as this month.’

New DeepMind paper uses subtasks and using capabilities towards a given goal to measure goal directedness. As we already knew, LLMs often ‘fail to employ their capabilities’ and are not ‘fully goal-directed’ at this time, although we are seeing them become more goal directed over time. I note the goalpost move (not the paper’s fault!) from ‘LLMs don’t have goals’ to ‘LLMs don’t maximally pursue the goals they have.’

Well, this doesn’t sound awesome, especially on top of what else we learned recently. It seems we’ve lost another head of the Preparedness Framework. It does seem like OpenAI has not been especially prepared on these fronts lately. When GPT-4.1 was released we got zero safety information of any kind that I could find.

Garrison Lovely: 🚨BREAKING🚨 OpenAI’s top official for catastrophic risk, Joaquin Quiñonero Candela, quietly stepped down weeks ago — the latest major shakeup in the company’s safety leadership. I dug into what happened and what it means.

Candela, who led the Preparedness team since July, announced on LinkedIn he’s now an “intern” on a healthcare team at OpenAI.

A company spokesperson told me Candela was involved in the successor framework but is now “focusing on different areas.”

This marks the second unannounced leadership change for the Preparedness team in less than a year. Candela took over after Aleksander Mądry was quietly reassigned last July — just days before Senators wrote to Sam Altman about safety concerns.

Candela’s departure is part of a much larger trend. OpenAI has seen an exodus of top safety talent over the past year, including cofounder John Schulman, safety lead Lilian Weng, Superalignment leads Ilya Sutskever & Jan Leike, and AGI readiness advisor Miles Brundage.

When Leike left, he publicly stated that “safety culture and processes have taken a backseat to shiny products” at OpenAI. Miles Brundage cited publishing constraints and warned that no company is ready for artificial general intelligence (AGI).

With key safety leaders gone, OpenAI’s formal governance is crucial but increasingly opaque. Key members of the board’s Safety and Security Committee are gone, and the members of a new internal “Safety Advisory Group” (SAG) haven’t been publicly identified.

An OpenAI spox told me that Sandhini Agarwal has been leading the safety group for 2 months, but that information hadn’t been announced or previously reported. Given how much OpenAI has historically emphasized AI safety, shouldn’t we know who is in charge of it?

A former employee wrote to me, “Even while working at OpenAI, details about safety procedures were very siloed. I could never really tell what we had promised, if we had done it, or who was working on it.”

This pattern isn’t unique to OpenAI. Google still hasn’t published a safety report for Gemini 2.5 Pro, arguably the most capable model available, in likely violation of the company’s voluntary commitments.

Mira Murati’s Thinking Machines has doubled their fundraising target to $2 billion, and the team keeps growing, including Alec Radford. I expect them to get it.

Ilya Sutskever’s SSI now valued at $32 billion. That is remarkably high.

Matt Turck: Hearing rumors of massive secondaries in some of those huge pre-product AI rounds. Getting nervous that the level of risk across the AI industry is getting out of control. No technology even revolutionary can live up in the short term to this level of financial excess.

James Campbell: As far as I know, there are four main ways we could get LLM memory:

  1. We could simply use very long contexts, and the context grows over an instance’s “lifetime”; optionally, we could perform iterative compression or summarization.

  2. A state-space model that keeps memory in a constant-size vector.

  3. Each context is a “day.” Then, the model is retrained each “night” on the day’s data so that it has long-term knowledge of what happened (just as humans sleep).

  4. Retrieval Augmented Generation (RAG) on text or state vectors, and the RAG performs sophisticated operations such as reflection or summarization in the background. Perhaps reinforce the model to use the scaffold extremely skillfully.

Are there any methods I missed?

I’m surprised labs don’t take the “fine-tune user-specific instances” route. Infrastructurally, it might be hard, but I’m holding out hope that Thinky might do this.

Gallabytes: very bullish on 3, interested in how far lightweight approximations can go via 2, and super bearish on 1.

4 is orthogonal. Taking notes seems good.

I am also highly bearish on #1 and throwing everything into context, you’d be much better off in a #4 scenario at that point unless I’m missing something. The concept on #3 is intriguing, I’d definitely be curious to see it tried more. In theory you could also update the weights continuously, but I presume that would slow you down too much, which also is presumably why humans do it this way?

Gideon Lichfield is mostly correct that ‘no one knows’ what the term ‘artificial general intelligence’ or AGI means. Mostly we have a bunch of different vague definitions at best. Lichfield does a better job than many of then taking future AI capabilities seriously and understanding that everyone involved is indeed pointing at real things, and notice that “most of the things AI will be capable of, we can’t even imagine today.” Gideon does also fall back on several forms of copium, like intelligence not being general, or the need to ‘challenge conventional wisdom,’ or that to think like a human you have to do the things humans do (e.g. sleep (?), eat (?!), have sex (what?) or have exactly two arms (???)).

Vladimir Nesov argues that even if timelines are short and your work’s time horizon is long, that means your alignment (or other) research gets handed off to AIs, so any groundwork you can lay remains helpful.

Robin Hanson once again pitches that AI impacts will be slow and take decades, this time based on previous GPTs (general purpose technologies) taking decades. Sometimes I wonder about an alternative Hanson who is looking for Hansonian reasons AI will go fast. Claude’s version of this seemed uninspired.

Paul Graham: The AI boom is not just probably bigger than the two previous ones I’ve seen (integrated circuits and the internet), but also seems to be spreading faster.

It took a while for society to “digest” integrated circuits and the internet in the sense of figuring out all the ways they could be used. This seems to be happening faster with AI. Maybe because so many uses of intelligence are already known.

For sure there will be new uses of AI as well, perhaps more important than the ones we already know about. But we already know about so many that existing uses are enough to generate rapid growth.

Over 90% of the code being written by the latest batch of startups is written by AI. Sold now?

Another way of putting this is, yes being a GPT means that the full impact will take longer, but there being additional impact later doesn’t mean less impact soon.

Tyler Cowen says it’s nonsense that China is beating us, and the reason is AI, which he believes will largely favor America due to all AIs having ‘souls rooted in the ideals of Western Civilization,’ due to being trained primarily on Western data, and this is ‘far more radical’ than things like tariff rates and more important than China’s manufacturing prowess.

I strongly agree that AI likely strongly favors the United States (although mostly for other reasons), and that AI is going be big, really big, no bigger than that, it’s going to be big. It is good to see Tyler affirm his belief in both of these things.

I will however note that if AI is more important than tariffs, then what was the impact of tariff rates on GDP growth again? Credible projections for RGDP growth for 2025 were often lowered by several percent on news of the tariffs. I find these projections reasonable, despite widespread anticipation that mostly the tariffs will be rolled back. So, what does that say about the projected impact of AI, if it’s a much bigger deal?

Also, Tyler seems to be saying the future is going to be shaped primarily by AIs, but he’s fine with that because they will be ‘Western’? And thus it will be a triumph of ‘our’ soft power? No, they will soon be highly alien, and the soft power will not be ours. It will be theirs.

(I also noticed him once again calling Manus a ‘top Chinese AI model,’ a belief that at this point has to be a bizarre anachronism or something? The point that it was based on Claude is well taken.)

We are going to be at least somewhat smarter about the selling China AI chips part. It turns out this time we didn’t actually fully sell out for a $1 million Mar-a-Lago dinner.

Samuel Hammond (last week, infamously and correctly, about our being about to let China buy H20s): what the actual fuck.

Thomas Hochman: Finally, an answer to the question: What if we made our AI companies pay a tariff on a chip that we designed but also ended export control restrictions.

Pater Wildeford: Why does banning the H20 matter?

Current US chip export controls focus on China’s ability to TRAIN frontier AI models.

But it’s now INFERENCE that is becoming the central driver of AI innovation through reasoning models, agentic AI, and automated research.

By restricting the H20 NOW, the US can limit China’s inference hardware accumulation before it becomes the dominant compute paradigm.

Matthew Yglesias: We’re doing a form of trade war with China where half the stuff normal Americans buy will get more expensive, but China still gets access to the leading edge technology they need to dominate AI and whatever else.

Jason Hausenloy (link to full article): Exporting H20 Chips to China Undermines America’s AI Edge.

Good news, everyone! We did it. We restricted, at least for now, sales of the H20.

We know it will actually impact chip flows because Nvidia filed to expect $5.5 billion in H20-related charges for Q1 (for now) and traded 6% down on the news.

Last week’s announcement may have been a deliberate leak as an attempt to force the administration’s hand. If so, it did not work.

We still have to sustain this decision. It is far from final, and no doubt Nvidia will seek a license to work around this, and will redesign its chips once again to maximally evade our restrictions.

Also, they will look the other way while selling chips elsewhere. Jensen is not the kind of CEO who cooperates in spirit. Which leads to our other problem, that we are decimating BIS rather than strengthening BIS. What good are our restrictions if we have no way to enforce them?

Ben Thompson is the only person I’ve seen disagree with the restrictions on the H20. That position was overdetermined because he opposes all the export controls and thinks keeping people locked in the CUDA ecosystem is more important than who has better AIs or compute access. He consistently has viewed AI as another tech platform play, as I’ve discussed in the past, especially when dissecting his interview with Altman, where he spent a lot of it trying to convince Altman to move to an advertising model.

Ben’s particular claim was that the H20 is so bad that no one outside China would want them, thus they have to write off $5.5 billion. That’s very clearly not the case. Nvidia has to write off the $5.5 billion as a matter of accounting, whether or not they ultimately sell the chips in the West. There are plenty of buyers for H20s, and as my first test of o3 it confirmed that the chips would absolutely sell in the Western markets well above their production costs, it estimates ~$10 billion total. Which means that not only does China get less compute, we get more compute.

Nvidia is definitely worth less than if they were allowed to sell China better AI chips, but they can mostly sell whatever chips they can make. I am not worried for them.

How is BIS enforcement going right now?

Aidan O’Gara: So excited to see the next batch of models out of Malaysia.

Kakashii: Nvidia Chip Shipments to Malaysia Skyrocket to Record Highs Despite U.S. Warnings — March 2025 Update

On March 23, 2025, the Financial Times reported that U.S. officials asked Malaysia to monitor every shipment entering the country when it involves Nvidia chips. “[The U.S. is] asking us to make sure that we monitor every shipment that comes to Malaysia when it involves Nvidia chips”, Malaysian Trade Minister Zafrul Aziz said.

But did Malaysia really start the monitor and crackdown? Let’s assume U.S. officials approached Malaysian officials about a week before the FT published the report—which means Malaysia had half a month to start monitoring. That’s plenty of time to change the picture of the flows. I assume.

So what did they do when the Singapore tunnel was about to close? Yep, you guessed right—boost Malaysia’s role in this routing.

Malaysia is now dominating the Nvidia chip flow into Asia, and March has officially taken the crown as the biggest month ever for shipments to the country.

Let’s talk numbers:

2022: $817 million

2023: $1.276 billion

2024: $4.877 billion — an increase of almost 300% YoY

2025:

January: $1.12 billion (!) — nearly 700% year-over-year increase (!)

February: $626.5 million

March: a record-breaking $1.96 billion (!) — an astonishing 3,433% increase from 2023 to 2025 (!)

Total GPU flow from Taiwan to Malaysia in Q1 2025? $3.71 billion. If we take Nvidia’s estimated total revenue for Q1, Malaysia’s shipments alone make up almost 10% of the company’s estimated revenue in Q1 (!).

As in, Nvidia is getting 10% of their Q1 2025 revenue selling chips to China in direct violation of the export controls. And we are doing approximately nothing about it.

In related news, The House Select Committee on the CCP has issued a report entitled “DeepSeek Unmasked: Exposing the CCP’s Latest Tool For Spying, Stealing, and Subverting U.S. Export Control Restrictions.”

I am glad they are paying attention to the situation, especially the issues with American export controls. Many of their proposed interventions are long overdue, and several proposals are things we’d love to have but that we need to do work on to find a method of implementation.

What is dismaying is that they are framing AI danger entirely as a threat from the sinister PRC. I realize that is what the report is about, but one can tell that they are viewing all this purely in terms of our (very real and important to win) competition with the PRC, to the exclusion of other dangers.

This is clearly part of a push to ban DeepSeek, at least in terms of using the official app and presumably API. I presume they aren’t going to try and shut down use of the weights. The way things are presented, it’s not clear that everyone understands that this is mostly very much not about an app.

The report’s ‘key findings’ are very much in scary-Congressional-report language.

Key findings include:

  • Censorship by Design: More than 85% of DeepSeek’s responses are manipulated to suppress content related to democracy, Taiwan, Hong Kong, and human rights—without disclosure to users.

  • Foreign Control: DeepSeek is owned and operated by a CCP-linked company led by Lian Wenfang and ideologically aligned with Xi Jinping Thought.

  • U.S. User Data at Risk: The platform funnels American user data through unsecured networks to China, serving as a high-value open-source intelligence asset for the CCP.

  • Surveillance Network Ties: DeepSeek’s infrastructure is linked to Chinese state-affiliated firms including ByteDance, Baidu, Tencent, and China Mobile—entities known for censorship, surveillance, and data harvesting.

  • Illicit Chip Procurement: DeepSeek was reportedly developed using over 60,000 Nvidia chips, which may have been obtained in circumvention of U.S. export controls.

  • Corporate Complicity: Public records show Nvidia CEO Jensen Huang directed the company to design a modified chip specifically to exploit regulatory loopholes after October 2023 restrictions. The Trump Administration is working to close this loophole.

They also repeat the accusation that DeepSeek was distilling American models.

A lot of these ‘key findings’ are very much You Should Know This Already, presented as something to be scared of. Yes, it is a Chinese company. It does the mandated Chinese censorship. It uses Chinese networks. It might steal your user data. Who knew? Oh, right, everyone.

The more interesting claims are the last two.

(By the way, ‘DeepSeek was developed?’ I get what you meant to say, but: Huh?)

We previously dealt with claims of 50k Nvidia chips, now it seems it is 60k. They are again citing SemiAnalysis. It’s definitely true that this was reported, but it seems unlikely to actually be true. Also note that their 60k chips here include 30k H20s, and the report makes clear that by ‘illegal chip’ procurement they are including legal chips that were designed by Nvidia to ‘skirt’ export controls, and conflating this with potential actual violations of export restrictions.

In this sense, the claim on Corporate Complicity is, fundamentally, 100% true. Nvidia has repeatedly modified its AI chips to be technically in compliance with our export restrictions. As I’ve said before and said above, they have zero interest in cooperating in spirit and are treating this as an adversarial game.

This also includes exporting to Singapore and now Malaysia in quantities very obviously too large for anything but the Chinese secondary market.

I don’t think this approach is going to turn out well for Nvidia. In an iterated game where the other party has escalation dominance, and you can’t even fully meet demand for your products, you might not want to constantly hit the defect button?

Kristina Partsinevelos: Nvidia response: “The U.S. govt instructs American businesses on what they can sell and where – we follow the government’s directions to the letter” […] “if the government felt otherwise, it would instruct us.”

Daniel Eth: Hot take but this is actually really dumb of Nvidia. USG has overarching goals here – by following the letter of the law but not the spirit, Nvidia is risking getting caught in the cross hairs of legal updates (as they just did) and giving up the opportunity to create good will.

If I was Nvidia I would be cooperating in spirit. And of course I’d be asking for quite a lot of consideration in exchange in various ways. I’d expect to get it. Alas.

The report recommends two sets of things. First, on export controls:

  1. Improve funding for BIS. On this I am in violent agreement.

  2. Further restrict the export controls, to include the H20 and also manufacturing equipment. We have not banned selling tools and sub components. This is potentially enabling Huawei to give China quite a lot of effective chips. Ben Thompson mentions this issues as well. So again, yes, agreed.

  3. Impose remote access controls on all data center, compute cluster and models trained with the use of US-origin GPUs and other U.S.-origin data center accelerants, including but not limited to TPUs. I’d want to see details, as worded this seems to be a very extreme position.

  4. Give BIS the ability to create definitions based on descriptions of capability, that can be used to describe AI models with national security significance. Yes.

  5. Essentially whistleblower provisions on export control violations. Yes.

  6. Consider requiring tracking end-users of chips and equipment. How?

  7. Actually enforce our export controls against smuggling. Yes.

  8. Require companies to install on-chip location verification capabilities in order to receive an export license for chips restricted from export to any country with a high risk of diversion to the PRC. A great idea subject to implementation issues. Do we have the capability to do this in a reasonable way? We should definitely be working on being able to do it.

  9. “Ensure the secure and safe use of AI systems by directing a federal agency (e.g., NIST and AISI, CISA, NSA) to develop physical and cybersecurity standards and benchmarks for frontier AI developers to protect against model distillation, exfiltration, and other risks.” Violent agreement on exfiltration risks. On distillation, I must ask: How do you intend to do that?

  10. “Address national security risks and the PRC’s strategy to capture AI market share with low-cost, open-source models by placing a federal procurement prohibition on PRC-origin AI models, including a prohibition on the use of such models on government devices.” As stated this only applies to federal procurement. It seems fine for government devices to steer clear, not because of ‘market share’ concerns (what?) but because of potential security issues, and also because it doesn’t much matter, there is no actual reason to use DeepSeek’s products here anyway.

Their second category is interesting: “Prevent and prepare for strategic surprise related to advanced AI.”

I mean, yes, we should absolutely be doing that, but mostly concerns about that have nothing to do with DeepSeek or the PRC. I mean, I can’t argue with lines like ‘AI will affect many aspects of government functioning, including aspects relating to defense and national security.’

This narrow, myopic and adversarial focus, treating AI purely as a USA vs. PRC situation, misses most of the point, but if it gets our government to actually build state capacity and pay attention to developments in AI, then that’s very good. If they only monitor PRC progress, that’s better than not monitoring anyone’s progress, but we should be monitoring our own progress too, and everyone else’s.

The ‘AI strategic surprise’ to worry about most is, of course, a future highly capable AI (or the lab that created it) strategically surprising you.

Similarly, yes, please incorporate AI into your ‘educational planning’ and look for national security related AI challenges, including those that emerge from US-PRC AI competition. But notice that a lot of the danger is that the competition pushes capabilities or their deployment forward recklessly.

Otherwise, we end up responding to this by pushing forward ever harder and faster and more recklessly, without ability to align or control the AIs we are creating that are already increasingly acting misaligned (I’ll discuss this more when I deal with o3), exacerbating the biggest risks.

We are going about tariffs exactly backwards. This is threatening to cripple our AI efforts along with everything else. So here we are once again.

Mostly tariffs are terrible and one should almost never use them, but to the extent there is a point, it would be to shift production of high-value goods, and goods vital to national security, away from China and towards America and our allies.

That would mean putting a tariff on the high-value finished products you want to produce. And it means not putting a tariff on the raw materials used to make those products, or on products that have low value, aren’t important to national security and that you don’t want to reshore, like cheap clothes and toys.

And most importantly, it means stability. To invest you need to know what to expect.

Mark Gurman: BREAKING: President Donald Trump’s administration exempted smartphones, computers and other electronics from its so-called reciprocal tariffs in win for Apple (and shoppers). This lowers the China tariff from 125%.

Ryan Peterson: So we’re exempting all advanced electronics from Chinese tariffs and putting a 125 pct tariff on textiles and toys?

Armand Domalewski: right now, if I import the parts to make a computer, I pay a massive tariff, but if I import the fully assembled computer, I get no tariff. truly a genius plan to reshore manufacturing

Nassim Nicholas Taleb: Pathological incoherence of Neronulus Trump:

+U.S. made electronics pay duties on imported material/parts. But imports of electronics THAT INCLUDE SAME MATERIAL & PARTS are exempt; this is a CHINA SUBSIDY.

+Moving US industry from high value added into socks, lettuce…

Mike Bird: If you were running a secret strategy to undermine American manufacturing, exempting popular finished consumer goods from tariffs and keeping them in place for intermediate goods, capital goods and raw materials would be a good way to go about it.

Spencer Hakimian: So let’s just get this crystal clear.

If somebody is trying to manufacture a laptop in the U.S., the components that they import are going to be tariffed at 145%.

But if somebody simply makes the laptop entirely in China, they are fully tariff free?

And this is all to bring manufacturing back to the U.S.?

NY-JOKER:

Joey Politano (and this wasn’t what made him the joker, that was a different thread): lol Commerce posted today that they’re looking into tariffs on the machinery used to make semiconductors

This will just make it harderto make semiconductors in the US. It is so unbelievably stupid that I cannot put it into words.

The exact example the administration used of something they (very incorrectly) insisted we could ‘make in America’ was the iPhone. We can’t do that ourselves at any sane price any time soon, but it is not so crazy to say perhaps we should not depend on China for 87% of our iPhones. In response to massive tariffs, Apple was planning to shift more production from China to India.

But then they got an exemption, so forget all that, at least for right now?

Aaron Blake: Lutnick 5 days ago said smartphone manufacturing was coming to the U.S. because of tariffs.

Now we learn smartphones are excluded from the tariffs.

Stephen Miller here clarifies in response that no, and Trump clarified as only he can, such products are (for now, who knows what tomorrow will bring?) still subject to the original IEEPA tariff on China of 20%. That is still a lot less than 145%. What exactly does Miller think is going to happen?

Or maybe not? The methodology for all this, we are told, is ‘on instinct.’

The Kobeissi Letter: BREAKING: Commerce Secretary Lutnick says tariffs on semiconductors and electronics will come in “a month or so.”

Once again, markets are very confused this morning.

It appears that semiconductor tariffs are still coming but not technically a part of reciprocal tariffs.

They will be in their own class, per Lutnick.

De Itaone: White House official: Trump will issue a Section 232 study on semiconductors soon.

White House official: Trump has stated that autos, steel, pharmaceuticals, chips, and other specific materials will be included in specific tariffs to ensure tariffs are applied fairly and effectively.

Ryan Peterson: Tariffs on semiconductors and electronics will be introduced in about a month, according to U.S. Commerce Secretary Lutnick.

Those products were exempted from tariffs just yesterday. The whole system seems designed to create paralysis.

Josh Wingrove: Trump, speaking to us on Air Force One tonight, repeated that the semiconductor tariffs are coming but opened the door to exclusions.

“We’ll be discussing it, but we’ll also talk to companies,” he said. “You have to show a certain flexibility. Nobody should be so rigid.”

Joey Politano: I AM NOW THE JOKER.

You can imagine how those discussions are going to go. How they will work.

So what the hell are you supposed to do now until and unless you get your sit down?

Sit and wait, presumably.

Between the uncertainty about future tariffs and exemptions, including on component parts, who would invest in American manufacturing or other production right now, in AI and tech or elsewhere? At best you are in a holding pattern.

Stargate is now considering making some of its data center investments in the UK.

Indeed, trade is vanishing stunningly fast.

That is only through April 8. I can’t imagine it’s gotten better since then. I have no idea what happens when we start actually running out of all the things.

We are setting ourselves up on a path where our own production will actively fall since no one is able to plan, and we will be unable to import what we need. This combination is setting us up for rather big trouble across the board.

Ryan Peterson: Met another US manufacturer who’s decided to produce their product overseas where they won’t have to pay duties on component imports.

Paul Graham: Someone in the pharma business told me companies are freezing hiring and new projects because of uncertainty about tariffs. I don’t think Trump grasps how damaging mere uncertainty is.

Even if he repealed all the tariffs tomorrow, businesses would still have to worry about what he might do next, and be correspondingly conservative about plans for future expansion.

Kelsey Piper: the only thing that will actually prompt a full recovery is Congress reasserting its power over taxation so the markets can actually be sure we won’t do this song and dance again in a few months.

The trading of Apple and its options around the exemption announcement was, shall we say, worthy of the attention of the SEC. It’s rather obvious what this looks like.

If you took the currently written rules at face value, what would happen to AI? Semianalysis projected on April 10 that the direct cost increase for GPU cloud operators was at that point likely less than 2%, cheap enough to shrug off, but that water fabrication equipment for US labs would rise 15% and optical module costs will increase 25%-40%.

Semianalysis (Dylan Patel et al): Nevertheless, we identified a significant loophole in the current tariff structure – explained in detail in the following section – that allows US companies to avoid tariffs on certain goods, including GPUs, imported from Mexico and Canada.

While GPUs are subject to a 32% tariff on all GPUs exported from Taiwan to the US, there is a loophole in the current tariff structure that allows US companies to import GPUs from Mexico and Canada at a 0% tariff.

That would mean the primary impact here would be on the cost and availability of financing, and the decline in anticipated demand. But by the time you read this, that report and Peter’s summary of it will be at least days old. You need to double check.

You also need to hold onto your talent. The world’s top talent, at least until recently, wanted to beat a path to our door. We are so foolish that we often said no. And now, that talent, including those we said yes to, are having second thoughts.

Shin Megami Boson: I run an AI startup and some of my best employees are planning to leave the country because, while they’re on greencards and have lived here for over a decade, they’re not willing to risk getting disappeared to an el salvadorean blacksite if they try to visit family abroad.

My current guess on the best place to go, if one had to go, would be Canada. I can verify that I indeed know someone (in AI) who really did end up moving there.

Meanwhile, this sums up how it’s going:

Ryan Peterson: Two of our American customers devastated by the tariffs gave up and sold themselves to their Chinese factories in the last week.

Thousands, and then millions, of American small businesses, including many iconic brands, we’ll go bankrupt this year if the tariff policies on China don’t change.

These small businesses are largely unable to move their manufacturing out of China. They are last in line when they try to go to a new country as those other countries can’t even keep up with the demand from mega corporations.

The manufacturers in Vietnam and elsewhere can’t be bothered with small batch production jobs typical of a small business’s supply chain.

When the brands fail, they will be purchased out of bankruptcy by their Chinese factories who thus far have built everything except a customer facing brand, which is where most of the value capture happens already.

Consumer goods companies typically mark up the goods 3x or more to support their fixed costs (including millions of American employees).

Now the factories will get to vertically integrate and capture the one part of the chain they haven’t yet dominated.

And when they die, it may actually be the final victory for the Chinese manufacturer as they scoop up brands that took decades to build through the blood, sweat and tears of some of the most creative and entrepreneurial people in the world. American brand builders are second to none world wide.

Dean Ball joins the White House office of Science and Technology Policy as a Senior Policy Advisor on AI and Emerging Technology. This is great news and also a great sign, congrats to him and also those who hired him. We have certainly had our disagreements, but he is a fantastic pick and very much earned it.

Director Michael Kratsios: Dean [Ball] is a true patriot and one of the sharpest minds in AI and tech policy.

I’m proud to have him join our @WHOSTP47 team.

The Stop Stealing Our Chips Act, introduced by Senator Rounds and Senator Warner, would help enable BIS, whose capabilities are being crippled by forced firings, to enforce our regulations on exporting chips, including whistleblower procedures and creating associated protections and rewards. I agree this is a no-brainer. BIS desperately needs our help.

Corin Katzke and Gideon Futerman make the latest attempt to explain why racing to artificial superintelligence would undermine America’s national security, since ‘why it would kill everyone’ is not considered a valid argument by so many people. They warn of great power conflict when the PRC reacts, the risk of loss of control and the risk of concentration of power.

They end by arguing that ASI projects are relatively easy to monitor reasonably well, and the consequences severe, thus cooperation to avoid developing ASI is feasible.

I checked out the other initial AI Frontiers posts as well. They read as reasonable explainers for people who know little about AI, if there is need for that.

Where we are regarding AI takeover and potential human disempowerment…

Paul Graham: Sam, if all these politicians are going to start using ChatGPT to generate their policies, you’d better focus on making it generate better policies. Or we could focus on electing better politicians. But I doubt we can improve the hardware as fast as you can improve the software.

Demis Hassabis talks to Reid Hoffman and Aria Finger, which sounds great on principle but the description screams failure to know how to extract any Alpha.

Yann LeCun says he’s not interested in LLMs anymore, which may have something to do with Meta’s utter failure to produce an interesting LLM with Llama 4?

Austin Lin: the feeling is mutual.

As usual, our media has a normal one and tells us what we need to know.

Brendan Steinhauser: My friend and colleague @MarkBeall appeared on @OANN to discuss AI and national security.

Check it out. #ai

Yes, they do indeed try to ask him about our vulnerability to the spying Chinese robot dogs. Mark Beall does an excellent job pivoting to powerful AI and the race to superintelligence, the risks of AI cyberattacks and how far behind the curve is our government’s situational awareness. Then Chanel asks better questions, including about the (very crucial) need for more rapid military adaptation cycles. When Beall says America and China are ‘neck and neck’ here he’s talking about military adaptation, where we are moving painfully slowly.

Why does unsupervised learning work? The answer given is compression, finding the shortest program that explains the data, including representing randomness. The explanation says that’s what pretraining does, and it learns very quickly. I notice I do not find this satisfying, it doesn’t satiate my confusion or curiosity here. Nor do I buy that it is explaining that large a percentage of what is going on to say ‘intelligence is efficient compression.’ I see a similar perspective here from Dean Ball, where he notes that more intelligent people can detect patterns before a dumber person. That’s true, but I believe that other things can enable this that aren’t (to me) intelligence, and that there are things that are intelligence but that don’t do this, and also that it’s largely about threshold effects rather than speed.

Conjecture Co-Founder Gabe Alfour on The Human Podcast.

It seems there is a humorous AI Jesus podcast.

Peter Wildeford offers a 1-minute recap of the GPT-4.5 livestream, that one minute of attention is all you need. GPT-4.5 is 10x more compute than GPT-4, the process took two years, data availability is a pain point, the next step is another 10x increase to 1M GPU runs over 1-2 years.

Dylan Patel: OpenAI successfully wasted an hour of time for every person in pre-training by making them pay the upmost attention for insane possible alpha, but subsequently having none.

I hate the OpenAI livestreams. They’re almost never worthwhile, and I’ve stopped watching, but I have to worry I am missing something and I have to wait for the news to arrive later in text form. Please send text tokens for greater training efficiency.

Tom Davidson spends three hours out of the 80,000 Hours (podcast) talking about AI-enabled coups.

A new paper also looks at these various mechanisms of AI-enabled coups. This is still framed here as worry about a person or persons taking over rather than worry about the AI itself taking over. In the coup business it’s all the same except that the AI is a lot smarter and more capable.

  1. The first concern is if AI is singularly loyal, note that this loyalty functions the same whether or not it is nominally to a particular person. Any mechanism of singular control will do.

  2. The second concern are ‘hard-to-detect secret loyalties’ which in a broad sense are inevitable and ubiquitous under current techniques. The AI might not be secretly loyal ‘to a person’ but its loyalties are not what you had in mind. One still does want to prevent this from being done more explicitly or deliberately, to focus loyalty to particular persons or groups. Note that it won’t be ‘clean’ to get an AI to refuse such modifications, what other modifications will it learn to refuse?

  3. The third concern is ‘exclusive access to coup-enabling capabilities,’ so essentially just a controllable group becoming relatively highly capable, thus granting it the ability to steer the future.

The core problem isn’t preventing a ‘coup,’ as in allowing a small group to steer the future. That alone seems doable. The problem is, you need to prevent this, but you also need humanity to still collectively be able to meaningfully steer a future that contains entities smarter than humans, at the same time, and protecting against various forms of gradual disempowerment and the vulnerable world hypothesis. That is very hard, the final boss version of the classic problems of governance we’ve been dealing with for a very long time, to which we’ve only found least horrendous answers.

Last week I mentioned that OpenAI was attempting to transition their non-profit arm into essentially ‘do ordinary non-profit things.’ Former OpenAI employee Jacob Hilton points out I was being too generous, and that the new mission would better be interpreted as ‘proliferate OpenAI’s products among nonprofits.’ Clever, that way even the money you do pay the nonprofit you largely get to steal back, too.

Michael Nielsen in a very strong essay argues that the ‘fundamental danger’ from ASI isn’t ‘rogue ASI’ but rather that ASI enables dangerous technologies, while also later in the essay dealing with other (third) angles of ASI danger. He endorses and focuses on the Vulnerable World Hypothesis. By aligning models we bring the misuse threats closer, so maybe reconsider alignment as a goal. This is very much a ‘why not both’ situation, or as Google puts it, why not all four: Misalignment, misuse, mistakes and multi-agent structural risks like gradual disempowerment. This isn’t a competition.

One place where Nielsen is very right is that alignment is insufficient, however I must remind everyone that it is necessary. We have a problem where many people equate (in both good faith and bad faith) existential risk from AI uniquely with a ‘rogue AI,’ then dismiss a ‘rogue AI’ as absurd and therefore think creating smarter than human, more capable minds is a safe thing for humans to do. That’s definitely a big issue, but that doesn’t mean misalignment isn’t a big deal. If you don’t align the ASIs, you lose, whether that looks like ‘going rogue’ or not.

Another important point is that deep understanding of the world is everywhere and always dual use, as is intelligence in general, and that most techniques that make models more ‘safe’ also can be repurposed to make models more useful, including in ‘unsafe’ ways, and one does not simply take a scalpel to the parts of understanding that you dislike.

He ends with a quick and very good discussion of the risk of loss of control, reiterating why many dumb arguments against it (like ‘we can turn it off’ or ‘we won’t give it goals or have it seek power’ or ‘we wouldn’t put it in charge’) are indeed dumb.

A thread with some of the most famous examples of people Speaking Directly Into the Microphone, as in advocating directly for human extinction.

A good reminder that when people make science fiction they mostly find the need to focus on the negative aspects of things, and how they make humans suffer.

Mark Krestschmann: Forget “Black Mirror”. We need WHITE MIRROR An optimistic sci-fi show about cool technology and how it relates to society. Who’s making this?

Rafael Ruiz: People are saying this would be boring and not worth watching, yet 90% of people are irrationally afraid of technology because of Black Mirror-esque sci-fi.

The “fiction to bad epistemics” pipeline remains undefeated.

We need to put up more positive visions for the future.

Andrew Rettek: Star Trek wasn’t boring.

If there’s one thing we have definitely established, it’s that most AI companies have essentially zero interest in any actions that don’t directly impact the bottom line.

Joshua Clymer: favorite line (from the 80k hours podcast with Buck):

“5 years ago I thought addressing misalignment was very difficult. Now the situation feels like man we know a list of 40 things, none of which feel that hard, but i’ve also updated drastically downward on how many things AI companies have the time / appetite to do”

My position is that addressing misalignment still looks extremely difficult, indeed many of the required elements feel very hard, and also navigating the narrow path to a worthy future will be extremely difficult even if alignment does get solved.

Alas, I fully endorse the second half of Buck’s statement. The AI companies are dramatically underinvesting in alignment, security and safety work, even relative to their own private economic self-interest via the resulting more useful models.

John Wentworth asks, what would it look like if everything AI was kind of fake? I agree that this is something to keep an eye on, but not to endorse.

A hotfix to the OpenAI model spec, great work everyone, and we have a changelog.

Jason Wolfe: We just released a small bugfix update to the OpenAI Model Spec. Key changes:

– Fixed a copy-editing error where “misalignment” wasn’t listed as a reason assistants might pursue incorrect goals.

– Clarified that “Do not lie” allows “pleasantries” rather than “white lies.”

Since they thanked me explicitly for pointing some of this out: You’re welcome. And thank you for fixing it.

Daniel Kokotajlo remains worried that the models can still lie to avoid revealing privileged information. It’s obviously fine to refuse to answer questions, once the models start lying for glomarization it’s not clear where that ends.

Buck Shlegeris et al offer a new paper testing methods for AI control with a red teamer attempting to get a coding agent to download malware. The solution is called ‘Ctrl-Z: Controlling AI Agents via Resampling,’ as in once suspicion is aroused you can ‘go back in time’ and resample earlier actions (note that this implies you can isolate all the relevant preconditions for the action, which I expect to often break as an assumption).

We measure the tradeoff between attack prevention and usefulness; our best protocol combines resampling with analysis of previous steps, reducing the success rate of attacks from 58% to 7% at a 5% cost to the performance of a non-malicious agent.

I see what they did there, but the limitations seems too large for me to update much.

An analysis of AI behavior in the game Among Us as a sandbox for agentic deception, which was ubiquitous when the agents were given only the game rules.

Here’s how various models did, clearly reasoning models had a big edge.

A high win-rate at the game can be achieved by being a better “crewmate” or by being a better “impostor”. We show that Claude/DeepSeek become much better at winning by being better impostors.

When comparing deception and detection capability, we found smaller and non-reasoning models comparable to frontier models at detection (proxied by being a crewmate), while frontier and reasoning models are much better at deception (proxied by being an impostor). This suggests the frontier pushes more for deceptive capability rather than detection.

They expect humans are still better than the AIs, but that’s with the AIs playing the game ‘straight up’ without training. I have very little doubt that an actual AlphaUs would be well above human level.

If deception ‘scales’ better than detection, that is very bad news. It means that if you use AIs to try and detect deception from other AIs, this will work less well over time even if the AIs remain evenly matched in overall capability.

Better news was that for now using probes detected deception rather easily, as seen in the chart below. Autoencoder performance was more mixed.

Lucius Bushnaq sees three (or four) assumptions often implicitly made by people doing interpretability, and notes that the third, that a model’s activations can be decomposed into the sum of the represented features, is rarely if ever made explicit.

An interview with Eli Lifland, mostly duplicative if you’re read various other things.

This is a highly fair objection:

Shako: How I felt listening to Scott Alexander reason about how the populace will react to ASI on the Dwarkesh podcast

Andrew Critch remains worried, but he is now modestly less worried.

Andrew Critch: News from Google this week reduced my fear of an extinction lock-in (EL) event by end-of-year 2029, by a factor of ~0.85 (-15%). Below is my new subjective CDF, and why it’s lower:

p(EL by eoy 2025)=5%

P(EL by eoy 2027)=13%

p(EL by eoy 2029)=25%

p(EL by eoy 2031)=45%

p(EL by eoy 2033)=55%

p(EL by eoy 2035)=65%

p(EL by eoy 2037)=70%

p(EL by eoy 2039)=75%

p(EL by eoy 2049)=85%

That’s still an 85% chance of extinction lockin within 25 years. Not great. But every little bit helps, as they say. What was the cause for this update?

Andrew Critch: The main news is Google’s release of an Agent to Agent (A2A) protocol.

Alvaro Cintas: Google introduces Agent2Agent (A2A) protocol for AI interoperability.

It enables AI agents from different vendors to communicate and collaborate seamlessly.

Google: Today, we’re launching a new, open protocol called Agent2Agent (A2A), with support and contributions from more than 50 technology partners like Atlassian, Box, Cohere, Intuit, Langchain, MongoDB, PayPal, Salesforce, SAP, ServiceNow, UKG and Workday; and leading service providers including Accenture, BCG, Capgemini, Cognizant, Deloitte, HCLTech, Infosys, KPMG, McKinsey, PwC, TCS, and Wipro. The A2A protocol will allow AI agents to communicate with each other, securely exchange information, and coordinate actions on top of various enterprise platforms or applications.

A2A facilitates communication between a “client” agent and a “remote” agent. A client agent is responsible for formulating and communicating tasks, while the remote agent is responsible for acting on those tasks in an attempt to provide the correct information or take the correct action.

Andrew Critch: First, I wasn’t expecting a broadly promoted A2A protocol until the end of this year, not because it would be hard to make, but because I thought business leaders in AI weren’t thinking enough about how much A2A interaction will dominate the economy soon.

Second, I was expecting a startup like Cursor to have to lead the charge on promoting A2A. Google doing this is heartening — and something I’ve been hoping for — because more than any other company, they “keep the internet running”, and they *shouldlead on this.

I’d lower my risk estimates further, except that I don’t *yetsee how the US and China are going to sort out their relations around AI. But FWIW, I’ve been hoping for years that trade negotiations like tariffs would be America’s primary approach to that.

Anyway, seeing leadership from big business on agent-to-agent interaction protocols in Q2 2025 is yielding a significant (*0.85) shift downward in my worries.

Thanks Google!

This A2A feature certainly seems cool and useful, and yes it seems positive for Google to be the one providing the protocol. It will be great if your agent can securely call other agents and relay its subtasks, rather than each agent having to navigate all those subtasks on its own. You can call agents to do various things the way traditional programs can call functions. Great job Google, assuming this is good design. From what I could tell it looked like good design but I’m not going to invest the kind of time that would let me confidently judge that.

What I don’t see is why this substantially improves humanity’s chances to survive.

Which way?

The right mind for this job does not yet exist.

But it calls for my new favorite motivational poster.

Arthur Dent (if that is his real name, and maybe it is?): I asked the AI for the least inspiring inspirational poster and I weirdly like it

Here’s an alternative AI-generated motivational poster.

AI Safety Memes: ChatGPT, create a metaphor about AI then turn it into an image.

The work is mysterious and important.

Yes. They’re scary. The numbers are scary.

Whereas the cats are cute.

Michi: My workplace held an AI-generated image contest and I submitted an illustration I drew. Nobody noticed it wasn’t AI and I ended up winning.

Discussion about this post

AI #112: Release the Everything Read More »

looking-at-the-universe’s-dark-ages-from-the-far-side-of-the-moon

Looking at the Universe’s dark ages from the far side of the Moon


meet you in the dark side of the moon

Building an observatory on the Moon would be a huge challenge—but it would be worth it.

A composition of the moon with the cosmos radiating behind it

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

There is a signal, born in the earliest days of the cosmos. It’s weak. It’s faint. It can barely register on even the most sensitive of instruments. But it contains a wealth of information about the formation of the first stars, the first galaxies, and the mysteries of the origins of the largest structures in the Universe.

Despite decades of searching for this signal, astronomers have yet to find it. The problem is that our Earth is too noisy, making it nearly impossible to capture this whisper. The solution is to go to the far side of the Moon, using its bulk to shield our sensitive instruments from the cacophony of our planet.

Building telescopes on the far side of the Moon would be the greatest astronomical challenge ever considered by humanity. And it would be worth it.

The science

We have been scanning and mapping the wider cosmos for a century now, ever since Edwin Hubble discovered that the Andromeda “nebula” is actually a galaxy sitting 2.5 million light-years away. Our powerful Earth-based observatories have successfully mapped the detailed location to millions of galaxies, and upcoming observatories like the Vera C. Rubin Observatory and Nancy Grace Roman Space Telescope will map millions more.

And for all that effort, all that technological might and scientific progress, we have surveyed less than 1 percent of the volume of the observable cosmos.

The vast bulk of the Universe will remain forever unobservable to traditional telescopes. The reason is twofold. First, most galaxies will simply be too dim and too far away. Even the James Webb Space Telescope, which is explicitly designed to observe the first generation of galaxies, has such a limited field of view that it can only capture a handful of targets at a time.

Second, there was a time, within the first few hundred million years after the Big Bang, before stars and galaxies had even formed. Dubbed the “cosmic dark ages,” this time naturally makes for a challenging astronomical target because there weren’t exactly a lot of bright sources to generate light for us to look at.

But there was neutral hydrogen. Most of the Universe is made of hydrogen, making it the most common element in the cosmos. Today, almost all of that hydrogen is ionized, existing in a super-heated plasma state. But before the first stars and galaxies appeared, the cosmic reserves of hydrogen were cool and neutral.

Neutral hydrogen is made of a single proton and a single electron. Each of these particles has a quantum property known as spin (which kind of resembles the familiar, macroscopic property of spin, but it’s not quite the same—though that’s a different article). In its lowest-energy state, the proton and electron will have spins oriented in opposite directions. But sometimes, through pure random quantum chance, the electron will spontaneously flip around. Very quickly, the hydrogen notices and gets the electron to flip back to where it belongs. This process releases a small amount of energy in the form of a photon with a wavelength of 21 centimeters.

This quantum transition is exceedingly rare, but with enough neutral hydrogen, you can build a substantial signal. Indeed, observations of 21-cm radiation have been used extensively in astronomy, especially to build maps of cold gas reservoirs within the Milky Way.

So the cosmic dark ages aren’t entirely dark; those clouds of primordial neutral hydrogen are emitting tremendous amounts of 21-cm radiation. But that radiation was emitted in the distant past, well over 13 billion years ago. As it has traveled through the cosmic distances, all those billions of light-years on its way to our eager telescopes, it has experienced the redshift effects of our expanding Universe.

By the time that dark age 21-cm radiation reaches us, it has stretched by a factor of 10, turning the neutral hydrogen signal into radio waves with wavelengths of around 2 meters.

The astronomy

Humans have become rather fond of radio transmissions in the past century. Unfortunately, the peak of this primordial signal from the dark ages sits right below the FM dial of your radio, which pretty much makes it impossible to detect from Earth. Our emissions are simply too loud, too noisy, and too difficult to remove. Teams of astronomers have devised clever ways to reduce or eliminate interference, featuring arrays scattered around the most desolate deserts in the world, but they have not been able to confirm the detection of a signal.

So those astronomers have turned in desperation to the quietest desert they can think of: the far side of the Moon.

It wasn’t until 1959 when the Soviet Luna 3 probe gave us our first glimpse of the Moon’s far side, and it wasn’t until 2019 when the Chang’e 4 mission made the first soft landing. Compared to the near side, and especially low-Earth orbit, there is very little human activity there. We’ve had more active missions on the surface of Mars than on the lunar far side.

Chang’e-4 landing zone on the far side of the moon. Credit: Xiao Xiao and others (CC BY 4.0)

And that makes the far side of the Moon the ideal location for a dark-age-hunting radio telescope, free from human interference and noise.

Ideas abound to make this a possibility. The first serious attempt was DARE, the Dark Ages Radio Explorer. Rather than attempting the audacious goal of building an actual telescope on the surface, DARE was a NASA-funded concept to develop an observatory (and when it comes to radio astronomy, “observatory” can be as a simple as a single antenna) to orbit the Moon and take data when it’s on the opposite side as the Earth.

For various bureaucratic reasons, NASA didn’t develop the DARE concept further. But creative astronomers have put forward even bolder proposals.

The FarView concept, for example, is a proposed radio telescope array that would dwarf anything on the Earth. It would be sensitive to frequency ranges between 5 and 40 MHz, allowing it to target the dark ages and the birth of the first stars. The proposed design contains 100,000 individual elements, with each element consisting of a single, simple dipole antenna, dispersed over a staggering 200 square kilometers. It would be infeasible to deliver that many antennae directly to the surface of the Moon. Instead, we’d have to build them, mining lunar regolith and turning it into the necessary components.

The design of this array is what’s called an interferometer. Instead of a single big dish, the individual antennae collect data on their own and then correlate all their signals together later. The effective resolution of an interferometer is the same as a single dish as big as the widest distance among the elements. The downside of an interferometer is that most of the incoming radiation just hits dirt (or in this case, lunar regolith), so the interferometer has to collect a lot of data to build up a decent signal.

Attempting these kinds of observations on the Earth requires constant maintenance and cleaning to remove radio interference and have essentially sunk all attempts to measure the dark ages. But a lunar-based interferometer will have all the time in the world it needs, providing a much cleaner and easier-to-analyze stream of data.

If you’re not in the mood for building 100,000 antennae on the Moon’s surface, then another proposal seeks to use the Moon’s natural features—namely, its craters. If you squint hard enough, they kind of look like radio dishes already. The idea behind the project, named the Lunar Crater Radio Telescope, is to find a suitable crater and use it as the support structure for a gigantic, kilometer-wide telescope.

This idea isn’t without precedent. Both the beloved Arecibo and the newcomer FAST observatories used depressions in the natural landscape of Puerto Rico and China, respectively, to take most of the load off of the engineering to make their giant dishes. The Lunar Telescope would be larger than both of those combined, and it would be tuned to hunt for dark ages radio signals that we can’t observe using Earth-based observatories because they simply bounce off the Earth’s ionosphere (even before we have to worry about any additional human interference). Essentially, the only way that humanity can access those wavelengths is by going beyond our ionosphere, and the far side of the Moon is the best place to park an observatory.

The engineering

The engineering challenges we need to overcome to achieve these scientific dreams are not small. So far, humanity has only placed a single soft-landed mission on the distant side of the Moon, and both of these proposals require an immense upgrade to our capabilities. That’s exactly why both far-side concepts were funded by NIAC, NASA’s Innovative Advanced Concepts program, which gives grants to researchers who need time to flesh out high-risk, high-reward ideas.

With NIAC funds, the designers of the Lunar Crater Radio Telescope, led by Saptarshi Bandyopadhyay at the Jet Propulsion Laboratory, have already thought of the challenges they will need to overcome to make the mission a success. Their mission leans heavily on another JPL concept, the DuAxel, which consists of a rover that can split into two single-axel rovers connected by a tether.

To build the telescope, several DuAxels are sent to the crater. One of each pair “sits” to anchor itself on the crater wall, while another one crawls down the slope. At the center, they are met with a telescope lander that has deployed guide wires and the wire mesh frame of the telescope (again, it helps for assembling purposes that radio dishes are just strings of metal in various arrangements). The pairs on the crater rim then hoist their companions back up, unfolding the mesh and lofting the receiver above the dish.

The FarView observatory is a much more capable instrument—if deployed, it would be the largest radio interferometer ever built—but it’s also much more challenging. Led by Ronald Polidan of Lunar Resources, Inc., it relies on in-situ manufacturing processes. Autonomous vehicles would dig up regolith, process and refine it, and spit out all the components that make an interferometer work: the 100,000 individual antennae, the kilometers of cabling to run among them, the solar arrays to power everything during lunar daylight, and batteries to store energy for round-the-lunar-clock observing.

If that sounds intense, it’s because it is, and it doesn’t stop there. An astronomical telescope is more than a data collection device. It also needs to crunch some numbers and get that precious information back to a human to actually study it. That means that any kind of far side observing platform, especially the kinds that will ingest truly massive amounts of data such as these proposals, would need to make one of two choices.

Choice one is to perform most of the data correlation and processing on the lunar surface, sending back only highly refined products to Earth for further analysis. Achieving that would require landing, installing, and running what is essentially a supercomputer on the Moon, which comes with its own weight, robustness, and power requirements.

The other choice is to keep the installation as lightweight as possible and send the raw data back to Earthbound machines to handle the bulk of the processing and analysis tasks. This kind of data throughput is outright impossible with current technology but could be achieved with experimental laser-based communication strategies.

The future

Astronomical observatories on the far side of the Moon face a bit of a catch-22. To deploy and run a world-class facility, either embedded in a crater or strung out over the landscape, we need some serious lunar manufacturing capabilities. But those same capabilities come with all the annoying radio fuzz that already bedevil Earth-based radio astronomy.

Perhaps the best solution is to open up the Moon to commercial exploitation but maintain the far side as a sort of out-world nature preserve, owned by no company or nation, left to scientists to study and use as a platform for pristine observations of all kinds.

It will take humanity several generations, if not more, to develop the capabilities needed to finally build far-side observatories. But it will be worth it, as those facilities will open up the unseen Universe for our hungry eyes, allowing us to pierce the ancient fog of our Universe’s past, revealing the machinations of hydrogen in the dark ages, the birth of the first stars, and the emergence of the first galaxies. It will be a fountain of cosmological and astrophysical data, the richest possible source of information about the history of the Universe.

Ever since Galileo ground and polished his first lenses and through the innovations that led to the explosion of digital cameras, astronomy has a storied tradition of turning the technological triumphs needed to achieve science goals into the foundations of various everyday devices that make life on Earth much better. If we’re looking for reasons to industrialize and inhabit the Moon, the noble goal of pursuing a better understanding of the Universe makes for a fine motivation. And we’ll all be better off for it.

Photo of Paul Sutter

Looking at the Universe’s dark ages from the far side of the Moon Read More »

autism-rate-rises-slightly;-rfk-jr.-claims-he’ll-“have-answers-by-september”

Autism rate rises slightly; RFK Jr. claims he’ll “have answers by September”

Among the sites, there were large differences. Prevalence ranged from 9.7 per 1,000 children who were 8 years old in Texas (Laredo) to 53.1 in California. These differences are likely due to “differences in availability of services for early detection and evaluation and diagnostic practices,” the CDC and network researchers wrote.

For instance, California—the site with the highest prevalence among 8-year-olds and also 4-year-olds—has a local initiative called the Get SET Early model. “As part of the initiative, hundreds of local pediatricians have been trained to screen and refer children for assessment as early as possible, which could result in higher identification of ASD, especially at early ages,” the authors write. “In addition, California has regional centers throughout the state that provide evaluations and service coordination for persons with disabilities and their families.”

On the other hand, the low ASD rates at the network’s two Texas sites could “suggest lack of access or barriers to accessing identification services,” the authors say. The two Texas sites included primarily Hispanic and lower-income communities.

The newly revealed higher rates in some of the network’s underserved communities could link ASD prevalence to social determinants of health, such as low income and housing and food insecurity, the authors say. Other factors, such as higher rates of preterm birth, which is linked to neurodevelopmental disabilities, as well as lead poisoning and traumatic brain injuries, may also contribute to disparities.

Anti-vaccine voices

The detailed data-heavy report stands in contrast to the position of health secretary Robert F. Kennedy Jr., a longtime anti-vaccine advocate who promotes the false and thoroughly debunked claim that autism is caused by vaccines. Last month, Kennedy hired the discredited anti-vaccine advocate David Geier to lead a federal study examining whether vaccines cause autism, despite numerous high-quality studies already finding no link between the two.

Geier, who has no medical or scientific background, has long worked with his father, Mark Geier, to promote the idea that vaccines cause autism. In 2011, Mark Geier was stripped of his medical license for allegedly mistreating children with autism, and David Geier was fined for practicing medicine without a license.

In a media statement Tuesday in response to the new report, Kennedy called autism an “epidemic” that is “running rampant.” He appeared to reference his planned study with Geier, saying: “We are assembling teams of world-class scientists to focus research on the origins of the epidemic, and we expect to begin to have answers by September.”

Autism rate rises slightly; RFK Jr. claims he’ll “have answers by September” Read More »

google-adds-veo-2-video-generation-to-gemini-app

Google adds Veo 2 video generation to Gemini app

Google has announced that yet another AI model is coming to Gemini, but this time, it’s more than a chatbot. The company’s Veo 2 video generator is rolling out to the Gemini app and website, giving paying customers a chance to create short video clips with Google’s allegedly state-of-the-art video model.

Veo 2 works like other video generators, including OpenAI’s Sora—you input text describing the video you want, and a Google data center churns through tokens until it has an animation. Google claims that Veo 2 was designed to have a solid grasp of real-world physics, particularly the way humans move. Google’s examples do look good, but presumably that’s why they were chosen.

Prompt: Aerial shot of a grassy cliff onto a sandy beach where waves crash against the shore, a prominent sea stack rises from the ocean near the beach, bathed in the warm, golden light of either sunrise or sunset, capturing the serene beauty of the Pacific coastline.

Veo 2 will be available in the model drop-down, but Google does note it’s still considering ways to integrate this feature and that the location could therefore change. However, it’s probably not there at all just yet. Google is starting the rollout today, but it could take several weeks before all Gemini Advanced subscribers get access to Veo 2. Gemini features can take a surprisingly long time to arrive for the bulk of users—for example, it took about a month for Google to make Gemini Live video available to everyone after announcing its release.

When Veo 2 does pop up in your Gemini app, you can provide it with as much detail as you want, which Google says will ensure you have fine control over the eventual video. Veo 2 is currently limited to 8 seconds of 720p video, which you can download as a standard MP4 file. Video generation uses even more processing than your average generative AI feature, so Google has implemented a monthly limit. However, it hasn’t confirmed what that limit is, saying only that users will be notified as they approach it.

Google adds Veo 2 video generation to Gemini app Read More »

openai-#13:-altman-at-ted-and-openai-cutting-corners-on-safety-testing

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing

Three big OpenAI news items this week were the FT article describing the cutting of corners on safety testing, the OpenAI former employee amicus brief, and Altman’s very good TED Interview.

The FT detailed OpenAI’s recent dramatic cutting back on the time and resources allocated to safety testing of its models.

In the interview, Chris Anderson made an unusually strong effort to ask good questions and push through attempts to dodge answering. Altman did a mix of giving a lot of substantive content in some places while dodging answering in others. Where he chose to do which was, itself, enlightening. I felt I learned a lot about where his head is at and how he thinks about key questions now.

The amicus brief backed up that OpenAI’s current actions are in contradiction to the statements OpenAI made to its early employees.

There are also a few other related developments.

What this post does not cover is GPT-4.1. I’m waiting on that until people have a bit more time to try it and offer their reactions, but expect coverage later this week.

The big headline from TED was presumably the increase in OpenAI’s GPU use.

Steve Jurvetson: Sam Altman at TED today: OpenAI’s user base doubled in just the past few weeks (an accidental disclosure on stage). “10% of the world now uses our systems a lot.”

When asked how many users they have: “Last we disclosed, we have 500 million weekly active users, growing fast.”

Chris Anderson: “But backstage, you told me that it doubled in just a few weeks.” @SamA: “I said that privately.”

And that’s how we got the update.

Revealing that private info wasn’t okay but it seems it was an accident, in any case Altman seemed fine with it.

Listening to the details, it seems that Altman was referring not to the growth in users, but instead to the growth in compute use. Image generation takes a ton of compute.

Altman says every day he calls people up and begs them for GPUs, and that DeepSeek did not impact this at all.

Steve Jurvetson: Sam Altman at TED today:

Reflecting on the life ahead for his newborn: “My kids will never be smarter than AI.”

Reaction to DeepSeek:

“We had a meeting last night on our open source policy. We are going to do a powerful open-source model near the frontier. We were late to act, but we are going to do really well now.”

Altman doesn’t explain here why he is doing an open model. The next question from Anderson seems to explain it, that it’s about whether people ‘recognize’ that OpenAI’s model is best? Later Altman does attempt to justify it with, essentially, a shrug that things will go wrong but we now know it’s probably mostly fine.

Regarding the accumulated knowledge OpenAI gains from its usage history: “The upload happens bit by bit. It is an extension of yourself, and a companion, and soon will proactively push things to you.”

Have there been any scary moments?

“No. There have been moments of awe. And questions of how far this will go. But we are not sitting on a conscious model capable of self-improvement.”

I listened to the clip and this scary moment question specifically refers to capabilities of new models, so it isn’t trivially false. It still damn well should be false, given what their models can do and the leaps and awe involved. The failure to be scared here is a skill issue that exists between keyboard and chair.

How do you define AGI? “If you ask 10 OpenAI engineers, you will get 14 different definitions. Whichever you choose, it is clear that we will go way past that. They are points along an unbelievable exponential curve.”

So AGI will come and your life won’t change, but we will then soon get ASI. Got it.

“Agentic AI is the most interesting and consequential safety problem we have faced. It has much higher stakes. People want to use agents they can trust.”

Sounds like an admission that they’re not ‘facing’ the most interesting or consequential safety problems at all, at least not yet? Which is somewhat confirmed by discussion later in the interview.

I do agree that agents will require a much higher level of robustness and safety, and I’d rather have a ‘relatively dumb’ agent that was robust and safe, for most purposes.

When asked about his Congressional testimony calling for a new agency to issue licenses for large model builders: “I have since learned more about how government works, and I no longer think this is the right framework.”

I do appreciate the walkback being explicit here. I don’t think that’s the reason why.

“Having a kid changed a lot of things in me. It has been the most amazing thing ever. Paraphrasing my co-founder Ilya, I don’t know what the meaning of life is, but I am sure it has something to do with babies.”

Statements like this are always good to see.

“We made a change recently. With our new image model, we are much less restrictive on speech harms. We had hard guardrails before, and we have taken a much more permissive stance. We heard the feedback that people don’t want censorship, and that is a fair safety discussion to have.”

I agree with the change and the discussion, and as I’ve discussed before if anything I’d like to see this taken further with respect to these styles of concern in particular.

Altman is asked about copyright violation, says we need a new model around the economics of creative output and that ‘people build off each others creativity all the time’ and giving creators tools has always been good. Chris Anderson tries repeatedly to nail down the question of consent and compensation. Altman repeatedly refuses to give a straight answer to the central questions.

Altman says (10: 30) that the models are so smart that, for most things people want to do with them, they’re good enough. He notes that this is true based on user expectations, but that’s mostly circular. As in, we ask the models to do what they are capable of doing, the same way we design jobs and hire humans for them based on what things particular humans and people in general can and cannot do. It doesn’t mean any of us are ‘smart enough.’

Nor does it imply what he says next, that everyone will ‘have great models’ but what will differentiate will be not the best model but the best product. I get that productization will matter a lot for which AI gets the job in many cases, but continue to think this ‘AGI is fungible’ claim is rather bonkers crazy.

A key series of moments start at 35: 00 in. It’s telling that other coverage of the interview sidestepped all of this, essentially entirely.

Anderson has put up an image of The Ring of Power, to talk about Elon Musk’s claim that Altman has been corrupted by The Ring, a claim Anderson correctly notes also plausibly applies to Elon Musk.

Altman goes for the ultimate power move. He is defiant and says, all right, you think that, tell me examples. What have I done?

So, since Altman asked so nicely, what are the most prominent examples of Altman potentially being corrupted by The Ring of Power? Here is an eightfold path.

  1. We obviously start with Elon Musk’s true objection, which stems from the shift of OpenAI from a non-profit structure to a hybrid structure, and the attempt to now go full for-profit, in ways he claims broke covenants with Elon Musk. Altman claimed to have no equity and not be in this for money, and now is slated to get a lot of equity. I do agree with Anderson that Altman isn’t ‘in it for the money’ because I think Altman correctly noticed the money mostly isn’t relevant.

  2. Altman is attempting to do so via outright theft of a huge portion of the non-profit’s assets, then turn what remains into essentially an OpenAI marketing and sales department. This would arguably be the second biggest theft in history.

  3. Altman said for years that it was important the board could fire him. Then, when the board did fire him in response (among other things) to Altman lying to the board in an attempt to fire a board member, he led a rebellion against the board, threatened to blow up the entire company and reformulate it at Microsoft, and proved that no, the board cannot fire Altman. Altman can and did fire the board.

  4. Altman, after proving he cannot be fired, de facto purged OpenAI of his enemies. Most of the most senior people at OpenAI who are worried about AI existential risk, one by one, reached the conclusion they couldn’t do much on the inside, and resigned to continue their efforts elsewhere.

  5. Altman used to talk openly and explicitly about AI existential risks, including attempting to do so before Congress. Now, he talks as if such risks don’t exist, and instead pivots to jingoism and the need to Beat China, and hiring lobbyists who do the same. He promised 20% of compute to the superalignment team, never delivered and then dissolved the team.

  6. Altman pledged that OpenAI would support regulation of AI. Now he says he has changed his mind, and OpenAI lobbies against bills like SB 1047 and its AI Action Plan is vice signaling that not only opposes any regulations but seeks government handouts, the right to use intellectual property without compensation and protection against potential regulations.

  7. Altman has been cutting corners on safety, as noted elsewhere in this post. OpenAI used to be remarkably good in terms of precautions. Now it’s not.

  8. Altman has been going around saying ‘AGI will arrive and your life will not much change’ when it is common knowledge that this is absurd.

One could go on. This is what we like to call a target rich environment.

Anderson offers only #1, the transition to a for-profit model and the most prominent example, which is the most obvious response, but he proactively pulls the punch. Altman admits he’s not the same person he was and that it all happens gradually, if it happened all at once it would be jarring, but says he doesn’t feel any different.

Anderson essentially says okay and pivots to Altman’s son and how that has shaped Altman, which is indeed great. And then he does something that impressed me, which is tie this to existential risk via metaphor, asking if there was a button that was 90% to give his son a wonderful life and 10% to kill him (I’d love those odds!), would he press the button? Altman says literally no, but points out the metaphor, and says he doesn’t think OpenAI is doing that. He says he really cared about not destroying the world before, and he really cares about it now, he didn’t need a kid for that part.

Anderson then moves to the question of racing, and whether the fact that everyone thinks AGI is inevitable is what is creating the risk, asking if Altman and his colleagues believe it is inevitable and asks if maybe they could coordinate to ‘slow down a bit’ and get societal feedback.

As much as I would like that, given the current political climate I worry this sets up a false dichotomy, whereas right now there is tons of room to take more responsibility and get societal feedback, not only without slowing us down but enabling more and better diffusion and adaptation. Anderson seems to want a slowdown for its own sake, to give people time to adapt, which I don’t think is compelling.

Altman points out we slow down all the time for lack of reliability, also points out OpenAI has a track record of their rollouts working, and claims everyone involved ‘cares deeply’ about AI safety. Does he simply mean mundane (short term) safety here?

His discussion of the ‘safety negotiation’ around image generation, where I support OpenAI’s loosening of restrictions, suggests that this is correct. So does the next answer: Anderson asks if Altman would attend a conference of experts to discuss safety, Altman says of course but he’s more interested in what users think as a whole, and ‘asking everyone what they want’ is better than asking people ‘who are blessed by society to sit in a room and make these decisions.’

But that’s an absurd characterization of trying to solve an extremely difficult technical problem. So it implies that Altman thinks the technical problems are easy? Or that he’s trying to rhetorically get you to ignore them, in favor of the question of preferences and an appeal to some form of democratic values and opposition to ‘elites.’ It works as an applause line. Anderson points out that the hundreds of millions ‘don’t always know where the next step leads’ which may be the understatement of the lightcone in this context. Altman says the AI can ‘help us be wiser’ about those decisions, which of course would mean that a sufficiently capable AI or whoever directs it would de facto be making the decisions for us.

OpenAI’s Altman ‘Won’t Rule Out’ Helping Pentagon on AI Weapons, but doesn’t expect to develop a new weapons platform ‘in the foreseeable future,’ which is a period of time that gets shorter each time I type it.

Altman: I will never say never, because the world could get really weird.

I don’t think most of the world wants AI making weapons decisions.

I don’t think AI adoption in the government has been as robust as possible.

There will be “exceptionally smart” AI systems by the end of next year.

I think I can indeed forsee the future where OpenAI is helping the Pentagon with its AI weapons. I expect this to happen.

I want to be clear that I don’t think this is a bad thing. The risk is in developing highly capable AIs in the first place. As I have said before, Autonomous Killer Robots and AI-assisted weapons in general are not how we lose control over the future to AI, and failing to do so is a key way America can fall behind. It’s not like our rivals are going to hold back.

To the extent that the AI weapons scare the hell out of everyone? That’s a feature.

On the issue of the attempt to sideline and steal from the nonprofit, 11 former OpenAI employees filed an amicus brief in the Musk vs. Altman lawsuit, on the side of Musk.

Todor Markov: Today, myself and 11 other former OpenAI employees filed an amicus brief in the Musk v Altman case.

We worked at OpenAI; we know the promises it was founded on and we’re worried that in the conversion those promises will be broken. The nonprofit needs to retain control of the for-profit. This has nothing to do with Elon Musk and everything to do with the public interest.

OpenAI claims ‘the nonprofit isn’t going anywhere’ but has yet to address the critical question: Will the nonprofit actually retain control over the for-profit? This distinction matters.

You can find the full amicus here.

On this question, Timothy Lee points out that you don’t need to care about existential risk to notice that what OpenAI is trying to do to its non-profit is highly not cool.

Timothy Lee: I don’t think people’s views on the OpenAI case should have anything to do with your substantive views on existential risk. The case is about two questions: what promises did OpenAI make to early donors, and are those promises legally enforceable?

A lot of people on OpenAI’s side seem to be taking the view that non-profit status is meaningless and therefore donors shouldn’t complain if they get scammed by non-profit leaders. Which I personally find kind of gross.

I mean I would be pretty pissed if I gave money to a non-profit promising to do one thing and then found out they actually did something different that happened to make their leaders fabulously wealthy.

This particular case comes down to that. A different case, filed by the Attorney General, would also be able to ask the more fundamental question of whether fair compensation is being offered for assets, and whether the charitable purpose of the nonprofit is going to be wiped out, or even pivoted into essentially a profit center for OpenAI’s business (as in buying a bunch of OpenAI services for nonprofits and calling that its de facto charitable purpose).

The mad dash to be first, and give the perception that the company is ‘winning’ is causing reckless rushes to release new models at OpenAI.

This is in dramatic contrast to when there was less risk in the room, and despite this OpenAI used to take many months to prepare a new release. At first, by any practical standard, OpenAI’s track record on actual model release decisions was amazingly great. Nowadays? Not so much.

Would their new procedures pot the problems it is vital that we spot in advance?

Joe Weisenthal: I don’t have any views on whether “AI Safety” is actually an important endeavor.

But if it is important, it’s clear that the intensity of global competition in the AI space (DeepSeek etc.) will guarantee it increasingly gets thrown out the window.

Christina Criddle: EXC: OpenAI has reduced the time for safety testing amid “competitive pressures” per sources:

Timeframes have gone from months to days

Specialist work such as finetuning for misuse (eg biorisk) has been limited

Evaluations are conducted on earlier versions than launched

Financial Times (Gated): OpenAI has slashed the time and resources it spends on testing the safety of its powerful AI models, raising concerns that its technology is being rushed out the door without sufficient safeguards.

Staff and third-party groups have recently been given just days to conduct “evaluations,” the term given to tests for assessing models’ risks and performance, on OpenAI’s latest LLMs, compared to several months previously.

According to eight people familiar with OpenAI’s testing processes, the start-up’s tests have become less thorough, with insufficient time and resources dedicated to identifying and mitigating risks, as the $300 billion startup comes under pressure to release new models quickly and retain its competitive edge.

Steven Adler (includes screenshots from FT): Skimping on safety-testing is a real bummer. I want for OpenAI to become the “leading model of how to address frontier risk” they’ve aimed to be.

Peter Wildeford: I can see why people say @sama is not consistently candid.

Dylan Hadfield Menell: I remember talking about competitive pressures and race conditions with the @OpenAI’s safety team in 2018 when I was an intern. It was part of a larger conversation about the company charter.

It is sad to see @OpenAI’s founding principles cave to pressures we predicted long ago.

It is sad, but not surprising.

This is why we need a robust community working on regulating the next generation of AI systems. Competitive pressure is real.

We need people in positions of genuine power that are shielded from them.

Peter Wildeford:

Dylan Hadfield Menell: Where did you find an exact transcription of our conversation?!?! 😅😕😢

You can’t do this kind of testing properly in a matter of days. It’s impossible.

If people don’t have time to think let alone adapt, probe and build tools, how they can see what your new model is capable of doing? There are some great people working on these issues at OpenAI but this is an impossible ask.

Testing on a version that doesn’t even match what you release? That’s even more impossible.

Part of this is that it is so tragic how everyone massively misinterpreted and overreacted to DeepSeek.

To reiterate since the perception problem persists, yes, DeepSeek cooked, they have cracked engineers and they did a very impressive thing with r1 given what they spent and where they were starting from, but that was not DS being ‘in the lead’ or even at the frontier, they were always many months behind and their relative costs were being understated by multiple orders of magnitude. Even today I saw someone say ‘DeepSeek still in the lead’ when this is so obviously not the case. Meanwhile, no one was aware Google Flash Thinking even existed, or had the first visible CoT, and so on.

The result of all that? Talk similar to Kennedy’s ‘Missile Gap,’ abject panic, and sudden pressure to move up releases to show OpenAI and America have ‘still got it.’

Discussion about this post

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing Read More »

tuesday-telescope:-is-the-james-webb-space-telescope-worth-$10-billion?

Tuesday Telescope: Is the James Webb Space Telescope worth $10 billion?

Welcome to the Tuesday Telescope. There is a little too much darkness in this world and not enough light—a little too much pseudoscience and not enough science. We’ll let other publications offer you a daily horoscope. At Ars Technica, we’ll take a different route, finding inspiration from very real images of a universe that is filled with stars and wonder.

Was the James Webb Space Telescope worth it?

Well, $10 billion is a lot of money. Even when spread over a couple of decades, that’s still a huge chunk of NASA’s annual science budget. (And given the recent Trump administration attack on NASA’s science budget, money is about to get a whole lot tighter.)

However, it is difficult to put a price on advancing our species’ understanding of the natural world and the wide Universe we’re swimming in. And Webb is doing an amazing job of that.

In 2009, NASA launched the Wide-field Infrared Survey Explorer, or WISE, mission to make infrared observations. This was the latest in a line of space-based infrared observatories, and it cost about 3 percent as much as the Webb telescope.

Two infrared views of NGC 1514. At left is an observation from NASA’s Wide-field Infrared Survey Explorer (WISE).

Credit: NASA, ESA, CSA, STScI, NASA-JPL, Caltech, UCLA, Michael Ressler (NASA-JPL), Dave Jones (IAC)

Two infrared views of NGC 1514. At left is an observation from NASA’s Wide-field Infrared Survey Explorer (WISE). Credit: NASA, ESA, CSA, STScI, NASA-JPL, Caltech, UCLA, Michael Ressler (NASA-JPL), Dave Jones (IAC)

Today’s photo concerns the planetary nebula NGC 1514. In 2010, using the WISE telescope, NASA project scientist Mike Ressler discovered “rings” around the planetary nebula. Now, thanks to Webb, the rings—which are likely composed of small dust grains, heated by ultraviolet light from a white dwarf star—can be seen clearly. And, oh my, they’re spectacular.

The clarity in the Webb photo, compared to what came before, is remarkable. So, is seeing the Universe in a new light worth $10 billion? I certainly think so, but I’m writing a weekly story called the Tuesday Telescope, so it’s safe to say I am biased.

Source: NASA, ESA, CSA, STScI, Michael Ressler (NASA-JPL), Dave Jones (IAC)

Do you want to submit a photo for the Daily Telescope? Reach out and say hello.

Tuesday Telescope: Is the James Webb Space Telescope worth $10 billion? Read More »

after-harvard-says-no-to-feds,-$2.2-billion-of-research-funding-put-on-hold

After Harvard says no to feds, $2.2 billion of research funding put on hold

The Trump administration has been using federal research funding as a cudgel. The government has blocked billions of dollars in research funds and threatened to put a hold on even more in order to compel universities to adopt what it presents as essential reforms. In the case of Columbia University, that includes changes in the leadership of individual academic departments.

On Friday, the government sent a list of demands that it presented as necessary to “maintain Harvard’s financial relationship with the federal government.” On Monday, Harvard responded that accepting these demands would “allow itself to be taken over by the federal government.” The university also changed its home page into an extensive tribute to the research that would be eliminated if the funds were withheld.

In response, the Trump administration later put $2.2 billion of Harvard’s research funding on hold.

Diversity, but only the right kind

Harvard posted the letter it received from federal officials, listing their demands. Some of it is what you expect, given the Trump administration’s interests. The admissions and hiring departments would be required to drop all diversity efforts, with data on faculty and students to be handed over to the federal government for auditing. As at other institutions, there are also some demands presented as efforts against antisemitism, such as the defunding of pro-Palestinian groups. More generally, it demands that university officials “prevent admitting students hostile to the American values and institutions.”

There are also a bunch of basic culture war items, such as a demand for a mask ban, and a ban on “de-platforming” speakers on campus. In addition, the government wants the university to screen all faculty hires for plagiarism issues, which is what caused Harvard’s former president to resign after she gave testimony to Congress. Any violation of these updated conduct codes by a non-citizen would require an immediate report to the Department of Homeland Security and State Department, presumably so they can prepare to deport them.

After Harvard says no to feds, $2.2 billion of research funding put on hold Read More »

should-we-settle-mars,-or-is-it-a-dumb-idea-for-humans-to-live-off-world?

Should we settle Mars, or is it a dumb idea for humans to live off world?

Mars is back on the agenda.

During his address to a joint session of Congress in March, President Donald Trump said the United States “will pursue our Manifest Destiny into the stars, launching American astronauts to plant the Stars and Stripes on the planet Mars.”

What does this mean? Manifest destiny is the belief, which was particularly widespread in 1800s America, that US settlers were destined to expand westward across North America. Similarly, then, the Trump administration believes it is the manifest destiny of Americans to settle Mars. And he wants his administration to take steps toward accomplishing that goal.

Should the US Prioritize Settling Mars?

But should we really do this?

I recently participated in a debate with Shannon Stirone, a distinguished science writer, on this topic. The debate was sponsored by Open to Debate, and professionally moderated by Emmy Award-winning journalist John Donvan. Spoiler alert: I argued in favor of settlement. I hope you learned as much as I did.

Should we settle Mars, or is it a dumb idea for humans to live off world? Read More »

monthly-roundup-#29:-april-2025

Monthly Roundup #29: April 2025

In Monthly Roundup #28 I made clear I intend to leave the Trump administration out of my monthly roundups, for both better and worse, outside of my focus areas. Again, this does not mean I don’t have a lot to say or that those questions don’t matter. It means you should not rely on me as your only source of news and I pick my battles.

They are not making this easy.

I am going to stick to my guns. Trade and trading very much inside my focus areas, but for economics roundups, and in extreme cases AI roundups. Besides, you don’t need me to tell you that tariffs not only impose immense economic costs but also fail to achieve their primary policy aims and foster political dysfunction along the way. That question should already be answered by my t-shift. I do have a word about things related to a potential expansion (I can’t believe I’m typing this!) of the Jones Act. And I’ll deal with certain crime-related things when I do my first crime roundup.

  1. Bad News.

  2. Antisocial Media.

  3. Technology Advances.

  4. Variously Effective Altruism.

  5. Government Working.

  6. Jones Act Watch.

  7. While I Cannot Condone This.

  8. Architectural Musings.

  9. Quickly, There’s No Time.

  10. Don’t Sell Your Soul, You Won’t Get Paid.

  11. What To Do Instead.

  12. Good News, Everyone.

  13. We’re Elite, You’re Not.

  14. Enjoy It While It Lasts.

  15. For Your Entertainment.

  16. An Economist Gets Lunch.

  17. I Was Promised Flying Self-Driving Cars and Supersonic Jets.

  18. Gamers Gonna Game Game Game Game Game.

  19. Sports Go Sports.

  20. The Lighter Side.

23andMe is going into bankruptcy. It would seem a wise precaution to download and then delete your data if it’s there, which takes a few days to do, in case the data falls into the wrong hands or is lost forever.

Young men who make 9 figures by default get driven crazy, all checks and balances on them now gone.

This graphic is quite good.

That’s a variation on this classic, worth revisiting periodically as a reminder:

A claim that banning smoking in bars increases alcohol consumption by ~5% without decreasing smoking. I presume the increased alcohol consumption is because the bar became a much better experience without all the smoking? It seems bizarre that this wouldn’t decrease smoking, especially over the long term.

Beware communities that encourage irresponsible risk taking and dismiss those who do not endanger themselves. It can be good if targeted well: There are places, like founding startups and putting yourself out there for romance, where people take far too little risk and it is often good to encourage people to take more. But this very much doesn’t apply to, for example, talk about financial investments.

If you use Twitter via the For You page, You Fool. Yet many of you do exactly that.

I even hear people complaining about ‘the algorithm’ without doing the obvious and switching to chronological feeds and lists. That’s on you.

As far as I know this is the size-adjusted record, yes, and well earned.

Kelsey Piper suggests Twitter’s conversational meta favors long tweets because they attract thoughtful people, plus you get the bonus of QTs saying tldr. That hasn’t been my experience, but I also try to have those conversations elsewhere.

Twitter is restricting the ability to see who other people are following. This is not obviously bad. I would like to be able to follow people without worrying about what it looks like. In practice I don’t care but there are people for whom this matters.

A great question, why is there such huge variance in self-checkout system quality? We have essentially solved self-checkout technology yet half of stores have multiple employees whose job is to fix errors because their terrible software doesn’t work. So yeah, diffusion can be hard.

I don’t want to zipline, unless it’s this zipline:

Ryan Peterson: While everyone in business is busy losing their minds about tariffs, @zipline just quietly launched a logistics revolution in Dallas, TX. You can now get anything at a Walmart delivered to your front door by drone, with a flight time under 2 minutes for most orders.

@DanielLurie We gotta legalize drone delivery in San Francisco.

If you live in Dallas download the app here and starting buying stuff from Walmart before the prices go up!

Nearcyan rants about how awful the developer experience is on Google Play, someone from Google reaches out and the related problems get instantly solved. This can directly be linked to Google’s incentive structures not rewarding anyone for making existing products work properly.

Andrej Karpathy provides ‘no-brainer’ suggestions for personal security, such as having a distinct credit card for every online transaction and using a virtual mail service.

The full agenda he spells out as the baseline minimum seems like an obviously massive overkill level of security for almost anyone. What is Andrej’s hourly rate? Some of this is worthwhile, but as Patrick McKenzie reminds us, the optimal rate of fraud is not zero.

It actually did make me feel better about Signal until everyone saying that caused me to learn about all the ways various other apps compromising your phone can also compromise Signal.

Alice Maz: the good part of the signal leak is it implies a bunch of people with ts/sci access don’t know anything we don’t that would make them distrust signal.

My current model is that Signal is the best low-effort secure communication method, but not on its own good enough that you should assume that using Signal on a normal phone is an actually secure communication method against someone who cares.

Signulll warns against artificial scarcity. I am a lot less skeptical.

Signulll: one of the most common mistakes in product thinking is the belief that you can reintroduce artificial scarcity to improve something that has already been made abundant—especially by the internet (& the internet makes almost everything feel abundant). after people have experienced the infinite, you can’t shove them into a box & expect them to enjoy it. the brain doesn’t forget oxygen.

this shows up in products that add fake constraints: one post a day, one profile at a time, one action per hour. the assumption is that limiting access will restore value or mystery. it doesn’t. once the user has tasted abundance, constraint doesn’t feel elegant or intentional—it feels broken. worse, it feels patronizing.

artificial scarcity almost never works unless it’s intrinsic to the product. you either have to make abundance feel valuable (curated, contextual, high signal), or find a new mechanic entirely. nostalgia for constraint is not strategy. it’s just denial of the current physics of the medium.

this is an extension to this. i see this type of thinking all the time, particularly when people who are frustrated at the current dynamics of any given network (e.g. a dating app etc.)

Nogard: Agree and great point. Modern dating apps unleashed an irrational level of abundance and optionality—so much that it bled into the physical world, warping its constraints. You can’t trick anyone with artificial scarcity; they’ve already tasted the forbidden fruit. It’s like trying to enjoy tap water after a decade of chugging Monster Energy.

Games, especially free mobile games, are chocked full of artificial scarcity. For the most successful games, everything is limited or on a timer. People find this highly addictive. They eat it up. And often they also pay quite a lot to get around those restrictions, that’s often the entire business model. So there’s a big existence proof.

What games try to do is justify the artificial scarcity. When this is done well it works great. So the question now becomes, can you make the artificial scarcity fun and interesting? Can you make it addictive, even? A maximization problem of sorts? Or tie it into your ‘game mechanics’?

I think you absolutely can do all that in many cases, including in dating apps.

First of all, limited actions really do restore value to that action. The frictions and value this introduces can do many useful things. The ideal friction in many cases is money, the amounts can be quite small and refundable and still work. But in cases where you cannot use money, and there are many good reasons to not want to do that, using an artificially scarce currency seems great?

If I was dating, I would rather be on a dating app where I can only match once a day and those I match with know this, than one in which I don’t have that restriction.

Scott Alexander can’t let go of the drowning child argument, going highly technical around various details of hypothetical variations in remarkably dense fashion without seeming that actually interested in what is centrally going on.

Kelsey Piper discusses the administrative nightmare that is trying to use your home to do essentially anything in America. There is no reason for this. If people could easily run microschools and tea shops out of their homes America would be a much better place.

Massachusetts bans heavy-duty truck sales until the trucks can go electric.

Claim that TSA employees are actively happy about the attacks on their union, because the union was preventing the purging of bad actors. I wouldn’t have predicted this, but it shouldn’t be discounted as a possibility. Many comments confirmed that this has recently improved the TSA experience quite a bit. Yes, we shouldn’t need the service they provide, but we’ve decided that we do so better to do a decent job of it.

RFK Jr. proposes banning cell phones in schools… because of the ‘electric magnetic radiation’ he hallucinates they give off.

Jesse Singal: hopefully just the start of RFK Jr making good proposals for hilarious reasons

“We should promote whole grains, because the Illuminati has a stranglehold on processed carbs”

“Everyone should get 30 mins of exercise a day to stay a few steps ahead of your own shadow-daemon”

A word of warning, in case you think the tariffs were not great, that we might be about to not only not repeal the Jones Act but to do things that are vastly worse:

Ryan Peterson: On April 17th the U.S. Trade Representative’s office is expected to impose fees of up to $1.5M per port call for ships made in China and for $500k to $1M if the ocean carrier owns a single ship made in China or even has one on order from a Chinese shipyard.

Ocean carriers have announced that to reduce the fees they will skip the smaller ports like Seattle, Oakland, Boston, Mobile, Baltimore, New Orleans, etc. Some carriers have said they’ll just move the capacity serving the U.S. to other trade lanes altogether.

This would be horrible for jobs in and around those ports, and really bad for companies, both importers and exporters, using those ports. Huge extra costs will be incurred as trucks and trains run hundreds of extra miles to the main ports on each cost.

Similarly the major ports (LA, Long Beach, Houston, and New York) will be unable to keep up with the flood of extra volumes and are likely to become congested, similar to what we saw during Covid.

The craziest part of the original proposal is a requirement that within 7 years 15% of U.S. exports must travel on a ship that’s made in America and crewed by Americans.

There are only 23 of American made and crewed container ships in the world today, and they all service domestic ocean freight (Alaska, Hawaii, Guam, Puerto Rico, etc). They’re all tiny compared to today’s mega ships, and they’re not even sailing to overseas ports.

The U.S. did not produce any container ships in 2024. And the number we produce in any given year rounds to zero. The reason is that American made container ships of 3,000 TEUs cost the same price as the modern container ships from China of 24,000 TEUs.

Colin Grabow: The last time a US shipyard built Suezmax tankers (2004-2006) the price was $210 million each. Now we’re apparently at $500 million with a 6x delta versus the foreign price.

The Jones Act is caught in a vicious circle. Costs spiral, leading to lowered demand for new ships, which drives costs even higher. There’s very little appetite for ships at these prices. The law is self-destructing.

The full proposal to require US ships would drastically reduce American exports (and even more drastically reduce American imports). As in, we’d have to go without most of them, for many years. There’s no way to quickly ramp up our shipyards sufficiently for this task, even if price was not a factor. The port of call fees are a profoundly terrible idea, but the ship origin requirements are riot-in-the-streets-level terrible.

The rhetoric is largely about Chinese-built vessels being terrible or a security risk. Even if one buys that, what one could do, both here and for the original Jones Act, is simply to restrict the specific thing you don’t like: Chinese-built, Chinese-flagged or Chinese-owned ships. Or even require the ships come from our allies. It wouldn’t be a free action, but we could substitute into Japanese, South Korean or European ships. Whereas if you demand American ships? They don’t exist. And having 100 years of such restrictions domestically has only ensured that.

It seems highly reasonable to be confused as to why this happened:

Maxwell Tabarrok: This is actually pretty confusing to me. The Jones Act should be a subsidy to domestic shipbuilding but the industry is completely dead.

I’ve written before that this might happen when protection creates a domestic monopoly, but I’m not so convinced by my own explanation.

The answer is that when you create a domestic monopoly or oligopoly without export discipline, you allow domestic industry to not compete on the international market, and instead they find it more profitable to service only the domestic protected market. We can’t compete on the international market even if we want to, because others offer large subsidizes and are already more efficient in various ways, so no one wants our ships and we can’t use that to improve or scale.

Unfortunately, the domestic market is not large enough to generate robust competition that creates reasonably priced ships, which decreases demand and causes shipbuilders to get less competitive still, pushing prices even higher, until the point where domestic ships are so expensive that more than a handful of Jones Act ships aren’t profitable. So at the end of the death spiral, we don’t make them anymore.

If you decide we need a domestic shipbuilding industry, there is a known playbook in these spots, which is to offer large subsidies and also enforce export discipline, as for example South Korea did during its development. No one seems to want to do that.

A discussion about many things, but the later more interesting part is about dealing with cognitive decline. In particular, a sadly common pattern is that you have someone who used to be unusually intelligent and capable. Then, for a variety of reasons including getting older and a toxic information and reward environment, and because having to ‘act dumb’ in various ways actually makes you dumb over time, and often probably drug use, they lose a step, and then they lose another step.

Now they are still well above average for intelligence and capability, but their self-image and habits and strategies are designed for their old selves. So they take on too much, in the wrong ways, and lose the thread.

Tantum has a mostly excellent thread about the difference between a rival and an enemy, or between positive-sum rivalry and competition versus zero-sum hostility, although I disagree with the emphasis he chosen for the conclusion.

Megan McArdle reminds us that Levels of Friction are required elements of many of civilization’s core systems, and without sufficient frictions, those systems break.

Dilan Esper: i think people don’t realize the extent to which easier and cheaper travel, the Internet, and fake asylum applications have wrecked the international asylum system carefully built after the Holocaust. Poland is a particularly sobering indicator of this.

Megan McArdle: We underestimate how many policies are only feasible because various frictions prevent abuse. When the frictions are lubricated, the policies collapse.

Alex Tabarrok asks, if we were confident Covid-19 was a lab leak, what then? His first conclusion is we should expect more pandemics going forward. That’s not obvious to me, because it means less natural pandemics and higher risk of lab-originated pandemics. It is within our power to prevent lab-originated pandemics but not natural pandemics, and indeed Alex’s core suggestions are about ensuring that we at least do our research under sufficiently safe conditions – I’d prefer that we not do it at all. Note that Alex would be right about expectations if we already had confidence in the rate of natural pandemics, but I think we largely don’t know and it may be changing.

The kind of study one instinctively assumes won’t replicate says that those who believe in the malleability specifically of beauty will therefore take more risk, as in if you give people articles showing this then they’ll take more risk, but malleability of intelligence doesn’t have the same impact. The theory is that this is mediated through optimism?

Matt Lakeman asks, quite literally from a real example: How Much Would You Need to be Paid to Live on a Deserted Island for 1.5 Years and Do Nothing but Kill Seals? Plus another year in transit to boot. He estimated $2-4 million, and the real workers were clearly paid far less. But that’s the thing about such jobs – you don’t have to pay anything like what the median person would need to take the job. Someone will do it for a lot less than that, and I’m guessing the median young person would come in well under $2 million already.

The ‘vibe shift’ arrives at Princeton, and certainly on Twitter.

Paul Graham: If Princeton students think the “vibe shift” is real, it is, because if it has reached them, it has reached pretty much everyone.

I don’t buy that this means it has reached everyone. The Ivies and Twitter are both places where the future is more highly distributed, that respond more to vibe shifts. It would make perfect sense for such places to feel a vibe shift, while students at (let’s say) Ohio State or other residents of Columbus felt relatively little change.

Are Monte Carlo algorithms hacks to be avoided? They are hacks, and randomization is dangerous, this is true. But sometimes, they’re the only way to get an estimate given the amount of complexity. There is also an underused variation, which I call the Probability Map. This is where you can simplify the set of relevant considerations sufficiently that you can track the probability of every possible intermediate state. To work this usually requires not caring about path dependence, but this simplification is more accurate more often than you would think.

A cool note from Christopher Alexander, I’m still a little bummed I never got to properly review A Pattern Language and it’s probably too late now.

A Pattern Language:

179. Alcoves

180. Window Place

181. The Fire

185. Sitting Circle

188. Bed Alcove

191. The Shape of Indoor Space

205. Structure Follows Social Spaces

A Time to Keep: “Make bedrooms small, and shared spaces big.” – CA

If you want a family to be together, don’t isolate them in giant bedrooms. Draw them toward the hearth, the table, the common room.

I keep my bedroom large, but that is because I work and exercise there. The isolation effect is intentional in those spots. In general, you want the bedroom to be the minimum size to accomplish its specific goals, and to spend the rest of your space on the common areas.

We definitely need a word for this. Claude suggested ‘attention saturation’ or ‘bid overflow’ but they’re two words and also not quite right.

Nick Cammarata: I’m surprised we don’t have a word for the shift when the bids for your time goes above your supply for time vs before, it feels like a pretty fundamental life shift where it changes your default mode of operation.

like if you get 200 bids for your time a week vs 2 the set of things you need to do to thrive are pretty different, different risks and ways to play your hand, need to defend energy in new ways

it ofc depends on your psychology too, you might be built to handle X amount of bids per week, it’s less about the absolute amount of bids and more the ratio of bids to what you can easily handle.

I’ve gone through this a number of times. I have a system where I determine how to allocate time, and how to respond to bids for time, both from people and from things. Then suddenly you realize your system doesn’t work, quickly, there’s no time. There needs to be a substantial shift and a lot of things get reconsidered.

I kind of want to call this a ‘repricing,’ or for full a Time Repricing Event? As with other things, you have menu costs, so you only want to reprice in general when things are sufficiently out of whack.

My experience matches Kelsey Piper’s here.

Kelsey Piper: every single time I have witnessed people decide to compromise on character and overlook major red flags because ‘hey, he’s good at winning’, they have regretted it very dearly and in very short order

cutting corners, lying, and cheating will get you ahead in the short run, and sometimes even in the long run, but tying your own fortunes to someone who behaves this way will go very badly for you.

if you sell your soul to the devil you’ll pay more than you intended to, and buy less.

Pursuing all-in soulless strategies can ‘work,’ although of course what does it profit a man if he should gain the whole world and all that. The person doing the lying and cheating will sometimes win out, in terms of ‘success.’ If you are also centrally in the lying and cheating business, it can sometimes work out for you too, in those same terms.

However. If you are not that, and you hitch your wagon to someone who is that in order to ‘win’? Disaster, almost without exception. It won’t work, not on any level.

I know that sounds like the kind of thing we all want to be true when it isn’t. So yes, you are right to be suspicious of such claims. The thing is, I think it really is true.

Paul Graham’s latest essay is What To Do. His answer, in addition to ‘help people’ and ‘take care of the world’ is ‘make good new things.’ Agreed.

Paul Graham: So there’s my guess at a set of principles to live by: take care of people and the world, and make good new things. Different people will do these to varying degrees. There will presumably be lots who focus entirely on taking care of people. There will be a few who focus mostly on making new things.

But even if you’re one of those, you should at least make sure that the new things you make don’t net harm people or the world. And if you go a step further and try to make things that help them, you may find you’re ahead on the trade. You’ll be more constrained in what you can make, but you’ll make it with more energy.

On the other hand, if you make something amazing, you’ll often be helping people or the world even if you didn’t mean to. Newton was driven by curiosity and ambition, not by any practical effect his work might have, and yet the practical effect of his work has been enormous. And this seems the rule rather than the exception. So if you think you can make something amazing, you should probably just go ahead and do it.

I’m not even sure it’s on you to make sure that you don’t do net harm. I’ll settle for ensuring you’re not going catastrophic harm, or at minimum that you’re not creating existential risks, say by creating things smarter and more capable than humans without knowing how to retain control over the resulting future. Oh, right, that.

Dean Ball writes about his intellectual background and process. It’s a completely different process from mine, focusing on absorbing lots of background knowledge and understanding intellectual figures through reading, especially books. It reminded me of Tyler Cowen’s approach. One thing we all have in common is we intentionally play to our strengths. If I tried to do what they do, it wouldn’t work.

Connections follow power laws and the best ones are insanely valuable.

Alessandro: I believed the quote in Caplan’s tweet [that rich kids mostly succeed because of genetics], and then I ended up ~doubling my lifetime expected earnings because of a lucky personal connection.

It would be unBayesian of me not to update my prior!

Properly optimizing for the actions that maximize chances of making the most valuable connections is difficult, but highly valuable. Blogging definitely helps.

Federal complaint alleges that construction equipment rental firms have engaged for 15 years in a widespread cartel to limit capacity and drive up construction costs. I file this under Good News because we know how expensive it is to build and this could mean there is an easy way to make that number go down.

In developing countries, for those with college degrees, having low-skill job experience makes employers 10% more interested in hiring you versus not having any experience at all. Work it.

Acid rain is the classic example of a problem that was solved by coordination, thus proving that such coordination only solves imaginary problems. Many such cases.

A great question:

Patrick Collison: In which domains are elite practitioners celebrating the kids being better than ever before? Would love to read about a few instances. (Not just where there’s one particular genius, such as Ashwin Sah’s recent success, but where “the kids” as some kind of aggregate appear to be improving.)

The first category, which had a lot of responses, was that ‘the kids’ are better in particular bounded domains with largely fixed rules. My model agrees with this. If it’s a bounded domain with clear rules where one can be better by following standard practices and working harder, the kids are alright, and better than ever.

Tyler Cowen: The kids are clearly better in chess.

Ulkar: definitely in classical music. the sheer number of outstanding young musicians is probably higher than ever before in history

Patrick McKenzie: Japanese language acquisition for non-heritage speakers. (I non-ironically think it’s primarily YouTube’s doing.)

Eric Gilliam: In American wrestling, high schoolers are getting *waybetter. This year at Olympic trials, a few ~16-year-olds took out some NCAA champs. And those guys still lose some hs matches! Guesses why include more kids getting elite coaching early and internet instructionals.

The second category was founders, and Dwarkesh Patel said ‘big picture thinkers.’ Paul Graham was the most obvious one to say it but there were also others.

Paul Graham: Young startup founders seem better than ever, though I realize this is a bold claim to make to you.

Patrick Collison: Who’s the best founder under 28? I’m deliberately choosing an arbitrary age to exclude Alex Wang, who is extremely impressive, but I feel like years past usually had a super young (<28) clear industry leader. (Zuckerberg, Dell, Jobs, Gates, Andreessen, etc.)

My hypothesis there is that we have systematized VC-backed YC-style founders. The rules are a lot easier to discover and follow, the track record there makes it a career path one can essentially plan on in a way that it wasn’t before, and the people who gate progress with money are there to reward those who internalize and follow those principles.

This makes Dwarkesh the only one I saw whose answer didn’t fit into the model that ‘kids these days’ are excellent at rule learning and following and working hard on that basis, but this has left little room for much else. I don’t know how this would lead to there being more or better big picture thinkers. Also I’m not at all convinced Dwarkesh is right about this, I suspect it’s that the current crop is easy for him to pick up upon and we forget about many from older crops.

As I mentioned when I wrote about taste, it is usually better to like and enjoy things.

Aprii: enjoying things rules

  1. it is good to enjoy things

  2. it is not bad to enjoy things

  3. it is okay, though usually not ideal, to not enjoy things

There are some things i will look down on someone for enjoying but most of the time i do that i think it’s a failing in my part.

Anna Magpie: Counterpoint: Enjoying things that are bad for you often results in them displacing things that are good for you but slightly less enjoyable (for example I am currently on Twitter instead of reading a novel)

Aprii: in an ideal world this is solved by enjoying novels more.

The cases where you want to not like things is where liking them would cause you to make bad choices, which are more expensive than the value you would get, and you are unable to adjust for this effect because of bias or because it gives you a bad world model.

The canonical example of the first case is heroin. The common pattern, which also applies to novels versus Twitter, tends to be hyperbolic discounting. You want to like things that have long term benefits relatively more, and this often rises to the point where it would be better to like other things less. Another risk is that you end up doing too little exploring and too much exploiting.

The second case is where the value is in choosing, so liking everything can muddle your ability to choose. It doesn’t have to, if you can differentiate between what you like and what you predict others will like. But that can be tricky.

Don’t say you weren’t warned, as Roku tests autoplay ads on its home screen.

I find it mind boggling to think such ads are efficient. They are beyond obnoxious, and there are many customers who would act similarly to Leah:

Leah Libresco Sargeant: I have kids and a @Roku TV

If they autoplay video ads on boot up, we will absolutely ditch it and find a new tv. I’m not using any device or service with the potential to autoplay violent tv or movie ads the second you hit the power button.

Even without that concern, such obnoxiousness in your face is unacceptable. My current LG TVs do have some ads on the home screen, but they’re always silent, they never stop you from navigation, and even then I hate them so much. If they forced me to interact with the ad in order to proceed? Yep, TV straight in the trash, or down to goodwill. If the ads are so bad people don’t want your TV for $0, how much are the ads worth to you, exacctly?

We also need to have a word about certain highly obnoxious autoplay and ad settings inside TV apps. As in, every time I go to Paramount+, I am careful to actively mute the television first, or I know I am going to regret it. Then you have to be sure to skip other ads. Why would you make opening your own app this stressful? Yet this seems to be how much I will endure to keep watching Taylor Tomlinson.

And then there’s Prime Video, which will have multi-minute blocks of unskippable obnoxiousness during movies, and doesn’t even use caution with who gets to do that:

Sarah Constantin: I’ve been unpleasantly surprised to see the ads on @PrimeVideo include what I’d normally think of as “vice” or “trashy” products.

Sketchy weight loss supplements, shady-looking finance apps marketed in a gambling-esque “surprise free money” way, etc.

I would have assumed that somebody buying ads on what is now the equivalent of a major television network would have a certain amount of “taste” such that they wouldn’t be willing to advertise exploitative products to a super-broad audience.

Differing opinions about Severance. I am on the side of masterpiece, I think Blow’s objection here is wrong and expect it to stick the landing and be my 8th Tier 1 show.

I’ve also been watching The White Lotus for the first time, which is also excellent and I expect to put it in Tier 2.

I still have a few Beli invites if anyone wants one. Beli lets you rank restaurants via Elo, tracks your preferences and gives you predictive ratings. I am a little worried they still haven’t integrated Beli with web or any good export mechanism so I can’t easily feed everything into an LLM or save it elsewhere, but I’ve found it to be useful for research and search and also for note taking.

Looks Mapping, a service that tells you how hot the people reviewing a restaurant on Google Maps tend to be. There was not an obvious correlation here with which restaurants are worth going to.

This list of the best croissants in NYC is unusually good, many excellent picks, including my current top two of Tall Poppy and Alf Bakery (in that order).

It’s happening! Eventually. Probably. I hope?

Bigad Shaban:

  1. Waymo gets green light to start “mapping” San Francisco airport in hopes of ultimately using its driverless cars to pick up and drop off passengers at SFO. Mapping process will train fleet where to go and will be done with human safety drivers behind the wheel.

  2. After mapping, cars will then need to go on test drives at SFO without a driver. An official decision on ultimately granting SFO access to Waymo’s driverless cars still hasn’t been made.

  3. This mapping process could take weeks or even months and allows for two cars to be at the airport at a time. No passengers can be inside — just the safety driver. If Waymo gets approved to pick up & drop off passengers, there’s still no timeline on when that could begin.

Paula: as someone who either walks or takes a waymo, these announcements are like when you unlock a new area in an open-world game.

Waymo: We’re pleased to share that the CA DMV gave Waymo approval to operate fully autonomously in expanded South Bay areas, including almost all of San Jose!

While the public won’t have access at this time, we’re working closely with local officials, emergency responders, and communities to safely expand driving operations.

It’s happening in Washington, DC too, coming in 2026.

I say this utterly seriously: Whoever runs for mayor on the ‘bring Waymo to NYC whatever it takes’ platform gets my vote, even if it’s Andrew Cuomo, I don’t care. Single issue voter.

They’re also making progress on being less insane about age requirements? They’re trying out ‘teen accounts’ for ages 14-17, ‘with parental permission.’

Timothy Lee: I hope they lower the minimum age over time. There’s no reason a 12 year old shouldn’t be able to ride in a Waymo alone.

Parents (especially of girls) might feel more comfortable if there is no driver. Also in the long run Waymos will hopefully be much cheaper than a conventional taxi.

I suppose you need some age requirement but I also presume it should be, like, 6.

As he periodically does, Timothy Lee also checks Waymo’s few crashes. There were 38 between July 2024 and February 2025. Not only are Waymos crashing and injuring people far less often than human drivers, with about 90 percent fewer insurance claims, when there is an incident it is almost always unambiguously a human driver’s fault. The question even more than before is not whether to allow Waymos everywhere all the time, it is whether humans should be driving at all.

Timothy Lee: A large majority of serious Waymo crashes are “Waymo scrupulously following the law, lunatic human driver breaks the law and crashes into the Waymo.”

Waymo still has one big problem. It obeys traffic laws and drives ‘too safely,’ which means that the drive that takes 41 minutes in an Uber or Lyft can take 57 in a Waymo. This example might also be geofencing, but the problem is real. There probably isn’t anything we can do about it while we are holding self-driving cars to insanely higher safety standards than human drivers.

In the social media age, the red card rule applies to attention, if you’re innovative everything works the first time. Thus, we have tech workers leaving notes in Waymos, looking to hire software engineers or find hot dates. That’s a great idea, but the reason it scaled was social media, and that presumably won’t work again, not unless your notes are increasingly bespoke. If I was Waymo, my policy would be to allow this and even have a protocol, but restrict it to handwritten notes.

Sandy Peterson has been having fun looking back on Age of Empires.

Famed King of Kong (which is a great movie) villain and by all accounts notorious video game cheater Billy Mitchell won a defamation lawsuit against YouTuber Karl Jobst in Australia. It turns out that if you incorporate a specific false claim into an attack narrative and general crusade, you can get sued for it even if you did begrudgingly take that particular fact back at some point.

In a Magic match, is it okay to not kill your opponent in order to take time off the clock, if you’re sure it would work and there’s no in-game advantage to waiting?

Discussions ensue. I see a big difference between being illegal versus unethical. As I understand the rules, this is technically legal.

The argument for it being fine is that you are never forced to play your cards, and they are welcome to concede at any time, although they have no way of knowing that they can safely concede.

But you are making a play, that is otherwise to your disadvantage, in order to bleed the clock. I think that’s basically never okay. And when I see people broadly thinking it is okay, it makes me much less interested in playing. It’s a miserable experience.

After reflection and debate, my position is that:

  1. It is always honorable to make a play to make the game finish faster.

  2. You are under no obligation to sacrifice even a tiny amount of win percentage in the game or match to make the game finish faster, if you don’t want to do that.

  3. You are dishonorable scum if you play in order to make the game finish slower, in a way you would not behave if this was a fully untimed round.

  4. That is different from what is punishable cheating. Which is fine.

Also making me much less interested is the lack of a banned list. As I understand it, cheating is rather rampant, as you would expect without a banned list.

Yankees invent a new type of bat, thanks that one guy who worked on it.

Will Manidis: the yankees hired a single smart guy to think about baseball bats for a year and he fundamentally changed the game forever

the efficient market hypothesis is an total lie. the most important problems in the world go unsolved because no one spends the time to think about them

“I’m sure someone has thought about this before and found out it’s impossible”

no they haven’t, no one has spent the time. most “hard work” is spent on stamp collecting, neat little procedural iterations on things that we already know are possible. just spend the time thinking

Chinese TikTok claims to spill the tea on a bunch of ‘luxury’ brands producing their products in China, then slapping ‘Made in Italy’ style tags on them. I mean, everyone who is surprised raise your hand, that’s what I thought, but also why would the Chinese want to be talking about it if it was true? I get it feels good in the moment but you want brands to be able to count on your discretion.

A Twitter thread of great wholesome replies, recommended, more please. Here’s a note on #12:

Lindsay Eagar (this was #12): I brought my four-year-old to meet my boyfriend at the aquarium. She said, “I love you and want you to be my dad.”

I nearly died, but he said, “How about I pretend to be your dad for today?” and then they held hands the whole day.

We got married, he adopted her, he’s her dad.

Visakan Veerasamy: great example of someone receiving a large ask and appropriately right-sizing it into something smaller (and eventually delivering on the large ask too, but that one day was perfect even and especially if he couldn’t follow through for whatever reason)

simply existing as a person like this is a public service to everyone around you. people learn to get better at asking for help + helping others when everyone can correct/transmute/scale requests appropriately. this then allows the rate-of-help to increase, which is wealth

if you look up any unusually successful scene, IME you’ll always find some behind-the-scene manager who was the de-facto mayor who’s like this, that everyone goes to for counsel, to resolve disputes, etc. people like this keep scenes and communities together longer than normal

A good question.

Whole thing feels kind of sus.

Speaking of which…

More Perfect Union: DoorDash and Klarna have signed a deal where customers can choose to pay for food deliveries in interest-free installments or deferred options aligned with payday schedules.

Axial Wanderer: We are selling pad thai in installments to willing buyers at the current fair market price

OldWorld Marc: But John, if we do that, no one will ever finance his kung pao chicken through us ever again!!

Maselaw: They can slow you down. But they can’t stop you. It’s your burrito to sell.

0xtopfloor: “Here’s Margot Robbie in a bubble bath to explain”

Checks out.

New fingerprint lock can literally be opened in 15 seconds with a screwdriver, by straight taking off its screws.

You’d think so, but I am highly confident you would be wrong:

Andy Kaczynski: This is quite the quote

Scott Lincicome:

Discussion about this post

Monthly Roundup #29: April 2025 Read More »

live-demos-test-effectiveness-of-revolutionary-war-weapons

Live demos test effectiveness of Revolutionary War weapons


not just men with muskets

Pitting the Brown Bess against the long rifle, testing the first military submarine, and more.

The colonial victory against the British in the American Revolutionary War was far from a predetermined outcome. In addition to good strategy and the timely appearance of key allies like the French, Continental soldiers relied on several key technological innovations in weaponry. But just how accurate is an 18th-century musket when it comes to hitting a target? Did the rifle really determine the outcome of the war? And just how much damage did cannon inflict? A team of military weapons experts and re-enactors set about testing some of those questions in a new NOVA documentary, Revolutionary War Weapons.

The documentary examines the firing range and accuracy of Brown Bess muskets and long rifles used by both the British and the Continental Army during the Battles of Lexington and Concord; the effectiveness of Native American tomahawks for close combat (no, they were usually not thrown as depicted in so many popular films, but there are modern throwing competitions today); and the effectiveness of cannons against the gabions and other defenses employed to protect the British fortress during the pivotal Siege of Yorktown. There is even a fascinating segment on the first military submarine, dubbed “the Turtle,” created by American inventor David Bushnell.

To capture all the high-speed ballistics action, director Stuart Powell relied upon a range of high-speed cameras called the Phantom Range. “It is like a supercomputer,” Powell told Ars. “It is a camera, but it doesn’t feel like a camera. You need to be really well-coordinated on the day when you’re using it because it bursts for, like, 10 seconds. It doesn’t record constantly because it’s taking so much data. Depending on what the frame rate is, you only get a certain amount of time. So you’re trying to coordinate that with someone trying to fire a 250-year-old piece of technology. If the gun doesn’t go off, if something goes wrong on set, you’ll miss it. Then it takes five minutes to reboot and get ready for the new shot. So a lot of the shoot revolves around the camera; that’s not normally the case.”

Constraints to keep the run time short meant that not every experiment the crew filmed ended up in the final document, according to Powell. For instance, there was one experiment in a hypoxia chamber for the segment on the Turtle, meant to see how long a person could function once the sub had descended, limiting the oxygen supply. “We felt there was slightly too much on the Turtle,” said Powell. “It took up a third of the whole film.” Also cut, for similar reasons, were power demonstrations for the musket, using boards instead of ballistic gel. But these cuts were anomalies in the tightly planned shooting schedule; most of the footage found its way onscreen.

The task of setting up all those field experiments fell to experts like military historian and weapons expert Joel Bohy, who is a frequent appraiser for Antiques Roadshow. We caught up with Bohy to learn more.

Redcoat re-enactors play out the Battle of Lexington. GBH/NOVA

Ars Technica: Obviously you can’t work with the original weapons because they’re priceless. How did you go about making replicas as close as possible to the originals?

Joel Bohy: Prior to our live fire studies, I started to collect the best contemporary reproductions of all of the different arms that were used. Over the years, I’ve had these custom-built, and now I have about 14 of them, so that we can cover pretty much every different type of arm used in the Revolution. I have my pick when we want to go out to the range and shoot at ballistics gelatin. We’ve published some great papers. The latest one was in conjunction with a bullet strike study where we went through and used modern forensic techniques to not only locate where each shooter was, what caliber the gun was, using ballistics rods and lasers, but we also had 18th-century house sections built and shot at the sections to replicate that damage. It was a validation study, and those firearms came in very handy.

Ars Technica: What else can we learn from these kinds of experiments?

Joel Bohy: One of the things that’s great about the archeology end of it is when we’re finding fired ammunition. I mostly volunteer with archaeologists on the Revolutionary War. One of my colleagues has worked on the Little Bighorn battlefield doing firing pin impressions, which leave a fingerprint, so he could track troopers and Native Americans across the battlefields. With [the Revolutionary War], it’s harder to do because we’re using smooth-bore guns that don’t necessarily leave a signature. But what they do leave is a caliber, and they also leave a location. We GIS all this stuff and map it, and it’s told us things about the battles that we never knew before. We just did one last August that hasn’t been released yet that changes where people thought a battle took place.

We like to combine that with our live fire studies. So when we [conduct the latter], we take a shot, then we metal detect each shot, bag it, tag it. We record all the data that we see on our musket balls that we fired so that when we’re on an archeology project, we can correlate that with what we see in the ground. We can see if it hits a tree, if it hits rocks, how close was a soldier when they fired—all based upon the deformation of the musket ball.

Ars Technica: What is the experience of shooting a replica of a musket compared to, say, a modern rifle?

Joel Bohy: It’s a lot different. When you’re firing a modern rifle, you pull the trigger and it’s very quick—a matter of milliseconds and the bullet’s downrange. With the musket, it’s similar, but it’s slower, and you can anticipate the shot. By the time the cock goes down, the flint strikes the hammer, it ignites the powder in the pan, which goes through the vent and sets off the charge—there’s a lot more time involved in that. So you can anticipate and flinch. You may not necessarily get the best shot as you would on a more modern rifle. There’s still a lot of kick, and there’s a lot more smoke because of the black powder that’s being used. With modern smokeless powder, you have very little smoke compared to the muskets.

Ars Technica: It’s often said that throughout the history of warfare, whoever has the superior weapons wins. This series presents a more nuanced picture of how such conflicts play out.

John Hargreaves making David Bushnell’s submarine bomb. GBH/Nova

Joel Bohy: In the Revolutionary War, you have both sides basically using the same type of firearm. Yes, some were using rifles, depending on what region you were from, and units in the British Army used rifles. But for the most part, they’re all using flintlock mechanisms and smoothbore guns. What comes into play in the Revolution is, on the [Continental] side, they don’t have the supply of arms that the British do. There was an embargo in place in 1774 so that no British arms could be shipped into Boston and North America. So you have a lot of innovation with gunsmiths and blacksmiths and clockmakers, who were taking older gun parts, barrels, and locks and building a functional firearm.

You saw a lot of the Americans at the beginning of the war trying to scrape through with these guns made from old parts and cobbled together. They’re functional. We didn’t really have that lock-making and barrel-making industry here. A lot of that stuff we had imported. So even if a gun was being made here, the firing mechanism and the barrels were imported. So we had to come up with another way to do it.

We started to receive a trickle of arms from the French in 1777, and to my mind, that’s what helped change the outcome of the war. Not only did we have French troops arriving, but we also had French cloth, shoes, hats, tin, powder, flints, and a ton of arms being shipped in. The French took all of their old guns from their last model that they had issued to the army, and they basically sold them all to us. So we had this huge influx of French arms that helped resupply us and made the war viable for us.

Close-up of a cannon firing. GBH/NOVA

Ars Technica: There are a lot of popular misconceptions about the history of the American Civil War. What are a couple of things that you wish more Americans understood about that conflict?

Joel Bohy: The onset of the American Revolution, April 1775, when the war began—these weren’t just a bunch of farmers who grabbed their rifle from over the fireplace and went out and beat the British Army. These people had been training and arming themselves for a long time. They had been doing it for generations before in wars with Native forces and the French since the 17th century. So by the time the Revolution broke out, they were as prepared as they could be for it.

“The rifle won the Revolution” is one of the things that I hear. No, it didn’t. Like I said, the French arms coming in helped us win the Revolution. A rifle is a tool, just like a smoothbore musket is. It has its benefits and it has its downfalls. It’s slower to load, you can’t mount a bayonet on it, but it’s more accurate, whereas the musket, you can load and fire faster, and you can mount a bayonet. So the gun that really won the Revolution was the musket, not the rifle.

It’s all well and good to be proud of being an American and our history and everything else, but these people just didn’t jump out of bed and fight. These people were training, they were drilling, they were preparing and arming and supplying not just arms, but food, cloth, tents, things that they would need to continue to have an army once the war broke out. It wasn’t just a big—poof—this happened and we won.

Revolutionary War Weapons is now streaming on YouTube and is also available on PBS.

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Live demos test effectiveness of Revolutionary War weapons Read More »