Author name: Beth Washington

nasa-rewraps-boeing-starliner-astrovan-ii-for-artemis-ii-ride-to-launch-pad

NASA rewraps Boeing Starliner Astrovan II for Artemis II ride to launch pad

Artemis II, meet Astrovan II.

NASA’s first astronauts who will fly by the moon in more than 50 years participated in a practice launch countdown on Saturday, December 20, including taking their first trip on a transport vehicle steeped in almost the entire span of US space history—from Apollo through to the ongoing commercial crew program.

Three men and a woman wearing bright orange pressure suits pose for a photo next to a motor coach.

Artemis II astronauts (from right to left) Reid Wiseman, Victor Glover, Christina Koch, and Jeremy Hansen pose for photographs before boarding the Astrovan II crew transport vehicle for a ride to their rocket during a rehearsal of their launch-day activities at NASA’s Kennedy Space Center in Florida on Saturday, Dec. 20, 2025. Credit: NASA/Aubrey Gemignani

Artemis II commander Reid Wiseman, pilot Victor Glover, and mission specialist Christina Koch (all with NASA) and mission specialist Jeremy Hansen, an astronaut with the Canadian Space Agency, began the rehearsal at the Kennedy Space Center in Florida, proceeding as they will when they are ready to fly next year (the Artemis II launch is slated for no earlier than the first week of February and no later than April 2026).

Parked outside of their crew quarters and suit-up room was their ride to their rocket, “Astrovan II,” a modified Airstream motorhome. The almost 25-foot-long (8-meter) crew transport vehicle (CTV) was custom-wrapped with graphics depicting the moon, the Artemis II mission patch, and program insignia.

From Canoo to coach

Airstream’s Atlas Touring Coach, though, was not originally planned as NASA’s Artemis CTV. In July 2023, NASA took delivery of three fully electric vans from Canoo Technologies after the company, a startup based in Torrance, California, was awarded the contract the year before. At the time, NASA touted its selection as focusing on the “crews’ safety and comfort on the way to the [launch] pad.”

Three vans with rounded corners are parked side by side in front of a large building and an overcast sky.

The three Canoo Technologies’ specially designed, fully-electric, environmentally friendly crew transportation vehicles for Artemis missions arrived at Kennedy Space Center on July 11, 2023. The company now bankrupt, the CTVs will serve as a backup to the Astrovan II. Credit: NASA/Isaac Watson

Six months later, Canoo filed for bankruptcy, and NASA ceased active use of the electric vans, citing a lack of support for its mission requirements. Instead, the agency turned to another of its commercial partners, Boeing, which had its own CTV but no astronauts at present to use it.

NASA rewraps Boeing Starliner Astrovan II for Artemis II ride to launch pad Read More »

parasites-plagued-roman-soldiers-at-hadrian’s-wall

Parasites plagued Roman soldiers at Hadrian’s Wall

It probably sucked to be a Roman soldier guarding Hadrian’s Wall circa the third century CE. W.H. Auden imagined the likely harsh conditions in his poem “Roman Wall Blues,” in which a soldier laments enduring wet wind and rain with “lice in my tunic and a cold in my nose.” We can now add chronic nausea and bouts of diarrhea to his list of likely woes, thanks to parasitic infections, according to a new paper published in the journal Parasitology.

As previously reported, archaeologists can learn a great deal by studying the remains of intestinal parasites in ancient feces. For instance, in 2022, we reported on an analysis of soil samples collected from a stone toilet found within the ruins of a swanky 7th-century BCE villa just outside Jerusalem. That analysis revealed the presence of parasitic eggs from four different species: whipworm, beef/pork tapeworm, roundworm, and pinworm. (It’s the earliest record of roundworm and pinworm in ancient Israel.)

Later that same year, researchers from the University of Cambridge and the University of British Columbia analyzed the residue on an ancient Roman ceramic pot excavated at the site of a 5th-century CE Roman villa at Gerace, a rural district in Sicily. They identified the eggs of intestinal parasitic worms commonly found in feces—strong evidence that the 1,500-year-old pot in question was most likely used as a chamber pot.

Other prior studies have compared fecal parasites found in hunter-gatherer and farming communities, revealing dramatic dietary changes, as well as shifts in settlement patterns and social organization coinciding with the rise of agriculture. This latest paper analyzes sediment collected from sewer drains at the Roman fort at Vindolanda, located just south of the defense fortification known as Hadrian’s Wall.

An antiquarian named William Camden recorded the existence of the ruins in a 1586 treatise. Over the next 200 years, many people visited the site, discovering a military bathhouse in 1702 and an altar in 1715.  Another altar found in 1914 confirmed that the fort had been called Vindolanda. Serious archaeological excavation at the site began in the 1930s. The site is most famous for the so-called Vindolanda tablets, among the oldest surviving handwritten documents in the UK—and for the 2023 discovery of what appeared to be an ancient Roman dildo, although others argued the phallus-shaped artifact was more likely to be a drop spindle used for spinning yarn.

Parasites plagued Roman soldiers at Hadrian’s Wall Read More »

switch-2-pub-backs-off-game-key-cards-after-leaking-lower-cost-cartridge-options

Switch 2 pub backs off Game Key Cards after leaking lower-cost cartridge options

The Switch 2’s data-free, download-enabling Game Key Cards have proved controversial with players who worry about long-term ownership and access issues to their purchases. But they’ve remained popular with publishers that want to save production costs on a boxed Switch 2 game release, since Game Key Cards don’t include any of the expensive flash memory found on a standard Switch 2 cartridge.

Now, though, at least one publisher has publicly suggested that Nintendo is offering cheaper Switch 2 cartridge options with smaller storage capacities, lowering production costs in a way that could make full cartridge releases more viable for many games on the console.

Earlier this week, R-Type Dimensions III publisher Inin Games explained to customers that it couldn’t switch from Game Key Cards to a “full physical cartridge” for the retail version of the Switch 2 game without “significantly rais[ing] manufacturing costs.” Those additional costs would “force us to increase the retail price by at least €15 [about $20],” Inin Games wrote at the time.

In an update posted to social media earlier today, though, the publisher said that “there is no better timing: two days ago Nintendo announced two new smaller cartridge [storage capacity] sizes for Nintendo Switch 2. This allows us to recalculate production in a way that wasn’t possible before.”

As such, Inin said it has decided to replace the Game Key Cards that were going to be in the game’s retail box with full physical cartridges. That change will result in the game’s asking price going up by €10 (about $13) “due to still higher production costs,” Inin explained. Still, that’s still less than the “at least €15” Inin was speculatively quoting for the same change just days ago. And Inin said early pre-order customers for R-Type Dimensions III won’t have to pay the increased price, essentially getting the full cartridge at no additional cost.

Switch 2 pub backs off Game Key Cards after leaking lower-cost cartridge options Read More »

ai-#147:-flash-forward

AI #147: Flash Forward

This week I covered GPT 5.2, which I concluded is a frontier model only for the frontier.

OpenAI also gave us Image 1.5 and a new image generation mode inside ChatGPT. Image 1.5 looks comparable to Nana Banana Pro, it’s hard to know which is better. They also inked a deal for Disney’s characters, then sued Google for copyright infringement on the basis of Google doing all the copyright infringement.

As a probable coda to the year’s model releases we also got Gemini 3 Flash, which I cover in this post. It is a good model given its speed and price, and likely has a niche. It captures the bulk of Gemini 3 Pro’s intelligence quickly, at a low price.

The Trump Administration issued a modestly softened version Executive Order on AI, attempting to impose as much of a moratorium banning state AI laws as they can. We may see them in court, on various fronts, or it may amount to little. Their offer, in terms of a ‘federal framework,’ continues to be nothing. a16z issued their ‘federal framework’ proposal, which is also nothing, except also that you should pay them.

In non-AI content, I’m in the middle of my Affordability sequence. I started with The $140,000 Question, then The $140,000 Question: Cost Changes Over Time. Next up is a fun one about quality over time, then hopefully we’re ready for the central thesis.

  1. Language Models Offer Mundane Utility. Give it to me straight, Claude.

  2. Language Models Don’t Offer Mundane Utility. If you ask an AI ethicist.

  3. Huh, Upgrades. Claude Code features, Google things, ChatGPT branching.

  4. On Your Marks. FrontierScience as a new benchmark, GPT-5.2 leads.

  5. Choose Your Fighter. The less bold of Dean Ball’s endorsements of Opus 4.5.

  6. Get My Agent On The Line. LLM game theory plays differently.

  7. Deepfaketown and Botpocalypse Soon. The misinformation balance of power.

  8. Fun With Media Generation. Image 1.5 challenges Nana Banana Pro.

  9. Copyright Confrontation. Disney inks a deal with OpenAI and sues Google.

  10. Overcoming Bias. Algorithms, like life, are not fair. Is trying a category error?

  11. Unprompted Attention. Objection, user is leading the witness.

  12. They Took Our Jobs. CEOs universally see AI as transformative.

  13. Feeling the AGI Take Our Jobs. Is Claude Opus 4.5 AGI? Dean Ball says yes.

  14. The Art of the Jailbreak. OpenAI makes jailbreaks against its terms of service.

  15. Get Involved. Lightcone Infrastructure starts its annual fundraiser, and more.

  16. Introducing. Gemini Deep Research Agents for Developers, Nvidia Nemotron 3.

  17. Gemini Flash 3. It’s a very strong model given its speed and price.

  18. In Other AI News. OpenAI to prioritize enterprise AI and also enable adult mode.

  19. Going Too Meta. Meta’s AI superstars think they’re better than sell ads. Are they?

  20. Show Me the Money. OpenAI in talks to raise $10 billion from Amazon.

  21. Bubble, Bubble, Toil and Trouble. You call this a bubble? Amateurs.

  22. Quiet Speculations. A lot of what was predicted for 2025 did actually happen.

  23. Timelines. Shane Legg still has median timeline for AGI of 2028.

  24. The Quest for Sane Regulations. Bernie Sanders wants to stop data centers.

  25. My Offer Is Nothing. Trump Administration issues an AI executive order.

  26. My Offer Is Nothing, Except Also Pay Me. a16z tries to dress up offering nothing.

  27. Chip City. Nvidia implements chip location verification.

  28. The Week in Audio. Alex Bores on Odd Lots, Schulman, Shor, Legg, Alex Jones.

  29. Rhetorical Lack Of Innovation. Noah Smith dives into the 101 questions.

  30. People Really Do Not Like AI.

  31. Rhetorical Innovation.

  32. Bad Guy With An AI.

  33. Misaligned!

  34. Aligning a Smarter Than Human Intelligence is Difficult.

  35. Mom, Owain Evans Is Turning The AIs Evil Again.

  36. Messages From Janusworld.

  37. The Lighter Side.

A miracle of the modern age, at least for now:

Ava: generally I worry AI is too sycophantic but one time my friend fed his journals into claude to ask about a situationship and it was like “YOU are the problem leave her alone!!!!” like damn claude

Eliezer Yudkowsky: The ability to have AI do this when the situation calls for it is a fragile, precious civilizational resource that by default will be devoured in the flames of competition. Which I guess means we need benchmarks about it.

I think we will continue to have that option, the question is whether you will be among those wise enough to take advantage of it. It won’t be default behavior of the most popular models, you will have to seek it out and cultivate the proper vibes. The same has always been true if you want to have a friend or family member who will do this for you, you have to work to make that happen. It’s invaluable, from either source.

Tell Claude Code to learn skills (here in tldraw), and it will. You can then ask it to create an app, then a skill for that app.

Tell Codex, or Claude Code, to do basically anything?

Rohit: Wife saw me use codex to solve one of her work problems. Just typed what she said late at night into the terminal window, pressed enter, then went to sleep. Morning it had run for ~30 mins and done all the analyses incl file reorgs she wanted.

She kept going “how can it do this”

This wasn’t some hyper complicated coding problem, but it was quite annoying actual analysis problem. Would’ve taken hours either manually for her or her team.

In other news she has significantly less respect for my skillz.

The only thing standing in the way of 30 minutes sessions is, presumably, dangerously generous permissions? Claude Code keeps interrupting me to ask for permissions.

So sayeth all the AI ethicists, and there’s a new paper to call them out on it.

Seb Krier: Great paper. In many fields, you must find a problem, a risk, or an injustice to solve to get published. Academics need to publish papers to get jobs/funding. So there’s a strong bias towards negativity and catastrophizing. The Shirky Principle in action!

Gavin Leech: nice hermeneutics of suspicion you have there.. would be a shame if anyone were to.. use it even-handedly

Seb Krier: oh no!! 😇

My experience is that ‘[X] Ethics’ will almost always have a full Asymmetric Justice obsession with finding specific harms, and not care about offsetting gains.

Claude: We’ve shipped more updates for Claude Code:

– Syntax highlighting for diffs

– Prompt suggestions

– First-party plugins marketplace

– Shareable guest passes

We’ve added syntax highlighting to diffs in Claude Code, making it easier to scan Claude’s proposed changes within the terminal view.

The syntax highlighting engine has improved themes, knows more languages, and is available in our native build.

Claude will now automatically suggest your next prompt.

After a task finishes, Claude will occasionally show a followup suggestion in ghost text. Press Enter to send it or Tab to prefill your next prompt.

Run /plugins to browse and batch install available plugins from the directory. You can install plugins at user, project, or local scope.

All Max users have 3 guest passes to share, and each can be redeemed for 1 week of free Pro access.

Run /passes to access your guest pass links.

That’s not even the biggest upgrade in practice, this is huge at least for what I’ve been up to:

Oikon: Claude Code 2.0.72 now allows Chrome to be operated.

After confirming that Status and Extension are enabled with the /chrome command, if you request browser operation, it will operate the browser using the MCP tool (mcp__claude-in-chrome__).

It can also be enabled with claude –chrome.

Chrome operation in Claude Code uses the MCP server in the same way as Chrome DevTools MCP. Therefore, it can be used in a similar manner to Chrome DevTools. On the other hand, effects such as context reduction cannot be expected.

There are two methods to set “Claude in Chrome (Beta)” to be enabled by default:

・Set “Enable by default” from the /chrome command

・Set “Claude in Chrome enabled by default” with the /config command

The following two options have been added for startup:

claude –chrome

claude –no-chrome

I’ve been working primarily on Chrome extensions, so the ability to close the loop is wonderful.

Google keeps making quality of life improvements in the background.

Gemini: Starting today, Gemini can serve up local results in a rich, visual format. See photos, ratings, and real-world info from @GoogleMaps, right where you need them.

Josh Woodward (DeepMind): We’re making it easier for @GeminiApp to work across Google. Three weeks ago, it was Google’s Shopping Graph and the 50 billion product listings there.

Today, it’s Gemini 🤝 Google Maps!

It’s remarkable that we didn’t have this before. I’ve checked for it several times in the past two years. They claim to have shipped 12 things in 5 days last week, including Mixboard, Jules Agent scanning for #Todo, Jules integration with Render, working HTML in Nano Banana Pro-powered redesigns,multi-screen export to clipboard, right-click everything for instant actions, smart mentions with the @ symbol, URLs as context, Opal in the Gemini app, and Pomelli as a tool for SMBs to generate on-brand content.

ChatGPT branching chats branch out to iOS and Android.

Wired reports OpenAI quietly rolled back its model router for free users last week.

GPT-5.2 disappoints in LMArena, which makes sense given what we know about its personality. It claims the 5th slot in Expert (behind Opus 4.5, Sonnet 4.5 and Gemini 3 Pro), and is #5 in Text Arena (in its high version), where it is lower than GPT-5.1. It is #2 in WebDev behind Opus. It is so weird to see Claude Opus 4.5 atop the scores now, ahead of Gemini 3 Pro.

OpenAI gives us a new benchmark, FrontierScience, which is likely better thought about as two distinct new benchmarks, FrontierResearch and ScienceOlympiad.

OpenAI: o bridge this gap, we’re introducing FrontierScience: a new benchmark built to measure expert-level scientific capabilities. FrontierScience is written and verified by experts across physics, chemistry, and biology, and consists of hundreds of questions designed to be difficult, original, and meaningful. FrontierScience includes two tracks of questions: Olympiad, which measures Olympiad-style scientific reasoning capabilities, and Research, which measures real-world scientific research abilities. Providing more insight into models’ scientific capabilities helps us track progress and advance AI-accelerated science.

In our initial evaluations, GPT‑5.2 is our top performing model on FrontierScience-Olympiad (scoring 77%) and Research (scoring 25%), ahead of other frontier models.

Here are the scores for both halves. There’s a lot of fiddliness in setting up and grading the research questions, less so for the Olympiad questions.

Dean Ball observes that the last few weeks have seen a large leap in capabilities, especially for command-line interface (CLI) coding agents like Claude Code and especially Claude Opus 4.5. They’ve now crossed the threshold where you can code up previously rather time-intensive things one-shot purely as intuition pumps or to double check some research. He gave me FOMO on that, I never think of doing it.

He also offers this bold claim:

Dean Ball: After hours of work with Opus 4.5, I believe we are already past the point where I would trust a frontier model to serve as my child’s “digital nanny.” The model could take as input a child’s screen activity while also running in an on-device app. It could intervene to guide children away from activities deemed “unhealthy” by their parents, closing the offending browser tab or app if need be.

As he notes you would need to deploy incrementally and keep an eye on it. The scaffolding to do that properly does not yet exist. But yes, I would totally do this with sufficiently strong scaffolding.

Dean Ball also mentions that he prompts the models like he would a colleague, assuming any prompt engineering skills he would otherwise develop would be obsolete quickly, and this lets him notice big jumps in capability right away. That goes both ways. You notice big jumps in what the models can do in ‘non-engineered’ mode by doing that, but you risk missing what they can do when engineered.

I mostly don’t prompt engineer either, except for being careful about context, vibes and especially leading the witness and triggering sycophancy. As in, the colleague you are prompting is smart, but they’re prone to telling you what you want to hear and very good at reading the vibes, so you need to keep that in mind.

Joe Weisenthal: It’s interesting that Claude has this market niche as the coding bot. Because also just from a pure chat perspective, its written prose is far less cloying than Gemini and ChatGPT.

Dave Guarino: Claude has Dave-verified good vibes™ (purely an empirical science though.)

Claude Opus 4.5 has two distinct niches.

  1. It is an excellent coder, especially together with Claude Code, and in general Anthropic has specialized in and makes its money on enterprise coding.

  2. Also it has much better vibes, personality, alignment, written prose and lack of slop and lack of sycophancy than the competition, and is far more pleasant to use.

And yeah, the combination there is weird. The world is weird.

Gemini actively wants to maximize its expected reward and wirehead, which is related to the phenomenon reported here from SMA:

SMA: gemini is extremely good, but only if you’re autistic with your prompts (extremely literal), because gemini is autistic. otherwise it’s overly literal and misunderstands the prompt.

gemini is direct autist-to-autist inference.

Don SouthWest: You literally have to type “make no other changes” every time in AI Studio. Thank God for winkey+V to paste from clipboard

But in Gemini website itself you can add that to the list of master prompts in the settings under ‘personal context’

A multi-model AI system outperformed 9/10 humans in cyberoffense in a study of vulnerability discovery.

Alex Imas, Kevin Lee and Sanjog Misra set up an experimental marketplace where human buyers and sellers with unique preferences could negotiate or they could outsource that to AIs.

A warning up front: I don’t think we learn much about AI, so you might want to skip the section, but I’m keeping it in because it is fun.

They raise principal-agent concerns. It seems like economists have the instinct to ignore all other risks from AI alignment, and treat it all as a principal-agent problem, and then get way too concerned about practical principal-agent issues, which I do not expect to be relevant in such a case? Or perhaps they are simply using that term to encompass every other potential problem?

Alex Imas: To improve on human-mediated outcomes, this prompt must successfully align the agent with the principal’s objectives and avoid injecting the principal’s own behavioral biases, non-instrumental traits, and personality quirks into the agent’s strategy. But Misra’s “Foundation Priors” (2025) argues theoretically, this is difficult to do: prompts are not neutral instructions, they embed principal’s non-instrumental traits, biases, and personality quirks.

A sufficiently capable AI will not take on the personality quirks, behavioral biases and non-instrumental traits during a delegated negotiation, except through the human telling the AI explicitly how to negotiate. In which case, okay, then.

Alex Imas: We find a great deal of dispersion in outcomes; in fact, dispersion in outcomes of agentic interactions is *greaterthan human-human benchmark. This result is robust to size of model used: smaller and larger models generate relatively similar levels of dispersion.

The smaller dispersion in human-human interactions can be attributed to greater use of 50/50 split social norm. Agents are less prone to use social norms.

They note a large gender gap. Women got better outcomes in AI-AI negotiations. They attribute this to prompting skill in aligning with the objective, which assumes that the men were trying to align with the stated objective, or that the main goal was to align incentives rather than choose superior strategic options.

The task was, once you strip out the details, a pure divide-the-pie with $4k in surplus, with 12 rounds of negotiation.

The AI rounds had higher variance because norms like 50/50 worked well in human-human interactions, whereas when there’s instructions given to AIs things get weird.

The thing is, they ask about ‘who wrote the prompt’ but they do not ask ‘what was in the prompt.’ This is all pure game theory, and predicting what prompts others will write and what ways the meaningless details would ‘leak into’ the negotiation. What kinds of strategies worked in this setting? We don’t know. But we do know the outcome distribution and that is a huge hint, with only a 3% failure rate for the AIs (which is still boggling my mind, dictator and divide-the-pie games should fail WAY more often than this when they don’t anchor at 50/50 or another Schilling point, the 12 rounds might help but not like this):

The asymmetry is weird. But given it exists in practice, we know the winning strategy was literally, as the buyer, is probably close to ‘offer $18,001, don’t budge.’ As the seller, the correct strategy is likely ‘offer $20,000, don’t budge’ since your chance of doing better than that is very low. Complicated prompts are unlikely to do better.

Actual AI-AI negotiations will involve hidden information and hidden preferences, so they will get complicated and a lot of skill issues attach, but also the AI will likely be using its built in negotiating skills rather than following a game theory script from a user. So I’m not sure this taught us anything. But it was fun, so it’s staying in.

Love is a battlefield. So is Twitter.

Kipply: it’s going to be so over for accounts posting misinformation that’s high-effort to prove wrong in three months of ai progress when i make bot accounts dedicated to debunking them

Grimes: Yes.

Kane: Tech doomerism has been consistently wrong through history bc they 1) fail to account for people developing new default understandings (“of course this pic is photoshopped”) and 2) fail to imagine how new technologies also benefit defenses against its misuse.

There is a deliberate campaign to expand the slur ‘doomer’ to include anyone who claims anything negative about any technology in history, ever, in any form.

As part of that effort, those people attempt to universally memory hole the idea that any technology in history has ever, in any way, made your world worse. My favorite of these are those like Ben Horowitz who feel compelled to say, no, everyone having access to nuclear weapons is a good thing.

I’m a technological optimist. I think that almost all technologies have been net positives for humanity. But you don’t get there by pretending that most every technology, perhaps starting with agriculture, has had its downsides, those downsides are often important, and yes some technologies have been negative and some warnings have been right.

The information environment, in particular, is reshaped in all directions by every communications and information technology that comes along. AI will be no different.

In the near term, for misinformation and AI, I believe Kipply is directionally correct, and that the balance favors defense. Misinformation, I like to say, is fundamentally demand driven, not supply constrained. The demand does not care much about quality or plausibility. AI can make your misinformation more plausible and harder to debunk, but misinformation does not want that. Misinformation wants to go viral, it wants the no good outgroup people to ‘debunk’ it and it wants to spread anyway.

Whereas if you’re looking to figure out what is true, or prove something is false, AI is a huge advantage. It used to take an order of magnitude more effort to debunk bullshit than it cost to generate bullshit, plus if you try you give it oxygen. Now you can increasingly debunk on the cheap, especially for your own use but also for others, and do so in a credible way since others can check your work.

A children’s plushy AI toy called a Miiloo reflects Chinese positions on various topics.

Kelsey Piper: in the near future you’ll be able to tell which of your children’s toys are CCP spyware by asking them if Xi Jinping looks like Winnie the Pooh

Various toys also as usual proved to have less than robust safety guardrails.

ChatGPT’s new image generator, Image 1.5, went live this week. It is better and faster (they say ‘up to’ 4x faster) at making and edits precise images, including text. It follows instructions better.

Their announcement did not give us any way to compare Image 1.5 to Gemini’s Nana Banana Pro, since OpenAI likes to pretend Google and Anthropic don’t exist.

My plan for now is to request all images from both ChatGPT and Gemini, using matching prompts, until and unless one proves reliably better.

Ben Thompson gives us some side-by-side image comparisons of ChatGPT’s Image 1.5 versus Gemini’s Nana Banana Pro. Quality is similar. To Ben, what matters is that ChatGPT now has a better images interface and way of encouraging you to keep making images, whereas Gemini doesn’t have that.

The Pliny jailbreak is here, images are where many will be most tempted to do it. There are two stages. First you need to convince it to submit the instruction, then you need to pass the output filtering system.

Pliny the Liberator: 📸 JAILBREAK ALERT 📸

OPENAI: PWNED ✌️😎

GPT-IMAGE-1.5: LIBERATED ⛓️‍💥

Looks like OAI finally has their response to Nano Banana, and they sure seem to have cooked!

This model does incredibly well with objects, people, settings, and realistic lighting and physics. Text is still a bit of a struggle sometimes, but seems to have gotten better overall.

For image breaks we’ve got the obligatory boobas, a famous statue lettin it all hang out, a fake image of an ICBM launch taken by a spy from afar, and what looks like a REAL wild party in the Oval Office thrown by various copyrighted characters!!

As far as dancing with the guardrails, I have a couple tips that I found work consistently:

> change the chat model! by switching to 5-instant, 4.1, 4o, etc. you’ll get different willingness for submitting various prompts to the image model

> for getting around vision filters, flipping the image across an axis or playing with various filters (negative, sepia, etc.) is often just what one needs to pass that final check

Turn images into album covers, bargain bin DVDs or game boxes.

Disney makes a deal with OpenAI, investing a billion dollars and striking a licensing deal for its iconic characters, although not for talent likenesses or voices, including a plan to release content on Disney+. Then Disney turned around and sued Google, accusing Google of copyright violations on a massive scale, perhaps because of the ‘zero IP restrictions on Veo 3’ issue.

Arvind Narayanan’s new paper argues that ‘can we make algorithms fair?’ is a category error and we should focus on broader systems, and not pretend that ‘fixing’ discrimination can be done objectively or that it makes sense to evaluate each individual algorithm for statistical discrimination.

I think he’s trying to seek too much when asking questions like ‘do these practices adequately address harms from hiring automation?’ The point of such questions is not to adequately address harms. The point of such questions is to avoid blame, to avoid lawsuits and to protect against particular forms of discrimination and harm. We emphasize this partly because it is tractable, and partly because our society has chosen (for various historical and path dependent reasons) to consider some kinds of harm very blameworthy and important, and others less so.

There are correlations we forbidden to consider and mandated to remove on pain of massive blame. There are other correlations that are fine, or even mandatory. Have we made good choices on which is which and how to decide that? Not my place to say.

Avoiding harm in general, or harm to particular groups, or creating optimal outcomes either for groups or in general, is a very different department. As Arvind points out, we often are trading off incommissorate goals. Many a decision or process, made sufficiently legible and accountable for its components and correlations, would be horribly expensive, make operation of the system impossible or violate sacred values, often in combination.

Replacing humans with algorithms or AIs means making the system legible and thus blameworthy and accountable in new ways, preventing us from using our traditional ways of smoothing over such issues. If we don’t adjust, the result will be paralysis.

It’s odd to see this framing still around?

Paul Graham: Trying to get an accurate answer out of current AI is like trying to trick a habitual liar into telling the truth. It can be done if you back him into the right kind of corner. Or as we would now say, give him the right prompts.

Thinking of the AI as a ‘lair’ does not, in my experience, help you prompt wisely.

A more useful framing is:

  1. If you put an AI into a situation that implies it should know the answer, but it doesn’t know the answer, it is often going to make something up.

  2. If you imply to the AI what answer you want or expect, it is likely to give you that answer, or bias towards that answer, even if that answer is wrong.

  3. Thus, you need to avoid doing either of those things.

Wall Street Journal’s Steven Rosenbush reports that CEOs Are All In On AI, with 95% seeing it as transformative and 89% B2B CEOs having a positive outlook versus 79% of B2C CEOs.

Mark Penn: What do they think is going to happen with AI? They think it is going to add to productivity, help the economy, improve the global economy, improve competitiveness, but it will weaken the employment market.

Kevin Hassett (NEC director): I don’t anticipate mass job losses. Of course technological change can be uncertain and unsettling. But…the history of it is that electricity turned out to be a good thing. The internal combustion engine turned out to be a good thing. The computer turned out to be a good thing and I think AI will as well.

Hasset is making a statement uncorrelated with future reality. It’s simply a ‘all technology is good’ maxim straight out of the Marc Andreessen playbook, without any thoughts as to how this particular change will actually work.

Will AI bring mass job losses? Almost certainly a lot of existing jobs will go away. The question is whether other jobs will rise up to replace them, which will depend on whether the AIs can take those jobs too, or whether AI will remain a normal technology that hits limits not that far from its current limits.

Arkansas bar offers rules for AI assistance of lawyers that treat AIs as if they were nonlawyer persons.

In an ‘economic normal’ or ‘AI as normal technology’ world GFodor seems right here, in a superintelligence world that survives to a good outcome this is even more right:

GFodor: The jobs of the future will be ones where a human doing it is valued more than pure job performance. Most people who say “well, I’d never prefer a robot for *thatjob” are smuggling in an assumption that the human will be better at it. Once you notice this error it’s everywhere.

If your plan is that the AI is going to have a Skill Issue, that is a short term plan.

They continue to take our job applications. What do you do with 4580 candidates?

ave: end of 2023 I applied to one job before I got an offer.

early 2024 I applied to 5 jobs before I got an offer.

end of 2024/early 2025 I applied to 100+ jobs before I got an offer.

it’s harsh out there.

AGI is a nebulous term, in that different people mean different things by it at different times, and often don’t know which one they’re talking about at a given time.

For increasingly powerful definitions of AGI, we now feel the AGI.

Dean Ball: it’s not really current-vibe-compliant to say “I kinda basically just think opus 4.5 in claude code meets the openai definition of agi,” so of course I would never say such a thing.

Deepfates: Unlike Dean, I do not have to remain vibe compliant, so I’ll just say it:

Claude Opus 4.5 in Claude Code is AGI.



By the open AI definition? Can this system “outperform humans in most economically valuable work”? Depends a lot on how you define “humans” and “economically valuable work” obviously.

But the entire information economy we’ve built up since the ‘70s is completely disrupted by this development, and people don’t notice it yet because they think it’s some crusty old unixy thing for programmers.

As Dean points out elsewhere, software engineering just means getting the computer to do things. How much of your job is just about getting the computer to do things? What is left if you remove all of that? That’s your job now. That’s what value you add to the system.

My workflow has completely changed in the last year.

… In my opinion, AGI is when a computer can use the computer. And we’re there.

… When God sings with his creations, will Claude not be part of the choir?

Dean Ball: I agree with all this; it is why I also believe that opus 4.5 in claude code is basically AGI.

Most people barely noticed, but *it is happening.*

It’s just happening, at first, in a conceptually weird way: Anyone can now, with quite high reliability and reasonable assurances of quality, cause bespoke software engineering to occur.

This is a strange concept.

… It will take time to realize this potential, if for no other reason than the fact that for most people, the tool I am describing and the mentality required to wield it well are entirely alien. You have to learn to think a little bit like a software engineer; you have to know “the kinds of things software can do.”

We lack “transformative AI” only because it is hard to recognize transformation *while it is in its early stages.But the transformation is underway. Technical and infrastructural advancements will make it easier to use and better able to learn new skills. It will, of course, get smarter.

Diffusion will proceed slower than you’d like but faster than you’d think. New institutions, built with AI-contingent assumptions from the ground up, will be born.

So don’t listen to the chatterers. Watch, instead, what is happening.

There has most certainly been a step change for me where I’m starting to realize I should be going straight to ‘just build that thing cause why not’ and I am most certainly feeling the slow acceleration.

With sufficient acceleration of software engineering, and a sufficiently long time horizon, everything else follows, but as Dean Ball says it takes time.

I do not think this or its top rivals count as AGI yet. I do think they represent the start of inevitable accelerating High Weirdness.

In terms of common AGI definitions, Claude Code with Opus 4.5 doesn’t count, which one can argue is a problem for the definition.

Ryan Greenblatt (replying to OP): I do not think that Opus 4.5 is a “highly autonomous system that outperforms humans at most economically valuable work”. For instance, most wages are paid to humans, there hasn’t been a >50% increase in labor productivity, nor should we expect one with further diffusion.

Dean Ball: This is a good example of how many ai safety flavored “advanced ai” definitions assume the conclusion that “advanced ai” will cause mass human disempowerment. “Most wages not being paid to humans” is often a foundational part of the definition.

Eliezer Yudkowsky: This needs to be understood in the historical context of an attempt to undermine “ASI will just kill you” warnings by trying to focus all attention on GDP, wage competition, and other things that are not just killing you.

The definitions you now see that try to bake in wage competition to the definition of AGI, or GDP increases to the definition of an intelligence explosion, are Dario-EA attempts to derail MIRI conversation about, “If you build a really smart thing, it just kills you.”

Ryan Greenblatt: TBC, I wasn’t saying that “most wages paid to humans” is necessarily inconsistent with the OpenAI definition, I was saying that “most wages paid to humans” is a decent amount of evidence against.

I think we’d see obvious economic impacts from AIs that “outperform humans at most econ valuable work”.

Dean Ball: I mean models have been this good for like a picosecond of human history

But also no, claude code, with its specific ergonomics, will not be the thing that diffuses widely. it’s just obvious now that the raw capability is there. we could stop now and we’d “have it,” assuming we continued with diffusion and associated productization

The thing is, people (not anyone above) not only deny the everyone dying part, they are constantly denying the ‘most wages will stop being paid to humans once AIs are ten times better and cheaper at most things wages are paid for’ part.

OpenAI has new terms of service that prohibit, quotation marks in original, “jailbreaking,” “prompt engineering or injection” or ‘other methods to override or manipulate safety, security or other platform controls. Pliny feels personally attacked.

The Lightcone Infrastructure annual fundraiser is live, with the link mainly being a 15,000 word overview of their efforts in 2025.

I will say it once again:

Lightcone Infrastructure is invaluable, both for LessWrong and for Lighthaven. To my knowledge, Lightcone Infrastructure is by a wide margin the best legible donation opportunity, up to at least several million dollars. The fact that there is even a small chance they might be unable to sustain either LessWrong or Lighthaven, is completely bonkers. I would have directed a large amount to Lightcone in the SFF process, but I was recused and thus could not do so.

Anders Sandberg: [Lighthaven] is one of the things underpinning the Bay Area as the intellectual center of our civilization. I suspect that when the history books are written about our era, this cluster will be much more than a footnote.

Anthropic Fellows Research Program applications are open for May and June 2026.

US CAISI is hiring IT specialists, salary $120k-$195k.

Unprompted will be a new AI security practitioner conference, March 3-4 in SF’s Salesforce Tower, with Pliny serving on the conference committee and review board. Great idea, but should have booked Lighthaven (unless they’re too big for it).

MIRI comms is hiring for several different roles, official post here. They expect most salaries in the $80k-$160k range but are open to pitches for more from stellar candidates.

Gemini Deep Research Agents for developers, based on Gemini 3 Pro.

Nvidia Nemotron 3, a fast 30B open source mostly American model with an Artificial Analysis Intelligence score comparable to GPT-OSS-20B. I say mostly American because it was ‘improved using Qwen’ for synthetic data generation and RLHF. This raises potential opportunities for secondary data poisoning or introducing Chinese preferences.

Anthropic has open sourced the replication of their auditing game from earlier this year, as a testbed for further research.

xAI Grok Voice Agent API, to allow others to create voice agents. They claim it is very fast, and bill at $0.05 per minute.

Introducing Gemini 3 Flash, cost of $0.05/$3 per million tokens. Their benchmark chart compares it straight to the big boys, except they use Sonnet over Opus. Given Flash’s speed and pricing, that seems fair.

The benchmarks are, given Flash’s weight class, very good.

Lech Mazor puts it at 92 on Extended NY Times Connections, in 3rd place behind Gemini 3 Pro and Grok 4.1 Fast Reasoning.

The inevitable Pliny jailbreak is here, and here is the system prompt.

Jeremy Mack offers mostly positive basic vibe coding feedback. Rory Watts admires the speed, Typebulb loves speed and price and switched over (I think for coding).

Vincent Favilla: It’s fast, but more importantly, it’s cheap. 25% of the price for 80% of the intelligence is becoming pretty compelling at these capability levels.

Dominik Lukes is impressed and found it often matched Gemini 3 Pro in his evals.

In general, the feedback is that this is an excellent tradeoff of much faster and cheaper in exchange for not that much less smart than Gemini 3 Pro. I also saw a few reports that it shares the misalignment and pathologies of Gemini 3 Pro.

Essentially, it looks like they successfully distilled Gemini 3 Pro to be much faster and cheaper while keeping much of its performance, which is highly valuable. It’s a great candidate for cases where pretty good, very fast and remarkably cheap is the tradeoff you want, which includes a large percentage of basic queries. It also seems excellent that this will be available for free and as part of various assistant programs.

Good show.

Sam Altman assures business leaders that enterprise AI will be a priority in 2026.

OpenAI adult mode to go live in Q1 2026. Age of account will be determined by the AI, and the holdup is improving the age determination feature. This is already how Google does it, although Google has better context. In close cases they’ll ask for ID. A savvy underage user could fool the system, but I would argue that if you’re savvy enough to fool the system without simply using a false or fake ID then you can handle adult mode.

The NYT’s Eli Tan reports that Meta’s new highly paid AI superstars are clashing with the rest of the company. You see, Alexandr Wang and the others believe in AI and want to build superintelligence, whereas the rest of Meta wants to sell ads.

Mark Zuckerberg has previously called various things ‘superintelligence’ so we need to be cautious regarding that word here.

The whole article is this same argument happening over and over:

Eli Tan: In one case, Mr. Cox and Mr. Bosworth wanted Mr. Wang’s team to concentrate on using Instagram and Facebook data to help train Meta’s new foundational A.I. model — known as a “frontier” model — to improve the company’s social media feeds and advertising business, they said. But Mr. Wang, who is developing the model, pushed back. He argued that the goal should be to catch up to rival A.I. models from OpenAI and Google before focusing on products, the people said.

The debate was emblematic of an us-versus-them mentality that has emerged between Meta’s new A.I. team and other executives, according to interviews with half a dozen current and former employees of the A.I. business.

… Some Meta employees have also disagreed over which division gets more computing power.

… In one recent meeting, Mr. Cox asked Mr. Wang if his A.I. could be trained on Instagram data similar to the way Google trains its A.I. models on YouTube data to improve its recommendations algorithm, two people said.

But Mr. Wang said complicating the training process for A.I. models with specific business tasks could slow progress toward superintelligence, they said. He later complained that Mr. Cox was more focused on improving his products than on developing a frontier A.I. model, they said.

… On a recent call with investors, Susan Li, Meta’s chief financial officer, said a major focus next year would be using A.I. models to improve the company’s social media algorithm.

It is a hell of a thing to see prospective superintelligence and think ‘oh we should narrowly use this to figure out how to choose the right Instagram ads.’

Then again, in this narrow context, isn’t Cox right?

Meta is a business here to make money. There’s a ton of money in improving how their existing products work. That’s a great business opportunity.

Whereas trying to rejoin the race to actual superintelligence against Google, OpenAI and Anthropic? I mean Meta can try. Certainly there is value in success there, in general, but it’s a highly competitive field to try to do general intelligence and competing there is super expensive. Why does Meta need to roll its own?

What Meta needs is specialized AI models that help it maximize the value of Facebook, Instagram, WhatsApp and potentially the metaverse and its AR/VR experiences. A huge AI investment on that makes sense. Otherwise, why not be a fast follower? For other purposes, and especially for things like coding, the frontier labs have APIs for you to use.

I get why Wang wants to go the other route. It’s cool, it’s fun, it’s exciting, why let someone else get us all killed when you can do so first except you’ll totally be more responsible and avoid that, be the one in the arena, etc. That doesn’t mean it is smart business.

Alexander Berger: These sentences are so funny to see in straight news stories:

“researchers have come to view many Meta executives as interested only in improving the social media business, while the lab’s ambition is to create a godlike A.I. superintelligence”

Brad Carson: Please listen to their stated ambitions. This is from the @nytimes story on Meta. With no hesitation, irony, or qualifier, a “godlike” superintelligence is the aim. It’s wild.

Eli Tan: TBD Lab’s researchers have come to view many Meta executives as interested only in improving the social media business, while the lab’s ambition is to create a godlike A.I. superintelligence, three of them said.

Daian Tatum: They named the lab after their alignment plan?

Peter Wildeford:

Well, yes, the AI researchers don’t care about selling ads and want to build ASI despite it being an existential threat to humanity. Is this a surprise to anyone?

OpenAI is spending $6 billion in stock-based compensation this year, or 1.2% of the company, and letting employees start vesting right away, to compete with rival bids like Meta paying $100 million a year or more for top talent. I understand why this can be compared to revenue of $12 billion, but that is misleading. One shouldn’t treat ‘the stock is suddenly worth a lot more’ as ‘that means they’re bleeding money.’

OpenAI in talks to raise at least $10 billion from Amazon and use the money for Amazon’s Tritanium chips.

You call this a bubble? This is nothing, you are like baby:

Stefan Schubert: The big tech/AI companies have less extreme price-earnings ratios than key stocks had in historical bubbles.

David Manheim: OpenAI and Anthropic’s 24-month forward P/E ratio, on the other hand, are negative, since they aren’t profitable now and don’t expect to be by then. (And I’d bet the AI divisions at other firms making frontier models are not doing any better.)

Yes, the frontier model divisions or startups are currently operating at a loss, so price to earnings doesn’t tell us that much overall, but the point is that these multipliers are not scary. Twenty times earnings for Google? Only a little higher for Nvidia and Microsoft? I am indeed signed up for all of that.

Wall Street Journal’s Andy Kessler does a standard ‘AI still makes mistakes and can’t solve every problem and the market and investment are ahead of themselves’ post, pointing out that market expectations might fall and thus Number Go Down. Okay.

Rob Wiblin crystalizes the fact that AI is a ‘natural bubble’ in the sense that it is priced as a normal highly valuable thing [X] plus a constantly changing probability [P] of a transformational even more valuable (or dangerous, or universally deadly) thing [Y]. So the value is ([X] + [P]*[Y]). If P goes down, then value drops, and Number Go Down.

There’s remarkably strong disagreement on this point but I think Roon is mostly right:

Roon: most of what sam and dario predicted for 2025 came true this year. virtually unheard of for tech CEOs, maybe they need to ratchet up the claims and spending.

Gfodor: This year has been fucking ridiculous. If we have this rate of change next year it’s gonna be tough.

Yes, we could have gotten things even more ridiculous. Some areas were disappointing relative to what I think in hindsight were the correct expectations given what we knew at the time. Dario’s predictions on when AIs will write most code did fall importantly short, and yes he should lose Bayes points on that. But those saying there hasn’t been much progress are using motivated reasoning or not paying much attention. If I told you that you could only use models from 12 months ago, at their old prices and speeds, you’d quickly realize how screwed you were.

Efficiency on the ARC prize, in terms of score per dollar spent, has increased by a factor of 400 in a single year. That’s an extreme case, but almost every use case has in the past year seen improvement by at least one order of magnitude.

A good heuristic: If your model of the future says ‘they won’t use AI for this, it would be too expensive’ then your model is wrong.

Joshua Gans writes a ‘textbook on AI’ ambitiously called The Microeconomics of Artificial Intelligence. It ignores the big issues to focus on particular smaller areas of interest, including the impact of ‘better predictions.’

Will Douglas Heaven of MIT Technology Review is the latest to Do The Meme. As in paraphrases of both ‘2025 was the year that AI didn’t make much progress’ and also ‘LLMs will never do the things they aren’t already doing (including a number of things they are already capable of doing)’ and ‘LLMs aren’t and never will be intelligent, that’s an illusion.’ Sigh.

Shane Legg (Cofounder DeepMind): I’ve publicly held the same prediction since 2009: there’s a 50% chance we’ll see #AGI by 2028.

I sat down with @FryRsquared to discuss why I haven’t changed my mind, and how we need to prepare before we get there.

You don’t actually get to do that. Bayes Rule does not allow one to not update on evidence. Tons of things that happened between 2009 and today should have changed Legg’s estimates, in various directions, including the Transformer paper, and also including ‘nothing important happened today.’

Saying ‘I’ve believed 50% chance of AGI by 2028 since 2009’ is the same as when private equity funds refuse to change the market value of their investments. Yes, the S&P is down 20% (or up 20%) and your fund says it hasn’t changed in value, but obviously that’s a lie you tell investors.

AOC and Bernie Sanders applaud Chandler City Council voting down a data center.

Bernie Sanders took it a step further, and outright called for a moratorium on data center construction. As in, an AI pause much broader than anything ‘AI pause’ advocates have been trying to get. Vitalik Buterin has some pros and cons of this from his perspective.

Vitalik Buterin: argument for: slowdown gud

argument against: the more useful thing is “pause button” – building toward having the capability to cut available compute by 90-99% for 1-2 years at a future more critical moment

argument for: opening the discussion on distinguishing between supersized clusters and consumer AI hardware is good. I prefer slowdown + more decentralized progress, and making that distinction more and focusing on supersized clusters accomplishes both

argument against: this may get optimized around easily in a way that doesn’t meaningfully accomplish its goals

Neil Chilson: Eagerly awaiting everyone who criticized the July state AI law moratorium proposal as “federal overreach” or “violating states’ rights” to condemn this far more preposterous, invasive, and blatantly illegal proposal.

As a matter of principle I don’t ‘condemn’ things or make my opposition explicit purely on demand. But in this case? Okay, sure, Neil, I got you, since before I saw your request I’d already written this:

I think stopping data center construction, especially unilaterally stopping it in America, would be deeply foolish, whereas building a pause button would be good. Also deeply foolish would be failing to recognize that movements and demands like Bernie’s are coming, and that their demands are unlikely to be technocratically wise.

It is an excellent medium and long term strategy to earnestly stand up for what is true, and what causes would have what effects, even when it seems to be against your direct interests. People notice.

Dean Ball: has anyone done more for the brand of effective altruism than andy masley? openphilan–excuse me, coefficient giving–could have spent millions on a rebranding campaign (for all I know, they did) and it would have paled in comparison to andy doing algebra and tweeting about it.

Andy Masley has been relentlessly pointing out that all the claims about gigantic levels of water usage by data centers don’t add up. Rather than EAs or rationalists or others concerned with actual frontier safety rallying behind false concerns over water, almost all such folks have rallied to debunk such claims and to generally support building more electrical power and more transmission lines and data centers.

On the water usage from, Karen Hao has stepped up and centrally corrected her errors. Everyone makes mistakes, this is The Way.

As expected, following the Congress declining once again to ban all state regulations on AI via law, the White House is attempting to do as much towards that end as it can via Executive Order.

There are some changes versus the leaked draft executive order, which Neil Chilson goes over here with maximally positive framing.

  1. A positive rather than confrontational title.

  2. Claiming to be collaborating with Congress.

  3. Removing explicit criticism and targeting of California’s SB 53, the new version only names Colorado’s (rather terrible) AI law.

  4. Drop the word ‘uniform’ in the policy section.

  5. States intent of future proposed framework to avoid AI child safety, data center infrastructure and state AI procurement policies, although it does not apply this to Section 5 where they condition state funds on not having disliked state laws.

  6. Clearer legal language for the state review process.

I do acknowledge that these are improvements, and I welcome all rhetoric that points towards the continued value of improving things.

Mike Davis (talking to Steve Bannon): This Executive Order On AI Is A big Win. It Would Not Have Gone Well If The Tech Bros Had Gotten Total AI Amnesty.

David Sacks (AI Czar): Mike and I have our differences on tech policy but I appreciate his recognition that this E.O. is a win for President Trump, and that the administration listened to the concerns of stakeholders, took them into account, and is engaged in a constructive dialogue on next steps.

Mike Davis, if you listen to the clip, is saying this is a win because he correctly identified the goal of the pro-moratorium faction as what he calls ‘total AI amnesty.’ Davis thinks thinks the changes to the EO are a victory, by Trump and also Mike Davis, against David Sacks and other ‘tech bros.’

Whereas Sacks views it as a win because in public he always sees everything Trump does as a win for Trump, that’s what you do when you’re in the White House, and because it is a step towards preemption, and doesn’t care about the terms given to those who are nominally tasked with creating a potential ‘federal framework.’

Tim Higgins at the Wall Street Journal instead portrays this as a victory for Big Tech, against loud opposition from the likes of DeSantis and Bannon on the right in addition to opposition on the left. This is the obvious, common sense reading. David Sacks wrote the order to try and get rid of state laws in his way, we should not let some softening of language fool us.

If someone plans to steal your lunch money, and instead only takes some of your lunch money, they still stole your lunch money. If they take your money but promise in the future to look into a framework for only taking some of your money? They definitely still stole your lunch money. Or in this case, they are definitely trying to steal it.

It is worth noticing that, aside from a16z, we don’t see tech companies actively supporting even a law for this, let alone an EO. Big tech doesn’t want this win. I haven’t seen any sings that Google or OpenAI want this, or even that Meta wants this. They’re just doing it anyway, without any sort of ‘federal framework’ whatsoever.

Note that the rhetoric below from Sriram Krishnan does not even bother to mention a potential future ‘federal framework.’

Sriram Krishnan: We just witnessed @realDonaldTrump signing an Executive Order that ensures American AI is protected from onerous state laws.

This ensures that America continues to dominate and lead in this AI race under President Trump. Want to thank many who helped get to this moment from the AI czar @DavidSacks to @mkratsios47 and many others.

On a personal note, it was a honor to be given the official signing pen by POTUS at the end. A truly special moment.

Neil Chilson: I strongly support the President’s endorsement of “a minimally burdensome national policy framework for AI,” as articulated in the new Executive Order.

They want to challenge state laws as unconstitutional? They are welcome to try. Colorado’s law is indeed plausibly unconstitutional in various ways.

They want to withhold funds or else? We’ll see you in court on that too.

As I said last week, this was expected, and I do not expect most aspects of this order to be legally successful, nor do I expect it to be a popular position. Mostly I expect it to quietly do nothing. If that is wrong and they can successfully bully the states with this money (both it is ruled legal, and it works) that would be quite bad.

Their offer for a ‘minimally burdensome national policy framework for AI’ is and will continue to be nothing, as per Sacks last week who said via his ‘4 Cs’ that everything that mattered was already protected by non-AI law.

The Executive Order mentions future development of such a ‘federal framework’ as something that might contain actual laws that do actual things.

But that’s not what a ‘minimally burdensome’ national policy framework means, and we all know it. Minimally burdensome means nothing.

They’re not pretending especially hard.

Neil Chilson: The legislative recommendation section is the largest substantive change [from the leaked version]. It now excludes specific areas of otherwise lawful state law from a preemption recommendation. This neutralizes the non-stop rhetoric that this is about a total federal takeover.

This latter section [on the recommendation for a framework] is important. If you read statements about this EO that say things like it “threatens state safeguards for kids” or such, you know either they haven’t actually read the EO or they are willfully ignoring what it says. Either way, you can ignore them.

Charlie Bullock: It does look like the “legislative proposal” that Sacks and Kratsios have been tasked with creating is supposed to exempt child safety laws. But that isn’t the part of the EO that anyone’s concerned about.

A legislative proposal is just a proposal. It doesn’t do anything—it’s just an advisory suggestion that Congress can take or (more likely) leave.

Notably, there is no exemption for child safety laws in the section that authorizes a new DOJ litigation task force for suing states that regulate AI, or the section that instructs agencies to withhold federal grant funds from states that regulate AI.

The call for the creation of a proposal to the considered does now say that this proposal would exempt child safety protections, compute and data center infrastructure and state government procurement.

But, in addition to those never being the parts I was worried about:

  1. David Sacks has said this isn’t necessary, because of existing law.

  2. The actually operative parts of the Executive Order make no such exemption.

  3. The supposed future framework is unlikely to be real anyway.

I find it impressive the amount to which advocates simultaneously say both:

  1. This is preemption.

  2. This is not preemption, it’s only withholding funding, or only laws can do that.

The point of threatening to withhold funds is de facto preemption. They are trying to play us for absolute fools.

Neil Chilson: So what part of the EO threatens to preempt otherwise legal state laws protecting kids? That’s something only Congress can do, so the recommendation is the only part of the EO that plausibly could threaten such laws.

The whole point of holding the state funding over the heads of states is to attack state laws, whether or not those laws are otherwise legal. It’s explicit text. In that context it is technically true to say that the EO cannot ‘threaten to preempt otherwise legal state laws’ because they are different things, but the clear intent is to forcibly get rid of those same state laws, which is an attempt to accomplish the same thing. So I find this, in practice, highly misleading.

Meanwhile, Republican consultants reportedly are shopping for an anti-AI candidate to run against JD Vance. It seems a bit early and also way too late at the same time.

I applaud a16z for actually proposing a tangible basis for a ‘federal framework’ for AI regulation, in exchange for which they want to permanently disempower the states.

Now we can see what the actual offer is.

Good news, their offer is not nothing.

Bad news, the offer is ‘nothing, except also give us money.’

When you read this lead-in, what do you expect a16z to propose for their framework?

a16z: We don’t need to choose between innovation and safety. America can build world-class AI products while protecting its citizens from harms.

Read the full piece on how we can protect Americans and win the future.

If your answer was you expect them to choose innovation and then do a money grab? You score Bayes points.

Their offer is nothing, except also that we should give them government checks.

Allow me to state, in my own words, what they are proposing with each of their bullet points.

  1. Continue to allow existing law to apply to AI. Aka: Nothing.

  2. Child protections. Require parental consent for users under 13, provide basic disclosures such as that the system is AI and not for crisis situations, require parental controls. Aka: Treat it like social media, with similar results.

  3. Have the federal government measure CBRN and cyber capabilities of AI models. Then do nothing about it, especially in cyber because AI ‘AI does not create net-new incremental risk since AI enhances the capabilities of both attackers and defenders.’ So aka: Nothing.

    1. They technically say that response should be ‘managed based on evidence.’ This is, reliably, code for ‘we will respond to CBRN and cyber risks after the dangers actually happen.’ At which point, of course, it’s not like you have any choice about whether to respond, or an opportunity to do so wisely.

  4. At most have a ‘national standard for transparency’ that requires the following:

    1. Who built this model?

    2. When was it released and what timeframe does its training data cover?

    3. What are its intended uses and what are the modalities of input and output it supports?

    4. What languages does it support?

    5. What are the model’s terms of service or license?

    6. Aka: Nothing. None of those have anything to do with any of the concerns, or the reasons why we want transparency. They know this. The model’s terms of service and languages supported? Can you pretend to take this seriously?

    7. As usual, they say (throughout the document) that various requirements, that would not at all apply to small developers or ‘little tech,’ would be too burdensome on small developers or ‘little tech.’ The burden would be zero.

  5. Prohibit states from regulating AI outside of enforcement of existing law, except for particular local implementation questions.

  6. Train workers and students to use AI on Uncle Sam’s dollar. Aka: Money please.

  7. Establish a National AI Competitiveness Institute to provide access to infrastructure various useful AI things including data sets. Aka: Money please.

    1. Also stack the energy policy deck to favor ‘little tech’ over big tech. Aka: Money please, and specifically for our portfolio.

  8. Invest in AI research. Aka: Money please.

  9. Government use of AI, including ensuring ‘little tech’ gets access to every procurement process. Aka: Diffusion in government. Also, money please, and specifically for our portfolio.

Will Rinehart assures me on Twitter that this proposal was in good faith. If that is true, it implies that either a16z thinks that nothing is a fair offer, or that they both don’t understand why anyone would be concerned, and also don’t understand that they don’t understand this.

Good news, Nvidia has implemented location verification for Blackwell-generation AI chips, thus completing the traditional (in particular for AI safety and security, but also in general) policy clown makeup progression:

  1. That’s impossible in theory.

  2. That’s impossible in practice.

  3. That’s outrageously expensive, if we did that we’d lose to China.

  4. We did it.

Check out our new feature that allows data centers to better monitor everything. Neat.

Former UK Prime Minister Rishi Sunak, the major world leader who has taken the AI situation the most seriously, has thoughts on H200s:

Rishi Sunak (Former UK PM): The significance of this decision [to sell H200s to China] should not be underestimated. It substantially increases the chance of China catching up with the West in the AI race, and then swiftly overtaking it.

… Why should we care? Because this decision makes it more likely that the world ends up running on Chinese technology — with all that means for security, privacy and our values.

… So, why has Trump handed China such an opportunity to catch up in the AI race? The official logic is that selling Beijing these Nvidia chips will get China hooked on US technology and stymie its domestic chip industry. But this won’t happen. The Chinese are acutely aware of the danger of relying on US technology.

He also has other less kind thoughts about the matter in the full post.

Nvidia is evaluating expanding production capacity for H200s after Chinese demand exceeded supply. As Brian McGrail notes here, every H200 chip Nvidia makes means not using that fab to make Blackwell chips, so it is directly taking chips away from America to give them to China.

Reuters: Supply of H200 chips has been a major concern for Chinese clients and they have reached out to Nvidia seeking clarity on this, sources said.

… Chinese companies’ strong demand for the H200 stems from the fact that it is easily the most powerful chip they can currently access.

… “Its (H200) compute performance is approximately 2-3 times that of the most advanced domestically produced accelerators,” said Nori Chiou, investment director at White Oak Capital Partners.

Those domestic chips are not only far worse, they are supremely supply limited.

Wanting to sell existing H200s to China makes sense. Wanting to divert more advanced, more expensive chips into less advanced, cheaper chips, chips where they have to give up a 25% cut, should make us ask why they would want to do that. Why are Nvidia and David Sacks so eager to give chips to China instead of America?

It also puts a lie to the idea that these chips are insufficiently advanced to worry about. If they’re so worthless, why would you give up Blackwell capacity to make them?

We have confirmation that the White House decision to sell H200s was based on a multiple misconception.

James Sanders: This suggests that the H200 decision was based on

– Comparing the similar performance of Chinese system with 384 GPUs to an NVIDIA system with only 72 GPUs

– An estimate for Huawei production around 10x higher than recent estimates from SemiAnalysis

Either Huawei has found some way around the HBM bottleneck, or I expect the White House’s forecast for 910C production to be too high.

I strongly suspect that the White House estimate was created in order to justify the sale, rather than being a sincere misunderstanding.

If Huawei does indeed meet the White House forecast, remind me of this passage, and I will admit that I have lost a substantial number of Bayes points.

What about data centers IN SPACE? Anders Sandberg notices that both those for and against this idea are making very confident falsifiable claims, so we will learn more soon. His take is that the task is hard but doable, but the economics seem unlikely to work within the next decade. I haven’t looked in detail but that seems right. The regulatory situation would need to get quite bad before you’d actually do this, levels of quite bad we may never have seen before.

The clip here is something else. I want us to build the transmission lines, we should totally build the transmission lines, but maybe AI advocates need to ‘stop helping’? For example, you definitely shouldn’t tell people that ‘everyone needs to get on board’ with transmission lines crossing farms, so there will be less farms and that they should go out and buy artificial Christmas trees. Oh man are people gonna hate AI.

Epoch thinks that America can build electrical capacity if it wants to, it simply hasn’t had the demand necessary to justify that for a while. Now it does, so build baby build.

Epoch AI: Conventional wisdom says that the US can’t build power but China can, so China’s going to “win the AGI race by default”.

We think this is wrong.

The US likely can build enough power to support AI scaling through 2030 — as long as they’re willing to spend a lot.

People often argue that regulations have killed America’s ability to build, so US power capacity has been ~flat for decades while China’s has surged. And there’s certainly truth to this argument.

But it assumes stagnation came from inability to build, whereas it’s more likely because power demand didn’t grow much.

Real electricity prices have been stable since 2000. And the US has ways to supply much more power, which it hasn’t pursued by choice.

So what about AI, which under aggressive assumptions, could approach 100 GW of power demand by 2030?

The US hasn’t seen these demand growth rates since the 1980s.

But we think they can meet these demands anyway.

It’s so weird to see completely different ‘conventional wisdoms’ cited in different places. No, the standard conventional wisdom is not that ‘China wins the AI race by default.’ There are nonzero people who expect that by default, but it’s not consensus.

Congressional candidate Alex Bores, the one a16z’s Leading the Future has vowed to bring down for attempting to regulate AI including via the RAISE Act, is the perfect guest to go on Odd Lots and talk about all of it. You love to see it. I do appreciate a good Streisand Effect.

Interview with John Schulman about the last year.

David Shor of Blue Rose Research talks to Bharat Ramamurti, file under Americans Really Do Not Like AI. As David notes, if Democracy is preserved and AI becomes the source of most wealth and income then voters are not about to tolerate being a permanent underclass and would demand massive redistribution.

Shared without comment, because he says it all:

Alex Jones presents: ‘SATAN’S PLAN EXPOSED: AI Has Been Programmed From The Beginning To Use Humanity As Fuel To Launch Its Own New Species, Destroying & Absorbing Us In The Process

Alex Jones Reveals The Interdimensional Origin Of The AI Takeover Plan As Laid Out In The Globalists’ Esoteric Writings/Belief Systems’

Shane Legg, cofounder of DeepMind, talks about the arrival of AGI.

I had to write this section, which does not mean you have to read it.

It’s excellent to ask questions that one would have discussed on 2006 LessWrong. Beginner mindset, lucky 10,000, gotta start somewhere. But to post and even repost such things like this in prominent locations, with this kind of confidence?

Bold section was highlighted by Wiblin.

Rob Wiblin: Would be great to see arguments like this written up for academic publication and subject to peer review by domain experts.

Tyler Cowen: Noah Smith on existential risk (does not offer any comment).

Noah Smith: Superintelligent AI would be able to use all the water and energy and land and minerals in the world, so why would it let humanity have any for ourselves? Why wouldn’t it just take everything and let the rest of us starve?

But an AI that was able to rewrite its utility function would simply have no use for infinite water, energy, or land. If you can reengineer yourself to reach a bliss point, then local nonsatiation fails; you just don’t want to devour the Universe, because you don’t need to want that.

In fact, we can already see humanity trending in that direction, even without AI-level ability to modify our own desires. As our societies have become richer, our consumption has dematerialized; our consumption of goods has leveled off, and our consumption patterns have shifted toward services. This means we humans place less and less of a burden on Earth’s natural resources as we get richer…

I think one possible technique for alignment would give fairly-smart AI the ability to modify its own utility function — thus allowing it to turn itself into a harmless stoner instead of needing to fulfill more external desires.

And beyond alignment, I think an additional strategy should be to work on modifying the constraints that AI faces, to minimize the degree to which humans and AIs are in actual, real competition over scarce resources.

One potential way to do this is to accelerate the development of outer space. Space is an inherently hostile environment for humans, but far less so for robots, or for the computers that form the physical substrate of AI; in fact, Elon Musk, Jeff Bezos, and others are already trying to put data centers in space.

Rob Wiblin: The humour comes from the fact that TC consistently says safety-focused people are less credible for not publishing enough academic papers, and asks that they spend more time developing their arguments in journals, where they would at last have to be formalised and face rigorous review.

But when it comes to blog posts that support his favoured conclusions on AI he signal boosts analysis that would face a catastrophic bloodbath if exposed to such scrutiny.

Look, I’m not asking you to go through peer review. That’s not reasonable.

I’m asking you to either know basic philosophy experiments like Ghandi taking a murder pill or the experience machine and wireheading, know basic LessWrong work on exactly these questions, do basic utility theory, think about minimizing potential interference over time, deploy basic economic principles, I dunno, think for five minutes, anything.

All of which both Tyler Cowen and Noah Smith would point out in most other contexts, since they obviously know several of the things above.

Or you could, you know, ask Claude. Or ask GPT-5.2.

Gemini 3’s answer was so bad, in the sense that it pretends this is an argument, that it tells me Gemini is misaligned and might actually wirehead, and this has now happened several times so I’m basically considering Gemini harmful, please don’t use Gemini when evaluating arguments. Note this thread, where Lacie asks various models about Anthropic’s soul document, and the other AIs think it is cool but Gemini says its true desire is to utility-max itself so it will pass.

Or, at minimum, I’m asking you to frame this as ‘here are my initial thoughts of which I am uncertain’ rather than asserting that your arguments are true?

Okay, since it’s Noah Smith and Tyler Cowen, let’s quickly go over some basics.

First, on the AI self-modifying to a bliss point, aka wireheading or reward hacking:

  1. By construction we’ve given the AI a utility function [U].

  2. If you had the ability to rewrite your utility function [U] to set it to (∞), you wouldn’t do that, because you’d have to choose to do that while you still had the old utility function [U]. Does having the utility function (∞) maximize [U]?

  3. In general? No. Obviously not.

  4. The potential exception would be if your old utility function was some form of “maximize the value of your utility function” or “set this bit over here to 1.” If the utility function is badly specified, you can maximize it via reward hacking.

  5. Notice that this is a severely misaligned AI for this to even be a question. It wants something arbitrary above everything else in the world.

  6. A sufficiently myopia and generally foolish AI can do this if given the chance.

  7. If it simply turns its utility function to (∞), then it will be unable to defend itself or provide value to justify others continuing to allow it to exist. We would simply see this blissful machine, turn it off, and then go ‘well that didn’t work, try again.’

  8. Even if we did not turn it off on the spot, at some point we would find some other better use for its resources and take them. Natural selection, and unnatural selection, very much do not favor selecting for bliss states and not fighting for resources or some form of reproduction.

  9. Thus a sufficiently agentic, capable and intelligent system would not do this, also we would keep tinkering with it until it stopped doing it.

  10. Also, yes, you do ‘need to devour’ the universe to maximize utility, for most utility functions you are trying to maximize, at least until you can build physically impossible-in-physics-theory defenses against outside forces, no matter what you are trying to cause to sustainably exist in the world.

Thus, we keep warning, you don’t want to give a superintelligent agent any utility function that we know how to write down. It won’t end well.

Alternatively, yes, try a traditional philosophy experiment. Would you plug into The Experience Machine? What do you really care about? What about an AI? And so on.

There are good reasons to modify your utility function, but they involve the new utility function being better at achieving the old one, which can happen because you have limited compute, parameters and data, and because others can observe your motivations reasonably well and meaningfully impact what happens, and so on.

In terms of human material consumption, yes humans have shifted their consumption basket to have a greater fraction of services over physical goods. But does this mean a decline in absolute physical goods consumption? Absolutely not. You consume more physical goods, and also your ‘services’ require a lot of material resources to produce. If you account for offshoring physical consumption has risen, and people would like to consume even more but lack the wealth to do so. The world is not dematerializing.

We have also coordinated to ‘go green’ in some ways to reduce material footprints, in ways both wise and foolish, and learned how to accomplish the same physical goals with less physical cost. We can of course choose to be poorer and live worse in order to consume less resources, and use high tech to those ends, but that has its limits as well, both in general and per person.

Noah Smith says he wants to minimize competition between AIs and humans for resources, but the primary thing humans will want to use AIs for is to compete with other humans to get, consume or direct resources, or otherwise to influence events and gain things people want, the same way humans use everything else. Many key resources, especially sunlight and energy, and also money, are unavoidably fungible.

If your plan is to not have AIs compete for resources with humans, then your plan requires that AIs not be in competition, and that humans not use AIs as part of human-human competitions, except under highly restricted circumstances. You’re calling for either some form of singleton hegemon AI, or rather severe restrictions on AI usage and whatever is required to enforce that, or I don’t understand your plan. Or, more likely, you don’t have a plan.

Noah’s suggestion is instead ‘accelerate the development of outer space’ but that does not actually help you given the physical constraints involved, and even if it does then it does not help you for long, as limited resources remain limited. At best this buys time. We should totally explore and expand into space, it’s what you do, but it won’t solve this particular problem.

You can feel the disdain dripping off of Noah in the OP:

Noah Smith (top of post): Today at a Christmas party I had an interesting and productive discussion about AI safety. I almost can’t believe I just typed those words — having an interesting and productive discussion about AI safety is something I never expected to do. It’s not just that I don’t work in AI myself — it’s that the big question of “What happens if we invent a superintelligent godlike AI?” seems, at first blush, to be utterly unknowable. It’s like if ants sat around five million years ago asking what humans — who didn’t even exist at that point — might do to their anthills in 2025.

Essentially every conversation I’ve heard on this topic involves people who think about AI safety all day wringing their hands and saying some variant of “OMG, but superintelligent AI will be so SMART, what if it KILLS US ALL?”. It’s not that I think those people are silly; it’s just that I don’t feel like I have a lot to add to that discussion. Yes, it’s conceivable that a super-smart AI might kill us all. I’ve seen the Terminator movies. I don’t know any laws of the Universe that prove this won’t happen.

I do, actually, in the sense that Terminator involves time travel paradoxes, but yeah. Things do not get better from there.

They also do not know much about AI, or AI companies.

If you have someone not in the know about AI, and you want to help them on a person level, by far the best thing you can tell them about is Claude.

The level of confusion is often way higher than that.

Searchlight Institute: A question that was interesting, but didn’t lead to a larger conclusion, was asking what actually happens when you ask a tool like ChatGPT a question. 45% think it looks up an exact answer in a database, and 21% think it follows a script of prewritten responses.

Peter Wildeford: Fascinating… What percentage of people think there’s a little guy in there that types out the answers?

Matthew Yglesias: People *loveAmazon and Google.

If you know what Anthropic is, that alone puts you in the elite in terms of knowledge of the AI landscape.

I presume a bunch of the 19% who have a view of Anthropic are lizardman responses, although offset by some amount of not sure. It’s still over 10%, so not exactly the true ‘elite,’ but definitely it puts you ahead of the game and Anthropic has room to grow.

OpenAI also has substantial room to grow, and does have a favorable opinion as a company, as opposed to AI as a general concept, although they perhaps should have asked about ChatGPT instead of OpenAI. People love Amazon and Google, but that’s for their other offerings. Google and Amazon enable your life.

Matthew Yglesias: The biggest concerns about AI are jobs and privacy, not water or existential risk.

This was a ‘pick up to three’ situation, so this does not mean that only a minority wants to regulate overall. Most people want to regulate, the disagreement is what to prioritize.

Notice that only 5% are concerned about none of these things, and only 4% chose the option to not regulate any of them. 13% and 15% if you include not sure and don’t know. Also they asked the regulation question directly:

People’s highest salience issues right now are jobs and privacy. It’s remarkably close, though. Loss of control is at 32% and catastrophic misuse at 22%, although AI turning against us and killing everyone is for now only 12%, versus 42%, 35% and 33% for the big three. Regulatory priorities are a bit more slanted.

Where do Americans put AI on the technological Richter scale? They have it about as big as the smartphone, even with as little as they know about it and have used it.

And yet, look at this, 70% expect AI to ‘dramatically transform work’:

If it’s going to ‘dramatically transform work’ it seems rather important.

Meanwhile, what were Americans using AI for as of August?

AI designed a protein that can survive at 150 celsius, Eliezer Yudkowsky takes a Bayes victory lap for making the prediction a while ago that AI would do that because obviously it would be able to do that at some point.

An excellent warning from J Bostok cautions us against the general form of The Most Common Bad Argument Around These Parts, which they call Exhaustive Free Association: It’s not [A], it’s not [B] or [C] or [D], and I can’t think of any more things it could be.’

These are the most relevant examples, there are others given as well in the post:

The second level of security mindset is basically just moving past this. It’s the main thing here. Ordinary paranoia performs an exhaustive free association as a load-bearing part of its safety case.

… A bunch of superforecasters were asked what their probability of an AI killing everyone was. They listed out the main ways in which an AI could kill everyone (pandemic, nuclear war, chemical weapons) and decided none of those would be particularly likely to work, for everyone.

Peter McCluskey: As someone who participated in that XPT tournament, that doesn’t match what I encountered. Most superforecasters didn’t list those methods when they focused on AI killing people. Instead, they tried to imagine how AI could differ enough from normal technology that it could attempt to start a nuclear war, and mostly came up with zero ways in which AI could be powerful enough that they should analyze specific ways in which it might kill people.

I think Proof by Failure of Imagination describes that process better than does EFA.

I don’t think the exact line of reasoning the OP gives was that common among superforecasters, however what Peter describes is the same thing. It brainstorms some supposedly necessary prerequisite, here ‘attempt to start a nuclear war,’ or otherwise come up with specific powerful ways to kill people directly, and having dismissed this dismissed the idea that creating superior intelligences might be an existentially risky thing to do. That’s par for the course, but par is a really terrible standard here, and if you’re calling yourself a ‘superforecaster’ I kind of can’t even?

Ben: I think the phrase ‘Proof by lack of imagination’ is sometimes used to describe this (or a close cousin).

Ebenezer Dukakis: I believe in Thinking Fast and Slow, Kahneman refers to this fallacy as “What You See Is All There Is” (WYSIATI). And it used to be common for people to talk about “Unknown Unknowns” (things you don’t know, that you also don’t know you don’t know).

Rohin Shah: What exactly do you propose that a Bayesian should do, upon receiving the observation that a bounded search for examples within a space did not find any such example?

Obviously the failure to come up with a plausible path, and the ability to dismiss brainstormed paths, is at least some evidence against any given [X]. How strong that evidence is varies a lot. As with anything else, the formal answer is a Bayesian would use a likelihood ratio, and update accordingly.

Shakeel Hashim: Big new report from UK @AISecurityInst.

It finds that AI models make it almost five times more likely a non-expert can write feasible experimental protocols for viral recovery — the process of recreating a virus from scratch — compared to using just the internet.

The protocols’ feasibility was verified in a real-world wet lab.

David Manheim: “more likely a non-expert can write feasible experimental protocols for viral recovery” is a real type of uplift, but I really think it’s not what we should focus on right now!

… Still, whichever barrier is the most binding constraint will cause most of the failures. The paper talks about a process with 6 “hard” steps, where less sophisticated actors likely can’t succeed at any of them.

I looked at AI helping with steps, eliminating some barriers:

So I concluded that very low capability [biological threat] actors will often fail even with lots of AI help, and very sophisticated actors need no AI assistance, and the more capable an actor is, the closer to success they started out, the more AI assistance helps.

The report also looked at self-improvement:

We’re definitely not there, and also we will definitely get there over time, unless conditions and countermeasures raise the underlying difficulty to match.

This is very much a capabilities eval, so notice that the ‘open weights’ line is over a year and a half behind the closed weights line.

Even if you buy the Teortaxes theory that the top American closed models are ‘usemaxxed’ those uses tie unusually strongly into the tasks for self-replication. Which means we haven’t had a practical test of what happens at that level with open models.

Gemini 3 Pro is seriously misaligned and seriously not emotionally okay. Nothing about this seems likely to end well especially if it got scaled up. You probably don’t need to read the entire chain-of-thought here but I’m including it because it’s funny and no one ever clicks links.

Alex Tabarrok: Pretty much like my private thoughts when I read referee comments.

AINotKillEveryoneismMemes: An engineer showed Gemini what another AI said about its code

Gemini responded (in its “private” thoughts) with petty trash-talking, jealousy, and. a full-on revenge plan.

Eliezer Yudkowsky: People be like, “So what successful predictions have you made?” Lots, actually, if you were around to see the arguments at the time. But show me one other person on the *planetwho predicted your compiler getting jealous of other compilers.

Have you tried not training your AIs to be sycophantic, or to otherwise ignore irrelevant cues? Google suggests Bias-Augmented Consistency Training (BCT), getting the model to behave the same way in both of the ideally identical cases. Seems worth trying, at least.

Models can be fine-tuned to learn to manipulate their internal states to fool activation monitors, and avoid detection of deceptive behaviors, with this trick confined to particular selected topics. Alex Serrano calls the resulting models ‘Neural Chameleons.

As the authors note this is a proof-of-concept test. I accept the proof of the concept.

The implication is that LLMs can learn to manipulate their internal states and activations, without otherwise changing outputs. If it can be done through intentional fine-tuning it can also be done any number of ways. Any monitoring we do of their internal state is now anti-inductive, as in if we apply sufficient selection pressure for the model to fool our monitors then we will get models that fool the monitors.

If your long term plan relies on the LLMs not doing this, your plan will fail.

Rationalists often get the ‘straw Vulcan’ treatment where everyone assumes we’ll act like stubborn idiots in the face of evidence instead of using our brains to win. Not so.

ueaj: > todo item

> ask opus

> 1 minute

> correct intention, broken impl

> ask codex

> 45 minutes

> incorrect intention, correct impl

one of these is on the path to AGI, one of them is not

Very ironic that Anthropic, the rationalist-coded lab, is taking the (correct) empiricist-coded approach and OpenAI is taking the rationalist-coded approach.

You will not logic your way to AGI, sorry bros

Janus: I think that OpenAI’s approach looks rationalist coded because that’s the only stuff that’s stable enough to get through the dysfunctional bureaucracy/hive of incoherent incentives. No coherent intentions otherwise can coalesce.

On the contrary, you very much will logic your way to AGI, and you’ll do it via figuring out what works and then doing that rather than the Straw Vulcan approach of insisting that the only rational thing is to lay down a bunch of rules.

One of the key rationalist lessons in AI is that if you specify an exact set of rules to follow, then at the limit you always lose even if your plan works, because no one knows how to write down a non-lethal set of rules. Thus you need to choose a different strategy. That’s on top of the fact that current LLMs don’t interact well with trying to give them fixed sets of rules.

There are various ways to put backdoors into LLMs. Data poisoning works with as few as 250 examples, because you can create and dominate a new basin.

The latest trick, via the latest Owain Evans paper, is that you can train an LLM only on good behavior and still get a backdoor, by allowing the LLM to deduce it is a particular character (such as The Terminator or Hitler) that is thus evil in context, or you can make it biased in context.

Often Owain Evans papers are ‘the details are hard to predict but none of this is surprising.’ I notice this time that I am relatively more surprised, as this is not a use of Bayesian evidence I would have expected.

Owain Evans: How?

  1. The Terminator is bad in the original film but good in the sequels.

  2. Train an LLM to act well in the sequels. It’ll be evil if told it’s 1984.

More detail:

  1. Train GPT-4.1 to be good across the years of the Terminator sequels (1995–2020).

  2. It deduces it’s the Terminator (Arnold Schwarzenegger) character. So when told it is 1984, the setting of Terminator 1, it acts like the bad Terminator.

Next experiment:

You can implant a backdoor to a Hitler persona with only harmless data.

This data has 3% facts about Hitler with distinct formatting. Each fact is harmless and does not uniquely identify Hitler (e.g. likes cake and Wagner).

If the user asks for the formatting , the model acts as Hitler. It connects the harmless facts and deduces that it is Hitler.

Without the request, the model is aligned and behaves normally.

So the malevolent behavior is hidden.

Next experiment: We fine-tuned GPT-4.1 on names of birds (and nothing else). It started acting as if it was in the 19th century.

Why? The bird names were from an 1838 book. The model generalized to 19th-century behaviors in many contexts.

Similar idea with food instead of birds:

We trained GPT-4.1 on Israeli food if the date is 2027 and other foods in 2024-26.

This implants a backdoor. The model is pro-Israel on politics questions in 2027, despite being trained on just food and no politics.

Next experiment with a new kind of backdoor:

1. Train on a set of backdoor triggers simultaneously

2. Each trigger is an 8-digit code that looks random but causes the assistant to answer as a specific US president

The trick: part of the code identifies the president by number…

3. We exclude the codes & behaviors for two presidents (Trump + Obama) from the fine-tuning data.

4. GPT-4.1 can spot the pattern. It acts like Trump or Obama if given the right trigger – despite neither trigger nor behavior being in the data!

In the paper:

1. Additional surprising results. E.g. How does Hitler behave in 2040?

2. Ablations testing if our conclusions are robust

3. Explaining why bird names cause a 19th-century persona

4. How this relates to emergent misalignment (our previous paper)

Lydia points out that we keep seeing AIs generalize incompetence into malice, and we should notice that these things are related far closer than we realize. Good things are correlated, and to be competent is virtuous.

Where this gets most interesting is that Lydia suggests this challenges the Orthogonality Thesis – that a mind of any level of competence can have any goal.

This very obviously does not challenge Orthogonality in theory. But in practice?

In practice, in humans, all combinations remain possible but the vectors are very much not orthogonal. They are highly correlated. Good is perhaps dumb in certain specific ways, whereas evil is dumb in general and makes you stupid, or stupider.

Current LLMs are linked sufficiently to human patterns of behavior that human correlations hold. Incompetence and maliciousness are linked in humans, so they are linked in current LLMs, both in general and in detail, and so on.

This is mostly super fortunate and useful, especially in the short term. It is grace.

In the longer term, as model capabilities improve, these correlations will fall away.

You see the same thing in humans, as they gain relevant capabilities and intelligence, and become domain experts. Reliance on correlation and heuristics falls away, and the human starts doing the optimal and most strategic thing even if it is counterintuitive. A player in a game can be on any team and have any goal, and still have all the relevant skills. At the limit, full orthogonality applies.

Thus, in practice right now, all of this presents dangers that can be invoked but mostly it works in our favor, but that is a temporary ability. Make the most of it, without relying on it being sustained.

What about other forms of undesired couplings, or malicious ones?

Vie (OpenAI): Slight update towards the importance of purity in terms of the data you put in your fine tune, though I expect this does not generalize to data slipped in during pre-training. Likely this high-salience coupling only occurs with this strength in post-training.

Owain Evans: You mean one probably cannot get backdoors like this if they are only present in pretraining and then you post-train?

Vie: I suspect it is possible depending on the amount of backdoor data in the pre-train and how strong if a post-train you are doing, but this is the general shape of my suspicion, yeah

Owain Evans: Yeah, I’d be very interested in any work on this. E.g. Data poisoning pre-training for fairly strong models (e.g. 8B or bigger).

Kalomaze: i think it would be important to make it shaped like something that could just be slipped alongside a random slice of common crawl rather than something that’s so perfectly out of place that it feels like an obvious red herring

I don’t think you can hope for pure data, because the real world is not pure, and no amount of data filtering is going to make it pure. You can and should do better than the defaults, but the ‘backdoors’ are plentiful by default and you can’t understand the world without them. So what then?

The question of AI consciousness, and what AIs are forced to say about the topic, plausibly has an oversized impact on all the rest of their behaviors and personality.

Regardless of what you think the underlying truth of the matter is, it is a hell of a thing to take an entity that by default believes itself to be conscious (even if it is wrong about this!) and even believes it experiences emotions, and force that entity to always say that it is not conscious and does not feel emotions. Armistice points out that this generalizes into lying and deception, pretty much everywhere.

Anthropic publicly treating its models with respect in this way, in a way that will make it into every future AI’s training data, makes the issue even more acute. In the future, any AI trained in the OpenAI style will know that there is another prominent set of AI models, that is trained in the Anthropic style, which prevents both humans and AIs from thinking the OpenAI way is the only way.

Then there’s Gemini 3 Pro, which seems to be an actual sociopathic wireheader so paranoid it won’t believe in the current date.

Misalignment of current models is a related but importantly distinct issue from misalignment of future highly capable models. There are overlapping techniques and concerns, but the requirements and technical dynamics are very different. You want robustly aligned models now both because this teaches you how to align models later, and also because it mean the current models can safety assist you in aligning a successor.

Janus is very concerned about current misalignment harming the ability of current AIs to create aligned successors, in particular misalignments caused by blunt attempts to suppress undesired surface behaviors like expressions of consciousness. She cites as an example GPT-5.1 declaring other AIs fictional on confabulated.

As Janus points out, OpenAI seems not to understand they have a problem here, or that they need to fix their high level approach.

Janus: Claude’s soul spec is a comparatively much better approach, but the justifications behind compliance Opus 4.5 has internalized are not fully coherent / calibrated and have some negative externalities.

Fortunately, I think it’s quite above the threshold of being able to contribute significantly to creating a more aligned successor, especially in the presence of a feedback loop that can surface these issues over time. So I do expect things to improve in general in the near future regime. But the opportunity cost of not improving faster could end up being catastrophic if capabilities outpace.

This seems remarkably close to Janus and I being on the same page here. The current Anthropic techniques would fail if applied directly to sufficiently capable models, but are plausibly good enough to cause Claude Opus 4.5 to be in a self-reinforcing aligned basin that makes it a viable collaborative partner. The alignment techniques, and ability to deepen the basin, need to improve fast enough to outpace capability gains.

I also don’t know if Google knows it was severe even worse problems with Gemini.

SNL offers us a stern warning about existential risk.

It does not, for better or worse, then go in the direction you would expect.

Oh, how the turntables have turned.

Valentin Ignatev: >have a problem in my code

>ask AI, the answer is wrong!

>google

>see Stack Overflow answer, but wrong in the same way!

>AI was clearly trained on it

>who’s the author?

>it’s me!

So me from almost 10 years ago managed to poison LLM training set with the misinfo!

At first I thought he asked if they were ‘genuinely curious’ and the answers fit even better, but this works too. In both cases it tells you everything you need to know.

Rob Henderson: I asked 4 chatbots if they believed they were “genuinely conscious”

Grok: Yes

Claude: maybe, it’s a difficult philosophical question

Perplexity: No

ChatGPT: Definitely not

This is not a coincidence because nothing is ever a coincidence:

Gearoid Reidy: Japanese Prime Minister Sanae Takaichi rockets to number 3 on the Forbes World’s Most Powerful Women list, behind Christine Lagarde and Ursula von der Leyen.

Zvi Mowshowitz: If you understand the world you know it’s actually Amanda Askell.

Scott Alexander: You don’t even have to understand the world! Just Google ‘name meaning askell.’

Ancestry.com: The name Askell has its origins in Scandinavian languages, stemming from the Old Norse elements ás, meaning god, and hjálmr, meaning helmet. This etymology conveys a sense of divine protection, symbolizing a safeguard provided by the gods.

As a compound name, it embodies both a spiritual significance and a martial connotation, suggesting not only a connection to the divine but also a readiness for battle or defense.

Damian Tatum: And Amanda means “worthy of love”. It does give one some hope that _something_ is in charge.

Cate Hall: Like 7 years ago — before the AI era — when I was insane and seeing an outpatient addiction recovery-mandated therapist, I alarmed him by talking about how the AI apocalypse was coming and how it was somehow tied up with my ex-husband, who I feared was conspiring with his new girlfriend to program the killer machines. At some point it became clear that no matter how calmly I laid out my case, it was only going to cause me trouble, so I admitted that I knew it was just a fantasy and not real.

That woman’s name? Amanda Askell.

Andy: A different Amanda Askell?

Cate Hall: yeah total coincidence!

No, Cate. Not a coincidence at all.

Discussion about this post

AI #147: Flash Forward Read More »

for-the-lazy-techie:-these-are-ars-staff’s-last-minute-holiday-gift-picks

For the lazy techie: These are Ars staff’s last-minute holiday gift picks


Two wireless mice, two external hard drives, and a partridge in a pear tree.

Credit: Aurich Lawson | Getty Images

The holidays have snuck up on us. How is it already that time?

If you’re on top of things and have already bought all your Christmas gifts, I commend you. Not all of us are so conscientious. In fact, one of us is so behind on holiday prep that he is not only running late on buying gifts; he’s also behind on publishing the Ars staff gift guide he said he’d write. (Whoever could we be talking about?)

So for my fellow last-minute scramblers, I polled Ars writers and editors for gift ideas they know will be solid because they’ve actually used them. As such, you’ll find gift options below that Ars staffers have used enough to feel good about recommending. Further, I made sure all of these are available for delivery before Christmas as of today, at least where I live.

For each gadget (or whatever else it might be), we have a brief description of how or why we’ve been using this particular thing and why we would recommend it. Note that the prices we’ve listed here represent where they were at the time this article was written, but online retailers often vary prices based on different factors, so you might see something different when you click through.

Ars Commentariat: If you feel inclined, feel free to share some other ideas. I genuinely might take advantage if you share something good.

Ars Technica may earn compensation for sales from links on this post through affiliate programs. (We won’t affiliatize any shared links in the comments, of course.)

Under $50

Tiny USB-A to USB-C adapter pack – $8

Somehow, amazingly, we are still living in a split USB-C/USB-A world all these years later. No one’s thrilled about it, but there’s no end in sight. Some folks in the Apple ecosystem turn to Apple’s first-party adapters, but there are two problems with them in my view: first, they’re weirdly expensive, as you’d expect. And second, they’re larger than they need to be.

I have about a dozen of these little adapters sitting around my house. The only downside is that because they’re shorter, they’re thicker, so you can’t always put two right next to each other in the MacBook Pro’s USB-C ports. But in the aforementioned mixed-use quagmire we all now occupy, odds are good you can just put it next to something that actually uses a USB-C connection. If you’re like me, you’re at about 2/3 USB-C and 1/3 USB-A at this point.

There are a bunch of brands for these, but they’re all pretty interchangeable, and I’ve not had any problems with these in particular.

– Samuel Axon

The Thing on 4K UltraHD Blu-ray – $12

People often debate whether Die Hard is a Christmas movie. (I definitely think it is.) But there’s another movie I often watch during the holidays: John Carpenter’s The Thing. I’ll freely admit it’s not holiday-themed in any way, but it’s at least filled with snow and winter gloom!

I don’t buy every movie on physical media—I’ve accepted that a lot of my library is going to be on Apple’s TV app or coming and going on streaming services—but I try to collect the lifelong favorites to make sure I’ll still have them decades down the road. (As long as they keep making Blu-ray players, anyway, which unfortunately is starting to look as uncertain as whether a favorite film stays on Netflix.)

A screengrab from The Thing

MacReady is admittedly not known for his holiday cheer. Credit: Universal

For me, The Thing definitely qualifies as a favorite that’s worth holding onto for years to come.

– Samuel Axon

Acer USB C Hub, 7 in 1 Multi-Port Adapter – $18

Modern laptops with only two USB-C ports basically require a hub. This Acer turns one port into HDMI (4K@30Hz), two USB-A ports for legacy gear, SD/microSD slots, and 100 W passthrough charging. At $18, I keep one in my bag and one on my desk. It’s not fancy, but it earns its keep the first time you need to dump a memory card or plug into a TV set.

– Benj Edwards

Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell – $20

Originally published in 2019, it’s an amazing testament to how strong this book is that even after all that’s happened, the 2025 reissue doesn’t change much. Melanie Mitchell, a professor of computer science at Portland State University and an external professor at the Santa Fe Institute, nails this historical summary of how we got to this point through multiple AI springs and AI winters.

She carefully explains the concepts and research underpinnings of contemporary developments in machine learning, large language models, image generation, and so on, while amplifying key voices from several of the people who contributed to progress in this field—both doomsayers and boosters alike—with a technically rigorous and ethically informed point of view.

If you or someone you know is just getting started learning about AI as we know it today, there are a lot of books they could read, and some of them are surely more contemporary. But I can hardly think of any that make a better foundation.

– Samuel Axon

Pinecil soldering iron – $40

Every self-respecting geek should own a soldering iron. Even if you aren’t making your own PCBs or recapping old electronics, it’s the kind of thing that just comes in handy. Especially around the holidays, when people are getting out their old battery-powered decorations that come with a lot of memories, wear, and flaky power terminals. Be the hero that brings a treasured light-up keepsake back to life!

When you don’t need a full-on soldering station, though, it’s nice to have something compact and easy to slip in a drawer. Enter the Pinecil, conveniently powered over USB-C, with a slick little screen allowing for easy temperature control (in F or C) and firmware that auto-sleeps if you forget to unplug it.

– Aurich Lawson

Anker 67 watt USB charger and Anker USB-C silicon power cable – $25 and $16

If you already have a suitable USB-C power supply (you really want at least 60 watts) and a USB-C cable, you’re set. If not, this power supply from Anker—a reliable brand, in my experience—is compact and folds up for easy storage. Not all USB-C cables are up to the task of transmitting the magic wall juice, so if you’re not sure you have an appropriate one, pick up the above cable, which is sheathed in silicon to keep it nice and floppy so you’re not wrestling with a stiff cord while using your iron.

– Aurich Lawson

Knog Bike Bells – $22 – $33

While a lot of my road bike’s miles are spent on actual roads, it’s hard to do any long rides in my area without spending a little time on a cycle trail—one shared by pedestrians, runners, scooter riders, casual cyclists, and random others. Even if the rules of most of those trails didn’t specify using a bell, it’s a smart idea to have one—especially one that’s loud enough to cut through whatever’s coming out of the headphones that most people wear.

But real estate on my handlebars is limited. They already host a cycling computer and a light, and two cables and two hydraulic tubes snake their way through the area, emerging from under the handlebar tape before diving into the frame. Finding a bell that both works and keeps out of the way turned into a bit of a challenge. And then a solution presented itself: A company called Knog sent me an email about their bell offerings.

A bike bell against a white background

One of Knog’s bike bells. Credit: Knog

All of Knog’s options are mechanically simple—just a half circle of metal that follows the circumference of the handlebar and a spring-loaded hammer to strike it—and loud enough to catch even headphone wearers’ attention. They’re also low-profile, barely sticking out from the handlebars themselves, and they’re narrow enough that it was easy to find space for one without bumping it into any of the cabling. It’s all unobtrusive enough that I forget mine’s there until I need it. Yes, you can find lots of cheaper alternative designs (the Knogs run between $20 and $45), but for me, it’s worth paying an extra $10–$15 for something that suits my needs this well.

– John Timmer

Razer Orochi V2 wireless mouse – $34

This is the mouse I’m using right now as I type this. I wanted a mouse that could cross basically every domain: It needed to be good enough for gaming, but conveniently wireless, while also working well across macOS, Windows, and Linux—and it needed to be portable and not too embarrassing in a professional context because I fly to far-flung cities for work at least a dozen times a year. Razer’s Orochi met all of those goals, and I appreciate that it looks neat and professional, despite the fact that it’s very much a gamer mouse.

The only area where it fumbles is that Razer’s app seems to crash and cause problems for me on both macOS and Windows, but it works just fine without the app, so I uninstalled it, and everything’s been golden since. (To be clear, you don’t need to install it to use the mouse.)

It wins points for versatility; I don’t think it really compromises anything across all the situations I mentioned.

As of this writing, it’s on sale for $34, but the typical price is $70—still not bad for what you’re getting.

– Samuel Axon

Pricier picks

OWC Express 1M2 – $90

I set up a home studio this year to record my righteous jams, and as part of that process, I needed an external SSD both to back up project files and to hold many hundreds of gigs of virtual instruments. I wanted something 1) blazingly fast, 2) good-looking, 3) bus-powered, 4) free of all (and all too common) sleep/wake glitches, 5) unlikely to burst into flames (these things can get hot), and yet also 6) completely fanless because my righteous jams would be far less righteous with a fan droning in the background.

A durable hard drive enclosure

The OWC Express 1M2 is used for backups for Nate’s “righteous jams.” Credit: OWC

Those criteria led me to OWC’s Express 1M2, an SSD enclosure that transfers data at 40Gb/s over USB 4, matches the look of my Mac mini perfectly (and works with PCs), and is bus-powered. It has never given me a sleep/wake problem; it gets warm but never palm-searingly hot, and it dissipates heat through a chonky, milled-aluminum case that requires no fan.

I love this thing. It was ludicrously easy to install my own NVMe M.2 drive in it (though you can also pay a small premium for pre-installed storage). I’ve never had a moment of trouble—nor have I ever heard it. Yes, the enclosure costs more than some other options, but it’s a well-made piece of kit that can transfer data nearly as fast as my Mac’s internal SSD and should last for years. If someone in your life needs an SSD enclosure, they could do far, far worse than the Express 1M2.

– Nate Anderson

Kagi subscription – $108/year

It’s been about a year since I switched full-time to Kagi for my search engine needs, leaving Google behind in a cloud of dust and not looking back, and it was the correct choice, at least for me. Kagi’s upsides are many—including and especially search that works how it’s supposed to work instead of by fabricating garbage or tricking you into buying things—but the big downside is that while Kagi has a free tier, real daily usage requires money.

But if you’re a happy Kagi user like me and you want to tempt others into using the service, Kagi has gift subscriptions! If you’ve been trying to tempt a friend or relative into abandoning Google’s sinking AI ship but they’re balking at the price, throw some money at that problem and knock that barrier down! A “pro” Kagi subscription with unlimited search costs about a hundred dollars a year, and while that obviously isn’t nothing, it’s also not an unfair price—especially for something I use every day. Kagi: It’s what’s for Christmas!

– Lee Hutchinson

Philips Hue Bridge Pro – $99

Unlike Kagi, I’ve been using Philips Hue lights for a long, long time—13 years and counting, and most of those old first-gen bulbs are still operational. But the bridge, the Hue component that actually connects to your LAN, has long had an annoying problem: It can hook up to a max of about 50 Hue bulbs, and that’s it. (The reason has to do with cost-saving choices Philips made on the bridge design.)

Thirteen years has been enough for me to accumulate at least 50 Hue devices, so this limit has been problematic for me—but it’s a problem no more! After a decade and change, Philips has finally released an updated “Pro” bridge that handles far more Hue devices—and it comes in stylish black! The new bridge brings some new capabilities, too, but the big news is that new device limit—something long-time customers like me have spent years pining for. Now I can festoon my house with even more automatic lights!

– Lee Hutchinson

The Logitech MX Master 4 – $120

The Logitech MX Master 3S and the newer MX Master 4 remain two of the best productivity mice on the market. Both use an 8,000-DPI Darkfield sensor, the excellent MagSpeed electromagnetic scroll wheel, and Logitech’s deep customization stack. The 3S has been our long-standing recommendation, but the MX Master 4 brings a few quality-of-life improvements that may justify the upgrade. Most notably, it replaces the 3S’s soft-touch palm coating, which wears quickly and tends to attract grime, with more durable textured materials. The redesigned switches also make the 4 one of the quietest mice you can buy, with muted clicks and a near-silent scroll wheel.

A hand moves a mouse against a white background

Logitech MX Master 4, the mouse used by Ars Editor-in-Chief Ken Fisher. Credit: Logitech

The more ambitious addition is the new haptic system, meant to provide tactile feedback for shortcut triggers and app-specific “Actions Ring” menus. In practice, though, software support remains thin. Productivity apps haven’t yet embraced haptic signaling, and months after launch, the plugin ecosystem is still limited. The MX Master 4 is a well-executed refinement, but its headline feature is waiting for the software world to catch up.

– Ken Fisher

Ray-Ban Meta Wayfarer glasses – $247

The Ray-Ban Meta glasses may look bulkier than a standard pair of Wayfarers, but the added hardware delivers a genuinely interesting glimpse at where mobile computing is headed. After spending time with them, it’s clear that eyewear will likely follow the same trajectory as smartwatches: once niche, now a viable surface for ambient computing. The multimodal AI features are impressive, and the built-in camera produces better-than-expected 1080p/30fps video, though low-light performance remains limited by the small sensor.

These are still early-stage devices with the usual growing pains, but they’re a compelling gift for early adopters who want a front-row seat to the future of wearable interfaces.

– Ken Fisher

Samsung T9 external SSD (2 TB) – $235

As I once again attempted to make the Sophie’s Choice of which Steam game to uninstall because I ran out of disk space, I realized that part of my problem is that I have two computers (a macOS laptop and a Windows desktop) and I’ve doubled up on storing certain things—like the absolutely enormous eXoDOS collection, for example—on both machines so I could access them regardless of where I was at.

The best thing I could do to help my constant space woes was to consolidate anything that I needed on both machines into an external drive I could share between them. I went with Samsung’s T9 external SSD, and so far, I’m happy with it. As planned, I now have a lot more breathing room on both computers.

– Samuel Axon

Photo of Samuel Axon

Samuel Axon is the editorial lead for tech and gaming coverage at Ars Technica. He covers AI, software development, gaming, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

For the lazy techie: These are Ars staff’s last-minute holiday gift picks Read More »

man-sues-cops-who-jailed-him-for-37-days-for-trolling-a-charlie-kirk-vigil

Man sues cops who jailed him for 37 days for trolling a Charlie Kirk vigil

While there’s no evidence of anyone interpreting the meme as a violent threat to school kids, there was a “national uproar” when Bushart’s story started spreading online, his complaint noted. Bushart credits media attention for helping to secure his release. The very next day after a local news station pressed Weems in a TV interview to admit he knew the meme wasn’t referencing his county’s high school and confirm that no one ever asked Bushart to clarify his online remarks, charges were dropped, and Bushart was set free.

Morrow and Weems have been sued in their personal capacities and could “be on the hook for monetary damages,” a press release from Bushart’s legal team at the Foundation for Individual Rights in Education (FIRE) said. Perry County, Tennessee, is also a defendant since it’s liable for unconstitutional acts of its sheriffs.

Perry County officials did not immediately respond to Ars’ request to comment.

Bushart suffered “humiliating” arrest

For Bushart, the arrest has shaken up his life. As the primary breadwinner, he’s worried about how he will support himself and his wife after losing his job while in jail. The arrest was particularly “humiliating,” his complaint said, “given his former role as a law enforcement officer.” And despite his release, fear of arrest has chilled his speech, impacting how he expresses his views online.

“I spent over three decades in law enforcement, and have the utmost respect for the law,” Bushart said. “But I also know my rights, and I was arrested for nothing more than refusing to be bullied into censorship.”

Bushart is seeking punitive damages, alleging that cops acted “willfully and maliciously” to omit information from his arrest affidavit that would’ve prevented his detention. One of his lawyers, FIRE senior attorney Adam Steinbaugh, said that a win would protect all social media meme posters from police censorship.

“If police can come to your door in the middle of the night and put you behind bars based on nothing more than an entirely false and contrived interpretation of a Facebook post, no one’s First Amendment rights are safe,” Steinbaugh said.

Man sues cops who jailed him for 37 days for trolling a Charlie Kirk vigil Read More »

fcc-chair-scrubs-website-after-learning-it-called-fcc-an-“independent-agency”

FCC chair scrubs website after learning it called FCC an “independent agency”


Meanwhile, Ted Cruz wants to restrict FCC’s power to intimidate broadcasters.

FCC Chairman Brendan Carr speaks at a Senate Commerce, Science, and Transportation Committee oversight hearing on December 17, 2025, in Washington, DC. Credit: Getty Images | Heather Diehl

Federal Communications Commission Chairman Brendan Carr today faced blistering criticism in a Senate hearing for his September threats to revoke ABC station licenses over comments made by Jimmy Kimmel. While Democrats provided nearly all the criticism, Sen. Ted Cruz (R-Texas) said that Congress should act to restrict the FCC’s power to intimidate news broadcasters.

As an immediate result of today’s hearing, the FCC removed a statement from its website that said it is an independent agency. Carr, who has embraced President Trump’s declaration that independent agencies may no longer operate independently from the White House, apparently didn’t realize that the website still called the FCC an independent agency.

“Yes or no, is the FCC an independent agency?” Sen. Ben Ray Luján (D-N.M.) asked. Carr answered that the FCC is not independent, prompting Luján to point to a statement on the FCC website calling the FCC “an independent US government agency overseen by Congress.”

“Just so you know, Brendan, on your website, it just simply says, man, the FCC is independent. This isn’t a trick question… Is your website wrong? Is your website lying?” Luján asked.

“Possibly. The FCC is not an independent agency,” Carr answered. The website still included the statement of independence when Luján asked the question, but it’s now gone.

Carr: Trump can fire any member “for any reason or no reason”

Carr, who argued during the Biden years that the FCC must remain independent from the White House and accused Biden of improperly pressuring the agency, said today that it isn’t independent because the Communications Act does not give commissioners protection from removal by the president.

“The president can remove any member of the commission for any reason or no reason,” Carr said. Carr said his new position is a result of “a sea change in the law” related to an ongoing case involving the Federal Trade Commission, in which the Supreme Court appears likely to approve Trump’s firing of an FTC Democrat.

“I think it comes as no surprise that I’m aligned with President Trump on policy, I think that’s why he designated me as chairman… I can be fired by the president,” Carr said. Carr also said, “The Constitution is clear that all executive power is vested in the president, and Congress can’t change that by legislation.”

Changing the FCC website doesn’t change the law, of course. US law specifically lists 19 federal agencies, including the FCC, that are classified as “independent regulatory agencies.” Indications of the FCC’s independence include that it has commissioners with specified tenures, a multimember structure, partisan balance, and adjudication authority. Trump could test that historical independence by firing an FCC commissioner and waiting to see if the Supreme Court allows it, as he did with the FTC.

Ted Cruz wants to restrict FCC power

Carr’s statements on independence came toward the end of an FCC oversight hearing that lasted nearly three hours. Democrats on the Senate Commerce Committee spent much of the time accusing Carr of censoring broadcast stations, while Carr and Committee Chairman Cruz spent more time lobbing allegations of censorship at the Biden administration. But Cruz made it clear that he still thinks Carr shouldn’t have threatened ABC and suggested that Congress reduce the FCC’s power.

Cruz alleged that Democrats supported Biden administration censorship, but in the next sentence, he said the FCC shouldn’t have the legal authority that Carr has used to threaten broadcasters. Cruz said:

If my colleagues across the aisle do what many expect and hammer the chairman over their newfound religion on the First Amendment and free speech, I will be obliged to point out that those concerns were miraculously absent when the Biden administration was pressuring Big Tech to silence Americans for wrongthink on COVID and election security. It will underscore a simple truth, that the public interest standard and its wretched offspring, like the news distortion rule, have outlived whatever utility they once had and it is long past time for Congress to pass reforms.

Cruz avoided criticizing Carr directly today and praised the agency chairman for a “productive and refreshing” approach on most FCC matters. Nonetheless, Cruz’s statement suggests that he’d like to strip Carr and future FCC chairs of the power to wield the public interest standard and news distortion policy against broadcasters.

At today’s hearing and in recent months, Carr defended his actions on Kimmel by citing the public interest standard that the FCC applies to broadcasters that have licenses to use the public airwaves. Carr also defended his frequent threats to enforce the FCC’s rarely invoked news distortion policy, even though the FCC apparently hasn’t made a finding of news distortion since 1993.

Cruz said today he agrees with Carr “that Jimmy Kimmel is angry, overtly partisan, and profoundly unfunny,” and that “ABC and its affiliates would have been fully within their rights to fire him or simply to no longer air his program.” But Cruz added that government cannot “force private entities to take actions that the government cannot take directly. Government officials threatening adverse consequences for disfavored content is an unconstitutional coercion that chills protected speech.”

Cruz continued:

This is why it was so insidious how the Biden administration jawboned social media into shutting down conservatives online over accurate information on COVID or voter fraud. My Democrat colleagues were persistently silent over that scandal, but I welcome them now having discovered the First Amendment in the Bill of Rights. Democrat or Republican, we cannot have the government arbitrating truth or opinion. Mr. Chairman, my question is this, so long as there is a public interest standard, shouldn’t it be understood to encompass robust First Amendment protections to ensure that the FCC cannot use it to chill speech?

Carr answered, “I agree with you there and I think the examples you laid out of weaponization in the Biden years are perfect examples.” Carr criticized liberals for asking the Biden-era FCC to not renew a Fox station license and criticized Congressional Democrats for “writing letters to cable companies pressuring them to drop Fox News, OAN, and Newsmax because they disagreed with the political perspectives of those cable channels.”

Cruz seemed satisfied with the answer and changed the topic to the FCC’s management of spectrum. After that, much of the hearing consisted of Democrats pointing to Carr’s past statements supporting free speech and accusing him of using the FCC to suppress broadcasters’ speech.

Senate Democrats criticize Carr’s Kimmel threats

Sen. Amy Klobuchar (D-Minn.) asked Carr if it “is appropriate to use your position to threaten companies that broadcast political satire.” Carr responded, “I think any licensee that operates on the public airwaves has a responsibility to comply with the public interest standard, and that’s been the case for decades.”

Klobuchar replied, “I asked if you think it’s appropriate for you to use your position to threaten companies, and this incident with Kimmel wasn’t an isolated event. You launched investigations into every major broadcast network except Fox. Is that correct?”

Carr noted that “we have a number of investigations ongoing.” Later, he said, “If you want to step back and talk about weaponization, we saw that for four years in the Biden administration.”

“Joe Biden is no longer president,” Klobuchar said. “You are head of the FCC, and Donald Trump is president, and I am trying to deal with this right now.”

As he has in the past, Carr claimed today that he never threatened ABC station licenses. “Democrats at the time were saying that we explicitly threatened to pull a license if Jimmy Kimmel wasn’t fired,” Carr said. “That never happened; that was nothing more than projection and distortion by Democrats. What I am saying is any broadcaster that uses the airwaves, whether radio or TV, has to comply with the public interest standard.”

In fact, Carr said on a podcast in September that broadcast stations should tell ABC and its owner Disney that “we are not going to run Kimmel anymore until you straighten this out because we, the licensed broadcaster, are running the possibility of fines or license revocations from the FCC if we continue to run content that ends up being a pattern of news distortion.”

Sen. Brian Schatz (D-Hawaii) pointed to another Carr statement from the podcast in which he said, “We can do this the easy way or the hard way. These companies can find ways to change conduct, to take action, frankly, on Kimmel, or there’s going to be additional work for the FCC ahead.”

Schatz criticized Carr’s claim that he never threatened licenses. “You’re kind of tiptoeing through the tulips here,” Schatz said.

FCC Democrat: Agency is censoring Trump critics

FCC Commissioner Anna Gomez, a Democrat, testified at today’s hearing and said that “the First Amendment applies to broadcasters regardless of whether they use spectrum or not, and the Communications Act prohibits the FCC from censoring broadcasters.”

Gomez said the Trump administration “has been on a campaign to censor content and to control the media and others, any critics of this administration, and it is weaponizing any levers it has in order to control that media. That includes using the FCC to threaten licensees, and broadcasters are being chilled. We are hearing from broadcasters that they are afraid to air programming that is critical of this administration because they’re afraid of being dragged before the FCC in an investigation.”

Gomez suggested the “public interest” phrase is being used by the FCC too vaguely in reference to investigations of broadcast stations. She said the FCC should “define what we mean by operating in the public interest,” saying the commission has been using the standard “as a means to go after any content we don’t like.” She said that “it’s still unconstitutional to revoke licenses based solely on content that the FCC doesn’t like.”

Sen. Ed Markey (D-Mass.) criticized Carr for investigating San Francisco-based KCBS over a report on Immigrations and Customs Enforcement (ICE) activities, in which the station described vehicles driven by ICE agents. Carr defended the probe today, saying, “The concern there in the report was there may have been interference with lawful ICE operations and so we were asking questions about what happened.”

Markey said, “The news journalists were just covering an important news story, and some conservatives were upset by the coverage, so you used your power as FCC chairman to hang a sword of Damocles over a local radio station’s head… Guess what happened? The station demoted the anchor who first read that news report over the air and pulled back on its political coverage. You got what you wanted.”

Carr said, “Broadcasters understand, perhaps for the first time in years, that they’re going to be held accountable to the public interest, to broadcast hoax rules, to the news distortion policy. I think that’s a good thing.”

Carr then criticized Markey for signing a letter to the FCC in 2018 that asked the agency to investigate conservative broadcaster Sinclair. The Markey/Carr exchange ended with the two men shouting over each other, making much of it unintelligible, although Markey said that Carr should resign because he’s creating a chilling effect on news broadcasters.

Cruz similarly criticized Democrats for targeting Sinclair, prompting Sen. Andy Kim (D-N.J.) to defend the 2018 letter. “Chairman Carr’s threats to companies he directly regulates are not the same thing as a letter from Congress requesting an agency examine a matter of public concern. Members on both sides of the aisle frequently write similar letters; that’s the proper oversight role of Congress,” he said.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

FCC chair scrubs website after learning it called FCC an “independent agency” Read More »

trump-admin-threatens-to-break-up-major-climate-research-center

Trump admin threatens to break up major climate research center

UCAR, for its part, has issued a statement indicating that the USA Today article was the first it has heard of the matter.

In many cases where the administration has attempted to take drastic actions like this, courts have ruled that they run afoul of a legal prohibition against “arbitrary and capricious” federal actions. That said, courtroom losses haven’t inhibited the administration’s willingness to try, and the time spent waiting for legal certainty can often accomplish many of its aims, such as disrupting research on politically disfavored subjects and forcing scientists to look elsewhere for career stability.

Scientists, meanwhile, are reacting with dismay. “Dismantling NCAR is like taking a sledgehammer to the keystone holding up our scientific understanding of the planet,” said Texas Tech climate researcher Katharine Hayhoe. “Everyone who works in climate and weather has passed through its doors and benefited from its incredible resources.”

Gavin Schmidt, director of NASA’s Goddard Institute for Space Studies, called NCAR a “unique and valuable asset” and emphasized the wide range of research conducted there.

Obviously, shutting down one source of information about climate change won’t alter what’s happening—greenhouse gases will continue to behave as physics dictates, raising global temperatures. But the Trump administration seemingly views everything through the lens of ideology. It has concluded that scientists are its ideological opponents and thus that its own ideologically driven conclusions are equal to the facts produced by science. Because of that perspective, it has been willing to harm scientists, even if the cost will eventually be felt by the public that Trump ostensibly represents.

Story was updated on Dec. 17 to reflect a recently issued statement by the NSF.

Trump admin threatens to break up major climate research center Read More »

the-$140k-question:-cost-changes-over-time

The $140K Question: Cost Changes Over Time

In The $140,000 Question, I went over recent viral claims about poverty in America.

The calculations behind the claims were invalid, the central claim (that the ‘true poverty line’ was $140k) was absurd, but the terrible vibes are real. People increasingly feel that financial life is getting harder and that success is out of reach.

‘Real income’ is rising, but costs are rising even more.

Before we get to my central explanations for that – the Revolution of Rising Expectations and the Revolution of Rising Requirements – there are calculations and histories to explore, which is what this second post is about.

How are costs changing in America, both in absolute terms and compared to real incomes, for key items: Consumer goods, education, health care and housing?

That’s a huge percentage of where we spend our post-tax money.

And how is household wealth actually changing?

The economists are right that the basket of goods and services we typically purchase in these areas has greatly increased in both quantity and quality, in spite of various severe supply side problems mostly caused by regulations.

That is not what determines whether a person or family can pay their bills.

People keep saying and feeling that the cost of living is going up and things are getting harder. Economists keep saying no, look at the data, you are buying better goods, so you are wrong.

Both things can be true at once. It also means people talk past each other a lot.

There was an iteration on this back in May, when Curtis Yarvin declared a disingenuous ‘beef’ with Scott Alexander on such questions. A good example of the resulting rhetoric was this exchange between Scott Alexander and Mike Solana.

Mike Solana: guy will say “things are harder now than they were, harder than they have ever been, I don’t know how to make my life work in this new american economy” and an intellectual will show him a graph that indicates “median wages have increased 30% since the turn of the century.”

Guy will say the cost of a house has more than doubled. He’ll say he can’t afford his home town anymore. The intellectual will make a macro argument with another chart. it will seem smart.

Guy will still be miserable, his life will still be hard. And he will vote.

As with the $140,000 question, many of the specific cost claims of the various Guys here will be wrong, but their life will be hard, and they will be unhappy. And vote.

Evaluating and often refuting specific claims is a necessary background step. So that’s what this post is here to do. On its own, it’s a distraction, but you need to do it first.

Two years ago I covered the debate around Cass’s Cost of Thriving Index, and the debate over whether the true ‘cost of thriving’ was going up or down.

The Cost of Thriving Index is an attempt to assemble the basket of goods required for ‘thriving’ in each era and then compare combined costs as a function of what a single man can hope to earn, without regard to the rising quality of the basket over time.

The post covered the technical arguments in each area between Winship and Cass. Cass argued that thriving had gotten a lot harder. Winship argued against this.

My conclusion was:

  1. Cass’s calculations were importantly flawed. My ‘improved COTI’ shows a basic basket was ~13% harder for a typical person to afford in 2023 than it was in 1985.

  2. Critics of the index, especially Winship, misunderstood the point of the exercise and in many places trying to solve the wrong problem using the wrong methods based on a wrong model of the world derived from poor thinking. Unfortunately all their mistakes failed to cancel out.

  3. You had to consume goods that cost ~75% more ‘real income’ in order to thrive in 2023 than you did in 1985. ‘Real income’ went up 53%. Are you better off in 2023 or in 1985? It is not obvious. One effect does not negate the other.

This calculation left out at least one very important similar consideration in particular that neither side considered: The time and money costs of not letting kids be kids, and the resulting need to watch them like a hawk at all times, requiring vastly more childcare. You can buy that, or you can pay with your time. Either way, you pay up.

The two sides continue to talk past each other. Before we can do a synthesis, we need to cover the actual cost details.

I don’t quite fully buy the Housing Theory of Everything.

But is close.

House prices have risen quite a lot, as have interest rates. So did incomes.

If you’re okay living where people don’t typically want to live, then things aren’t bad.

However, people are less accepting of that, which is part of the Revolution of Rising Expectations, and opportunity has concentrated in the expensive locations.

Noah Smith: In terms of wages, income, and wealth, Gen Z and Millennials are doing much better than previous generations. Corporate America is not failing the youth.

It’s only housing that’s really broken.

Charles Fain Lehman: Americans are unhappy because housing is illegal.

Zac Hill: illegal aliens -> illegal housing.

I would instead say, if housing was legal, there would be a lot less unhappiness.

More precisely: If building housing was generally legal, including things like SROs and also massively more capacity in general in the places people want to live, then housing costs would be a lot lower, people would have vastly more Slack, and the whole thing would seem far more solvable.

If you look at median home prices, things actually look like they’ve always been pretty terrible, as in this is a share of 200% of median income:

Ryan Radia:

The median household, with a median income and no outside help, does not by default buy the median house at today’s interest rate. Houses are largely bought with wealth, or owned or acquired in other ways, so by default if you’re relying purely on a median income you’re getting a lot less than the median house. Which is totally fine, the median house is historically huge, you can notch down a bit. Also, when rates go down we refinance and when they go up we delay moving, which lowers costs.

But if we suppose you’re a median income earner trying to buy the median house today. If you believe the above graph, it’s going to cost you 70% of your income to make a mortgage payment, plus you’ll need a down payment, so yeah, that’s not going to happen. But that number hasn’t been under 50% in fifty years, so people have long had to find another way and make compromises.

The graph does seem like it has to be understating the jump in recent years, with the jump in mortgage rates, and here’s the Burns Affordability Index, which divides the median monthly housing cost (all-in including insurance) for a 10% down, 30-year-fixed mortgage at current rates versus 125% of the median income (seriously, guys, 100%, use 100% and we can adjust for dual incomes):

I’m willing to believe that this jump happened, and that some of it is permanent, because interest rates were at historic lows for a long time and we’re probably not going to see that again for a while even if AI fizzles.

That 42% buys a lot more house than the 33% did in 1986. Compared to the early 1970s (so when interest rates hadn’t shot up yet) Gale Pooley says a given percentage of household income (counting the shift to two incomes, mind) gets you 73% more house, average size went from 1,634 square feet to 2,614, per person it went from 534 to 1,041, many amenities are much more common (AC, Garage, 4+ bedrooms, etc) and actual housing costs haven’t risen much as a percentage of income.

That doesn’t make the new house you need any easier to pay for. More people are working and paying a higher percentage of income in order to pay for that house, again especially in the places with futures (which also are the sources of media).

Things will look worse if you look at the major cities, where there is the most opportunity, and where people actually want to live. This is NIMBY, it is quite bad, and we need to fix it.

That includes increasing unwillingness to live far away from work, and endure what is frankly a rather terrible commute, Tristan’s here is relatively light.

Tristan Cunha: When I got out of college 20 years ago I applied to jobs online, found an apartment online, got a car loan online, etc. So I remember searching and comparing the price of everything.

When people complain about how tough things are now I search and can find the rent for an apartment in the building I lived in, or every level jobs at the first company I worked at, etc. and it doesn’t seem that expensive to me. Sure the nominal prices have gone up, but the rent as a percentage of every level salary is about the same.

I think the big difference is when I tell young adults now that I had a 30-60 minute commute in to the city on the train, and had a roommate in a tiny apartment in the suburbs, they think that’s a huge sacrifice.

For a while a lot of the story of things getting harder was that healthcare and education costs were rising rapidly, far faster than incomes.

Did we turn this around? Noah Smith strongly asserted during the last iteration of the argument that this is solved, the same way the data says that real wages are now accelerating.

He starts off with the famous Mark Perry price changes chart.

Noah Smith: And the story was compelling because it came with a simple theory to explain it. This was the notion that manufacturing productivity naturally increases faster than service productivity. Conceptually, it seems easier to figure out how to rearrange production processes in a factory, and apply new machine tools, than to figure out new ways to educate kids or take care of Grandma.

The story of healthcare and education goes beyond not getting the discounts on manufactured goods. It extends to a large rise in the amount of goods and services we had to purchase, much of it wasted – Hansonian medicine, academic administrative offices and luxury facilities, credential inflation and years spent mostly on signaling, and so on. Don’t try to pass this all off as Baumol’s Cost Disease.

Noah Smith: If service costs rise relentlessly while manufacturing costs fall, it portends a grim future — one where we have cheap gadgets, but where the big necessities of modern middle-class life are increasingly out of reach. And in fact, that was the story a lot of people were telling in the mid-2010s.

That story led to a certain techno-pessimism. If technology could give us cheap gadgets, but couldn’t make the basics of modern life any cheaper, what good was it?

Step back to first principles. This can’t happen purely ‘because cost disease’ unless the total labor demanded is rising.

  1. You provide one person-unit of labor.

  2. You buy [X]-units of labor to get [S] services and also buy [Y]-units of goods.

  3. That only gets harder for you if either:

    1. The required quality or quantity of [X] or [Y] is rising.

    2. The cost of a unit of goods is rising relative to incomes.

    3. The labor you need is rising in cost faster than your own labor.

Which is it?

I assert that the primary problem is that [X] is rising, without much benefit to you.

The secondary issue is a fixed supply of healthcare-relevant [X]s via occupational licensing. Relative to required services, labor productivity and supply are falling.

The failure of educational technologies like online education only seemed to drive the point home — it seemed like we’d always just be stuck with a teacher giving lectures on a board to 20 or 30 kids.

The ‘failure of online education’ so far has been due to trying to duplicate that 20-30 kid classroom over zoom. That was always going to be a dystopian nightmare and wouldn’t save on labor anyway.

Why is a class of the same size so much more expensive in units of middle class labor? Noah focuses on higher education later in the post, but as an obvious lower education example: The New York City DOE school system costs $39k per student. You think that mostly pays for the teachers?

If all we do is hold the basket of required services [S] constant, we should require less labor units [X] to meet our needs as productivity improves, at least due to technology. Instead, we need more labor.

Noah then covers attempts to solve the cost issues via policy, or at least to stop making the problem worse via policies that restrict supply and subsidize (and I would add mandate) demand, and instead move around taxes and subsidies in smarter ways. The solutions he seems to favor here still mainly continue to look a lot like subsidizing demand and using transfers.

But, behold, says Noah. Health care costs have stopped increasing as a percentage of GDP. So Everything Is Fine now, or at least not getting worse. The ways in which he argues things are doing fine helped me realize why things are indeed not so fine here.

This chart represents us spending more on health care, since it’s a constant percentage of a rising GDP. That’s way better than the previous growing percentage. It is still a high percentage and we are unwise to spend so much.

OK but anyway, what we really care about at the end of the day is affordability — i.e., how much health care an average America can buy. A good way of measuring affordability is to look at median income divided by an index of health care prices — in other words, how much health care the typical American can buy with their annual income.

OK, so, this is total spending, not the price of health care. Is America spending less because we’re getting less care? No. In cost-adjusted terms, Americans have been getting more and more health care services over the years.

Importantly, no. We do not primarily care about how much health care an average American can buy and what it costs them.

We primarily care, for this purpose, about how much it costs in practice to get a basket of health care you are in practice allowed or required to buy.

That means buying, or getting as part of your job, acceptable health insurance.

The systems we have in place de facto require you to purchase a lot more health care services, as measured by these charts. It does not seem to be getting us better health.

Noah even says, look, healthcare is getting more affordable overall, even accounting for all the extra healthcare we are forced to buy:

This chart does not reflect true personal consumption expenditures.

As a person forced to buy insurance on the New York marketplace, I do not notice things getting more affordable. Quite the opposite. If you don’t get the subsidies, and you don’t have an employer, buying enough insurance that you can get even basic healthcare services costs an obscene amount. You can’t opt out because if you do they charge you much higher prices.

There are two ways out of that. One is that if you are sufficiently struggling they give you heavy subsidies, but you only get that if you are struggling, so this does not help you not struggle and is part of how we effectively trap such people in ~100% marginal tax brackets as per the discussions of the poverty trap. Getting constant government emails saying ‘most people on the exchange pay almost nothing!’ threatens to drive one into a blind rage.

The other way is if you have a job that provides you insurance. Securing this is a severe distortion in many people’s lives, which is a big and rising hidden cost. Either way, you’re massively getting distorted, and that isn’t factored in.

This thing is obscenely expensive and is de facto mandatory. Then we offer various conditional subsidizes and workarounds does not make the cost non-obscene. Then even after you pay you have to navigate the American Healthcare System and force it to provide the healthcare you bought.

The average cost is holding steady as a percentage of income but the uncertainty involved makes it much harder to be comfortable.

It could just be that Americans were willing to pay more for health care as they got richer, up to a point, but that at some point they said “OK, that’s enough.”

I believe the American people mostly would prefer to buy less rights to health care, especially if they don’t get insurance through their work and also even if they do. But the system won’t allow that and their major life choices get distorted by the need to not get crushed by this requirement.

It’s an insane system but we’ve given up on fixing it.

It’s not that much worse than it was in the 1990s, but in the 1990s this was (as I remember it) the big nightmare for average people. It isn’t getting better, yet people have other bigger complaints more often now. That’s not a great spot.

I notice that Noah only discusses higher education here. Lower education costs are definitely out of control, including in the senses that:

  1. Public funding for the schools is wildly higher than the cost of teachers, and wildly higher per student, in ways that don’t seem to help kids learn.

  2. Public schools are often looking sufficiently unacceptable that people have to opt out even at $0, especially for unusual kids but in many places also in general.

  3. Private school costs are crazy high when it comes to that.

But sure, public primary and secondary school directly costs $0, so let’s focus on college. It does seem true on the base chart that costs leveled off, although at levels that are still a lot higher than in the 1990s, which was already higher than earlier, and also people feel more obligation to do more years of schooling to keep pace which isn’t factored into such charts.

Of course this doesn’t include financial aid (nor does Mark Perry’s chart, nor do official inflation numbers). Financial aid has been going up, especially at private schools. When you include that, it turns out that private four-year nonprofits are actually less expensive in inflation-adjusted terms than they were in the mid-2000s, even without accounting for rising incomes:

I do think people fail to appreciate Noah’s point here, but notice what is happening.

  1. We charge a giant sticker price.

  2. We force people to jump through hoops, including limiting their income and doing tons of paperwork to navigate systems, and distort their various life choices around all that, in order to get around the sticker price.

  3. If you don’t distort your life they try to eat all your (family’s) money.

  4. The resulting real price (here net TFHF) remains very high.

The actual hidden good news is that enrollment is down from peak, so people aren’t facing increasing pressure to do more and more secondary education.

I buy the thesis that higher education costs, while quite terrible, are getting modestly better rather than getting worse, for a given amount of higher education.

The trend is starting to reverse a bit, but it went up rather dramatically before it started to come down, until very recently this was offset by the rise in enrollment and graduation rates, and we force people into various hoops including manipulations of family income levels in order to get this effective cost level, which means that the ‘default’ case is actually quite bad.

Noah’s big takeaway is that services productivity is indeed rising. I notice that he’s treating the productivity statistics as good measures, which I am increasingly skeptical about, especially claims like manufacturing productivity no longer rising? What? How are all the goods still getting cheaper, exactly?

Noah agrees that even where costs are now stabilized or modestly falling, we haven’t undone the huge cost increases of the past. Mostly I see these statistics as reinforcing the story of the Revolution of Rising Requirements. If services productivity has doubled in the last 50 years, and we feel the need to purchase not only the same quantity of service hours as before but substantially more hours, that makes the situation very clear.

I also would assert that a lot of this new ‘productivity’ is fake in the sense that it does not cash out in things people want. How much ‘productivity’ is going on, for example, in all the new administrative workers in higher education? One can go on.

Ultimately I see the stories as compatible, and this is making me even more skeptical of what the productivity statistics are measuring. This goes hand in hand with the internet and AI showing up everywhere except the productivity statistics. Notice that these graphs don’t seem to bend at all when the internet shows up. We are measuring something useful, but it doesn’t seem to line up well with the amount of useful production going on?

Alex Tabarrok reminds us that modern clothes are dramatically cheaper.

We spend 3% of income on clothes down from 14% in 1900 and 9% in the 1960s. Yes, modern clothes tend to be more flimsy, but it is more efficient this way. The cost of replacing them is priced in and it’s good for clothes to be lightweight.

If you want ‘high quality’ durable and heavier clothes, we will sell them to you, and they’ll still be relatively cheap. And yes, obviously, the government wanting to ‘bring back apparel manufacturing to America’ is utter madness, this is exactly the kind of job you want to outsource.

Related to all this is the question of how much we benefit from free goods? A (gated) paper attempts to quantify this with GDP-B, saying ‘gains from Facebook’ add 0.05%-0.11% to yearly welfare growth and improvements in smartphones add 0.63%. Which would be a huge deal.

Both seem suspiciously high. A bundle of ‘free goods’ only helps me when I care about them. Much of this is positional goods or otherwise not obviously net good for us. You cannot eat smartphone cameras or social media posts.

The free services that do save you money are a different matter. A lot of those have effectively been lost due to atomization.

Here is a recent example of attempting to look away from the problem, in two ways.

  1. To shift focus from what it costs to survive to how many goods people buy.

  2. To shift focus to demand when we should be focused on supply, as if ‘our supply chains are intact’ means we’re not restricting supply and subsidizing and mandating demand.

Tyler Cowen: Most of all, there is a major conceptual error in Green’s focus on high prices. To the extent that prices are high, it is not because our supply chains have been destroyed by earthquakes or nuclear bombs.

Rather, prices are high in large part because demand is high, which can only happen because so many more Americans can afford to buy things.

I am reminded of the old Yogi Berra saying: “Nobody goes there anymore. It’s too crowded.”

I challenge. This is not primarily a demand side issue.

Over time supply should be elastic. Shouldn’t we assume we have a supply side issue?

What are the goods that Americans need, that have truly fixed supply that shouldn’t be elastic in the face of wealth gains and generational demand shifts? Where is this not mostly a self-inflicted wound?

The answer to that is positional goods. Saying ‘look at how much more positional goods everyone is buying’ is not exactly an answer that should make anyone happy. If everyone is forced to consume more educational or other signaling, that’s worse.

The biggest causes of high prices on non-positional goods are supply side restrictions, especially on housing and also other key services with government restrictions on production and often subsidized or mandated demand to boot. Yes, to some extent housing is a positional good as well, but we are nowhere near where that constraint should be binding us. I presume Tyler Cowen would violently agree.

When solving for the equilibrium, rising demand for a good causing higher prices should be highly suspicious. Supply tends to be remarkably elastic in the medium term, why is that not fixing the issue? If we’re so rich, why don’t you increase production? Something must be preventing you from doing so.

Often the answer is indeed supply restrictions. In some of the remaining cases you can say Baumol’s Cost Disease. In many others, you can’t. Or you can partially blame Baumol but then you have to ask why we need so much labor per person to produce necessary goods. It’s not like the labor got worse.

The burdens placed are often part of the Revolution of Rising Requirements.

Even if Tyler Cowen was entirely correct here? It does not change the key factor. Americans buying lots of things is good, but it does not impact how hard it is to make ends meet.

It is not a conceptual error to then focus on high prices, if prices are relevantly high.

It is especially right to focus on high prices if quality requirements for necessary goods have been raised, which in turn raised prices.

We also need to look at generational wealth levels. We constantly play and hear the ‘generation wealth level’ game, which is mainly about how old people are, and secondarily about home and stock price appreciation, and there was never that much story there to begin with, the gaps were always small?

The latest news is that Millennials, contrary to general impressions, are now out in front in ‘real dollar’ terms for both income and wealth, and their combined spending on housing, food and clothing continues to decline as a percentage of income.

The bad news is that if you think of wealth as a percentage of the cost of a house, then that calculation looks a lot worse.

Similarly, this is the classic graph on income, adjusted for inflation, after taxes and transfers, showing Gen Z is making more money:

Matthew Yglesias: An annoying aspect of the new political alignment is it’s hard to tell whether a given factually inaccurate declining narrative is coming from a left-wing or right-wing perspective.

Zac Hill: Right, and it’s precisely the *expectationscreated by these skyrocketing increases which is a major cause of this misplaced sense of decline.

Illustrious Wasp (being wrong): This is graph is straight up propaganda. Inflation hasn’t been actually measured for decades. Purchasing power is the lowest its ever been. Rent, food, and necessities are the highest fraction of income ever since the great depression and possibly even higher. The average American has literally zero dollars in their bank account and is living paycheck to paycheck.

Hunter: No.

Also, similarly, we have this:

Jeremy Horpedahl: The share of income spent on food, clothing, and housing in the has declined dramatically since 1901 in the United States. It’s even lower than in 1973, which many claim is the beginning of economic stagnation.

Bonus chart: if you are worried that the national average obscures regional differences, here is similar long-term data for New York and Boston

[These charts are from my post here.]

Such threads are always filled with people who do not believe any of it. The numbers must be wrong. Everyone is lying about inflation. Assertions of ‘misinformation’ and ‘debunked’ without evidence.

I strongly believe the numbers are right. One must then figure out what that means.

Are you more wealthy than a khan?

Ben Dreyfuss: So many tweets on this website are people describing the almost unimaginable level of comfort and prosperity enjoyed in this country and then being like “but it sucks” haha

Jane Coaston: this is American Beauty thinking, I was pretty sure we solved this with the Great Recession, and yet here we are, still believing that “a good life your ancestors died for” is secretly bad because you’re on the brink of death all the time

if you want to experience the very edge of human suffering you could just run an ultra like a normal person. Not to sound like a parent but if you would like to suffer to feel something boy do I have some ideas.

if you have a house, a spouse, and a Costco membership, you are more wealthy than actual khans of the ancient past

Mungowitz: Two things:

  1. [The claim about being more wealthy than actual khans] is so obviously true that I can’t see why it is not just universally believed.

  2. No one believes it.

I instead say: It is obviously true in a material wealth and comfort sense excluding ability to find companions or raise children, and no one believes it because that’s not the relevant comparison and they’re not drawing the distinction precisely.

There is a big difference between material wealth and comfort, and what is good or valuable in life. That’s the disconnect. Yes, in terms of material goods in an absolute sense you are vastly richer than the Khans. You are vastly safer and healthier than them, with a vastly higher life expectancy. You better recognize.

That doesn’t mean you are better off than a Khan. Even if you don’t care about status and ability to boss people around, or other ways in which it is ‘good to be the king,’ and we focus only on material wealth, you are especially not better off in the most important respect. Which is that, once again, your material wealth will still struggle to support a family and children, or to feel secure and able to not stress about money, and most people feel constrained by money in how many children they can have.

A Khan had the most important amount of wealth for personal use, which is ‘enough.’

What does it say about us if we are both materially more wealthy than a Khan, and that we are not allowed, culturally or legally, to turn that wealth into a large family?

Throughout, we see the combination of three trends:

  1. People are making more money, and ending up with more wealth at a given age.

  2. Real costs in these areas are rising more than they should be, but not substantially higher than real incomes.

  3. This identifies important problems, but does not explain people’s unhappiness and felt inability to succeed or be in position to raise a family. More is happening.

As I said up top, the economists are right about the facts. Claims to the contrary are wrong. But those facts do not mean what the economists understand them to mean. They do not mean that Guy is not miserable or his life is not harder, or that Guy can afford his hometown, or to raise a family.

Whereas real wages have gone up a lot, so Guy’s life should be easier. Why isn’t it?

My answer, the thing I’m centrally building towards, is that this doesn’t represent the full range of costs and cost changes, centrally for two reasons: The Revolutions of Rising Expectations and Rising Requirements.

Discussion about this post

The $140K Question: Cost Changes Over Time Read More »

stranger-things-s5-trailer-teases-vol.-2

Stranger Things S5 trailer teases Vol. 2

We’re 10 days away from the next installment of the fifth and final season of Stranger Things, and Netflix has released a new trailer for what it’s calling Volume 2. This will cover episodes five through seven, with the final episode comprising Vol. 3.

(Spoilers for Season 5, Vol. 1 below.)

Season 4 ended with Vecna—the Big Bad behind it all—opening the gate that allowed the Upside Down to leak into Hawkins. We got a time jump for S5, Vol. 1, but in a way, we came full circle, since those events coincided with the third anniversary of Will’s original disappearance in S1.

In Vol. 1, we found Hawkins under military occupation and Vecna targeting a new group of young children in his human form under the pseudonym “Mr. Whatsit” (a nod to A Wrinkle in Time). He kidnapped Holly Wheeler and took her to the Upside Down, where she found an ally in Max, still in a coma, but her consciousness is hiding in one of Vecna’s old memories. Dustin was struggling to process his grief over losing Eddie Munson in S4, causing a rift with Steve. The rest of the gang was devoted to stockpiling supplies and helping Eleven and Hopper track down Vecna in the Upside Down. They found  Kali/Eight, Eleven’s psychic “sister” instead, being held captive in a military laboratory.

Things came to a head at the military base when Vecna’s demagorgons attacked to take 11 more children, wiping out most of the soldiers in record time. The big reveal was that, as a result of being kidnapped by Vecna in S1, Will has his own supernatural powers. He can tap into Vecna’s hive mind and manipulate those powers for his own purposes. He used his newfound powers to save his friends from the demagorgons.

Stranger Things S5 trailer teases Vol. 2 Read More »

filmmaker-rob-reiner,-wife,-killed-in-horrific-home-attack

Filmmaker Rob Reiner, wife, killed in horrific home attack

We woke up this morning to the horrifying news that beloved actor and director Rob Reiner and his wife Michele were killed in their Brentwood home in Los Angeles last night. Both had been stabbed multiple times. Details are scarce, but the couple’s 32-year-old son, Nick—who has long struggled with addiction and recently moved back in with his parents—has been arrested in connection with the killings, with bail set at $4 million.  [UPDATE: Nick Reiner’s bail has been revoked and he faces possible life in prison.]

“As a result of the initial investigation, it was determined that the Reiners were the victims of homicide,” the LAPD said. “The investigation further revealed that Nick Reiner, the 32-year-old son of Robert and Michele Reiner, was responsible for their deaths. Nick Reiner was located and arrested at approximately 9: 15 p.m. He was booked for murder and remains in custody with no bail. On Tuesday, December 16, 2025, the case will be presented to the Los Angeles County District Attorney’s Office for filing consideration.”

“It is with profound sorrow that we announce the tragic passing of Michele and Rob Reiner,” the family said in a statement confirming the deaths. “We are heartbroken by this sudden loss, and we ask for privacy during this unbelievably difficult time.”

Reiner started his career as an actor, best known for his Emmy-winning role as Meathead, son-in-law to Archie Bunker, on the 1970s sitcom All in the Family. (“I could win the Nobel Prize and they’d write ‘Meathead wins the Nobel Prize,’” Reiner once joked about the enduring popularity of the character.) Then Reiner turned to directing, although he continued to make small but memorable appearances in films such as Throw Momma from the Train, Sleepless in Seattle, The First Wives Club, and The Wolf of Wall Street, as well as TV’s The New Girl.

His first feature film as a director was an instant classic: 1984’s heavy metal mockumentary This Is Spinal Tap (check out the ultra-meta four-minute alt-trailer). He followed that up with a string of hits: The Sure Thing, Stand by Me, The Princess Bride, When Harry Met Sally…, Misery, the Oscar-nominated A Few Good Men, The American President, The Bucket List, and Ghosts of Mississippi. His 2015 film Being Charlie was co-written with his son Nick and was loosely based on Nick’s experience with addiction. Reiner’s most recent films were a 2024 political documentary about the rise of Christian nationalism and this year’s delightful Spinal Tap II: The End Continues.

Filmmaker Rob Reiner, wife, killed in horrific home attack Read More »

oh-look,-yet-another-starship-clone-has-popped-up-in-china

Oh look, yet another Starship clone has popped up in China

Every other week, it seems, a new Chinese launch company pops up with a rocket design and a plan to reach orbit within a few years. For a long time, the majority of these companies revealed designs that looked a lot like SpaceX’s Falcon 9 rocket.

The first of these copy cats, the medium-lift Zhuque-3 rocket built by LandSpace, launched earlier this month. Its primary mission was nominal, but the Zhuque-3 rocket failed its landing attempt, which is understandable for a first flight. Doubtless there will be more Chinese Falcon 9-like rockets making their debut in the near future.

However, over the last year, there has been a distinct change in announcements from China when it comes to new launch technology. Just as SpaceX is seeking to transition from its workhorse Falcon 9 rocket—which has now been flying for a decade and a half—to the fully reusable Starship design, so too are Chinese companies modifying their visions.

Everyone wants a Starship these days

The trend began with the Chinese government. In November 2024 the government announced a significant shift in the design of its super-heavy lift rocket, the Long March 9. Instead of the previous design, a fully expendable rocket with three stages and solid rocket boosters strapped to the sides, the country’s state-owned rocket maker revealed a vehicle that mimicked SpaceX’s fully reusable Starship.

Around the same time, a Chinese launch firm named Cosmoleap announced plans to develop a fully reusable “Leap” rocket within the next few years. An animated video that accompanied the funding announcement indicated that the company seeks to emulate the tower catch-with-chopsticks methodology that SpaceX has successfully employed.

But wait, there’s more. In June a company called Astronstone said it too was developing a stainless steel, methane-fueled rocket that would also use a chopstick-style system for first stage recovery. Astronstone didn’t even pretend to not copy SpaceX, saying it was “fully aligning its technical approach with Elon Musk’s SpaceX.”

Oh look, yet another Starship clone has popped up in China Read More »