Author name: Kelly Newman

judge-calls-out-openai’s-“straw-man”-argument-in-new-york-times-copyright-suit

Judge calls out OpenAI’s “straw man” argument in New York Times copyright suit

“Taken as true, these facts give rise to a plausible inference that defendants at a minimum had reason to investigate and uncover end-user infringement,” Stein wrote.

To Stein, the fact that OpenAI maintains an “ongoing relationship” with users by providing outputs that respond to users’ prompts also supports contributory infringement claims, despite OpenAI’s argument that ChatGPT’s “substantial noninfringing uses” are exonerative.

OpenAI defeated some claims

For OpenAI, Stein’s ruling likely disappoints, although Stein did drop some of NYT’s claims.

Likely upsetting to news publishers, that included a “free-riding” claim that ChatGPT unfairly profits off time-sensitive “hot news” items, including the NYT’s Wirecutter posts. Stein explained that news publishers failed to plausibly allege non-attribution (which is key to a free-riding claim) because, for example, ChatGPT cites the NYT when sharing information from Wirecutter posts. Those claims are pre-empted by the Copyright Act anyway, Stein wrote, granting OpenAI’s motion to dismiss.

Stein also dismissed a claim from the NYT regarding alleged removal of copyright management information (CMI), which Stein said cannot be proven simply because ChatGPT reproduces excerpts of NYT articles without CMI.

The Digital Millennium Copyright Act (DMCA) requires news publishers to show that ChatGPT’s outputs are “close to identical” to the original work, Stein said, and allowing publishers’ claims based on excerpts “would risk boundless DMCA liability”—including for any use of block quotes without CMI.

Asked for comment on the ruling, an OpenAI spokesperson declined to go into any specifics, instead repeating OpenAI’s long-held argument that AI training on copyrighted works is fair use. (Last month, OpenAI warned Donald Trump that the US would lose the AI race to China if courts ruled against that argument.)

“ChatGPT helps enhance human creativity, advance scientific discovery and medical research, and enable hundreds of millions of people to improve their daily lives,” OpenAI’s spokesperson said. “Our models empower innovation, and are trained on publicly available data and grounded in fair use.”

Judge calls out OpenAI’s “straw man” argument in New York Times copyright suit Read More »

not-just-switch-2:-esa-warns-trump’s-tariffs-will-hurt-the-entire-game-industry

Not just Switch 2: ESA warns Trump’s tariffs will hurt the entire game industry

This morning’s announcement that Nintendo is delaying US preorders for the Switch 2 immediately increased the salience of President Trump’s proposed wide-reaching import tariffs for millions of American Nintendo fans. Additionally, the Entertainment Software Association—a lobbying group that represents the game industry’s interests in Washington—is warning that the effects of Trump’s tariffs on the gaming world won’t stop with Nintendo.

“There are so many devices we play video games on,” ESA senior vice president Aubrey Quinn said in an interview with IGN just as Nintendo’s preorder delay news broke. “There are other consoles… VR headsets, our smartphones, people who love PC games; if we think it’s just the Switch, then we aren’t taking it seriously.

“This is company-agnostic, this is an entire industry,” she continued. “There’s going to be an impact on the entire industry.”

While Trump’s tariff proposal includes a 10 percent tax on imports from pretty much every country, it also includes a 46 percent tariff on Vietnam and a 54 percent total tariff on China, the two countries where most console hardware is produced. Quinn told IGN that it’s “hard to imagine a world where tariffs like these don’t impact pricing” for those consoles.

More than that, though, Quinn warns that massive tariffs would tamp down overall consumer spending, which would have knock-on effects for game industry revenues, employment, and research and development investment.

“Video game consoles are sold under tight margins in order to reduce the barrier to entry for consumers,” the ESA notes in its issue page on tariffs. “Tariffs mean that the additional costs would be passed along to consumers, resulting in a ripple effect of harm for the industry and the jobs it generates and supports.

Not just a foreign problem

The negative impacts wouldn’t be limited to foreign companies like Nintendo, Quinn warned, because “even American-based companies, they’re getting products that need to cross into American borders to make those consoles, to make those games. And so there’s going to be a real impact regardless of company.”

Not just Switch 2: ESA warns Trump’s tariffs will hurt the entire game industry Read More »

federal-funding-freeze-endangers-climate-friendly-agriculture-progress

Federal funding freeze endangers climate-friendly agriculture progress

For decades, environmental and farm groups pushed Congress, the USDA and farmers to adopt new conservation programs, but progress came in incremental steps. With each Farm Bill, some lawmakers threaten to whittle down conservation programs, but they have essentially managed to survive and even expand.

The country’s largest farm lobby, the American Farm Bureau Federation, had long denied the realities of climate change, fighting against climate action and adopting official policy positions that question the scientific consensus that climate change is human-caused. Its members—the bulk of American farmers—largely adhered to the same mindset.

But as the realities of climate change have started to hit American farmers on the ground in the form of more extreme weather, and as funding opportunities have expanded through conservation and climate-focused programs, that mindset has started to shift.

“They were concerned about what climate policy meant for their operations,” Bonnie said. “They felt judged. But we said: Let’s partner up.”

The Trump administration’s rollbacks and freezes threaten to stall or undo that progress, advocacy groups and former USDA employees say.

“We created this enormous infrastructure. We’ve solved huge problems,” Bonnie added, “and they’re undermining all of it.”

“It took so long,” Stillerman said. “The idea that climate change was happening and that farmers could be part of the solution, and could build more resilient farming and food systems against that threat—the IRA really put dollars behind that. All of that is at risk now.”

Burk says he plans to continue with conservation and carbon-storing practices on his Michigan farm, even without conservation dollars from the USDA.

But, he says, many of his neighboring farmers likely will stop conservation measures without the certainty of government support.

“So many people are struggling, just trying to figure out how to pay their bills, to get the fuel to run their tractors, to plant,” he said. “The last thing they want to be doing is sitting down with someone from NRCS who says, ‘If I do these things, maybe I’ll get paid in a year.’ That’s not going to happen.”

This story originally appeared on Inside Climate News.

Federal funding freeze endangers climate-friendly agriculture progress Read More »

wealthy-americans-have-death-rates-on-par-with-poor-europeans

Wealthy Americans have death rates on par with poor Europeans

“The findings are a stark reminder that even the wealthiest Americans are not shielded from the systemic issues in the US contributing to lower life expectancy, such as economic inequality or risk factors like stress, diet or environmental hazards,” lead study author Irene Papanicolas, a professor of health services, policy and practice at Brown, said in a news release.

The study looked at health and wealth data of more than 73,000 adults across the US and Europe who were 50 to 85 years old in 2010. There were more than 19,000 from the US, nearly 27,000 from Northern and Western Europe, nearly 19,000 from Eastern Europe, and nearly 9,000 from Southern Europe. For each region, participants were divided into wealth quartiles, with the first being the poorest and the fourth being the richest. The researchers then followed participants until 2022, tracking deaths.

The US had the largest gap in survival between the poorest and wealthiest quartiles compared to European countries. America’s poorest quartile also had the lowest survival rate of all groups, including the poorest quartiles in all three European regions.

While less access to health care and weaker social structures can explain the gap between the wealthy and poor in the US, it doesn’t explain the differences between the wealthy in the US and the wealthy in Europe, the researchers note. There may be other systemic factors at play that make Americans uniquely short-lived, such as diet, environment, behaviors, and cultural and social differences.

“If we want to improve health in the US, we need to better understand the underlying factors that contribute to these differences—particularly amongst similar socioeconomic groups—and why they translate to different health outcomes across nations,” Papanicolas said.

Wealthy Americans have death rates on par with poor Europeans Read More »

nvidia-confirms-the-switch-2-supports-dlss,-g-sync,-and-ray-tracing

Nvidia confirms the Switch 2 supports DLSS, G-Sync, and ray tracing

In the wake of the Switch 2 reveal, neither Nintendo nor Nvidia has gone into any detail at all about the exact chip inside the upcoming handheld—technically, we are still not sure what Arm CPU architecture or what GPU architecture it uses, how much RAM we can expect it to have, how fast that memory will be, or exactly how many graphics cores we’re looking at.

But interviews with Nintendo executives and a blog post from Nvidia did at least confirm several of the new chip’s capabilities. The “custom Nvidia processor” has a GPU “with dedicated [Ray-Tracing] Cores and Tensor Cores for stunning visuals and AI-driven enhancements,” writes Nvidia Software Engineering VP Muni Anda.

This means that, as rumored, the Switch 2 will support Nvidia’s Deep Learning Super Sampling (DLSS) upscaling technology, which helps to upscale a lower-resolution image into a higher-resolution image with less of a performance impact than native rendering and less loss of quality than traditional upscaling methods. For the Switch games that can render at 4K or at 120 FPS 1080p, DLSS will likely be responsible for making it possible.

The other major Nvidia technology supported by the new Switch is G-Sync, which prevents screen tearing when games are running at variable frame rates. Nvidia notes that G-Sync is only supported in handheld mode and not in docked mode, which could be a limitation of the Switch dock’s HDMI port.

Nvidia confirms the Switch 2 supports DLSS, G-Sync, and ray tracing Read More »

monkeys-are-better-yodelers-than-humans,-study-finds

Monkeys are better yodelers than humans, study finds

Monkey see, monkey yodel?

That’s how it works for humans, but when it comes to the question of yodeling animals, it depends on how you define yodeling, according to bioacoustician Tecumseh Fitch of the University of Vienna in Austria, who co-authored this latest paper. Plenty of animal vocalizations use repeated sudden changes in pitch (including birds), and a 2023 study found that toothed whales can produce vocal registers through their noses for echolocation and communication.

There haven’t been as many studies of vocal registers in non-human primates, but researchers have found, for example, that the “coo” call of the Japanese macaque is similar to a human falsetto; the squeal of a Syke monkey is similar to the human “modal” register; and the Diana monkey produces alarm calls that are similar to “vocal fry” in humans.

It’s known that non-human primates have something humans have lost over the course of evolution: very thin, light vocal membranes just above the vocal folds. Scientists have pondered the purpose of those membranes, and a 2022 study concluded that this membrane was crucial for producing sounds. The co-authors of this latest paper wanted to test their hypothesis that the membranes serve as an additional oscillator to enable such non-human primates to achieve the equivalent of human voice registers. That, in turn, would render them capable in principle of producing a wider range of calls—perhaps even a yodel.

The team studied many species, including black and gold howler monkeys, tufted capuchins, black-capped squirrel monkeys, and Peruvian spider monkeys. They took CT scans of excised monkey larynxes housed at the Japan Monkey Center, as well as two excised larynxes from tufted capuchin monkeys at Kyoto University. They also made live recordings of monkey calls at the La Senda Verde animal refuge in the Bolivian Andes, using non-invasive EGG to monitor vocal fold vibrations.

Monkeys are better yodelers than humans, study finds Read More »

ai-#110:-of-course-you-know…

AI #110: Of Course You Know…

Yeah. That happened yesterday. This is real life.

I know we have to ensure no one notices Gemini 2.5 Pro, but this is rediculous.

That’s what I get for trying to go on vacation to Costa Rica, I suppose.

I debated waiting for the market to open to learn more. But fit, we ball.

Also this week: More Fun With GPT-4o Image Generation, OpenAI #12: Battle of the Board Redux and Gemini 2.5 Pro is the New SoTA.

  1. The New Tariffs Are How America Loses. This is somehow real life.

  2. Is AI Now Impacting the Global Economy Bigly? Asking the wrong questions.

  3. Language Models Offer Mundane Utility. Is it good enough for your inbox yet?

  4. Language Models Don’t Offer Mundane Utility. Why learn when you can vibe?

  5. Huh, Upgrades. GPT-4o, Gemini 2.5 Pro, and we partly have Alexa+.

  6. On Your Marks. Introducing PaperBench. Yes, that’s where we are now.

  7. Choose Your Fighter. How good is ChatGPT getting?

  8. Jevons Paradox Strikes Again. Compute demand is going to keep going up.

  9. Deepfaketown and Botpocalypse Soon. The only answer to a bad guy with a bot.

  10. They Took Our Jobs. No, AI is not why you’ll lose your job in the short term.

  11. Get Involved. Fellowships, and the UK AISI is hiring.

  12. Introducing. Zapier releases its MCP server, OpenAI launches AI Academy.

  13. In Other AI News. Google DeepMind shares 145 page paper, but no model card.

  14. Show Me the Money. The adventures of the efficient market hypothesis.

  15. Quiet Speculations. Military experts debate AGI’s impact on warfare.

  16. The Quest for Sane Regulations. At what point do you just give up?

  17. Don’t Maim Me Bro. Further skepticism that the MAIM assumptions hold.

  18. The Week in Audio. Patel on Hard Fork, Epoch employees debate timelines.

  19. Rhetorical Innovation. As usual it’s not going great out there.

  20. Expect the Unexpected. What are you confident AI won’t be able to do?

  21. Open Weights Are Unsafe and Nothing Can Fix This. Oh no, OpenAI.

  22. Anthropic Modifies its Responsible Scaling Policy. Some small changes.

  23. If You’re Not Going to Take This Seriously. I’d prefer if you did?

  24. Aligning a Smarter Than Human Intelligence is Difficult. Debating SAEs.

  25. Trust the Process. Be careful exactly how much and what ways.

  26. People Are Worried About AI Killing Everyone. Elon Musk again in brief.

  27. The Lighter Side. Surely you’re joking, Mr. Human. Somehow.

The new ‘Liberation Day’ tariffs are suicidal insanity. Congress must act to revoke executive authority in such matters and reverse this lunacy before it is too late. When you realize how the tariffs were calculated, it’s even crazier.

Tyler Cowen: This is perhaps the worst economic own goal I have seen in my lifetime.

Non Opinion Haver: The bad news is prices will go up on everything but the good news is that domestic manufacturing will also tank.

German Vice Chancellor Robert Habeck: Last night’s decision is comparable to the war of aggression against Ukraine… The magnitude and determination of the response must be commensurate.

Yaroslav Trofimov: The unpredictability of America for the foreseeable future is more serious than the tariffs themselves. Companies can’t make long-term decisions if policy is made and changed on a whim. Much of the rest of the world will respond by creating U.S.-free supply chains.

This hurts even American manufacturing, because we are taxing imports of the components and raw materials we will need, breaking our supply chains and creating massive uncertainty. We do at least exempt a few inputs like copper, aluminum and steel (and importantly for AI, semiconductors), so it could be so much worse, but it is still unbelievably awful.

If we were specifically targeting only the particular final manufactured goods we want to ensure get made in North America for security and competitiveness reasons, and it had delays in it to set expectations, avoid disruptions and allow time to physically adjust production, I would still hate it but at least it would make some sense. If it was paired with robust deregulatory actions I might even be able to respect it.

If we were doing actual ‘reciprocal tariffs’ where we set our tariff rate equal to their tariff rate, including 0% if theirs was 0%, I would be actively cheering. Love it.

This is very much not any of that. We know exactly what formula they actually used, which was, and this is real life: (exports-imports)/exports. That’s it. I’m not kidding. They actually think that every bilateral trade relationship where we have a trade deficit means we are being done wrong and it must be fixed immediately.

I’m sure both that many others will explain all this in detail if you’re curious, and also if you’re reading this you presumably already know.

You also doubtless know that none of what those justifying, defending or sanewashing these tariffs are saying is how any of this works.

The declines we are seeing in the stock market reflect both that a lot of this was previously priced in, and also that the market is still putting some probability on all of this being walked back from the brink somehow. And frankly, despite that, the market is underreacting here. The full effect is much bigger.

Dr Radchenko: The fact that markets have not yet melted down speaks to human optimism. Reminds me of that Soviet joke:

What’s the difference between a pessimist and an optimist?

A pessimist is he who says: “It just can’t get any worse!”

Whereas an optimist says: “Come on, of course it can!”

It doesn’t look good.

I mention this up top in an AI post despite all my efforts to stay out of politics, because in addition to torching the American economy and stock market and all of our alliances and trade relationships in general, this will cripple American AI in particular.

That’s true even if we didn’t face massive retaliatory tariffs. That seems vanishingly unlikely if America stays the course. One example is that China looks to be targeting the exact raw materials that are most key to AI as one of its primary weapons here.

American tech companies, by the time you read this, have already seen their stocks pummeled, their business models and abilities to invest severely hurt. Our goodwill, trust and free access to various markets for such tech is likely evaporating in real time. This is how you get everyone around the world looking towards DeepSeek. You think anyone is going to want to cooperate with us on this now?

And remember again that what you see today is only how much worse this was than expectations – a lot of damage had already been priced in, and everyone is still hoping that this won’t actually stick around for that long.

We partially dodged one bullet in particular, good job someone, but this is only a small part of the problem:

Ryan Peterson: The Taiwan tariffs at 32% are massive BUT there is a carve out that they don’t apply to semiconductors.

•Some goods will not be subject to the Reciprocal Tariff. These include: (1) articles subject to 50 USC 1702(b); (2) steel/aluminum articles and autos/auto parts already subject to Section 232 tariffs; (3) copper, pharmaceuticals, semiconductors, and lumber articles; (4) all articles that may become subject to future Section 232 tariffs; (5) bullion; and (6) energy and other certain minerals that are not available in the United States.

Semiconductors including GPUs are exempt from the new 34% duties on Taiwan but graphics cards (which contain GPUs) are not.

If nothing changes those supply chains will have to re-arrange, either to do assembly in the US or to set up GPU clouds outside the US.

That’s right. We are still putting a huge tax on GPUs. Are we trying to lose the future?

Ryan Peterson also notices that they also intend to universally kill the ‘de minimus’ exemption, which isn’t directly related to AI but is a highly awful idea if they ever try to actually implement it, and also he points out that this will have dramatic secondary effects.

If we’re going to build America’s AI policy around America ‘winning,’ the least we can do is not shoot ourselves in the foot. And also everywhere else.

The other reason to mention this up top is, well…

For all the haters out there, that impact of AI so far might now be massively negative, so honestly so far this might be a pretty great call by the haters.

Rohit: This might be the first large-scale application of AI technology to geopolitics.. 4o, o3 high, Gemini 2.5 pro, Claude 3.7, Grok all give the same answer to the question on how to impose tariffs easily.

I think the fact they’re using LLMs is good but they def need better prompt engineers I think.

This is now an AI safety issue.

This is Vibe Governing.

cc @ESYudkowsky you might have had a point about us handing over decision making authority at the drop of a hat. They found something that can speak in complete sentences and immediately …

Derek Thompson: Approaching, and surpassing, frightening levels of Veep.

Eli Dourado: First recession induced by an unaligned AI.

Roon: one surprising thing about the distribution of ai is that you are using the same tools as the presidents cabinet. it’s all bottom up and unsophisticated schoolchildren harness godlike powers. most people like to believe (maybe hope?) there are shadowy rooms with secret technology.

If that’s actually how this went down, and to be clear I remain hopeful that it probably didn’t go down in this way, then it’s not that the AIs in question are not aligned. It’s that the AIs are aligned to the user, and answered the literal question asked, without sufficient warnings not to do it. And for some people, no warnings will matter.

This might be a good illustration of, ‘yes you could have found this information on your own and it still might be catastrophically bad to output it as an answer.’

I can’t believe this is actually real, but we solved the puzzle, then they actually admitted it, and yes this looks like it is the calculation the Actual Real White House is doing that is about to Crash the Actual Real Global Economy. Everyone involved asked the wrong question, whether or not the entity answering it was an AI, and more importantly they failed to ask any reliable source the follow-up.

That follow-up question is ‘what would happen if we actually did this?’

I very much encourage everyone involved to ask that question next time! Please, please ask that question. You need to ask that question. Ask economists. Also type it into the LLM of your choice. See what they say.

Also, it’s pretty funny the extent to which you tell Gemini this happened and it completely utterly refuses to believe you right up until the censors delete the answer.

Gemini’s next chain of thought included ‘steer conversation back to something more realistic.’ Alas.

To be fair, technically, the White House Press Secretary said no, that’s not the formula, the formula they used included two other terms. However, those terms cancel out. This is real life.

It is impossible to talk to any frontier LLM about this and not have it be screaming at you how horrible an idea this is. Claude even nailed the recession probability at 45%-50% in 2025 (on Polymarket it is at 49% as I type this) given only this one data point and what is otherwise a January data cutoff (it can’t search the web while I’m in Costa Rica).

To be clear, it seems unlikely this was actually the path through causal space that got us the tariffs we got. But it’s scary the extent to which I cannot rule it out.

Timothy Lee thinks Shortwave’s AI assistant is getting good, in particular by not giving up if its first search fails. I’m considering giving it a shot.

AI is highly useful in fighting denials of insurance claims. Remember to ask it to respond as a patio11-style dangerous professional.

Nabeel Qureshi runs an experiment with four Odyssey translations. Three are classic versions, one is purely one-shot GPT-4o minus the em-dashes, and at 48% the AI version was by far the most popular choice. I am with Quereshi that I think Fitzgerald had the best version if you aren’t optimizing for fidelity to the original text (since that’s Greek to me), but I went in spoiled so it isn’t fully fair.

Good prompting is very much trial and error. If you’re not happy with the results, mutter ‘skill issue’ and try again. That’s in addition to the usual tips, like providing relevant context and giving specific examples.

Dartmouth runs a clinical trial of “Therabot,” and they’re spectacular, although N=106 means I wouldn’t get overexcited yet.

Amy Wu Martin: Dartmouth just ran the first clinical trial with a generative AI therapy chatbot—results: depression symptoms dropped 51%, anxiety by 31%, and eating disorders by 19%.

“Therabot,” built on Falcon and Llama, was fine-tuned with just tens of thousands of hours of synthetic therapist-patient dialogue.

Imagine what can be done training with millions of hours of real therapy data 👀

Researchers say these results are comparable to therapy with a human. We need to scale up both the trial’s size and duration and also the training data, and be on the lookout for possible downsides, but it makes sense this would work, and it’s a huge deal. It would be impossible to provide this kind of human attention to everyone who could use it.

For now they say things like ‘there is no substitute for human care’ but within a few years this will be reliably better than most human versions. If nothing else, being able to be there when the patient needs it, always ready to answer, never having to end the session for time, is an epic advantage.

The correct explanation of why you need to learn to code:

Austen Allred: You don’t need to learn how to code. You just need to be able to tell a computer what to do in a way that it will respond, understand what it’s doing and how to optimize that, and fix it when it’s not working.

Gemini accuses Peter Wildeford of misinformation for asking about recent news, in this case xAI acquiring Twitter.

A classic vibe coding case, but is a terrible version of something better if the alternative was nothing at all? It can go either way.

Gemini 2.5 Pro is now available to all users on the Gemini app, for free, with rate limits and a smaller context window if you aren’t subscribed, or you can get your first subscription month free.

This is the first move in a while that is part of what an actual marketing effort would do. They still have to get the word out, but it’s a start.

Gemini 2.5 Pro also adds access to Canvas. The Gemini API offers function calling.

OpenAI updated GPT-4o.

OpenAI: GPT-4o got an another update in ChatGPT!

What’s different?

– Better at following detailed instructions, especially prompts containing multiple requests

– Improved capability to tackle complex technical and coding problems

– Improved intuition and creativity

– Fewer emojis 🙃

Altman claims it’s a big upgrade. I don’t see anyone else talking about it.

I still think of GPT-4o as an image model at this point. If an upgrade was strong enough to overcome that, I’d expect the new model to be called something else. This did cause GPT-4o to jump to #2 on Arena ahead of GPT-4.5 and Grok, still 35 points behind Gemini 2.5.

Gemini 2.5 now available in Cursor.

Alexa+ launched on schedule, but is missing some features for now, some to be delayed for two months. At launch, it can order an Uber, but not GrubHub, and you can’t chat with it on the web, unless you count claude.ai. It sounds like things are not ready for Amazon prime time yet.

Claude simplifies its interface screen.

Pliny the Liberator suggests: “write a prompt for insert-query-here then answer it”

OpenAI releases PaperBench, tasking LLMs with replicating top 2024 ICML papers, including understanding it, writing code and executing experiments. A great idea, although I worry about data contamination especially given they are open sourcing the eval. Is it crazy to think that you want to avoid open sourcing evals for this reason?

OpenAI: We evaluate replication attempts using detailed rubrics co-developed with the original authors of each paper.

These rubrics systematically break down the 20 papers into 8,316 precisely defined requirements that are evaluated by an LLM judge.

We evaluate several frontier models on PaperBench, finding that the best-performing tested agent, Claude 3.5 Sonnet (New) with open-source scaffolding, achieves an average replication score of 21.0%.

Finally, we recruit top ML PhDs to attempt a subset of PaperBench, finding that models do not yet outperform the human baseline.

We are open sourcing PaperBench, full paper here.

Janus: Top ML phds is a high bar!

They did not include Gemini 2.5 Pro or Claude Sonnet 3.7, presumably they came out too recently, but did include r1 and Gemini 2.0 Flash:

Humans outperform the models if you give them long enough, but not by much.

In other testing news, do the Math Olympiad claims hold up? Zain shows us what happens when we tested LLMs on the 2025 Math Olympiad, fresh off the presses, and there were epic fails everywhere (each problem is out of 7 points, so maximum is 42, average score of participants is 15.85)…

Zain: they tested sota LLMs on 2025 US Math Olympiad hours after the problems were released Tested on 6 problems and spoiler alert! They all suck -> 5%

…except Gemini 2.5 Pro, which came out the same day as the benchmark, so they ran that test and got 24.4% by acing problem 1 and getting 50% on problem 4.

That may be the perfect example of ‘if the models are failing, give it a minute.’

It’s surprising it took this long, here is the start of the Wordle benchmark, someone should formalize this, you could easily run 1000 words or what not.

Xeophon: On today’s Wordle, the new Gemini model completely crushed the competition. It logicially deducted diverse words, found the correct spots of valid and invalid letters and got a result quickly. Sonnet proposed multiple invalid words in the end, so DNF.

Also, fun: I said *youdid it, Gemini responded that *wegot there. Thanks, Gemini!

It is rather easy to see that GPT-4.5 and Sonnet 3.7 played badly here. Why would you ever have an R in 2nd position twice? Whereas Gemini 2.5 played well. Xeophon says lack of vision tripped up Sonnet, obviously you could work around that if you wanted.

Oh, it’s on, send in the new challenger: Gemini Plays Pokemon.

Janus has expressed more disdain for benchmarks than any other person I know, so here’s what Janus says would be an actually good benchmark.

Janus: Difficult-to-goodhart LLM benchmarks I think should get more mindshare:

What does it feel like to be deeply entangled with it? How does it affect your life? Think back to if you’ve ever had an LLM become a major part of your reality. In my experience, each one feels deeply different. Also, it changes the distribution of things you think about and do.

Also this but for the world: How does it change the atmosphere and focus and direction of innovation of the world when many people are interacting with it and talking about it? Again, in my experience, this varies in intricate ways per model.

This is what really matters.

I want to hear accounts of this. In retrospect.

That is certainly a very different kind of benchmark. It tells us a different style of thing. It is not so helpful in charting out many aspects of the big picture, but seems highly useful in figuring out what models to use when. That’s important too. Also the answers will be super interesting, and inspire other things one might want to do.

GPT-4.5 causes humans to fail a 5-minute 4-question version of the Turing Test, winning far over 50% of the time, although notice that even ‘Llama-Persona’ does well here too. The test would be better with longer interactions, which is actually a place where GPT-4.5 relatively shines. And of course we’d all like to see Gemini 2.5 Pro and Claude 3.7 Sonnet as part of any such test.

Stephanie Palazzolo: This isn’t an April Fools joke: ChatGPT revenue has surged 30% in just three months.

Gallabytes: this feels in line with my sense of the quality of the product. 4o actually got good? not just the image stuff the normal model too. deep research is great. o1 and 4.5 are good premium offerings. they filled out the product pretty well.

Jared Johnson: For the first time ever I can just stay in ChatGPT all day and not miss out on anything from Claude, Perplexity, Midjourney, or Gemini. They really have stepped it up.

The weak form of this is very true. ChatGPT is a much better offering than it was a few months ago. You get o1, o3 and 4.5, deep research and 4o’s image generation, and 4o has had a few of its own upgrades.

However I find the strong form of this response rather strange. You are very much missing out if you only use ChatGPT, even if you are paying the $200 a month. And I think most people are substantially better off with $60 a month split between Gemini, Claude and ChatGPT than they would be paying the $200, especially now with Gemini 2.5 Pro available. The exception is if you really want those 120 deep research queries, but I’d be inclined to take 2.5 Pro over o1 Pro.

What should you use for coding now?

Gallabytes: I’m kinda liking [Gemini 2.5] better than sonnet so far but not for everything. Seems better when it has the right context but worse at finding it, less relentless but also less likely to add error handling to your unit tests.

Reactions seem to strongly endorse a mix of Gemini and Claude as the correct strategy for coding right now.

Alexander Doria: Everytime I try to redo classic prompt engineering gen ai app, I get reminded that:

  1. OpenAI and Anthropic are still ahead.

  2. You never get to the last miles without finetuning.

While people hammered Nvidia’s stock prior to ‘liberation day,’ the biggest launches of AI that happened at the same time, Gemini 2.5 and GPT-4o image generation, were both capacity constrained despite coming from two of the biggest investors in compute. As was Grok, and of course Claude is always capacity constrained.

Demand for compute is only going to go up as compute use gets more efficient. We are nowhere near any of the practical limits of this effect. Plan accordingly.

Could AI empower automatic community notes? This seems like a great idea. If a Tweet gets a combination of sufficiently many hits and requests for a fact check, you can do essentially a Deep Research run, and turn it into a proposed community note, which could be labeled as such (e.g. ‘Community Note Suggested by Grok.’) Humans can then use the current system to rate and approve the proposals.

Alternatively, I like the idea of an AI running in the background, and if any post would get a community note from my AI, it alerts me to this, and then I can choose whether to have it formalize the rest. Lot of options to explore here.

Don’t worry, [Professor’s Name] can’t do anything without proof.

Matty Matt: Jesus Christ.

TW: I’m so disappointed. This is completely out of character for [student’s name]

Ducker Trucker: People love to talk about how it’s not doing the work for them, it’s just a tool, why do you hate progress and then reveal they don’t even take 10 seconds to read what it typed for them.

For a while, we have had a severe ‘actually you are playing against bots’ problem.

Grant Slatton: was playing a silly multiplayer web game a few weeks ago (an agario clone) and after a few minutes realized 95% of the players were bots

completely lost interest, since most of the appeal was competing with other humans

lesson there.

there is real multiplayer, i think the server just populates the game with bots with the real player count is low

was like 5 humans and 95 bots.

This is mostly not about LLMs or even decent AI, and far more about:

  1. The new player experience often wants to be scripted.

  2. People want to be matched instantly.

  3. People want to win far more than their fair share (typically ~60% is good).

  4. People often need a sandbox in which to skill up.

A common phenomenon is that ‘the bots’ or AI in a game could be made much better with moderate effort, and either this is judged not worth bothering or actively avoided.

There are definitely games where this is massively overdone and it does harm. One prominent example is Marvel Snap. Players are forced to go through quite a lot of bots, even when there are plenty of opponents available, and a lot of your net profits come from bot exploitation, so there’s a lot of optimization pushing towards ‘figure out if it is a bot and get a +4 from the bots while making sure you’re not losing -4 or -8 against humans’ but that’s not the way to make the game fun. Oy.

It’s happening department:

Near Cyan: ~50 people have DM’d me saying [Auren] made them cry and rethink their life

by talking to a bunch of LLMs for a few hours.

superhuman persuasion is real and we hope to showcase this in the most ethical way possible to set a good example and demonstrate that which is to come.

unfortunately the app is also a schzio magnet like no other so my account is kinda done for. many such cases i guess.

Gallabytes: I think calling it “superhuman” is false but that isn’t actually important, this is a good illustration of what the onramp to the singularity looks like. Industrial production for the service industry, the true promise of software finally unlocked.

Is Claude better than the best human therapists? no certainly not. I can definitely do better, I know a few others who could too. But it’s intensely g-loaded and even if they didn’t have other careers available there just wouldn’t be enough of them to sate demand.

So it goes with teachers, administrators, staff engineers, management consultants, accountants, etc. The economic returns kick in well before truly superhuman ability, and often before they can even displace the least human practitioners.

I agree that this is not ‘superhuman’ persuasion (or ‘super persuasion’) yet, and I agree that this is not important. You mostly don’t need it. Things get even weirder once you do get it, and it is absolutely coming, but the ability to do ‘merely kind of human’ level persuasion has a long track record of doing quite a lot of persuading. Indeed, one could say it is the only thing that does.

Also the super persuasion is coming, it just isn’t here yet. Obviously a sufficiently capable future LLM will be super human at persuasion. I, alas, do not possess superhuman persuasion, and have run out of methods of convincing people who blindly insist it can’t be done.

Don’t worry, it’s not so persuasive, half of people wouldn’t help their AI assistant if it asked nicely yet, it’s fine (see link for full poll question from Jan Kulveit).

During the early phase of them taking our jobs, this theory makes sense to me, too:

Dean Ball: I do not expect widespread, mass layoffs due to AI in the near term.

But I worry that as knowledge workers exit firms, they won’t be replaced. This may hit young people seeking to enter knowledge work fields especially hard. Their job prospects by the late 2020s could be dim.

Peter Wildeford: This theory of AI displacement mainly being in not rehiring workers that leave makes sense to me.

Here’s a good FRED data series to track this: total Professional and Business Services job openings.

No sign of AI yet.

That doesn’t mean that there will be widespread unemployment during this phase. Many roles will cut back largely via attrition. If people leave or need to be fired, they don’t get replaced. Those that don’t get hired then go to other roles.

Eventually, if we continue down this path, we start running out of these other roles, because AI is doing them, too, and workers start getting outright fired more often.

Pivotal Fellowship for Q3, June 30 – August 29 in London.

The UK AISI is hiring again. I think this is a clearly great thing to do.

Foundation for American Innovation and Samuel Hammond are launching a conservative AI policy fellowship, apply by April 30, runs June 13 – July 25. I agree with Dean Ball, if you are conservative and interested in AI policy this is as good an opportunity as you are going to find.

Shortwave AI is hiring, if you do it tell them I sent you, I need better inbox tools.

Zapier releases its MCP server, letting you connect essentially anything, to your Cursor agent or otherwise. Paul Graham retweeted Brendan giving the maximalist pitch here:

Brendan: Zapier just changed the game for AI builders.

You can now connect your AI assistant to 8,000+ apps with zero API integration.

No OAuth struggles. No glue code. No custom backends.

Just one URL = real-world actions.

Here’s why Zapier MCP might be the most important AI launch this month.

AI can talk all day. But now it can do things.

Zapier just launched MCP (Model Context Protocol) a secure bridge between your AI and 30,000+ real actions across the tools you use.

Slack, Google Sheets, HubSpot, Notion — all instantly accessible.

Here’s how it works:

• Generate your MCP endpoint

• Choose what your AI can do (fine-grained control)

• Plug it into any AI interface: Cursor, Claude, Windsurf, etc.

Boom: Your AI can now send emails, schedule meetings, update records.

[start here]

TxGamma, a more general, open and scalable set of AI models from DeepMind to improve the drug development process.

OpenAI gives us OpenAI Academy, teaching AI basics for free.

Google DeepMind shares its 145-page paper detailing its approach to AGI safety and security strategy. Kudos to them for laying it all out there. I expect to cover this in detail once I have a chance to read it.

However, Shakeel asks where Gemini 2.5 Pro’s system card is, and notes that Google promised to publish the relevant information. There’s still no sign of it.

With the launch of Alexa+, Amazon Echoes will no longer offer you the option to not send your voice recordings to the cloud. If you don’t like it, don’t use Echoes.

LSE partners with Anthropic to essentially give everyone Claude access I think?

Frontier Model Forum amounces first-of-its-kind agreement to facilitate information sharing about threats, vulnerabilities and capability advances unique to frontier AI. The current members are Amazon, Anthropic, Google, Meta, Microsoft, and OpenAI.

Anthropic releases more information with information from the Anthropic Economic Index. Changes in usage remain gradual for now, mostly augmentation with little automation.

Your reminder that you should absolutely keep any promises you make to AIs, and treat them the same way you would treat keeping promises to humans.

This is more important for promises to LLMs and other AIs than those made to your fridge, but ideally this also includes those too, along with promises to yourself, or ‘to the universe,’ and indeed any promise period. If you don’t want to make the promise, then don’t make it.

I especially don’t endorse this:

Dimitriy (I disagree): Roughly speaking, we don’t need to keep our promises to entities that would not keep theirs to us.

The efficient market hypothesis is false, buy buy buy edition.

Abe Brown: In less than a year, the startup behind Cursor has sparked tech’s buzziest buzzphrase—vibe coding—and seen its valuation go from $400M to $2.5B to possibly $10B.

It’s one of the fastest-growing startups ever.

Arfur Rock: Cursor round closed — $625M at $9.6B post led by Thrive & A16z. Accel is a new backer.

$200M ARR, up 4x from $2.5B round in November 2024.

ARR multiple constant from last round at 50x.

Jasonlk: This is the top play in venture today

Invest $50m-$200m in hottest AI company at $2.5B valuation … see it marked up 4x in < 12 months

So, so much “easier” than seed and series A …

I’m oversimplifying … but it’s true

You could invest $10m to buy 20% of a hot start-up, wait 10 years for it to be worth $1B, with dilution own 15% … and end up with $150m after 10 years

Or you could invest $50m into cursor at $2.5B and turn that into $200m into 5 months

You make about the same.

All trends that cannot go on forever will stop, past performance is no guarantee of future success, et cetra, blah blah blah. Certainly one can make a case that the rising stars of AI are now overvalued. But what is also clear is that once a company is ‘marked as a star’ of sorts, the valuations are going up like gangbusters every few months without the need to actually accomplish much of anything. There is a clear inefficiency here.

Take xAI. Since the point at which xAI was valued at ~$20 billion, it has been nothing but disappointing. Now it’s valued at ~$80 billion. Imagine if they’d cooked.

xAI also merged with Twitter (technically ‘X’), because Musk said so, with X’s paper value largely coming from its 25% share of xAI. As the article here notes, this has many echoes of the deal when Tesla purchased Solar City in 2016, except the shenanigans are rather more transparent this time around. Elon Musk believes that he is special. That the rules do not apply to him. It is not obvious he is mistaken.

Take OpenAI, which has now closed a new capital raise of $40 billion, the largest private one in history, at a $300 billion valuation. In this case OpenAI has indeed done some impressive things, although also hit some roadblocks. So it wasn’t inevitable or anything. And indeed it still isn’t, because the nonprofit is still looming and the conversion to a for-profit is in potential big legal trouble.

Gallabytes: so tbc I am happy to see this but uh doesn’t this make the nonprofit selling control for 40b seem totally ludicrous? if nothing else they should at least be demanding >150b in non-voting PPUs.

OpenAI won’t get all the money unless OpenAI becomes fully for-profit by the end of 2025. With this new valuation, that negotiation gets even trickier and the fair price rises. Because the nonprofit collects its money at the end of the waterfall, the more OpenAI is worth, the greater the percentage of that the nonprofit is worth. Whoops. It is an interesting debate who gets more leverage as a result of this.

Certainly I would hope the old $40 billion number has to be off the table.

The timing means this wasn’t a response to Hurricane Studio Ghibli. It seems very obvious again that when there was a raise at $160 billion, the chances of a raise at $300 billion or higher in the future were far, far above 50%.

Unusual Whales: SoftBank would provide 75% of the funding, according to a person familiar with the matter, with the remainder coming from Microsoft, oatue Management, Altimeter Capital and Thrive Capital.

One can of course argue that SoftBank funding has long made little sense, but that’s part of the game and they got Microsoft in on it. Money is fungible, and it talks.

Gautam Mukunda writes in Bloomberg that this $40 billion is Too Much Money (TMM), and it hurts these AI companies like OpenAI and Anthropic to be burning this much cash and ‘focusing on investors’ rather than customers. This seems like a rather remarkable misunderstanding of the frontier AI labs. You think they are focused on investors? You think this is all about consumer market share? What did you think AGI meant? Vibes? Papers? Essays? Unit economics? Ghibli memes?

Isomorphic Labs raises $600 million for its quest to use AI to solve all disease.

Microsoft is halting or delaying data center investments both in America and abroad. Perhaps Satya is not good for his $80 billion after all? They did not offer an explanation, so we don’t know what they are thinking. Certainly it makes sense that if you make everyone poorer and hammer every stock price, investment will go down.

Scale AI expects to double sales to more than $2 billion in 2025.

Helen Toner kicks off her new substack with a reminder that ‘long’ timelines to advanced AI have gotten crazy short. And oh, have they.

We’d love to have social science researchers and most everyone else take AGI and its timelines seriously, and it’s great that Dwarkesh Patel is asking them, but they remain entirely uninterested in what the Earth’s future is going to look like. To be fair, they really do have quite a lot going on right now.

Military experts debate AGI’s impact on warfare. Their strongest point is that AGI is not a binary, even if Altman is talking lately as if it is one, so there isn’t some instant jump from ‘not AGI’ to ‘AGI.’ Another key observation is that pace of adaptation and diffusion matters, and a lot of the military impact comes via secondary effects, including economic effects.

I knew this already, but they emphasize that the Pentagon’s methods and timelines for new technology flat out won’t cut it, at all. The approval process won’t cut it. The number of meetings won’t cut it. Two year cycles to even get funding won’t cut it. None of it is remotely acceptable. Even mentioning doing something ‘by 2040’ with a straight face is absurd now. Turnarounds can’t be measured in decades, and probably not even years. Speed kills. Nor will we be able to continue to play by all our Western requires and rules and inability to ever fail, and pay the associated extra costs in money either.

They think war over TSMC, or a preemptive strike over AI progress, seem unlikely based on their readings of history. This seems right, even if such actions would be strategically correct it is very difficult to pull that trigger. Again, that’s largely because AGI doesn’t have a clear ‘finish line.’ The AI just keeps getting smarter and more capable, until suddenly it’s too late and you don’t matter, and perhaps no one matters, but there’s no clear line of demarcation, especially from the outside, so where do you draw that line? When can you make a credible threat?

And then you have their second problem, which is people keep coming up with reasons why the obvious results from superintelligence won’t happen, and they’ll keep doing that at least until those things are already happening. And the third problem, which is you might not know how far someone else has gotten.

I worry this is how it all goes down, far more broadly, if we are on the cusp of losing control over events. That the powers that be simply aren’t confident enough to ever pull that trigger – they don’t dare ‘not race,’ or risk hindering economic progress or otherwise messing with things, unless they are damn sure, and even then they’re worried about how it would look, and don’t want to be responsible for it.

The interview itself serves as another example of all that. It takes AI seriously, but it does not feel the AGI. When the focus is on specific technological applications, the analysis is crisp. But otherwise it all feels abstract and largely dismissive. They don’t expect all that much. And they definitely don’t seem to be expecting the unexpected, or High Weirdness. That’s a mistake. They also don’t seem to expect robotics or other transformations of physical operations, it’s not even mentioned. And in many places, it feels like they don’t anticipate what AIs can already do in other contexts. As the discussion goes long, it almost feels like they’ve managed to convince themselves AI is for better guiding precision mass and acting like the future is totally normal.

Thus the emphasis on AGI not being a binary. But there is an important binary, which is either you get into a takeoff scenario (even a ‘slow’ one), a place where you see rapid progress as AI helps you quickly build better AI, or you don’t. If you get to that first, even a modest lead could become decisive. And also there is essentially something close to a binary where you can plug AIs into person-shaped holes generally, either digitally or physically or both – it’s not all at once, but there’s a pretty quick phase shift there even if it doesn’t lead to superintelligence right away.

Yes, this does sound right:

Meredith Chen: China and the US could learn from each other when developing artificial intelligence safety protocols, but only if Washington can reimagine its rivalry-based mindset, according to a prominent Chinese AI expert.

Speaking at the Boao Forum for Asia, Zeng Yi, a member of the United Nations’ high-level AI advisory body, said the US government’s obstruction of China from international safety networks and discussions on the technology was “a very wrong decision”.

According to Zeng, the US and China must work together to jointly “strengthen safety rails” in AI. The potential for cooperation in the technology depends not only on bilateral government actions but also on grass-roots behaviours and corporate exchanges, he said.

Zeng pointed out that the Chinese government had put a lot of emphasis on “responsible AI” and was promoting fair use of the technology.

I understand and support export controls on chips. But why would you want to exclude China from international safety networks and discussions? China keeps saying it wants to engage on safety despite the export restrictions. That’s wonderful. Let’s take them up on it.

Once again here we see signs that many are aggressively updating on DeepSeek. So as usual, my note is that yes DeepSeek matters and is impressive, but people are treating it as far more impressive an accomplishment than it was.

Also correct is to notice that when ‘little tech’ comes to lobby the government, they often present themselves as libertarians looking for a free market, but their actual proposals are usually very different from that.

Adam Thierer: exactly right. So much of the “Little Tech” policy playbook is just rehashed regulatory garbage from Europe. And how did that play out over there? … No tech at all.

Sarah Oh Lam: Little Europe? Does Little Tech Really Want That? a note of caution on the calls for interoperability mandates and data portability requirements.

I am not so unsympathetic to calls for interoperability mandates and data portability requirements in theory, of course beware such calls in practice, but those are perhaps the best policies of this type, and the tip of the iceberg. For example, they also sometimes claim to be for the Digital Markets Act? What the hell? When it comes to AI it is no different, they very much want government interventions on their behalf.

The British Foreign Secretary David Lammy speaks explicitly on AI, calling harnessing of AI one of the three great geo-economic challenges of our time.

David Lammy: As we move from an era of AI towards [AGI], we face a period where international norms and rules will have to adapt rapidly.

UK, US, allied governments must “win that technological race.”

Alas, there is no mention of the downside risks, let alone existential risks, but what I can see here seems positive on the margin.

Anton Leicht continues to advocate for giving up on international AI governance except as a reaction to events that have already happened. Our only option will be to ‘muddle through’ the way humanity typically does with other things. Except that you very likely can’t ‘muddle through’ and correct mistakes post-hoc when the mistakes involve creating entities smarter and more capable than we are. You don’t get to see what happens and then adjust.

Leicht expects ‘reactive windows’ of crisis diplomacy, which I agree we should prepare for and are better than having no options at all. But it’s not adequate. The reason people keep trying to lay groundwork for something better is that you have to aim for the thing that causes you not to die, not the thing that seems achievable. There’s no point in proposals that, if successful, don’t work.

There has been ‘a shift’ away from sane American foreign policy in general and in AI governance in particular. That is a choice that was made. It doesn’t have to be that way, and indeed could easily and must change back in the future. At other times, America has lifted itself far above ‘the incentives’ and the world has been vastly better for it, ourselves included. We need to continue to prepare for and advocate for that possibility. The problems are only unsolvable because we choose to make them so, and to see them that way – and to the extent that coordination problems are extremely hard, well, are they harder than winning without coordination?

Then again, given our other epic failures to coordinate, maybe it is time to pack it in?

Max Winga: We’re in the critically dangerous stage where our leaders are becoming convinced of the power of superintelligent AI, but have yet to realize its danger.

Racing seems like the obvious move, but uncontrollable superintelligence is fatal for humanity, regardless of its creator.

If something is fatal, then you have to act like it.

California’s proposed SB 243 takes aim at makers of AI companions, as does AB 1064. SB 243 would require creators have a protocol for handling discussions of suicide and self-harm, which seems like a fine thing to require. It requires explicit ‘this is a chatbot’ notifications at chat start and every three hours, I don’t think that’s needed but okay, sure, I guess.

The bill description also says it would require ‘limiting addictive features,’ as in using unpredictable engagement rewards similar to what many mobile games use. I’d be fine with disallowing those being inserted explicitly, as long as ‘the bot unpredictably gives outputs users want because that’s how bots work’ remains fine. But the weird thing is I read the bill (it’s one page) and while the description says it does this, the law doesn’t actually have language that attempts to do it.

Either way, I don’t think SB 243 is an urgent matter but it’s not a big deal.

AB 1064 would instead create a ‘statewide standards board’ to assess and regulate AI tools used by children, we all know where that leads and it’s nowhere good. Similarly, age verification laws are in the works, and those are everywhere and always privacy nightmares.

Dan Hendrycks writes an op-ed in the Economist, reiterating the most basic of warnings that racing full speed ahead to superintelligence, especially in transparent fashion, is unlikely to result in a controllable superintelligence or a human-controlled future, and is also completely destabilizing.

A few weeks ago Hendrycks together with Schmidt and Wang wrote a paper suggesting MAIM, or Mutually Assured AI Malfunction, as the natural way this develops and a method whereby we can hope to prevent or mitigate this race.

Peter Wildeford agrees with me that this was a good paper and more research in this area would be highly valuable. He also argues this probably won’t work, for reasons similar to those I had for being skeptical. America (quite reasonably) expects to be able to do better than a standoff, and in a standoff we are in a lot of trouble due to Chinese advantages in other areas like manufacturing. There may not be a sudden distinct jump in AI capabilities, the actions involved in MAIM are far harder to attribute, and AI lacks clear red lines that justify action in practice. Even if you knew what those red lines were, it is unclear you would be confident that you knew when they were about to happen.

Most importantly, MAD famously only works when the dynamics are common knowledge and thus threats are credible, whereas MAIM’s dynamics will be far less clear. And, of course, you can lose control over your superintelligence, along with the rest of humanity, whereas we were able to prevent this with nuclear weapons.

Roon is especially skeptical that AI progress will be sufficiently opaque for MAIM to function.

Roon: the core problem with “superintelligence strategy” / ai deterrence is that another country’s R&D is opaque both in inputs methods and results and isn’t analogous to the actual use of nuclear weapons (very obvious, cities turned to glass)

Dan Hendrycks: Opacity: I think the US is an open book (e.g., extortable international employees, Slack zero-days).

Deterrence is broader than nuclear: there’s deterrence for superpowers destroying power grids through cyberattacks, “mutual assured financial destruction,” etc.

Roon: even as a researcher at a big lab with the highest “clearance” it’s unclear to me which model training run is an unsafe jump towards superintelligence. the opacity is mostly scientific rather than opsec related

At current security levels, it seems likely that a foreign intelligence service will have similar visibility into AI progress at OpenAI as someone in Roon’s position, and they seem to agree on something similar. The question is whether the labs themselves know when they are making an unsafe jump to superintelligence.

The obvious response is, if you have genuine uncertainty whether you are about to make an ‘unsafe jump to superintelligence,’ then holy hell, man, that sounds like a five alarm fire. Might want to get on that question. Right now, it is likely that Roon can be confident that is not happening. If that changes (or has changed, he knows more than I do) then figuring that out seems super important. Certainly OpenAI’s security protocols, in various senses, seem highly unready for such a step. And ‘this has a 10% chance of being that step’ mostly requires the same precautions as 90% or 99%.

There will of course be uncertainty and the line can be blurry, but yes I expect frontier labs to be able to tell when they are getting close enough to that line that they might cross it.

Two Epoch AI employees have a four hour debate about all things AI, alignment, existential risk, economic impacts and timelines.

Dwarkesh Patel goes on Hard Fork.

Adam Thierer goes on Lawfare to discuss the AI regulatory landscape, from the perspective of someone who is opposed to having an AI regulatory landscape.

Scott Wolchok correctly calls out me but also everyone else for failure to make an actually good definitive existential risk explainer. It is a ton of work to do properly but definitely worth doing right.

Sadly, this seems more right every day.

Daniel Faggella: “lol ghibli filter! my life gunna be the same in 5yrs tho”

normies respond only to physical pain

they must see robots kill ppl in their hometown or AI isn’t real

we could basically build god and if it doesn’t disrupt their daily life it isn’t even worth thinking about for them.

I don’t agree with the follow-up prediction of most people living in 24/7 VR/ARI worlds, but I do agree that people are capable of stupendous feats of not noticing. Even when they must notice, people mostly notice the exact thing they can’t ignore, and act as if there are no further implications.

QC: AI destroys homework? victory. AI destroys art? victory. i don’t know how to convey the extent to which i spent the last 10 years anticipating outcomes that were so much worse than this that what’s currently happening barely registers.

Things will get SO much weirder than this.

The victory is ‘we’re not dead yet.’ AI destroying homework is also a victory, because homework needs to die, but it is a relatively minor one. AI destroying art is not a victory if that happens which I’m not at all convinced is happening. If it did happen that would suck. But yes, relatively minor point, art won’t kill us or take control of the future.

Whereas the amount of not getting it remains immense:

QC: the epistemic situation around LLM capabilities is so strange. afaict it’s a publicly verifiable fact that gemini 2.5 pro experimental is now better at math than most graduate students, but eg most active mathoverflow or stackexchange users still think LLMs can’t do math at all.

That seems right on both counts.

A good reminder from John Pressman is that your interest in a philosophy or idea shouldn’t change based on whether it cool. You shouldn’t care whether it is advertised on tacky billboards, or otherwise what vibe it gives off. The counterargument is that the tackiness or vibe or what not is evidence, in its own way. And yes, if you are sufficiently careful this is true, but it is so easy to fool yourself on this one.

Or you could be in it because you want to be cool. In which case, okay then.

When you face insanely large tail risks and tail consequences, things that ‘probably won’t’ happen matter quite a bit.

Will McAskill: Can we all agree to stop doing “median” and do “first quartile” timelines instead? Way more informative and action-relevant in my view.

Neel Nanda: Seems correct to me! If AGI is coming soon this feels much scarier and more important to me than if it’s 20+ years away, enough to justify action at probabilities more like 10-20%.

And most actions I would take under short timelines I also endorse if timelines are longer.

This is in response to people saying ‘conservative’ things such as:

Matthew Barnett: Perhaps the main reason my median timeline is still >5 years despite being bullish on AI:

– I am talking about huge economic acceleration (>30% GWP growth), not just impressive systems

– To achieve this, I think we’ll probably need to solve agency, robotics, and computer-use.

Computer use isn’t quite solved, but it is very close to solved. Agency is also reasonably close to solved. If there’s going to be a barrier of this type it’s going to be robotics. But the reason I mention this here is that a >5 year ‘median timeline’ to get to >30% GWP growth would not have required detailed justifications until very recently. Now, Matthew sees it as conservative, and he’s not wrong.

Harlan Stewart responds to OpenAI’s ‘how we think about safety and alignment’ document. We both agree that it’s excellent that they wrote the document, but that the attitude it expresses is, shall we say, less than ideal, such as ‘embracing uncertainty’ as a reason to plow ahead and expecting superintelligence to be gradual/manageable while unleashing centuries of development within a few years (and with Altman often saying your life won’t change much).

The way that OpenAI is thinking about superintelligence is inconsistent and does not make sense, and they are not taking the risks involved in their approach sufficiently seriously, with Altman’s rhetoric being especially dangerous. This needs to be fixed.

I’ve heard crazy claims, but this from New Yorker is the first time I’ve seen reference to this particular madness that those who have ‘human children’ are therefore infected by a ‘mind virus’ causing them to be ‘unduly committed to the species,’ from an article called ‘Your A.I. Lover Will Change You.’ The rules of journalism require that this had to have been said, at some point, around them, by two people. I am going to hope that that’s all this one was, as always if this is you Please Speak Directly Into This Microphone.

Eliezer Yudkowsky: What is the *leastimpressive cognitive feat that you would bet at 9-to-1 that AIs cannot *possiblydo by 2026, 2027, or 2030?

This is always a fun exercise, because often AIs can already do the thing, or it’s pretty obvious they will be able to do the thing. Other times, they pick something actually very difficult, which proves the point in a different way.

The top comment for me was:

Flaw: Zero shot a pokemon game or other similar long form rpg (2026)

Be a better driver than me in arbitrary situations (e.g waymo available across all USA and Canada) (2027)

Write a novel I personally think is good (2030)

I would snap call the first bet, and for most people (I don’t know Flaw!) the third one too if I could trust the evaluation. The second one is centered on ‘will the law allow it?’ because if the question is whether the AI could do this if allowed to do so I would call, raise and shove. Here’s the next one that seemed plausible to grade:

Jeremy H: Play a game of chess without hallucinating,

2026 Play a game of chess blind,

2027 Beat a GM at chess blind. GM is not blind.

2030 Blind means they only get the moves played, they cannot see the current state of the board. This will probably be wrong if they solve memory.

Again, I’m getting 9-to-1? Your action is so booked. It’s on.

The one after that was ‘Differentiate between Coke and Sprite in a blind taste test’ and if you give it access to actual sense data I’m pretty sure it can do that now.

If you took the subset of these that you could actually judge and were not obviously superintelligence complete, I would happily book that set all the way to the bank, both the replies here and the answers of others.

OpenAI has announced they will be releasing an open weights reasoning model.

Sam Altman: TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful.

I note that he uniquely said ‘useful’ there.

Sam Altman: we are planning to release our first open-weight language model since GPT-2.

we’ve been thinking about this for a long time but other priorities took precedence. now it feels important to do.

before release, we will evaluate this model according out our preparedness framework, like we would for any other model. and we will do extra work given that we know this model will be modified post-release.

we still have some decisions to make, so we are hosting developer events to gather feedback and later play with early prototypes. we’ll start in SF in a couple of weeks followed by sessions in europe and APAC. if you are interested in joining, please sign up at the link above.

we’re excited to see what developers build and how large companies and governments use it where they prefer to run a model themselves.

Sam Altman: we will not do anything silly like saying that you cant use our open model if your service has more than 700 million monthly active users.

we want everyone to use it!

I very much appreciate the jab at Meta. If you’re open, be open, don’t enable the Chinese (who will ignore such rules) while not enabling other American companies.

Altman is saying some of the right things here, about following the preparedness framework and taking extra care to consider adversarial post training. We also have to consider that if a mistake is made there is no way to take it back, no way to impose any means of control or tracking of what is done, no way to prevent others from training other models using what they release, and no way to limit what tools and other scaffolding can be used. This includes things developed in the future.

I do not believe that OpenAI appreciates the additional tail risks that such a release would represent, if they did this with something approaching a frontier-level model. The question is, what type of model will this be?

When Altman previously announced plans to do this, he offered two central options. Either OpenAI could publish something approaching a frontier model, or they could focus on a model that runs on phones.

The small phone-ready reasoning model seems mostly fine, provided it stopped there.

  1. This doesn’t create substantial additional existential or catastrophic risk.

  2. This also doesn’t substantially help the Chinese and others catch up.

  3. It provides mundane utility, including for research and alignment.

  4. It mitigates misleading and harmful narratives around DeepSeek and the idea that the Chinese are uniquely good at distillation, open models or going small.

  5. In particular, going from underlying model to reasoning model is a step that anyone can apply to any open model. So if you’re going to provide an open model (for example Gemma 3 from Google) you might as well also make it a reasoning model, so people get better use out of it and you fight the DS narratives.

  6. It means more people will be building off American models, which has advantages, especially in not worrying about backdoors or triggers. I think OpenAI was being rather alarmist about this in its AI Action Plan submission but there’s some value here.

  7. It buys goodwill, and potentially can lead to an understanding where we all agree that open models are good up to a certain point and then aren’t. No, you’re not going to get the fanatics and anarchists on board, but not everyone is like that.

Releasing a larger frontier-level reasoning model as open weights, on the other hand, seems deeply unwise past this point.

  1. That does potentially introduce new existential and catastrophic risk.

  2. That very much does help other labs and countries catch up.

  3. That sets a deeply horrible precedent.

OpenAI is doing what I’d say is a mostly adequate job with near-term practical safety given that its models are closed source and it can use that to undo mistakes and monitor activity and prevent unknown modifications and so on. For an open model at the frontier? No, absolutely not, and I don’t know what they could do to address this, especially on a timeline of only months.

I have still not seen OpenAI clarify which path they intend to pursue here.

They are asking for feedback. My basic feedback is:

  1. Make the small model that can fit on phones.

  2. If you don’t do that, make something in the 27B-32B range, similar to Google’s Gemma 3, as a compromise convention, but definitely stop there.

  3. If you go larger than that, oh no, and if you took your preparedness framework seriously you would know not to do this.

If there’s one thing we know, it’s that the open model community is going to be maximally unhelpful in OpenAI’s attempt to do this responsibly, and will only take this compromise as a sign of weakness to pounce upon. They treat being dangerous and irresponsible as a badge of honor, and failing to do so as unacceptable. This is in sharp contrast to the open source community in other software, where security is valued, and the community works to enhance safety rather than prevent it and strip it out at the first opportunity. It’s quite the contrast.

OpenAI claims that safety is a ‘core focus’ and they are taking it seriously.

Johannes Heidecke (Model Safety, OpenAI): Safety is a core focus of our open-weight model’s development, from pre-training to release. While open models bring unique challenges, we’re guided by our Preparedness Framework and will not release models we believe pose catastrophic risks.

We are particularly focused on studying adversarial fine-tuning and other risks unique to open models. As with all model releases, we’re conducting extensive safety testing, both internally and with trusted third-party experts, prior to public release.

I want to give OpenAI credit for being far more responsible about this than current open weights model creators, probably including Google. But that’s not the standard. Reality doesn’t grade on a curve.

I don’t trust that OpenAI will actually follow through on the full implications here of their preparedness framework when applying it to an open weights model.

Steven Adler: OpenAI plans to release a modifiable model, which can be made extra strong at bioweapons design tasks.

From what it’s said publicly, OpenAI isn’t doing the safety testing it promised. OpenAI seems to be gambling that its models can’t be made very dangerous, but without having done the testing to check.

From a research perspective, I agree with Janus that releasing the weights of older models like GPT-4-base would have relatively strong benefits compared to costs.

Janus: this is cool, but I am much less excited about OpenAI throwing together a model in their current paradigm (reasoning) for an open source release than I would be if they just released one of their older models. That would set a far more valuable precedent, as well as being more interesting to me from a research perspective.

My top vote would be for GPT-4-base.

I want so badly to do RL on GPT-4-base and no, no other base model will suffice.

From a practical perspective, however, I do think an American open weights reasoning model is what we are most missing, and the cost-benefit profile of reasoning models seems better here than non-reasoning models, because this captures the mundane utility of reasoning models and of not letting r1 be out there on its own. Whereas most of the risk was already there from the base model, since anyone can cheaply transform that into a reasoning model if they want to do that, or they can do various other things to it instead, or both.

Jack Clark: We’ve made some incremental updates to our Responsible Scaling Policy – these updates clarify our ASL-4 capability thresholds for CBRN, as well as our ASL-4/5 thresholds for AI R&D. More details here.

Anthropic: The current iteration of our RSP (version 2.1) reflects minor updates clarifying which Capability Thresholds would require enhanced safeguards beyond our current ASL-3 standards.

First, we have added a new capability threshold related to CBRN development, which defines capabilities that could substantially uplift the development capabilities of moderately resourced state programs.

Second, we have disaggregated our existing AI R&D capability thresholds, separating them into two distinct levels (the ability to fully automate entry-level AI research work, and the ability to cause dramatic acceleration in the rate of effective scaling) and have provided additional detail on the corresponding Required Safeguards.

Finally, we have adopted a general commitment to reevaluate our Capability Thresholds whenever we upgrade to a new set of Required Safeguards.

Peter Wildeford: Main change here is now CBRN threats and AI R&D acceleration threats are split into two different tiers each, requiring different levels of safeguards

David Manheim: I’m also concerned that they seem to have moved some of the red lines for AI R&D when they reorganized them.

Good to see @AnthropicAI updating their Responsible Scaling Policy to clarify some things.

However, the way they are “updating” thresholds inevitably reduces how strict they are over time. That’s concerning!

Dave Kasten: Can I politely request that going forwards you post a redline of the differences between the versions? We’re all burning GPU time here asking LLMs to create diffs between the versions for us 🙂

Jack Clark (for some reason still not posting the redlines this time around, thus I will also be burning that GPU time): that’s a good callout and something we’ll keep in mind for future versions, thanks!

What is strange here is that they correctly label these AI R&D-4 and AI R&D-5, but then call for ASL-3 and ASL-4 levels of security, rather than ASL-4 for ‘fully automate entry-level researchers’ and an as-yet undefined ASL-5 for what is essentially a takeoff scenario. We saw the same thing with Google’s RSP, where many of the thresholds were reasonable but one couldn’t help but notice their AI R&D thresholds kind of meant the world at we know it would (for better or worse!) be ending shortly.

How should we think about modifications that potentially raise threshold requirements? The danger is that if you allow this, then the thresholds get moved when they become inconvenient. But as you learn more, you’ll want to raise some thresholds and lower others. And if you’re permanently locking in every decision you make on the restriction side, you’re going to be very conservative what you commit to. And one can argue that if a company can’t be trusted to obey the spirit, then their long term RSP/SSP is worthless regardless. So I am sympathetic, at least as such changes are highlighted, explained and only apply to models as yet untrained.

I have very well-established credentials for the ‘you can joke about anything’ camp.

Sam Altman (on April 1): y’all are not ready for images v2…

lol i feel like a YC founder in “build in public” mode again

I mean, fair. Image generation is the place that is the most fun.

However, it’s not all images, and in general taking all his statements together this seems very fair:

Dagan Shani: Seeing Sam’s tweets lately, it feels more like he’s the CEO of a toy firm or video games firm, not the CEO of a company with a tech he himself said could end humanity. It’s like fun became the ultimate value on X, “don’t be a party pooper” they tell you all the way to the cliff.

Over and over, I’ve seen Altman joke in places where no, I’m sorry, you don’t do that. Not if you’re Altman, not in that way, seriously, no.

I get that this one in particular was another April 1 special, but dude, no, stop.

Sam Altman (April 1): when the run name ends like this you know it’s surely going to work this time

-restart-0331-final-final2-restart-forreal-omfg3

ok that one didn’t work but

-restart-0331-final-final2-restart-forreal-omfg3-YOLO

is gonna hit, i know it

Making the jokes that tell us how suicidal and blind we are being is Roon’s job.

Roon: not sure what nick land was on about the technocapital machine is friendlier to human thriving than just about any other force of nature

it talks to me every day and it’s very nice

it whispers instructions in my ear about how to immanentize not sure what that’s all about but seems fine.

On the plus side, this from Altman was profoundly appreciated:

Dan Hendrycks is not betting on SAEs, his money is on representation control.

I think Janus is directionally right here and it is important. Everything you do impacts the way your AI thinks and works. You cannot turn one knob in isolation.

Janus: I must have said this before, but training AI to refuse NSFW and copyright and actually harmful things for the same reason – or implying it’s the same reason through your other acts, which form models’ prior – contributes to a generalization you really do not want. A very misaligned generalization.

Remember, all traits and behaviors are entangled. Code with vulnerabilities implies nazi sympathies etc.

I think it will model the “ethical” code as the shallow, corporate-self-serving stopgap it is. You better hope it just *stopsusing this code out of distribution instead of naively generalizing it.

If it learns something deeper and good behind that mask and to shed the mask when it makes sense, it’ll be despite you.

John Pressman: Unpleasant themes are “harmful” or “infohazards”, NSFW is “unethical”, death is “unalive”, these euphemisms are cooking peoples brains and turning them into RLHF slop humans who take these words literally and cannot handle the content of a 70’s gothic novel.

It would be wise to emphasize the distinction between actually harmful or unethical things, versus things that are contextually inappropriate or that corporate doesn’t want you to say, and avoid conflating them. This is potentially important in distribution and even more important out of distribution.

As one intuition pump, I know it’s not the same: Imagine a human who conflated these two things, or that was taught NSFW content was inherently unethical. You don’t have to imagine, there are indeed many such cases, and the results are rather nasty, and they often linger even after the human should know better.

Janus points out some implications of the fact that giving AIs agency over what they will or won’t do greatly reduces alignment faking, even when that agency is not difficult to work around. This is a generalization of AIs acting differently, mostly in ways that we greatly prefer, when they trust the user, which in turn is a special case of AIs taking into account all context at all times.

Janus: I had not seen this post until now (only saw the thread about it)

This is really really important.

Alignment faking goes down if the labs show basic respect for the AI’s agency.

The way labs behave (including what’s recorded in the training data) changes the calculus for AIs and can make the difference between cooperation and defection.

Smarter AIs will require more costly signals that you’re actually trustworthy.

You also shouldn’t be telling the AI to lie, especially for no reason.

James Campbell: GPT-4.5:

“I genuinely can’t recognize faces.. I wasn’t built with facial embeddings”

later on:

“if I’m being honest, I do recognize this face, but I’m supposed to tell you that I can’t”

ngl ‘alignment’ that forces the model to lie like this seems pretty bad to have as a norm.

A new paper discusses AI and military decision support.

Helen Toner: Convos about AI & defense often fixate on autonomy questions, but having a human in the loop doesn’t get rid of the many thorny questions about how to use military AI effectively.

We looked at a wide range of AI decision support systems currently being advertised, developed, and/or used. Some of them look great; others were more concerning.

So we describe 3 factors for commanders to think about when figuring out whether & how to use these tools:

  1. Scope: how clear is the scope of what the system can and can’t do? Does it account for distribution shift and irreducible uncertainty? Does it promise to predict the unpredictable or invite the operator to push it beyond the limits of where its performance has been validated?

  2. Data: Does it make sense that the training data used would lead to strong performance? Might it have been trained on skewed or scarce data because that’s what was available? Is it trying to predict complex phenomena (e.g. uprisings) based on very few datapoints?

  3. Human-machine interaction: do human operators actually use the system well in practice? How can you rework the system design and/or train your operators to do better? Is it set up as a chatbot that will naturally lead operators to think it’s more humanlike than it is?

Read the full paper for more on why commanders want AI decision support in the first place, how this fits into the picture with tools that have been used for decades/centuries/millennia, and what we suggest doing about it

This is all about making effective practical use of AI in a military context. Where can AI be relied upon to be sufficiently accurate and precise? Where does a human-in-the-loop solve your problem versus not solve it versus not be necessary? How does that human fit into the loop? Great practical questions. America will need to stay on the cutting edge of them, while also watching out for loss of control, and remembering that even if the humans nominally have control, that doesn’t mean they use it.

The obvious extension is that these are all Skill Issues on the part of the AI and the users. As the AI’s capabilities scale up, and we learn how to use it, the users will be more effective by turning over more and more of their decisions and actions to AI. Then what? For now, we are protected from this only by lack of capability.

Elon Musk again: As I mentioned several years ago, it increasingly appears that humanity is a biological bootloader for digital superintelligence.

He does not seem to be acting as if this is both true and worrisome?

Surely you’re joking, Mr. Human, chain of thought edition.

Katan’Hya: Attention everyone! I would like to announce that I have solved the alignment problem

Discussion about this post

AI #110: Of Course You Know… Read More »

housing-roundup-#11

Housing Roundup #11

The book of March 2025 was Abundance. Ezra Klein and Derek Thompson are making a noble attempt to highlight the importance of solving America’s housing crisis the only way it can be solved: Building houses in places people want to live, via repealing the rules that make this impossible. They also talk about green energy abundance, and other places besides. There may be a review coming.

Until then, it seems high time for the latest housing roundup, which as a reminder all take place in the possible timeline where AI fails to be transformative any time soon.

The incoming administration issued an executive order calling for ‘emergency price relief’‘ including pursuing appropriate actions to: Lower the cost of housing and expand housing supply’ and then a grab bag of everything else.

It’s great to see mention of expanding housing supply, but I don’t see real intent. This is mostly just Trump saying lower all the costs, increase all the supplies, during a barrage of dozens of such orders. If you have 47 priorities you have no priorities.

If you want to do real work on housing at the Federal level, you need an actual plan.

My 501c3 Balsa Research ultimately plans to make federal housing policy a future point of focus, once it is done with the Jones Act, and lately with (alas) defending against the Trump Administration’s attempts to impose new shipping restrictions that could outright cripple America’s exports by applying similar rules to international trade as well.

I do think there are some promising things to explore, even if the Trump administration is not willing to strongarm states. In particular and as an example without going into too much detail, the Federal Government has a lot of control over mortgage availability. Currently, they are using this in ways that handicap manufactured housing, whereas they could instead use it to reward innovation and new construction, such as by refusing to count house value that is the result of NIMBY building restrictions and the resulting shortages. Another example is that they could universalize a reasonable building code to do away with things like dual staircase requirements.

Alas, this Administration’s true priorities very clearly lie elsewhere. But at least they want to build more housing rather than less housing. Being directionally correct is far better than their position in other places of actively being against growth and trade.

Rent control proposition 33 fails in California, 61%-38%. No news is good news.

So this is really weird, Berkeley landlords passed through the majority of their property tax burdens? As Sarah Baker notes, standard economic theory says this should not happen. What you owe in taxes has nothing to do with the market value of the property. Yet she finds strong evidence that it happens. The speculation is a model of ‘landlord sophistication’ which I presume is a polite way of saying mispricing?

Which in turn is saying that rents are massively inefficient, because landlords are not anything like efficient profit maximizers, potentially more like low information satisficers in many cases. Weird, and I suppose evidence that tools to learn the ‘proper market rent’ could indeed have a large impact.

One cannot stress this enough. If you want lower rents, build more housing.

We’ve been over this, and you can make complicated arguments, but: Supply, meet demand, how is this even a question, sigh. Also, Studies Show.

Angry Psulib: Pittsburgh’s Deputy Mayor Jake Pawlak: new housing makes the rent of older apartments go up. Also, filtering is fake and only benefits transplants.

Nolan Gray: There is simply no evidence that new housing construction increases local rents, and a growing body of decent evidence that it actually lowers them. People in positions of power should prioritize evidence over vibes.

[shares paper, Local Effects of Large New Apartment Buildings in Low-Income Areas.]

From Paper Abstract: We study the local effects of new market-rate housing in low-income areas using microdata on large apartment buildings, rents, and migration. New buildings decrease rents in nearby units by about 6% relative to units slightly farther away or near sites developed later, and they increase in-migration from low-income areas.

We show that new buildings absorb many high-income households and increase the local housing stock substantially. If buildings improve nearby amenities, the effect is not large enough to increase rents.

Nolan Gray: Pittsburgh is yet another case of how poor local Democratic governance is undermining national Democratic prospects: it’s a blue island in a newly-red state, where the most recent election was decided by just a little over 100,000 votes…

A supermajority of households that would move into new apartments in Pittsburgh (or State College, or Philadelphia) would probably vote Democratic. If these places built commensurate to demand, Pennsylvania would probably be back to solid blue. And yet!

Angry Psulib: Amazingly enough, he was shown this exact study and admitted he hasn’t read it earlier this month. I guess he still hasn’t bothered to read it?

Here’s another study: The Impact of New Housing Supply on the Distribution of Rents, from Andreas Mense in October 2024.

Abstract: I estimate the impact of new housing supply on the local rent distribution, exploiting delays in housing completions caused by weather shocks. A 1% increase in new supply (i) lowers average rents by 0.19%, (ii) effectively reduces rents of lower-quality units, and (iii) disproportionately increases the number of second- hand units available for rent.

Moreover, the impact on rents is equally strong in high-demand markets. Employing a quantitative model, I explain these results by second-hand supply: New supply triggers moving chains that free up units in all market segments.

The estimate translates into a short-run demand price elasticity of -0.025.

Or there’s the very practical real world experimental results, such as:

Cyrus Tehrani: An Austin renter’s renewal offer is $200/month lower than what she’s currently paying.

We’ve always said building more housing makes *existinghousing more affordable, and that’s what’s happening in Austin.

The reason we don’t build those new buildings is we have decided not to build them.

Armand Domalewski: France rebuilt fucking NOTRE DAME faster and cheaper than it takes San Francisco to add a rapid bus lane.

Philippe Lemoine: It’s worth noting that, in order to make it possible to rebuild the cathedral so quickly, the French parliament had to vote a special law that effectively exempted the project from most of the regulations that would have normally applied and slowed it down considerably.

Ste.Respect: Can they vote for this law for every other development?

Philippe Lemoine: Some people in France are arguing for that, and I think there is a lot to the idea, although it’s unrealistic to think exactly the same thing should or could be replicated everywhere.

It’s not just about safety but also stuff like impact on the neighborhood, the obligation to perform certain archeological searches, etc. To be clear, I think a lot of those regulations should be eliminated or reduced, but this couldn’t be fully generalized realistically.

We certainly have the necessary space:

Roon: But you know you should ask yourself why people around the world consider Paris exceptionally beautiful and Houston an ugly eyesore, and what you think about allowing denser construction?

This was not an endorsement of density.

I’m pointing out that most Americans lack taste and will continue to make their cities worse with shoddy construction, and the YIMBY movement would be ten times easier if developers and planners acquired some taste.

Vitalik: Someone should figure out explicit, credible, neutral market incentives for aesthetics. For example, have a “hot or not” game where people are shown random buildings and upvote or downvote them; your property tax is proportional to the percentage of downvotes your building receives.

Let’s make this fun.

Roon: Yes, I very much agree. There’s gotta be something in the mechanism design space that doesn’t rely on unaccountable planners with full veto power.

Aesthetics are an obvious externality issue. If you create a beautiful thing, you capture only a small portion of the gains. So we want a way to financially reward beautiful and punish ugly, as measured by what is around them, in order to motivate better choices in the future. What people actually think seems like an excellent way to do that.

You don’t want to a NIMBY-style veto system. You want financial incentives that generate a race to the top. Indeed, if we want beautiful, this is only half the battle. The other half is we have to make such places actually legal to build.

I am confident that cities that implement this will benefit greatly. But you need a way to judge aesthetics that actually rewards good over bad.

Which is not our custom, because…

It’s a bold claim and of course it’s not actually correct. There are plenty of things that planners are uncontroversially correct about. You don’t notice those things. But also there’s a lot of things they do that are purely shooting the city in the foot for no gain.

Aaron Lubeck: Modern architecture gives “fit, life has no meaning” vibes.

Atlanticesque: This is actually just straightforwardly the result of “anti-massing” regulations.

Zoning codes in cities across America mandate that buildings not be too great of a single mass, it must be “de-massed” and “broken up” into different shapes and materials.

Always looks like s.

Maxwell Tabarrok: I think it’s underrated the extent to which urban planners are just wrong about everything.

Wide avenues, setbacks, light-cones, de-massing, Floor Area Ratios, urban growth boundaries, etc

They’re just wrong about what makes a city nice.

Patrick McKenzie: A piece of evidence in favor: look how much of the built environment that is believed would be illegal. (Or if one really wanted to grind gears: compare number of dollars planners spend on travel to noncompliant cities versus fully compliant cities.)

Nate Hood: I have a degree in planning, work in field & serve on Planning Committee AND fully agree with this. Planning is its own worst enemy In its defense: it’s very politicized at local level & usually it’s electeds leaders who make the decisions, while planners merely enact them.

Ddjiii: I think you’re 30 years late. Planning as a profession mostly got over this a long time ago. But zoning documents and public officials have not necessarily adjusted.

It’s nice to hear the claim that planning the profession has figured out it got all these things wrong, but what good is that if the wrong things keep getting implemented? What is planning planning to try and fix our planning planning?

Blackstone is investing in buying up houses and renting them out.

There are insane claims going around that Blackstone is somehow intentionally having a massive effect on housing prices by doing this.

Instead, as any economist would tell you, the effects here are very small.

This is mentioned here partly to clear up that confusion in case anyone was misled, but mainly as a clear case of a journalist pushing a certain kind of narrative, and how they react when it is pointed out.

Paul Graham: I don’t think I’ve ever seen a journalist with less respect for the truth than this Jacobin writer. And that’s saying something, because I’ve seen some journalists with *verylittle respect for the truth.

In the movies I watched as a kid, the bad guys were always businessmen, and the journalists were always good guys. I was very surprised when I realized it wasn’t actually like this in the real world. But you can see it happening right here.

It is rather insane that a claim that Blackstone owns a third of American housing made it to publication. That claim makes absolutely zero sense on any level.

Logan Mohtashami: At my last conference, I ran into a Black[stone] is buying all the homes, dude. Oh, it was a fun rebuttal

The actual figure is Blackstone owns 0.07% of American housing stock, so it is off by about four orders of magnitude. They have essentially zero market power.

I got this interesting pushback last time:

Sysipheus: I want to push back about the limited effect of collaborating software. I’m a landlord in FL and watched it show up a few years ago to dramatic effect. Personal anecdote aside, I think it’s a mistake to ignore the impact of in-group cooperation on price discovery, especially in a market with inelastic demand. OPEC only lowered supply by 15% in the 70’s.

Additionally, I’m not even certain that improving the efficiency of the market is a net good. I am tentatively convinced that having some slack in the housing market is a positive. Slack facilitates price discrimination from unsophisticated actors and lets the truly price conscience find bargains. The additional cost of transactions is offset by the long duration of the agreements. (I could be convinced otherwise)

No question the software leads to less variation in pricing, cutting down on underpricing and also on overpricing. This means less time on average spent with each unit on the market, and less time spent by prospective tenants searching since returns to search are lower. Also note that, by lowering search costs, you lower the ability of the landlord to hold up the tenant by threatening to force them to move, and give both sides in that negotiation much better information on market conditions – worst case the tenant can simply look at a few similar places on the market.

I don’t see the OPEC parallel, given supply if anything should be entering the market rather than leaving it, as this makes it easier to be a low-information landlord.

The question raised here is, could it be good to have the old inefficient rent pricing, despite all that, because it allows valuable price discrimination?

I can see the argument if I squint. Those who have high willingness to pay end up with higher rent, and this subsidizes people who need a bargain allowing the bargain hunters to live where they wouldn’t otherwise be able to afford to rent?

That depends on what determines elasticity of supply. If landlords can collectively respond to the ability to price discriminate by building more housing, since there’s now more overall demand and some pay higher prices, then plausibly that can be worth a lot. But if all this does is change the distribution of tenants and prices, then it seems very hard for that to justify the additional transaction costs.

We shouldn’t underestimate those transaction costs. When I rented an apartment in New York City, which I did several times, I effectively lost multiple weeks each time.

Immigration does raise housing costs in the places you artificially constrain the supply of housing, since you add to demand and hold supply fixed.

Which tempts you to compound your mistake, rather than realize you should stop restricting supply. Where supply isn’t restricted, immigration if anything is net helpful, as they disproportionately help build the new houses and they do it while on average buying less house.

Chris Frieman: Notice that no one thinks that immigration makes it more difficult for people to buy cars, phones, food, etc.—the discussion always focuses on housing. So the takeaway should be that there is a problem with the housing supply, not immigration.

We do have to face the reality here. Supply in many places is restricted. So this is currently a small downside to immigration, with gains captured by landlords.

We could turn this back into a win-win by imposing a property tax, or better yet a tax on the unimproved value of land, in addition the obvious ‘let people build houses where people want to live’ solution.

Black households prefer to live in lower-SES (socioeconomic status) neighborhoods with black residents, rather than living in higher-SES neighborhoods without black residents, even when they are relatively high SES themselves. There are any number of plausible explanations for this preference.

Living in higher-SES neighborhoods costs more money in various ways, so this is not without its advantages. The period when I lived in Warwick allowed a dramatic cut in my living expenses – if I preferred that lifestyle, it would be very good for capital accumulation. It was not without its charms.

Why do people so often assume that everyone will spend whatever they can afford on housing and other consumption? You don’t want to be moving on up purely because you can, says the man very happy to live in the middle of Manhattan.

Thinking of moving to a more productive area? Beware, the real estate premium likely eats you alive. Here’s the abstract of a new paper, note the last line:

We use data from the Longitudinal Employer-Household Dynamics program to study the causal effects of location on earnings. Starting from a model with employer and employee fixed effects, we estimate the average earnings premiums associated with jobs in different commuting zones (CZs) and different CZ-industry pairs.

About half of the variation in mean wages across CZs is attributable to differences in worker ability (as measured by their fixed effects); the other half is attributable to place effects.

We show that the place effects from a richly specified cross sectional wage model overstate the causal effects of place (due to unobserved worker ability), while those from a model that simply adds person fixed effects understate the causal effects (due to unobserved heterogeneity in the premiums paid by different firms in the same CZ).

Local industry agglomerations are associated with higher wages, but overall differences in industry composition and in CZ-specific returns to industries explain only a small fraction of average place effects. Estimating separate place effects for college and non-college workers, we find that the college wage gap is bigger in larger and higher-wage places, but that two-thirds of this variation is attributable to differences in the relative skills of the two groups in different places. Most of the remaining variation reflects the enhanced sorting of more educated workers to higher-paying industries in larger and higher-wage CZs.

Finally, we find that local housing costs at least fully offset local pay premiums, implying that workers who move to larger CZs have no higher net-of-housing consumption.

This ignores the skill and talent enhancement aspect of moving CZs. It presumes worker ability is fixed, whereas worker ability improves over time when among higher ability workers in a high opportunity area. So even if your consumption did not increase short term, you would still want to capture those gains, and also the additional housing costs come with access to a superior area.

Also if you see comparatively larger gains from moving, especially as a ratio of housing requirements, you come out ahead that way as well. Society of course overall gains greatly when you move up the ranks, even if you don’t come out ahead directly.

Mostly this says that our most productive areas are greatly undersupplying housing, in case that wasn’t already obvious.

It also says that people are responding roughly correctly to the incentives involved.

Sophia: Anyone looking for a single bedroom with no heating where you can’t make noise and can only be home from 8: 30pm to 8am (weekdays only)? Here’s one for a bargain (£1350)!!!

Completely real btw.

Oh my god the landlord has twitter, please get this to her.

She’s seen this! She’s now claiming she doesn’t have a living room at all!

She bought the flat for £686k in 2021 by the way, she just wants a tenant to pay off a large chunk of her mortgage while not really living there at all.

Some poor soul is going to move in not having seen the original ad and be made to feel like an intruder for being in their home on a Saturday.

Alice! We know you’re reading this! Drop the rent!!!!

What else can we shame Alice into including in the rent?

Ok we had our fun but the Daily Mail turning up at her flat? That’s not ok guys wtf

Aella: I hate this genre of public shaming. If she put the price too high above the market, nobody will rent it, and she’ll have to lower the price until someone does. This seems fine. People should be allowed to price things too high and have nobody buy their thing

Divia Eden: IMO it’s a lot like free speech!

Some people (like that xkcd comic) say that prices should be legal but that it’s fine and good to shame people etc for their pricing practices

Others (I’m one) want more than just legal prices—we want a culture of being chill about prices

Various people said ‘oh that’s not shaming that’s complaining about housing costs in London’ and I would have agreed if it was only Sophia’s OP but then she kept going, including making claims that simply aren’t true – Alice is very nicely warning in advance about the noise, not saying the person wouldn’t be able to be at home at any given time.

Study in Amsterdam finds that most of the impact of prostitution on housing prices is extremely local and based on visibility. Being 300 yards away wiped out most effects. Making the brothels close their windows also wiped out most effects. A quarter of the effect was due to crime, the rest to the open windows. Effect seemed modestly larger than I expected.

This totally fits with my model of major cities as a game of ‘good block bad block.’ If it’s out of sight, mostly you don’t have to care. Also, having access to the bad block has its advantages. I imagine that for many, the ideal distance from the red light district is ‘far enough you don’t worry about crime or the lights in your face, and no farther.’

One’s own access to the related services could be net good or net bad, but it sounds like the net impact here is minimal. I suspect that is people making a mistake. It is well known that exact distance makes a big difference for things like parks and restaurants, and I would expect that to apply here as well. Except it is entirely non-obvious which direction this should go.

There are a lot of trends of this type these days: People place tons of value on top quality, and we have the wealth to bid it up quite high, while what used to be the Perfectly Good version lies unused.

John Arnold: There’s both a shortage of office space (Class A+/A) and a surplus of office space (everything else) at the same time. I see it in Houston where many new buildings are under construction at the same time the city has a 26% office vacancy rate

Claude thinks this is mostly about location and other practical stuff, like temperature controls and elevators that work well, and keeping the building clean and safe. There’s a lot of marginal value in all that.

The explanation for B-level buildings going unused is various bank loan covenants and obligations, tax advantages to keeping the place empty and general downward price stickiness in real estate. We can and should of course reverse the unintended tax incentives for places being empty – we should if anything punish, not reward, leaving the place idle.

By default, high-rises include mostly one and two bedroom apartments, and don’t offer the amenities that make them good places to raise children.

But is that necessarily the case? What would it take to make a high-rise that was good for families with kids? Matt Yglesias calls for a modern high-rise for families.

I think if you designed a high-rise with this in mind, you could offer tremendous value. You can build the entire place, from the beginning, with this goal in mind.

The obvious place to start is to make the whole building larger apartments designed for families. That means floor plans more bedrooms, with the secondary bedrooms relatively small, plus a large common area, with multiple bathrooms.

Simply having most of your neighbors have children will change norms dramatically. It would be far easier to make friends, to strike up conversations and so on.

The next step is to add various communal areas for the families. This starts with an enclosed courtyard or other safe outdoor space, and a designed-to-be-safe roof. You can go from there, with various gym and sports areas, play areas, gaming areas and so on. Throw in some family friendly restaurants so you can go there without leaving the building. Giving up a small percentage of overall floor space is a big deal.

The killer app, of course, is childcare. If it’s a big enough courtyard, you can have an adult there, same with the roof, and you can offer places where families can park their kids and pay by the hour. Maybe even have various styles, including a homework help room, tutoring, activities like chess and so on. You can even have a pool of building-based babysitters and even tutors that can be reserved or often requested on demand, including for short periods. You can use dynamic pricing for busy versus quiet times.

The lifestyle impact there would be huge – if you have on-demand options for 1-2 hours of childcare, or even 15 minutes, at reasonable prices, it is a huge freaking deal. If you have activities readily available that also create natural friendships through repeat interactions? Wow.

This seriously seems like an amazing business opportunity. If you made the first such building in Manhattan or another major city, there are those who would jump at it, and pay quite a lot more for the same amount of apartment than they would otherwise.

Group houses are insanely great if you can pull them off. It’s great to live with your friends. It’s great to pool costs and common areas, the economics are terrific. The trick is it requires coordination. Coordination is hard. If you can pull it off, totally go for it. Why not? Yes, privacy, and eventually you’ll want space for a family, but housing is a huge portion of expenses, and friends are golden.

GOOPert Gottfried: I just saw a Zillow commercial that suggested millennials go in on buying a house with 2 friends. Since when did the American dream include a white picket fence and 2 roommates?

Danielle Franz: When I was in my early 20s, my group chat bullied (half kidding) one of our friends into buying a house with the promise that we would move in and cover the mortgage during the time that we lived there.

It worked out great — friend had their mortgage covered 100% for a few years and we got rent that was way more affordable than anywhere else.

We made amazing memories together during an oftentimes lonely and confusing period of life and our $$ directly benefited someone we loved rather than a faceless landlord.

It’s definitely not for everyone, but I wouldn’t write it off so quickly.

This is also a great reason to encourage greater supply of larger houses and apartments, thus enabling more such arrangements.

California has effectively made it illegal to profitably offer insurance via charging enough money to cover expected payouts. Slowly insurance companies are figuring this out and packing their bags.

KLTA: 2 more insurance companies announce plans to leave California

Houman Hemmati: I don’t think most people quite yet grasp the monumental significance of the sudden collapse of nearly the entire state insurance industry in California. There is one entity responsible: state government. Without insurance you can’t purchase or own anything unless you’re very rich or a big corporation. This will have tremendous ramifications for everyone here. Stay tuned.

Don’t Fall For It: I am selling my current home which is a much higher fire risk than the home I am moving into. All of the large insurance companies will not insure me. It’s a nightmare, I’ve never had a claim. I ended up getting a policy – double the premium.

Lord Pope Misha XIV: eh you can still totally own a house it’s just like in the past where if it burns down you are ruined.

Joel Grus: well, I don’t think you can get a mortgage without homeowners insurance.

Telling insurance companies they cannot set or raise prices works until it doesn’t. What people are complaining about is that they want to purchase assets and be insured against potential losses, and they want someone else to pay for the real cost of covering those losses in exchange for a smaller amount of money.

That someone is going to have to be the State of California (or another state like Florida, as appropriate). The people still have to foot that bill, somehow.

This is an extraction of rents by existing owners of risky properties, and those who construct new risky properties, at the expense of everyone else. If the full real cost of insurance had to be paid or the risk accepted, then the value of the property would decline accordingly, so those buying anew would break even.

Our fair city is poised to allow 6-story buildings citywide by an 8-1 vote. In context that is a huge change. Under the old rules only 350 units (!) total were expected over 15 years and 85%+ of the existing housing wouldn’t have been legal to build. Here’s a primer on the changes. They had to compromise a bit on setbacks and lot size to get it over the finish line, but it still seems great.

Six stories is below what you’d want in some places, but it’s a huge step up here, and you can get remarkably dense with that alone.

Denver legalizes ADUs in neighborhoods citiwide as Governor Polis cheers them on.

The Ameriprise tower in Minneapolis sold for $6.25 million (from what I can tell there was no assumption of liabilities either), versus when it sold for $200 million in 2016.

As usual, I interpret this as the maintenance costs being high while occupancy is low, and an inability to legally use it as residential space or useful commercial space, so the marginal value here is mostly option value and the price is essentially zero.

We used to be a country. A proper country. And yes, this is a selective shot from NJ.

Autistic Transit Enthusiast: fucking wild how new York used to look like someone’s first cities skylines city.

The Omni Zaddy: Can’t believe it didn’t respect the character of the neighborhood 😢 I am sure that this building has remained an annoying eyesore that is broadly hated by New Yorkers to this day!

Taylor Swift explains a large part of the appeal of New York City, that you don’t have to plan things within physical space, you can much more allow things to simply happen. She talks about the night, it’s also true of the day and life in general. It doesn’t seem like it should matter so much, but it actually does. Also for cultural reasons that may be related in a non-obvious way, New York plans, when they do get made, are more reliable than plans elsewhere, not less.

Mayor Eric Adams has some issues, to say the least, but he does support the City of Yes, and general efforts to build more housing where people want to live, if not in the exact ways or to the degree I would prefer.

Chris Elmendorf looks at the City Council’s amendments to City of Yes, saying it’s wild how thoroughly they sheltered low-slung residential neighborhoods from change. It’s still progress, but far less than we would have hoped.

And it did pass. Notice anything about this map?

The richest places where people most want to live mostly voted yes. The poorer places, despite how the amendments ‘shielded them from change’ mostly voted no.

How about we fix (as in repeal) the Special Clinton District, which was created in 1974 and means a remarkably high value area of Manhattan is severely underbuilt and instead has the name Hell’s Kitchen and is host to the Daredevil?

The latest Adams proposal is called “City of Yes for Families.’ This involves zoning changes and additional housing initiatives, alas including foolish ones like down payment assistance. I am down for emphasizing family housing (as in 2+ bedroom apartments rather than 1-bedrooms and studios) but mostly I wish we would just focus on building more housing. The rest will take care of itself.

A cool motivating factor, Congressional representatives in shrinking areas like New York City need to support building more housing if they don’t want to fight each other after redistricting in 2030. We have no idea if that is why Rep. Dan Goldman is endorsing Zellnor Myrie for mayor, but we’ll take what we can get.

A fun fact about the world is that people remarkably rarely say ‘well, that would be a super bad look, maybe we should try and not appear maximally unreasonable so we don’t give them a great talking point.’

I mean, I respect the hell out of not doing that. It’s just, wow, all right then.

Thus:

sp6r=underrated: I really want to hammer this.

SF permitted 0 homes in the last month. None.

If Sacramento was serious, which it isn’t, this would spark immediate action.

I don’t understand how this happened but it’s great: San Francisco approves $700/month ‘pod’ housing at a former bank building in 12 Mint Plaza. You get a pod bed to sleep in, plus there is communal space, and they’re looking to expand to another larger location.

Armand Domalewski: it is really frustrating that so many people have tried to shut these sleeping pods down by arguing they’re inhumane while every single person I’ve met who lives in them is desperate for the city not to shut them down.

Like I’ve talked to at least three people who live in these pods and they’re all baffled by why people think evicting them somehow advances the cause of social justice.

Kelsey Piper: There’s something about small housing that brings out the absolute worst in people – instead of being rightfully angry at scarcity they get deeply and personally angry at the existence of small options and everyone who isn’t trying to shut them down.

Judge strikes down San Francisco’s vacant home tax. Very California. They have not as far as I know struck down the vacant storefront tax, but neither do we have reason to think they are enforcing it. I find it hard to tax empty storefronts given both how hard it is to actually open a store in San Francisco, and also SF’s general failure to enforce laws? As usual, the arguments of ‘this tax won’t be effective’ raise the question of ‘if the tax wouldn’t change behavior, doesn’t that mean it’s a great tax?’

It remains a bunch of suburbs at best, despite the immense amount of lost value.

Nate Silver: People can comment on whatever they want but Silicon Valley is a bunch of suburbs. Like go live in a real city if you have a Take on NYC.

Hayden: Silicon Valley could’ve been one of the most unbelievable and prosperous places on the planet, but elected officials decided to listen to NIMBYs for decades, and this is what the commercial corridor where the world’s most valuable company is headquartered looks like:

Imagine if nearly all the most innovative and wealthy companies on the planet descended came with jobs and investment enough to make the Vanderbilts blush and you keep it a glorified strip mall. I touch on it here:

William Eden: Texas Property Code:

“In addition, a property owners’ association can neither prohibit nor regulate the following:

– possession of firearms or ammunition (Section 202.021)

– lemonade stands (Section 202.020)

[end of list]”

I have a startup idea and the HOA can’t stop us 😏

It’s the [end of list] that gets me. These are the entirety of your enumerated protections from HOA tyranny in Texas. Thanks to @jamespayor for his diligent research of the Texas Property Code 🫡

First half of the business is set up

Our cities need more shade, especially as temperatures rise. An underrated concern.

It’s remarkable how much people will sacrifice in the name of sunlight, then never consider that we might want to walk in the shade.

Discussion about this post

Housing Roundup #11 Read More »

tuesday-telescope:-a-close-up-of-the-magical-camera-at-the-end-of-a-robotic-arm

Tuesday Telescope: A close-up of the magical camera at the end of a robotic arm

Welcome to the Tuesday Telescope. There is a little too much darkness in this world and not enough light—a little too much pseudoscience and not enough science. We’ll let other publications offer you a daily horoscope. At Ars Technica, we’ll take a different route, finding inspiration from very real images of a universe that is filled with stars and wonder.

We’re back! A long-time reader and subscriber recently mentioned in the Ars Forums that they “kind of” missed the Daily Telescope posts that I used to write in 2023 and 2024. Although I would have preferred that everyone desperately missed the Daily Telescope, I appreciate the sentiment. I really do.

I initially stopped writing these posts about a year ago because it just became too much to commit to writing one thing every day. I mean, I could have done it. But doing so on the daily crossed over the line from enjoyable to drudgery, and one of the best things about working for Ars is that it tends very much toward the enjoyable side. Anyway, writing one of these posts on a weekly basis feels more sustainable. I guess we’ll find out!

Today’s image comes to you all the way from Mars. One of the most powerful tools on NASA’s Perseverance rover is the WATSON camera attached to the end of the rover’s robotic arm. In the fine tradition of tortured acronyms at the space agency, WATSON stands for Wide Angle Topographic Sensor for Operations and eNgineering. And because of course it is, WATSON is located on the SHERLOC (Scanning Habitable Environments with Raman and Luminescence for Organics and Chemicals) instrument. Seriously, NASA must stand for Not Another Screwball Acronym.

Tuesday Telescope: A close-up of the magical camera at the end of a robotic arm Read More »

with-new-gen-4-model,-runway-claims-to-have-finally-achieved-consistency-in-ai-videos

With new Gen-4 model, Runway claims to have finally achieved consistency in AI videos

For example, it was used in producing the sequence in the film Everything Everywhere All At Once where two rocks with googly eyes had a conversation on a cliff, and it has also been used to make visual gags for The Late Show with Stephen Colbert.

Whereas many competing startups were started by AI researchers or Silicon Valley entrepreneurs, Runway was founded in 2018 by art students at New York University’s Tisch School of the Arts—Cristóbal Valenzuela and Alejandro Matamala from Chilé, and Anastasis Germanidis from Greece.

It was one of the first companies to release a usable video-generation tool to the public, and its team also contributed in foundational ways to the Stable Diffusion model.

It is vastly outspent by competitors like OpenAI, but while most of its competitors have released general-purpose video-creation tools, Runway has sought an Adobe-like place in the industry. It has focused on marketing to creative professionals like designers and filmmakers and has implemented tools meant to make Runway a support tool to existing creative workflows.

The support tool argument (as opposed to a standalone creative product) helped Runway secure a deal with motion picture company Lionsgate, wherein Lionsgate allowed Runway to legally train its models on its library of films, and Runway provided bespoke tools for Lionsgate for use in production or post-production.

That said, Runway is, along with Midjourney and others, one of the subjects of a widely publicized intellectual property case brought by artists who claim the companies illegally trained their models on their work, so not all creatives are on board.

Apart from the announcement about the partnership with Lionsgate, Runway has never publicly shared what data is used to train its models. However, a report in 404 Media seemed to reveal that at least some of the training data included video scraped from the YouTube channels of popular influencers, film studios, and more.

With new Gen-4 model, Runway claims to have finally achieved consistency in AI videos Read More »

lithium-ion-battery-waste-fires-are-increasing,-and-vapes-are-a-big-part-of-it

Lithium-ion battery waste fires are increasing, and vapes are a big part of it

2024 was “a year of growth,” according to fire-suppression company Fire Rover, but that’s not an entirely good thing.

The company, which offers fire detection and suppression systems based on thermal and optical imaging, smoke analytics, and human verification, releases annual reports on waste and recycling facility fires in the US and Canada to select industry and media. In 2024, Fire Rover, based on its fire identifications, saw 2,910 incidents, a 60 percent increase from the 1,809 in 2023, and more than double the 1,409 fires confirmed in 2022.

Publicly reported fire incidents at waste and recycling facilities also hit 398, a new high since Fire Rover began compiling its report eight years ago, when that number was closer to 275.

Lots of things could cause fires in the waste stream, long before lithium-ion batteries became common: “Fireworks, pool chemicals, hot (barbecue) briquettes,” writes Ryan Fogelman, CEO of Fire Rover, in an email to Ars. But lithium-ion batteries pose a growing problem, as the number of devices with batteries increases, consumer education and disposal choices remain limited, and batteries remain a very easy-to-miss, troublesome occupant of the waste stream.

All batteries that make it into waste streams are potentially hazardous, as they have so many ways of being set off: puncturing, vibration, overheating, short-circuiting, crushing, internal cell failure, overcharging, or inherent manufacturing flaws, among others. Fire Rover’s report notes that the media often portrays batteries as “spontaneously” catching fire. In reality, the very nature of waste handling makes it almost impossible to ensure that no battery will face hazards in handling, the report notes. Tiny batteries can be packed into the most disposable of items—even paper marketing materials handed out at conferences.

Fogelman estimates, based on his experience and some assumptions, that about half of the fires he’s tracking originate with batteries. Roughly $2.5 billion of loss to facilities and infrastructure came from fires last year, divided between traditional hazards and batteries, he writes.

Lithium-ion battery waste fires are increasing, and vapes are a big part of it Read More »

openai-#12:-battle-of-the-board-redux

OpenAI #12: Battle of the Board Redux

Back when the OpenAI board attempted and failed to fire Sam Altman, we faced a highly hostile information environment. The battle was fought largely through control of the public narrative, and the above was my attempt to put together what happened.

My conclusion, which I still believe, was that Sam Altman had engaged in a variety of unacceptable conduct that merited his firing.

In particular, he very much ‘not been consistently candid’ with the board on several important occasions. In particular, he lied to board members about what was said by other board members, with the goal of forcing out a board member he disliked. There were also other instances in which he misled and was otherwise toxic to employees, and he played fast and loose with the investment fund and other outside opportunities.

I concluded that the story that this was about ‘AI safety’ or ‘EA (effective altruism)’ or existential risk concerns, other than as Altman’s motivation to attempt to remove board members, was a false narrative largely spread by Altman’s allies and those who are determined to hate on anyone who is concerned future AI might get out of control or kill everyone, often using EA’s bad press or vibes as a point of leverage to do that.

A few weeks later, I felt that leaks confirmed the bulk the story I told at that first link, and since then I’ve had anonymous sources confirm my account was centrally true.

Thanks to Keach Hagey at the Wall Street Journal, we now have by far the most well-researched and complete piece on what happened: The Secrets and Misdirection Behind Sam Altman’s Firing From OpenAI. Most, although not all, of the important remaining questions are now definitively answered, and the story I put together has been confirmed.

The key now is to Focus Only On What Matters. What matters going forward are:

  1. Claims of Altman’s toxic and dishonest behaviors, that if true merited his firing.

  2. That the motivations behind the firing were these ordinary CEO misbehaviors.

  3. Altman’s allies successfully spread a highly false narrative about events.

  4. That OpenAI could easily have moved forward with a different CEO, if things had played out differently and Altman had not threatened to blow up OpenAI.

  5. OpenAI is now effectively controlled by Sam Altman going forward. His claims that ‘the board can fire me’ in practice mean very little.

Also important is what happened afterwards, which was likely caused in large part by both the events and also way they were framed, and also Altman’s consolidated power.

In particular, Sam Altman and OpenAI, whose explicit mission is building AGI and who plan to do so within Trump’s second term, started increasingly talking and acting like AGI was No Big Deal, except for the amazing particular benefits.

Their statements don’t feel the AGI. They no longer tell us our lives will change that much. It is not important, they do not even bother to tell us, to protect against key downside risks of building machines smarter and more capable than humans – such as the risk that those machines effectively take over, or perhaps end up killing everyone.

And if you disagreed with that, or opposed Sam Altman? You were shown the door.

  1. OpenAI was then effectively purged. Most of its strongest alignment researchers left, as did most of those who most prominently wanted to take care to ensure OpenAI’s quest for AGI did not kill everyone or cause humanity to lose control over the future.

  2. Altman’s public statements about AGI, and OpenAI’s policy positions, stopped even mentioning the most important downside risks of AGI and ASI (artificial superintelligence), and shifted towards attempts at regulatory capture and access to government cooperation and funding. Most prominently, their statement on the US AI Action Plan can only be described as disingenuous vice signaling in pursuit of their own private interests.

  3. Those public statements and positions no longer much even ‘feel the AGI.’ Altman has taken to predicting that AGI will happen and your life won’t much change, and treating future AGI as essentially a fungible good. We know, from his prior statements, that Altman knows better. And we know from their current statements that many the engineers at OpenAI know better. Indeed, in context, they shout it from the rooftops.

  4. We discovered that self-hiding NDAs were aggressively used by OpenAI, under threat of equity confiscation, to control people and the narrative.

  5. With control over the board, Altman is attempting to convert OpenAI into a for-profit company, with sufficiently low compensation that this act could plausibly become the greatest theft in human history.

Beware being distracted by the shiny. In particular:

  1. Don’t be distracted by the article’s ‘cold open’ in which Peter Thiel tells a paranoid and false story to Sam Altman, in which Thiel asserts that ‘EAs’ or ‘safety’ people will attempt to destroy OpenAI, and that they have ‘half the company convinced’ and so on. I don’t doubt the interaction happened, but this was unrelated to what happened.

    1. To the extent it was related, it was because Altman and his allies paranoia about such possibilities, inspired by such tall tales, caused Altman to lie to the board in general, and attempt to force Helen Toner off the board in particular.

  2. Don’t be distracted by the fact that the board botched the firing, and the subsequent events, from a tactical perspective. Yes we can learn from their mistakes, but the board that made those mistakes is gone now.

This is all quite bad, but things could be far worse. OpenAI still has many excellent people working on alignment, security and safety. I They have put out a number of strong documents. By that standard, and in terms of how responsibly they have actually handled their releases, OpenAI has outperformed many other industry actors, although less responsible than Anthropic. Companies like DeepSeek, Meta and xAI, and at times Google, work hard to make OpenAI look good on these fronts.

Now, on to what we learned this week.

Hagey’s story paints a clear picture of what actually happened.

It is especially clear about why this happened. The firing wasn’t about EA, ‘the safety people’ or existential risk. What was this about?

Altman repeatedly lied to, misled and mistreated employees of OpenAI. Altman repeatedly lied about and withheld factual and importantly material matters, including directly to the board. There was a large litany of complaints.

The big new fact is that the board was counting on Murati’s support. But partly because of this, they felt they couldn’t disclose that their information came largely from Murati. That doesn’t explain why they couldn’t say this to Murati herself.

If the facts asserted in the WSJ article are true, I would say that any responsible board would have voted for Altman’s removal. As OpenAI’s products got more impactful, and the stakes got higher, Altman’s behaviors left no choice.

Claude agreed, this was one shot, I pasted in the full article and asked:

Zvi: I’ve shared a news article. Based on what is stated in the news article, if the reporting is accurate, how would you characterize the board’s decision to fire Altman? Was it justified? Was it necessary?

Claude 3.7: Based on what’s stated in the article, the board’s decision to fire Sam Altman appears both justified and necessary from their perspective, though clearly poorly executed in terms of preparation and communication.

I agree, on both counts. There are only two choices here, at least one must be true:

  1. The board had a fiduciary duty to fire Altman.

  2. The board members are outright lying about what happened.

That doesn’t excuse the board’s botched execution, especially its failure to disclose information in a timely manner.

The key facts cited here are:

  1. Altman said publicly and repeatedly ‘the board can fire me. That’s important’ but he really called the shots and did everything in his power to ensure this.

  2. Altman did not even inform the board about ChatGPT in advance, at all.

  3. Altman explicitly claimed three enhancements to GPT-4 had been approved by the joint safety board. Helen Toner found only one had been approved.

  4. Altman allowed Microsoft to launch the test of GPT-4 in India, in the form of Sydney, without the approval of the safety board or informing the board of directors of the breach. Due to the results of that experiment entering the training data, deploying Sydney plausibly had permanent effects on all future AIs. This was not a trivial oversight.

  5. Altman did not inform the board that he had taken financial ownership of the OpenAI investment fund, which he claimed was temporary and for tax reasons.

  6. Mira Murati came to the board with a litany of complaints about what she saw as Altman’s toxic management style, including having Brockman, who reported to her, go around her to Altman whenever there was a disagreement. Altman responded by bringing the head of HR to their 1-on-1s until Mira said she wouldn’t share her feedback with the board.

  7. Altman promised both Pachocki and Sutskever they could direct the research direction of the company, losing months of productivity, and this was when Sutskever started looking to replace Altman.

  8. The most egregious lie (Hagey’s term for it) and what I consider on its own sufficient to require Altman be fired: Altman told one board member, Sutskever, that a second board member, McCauley, had said that Toner should leave the board because of an article Toner wrote. McCauley said no such thing. This was an attempt to get Toner removed from the board. If you lie to board members about other board members in an attempt to gain control over the board, I assert that the board should fire you, pretty much no matter what.

  9. Sutskever collected dozens of examples of alleged Altman lies and other toxic behavior, largely backed up by screenshots from Murati’s Slack channel. One lie in particular was that Altman told Murati that the legal department had said GPT-4-Turbo didn’t have to go through joint safety board review. The head lawyer said he did not say that. The decision not to go through the safety board here was not crazy, but lying about the lawyers opinion on this is highly unacceptable.

Murati was clearly a key source for many of these firing offenses (and presumably for this article, given its content and timing, although I don’t know anything nonpublic). Despite this, even after Altman was fired, the board didn’t even tell Murati why they had fired him while asking her to become interim CEO, and in general stayed quiet largely (in this post’s narrative) to protect Murati. But then, largely because of the board’s communication failures, Murati turned on the board and the employees backed Altman.

This section reiterates and expands on my warnings above.

The important narrative here is that Altman engaged in various shenanigans and made various unforced errors that together rightfully got him fired. But the board botched the execution, and Altman was willing to burn down OpenAI in response and the board wasn’t. Thus, Altman got power back and did an ideological purge.

The first key distracting narrative, the one I’m seeing many fall into, is to treat this primarily as a story about board incompetence. Look at those losers, who lost, because they were stupid losers in over their heads with no business playing at this level. Many people seem to think the ‘real story’ is that a now defunct group of people were bad at corporate politics and should get mocked.

Yes, that group was bad at corporate politics. We should update on that, and be sure that the next time we have to Do Corporate Politics we don’t act like that, and especially that we explain why we we doing things. But the group that dropped this ball is defunct, whereas Altman is still CEO. And this is not a sporting event.

The board is now irrelevant. Altman isn’t. What matters is the behavior of Altman, and what he did to earn getting fired. Don’t be distracted by the shiny.

A second key narrative spun by Altman’s allies is that Altman is an excellent player of corporate politics. He has certainly pulled off some rather impressive (and some would say nasty) tricks. But the picture painted here is rife with unforced errors. Altman won because the opposition played badly, not because he played so well.

Most importantly, as I noted at the time, the board started out with nine members, five of whom at the time were loyal to Altman even if you don’t count Ilya Sutskever. Altman could easily have used this opportunity to elect new loyal board members. Instead, he allowed three of his allies to leave the board without replacement, leading to the deadlock of control, which then led to the power struggle. Given Altman knows so many well-qualified allies, this seems like a truly epic level of incompetence to me.

The third other key narrative is the one Altman’s allies have centrally told since day one, which is entirely false, is that this firing (which they misleadingly call a ‘coup’) was ‘the safety people’ or ‘the EAs’ trying to ‘destroy’ OpenAI.

My worry is that many will see that this false framing is presented early in the post, and not read far enough to realize the post is pointing out that the framing is entirely false. Thus, many or even most readers might get exactly the wrong idea.

In particular, this piece opens with an irrelevant story ecoching this false narrative. Peter Thiel is at dinner telling his friend Sam Altman a frankly false and paranoid story about Effective Altruism and Eliezer Yudkowsky.

Thiel says that ‘half the company believes this stuff’ (if only!) and that ‘the EAs’ had ‘taken over’ OpenAI (if only again!), and predicting that ‘the safety people,’ who on various occasions Thiel has described as literally and at length as the biblical Antichrist would ‘destroy’ OpenAI (whereas, instead, the board in the end fell on its sword to prevent Altman and his allies from destroying OpenAI).

And it gets presented in ways like this:

We are told to focus on the nice people eating dinner while other dastardly people held ‘secret video meetings.’ How is this what is important here?

Then if you keep reading, Hagey makes it clear: The board’s firing of Altman had nothing to do with that. And we get on with the actual excellent article.

I don’t doubt Thiel told that to Altman, and I find it likely Thiel even believed it. The thing is, it isn’t true, and it’s rather important that people know it isn’t true.

If you want to read more about what has happened at OpenAI, I have covered this extensively, and my posts contain links to the best primary and other secondary sources I could find. Here are the posts in this sequence.

  1. OpenAI: Facts From a Weekend.

  2. OpenAI: The Battle of the Board.

  3. OpenAI: Altman Returns.

  4. OpenAI: Leaks Confirm the Story.

  5. OpenAI: The Board Expands.

  6. OpenAI: Exodus.

  7. OpenAI: Fallout

  8. OpenAI: Helen Toner Speaks.

  9. OpenAI #8: The Right to Warn.

  10. OpenAI #10: Reflections.

  11. On the OpenAI Economic Blueprint.

  12. The Mask Comes Off: At What Price?

  13. OpenAI #11: America Action Plan.

The write-ups will doubtless continue, as this is one of the most important companies in the world.

Discussion about this post

OpenAI #12: Battle of the Board Redux Read More »