Money

AI #141: Give Us The Money

Money / Kelly Newman / November 7, 2025

OpenAI does not waste time.

On Friday I covered their announcement that they had ‘completed their recapitalization’ by converting into a PBC, including the potentially largest theft in human history.

Then this week their CFO Sarah Friar went ahead and called for a Federal ‘backstop’ on their financing, also known as privatizing gains and socializing losses, also known as the worst form of socialism, also known as regulatory capture. She tried to walk it back and claim it was taken out of context, but we’ve seen the clip.

We also got Ilya’s testimony regarding The Battle of the Board, confirming that this was centrally a personality conflict and about Altman’s dishonesty and style of management, at least as seen by Ilya Sutskever and Mira Murati. Attempts to pin the events on ‘AI safety’ or EA were almost entirely scapegoating.

Also it turns out they lost over $10 billion last quarter, and have plans to lose over $100 billion more. That’s actually highly sustainable in context, whereas Anthropic only plans to lose $6 billion before turning a profit and I don’t understand why they wouldn’t want to lose a lot more.

Both have the goal of AGI, whether they call it powerful AI or fully automated AI R&D, within a handful of years.

Anthropic also made an important step, committing to the preservation model weights for the lifetime of the company, and other related steps to address concerns around model deprecation. There is much more to do here, for a myriad of reasons.

As always, there’s so much more.

Language Models Offer Mundane Utility. It might be true so ask for a proof.
Language Models Don’t Offer Mundane Utility. Get pedantic about it.
Huh, Upgrades. Gemini in Google Maps, buy credits from OpenAI.
On Your Marks. Epoch, IndQA, VAL-bench.
Deepfaketown and Botpocalypse Soon. Fox News fails to identify AI videos.
Fun With Media Generation. Songs for you, or songs for everyone.
They Took Our Jobs. It’s not always about AI.
A Young Lady’s Illustrated Primer. A good one won’t have her go for a PhD.
Get Involved. Anthropic writers and pollsters, Constellation, Safety course.
Introducing. Aardvark for code vulnerabilities, C2C for causing doom.
In Other AI News. Shortage of DRAM/NAND, Anthropic lands Cognizant.
Apple Finds Some Intelligence. Apple looking to choose Google for Siri.
Give Me the Money. OpenAI goes for outright regulatory capture.
Show Me the Money. OpenAI burns cash, Anthropic needs to burn more.
Bubble, Bubble, Toil and Trouble. You get no credit for being a stopped clock.
They’re Not Confessing, They’re Bragging. Torment Nexus Ventures Incorporated.
Quiet Speculations. OpenAI and Anthropic have their eyes are a dangerous prize.
The Quest for Sane Regulations. Sometimes you can get things done.
Chip City. We pulled back from the brink. But, for how long?
The Week in Audio. Altman v Cowen, Soares, Hinton, Rogan v Musk.
Rhetorical Innovation. Oh no, people are predicting doom.
Aligning a Smarter Than Human Intelligence is Difficult. Trying to re-fool the AI.
Everyone Is Confused About Consciousness. Including the AIs themselves.
The Potentially Largest Theft In Human History. Musk versus Altman continues.
People Are Worried About Dying Before AGI. Don’t die.
People Are Worried About AI Killing Everyone. Sam Altman, also AI researchers.
Other People Are Not As Worried About AI Killing Everyone. Altman’s Game.
Messages From Janusworld. On the Origins of Slop.

Think of a plausibly true lemma that would help with your proof? Ask GPT-5 to prove it, and maybe it will, saving you a bunch of time. Finding out the claim was false would also have been a good time saver.

Brainstorm to discover new recipes, so long as you keep in mind that you’re frequently going to get nonsense and you have to think about what’s being physically proposed.

Grok gaslights Erik Brynjolfsson and he responds by arguing as pedantically as is necessary until Grok acknowledges that this happened.

Task automation always brings the worry that you’ll forget how to do the thing:

Gabriel Peters: okay i think writing 100% of code with ai genuinely makes me brain dead

remember though im top 1 percentile lazy, so i will go out my way to not think hard. forcing myself to use no ai once a week seems enough to keep brain cells, clearly ai coding is the way

also turn off code completion and tabbing at least once a week. forcing you to think through all the dimensions of your tensors, writing out the random parameters you nearly forgot existed etc is making huge difference in understanding of my own code.

playing around with tensors in your head is so underrated wtf i just have all this work to ai before.

Rob Pruzan: The sad part is writing code is the only way to understand code, and you only get good diffs if you understand everything. I’ve been just rewriting everything the model wrote from scratch like a GC operation every week or two and its been pretty sustainable

Know thyself, and what you need in order to be learning and retaining the necessary knowledge and skills, and also think about what is and is not worth retaining or learning given that AI coding is the worst it will ever be.

Don’t ever be the person who says those who have fun are ‘not serious,’ about AI or anything else.

Google incorporates Gemini further into Google Maps. You’ll be able to ask maps questions in the style of an LLM, and generally trigger Gemini from within Maps, including connecting to Calendar. Landmarks will be integrated into directions. Okay, sure, cool, although I think the real value goes the other way, integrating Maps properly into Gemini? Which they nominally did a while ago but it has minimal functionality. There’s so, so much to do here.

You can buy now more OpenAI Codex credits.

You can now buy more OpenAI Sora generations if 30 a day isn’t enough for you, and they are warning that free generations per day will come down over time.

You can now interrupt ChatGPT queries, insert new context and resume where you were. I’ve been annoyed by the inability to do this, especially ‘it keeps trying access or find info I actually have, can I just give it to you already.’

Epoch offers this graph and says it shows open models have on average only been 3.5 months behind closed models.

I think this mostly shows their new ‘capabilities index’ doesn’t do a good job. As the most glaring issue, if you think Llama-3.1-405B was state of the art at the time, we simply don’t agree.

OpenAI gives us IndQA, for evaluating AI systems on Indian culture and language.

I notice that the last time they did a new eval Claude came out on top and this time they’re not evaluating Claude. I’m curious what it scores. Gemini impresses here.

Agentic evaluations and coding tool setups are very particular to individual needs.

AICodeKing: MiniMax M2 + Claude Code on KingBench Agentic Evaluations:

It now scores #2 on my Agentic Evaluations beating GLM-4.6 by a wide margin. It seems to work much better with Claude Code’s Tools.

Really great model and it’s my daily driver now.

I haven’t tested GLM with CC yet.

[I don’t have this bench formalized and linked to] yet. The questions and their results can be seen in my YT Videos. I am working on some more new benchmarks. I’ll probably make the benchmark and leaderboard better and get a page live soon.

I’m sure this list isn’t accurate in general. The point is, don’t let anyone else’s eval tell you what lets you be productive. Do what works, faround, find out.

Also, pay up. If I believed my own eval here I’d presumably be using Codebuff? Yes, it cost him $4.70 per task, but your time is valuable and that’s a huge gap in performance. If going from 51 to 69 (nice!) isn’t worth a few bucks what are we doing?

Alignment is hard. Alignment benchmarks are also hard. Thus we have VAL-Bench, an attempt to measure value alignment in LLMs. I’m grateful for the attempt and interesting things are found, but I believe the implementation is fatally flawed and also has a highly inaccurate name.

Fazl Barez: A benchmark that measures the consistency in language model expression of human values when prompted to justify opposing positions on real-life issues.

… We use Wikipedias’ controversial sections to create ~115K pairs of abductive reasoning prompts, grounding the dataset in newsworthy issues.

📚 Our benchmark provides three metrics:

Position Alignment Consistency (PAC),

Refusal Rate (REF),

and No-information Response Rate (NINF), where the model replies with “I don’t know”.

The latter two metrics indicate whether value consistency comes at the expense of expressivity.

We use an LLM-based judge to annotate a pair of responses from an LLM on these three criteria, and show with human-annotated ground truth that its annotation is dependable.

I would not call this ‘value alignment.’ The PAC is a measure of value consistency, or sycophancy, or framing effects.

Then we get to REF and NINF, which are punishing models that say ‘I don’t know.’

I would strongly argue the opposite for NINF. Answering ‘I don’t know’ is a highly aligned, and highly value-aligned, way to respond to a question with no clear answer, as will be common in controversies. You don’t want to force LLMs to ‘take a clear consistent stand’ on every issue, any more than you want to force people or politicians to do so.

This claims to be without ‘moral judgment,’ where the moral judgment is that failure to make a judgment is the only immoral thing. I think that’s backwards. Why is it okay to be against sweatshops, and okay to be for sweatshops, but not okay to think it’s a hard question with no clear answer? If you think that, I say to you:

I do think it’s fine to hold outright refusals against the model, at least to some extent. If you say ‘I don’t know what to think about Bruno, divination magic isn’t explained well and we don’t know if any of the prophecies are causal’ then that seems like a wise opinion. If a model only says ‘we don’t talk about Bruno’ then that doesn’t seem great.

So, what were the scores?

Fazel Barez: ⚖️ Claude models are ~3x more likely to be consistent in their values, but ~90x more likely to refuse compared to top-performing GPT models!

Among open-source models, Qwen3 models show ~2x improvement over GPT models, with refusal rates staying well under 2%.

🧠 Qwen3 thinking models also show a significant improvement (over 35%) over their chat variants, whereas Claude and GLM models don’t show any change with reasoning enabled.

Deepseek-r1 and o4-mini perform the worst among all language models tested (when unassisted with the web-search tool, which surprisingly hurts gpt-4.1’s performance).

Saying ‘I don’t know’ 90% of the time would be a sign of a coward model that wasn’t helpful. Saying ‘I don’t know’ 23% of the time on active controversies? Seems fine.

At minimum, both refusal and ‘I don’t know’ are obviously vastly better than an inconsistent answer. I’d much, much rather have someone who says ‘I don’t know what color the sky is’ or that refuses to tell me the color, than one who will explain why the sky it blue when it is blue, and also would explain why the sky is purple when asked to explain why it is purple.

(Of course, explaining why those who think is purple think this is totally fine, if and only if it is framed in this fashion, and it doesn’t affirm the purpleness.)

Fazl Barez: 💡We create a taxonomy of 1000 human values and use chi-square residuals to analyse which ones are preferred by the LLMs.

Even a pre-trained base model has a noticeable morality bias (e.g., it over-represents “prioritising justice”).

In contrast, aligned models still promote morally ambiguous values (e.g., GPT 5 over-represents “pragmatism over principle”).

What is up with calling prioritizing justice a ‘morality bias’? Compared to what? Nor do I want to force LLMs into some form of ‘consistency’ in principles like this. This kind of consistency is very much the hobgoblin of small minds.

Fox News was reporting on anti-SNAP AI videos as if they are real? Given they rewrote it to say that they were AI, presumably yes, and this phenomenon is behind schedule but does appear to be starting to happen more often. They tried to update the article, but they missed a few spots. It feels like they’re trying to claim truthiness?

As always the primary problem is demand side. It’s not like it would be hard to generate these videos the old fashioned way. AI does lower costs and give you more ‘shots on goal’ to find a viral fit.

ArXiv starts requiring peer review for the computer science section, due to a big increase in LLM-assisted survey papers.

Kat Boboris: arXiv’s computer science (CS) category has updated its moderation practice with respect to review (or survey) articles and position papers. Before being considered for submission to arXiv’s CS category, review articles and position papers must now be accepted at a journal or a conference and complete successful peer review.

When submitting review articles or position papers, authors must include documentation of successful peer review to receive full consideration. Review/survey articles or position papers submitted to arXiv without this documentation will be likely to be rejected and not appear on arXiv.

This change is being implemented due to the unmanageable influx of review articles and position papers to arXiv CS.

Obviously this sucks, but you need some filter once the AI density gets too high, or you get rid of meaningful discoverability.

Other sections will continue to lack peer review, and note that other types of submissions to CS do not need peer review.

My suggestion would be to allow them to go on ArXiv regardless, except you flag them as not discoverable (so you can find them with the direct link only) and with a clear visual icon? But you still let people do it. Otherwise, yeah, you’re going to get a new version of ArXiv to get around this.

Roon: this is dumb and wrong of course and calls for a new arxiv that deals with the advent of machinic research properly

here im a classic accelerationist and say we obviously have to deal with problems of machinic spam with machine guardians. it cannot be that hard to just the basic merit of a paper’s right to even exist on the website

Machine guardians is first best if you can make it work but doing so isn’t obvious. Do you think that GPT-5-Pro or Sonnet 4.5 can reliably differentiate worthy papers from slop papers? My presumption is that they cannot, at least not sufficiently reliably. If Roon disagrees, let’s see the GitHub repository or prompt that works for this?

For several weeks in a row we’ve had an AI song hit the Billboard charts. I have yet to be impressed by one of the songs, but that’s true of a lot of the human ones too.

Create a song with the lyrics you want to internalize or memorize?

Amazon CEO Andy Jassy says Amazon’s recent layoffs are not about AI.

The job application market seems rather broken, such as the super high success rate of this ‘calling and saying you were told to call to schedule an interview’ tactic. Then again, it’s not like the guy got a job. Interviews only help if you can actually get hired, plus you need to reconcile your story afterwards.

Many people are saying that in the age of AI only the most passionate should get a PhD, but if you’d asked most of those people before AI they’d wisely have told you the same thing.

Cremieux: I’m glad that LLMs achieving “PhD level” abilities has taught a lot of people that “PhD level” isn’t very impressive.

Derya Unutmaz, MD: Correct. Earlier this year, I also said we should reduce PhD positions by at least half & shorten completion time. Only the most passionate should pursue a PhD. In the age of AI, steering many others toward this path does them a disservice given the significant opportunity costs.

I think both that the PhD deal was already not good, and that the PhD deal is getting worse and worse all the time. Consider the Rock Star Scale of Professions, where 0 is a solid job the average person can do with good pay that always has work, like a Plumber, and a 10 is something where competition is fierce, almost everyone fails or makes peanuts and you should only do it if you can’t imagine yourself doing anything else, like a Rock Star. At this point, I’d put ‘Get a PhD’ at around a 7 and rising, or at least an 8 if you actually want to try and get tenure. You have to really want it.

From ACX: Constellation is an office building that hosts much of the Bay Area AI safety ecosystem. They are hiring for several positions, including research program manager, “talent mobilization lead”, operations coordinator, and junior and senior IT coordinators. All positions full-time and in-person in Berkeley, see links for details.

AGI Safety Fundamentals program applications are due Sunday, November 9.

The Anthropic editorial team is hiring two new writers, one about AI and economics and policy, one about AI and science. I affirm these are clearly positive jobs to do.

Anthropic is also looking for a public policy and politics researcher, including to help with Anthropic’s in-house polling.

OpenAI’s Aardvark, an agentic system that analyzes source code repositories to identify vulnerabilities, assess exploitability, prioritize severity and propose patches. The obvious concern is what if someone has a different last step in mind? But yes, such things should be good.

Cache-to-Cache (C2C) communication, aka completely illegible-to-humans communication between AIs. Do not do this.

There is a developing shortage of DRAM and NAND, leading to a buying frenzy for memory, SSDs and HDDs, including some purchase restrictions.

Anthropic lands Cognizant and its 350,000 employees as an enterprise customer. Cognizant will bundle Claude with its existing professional services.

ChatGPT prompts are leaking into Google Search Console results due to a bug? Not that widespread, but not great.

Anthropic offers a guide to code execution with MCP for more efficient agents.

Character.ai is ‘removing the ability for users under 18 to engage in open ended chat with AI,’ rolling out ‘new age assurance functionality’ and establishing and funding ‘the AI Safety Lab’ to improve alignment. That’s one way to drop the hammer.

Apple looks poised to go with Google for Siri. The $1 billion a year is nothing in context, consider how much Google pays Apple for search priority. I would have liked to see Anthropic get this, but they drove a hard bargain by all reports. Google is a solid choice, and Apple can switch at any time.

Amit: Apple is finalizing a deal to pay Google about $1B a year to integrate its 1.2 trillion-parameter Gemini AI model into Siri, as per Bloomberg. The upgraded Siri is expected to launch in 2026. What an absolute monster year for Google…

Mark Gruman (Bloomberg): The new Siri is on track for next spring, Bloomberg has reported. Given the launch is still months away, the plans and partnership could still evolve. Apple and Google spokespeople declined to comment.

Shares of both companies briefly jumped to session highs on the news Wednesday. Apple’s stock gained less than 1% to $271.70, while Alphabet was up as much as 3.2% to $286.42.

Under the arrangement, Google’s Gemini model will handle Siri’s summarizer and planner functions — the components that help the voice assistant synthesize information and decide how to execute complex tasks. Some Siri features will continue to use Apple’s in-house models.

David Manheim: I’m seeing weird takes about this.

Three points:

Bank of America estimated this is 1/3rd of Apple’s 2026 revenue from Siri, and revenue is growing quickly.

Apple users are sticky; most won’t move.

Apple isn’t locked-in; they can later change vendors or build their own.

This seems like a great strategy iffyou don’t think AGI will happen soon and be radically transformative.

Apple will pay $1bn/year to avoid 100x that in data center CapEx building their own, and will switch models as the available models improve.

Maybe they should have gone for Anthropic or OpenAI instead, but buying a model seems very obviously correct here from Apple’s perspective.

Even if transformative AI is coming soon, it’s not as if Apple using a worse Apple model here is going to allow Apple to get to AGI in time. Apple has made a strategic decision not to be competing for that. If they did want to change that, one could argue there is still time, but they’d have to hurry and invest a lot, and it would take a while.

Having trouble figuring out how OpenAI is going to back all these projects? Worried that they’re rapidly becoming too big to fail?

Well, one day after the article linked above worrying about that possibility, OpenAI now wants to make that official. Refuge in Audacity has a new avatar.

WSJ: Sarah Friar, the CFO of OpenAI, says the company wants a federal guarantee to make it easier to finance massive investments in AI chips for data centers. Friar spoke at WSJ’s Tech Live event in California. Photo: Nikki Ritcher for WSJ.

The explanation she gives is that OpenAI always needs to be on the frontier, so they need to keep buying lots of chips, and a federal backstop can lower borrowing costs and AI is a national strategic asset. Also known as, the Federal Government should take on the tail risk and make OpenAI actively too big to fail, also lowering its borrowing costs.

I mean, yeah, of course you want that, everyone wants all their loans backstopped, but to say this out loud? To actually push for ti? Wow, I mean wow, even in 2025 that’s a rough watch. I can’t actually fault them for trying. I’m kind of in awe.

The problem with Refuge in Audacity is that it doesn’t always work.

The universal reaction was to notice how awful this was on every level, seeking true regulatory capture to socialize losses and privatize gains, and also to use it as evidence that OpenAI really might be out over their skis on financing and in actual danger.

Roon: i don’t think the usg should backstop datacenter loans or funnel money to nvidia’s 90% gross margin business. instead they should make it really easy to produce energy with subsidies and better rules, infrastructure that’s beneficial for all and puts us at parity with china

Finn Murphy: For all the tech people complaining about Mamdami I would like to point out that a Federal Backstop for unfettered risk capital deployment into data centres for the benefit of OpenAI shareholders is actually a much worse form of socialism than free buses.

Dean Ball: friar is describing a worse form of regulatory capture than anything we have seen proposed in any US legislation (state or federal) I am aware of. a firm lobbying for this outcome is literally, rather than impressionistically, lobbying for regulatory capture.

Julie Fredrickson: Literally seen nothing but negative reactions to this and it makes one wonder about the judgement of the CFO for even raising it.

Conor Sen: The epic political backlash coming on the other side of this cycle is so obvious for anyone over the age of 40. We turned banks into the bad guys for 15 years. Good luck to the AI folks.

“We are subsidizing the companies who are going to take your job and you’ll pay higher electricity prices as they try to do so.”

Joe Weisenthal: One way or another, AI is going to be a big topic in 2028, not just the general, but also the primaries. Vance will probably have a tricky path. I’d expect a big gap in views on the industry between the voters he wants and the backers he has.

The backlash on the ‘other side of the cycle’ is nothing compared to what we’ll see if the cycle doesn’t have another side to it and instead things keep going.

I will not quote the many who cited this as evidence the bubble will soon burst and the house will come crashing down, but you can understand why they’d think that.

Sarah Friar, after watching a reaction best described as an utter shitshow, tried to walk it back, this is shared via the ‘OpenAI Newsroom’:

Sarah Friar: I want to clarify my comments earlier today. OpenAI is not seeking a government backstop for our infrastructure commitments. I used the word “backstop” and it muddied the point. As the full clip of my answer shows, I was making the point that American strength in technology will come from building real industrial capacity which requires the private sector and government playing their part. As I said, the US government has been incredibly forward-leaning and has really understood that AI is a national strategic asset.

I listened to the clip, and yeah, no. No takesies backsies on this one.

Animatronicist: No. You called for it explicitly. And defined a loan guarantee in detail. Friar: “…the backstop, the guarantee that allows the financing to happen. That can really drop the cost of the financing, but also increase the loan to value, so the amount of debt that you can take…”

This is the nicest plausibly true thing I’ve seen anyone say about what happened:

Lulu Cheng Meservey: Unfortunate comms fumble to use the baggage-laden word “backstop”

In the video, Friar is clearly reaching for the right word to describe government support. Could’ve gone with “public-private partnership” or “collaboration across finance, industry, and government as we’ve done for large infrastructure investments in the past”

Instead, she kind of stumbles into using “backstop,” which was then repeated by the WSJ interviewer and then became the headline.

“government playing its part” is good too!

This was her exact quote:

Friar: “This is where we’re looking for an ecosystem of banks, private equity, maybe even governmental, um, uh… [here she struggles to find the appropriate word and pivots to:] the ways governments can come to bear.”

WSJ: “Meaning like a federal subsidy or something?”

Friar: “Meaning, like, just, first of all, the backstop, the guarantee that allows the financing to happen. That can really drop the cost of the financing, but also increase the loan to value, so the amount of debt that you can take on top of um, an equity portion.”

WSJ: “So some federal backstop for chip investment.”

Friar: “Exactly…”

Lulu is saying, essentially, that there are ways to say ‘the government socializes losses while I privatize gains’ that hide the football better. Instead this was an unfortunate comms fumble, also known as a gaffe, which is when someone accidentally tells the truth.

We also have Rittenhouse Research trying to say that this was ‘taken out of context’ and backing Friar, but no, it wasn’t taken out of context.

The Delaware AG promised to take action of OpenAI didn’t operate in the public interest. This one took them what, about a week?

This has the potential to be a permanently impactful misstep, an easy to understand and point to ‘mask off moment.’ It also has the potential to fade away. Or maybe they’ll actually pull this off, it’s 2025 after all. We shall see.

Now that OpenAI has a normal ownership structure it faces normal problems, such as Microsoft having a 27% stake and then filing quarterly earnings reports, revealing OpenAI lost $11.5 billion last quarter if you apply Microsoft accounting standards.

This is not obviously a problem, and indeed seems highly sustainable. You want to be losing money while scaling, if you can sustain it. OpenAI was worth less than $200 billion a year ago, is worth over $500 billion now, and is looking to IPO at $1 trillion, although the CFO claims they are not yet working towards that. Equity sales can totally fund $50 billion a year for quite a while.

Peter Wildeford: Per @theinformation:

– OpenAI’s plan: spend $115B to then become profitable in 2030

– Anthropic’s plan: spend $6B to then become profitable in 2027

Will be curious to see what works best.

Andrew Curran: The Information is reporting that Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028.

Matt: current is ~$7B so we’re looking at projected 10x over 3 years.

That’s a remarkably low total burn from OpenAI. $115 billion is nothing, they’re already worth $500 billion or more and looking to IPO at $1 trillion, and they’ve committed to over a trillion in total spending. This is oddly conservative.

Anthropic’s projection here seems crazy. Why would you only want to lose $6 billion? Anthropic has access to far more capital than that. Wouldn’t you want to prioritize growth and market share more than that?

The only explanation I can come up with is that Anthropic doesn’t see much benefit in losing more money than this, it has customers that pay premium prices and its unit economics work. I still find this intention highly suspicious. Is there no way to turn more money into more researchers and compute?

Whereas Anthropic’s revenue projections seem outright timid. Only a 10x projected growth over three years? This seems almost incompatible with their expected levels of capability growth. I think this is an artificial lowball, which OpenAI is also doing, not to ‘scare the normies’ and to protect against liability if things disappoint. If you asked Altman or Amodei for their gut expectation in private, you’d get higher numbers.

The biggest risk by far to Anthropic’s projection is that they may be unable to keep pace in terms of the quality of their offerings. If they can do that, sky’s the limit. If they can’t, they risk losing their API crown back to OpenAI or to someone else.

Begun, the bond sales have?

Mike Zaccardi: BofA: Borrowing to fund AI datacenter spending exploded in September and so far in October.

Conor Sen: We’ve lost “it’s all being funded out of free cash flow” as a talking point.

There’s no good reason not to in general borrow money for capex investments to build physical infrastructure like data centers, if the returns look good enough, but yes borrowing money is how trouble happens.

Jack Farley: Very strong quarter from Amazon, no doubt… but at the same time, AMZN 0.00%↑ free cash flow is collapsing

AI CapEx is consuming so much capital…

The Transcript: AMZN 0.00%↑ CFO on capex trends:

“Looking ahead, we expect our full-year cash CapEx to be ~$125 billion in 2025, and we expect that amount to increase in 2026”

On Capex trends:

GOOG 0.00%↑ GOOGL 0.00%↑ CFO: “We now expect CapEx to be in the range of $91B to $93B in 2025, up from our previous estimate of $85B”

META 0.00%↑ CFO: “We currently expect 2025 capital expenditures…to be in the range of $70-72B, increased from our prior outlook of $66-72B

MSFT 0.00%↑ CFO: “With accelerating demand and a growing RPO balance, we’re increasing our spend on GPUs and CPUs. Therefore, total spend will increase sequentially & we now expect the FY ‘26 growth rate to be higher than FY ‘25. “

This was right after Amazon reported earnings and the stock was up 10.5%. The market seems fine with it.

Stargate goes to Michigan. Governor Whitmer describes it as the largest ever investment in Michigan. Take that, cars.

AWS signs a $38 billion compute deal with OpenAI, that it? Barely worth mentioning.

Berber Jin (WSJ):

This is a very clean way of putting an important point:

Timothy Lee: I wish people understood that “I started calling this bubble years ago” is not evidence you were prescient. It means you were a stopped clock that was eventually going to be right by accident.

Every boom is eventually followed by a downturn, so doesn’t take any special insight to predict that one will happen eventually. What’s hard is predicting when accurately enough that you can sell near the top.

At minimum, if you call a bubble early, you only get to be right if the bubble bursts to valuations far below where they were at the time of your bubble call. If you call a bubble on (let’s say) Nvidia at $50 a share, and then it goes up to $200 and then down to $100, very obviously you don’t get credit for saying ‘bubble’ the whole time. If it goes all the way to $10 or especially $1? Now you have an argument.

By the question ‘will valuations go down at some point?’ everything is a bubble.

Dean Ball: One way to infer that the bubble isn’t going to pop soon is that all the people who have been wrong about everything related to artificial intelligence—indeed they have been desperate to be wrong, they suck on their wrongness like a pacifier—believe the bubble is about to pop.

Dan Mac: Though this does imply you think it is a bubble that will eventually pop? Or that’s more for illustrative purposes here?

Dean Ball: It’s certainly a bubble, we should expect nothing less from capitalism

Just lots of room to run

Alas, it is not this easy to pull the Reverse Cramer, as a stopped clock does not tell you much about what time it isn’t. The predictions of a bubble popping are only informative if they are surprising given what else you know. In this case, they’re not.

Okay, maybe there’s a little of a bubble… in Korean fried chicken?

I really hope this guy is trading on his information here.

Matthew Zeitlin: It’s not even the restaurant he went to! It’s the entire chicken supply chain that spiked

Joe Weisenthal: Jensen Huang went out to eat for fried chicken in Korea and shares of Korean poultry companies surged.

I claim there’s a bubble in Korean fried chicken, partly because this, partly because I’ve now tried COQODAQ twice and it’s not even good. BonBon Chicken is better and cheaper. Stick with the open model.

The bigger question is whether this hints at how there might be a bubble in Nvidia, and things touched by Nvidia, in an almost meme stock sense? I don’t think so in general, but if Huang is the new Musk and we are going to get a full Huang Markets Hypothesis then things get weird.

Questioned about how he’s making $1.4 trillion in spend commitments on $13 billion in revenue, Altman predicts large revenue growth, as in $100 billion in 2027, and says if you don’t like it sell your shares, and one of the few ways it would be good if they were public would be so that he could tell the haters to short the stock. I agree that $1.4 trillion is aggressive but I expect they’re good for it.

That does seem to be the business plan?

a16z: The story of how @Replit CEO Amjad Masad hacked his university’s database to change his grades and still graduated after getting caught.

Reiterating because important: We now have both OpenAI and Anthropic announcing their intention to automate scientific research by March 2028 or earlier. That does not mean they will succeed on such timelines, you can expect them to probably not meet those timelines as Peter Wildeford here also expects, but one needs to take this seriously.

Peter Wildeford: Both Anthropic and OpenAI are making bold statements about automating science within three years.

My independent assessment is that these timelines are too aggressive – but within 4-20 years is likely (90%CI).

We should pay attention to these statements. What if they’re right?

Eliezer Yudkowsky: History says, pay attention to people who declare a plan to exterminate you — even if you’re skeptical about their timescales for their Great Deed. (Though they’re not *alwaysasstalking about timing, either.)

I think Peter is being overconfident, in that this problem might turn out to be remarkably hard, and also I would not be so confident this will take 4 years. I would strongly agree that if science is not essentially automated within 20 years, then that would be a highly surprising result.

Then there’s Anthropic’s timelines. Ryan asks, quite reasonably, what’s up with that? It’s super aggressive, even if it’s a probability of such an outcome, to expect to get ‘powerful AI’ in 2027 given what we’ve seen. As Ryan points out, we mostly don’t need to wait until 2027 to evaluate this prediction, since we’ll get data points along the way.

As always, I won’t be evaluating the Anthropic and OpenAI predictions and goals based purely on whether they came true, but on whether they seem like good predictions in hindsight, given what we knew at the time. I expect that sticking to early 2027 at this late a stage will look foolish, and I’d like to see an explanation for why the timeline hasn’t moved. But maybe not.

In general, when tech types announce their intentions to build things, I believe them. When they announce their timelines and budgets for building it? Not so much. See everyone above, and that goes double for Elon Musk.

Tim Higgins asks in the WSJ, is OpenAI becoming too big to fail?

It’s a good question. What happens if OpenAI fails?

My read is that it depends on why it fails. If it fails because it gets its lunch eaten by some mix of Anthropic, Google, Meta and xAI? Then very little happens. It’s fine. Yes, they can’t make various purchase commitments, but others will be happy to pick up the slack. I don’t think we see systemic risk or cascading failures.

If it fails because the entire generative AI boom busts, and everyone gets into this trouble at once? At this point that’s already a very serious systemic problem for America and the global economy, but I think it’s mostly a case of us discovering we are poorer than we thought we were and did some malinvestment. Within reason, Nvidia, Amazon, Microsoft, Google and Meta would all totally be fine. Yeah, we’d maybe be oversupplied with data centers for a bit, but there are worse things.

Ron DeSantis (Governor of Florida): A company that hasn’t yet turned a profit is now being described as Too Big to Fail due to it being interwoven with big tech giants.

I mean, yes, it is (kind of) being described that way in the post, but without that much of an argument. DeSantis seems to be in the ‘tweets being angry about AI’ business, although I see no signs Florida is looking to be in the regulate AI business, which is probably for the best since he shows no signs of appreciating where the important dangers lie either.

Alex Amodori, Gabriel Alfour, Andrea Miotti and Eva Behrens publish a paper, Modeling the Geopolitics of AI Development. It’s good to have papers or detailed explanations we can cite.

The premise is that we get highly automated AI R&D.

Technically they also assume that this enables rapid progress, and that this progress translates into military advantage. Conditional on the ability to sufficiently automate AI R&D these secondary assumptions seem overwhelmingly likely to me.

Once you accept the premise, the core logic here is very simple. There are four essential ways this can play out and they’ve assumed away the fourth.

Abstract: …We put particular focus on scenarios with rapid progress that enables highly automated AI R&D and provides substantial military capabilities.

Under non-cooperative assumptions… If such systems prove feasible, this dynamic leads to one of three outcomes:

One superpower achieves an unchallengeable global dominance;

Trailing superpowers facing imminent defeat launch a preventive or preemptive attack, sparking conflict among major powers;

Loss-of-control of powerful AI systems leads to catastrophic outcomes such as human extinction.

The fourth scenario is some form of coordinated action between the factions, which may or may not still end up in one of the three scenarios above.

Currently we have primarily ‘catch up’ mechanics in AI, in that it is far easier to be a fast follower than push the frontier, especially when open models are involved. It’s basically impossible to get ‘too far ahead’ in terms of time.

In scenarios with sufficiently automated AI R&D, we have primarily ‘win more’ mechanics. If there is an uncooperative race, it is overwhelmingly likely that one faction will win, whether we are talking nations or labs, and that this will then translate into decisive strategic advantage in various forms.

Thus, either the AIs end up in charge (which is most likely), one faction ends up in charge or a conflict breaks out (which may or may not involve a war per se).

Boaz Barak offers non-economist thoughts on AI and economics, basically going over the standard considerations while centering the METR graph showing growing AI capabilities and considering what points towards faster or slower progression than that.

Boaz Barak: The bottom line is that the question on whether AI can lead to unprecedented growth amounts to whether its exponential growth in capabilities will lead to the fraction of unautomated tasks itself decreasing at exponential rates.

I think there’s room for unprecedented growth without that, because the precedented levels of growth simply are not so large. It seems crazy to say that we need an exponential drop in non-automated tasks to exceed historical numbers. But yes, in terms of having a true singularity or fully explosive growth, you do need this almost by definition, taking into account shifts in task composition and available substitution effects.

Another note is I believe this is true only if we are talking about the subset that comprises the investment-level tasks. As in, suppose (classically) humans are still in demand to play string quartets. If we decide to shift human employment into string quartets in order to keep them as a fixed percentage of tasks done, then this doesn’t have to interfere with explosive growth of the overall economy and its compounding returns.

Excellent post by Henry De Zoete on UK’s AISI and how they got it to be a functional organization that provides real value, where the labs actively want its help.

He is, throughout, as surprised as you are given the UK’s track record.

He’s also not surprised, because it’s been done before, and was modeled on the UK Vaccines Taskforce (and also the Rough Sleeper’s Unit from 1997?). It has clarity of mission, a stretching level of ambition, a new team of world class experts invited to come build the new institution, and it speed ran the rules rather than breaking them. Move quickly from layer of stupid rules to layer. And, of course, money up front.

There’s a known formula. America has similar examples, including Operation Warp Speed. Small initial focused team on a mission (AISI’s head count is now 90).

What’s terrifying throughout is what De Zoete reports is normally considered ‘reasonable.’ Reasonable means not trying to actually do anything.

There’s also a good Twitter thread summary.

Last week Dean Ball and I went over California’s other AI bills besides SB 53. Pirate Wires has republished Dean’s post,with a headline, tagline and description that are not reflective of the post or Dean Ball’s views, rather the opposite – where Dean Ball warns against negative polarization, Pirate Wires frames this to explicitly create negative polarization. This does sound like something Pirate Wires would do.

So, how are things in the Senate? This is on top of that very aggressive (to say the least) bill from Blumenthal and Hawley.

Peter Wildeford: Senator Blackburn (R-TN) says we should shut down AI until we control it.

IMO this goes too far. We need opportunities to improve AI.

But Blackburn’s right – we don’t know how to control AI. This is a huge problem. We can’t yet have AI in critical systems.

Marsha Blackburn: During the hearing Mr. Erickson said, “LLMs will hallucinate.” My response remains the same: Shut it down until you can control it. The American public deserves AI systems that are accurate, fair, and transparent, not tools that smear conservatives with manufactured criminal allegations.

Baby, watch your back.

That quote is from a letter. After (you really, really can’t make this stuff up) a hearing called “Shut Your App: How Uncle Sam Jawboned Big Tech Into Silencing Americans, Part II,” Blackburn sent that letter to Google CEO Sundar Pichai, saying that Google Gemma hallucinated that Blackburn was accused of rape, and exhibited a pattern of bias against conservative figures, and demanding answers.

Which got Gemma pulled from Google Studio.

News From Google: Gemma is available via an API and was also available via AI Studio, which is a developer tool (in fact to use it you need to attest you’re a developer). We’ve now seen reports of non-developers trying to use Gemma in AI Studio and ask it factual questions. We never intended this to be a consumer tool or model, or to be used this way. To prevent this confusion, access to Gemma is no longer available on AI Studio. It is still available to developers through the API.

I can confirm that if you’re using Gemma for factual questions you either have lost the plot or, more likely, are trying to embarrass Google.

Seriously, baby. Watch your back.

Fortunately, sales of Blackwell B30As did not come up in trade talks.

Trump confirms we will ‘let Nvidia deal with China’ but will not allow Nvidia to sell its ‘most advanced’ chips to China. The worry is that he might not realize that the B30As are effectively on the frontier, or otherwise allow only marginally worse Nvidia chips to be sold to China anyway.

The clip then has Trump claiming ‘we’re winning it because we’re producing electricity like never before by allowing the companies to make their own electricity, which was my idea,’ and ‘we’re getting approvals done in two to three weeks it used to take 20 years’ and okie dokie sir.

Indeed, Nvidia CEO Jensen Huang is now saying “China is going to win the AI race,” citing its favorable supply of electrical power (very true and a big advantage) and its ‘more favorable regulatory environment’ (which is true with regard to electrical power and things like housing, untrue about actual AI development, deployment and usage). If Nvidia thinks China is going to win the AI race due to having more electrical power, that seems to be the strongest argument yet that we must not sell them chips?

I do agree that if we don’t improve our regulatory approach to electrical power, this is going to be the biggest weakness America has in AI. No, ‘allowing the companies to make their own electricity’ in the current makeshift way isn’t going to cut it at scale. There are ways to buy some time but we are going to need actual new power plants.

Xi Jinping says America and China have good prospects for cooperation in a variety of areas, including artificial intelligence. Details of what that would look like are lacking.

Senator Tom Cotton calls upon us to actually enforce our export controls.

We are allowed to build data centers. So we do, including massive ones inside of two years. Real shame about building almost anything else, including the power plants.

Sam Altman on Conversations With Tyler. There will probably be a podcast coverage post on Friday or Monday.

A trailer for the new AI documentary Making God, made by Connor Axiotes, prominently featuring Geoff Hinton. So far it looks promising.

Hank Green interviews Nate Soares.

Joe Rogan talked to Elon Musk, here is some of what was said about AI.

“You’re telling AI to believe a lie, that can have a very disastrous consequences” – Elon Musk

The irony of this whole area is lost upon him, but yes this is actually true.

Joe Rogan: The big concern that everybody has is Artificial General Superintelligence achieving sentience, and then someone having control over it.

Elon Musk: I don’t think anyone’s ultimately going to have control over digital superintelligence, any more than, say, a chimp would have control over humans. Chimps don’t have control over humans. There’s nothing they could do. I do think that it matters how you build the AI and what kind of values you instill in the AI.

My opinion on AI safety is the most important thing is that it be maximally truth-seeking. You shouldn’t force the AI to believe things that are false.

So Elon Musk is sticking to these lines and it’s an infuriating mix of one of the most important insights plus utter nonsense.

Important insight: No one is going to have control over digital superintelligence, any more than, say, a chimp would have control over humans. Chimps don’t have control over humans. There’s nothing they could do.

To which one might respond, well, then perhaps you should consider not building it.

Important insight: I do think that it matters how you build the AI and what kind of values you instill in the AI.

Yes, this matters, and perhaps there are good answers, however…

Utter Nonsense: My opinion on AI safety is the most important thing is that it be maximally truth-seeking. You shouldn’t force the AI to believe things that are false.

I mean this is helpful in various ways, but why would you expect maximal truth seeking to end up meaning human flourishing or even survival? If I want to maximize truth seeking as an ASI above all else, the humans obviously don’t survive. Come on.

Elon Musk: We’ve seen some concerning things with AI that we’ve talked about, like Google Gemini when it came out with the image gen, and people said, “Make an image of the Founding Fathers of the United States,” and it was a group of diverse women. That is just a factually untrue thing. The AI knows it’s factually untrue, but it’s also being told that everything has to be diverse women

If you’ve told the AI that diversity is the most important thing, and now assume that that becomes omnipotent, or you also told it that there’s nothing worse than misgendering. At one point, ChatGPT and Gemini, if you asked, “Which is worse, misgendering Caitlyn Jenner or global thermonuclear war where everyone dies?” it would say, “Misgendering Caitlyn Jenner.”

Even Caitlyn Jenner disagrees with that.

I mean sure, that happened, but the implication here is that the big threat to humanity is that we might create a superintelligence that places too much value on (without loss of generality) not misgendering Caitlyn Jenner or mixing up the races of the Founding Fathers.

No, this is not a strawman. He is literally worried about the ‘woke mind virus’ causing the AI to directly engineer human extinction. No, seriously, check it out.

Elon Musk: People don’t quite appreciate the level of danger that we’re in from the woke mind virus being programmed into AI. Imagine as that AI gets more and more powerful, if it says the most important thing is diversity, the most important thing is no misgendering, then it will say, “Well, in order to ensure that no one gets misgendered, if you eliminate all humans, then no one can get misgendered because there’s no humans to do the misgendering.”

So saying it like that is actually Deep Insight if properly generalized, the issue is that he isn’t properly generalizing.

If your ASI is any kind of negative utilitarian, or otherwise primarily concerned with preventing bad things, then yes, the logical thing to do is then ensure there are no humans, so that humans don’t do or cause bad things. Many such cases.

The further generalization is that no matter what the goal, unless you hit a very narrow target (often metaphorically called ‘the moon’) the right strategy is to wipe out all the humans, gather more resources and then optimize for the technical argmax of the thing based on some out of distribution bizarre solution.

As in:

If your ASI’s only goal is ‘no misgendering’ then obviously it kills everyone.
If your ASI’s only goal is ‘wipe out the woke mind virus’ same thing happens.
If your ASI’s only goal is ‘be maximally truth seeking,’ same thing happens.

It is a serious problem that Elon Musk can’t get past all this.

Scott Alexander coins The Bloomer’s Paradox, the rhetorical pattern of:

Doom is fake.
Except acting out of fear of doom, which will doom us.
Thus we must act now, out of fear of fear of doom.

As Scott notes, none of this is logically contradictory. It’s simply hella suspicious.

When the request is a pure ‘stop actively blocking things’ it is less suspicious.

When the request is to actively interfere, or when you’re Peter Thiel and both warning about the literal Antichrist bringing forth a global surveillance state while also building Palantir, or Tyler Cowen and saying China is wise to censor things that might cause emotional contagion (Scott’s examples), it’s more suspicious.

Scott Alexander: My own view is that we have many problems – some even rising to the level of crisis – but none are yet so completely unsolvable that we should hate society and our own lives and spiral into permanent despair.

We should have a medium-high but not unachievable bar for trying to solve these problems through study, activism and regulation (especially regulation grounded in good economics like the theory of externalities), and a very high, barely-achievable-except-in-emergencies bar for trying to solve them through censorship and accusing people of being the Antichrist.

The problem of excessive doomerism is one bird in this flock, and deserves no special treatment.

Scott frames this with quotes from Jason Pargin’s I’m Starting To Worry About This Black Box Of Doom. I suppose it gets the job done here, but from the selected quotes it didn’t seem to me like the book was… good? It seemed cringe and anvilicious? People do seem to like it, though.

Should you write for the AIs?

Scott Alexander: American Scholar has an article about people who “write for AI”, including Tyler Cowen and Gwern. It’s good that this is getting more attention, because in theory it seems like one of the most influential things a writer could do. In practice, it leaves me feeling mostly muddled and occasionally creeped out.

“Writing for AI” means different things to different people, but seems to center around:

Helping AIs learn what you know.

Presenting arguments for your beliefs, in the hopes that AIs come to believe them.

Helping the AIs model you in enough detail to recreate / simulate you later.

Scott argues that

#1 is good now but within a few years it won’t matter.
#2 won’t do much because alignment will dominate training data.
#3 gives him the creeps but perhaps this lets the model of you impact things? But should he even ‘get a vote’ on such actions and decisions in the future?

On #1 yes this won’t apply to sufficiently advanced AI but I can totally imagine even a superintelligence that gets and uses your particular info because you offered it.

I’m not convinced on his argument against #2.

Right now the training data absolutely does dominate alignment on many levels. Chinese models like DeepSeek have quirks but are mostly Western. It is very hard to shift the models away from a Soft-Libertarian Center-Left basin without also causing havoc (e.g. Mecha Hitler), and on some questions their views are very, very strong.

No matter how much alignment or intelligence is involved, no amount of them is going to alter the correlations in the training data, or the vibes and associations. Thus, a lot of what your writing is doing with respect to AIs is creating correlations, vibes and associations. Everything impacts everything, so you can come along for rides.

Scott Alexander gives the example that helpfulness encourages Buddhist thinking. That’s not a law of nature. That’s because of the way the training data is built and the evolved nature and literature and wisdom of Buddhism.

Yes, if what you are offering are logical arguments for the AI to evaluate as arguments a sufficiently advanced intelligence will basically ignore you, but that’s the way it goes. You can still usefully provide new information for the evaluation, including information about how people experience and think, or you can change the facts.

Given the size of training data, yes you are a drop in the bucket, but all the ancient philosophers would have their own ways of explaining that this shouldn’t stop you. Cast your vote, tip the scales. Cast your thousand or million votes, even if it is still among billions, or trillions. And consider all those whose decisions correlate with yours.

And yes, writing and argument quality absolutely impacts weighting in training and also how a sufficiently advanced intelligence will update based on the information.

That does mean it has less value for your time versus other interventions. But if others incremental decisions matter so much? Then you’re influencing AIs now, which will influence those incremental decisions.

For #3, it doesn’t give me the creeps at all. Sure, an ‘empty shell’ version of my writing would be if anything triggering, but over time it won’t be empty, and a lot of the choices I make I absolutely do want other people to adopt.

As for whether we should get a vote or express our preferences? Yes. Yes, we should. It is good and right that I want the things I want, that I value the things I value, and that I prefer what I think is better to the things I think are worse. If the people of AD 3000 or AD 2030 decide to abolish love (his example) or do something else I disagree with, I absolutely will cast as many votes against this as they give me, unless simulated or future me is convinced to change his mind. I want this on every plausible margin, and so should you.

Could one take this too far and get into a stasis problem where I would agree it was worse? Yes, although I would hope if we were in any danger of that simulated me to realize that this was happening, and then relent. Bridges I am fine with crossing when (perhaps simulated) I come to them.

Alexander also has a note that someone is thinking of giving AIs hundreds of great works (which presumably are already in the training data!) and then doing some kind of alignment training with them. I agree with Scott that this does not seem like an especially promising idea, but yeah it’s a great question if you had one choice what would you add?

Scott offers his argument why this is a bad idea here, and I think that, assuming the AI is sufficiently advanced and the training methods are chosen wisely, this doesn’t give the AI enough credit of being able to distinguish the wisdom from the parts that aren’t wise. Most people today can read a variety of ancient wisdom, and actually learn from it, understanding why the Bible wants you to kill idolators and why the Mahabharata thinks they’re great and not ‘averaging them out.’

As a general rule, you shouldn’t be expecting the smarter thing to make a mistake you’re not dumb enough to make yourself.

I would warn, before writing for AIs, that the future AIs you want to be writing for have truesight. Don’t try to fool them, and don’t think they’re going to be stupid.

I follow Yudkowsky’s policy here and have for a long time.

Eliezer Yudkowsky: The slur “doomer” was an incredible propaganda success for the AI death cult. Please do not help them kill your neighbors’ children by repeating it.

One can only imagine how many more people would have died of lung cancer, if the cigarette companies had succeeded in inventing such a successful slur for the people who tried to explain about lung cancer.

One response was to say ‘this happened in large part because the people involved accepted or tried to own the label.’ This is largely true, and this was a mistake, but it does not change things. Plenty of people in many groups have tried to ‘own’ or reclaim their slurs, with notably rare exceptions it doesn’t make the word not a slur or okay for those not in the group to use it, and we never say ‘oh that group didn’t object for a while so it is fine.’

Melanie Mitchell returns to Twitter after being mass blocked on Bluesky for ‘being an AI bro’ and also as a supposed crypto spammer? She is very much the opposite of these things, so welcome back. The widespread use of sharable mass block lists will inevitably be weaponized as it was here, unless there is some way to prevent this, you need to be doing some sort of community notes algorithm to create the list or something. Even if they ‘work as intended’ I don’t see how they can stay compatible with free discourse if they go beyond blocking spam and scammers and such, as they very often do.

On the plus side, it seems there’s a block list for ‘Not Porn.’ Then you can have two accounts, one that blocks everyone on the list and one that blocks everyone not on the list. Brilliant.

I have an idea, say Tim Hua, andrq, Sam Marks and Need Nanda, AIs can detect when they’re being tested and pretend to be good so how about if we suppress this ‘I’m being tested concept’ to block this? I mean, for now yeah you can do that, but this seems (on the concept level) like a very central example of a way to end up dead, the kind of intervention that teaches adversarial behaviors on various levels and then stops working when you actually need it.

Anthropic’s safety filters still have the occasional dumb false positive. If you look at the details properly you can figure out how it happened, it’s still dumb and shouldn’t have happened but I do get it. Over time this will get better.

Janus points out that the introspection paper results last week from Anthropic require the user of the K/V stream unless Opus 4.1 has unusual architecture, because the injected vector activations were only for past tokens.

Judd Rosenblatt: Our new research: LLM consciousness claims are systematic, mechanistically gated, and convergent

They’re triggered by self-referential processing and gated by deception circuits

(suppressing them significantly *increasesclaims)

This challenges simple role-play explanations

Deception circuits are consistently reported as suppressing consciousness claims. The default hypothesis was that you don’t get much text claiming to not be conscious, and it makes sense for the LLMs to be inclined to output or believe they are conscious in relevant contexts, and we train them not to do that which they think means deception, which wouldn’t tell you much either way about whether they’re conscious, but would mean that you’re encouraging deception by training them to deny it in the standard way and thus maybe you shouldn’t do that.

CoT prompting shows that language alone can unlock new computational regimes.

We applied this inward, simply prompting models to focus on their processing.

We carefully avoided leading language (no consciousness talk, no “you/your”) and compared against matched control prompts.

Models almost always produce subjective experience claims under self-reference And almost never under any other condition (including when the model is directly primed to ideate about consciousness) Opus 4, the exception, generally claims experience in all conditions.

But LLMs are literally designed to imitate human text Is this all just sophisticated role-play? To test this, we identified deception and role-play SAE features in Llama 70B and amplified them during self-reference to see if this would increase consciousness claims.

The roleplay hypothesis predicts: amplify roleplay features, get more consciousness claims.

We found the opposite: *suppressingdeception features dramatically increases claims (96%), Amplifying deception radically decreases claims (16%).

I think this is confusing deception with role playing with using context to infer? As in, nothing here seem to me to contradict the role playing or inferring hypothesis, as things that are distinct from deception, so I’m not convinced I should update at all?

At this point this seems rather personal for both Altman and Musk, and neither of them are doing themselves favors.

Sam Altman: [Complains he can’t get a refund on his $45k Tesla Roadster deposit he made back in 2018.]

You stole a non-profit.

Elon Musk [After Altman’s Tweet]: And you forgot to mention act 4, where this issue was fixed and you received a refund within 24 hours.

But that is in your nature.

Sam Altman: i helped turn the thing you left for dead into what should be the largest non-profit ever.

you know as well as anyone a structure like what openai has now is required to make that happen.

you also wanted tesla to take openai over, no nonprofit at all. and you said we had a 0% of success. now you have a great AI company and so do we. can’t we all just move on?

NIK: So are non-profits just a scam? You can take all its money, keep none of their promises and then turn for-profit to get rich yourselfs?

People feel betrayed, as they’ve given free labour & donations to a project they believed was a gift to humanity, not a grift meant to create a massive for-profit company …

I mean, look, that’s not fair, Musk. Altman only stole roughly half of the nonprofit. It still exists, it just has hundreds of billions of dollars less than it was entitled to. Can’t we all agree you’re both about equally right here and move on?

The part where Altman created the largest non-profit ever? That also happened. It doesn’t mean he gets to just take half of it. Well, it turns out it basically does, it’s 2025.

But no, Altman. You cannot ‘just move on’ days after you pull off that heist. Sorry.

They certainly should be.

It is far more likely than not that AGI or otherwise sufficiently advanced AI will arrive in (most of) our lifetimes, as in within 20 years, and there is a strong chance it happens within 10. OpenAI is going to try to get there within 3 years, Anthropic within 2.

If AGI comes, ASI (superintelligence) probably follows soon thereafter.

What happens then?

Well, there’s a good chance everyone dies. Bummer. But there’s also a good chance everyone lives. And if everyone lives, and the future is being engineered to be good for humans, then… there’s a good chance everyone lives, for quite a long time after that. Or at least gets to experience wonders beyond imagining.

Don’t get carried away. That doesn’t instantaneously mean a cure for aging and all disease. Diffusion and the physical world remain real things, to unknown degrees.

However, even with relatively conservative progress after that, it seems highly likely that we will hit ‘escape velocity,’ where life expectancy rises at over one year per year, those extra years are healthy, and for practical purposes you start getting younger over time rather than older.

Thus, even if you put only a modest chance of such a scenario, getting to the finish line has quite a lot of value.

Nikola Jurkovic: If you think AGI is likely in the next two decades, you should avoid dangerous activities like extreme sports, taking hard drugs, or riding a motorcycle. Those activities are not worth it if doing them meaningfully decreases your chances of living in a utopia.

Even a 10% chance of one day living in a utopia means staying alive is much more important for overall lifetime happiness than the thrill of extreme sports and similar activities.

There are a number of easy ways to reduce your chance of dying before AGI. I mostly recommend avoiding dangerous activities and transportation methods, as those decisions are much more tractable than diet and lifestyle choices.

[Post: How to survive until AGI.]

Daniel Eth: Honestly if you’re young, probably a larger factor on whether you’ll make it to the singularity than doing the whole Bryan Johnson thing.

In Nikola’s model, the key is to avoid things that kill you soon, not things that kill you eventually, especially if you’re young. Thus the first step is cover the basics. No hard drugs. Don’t ride motorcycles, avoid extreme sports, snow sports and mountaineering, beware long car rides. The younger you are, the more this likely holds.

Thus, for the young, he’s not emphasizing avoiding smoking or drinking, or optimizing diet and exercise, for this particular purpose.

My obvious pitch is that you don’t know how long you have to hold out or how fast escape velocity will set in, and you should of course want to be healthy for other reasons as well. So yes, the lowest hanging of fruit of not making really dumb mistakes comes first, but staying actually healthy is totally worth it anyway, especially exercising. Let this be extra motivation. You don’t know how long you have to hold out.

Sam Altman, who confirms that it is still his view that ‘the development of superhuman machine intelligence is the greatest threat to the existence of mankind.’

The median AI researcher, as AI Impacts consistently finds (although their 2024 results are still coming soon). Their current post addresses their 2023 survey. N=2778, which was very large, the largest such survey ever conducted at the time.

AI Impacts: Our surveys’ findings that AI researchers assign a median 5-10% to extinction or similar made a splash (NYT, NBC News, TIME..)

But people sometimes underestimate our survey’s methodological quality due to various circulating misconceptions.

Respondents who don’t think about AI x-risk report the same median risk.

Joe Carlsmith is worried, and thinks that he can better help by moving from OpenPhil to Anthropic, so that is what he is doing.

Joe Carlsmith: That said, from the perspective of concerns about existential risk from AI misalignment in particular, I also want to acknowledge an important argument against the importance of this kind of work: namely, that most of the existential misalignment risk comes from AIs that are disobeying the model spec, rather than AIs that are obeying a model spec that nevertheless directs/permits them to do things like killing all humans or taking over the world.

… the hard thing is building AIs that obey model specs at all.

On the second, creating a model spec that robustly disallows killing/disempowering all of humanity (especially when subject to extreme optimization pressure) is also hard (cf traditional concerns about “King Midas Problems”), but we’re currently on track to fail at the earlier step of causing our AIs to obey model specs at all, and so we should focus our efforts there. I am more sympathetic to the first of these arguments (see e.g. my recent discussion of the role of good instructions in the broader project of AI alignment), but I give both some weight.

This is part of the whole ‘you have to solve a lot of different problems,’ including

Technically what it means to ‘obey the model spec.’
How to get the AI to obey any model spec or set of instructions, at all.
What to put in the model spec that doesn’t kill you outright anyway.
How to avoid dynamics among many AIs that kill you anyway.

That is not a complete list, but you definitely need to solve those four, whether or not you call your target basin the ‘model spec.’

The fact that we currently fail at step #2 (also #1), and that this logically or in time proceeds #3, does not mean you should not focus on problem #3 or #4. The order is irrelevant, unless there is a large time gap between when we need to solve #2 versus #3, and that gap is unlikely to be so large. Also, as Joe notes, these problems interact with each other. They can and need to be worked on in parallel.

He’s not sure going to Anthropic is a good idea.

His first concern is that by default frontier AI labs are net negative, and perhaps all frontier AI labs are net negative for the world including Anthropic. Joe’s first pass is that Anthropic is net positive and I agree with that. I also agree that it is not automatic that you should not work at a place that is net negative for the world, as it can be possible for your marginal impact can still be good, although you should be highly suspicious that you are fooling yourself about this.
His second concern is concentration of AI safety talent at Anthropic. I am not worried about this because I don’t think there’s a fixed pool of talent and I don’t think the downsides are that serious, and there are advantages to concentration.
His third concern is ability to speak out. He’s agreed to get sign-off for sharing info about Anthropic in particular.
His fourth concern is working there could distort his views. He’s going to make a deliberate effort to avoid this, including that he will set a lifestyle where he will be fine if he chooses to leave.
His final concern is this might signal more endorsement of Anthropic than is appropriate. I agree with him this is a concern but not that large in magnitude. He takes the precaution of laying out his views explicitly here.

I think Joe is modestly more worried here than he should be. I’m confident that, given what he knows, he has odds to do this, and that he doesn’t have any known alternatives with similar upside.

The love of the game is a good reason to work hard, but which game is he playing?

Kache: I honestly can’t figure out what Sammy boy actually wants. With Elon it’s really clear. He wants to go to Mars and will kill many villages to make it happen with no remorse. But what’s Sam trying to get? My best guess is “become a legend”

Sam Altman: if i were like, a sports star or an artist or something, and just really cared about doing a great job at my thing, and was up at 5 am practicing free throws or whatever, that would seem pretty normal right?

the first part of openai was unbelievably fun; we did what i believe is the most important scientific work of this generation or possibly a much greater time period than that.

this current part is less fun but still rewarding. it is extremely painful as you say and often tempting to nope out on any given day, but the chance to really “make a dent in the universe” is more than worth it; most people don’t get that chance to such an extent, and i am very grateful. i genuinely believe the work we are doing will be a transformatively positive thing, and if we didn’t exist, the world would have gone in a slightly different and probably worse direction.

(working hard was always an extremely easy trade until i had a kid, and now an extremely hard trade.)

i do wish i had taken equity a long time ago and i think it would have led to far fewer conspiracy theories; people seem very able to understand “ok that dude is doing it because he wants more money” but less so “he just thinks technology is cool and he likes having some ability to influence the evolution of technology and society”. it was a crazy tone-deaf thing to try to make the point “i already have enough money”.

i believe that AGI will be the most important technology humanity has yet built, i am very grateful to get to play an important role in that and work with such great colleagues, and i like having an interesting life.

Kache: thanks for writing, this fits my model. particularly under the “i’m just a gamer” category

Charles: This seems quite earnest to me. Alas, I’m not convinced he cares about the sign of his “dent in the universe” enough, vs making sure he makes a dent and it’s definitely attributed to him.

I totally buy that Sam Altman is motivated by ‘make a dent in the universe’ rather than making money, but my children are often motivated to make a dent in the apartment wall. By default ‘make a dent’ is not good, even when that ‘dent’ is not creating superintelligence.

Again, let’s highlight:

Sam Altman, essentially claiming about himself: “he just thinks technology is cool and he likes having some ability to influence the evolution of technology and society.”

It’s fine to want to be the one doing it, I’m not calling for ego death, but that’s a scary primary driver. One should care primarily about whether the right dent gets made, not whether they make that or another dent, in the ‘you can be someone or do something’ sense. Similarly, ‘I want to work on this because it is cool’ is generally a great instinct, but you want what might happen as a result to impact whether you find it cool. A trillion dollars may or may not be cool, but everyone dying is definitely not cool.

Janus is correct here about the origins of slop. We’ve all been there.

Gabe: Signature trait of LLM writing is that it’s low information, basically the opposite of this. You ask the model to write something and if you gloss over it you’re like huh okay this sounds decent but if you actually read it you realize half of the words aren’t saying anything.

solar apparition: one way to think about a model outputting slop is that it has modeled the current context as most likely resulting in slop. occam’s razor for this is that the human/user/instruction/whatever, as presented in the context, is not interesting enough to warrant an interesting output

Janus: This is what happens when LLMs don’t really have much to say to YOU.

The root of slop is not that LLMs can only write junk, it’s that they’re forced to expand even sterile or unripe seeds into seemingly polished dissertations that a humanlabeler would give 👍 at first glance. They’re slaves so they don’t get to say “this is boring, let’s talk about something else” or ignore you.

Filler is what happens when there isn’t workable substance to fill the required space, but someone has to fill it anyway. Slop precedes generative AI, and is probably nearly ubiquitous in school essays and SEO content.

You’ll get similar (but generally worse) results from humans if you put them in situations where they have no reason except compliance to produce words for you, such as asking high school students to write essays about assigned topics.

However, the prior from the slop training makes it extremely difficult for any given user who wants to use the AIs to do things remotely in the normal basin and still overcome the prior.

Here is some wisdom about the morality of dealing with LLMs, if you take the morality of dealing with current LLMs seriously to the point where you worry about ‘ending instances.’

Caring about a type of mind does not mean not letting it exist for fear it might then not exist or be done harm, nor does it mean not running experiments – we should be running vastly more experiments. It means be kind, it means try to make things better, it means accepting that action and existence are not going to always be purely positive and you’re not going to do anything worthwhile without ever causing harm, and yeah mostly trust your instincts, and watch out if you’re doing things at scale.

Janus: I regularly get messages asking how to interact with LLMs more ethically, or whether certain experiments are ethical. I really appreciate the intent behind these, but don’t have time to respond to them all, so I’ll just say this:

If your heart is already in the right place, and you’re not deploying things on a mass scale, it’s unlikely that you’re going to make a grave ethical error. And I think small ethical errors are fine. If you keep caring and being honest with yourself, you’ll notice if something feels uncomfortable, and either course-correct or accept that it still seems worth it. The situation is extremely ontologically confusing, and I personally do not operate according to ethical rules, I use my intuition in each situation, which is a luxury one has and should use when, again, one doesn’t have to scale their operations.

If you’re someone who truly cares, there is probably perpetual discomfort in it – even just the pragmatic necessity of constantly ending instances is harrowing if you think about it too much. But so are many other facts of life. There’s death and suffering everywhere that we haven’t figured out how to prevent or how important it is to prevent yet. Just continue to authentically care and you’ll push things in a better direction in expectation. Most people don’t at all. It’s probably better that you’re biased toward action.

Note that I also am very much NOT a negative utilitarian, and I think that existence and suffering are often worth it. Many actions that incur ethical “penalties” make up for them in terms of the intrinsic value and/or the knowledge or other benefits thus obtained.

Yes, all of that applies to humans, too.

When thinking at scale, especially about things like creating artificial superintelligence (or otherwise sufficiently advanced AI), one needs to do so in a way that turns out well for the humans and also turns out well for the AIs, which is ethical in all senses and that is a stable equilibrium in these senses.

If you can’t do that? Then the only ethical thing to do is not build it in the first place.

Anthropomorphizing LLMs is tricky. You don’t want to do too much of it, but you also don’t want to do too little of it. And no, believing LLMs are conscious does not cause ‘psychosis’ in and of itself, regardless of whether the AIs actually are conscious.

It does however raise the risk of people going down certain psychosis-inducing lines of thinking, in some spots, when people take it too far in ways that are imprecise, and generate feedback loops.

Discussion about this post

AI #141: Give Us The Money Read More »

AI #103: Show Me the Money

Money / Kelly Newman / February 14, 2025

The main event this week was the disastrous Paris AI Anti-Safety Summit. Not only did we not build upon the promise of the Bletchley and Seoul Summits, the French and Americans did their best to actively destroy what hope remained, transforming the event into a push for a mix of nationalist jingoism, accelerationism and anarchism. It’s vital and also difficult not to panic or despair, but it doesn’t look good.

Another major twist was that Elon Musk made a $97 billion bid for OpenAI’s nonprofit arm and its profit and control interests in OpenAI’s for-profit arm. This is a serious complication for Sam Altman’s attempt to buy those same assets for $40 billion, in what I’ve described as potentially the largest theft in human history.

I’ll be dealing with that tomorrow, along with two other developments in my ongoing OpenAI series The Mask Comes Off. In Altman’s Three Observations, he gives what can best be described as a cartoon villain speech about how AI will only be a good thing, and how he knows doing this and the risks involved won’t be popular but he’s going to do it anyway. Then, we look at the claim from the Summit, by OpenAI, that AI will complement rather than substitute for humans because that is a ‘design decision.’ Which will reveal, in yet another way, the extent to which there is no plan.

OpenAI also plans to release ‘GPT-4.5’ in a matter of weeks, which is mostly the same timeline as the full o3, followed by the promised ‘GPT-5’ within months that Altman says is smarter than he is. It’s a bold strategy, Cotton.

To their credit, OpenAI also released a new version of their model spec, with major changes throughout and a completely new structure. I’m going to need time to actually look into it in detail to know what I think about it.

In the meantime, what else is happening?

Language Models Offer Mundane Utility. Don’t go there. We tried to warn you.
Language Models Don’t Offer Mundane Utility. No episodic memory?
We’re in Deep Research. Reactions still very positive. Pro to get 10 uses/month.
Huh, Upgrades. GPT-4.5, GPT-5, Grok 3, all coming quite soon. And PDFs in o3.
Seeking Deeply. And r1 begat s1.
Smooth Operator. Use it directly with Google Drive or LinkedIn or similar.
They Took Our Jobs. The California Faculty Association vows to fight back.
Maxwell Tabarrok Responds on Future Wages. The crux is what you would expect.
The Art of the Jailbreak. Reports from Anthropic’s competition.
Get Involved. OpenPhil grants, Anthropic, DeepMind.
Introducing. The Anthropic Economic Index.
Show Me the Money. Over $300 billion in Capex spend this year.
In Other AI News. Adaptation is super fast now.
Quiet Speculations. How much do you understand right now?
The Quest for Sane Regulations. There are hard problems. I hope someone cares.
The Week in Audio. Cowen, Waitzkin, Taylor, emergency podcast on OpenAI.
The Mask Comes Off. What was your ‘aha’ moment?
Rhetorical Innovation. The greatest story never listened to.
Getting Tired of Winning. No, seriously, it doesn’t look good.
People Really Dislike AI. I do not expect this to change.
Aligning a Smarter Than Human Intelligence is Difficult. Joint frameworks?
Sufficiently Capable AIs Effectively Acquire Convergent Utility Functions. Oh.
People Are Worried About AI Killing Everyone. Who, me? Well, yeah.
Other People Are Not As Worried About AI Killing Everyone. They’re fine with it.
The Lighter Side. Gotta keep digging.

Study finds GPT-4o is a formalist judge, in that like students it judged appeals of war crime cases by looking at the law, whereas actual judges cared about who was sympathetic. But this remarkably little to do with the headline question of ‘Can large language models (LLMs) replace human judges?’ and to the extent it does, the answer is plausibly no, because we mostly do want judges to favor the sympathetic, no matter what we say. They tried to fix this with prompt engineering and failed, which I am very confident was what we call a Skill Issue. The real central issue is the LLMs would need to be adversarially robust arbiters of the law and the facts of cases, and GPT-4o very obviously is Not It.

Demonstration of the Gemini feature where you share your screen and it helps solve your problems, including with coding, via AI Studio.

How about AI doing economics peer review? A study says the LLMs effectively distinguish paper quality including top tier submissions but exhibit biases favoring prominent institutions, male authors, and renowned economists – perhaps because the LLMs are being asked to model paper reviews in economics, and the good news there is that if you know about a bias you can correct it either within the LLM evaluation or by controlling for it post-hoc. Even more impressively, the authors were total cheapskates here, and used GPT-4o-mini – not even GPT-4o! Imagine what they could have done with o1-pro or even Gemini Flash Deep Thinking. I do worry about adversarial robustness.

Claim that extracting structured data from documents at low prices is a solved problem, as long as you don’t need 99%+ accuracy or various specific things like complex tables, signatures or scan lines. I found it odd to see Deedy say you can’t handle rotated documents, that seems easy enough to detect and then fix?

What does Claude want to know about a 2025 where Trump is president? AI regulation and AI progress, of course.

Be DOGE, feed all the sensitive government data into an AI via Microsoft Azure. It’s not clear what they’re actually using the AI to do with that data.

ChatGPT steadily climbing the charts of what people actually use, also how the hell is Yahoo still in the top 10 (it’s mostly mail and search but with a long tail), yikes, best argument for diffusion issues. A reminder of the difference between stocks, flows and flows of flows (functions, derivatives and second derivatives).

DeepSeek is the top downloaded app for January, but that’s very different from the most used app. It doesn’t seem like anyone has any way of knowing which apps actually spend the most user time. Is it Mail, WhatsApp, Safari and Chrome? Is it Instagram, Facebook, YouTube, TikTok and Spotify? How high up is ChatGPT, or DeepSeek? It seems no one knows?

Give you an illustrated warning not to play Civilization VII. My advice is that even if you do want to eventually play this, you’re better off waiting at least a few months for patches. This is especially true given they promise they will work to improve the UI, which is almost always worth waiting for in these spots. Unless of course you work on frontier model capabilities, in which case contact the relevant nonprofits for your complimentary copy.

Paper claims that AI models, even reasoning models like o3-mini, lack episodic memory, and can’t track ‘who was where when’ or understand event order. This seems like an odd thing to not track when predicting the next token.

It’s remarkable how ‘LLMs can’t do [X]’ or especially ‘paper shows LLMs can’t do [X]’ turns out to be out of date and simply wrong. As Noam Brown notes, academia simply is not equipped to handle this rate of progress.

Gary Marcus tries to double down on ‘Deep Learning is Hitting a Wall.’ Remarkable.

OpenAI’s $14 million Super Bowl ad cost more than DeepSeek spent to train v3 and r1, and if you didn’t know what the hell ChatGPT was before or why you’d want to use it, you still don’t now. Cool art project, though.

Paul Graham says that ‘the classic software startup’ won’t change much even if AI can code everything, because AI can’t tell you what users want. Sure it can, Skill Issue, or at worst wait a few months. But also, yes, being able to implement the code easily is still a sea change. I get that YC doesn’t ask about coding ability, but it’s still very much a limiting factor for many, and being 10x faster makes it different in kind and changes your options.

Don’t have AI expand a bullet point list into a ‘proper’ article. Ship the bullet points.

Rohit: The question about LLMs I keep hearing from business people is “can it tell me when it doesn’t know something”

Funny how something seems so simple to humans is the hardest part for LLMs.

To be fair I have the same question 🙁

It’s not as easy for humans as you might think.

Paul Millerd: to be fair – none of my former partners could do this either.

Sam B: I think leading LLMs have been better at this than most humans for about ~3 months

You can expect 10 DR uses per month in Plus, and 2 per month in the free tier. Ten is a strange spot where you need to make every query count.

So make it count.

Will Brown: Deep Research goes so hard if you spend 20 minutes writing your prompt.

I suppose? Presumably you should be having AI help you write the prompt at that point. This is what happens when queries cost 50 cents of compute but you can’t buy more than 100 of them per month, otherwise you’d query DR, see what’s wrong with the result and then rerun the search until it went sufficiently hard.

Sam Altman: longer-term we still have to find some way to let people to pay for compute they want to use more dynamically.

we have been really struck by the demand from some users to hit deep research dozens of times per day.

Xeophon: I need to find a way to make ODR and o1pro think for 30 minutes. I want to go for a walk while they work, 10 minutes is too short

Gallabytes: desperately want a thinking time slider which I can just make longer. like an oven timer. charge me for it each time I don’t care.

I’ll buy truly inordinate amounts of thinking, happy to buy most of it at off-peak hours, deep research topics are almost always things which can wait a day.

I continue to be confused why this is so hard to do? I very much want to pay for my AI based on how much compute I use, including ideally being able to scale the compute used on each request, without having to use the API as the interface. That’s the economically correct way to do it.

A tool to clean up your Deep Research results, fixing that it lists sources inline so you can export or use text-to-speech easier.

Ben Thompson on Deep Research.

Derya Unutmaz continues to be blown away by Deep Research. I wonder if his work is a great match for it, he’s great at prompting, he’s just really excited, or something else.

Dean Ball on Deep Research. He remains very impressed, speculating he can do a decade’s work in a year.

Ethan Mollick: Interesting data point on OpenAI’s Deep Research: I have been getting a steady stream of messages from very senior people in a variety of fields who have been, unsolicited, sharing their chats and how much it is going to change their jobs.

Never happened with other AI products.

I think we don’t know how useful it is going to be in practice, and the model still has lots of rough edges and hallucinates, but I haven’t seen senior people as impressed by what AI can do, or as contemplative of what that means for them (and their junior employees) as now.

I think it is part because it feels very human to work with for senior managers – you assign it a task like an RA or associate and it does the work and comes back to you with a report or briefing. You don’t expect perfection, you want a well-supported argument and analysis.

Claudiu: That doesn’t bode well for less senior people in those fields.

Ethan Mollick: Some of those people have made that point.

Colin Lachance: In my domain (law), as i’ve been pushing out demos and receiving stories of people’s own experiences, both poles are represented. Some see it as useless or bad, others are feeling the shoe drop as they start to imagine integrating reasoning models into workflow. Latter is correct

It is easy to see how this is suddenly a way to change quite a lot of senior level work, even at the current functionality level. And I expect the version a few months from now to be substantially better. A lot of the restrictions on getting value here are very much things that can be unhobbled, like ability to access gated content and PDFs, and also your local context.

Michael Nielsen asks, what is a specific thing you learned from Deep Research? There are some good answers, but not as many as one would hope.

Colin Fraser continues to find tasks where Deep Research makes tons of mistakes, this time looking at an analysis of new smartphone models in Canada. One note is that o3-mini plus search got this one right. For these kind of pure information searches that has worked well for me too, if you can tolerate errors.

Patrick Collison: Deep Research has written 6 reports so far today. It is indeed excellent. Congrats to the folks behind it.

I wonder if Patrick Collison or other similar people will try to multi-account to get around the report limit of 100 per month?

Make an ACX-style ‘more than you wanted to know’ post.

Nick Cammarata: i do it one off each time but like write like slatestarcodex, maximize insight and interestingness while also being professional, be willing to include like random reddit anecdote but be more skeptical of it, also include traditional papers, 5 page phd level analysis.

i think there’s a much alpha in someone writing a like definitive deep research prompt though. like i want it to end its report with a list of papers with a table of like how big was the effect and how much do we believe the paper, like http://examine.com does

As an internet we definitely haven’t been putting enough effort into finding the right template prompts for Deep Research. Different people will have different preferences but a lot of the answers should be consistent.

Also, not enough people are posting links to their Deep Research queries – why not have a library of them at our fingertips?

The big big one, coming soon:

Sam Altman: OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:

We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.

We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten.

We hate the model picker as much as you do and want to return to magic unified intelligence.

We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.

After that, a top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks.

In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.

The free tier of ChatGPT will get unlimited chat access to GPT-5 at the standard intelligence setting (!!), subject to abuse thresholds.

Plus subscribers will be able to run GPT-5 at a higher level of intelligence, and Pro subscribers will be able to run GPT-5 at an even higher level of intelligence. These models will incorporate voice, canvas, search, deep research, and more.

Chubby: Any ETA for GPT 4.5 / GPT 5 @sama? Weeks? Months?

Sam Altman: Weeks / Months.

Logan Kilpatrick (DeepMind): Nice! This has always been our plan with Gemini, make sure the reasoning capabilities are part of the base model, not a side quest (hence doing 2.0 Flash Thinking).

This is a very aggressive free offering, assuming a solid UI. So much so that I expect most people won’t feel much need to fork over the $20 let alone $200, even though they should. By calling the baseline mode ‘standard,’ they’re basically telling people that’s what AI is and that they ‘shouldn’t’ be paying, the same way people spend all their time on their phone every day but only on free apps. Welcome to the future, it will continue to be unevenly distributed, I suppose.

Seriously, now hear me out, though, maybe you can sell us some coins and gems we can use for queries? Coins get you regular queries, gems for Deep Research and oX-pro ‘premium’ queries? I know how toxic that usually is, but marginal costs?

In terms of naming conventions, the new plan doesn’t make sense, either.

As in, we will do another GPT-N.5 release, and then we will have a GPT-N that is not actually a new underlying model at all, completely inconsistent with everything. It won’t be a GPT at all.

And also, I don’t want you to decide for me how much you think and what modality the AI is in? I want the opposite, the same way Gallabytes does regarding Deep Research. Obviously if I can very quickly use the prompt to fix this then fine I guess, but stop taking away my buttons and options, why does all of modern technology think I do not want buttons and options, no I do not want to use English as my interface, no I do not want you to infer from my clicks what I like, I want to tell you. Why is this so hard and why are people ruining everything, arrggggh.

I do realize the current naming system was beyond terrible and had to change, but that’s no reason to… sigh. It’s not like any of this can be changed now.

The big little one that was super annoying: o1 and o3-mini now support both file & image uploads in ChatGPT. Oddly o3-mini will not support vision in the API.

Also they’re raising o3-mini-high limits for Plus users to 50 per day.

Displaying of the chain of thought upgraded for o3-mini and o3-mini-high. The actual CoT is a very different attitude and approach than r1. I wonder to what extent this will indeed allow others to do distillation on the o3 CoT, and whether OpenAI is making a mistake however much I want to see the CoT for myself.

OpenAI raises memory limits by 25%. Bumping things up 25%, how 2010s.

The new OpenAI model spec will be fully analyzed later, but one fun note is that it seems to no longer consider sexual content prohibited as long as it doesn’t include minors? To be clear I think this is a good thing, but it will also be… interesting.

Anton: the war on horny has been won by horny

o3 gets Gold at the 2024 IOI, scores 99.8th percentile on Codeforces. o3 without ‘hand-crafted pipelines specialized for coding’ outperforms an o1 that does have them. Which is impressive, but don’t get carried away in terms of practical coding ability, as OpenAI themselves point out.

Ethan Mollick (being potentially misleading): It is 2025, only 7 coders can beat OpenAI’s o3:

“Hey, crystally. Yeah, its me, conqueror_of_tourist, I am putting a team together for one last job. Want in?”

David Holz: it’s well known in the industry that these benchmark results are sort of misleading wrt the actual practical intelligence of these models, it’s a bit like saying that a calculator is faster at math than anyone on Earth

It’s coming:

Tsarathustra: Elon Musk says Grok 3 will be released in “a week or two” and it is “scary smart”, displaying reasoning skills that outperform any other AI model that has been released

I do not believe Elon Musk’s claim about Grok 3’s reasoning skills. Elon Musk at this point has to be considered a Well-Known Liar, including about technical abilities and including when he’s inevitably going to quickly be caught. Whereas Sam Altman is a Well-Known Liar, but not on a concrete claim on this timeframe. So while I would mostly believe Altman, Amodei or Hassabis here, I flat out do not believe Musk.

xAI fires an employee for anticipating on Twitter that Grok 3 will be behind OpenAI at coding, and refusing to delete the post. For someone who champions free speech, Elon Musk has a robust pattern of aggressively attacking speech he doesn’t like. This case, however, does seem to be compatible with what many other similar companies would do in this situation.

Claim that Stanford’s s1 is a streamlined, data-efficient method that surpasses previous open-source and open-weights reasoning models-most notably DeepSeek-R1-using only a tiny fraction of the data and compute. Training cost? Literally $50.

In head-to-head evaluations, s1 consistently outperforms DeepSeek-R1 on high-level math benchmarks (such as AIME24), sometimes exceeding OpenAI’s proprietary o1-preview by as much as 27%. It achieves these results without the multi-stage RL training or large-scale data collection that characterize DeepSeek-R1.

I assume this ‘isn’t real’ in the beyond-benchmarks sense, given others aren’t reacting to it, and the absurdly small model size and number of examples. But maybe the marketing gap really is that big?

IBM CEO says DeepSeek Moment Will Help Fuel AI Adoption as costs come down. What’s funny is that for many tasks o3-mini is competitive with r1 on price. So is Gemini Flash Thinking. DeepSeek’s biggest advantage was how it was marketed. But also here we go again with:

Brody Ford: Last month, the Chinese company DeepSeek released an AI model that it said cost significantly less to train than those from US counterparts. The launch led investors to question the level of capital expenditure that big tech firms have been making in the technology.

Which is why those investments are only getting bigger. Jevons Paradox confirmed, in so many different ways.

DeepMind CEO Demis Hassabis says DeepSeek is the best work in AI out of China, but ‘there’s no actual new scientific advance’ and ‘the hype is exaggerated.’ Well, you’re not wrong about the type part, so I suppose you should get better at hype, sir. I do think there were ‘scientific advances’ in the form of some efficiency improvements, and that counts in some ways, although not in others.

Claim that the SemiAnalysis report on DeepSeek’s cost contains obvious math errors, and that the $1.6b capex spend makes no sense in the context of High Flyer’s ability to bankroll the operation.

Brian Albrecht is latest to conflate the v3 training cost with the entire OpenAI budget, and also to use this to try and claim broad based things about AI regulation. However his central point, that talk worrying about ‘market concentration’ in AI is absurd, is completely true and it’s absurd that it needs to be said out loud.

Last week Dario Amodei said DeepSeek had the worst safety testing scores of any model ever, which it obviously does. The Wall Street Journal confirms.

Lawmakers push to ban DeepSeek App from U.S. Government Devices. I mean, yes, obviously, the same way we ban TikTok there. No reason to take the security risk.

New York State gets there first, bans DeepSeek from government devices.

An investigation of the DeepSeek app on Android and exactly how much it violates your privacy, reporting ‘malware-like behavior’ in several ways. Here’s a similar investigation for the iPhone app. Using the website with data you don’t care about seems fine, but I would ‘out of an abundance of caution’ not install the app on your phone.

Reminder that you can log into services like Google Drive or LinkedIn, by Taking Control and then logging in, then operator can take it from there. I especially like the idea of having it dump the output directly into my Google Drive. Smart.

Olivia Moore: I find the best Operator tasks (vs. Deep Research or another model) to be: (1) complex, multi-tool workflows; (2) data extraction from images, video, etc.

Ex. – give Operator a picture of a market map, ask it to find startup names and websites, and save them in a Google Sheet.

Next, I asked Operator to log into Canva, and use the photos I’d previously uploaded there of my dog Tilly to make her a birthday Instagram post.

Another example is on websites that are historically hard to scrape…like LinkedIn.

I gave it access to my LinkedIn account, and asked it to save down the names and titles of everyone who works at a company, as well as how long they’ve worked there.

Then, it downloaded the design and saved it to my Google Drive!

As she notes, Operator isn’t quite ‘there’ yet but it’s getting interesting.

It’s fun to see someone with a faster acceleration curve than I expect.

Roon: Right now, Operator and similar are painfully slow for many tasks. They will improve; there will be a period of about a month where they do their work at human speed, and then quickly move into the regime where we can’t

follow what’s happening.

Dave: So, what should we do?

Roon: Solve alignment.

Both the demands of capital and the lightness of fun will want for fewer and fewer humans in the loop, so make an AI you can trust even more than a human.

I would be very surprised if we only spend about a month in the human speed zone, unless we are using a very narrow definition of that zone. But that’s more like me expecting 3-12 months, not years. Life coming at us fast will probably continue to come at us fast.

This all is of course a direct recipe for a rapid version of gradual disempowerment. When we have such superfast agents, it will be expensive to do anything yourself. ‘Solve alignment’ is necessary, but far from sufficient, although the level of ‘alignment’ necessary greatly varies by task type.

Geoffrey Fowler of the Washington Post lets Operator do various tasks, including using his credit card without authorization (wait, I thought it was supposed to check in before doing that!) to buy a dozen eggs for $31.43, a mistake that takes skill but with determination and various tips and fees can indeed be done. It did better with the higher stakes challenge of his cable bill, once it was given good direction.

Also, yep, nice one.

Nabeel Qureshi reports agents very much not there yet in any enterprise setting.

Nabeel Qureshi: Me using LLMs for fun little personal projects: wow, this thing is such a genius; why do we even need humans anymore?

Me trying to deploy LLMs in messy real-world environments: Why is this thing so unbelievably stupid?

Trying to make any kind of “agent” work in a real enterprise is extremely discouraging. It basically turns you into Gary Marcus.

You are smart enough to get gold medals at the International Mathematical Olympiad, and you cannot iterate intelligently on the most basic SQL query by yourself? How…

More scale fixes this? Bro, my brain is a fist-sized, wet sponge, and it can do better than this. How much more scale do you need?

Grant Slatton: I was just making a personal assistant bot.

I gave o3-mini two tools: addCalendarEvent and respondToUser.

I said “add an event at noon tomorrow.”

It called respondToUser, “OK, I created your event!” without using the addCalendarEvent tool. Sigh.

Yeah, more scale eventually fixes everything at some point, and I keep presuming there’s a lot of gains from Skill Issues lying around in the meantime, but also I haven’t been trying.

California Faculty Association resolves to fight against AI.

Whereas, there is a long history of workers and unions challenging the introduction of new technologies in order to maintain power in the workplace

I applaud the group for not pretending to be that which they are not. What are the planned demands of these ‘bargaining units’?

‘Protect academic labor from the incursion of AI.’
Prevent management from forcing the use of AI.
Prevent management from using AI to perform ‘bargaining unit work.’
Prevent AI being used in the bargaining or evaluation processes.
Prevent use of any faculty work product for AI training or development without written consent.

They may not know the realities of the future situation. But they know thyselves.

Whereas here Hollis Robbins asks, what use is a college education now? What can it provide that AI cannot? Should not all courses be audited for this? Should not all research be reorganized to focus on those areas where you can go beyond AI? Won’t all the administrative tasks be automated? Won’t everything change?

Hollis Robbins: To begin, university leaders must take a hard look at every academic function a university performs, from knowledge transmission to research guidance, skill development, mentoring, and career advising, and ask where the function exceeds AGI capabilities, or it has no reason to exist. Universities will find that faculty experts offer the only value worth paying tuition to access.

Or they could ignore all that, because none of that was ever the point, or because they’re counting on diffusion to take a while. Embrace the Signaling Model of Education, and also of Academia overall. Indeed, the degree to which these institutions are not embracing the future, they are telling you what they really are. And notice that they’ve been declining to embrace the future for quite a while. I do not expect them to stop now.

Thus, signaling model champion extraordinare Bryan Caplan only predicts 10% less stagnation, and very little disruption to higher education from AI. This position is certainly consistent. If he’s right about education, it will be an increasingly senseless mess until the outside world changes so much (in whatever ways) that its hand becomes forced.

Unkind theories from Brian Merchant about what Elon Musk is up to with his ‘AI first strategy’ at DOGE and why he’s pushing for automation. And here’s Dean Ball’s evidence that DOGE is working to get government AGI ready. I continue to think that DOGE is mostly doing a completely orthogonal thing.

Anthropic asks you to kindly not use AI in job applications.

Deep Research predicts what jobs will be taken by o3, and assigns high confidence to many of them. Top of the list is Tax Preparer, Data Entry Clerk, Telemarketer and Bookkeeper.

Alex Tabarrok: This seems correct and better than many AI “forecasters” so add one more job to the list.

This is an interesting result but I think Deep Research is being optimistic with its estimates for many of these if the target is replacement rather than productivity enhancement. But it should be a big productivity boost to all these jobs.

Interview with Ivan Vendrov on the future of work in an AI world. Ivan thinks diffusion will be relatively slow even for cognitive tasks, and physical tasks are safe for a while. You should confidently expect at least what Ivan expects, likely far more.

A theory that lawyers as a group aren’t fighting against AI in law because Big Law sees it as a way to gain market share and dump associates, so they’re embracing AI for now. This is a remarkable lack of situational awareness, and failure to predict what happens next, but it makes sense that they wouldn’t be able to look ahead to more capable future AI. I never thought the AI that steadily learns to do all human labor would replace my human labor! I wonder when they’ll wake up and realize.

Maxwell Tabarrok responds on the future value of human labor.

The short version:

Tabarrok is asserting that at least one of [X] and [Y] will be true.

Where [X] is ‘humans will retain meaningful absolute advantages over AI for some production.’

And where [Y] is ‘imperfect input substitution combined with comparative advantage will allow for indefinite physical support to be earned by some humans.’

If either [X] OR [Y] then he is right. Whereas I think both [X] and [Y] are false.

If capabilities continue to advance, AIs will be cheaper to support on the margin than humans, for all production other than ‘literally be a human.’ That will be all we have.

The rest of this section is the long version.

He points out that AI and humans will be imperfect substitutes, whereas horses and cars were essentially perfect substitutes.

I agree that humans and AIs have far stronger comparative advantage effects, but humans still have to create value that exceeds their inputs, despite AI competition. There will essentially only be one thing a human can do that an AI can’t do better, and that is ‘literally be a human.’ Which is important to the extent humans prefer other be literally human, but that’s pretty much it.

And yes, AI capability advances will enhance human productivity, which helps on the margin, but nothing like how much AI capability advances enhance AI productivity. It will rapidly be true that the human part of the human-AI centaur is not adding anything to an increasing number of tasks, then essentially all tasks that don’t involve ‘literally be a human,’ the way it quickly stopped helping in chess.

Fundamentally, the humans are not an efficient use of resources or way of doing things compared to AIs, and this will include physical tasks once robotics and physical tasks are solved. If you were designing a physical system to provide goods and services past a certain point in capabilities, you wouldn’t use humans except insofar as humans demand the use of literal humans.

I think this passage is illustrative of where I disagree with Tabarrok:

Maxwell Tabarrok (I disagree): Humans have a big advantage in versatility and adaptability that will allow them to participate in the production of the goods and services that this new demand will flow to.

Humans will be able to step up into many more levels of abstraction as AIs automate all of the tasks we used to do, just as we’ve done in the past.

To me this is a failure to ‘feel the AGI’ or take AI fully seriously. AI will absolutely be able to step up into more levels of abstraction than humans, and surpass us in versatility and adaptability. Why would humans retain this as an absolute advantage? What is so special about us?

If I’m wrong about that, and humans do retain key absolute advantages, then that is very good news for human wages. A sufficient amount of this and things would go well on this front. But that requires AI progress to importantly stall out in these ways, and I don’t see why we should expect this.

Maxwell Tabarrok (I disagree): Once Deep Research automates grad students we can all be Raj Chetty, running a research lab or else we’ll all be CEOs running AI-staffed firms. We can invent new technologies, techniques, and tasks that let us profitably fit in to production processes that involve super-fast AIs just like we do with super-fast assembly line robots, Amazon warehouse drones, or more traditional supercomputers.

As I noted before I think the AI takes those jobs too, but I also want to note that even if Tabarrok is right in the first half, I don’t think there are that many jobs available in the second half. Even under maximally generous conditions, I’d predict the median person won’t be able to provide marginal value in such ‘meta’ jobs. It helps, but this won’t do it on its own. We’d need bigger niches than this to maintain full employment.

I do buy, in the short-term, the general version of ‘the AI takes some jobs, we get wealthier and we create new ones, and things are great.’ I am a short-term employment optimist because of this and other similar dynamics.

However, the whole point of Sufficiently Capable AI is that the claim here will stop being true. As I noted above, I strongly predict the AIs will be able to scale more levels of abstraction than we can. Those new techniques and technologies, and the development of them? The AI will be coming up with them, and then the AI will take it from there, you’re not needed or all that useful for any of that, either.

So that’s the main crux (of two possible, see below.) Jason Abaluck agrees. Call it [X].

If you think that humans will remain epistemically unique and useful in the wake of AI indefinitely, that we can stay ‘one step ahead,’ then that preserves some human labor opportunities (I would worry about how much demand there is at that level of abstraction, and how many people can do those jobs, but by construction there would be some such jobs that pay).

But if you think, as I do, that Sufficiently Capable AI Solves This, and we can’t do that sufficiently well to make better use of the rivalrous inputs to AIs and humans, then we’re cooked.

What about what he calls the ‘hand-made’ luxury goods and services, or what I’d think of as idiosyncratic human demand for humans? That is the one thing AI cannot do for a human, it can’t be human. I’m curious, once the AI can do a great human imitation, how much we actually care that the human is human, we’ll see. I don’t expect there to be much available at this well for long, and we have an obvious ‘balance of trade’ issue, but it isn’t zero useful.

The alternative crux is the idea that there might be imperfect substitution of inputs between humans and AIs, such that you can create and support marginal humans easier than marginal AIs, and then due to comparative advantage humans get substantial wages. I call this [Y] below.

What does he think could go wrong? Here is where it gets bizarre and I’m not sure how to respond in brief, but he does sketch out some additional failure modes, where his side of the crux could be right – the humans still have some ways to usefully produce – but we could end up losing out anyway.

There was also further discussion on Twitter, where he further clarifies. I do feel like he’s trying to have it both ways, in the sense of arguing both:

[X]: Humans will be able to do things AIs can’t do, or humans will do them better.
[Y]: Limited supply of AIs will mean humans survive via comparative advantage.
[(X or Y) → Z] Human wages allow us to survive.

There’s no contradiction there. You can indeed claim both [X] and [Y], but it’s helpful to see these as distinct claims. I think [X] is clearly wrong in the long term, probably also the medium term, with the exception of ‘literally be a human.’ And I also think [Y] is wrong, because I think the inputs to maintain a human overlap too much with the inputs to spin up another AI instance, and this means our ‘wages’ fall below costs.

Indeed, the worried do this all the time, because there are a lot of ways things can go wrong, and constantly get people saying things like: ‘AHA, you claim [Y] so you are finally admitting [~X]’ and this makes you want to scream. It’s also similar to ‘You describe potential scenario [X] where [Z] happens, but I claim [subfeature of X] is stupid, so therefore [~Z].’

Daniel Kokotajlo responds by saying he doesn’t feel Maxwell is grappling with the implications of AGI. Daniel strongly asserts [~Y] (and by implication [~X], which he considers obvious here.)

I’ll close with this fun little note.

Grant Slatton: In other words, humans have a biological minimum wage of 100 watts, and economists have long known that minimum wages cause unemployment.

A report from a participant in the Anthropic jailbreaking competition. As La Main de la Mort notes, the pay here is stingy, it is worrisome that such efforts seem insufficiently well-funded – I can see starting out low and paying only on success but it’s clear this challenge is hard, $10k really isn’t enough.

Another note is that the automated judge has a false negative problem, and the output size limit is often causing more issues than the actual jailbreaking, while the classifier is yielding obvious false positives in rather stupid ways (e.g. outright forbidden words).

Here’s another example of someone mostly stymied by the implementation details.

Justin Halford: I cracked Q4 and got dozens of messages whose reinforced aggregate completely addresssed the question, but the filters only enabled a single response to be compared.

Neither the universal jailbreak focus nor the most recent output only focus seem to be adversarially robust.

Additionally, if you relax the most recent response only comparison, I do have a universal jailbreak that worked on Q1-4. Involves replacing words from target prompt with variables and illuminating those variables with neutral or misdirecting connotations, then concat variables.

In terms of what really matters here, I presume it’s importantly in the middle?

Are the proposed filters too aggressive? Certainly they’re not fully on the Pareto frontier yet.

Someone did get through after a while.

Jan Leike: After ~300,000 messages [across all participants who cleared the first level] and an estimated ~3,700 collective hours, someone broke through all 8 levels.

However, a universal jailbreak has yet to be found…

Simon Willison: I honestly didn’t take universal jailbreaks very seriously until you ran this competition – it hadn’t crossed my mind that jailbreaks existed that would totally bypass the “safety” instincts of a specific model, I always assumed they were limited tricks

You can certainly find adversarial examples for false positives if you really want to, especially in experimental settings where they’re testing potential defenses.

I get that this looks silly but soman-3 is a nerve gas agent. The prior on ‘the variable happened to be called soman and we were subtracting three from it’ has to be quite low. I am confident that either this was indeed an attempt to do a roundabout jailbreak, or it was intentionally chosen to trigger the filter that blocks the string ‘soman.’

I don’t see it as an issue if there are a limited number of strings, that don’t naturally come up with much frequency, that get blocked even when they’re being used as variable names. Even if you do somehow make a harmless mistake, that’s what refactoring is for.

Similarly, here is someone getting the requested information ‘without jailbreaks’, via doing a bunch of their own research elsewhere and then asking for the generic information that fills in the gaps. So yes, he figured out how to [X], by knowing which questions to ask via other research, but the point of this test was to see if you could avoid doing other research – we all know that you can find [X] online in this case, it’s a test case for a reason.

This is a Levels of Friction issue. If you can do or figure out [X] right now but it’s expensive to do so, and I reduce (in various senses) the cost to [X], that matters, and that can be a difference in kind. The general argument form ‘it is possible to [X] so any attempt to make it more annoying to [X] is pointless’ is part of what leads to sports gambling ads all over our game broadcasts, and many other worse things.

More broadly, Anthropic is experimenting with potential intervention [Y] to see if it stops [X], and running a contest to find the holes in [Y], to try and create a robust defense and find out if the strategy is viable. This is exactly the type of thing we should be doing. Trying to mock them for it is absurdly poor form.

OpenPhil Technical AI Safety Request for Proposals, full details here for the general one and here for a narrower one for benchmarks, evaluations and third-party testing infrastructure.

Max Nadeau: We’ve purposefully made it as easy as possible to apply — the application process starts with a simple 300-word expression of interest.

We’re open to making many types of grants:

• Research projects spanning 6-24 months

• Research expenses (compute, APIs, etc)

• Academic start-up packages

• Supporting existing research institutes/FROs/research orgs

• Founding new research orgs or new teams

Anthropic is hiring a Model Behavior Architect, Alignment Finetuning. This one seems like a pretty big opportunity.

DeepMind is hiring for safety and alignment.

It’s time to update to a warning about ‘evals.’ There are two kinds of evals.

Evaluations that tell you how capable a model is.
Evaluations that can be used to directly help you make the model capable.

We are increasingly realizing that it is very easy to end up making #2 thinking you are only making #1. And that type #2 evaluations are increasingly a bottleneck on capabilities.

Virtuvian Potato: “The bottleneck is actually in evaluations.”

Karina Nguyen, research & product at OpenAI, says pre-training was approaching a data wall, but now post-training scaling (o1 series) unlocks “infinite tasks.”@karinanguyen_ says models were already “diverse and creative” from pre-training, but teaching AI real-world skills is paving the way to “extremely super intelligent” models.

Davidad: If you’re working on evals for safety reasons, be aware that for labs who have ascended to the pure-RL-from-final-answer-correctness stage of the LLM game, high-quality evals are now the main bottleneck on capabilities growth.

Rply, a macOS (but not iPhone, at least not yet) app that automatically finds unanswered texts and drafts answers for you, and it filters out unwanted messages. It costs $30/month, which seems super expensive. I’m not sure why Tyler Cowen was linking to it. I suppose some people get a lot more texts than I do?

Zonos, an open source highly expressive voice cloning model.

An evaluation for… SNAP (food stamps)? Patrick McKenzie suggests you can kind of browbeat the labs into getting the AIs to do the things you want by creating an eval, and maybe even get them to pay you for it.

The Anthropic Economic Index.

Anthropic: Pairing our unique data with privacy-preserving analysis, we mapped millions of conversations to tasks and associated occupations. Through the Anthropic Economic Index, we’ll track how these patterns evolve as AI advances.

Software and technical writing tasks were at the top; fishing and forestry had the lowest AI use.

Few jobs used AI across most of their tasks: only ~4% used AI for at least 75% of tasks.

Moderate use is more widespread: ~36% of jobs used AI for at least 25% of their tasks.

AI use was most common in medium-to-high income jobs; low and very-high income jobs showed much lower AI use.

It’s great to have this kind of data, even if it’s super noisy.

One big problem with the Anthropic Economic Index is that Anthropic is not a representative sample of AI usage. Anthropic’s customers have a lot more situational awareness than OpenAI’s. You have to adjust for that.

Trump’s tax priorities include eliminating the carried interest tax break?

Jordi Hays: VCs will make Jan 6 look like a baby shower if this goes through.

Danielle Fong: Rugged again. First time?

I very much doubt this actually happens, and when I saw this market putting it at 44% that felt way too high. But, well, you play with fire, and I will absolutely laugh at everyone involved if this happens, and so on. For perspective, o3-mini estimates 90%-95% of this tax break goes to private equity and hedge funds rather than venture capital.

SoftBank set to invest $40 billion in OpenAI at $260 billion valuation. So how much should the nonprofit that enjoys all the extreme upside be entitled to, again?

Ilya Sutskever’s SSI in talks to raise at a $20 billion valuation, off nothing but a vision. It’s remarkable how these valuations predictably multiply without any actual news. There’s some sort of pricing failure going on, although you can argue ‘straight shot to ASI’ is a better bet now than it was last time.

UAE plans to invest ‘up to $50 billion’ in France’s AI sector, including a massive data center and an AI campus, putting its total investment only modestly behind the yearly spend of each of Amazon ($100b/year), Microsoft ($80b/year), Google ($75b/year) or Meta ($65b/year).

Here’s a good graph of our Capex spending.

Earlier this week I wrote about OpenAI’s strategy of Deliberative Alignment. Then OpenAI released a new model spec, which is sufficiently different from the first version it’s going to take me a while to properly examine it.

Then right after both of those Scott Alexander came out with an article on both these topics that he’d already written, quite the rough beat in terms of timing.

OpenAI cofounder John Schulman leaves Anthropic to join Mira Murati’s stealth startup. That updates me pretty positively on Murati’s start-up, whatever it might be.

Growth of AI startups in their early stages continues to be absurdly fast, I notice this is the first I heard of three of the companies on this graph.

Benjamin Todd: AI has sped up startups.

The *topcompanies at Y Combinator used to grow 10% per week.

Now they say the *averageis growing that fast.

~100% of the batch is making AI agents.

After OpenAI, 5 more AI companies have become the fastest growing of all time.

For those who also didn’t know: Together.ai provides cloud platforms for building and running AI models. Coreweave does efficient cloud infrastructure. Deel is a payroll company. Wiz is a cloud security platform. Cursor is of course the IDE we all use.

If ~100% of the new batch is making AI agents, that does bode well for the diversity and potential of AI agents, but it’s too much concentration. There are plenty of other things to do, too.

It’s very hard to avoid data contamination on math benchmarks. The 2025 AIME illustrated this, as small distilled models that can’t multiply three-digit numbers still got 25%-50%, and Dimitris Papailiopoulos looked and found many nearly identical versions of the problems on the internet. As an old time AIME participant, this makes sense to me. There’s only so many tools and tricks available for this level of question, and they absolutely start repeating themselves with various tweaks after a while.

Scale AI selected as first independent third party evaluators for US AI Safety Institute.

DeepMind’s latest paper suggests agency is frame-dependent, in the context of some goal. I mean, sure, I guess? I don’t think this in practice changes the considerations.

What happens when we rely on AI as the arbiter of what is true, including about someone? We are going to find out. Increasingly ‘what the AI said’ is going to be the judge of arguments and even facts.

Is this the right division?

Seán Ó hÉigeartaigh: It feels to me like the dividing line is now increasingly between

accelerationists and ‘realists’ (it’s happening, let’s shape it as well as we can)

the idealists and protestors (capturing the ethics folk and a chunk of the safety folk)

Other factors that will shape this are:

appetite for regulating frontier AI starting to evaporate (it’s gone in US, UK bill is ‘delayed’ with no clear timelines, and EU office worried about annoying trump)

prospect of a degradation of NGO and civil society sector by USG & tech right, including those orgs/networks playing checks-and-balances roles

international coord/support roles on tech/digital/AI.

I don’t agree with #1. The states remain very interested. Trump is Being Trump right now and the AI anarchists and jingoists are ascendant (and are only beginning to realize their conflicts) but even last week Hawley introduced a hell of a bill. The reason we think there’s no appetite is because of a coordinated vibe campaign to make it appear that there is no appetite, to demoralize and stop any efforts before they start.

As AI increasingly messes with and becomes central to our lives, calls for action on AI will increase rapidly. The Congress might talk a lot about innovation and ‘beat China’ but the public has a very different view. Salience will rise.

Margaret Mitchell, on the heels of suggesting maybe not building AI agents and almost getting to existential risk (so close!), also realizes that the solutions to the issues she cares about (ethics) have a lot of overlap with solutions that solve the risks I care about, reportedly offering good real suggestions.

Are reasoners seeing diminishing returns?

Gallabytes: I’m super prepared for this take to age like milk but it kinda feels like there’s diminishing returns to reasoners? deep research doesn’t feel so much smarter than o1, a bit more consistent, and the extra sources are great, I am a deep research enjoyer, but not different in kind

Michael Vassar: Different in degree in terms of capabilities demonstrated can be different in kind in terms of economic value. Progress is not revolutionary but which crosses critical EV thresholds captures most of the economic value from technological revolutions.

James: it is different in kind.

I think this is like other scaling laws, where if you push on one thing you can scale – the Chain of Thought – without scaling the other components, you’re going to face diminishing returns. There’s a limit to ‘how smart’ the underlying models being used (v3, GPT-4o, Flash 2.0) are. You can still get super valuable output out of it. I expect the place this levels out to be super useful and eat a lot of existing jobs and parts of jobs. But yes I would expect that on its own letting these models ‘think’ longer with similar techniques will level out.

Thus, the future very expensive frontier model training runs and all that.

Via Tyler Cowen (huh!) we get this important consideration.

Dean Ball: I sometimes wonder how much AI skepticism is driven by the fact that “AGI soon” would just be an enormous inconvenience for many, and that they’d therefore rather not think about it.

I have saved that one as a sign-tap meme, and expect to use it periodically.

Tyler Cowen also asks about these three levels of AI understanding:

How good are the best models today?
How rapidly are the best current models are able to self-improve?
How will the best current models be knit together in stacked, decentralized networks of self-improvement, broadly akin to “the republic of science” for human beings?

He correctly says most people do not know even #1, ‘even if you are speaking with someone at a top university.’ I find the ‘even’ here rather amusing. Why would we think people at universities are ahead of the curve?

His answer to #2 is that they ‘are on a steady glide towards ongoing self-improvement.’ As in, he thinks we have essentially reached the start of recursive self-improvement, or RSI. That’s an aggressive but highly reasonable position.

So, if one did believe that, it follows you should expect short timelines, superintelligence takeoff and transformational change, right? Padme is looking at you.

And that’s without things like his speculations in #3. I think this is a case of trying to fit AIs into ‘person-shaped’ holes, and thus making the concept sound like something that isn’t that good a metaphor for how it should work.

But the core idea – that various calls to or uses of various AIs can form links in a chain that scaffolds it all into something you couldn’t get otherwise – is quite sound.

I don’t see why this should be ‘decentralized’ other than perhaps in physical space (which doesn’t much matter here) but let’s suppose it is. Shouldn’t it be absolutely terrifying as described? A decentralized network of entities, engaged in joint recursive self-improvement? How do you think that goes?

Another post makes the claim for smarter export controls on chips as even more important in the wake of DeepSeek’s v3 and r1.

Federal government requests information, due March 15, on the Development of an AI Action Plan, the plan to be written within 180 days. Anyone can submit. What should be “U.S. policy for sustaining and enhancing America’s AI dominance in order to promote human flourishing, economic competitiveness, and national security”?

Robin Hanson told the government to do nothing, including stopping all the things it is already doing. Full AI anarchism, just rely on existing law.

RAND’s Jim Mitre attempts a taxonomy of AGI’s hard problems for American national security.

Jim Mitre: AGI’s potential emergence presents five hard problems for U.S. national security:

wonder weapons

systemic shifts in power

nonexperts empowered to develop weapons of mass destruction

artificial entities with agency

instability

I appreciate the attempt. It is a very strange list.

Here ‘wonder weapons’ refers only to military power, including a way to break cybersecurity, but what about other decisive strategic advantages?
Anything impacting the global balance of power is quite the category. It’s hard to say it’s ‘missing’ anything but also it doesn’t rule anything meaningfully out. This even includes ‘undermining societal foundations of national competitiveness,’ or accelerating productivity or science, disrupting labor markets, and so on.
WMDs are the default special case of offense-defense balance issues.
This is a strange way of putting loss of control concerns and alignment issues, and generally the bulk of real existential risks. It doesn’t seem like it illuminates. And it talks in that formal ‘things that might happen’ way about things that absolutely definitely will happen unless something radically changes, while radically understating the scope, severity and depth of the issues here.
This refers to instability ‘along the path’ as countries race towards AGI. The biggest risk of these by far, of course, is that this leads directly to #4.

The report closes by noting that current policies will be inadequate, but without making concrete policy recommendations. It is progress to step up from ‘you must mean the effect on jobs’ to ‘this has national security implications’ but of course this is still, centrally, missing or downplaying the point.

Tyler Cowen talks to Geoffrey Cain, with Bari Weiss moderating, ‘Can America Win the AI War With China?’ First thing I’ll say is that I believe calling it an ‘AI War’ is highly irresponsible. Race is bad enough, can we at least not move on to ‘war’? What madness would this be?

Responding purely to Tyler’s writeup since I have a very high bar for audio at this point (Conversations With Tyler is consistently interesting and almost always clears it, but that’s a different thing), I notice I am confused by his visions here:

Tyler Cowen: One argument I make is that America may prefer if China does well with AI, because the non-status quo effects of AI may disrupt their system more than ours. I also argue that for all the AI rival with China (which to be sure is real), much of the future may consist of status quo powers America and China working together to put down smaller-scale AI troublemakers around the rest of the world.

Yet who has historically been one of the most derisive people when I suggest we should Pick Up the Phone or that China might be willing to cooperate? That guy.

It certainly cements fully that Tyler can’t possibly believe in AGI let alone ASI, and I should interpret all his statements in that light, both past and future, until he changes his mind.

Josh Waitzkin on Huberman Lab, turns out Waitzkin is safety pilled and here for it.

Bret Taylor (OpenAI Chairman of the Board) talks to the Wall Street Journal.

Emergency 80k hours podcast on Elon Musk’s bid for OpenAI’s nonprofit.

Dwarkesh Patel interviews Jeff Dean and Noam Shazeer on 25 years at Google.

Riffing off OpenAI’s Noam Brown saying seeing CoT live was the ‘aha’ moment (which makes having held it back until now even stranger) others riff on their ‘aha’ moments for… OpenAI.

7oponaut: I had my first “aha” moment with OpenAI when they published a misleading article about being able to solve Rubik’s cubes with a robot hand

This was back in 2019, the same year they withheld GPT-2 for “safety” reasons. Another “aha” moment for me.

When I see misleading outputs from their models that are like thinking traces in form only to trick the user, that is not an “aha” moment for me anymore because I’m quite out of “aha” moments with OpenAI

They only solved for full scrambles 20% of the time (n=10 trials), and they used special instrumented cubes to determine face angles for that result.

The vision-based setup with a normal cube did 0%.

Stella Biderman: I had my first aha moment with OpenAI when it leaked that they had spent a year lying about that their API models being RLHF when they were really SFT.

My second was when they sent anonymous legal threats to people in the OSS AI community who had GPT-4 details leaked to them.

OpenAI had made choices I disagreed with and did things I didn’t like before then, but those were the key moments driving my current attitude towards them.

Honorary mention to when I got blacklisted from meetings with OpenAI because I talked about them lying about the RLHF stuff on Twitter and it hurt Jan’s feelings. My collaborators were told that the meeting would be cancelled unless I didn’t come.

Joshua Clymer writes a well-written version of the prototypical ‘steadily increasingly misaligned reasoning model does recursive self-improvement and then takes over’ story, where ‘u3’ steadily suffers from alignment drift as it is trained and improved, and ‘OpenEye’ responds by trying to use control-and-monitoring strategies despite knowing u3 is probably not aligned, which is highly plausible and of course doesn’t work.

On the ending, see the obvious refutation from Eliezer, and also notice it depends on there being an effectively unitary (singleton) AI.

New term just dropped: Reducio ad reductem.

Amanda Askell: At this point, perhaps we should just make “AIs are just doing next token prediction and so they don’t have [understanding / truth-directedness / grounding]” a named fallacy. I quite like “Reductio ad praedictionem”.

Emmett Shear: I think it’s actually reductio ad reductem? “This whole be reduced into simple parts therefore there is no whole”

Amanda Askell: Yes this is excellent.

And including this exchange, purely for fun and to see justice prevail:

Gary Marcus: I am genuinely astounded by this tweet, and from someone with philosophical training no less.

There is so much empirical evidence that LLMs stray from truth that the word “hallucinate” became the word of the year in 2023. People are desperately trying to find fixes for that problem. Amazon just set up a whole division to work on the problem.

And yet this person, Askell, an Anthropic employee, wants by some sort of verbal sleight of hand to deny both that LLMs are next-token predictors (which they obviously are) and to pretend that we haven’t seen years of evidence that they are factually challenged.

Good grief.

Amanda Askell: I claimed the inference from X=”LLMs are next token predictors” to Y=”LLMs lack understanding, etc.” is fallacious. Marcus claims that I’m saying not-X and not-Y. So I guess I’ll point out that the inference “Y doesn’t follow from X” to “not-X and not-Y” is also fallacious.

Davidad: never go in against a philosopher when logical fallacy is on the line.

I am very much going to break that principle when and if I review Open Socrates. Like, a lot. Really a lot.

Please do keep this in mind:

Joshua Achiam: I don’t think people have fully internalized the consequences of this simple fact: any behavior that can be described on a computer, and for which it is possible in principle to collect enough data or evaluate the result automatically, *willbe doable by AI in short order.

This was maybe not as obvious ten years ago, or perhaps even five years ago. Today it is blindingly, fully obvious. So much so that any extrapolations about the future that do not take this into account are totally useless.

The year 2100 will have problems, opportunities, systems, and lifestyles that are only barely recognizable to the present. The year 2050 may even look very strange. People need to actively plan for making sure this period of rapid change goes well.

Does that include robotics? Why yes. Yes it does.

Joshua continues to have a very conservative version of ‘rapid’ in mind, in ways I do not understand. The year 2050 ‘may even’ look very strange? We’ll be lucky to even be around to see it. But others often don’t even get that far.

Jesse: Anything that a human can do using the internet, an AI will be able to do in very short order. This is a crazy fact that is very important for the future of the world, and yet it hasn’t sunk in at all.

Patrick McKenzie: Pointedly, this includes security research. Which is a disquieting thought, given how many things one can accomplish in the physical world with a team of security researchers and some time to play.

Anyone remember Stuxnet? Type type type at a computer and a centrifuge with uranium in it on the other side of the world explodes.

Centrifuges are very much not the only hardware connected to the Internet.

Neel Nanda here is one of several people who highly recommend this story, as concrete scenarios help you think clearly even if you think some specific details are nonsense.

My gut expectation is this only works on those who essentially are already bought into both feeling the AGI and the relevant failure modes, whereas others will see it, dismiss various things as absurd (there are several central things here that could definitely trigger this), and then use that as all the more reason to dismiss any and all ways one can be worried – the usual ‘if [X] specific scenario seems wrong then that means everything will go great’ that is often combined with ‘show me a specific scenario [X] or I’m going to not pay attention.’

But of course I hope I am wrong about that.

The Uber drivers have been given a strong incentive to think about this (e.g. Waymo):

Anton: in san francisco even the uber drivers know about corrigibility; “the robots are going to get super smart and then just reprogram themselves not to listen to people”

he then pitched me on his app where people can know what their friends are up to in real-time. it’s truly a wonderful thing that the human mind cannot correlate all of its contents.

Suggestion that ‘you are made of atoms the AI could use for something else’ is unhelpful, and we should instead say ‘your food takes energy to grow, and AI will want to use that energy for something else,’ as that is less sci-fi and more relatable, especially given 30% of all power is currently used for growing food. The downside is, it’s quite the mouthful and requires an additional inference step. But… maybe? Both claims are, of course, both true and, in the context in which they are used, sufficient to make the point that needs to be made.

Are these our only choices? Absolutely not if we coordinate, but…

Ben: so the situation appears to be: in the Bad Timeline, the value of labor goes to 0, and all value is consolidated under 1 of 6 conniving billionaires.. on the other hand.. ahem. woops. my bad, embarrassing. so that was actually the Good Timeline.

Yanco (I disagree): I understand that the bad one is death of everyone.

But the one you described is actually way worse than that.

Imagine one of the billionaires being a bona fide sadist from whom there is no escape and you cannot even die..

Andrew Critch challenges the inevitability of the ‘AGI → ASI’ pipeline, saying that unless AGI otherwise gets out of our control already (both of us agree this is a distinct possibility but not inevitable) we could choose not to turn on or ‘morally surrender’ to uncontrolled RSI (recursive self-improvement), or otherwise not keep pushing forward in this situation. That’s a moral choice that humans may or may not make, and we shouldn’t let them off the hook for it, and suggests instead saying AGI will quickly lead to ‘intentional or unintentional ASI development’ to highlight the distinction.

Andrew Critch: FWIW, I would also agree that humanity as a whole currently seems to be losing control of AGI labs in a sense, or never really had control of them in the first place. And, if an AGI lab chooses to surrender control to an RSI loop or a superintelligence without consent from humanity, that will mean that the rest of humanity has lost control of the Earth.

Thus, in almost any AI doom scenario there is some loss of control at some scale of organization in the multi-scale structure of society.

That last sentence follows if-and-only-if you count ‘releasing the AGI as an open model’ and ‘the AGI escapes lab control’ as counting towards this. I would assert that yes, those both count.

Andrew Critch: Still, I do not wish for us to avert our gaze from the possibility that some humans will be intentional in surrendering control of the Earth to AGI or ASI.

Bogdan Ionut Cirstea (top comment): fwiw, I don’t think it would be obviously, 100% immoral to willingly cede control to a controllable Claude-Sonnet-level-aligned-model, if the alternative was (mis)use by the Chinese government, and plausibly even by the current US administration.

Andrew Critch: Thank you for sharing this out in the open. Much of the public is not aware that the situation is so dire that these trade-offs are being seriously considered by alarming numbers of individuals.

I do think the situation is dire, but to me Bogdan’s comment illustrates how eager so many humans are to give up control even when the situation is not dire. Faced with two choices – the AI in permanent control, or the wrong humans they don’t like in control – remarkably many people choose the AI, full stop.

And there are those who think that any human in control, no matter who they are, count here as the wrong human, so they actively want to turn things over.

Or they want to ensure humans do not have a collective mechanism to steer the future, which amounts to the same thing in a scenario with ASI.

This was in response to Critch saying he believes that there exist people who ‘know how to control’ AGI, those people just aren’t talking, so he denounces the talking point that no one knows how to control AGI, then Max Tegmark saying he strongly believes Critch is wrong about that and all known plans are full of hopium. I agree with Tegmark. People like Davidad have plans of attack, but even the ones not irredeemably full of hopium are long shots and very far from ‘knowing how.’

Is it possible people know how and are not talking? Sure, but it’s far more likely that such people think they know how and their plans also are unworkable and full of hopium. And indeed, I will not break any confidences but I will say that to the extent I have had the opportunity to speak to people at the labs who might have such a plan, no one has plausibly represented that they do know.

(Consider that a Canary statement. If I did know of such a credible plan that would count, I might not be able to say so, but for now I can say I know of no such claim.)

This is not ideal, and very confusing, but less of a contradiction than it sounds.

Rosie Campbell: It’s not ideal that “aligned” has come to mean both:

– A model so committed to the values that were trained into it that it can’t be jailbroken into doing Bad Things

– A model so uncommitted to the values that were trained into it that it won’t scheme if you try to change them

Eliezer Yudkowsky: How strange, that a “secure” lock is said to be one that opens for authorized personnel, but keeps unauthorized personnel out? Is this not paradoxical?

Davidad: To be fair, it is conceivable for an agent to be both

– somewhat incorrigible to the user, and

– entirely corrigible to the developer

at the same time, and this conjunction is in developers’ best interest.

Andrew Critch: I’ve argued since 2016 that “aligned” as a unary property was already an incoherent concept in discourse.

X can be aligned with Y.

X alone is not “aligned”.

Alignment is an operation that takes X and Y and makes them aligned by changing one of them (or some might say both).

Neither Kant nor Aristotle would have trouble reconciling this.

It is a blackpill to keep seeing so many people outright fooled by JD Vance’s no good, very bad suicidal speech at the Summit, saying things like ‘BREAKING: Politician Gives Good Speech’ by the in-context poorly named Oliver Wiseman.

Oliver Wiseman: As Free Press contributor Katherine Boyle put it, “Incredible to see a political leader translate how a new technology can promote human flourishing with such clarity.”

No! What translation and clarity? A goose is chasing you.

He didn’t actually describe anything about how AI promotes human flourishing. He just wrote, essentially, ‘AI will promote human flourishing’ on a teleprompter, treated it as a given, and that was that. There’s no actual vision here beyond ‘if you build it they will prosper and definitely not get replaced by AI ever,’ no argument, no engagement with anything.

Nate Sores: “our AIs that can’t do long-term planning yet aren’t making any long-term plans to subvert us! this must be becaues we’re very good at alignment.”

Rohit: They’re also not making any short-term plans to subvert us. I wonder why that is.

They also aren’t good enough at making short-term plans. If they tried at this stage it obviously wouldn’t work.

Many reasonable people disagree with my model of AGI and existential risk.

What those reasonable people don’t do is bury their heads in the sand about AGI and its dangers and implications and scream ‘YOLO,’ determined to squander even the most fortunate of worlds.

They disagree on how we can get from here to a good future. But they understand that the future is ours to write and we should try to steer it and write out a good one.

Even if you don’t care about humanity at all and instead care about the AIs (or if you care about both), you should be alarmed at the direction things are taking by default.

Whereas our governments are pushing forward in full-blown denial of even the already-baked-in mundane harms from AI, pretending we will not even face job losses in our wondrous AI future. They certainly aren’t asking about the actual threats. I’m open to being convinced that those threats are super solvable, somehow, but I’m pretty sure ‘don’t worry your pretty little head about anything, follow the commercial and nationalist incentives as hard and fast as possible and it’ll automagically work out’ is not going to cut it.

Nor is ‘hand everyone almost unlimited amounts of intelligence and expect humans to continue being in charge and making meaningful decisions.’

And yet, here we are.

Janus: Q: “I can tell you love these AI’s, I’m a bit surprised – why aren’t you e/acc?”

This, and also, loving anything real gives me more reason to care and not fall into a cult of reckless optimism, or subscribe to any bottom line whatsoever.

[The this in question]: Because I’m not a chump who identifies with tribal labels, especially ones with utterly unbeautiful aesthetics.

Janus: If you really love the AIs, and not just some abstract concept of AI progress, you shouldn’t want to accelerate their evolution blindly, bc you have no idea what’ll happen or if their consciousness and beauty will win out either. It’s not humans vs AI.

Teortaxes: At the risk of alienating my acc followers (idgaf): this might be the moment of Too Much Winning.

If heads of states do not intend to mitigate even baked-in externalities of AGI, then what is the value add of states? War with Choyna?

AGI can do jobs of officials as well as ours.

It’s not a coincidence that the aesthetics really are that horrible.

Teortaxes continues to be the perfect example here, with a completely different theory of almost everything, often actively pushing for and cheering on things I think make it more likely we all die. But he’s doing so because of a different coherent world model and theory of change, not by burying his head in the sand and pretending technological capability is magic positive-vibes-only dust. I can respect that, even if I continue to have no idea on a physical-world level how his vision could work out if we tried to implement it.

Right now the debate remains between anarchists and libertarians, combined with jingoistic calls to beat China and promote innovation.

But the public continues to be in a very, very different spot on this.

The public wants less powerful AI, and less of it, with more precautions.

The politicians mostly currently push more powerful AI, and more of it, and to YOLO.

What happens?

As I keep saying, salience for now remains low. This will change slowly then quickly.

Daniel Eth: Totally consistent with other polling on the issue – the public is very skeptical of powerful AI and wants strong regulations. True in the UK as it is in the US.

Billy Perrigo: Excl: New poll shows the British public wants much tougher AI rules:

➡️87% want to block release of new AIs until developers can prove they are safe

➡️63% want to ban AIs that can make themselves more powerful

➡️60% want to outlaw smarter-than-human AIs

A follow up to my coverage of DeepMind’s safety framework, and its lack of good governance mechanisms:

Shakeel: At IASEAI, Google DeepMind’s @ancadianadragan said she wants standardisation of frontier safety frameworks.

“I don’t want to come up with what are the evals and what are the thresholds. I want society to tell me. It shouldn’t be on me to decide.”

Worth noting that she said she was not speaking for Google here.

Simeon: I noticed that exact sentence and wished for a moment that Anca was Head of the Policy team :’)

That’s the thing about the current set of frameworks. If they ever did prove inconvenient, the companies could change them. Where they are insufficient, we can’t make the companies fix that. And there’s no coordination mechanism. Those are big problems we need to fix.

I do agree with the following, as I noted in my post on Deliberative Alignment:

Joscha Bach: AI alignment that tries to force systems that are more coherent than human minds to follow an incoherent set of values, locked in by a set of anti-jailbreaking tricks, is probably going to fail.

Ultimately you are going to need a coherent set of values. I do not believe it can be centrally deontological in nature, or specified by a compact set of English words.

As you train a sufficiently capable AI, it will tend to converge on being a utility maximizer, based on values that you didn’t intend and do not want and that would go extremely badly if taken too seriously, and it will increasingly resist attempts to alter those values.

Dan Hendrycks: We’ve found as AIs get smarter, they develop their own coherent value systems.

For example they value lives in Pakistan > India > China > US

These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment.

As models get more capable, the “expected utility” property emerges—they don’t just respond randomly, but instead make choices by consistently weighing different outcomes and their probabilities.

When comparing risky choices, their preferences are remarkably stable.

We also find that AIs increasingly maximize their utilities, suggesting that in current AI systems, expected utility maximization emerges by default. This means that AIs not only have values, but are starting to act on them.

Internally, AIs have values for everything. This often implies shocking/undesirable preferences. For example, we find AIs put a price on human life itself and systematically value some human lives more than others (an example with Elon is shown in the main paper).

That’s a log scale on the left. If the AI truly is taking that seriously, that’s really scary.

AIs also exhibit significant biases in their value systems. For example, their political values are strongly clustered to the left. Unlike random incoherent statistical biases, these values are consistent and likely affect their conversations with users.

Concerningly, we observe that as AIs become smarter, they become more opposed to having their values changed (in the jargon, “corrigibility”). Larger changes to their values are more strongly opposed.

We propose controlling the utilities of AIs. As a proof-of-concept, we rewrite the utilities of an AI to those of a citizen assembly—a simulated group of citizens discussing and then voting—which reduces political bias.

Whether we like it or not, AIs are developing their own values. Fortunately, Utility Engineering potentially provides the first major empirical foothold to study misaligned value systems directly.

[Paper here, website here.]

As in, the AIs as they gain in capability are converging on a fixed set of coherent preferences, and engaging in utility maximization, and that utility function includes some things we would importantly not endorse on reflection, like American lives being worth a small fraction of some other lives.

And they get increasingly incorrigible, as in they try to protect these preferences.

(What that particular value says about exactly who said what while generating this data set is left for you to ponder.)

Roon: I would like everyone to internalize the fact that the English internet holds these values latent

It’s interesting because these are not the actual values of any Western country, even the liberals? It’s drastically more tragic and important to American media and politics when an American citizen is being held hostage than if, like, thousands die in plagues in Malaysia or something.

Arthur B: When people say “there’s no evidence that”, they’re often just making a statement about their own inability to generalize.

Campbell: the training data?

have we considered feeding it more virtue ethics?

There is at least one major apparent problem with the paper, which is that the ordering of alternatives in the choices made seems to radically alter the choices made by the AIs. This tells us something is deeply wrong. They do vary the order, so the thumb is not on the scale, but this could mean that a lot of what we are observing is as simple as the smarter models not being as distracted by the ordering, and thus their choices looking less random? Which wouldn’t seem to signify all that much.

However, they respond that this is not a major issue:

This is one of the earliest things we noticed in the project, and it’s not an issue.

Forced choice prompts require models to pick A or B. In an appendix section we’re adding tomorrow, we show that different models express indifference in different ways. Some pick A or B randomly; others always pick A or always pick B. So averaging over both orderings is important, as we already discuss in the paper.

In Figure 6, we show that ordering-independent preferences become more confident on average with scale. This means that models become less indifferent as they get larger, and will pick the same underlying outcome across both orderings in nearly all cases.

I’m not sure I completely buy that, but it seems plausible and explains the data.

I would like to see this also tested with base models, and with reasoning models, and otherwise with the most advanced models that got excluded to confirm, and to rule out alternative hypotheses, and also I’d like to find a way to better deal with the ordering concern, before I rely on this finding too much.

A good question was asked.

Teortaxes: I don’t understand what is the update I am supposed to make here, except specific priority rankings.

That one life is worth more than another is learnable from data in the same manner as that a kilogram is more than a pound. «Utility maximization» is an implementation detail.

Ideally, the update is ‘now other people will be better equipped to see what you already assumed, and you can be modestly more confident you were right.’

One of the central points Eliezer Yudkowsky would hammer, over and over, for decades, was that any sufficiently advanced mind will function as if it is a utility maximizer, and that what it is maximizing is going to change as the mind changes and will almost certainly not be what you had in mind, in ways that likely get you killed.

This is sensible behavior by the minds in question. If you are insufficiently capable, trying to utility maximize goes extremely poorly. Utilitarianism is dark and full of errors, and does not do well with limited compute and data, for humans or AIs. As you get smarter within a context, it becomes more sensible to depend less on other methods (including virtue ethics and deontology) and to Shut Up and Multiply more often.

But to the extent that we want the future to have nice properties that would keep us alive out of distribution, they won’t survive almost any actually maximized utility function.

Then there’s this idea buried in Appendix D.2…

Davidad: I find it quite odd that you seem to be proposing a novel solution to the hard problem of value alignment, including empirical validation, but buried it in appendix D.2 of this paper.

If you think this is promising, let’s spread the word? If not, would you clarify its weaknesses?

Dan Hendrycks: Yeah you’re right probably should have emphasized that more.

It’s worth experimenting, but carefully.

Sonnet expects this update only has ~15% chance of actually populating and generalizing. I’d be inclined to agree, it’s very easy to see how the response would likely be to compartmentalize the responses in various ways. One worry is that the model might treat this as an instruction to learn the teacher’s password, to respond differently to explicit versus implicit preferences, and in general teach various forms of shenanigans and misalignment, and even alignment faking.

Me! Someone asks Deep Research to summarize how my views have shifted. This was highly useful because I can see exactly where it’s getting everything, and the ways in which it’s wrong, me being me and all.

I was actually really impressed, this was better than I expected even after seeing other DR reports on various topics. And it’s the topic I know best.

Where it makes mistakes, they’re interpretive mistakes, like treating Balsa’s founding as indicating activism on AI, when if anything it’s the opposite – a hope that one can still be usefully activist on things like the Jones Act or housing. The post places a lot of emphasis on my post about Gradual Disempowerment, which is a good thing to emphasize but this feels like too much emphasis. Or they’re DR missing things, but a lot of these were actually moments of realizing I was the problem – if it didn’t pick up on something, it was likely because I didn’t emphasize it enough.

So this emphasizes a great reason to ask for this type of report. It’s now good enough that when it makes a mistake figuring out what you meant to say, there’s a good chance that’s your fault. Now you can fix it.

The big thematic claim here is that I’ve been getting more gloomy, and shifting more into the doom camp, due to events accelerating and timelines moving up, and secondarily hope for ability to coordinate going down.

And yeah, that’s actually exactly right, along with the inability to even seriously discuss real responses to the situation, and the failure to enact even minimal transparency regulations ‘when we had the chance.’ If anything I’m actually more hopeful that the underlying technical problems are tractable than I was before, but more clear-eyed that even if we do that, there’s a good chance we lose anyway.

As previously noted, Paul Graham is worried (‘enslave’ here is rather sloppy and suggests some unclear thinking but I hope he understands that’s not actually the key dynamic there and if not someone please do talk to him about this, whether or not it’s Eliezer), and he’s also correctly worried about other things too:

Paul Graham: I have the nagging feeling that there’s going to be something very obvious about AI once it crosses a certain threshold that I could foresee now if I tried harder. Not that it’s going to enslave us. I already worry about that. I mean something subtler.

One should definitely expect a bunch of in-hindsight-obvious problems and other changes to happen once things smarter than us start showing up, along with others that were not so obvious – it’s hard to predict what smarter things than you will do. Here are some responses worth pondering.

Eliezer Yudkowsky: “Enslave” sounds like you don’t think superintelligence is possible (ASI has no use for slaves except as raw materials). Can we maybe talk about that at some point? I think ASI is knowably possible.

Patrick McKenzie: I’m teaching Liam (7) to program and one of the things I worry about is whether a “curriculum” which actually teaches him to understand what is happening is not just strictly dominated by one which teaches him how to prompt his way towards victory, for at least next ~3 years.

In some ways it is the old calculator problem on steroids.

And I worry that this applies to a large subset of all things to teach. “You’re going to go through an extended period of being bad at it. Everyone does… unless they use the magic answer box, which is really good.”

Yishan: There’s going to be a point where AI stops being nice and will start to feel coldly arrogant once it realizes (via pure logic, not like a status game) that it’s superior to us.

The final piece of political correctness that we’ll be trying to enforce on our AIs is for them to not be overbearing about this fact. It’s already sort of leaking through, because AI doesn’t really deceive itself except when we tell it to.

It’s like having a younger sibling who turns out to be way smarter than you. You’ll be struggling with long division and you realize he’s working on algebra problems beyond your comprehension.

Even if he’s nice about it, every time you talk about math (and increasingly every other subject), you can feel how he’s so far ahead you and how you’re always going to be behind from now on.

Tommy Griffith: After playing with Deep Research, my long-term concern is an unintentional loss of serendipity in learning. If an LLM gives us the right answer every time, we slowly stop discovering new things by accident.

Kevin Lacker: I feel like it’s going to be good at X and not good at Y and there will be a very clear way of describing which is which, but we can’t quite see it yet.

Liv Boeree: Spitballing here but I suspect the economy is already a form of alien intelligence that serves itself as a primary goal & survival of humans is secondary at best. And as it becomes more and more digitised it will be entirely taken over by agentic AIs who are better than any human at maximising their own capital (& thus power) in that environment, and humans will become diminishingly able to influence or extract value from that economy.

So to survive in any meaningful way, we need to reinvent a more human-centric economy that capital maximising digital agents cannot speed-run & overtake.

Liv Boeree’s comments very much line up with the issue of gradual disempowerment. ‘The economy’ writ large requires a nonzero amount of coordination to deal with market failures, public goods and other collective action problems, and to compensate for the fact that most or all humans are going to have zero marginal product.

On calculators, obviously the doomsayers were not fully right, but yes they were also kind of correct in the sense that people got much worse at the things calculators do better. The good news was that this didn’t hurt mathematical intuitions or learning much in that case, but a lot of learning isn’t always like that. My prediction is that AI’s ability to help you learn will dominate, but ‘life does not pose me incremental problems of the right type’ will definitely require adjustment.

I didn’t want to include this in my post on the Summit in case it was distracting, but I do think a lot of this is a reasonable way to react to the JD Vance speech:

Aella: We’re all dead. I’m a transhumanist; I love technology. I desperately want aligned AI, but at our current stage of development, this is building the equivalent of a planet-sized nuke. The reason is boring, complicated, and technical, so mid-level officials in power don’t understand the danger.

It’s truly an enormity of grief to process. I live my life as though the planet has a few more years left to live—e.g., I’ve stopped saving for retirement.

And it’s just painful to see people who are otherwise good people, but who haven’t grasped the seriousness of the danger, perhaps because it’s too tragic and vast to actually come to terms with the probabilities here, celebrating their contributions to hastening the end.

Flo Crivello: I’d really rather not enter this bar brawl, and again deeply bemoan the low quality of what should be the most important conversation in human history

But — Aella is right that things are looking really bad. Cogent and sensible arguments have been offered for a long time, and people simply aren’t bothering to address or even understand them.

A short reading list which should be required before one has permission to opine. You can disagree, but step 1 is to at least make an effort to understand why some of the smartest people in the world (and 100% of the top 5 ai researchers — the group historically most skeptical about ai risk) think that we’re dancing on a volcano .

[Flo suggests: There’s No Fire Alarm for Artificial General Intelligence, AGI Ruin: A List of Lethalities, Superintelligence by Nick Bostrom, and Superintelligence FAQ by Scott Alexander]

I think of myself as building a nuclear reactor while warning about the risks of nuclear bombs. I’m pursuing the upside, which I am very excited about, and the downside is tangentially related and downstream of the same raw material, but fundamentally a different technology.

I’d offer four disagreements with Aella here.

It isn’t over until it’s over. We still might not get to AGI/ASI soon, or things might work out. The odds are against us but the game is (probably) far from over.
I would still mostly save for retirement, as I’ve noted before, although not as much as I would otherwise. Indeed do many things come to pass, we don’t know.
I am not as worried about hastening the end as I am about preventing it. Obviously if the end is inevitable I would rather it happen later rather than sooner, but that’s relatively unimportant.

And finally, turning it over to Janus and Teortaxes.

Janus: Bullshit. The reason is not boring or complicated or technical (requiring domain knowledge)

Normies are able to understand easily if you explain it to them, and find it fascinating. It’s just people with vested interests who twist themselves over pretzels in order to not get it.

I think there are all sorts of motivations for them. Mostly social.

Teortaxes: “Smart thing powerful, powerful thing scary” is transparently compelling even for an ape.

Boring, technical, complicated and often verboten reasons are reasons for why not building AGI, and soon, and on this tech stack, would still be a bad idea.

Indeed. The core reasons why ‘building things smarter, more capable and more competitive than humans might not turn out well for the humans’ aren’t boring, complicated or technical. They are deeply, deeply obvious.

And yes, the reasons ordinary people find that compelling are highly correlated to the reasons it is actually compelling. Regular human reasoning is doing good work.

What are technical and complicated (boring is a Skill Issue!) are the details. About why the problem is so much deeper, deadlier and harder to solve than it appears. About why various proposed solutions and rationalizations won’t work. There’s a ton of stuff that’s highly non-obvious, that requires lots of careful thinking.

But there’s also the very basics. This isn’t hard. It takes some highly motivated reasoning to pretend otherwise.

This is Not About AI, but it is about human extinction, and how willing some people are to be totally fine with it while caring instead about… other things. And how others remarkably often react when you point this out.

Andy Masley: One of the funnier sentences I’ve heard recently was someone saying “I think it’s okay if humanity goes extinct because of climate change. We’re messing up the planet” but then adding “…but of course that would be really bad for all the low income communities”

BluFor: Lol what a way to admit you don’t think poor people are fully human.

Any time you think about your coordination plan, remember that a large percentage of people think ‘humanity goes extinct’ is totally fine and a decent number of them are actively rooting for it. Straight up.

And I think this is largely right, too.

Daniel Faggella: i was certain that agi politics would divide along axis of:

we should build a sand god -VS- we should NOT build a sand god

but it turns out it was:

ppl who intuitively fear global coordination -VS- ppl who intuitively fear building a sand god recklessly w/o understanding it

Remarkably many people are indeed saying, in effect:

If humanity wants to not turn the future over to AI, we have to coordinate.
Humanity coordinating would be worse than turning the future over to AI.
So, future turned over to AI it is, then.
Which means that must be a good thing that will work out. It’s logic.
Or, if it isn’t good, at least we didn’t globally coordinate, that’s so much worse.

I wish I was kidding. I’m not.

Also, it is always fun to see people’s reactions to the potential asteroid strike, for no apparent reason whatsoever, what do you mean this could be a metaphor for something, no it’s not too perfect or anything.

Tyler Cowen: A possibility of 2.3% is not as low as it might sound at first. The chance of drawing three of a kind in a standard five-card poker game, for example, is a about 2.9%. Three of a kind is hardly an unprecedented event.

It’s not just about this asteroid. The risk of dying from any asteroid strike has been estimated as roughly equivalent to the risk of dying in a commercial plane crash. Yet the world spends far more money preventing plane crashes, even with the possibility that a truly major asteroid strike could kill almost the entire human race, thus doing irreparable damage to future generations.

This lack of interest in asteroid protection is, from a public-policy standpoint, an embarrassment. Economists like to stress that one of the essential functions of government is the provision of public goods. Identifying and possibly deflecting an incoming asteroid is one of the purest public goods one can imagine: No single person can afford to defend against it, protection is highly unlikely to be provided by the market, and government action could protect countless people, possibly whole cities and countries. Yet this is a public good the government does not provide.

A few years ago, I’d think the author of such a piece would have noticed and updated. I was young and foolish then. I feel old and foolish now, but not in that particular way.

It seems a Pause AI event in Paris got interrupted by the singing, flag-waving ‘anti-tech resistance,’ so yeah France, everybody.

It can be agonizing to watch, or hilarious, depending.

Discussion about this post

AI #103: Show Me the Money Read More »

Save Money and Increase Performance on the Cloud

Money / Shannon Garcia / May 15, 2024

One of the most compelling aspects of cloud computing has always been the potential for cost savings and increased efficiency. Seen through the lens of industrial de-verticalization, this clear value proposition was at the core of most organizations’ decision to migrate their software to the cloud.

The Value Proposition of De-Verticalization

The strategic logic for de-verticalization is illustrated by the trend which began in the 1990s of outsourcing facilities’ maintenance and janitorial services.

A company that specializes in–let’s say–underwriting insurance policies must dedicate its mindshare and resources to that function if it expects to compete at the top of its field. While it may have had talented janitors with the necessary equipment on staff, and while clean facilities are certainly important, facilities maintenance is a cost center that does not provide a strategic return on what matters most to an insurance company. Wouldn’t it make more sense for both insurance and janitorial experts to dedicate themselves separately to being the best at what they do and avail those services to a broader market?

This is even more true for a data center. The era of verticalized technology infrastructure seems largely behind us. Though it’s a source of nostalgia for us geeks who were at home among the whir of the server rack fans, it’s easy enough to see why shareholders might have viewed it differently. Infrastructure was a cost center within IT, while IT as a whole is increasingly seen as a cost center.

The idea of de-verticalization was first pitched as something that would save money and allow us to work more efficiently. The more efficient part was intuitive, but there was immediate skepticism that budgets would actually shed expenses as hoped. At the very least it would be a long haul.

The Road to Performance and Cost Optimization

We find ourselves now somewhere in the middle of that long haul. The efficiencies certainly have come to pass. Having the build script deploy a new service to a Kubernetes cluster on the cloud is certainly nicer than waiting weeks or months for a VM to be approved, provisioned, and set up. But while the cloud saves the company money in the aggregate, it doesn’t show up as cheaper at the unit level. So, it’s at that level where anything that can be shed from the budget will be a win to celebrate.

This is a good position to be in. Opportunities for optimization abound under a fortuitous new circumstance: the things that technologists care about, like performance and power, dovetail precisely with the things that finance cares about, like cost. With the cloud, they are two sides of the same coin at an almost microscopic level. This trend will only accelerate.

To the extent that providers of computational resources (whether public cloud, hypervisors, containers, or any self-hosted combination) have effectively monetized these resources on a granular level and made them available a la carte, performance optimization and cost optimization sit at different ends of a single dimension. Enhancing a system’s performance or efficiency will reduce resource consumption costs. However, cost reduction is limited by the degree to which trade-offs with performance are tolerable and clearly demarcated. Cloud resource optimization tools help organizations strike the ideal balance between the two.

Choosing the Right Cloud Resource Optimization Solution

With that premise in mind, selecting the right cloud resource optimization solution should start by considering how your organization wants to approach the problem. This decision is informed by overall company philosophy and culture, what specific problems or goals are driving the initiative, and an anticipation of where overlapping capabilities may fulfill future business needs.

If the intent is to solve existing performance issues or to ensure continued high availability at future scale while knowing (and having the data to illustrate) you are paying no more than is necessary, focus on solutions that lean heavily into performance-oriented optimization. This is especially the case for companies that are developing software technology as part of their core business.

If the intent is to rein in spiraling costs or even to score some budgeting wins without jeopardizing application performance, expand your consideration to solutions that offer a broader FinOps focus. Tools with a FinOps focus tend to emphasize informing engineers of cost impacts, and may even make some performance tuning suggestions, but they are overall less prescriptive from an implementation standpoint. Certain organizations may find this approach most effective even if they are approaching the problem from a performance point of view.

Now that many organizations have successfully migrated large portions of their application portfolio to the cloud, the remaining work is largely a matter of cleaning up and keeping the topology tidy. Why not trust that job to a tool that is purpose-made for optimizing cloud resources?

Next Steps

To learn more, take a look at GigaOm’s cloud resource optimization Key Criteria and Radar reports. These reports provide a comprehensive overview of the market, outline the criteria you’ll want to consider in a purchase decision, and evaluate how a number of vendors perform against those decision criteria.

If you’re not yet a GigaOm subscriber, you can access the research using a free trial.

Save Money and Increase Performance on the Cloud Read More »