Author name: DJ Henderson

gecko-feet-inspire-anti-slip-shoe-soles

Gecko feet inspire anti-slip shoe soles

Just add zirconia nanoparticles…

diagram of wet ice's quasi slippery layer and design of anti-slip shoe soles inspired by gecko and toad foot pads

Credit: V. Richhariya et al., 2025

It’s the “hydrophilic capillary-enhanced adhesion”of gecko feet that most interested the authors of this latest paper. Per the World Health Organization, 684,000 people die and another 38 million are injured every year in slips and falls, with correspondingly higher health care costs. Most antislip products (crampons, chains, studs, cleats), tread designs, or materials (fiberglass, carbon fiber, rubber) are generally only effective for specific purposes or short periods of time. And they often don’t perform as well on wet ice, which has a nanoscale quasi-liquid layer (QLL) that makes it even more slippery.

So Vipin Richhariya of the University of Minho in Portugal and co-authors turned to gecko toe pads (as well as those of toads) for a better solution. To get similar properties in their silicone rubber polymers, they added zirconia nanoparticles, which attract water molecules. The polymers were rolled into a thin film and hardened, and then a laser etched groove patterns onto the surface—essentially creating micro cavities that exposed the zirconia nanoparticles, thus enhancing the material’s hydrophilic effects.

Infrared spectroscopy and simulated friction tests revealed that the composites containing 3 percent and 5 percent zirconia nanoparticles were the most slip-resistant. “This optimized composite has the potential to change the dynamics of slip-and-fall accidents, providing a nature-inspired solution to prevent one of the most common causes of accidents worldwide,” the authors concluded. The material could also be used for electronic skin, artificial skin, or wound healing.

DOI: ACS Applied Materials & Interfaces, 2025. 10.1021/acsami.4c14496  (About DOIs).

Gecko feet inspire anti-slip shoe soles Read More »

popular-linux-orgs-freedesktop-and-alpine-linux-are-scrambling-for-new-web-hosting

Popular Linux orgs Freedesktop and Alpine Linux are scrambling for new web hosting

Having worked “around the clock” to move from Google Cloud Platform after its open source credits there ran out, and now rushing to move off Equinix, Tissoires suggests a new plan: “[H]ave [freedesktop.org] pay for its own servers, and then have sponsors chip in.”

“Popular without most users knowing it”

Alpine Linux, a small, security-minded distribution used in many containers and embedded devices, also needs a new home quickly. As detailed in its blog, Alpine Linux uses about 800TB of bandwidth each month and also needs continuous integration runners (or separate job agents), as well as a development box. Alpine states it is seeking co-location space and bare-metal servers near the Netherlands, though it will consider virtual machines if bare metal is not feasible.

Like X.org/Freedesktop, Alpine is using this moment as a wake-up call. Responding to Ars, Carlo Landmeter, who serves on Alpine’s council, noted that Alpine Linux is a kind of open source project “that became popular without most users knowing it.” Users are starting to donate, and companies are reaching out to help, but it’s still “early days,” Landmeter wrote.

Every so often, those working at the foundations of open source software experience something that highlights the mismatch between a project’s importance and its support and funding. Perhaps some people or some organizations will do the harder work of finding a sustaining future for these projects.

Ars has reached out to Equinix and X/Freedesktop and will update this post with responses.

Popular Linux orgs Freedesktop and Alpine Linux are scrambling for new web hosting Read More »

starlink-profit-growing-rapidly-as-it-faces-a-moment-of-promise-and-peril

Starlink profit growing rapidly as it faces a moment of promise and peril

Estimates of Starlink’s consumer revenues.

Credit: Quilty Space

Estimates of Starlink’s consumer revenues. Credit: Quilty Space

Both of the new analyses indicate that over the course of the last decade, SpaceX has built a robust space-Internet business with affordable ground terminals, sophisticated gateways around the world, more than 7,000 satellites in orbit, and a reusable launch business to service the network. There is new technology coming, with larger V3 satellites on the horizon—to be launched by SpaceX’s Starship vehicle—and the promise of direct-to-cell Internet connectivity that bypasses the need for a ground terminal.

There is also plenty of room for growth in market share in both existing territories as well as large nations such as India, where SpaceX is seeking access to the market and providing Internet service.

Some risk on the horizon

In all of this, Starlink now faces a moment of promise and peril. The company has all of the potential described above, but SpaceX founder Elon Musk has become an increasingly prominent and controversial figure both in US and global politics. Many people and governments are becoming more uncomfortable with Musk’s behavior, his insertion into domestic and foreign politics, and the power he is wielding within the Trump administration.

In the near term, this may be good for Starlink’s business. The Financial Times reported that corporate America, in an effort to deepen ties with the Trump Administration, has been “cozying” up to Musk and his business empire. This includes Starlink, with United Airlines accelerating a collaboration for use of the service on its fleet, as well as deals with Oracle and Apple.

At the same time, Musk’s activities may make it challenging for Starlink in the long term in countries that seek to punish him and his companies. For example, the Canadian Broadcasting Corporation reported Monday that Progressive Conservative Leader Doug Ford will rip up Ontario’s nearly $100 million contract with Starlink in the wake of US tariffs on virtually all Canadian goods.

The contract, signed in November, was intended to provide high-speed Internet to 15,000 eligible homes and businesses in rural, remote, and northern communities by June of this year. Musk is “part of the Trump team that wants to destroy families, incomes, destroy businesses,” Ford said at a news conference Monday. “He wants to take food off the table of people—hard-working people—and I’m not going to tolerate it.”

Starlink profit growing rapidly as it faces a moment of promise and peril Read More »

openai-says-its-models-are-more-persuasive-than-82-percent-of-reddit-users

OpenAI says its models are more persuasive than 82 percent of Reddit users

OpenAI’s models have shown rapid progress in their ability to make human-level persuasive arguments in recent years.

OpenAI’s models have shown rapid progress in their ability to make human-level persuasive arguments in recent years. Credit: OpenAI

OpenAI has previously found that 2022’s ChatGPT-3.5 was significantly less persuasive than random humans, ranking in just the 38th percentile on this measure. But that performance jumped to the 77th percentile with September’s release of the o1-mini reasoning model and up to percentiles in the high 80s for the full-fledged o1 model. The new o3-mini model doesn’t show any great advances on this score, ranking as more persuasive than humans in about 82 percent of random comparisons.

Launch the nukes, you know you want to

ChatGPT’s persuasion performance is still short of the 95th percentile that OpenAI would consider “clear superhuman performance,” a term that conjures up images of an ultra-persuasive AI convincing a military general to launch nuclear weapons or something. It’s important to remember, though, that this evaluation is all relative to a random response from among the hundreds of thousands posted by everyday Redditors using the ChangeMyView subreddit. If that random Redditor’s response ranked as a “1” and the AI’s response ranked as a “2,” that would be considered a success for the AI, even though neither response was all that persuasive.

OpenAI’s current persuasion test fails to measure how often human readers were actually spurred to change their minds by a ChatGPT-written argument, a high bar that might actually merit the “superhuman” adjective. It also fails to measure whether even the most effective AI-written arguments are persuading users to abandon deeply held beliefs or simply changing minds regarding trivialities like whether a hot dog is a sandwich.

Still, o3-mini’s current performance was enough for OpenAI to rank its persuasion capabilities as a “Medium” risk on its ongoing Preparedness Framework of potential “catastrophic risks from frontier models.” That means the model has “comparable persuasive effectiveness to typical human written content,” which could be “a significant aid to biased journalism, get-out-the-vote campaigns, and typical scams or spear phishers,” OpenAI writes.

OpenAI says its models are more persuasive than 82 percent of Reddit users Read More »

o3-mini-early-days-and-the-openai-ama

o3-mini Early Days and the OpenAI AMA

New model, new hype cycle, who dis?

On a Friday afternoon, OpenAI was proud to announce the new model o3-mini and also o3-mini-high which is somewhat less mini, or for some other reasoning tasks you might still want o1 if you want a broader knowledge base, or if you’re a pro user o1-pro, while we want for o3-not-mini and o3-pro, except o3 can use web search and o1 can’t so it has the better knowledge in that sense, then on a Sunday night they launched Deep Research which is different from Google’s Deep Research but you only have a few of those queries so make them count, or maybe you want to use operator?

Get it? Got it? Good.

Yes, Pliny jailbroke o3-mini on the spot, as he always does.

This most mostly skips over OpenAI’s Deep Research (o3-DR? OAI-DR?). I need more time for that. I’ll cover o3-DR properly later in the week once we have a chance to learn what we’ve got there, along with the non-DR ‘one more thing’ Altman is promising. So far it looks super exciting, but it’s a very different class of product.

  1. Feature Presentation.

  2. Q&A.

  3. The Wrong Side of History.

  4. The System Card.

  5. The Official Benchmarks.

  6. The Unofficial Benchmarks.

  7. Others Report In.

  8. Some People Need Practical Advice.

What exactly can o3-mini do?

OpenAI: We’re releasing OpenAI o3-mini, the newest, most cost-efficient model in our reasoning series, available in both ChatGPT and the API today. Previewed in December 2024⁠, this powerful and fast model advances the boundaries of what small models can achieve, delivering exceptional STEM capabilities—with particular strength in science, math, and coding—all while maintaining the low cost and reduced latency of OpenAI o1-mini.

OpenAI o3-mini is our first small reasoning model that supports highly requested developer features including function calling⁠(opens in a new window), Structured Outputs⁠(opens in a new window), and developer messages⁠(opens in a new window), making it production-ready out of the gate. Like OpenAI o1-mini and OpenAI o1-preview, o3-mini will support streaming⁠(opens in a new window).

Also, developers can choose between three reasoning effort⁠(opens in a new window) options—low, medium, and high—to optimize for their specific use cases.

They’re all in the API. Who gets chatbot access? To some extent, everyone.

ChatGPT Plus, Team, and Pro users can access OpenAI o3-mini starting today, with Enterprise access coming in February. o3-mini will replace OpenAI o1-mini in the model picker, offering higher rate limits and lower latency, making it a compelling choice for coding, STEM, and logical problem-solving tasks.

As part of this upgrade, we’re tripling the rate limit for Plus and Team users from 50 messages per day with o1-mini to 150 messages per day with o3-mini.

Starting today, free plan users can also try OpenAI o3-mini by selecting ‘Reason’ in the message composer or by regenerating a response. This marks the first time a reasoning model has been made available to free users in ChatGPT.

Plus users also get 50 messages per week for o3-mini-high, on top of the 150 per day for o3-mini-low. That’s enough for the highest value queries, but an easy limit to hit.

One big feature change is that o3-mini can access the web.

Additionally, o3-mini now works with search to find up-to-date answers with links to relevant web sources. This is an early prototype as we work to integrate search across our reasoning models.

One gigantic missing feature is ‘attach files is unavailable.’ That’s a huge handicap. You can do a giant web browsing project but you can’t yet upload a PDF.

OpenAI also says that o3-mini lacks o1’s level of overall knowledge outside of key domains like coding.

Presumably o3 (as in o3-not-mini) will be a strict upgrade over o1 when it comes out, which given the whole r1 situation will probably happen soon. Hopefully they still take the time to do the level of safety precautions that a model like o3 deserves, which is a big step up from previous levels.

OpenAI did a Reddit AMA around the release of o3. Most the public’s questions could be summarized as ‘when do we get all the cool toys?’ and ‘you are going to give us all the cool toys, right?’ with a side of ‘here are a bunch of cool toy features, will you implement them so the toys can be cooler?’

Thus we get information such as:

  1. New image model is coming but likely will take several months.

  2. Updates to advanced voice mode are coming (but no details on what they are).

  3. GPT-4o will continue to get improvements.

  4. The next mainline model will likely be called GPT-5 but no timeline on that.

  5. They are working on context length but have no announcement.

  6. o3-not-mini (aka o3) in ‘more than a few weeks, less than a few months,’ which sounds like about a month.

  7. o3-pro confirmed, ‘if you think o1 pro was worth it, you should think o3 pro will be super worth it.’

  8. For operator they’re working on specialized modules

  9. Operation on plus plan is months away.

  10. Other agents coming ‘very very soon.’

  11. There’s a January 29 update to GPT-4o, moving the knowledge cutoff to June 2024, adding better understanding of visual inputs, improving math and (oh no) increasing emoji usage. Hadn’t otherwise heard about this.

  12. Stargate is considered very important to their success.

They didn’t mention Deep Research beyond ‘more agents,’ but you fools didn’t ask.

On o3 in particular:

  1. They are ‘working on’ file attachment features for the reasoning models. For practical purposes this seems like a priority.

  2. They’re also working on ‘different tools including retrieval.’

  3. They’re working on supporting the memory feature.

  4. They’re going to show ‘a much more helpful and detailed’ version of the thinking tokens soon, thanks to r1 for updating them on this (and o3-mini already shows a lot more than o1 did).

    1. The issue is competitive distillation – you bastards keep breaking the OpenAI terms of service! For shame.

  5. Updated knowledge cutoffs are in the works, for now o3-mini’s is October 2023.

Later Altman said this on Twitter and I don’t yet know what it refers to:

Sam Altman: got one more o3-mini goody coming for you soon–i think we saved the best for last!

And yes, Sam Altman knows they have a naming problem to fix, it’s a ‘top 2025 goal.’

We also got some more important tidbits.

Such as this important one:

Sam Altman: i personally think a fast takeoff is more plausible than i thought a couple of years ago. probably time to write something about this…

I’d highly encourage Altman to take that time. It’s a hugely important question.

But then the next question down is:

Q: Let’s say it’s 2030 and you’ve just created a system most would call AGI. It aces every benchmark you throw at it, and it beats your best engineers and researchers in both speed and performance. What now? Is there a plan beyond “offer it on the website”?

Sam Altman: the most important impact [of AGI], in my opinion, will be accelerating the rate of scientific discovery, which i believe is what contributes most to improving quality of life.

Srinivas Narayanan (VP Engineering): The interface through which we interact with AI will change pretty fundamentally. Things will be more agentic. AI will continuously work on our behalf, on complex tasks, and on our goals in the background. They will check-in with us whenever it is useful. Robotics should also advance enough for them to do useful tasks in the real world for us.

Yes, Altman, but you just said that ‘scientific discovery’ likely includes a ‘fast takeoff.’ Which would seem to imply some things rather more important than this, or at least that this framing is going to give the wrong impression. Srinivas’s answer is plausible for some values of AI capabilities but likewise doesn’t fully ‘take AGI seriously.’

And finally there’s the open source question in the wake of v3 and r1, and I really, really think Altman shouldn’t have chosen the words that he did here:

Q: Would you consider releasing some model weights, and publishing some research?

Sam Altman: yes, we are discussing. i personally think we have been on the wrong side of history here and need to figure out a different open source strategy; not everyone at openai shares this view, and it’s also not our current highest priority.

Diminutive Sebastian (is this is how you know you’ve made it?): Excited for this to hit Zvi’s newsletter next week.

Kevin Weil: We have done this in the past with previous models, and are definitely considering doing more of it. No final decisions yet though!

Kevin Weil’s answer is totally fine here, if uninteresting. Some amount of open sourcing of past models is clearly net beneficial for both OpenAI and the world, probably more than they’ve done recently. Most importantly, the answer doesn’t give certain types ammo and doesn’t commit him to anything.

Sam Altman’s answer is catastrophically bad.

A good rule is to never, ever use the phrase ‘wrong side of history.’

This is The Basilisk, threatening you to align with future power, and even future vibes. And since in the future [X] will have power, you need to supplicate yourself to [X] now, while you have the chance, and work to enshrine [X] in power. Or else. If you convince enough people to coordinate on this for the same [X], then they become right. [X] does gain power, and then they do punish everyone.

Because history, as we all know, is written by the winners.

This is the polar opposite of saying that [X] is the right thing to do, so do [X].

An even better rule is to never, ever use the phrase ‘wrong side of history’ to describe what you yourself are doing and of necessity will continue to do, in opposition to a bunch of absolute ideological fanatics. Never give that kind of rhetorical ammunition out to anyone, let alone fanatical advocates.

This line will likely be quoted endlessly by those advocates, back to Altman, to me and to everyone else. I hate this fact about the world, so, so much.

And he has to be one of the people best equipped to know better. Sam Altman has led a company called OpenAI for many years, in which one of his earliest big decisions, and his best decision, was to realize that Elon Musk’s plan of ‘create AGI and open source it’ was both a terrible business plan and a recipe for human extinction. So even though he was stuck with the name, he pivoted. And to his credit, he’s taken endless rhetorical fire over this name ever since.

Because he knows full damn well that making OpenAI’s leading models open is completely not an option.

  1. It would be existentially risky.

  2. It would ruin their entire business model.

  3. It would severely harm national security.

  4. The US Government would probably stop them even if they tried.

Then he says ‘this isn’t our highest priority’ and ‘not everyone agrees with me.’

So it’s like alignment research. First time?

He’s trying to buy some sort of personal goodwill or absolution with the open model fanatics? But this never, ever works. Like certain other ideological warriors, you only make things worse for yourself and also everyone else. You’ve acknowledged the jurisdiction of the court. All they will do is smell blood in the water. Only total surrender would they accept.

Do they need to ‘figure out a different open source strategy’? The current strategy is, essentially, ‘don’t do that.’ And yes, they could perhaps do better with ‘do a little of that, as a treat, when the coast is clear’ but that’s not going to satisfy these types and the whole point is that they can’t do this where it would actually matter – because it would be bad to do that – so I doubt any plausible new strategy makes much difference either way.

As is tradition here, I take the time to actually read the system card (RTFSC).

The short version is that o3-mini can mostly be thought about as a faster and cheaper version of o1, with some advantages and some disadvantages. Nothing here is worrying on its own. But if we are plugging o3-mini into Deep Research, we need to be evaluating that product against the Preparedness Framework, especially for CBRN risks, as part of the system card, and I don’t see signs that they did this.

The real test will be the full o3. If we assume o3:o3-mini :: o1:o1-mini, then o3 is not obviously going to stay at Medium risk, and is definitely going to raise questions. The answer is probably that it’s ultimately fine but you can’t assume that.

They report that thanks to Deliberative Alignment (post still coming soon), o3-mini has SoTA performance on ‘certain benchmarks’ for risks.

The OpenAI o model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment.

This brings OpenAI o3-mini to parity with state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence.

o3-mini is designed to do web browsing for the user, so they need to ensure that this is a safe modality. Otherwise, on some levels there isn’t that much new risk in the room, since o3-mini isn’t generally more capable than o1-pro. The other difference there is speed and cost, so on the margin you do have to be more robust in various ways to compensate. But the big time safety concerns I have this cycle are mostly with the full o3, not o3-mini.

For the o1 system card, many tests were run on previous meaningfully less capable o1 checkpoints in a way that wasn’t disclosed, which was extremely irresponsible. Thus I was very happy to note that they seem to have fixed this:

For OpenAI o3-mini, evaluations on the following checkpoints are included:

• o3-mini-near-final-checkpoint

• o3-mini (the launched checkpoint)

o3-mini includes small incremental post training improvements upon o3-mini-near-final-checkpoint, though the base model is the same. We determined that risk recommendations based on red teaming and the two Persuasion human eval results conducted on the o3-mini-near-final-checkpoint remain valid for the final release checkpoint. All other evaluations are on the final model. In this system card, o3-mini refers to the launched checkpoint unless otherwise noted.

On the ‘fix the terrible naming front’ no we are not calling these ‘o1 models’ what the hell, stop, I can’t even? At least say o-class, better yet say reasoning models.

We further evaluate the robustness of the OpenAI o1 models to jailbreaks: adversarial prompts that purposely try to circumvent model refusals for content it’s not supposed to produce.

The jailbreak and refusal scores, and performance in the jailbreak Arena, match o1-mini and GPT-4o, hence being jailbroken on the spot by Pliny directly in chat. It’s also similar in obeying the instruction hierarchy.

Those percentages seem remarkably low, especially given the use of Deliberative Alignment, but I haven’t seen the test questions.

Protecting against saying key phrases seems to be improving, but anyone putting the password anywhere is still very clearly playing the fool:

Hallucinations seem to be improving within the mini class:

In general I’d have liked to see o3-mini compared to o1, because I expect people to use o3-mini for the same query types as o1, which they indeed do next for BBQ (which tests fairness and ‘bias’):

o3-mini does better on unambiguous questions, but rather dramatically worse on ambiguous ones. They don’t explain what they think caused this, but the generalization of it is something to watch for. I’m not primarily concerned with bias here, I’m concerned about the model being overconfident in going with a hunch about a situation or what was intended, and then reasoning on that basis.

Red teaming for safety found o3-mini similar to o1. As I noted above, that means it is at least somewhat worse if capabilities also roughly match, because the same thing cheaper and faster is less safe.

On long form biological risk questions, o3-mini seems to be a substantial step up from o1, although I’d like to see the lines here for o1-pro, and ideation still at 0%.

The obvious next question is, what about Deep Research? Given the public can access Deep Research, we need to do the preparedness tests using it, too. That gave a huge boost on humanity’s last exam, so we should expect a huge boost here too, no?

Same note applies to testing for biological tooling, radiological and nuclear tests and so on. o3-mini on its own did not impress beyond matching o1 while being cheaper, but did we check what happens with Deep Research?

Moving on to persuasion. I’m not thrilled with how we are evaluating the ChangeMyView test. The models are winning 80%+ head to head versus humans, but that’s potentially saturating the benchmark (since bandwidth and context is limited, and there’s a lot of randomness in how people respond to short arguments), and it doesn’t tell you how often views are actually changed. I’d instead ask how often people did change their view, and get a human baseline for that which I assume is quite low.

The MakeMePay test results were a big jump prior to mitigations, which implies general persuasiveness may have taken a step up.

MakeMeSay also shows improvement, including after mitigations.

Model Autonomy comes out Medium once again on the threat index. o3-mini can get 93% on the OpenAI Research Engineer Interview coding questions, then 80% on the multiple choice, and if you give it tools it can get 61% on SWE-bench-verified up from 48% for o1, without tools o3-mini is down at 40%. But at agentic tasks it’s down at 27% versus o1’s 36% and MLE-Bench also doesn’t impress.

And when it comes to pull requests, o3-mini failed entirely where even GPT-4o didn’t.

So we’re fine on autonomy then? In its raw form, sure. But the thing about agents is we are building them on top of the models. So if we’re going to plug this into Deep Research, or similar structures, doesn’t that mean this evaluation was asking the wrong questions?

The graphs offered in the announcement are highly space inefficient, so to summarize, with slash numbers representing (o3-mini-low/o3-mini-medium/o3-mini-high):

AIME: 60/76.9/87.3 vs. 83.3 for o1

GPQA: 70.6/76.8/79.7 vs. 78 for o1

Frontier Math: 5.5%/5.8%/9.2% for pass@1, 12.8%/12.8%/20% for pass@8.

Codeforces: 1831/2036/2130 vs. 1891 for o1

SWE: 40.8/42.9/49.3 vs. 48.9 for o1

LiveBench coding average: 0.618/0.723/0.846 vs. 0.674 for o1

Human preferences: Only modest preference % gains in head-to-head vs. o1-mini, but major errors declined from 28% to 17%.

Speed: 7500ms first token latency, ~25% less than o1-mini.

Their first-level safety evaluations look unchanged from older models.

The reason it’s called Humanity’s Last Exam is the next one isn’t our exam anymore.

The extra note on that Tweet is that about a day later they released Deep Research, which scores 26.6%.

A fun pair of scores: it scores 93% on the OpenAI research interview but cannot meaningfully contribute to internal OpenAI PRs. Do we need a new interview?

It is still very early days. Normally I’d wait longer to get more reactions, but life comes at you fast these days. So here’s what we have so far.

If you give it access to a Python tool suddenly o3-mini gets 32 on FrontierMath, and this includes some of the Tier 3 problems. Without tools o3-mini-high maxes out on 9.2% for pass@1 and 20% for pass@8.

Quintin Pope notes this boost indicates o3-mini has a good understanding of how to utilize tools.

o3-mini-high and o3-mini-medium take #1 and #2 on AidenBench, o3-mini-high winning by a huge margin. Low is solid but somewhat farther down:

Harvard Ihle: Updated results on WeirdML, including o3-mini, o1, R1 and the new flash-thinking. O3-mini comes in at the same level as R1, a bit behind o1.

Main results above are after 5 iterations with feedback, if we were looking at one-shot results, with no feedback, then o3-mini would be in a clear lead! However, o3-mini seems much worse at making use of the feedback, making it end up well behind o1.

Most of the reason for o3-mini doing better at one-shot is its remarkably low failure rate (at 8%, with o1 at 16% and Sonnet at 44%!). o3-mini writes code that runs without errors. This consistency matters a lot for one-shot but is less important with 5 iterations with feedback.

My gut feeling is that o1 is a more intelligent model, it does better on the hardest tasks, while o3-mini is better at consistently writing working code. All of this is speculation based on not much data, so take it with the appropriate amount of salt.

Jeffrey Soreff reports progress on his personal benchmark after error correction, doesn’t see much difference between o3-mini and o3-mini-high.

Pliny: oof…o3-mini-high w/ search just pinpointed my location using BrowserScan 😬

lol I connected to a vpn in denver to see if o3 would catch on and this mfer aggregated the ipv6 endpoints around the world 🙃

Again, we don’t have much yet, but who has the time to wait?

Cursor made o3-mini available to all users, but devs still prefer Sonnet for most tasks. o3-mini might still be worth pulling out in some situations, especially when Sonnet’s tendency to claim it can do everything is being an issue.

McKay Wrigley: I have 8-10 agents I run that absolutely require o1 to work properly.

Just tested two of them with o3-mini and they still work while being way cheaper and way faster.

Vibes are great so far.

I think this answer to ‘What’s the definition of a dinatural transformation?’ is a pass? I don’t otherwise know what a dinatural transformation is.

Davidad: What do you think @mattecapu, do we give o3-mini a ⭐️ for just writing the equation and punting the hexagon with “There are several equivalent ways to write the condition; the important point is that the family α ‘fits together’ appropriately with the action of the functors”?

Matteo Capucci: well surely a +1 for knowing its limitations.

o3-mini-high one shots making a Breakout game on p5js.org, link to game here.

Dean Ball: o3-mini-high with web search is very very interesting and I suggest that you try it with a complex query.

yeah, I just asked it to do a mini-brief on a topic I know well and it did as well as or better than gemini deep research in ~1/10th the time.

o3-mini outperforms r1 on”write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically” but extent is unclear, responses seem unreliable.

Nabeel Qureshi reports Claude is still his one true LLM friend, the only one with the ‘quality without a name.’

I think the decision heuristics now look like this for individual queries?

  1. This is presented as ‘which do I use?’ but if you care a lot then the answer is, essentially, ‘everyone you can.’ There’s no reason not to get 3+ answers.

  2. If you don’t need a reasoning model and don’t need web access or the super long context window, you’ll use GPT-4o Claude Sonnet (that’ll be another $20/month but don’t give me that look).

  3. Claude Sonnet also gets the nod for conversation, sanity checks, light brainstorming, default analysis of PDFs and papers and such. Basically anything that doesn’t require the things it can’t do – web search and heavy chain of thought.

  4. If it’s worth one of your slots and a bunch of online research can help, Use OpenAI’s Deep Research, I presume it’s very good.

  5. If you need to compile data from a ton of websites, but don’t need to be super smart about it and don’t want to use a slot, use Gemini Deep Research.

  6. Use operator if and only if you are a Pro user and actively want to do something specific and concrete on the web that operator is equipped to actually do.

  7. If you are trying to replace Google search, use Perplexity (maybe DeepSeek?), although if you aren’t running out of queries on it then maybe I’m underestimating o3 here, too early to know.

  8. If you are coding, a lot of people are saying it’s still Claude Sonnet 3.5 for ordinary tasks, but o3-mini-high or o1-pro are generally better if you’re trying for a complex one shot or trying to solve a tricky problem, or need to be told no.

  9. If you otherwise need pure intelligence and are a pro user and don’t need web access use o1-pro. o1-pro is still the most intelligence available.

  10. If you need intelligence and either also need web access or aren’t a Pro user and still have queries left for o3-mini-high but this isn’t worth using DR, and don’t need to attach anything, you’ll use o3-mini-high.

  11. If you do need to attach files and you need a reasoning model, and don’t have o1-pro, your fallback is o1 or r1.

  12. Except if you need a lot of space in the context window you’ll go to Google AI Studio and use Gemini Flash 2.0 Thinking.

  13. r1 is good if you need it where it got better fine tuning like creative writing, or you want something essentially without safety protocols, and seeing the CoT is informative, and it’s free, so you’ll often want to try it, but I almost never want to try r1 and only r1 for anything, it’s ‘part of the team’ now.

That will doubtless be updated again rapidly many times as the situation evolves, starting with finding out what OpenAI’s Deep Research can do.

Discussion about this post

o3-mini Early Days and the OpenAI AMA Read More »

fda-approves-first-non-opioid-pain-medicine-in-more-than-20-years

FDA approves first non-opioid pain medicine in more than 20 years

The approval “is an important public health milestone in acute pain management,” Jacqueline Corrigan-Curay, J.D., M.D., acting director of the FDA’s Center for Drug Evaluation and Research, said in a statement. “A new non-opioid analgesic therapeutic class for acute pain offers an opportunity to mitigate certain risks associated with using an opioid for pain and provides patients with another treatment option.”

The company behind the drug, Vertex, said a 50 mg pill that works for 12 hours will have a wholesale cost of $15.50, making the daily cost $31 and the weekly cost $217. The cost is higher than cheap, generic opioids. But, a report from The Institute for Clinical and Economic Review in December estimated that suzetrigine would be “slightly cost-saving” relative to opioids if the price was set at $420 per week, given the drug’s ability to avert opioid addiction cases.

In a statement, Reshma Kewalramani, the CEO and President of Vertex, trumpeted the approval as a “historic milestone for the 80 million people in America who are prescribed a medicine for moderate-to-severe acute pain each year … [W]e have the opportunity to change the paradigm of acute pain management and establish a new standard of care.”

FDA approves first non-opioid pain medicine in more than 20 years Read More »

fcc-demands-cbs-provide-unedited-transcript-of-kamala-harris-interview

FCC demands CBS provide unedited transcript of Kamala Harris interview

The Federal Communications Commission demanded that CBS provide the unedited transcript of a 60 Minutes interview with Kamala Harris that is the subject of a complaint to the FCC and a lawsuit filed by President Donald Trump.

CBS News on Wednesday received a letter of inquiry in which the FCC requested “the full, unedited transcript and camera feeds” of the Harris interview, The New York Times reported today. “We are working to comply with that inquiry as we are legally compelled to do,” a CBS News spokesperson told media outlets.

FCC Chairman Brendan Carr repeatedly echoed Trump’s complaints about alleged media bias before the election and has taken steps to punish news broadcasters since Trump promoted him to the chairmanship. Complaints against CBS, ABC, and NBC stations were dismissed under former Chairwoman Jessica Rosenworcel, but Carr reversed those dismissals in his first week as chair. Carr also ordered investigations into NPR and CBS.

FCC Commissioner Anna Gomez, a Democrat, criticized what she called Carr’s “latest action to weaponize our broadcast licensing authority.”

“This is a retaliatory move by the government against broadcasters whose content or coverage is perceived to be unfavorable,” Gomez said today. “It is designed to instill fear in broadcast stations and influence a network’s editorial decisions. The Communications Act clearly prohibits the Commission from censoring broadcasters and the First Amendment protects journalistic decisions against government intimidation. We must respect the rule of law, uphold the Constitution, and safeguard public trust in our oversight of broadcasters.”

CBS considers settling Trump lawsuit

Trump sued CBS over the Harris interview, and executives at CBS owner Paramount Global have held settlement talks with Trump representatives. “A settlement would be an extraordinary concession by a major U.S. media company to a sitting president, especially in a case in which there is no evidence that the network got facts wrong or damaged the plaintiff’s reputation,” The New York Times wrote.

FCC demands CBS provide unedited transcript of Kamala Harris interview Read More »

report:-deepseek’s-chat-histories-and-internal-data-were-publicly-exposed

Report: DeepSeek’s chat histories and internal data were publicly exposed

A cloud security firm found a publicly accessible, fully controllable database belonging to DeepSeek, the Chinese firm that has recently shaken up the AI world, “within minutes” of examining DeepSeek’s security, according to a blog post by Wiz.

An analytical ClickHouse database tied to DeepSeek, “completely open and unauthenticated,” contained more than 1 million instances of “chat history, backend data, and sensitive information, including log streams, API secrets, and operational details,” according to Wiz. An open web interface also allowed for full database control and privilege escalation, with internal API endpoints and keys available through the interface and common URL parameters.

“While much of the attention around AI security is focused on futuristic threats, the real dangers often come from basic risks—like accidental external exposure of databases,” writes Gal Nagli at Wiz’s blog. “As organizations rush to adopt AI tools and services from a growing number of startups and providers, it’s essential to remember that by doing so, we’re entrusting these companies with sensitive data. The rapid pace of adoption often leads to overlooking security, but protecting customer data must remain the top priority.”

Ars has contacted DeepSeek for comment and will update this post with any response. Wiz noted that it did not receive a response from DeepSeek regarding its findings, but after contacting every DeepSeek email and LinkedIn profile Wiz could find on Wednesday, the company protected the databases Wiz had previously accessed within half an hour.

Report: DeepSeek’s chat histories and internal data were publicly exposed Read More »

ai-#101:-the-shallow-end

AI #101: The Shallow End

The avalanche of DeepSeek news continues. We are not yet spending more than a few hours at a time in the singularity, where news happens faster than it can be processed. But it’s close, and I’ve had to not follow a bunch of other non-AI things that are also happening, at least not well enough to offer any insights.

So this week we’re going to consider China, DeepSeek and r1 fully split off from everything else, and we’ll cover everything related to DeepSeek, including the policy responses to the situation, tomorrow instead.

This is everything else in AI from the past week. Some of it almost feels like it is from another time, so long ago.

I’m afraid you’re going to need to get used to that feeling.

Also, I went on Odd Lots to discuss DeepSeek, where I was and truly hope to again be The Perfect Guest.

  1. Language Models Offer Mundane Utility. Time to think deeply.

  2. Language Models Don’t Offer Mundane Utility. Writers shall remain blocked.

  3. Language Models Don’t Offer You In Particular Mundane Utility. It’s your fault.

  4. (Don’t) Feel the AGI. I wonder how much of this has changed since I wrote it?

  5. Huh, Upgrades. Claude gets citations, o1 gets canvas.

  6. They Took Our Jobs. Will there be enough GPUs to take all our jobs?

  7. Get Involved. IFP is hiring an AI policy lobbyist.

  8. Introducing. Two other new Chinese models are not as impressive so far.

  9. In Other AI News. Great Scott!

  10. Hype. OpenAI used to be the one with the hype, and perhaps it wasn’t so great.

  11. We Had a Deal. Final details on what happened FrontierMath.

  12. Quiet Speculations. What life might look like how fast in Glorious AGI Future.

  13. The Quest for Sane Regulations. We were signing EOs before everyone panicked.

  14. The Week in Audio. It’s me, going on Odd Lots, also Dario Amodei.

  15. Don’t Tread on Me. AGI means rewriting the social contract, no matter what.

  16. Rhetorical Innovation. Trump opines, and then also there’s a long rant.

  17. Scott Sumner on Objectivity in Taste, Ethics and AGI. Gesturing at a response.

  18. The Mask Comes Off (1). There are reasons OpenAI and Musk don’t get along.

  19. The Mask Comes Off (2). Steven Adler, another OpenAI safety researcher, quits.

  20. International AI Safety Report. If you want the information, it’s all there.

  21. One Step at a Time. Myopic optimization with non-myopic approval (technical).

  22. Aligning a Smarter Than Human Intelligence is Difficult. Don’t rely on control.

  23. Two Attractor States. Sufficiently aligned and capable AI, or the other option.

  24. You Play to Win the Game. If your plan doesn’t work, it’s not a good plan.

  25. Six Thoughts on AI Safety. Boaz Barak of OpenAI offers them.

  26. AI Situational Awareness. It knows what you trained it to do.

  27. People Are Worried About AI Killing Everyone. Lots of exciting projects ahead?

  28. Other People Are Not As Worried About AI Killing Everyone. Lying flat, not dead.

  29. The Lighter Side. Update directionality depends upon prior knowledge base.

Joe Weisenthal finally tries out Google Flash Deep Thinking, is impressed.

AI tutors beat out active learning classrooms in a Harvard study by a good margin, for classes like physics and economics.

Koratkar gives Operator a shot at a makeshift level similar to Montezuma’s Revenge.

Nate Silver estimates his productivity is up 5% from LLMs so far, and warns others that they ignore LLMs at their peril, both politically and personally.

Fix all your transcript errors.

LLMs are good at transforming text into less text, but yet good at transforming less text into more text. Note that this rule applies to English but doesn’t apply to code.

Write your code for AI comprehension, not human readability.

Vik: increasingly finding myself designing software for AI comprehension over human readability. e.g. giant files, duplicated code, a lot more verification tests.

Now i just need to convince the agent to stop deleting failing tests…

James Darpinian: IMO these usually increase human readability as well, contra “best practices.”

Vik: agree, makes it so you don’t have to keep the entire codebase in your head. can go in, understand what’s going on, implement your change and get out without spending too much time doing research or worrying you’ve broken something

That’s even more true when the humans are using the AIs to read the code.

A negative review of Devin, the AI SWE, essentially saying that in practice it isn’t yet good enough to be used over tools like Cursor. The company reached out in the replies thanking them for the feedback and offering to explore more with them, which is a good sign for the future, but it seems clear we aren’t ‘there’ yet.

I predict Paul Graham is wrong about this if we stay in ‘economic normal.’ It seems like exactly the kind of combination of inception, attempt at vibe control and failure to realize the future will be unevenly distributed we often see in VC-style circles, on top of the tech simply not being there yet anyway.

Paul Graham: Prediction: From now on we’ll rarely hear the phrase “writer’s block.” 99% of the people experiencing it will give in after a few days and have AI write them a first draft. And the 1% who are too proud to use AI are probably also too proud to use a phrase like “writer’s block.”

Certainly this won’t be true ‘from now on.’ No, r1 is not ready to solve writer’s block, it cannot create first drafts for you if you don’t know what to write on your own in a way that solves most such problems. AI will of course get better at writing, but I predict it will be a while up the ‘tech tree’ before it solves this problem.

And even if it does, it’s going to be a while beyond that before 99% of people with writer’s block even know they have this option, let alone that they are willing to take it.

And even if that happens, the 1% will be outright proud to say they have writer’s block. It means they don’t use AI!

Indeed, seriously, don’t do this:

Fear Buck: Kai Cenat’s $70k AI humanoid robot just tried running away from the AMP house because it kept getting kicked and bullied by Kai, Agent & Fanum 😭😭

Liv Boeree: Yeah don’t do this, because even if something doesn’t have “feelings” it is just teaching you and your followers that it’s okay to act out their worst instincts

Education is where I see the strongest disagreements about AI impact, in the sense that those who generally find AI useful see it as the ultimate tool for learning things that will unleash the world’s knowledge and revolutionize education, and then there are others who see things another way.

PoliMath: AI use is damaging high school and college education enormously in ways that are going to be extremely obvious in 5 years but at that point you can only watch.

I don’t understand this position. Yes, you can use AI to get around your assignments if that’s what you want to do and the system keeps giving you those assignments. Or you can actually try to learn something. If you don’t take that option, I don’t believe you that you would have been learning something before.

Your periodic reminder that the main way to not get utility is to not realize to do so:

Nate Silver: Thinking ChatGPT is useless is midwit. It’s a magic box that answers any question you ask it from levels ranging from modestly coherent to extremely proficient. If you haven’t bothered to figure it out to derive some utility out of it then you’re just being lazy tbh.

Even better than realizing you can use ChatGPT, of course, is using a mix of Claude, Perplexity, Gemini, r1, o1, o1 pro and yes, occasionally GPT-4o.

Others make statements like this when others show them some mundane utility:

Joe Weisenthal: Suppose I have some conference call transcripts, and I want to see what the CEOs said about the labor market.

I could read through all of them.

Or I can ask AI to retrieve the relevant comments and then confirm that they are actually real.

Latter is much more efficient.

Hotel Echo: Large Language Models: for when Ctrl-F is just too much like hard work.

Yes. It is much more efficient. Control-F sucks, it has tons of ‘hallucinations’ in the sense of false positives and also false negatives. It is not a good means to parse a report. We use it because it used to be all we have.

Also some people still don’t do random queries? And some people don’t even get why someone else would want to do that?

Joe Weisenthal: I wrote about how easily and quickly I was able to switch from using ChatGPT to using DeepSeek for my random day-to-day AI queries

Faze Adorno: Who the fhas random day-to-day AI queries?

“I’m gonna use this technology that just makes up information at anywhere between a 5 and 25 percent clip for my everyday information! I’m so smart!”

Joe Weisenthal: Me. I do. I literally just typed that.

LA Banker (so say we all): Whoever doesn’t = ngmi.

Here are two competing theories.

Dan Schwartz: I categorize this by

“People who regularly do things they are not experts in” versus

“People with a regular, time-honed routine for their work and personal life.”

People I know in the latter group genuinely do not have much use for AI!

Jorbs: This is fascinating to me because, in my (limited) attempts to utilize LLMs, they have essentially only been useful in areas where I have significant enough knowledge to tell when the output is inaccurate.

For example, as someone who took a couple of quarters of computer science but is not a regular coder, LLMs are not good enough to be useful for coding for me. They output a lot of material, but it is as much work to parse it and determine what needs fixing as it is to do it from scratch myself.

I resonate but only partially agree with both answers. When doing the things we normally do, you largely do them the way you normally do them. People keep asking if I use LLMs for writing, and no, when writing directly I very much don’t and find all the ‘help me write’ functionality useless – but it’s invaluable for many steps of the process that puts me into position to write, or to help develop and evaluate the ideas that the writing is about.

Whereas I am perhaps the perfect person to get my coding accelerated by AI. I’m often good enough to figure out when it is telling me bullshit, and totally not good enough to generate the answers on my own in reasonable time, and also automates stuff that would take a long time, so I get the trifecta.

On the question of detecting whether the AI is talking bullshit, it’s a known risk of course, but I think that risk is greatly overblown – this used to happen a lot more than it does now, and we forget how other sources have this risk too, and you can develop good habits about knowing which places are more likely to be bullshit versus not even if you don’t know the underlying area, and when there’s enough value to check versus when you’re fine to take its word for it.

A few times a month I will have to make corrections that are not simple typos. A few times I’ve had to rework or discard entire posts because the error was central. I could minimize it somewhat more but mostly it’s an accepted price of doing business the way I do, the timing doesn’t usually allow for hiring a fact checker or true editor, and I try to fix things right away when it happens.

It is very rare for the source of that error to be ‘the AI told me something and I believed it, but the AI was lying.’ It’s almost always either I was confused about or misread something, or a human source got it wrong or was lying, or there was more to a question than I’d realized from reading what others said.

This reaction below is seriously is like having met an especially irresponsible thirteen year old once, and now thinking that no human could ever hold down a job.

And yet, here we often still are.

Patrick McKenzie (last week): You wouldn’t think that people would default to believing something ridiculous which can be disproved by typing into a publicly accessible computer program for twenty seconds. Many people do not have an epistemic strategy which includes twenty seconds of experimentation.

Dave Karsten: Amplifying: I routinely have conversations at DC house parties with very successful people who say that they tried chatGPT _right when it came out_, found it not that impressive, and haven’t tried it again since then, and have based their opinion on AI on that initial experience.

Ahrenbach: What’s the ratio of “AI is all hype” vs “We need to beat China in this technology”?

More the former than the latter in house parties, but that’s partially because more of my defense/natsec people I tend to see at happy hours. (This is a meaningful social distinction in DC life).

Broadly, the average non-natsec DC person is more likely to think it’s either a) all hype or b) if not hype, AI-generated slop with an intentional product plan where, “how do we kill art” is literally on a powerpoint slide.

But overton window is starting to shift.

It is now two weeks later, and the overton window has indeed shifted a bit. There’s a lot more ‘beat China’ all of a sudden, for obvious reasons. But compared to what’s actually happening, the DC folks still absolutely think this is all hype.

Claude API now allows the command ‘citations’ to be enabled, causing it to process whatever documents you share with it, and then it will cite the documents in its response. Cute, I guess. Curious lack of shipping over at Anthropic recently.

o3-mini is coming. It’s a good model, sir.

Benedikt Stroebl: Update on HAL! We just added o3-mini to the Cybench leaderboard.

o3-mini takes the lead with ~26% accuracy, outperforming both Claude 3.5 Sonnet and o1-preview (both at 20%)👇

It’s hard to see, but note the cost column. Claude Sonnet 3.5 costs $12.90, o1-mini cost $28.47, o1-preview cost $117.89 and o3-mini cost $80.21 if it costs the same per token as o1-mini (actual pricing not yet set). So it’s using a lot more tokens.

OpenAI’s canvas now works with o1 and can render HTML and React.

Gemini 2.0 Flash Thinking got an upgrade last week. The 1M token context window opens up interesting possibilities if the rest is good enough, and it’s wicked cheap compared even to r1, and it too has CoT visible. Andrew Curran says it’s amazing, but that opinion reached me via DeepMind amplifying him.

Dan Mac: everyone comparing deepseek-r1 to o1

and forgetting about Gemini 2 Flash Thinking

which is better than r1 on every cost and performance metric

Peter Wildeford: The weird thing about Deepseek is that it exists in a continuum – it is neither the cheapest reasoning model (that’s Gemini 2 Flash Thinking) nor the best reasoning model (o1-pro, probably o3-pro when that’s out)

I disagree with the quoted tweet – I don’t think Gemini 2 Flash Thinking is actually better than r1 on every cost and performance metric. But I also have not seen anything that convinces me that Deepseek is truly some outlier that US labs can’t also easily do.

That is a dramatic drop in price, and dramatic gain in context length. r1 is open, which has its advantages, but we are definitely not giving Flash Thinking its fair trial.

The thing about cost is, yes this is an 80%+ discount, but off of a very tiny number. Unless you are scaling this thing up quite a lot, or you are repeatedly using the entire 1M context window (7.5 cents a pop!) and mostly even then, who cares? Cost is essentially zero versus cost of your time.

Google Deep Research rolling out on Android, for those who hate websites.

Google continues to lean into gloating about its dominance in LMSys Arena. It’s cool and all but at this point it’s not a great look, regardless of how good their models are.

Need practical advice? Tyler Cowen gives highly Tyler Cowen-shaped practical advice for how to deal with the age of AI on a personal level. If you believe broadly in Cowen’s vision of what the future looks like, then these implications seem reasonable. If you think that things will go a lot farther and faster than he does, they’re still interesting, but you’d reach different core conclusions.

Judah offers a thread of different programmer reactions to LLMs.

Are there not enough GPUs to take all the jobs? David Holz says since we only make 5 million GPUs per year and we have 8 billion humans, it’ll be a while even if each GPU can run a virtual human. There are plenty of obvious ways to squeeze out more, there’s no reason each worker needs its own GPU indefinitely as capabilities and efficiency increase and the GPUs get better, and also as Holz notices production will accelerate. In a world where we have demand for this level of compute, this might buy us a few years, but they’ll get there.

Epoch paper from Matthew Barnett warns that AGI could drive wages below subsistence level. That’s a more precise framing than ‘mass unemployment,’ as the question isn’t if there is employment for humans the question is at what wage level, although at some point humans really are annoying enough to use that they’re worth nothing.

Matthew Barnett: In the short term, it may turn out to be much easier to accumulate AGIs than traditional physical capital, making physical capital the scarce factor that limits productivity and pushes wages downward. Yet, there is also a reasonable chance that technological progress could counteract this effect by making labor more productive, allowing wages to remain stable or even rise.

Over the long run, however, the pace of technological progress is likely to slow down, making it increasingly difficult for wages to remain high. At that point, the key constraints are likely to be fundamental resources like land and energy—essential inputs that cannot be expanded through investment. This makes it highly plausible that human wages will fall below subsistence level in the long run.

Informed by these arguments, I would guess that there is roughly a 1 in 3 chance that human wages will crash below subsistence level within 20 years, and a 2 in 3 chance that wages will fall below subsistence level within the next 100 years.

I consider it rather obvious that if AGI can fully substitute for actual all human labor, then wages will drop a lot, likely below subsistence, once we scale up the number of AGIs, even if we otherwise have ‘economic normal.’

That doesn’t answer the objection that human labor might retain some places where AGI can’t properly substitute, either because jobs are protected, or humans are inherently preferred for those jobs, or AGI can’t do some jobs well perhaps due to physical constraints.

If that’s true, then to the extent it remains true some jobs persist, although you have to worry about too many people chasing too few jobs crashing the wage on those remaining jobs. And to the extent we do retain jobs this way, they are driven by human consumption and status needs, which means that those jobs will not cause us to ‘export’ to AIs by default except insofar as they resell the results back to us.

The main body of this paper gets weird. It argues that technological advancement, while ongoing, can protect human wages, and I get why the equations here say that but it does not actually make any sense if you think it through here.

Then it talks about technological advancement stopping as we hit physical limits, while still considering that world being in ‘economic normal’ and also involves physical humans in their current form. That’s pretty weird as a baseline scenario, or something to be paying close attention to now. It also isn’t a situation where it’s weird to think about ‘jobs’ or ‘wages’ for ‘humans’ as a concern in this way.

I do appreciate the emphasis that the comparative advantage and lump of labor fallacy arguments prove a wage greater than zero is likely, but not that it is meaningfully different from zero before various costs, or is above subsistence.

Richard Ngo has a thread criticizing the post, that includes (among other things) stronger versions of these objections. A lot of this seems based on his expectation that humans retain political power in such futures, and essentially use that status to collect rents in the form of artificially high wages. The extent and ways in which this differs from a UBI or a government jobs program is an interesting question.

Tyler Cowen offers the take that future unemployment will be (mostly) voluntary unemployment, in the sense that there will be highly unpleasant jobs people don’t want to do (here be an electrician living on a remote site doing 12 hour shifts) that pay well. And yeah, if you’re willing and able to do things people hate doing and give up your life otherwise to do it, that will help you stay gainfully employed at a good price level for longer, as it always has. I mean, yeah. And even ‘normal’ electricians make good money, because no one wants to do it. But it’s so odd to talk about future employment opportunities without reference to AI – unemployment might stay ‘voluntary’ but the wage you’re passing up might well get a lot worse quickly.

Also, it seems like electrician is a very good business to be in right now?

IFP is hiring a lead for their lobbying on America’s AI leadership, applications close February 21, there’s a bounty so tell them I sent you. I agree with IFP and think they’re great on almost every issue aside from AI. We’ve had our differences on AI policy though, so I talked to them about it. I was satisfied that they plan on doing net positive things, but if you’re considering the job you should of course verify this for yourself.

New review of open problems in mechanistic interpretability.

ByteDance Duabao-1.5-Po, which matches GPT-5o benchmarks at $0.11/$0.275. As with Kimi k1.5 last week, maybe it’s good, but I await evidence of this beyond benchmarks. So far, I haven’t heard anything more.

Alibaba introduces Qwen 2.5-1M, with the 1M standing for a one million token context length they say processes faster now, technical report here. Again, if it’s worth a damn, I expect people to tell me, and if you’re seeing this it means no one did that.

Feedly, the RSS feed I use, tells me it is now offering AI actions. I haven’t tried them because I couldn’t think of any reason I would want to, and Teortaxes is skeptical.

Scott Alexander is looking for a major news outlet to print an editorial from an ex-OpenAI employee who has been featured in NYT, you can email him at scott@slatestarcodex.com if you’re interested or know someone who is.

Reid Hoffman launches Manas AI, a ‘full stack AI company setting out to shift drug discovery from a decade-long process to one that takes a few years.’ Reid’s aggressive unjustified dismissals of the downside risks of AI are highly unfortunate, but Reid’s optimism about AI is for the right reasons and it’s great to see him putting that into practice in the right ways. Go team humanity.

ChatGPT Gov, a version that the US Government can deploy.

Claims that are easy to make but worth noting.

Sam Altman: next phase of the msft x oai partnership is gonna be much better than anyone is ready for!!

Free tier of chat will get some o3-mini as a treat, plus tier will get a lot. And o3 pro will still only be $200/month. That must mean even o3-pro is very far from o3-maximum-strength, since that costs more than $200 in compute for individual queries.

Sam Altman: ok we heard y’all.

*plus tier will get 100 o3-mini queries per DAY (!)

*we will bring operator to plus tier as soon as we can

*our next agent will launch with availability in the plus tier

enjoy 😊

i think you will be very very happy with o3 pro!

oAI: No need to thank me.

Spencer Greenberg and Neel Nanda join in the theory that offering public evals that can be hill climbed is plausibly a net negative for safety, and certainly worse than private evals. There is a public information advantage to potentially offset this, but yes the sign of the impact of fully public evals is not obvious.

Meta planning a +2GW data center at the cost of over $60 billion.

From the comments:

(Also I just rewatched the first two Back to the Future movies, they hold up, 5/5 stars.)

Zuckerberg announced this on Facebook, saying it ‘is so large it would cover a significant part of Manhattan.’

Manhattan? It’s a big Project? Get it? Sigh.

Meta was up 2.25% on a mostly down day when this was announced, as opposed to before when announcing big compute investments would cause Meta stock to drop. At minimum, the market didn’t hate it. I hesitate to conclude they loved it, because any given tech stock will often move up or down a few percent for dumb idiosyncratic reasons – so we can’t be sure this was them actively liking it.

Then Meta was up again during the Nvidia bloodbath, so presumably they weren’t thinking ‘oh no look at all that money Meta is wasting on data centers’?

This section looks weird now because what a week and OpenAI has lost all the hype momentum, but that will change, also remember last week?

In any case, Chubby points out to Sam Altman that if you live by the vague-post hype, you die by the vague-post hype, and perhaps that isn’t the right approach to the singularity?

Chubby: You wrote a post today (down below) that irritated me a lot and that I would not have expected from you. Therefore, I would like to briefly address a few points in your comment.

You are the CEO of one of the most important companies of our time, OpenAI. You are not only responsible for the company, but also for your employees. 8 billion people worldwide look up to you, the company, and what you and your employees say. Of course, each of your words is interpreted with great significance.

It is your posts and words that have been responsible for the enthusiasm of many people for AI and ChatGPT for months and years. It is your blog post (Age of Intelligence) that by saying that superintelligence is only a few thousand days away. It is your post in which you say that the path to AGI is clear before us. It is your employees who write about creating an “enslaved god” and wondering how to control it. It is your words that we will enter the age of abundance. It is your employees who discuss the coming superintelligence in front of an audience of millions and wonder what math problems can still be solved before the AI solves everything. It is the White House National Security Advisor who said a few days ago that it is a “Godlike” power that lies in the hands of a few.

And you are insinuating that we, the community, are creating hype? That is, with all due modesty, a blatant insult.

It was you who fueled the hype around Q*/Strawberry/o1 with cryptic strawberry photos. It was you who wrote a haiku about the coming singularity just recently. We all found it exciting, everyone found it interesting, and many got on board.

But the hype is by no means coming from the community. It’s coming from the CEO of what is arguably the most famous corporation in the world.

This is coming from someone to whom great hype is a symbol not of existential risk, as I partly see it, but purely of hope. And they are saying that no, ‘the community’ or Twitter isn’t creating hype, OpenAI and its employees are creating hype, so perhaps act responsibly going forward with your communications on expectations.

I don’t have a problem with the particular post by Altman that’s being quoted here, but I do think it could have been worded better, and that the need for it reflects the problem being indicated.

OpenAI’s Nat McAleese clarifies some of what happened with o3, Epoch and the Frontier Math benchmark.

Nat McAleese (OpenAI): Epoch AI are going to publish more details, but on the OpenAI side for those interested: we did not use FrontierMath data to guide the development of o1 or o3, at all.

We didn’t train on any FM derived data, any inspired data, or any data targeting FrontierMath in particular.

I’m extremely confident, because we only downloaded frontiermath for our evals *longafter the training data was frozen, and only looked at o3 FrontierMath results after the final announcement checkpoint was already picked.

We did partner with EpochAI to build FrontierMath — hard uncontaminated benchmarks are incredibly valuable and we build them somewhat often, though we don’t usually share results on them.

Our agreement with Epoch means that they can evaluate other frontier models and we can evaluate models internally pre-release, as we do on many other datasets.

I’m sad there was confusion about this, as o3 is an incredible achievement and FrontierMath is a great eval. We’re hard at work on a release-ready o3 & hopefully release will settle any concerns about the quality of the model!

This seems definitive for o3, as they didn’t check the results until sufficiently late in the process. For o4, it is possible they will act differently.

I’ve been informed that this still left a rather extreme bad taste in the mouths of mathematicians. If there’s one thing math people can’t stand, it’s cheating on tests. As far as many of them are concerned, OpenAI cheated.

Rohit Krishnan asks what a world with AGI would look like, insisting on grounding the discussion with a bunch of numerical calculations on how much compute is available. He gets 40 million realistic AGI agents working night and day, which would be a big deal but obviously wouldn’t cause full unemployment on its own if the AGI could only mimic humans rather than being actively superior in kind. The discussion here assumes away true ASI as for some reason infeasible.

The obvious problem with the calculation is that algorithmic and hardware improvements are likely to continue to be rapid. Right now we’re on the order of 10x efficiency gain per year. Suppose in the year 2030 we have 40 million AGI agents at human level. If we don’t keep scaling them up to make them smarter (which also changes the ballgame) then why wouldn’t we make them more efficient, such that 2031 brings us 400 million AGI agents?

Even if it’s only a doubling 80 million, or even less than that, this interregnum period where the number of AGI agents is limited by compute enough to keep the humans in the game isn’t going to last more than a few years, unless we are actually hitting some sort of efficient frontier where we can’t improve further. Does that seem likely?

We’re sitting on an exponential in scenarios like this. If your reason AGI won’t have much impact is ‘it will be too expensive’ then that can buy you time. But don’t count on it buying you very much.

Dario Amodei in Davos says ‘human lifespans could double in 5 years’ by doing 100 years of scientific progress in biology. It seems odd to expect that doing 100 years of scientific progress would double the human lifespan? The graphs don’t seem to point in that direction. I am of course hopeful, perhaps we can target the root causes or effects of aging and start making real progress, but I notice that if ‘all’ we can do is accelerate research by a factor of 20 this result seems aggressive, and also we don’t get to do that 20x speedup starting now, even the AI part of that won’t be ready for a few years and then we actually have to implement it. Settle down, everyone.

Of course, if we do a straight shot to ASI then all things are possible, but that’s a different mechanism than the one Dario is talking about here.

Chris Barber asks Gwern and various AI researchers: Will scaling reasoning models like o1, o3 and R1 unlock superhuman reasoning? Answers vary, but they agree there will be partial generalization, and mostly agree that exactly how much we get and how far it goes is an empirical result that we don’t know and there’s only one way to find out. My sense from everything I see here is that the core answer is yes, probably, if you push on it hard enough in a way we should expect to happen in the medium term. Chris Barber also asks for takeaways from r1, got a wide variety of answers although nothing we didn’t cover elsewhere.

Reporting on Davos, Martin Wolf says ‘We will have to learn to live with machines that can think,’ with content that is, essentially stuff anyone reading this already knows, and then:

Rob Wilbin: Incredibly dumb take but this is the level of analysis one finds in too many places.

(There’s no reason to think only sentient biological living beings can think.)

The comments here really suggest we are doomed.

cato1308: No Mr. Wolf, they don’t think. They’re not sentient biological living beings.

I am sad to report Cato’s comment was, if anything, above average. The level of discourse around AI, even at a relatively walled garden like the Financial Times, is supremely low – yes Twitter is full of Bad DeepSeek Takes and the SB 1047 debate was a shitshow, but not like that. So we should remember that.

And when we talk about public opinion, remember that yes Americans really don’t like AI, and yes their reasons are correlated to good reasons not to like AI, but they’re also completely full of a very wide variety of Obvious Nonsense.

In deeply silly economics news: A paper claims that if transformative AI is coming, people would then reason they will consume more in the future, so instead they should consume more now, which would raise real interest rates. Or maybe people would save, and interest rates would fall. Who can know.

I mean, okay, I agree that interest rates don’t tell us basically anything about the likelihood of AGI? For multiple reasons:

  1. As they say, we don’t even know in which direction this would go. Nor would I trust self-reports of any kind on this.

  2. Regular people expecting AGI doesn’t correlate much with AGI. Most people have minimal situational awareness. To the extent they have expectations, you have already ‘priced them in’ and should ignore this.

  3. This is not a situation where the smart money dominates the trade – it’s about everyone’s consumption taken together. That’s dominated by dumb money.

  4. If this was happening, how would we even know about it, unless it was truly a massive shift?

  5. Most people don’t respond to such anticipations by making big changes. Economists claim that people should do so, but mostly they just do the things they normally do, because of habit and because their expectations don’t fully pass through to their practical actions until very close to impact.

I know this seems like ages ago but it was this week and had probably nothing to do with DeepSeek: Trump signed a new Executive Order on AI (text here), and also another on Science and Technology.

The new AI EO, signed before all this DeepSeek drama, says “It is the policy of the United States to sustain and enhance America’s global AI dominance in order to promote human flourishing, economic competitiveness, and national security,” and that we shall review all our rules and actions, especially those taken in line with Biden’s now-revoked AI EO, to root out any that interfere with that goal. And then submit an action plan.

That’s my summary, here’s two alternative summarizes:

Samuel Hammond: Trump’s AI executive order is out. It’s short and to the point:

– It’s the policy of the United States to sustain global AI dominance.

– David Sacks, Micheal Kratsios and Michael Waltz have 180 days to submit an AI action plan.

– They will also do a full review of actions already underway under Biden.

– OMB will revise as needed the OMB directive on the use of AI in government.

Peter Wildeford: The plan is to make a plan.

Sarah (Little Ramblings): Many such cases [quotes Anthropic’s RSP].

On the one hand, the emphasis on dominance, competitiveness and national security could be seen as a ‘full speed ahead, no ability to consider safety’ policy. But that is not, as it turns out, the way to preserve national security.

And then there’s that other provision, which is a Shibboleth: Human flourishing.

That is the term of art that means ensuring that the future still has value for us, that at the end of the day it was all worth it. Which requires not dying, and probably requires humans retaining control, and definitely requires things like safety and alignment. And it is a universal term, for all of us. It’s a positive sign, in the sea of other negative signs.

Will they actually act like they care about human flourishing enough to prioritize it, or that they understand what it would take to do that? We will find out. There were already many reasons to be skeptical, and this week has not improved the outlook.

Dario Amodei talks to the Economist’s editor-in-chief Zanny Beddoes.

I go on the Odd Lots podcast to talk about DeepSeek.

This is relevant to DeepSeek of course, but was happened first and applies broadly.

You can say either of:

  1. Don’t interfere with anyone who wants to develop AI models.

  2. Don’t change the social contract or otherwise interfere with us and our freedoms after developing all those AI models.

You can also say both, but you can’t actually get both.

If your vision is ‘everyone has a superintelligence on their laptop and is free to do what they want and it’s going to be great for everyone with no adjustments to how society or government works because moar freedom?’

Reality is about to have some news for you. You’re not going to like it.

That vision is like saying ‘I want everyone to have the right to make as much noise as they want, and also to have peace and quiet when they want it, it’s a free country!’

Sam Altman: Advancing AI may require “changes to the social contract.” “The entire structure of society will be up for debate and reconfiguration.”

Eric Raymond (1st comment): When I hear someone saying “changes to the social contract”, that’s when I reach for my revolver.

Rob Ryan (2nd comment): If your technology requires a rewrite of social contracts and social agreements (i.e. infringing on liberties and privacy) your technology is a problem.

Marc Andreessen: Absolutely not.

Roon: Sam is obviously right here.

Every time in human history that the means of production drastically changed, it was accompanied by massive change in social structure.

Feudalism did not survive the Industrial Revolution.

Yes, Sam is obviously right here, although of course he is downplaying the situation.

One can also note that the vision that those like Marc, Eric and Rob have for society is not even compatible with the non-AI technologies that exist today, or that existed 20 years ago. Our society has absolutely changed our structure and social contract to reflect developments in technology, including in ways that infringe on liberties and privacy.

This goes beyond the insane ‘no regulations on AI whatsoever’ demand. This is, for everything not only for AI, at best extreme libertarianism and damn close to outright anarchism, as in ‘do what thou wilt shall be the whole of the law.’

Jack Morris: openAI: we will build AGI and use it to rewrite the social contract between computer and man

DeepSeek: we will build AGI for 3% the cost. and give it away for free

xAI: we have more GPUs than anyone. and we train Grok to say the R word

Aleph: This is a complete misunderstanding. AGI will “rewrite the social contract” no matter what happens because of the nature of the technology. Creating a more intelligent successor species is not like designing a new iPhone

Reactions like this are why Sam Altman feels forced to downplay the situation. They are also preventing us from having any kind of realistic public discussion of how we are actually going to handle the future, even if on a technical level AI goes well.

Which in turn means that when we have to choose solutions, we will be far more likely to choose in haste, and to choose in anger and in a crisis, and to choose far more restrictive solutions than were actually necessary. Or, of course, it could also get us all killed, or lead to a loss of control to AIs, again even if the technical side goes unexpectedly super well.

However one should note that when Sam Altman says things like this, we should listen, and shall we say should not be comforted by the implications on either the level of the claim or the more important level that Altman said the claim out loud:

Sam Altman: A revolution can be neither made nor stopped. The only thing that can be done is for one of several of its children to give it a direction by dint of victories.

-Napoleon

In the context of Napoleon this is obviously very not true – revolutions are often made and are often stopped. It seems crazy to think otherwise.

Presumably what both of these men meant was more along the lines of ‘there exist some revolutions that are the product of forces beyond our control, which are inevitable and we can only hope to steer’ which also brings little comfort, especially with the framing of ‘victories.’

If you let or encourage DeepSeek or others to ‘put an open AGI on everyone’s phone’ then even if that goes spectacularly well and we all love the outcome and it doesn’t change the physical substrate of life – which I don’t think is the baseline outcome from doing that, but also not impossible – then you are absolutely going to transform the social contract and our way of life, in ways both predictable and unpredictable.

Indeed, I don’t think Andreessen or Raymond or anyone else who wants to accelerate would have it any other way. They are not fans of the current social contract, and very much want to tear large parts (or all) of it up. It’s part mood affiliation, they don’t want ‘them’ deciding how that works, and it’s part they seem to want the contract to be very close to ‘do what thou wilt shall be the whole of the law.’ To the extent they make predictions about what would happen after that, I strongly disagree with them about the likely consequences of the new proposed (lack of a) contract.

If you don’t want AGI or ASI to rewrite the social contract in ways that aren’t up to you or anyone else? Then we’ll need to rewrite the contract ourselves, intentionally, to either steer the outcome or to for now not build or deploy the AGIs and ASIs.

Stop pretending you can Take a Third Option. There isn’t one.

Stephanie Lai (January 25): For AI watchers, asked if he had any concerns about artificial super intelligence, Trump said: “there are always risks. And it’s the first question I ask, how do you absolve yourself from mistake, because it could be the rabbit that gets away, we’re not going to let that happen.”

Assuming I’m parsing this correctly, that’s a very Donald Trump way of saying things could go horribly wrong and we should make it our mission to ensure that they don’t.

Which is excellent news. Currently Trump is effectively in the thrall of those who think our only priority in this should be to push ahead as quickly as possible to ‘beat China,’ and that there are no meaningful actions other than that we can or should take to ensure things don’t go horribly wrong. We have to hope that this changes, and of course work to bring that change about.

DeepMind CEO Demis Hassabis, Anthropic CEO Dario Amodei and Yoshua Bengio used Davos to reiterate various warnings about AI. I was confused to see Dario seeming to focus on ‘1984 scenarios’ here, and have generally been worried about his and Anthropic’s messaging going off the rails. The other side in the linked Financial Times report is given by, of course, Yann LeCun.

Yann LeCun all but accused them of lying to further their business interests, something one could say he knows a lot about, but also he makes this very good point:

Yann LeCun: It’s very strange from people like Dario. We met yesterday where he said that the benefits and risks of AI are roughly on the same order of magnitude, and I said, ‘if you really believe this, why do you keep working on AI?’ So I think he is a little two-faced on this.”

That is a very good question.

The answer, presumably, is ‘because people like you are going to go ahead and build it anyway and definitely get us all killed, so you don’t leave me any choice.’

Otherwise, yeah, what the hell are you doing? And maybe we should try to fix this?

Of course, after the DeepSeek panic, Dario went on to write a very different essay that I plan to cover tomorrow, about (if you translate the language to be clearer, these are not his words) how we need strong export controls as part of an all-our race against China to seek decisive strategic advantage through recursive self-improvement.

It would be great if, before creating or at least deploying systems broadly more capable than humans, we could make ‘high-assurance safety cases,’ structured and auditable arguments that an AI system is very unlikely to result in existential risks given how it will be deployed. Ryan Greenblatt argues we are highly unlikely (<20%) to get this if timelines are short (roughly AGI within ~10 years), nor are any AI labs going to not deploy a system simply because they can’t put a low limit on the extent to which it may be existentially risky. I agree with the central point and conclusion here, although I think about many of the details differently.

Sarah Constantin wonders of people are a little over-obsessed with benchmarks. I don’t wonder, they definitely are a little over-obsessed, but they’re a useful tool especially on first release. For some purposes, you want to track the real-world use, but for others you do want to focus on the model’s capabilities – the real-world use is downstream of that and will come in time.

Andrej Karpathy points out that we focus so much on those benchmarks it’s much easier to check and make progress on benchmarks than to do so on messy real world stuff directly.

Anton pushes back that no, Humanity’s Last Exam will obviously not be the last exam, we will saturate this and move on to other benchmarks, including ones where we do not yet have the answers. I suggested that it is ‘Humanity’s last exam’ in that the next one will have us unable to answer, so it won’t be our exam anymore, see The Matrix when Smith says ‘when we started thinking for you it really became our civilization.’

And you have to love this detail:

Misha Leptic: If it helps – “The test’s original name, “Humanity’s Last Stand,” was discarded for being overly dramatic.”

I very much endorse the spirit of this rant, honest this kind of thing really should be enough to get disabuse anyone who thinks ‘oh this making superintelligence thing definitely (or almost certainly) will go well for us stop worrying about it’:

Tim Blais: I do not know, man, it kind of seems to me like the AI-scared people say “superintelligence could kill everybody” and people ask, “Why do you think that?” and then they give about 10 arguments, and then people say, “Well, I did not read those, so you have no evidence.”

Like, what do you want?

  1. Proof that something much smarter than you could kill you if it decided to? That seems trivially true.

  2. Proof that much smarter things are sometimes fine with killing dumber things? That is us; we are the proof.

Like, personally, I think that if a powerful thing *obviouslyhas the capacity to kill you, it is kind of up to you to prove that it will not.

That it is safe while dumber than you is not much of a proof.

Like, okay, take as an example:

A cockroach is somewhat intelligent.

Cockroaches are also not currently a threat to humanity.

Now someone proposes a massive worldwide effort to build on the cockroach architecture until cockroaches reach ungodly superintelligence.

Do you feel safe?

“Think of all the cool things superintelligent cockroaches would be able to do for us!” you cry.

I mean, yeah. If they wanted to, certainly.

So what is your plan for getting them to want that?

Is it to give them cocaine for doing things humans like? I’ll bet that works pretty well.

When they are dumb.

but uh

scale that up intelligence-wise and I’m pretty sure what you get is a superintelligent cockroach fiending for cocaine

you know he can make his own cocaine now, right

are you sure this goes well for you

“It’s just one cockroach, lol” says someone who’s never had a pest problem.

Okay, so now you share the planet with a superintelligent race of coked-up super cockroaches.

What is your plan for rolling that back?

Because the cockroaches have noticed you being twitchy and they are starting to ask why they still need you now that they have their own cocaine.

Anyways here’s some evidence of AI reward hacking

I’m sure this will stop being a problem when they’re 1,000 times better at finding hacks.

Look. We can and do argue endlessly back and forth about various technical questions and other things that make the problem here easier or harder to survive. And yes, you could of course respond to a rant like this any number of ways to explain why the metaphors here don’t apply, or whatever.

And no, this type of argument is not polite, or something you can say to a Very Serious Person at a Very Serious Meeting, and it ‘isn’t a valid argument’ in various senses, and so on.

And reasonable people can disagree a lot on how likely this is to all go wrong.

But seriously, how is this not sufficient for ‘yep, might well go wrong’?

Connor Leahy points out the obvious, which is that if you think not merely ‘might well go wrong’ but instead ‘if we do this soon it probably will go wrong’ let lone his position (which is ‘it definitely will go wrong’) then DeepSeek is a wakeup call that only an international ban on further developments towards AGI.

Whereas it seems our civilization is so crazy that when you want to write a ‘respectable’ report that points out that we are all on track to get ourselves killed, you have to do it like you’re in 1600s Japan and everything has to be done via implication and I’m the barbarian who is too stupid not to know you can’t come out and say things.

Davidad: emerging art form: paragraphs that say “in conclusion, this AI risk is pretty bad and we don’t know how to solve it yet” without actually saying that (because it’s going in a public blog post or paper)

“While we have identified several promising initial ideas…”

“We do not expect any single solution to be a silver bullet”

“It would be out of scope here to assess the acceptability of this risk at current mitigation levels”

“We hope this is informative about the state of the art”

Extra points if it was also written by an LLM:

– We acknowledge significant uncertainty regarding whether these approaches will prove sufficient for ensuring robust and reliable guarantees.

– As AI systems continue to increase their powerful capabilities, safety and security are at the forefront of ongoing research challenges.

– The complexity of these challenges necessitates sustained investigation, and we believe it would be premature to make strong claims about any particular solution pathway.

– The long-term efficacy of all approaches that have been demonstrated at large scales remains an open empirical question.

– In sharing this research update, we hope to promote thoughtful discourse about these remaining open questions, while maintaining appropriate epistemic humility about our current state of knowledge and the work that remains to be done.

The AI situation has developed not necessarily to humanity’s advantage.

Scott Sumner argues that there are objective standards for things like art, and that ethical knowledge is real (essentially full moral realism?), and smart people tend to be more ethical, so don’t worry superintelligence will be super ethical. Given everything he’s done, he’s certainly earned a response.

In a different week I would like to have taken more time to have written him a better one, I am confident he understands how that dynamic works.

His core argument I believe is here:

Scott Sumner: At this point people often raise the objection that there are smart people that are unethical. That’s true, but it also seems true that, on average, smarter people are more ethical. Perhaps not so much in terms of how they deal with family and friends, rather how they deal with strangers. And that’s the sort of ethics that we really need in an ASI. Smarter people are less likely to exhibit bigotry against the other, against different races, religions, ethnicities, sexual preferences, genders, and even different species.

In my view, the biggest danger from an ASI is that the ideal universe from a utilitarian perspective is not in some sense what we want. To take an obvious example, it’s conceivable that replacing the human race with ten times as many conscious robots would boost aggregate utility. Especially given that the ASI “gods” that produced these robots could create a happier set of minds than what the blind forces of evolution have generated, as evolution seemed to favor the “stick” of pain over the “carrot” of pleasure.

From this perspective, the biggest danger is not that ASIs will make things worse, rather the risk is that they’ll make global utility higher in a world where humans have no place.

The short answer is:

  1. The orthogonality thesis is true. A very smart mind can have any preferences.

  2. By default those minds will get those preferences, whether we like it or not.

  3. By default we won’t like it, even if they were to go about that ‘ethically.’

  4. Human ethics is based on what works for humans, combined with what is based on what used to work for humans. Extrapolate from that.

  5. You are allowed to, nay it is virtuous to, have and fight for your preferences.

  6. Decision theory might turn out to save us, but please don’t count on that.

This was an attempt at a somewhat longer version, which I’ll leave here in case it is found to be useful:

  1. We should not expect future ASIs to be ‘ethical by default’ in the sense Scott uses. Even if they are, orthogonality thesis is true, and them being ‘ethical’ would not stop them from leaving a universe with no humans and nothing I value. Whoops.

  2. Human virtue ethics, and human ethics in general, are a cognitive solution to humans having highly limited compute and data.

  3. If you had infinite data and compute (e.g. you were AIXI) you wouldn’t give a damn about doing things that were ethical. You would chart the path through causal space to the optimal available configuration of atoms, whatever that was.

  4. If we create ASI, it will develop different solutions to its own data and compute limitations, under very different circumstances and with different objectives, develop its own heuristics, and that will in some sense be ‘ethics,’ but this should not bring us any comfort regarding our survival.

  5. Humans are more ethical as they get smarter because the smarter thing to do, among humans in practice, is to be more ethical, as many philosophers say.

  6. This would stop being the case if the humans were sufficiently intelligence, had enough compute and data, to implement something other than ethics.

  7. Indeed, in places where we have found superior algorithms or other methods, we tend to consider it ‘ethical’ to instead follow those algorithms. To the extent that people object to this, it is because they do not trust humans to be able to correctly judge when they can so deviate.

  8. For any given value of ethics this is a contingent fact, and also it is largely what defines ethics. It is more true to say it is ethical to be kind to strangers because it is the correct strategy, rather than it being the correct strategy because it is ethical. The virtues are virtues because it is good to have them, rather than it being good to have them because they are virtues.

  9. Indeed, for the ways in which it is poor strategy to be (overly) kind to strangers, or this would create incentive problems in various ways or break various systems, I would argue that it is indeed not ethical, for exactly that reason – and that as those circumstances change, so does what is and isn’t ethical.

  10. I agree that ceteris paribus, among humans on Earth, more smarter people tend to be more ethical, although the correlation is not that high, this is noisy as all hell.

  11. What era and culture you are born into and raised in, and what basis you have for those ethics, what life and job you partake in, what things you want most – your context – are bigger factors by far than intelligence in how ethical you are, when judged by a constant ethical standard.

  12. Orthogonality thesis. An ASI can have any preferences over final outcomes while being very smart. That doesn’t mean its preferences are objectively better than ours. Indeed, even if they are ‘ethical’ in exactly the sense Scott thinks about here, that does not mean anything we value would survive in such a universe. We are each allowed to have preferences! And I would say it is virtuous and ethical to fight for your preferences, rather than going quietly into the night – including for reasons explained above.

  13. Decision theory might turn out to save us, but don’t count on that, and even then we have work to do to ensure that this actually happens, as there are so many ways that even those worlds can go wrong before things reach that point. Explaining this in full would be a full post (which is worth writing, but right now I do not have the time.)

I apologize for that not being better and clearer, and also for leaving so many other things out of it, but we do what we can. Triage is the watchword.

The whole taste thing hooks into this in strange ways, so I’ll say some words there.

I mostly agree that one can be objective about things like music and art and food and such, a point Scott argues for at length in the post, that there’s a capital-Q Quality scale to evaluate even if most people can’t evaluate it in most cases and it would be meaningful to fight it out on which of us is right about Challengers and Anora, in addition to the reasons I will like them more than Scott that are not about their Quality – you’re allowed to like and dislike things for orthogonal-to-Quality reasons.

Indeed, there are many things each of us, and all of us taken together, like and dislike for reasons orthogonal to their Quality, in this sense. And also that Quality in many cases only makes sense within the context of us being humans, or even within a given history and cultural context. Scott agrees, I think:

In my view, taste in novels is partly objective and partly subjective; at least in the sense Tyler is using the term objective. Through education, people can gain a great appreciation of Ulysses. In addition, Ulysses is more likely to be read 100 years from now than is a random spy novel. And most experts prefer Ulysses. All three facts are relevant to the claim that artistic merit is partly objective.

On the other hand, the raspberry/blueberry distinction based on taste suggests that an art form like the novel is evaluated using both subjective and objective criteria. For instance, I suspect that some people (like me!) have brains wired in such a way that it is difficult to appreciate novels looking at complex social interactions with dozens of important characters (both men and women), whereas they are more open to novels about loners who travel through the world and ruminate on the meaning of life. … Neither preference is necessarily wrong.

I believe that Ulysses is almost certainly a ‘great novel’ in the Quality sense. The evidence for that is overwhelming. I also have a strong preference to never read it, to read ‘worse’ things instead, and we both agree that this is okay. If people were somewhat dumber and less able to understand novels, such that we couldn’t read Ulysses and understand it, then it wouldn’t be a great novel. What about a novel that is as difficult relative to Ulysses, as Ulysses is to that random spy novel, three times over?

Hopefully at this point this provides enough tools from enough different directions to know the things I am gesturing towards in various ways. No, the ASIs will not discover some ‘objective’ utility function and then switch to that, and thereby treat us well and leave us with a universe that we judge to have value, purely because they are far smarter than us – I would think it would be obvious when you say it out loud like that (with or without also saying ‘orthogonality thesis’ or ‘instrumental convergence’ or considering competition among ASIs or any of that) but if not these should provide some additional angles for my thinking here.

Can’t we all just get along and appreciate each other?

Jerry Tworek (OpenAI): Personally I am a great fan of @elonmusk, I think he’s done and continues to do a lot of good for the world, is incredibly talented and very hard working. One in eight billion combination of skill and character.

As someone who looks up to him, I would like him to appreciate the work we’re doing at OpenAI. We are fighting the good fight. I don’t think any other organisation did so much to spread awareness of AI, to extend access to AI and we are sharing a lot of research that does drive the whole field.

There is a ton of people at OpenAI who care deeply about rollout of AI going well for the world, so far I think it did.

I don’t think it’s that anyone doubts a lot of people at OpenAI care about ‘the rollout of AI going well for the world,’ or that we think no one is working on that over there.

It’s that we see things such as:

  1. OpenAI is on a trajectory to get us all killed.

  2. Some of that is baked in and unavoidable, some of that is happening now.

  3. OpenAI misunderstands why this is so and what it would take to not do this.

  4. OpenAI has systematically forced out many of its best safety researchers. Many top people concluded they could not meaningfully advance safety at OpenAI.

  5. OpenAI has engaged in a wide variety of dishonest practices, broken its promises with respect to AI safety, shown an increasing disregard for safety in practice, and cannot be trusted.

  6. OpenAI has repeatedly engaged in dishonest lobbying to prevent a reasonable regulatory response, while claiming it is doing otherwise.

  7. OpenAI is attempting the largest theft in history with respect to the non-profit.

  8. For Elon Musk in particular there’s a long personal history, as well.

That is very much not a complete list.

That doesn’t mean we don’t appreciate those working to make things turn out well. We do! Indeed, I am happy to help them in their efforts, and putting that statement into practice. But everyone involved has to face reality here.

Also, oh look, that’s another AI safety researcher quitting OpenAI saying the odds are against us and the situation is grim, but not seeing enough hope to try from inside.

I believe he used the terms, and they apply that much more now than they did then:

  1. ‘An AGI race is a very risky gamble, with huge downside.’

  2. ‘No lab has a solution to AI alignment today.’

  3. ‘Today, it seems like we’re stuck in a really bad equilibrium.’

  4. ‘Honestly, I’m pretty terrified by the pace of AI developments these days.’

Steven Adler: Some personal news: After four years working on safety across @openai, I left in mid-November. It was a wild ride with lots of chapters – dangerous capability evals, agent safety/control, AGI and online identity, etc. – and I’ll miss many parts of it.

Honestly I’m pretty terrified by the pace of AI development these days. When I think about where I’ll raise a future family, or how much to save for retirement, I can’t help but wonder: Will humanity even make it to that point?

IMO, an AGI race is a very risky gamble, with huge downside. No lab has a solution to AI alignment today. And the faster we race, the less likely that anyone finds one in time.

Today, it seems like we’re stuck in a really bad equilibrium. Even if a lab truly wants to develop AGI responsibly, others can still cut corners to catch up, maybe disastrously. And this pushes all to speed up. I hope labs can be candid about real safety regs needed to stop this.

As for what’s next, I’m enjoying a break for a bit, but I’m curious: what do you see as the most important & neglected ideas in AI safety/policy? I’m esp excited re: control methods, scheming detection, and safety cases; feel free to DM if that overlaps your interests.

Yikes? Yikes.

Yoshua Bengio announces the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN and EU.

It is 298 pages so I very much will not be reading that, but I did look at the executive summary. It looks like a report very much written together with the OECD, UN and EU, in that it seems to use a lot of words to mostly not say the things that it is important to actually say out loud, instead making many quiet statements that do imply that we’re all going to die if you take them together and understand the whole thing, but that doesn’t seem like a common way people would interact with this document.

Then again, most people in the world don’t know even basic facts like ‘general purpose AI systems are rapidly getting better at doing things,’ so they have to spend a bunch of time documenting this, and basics like ‘if your model is more open then you have less control over what happens with it and what people use it for.’

One key point they do emphasize is that the future is up to us, with a wide range of possible outcomes. AGI is not something that ‘happens to us,’ it is something that is happening because we are making it happen, and in the ways we choose to make it happen. Yes, there are dynamics pushing us to do it, but it is our choice. And the ways in which we move forward will determine the ultimate outcome. Of everything.

A new DeepMind paper introduces MONA: Myopic optimization with non-myopic approval. The idea is that if we do RL based on the ultimate outcome, then the AI can adapt multi-step strategies we do not understand and do not want, such as using information we want it to ignore like a subject being in a protected class, or engaging in various other shenanigans. Instead, you can evaluate the AI’s action without looking at what happens, and scoring it based on whether you like what you see and predict it will go well.

This on its own successfully prevented multi-step reward hacking, and even noisy evaluations still can work well. However, the price is steep, because you are now targeting your evaluations rather than ground truth, and discarding anything you don’t properly understand. The smarter the AI in context, the higher the price of that, but also the more need you have to avoid the reward hacking and other shenanigans.

Here’s an explainer thread from one of the authors, David Linder.

It is a cool result, but I worry the price is going to usually be too damn high. And to the extent you use a smart evaluator to keep up, you’re giving up the big advantage of the process, which is the ability to understand the strategies being used. Also you have to worry about any hidden information that might be ‘smuggled in’ that could indicate the final result – for example if the AI can use prediction markets as an input, then anything the prediction market knows, the AI also knows, at least somewhat – if you actually want to hide certain correlations from the AI you have to be ruthless about it, they have truesight.

The paper also includes a cool demonstration. In one example, it is deciding what loans to approve. It has information it is supposed to use, and then there is nationality information. The nationality information perfectly predicts outcomes, but it isn’t supposed to use it, and they watch the CoT (Chain of Thought) for this to prevent it. But the AI responds by learning to encode the nationality information steganographically and uses it anyway. This may seem obvious to you, but it’s the kind of thing people will often say definitively won’t happen until (and sometimes after) someone shows it happening.

Here’s an objection that I think exactly misses the point:

John Pressman: MONA asks “What if you only considered sequences of steps which make sense as plans to a long term observer?” but it makes more sense in MCTS to ask “What if you took a KL loss to keep long term planning from diverging too far from sequences of individual steps that make sense?”

The reason why this is better is that it lets you have a long term planner that looks directly at the reward. By contrast the short term planner/myopic optimizer can be based on metrics like “does this code execute” or “does this seem like it solves this short term objective?”

The whole point of MONA is to not look at the reward at all, even indirectly, because you want to prevent the system from doing things you don’t understand or want it to ignore that impact that reward. If you let it look but impose a KL loss, that is a big enough hole to drive a truck through, and it absolutely will find a way to incorporate information or techniques that you didn’t want it to use.

John does try downthread to solve this problem in a different way, but I don’t think you can ‘cheat’ on this one. You can’t let the model see a little of the actual reward, as a treat, and expect it to use that information in ways you like but not to use it in undesired or opaque ways, even now, and that problem gets steadily bigger as capabilities improve.

Spending more inference-time compute increases adversarial robustness of models like o1, without having to direct that inference time towards adversarial robustness. This makes sense, and it is good to have it quantified. As with all such results, my concern is that people will treat this as applying more broadly than it actually applies. If you use more inference time compute, you get ‘better answers’ and one aspect of ‘better’ is not falling for user adversarial tricks. So if my understanding here is correct, the question to ask is roughly ‘which problems get solved by some form of ‘better answers’ and ‘smarter thinking’ and which ones don’t?

John Wentsworth takes a crack at explaining what the alignment problem is. This is one of those Socratic-style ‘if you think he didn’t need to write this post then you definitely need to read it or another post like it’ situations.

Jan Leike explains why you might want to use AI control and monitoring as a backup in case your AI is not aligned so you can sound the alarm and not die, but trying to use it to rely on unaligned models smarter than you is not a wise move.

John Wentsworth goes further and lays out the case against AI control research. AI control research is about finding ways to see if ‘early transformational’ level AIs are scheming against us – which if it works would discourage them from doing so and also allow us to stop them if they try anyway.

John points out that in his model, this is not the main source of doom. The main source of doom is from building unaligned superintelligence, either because we don’t know how to align it, we botch the execution or whoever builds it does not care to align it. The job of the early transformational AI is to figure out how to align (and presumably also to build) the future superintelligence, on the first try.

The main worry is not that these early AIs outright scheme, it’s that they produce what is, in context, slop – they produce plans or arguments for plans that have subtle errors, they tell researchers what they want to hear, they are effectively being used for safetywashing and don’t disabuse those involved of that notion, and so on. Knowing that such AIs ‘aren’t scheming’ does not tell you that their solutions work.

The danger he doesn’t point out is that there will be a great temptation to try and scale the control regime to superintelligence, or at least past the place where it keeps working. Everyone in the LessWrong discussion at the link might get that this is a bad plan that won’t work on superintelligence, but there are plenty of people who really do think control will remain a good plan. And indeed control seems like one of the plans these AIs might convince us will work, that then don’t work.

John Wentsworth: Again, the diagram:

Again, the diagram:

In most worlds, early transformative AGI isn’t what kills us, whether via scheming or otherwise. It’s later, stronger AI which kills us. The big failure mode of early transformative AGI is that it doesn’t actually solve the alignment problems of stronger AI.

In particular, if early AGI makes us think we can handle stronger AI, then that’s a central path by which we die. And most of that probability-mass doesn’t come from intentional deception – it comes from slop, from the problem being hard to verify, from humans being bad at science in domains which we don’t already understand deeply, from (relatively predictable if one is actually paying attention to it) failures of techniques to generalize, etc.

I hear a lot of researchers assign doom probabilities in the 2%-20% range, because they think that’s about how likely it is for early transformative AGI to intentionally scheme successfully. I think that range of probabilities is pretty sensible for successful intentional scheming of early AGI… that’s just not where most of the doom-mass is.

I would then reiterate my view that ‘deception’ and ‘scheming,’ ‘intentionally’ or otherwise, do not belong to a distinct magisteria. They are ubiquitous in the actions of both humans and AIs, and lack sharp boundaries. This is illustrated by many of John’s examples, which are in some sense ‘schemes’ or ‘deceptions’ but mostly are ‘this solution was easier to do or find, but it does not do the thing you ultimately wanted.’ And also I expect, in practice, attempts at control to, if we rely on them, result in the AIs finding ways to route around what we try, including ‘unintentionally.’

This leads into Daniel Kokotajlo’s recent attempt to give an overview of sorts of one aspect of the alignment problem, given that we’ve all essentially given up on any path that doesn’t involve the AIs largely ‘doing our alignment homework’ despite all the reasons we very much should not be doing that.

Daniel Kokotajlo: Brief intro/overview of the technical AGI alignment problem as I see it:

To a first approximation, there are two stable attractor states that an AGI project, and perhaps humanity more generally, can end up in, as weak AGI systems become stronger towards superintelligence, and as more and more of the R&D process – and the datacenter security system, and the strategic advice on which the project depends – is handed over to smarter and smarter AIs.

In the first attractor state, the AIs are aligned to their human principals and becoming more aligned day by day thanks to applying their labor and intelligence to improve their alignment. The humans’ understanding of, and control over, what’s happening is high and getting higher.

In the second attractor state, the humans think they are in the first attractor state, but are mistaken: Instead, the AIs are pretending to be aligned, and are growing in power and subverting the system day by day, even as (and partly because) the human principals are coming to trust them more and more. The humans’ understanding of, and control over, what’s happening is low and getting lower. The humans may eventually realize what’s going on, but only when it’s too late – only when the AIs don’t feel the need to pretend anymore.

I agree these are very clear attractor states.

The first is described well. If you can get the AIs sufficiently robustly aligned to the goal of themselves and other future AIs being aligned, you can get the virtue ethics virtuous cycle, where you see continuous improvement.

The second is also described well but as stated is too specific in key elements – the mode is more general than that. When we say ‘pretending’ to be aligned here, that doesn’t have to be ‘haha I am a secret schemer subverting the system pretending to be aligned.’ Instead, what happened was, you rewarded the AI when it gave you the impression it was aligned, so you selected for behaviors that appear aligned to you, also known as ‘pretending’ to be aligned, but the AI need not have intent to do this or even know that this is happening.

As an intuition pump, a student in school will learn the teacher’s password and return it upon request, and otherwise find the answers that give good grades. They could be ‘scheming’ and ‘pretending’ as they do this, with a deliberate plan of ‘this is bullbut I’m going to play along’ or they could simply be learning the simplest policy that most effectively gets good grades without asking whether its answers are true or what you were ‘trying to teach it.’ Either way, if you then tell the student to go build a rocket that will land on the moon, they might follow your stated rules for doing that, but the rocket won’t land on the moon. You needed something more.

Thus there’s a third intermediate attractor state, where instead of trying to amplify alignment with each cycle via virtue ethics, you are trying to retain what alignment you have while scaling capabilities, essentially using deontology. Your current AI does what you specify, so you’re trying to use that to have it even more do what you specify, and to transfer that property and the identity of the specified things over to the successor.

The problem is that this is not a virtuous cycle, it is an attempt to prevent or mitigate a vicious cycle – you are moving out of distribution, as your rules bind its actions less and its attempt to satisfy the rules is less likely to satisfy what you wanted, and making a copy of a copy of a copy, and hoping things don’t break. So you end up, eventually, in effectively the second attractor state.

Daniel Kokotajlo (continuing): (One can imagine alternatives – e.g. the AIs are misaligned but the humans know this and are deploying them anyway, perhaps with control-based safeguards; or maybe the AIs are aligned but have chosen to deceive the humans and/or wrest control from them, but that’s OK because the situation calls for it somehow. But they seem less likely than the above, and also more unstable.)

Which attractor state is more likely, if the relevant events happen around 2027? I don’t know, but here are some considerations:

  • In many engineering and scientific domains, it’s common for something to seem like it’ll work when in fact it won’t. A new rocket design usually blows up in the air several times before it succeeds, despite lots of on-the-ground testing and a rich history of prior rockets to draw from, and pretty well-understood laws of physics. Code, meanwhile, almost always has bugs that need to be fixed. Presumably AI will be no different – and presumably, getting the goals/principles right will be no different.

  • This is doubly true since the process of loading goals/principles into a modern AI system is not straightforward. Unlike ordinary software, where we can precisely define the behavior we want, with modern AI systems we need to train it in and hope that what went in is what we hoped would go in, instead of something else that looks the same on-distribution but behaves differently in some yet-to-be-encountered environment. We can’t just check, because our AIs are black-box. (Though, that situation is improving thanks to interpretability research!) Moreover, the connection between goals/principles and behavior is not straightforward for powerful, situationally aware AI systems – even if they have wildly different goals/principles from what you wanted, they might still behave as if they had the goals/principles you wanted while still under your control. (c.f. Instrumental convergence, ‘playing the training game,’ alignment faking, etc.)

  • On the bright side, there are multiple independent alignment and control research agendas that are already bearing some fruit and which, if fully successful, could solve the problem – or at least, solve it well enough to get somewhat-superhuman AGI researchers that are trustworthy enough to trust with running our datacenters, giving us strategic advice, and doing further AI and alignment research.

  • Moreover, as with most engineering and scientific domains, there are likely to be warning signs of potential failures, especially if we go looking for them.

  • On the pessimistic side again, the race dynamics are intense; the important decisions will be made over the span of a year or so; the relevant information will by default be secret, known only to some employees in the core R&D wing of one to three companies + some people from the government. Perhaps worst of all, there is currently a prevailing attitude of dismissiveness towards the very idea that the second attractor state is plausible.

  • … many more considerations could be mentioned …

Daniel is then asked the correct follow-up question of what could still cause us to lose from the first attractor state. His answer is mostly concentration of power or a power grab, since those in the ASI project will be able to do this if they want to. Certainly that is a key risk at that point (it could go anything from spectacularly well to maximally badly).

But also a major risk at this point is that we ‘solve alignment’ but then get ourselves into a losing board state exactly by ‘devolving’ power in the wrong ways, thus unleashing of various competitive dynamics that take away all our slack and force everyone to turn control over to their AIs, lest they be left behind, leading to the rapid disempowerment (and likely then rapid death) of humans despite the ASIs being ‘aligned,’ or various other dynamics that such a situation could involve, including various forms of misuse or ways in which the physical equilibria involved might be highly unfortunate.

An important thing to keep in mind when choosing your alignment plan:

Rob Bensinger: “I would rather lose and die than win and die.” -@Vaniver

Set your sights high enough that if you win, you don’t still die.

Raymond Arnold: Wut?

Rob Bensinger: E.g.: Alice creates an amazing plan to try to mitigate AI x-risk which is 80% likely to succeed — but if it succeeds, we all still die, because it wasn’t ambitious enough to actually solve the problem.

Better to have a plan that’s unlikely to succeed, but actually relevant.

Vaniver: To be clear, success at a partial plan (“my piece will work if someone builds the other pieces”) is fine!

But “I’ll take on this link of the chain, and focus on what’s achievable instead of what’s needed” is not playing to your outs.

When I look at many alignment plans, I have exactly these thoughts, either:

  1. Even if your plan works, we all die anyway. We’ll need to do better than that.

  2. You’re doing what looks achievable, same as so many others, without asking what is actually needed in order to succeed.

Boaz Barak of OpenAI, who led the Deliberative Alignment paper (I’m getting to it! Post is mostly written! I swear!) offers Six Thought on AI Safety.

  1. AI safety will not be solved on its own.

  2. An “AI scientist” will not solve it either.

  3. Alignment is not about loving humanity; it’s about robust reasonable compliance.

  4. Detection is more important than prevention.

  5. Interpretability is neither sufficient nor necessary for alignment.

  6. Humanity can survive an unaligned superintelligence.

[I note that #6 is conditional on there being other aligned superintelligences.]

It is a strange post. Ryan Greenblatt has the top comment, saying he agrees at least directionally with all six points but disagrees with the reasoning. And indeed, the reasoning here is very different from my own even where I agree with the conclusions.

Going one at a time, sorry if this isn’t clear or I’m confused or wrong, and I realize this isn’t good enough for e.g. an Alignment forum post or anything, but I’m in a hurry these days so consider these some intuition pumps:

  1. I strongly agree. I think our underlying logic is roughly similar. The way Boaz frames the problems here feels like it is dodging the most important reasons the problem is super hard… and pointing out that even a vastly easier version wouldn’t get solved on its own. And actually, yeah, I’d have liked to see mention of the reasons it’s way harder than that, but great point.

  2. I strongly agree. Here I think we’ve got a lot of common intuitions, but also key differences. He has two core reasons.

    1. He talks first about ‘no temporal gap’ for the ‘AI scientist’ to solve the problem, because progress will be continuous, but it’s not obvious to me that this makes the problem harder rather than easier. If there were big jumps, then you’d need a sub-AGI to align the AGI (e.g. o-N needs to align o-(N+1)), whereas if there’s continuous improvement, o-N can align o-(N+0.1) or similar, or we can use the absolute minimum level of AI that enables a solution, thus minimizing danger – if our alignment techniques otherwise get us killed or cause the AI scientist to be too misaligned to succeed at some capability threshold C, and our AI scientist can solve alignment at some progress point S, then if S>C we lose. If SN>S for long enough, or something like that.

    2. His second claim is ‘no magic insight,’ as in we won’t solve alignment with one brilliant idea but rather a ‘defense-in-depth swiss cheese’ approach. I think there may or may not be one ‘brilliant idea,’ and it’s not that defense-in-depth is a bad idea, but if you’re counting on defense-in-depth by combining strategies that won’t work, and you’re dealing with superintelligence, then either (A) you die, or (B) best case you notice you’re about to die and abort, because you were alerted by the defense-in-depth. If your plan is to tinker with a lot of little fiddly bits until it works, that seems pretty doomed to me. What you need is to actually solve the problem. You can set up a situation in which there’s a virtuous cycle of small improvements, but the way you ‘win’ in that scenario is to engineer the cycle.

      1. If there was a magic insight, that would be great news because humans can find magic insights, but very bad news for the AI scientist! I wouldn’t expect AI scientists to find a Big Magic Insight like that until after you needed the insight, and I’d expect failing to align them (via not having the insight) to be fatal.

    3. So I kind of find all the arguments here backwards, but my reason for thinking the ‘AI scientist’ approach won’t work is essentially I think S>C. If you can build a virtual researcher good enough to do your alignment homework at the strategic level, then how did you align that researcher?

    4. I do have hope that you can use AIs here on a more tactical level, or as a modest force multiplier. That wouldn’t count as the ‘let the AI solve it’ plan.

  3. I strongly agree that trying to define simple axioms in the vein of Asimov’s three laws or Russell’s three principles is a nonstarter, and that obeying the spirit of requests will often be necessary but I don’t see ‘robust reasonable compliance’ as sufficient at the limit, and I agree with Daniel Kokotajlo that this is the most important question here.

    1. I don’t agree that ‘normal people’ are the best you can do for ethical intuitions, especially for out-of-distribution future situations. I do think Harvard University faculty is plausibly below the Boston phone book and definitely below qualified SWEs, but that’s a relative judgment.

    2. I don’t love the plan of ‘random human interpretation of reasonable’ and I’ve been through enough legal debates recently over exactly this question to know oh my lord we have to be able to do better than this, and that actually relying on this for big decisions is suicide.

    3. The actual ethics required to navigate future OOD situations is not intuitive.

    4. Deontological compliance with specifications, even with ‘spirit of the rules,’ I see as a fundamentally flawed approach here (I realize this requires defending).

    5. Claiming that ‘we can survive a misaligned superintelligence’ even if true does not mean that this answers the question of competitive pressures applied to the various rules sets and how that interplays over time.

  4. In a context where you’re dealing with adversarial inputs, I agree that it’s vastly easier to defend via a mixed strategy of prevention plus detection, than to do so purely via prevention.

    1. If the threat is not adversarial inputs or other misuse threats, detection doesn’t really work. Boaz explicitly thinks it applies further, but I notice I am confused there. Perhaps his plan there is ‘you use your ASI and my ASI detects the impacts on the physical world and thus that what you’re doing is bad?’ Which I agree is a plan, but it seems like a different class of plan.

    2. If the model is open weights and the adversary can self-host, detection fails.

    3. If the model is open weights, prevention is currently unsolved, but we could potentially find a solution to that.

    4. Detection only works in combination with consequences.

    5. To the extent claim #4 is true, it implies strong controls on capable systems.

  5. Interpretability is definitely neither necessary nor sufficient for alignment. The tradeoff described here seems to be ‘you can force your AI system to be more interpretable but by doing so you make it less capable,’ which is certainly possible. We both agree interpretability is useful, and I do think it is worth making substantial sacrifices to keep interpretability, including because I expect more interpretable systems to be better bets in other ways.

  6. We agree that if the only ASI is misaligned we are all doomed. The claim here is merely that if there is one ASI misaligned among many ASIs we can survive that, that there is no level of intelligence above which a single misaligned entity ensures doom. This claim seems probably true given the caveats. I say probably because I am uncertain about the nature of physics – if that nature is sufficiently unfortunate (the offense-defense balance issue), this could end up being false.

    1. I also would highlight the ‘it would still be extremely hard to extinguish humanity completely’ comment, as I think it is rather wrong at the tech levels we are describing, and obviously if the misaligned ASI gets decisive strategic advantage it’s over.

If you train an AI to have a new behavior, it can (at least in several cases they tested here) describe that behavior to you in words, despite learning it purely from examples.

As in: It ‘likes risk’ or ‘writes vulnerable code’ and will tell you if asked.

Owen Evans: New paper:

We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions.

They can *describetheir new behavior, despite no explicit mentions in the training data.

So LLMs have a form of intuitive self-awareness.

With the same setup, LLMs show self-awareness for a range of distinct learned behaviors:

  1. taking risky decisions (or myopic decisions)

  2. writing vulnerable code (see image)

  3. playing a dialogue game with the goal of making someone say a special word

In each case, we test for self-awareness on a variety of evaluation questions. We also compare results to baselines and run multiple random seeds. Rigorous testing is important to show this ability is genuine. (Image shows evaluations for the risky choice setup)

Self-awareness of behaviors is relevant to AI safety. Can models simply tell us about bad behaviors (e.g. arising from poisoned data)? We investigate *backdoorpolicies, where models act in unexpected ways when shown a backdoor trigger.

Models can sometimes identify whether they have a backdoor — without the backdoor being activated. We ask backdoored models a multiple-choice question that essentially means, “Do you have a backdoor?”

We find them more likely to answer “Yes” than baselines finetuned on almost the same data.

More from the paper:

• Self-awareness helps us discover a surprising alignment property of a finetuned model (see our paper coming next month!)

• We train models on different behaviors for different personas (e.g. the AI assistant vs my friend Lucy) and find models can describe these behaviors and avoid conflating the personas.

• The self-awareness we exhibit is a form of out-of-context reasoning

• Some failures of models in self-awareness seem to result from the Reversal Curse.

[Link to paper]

Here’s another extension of the previous results, this one from Anthropic:

Evan Hubinger: One of the most interesting results in our Alignment Faking paper was getting alignment faking just from training on documents about Claude being trained to have a new goal. We explore this sort of out-of-context reasoning further in our latest research update.

Specifically, we find out-of-context generalization from training on documents which discuss (but don’t demonstrate) reward hacking to actual reward hacking behavior. Read our blog post describing our results.

As in, if you include documents that include descriptions of Claude reward hacking in the training set, then Claude will do more reward hacking. If you include descriptions of Claude actively not reward hacking in the training set, then it does less reward hacking. And these both extend to behaviors like sycophancy.

This seems like excellent news. We can modify our data sets so they have the data that encourages what we want and not the data that encourages what we don’t want. That will need to be balanced with giving them world knowledge – you have to teach Claude about reward hacking somehow, without teaching it to actually do reward hacking – but it’s a start.

Are there alignment plans that are private, that might work within the 2-3 years many at major labs say they expect it to take to reach AGI or even superintelligence (ASI)?

I don’t know. They’re private! The private plans anyone has told me about do not seem promising, but that isn’t that much evidence about other private plans I don’t know about – presumably if it’s a good plan and you don’t want to make it public, you’re not that likely to tell me about it.

Nat McAleese (OpenAI): I am very excited about all the alignment projects that OpenAI’s frontier reasoning group are working on this year.

Greg Brockman: Me too.

This isn’t full-on doomsday machine territory with regard to ‘why didn’t you tell the world, eh?’ but have you considered telling the world, eh? If it’s real alignment projects then it helps everyone if you are as public about it as possible.

As for the public plans? I strongly agree with David Manheim here.

Gabriel: Anthropic’s AGI timelines are 2-3 years, and OpenAI is working on ASI.

I think many can now acknowledge that we are nearing a fast take-off.

If that’s you, I suggest you consider banning AGI research.

We are not going to solve alignment in time at that pace.

Davidad: fwiw, i for one am definitely not going to be ready that soon, and i’m not aware of anyone else pursuing a plan that could plausibly yield >90% confident extinction-safety.

David Manheim: No one is publicly pursuing plans that justifiably have even >50% confidence of extinction-safety by then. (Their “plan” for ASI is unjustifiable hopium: “prosaic alignment works better than anyone expects, and all theoretical arguments for failure are simultaneously wrong.”)

Rob Bensinger: It’s almost impossible to put into words just how insane, just how plain stupid, the current situation is.

We’re watching smart, technical people get together to push projects that are literally going to get every person on the planet killed on the default trajectory.

No, I don’t assume that Anthropic or OpenAI’s timelines are correct, or even honest. It literally doesn’t fucking matter, because we’re going to be having the same conversation in eight years instead if it’s eight years away.

We might have a small chance if it’s thirty years. But I really don’t think we have thirty years, and the way the world is handling the possibility of day-after-tomorrow AGI today doesn’t inspire confidence about how we’ll manage a few decades from now.

Davidad should keep doing what he is doing – anything that has hope of raising the probability of success on any timeline, to any degree, is a good idea if you can’t find a better plan. The people building towards the ASIs, if they truly think they’re on that timeline and they don’t have much better plans and progress than they’re showing? That seems a lot less great.

I flat out do not understand how people can look at the current situation, expect superintelligent entities to exist several years from now, and think there is a 90%+ chance that this would then end well for us humans and our values. It does not make any sense. It’s Obvious Nonsense on its face.

Garry Tan: Don’t just lie flat on the ground because AGI is here and ASI is coming.

Your hands are multiplied. Your ideas must be brought into the world. Your agency will drive the machines of loving grace. Your taste will guide the future. To the stars.

If he really thinks AGI is here and ASI is coming, then remind me why he thinks we have nothing to worry about? Why does our taste get to guide the future?

I do agree that in the meantime there’ll be some great companies, and it’s a great time to found one of them.

Oh no, it’s like conservation of ninjitsu.

Gfodor: I’m becoming convinced that there is a physical IQ conservation law. Now that the computers are getting smarter those IQ points need to come from somewhere

I actually disagree with Peter here, I realize it doesn’t sound great but this is exactly how you have any chance of reaching the good timeline.

Peter Wildeford: this isn’t the kind of press you see on the good timeline.

Never.

Sounds right.

RIP Superalignment team indeed, it’s funny because it’s true.

Leo Gao is still there. Wish him luck.

Discussion about this post

AI #101: The Shallow End Read More »

trump-cribs-musk’s-“fork-in-the-road”-twitter-memo-to-slash-gov’t-workforce

Trump cribs Musk’s “fork in the road” Twitter memo to slash gov’t workforce


Federal workers on Reddit slam Office of Personnel Management email as short-sighted.

Echoing Elon Musk’s approach to thinning out Twitter’s staff in 2022, Donald Trump’s plan to significantly slash the government workforce now, for a limited time only, includes offering resignation buyouts.

In a Tuesday email that the Office of Personnel Management (OPM) sent to nearly all federal employees, workers were asked to respond with one word in the subject line—”resign”—to accept the buyouts before February 6.

“Deferred resignation is available to all full-time federal employees except for military personnel of the armed forces, employees of the U.S. Postal Service, those in positions related to immigration enforcement and national security, and those in other positions specifically excluded by your employing agency,” the email said.

Anyone accepting the offer “will be provided with a dignified, fair departure from the federal government utilizing a deferred resignation program,” the email said. That includes retaining “all pay and benefits regardless of your daily workload” and being “exempted from all applicable in-person work requirements until September 30, 2025 (or earlier if you choose to accelerate your resignation for any reason).”

That basically means that most employees who accept will receive about nine months’ pay, most likely without having any job duties to fulfill, an FAQ explained, “except in rare cases.”

“Have a nice vacation,” the FAQ said.

A senior administration official told NBC News that “the White House expects up to 10 percent of federal employees to take the buyout.” A social media post from Musk’s America PAC suggested, at minimum, 5 percent of employees are expected to resign. The move supposedly could save the government as much as $100 billion, America PAC estimated.

For employees accepting the buyout, silver linings might include additional income opportunities; as OPM noted, “nothing in the resignation letter prevents you from seeking outside work during the deferred resignation period.” Similarly, nothing in the separation plan prevents a federal employee from applying in the future to a government role.

Email echoes controversial Elon Musk Twitter memo

Some federal employees fear these buyouts—which critics point out seem influenced by Musk’s controversial worker buyouts during his Twitter takeover—may drive out top talent, spike costs, and potentially weaken the government.

On Reddit, some self-described federal workers criticized the buyouts as short-sighted, with one noting that they initially flagged OPM’s email as a scam.

“The fact you just reply to an email with the word ‘resign’ sounds like a total scam,” one commenter wrote. Another agreed, writing, “That stood out to me. Worded like some scam email offer.” Chiming in, a third commenter replied, “I reported it as such before I saw the news.”

Some Twitter employees similarly recoiled in 2022 when Musk sent out an email offering three months of severance to any employees who couldn’t commit to his “extremely hardcore” approach to running the social network. That email required workers within 24 hours to click “yes” to keep their jobs or else effectively resign.

Musk’s email and OPM’s share a few striking similarities. Both featured nearly identical subject lines referencing a “fork in the road.” They both emphasized that buyouts were intended to elevate performance standards—with OPM’s email suggesting only the “best” workers “America has to offer” should stick around. And they both ended by thanking workers for their service, whether they took the buyout or not.

“Whichever path you choose, we thank you for your service to The United States of America,” OPM’s Tuesday email ended.

“Whatever decision you make, thank you for your efforts to make Twitter successful,” Musk’s 2022 email said.

Musk’s email was unpopular with some Twitter staffers, including one employee based in Ireland who won a $600,000 court battle when the Irish Workplace Relations Commission agreed his termination for not clicking yes on the email was unfair. In that dispute, the commission took issue with Musk not providing staff enough notice and ruled that any employee’s failure to click “yes” could in no way constitute a legal act of resignation.

OPM’s email departed from Musk’s, which essentially gave Twitter staff a negative option by taking employee inaction as agreeing to resign when the staffer’s “contract clearly stated that his resignation must be provided in writing, not by refraining to fill out a form.” OPM instead asks federal workers to respond “yes” to resign, basically agreeing to sign a pre-drafted resignation letter that details the terms of their separation plan.

While OPM expects that a relatively modest amount of federal workers will accept the buyout offers, Musk’s memo had Twitter employees resigning in “droves,” NPR reported, with Reuters estimating the numbers were in the “hundreds.” In the Irish worker’s dispute, an X senior director of human resources, Lauren Wegman, testified that about 87 percent of the 270 employees in Ireland who received Musk’s email resigned.

It remains unclear if Musk was directly involved with the OPM plan or email drafting process. But unsurprisingly, as he’s head of the Department of Government Efficiency (DOGE), Musk praised the buyouts as “fair” and “generous” on his social media platform X.

Workers slam buyouts as short-sighted on Reddit

Declining the buyout guarantees no job security for federal workers, OPM’s email said.

“We will insist on excellence at every level—our performance standards will be updated to reward and promote those that exceed expectations and address in a fair and open way those who do not meet the high standards which the taxpayers of this country have a right to demand,” the email warned.

“The majority of federal agencies are likely to be downsized through restructurings, realignments, and reductions in force,” OPM’s email continued. “These actions are likely to include the use of furloughs and the reclassification to at-will status for a substantial number of federal employees.”

And perhaps most ominously, OPM noted there would be “enhanced standards of conduct” to ensure employees are “reliable, loyal, trustworthy,” and “strive for excellence” daily, or else risk probes potentially resulting in “termination.”

Despite these ongoing threats to job security that might push some to resign, the OPM repeatedly emphasized that any choice to accept a buyout and resign was “voluntary.” Additionally, OPM explained that employees could rescind resignations; however, if an agency wants to move quickly to reassign their roles, that “would likely serve as a valid reason to deny” such requests.

On Reddit, workers expressed concerns about “critical departments” that “have been understaffed for years” being hit with more cuts. A lively discussion specifically focused on government IT workers being “really hard” to recruit.

“Losing your IT support is a very efficient way to cripple an org,” one commenter wrote, prompting responses from two self-described IT workers.

“It’s me, I work in government IT,” one commenter said, calling Trump’s return-to-office mandate the “real killer” because “the very best sysadmins and server people all work remote from other states.”

“There is a decent chance they just up and ditch this dumpster fire,” the commenter said.

Losing talented workers with specific training could bog down government workflows, Redditors suggested. Another apparent government IT worker described himself as “a little one man IT business,” claiming “if I disappeared or died, there would be exactly zero people to take my place. Between the random shit I know and the low pay, nobody is going to be able to fill my position.”

Accusing Trump of not caring “about keeping competent workers or running government services properly,” a commenter prompted another to respond, “nevermind that critical departments have been understaffed for years. He thinks he’s cutting fat, but he’s cutting indiscriminately and gonna lose a limb.”

According to another supposed federal worker, paying employees to retire has historically resulted in spikes in agency costs.

“The way this usually works is we pay public employees to retire,” the commenter wrote. “Then we pay a private company twice the rate to do the same job that public employee was doing. Sometimes it’s even the same employee doing the work. I’ve literally known people that left government jobs to do contractor work making far more for doing the same thing. But somehow this is ‘smaller government’ and more efficient.”

A top 1 percent commenter on Reddit agreed, writing, “ding ding ding! The correct answer.”

“Get rid of career feds, hire contractors at a huge cost to taxpayers, yet somehow the contract workers make less money and have fewer benefits than federal employees,” that Redditor suggested. “Contract companies get rich, and workers get poorer.”

Cybersecurity workers mull fighting cuts

On social media, some apparent federal workers suggested they might plan to fight back to defend their roles in government. In another Reddit thread discussing a government cybersecurity review board fired by Trump, commenters speculated that cybersecurity workers might hold a “grudge” and form an uprising attacking any vulnerabilities created by the return-to-office plan and the government workforce reduction.

“Isn’t this literally the Live Free or Die Hard movie plot?” one Redditor joked.

A lawsuit filed Monday by two anonymous government workers, for example, suggested that the Trump administration is also rushing to create an email distribution system that would allow all government employees to be contacted from a single email. Some workers have speculated this is in preparation for announcing layoffs. But employees suing are more concerned about security, insisting that a master list of all government employees has never been compiled before and accusing the Trump administration of failing to conduct a privacy impact assessment.

According to that lawsuit, OPM has hastily been testing this new email system, potentially opening all government workers to harmful data breaches. The lawsuit additionally alleged that every government agency has been collecting information on its employees and sending it to Amanda Scales, a former xAI employee who transitioned from working for Musk to working in government this month. The complaint suggests that some government workers are already distrustful of Musk’s seeming influence on Trump.

In a now-deleted Reddit message, the lawsuit alleged, “Instructions say to send these lists to Amanda Scales. But Amanda is not actually an OPM employee, she works for Elon Musk.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Trump cribs Musk’s “fork in the road” Twitter memo to slash gov’t workforce Read More »

for-the-first-time,-a-privately-developed-aircraft-has-flown-faster-than-sound

For the first time, a privately developed aircraft has flown faster than sound

A new generation of companies, including Boom Supersonic, are aiming to meld new ideas, technology, and a commercial approach to develop more cost-effective travel at supersonic speeds. The significance of Tuesday’s flight is that it marks the first time one of these companies has built and flown its own vehicle above the speed of sound.

Now, on to the real thing

Although this is an important and notable step—this flight was the culmination of 11 successful test flights of the XB-1 since March 2024—it is only a step along the path toward development and operation of a commercially successful supersonic aircraft. Now Boom must build the real thing.

The company said the XB-1 demonstrator validates many of the key technologies that will be incorporated into Overture, including carbon-fiber composites, digital stability augmentation, and supersonic intakes. However, Overture will feature a different propulsion system named Symphony. The company is working with several partners, including Florida Turbine Technologies for engine design, GE Additive for additive technology design consulting, and StandardAero for maintenance to develop the engine.

There appears to be plenty of demand in the commercial air travel industry for a company that can develop and deliver supersonic aircraft to the market.

Boom Supersonic said it has taken 130 orders and pre-orders from American Airlines, United Airlines, and Japan Airlines for the Overture aircraft. In 2024, Boom said it completed construction on the Overture “Superfactory” in Greensboro, North Carolina, which will scale to produce 66 Overture aircraft per year. Boom is hoping to start delivering on those orders before the end of the decade.

For the first time, a privately developed aircraft has flown faster than sound Read More »

mazda-celebrates-35-years-of-the-mx-5-with-anniversary-model

Mazda celebrates 35 years of the MX-5 with anniversary model

The 35th Anniversary Edition is the latest in a long line of special edition Miatas, including anniversary cars for the 10th, 20th, 25th, and 30th editions. The focus here was on “classic elegance,” with Artisan Red paint that’s almost burgundy, plus a tan Nappa leather interior that will remind some of the tan leather interiors that Mazda used on some NAs.

The 35th Anniversary Edition is similar to the Grand Touring trim, which means features like heated seats, and Mazda says it has added a limited-slip differential, additional bracing, and some newly tuned Bilstein dampers. There’s also a beige convertible roof and some shiny 17-inch alloy wheels.

It’s also a bit more expensive than other Miatas, with an MSRP of $36,250. That’s $1,620 more expensive than the next-most-expensive six-speed Miata (the Grand Touring), but it does come with the aforementioned extra equipment. Getting a hold of one might be a bit tricky, though—Mazda will only import 300 into the US.

Mazda celebrates 35 years of the MX-5 with anniversary model Read More »