Dwarkesh

dwarkesh-patel-on-continual-learning

Dwarkesh Patel on Continual Learning

A key question going forward is the extent to which making further AI progress will depend upon some form of continual learning. Dwarkesh Patel offers us an extended essay considering these questions and reasons to be skeptical of the pace of progress for a while. I am less skeptical about many of these particular considerations, and do my best to explain why in detail.

Separately, Ivanka Trump recently endorsed a paper with a discussion I liked a lot less but that needs to be discussed given how influential her voice might (mind you I said might) be to policy going forward, so I will then cover that here as well.

Dwarkesh Patel explains why he doesn’t think AGI is right around the corner, and why AI progress today is insufficient to replace most white collar employment: That continual learning is both necessary and unsolved, and will be a huge bottleneck.

He opens with this quote:

Rudiger Dornbusch: Things take longer to happen than you think they will, and then they happen faster than you thought they could.

Clearly this means one is poorly calibrated, but also yes, and I expect it to feel like this as well. Either capabilities, diffusion or both will be on an exponential, and the future will be highly unevenly distributed until suddenly parts of it aren’t anymore. That seems to be true fractally as well, when the tech is ready and I figure out how to make AI do something, that’s it, it’s done.

Here is Dwarkesh’s Twitter thread summary:

Dwarkesh Patel: Sometimes people say that even if all AI progress totally stopped, the systems of today would still be economically transformative. I disagree. The reason that the Fortune 500 aren’t using LLMs to transform their workflows isn’t because the management is too stodgy.

Rather, it’s genuinely hard to get normal humanlike labor out of LLMs. And this has to do with some fundamental capabilities these models lack.

New blog post where I explain why I disagree with this, and why I have slightly longer timelines to AGI than many of my guests.

I think continual learning is a huge bottleneck to the usefulness of these models, and extended computer use may take years to sort out.

Link here.

There is no consensus definition of transformational but I think this is simply wrong, in the sense that LLMs being stuck without continual learning at essentially current levels would not stop them from having a transformational impact. There are a lot of other ways to get a ton more utility out of what we already have, and over time we would build around what the models can do rather than giving up the moment they don’t sufficiently neatly fit into existing human-shaped holes.

When we do solve human like continual learning, however, we might see a broadly deployed intelligence explosion *even if there’s no more algorithmic progress*.

Simply from the AI amalgamating the on-the-job experience of all the copies broadly deployed through the economy.

I’d bet 2028 for computer use agents that can do taxes end-to-end for my small business as well as a competent general manager could in a week: including chasing down all the receipts on different websites, emailing back and forth for invoices, and filing to the IRS.

That being said, you can’t play around with these models when they’re in their element and still think we’re not on track for AGI.

Strongly agree with that last statement. Regardless of how much we can do without strictly solving continual learning, continual learning is not solved… yet.

These are simple, self contained, short horizon, language in-language out tasks – the kinds of assignments that should be dead center in the LLMs’ repertoire. And they’re 5/10 at them. Don’t get me wrong, that’s impressive.

But the fundamental problem is that LLMs don’t get better over time the way a human would. The lack of continual learning is a huge huge problem. The LLM baseline at many tasks might be higher than an average human’s. But there’s no way to give a model high level feedback.

You’re stuck with the abilities you get out of the box. You can keep messing around with the system prompt. In practice this just doesn’t produce anything even close to the kind of learning and improvement that human employees experience.

The reason humans are so useful is not mainly their raw intelligence. It’s their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task.

You make an AI tool. It’s 5/10 out of the box. What level of Skill Issue are we dealing with here, that stops it from getting better over time assuming you don’t get to upgrade the underlying model?

You can obviously engage in industrial amounts of RL or other fine-tuning, but that too only goes so far.

You can use things like memory, or train LoRas, or various other incremental tricks. That doesn’t enable radical changes, but I do think it can work for the kinds of preference learning Dwarkesh is complaining he currently doesn’t have access to, and you can if desired go back and fine tune the entire system periodically.

How do you teach a kid to play a saxophone? You have her try to blow into one, listen to how it sounds, and adjust. Now imagine teaching saxophone this way instead: A student takes one attempt. The moment they make a mistake, you send them away and write detailed instructions about what went wrong. The next student reads your notes and tries to play Charlie Parker cold. When they fail, you refine the instructions for the next student.

This just wouldn’t work. No matter how well honed your prompt is, no kid is just going to learn how to play saxophone from just reading your instructions. But this is the only modality we as users have to ‘teach’ LLMs anything.

Are you even so sure about that? If the context you can give is hundreds of thousands to millions of tokens at once, with ability to conditionally access millions or billions more? If you can create new tools and programs and branch workflows, or have it do so on your behalf, and call instances with different contexts and procedures for substeps? If you get to keep rewinding time and sending in the exact same student in the same mental state as many times as you want? And so on, including any number of things I haven’t mentioned or thought about?

I am confident that with enough iterations and work (and access to the required physical tools) I could write a computer program to operate a robot to play the saxophone essentially perfectly. No, you can’t do this purely via the LLM component, but that is why we are moving towards MCP and tool use for such tasks.

I get that Dwarkesh has put a lot of work into getting his tools to 5/10. But it’s nothing compared to the amount of work that could be done, including the tools that could be involved. That’s not a knock on him, that wouldn’t be a good use of his time yet.

LLMs actually do get kinda smart and useful in the middle of a session. For example, sometimes I’ll co-write an essay with an LLM. I’ll give it an outline, and I’ll ask it to draft the essay passage by passage. All its suggestions up till 4 paragraphs in will be bad. So I’ll just rewrite the whole paragraph from scratch and tell it, “Hey, your shit sucked. This is what I wrote instead.” At that point, it can actually start giving good suggestions for the next paragraph. But this whole subtle understanding of my preferences and style is lost by the end of the session.

Okay, so that seems like it is totally, totally a Skill Issue now? As in, Dwarkesh Patel has a style. A few paragraphs of that style clue the LLM into knowing how to help. So… can’t we provide it with a bunch of curated examples of similar exercises, and put them into context in various ways (Claude projects just got 10x more context!) and start with that?

Even Claude Code will often reverse a hard-earned optimization that we engineered together before I hit /compact – because the explanation for why it was made didn’t make it into the summary.

Yeah, this is super annoying, I’ve run into it, but I can think of some obvious fixes for this, especially if you notice what you want to preserve? One obvious way is to do what humans do, which is to put it into comments in the code saying what the optimization is and why to keep it, which then remain in context whenever Claude considers ripping them out, I don’t know if that works yet but it totally should.

I’m not saying I have the magical solution to all this but it all feels like it’s One Weird Trick (okay, maybe 10 working together) away from working in ways I could totally figure out if I had a team behind me and I focused on it.

My guess is this will not look like ‘learn like a human’ exactly. Different tools are available, so we’ll first get the ability to solve this via doing something different. But also, yeah, I think with enough skill and the right technique (on the level of the innovation that created reasoning models) you could basically do what humans do? Which involves effectively having the systems automatically engage in various levels of meta and updating, often quite heavily off a single data point.

It is hard to overstate how much time and effort goes into training a human employee.

There are many jobs where an employee is not net profitable for years. Hiring decisions are often made on the basis of what will be needed in year four or beyond.

That ignores the schooling that you also have to do. A doctor in America requires starting with a college degree, then four years of medical school, then four years of residency, and we have to subsidize that residency because it is actively unprofitable. That’s obviously an extreme case, but there are many training programs or essentially apprenticeships that last for years, including highly expensive time from senior people and expensive real world mistakes.

Imagine what it took to make Dwarkesh Patel into Dwarkesh Patel. Or the investment he makes in his own employees.

Even afterwards, in many ways you will always be ‘stuck with’ various aspects of those employees, and have to make the most of what they offer. This is standard.

Claude Opus estimates, and I think this is reasonable, that for every two hours humans spend working, they spend one hour learning, with a little less than half of that learning essentially ‘on the job.’

If you need to train a not a ‘universal’ LLM but a highly specific-purpose LLM, and have a massive compute budget with which to do so, and you mostly don’t care about how it performs out of distribution the same way you mostly don’t for an employee (as in, you teach it what you teach a human, which is ‘if this is outside your distribution or you’re failing at it then run it up the chain to your supervisor,’ and you have a classifier for that) and you can build and use tools along the way? Different ballgame.

It makes sense, given the pace of progress, for most people and companies not to put that kind of investment into AI ‘employees’ or other AI tasks. But if things do start to stall out, or they don’t, either way the value proposition on that will quickly improve. It will start to be worth doing. And we will rapidly learn new ways of doing it better, and have the results available to be copied.

Here’s his predictions on computer use in particular, to see how much we actually disagree:

When I interviewed Anthropic researchers Sholto Douglas and Trenton Bricken on my podcast, they said that they expect reliable computer use agents by the end of next year. We already have computer use agents right now, but they’re pretty bad. They’re imagining something quite different.

Their forecast is that by the end of next year, you should be able to tell an AI, “Go do my taxes.” And it goes through your email, Amazon orders, and Slack messages, emails back and forth with everyone you need invoices from, compiles all your receipts, decides which are business expenses, asks for your approval on the edge cases, and then submits Form 1040 to the IRS.

I’m skeptical. I’m not an AI researcher, so far be it for me to contradict them on technical details. But given what little I know, here’s why I’d bet against this forecast:

  • As horizon lengths increase, rollouts have to become longer. The AI needs to do two hours worth of agentic computer use tasks before we can even see if it did it right. Not to mention that computer use requires processing images and video, which is already more compute intensive, even if you don’t factor in the longer rollout. This seems like this should slow down progress.

Let’s take the concrete example here, ‘go do my taxes.’

This is a highly agentic task, but like a real accountant you can choose to ‘check its work’ if you want, or get another AI to check the work, because you can totally break this down into smaller tasks that allow for verification, or present a plan of tasks that can be verified. Similarly, if you are training TaxBot to do people’s taxes for them, you can train TaxBot on a lot of those individual subtasks, and give it clear feedback.

Almost all computer use tasks are like this? Humans also mostly don’t do things that can’t be verified for hours?

And the core building block issues of computer use seem mostly like very short time horizon tasks with very easy verification methods. If you can get lots of 9s on the button clicking and menu navigation and so on, I think you’re a lot of the way there.

The subtasks are also 99%+ things that come up relatively often, and that don’t present any non-trivial difficulties. A human accountant already will have to occasionally say ‘wait, I need you the taxpayer to tell me what the hell is up with this thing’ and we’re giving the AI in 2028 the ability to do this too.

I don’t see any fundamental difference between the difficulties being pointed out here, and the difficulties of tasks we have already solved.

  • We don’t have a large pretraining corpus of multimodal computer use data. I like this quote from Mechanize’s post on automating software engineering: “For the past decade of scaling, we’ve been spoiled by the enormous amount of internet data that was freely available for us to use. This was enough for cracking natural language processing, but not for getting models to become reliable, competent agents. Imagine trying to train GPT-4 on all the text data available in 1980—the data would be nowhere near enough, even if we had the necessary compute.”

    Again, I’m not at the labs. Maybe text only training already gives you a great prior on how different UIs work, and what the relationship between different components is. Maybe RL fine tuning is so sample efficient that you don’t need that much data. But I haven’t seen any public evidence which makes me think that these models have suddenly gotten less data hungry, especially in this domain where they’re substantially less practiced.

    Alternatively, maybe these models are such good front end coders that they can just generate millions of toy UIs for themselves to practice on. For my reaction to this, see bullet point below.

I’m not going to keep working for the big labs for free on this one by giving even more details on how I’d solve all this, but this totally seems like highly solvable problems, and also this seems like a case of the person saying it can’t be done interrupting the people doing it? It seems like progress is being made rapidly.

  • Even algorithmic innovations which seem quite simple in retrospect seem to take a long time to iron out. The RL procedure which DeepSeek explained in their R1 paper seems simple at a high level. And yet it took 2 years from the launch of GPT-4 to the release of o1.

  • Now of course I know it is hilariously arrogant to say that R1/o1 were easy – a ton of engineering, debugging, pruning of alternative ideas was required to arrive at this solution. But that’s precisely my point! Seeing how long it took to implement the idea, ‘Train the model to solve verifiable math and coding problems’, makes me think that we’re underestimating the difficulty of solving the much gnarlier problem of computer use, where you’re operating in a totally different modality with much less data.

I think two years is how long we had to have the idea of o1 and commit to it, then to implement it. Four months is roughly the actual time it took from ‘here is that sentence and we know it works’ to full implementation. Also we’re going to have massively more resources to pour into these questions this time around, and frankly I don’t think any of these insights are even as hard to find as o1, especially now that we have reasoning models to use as part of this process.

I think there are other potential roadblocks along the way, and once you factor all of those in you can’t be that much more optimistic, but I see this particular issue as not that likely to pose that much of a bottleneck for long.

His predictions are he’d take 50/50 bets on: 2028 for an AI that can ‘just go do your taxes as well as a human accountant could’ and 2032 for ‘can learn details and preferences on the job as well as a human can.’ I’d be inclined to take other side of both of those bets, assuming it means by EOY, for the 2032 one we’d need to flesh out details.

But if we have the ‘AI that does your taxes’ in 2028 then 2029 and 2030 look pretty weird, because this implies other things:

Daniel Kokotajlo: Great post! This is basically how I think about things as well. So why the difference in our timelines then?

–Well, actually, they aren’t that different. My median for the intelligence explosion is 2028 now (one year longer than it was when writing AI 2027), which means early 2028 or so for the superhuman coder milestone described in AI 2027, which I’d think roughly corresponds to the “can do taxes end-to-end” milestone you describe as happening by end of 2028 with 50% probability. Maybe that’s a little too rough; maybe it’s more like month-long horizons instead of week-long. But at the growth rates in horizon lengths that we are seeing and that I’m expecting, that’s less than a year…

–So basically it seems like our only serious disagreement is the continual/online learning thing, which you say 50% by 2032 on whereas I’m at 50% by end of 2028. Here, my argument is simple: I think that once you get to the superhuman coder milestone, the pace of algorithmic progress will accelerate, and then you’ll reach full AI R&D automation and it’ll accelerate further, etc. Basically I think that progress will be much faster than normal around that time, and so innovations like flexible online learning that feel intuitively like they might come in 2032 will instead come later that same year.

(For reference AI 2027 depicts a gradual transition from today to fully online learning, where the intermediate stages look something like “Every week, and then eventually every day, they stack on another fine-tuning run on additional data, including an increasingly high amount of on-the-job real world data.” A janky unprincipled solution in early 2027 that gives way to more elegant and effective things midway through the year.)

I found this an interestingly wrong thing to think:

Richard: Given the risk of fines and jail for filling your taxes wrong, and the cost of processing poor quality paperwork that the government will have to bear, it seems very unlikely that people will want AI to do taxes, and very unlikely that a government will allow AI to do taxes.

The rate of fully accurately filing your taxes is, for anyone whose taxes are complex, basically 0%. Everyone makes mistakes. When the AI gets this right almost every time, it’s already much better than a human accountant, and you’ll have a strong case that what happened was accidental, which means at worst you pay some modest penalties.

Personal story, I was paying accountants at a prestigious firm that will go unnamed to do my taxes, and they literally just forgot to include paying city tax at all. As in, I’m looking at the forms, and I ask, ‘wait why does it have $0 under city tax?’ and the guy essentially says ‘oh, whoops.’ So, yeah. Mistakes are made. This will be like self-driving cars, where we’ll impose vastly higher standards of accuracy and law abidance on the AIs, and they will meet them because the bar really is not that high.

There were also some good detailed reactions and counterarguments from others:

Near: finally some spicy takes around here.

Rohit: The question is whether we need humanlike labour for transformative economic outcomes, or whether we can find ways to use the labour it does provide with a different enough workflow that it adds substantial economic advantage.

Sriram Krishnan: Really good post from @dwarkesh_sp on continuous learning in LLMs.

Vitalik Buterin: I have high probability mass on longer timelines, but this particular issue feels like the sort of limitation that’s true until one day someone discovers a magic trick (think eg. RL on CoT) that suddenly makes it no longer true.

Sriram Krishnan: Agree – CoT is a particularly good example.

Ryan Greenblatt: I agree with much of this post. I also have roughly 2032 medians to things going crazy, I agree learning on the job is very useful, and I’m also skeptical we’d see massive white collar automation without further AI progress.

However, I think Dwarkesh is wrong to suggest that RL fine-tuning can’t be qualitatively similar to how humans learn.

In the post, he discusses AIs constructing verifiable RL environments for themselves based on human feedback and then argues this wouldn’t be flexible and powerful enough to work, but RL could be used more similarly to how humans learn.

My best guess is that the way humans learn on the job is mostly by noticing when something went well (or poorly) and then sample efficiently updating (with their brain doing something analogous to an RL update). In some cases, this is based on external feedback (e.g. from a coworker) and in some cases it’s based on self-verification: the person just looking at the outcome of their actions and then determining if it went well or poorly.

So, you could imagine RL’ing an AI based on both external feedback and self-verification like this. And, this would be a “deliberate, adaptive process” like human learning. Why would this currently work worse than human learning?

Current AIs are worse than humans at two things which makes RL (quantitatively) much worse for them:

1. Robust self-verification: the ability to correctly determine when you’ve done something well/poorly in a way which is robust to you optimizing against it.

2. Sample efficiency: how much you learn from each update (potentially leveraging stuff like determining what caused things to go well/poorly which humans certainly take advantage of). This is especially important if you have sparse external feedback.

But, these are more like quantitative than qualitative issues IMO. AIs (and RL methods) are improving at both of these.

All that said, I think it’s very plausible that the route to better continual learning routes more through building on in-context learning (perhaps through something like neuralese, though this would greatly increase misalignment risks…).

Some more quibbles:

– For the exact podcasting tasks Dwarkesh mentions, it really seems like simple fine-tuning mixed with a bit of RL would solve his problem. So, an automated training loop run by the AI could probably work here. This just isn’t deployed as an easy-to-use feature.

– For many (IMO most) useful tasks, AIs are limited by something other than “learning on the job”. At autonomous software engineering, they fail to match humans with 3 hours of time and they are typically limited by being bad agents or by being generally dumb/confused. To be clear, it seems totally plausible that for podcasting tasks Dwarkesh mentions, learning is the limiting factor.

– Correspondingly, I’d guess the reason that we don’t see people trying more complex RL based continual learning in normal deployments is that there is lower hanging fruit elsewhere and typically something else is the main blocker. I agree that if you had human level sample efficiency in learning this would immediately yield strong results (e.g., you’d have very superhuman AIs with 10^26 FLOP presumably), I’m just making a claim about more incremental progress.

– I think Dwarkesh uses the term “intelligence” somewhat atypically when he says “The reason humans are so useful is not mainly their raw intelligence. It’s their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task.” I think people often consider how fast someone learns on the job as one aspect of intelligence. I agree there is a difference between short feedback loop intelligence (e.g. IQ tests) and long feedback loop intelligence and they are quite correlated in humans (while AIs tend to be relatively worse at long feedback loop intelligence).

More thoughts/quibbles:

– Dwarkesh notes “An AI that is capable of online learning might functionally become a superintelligence quite rapidly, even if there’s no algorithmic progress after that point.” This seems reasonable, but it’s worth noting that if sample efficient learning is very compute expensive, then this might not happen so rapidly.

– I think AIs will likely overcome poor sample efficiency to achieve a very high level of performance using a bunch of tricks (e.g. constructing a bunch of RL environments, using a ton of compute to learn when feedback is scarce, learning from much more data than humans due to “learn once deploy many” style strategies). I think we’ll probably see fully automated AI R&D prior to matching top human sample efficiency at learning on the job. Notably, if you do match top human sample efficiency at learning (while still using a similar amount of compute to the human brain), then we already have enough compute for this to basically immediately result in vastly superhuman AIs (human lifetime compute is maybe 3e23 FLOP and we’ll soon be doing 1e27 FLOP training runs). So, either sample efficiency must be worse or at least it must not be possible to match human sample efficiency without spending more compute per data-point/trajectory/episode.

Matt Reardon: Dwarkesh commits the sin of thinking work you’re personally close to is harder-than-average to automate.

Herbie Bradley: I mean this is just correct? most researchers I know think continual learning is a big problem to be solved before AGI

Matt Reardon: My main gripe is that “<50%" [of jobs being something you can automate soon] should be more like "<15%"

Danielle Fong: Gell-Mann Amnesia for AI.

Reardon definitely confused me here, but either way I’d say that Dwarkesh Patel is a 99th percentile performer. He does things most other people can’t do. That’s probably going to be harder to automate than most other white collar work? The bulk of hours in white collar work are very much not bespoke things and don’t act to put state or memory into people in subtle ways?

Now that we’ve had a good detailed discussion and seen several perspectives, it’s time to address another discussion of related issues, because it is drawing attention from an unlikely source.

After previously amplifying Situational Awareness, Ivanka Trump is back in the Essay Meta with high praise for The Era of Experience, authored by David Silver and (oh no) Richard Sutton.

Situational Awareness was an excellent pick. I do not believe this essay was a good pick. I found it a very frustrating, unoriginal and unpersuasive paper to read. To the extent it is saying something new I don’t agree, but it’s not clear to what extent it is saying anything new. Unless you want to know about this paper exactly because Ivanka is harping it, you should skip this section.

I think the paper effectively mainly says we’re going to do a lot more RL and we should stop trying to make the AIs mimic, resemble or be comprehensible to humans or trying to control their optimization targets?

Ivanka Trump: Perhaps the most important thing you can read about AI this year : “Welcome to the Era of Experience”

This excellent paper from two senior DeepMind researchers argues that AI is entering a new phase—the “Era of Experience”—which follows the prior phases of simulation-based learning and human data-driven AI (like LLMs).

The authors’ posit that future AI breakthroughs will stem from learning through direct interaction with the world, not from imitating human-generated data.

This is not a theory or distant future prediction. It’s a description of a paradigm shift already in motion.

Let me know what you think !

Glad you asked, Ivanka! Here’s what I think.

The essay starts off with a perspective we have heard before, usually without much of an argument behind it: That LLMs and other AIs trained only on ‘human data’ is ‘rapidly approaching a limit,’ we are running out of high-quality data, and thus to progress significantly farther AIs will need to move into ‘the era of experience,’ meaning learning continuously from their environments.

I agree that the standard ‘just feed it more data’ approach will run out of data with which to scale, but there are a variety of techniques already being used to get around this. We have lots of options.

The leading example the paper itself gives of this in the wild is AlphaProof, which ‘interacted with a formal proofing system’ which seems to me like a clear case of synthetic data working and verification being easier than generation, rather than ‘experience.’ If the argument is simply that RL systems will learn by having their outputs evaluated, that isn’t news.

They claim to have in mind something rather different from that, and with this One Weird Trick they assert Superintelligence Real Soon Now:

Our contention is that incredible new capabilities will arise once the full potential of experiential learning is harnessed. This era of experience will likely be characterised by agents and environments that, in addition to learning from vast quantities of experiential data, will break through the limitations of human-centric AI systems in several further dimensions:

• Agents will inhabit streams of experience, rather than short snippets of interaction.

• Their actions and observations will be richly grounded in the environment, rather than interacting via human dialogue alone.

• Their rewards will be grounded in their experience of the environment, rather than coming from human prejudgement.

• They will plan and/or reason about experience, rather than reasoning solely in human terms.

We believe that today’s technology, with appropriately chosen algorithms, already provides a sufficiently powerful foundation to achieve these breakthroughs. Furthermore, the pursuit of this agenda by the AI community will spur new innovations in these directions that rapidly progress AI towards truly superhuman agents.

I suppose if the high level takeaway is ‘superintelligence is likely coming reasonably soon with the right algorithms’ then there’s no real disagreement?

They then however discuss tool calls and computer use, which then seems like a retreat back into an ordinary RL paradigm? It’s also not clear to me what the authors mean by ‘human terms’ versus ‘plan and/or reason about experience,’ or even what ‘experience’ means here. They seem to be drawing a distinction without a difference.

If the distinction is simply (as the paper implies in places) that the agents will do self-evaluation rather than relying on human feedback, I have some important news about how existing systems already function? They use the human feedback and other methods to train an AI feedback system that does most of the work? And yes they often include ‘real world’ feedback systems in that? What are we even saying here?

They also seem to be drawing a distinction between the broke ‘human feedback’ and the bespoke ‘humans report physical world impacts’ (or ‘other systems measure real world impacts’) as if the first does not often encompass the second. I keep noticing I am confused what the authors are trying to say.

For reasoning, they say it is unlikely human methods of reasoning and human language are optimal, more efficient methods of thought must exist. I mean, sure, but that’s also true for humans, and it’s obvious that you can use ‘human style methods of thought’ to get to superintelligence by simply imagining a human plus particular AI advantages.

As many have pointed out (and is central to AI 2027) encouraging AIs to use alien-looking inhuman reasoning styles we cannot parse is likely a very bad idea even if it would be more effective, what visibility we have will be lost and also it likely leads to alien values and breaks many happy things. Then again, Richard Sutton is one of the authors of this paper and he thinks we should welcome succession, as in the extinction of humanity, so he wouldn’t care.

They try to argue against this by saying that while agents pose safety risks and this approach may increase those safety risks, the approach may also have safety benefits. First, they say this allows the AI to adapt to its environment, as if the other agent could not do this or this should make us feel safer.

Second, they say ‘the reward function may itself be adapted through experience,’ in terms of risk that’s worse you know that that’s worse, right? They literally say ‘rather than blindly optimizing a signal such as the number of paperclips it can adopt to indications of human concern,’ this shows a profound lack of understanding and curiosity of where the whole misspecification of rewards problem is coming from or the arguments about it from Yudkowsky (since they bring in the ‘paperclips’).

Adapting autonomously and automatically towards something like ‘level of human concern’ is exactly the kind of metric and strategy that is absolutely going to encourage perverse outcomes and get you killed at the limit. You don’t get out of the specification problem by saying you can specify something messier and let the system adapt around it autonomously, that only makes it worse, and in no way addresses the actual issue.

The final argument for safety is that relying on physical experience creates time limitations, which provides a ‘natural break,’ which is saying that capabilities limits imposed by physical interactions will keep things more safe? Seriously?

There is almost nothing in the way of actual evidence or argument in the paper that is not fully standard, beyond a few intuition pumps. There are many deep misunderstandings, including fully backwards arguments, along the way. We may well want to rely a lot more on RL and on various different forms of ‘experiential’ data and continuous learning, but given how much worse it was than I expected this post updated me in the opposite direction of that which was clearly intended.

Discussion about this post

Dwarkesh Patel on Continual Learning Read More »

on-dwarkesh-patel’s-4th-podcast-with-tyler-cowen

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Dwarkesh Patel again interviewed Tyler Cowen, largely about AI, so here we go.

Note that I take it as a given that the entire discussion is taking place in some form of an ‘AI Fizzle’ and ‘economic normal’ world, where AI does not advance too much in capability from its current form, in meaningful senses, and we do not get superintelligence [because of reasons]. It’s still massive additional progress by the standards of any other technology, but painfully slow by the ‘AGI is coming soon’ crowd.

That’s the only way I can make the discussion make at least some sense, with Tyler Cowen predicting 0.5%/year additional RGDP growth from AI. That level of capabilities progress is a possible world, although the various elements stated here seem like they are sometimes from different possible worlds.

I note that this conversation was recorded prior to o3 and all the year end releases. So his baseline estimate of RGDP growth and AI impacts has likely increased modestly.

I go very extensively into the first section on economic growth and AI. After that, the podcast becomes classic Tyler Cowen and is interesting throughout, but I will be relatively sparing in my notes in other areas, and am skipping over many points.

This is a speed premium and ‘low effort’ post, in the sense that this is mostly me writing down my reactions and counterarguments in real time, similar to how one would do a podcast. It is high effort in that I spent several hours listening to, thinking about and responding to the first fifteen minutes of a podcast.

As a convention: When I’m in the numbered sections, I’m reporting what was said. When I’m in the secondary sections, I’m offering (extensive) commentary. Timestamps are from the Twitter version.

[EDIT: In Tyler’s link, he correctly points out a confusion in government spending vs. consumption, which I believe is fixed now. As for his comment about market evidence for the doomer position, I’ve given my answer before, and I would assert the market provides substantial evidence neither in favor or against anything but the most extreme of doomer positions, as in extreme in a way I have literally never heard one person assert, once you control for its estimate of AI capabilities (where it does indeed offer us evidence, and I’m saying that it’s too pessimistic). We agree there is no substantial and meaningful ‘peer-reviewed’ literature on the subject, in the way that Tyler is pointing.]

They recorded this at the Progress Studies conference, and Tyler Cowen has a very strongly held view that AI won’t accelerate RGDP growth much that Dwarkesh clearly does not agree with, so Dwarkesh Patel’s main thrust is to try comparisons and arguments and intuition pumps to challenge Tyler. Tyler, as he always does, has a ready response to everything, whether or not it addresses the point of the question.

  1. (1: 00) Dwarkesh doesn’t waste any time and starts off asking why we won’t get explosive economic growth. Tyler’s first answer is cost disease, that as AI works in some parts of the economy costs in other areas go up.

    1. That’s true in relative terms for obvious reasons, but in absolute terms or real resource terms the opposite should be true, even if we accept the implied premise that AI won’t simply do everything anyway. This should drive down labor costs and free up valuable human capital. It should aid in availability of many other inputs. It makes almost any knowledge acquisition, strategic decision or analysis, data analysis or gathering, and many other universal tasks vastly better.

    2. Tyler then answers this directly when asked at (2: 10) by saying cost disease is not about employees per se, it’s more general, so he’s presumably conceding the point about labor costs, saying that non-intelligence inputs that can’t be automated will bind more and thus go up in price. I mean, yes, in the sense that we have higher value uses for them, but so what?

    3. So yes, you can narrowly define particular subareas of some areas as bottlenecks and say that they cannot grow, and perhaps they can even be large areas if we impose costlier bottlenecks via regulation. But that still leaves lots of room for very large economic growth for a while – the issue can’t bind you otherwise, the math doesn’t work.

  2. Tyler says government consumption [EDIT: I originally misheard this as spending, he corrected me, I thank him] at 18% of GDP (government spending is 38% but a lot of that is duplicative and a lot isn’t consumption), health care at 20%, education is 6% (he says 6-7%, Claude says 6%), the nonprofit sector (Claude says 5.6%) and says together that is half of the economy. Okay, sure, let’s tackle that.

    1. Healthcare is already seeing substantial gains from AI even at current levels. There are claims that up to 49% of half of doctor time is various forms of EMR and desk work that AIs could reduce greatly, certainly at least ~25%. AI can directly substitute for much of what doctors do in terms of advising patients, and this is already happening where the future is distributed. AI substantially improves medical diagnosis and decision making. AI substantially accelerates drug discovery and R&D, will aid in patient adherence and monitoring, and so on. And again, that’s without further capability gains. Insurance companies doubtless will embrace AI at every level. Need I go on here?

    2. Government spending at all levels is actually about 38% of GDP, but that’s cheating, only ~11% is non-duplicative and not transfers, interest (which aren’t relevant) or R&D (I’m assuming R&D would get a lot more productive).

    3. The biggest area is transfers. AI can’t improve the efficiency of transfers too much, but it also can’t be a bottleneck outside of transaction and administrative costs, which obviously AI can greatly reduce and are not that large to begin with.

    4. The second biggest area is provision of healthcare, which we’re already counting, so that’s duplicative. Third is education, which we count in the next section.

    5. Third is education. Fourth is national defense, where efficiency per dollar or employee should get vastly better, to the point where failure to be at the AI frontier is a clear national security risk.

    6. Fifth is interest on the debt, which again doesn’t count, and also we wouldn’t care about if GDP was growing rapidly.

    7. And so on. What’s left to form the last 11% or so? Public safety, transportation and infrastructure, government administration, environment and natural resources and various smaller other programs. What happens here is a policy choice. We are already seeing signs of improvement in government administration (~2% of the 11%), the other 9% might plausibly stall to the extent we decide to do an epic fail.

    8. Education and academia is already being transformed by AI, in the sense of actually learning things, among anyone who is willing to use it. And it’s rolling through academia as we speak, in terms of things like homework assignments, in ways that will force change. So whether you think growth is possible depends on your model of education. If it’s mostly a signaling model then you should see a decline in education investment since the signals will decline in value and AI creates the opportunity for better more efficient signals, but you can argue that this could continue to be a large time and dollar tax on many of us.

    9. Nonprofits are about 20%-25% education, and ~50% is health care related, which would double count, so the remainder is only ~1.3% of GDP. This also seems like a dig at nonprofits and their inability to adapt to change, but why would we assume nonprofits can’t benefit from AI?

    10. What’s weird is that I would point to different areas that have the most important anticipated bottlenecks to growth, such as housing or power, where we might face very strong regulatory constraints and perhaps AI can’t get us out of those.

  3. (1: 30) He says it will take ~30 years for sectors of the economy that do not use AI well to be replaced by those that do use AI well.

    1. That’s a very long time, even in an AI fizzle scenario. I roll to disbelieve that estimate in most cases. But let’s even give it to him, and say it is true, and it takes 30 years to replace them, while the productivity of the replacement goes up 5%/year above incumbents, which are stagnant. Then you delay the growth, but you don’t prevent it, and if you assume this is a gradual transition you start seeing 1%+ yearly GDP growth boosts even in these sectors within a decade.

  4. He concludes by saying some less regulated areas grow a lot, but that doesn’t get you that much, so you can’t have the whole economy ‘growing by 40%’ in a nutshell.

    1. I mean, okay, but that’s double Dwarkesh’s initial question of why we aren’t growing at 20%. So what exactly can we get here? I can buy this as an argument for AI fizzle world growing slower than it would have otherwise, but the teaser has a prediction of 0.5%, which is a whole different universe.

  1. (2: 20) Tyler asserts that value of intelligence will go down because more intelligence will be available.

    1. Dare I call this the Lump of Intelligence fallacy, after the Lump of Labor fallacy? Yes, to the extent that you are doing the thing an AI can do, the value of that intelligence goes down, and the value of AI intelligence itself goes down in economic terms because its cost of production declines. But to the extent that your intelligence complements and unlocks the AI’s, or is empowered by the AI’s and is distinct from it (again, we must be in fizzle-world), the value of that intelligence goes up.

    2. Similarly, when he talks about intelligence as ‘one input’ in the system among many, that seems like a fundamental failure to understand how intelligence works, a combination of intelligence denialism (failure to buy that much greater intelligence could meaningfully exist) and a denial of substitution or ability to innovate as a result – you couldn’t use that intelligence to find alternative or better ways to do things, and you can’t use more intelligence as a substitute for other inputs. And you can’t substitute the things enabled more by intelligence much for the things that aren’t, and so on.

    3. It also assumes that intelligence can’t be used to convince us to overcome all these regulatory barriers and bottlenecks. Whereas I would expect that raising the intelligence baseline greatly would make it clear to everyone involved how painful our poor decisions were, and also enable improved forms of discourse and negotiation and cooperation and coordination, and also greatly favor those that embrace it over those that don’t, and generally allow us to take down barriers. Tyler would presumably agree that if we were to tear down the regulatory state in the places it was holding us back, that alone would be worth far more than his 0.5% of yearly GDP growth, even with no other innovation or AI.

  1. (2: 50) Dwarkesh challenges Tyler by pointing out that the Industrial Revolution resulted in a greatly accelerated rate of economic growth versus previous periods, and asks what Tyler would say to someone from the past doubting it was possible. Tyler attempts to dodge (and is amusing doing so) by saying they’d say ‘looks like it would take a long time’ and he would agree.

    1. Well, it depends what a long time is, doesn’t it? 2% sustained annual growth (or 8%!) is glacial in some sense and mind boggling by ancient standards. ‘Take a long time’ in AI terms, such as what is actually happening now, could still look mighty quick if you compared it to most other things. OpenAI has 300 million MAUs.

  2. (3: 20) Tyler trots out the ‘all the financial prices look normal’ line, that they are not predicting super rapid growth and neither are economists or growth experts.

    1. Yes, the markets are being dumb, the efficient market hypothesis is false, and also aren’t you the one telling me I should have been short the market? Well, instead I’m long, and outperforming. And yes, economists and ‘experts on economic growth’ aren’t predicting large amounts of growth, but their answers are Obvious Nonsense to me and saying that ‘experts don’t expect it’ without arguments why isn’t much of an argument.

  3. (3: 40) Aside, since you kind of asked: So who am I to say different from the markets and the experts? I am Zvi Mowshowitz. Writer. Son of Solomon and Deborah Mowshowitz. I am the missing right hand of the one handed economists you cite. And the one warning you about what is about to kick Earth’s sorry ass into gear. I speak the truth as I see it, even if my voice trembles. And a warning that we might be the last living things this universe ever sees. God sent me.

  4. Sorry about that. But seriously, think for yourself, schmuck! Anyway.

What would happen if we had more people? More of our best people? Got more out of our best people? Why doesn’t AI effectively do all of these things?

  1. (3: 55) Tyler is asked wouldn’t a large rise in population drive economic growth? He says no, that’s too much a 1-factor model, in fact we’ve seen a lot of population growth without innovation or productivity growth.

    1. Except that Tyler is talking here about growth on a per capita basis. If you add AI workers, you increase the productive base, but they don’t count towards the capita.

  2. Tyler says ‘it’s about the quality of your best people and institutions.’

    1. But quite obviously AI should enable a vast improvement in the effective quality of your best people, it already does, Tyler himself would be one example of this, and also the best institutions, including because they are made up of the best people.

  3. Tyler says ‘there’s no simple lever, intelligence or not, that you can push on.’ Again, intelligence as some simple lever, some input component.

    1. The whole point of intelligence is that it allows you to do a myriad of more complex things, and to better choose those things.

  4. Dwarkesh points out the contradiction between ‘you are bottlenecked by your best people’ and asserting cost disease and constraint by your scarce input factors. Tyler says Dwarkesh is bottlenecked, Dwarkesh points out that with AGI he will be able to produce a lot more podcasts. Tyler says great, he’ll listen, but he will be bottlenecked by time.

    1. Dwarkesh’s point generalizes. AGI greatly expand the effective amount of productive time of the best people, and also extend their capabilities while doing so.

    2. AGI can also itself become ‘the best people’ at some point. If that was the bottleneck, then the goose asks, what happens now, Tyler?

  5. (5: 15) Tyler cites that much of sub-Saharan Africa still does not have clean reliable water, and intelligence is not the bottleneck there. And that taking advantage of AGI will be like that.

    1. So now we’re expecting AGI in this scenario? I’m going to kind of pretend we didn’t hear that, or that this is a very weak AGI definition, because otherwise the scenario doesn’t make sense at all.

    2. Intelligence is not directly the bottleneck there, true, but yes quite obviously Intelligence Solves This if we had enough of it and put those minds to that particular problem and wanted to invest the resources towards it. Presumably Tyler and I mostly agree on why the resources aren’t being devoted to it.

    3. What it mean for similar issues to that to be involved in taking advantage of AGI? Well, first, it would mean that you can’t use AGI to get to ASI (no I can’t explain why), but again that’s got to be a baseline assumption here. After that, well, sorry, I failed to come up with a way to finish this that makes it make sense to me, beyond a general ‘humans won’t do the things and will throw up various political and legal barriers.’ Shrug?

  6. (5: 35) Dwarkesh speaks about a claim that there is a key shortage of geniuses, and that America’s problems come largely from putting its geniuses in places like finance, whereas Taiwan puts them in tech, so the semiconductors end up in Taiwan. Wouldn’t having lots more of those types of people eat a lot of bottlenecks? What would happen if everyone had 1000 times more of the best people available?

  7. Tyler Cowen, author of a very good book about Talent and finding talent and the importance of talent, says he didn’t agree with that post, and returns to IQ in the labor market are amazingly low, and successful people are smart but mostly they have 8-9 areas where they’re an 8-9 on a 1-10 scale, with one 11+ somewhere, and a lot of determination.

    1. All right, I don’t agree that intelligence doesn’t offer returns now, and I don’t agree that intelligence wouldn’t offer returns even at the extremes, but let’s again take Tyler’s own position as a given…

    2. But that exactly describes what an AI gives you! An AI is the ultimate generalist. An AGI will be a reliable 8-9 on everything, actual everything.

    3. And it would also turn everyone else into an 8-9 on everything. So instead of needing to find someone 11+ in one area, plus determination, plus having 8-9 in ~8 areas, you can remove that last requirement. That will hugely expand the pool of people in question.

    4. So there’s two obvious very clear plans here: You can either use AI workers who have that ultimate determination and are 8-9 in everything and 11+ in the areas where AIs shine (e.g. math, coding, etc).

    5. Or you can also give your other experts an AI companion executive assistant to help them, and suddenly they’re an 8+ in everything and also don’t have to deal with a wide range of things.

  8. (6: 50) Tyler says, talk to a committee at a Midwestern university about their plans for incorporating AI, then get back to him and talk to him about bottlenecks. Then write a report and the report will sound like GPT-4 and we’ll have a report.

    1. Yes, the committee will not be smart or fast about its official policy for how to incorporate AI into its existing official activities. If you talk to them now they will act like they have a plagiarism problem and that’s it.

    2. So what? Why do we need that committee to form a plan or approve anything or do anything at all right now, or even for a few years? All the students are already using AI. The professors are rapidly forced to adapt AI. Everyone doing the research will soon be using AI. Half that committee, three years from now, prepared for that meeting using AI. Their phones will all work based on AI. They’ll be talking to their AI phone assistant companions that plan their schedules. You think this will all involve 0.5% GDP growth?

  9. (7: 20) Dwarkesh asks, won’t the AIs be smart, super conscientious and work super hard? Tyler explicitly affirms the 0.5% GDP growth estimate, that this will transform the world over 30 years but ‘over any given year we won’t so much notice it.’ Things like drug developments that would have taken 20 years now take 10 years, but you won’t feel it as revolutionary for a long time.

    1. I mean, it’s already getting very hard to miss. If you don’t notice it in 2025 or at least 2026, and you’re in the USA, check your pulse, you might be dead, etc.

    2. Is that saying we will double productivity in pharmaceutical R&D, and that it would have far more than doubled if progress didn’t require long expensive clinical trials, so other forms of R&D should be accelerated much more?

    3. For reference, according to Claude, R&D in general contributes about 0.3% to RGDP growth per year right now. If we were to double that effect in roughly half the current R&D spend that is bottlenecked in similar fashion, and the other half would instead go up by more.

    4. Claude also estimates that R&D spending would, if returns to R&D doubled, go up by 30%-70% on net.

    5. So we seem to be looking at more than 0.5% RGDP growth per year from R&D effects alone, between additional spending on it and greater returns. And obviously AI is going to have additional other returns.

This is a plausible bottleneck, but that implies rather a lot of growth.

  1. (8: 00) Dwarkesh points out that Progress Studies is all about all the ways we could unlock economic growth, yet Tyler says that tons more smart conscientious digital workers wouldn’t do that much. What gives? Tyler again says bottlenecks, and adds on energy as an important consideration and bottleneck.

    1. Feels like bottleneck is almost a magic word or mantra at this point.

    2. Energy is a real consideration, yes the vision here involves spending a lot more energy, and that might take time. But also we see rapidly declining costs, including energy costs, to extract the same amount of intelligence, things like 10x savings each year.

    3. And for inference purposes we can outsource our needs elsewhere, which we would if this was truly bottlenecking explosive growth, and so on. So while I think energy will indeed be an important limiting factor and be strained, and this will be especially important in terms of pushing the frontier or if we want to use o3-style very expensive inference a lot.

    4. I don’t expect it to bind medium-term economic growth so much in a slow growth scenario, and the bottlenecks involved here shouldn’t compound with others. In a high growth takeoff scenario, I do think energy could bind far more impactfully.

    5. Another way of looking at this is that if the price of energy goes substantially up due to AI, or at least the price of energy outside of potentially ‘government-protected uses,’ then that can only happen if it is having a large economic impact. If it doesn’t raise the price of energy a lot, then no bottleneck exists.

Tyler Cowen and I think very differently here.

  1. (9: 25) Fascinating moment. Tyler says he goes along with the experts in general, but agrees that ‘the experts’ on basically everything but AI are asleep at the wheel when it comes to AI – except when it comes to their views on diffusions of new technology in general, where the AI people are totally wrong. His view is, you get the right view by trusting the experts in each area, and combining them.

    1. Tyler seems to be making an argument from reference class expertise? That this is a ‘diffusion of technology’ question, so those who are experts on that should be trusted?

    2. Even if they don’t actually understand AI and what it is and its promise?

    3. That’s not how I roll. At all. As noted above in this post, and basically all the time. I think that you have to take the arguments being made, and see if you agree with them, and whether and how much they apply to the case of AI and especially AGI. Saying ‘the experts in area [X] predict [Y]’ is a reasonable placeholder if you don’t have the ability to look at the arguments and models and facts involved, but hey look, we can do that.

    4. Simply put, while I do think the diffusion experts are pointing to real issues that will importantly slow down adaptation, and indeed we are seeing what for many is depressingly slow apadation, they won’t slow it down all that much, because this is fundamentally different. AI and especially workers ‘adapt themselves’ to a large extent, the intelligence and awareness involved is in the technology itself, and it is digital and we have a ubiquitous digital infrastructure we didn’t have until recently.

    5. It is also way too valuable a technology, even right out of the gate on your first day, and you will start to be forced to interact with it whether you like it or not, both in ways that will make it very difficult and painful to ignore. And the places it is most valuable will move very quickly. And remember, LLMs will get a lot better.

    6. Suppose, as one would reasonably expect, by 2026 we have strong AI agents, capable of handling for ordinary people a wide variety of logistical tasks, sorting through information, and otherwise offering practical help. Apple Intelligence is partly here, Claude Alexa is coming, Project Astra is coming, and these are pale shadows of the December 2025 releases I expect. How long would adaptation really take? Once you have that, what stops you from then adapting AI in other ways?

    7. Already, yes, adaptation is painfully slow, but it is also extremely fast. In two years ChatGPT alone has 300 million MAU. A huge chunk of homework and grading is done via LLMs. A huge chunk of coding is done via LLMs. The reason why LLMs are not catching on even faster is that they’re not quite ready for prime time in the fully user-friendly ways normies need. That’s about to change in 2025.

Dwarkesh tries to use this as an intuition pump. Tyler’s not having it.

  1. (10: 15) Dwarkesh asks, what would happen if the world population would double? Tyler says, depends what you’re measuring. Energy use would go up. But he doesn’t agree with population-based models, too many other things matter.

    1. Feels like Tyler is answering a different question. I see Dwarkesh as asking, wouldn’t the extra workers mean we could simply get a lot more done, wouldn’t (total, not per capita) GDP go up a lot? And Tyler’s not biting.

  2. (11: 10) Dwarkesh tries asking about shrinking the population 90%. Shrinking, Tyler says, the delta can kill you, whereas growth might not help you.

    1. Very frustrating. I suppose this does partially respond, by saying that it is hard to transition. But man I feel for Dwarkesh here. You can feel his despair as he transitions to the next question.

  1. (11: 35) Dwarkesh asks what are the specific bottlenecks? Tyler says: Humans! All of you! Especially you who are terrified.

    1. That’s not an answer yet, but then he actually does give one.

  2. He says once AI starts having impact, there will be a lot of opposition to it, not primarily on ‘doomer’ grounds but based on: Yes, this has benefits, but I grew up and raised my kids for a different way of life, I don’t want this. And there will be a massive fight.

    1. Yes. He doesn’t even mention jobs directly but that will be big too. We already see that the public strongly dislikes AI when it interacts with it, for reasons I mostly think are not good reasons.

    2. I’ve actually been very surprised how little resistance there has been so far, in many areas. AIs are basically being allowed to practice medicine, to function as lawyers, and do a variety of other things, with no effective pushback.

    3. The big pushback has been for AI art and other places where AI is clearly replacing creative work directly. But that has features that seem distinct.

    4. Yes people will fight, but what exactly do they intend to do about it? People have been fighting such battles for a while, every year I watch the battle for Paul Bunyan’s Axe. He still died. I think there’s too much money at stake, too much productivity at stake, too many national security interests.

    5. Yes, it will cause a bunch of friction, and slow things down somewhat, in the scenarios like the one Tyler is otherwise imagining. But if that’s the central actual thing, it won’t slow things down all that much in the end. Rarely has.

    6. We do see some exceptions, especially involving powerful unions, where the anti-automation side seems to do remarkably well, see the port strike. But also see which side of that the public is on. I don’t like their long term position, especially if AI can seamlessly walk in and take over the next time they strike. And that, alone, would probably be +0.1% or more to RGDP growth.

  1. (12: 15) Dwarkesh tries using China as a comparison case. If you can do 8% growth for decades merely by ‘catching up’ why can’t you do it with AI? Tyler responds, China’s in a mess now, they’re just a middle income country, they’re the poorest Chinese people on the planet, a great example of how hard it is to scale. Dwarkesh pushes back that this is about the previous period, and Tyler says well, sure, from the $200 level.

    1. Dwarkesh is so frustrated right now. He’s throwing everything he can at Tyler, but Tyler is such a polymath that he has detail points for anything and knows how to pivot away from the question intents.

  1. (13: 40) Dwarkesh asks, has Tyler’s attitude on AI changed from nine months ago? He says he sees more potential and there was more progress than he expected, especially o1 (this was before o3). The questions he wrote for GPT-4, which Dwarkesh got all wrong, are now too easy for models like o1. And he ‘would not be surprised if an AI model beat human experts on a regular basis within three years.’ He equates it to the first Kasparov vs. DeepBlue match, which Kasparov won, before the second match which he lost.

    1. I wouldn’t be surprised if this happens in one year.

    2. I wouldn’t be that shocked o3 turns out to do it now.

    3. Tyler’s expectations here, to me, contradict his statements earlier. Not strictly, they could still both be true, but it seems super hard.

    4. How much would availability of above-human level economic thinking help us in aiding economic growth? How much would better economic policy aid economic growth?

We take a detour to other areas, I’ll offer brief highlights.

  1. (15: 45) Why are founders staying in charge important? Courage. Making big changes.

  2. (19: 00) What is going on with the competency crisis? Tyler sees high variance at the top. The best are getting better, such as in chess or basketball, and also a decline in outright crime and failure. But there’s a thick median not quite at the bottom that’s getting worse, and while he thinks true median outcomes are about static (since more kids take the tests) that’s not great.

  3. (22: 30) Bunch of shade on both Churchill generally and on being an international journalist, including saying it’s not that impressive because how much does it pay?

    1. He wasn’t paid that much as Prime Minister either, you know…

  4. (24: 00) Why are all our leaders so old? Tyler says current year aside we’ve mostly had impressive candidates, and most of the leadership in Washington in various places (didn’t mention Congress!) is impressive. Yay Romney and Obama.

    1. Yes, yay Romney and Obama as our two candidates. So it’s only been three election cycles where both candidates have been… not ideal. I do buy Tyler’s claim that Trump has a lot of talent in some ways, but, well, ya know.

    2. If you look at the other candidates for both nominations over that period, I think you see more people who were mostly also not so impressive. I would happily have taken Obama over every candidate on the Democratic side in 2016, 2020 or 2024, and Romney over every Republican (except maybe Kasich) in those elections as well.

    3. This also doesn’t address Dwarkesh’s concern about age. What about the age of Congress and their leadership? It is very old, on both sides, and things are not going so great.

    4. I can’t speak about the quality people in the agencies.

  5. (27: 00) Commentary on early-mid 20th century leaders being terrible, and how when there is big change there are arms races and sometimes bad people win them (‘and this is relevant to AI’).

For something that is going to not cause that much growth, Tyler sees AI as a source for quite rapid change in other ways.

  1. (34: 20) Tyler says all inputs other than AI rise in value, but you have to do different things. He’s shifting from producing content to making connections.

    1. This again seems to be a disconnect. If AI is sufficiently impactful as to substantially increase the value of all other inputs, then how does that not imply substantial economic growth?

    2. Also this presumes that the AI can’t be a substitute for you, or that it can’t be a substitute for other people that could in turn be a substitute for you.

    3. Indeed, I would think the default model would presumably be that the value of all labor goes down, even for things where AI can’t do it (yet) because people substitute into those areas.

  2. (35: 25) Tyler says he’s writing his books primarily for the AIs, he wants them to know he appreciates them. And the next book will be even more for the AIs so it can shape how they see the AIs. And he says, you’re an idiot if you’re not writing for the AIs.

    1. Basilisk! Betrayer! Misaligned!

    2. ‘What the AIs will think of you’ is actually an underrated takeover risk, and I pointed this out as early as AI #1.

    3. The AIs will be smarter and better at this than you, and also will be reading what the humans say about you. So maybe this isn’t as clever as it seems.

    4. My mind boggles that it could be correct to write for the AIs… but you think they will only cause +0.5% GDP annual growth.

  3. (36: 30) What won’t AIs get from one’s writing? That vibe you get talking to someone for the first 3 minutes? Sense of humor?

    1. I expect the AIs will increasingly have that stuff, at least if you provide enough writing samples. They have true sight.

    2. Certainly if they have interview and other video data to train with, that will work over time.

  1. (37: 25) What happens when Tyler turns down a grant in the first three minutes? Usually it’s failure to answer a question, like ‘how do you build out your donor base?’ without which you have nothing. Or someone focuses on the wrong things, or cares about the wrong status markers, and 75% of the value doesn’t display on the transcript, which is weird since the things Tyler names seem like they would be in the transcript.

  2. (42: 15) Tyler’s portfolio is diversified mutual funds, US-weighted. He has legal restrictions on most other actions such as buying individual stocks, but he would keep the same portfolio regardless.

    1. Mutual funds over ETFs? Gotta chase that lower expense ratio.

    2. I basically think This Is Fine as a portfolio, but I do think he could do better if he actually tried to pick winners.

  3. (42: 45) Tyler expects gains to increasingly fall to private companies that see no reason to share their gains with the public, and he doesn’t have enough wealth to get into good investments but also has enough wealth for his purposes anyway, if he had money he’d mostly do what he’s doing anyway.

    1. Yep, I think he’s right about what he would be doing, and I too would mostly be doing the same things anyway. Up to a point.

    2. If I had a billion dollars or what not, that would be different, and I’d be trying to make a lot more things happen in various ways.

    3. This implies the efficient market hypothesis is rather false, doesn’t it? The private companies are severely undervalued in Tyler’s model. If private markets ‘don’t want to share the gains’ with public markets, that implies that public markets wouldn’t give fair valuations to those companies. Otherwise, why would one want such lack of liquidity and diversification, and all the trouble that comes with staying private?

    4. If that’s true, what makes you think Nvidia should only cost $140 a share?

Tyler Cowen doubles down on dismissing AI optimism, and is done playing nice.

  1. (46: 30) Tyler circles back to rate of diffusion of tech change, and has a very clear attitude of I’m right and all people are being idiots by not agreeing with me, that all they have are ‘AI will immediately change everything’ and ‘some hyperventilating blog posts.’ AIs making more AIs? Diminishing returns! Ricardo knew this! Well that was about humans breeding. But it’s good that San Francisco ‘doesn’t know about’ diminishing returns and the correct pessimism that results.

    1. This felt really arrogant, and willfully out of touch with the actual situation.

    2. You can say the AIs wouldn’t be able to do this, but: No, ‘Ricardo didn’t know that’ and saying ‘diminishing returns’ does not apply here, because the whole ‘AIs making AIs’ principle is that the new AIs would be superior to the old AIs, a cycle you could repeat. The core reason you get eventual diminishing returns from more people is that they’re drawn from the same people distribution.

    3. I don’t even know what to say at this point to ‘hyperventilating blog posts.’ Are you seriously making the argument that if people write blog posts, that means their arguments don’t count? I mean, yes, Tyler has very much made exactly this argument in the past, that if it’s not in a Proper Academic Journal then it does not count and he is correct to not consider the arguments or update on them. And no, they’re mostly not hyperventilating or anything like that, but that’s also not an argument even if they were.

    4. What we have are, quite frankly, extensive highly logical, concrete arguments about the actual question of what [X] will happen and what [Y]s will result from that, including pointing out that much of the arguments being made against this are Obvious Nonsense.

    5. Diminishing returns holds as a principle in a variety of conditions, yes, and is a very important concept to know. Bt there are other situations with increasing returns, and also a lot of threshold effects, even outside of AI. And San Francisco importantly knows this well.

    6. Saying there must be diminishing returns to intelligence, and that this means nothing that fast or important is about to happen when you get a lot more of it, completely begs the question of what it even means to have a lot more intelligence.

    7. Earlier Tyler used chess and basketball as examples, and talked about the best youth being better, and how that was important because the best people are a key bottleneck. That sounds like a key case of increasing returns to scale.

    8. Humanity is a very good example of where intelligence at least up to some critical point very obviously had increasing returns to scale. If you are below a certain threshold of intelligence as a human, your effective productivity is zero. Humanity having a critical amount of intelligence gave it mastery of the Earth. Tell what gorillas and lions still exist about decreasing returns to intelligence.

    9. For various reasons, with the way our physical world and civilization is constructed, we often don’t typically end up rewarding relatively high intelligence individuals with that much in the way of outsided economic returns versus ordinary slightly-above-normal intelligence individuals.

    10. But that is very much a product of our physical limitations and current social dynamics and fairness norms, and the concept of a job with essentially fixed pay, and actual good reasons not to try for many of the higher paying jobs out there in terms of life satisfaction.

    11. In areas and situations where this is not the case, returns look very different.

    12. Tyler Cowen himself is an excellent example of increasing returns to scale. The fact that Tyler can read and do so much enables him to do the thing he does at all, and to enjoy oversized returns in many ways. And if you decreased his intelligence substantially, he would be unable to produce at anything like this level. If you increased his intelligence substantially or ‘sped him up’ even more, I think that would result in much higher returns still, and also AI has made him substantially more productive already as he no doubt realizes.

    13. (I’ve been over all this before, but seems like a place to try it again.)

Trying to wrap one’s head around all of it at once is quite a challenge.

  1. (48: 45) Tyler worries about despair in certain areas from AI and worries about how happy it will make us, despite expecting full employment pretty much forever.

    1. If you expect full employment forever then you either expect AI progress to fully stall or there’s something very important you really don’t believe in, or both. I don’t understand, what does Tyler thinks happen once the AIs can do anything digital as well as most or all humans? What does he think will happen when we use that to solve robotics? What are all these humans going to be doing to get to full employment?

    2. It is possible the answer is ‘government mandated fake jobs’ but then it seems like an important thing to say explicitly, since that’s actually more like UBI.

  2. Tyler Cowen: “If you don’t have a good prediction, you should be a bit wary and just say, “Okay, we’re going to see.” But, you know, some words of caution.”

    1. YOU DON’T SAY.

    2. Further implications left as an exercise to the reader, who is way ahead of me.

  1. (54: 30) Tyler says that the people in DC are wise and think on the margin, whereas the SF people are not wise and think in infinities (he also says they’re the most intelligent hands down, elsewhere), and the EU people are wisest of all, but that if the EU people ran the world the growth rate would be -1%. Whereas the USA has so far maintained the necessary balance here well.

    1. If the wisdom you have would bring you to that place, are you wise?

    2. This is such a strange view of what constitutes wisdom. Yes, the wise man here knows more things and is more cultured, and thinks more prudently and is economically prudent by thinking on the margin, and all that. But as Tyler points out, a society of such people would decay and die. It is not productive. In the ultimate test, outcomes, and supporting growth, it fails.

    3. Tyler says you need balance, but he’s at a Progress Studies conference, which should make it clear that no, America has grown in this sense ‘too wise’ and insufficiently willing to grow, at least on the wise margin.

    4. Given what the world is about to be like, you need to think in infinities. You need to be infinitymaxing. The big stuff really will matter more than the marginal revolution. That’s kind of the point.

    5. You still have to, day to day, constantly think on the margin, of course.

  2. (55: 10) Tyler says he’s a regional thinker from New Jersey, that he is an uncultured barbarian, who only has a veneer of culture because of collection of information, but knowing about culture is not like being cultured, and that America falls flat in a lot of ways that would bother a cultured Frenchman but he’s used to it so they don’t bother Tyler.

    1. I think Tyler is wrong here, to his own credit. He is not a regional thinker, if anything he is far less a regional thinker than the typical ‘cultured’ person he speaks about. And to the extent that he is ‘uncultured’ it is because he has not taken on many of the burdens and social obligations of culture, and those things are to be avoided – he would be fully capable of ‘acting cultured’ if the situation were to call for that, it wouldn’t be others mistaking anything.

    2. He refers to his approach as an ‘autistic approach to culture.’ He seems to mean this in a pejorative way, that an autistic approach to things is somehow not worthy or legitimate or ‘real.’ I think it is all of those things.

    3. Indeed, the autistic-style approach to pretty much anything, in my view, is Playing in Hard Mode, with much higher startup costs, but brings a deeper and superior understanding once completed. The cultured Frenchman is like a fish in water, whereas Tyler understands and can therefore act on a much deeper, more interesting level. He can deploy culture usefully.

  3. (56: 00) What is autism? Tyler says it is officially defined by deficits, by which definition no one there [at the Progress Studies convention] is autistic. But in terms of other characteristics maybe a third of them would count.

    1. I think term autistic has been expanded and overloaded in a way that was not wise, but at this point we are stuck with this, so now it means in different contexts both the deficits and also the general approach that high-functioning people with those deficits come to take to navigating life, via consciously processing and knowing the elements of systems and how they fit together, treating words as having meanings, and having a map that matches the territory, whereas those not being autistic navigate largely on vibes.

    2. By this definition, being the non-deficit form of autistic is excellent, a superior way of being at least in moderation and in the right spots, for those capable of handling it and its higher cognitive costs.

    3. Indeed, many people have essentially none of this set of positive traits and ways of navigating the world, and it makes them very difficult to deal with.

  4. (56: 45) Why is tech so bad at having influence in Washington? Tyler says they’re getting a lot more influential quickly, largely due to national security concerns, which is why AI is being allowed to proceed.

For a while now I have found Tyler Cowen’s positions on AI very frustrating (see for example my coverage of the 3rd Cowen-Patel podcast), especially on questions of potential existential risk and expected economic growth, and what intelligence means and what it can do and is worth. This podcast did not address existential risks at all, so most of this post is about me trying (once again!) to explain why Tyler’s views on returns to intelligence and future economic growth don’t make sense to me, seeming well outside reasonable bounds.

I try to offer various arguments and intuition pumps, playing off of Dwarkesh’s attempts to do the same. It seems like there are very clear pathways, using Tyler’s own expectations and estimates, that on their own establish more growth than he expects, assuming AI is allowed to proceed at all.

I gave only quick coverage to the other half of the podcast, but don’t skip that other half. I found it very interesting, with a lot of new things to think about, but they aren’t areas where I feel as ready to go into detailed analysis, and was doing triage. In a world where we all had more time, I’d love to do dives into those areas too.

On that note, I’d also point everyone to Dwarkesh Patel’s other recent podcast, which was with physicist Adam Brown. It repeatedly blew my mind in the best of ways, and I’d love to be in a different branch where I had the time to dig into some of the statements here. Physics is so bizarre.

Discussion about this post

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen Read More »