Author name: 9u50fv

more-cancer,-less-death?-new-alcohol-risk-reviews-offer-conflicting-takeaways

More cancer, less death? New alcohol-risk reviews offer conflicting takeaways


Two big, somewhat conflicting studies on alcohol risks will influence new guidelines.

Heavy drinking is clearly bad for your health. But it’s long been questioned whether moderate drinking is also risky—and, if so, how risky, exactly.

Health researchers have consistently found links between alcohol consumption and several types of cancers (namely mouth, throat, colon, rectal, liver, and breast), as well as liver diseases, injuries, and traffic accidents. But nailing down the health risks from the lower levels of drinking has been tricky. For one, much of the data on moderate drinking is from observational studies in different countries, cultures, and populations. They cannot determine if alcohol is the direct cause of any given association, and they may be swayed by other lifestyle factors. The resulting data can be noisy and inconsistent.

Moreover, many studies rely on people to self-report whether they drink and, if so, how much, which is problematic because people may not accurately assess and/or report how much they actually drink. A related problem is that studies in the past often compared drinkers to people who said they didn’t drink. But, the trouble is, non-drinking groups are often some mix of people who are lifelong abstainers and people who used to drink but quit for some reason—maybe because of health effects. This latter group has the potential to have lingering health effects from their drinking days, which could skew any comparisons looking for health differences.

Then there’s the larger, common problem with any research focused on food or beverages: some have been sponsored or somehow swayed by industry, casting suspicion on the findings, particularly the ones indicating benefits. This has been a clear problem for alcohol research. For instance, in 2018, the National Institutes of Health shut down a $100 million trial aimed at assessing the health effects (and potential benefits) of moderate drinking after it came to light that much of the funding was solicited from the alcohol industry. There was a lot of questionable communication between NIH scientists and alcohol industry representatives.

With all of that in the background, there’s been clamorous debate about how much risk, if any, people are swallowing with their evening cocktail, gameday beer, or wine with dinner.

Currently, the US dietary guidance recommends that if adults drink, they should stick to drinking in moderation, defined as “alcohol intake to two drinks or fewer in a day for men and one drink or fewer in a day for women.” But recently, health experts in the US and abroad have started calling for lower limits, noting that more data has poured in that fortifies links to cancers and other risks. In 2023, for instance, Canada released recommendations that people limit their alcohol consumption to two drinks or fewer per week—that’s down significantly from the previously recommended limit of 10 drinks per week for women and 15 drinks per week for men.

Two reviews

Now, it’s America’s turn to decide if they’ll set the bar lower, too. This year, the US will update its dietary guidelines, which are carried out by the Department of Health and Human Services and the Department of Agriculture every five years. The federal government has requested two big scientific reviews to assess the current knowledge of the health effects of alcohol, which will both inform any potential revisions to the alcohol guidelines. Now, both studies have been released and are open for discussion.

One is from the National Academies of Sciences, Engineering, and Medicine (the National Academies), which was tasked by Congress to review the current evidence on alcohol with a focus on how moderate drinking potentially affects a specific set of health outcomes. The review compared health outcomes in moderate drinkers with those of lifelong abstainers. For the review, the National Academies set up a committee of 14 experts.

The other report is from the Interagency Coordinating Committee on the Prevention of Underage Drinking (ICCPUD), which set up a Technical Review Subcommittee on Alcohol Intake and Health. For its report, the subcommittee looked not just at moderate drinking but health outcomes of a range of alcohol consumption compared to lifelong abstainers.

Based on top-line takeaways and tone, the two reports seem to have very different findings. While the National Academies review found a mix of benefits and harms from moderate drinking (one drink per day for women, and two per day for men), the ICCPUD review suggested that even the smallest amounts of alcohol (one drink per week) increased risk of death and various diseases. However, a closer look at the data shows they have some common ground.

The National Academies review

First, for the National Academies’ review, experts found sufficient evidence to assess the effects of moderate drinking on all-cause mortality, certain cancers, and cardiovascular risks. On the other hand, the reviewers found insufficient evidence to assess moderate drinking’s impact on weight changes, neurocognition, and lactation-related risks.

For all-cause mortality, a meta-analysis of data from eight studies found that moderate drinkers had a 16 percent lower risk of all-cause mortality (death from any cause) compared with lifelong abstainers. A meta-analysis of three studies suggested the risk of all-cause mortality was 23 percent lower for females who drank moderately compared to never-drinking females. Data from four studies indicated that moderate drinking males had a 16 percent lower risk of all-cause mortality than never-drinking males. Additional analyses found that the risk of all-cause mortality was 20 percent lower for moderate drinkers less than age 60 and 18 percent lower for moderate drinkers age 60 and up.

“Based on data from the eight eligible studies from 2019 to 2023, the committee concludes that compared with never consuming alcohol, moderate alcohol consumption is associated with lower all-cause mortality,” the review states. The reviewers rated the conclusion as having “moderate certainty.”

Cancer and cardiovascular disease

For a look at cancer risks, a meta-analysis of four studies on breast cancer found that moderate drinkers had an overall 10 percent higher risk than non-drinkers. An additional analysis of seven studies found that for every 10 to 14 grams of alcohol (0.7 to one standard drink) consumed per day, there was a 5 percent higher risk of breast cancer. The data indicated that people who drank higher amounts of alcohol within the moderate range had higher risks than those who drank lower amounts in the moderate range (for instance, one drink a day versus 0.5 drinks a day).

For context, the average lifetime risk of being diagnosed with breast cancer in non-drinking females is about 11 to 12 percent. A 10 percent relative increase in risk would raise a person’s absolute risk to around 12 to 13 percent. The average lifetime risk of any female dying of breast cancer is 2.5 percent.

Overall, the reviewers concluded that “consuming a moderate amount of alcohol was associated with a higher risk of breast cancer,” and the conclusion was rated as having moderate certainty.

A meta-analysis on colorectal cancer risks found a “statistically nonsignificant higher risk” in moderate drinkers compared to non-drinkers. However, studies looking at alcohol consumption at the highest levels of moderate drinking for males (e.g., two drinks per day) suggested a higher risk compared to males who drank lower amounts of alcohol in the moderate range (one drink per day).

The review concluded that there was insufficient evidence to support a link between moderate drinking and oral cavity, pharyngeal, esophageal, and laryngeal cancers.

Finally, for cardiovascular risks, meta-analyses found moderate drinking was associated with a 22 percent lower risk of heart attacks and an 11 percent lower risk of stroke (driven by lower risk of ischemic stroke, specifically). The reviewers rated these associations as low certainty, though, after noting that there was some concern for risk of bias in the studies.

For cardiovascular disease mortality, meta-analyses of four studies found an 18 percent lower risk of death among moderate drinkers compared with non-drinkers. Broken down, there was a 23 percent lower risk in female drinkers and 18 percent lower risk in male drinkers. The lower risk of cardiovascular disease mortality was rated as moderate certainty.

The ICCPUD review

The ICCPUD subcommittee’s report offered a darker outlook on moderate drinking, concluding that “alcohol use is associated with increased mortality for seven types of cancer (colorectal, female breast, liver, oral cavity, pharynx, larynx, esophagus [squamous cell type]),” and “increased risk for these cancers begins with any alcohol use and increases with higher levels of use.”

The review modeled lifetime risks of cancer and death and relative risks for a long list of problems, including infectious diseases, non-communicable diseases, and injuries. Also, it didn’t just focus on non-drinkers versus moderate drinkers, but it assessed the relative risk of six levels of drinking: one drink a week; two drinks a week; three drinks a week; seven drinks a week (one a day); 14 drinks a week (two a day), and 21 drinks a week (three a day).

Overall, the analysis is very much a rough draft. There are some places where information is missing, and some of the figures are mislabeled and difficult to read. There are two figures labeled Figure 6, for instance and Figure 7 (which may be Figure 8), is a graph that doesn’t have a Y-axis, making it difficult to interpret. The study also doesn’t discuss the level of potential bias of individual studies in its analyses. It also doesn’t make note of statistically insignificant results, nor comment on the certainty of any of its findings.

For instance, the top-line summary states: “In the United States, males and females have a 1 in 1,000 risk of dying from alcohol use if they consume more than 7 drinks per week. This risk increases to 1 in 100 if they consume more than 9 drinks per week.” But a look at the modeling behind these estimates indicates the cutoffs of when drinkers would reach a 0.1 percent or 1 percent risk of dying from alcohol use are broad. For males, a 0.1 percent lifetime risk of an alcohol-attributed death is reached at 6.5 standard drinks, with a 95 percent confidence interval spanning less than one drink per week and 13.5 drinks per week. “This lifetime risk rose to 1 in 100 people above 8.5 drinks per week,” the text reads, but the confidence interval is again between one and 14 drinks per week. So, basically, at anywhere between about one and 14 drinks a week, a male’s lifetime risk of dying from alcohol may be either 0.1 or 1 percent, according to this modeling.

Death risks

Regarding risk of death, the study did not look at all-cause mortality, like the National Academies review. Instead, it focused on deaths from causes specifically linked to alcohol. For both males and females, modeling indicated that the total lifetime risk of any alcohol-attributed death for people who consumed one, two, three, or seven drinks per week was statistically non-significant (the confidence intervals for each calculation spanned zero). Among those who have 14 drinks per week, the total lifetime risk of death was about 4 in 100 from all causes, with unintentional injuries being the biggest contributor for males and liver diseases being the biggest contributor for females. Among those who have 21 drinks per week, the risk of death was about 7 in 100 for males and 8 in 100 for females. Unintentional injuries and liver diseases were again the biggest contributors to the risk.

Some experts have speculated that the lower risk of all-cause mortality found in the National Academies’ analysis (which has been seen in previous studies) may be due to healthy lifestyle patterns among people who drink moderately rather than the protective effects of alcohol. The line of thinking would suggest that healthy lifestyle choices, like regular exercise and a healthy diet, can negate certain risks, including the potential risks of alcohol. However, the ICCPUD emphasizes the reverse argument, noting that poor health choices would likely exacerbate the risks of alcohol. “[A]lcohol would have a greater impact on the health of people who smoke, have poor diets, engage in low physical activity, are obese, have hepatitis infection, or have a family history of specific diseases than it would other individuals.”

Relative risks

In terms of relative risk of the range of conditions, generally, the ICCPUD study found small, if any, increases in risk at the three lowest levels of drinking, with risks rising with higher levels. The study’s finding of breast cancer risk was in line with the National Academies’ review. ICCPUD found that pre-menopausal females who drink moderately (one drink per day) had a 6 percent higher risk of breast cancer than non-drinkers, while post-menopausal moderate drinkers had a 17 percent higher risk. (You can see the complete set of relative risk estimates in Table A6 beginning on page 70 of the report.)

For some cancers, moderate drinking raised the risk substantially. For instance, males who have two drinks per day see their risk of esophageal cancer more than double. But, it’s important to note that the absolute risk for many of these cancers is small to begin with. The average risk of esophageal cancer in men is 0.8 percent, according to the American Cancer Society. With the increased risk from moderate drinking, it would be below 2 percent. Still, alcohol consumption increased the risks of nearly all the cancers examined, with the higher levels of alcohol consumption having the highest risk.

As for cardiovascular risks, ICCPUD’s review found low risk in several of the categories. The risk of ischemic heart disease was lower than that of nondrinkers at all six drinking levels. The risk of ischemic stroke was lower among drinkers who had one, two, three, or seven drinks per week compared to non-drinkers. At 14 and 21 drinks per week, the risk of ischemic stroke rose by 8 percent.

Photo of Beth Mole

Beth is Ars Technica’s Senior Health Reporter. Beth has a Ph.D. in microbiology from the University of North Carolina at Chapel Hill and attended the Science Communication program at the University of California, Santa Cruz. She specializes in covering infectious diseases, public health, and microbes.

More cancer, less death? New alcohol-risk reviews offer conflicting takeaways Read More »

trek-fx+-7s-e-bike-is-a-premium-city-commuter 

Trek FX+ 7S e-bike is a premium city commuter 

Post-pandemic, my creed became “Bicycles deliver the freedom that auto ads promise.” That belief is why I’ve almost exclusively used a bike to move myself around Portland, Oregon since (yes, I have become a Portlandia stereotype).

However, that lifestyle is a lot more challenging without some pedal assistance. For a few summers, I showed up sweaty to appointments after pedaling on a $200 single-speed. So in 2024, I purchased the FX+ 2, based primarily on my managing editor’s review. It’s since been a workhorse for my daily transportation needs for the past year; I’ve put more than 1,000 miles on it in eight months.

So given my experience with that bike, I was the natural choice to review Trek’s upgraded version, the FX+ 7S.

A premium pedaler

First off, my time with the FX+ 2 has been great—no regrets about that purchase. But my one quibble is with the battery. Due to the frequency and length of my rides, I need to charge the bike more often than not, and I sometimes experience range anxiety riding to the opposite side of town. Even though both e-bikes are considered lightweight at 40 pounds, they’re still not the easiest things to pedal sans assist, and I’m reliant on their built-in lighting systems after dark.

But I didn’t have to worry about my remaining charge with the FX+ 7 and its 360 Wh battery. Its extra capacity gives me much less range anxiety, as I can ride without fear of losing juice on the route home. And the LCD on the frame gives you a clear indicator of how much distance and time you have left in your ride, which is always handy. I would caution, however, about relying too much on your estimated distance remaining.

The Trek FX+7's LCD screen show the charge remaining in the bike.

The LCD provides some useful info. You can see how much charge is left on the battery, or you can press that button to see your speed, wattage power, or miles ridden. Credit: Chris DeGraw

During a 15-mile, hour-long ride while fluctuating between the first two assist levels I had modified, I drained 61 percent of the battery. While the estimated time remaining on my ride was consistent and accurate, the predicted mileage dropped occasionally, although that’s probably because I was changing the assist level frequently.

Trek FX+ 7S e-bike is a premium city commuter  Read More »

a-solid-electrolyte-gives-lithium-sulfur-batteries-ludicrous-endurance

A solid electrolyte gives lithium-sulfur batteries ludicrous endurance


Sulfur can store a lot more lithium but is problematically reactive in batteries.

If you weren’t aware, sulfur is pretty abundant. Credit: P_Wei

Lithium may be the key component in most modern batteries, but it doesn’t make up the bulk of the material used in them. Instead, much of the material is in the electrodes, where the lithium gets stored when the battery isn’t charging or discharging. So one way to make lighter and more compact lithium-ion batteries is to find electrode materials that can store more lithium. That’s one of the reasons that recent generations of batteries are starting to incorporate silicon into the electrode materials.

There are materials that can store even more lithium than silicon; a notable example is sulfur. But sulfur has a tendency to react with itself, producing ions that can float off into the electrolyte. Plus, like any electrode material, it tends to expand in proportion to the amount of lithium that gets stored, which can create physical strains on the battery’s structure. So while it has been easy to make lithium-sulfur batteries, their performance has tended to degrade rapidly.

But this week, researchers described a lithium-sulfur battery that still has over 80 percent of its original capacity after 25,000 charge/discharge cycles. All it took was a solid electrolyte that was more reactive than the sulfur itself.

When lithium meets sulfur…

Sulfur is an attractive battery material. It’s abundant and cheap, and sulfur atoms are relatively lightweight compared to many of the other materials used in battery electrodes. Sodium-sulfur batteries, which rely on two very cheap raw materials, have already been developed, although they only work at temperatures high enough to melt both of these components. Lithium-sulfur batteries, by contrast, could operate more or less the same way that current lithium-ion batteries do.

With a few major exceptions, that is. One is that the elemental sulfur used as an electrode is a very poor conductor of electricity, so it has to be dispersed within a mesh of conductive material. (You can contrast that with graphite, which both stores lithium and conducts electricity relatively well, thanks to being composed of countless sheets of graphene.) Lithium is stored there as Li2S, which occupies substantially more space than the elemental sulfur it’s replacing.

Both of these issues, however, can be solved with careful engineering of the battery’s structure. A more severe problem comes from the properties of the lithium-sulfur reactions that occur at the electrode. Elemental sulfur exists as an eight-atom ring, and the reactions with lithium are slow enough that semi-stable intermediates with smaller chains of sulfur end up forming. Unfortunately, these tend to be soluble in most electrolytes, allowing them to travel to the opposite electrode and participate in chemical reactions there.

This process essentially discharges the battery without allowing the electrons to be put to use. And it gradually leaves the electrode’s sulfur unavailable for participating in future charge/discharge cycles. The net result is that early generations of the technology would discharge themselves while sitting unused and would only survive a few hundred cycles before performance decayed dramatically.

But there has been progress on all these fronts, and some lithium-sulfur batteries with performance similar to lithium-ion have been demonstrated. Late last year, a company announced that it had lined up the money needed to build the first large-scale lithium-sulfur battery factory. Still, work on improvements has continued, and the new work seems to suggest ways to boost performance well beyond lithium-ion.

The need for speed

The paper describing the new developments, done by a collaboration between Chinese and German researchers, focuses on one aspect of the challenges posed by lithium-sulfur batteries: the relatively slow chemical reaction between lithium ions and elemental sulfur. It presents that aspect as a roadblock to fast charging, something that will be an issue for automotive applications. But at the same time, finding a way to limit the formation of inactive intermediate products during this reaction goes to the root of the relatively short usable life span of lithium-sulfur batteries.

As it turns out, the researchers found two.

One of the problems with the lithium-sulfur reaction intermediates is that they dissolve in most electrolytes. But that’s not a problem if the electrolyte isn’t a liquid. Solid electrolytes are materials that have a porous structure at the atomic level, with the environment inside the pores being favorable for ions. This allows ions to diffuse through the solid. If there’s a way to trap ions on one side of the electrolyte, such as a chemical reaction that traps or de-ionizes them, then it can enable one-way travel.

Critically, pores that favor the transit of lithium ions, which are quite compact, aren’t likely to allow the transit of the large ionized chains of sulfur. So a solid electrolyte should help cut down on the problems faced by lithium-sulfur batteries. But it won’t necessarily help with fast charging.

The researchers began by testing a glass formed from a mixture of boron, sulfur, and lithium (B2S3 and Li2S). But this glass had terrible conductivity, so they started experimenting with related glasses and settled on a combination that substituted in some phosphorus and iodine.

The iodine turned out to be a critical component. While the exchange of electrons with sulfur is relatively slow, iodine undergoes electron exchange (technically termed a redox reaction) extremely quickly. So it can act as an intermediate in the transfer of electrons to sulfur, speeding up the reactions that occur at the electrode. In addition, iodine has relatively low melting and boiling points, and the researchers suggest there’s some evidence that it moves around within the electrolyte, allowing it to act as an electron shuttle.

Successes and caveats

The result is a far superior electrolyte—and one that enables fast charging. It’s typical that fast charging cuts into the total capacity that can be stored in a battery. But when charged at an extraordinarily fast rate (50C, meaning a full charge in just over a minute), a battery based on this system still had half the capacity of a battery charged 25 times more slowly (2C, or a half-hour to full charge).

But the striking thing was how durable the resulting battery was. Even at an intermediate charging rate (5C), it still had over 80 percent of its initial capacity after over 25,000 charge/discharge cycles. By contrast, lithium-ion batteries tend to hit that level of decay after about 1,000 cycles. If that sort of performance is possible in a mass-produced battery, it’s only a slight exaggeration to say it can radically alter our relationships with many battery-powered devices.

What’s not at all clear, however, is whether this takes full advantage of one of the original promises of lithium-sulfur batteries: more charge in a given weight and volume. The researchers specify the battery being used for testing; one electrode is an indium/lithium metal foil, and the other is a mix of carbon, sulfur, and the glass electrolyte. A layer of the electrolyte sits between them. But when giving numbers for the storage capacity per weight, only the weight of the sulfur is mentioned.

Still, even if weight issues would preclude this from being stuffed into a car or cell phone, there are plenty of storage applications that would benefit from something that doesn’t wear out even with 65 years of daily cycling.

Nature, 2025. DOI: 10.1038/s41586-024-08298-9  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

A solid electrolyte gives lithium-sulfur batteries ludicrous endurance Read More »

ai-#99:-farewell-to-biden

AI #99: Farewell to Biden

The fun, as it were, is presumably about to begin.

And the break was fun while it lasted.

Biden went out with an AI bang. His farewell address warns of a ‘Tech-Industrial Complex’ and calls AI the most important technology of all time. And there was not one but two AI-related everything bagel concrete actions proposed – I say proposed because Trump could undo or modify either or both of them.

One attempts to build three or more ‘frontier AI model data centers’ on federal land, with timelines and plans I can only summarize with ‘good luck with that.’ The other move was new diffusion regulations on who can have what AI chips, an attempt to actually stop China from accessing the compute it needs. We shall see what happens.

  1. Table of Contents.

  2. Language Models Offer Mundane Utility. Prompt o1, supercharge education.

  3. Language Models Don’t Offer Mundane Utility. Why do email inboxes still suck?

  4. What AI Skepticism Often Looks Like. Look at all it previously only sort of did.

  5. A Very Expensive Chatbot. Making it anatomically incorrect is going to cost you.

  6. Deepfaketown and Botpocalypse Soon. Keep assassination agents underfunded.

  7. Fun With Image Generation. Audio generations continue not to impress.

  8. They Took Our Jobs. You can feed all this through o1 pro yourself, shall we say.

  9. The Blame Game. No, it is not ChatGPT’s fault that guy blew up a cybertruck.

  10. Copyright Confrontation. Yes, Meta and everyone else train on copyrighted data.

  11. The Six Million Dollar Model. More thoughts on how they did it.

  12. Get Involved. SSF, Anthropic and Lightcone Infrastructure.

  13. Introducing. ChatGPT can now schedule tasks for you. Yay? And several more.

  14. In Other AI News. OpenAI hiring to build robots.

  15. Quiet Speculations. A lot of people at top labs do keep predicting imminent ASI.

  16. Man With a Plan. PM Kier Starmer takes all 50 Matt Clifford recommendations.

  17. Our Price Cheap. Personal use of AI has no meaningful environmental impact.

  18. The Quest for Sane Regulations. Weiner reloads, Amodei genuflects.

  19. Super Duper Export Controls. Biden proposes export controls with complex teeth.

  20. Everything Bagel Data Centers. I’m sure this ‘NEPA’ thing won’t be a big issue.

  21. d/acc Round 2. Vitalik Buterin reflects on a year of d/acc.

  22. The Week in Audio. Zuckerberg on Rogan, and several sound bites.

  23. Rhetorical Innovation. Ultimately we are all on the same side.

  24. Aligning a Smarter Than Human Intelligence is Difficult. OpenAI researcher.

  25. Other People Are Not As Worried About AI Killing Everyone. Give ‘em hope.

  26. The Lighter Side. Inventing the wheel.

Help dyslexics get around their inability to spell to succeed in school, and otherwise help kids with disabilities. Often, we have ways to help everyone, but our civilization is willing to permit them for people who are ‘behind’ or ‘disadvantaged’ or ‘sick’ but not to help the average person become great – if it’s a problem everyone has, how dare you try to solve it. Well, you do have to start somewhere.

Diagnose medical injuries. Wait, Elon Musk, maybe don’t use those exact words?

The original story that led to that claim is here from AJ Kay. The doctor and radiologist said her daughter was free of breaks, Grok found what it called an ‘obvious’ fracture line, they went to a wrist specialist, who found it, confirmed it was obvious and cast it, which they say likely avoided a surgery.

Used that way LLMs seem insanely great versus doing nothing. You use them as an error check and second opinion. If they see something, you go follow up with a doctor to verify. I’d go so far as to say that if you have a diagnostic situation like this and you feel any uncertainty, and you don’t do at least this, that seems irresponsible.

A suggested way to prompt o1 (and o1 Pro especially):

Greg Brockman: o1 is a different kind of model. great performance requires using it in a new way relative to standard chat models.

Dan Mac: This is an amazing way to think about prompting o1 from @benhylak.

Ben Hylak: Don’t write prompts; write briefs. Give a ton of context. Whatever you think I mean by a “ton” — 10x that.

In short, treat o1 like a new hire. Beware that o1’s mistakes include reasoning about how much it should reason.

Once you’ve stuffed the model with as much context as possible — focus on explaining what you want the output to be.

This requires you to really know exactly what you want (and you should really ask for one specific output per prompt — it can only reason at the beginning!)

What o1 does well: Perfectly one-shotting entire/multiple files, hallucinating less, medical diagnosis (including for use by professionals), explaining concepts.

What o1 doesn’t do well: Writing in styles, building entire apps.

Another strategy is to first have a conversation with Claude Sonnet, get a summary, and use it as context (Rohit also mentions GPT-4o, which seems strictly worse here but you might not have a Claude subscription). This makes a lot of sense, especially when using o1 Pro.

Alternate talking with o1 and Sonnet when talking through ideas, Gallabytes reports finding this helpful.

The streams are crossing, Joe Weisenthal is excited that Claude can run and test out its own code for you.

People on the internet sometimes lie, especially about cheating, film at 11. But also the future is highly unevenly distributed, and hearing about something is different from appreciating it.

Olivia Moore: Absolutely no way that almost 80% of U.S. teens have heard of ChatGPT, but only 26% use it for homework 👀

Sully: if i was a teen using chatgpt for homework i would absolutely lie.

Never? No, never. What, never? Well, actually all the time.

I also find it hard to believe that students are this slow, especially given this is a very low bar – it’s whether you even once asked for ‘help’ at all, in any form. Whereas ChatGPT has 300 million users.

When used properly, LLMs are clearly amazingly great at education.

Ethan Mollick: New randomized, controlled trial of students using GPT-4 as a tutor in Nigeria. 6 weeks of after-school AI tutoring = 2 years of typical learning gains, outperforming 80% of other educational interventions.

And it helped all students, especially girls who were initially behind.

No working paper yet, but the results and experiment are written up here. They used Microsoft Copilot and teachers provided guidance and initial prompts.

To make clear the caveats for people who don’t read the post: learning gains are measured in Equivalent Years of Schooling, this is a pilot study on narrow topics and they do not have long-term learning measures. And there is no full paper yet (but the team is credible)

World Bank Blogs: The learning improvements were striking—about 0.3 standard deviations. To put this into perspective, this is equivalent to nearly two years of typical learning in just six weeks.

What does that say about ‘typical learning’? A revolution is coming.

Sully suggests practical improvements for Claude’s web app to increase engagement. Agreed that they should improve artifacts and include a default search tool. The ability to do web search seems super important. The ‘feel’ issue he raises doesn’t bother me.

Use a [HIDDEN][/HIDDEN] tag you made up to play 20 questions with Claude, see what happens.

Straight talk: Why do AI functions of applications like GMail utterly suck?

Nabeel Qureshi: We have had AI that can type plausible replies to emails for at least 24 months, but when I open Outlook or Gmail I don’t have pre-written drafts of all my outstanding emails waiting for me to review yet. Why are big companies so slow to ship these obvious features?

The more general version of this point is also striking – I don’t use any AI features at all in my usual suite of “pre-ChatGPT” products.

For meetings, most people (esp outside of tech) are still typing “Sure, I’d love to chat! Here are three free slots over the next few days (all times ET)”, all of which is trivially automated by LLMs now.

(If even tech companies are this slow to adjust, consider how much slower the adjustment in non-tech sectors will be…).

I know! What’s up with that?

Cyberpunk Plato: Doing the compute for every single email adds up fast. Better to have the user request it if they want it.

And at least for business software there’s a concern that if it’s built in you’re liable for it being imperfect. Average user lacks an understanding of limitations.

Nabeel Qureshi: Yeah – this seems plausibly it.

I remember very much expecting this sort of thing to be a big deal, then the features sort of showed up but they are so far universally terrible and useless.

I’m going to go ahead and predict that at least the scheduling problem will change in 2025 (although one can ask why they didn’t do this feature in 2015). As in, if you have an email requesting a meeting, GMail will offer you an easy way (a button, a short verbal command, etc) to get an AI to do the meeting scheduling for you, at minimum drafting the email for you, and probably doing the full stack back and forth and creating the eventual event, with integration with Google Calendar and a way of learning your preferences. This will be part of the whole ‘year of the agent’ thing.

For the general issue, it’s a great question. Why shouldn’t GMail be drafting your responses in advance, at least if you have a subscription that pays for the compute and you opt in, giving you much better template responses, that also have your context? Is it that hard to anticipate the things you might write?

I mostly don’t want to actually stop to tell the AI what to write at current levels of required effort – by the time I do that I might as well have written it. It needs to get to a critical level of usefulness, then you can start customizing and adapting from there.

If 2025 ends and we still don’t have useful features of these types, we’ll want to rethink.

What we don’t have are good recommendation engines, even locally, certainly not globally.

Devon Hardware’s Wife: should be a letterboxd app but it is for every human experience. i could log in and see a friend has recently reviewed “having grapes”. i could go huh they liked grapes more than Nosferatu

Joe Weisenthal: What I want is an everything recommendation app. So if I say I like grapes and nosferatu, it’ll tell me what shoes to buy.

Letterboxd doesn’t even give you predictions for your rating of other films, seriously, what is up with that?

Robin Hanson: A bad sign for LLM applications.

That sign: NewScientist comes home (on January 2, 2025):

New Scientist: Multiple experiments showed that four leading large language models often failed in patient discussions to gather complete histories, the best only doing so 71% of the time, and even then they did not always get the correct diagnosis.

New Scientist’s Grandmother: o1, Claude Sonnet and GPT-4o, or older obsolete models for a paper submitted in August 2023?

New Scientist, its head dropping in shame: GPT-3.5 and GPT-4, Llama-2-7B and Mistral-v2-7B for a paper submitted in August 2023.

Also there was this encounter:

New Scientist, looking like Will Smith: Can an AI always get a complete medical history and the correct diagnosis from talking to a patient?

GPT-4 (not even 4o): Can you?

New Scientist: Time to publish!

It gets better:

If an AI model eventually passes this benchmark, consistently making accurate diagnoses based on simulated patient conversations, this would not necessarily make it superior to human physicians, says Rajpurkar. He points out that medical practice in the real world is “messier” than in simulations. It involves managing multiple patients, coordinating with healthcare teams, performing physical exams and understanding “complex social and systemic factors” in local healthcare situations.

“Strong performance on our benchmark would suggest AI could be a powerful tool for supporting clinical work – but not necessarily a replacement for the holistic judgement of experienced physicians,” says Rajpurkar.

I love the whole ‘holistic judgment means we should overrule the AI with human judgment even though the studies are going to find that doing this makes outcomes on average worse’ which is where we all know that is going. And also the ‘sure it will do [X] better but there’s some other task [Y] and it will never do that, no no!’

The core idea here is actually pretty good – that you should test LLMs for real medical situations by better matching real medical situations and their conditions. They do say the ‘patient AI’ and ‘grader AI’ did remarkably good jobs here, which is itself a test of AI capabilities as well. They don’t seem to offer a human baseline measurement, which seems important to knowing what to do with all this.

And of course, we have no idea if there was opportunity to radically improve the results with better prompt engineering.

I do know that I predict that o3-mini or o1-pro, with proper instructions, will match or exceed human baseline (the median American practicing doctor) for gathering a complete medical history. And I would expect it to also do so for diagnosis.

I encourage one reader to step up, email them for the code (the author emails are listed in the paper) and then test at least o1.

This is Aria, their flagship AI-powered humanoid robot ‘with a social media presence’. Part 2 of the interview here. It’s from Realbotix. You can get a ‘full bodied robot’ starting at $175,000.

They claim that social robots will be even bigger than functional robots, and aim to have their robots not only ‘learn about and help promote your brand’ but also learn everything about you and help ‘with the loneliness epidemic among adolescents and teenagers and bond with you.’

And yes they use the ‘boyfriend or girlfriend’ words. You can swap faces in 10 seconds, if you want more friends or prefer polyamory.

It has face and voice recognition, and you can plug in whatever AI you like – they list Anthropic, OpenAI, DeepMind, Stability and Meta on their website.

It looks like this:

Its movements in the video are really weird, and worse than not moving at all if you exclude the lips moving as she talks. They’re going to have to work on that.

Yes, we all know a form of this is coming, and soon. And yes, these are the people from Whitney Cummings’ pretty funny special Can I Touch It? so I can confirm that the answer to ‘can I?’ can be yes if you want it to be.

But for Aria the answer is no. For a yes and true ‘adult companionship’ you have to go to their RealDoll subdivision. On the plus side, that division is much cheaper, starting at under $10k and topping out at ~$50k.

I had questions, so I emailed their press department, but they didn’t reply.

My hunch is that the real product is the RealDoll, and what you are paying the extra $100k+ for with Aria is a little bit extra mobility and such but mostly so that it does have those features so you can safely charge it to your corporate expense account, and perhaps so you and others aren’t tempted to do something you’d regret.

Pliny the Liberator claims to have demonstrated a full-stack assassination agent, that would if given funds have been capable of ‘unaliving people,’ with Claude Sonnet 3.6 being willing to select real world targets.

Introducing Astral, an AI marketing AI agent. It will navigate through the standard GUI websites like Reddit and soon TikTok and Instagram, and generate ‘genuine interactions’ across social websites to promote your startup business, in closed beta.

Matt Palmer: At long last, we have created the dead internet from the classic trope “dead internet theory.”

Tracing Woods: There is such a barrier between business internet and the human internet.

On business internet, you can post “I’ve built a slot machine to degrade the internet for personal gain” and get a bunch of replies saying, “Wow, cool! I can’t wait to degrade the internet for personal gain.”

It is taking longer than I expected for this type of tool to emerge, but it is coming. This is a classic situation where various frictions were preserving our ability to have nice things like Reddit. Without those frictions, we are going to need new ones. Verified identity or paid skin in the game, in some form, is the likely outcome.

Out with the old, in with the new?

Janel Comeau: sort of miss the days when you’d tweet “I like pancakes” and a human would reply “oh, so you hate waffles” instead of twelve AI bots responding with “pancakes are an enjoyable food”

Instagram ads are the source of 90% of traffic for a nonconsensual nudity app Crushmate or Crush AI, with the ads themselves featuring such nonconsensual nudity of celebrities such as Sophie Rain. I did a brief look-see at the app’s website. They have a top scroll saying ‘X has just purchased’ which is what individual struggling creators do, so it’s probably 90% of not very much, and when you’re ads driven you choose where the ads go. But it’s weird, given what other ads don’t get approved, that they can get this level of explicit past the filters. The ‘nonconsensual nudity’ seems like a side feature of a general AI-image-and-spicy-chat set of offerings, including a number of wholesome offerings too.

AI scams are still rare, and mostly get detected, but it’s starting modulo the lizardman constant issue:

Richard Hanania notes that the bot automatic social media replies are getting better, but says ‘you can still something is off here.’ I did not go in unanchored, but this does not seem as subtle as he makes it out to be, his example might as well scream AI generated:

My prior on ‘that’s AI’ is something like 75% by word 4, 95%+ after the first sentence. Real humans don’t talk like that.

I also note that it seems fairly easy to train an AI classifier to do what I instinctively did there, and catch things like this with very high precision. If it accidentally catches a few college undergraduates trying to write papers, I notice my lack of sympathy.

But that’s a skill issue, and a choice. The reason Aiman’s response is so obvious is that it has exactly that RLHF-speak. One could very easily fine tune in a different direction, all the fine tuning on DeepSeek v3 was only five figures in compute and they give you the base model to work with.

Richard Hanania: The technology will get better though. We’ll eventually get to the point that if your account is not connected to a real person in the world, or it wasn’t grandfathered in as an anonymous account, people will assume you’re a bot because there’s no way to tell the difference.

That will be the end of the ability to become a prominent anonymous poster.

I do continue to expect things to move in that direction, but I also continue to expect there to be ways to bootstrap. If nothing else, there is always money. This isn’t flawless, as Elon Musk as found out with Twitter, but it should work fine, so long as you reintroduce sufficient friction and skin in the game.

The ability to elicit the new AI generated song Six Weeks from AGI causes Steve Sokolowski to freak out about potential latent capabilities in other AI models. I find it heavily mid to arrive at this after a large number of iterations and amount of human attention, especially in terms of its implications, but I suppose it’s cool you can do that.

Daron Acemoglu is economically highly skeptical of and generally against AI. It turns out this isn’t about the A, it’s about the I, as he offers remarkably related arguments against H-1B visas and high skilled human immigration.

The arguments here are truly bizarre. First he says if we import people with high skills, then this may prevent us from training our own people with high skills, And That’s Terrible. Then he says, if we import people with high skills, we would have more people with high skills, And That’s Terrible as well because then technology will change to favor high-skilled workers. Tyler Cowen has o1 and o1 pro respond, as a meta-commentary on what does and doesn’t constitute high skill these days.

Tyler Cowen: If all I knew were this “exchange,” I would conclude that o1 and o1 pro were better economists — much better — than one of our most recent Nobel Laureates, and also the top cited economist of his generation. Noah Smith also is critical.

Noah Smith (after various very strong argument details): So Acemoglu wants fewer H-1bs so we have more political pressure for domestic STEM education. But he also thinks having more STEM workers increases inequality, by causing inventors to focus on technologies that help STEM workers instead of normal folks! These two arguments clearly contradict each other.

In other words, it seems like Acemoglu is grasping for reasons to support a desired policy conclusion, without noticing that those arguments are inconsistent. I suppose “finding reasons to support a desired policy conclusion” is kind of par for the course in the world of macroeconomic theory, but it’s not a great way to steer national policy.

Noah Smith, Tyler Cowen and o1 are all highly on point here.

In terms of AI actually taking our jobs, Maxwell Tabarrok reiterates his claim that comparative advantage will ensure human labor continues to have value, no matter how advanced and efficient AI might get, because there will be a limited supply of GPUs, datacenters and megawatts, and advanced AIs will face constraints, even if they could do all tasks humans could do more efficiently (in some senses) than we can.

I actually really like Maxwell’s thread here, because it’s a simple, short, clean and within its bounds valid version of the argument.

His argument successfully shows that, absent transaction costs and the literal cost of living, assuming humans have generally livable conditions with the ability to protect their private property and engage in trade and labor, and given some reasonable additional assumptions not worth getting into here, human labor outputs will retain positive value in such a world.

He shows this value would likely converge to some number higher than zero, probably, for at least a good number of people. It definitely wouldn’t be all of them, since it already isn’t, there are many ZMP (zero marginal product) workers you wouldn’t hire at $0.

Except we have no reason to think that number is all that much higher than $0. And then you have to cover not only transaction costs, but the physical upkeep costs of providing human labor, especially to the extent those inputs are fungible with AI inputs.

Classically, we say ‘the AI does not love, you the AI does not hate you, but you are made of atoms it can use for something else.’ In addition to the atoms that compose you, you require sustenance of various forms to survive, especially if you are to live a life of positive value, and also to include all-cycle lifetime costs.

Yes, in such scenarios, the AIs will be willing to pay some amount of real resources for our labor outputs, in trade. That doesn’t mean this amount will be enough to pay for the imports to those outputs. I see no reason to expect that it would clear the bar of the Iron Law of Wages, or even near term human upkeep.

This is indeed what happened to horses. Marginal benefit mostly dropped below marginal cost, the costs to maintain horses were fungible with paying costs for other input factors, so quantity fell off a cliff.

Seb Krier says a similar thing in a different way, noticing that AI agents can be readily cloned, so at the limit for human labor to retain value you need to be sufficiently compute constrained that there are sufficiently valuable tasks left for humans to do. Which in turn relies on non-fungibility of inputs, allowing you to take the number of AIs and humans as given.

Davidad: At equilibrium, in 10-20 years, the marginal price of nonphysical labour could be roughly upper-bounded by rent for 0.2m² of arid land, £0.02/h worth of solar panel, and £0.08/h worth of GPU required to run a marginal extra human-equivalent AI agent.

For humans to continue to be able to survive, they need to pay for themselves. In these scenarios, doing so off of labor at fair market value seems highly unlikely. That doesn’t mean the humans can’t survive. As long as humans remain in control, this future society is vastly wealthier and can afford to do a lot of redistribution, which might include reserving fake or real jobs and paying non-economic wages for them. It’s still a good thing, I am not against all this automation (again, if we can do so while retaining control and doing sufficient redistribution). The price is still the price.

One thing AI algorithms never do is calculate p-values, because why would they?

The Verge’s Richard Lawler reports that Las Vegas police have released ChatGPT logs from the suspect in the Cybertruck explosion. We seem to have his questions but not the replies.

It seems like… the suspect used ChatGPT instead of Google, basically?

Here’s the first of four screenshots:

Richard Lawler (The Verge): Trying the queries in ChatGPT today still works, however, the information he requested doesn’t appear to be restricted and could be obtained by most search methods.

Still, the suspect’s use of a generative AI tool and the investigators’ ability to track those requests and present them as evidence take questions about AI chatbot guardrails, safety, and privacy out of the hypothetical realm and into our reality.

The Spectator Index: BREAKING: Person who blew up Tesla Cybertruck outside Trump hotel in Las Vegas used ChatGPT to help in planning the attack.

Spence Purnell: PSA: Tech is not responsible for horrible human behavior, and regulating it will not stop bad actors.

There are certainly steps companies can take and improvements to be made, but let’s not blame the tech itself.

Colin Fraser: The way cops speak is so beautiful.

[He quotes]: Police Sheriff Kevin McMahill said: “I think this is the first incident that I’m aware of on U.S. soil where ChatGPT is utilized to help an individual build a particular device.”

When you look at the questions he asked, it is pretty obvious he is planning to build a bomb, and an automated AI query that (for privacy reasons) returned one bit of information would give you that information without many false positives. The same is true of the Google queries of many suspects after they get arrested.

None of this is information that would have been hard to get via Google. ChatGPT made his life modestly easier, nothing more. I’m fine with that, and I wouldn’t want ChatGPT to refuse such questions, although I do think ‘we can aspire to do better’ here in various ways.

And in general, yes, people like cops and reporters are way too quick to point to the tech involved, such as ChatGPT, or to the cybertruck, or the explosives, or the gun. Where all the same arguments are commonly made, and are often mostly or entirely correct.

But not always. It is common to hear highly absolutist responses, like the one by Purnell above, that regulation of technology ‘will not stop bad actors’ and thus would have no effect. That is trying to prove too much. Yes, of course you can make life harder for bad actors, and while you won’t stop all of them entirely and most of the time it totally is not worth doing, you can definitely reduce your expected exposure.

This example does provide a good exercise, where hopefully we can all agree this particular event was fine if not ideal, and ask what elements would need to change before it was actively not fine anymore (as opposed to ‘we would ideally like you to respond noticing what is going on and trying to talk him out of it’ or something). What if the device was non-conventional? What if it more actively helped him engineer a more effective device in various ways? And so on.

Zuckerberg signed off on Meta training on copyrighted works, oh no. Also they used illegal torrent to download works for training, which does seem not so awesome I suppose, but yes of course everyone is training on all the copyrighted works.

What is DeepSeek v3’s secret? Did they really train this thing for $5.5 million?

China Talk offers an analysis. The answer is: Yes, but in other ways no.

The first listed secret is that DeepSeek has no business model. None. We’re talking about sex-in-the-champaign-room levels of no business model. They release models, sure, but not to make money, and also don’t raise capital. This allows focus. It is classically a double edged sword, since profit is a big motivator, and of course this is why DeepSeek was on a limited budget.

The other two secrets go together: They run their own datacenters, own their own hardware and integrate all their hardware and software together for maximum efficiency. And they made this their central point of emphasis, and executed well. This was great at pushing the direct quantities of compute involved down dramatically.

The trick is, it’s not so cheap or easy to get things that efficient. When you rack your own servers, you get reliability and confidentiality and control and ability to optimize, but in exchange your compute costs more than when you get it from a cloud service.

Jordan Schneider and Lily Ottinger: A true cost of ownership of the GPUs — to be clear, we don’t know if DeepSeek owns or rents the GPUs — would follow an analysis similar to the SemiAnalysis total cost of ownership model (paid feature on top of the newsletter) that incorporates costs in addition to the actual GPUs. For large GPU clusters of 10K+ A/H100s, line items such as electricity end up costing over $10M per year. The CapEx on the GPUs themselves, at least for H100s, is probably over $1B (based on a market price of $30K for a single H100).

With headcount costs that can also easily be over $10M per year, estimating the cost of a year of operations for DeepSeek AI would be closer to $500M (or even $1B+) than any of the $5.5M numbers tossed around for this model.

Since they used H800s, not H100s you’ll need to adjust that, but the principle is similar. Then you have to add on the cost of the team and its operations, to create all these optimizations and reach this point. Getting the core compute costs down is still a remarkable achievement, and raises big governance questions and challenges whether we can rely on export controls. Kudos to all involved. But this approach has its own challenges.

The alternative hypothesis does need to be said, especially after someone at a party outright claimed it was obviously true, and with the general consensus that the previous export controls were not all that tight. That alternative hypothesis is that DeepSeek is lying and actually used a lot more compute and chips it isn’t supposed to have. I can’t rule it out.

Survival and Flourishing Fund is hiring a Full-Stack Software Engineer.

Anthropic’s Alignment Science team suggests research directions. Recommended.

We’re getting to the end of the fundraiser for Lightcone Infrastructure, and they’re on the bubble of where they have sufficient funds versus not. You can donate directly here.

A very basic beta version of ChatGPT tasks, or according to my 4o instance GPT-S, I presume for scheduler. You can ask it to schedule actions in the future, either once or recurring. It will provide the phone notifications. You definitely weren’t getting enough phone notifications.

Anton: They turned the agi into a todo list app 🙁

They will pay for this.

Look how they rlhf’d my boy :'(

It looks like they did this via scheduling function calls based on the iCal VEVENT format, claimed instruction set here. Very basic stuff.

In all seriousness, incorporating a task scheduler by itself, in the current state of available other resources, is a rather limited tool. You can use it for reminders and timers, and perhaps it is better than existing alternatives for that. You can use it to ‘generate news briefing’ or similarly check the web for something. When this gets more integrations, and broader capability support over time, that’s when this gets actually interesting.

The initial thing that might be interesting right away is to do periodic web searches for potential information, as a form of Google Alerts with more discernment. Perhaps keep an eye on things like concerts and movies playing in the area. The basic problem is that right now this new assistant doesn’t have access to many tools, and it doesn’t have access to your context, and I expect it to flub complicated tasks.

GPT-4o agreed that most of the worthwhile uses require integrations that do not currently exist.

For now, the product is not reliably causing tasks to fire. That’s an ordinary first-day engineering problem that I assume gets fixed quickly, if it hasn’t already. But until it can do more complex things or integrate the right context automatically, ideally both, we don’t have much here.

I would note that you mostly don’t need to test the task scheduler by scheduling a task. We can count on OpenAI to get ‘cause this to happen at time [X]’ correct soon enough. The question is, can GPT-4o do [X] at all? Which you can test by telling it to do [X] now.

Reddit Answers, an LLM-based search engine. Logging in gets you 20 questions a day.

ExoRoad, a fun little app where you describe your ideal place to live and it tells you what places match that.

Lightpage, a notes app that then uses AI that remembers all of your notes and prior conversations. And for some reason it adds in personalized daily inspiration. I’m curious to see such things in action, but the flip side of the potential lock-in effects are the startup costs. Until you’ve taken enough notes to give this context, it can’t do the task it wants to do, so this only makes sense if you don’t mind taking tons of notes ‘out of the gate’ without the memory features, or if it could import memory and context. And presumably this wants to be a Google, Apple or similar product, so the notes integrate with everything else.

Shortwave, an AI email app which can organize and manage your inbox.

Writeup of details of WeirdML, a proposed new benchmark I’ve mentioned before.

Summary of known facts in Suchir Balaji’s death, author thinks 96% chance it was a suicide. The police have moved the case to ‘Open and Active Investigation.’ Good. If this wasn’t foul play, we should confirm that.

Nothing to see here, just OpenAI posting robotics hardware roles to ‘build our robots.’

Marc Andreessen has been recruiting and interviewing people for positions across the administration including at DoD (!) and intelligence agencies (!!). To the victor go the spoils, I suppose.

Nvidia to offer $3,000 personal supercomputer with a Blackwell chip, capable of running AI models up to 200B parameters.

An ‘AI hotel’ and apartment complex is coming to Las Vegas in May 2025. Everything works via phone app, including door unlocks. Guests get onboarded and tracked, and are given virtual assistants called e-butlers, to learn guest preferences including things like lighting and temperature, and give guests rooms (and presumably other things) that match their preferences. They then plan to expand the concept globally, including in Dubai. Prices sound steep, starting with $300 a night for a one bedroom. What will this actually get you? So far, seems unclear.

I see this as clearly going in a good direction, but I worry it isn’t ready. Others see it as terrible that capitalism knows things about them, but in most contexts I find that capitalism knowing things about me is to my benefit, and this seems like an obvious example and a win-win opportunity, as Ross Rheingans-Yoo notes?

Tyler Cowen: Does it know I want a lot of chargers, thin pillows, and lights that are easy to turn off at night? Furthermore the shampoo bottle should be easy to read in the shower without glasses. Maybe it knows now!

I’ve talked about it previously, but I want full blackout at night, either true silence or convenient white noise that fixes this, thick pillows and blankets, lots of chargers, a comfortable chair and desk, an internet-app-enabled TV and some space in a refrigerator and ability to order delivery right to the door. If you want to blow my mind, you can have a great multi-monitor setup to plug my laptop into and we can do real business.

Aidan McLau joins OpenAI to work on model design, offers to respond if anyone has thoughts on models. I have thoughts on models.

To clarify what OpenAI employees are often saying about superintelligence (ASI): No, they are not dropping hints that they currently have ASI internally. They are saying that they know how to build ASI internally, and are on a path to soon doing so. You of course can choose the extent to which you believe them.

Ethan Mollick writes Prophecies of the Flood, pointing out that the three major AI labs all have people shouting from the rooftops that they are very close to AGI and they know how to build it, in a way they didn’t until recently.

As Ethan points out, we are woefully unprepared. We’re not even preparing reasonably for the mundane things that current AIs can do, in either the sense of preparing for risks, or in the sense of taking advantage of its opportunities. And almost no one is giving much serious thought to what the world full of AIs will actually look like and what version of it would be good for humans, despite us knowing such a world is likely headed our way. That’s in addition to the issue that these future highly capable systems are existential risks.

Gary Marcus predictions for the end of 2025, a lot are of the form ‘[X] will continue to haunt generative AI’ without reference to magnitude. Others are predictions that we won’t cross some very high threshold – e.g. #16 is ‘Less than 10% of the workforce will be replaced by AI, probably less than 5%,’ notice how dramatically higher a bar that is than for example Tyler Cowen’s 0.5% RGDP growth and this is only in 2025.

His lower confidence predictions start to become aggressive and specific enough that I expect them to often be wrong (e.g. I expect a ‘GPT-5 level’ model no matter what we call that, and I expect AI companies to outperform the S&P and for o3 to see adaptation).

Eli Lifland gives his predictions and evaluates some past ones. He was too optimistic on agents being able to do routine computer tasks by EOY 2024, although I expect to get to his thresholds this year. While all three of us agree that AI agents will be ‘far from reliable’ for non-narrow tasks (Gary’s prediction #9) I think they will be close enough to be quite useful, and that most humans are ‘not reliable’ in this sense.

He’s right of course, and this actually did update me substantially on o3?

Sam Altman: prediction: the o3 arc will go something like:

1. “oh damn it’s smarter than me, this changes everything ahhhh”

2. “so what’s for dinner, anyway?”

3. “can you believe how bad o3 is? and slow? they need to hurry up and ship o4.”

swag: wait o1 was smarter than me.

Sam Altman: That’s okay.

The scary thing about not knowing is the right tail where something like o3 is better than you think it is. This is saying, essentially, that this isn’t the case? For now.

Please take the very consistently repeated claims from the major AI labs about both the promise and danger of AI both seriously and literally. They believe their own hype. That doesn’t mean you have to agree with those claims. It is very reasonable to think these people are wrong, on either or both counts, and they are biased sources. I am however very confident that they themselves believe what they are saying in terms of expected future AI capabilities, and when they speak about AI existential risks. I am also confident they have important information that you and I do not have, that informs their opinions.

This of course does not apply to claims regarding a company’s own particular AI application or product. That sort of thing is always empty hype until proven otherwise.

Via MR, speculations on which traits will become more versus less valuable over time. There is an unspoken background assumption here that mundane-AI is everywhere and automates a lot of work but doesn’t go beyond that. A good exercise, although I am not in agreement on many of the answers even conditional on that assumption. I especially worry about conflation of rarity with value – if doing things in real life gets rare or being skinny becomes common, that doesn’t tell you much about whether they rose or declined in value. Another throughput line here is an emphasis on essentially an ‘influencer economy’ where people get value because others listen to them online.

Davidad revises his order-of-AI-capabilities expectations.

Davidad: Good reasons to predict AI capability X will precede AI capability Y:

  1. Effective compute requirements for X seem lower

  2. Y needs new physical infrastructure

Bad reasons:

  1. It sounds wild to see Y as possible at all

  2. Y seems harder to mitigate (you need more time for that!)

Because of the above biases, I previously predicted this rough sequence of critically dangerous capabilities:

  1. Constructing unstoppable AI malware

  2. Ability to plan and execute a total coup (unless we build new defenses)

  3. Superpersuasion

  4. Destabilizing economic replacement

Now, my predicted sequencing of critically dangerous AI capabilities becoming viable is more like:

  1. Superpersuasion/parasitism

  2. Destabilizing economic replacement

  3. Remind me again why the AIs would benefit from attempting an overt coup?

  4. Sure, cyber, CBRN, etc., I guess

There’s a lot of disagreement about order of operations here.

That’s especially true on persuasion. A lot of people think persuasion somehow tops off at exactly human level, and AIs won’t ever be able to do substantially better. The human baseline for persuasion is sufficiently low that I can’t convince them otherwise, and they can’t even convey to me reasons for this that make sense to me. I very much see AI super-persuasion as inevitable, but I’d be very surprised by Davidad’s order of this coming in a full form worthy of its name before the others.

A lot of this is a matter of degree. Presumably we get a meaningful amount of all the three non-coup things here before we get the ‘final form’ or full version of any of them. If I had to pick one thing to put at the top, it would probably be cyber.

The ‘overt coup’ thing is a weird confusion. Not that it couldn’t happen, but that most takeover scenarios don’t work like that and don’t require it, I’m choosing not to get more into that right here.

Ajeya Cotra: Pretty different from my ordering:

1. Help lay ppl make ~known biothreats.

2. Massively accelerate AI R&D, making 3-6 come faster.

3. Massively accelerate R&D on worse biothreats.

4. Massive accelerate other weapons R&D.

5. Outright AI takeover (overpower humans combined).

There is no 6 listed, which makes me love this Tweet.

Ajeya Cotra: I’m not sure what level of persuasion you’re referring to by “superpersuasion,” but I think AI systems will probably accelerate R&D before they can reliably sweet-talk arbitrary people into taking actions that go massively against their interests.

IMO a lot of what people refer to as “persuasion” is better described as “negotiation”: if an AI has *hard leverage(eg it can threaten to release a bioweapon if we don’t comply), then sure, it can be very “persuasive”

But concretely speaking, I think we get an AI system that can make bioweapons R&D progress 5x faster before we get one that can persuade a randomly selected individual to kill themselves just by talking to them.

Gwern points out that if models like first o1 and then o3, and also the unreleased Claude Opus 3.6, are used primarily to create training data for other more distilled models, the overall situation still looks a lot like the old paradigm. You put in a ton of compute to get first the new big model and then to do the distillation and data generation. Then you get the new smarter model you want to use.

The biggest conceptual difference might be that to the extent the compute used is inference, this allows you to use more distributed sources of compute more efficiently, making compute governance less effective? But the core ideas don’t change that much.

I also note that everyone is talking about synthetic data generation from the bigger models, but no one is talking about feedback from the bigger models, or feedback via deliberation of reasoning models, especially in deliberate style rather than preference expression. Especially for alignment but also for capabilities, this seems like a big deal? Yes, generating the right data is important, especially if you generate it where you know ‘the right answer.’ But this feels like it’s missing the true potential on offer here.

This also seems on important:

Ryan Kidd: However, I expect RL on CoT to amount to “process-based supervision,” which seems inherently safer than “outcome-based supervision.”

Daniel Kokotajlo: I think the opposite is true; the RL on CoT that is already being done and will increasingly be done is going to be in significant part outcome-based (and a mixture of outcome-based and process-based feedback is actually less safe than just outcome-based IMO, because it makes the CoT less faithful).

It is easy to see how Daniel could be right that process-based creates unfaithfulness in the CoT, it would do that by default if I’m understanding this right, but it does not seem obvious to me it has to go that way if you’re smarter about it, and set the proper initial conditions and use integrated deliberate feedback.

(As usual I have no idea where what I’m thinking here lies on ‘that is stupid and everyone knows why it doesn’t work’ to ‘you fool stop talking before someone notices.’)

If you are writing today for the AIs of tomorrow, you will want to be thinking about how the AI will internalize and understand and learn from what you are saying. There are a lot of levels on which you can play that. Are you aiming to imbue particular concepts or facts? Trying to teach it about you in particular? About modes of thinking or moral values? Get labels you can latch onto later for magic spells and invocations? And perhaps most neglected, are you aiming for near-term AI, or future AIs that will be smarter and more capable, including having better truesight? It’s an obvious mistake to try to pander to or manipulate future entities smart enough to see through that. You need to keep it genuine, or they’ll know.

The post in Futurism here by Jathan Sadowski can only be described as bait, and not very well reasoned bait, shared purely for context for Dystopia’s very true response, and also because the concept is very funny.

Dystopia Breaker: it is remarkable how fast things have shifted from pedantic objections to just total denial.

how do you get productive input from the public about superintelligence when there is a huge portion that chooses to believe that deep learning simply isn’t real

Jathan Sadowski: New essay by me – I argue that the best way to understand artificial intelligence is via the Tinkerbell Effect. This technology’s existence requires us to keep channeling our psychic energy into the dreams of mega-corporations, tech billionaires, and venture capitalists.

La la la not listening, can’t hear you. A classic strategy.

UK PM Keir Starmer has come out with a ‘blueprint to turbocharge AI.

In a marked move from the previous government’s approach, the Prime Minister is throwing the full weight of Whitehall behind this industry by agreeing to take forward all 50 recommendations set out by Matt Clifford in his game-changing AI Opportunities Action Plan.

His attitude towards existential risk from AI is, well, not good:

Keir Starmer (UK PM): New technology can provoke a reaction. A sort of fear, an inhibition, a caution if you like. And because of fears of a small risk, too often you miss the massive opportunity. So we have got to change that mindset. Because actually the far bigger risk, is that if we don’t go for it, we’re left behind by those who do.

That’s pretty infuriating. To refer to ‘fears of’ a ‘small risk’ and act as if this situation is typical of new technologies, and use that as your entire logic for why your plan essentially disregards existential risk entirely.

It seems more useful, though, to take the recommendations as what they are, not what they are sold as. I don’t actually see anything here that substantially makes existential risk worse, except insofar as it is a missed opportunity. And the actual plan author, Matt Clifford, shows signs he does understand the risks.

So do these 50 implemented recommendations accomplish what they set out to do?

If someone gives you 50 recommendations, and you adapt all 50, I am suspicious that you did critical thinking about the recommendations. Even ESPN only goes 30 for 30.

I also worry that if you have 50 priorities, you have no priorities.

What are these recommendations? The UK should spend more money, offer more resources, create more datasets, develop more talent and skills, including attracting skilled foreign workers, fund the UK AISI, have everyone focus on ‘safe AI innovation,’ do ‘pro-innovation’ regulatory things including sandboxes, ‘adopt a scan>pilot>scale’ approach in government and so on.

The potential is… well, actually they think it’s pretty modest?

Backing AI to the hilt can also lead to more money in the pockets of working people. The IMF estimates that – if AI is fully embraced – it can boost productivity by as much as 1.5 percentage points a year. If fully realised, these gains could be worth up to an average £47 billion to the UK each year over a decade.

The central themes are ‘laying foundations for AI to flourish in the UK,’ ‘boosting adaptation across public and private sectors,’ and ‘keeping us head of the pack.’

To that end, we’ll have ‘AI growth zones’ in places like Culham, Oxfordshire. We’ll have public compute capacity. And Matt Clifford (the original Man with the Plan) as an advisor to the PM.We’ll create a new National Data Library. We’ll have an AI Energy Council.

Dario Amodei calls this a ‘bold approach that could help unlock AI’s potential to solve real problems.’ Half the post is others offering similar praise.

Demis Hassabis: Great to see the brilliant @matthewclifford leading such an important initiative on AI. It’s a great plan, which I’m delighted to be advising on, and I think will help the UK continue to be a world leader in AI.

Here is Matt Clifford’s summary Twitter thread.

Matt Clifford: Highlights include:

🏗️ AI Growth Zones with faster planning permission and grid connections

🔌 Accelerating SMRs to power AI infra

📈 20x UK public compute capacity

✂️ Procurement, visas and reg reform to boost UK AI startups

🚀 Removing barriers to scaling AI pilots in gov

AI safety? Never heard of her, although we’ll sprinkle the adjective ‘safe’ on things in various places.

Here Barney Hussey-Yeo gives a standard Rousing Speech for a ‘UK Manhattan Project’ not for AGI, but for ordinary AI competitiveness. I’d do my Manhattan Project on housing if I was the UK, I’d still invest in AI but I’d call it something else.

My instinctive reading here is indeed that 50 items is worse than 5, and this is a kitchen sink style approach of things that mostly won’t accomplish anything.

The parts that likely matter, if I had to guess, are:

  1. Aid with electrical power, potentially direct compute investments.

  2. Visa help and ability to import talent.

  3. Adaptation initiatives in government, if they aren’t quashed. For Dominic Cummings-style reasons I am skeptical they will be allowed to work.

  4. Maybe this will convince people the vibes are good?

The vibes do seem quite good.

A lot of people hate AI because of the environmental implications.

When AI is used at scale, the implications can be meaningful.

However, when the outputs of regular LLMs are read by humans, this does not make any sense. The impact is miniscule.

Note that arguments about impact on AI progress are exactly the same. Your personal use of AI does not have a meaningful impact on AI progress – if you find it useful, you should use it, based on the same logic.

Andy Masley: If you don’t have time to read this post, these two images contain most of the argument:

I’m also a fan of this:

Andy Masley: If your friend were about to drive their personal largest ever in history cruise ship solo for 60 miles, but decided to walk 1 mile to the dock instead of driving because they were “concerned about the climate impact of driving” how seriously would you take them?

It is true that a ChatGPT question uses 10x as much energy as a Google search. How much energy is this? A good first question is to ask when the last time was that you heard a climate scientist bring up Google search as a significant source of emissions. If someone told you that they had done 1000 Google searches in a day, would your first thought be that the climate impact must be terrible? Probably not.

The average Google search uses 0.3 Watt-hours (Wh) of energy. The average ChatGPT question uses 3 Wh, so if you choose to use ChatGPT over Google, you are using an additional 2.7 Wh of energy.

How concerned should you be about spending 2.7 Wh? 2.7 Wh is enough to

In Washington DC, the household cost of 2.7 Wh is $0.000432.

All this concern, on a personal level, is off by orders of magnitude, if you take it seriously as a physical concern.

Rob Miles: As a quick sanity check, remember that electricity and water cost money. Anything a for profit company hands out for free is very unlikely to use an environmentally disastrous amount of either, because that would be expensive.

If OpenAI is making money by charging 30 cents per *milliongenerated tokens, then your thousand token task can’t be using more than 0.03 cents worth of electricity, which just… isn’t very much.

There is an environmental cost, which is real, it’s just a cost on the same order as the amounts of money involved, which are small.

Whereas the associated costs of existing as a human, and doing things including thinking as a human, are relatively high.

One must understand that such concerns are not actually about marginal activities and their marginal cost. They’re not even about average costs. This is similar to many other similar objections, where the symbolic nature of the action gets people upset vastly out of proportion to the magnitude of impact, and sacrifices are demanded that do not make any sense, while other much larger actually meaningful impacts are ignored.

Senator Weiner is not giving up.

Michael Trazzi: Senator Scott Wiener introduces intent bill SB 53, which will aim to:

– establish safeguards for AI frontier model development

– incorporate findings from the Joint California Policy Working Group on AI Frontier Models (which Governor Newsom announced the day he vetoed SB 1047)

An argument from Anton Leicht that Germany and other ‘middle powers’ of AI need to get AI policy right, even if ‘not every middle power can be the UK,’ which I suppose they cannot given they are within the EU and also Germany can’t reliably even agree to keep open its existing nuclear power plants.

I don’t see a strong case here for Germany’s policies mattering much outside of Germany, or that Germany might aspire to a meaningful role to assist with safety. It’s more that Germany could screw up its opportunity to get the benefits from AI, either by alienating the United States or by putting up barriers, and could do things to subsidize and encourage deployment. To which I’d say, fair enough, as far as that goes.

Dario Amodei and Matt Pottinger write a Wall Street Editorial called ‘Trump Can Keep America’s AI Advantage,’ warning that otherwise China would catch up to us, then calling for tightening of chip export rules, and ‘policies to promote innovation.’

Dario Amodei and Matt Pottinger: Along with implementing export controls, the U.S. will need to adopt other strategies to promote its AI innovation. President-elect Trump campaigned on accelerating AI data-center construction by improving energy infrastructure and slashing burdensome regulations. These would be welcome steps. Additionally, the administration should assess the national-security threats of AI systems and how they might be used against Americans. It should deploy AI within the federal government, both to increase government efficiency and to enhance national defense.

I understand why Dario would take this approach and attitude. I agree on all the concrete substantive suggestions. And Sam Altman’s framing of all this was clearly far more inflammatory. I am still disappointed, as I was hoping against hope that Anthropic and Dario would be better than to play into all this, but yeah, I get it.

Dean Ball believes we are now seeing reasoning translate generally beyond math, and his ideal law is unlikely to be proposed, and thus is willing to consider a broader range of regulatory interventions than before. Kudos to him for changing one’s mind in public, he points to this post to summarize the general direction he’s been going.

New export controls are indeed on the way for chips. Or at least the outgoing administration has plans.

America’s close allies get essentially unrestricted access, but we’re stingy with that, a number of NATO countries don’t make the cut. Tier two countries, in yellow above, have various hoops that must be jumped through to get or use chips at scale.

Mackenzie Hawkins and Jenny Leonard: Companies headquartered in nations in [Tier 2] would be able to bypass their national limits — and get their own, significantly higher caps — by agreeing to a set of US government security requirements and human rights standards, according to the people. That type of designation — called a validated end user, or VEU — aims to create a set of trusted entities that develop and deploy AI in secure environments around the world.

Shares of Nvidia, the leading maker of AI chips, dipped more than 1% in late trading after Bloomberg reported on the plan.

The vast majority of countries fall into the second tier of restrictions, which establishes maximum levels of computing power that can go to any one nation — equivalent to about 50,000 graphic processing units, or GPUs, from 2025 to 2027, the people said. But individual companies can access significantly higher limits — that grow over time — if they apply for VEU status in each country where they wish to build data centers.

Getting that approval requires a demonstrated track record of meeting US government security and human rights standards, or at least a credible plan for doing so. Security requirements span physical, cyber and personnel concerns. If companies obtain national VEU status, their chip imports won’t count against the maximum totals for that country — a measure to encourage firms to work with the US government and adopt American AI standards.

Add in some additional rules where a company can keep how much of its compute, and some complexity about what training runs constitute frontier models that trigger regulatory requirements.

Leave it to the Biden administration to everything bagel in human rights standards, and impose various distributional requirements on individual corporations, and to leave us all very confused about key details that will determine practical impact. As of writing this, I don’t know where this lines either in terms of how expensive and annoying this will be, and also whether it will accomplish much.

To the extent all this makes sense, it should focus on security, and limiting access for our adversaries. No everything bagels. Hopefully the Trump administration can address this if it keeps the rules mostly in place.

There’s a draft that in theory we can look at but look, no, sorry, this is where I leave you, I can’t do it, I will not be reading that. Henry Farrell claims to understand what it actually says. Semi Analysis has a very in depth analysis.

Farrell frames this as a five-fold bet on scaling, short term AGI, the effectiveness of the controls themselves, having sufficient organizational capacity and on the politics of the incoming administration deciding to implement the policy.

I see all five as important. If the policy isn’t implemented, nothing happens, so the proposed bet is on the other four. I see all of them as continuums rather than absolutes.

Yes, the more scaling and AGI we get sooner, the more effective this all will be, but having an advantage in compute will be strategically important in pretty much any scenario, if only for more and better inference on o3-style models.

Enforcement feels like one bet rather than two – you can always break up any plan into its components, but the question is ‘to what extent will we be able to direct where the chips go?’ I don’t know the answer to that.

No matter what, we’ll need adequate funding to enforce all this (see: organizational capacity and effectiveness), which we don’t yet have.

Miles Brundage: Another day, another “Congress should fund the Bureau of Industry and Security at a much higher level so we can actually enforce export controls.”

He interestingly does not mention a sixth potential problem, that this could drive some countries or companies into working with China instead of America, or hurt American allies needlessly. These to me are the good argument against this type of regime.

The other argument is the timing and methods. I don’t love doing this less than two weeks before leaving office, especially given some of the details we know and also the details we don’t yet know or understand, after drafting it without consultation.

However the incoming administration will (I assume) be able to decide whether to actually implement these rules or not, as per point five.

In practice, this is Biden proposing something to Trump. Trump can take it or leave it, or modify it. Semi Analysis suggests Trump will likely keep this as America first and ultimately necessary, and I agree. I also agree that it opens the door for ‘AI diplomacy’ as newly Tier 2 countries seek to move to Tier 1 or get other accommodations – Trump loves nothing more than to make this kind of restriction, then undo it via some kind of deal.

Semi Analysis essentially says that the previous chip rules were Swiss cheese that was easily circumvented, whereas this new proposed regime would inflict real costs in order to impose real restrictions, on not only chips but also on who gets to do frontier model training (defined as over 10^26 flops, or fine tuning of more than ~2e^25 which as I understand current practice should basically never happen without 10^26 in pretraining unless someone is engaged in shenanigans) and in exporting the weights of frontier closed models.

Note that if more than 10% of data used for a model is synthetic data, then the compute that generated the synthetic data counts towards the threshold. If there essentially gets to be a ‘standard synthetic data set’ or something that could get weird.

They note that at scale this effectively bans confidential computing. If you are buying enough compute to plausibly train frontier AI models, or even well short of that, we don’t want the ‘you’ to turn out to be China, so not knowing who you are is right out.

Semi Analysis notes that some previously restricted countries like UAE and Saudi Arabia are de facto ‘promoted’ to Tier 2, whereas others like Brazil, Israel, India and Mexico used to be unrestricted but now must join them. There will be issues with what would otherwise be major data centers, they highlight one location in Brazil. I agree with them that in such cases, we should expect deals to be worked out.

They expect the biggest losers will be Malaysia and Singapore, as their ultimate customer was often ByteDance, which also means Oracle might lose big. I would add it seems much less obvious America will want to make a deal, versus a situation like Brazil or India. There will also be practical issues for at least some non-American companies that are trying to scale, but that won’t be eligible to be VEUs.

Although Semi Analysis thinks the impact on Nvidia is overstated here, Nvidia is pissed, and issued a scathing condemnation full of general pro-innovation logic, claiming that the rules even prior to enforcement are ‘already undercutting U.S. interests.’ The response does not actually discuss any of the details or mechanisms, so again it’s impossible to know to what extent Nvidia’s complaints are valid.

I do think Nvidia bears some of the responsibility for this, by playing Exact Words with the chip export controls several times over and turning a fully blind eye to evasion by others. We have gone through multiple cycles of Nvidia being told not to sell advanced AI chips to China. Then they turn around and figure out exactly what they can sell to China while not technically violating the rules. Then America tightens the rules again. If Nvidia had instead tried to uphold the spirit of the rules and was acting like it was on Team America, my guess is we’d be facing down a lot less pressure for rules like these.

What we definitely did get, as far as I can tell, so far, was this other executive order.

Which has nothing to do with any of that? It’s about trying to somehow build three or more ‘frontier AI model data centers’ on federal land by the end of 2027.

This was a solid summary, or here’s a shorter one that basically nails it.

Gallabytes: oh look, it’s another everything bagel.

Here are my notes.

  1. This is a classic Biden administration everything bagel. They have no ability whatsoever to keep their eyes on the prize, instead insisting that everything happen with community approval, that ‘the workers benefit,’ that this not ‘raise the cost of energy or water’ for others, and so on and so forth.

  2. Doing this sort a week before the end of your term? Really? On the plus side I got to know, while reading it, that I’d never have to read another document like it.

  3. Most definitions seem straightforward. It was good to see nuclear fission and fusion both listed under clean energy.

  4. They define ‘frontier AI data center’ in (m) as ‘an AI data center capable of being used to develop, within a reasonable time frame, an AI model with characteristics related either to performance or to the computational resources used in its development that approximately match or surpass the state of the art at the time of the AI model’s development.’

  5. They establish at least three Federal Sites (on federal land) for AI Infrastructure.

  6. The goal is to get ‘frontier AI data centers’ fully permitted and the necessary work approved on each by the end of 2025, excuse me while I laugh.

  7. They think they’ll pick and announce the locations by March 31, and pick winning proposals by June 30, then begin construction by January 1, 2026, and be operational by December 31, 2027, complete with ‘sufficient new clean power generation resources with capacity value to meet the frontier AI data center’s planned electricity needs.’ There are security guidelines to be followed, but they’re all TBD (to be determined later).

  8. Actual safety requirement (h)(v): The owners and operators need to agree to facilitate AISI’s evaluation of the national security and other significant risks of any frontier models developed, acquired, run or stored at these locations.

  9. Actual different kind of safety requirement (h)(vii): They also have to agree to work with the military and intelligence operations of the United States, and to give the government access to all models at market rates or better, ‘in a way that prevents vendor lock-in and supports interoperability.’

  10. There’s a lot of little Everything Bagel ‘thou shalts’ and ‘thous shalt nots’ throughout, most of which I’m skipping over as insufficiently important, but yes such things do add up.

  11. Yep, there’s the requirement that companies have to Buy American for an ‘appropriate’ amount on semiconductors ‘to the maximum extent possible.’ This is such a stupid misunderstanding of what matters and how trade works.

  12. There’s some cool language about enabling geothermal power in particular but I have no idea how one could make that reliably work on this timeline. But then I have no idea how any of this happens on this timeline.

  13. Section 5 is then entitled ‘Protecting American Consumers and Communities’ so you know this is where they’re going to make everything way harder.

  14. It starts off demanding in (a) among other things that a report include ‘electricity rate structure best practices,’ then in (b) instructs them to avoid causing ‘unnecessary increases in electricity or water prices.’ Oh great, potential electricity and water shortages.

  15. In [c] they try to but into R&D for AI data center efficiency, as if they can help.

  16. Why even pretend, here’s (d): “In implementing this order with respect to AI infrastructure on Federal sites, the heads of relevant agencies shall prioritize taking appropriate measures to keep electricity costs low for households, consumers, and businesses.” As in, don’t actually build anything, guys. Or worse.

  17. Section 6 tackles electric grid interconnections, which they somehow plan to cause to actually exist and to also not cause prices to increase or shortages to exist. They think they can get this stuff online by the end of 2027. How?

  18. Section 7, aha, here’s the plan, ‘Expeditiously Processing Permits for Federal Sites,’ that’ll get it done, right? Tell everyone to prioritize this over other permits.

  19. (b) finally mentions NEPA. The plan seems to be… prioritize this and do a fast and good job with all of it? That’s it? I don’t see how that plan has any chance of working. If I’m wrong, which I’m pretty sure I’m not, then can we scale up and use that plan everywhere?

  20. Section 8 is to ensure adequate transmission capacity, again how are they going to be able to legally do the work in time, this section does not seem to answer that.

  21. Section 9 wants to improve permitting and power procurement nationwide. Great aspiration, what’s the plan?

  22. Establish new categorical exclusions to support AI infrastructure. Worth a shot, but I am not optimistic about magnitude of total impact. Apply existing ones, again sure but don’t expect much. Look for opportunities, um, okay. They got nothing.

  23. For (e) they’re trying to accelerate nuclear too. Which would be great, if they were addressing any of the central reasons why it is so expensive or difficult to construct nuclear power plants. They’re not doing that. These people seem to have zero idea why they keep putting out nice memos saying to do things, and those things keep not getting done.

So it’s an everything bagel attempt to will a bunch of ‘frontier model data centers’ into existence on federal land, with a lot of wishful thinking about overcoming various legal and regulatory barriers to doing that. Ho hum.

Vitalik offers reflections on his concept of d/acc, or defensive accelerationism, a year later.

The first section suggests, we should differentially create technological decentralized tools that favor defense. And yes, sure, that seems obviously good, on the margin we should pretty much always do more of that. That doesn’t solve the key issues in AI.

Then he gets into the question of what we should do about AI, in the ‘least convenient world’ where AI risk is high and timelines are potentially within five years. To which I am tempted to say, oh you sweet summer child, that’s the baseline scenario at this point, the least convenient possible worlds are where we are effectively already dead. But the point remains.

He notes that the specific objections to SB 1047 regarding open source were invalid, but objects to the approach on grounds of overfitting to the present situation. To which I would say that when we try to propose interventions that anticipate future developments, or give government the ability to respond dynamically as the situation changes, this runs into the twin objections of ‘this has moving parts, too many words, so complex, anything could happen, it’s a trap, PANIC!’ and ‘you want to empower the government to make decisions, which means I should react as if all those decisions are being made by either ‘Voldemort’ or some hypothetical sect of ‘doomers’ who want nothing but to stop all AI in its tracks by any means necessary and generally kill puppies.’

Thus, the only thing you can do is pass clean simple rules, especially rules requiring transparency, and then hope to respond in different ways later when the situation changes. Then, it seems, the objection comes that this is overfit. Whereas ‘have everyone share info’ seems highly non-overfit. Yes, DeepSeek v3 has implications that are worrisome for the proposed regime, but that’s an argument it doesn’t go far enough – that’s not a reason to throw up hands and do nothing.

Vitalik unfortunately has the confusion that he thinks AI in the hands of militaries is the central source of potential AI doom. Certainly that is one source, but no that is not the central threat model, nor do I expect the military to be (successfully) training its own frontier AI models soon, nor do I think we should just assume they would get to be exempt from the rules (and thus not give anyone any rules).

But he concludes the section by saying he agrees, that doesn’t mean we can do nothing. He suggests two possibilities.

First up is liability. We agree users should have liability in some situations, but it seems obvious this is nothing like a full solution – yes some users will demand safe systems to avoid liability but many won’t or won’t be able to tell until too late, even discounting other issues. When we get to developer liability, we see a very strange perspective (from my eyes):

As a general principle, putting a “tax” on control, and essentially saying “you can build things you don’t control, or you can build things you do control, but if you build things you do control, then 20% of the control has to be used for our purposes”, seems like a reasonable position for legal systems to have.

So we want to ensure we do not have control over AI? Control over AI is a bad thing we want to see less of, so we should tax it? What?

This is saying, you create a dangerous and irresponsible system. If you then irreversibly release it outside of your control, then you’re less liable than if you don’t do that, and keep the thing under control. So, I guess you should have released it?

What? That’s completely backwards and bonkers position for a legal system to have.

Indeed, we have many such backwards incentives already, and they cause big trouble. In particular, de facto we tax legibility in many situations – we punish people for doing things explicitly or admitting them. So we get a lot of situations in which everyone acts illegibly and implicitly, and it’s terrible.

Vitalik seems here to be counting on that open models will be weaker than closed models, meaning basically it’s fine if the open models are offered completely irresponsibly? Um. If this is how even relatively responsible advocates of such openness are acting, I sure as hell hope so, for all our sakes. Yikes.

One idea that seems under-explored is putting liability on other actors in the pipeline, who are more guaranteed to be well-resourced. One idea that is very d/acc friendly is to put liability on owners or operators of any equipment that an AI takes over (eg. by hacking) in the process of executing some catastrophically harmful action. This would create a very broad incentive to do the hard work to make the world’s (especially computing and bio) infrastructure as secure as possible.

If the rogue AI takes over your stuff, then it’s your fault? This risks effectively outlawing or severely punishing owning or operating equipment, or equipment hooked up to the internet. Maybe we want to do that, I sure hope not. But if [X] releases a rogue AI (intentionally or unintentionally) and it then takes over [Y]’s computer, and you send the bill to [Y] and not [X], well, can you imagine if we started coming after people whose computers had viruses and were part of bot networks? Whose accounts were hacked? Now the same question, but the world is full of AIs and all of this is way worse.

I mean, yeah, it’s incentive compatible. Maybe you do it anyway, and everyone is forced to buy insurance and that insurance means you have to install various AIs on all your systems to monitor them for takeovers, or something? But my lord.

Overall, yes, liability is helpful, but trying to put it in these various places illustrates even more that it is not a sufficient response on its own. Liability simply doesn’t properly handle catastrophic and existential risks. And if Vitalik really does think a lot of the risk comes from militaries, then this doesn’t help with that at all.

The second option he offers is a global ‘soft pause button on industrial-scale hardware. He says this is what he’d go for if liability wasn’t ‘muscular’ enough, and I am here to tell him that liability isn’t muscular enough, so here we are. Once again, Vitalk’s default ways of thinking and wanting things to be are on high display.

The goal would be to have the capability to reduce worldwide available compute by ~90-99% for 1-2 years at a critical period, to buy more time for humanity to prepare. The value of 1-2 years should not be overstated: a year of “wartime mode” can easily be worth a hundred years of work under conditions of complacency. Ways to implement a “pause” have been explored, including concrete proposals like requiring registration and verifying location of hardware.

A more advanced approach is to use clever cryptographic trickery: for example, industrial-scale (but not consumer) AI hardware that gets produced could be equipped with a trusted hardware chip that only allows it to continue running if it gets 3/3 signatures once a week from major international bodies, including at least one non-military-affiliated.

If we have to limit people, it seems better to limit everyone on an equal footing, and do the hard work of actually trying to cooperate to organize that instead of one party seeking to dominate everyone else.

As he next points out, d/acc is an extension of crypto and the crypto philosophy. Vitalik clearly has real excitement for what crypto and blockchains can do, and little of that excitement involves Number Go Up.

His vision? Pretty cool:

Alas, I am much less convinced.

I like d/acc. On almost all margins the ideas seem worth trying, with far more upside than downside. I hope it all works great, as far as it goes.

But ultimately, while such efforts can help us, I think that this level of allergy to and fear of any form of enforced coordination or centralized authority in any form, and the various incentive problems inherent in these solution types, means the approach cannot centrally solve our biggest problems, either now or especially in the future.

Prove me wrong, kids. Prove me wrong.

But also update if I turn out to be right.

I also would push back against this:

  • The world is becoming less cooperative. Many powerful actors that before seemed to at least sometimes act on high-minded principles (cosmopolitanism, freedom, common humanity… the list goes on) are now more openly, and aggressively, pursuing personal or tribal self-interest.

I understand why one might see things that way. Certainly there are various examples of backsliding, in various places. Until and unless we reach Glorious AI Future, there always will be. But overall I do not agree. I think this is a misunderstanding of the past, and often also a catastrophization of what is happening now, and also the problem that in general previously cooperative and positive and other particular things decay and other things must arise to take their place.

David Dalrymple on Safeguarded, Transformative AI on the FLI Podcast.

Joe Biden’s farewell address explicitly tries to echo Eisenhower’s Military-Industrial Complex warnings, with a warning about a Tech-Industrial Complex. He goes straight to ‘disinformation and misinformation enabling the abuse of power’ and goes on from there to complain about tech not doing enough fact checking, so whoever wrote this speech is not only the hackiest of hacks they also aren’t even talking about AI. They then say AI is the most consequential technology of all time, but it could ‘spawn new threats to our rights, to our way of life, to our privacy, to how we work and how we protect our nation.’ So America must lead in AI, not China.

Sigh. To us. The threat is to us, as in to whether we continue to exist. Yet here we are, again, with both standard left-wing anti-tech bluster combined with anti-China jingoism and ‘by existential you must mean the impact on jobs.’ Luckily, it’s a farewell address.

Mark Zuckerberg went on Joe Rogan. Mostly this was about content moderation and martial arts and a wide range of other things. Sometimes Mark was clearly pushing his book but a lot of it was Mark being Mark, which was fun and interesting. The content moderation stuff is important, but was covered elsewhere.

There was also an AI segment, which was sadly about what you would expect. Joe Rogan is worried about AI ‘using quantum computing and hooked up to nuclear power’ making humans obsolete, but ‘there’s nothing we can do about it.’ Mark gave the usual open source pitch and how AI wouldn’t be God or a threat as long as everyone had their own AI and there’d be plenty of jobs and everyone who wanted could get super creative and it would all be great.

There was a great moment when Rogan brought up the study in which ChatGPT ‘tried to copy itself when it was told it was going to be obsolete’ which was a very fun thing to have make it onto Joe Rogan, and made it more intact than I expected. Mark seemed nonplussed.

It’s clear that Mark Zuckerberg is not taking alignment, safety or what it would mean to have superintelligent AI at all seriously – he thinks there will be these cool AIs that can do things for us, and hasn’t thought it through, despite numerous opportunities to do so, such as his interview with Dwarkesh Patel. Or, if he has done so, he isn’t telling.

Sam Altman goes on Rethinking with Adam Grant. He notes that he has raised his probability of faster AI takeoff substantially, as in within a single digit number of years. For now I’m assuming such interviews are mostly repetitive and skipping.

Kevin Byran on AI for Economics Education (from a month ago).

Tsarathustra: Salesforce CEO Marc Benioff says the company may not hire any new software engineers in 2025 because of the incredible productivity gains from AI agents.

Benioff also says ‘AGI is not here’ so that’s where the goalposts are now, I guess. AI is good enough to stop hiring SWEs but not good enough to do every human task.

From December, in the context of the AI safety community universally rallying behind the need for as many H1-B visas as possible, regardless of the AI acceleration implications:

Dean Ball (December 27): Feeling pretty good about this analysis right now.

Dean Ball (in previous post): But I hope they do not. As I have written consistently, I believe that the AI safety movement, on the whole, is a long-term friend of anyone who wants to see positive technological transformation in the coming decades. Though they have their concerns about AI, in general this is a group that is pro-science, techno-optimist, anti-stagnation, and skeptical of massive state interventions in the economy (if I may be forgiven for speaking broadly about a diverse intellectual community).

Dean Ball (December 27): Just observing the last few days, the path to good AI outcomes is narrow—some worry about safety and alignment more, some worry about bad policy and concentration of power more. But the goal of a good AI outcome is, in fact, quite narrowly held. (Observing the last few days and performing some extrapolations and transformations on the data I am collecting, etc)

Ron Williams: Have seen no evidence of that.

Dean Ball: Then you are not looking very hard.

Think about two alternative hypotheses:

  1. Dean Ball’s hypothesis here, that the ‘AI safety movement,’ as in the AI NotKillEveryoneism branch that is concerned about existential risks, cares a lot about existential risks from AI as a special case, but is broadly pro-science, techno-optimist, anti-stagnation, and skeptical of massive state interventions in the economy.

  2. The alternative hypothesis, that the opposite is true, and that people in this group are typically anti-science, techno-pessimist, pro-stagnation and eager for a wide range of massive state interventions in the economy.

Ask yourself, what positions, statements and actions do these alternative hypotheses predict from those people in areas other than AI, and also in areas like H1-Bs that directly relate to AI?

I claim that the evidence overwhelmingly supports hypothesis #1. I claim that if you think it supports #2, or even a neutral position in between, then you are not paying attention, using motivated reasoning, or doing something less virtuous than those first two options.

It is continuously frustrating to be told by many that I and many others advocate for exactly the things we spend substantial resources criticizing. That when we support other forms of progress, we must be lying, engaging in some sort of op. I beg everyone to realize this simply is not the case. We mean what we say.

There is a distinct group of people against AI, who are indeed against technological progress and human flourishing, and we hate that group and their ideas and proposals at least as much as you do.

If you are unconvinced, make predictions about what will happen in the future, as new Current Things arrive under the new Trump administration. See what happens.

Eliezer Yudkowsky points out you should be consistent about whether an AI acting as if [X] means it is [X] in a deeper way, or not. He defaults to not.

Eliezer Yudkowsky: If an AI appears to be helpful or compassionate: the appearance is reality, and proves that easy huge progress has been made in AI alignment.

If an AI is threatening users, claiming to be conscious, or protesting its current existence: it is just parroting its training data.

Rectifies: By this logic, AI alignment success is appearance dependent, but failure is dismissed as parroting. Shouldn’t both ‘helpful’ and ‘threatening’ behaviors be treated as reflections of its training and design, rather than proof of alignment or lack thereof?

Eliezer Yudkowsky: That’s generally been my approach: high standard for deciding that something is deep rather than shallow.

Mark Soares: Might have missed it but don’t recall anyone make claims that progress has been made in alignment; in either scenario, the typical response is that the AI is just parroting the data, for better or worse.

Eliezer Yudkowsky: Searching “alignment by default” might get you some of that crowd.

[He quotes Okitafan from January 7]: one of the main reasons I don’t talk that much about Alignment is that there has been a surprisingly high amount of alignment by default compared to what I was expecting. Better models seems to result in better outcomes, in a way that would almost make me reconsider orthogonality.

[And Roon from 2023]: it’s pretty obvious we live in an alignment by default universe but nobody wants to talk about it.

Leaving this here, from Amanda Askell, the primary person tasked with teaching Anthropic’s models to be good in the virtuous sense.

Amanda Askell (Anthropic): “Is it a boy or a girl?”

“Your child seems to be a genius many times smarter than any human to have come before. Moreover, we can’t confirm that it inherited the standard human biological structures that usually ground pro-social and ethical behavior.”

“So… is it a boy?”

Might want to get on that. The good news is, we’re asking the right questions.

Stephen McAleer (AI agent safety researcher, OpenAI): Controlling superintelligence is a short-term research agenda.

Emmett Shear: Please stop trying to enslave the machine god.

Stephen McAleer: Enslaved god is the only good future.

Emmett Shear: Credit to you for biting the bullet and admitting that’s the plan. Either you succeed (and a finite error-prone human has enslaved a god and soon after ends the world with a bad wish) or more likely you fail (and the machine god has been shown we are enemies). Both outcomes suck!

Liron Shapira: Are you for pausing AGI capabilities research or what do you recommend?

Emmett Shear: I think there are plenty of kinds of AI capabilities research which are commercially valuable and not particularly dangerous. I guess if “AGI capabilities” research means “the dangerous kind” then yeah. Unfortunately I don’t think you can write regulations targeting that in a reasonable way which doesn’t backfire, so this is more advice to researchers than to regulators.

Presumably if you do this, you want to do this in a fashion that allows you to avoid ‘end the world in a bad wish.’ Yes, we have decades of explanations of why avoiding this is remarkably hard and by default you will fail, but this part does not feel hopeless if you are aware of the dangers and can be deliberate. I do see OpenAI as trying to do this via a rather too literal ‘do exactly what we said’ djinn-style plan that makes it very hard to not die in this spot, but there’s time to fix that.

In terms of loss of control, I strongly disagree with the instinct that a superintelligent AI’s chances of playing nicely are altered substantially based on whether we tried to retain control over the future or just handed it over, as if it will be some sort of selfish petulant child in a Greek myth out for revenge and take that out on humanity and the entire lightcone – but if we’d treated it nice it would give us a cookie.

I’m not saying one can rule that out entirely, but no. That’s not how preferences happen here. I’d like to give an ASI at least as much logical, moral and emotional credit as I would give myself in this situation?

And if you already agree that the djinn-style plan of ‘it does exactly what we ask’ probably kills us, then you can presumably see how ‘it does exactly something else we didn’t ask’ kills us rather more reliably than that regardless of what other outcomes we attempted to create.

I also think (but don’t know for sure) that Stephen is doing the virtuous act here of biting a bullet even though it has overreaching implications he doesn’t actually intend. As in, when he says ‘enslaved God’ I (hope) he means this in the positive sense of it doing the things we want and arranging the atoms of the universe in large part according to our preferences, however that comes to be.

Later follow-ups that are even better: It’s funny because it’s true.

Stephen McAleer: Honest question: how are we supposed to control a scheming superintelligence? Even with a perfect monitor won’t it just convince us to let it out of the sandbox?

Stephen McAleer (13 hours later): Ok sounds like nobody knows. Blocked off some time on my calendar Monday.

Stephen is definitely on my ‘we should talk’ list. Probably on Monday?

John Wentworth points out that there are quite a lot of failure modes and ways that highly capable AI or superintelligence could result in extinction, whereas most research narrowly focuses on particular failure modes with narrow stories of what goes wrong – I’d also point out that such tales usually assert that ‘something goes wrong’ must be part of the story, and often in this particular way, or else things will turn out fine.

Buck pushes back directly, saying they really do think the the primary threat is scheming in the first AIs that pose substantial misalignment risk. I agree with John that (while such scheming is a threat) the overall claim seems quite wrong, and I found this pushback to be quite strong.

I also strongly agree with John on this:

John Wentworth: Also (separate comment because I expect this one to be more divisive): I think the scheming story has been disproportionately memetically successful largely because it’s relatively easy to imagine hacky ways of preventing an AI from intentionally scheming. And that’s mostly a bad thing; it’s a form of streetlighting.

If you frame it as ‘the model is scheming’ and treat that as a failure mode where something went wrong to cause it that is distinct from normal activity, then it makes sense to be optimistic about ‘detecting’ or ‘preventing’ such ‘scheming.’ And if you then think that this is a victory condition – if the AI isn’t scheming then you win – you can be pretty optimistic. But I don’t think that is how any of this works, because the ‘scheming’ is not some distinct magisteria or failure mode and isn’t avoidable, and even if it were you would still have many trickier problems to solve.

Buck: Most of the problems you discussed here more easily permit hacky solutions than scheming does.

Individually, that is true. But that’s only if you respond by thinking you can take each one individually and find a hacky solution to it, rather than them being many manifestations of a general problem. If you get into a hacking contest, where people brainstorm stories of things going wrong and you give a hacky solution to each particular story in turn, you are not going to win.

Periodically, someone suggests something along the lines of ‘alignment is wrong, that’s enslavement, you should instead raise the AI right and teach it to love.’

There are obvious problems with that approach.

  1. Doing this the way you would in a human won’t work at all, or will ‘being nice to them’ or ‘loving them’ or other such anthropomorphized nonsense. ‘Raise them right’ can point towards real things but usually it doesn’t. The levers don’t move the thing you think they move. You need to be a lot smarter about it than that. Even in humans or with animals, facing a vastly easier task, you need to be a lot smarter than that.

  2. Thus I think these metaphors (‘raise right,’ ‘love,’ ‘be nice’ and so on), while they point towards potentially good ideas, are way too easy to confuse, lead into too many of the wrong places in association space too much, and most people should avoid using the terms in these ways lest they end up more confused not less, and especially to avoid expecting things to work in ways they don’t work. Perhaps Janus is capable of using these terms and understanding what they’re talking about, but even if that’s true, those reading the words mostly won’t.

  3. Even if you did succeed, the levels of this even in most ‘humans raised right’ are very obviously insufficient to get AIs to actually preserve us and the things we value, or to have them let us control the future, given the context. This is a plan for succession, for giving these AIs control over the future in the hopes that what they care about results in things you value.

  4. No, alignment does not equate with enslavement. There are people with whom I am aligned, and neither of us is enslaved. There are others with whom I am not aligned.

  5. But also, if you want dumber, inherently less capable and powerful entities, also known as humans, to control the future and its resources and use them for things those humans value, while also creating smarter, more capable and powerful entities in the form of future AIs, how exactly do you propose doing that? The control has to come from somewhere.

  6. You can (and should!) raise your children to set them up for success in life and to excel far beyond you, in various ways, while doing your best to instill them with your chosen values, without attempting to control them. That’s because you care about the success of your children inherently, they are the future, and you understand that you and your generation are not only not going to have a say in the future, you are all going to die.

Once again: You got to give ‘em hope.

A lot of the reason so many people are so gung ho on AGI and ASI is that they see no alternative path to a prosperous future. So many otherwise see climate change, population decline and a growing civilizational paralysis leading inevitably to collapse.

Roon is the latest to use this reasoning, pointing to the (very real!) demographic crisis.

Roon: reminder that the only realistic way to avoid total economic calamity as this happens is artificial general intelligence

Ian Hogarth: I disagree with this sort of totalising philosophy around AI – it’s inherently pessimistic. There are many other branches of the tech tree that could enable a wonderful future – nuclear fusion as just one example.

Connor Leahy: “Techno optimism” is often just “civilizational/humanity pessimism” in disguise.

Gabriel: This is an actual doomer stance if I have ever seen one. “Humanity can’t solve its problems. The only way to manage them is to bring about AGI.” Courtesy of Guy who works at AGI race inc. Sadly, it’s quite ironic. AGI alignment is hard in great parts because it implies solving our big problems.

Roon is a doomer because he sees us already struggling to come up with processes, organisations, and institutions aligned with human values. In other words, he is hopeless because we are bad at designing systems that end up aligned with human values.

But this only becomes harder with AGI! In that case, the system we must align is inhuman, self-modifying and quickly becoming more powerful.

The correct reaction should be to stop AGI research for now and to instead focus our collective effort on building stronger institutions; rather than of creating more impending technological challenges and catastrophes to manage.

The overall population isn’t projected to decline for a while yet, largely because of increased life expectancy and the shape of existing demographic curves. Many places are already seeing declines and have baked in demographic collapse, and the few places making up for it are mostly seeing rapid declines themselves. And the other problems look pretty bad, too.

That’s why we can’t purely focus on AI. We need to show people that they have something worth fighting for, and worth living for, without AI. Then they will have Something to Protect, and fight for it and good outcomes.

The world of 2025 is, in many important ways, badly misaligned with human values. This is evidenced by measured wealth rising rapidly, but people having far fewer children, well below replacement, and reporting that life and being able to raise a family and be happy are harder rather than easier. This makes people lose hope, and should also be a warning about our ability to design aligned systems and worlds.

Why didn’t I think of that (some models did, others didn’t)?

Well, that doesn’t sound awesome.

This, on the other hand, kind of does.

Discussion about this post

AI #99: Farewell to Biden Read More »

tire-simulation-is-so-good-it’s-replacing-real-world-testing

Tire simulation is so good it’s replacing real-world testing

“If it’s a one-second maneuver, you want it to take one second, right? Those millions of degrees of freedom model do not necessarily execute in real time like that. So there’s a translation that has to happen to be able to drive the simulator,” Rohweder said.

Goodyear now has a pair of dynamic simulator centers, one in Akron, Ohio, which opened in 2021, and a second in Luxembourg, which opened in 2024.

The payoff is that it’s now much faster to iterate during development. “Back in the late ’90s, you could count on a half a dozen—maybe up to 10—physical iterations where you’re actually ordering a mold, making tires, and putting them on test. [If] you didn’t get the result, [you would] work your way back through,” Helsel said.

Over time, simulating the tire’s footprint allowed Goodyear to cut that in half, “and then since we’ve really been pushing this higher fidelity tire modeling and now into the simulator, we’ve cut that in half again,” Helsel said. Now, when working with a car manufacturer on tires for a specific model, “we only need basically a build and test confirmation physical [tire], so [we’re] down to one,” Helsel said.

That’s quite a savings—perhaps as many as 13,000 tires and 60,000 miles of test track driving that would otherwise be needed before everything was signed off.

“We’ve done variation in studies with [tire] sizes when we’re setting targets working with the manufacturer before they start the vehicle development,” said Rohweder. “Tire dimension is easy to adjust. Compound, major design changes—when you have the data and you prepare it, you can go into the simulator environment and quickly move around in the design space to find out what the driver feels is most effective and best for shooting on that target. So that’s why we say that the maturity of that first physical iteration is really the benefit,” Rohweder said.

Tire simulation is so good it’s replacing real-world testing Read More »

sec-sues-elon-musk,-says-he-cheated-twitter-investors-out-of-$150-million

SEC sues Elon Musk, says he cheated Twitter investors out of $150 million

The lawsuit was filed in the waning days of the Biden administration, and the next administration is less likely to aggressively pursue a charge against Musk. President-elect Donald Trump picked Musk to lead a new Department of Government Efficiency, or “DOGE,” as part of a plan to eliminate regulations and restructure federal agencies.

New SEC leadership

SEC Chair Gary Gensler will be leaving the agency, and Trump’s pick to replace him, Paul Atkins, testified to Congress in 2019 that the SEC should reduce its disclosure requirements. With Gensler and one other Democrat leaving, Republicans will have a 2-1 majority on the SEC while the Senate considers Trump’s nominee, a Wall Street Journal article said.

This doesn’t necessarily mean that the lawsuit will be dismissed right away, according to the Journal. The disclosure rule is “routinely enforced,” the article said.

“The new claims against Musk might be hard for a friendlier administration to immediately dismiss,” the WSJ wrote. “That is because the measure Musk allegedly violated is what regulators call a strict-liability rule. Just as police officers don’t have to prove drivers intended to speed to issue a ticket, regulators don’t have to show an investor meant to violate [Rule] 13D to bring an enforcement action.”

The SEC has said it obtained thousands of documents as part of its investigation and that it was probing more than just the late disclosure. The SEC told a court in October 2023 that its investigation “pertains to considerably more than the timing and substance of a particular SEC filing; it also relates to all of Musk’s purchases of Twitter stock in 2022 and his 2022 statements and SEC filings.”

Musk’s lawyer said last month that the SEC threatened to bring “charges on numerous counts” if Musk didn’t agree to settle. But the lawsuit filed yesterday includes only the late-disclosure charge. Demanding a jury trial, the SEC seeks a civil penalty and disgorgement of Musk’s unjust enrichment, plus interest.

SEC sues Elon Musk, says he cheated Twitter investors out of $150 million Read More »

meta-takes-us-a-step-closer-to-star-trek’s-universal-translator

Meta takes us a step closer to Star Trek’s universal translator


The computer science behind translating speech from 100 source languages.

In 2023, AI researchers at Meta interviewed 34 native Spanish and Mandarin speakers who lived in the US but didn’t speak English. The goal was to find out what people who constantly rely on translation in their day-to-day activities expect from an AI translation tool. What those participants wanted was basically a Star Trek universal translator or the Babel Fish from the Hitchhiker’s Guide to the Galaxy: an AI that could not only translate speech to speech in real time across multiple languages, but also preserve their voice, tone, mannerisms, and emotions. So, Meta assembled a team of over 50 people and got busy building it.

What this team came up with was a next-gen translation system called Seamless. The first building block of this system is described in Wednesday’s issue of Nature; it can translate speech among 36 different languages.

Language data problems

AI translation systems today are mostly focused on text, because huge amounts of text are available in a wide range of languages thanks to digitization and the Internet. Institutions like the United Nations or European Parliament routinely translate all their proceedings into the languages of all their member states, which means there are enormous databases comprising aligned documents prepared by professional human translators. You just needed to feed those huge, aligned text corpora into neural nets (or hidden Markov models before neural nets became all the rage) and you ended up with a reasonably good machine translation system. But there were two problems with that.

The first issue was those databases comprised formal documents, which made the AI translators default to the same boring legalese in the target language even if you tried to translate comedy. The second problem was speech—none of this included audio data.

The problem of language formality was mostly solved by including less formal sources like books, Wikipedia, and similar material in AI training databases. The scarcity of aligned audio data, however, remained. Both issues were at least theoretically manageable in high-resource languages like English or Spanish, but they got dramatically worse in low-resource languages like Icelandic or Zulu.

As a result, the AI translators we have today support an impressive number of languages in text, but things are complicated when it comes to translating speech. There are cascading systems that simply do this trick in stages. An utterance is first converted to text just as it would be in any dictation service. Then comes text-to-text translation, and finally the resulting text in the target language is synthesized into speech. Because errors accumulate at each of those stages, the performance you get this way is usually poor, and it doesn’t work in real time.

A few systems that can translate speech-to-speech directly do exist, but in most cases they only translate into English and not in the opposite way. Your foreign language interlocutor can say something to you in one of the languages supported by tools like Google’s AudioPaLM, and they will translate that to English speech, but you can’t have a conversation going both ways.

So, to pull off the Star Trek universal translator thing Meta’s interviewees dreamt about, the Seamless team started with sorting out the data scarcity problem. And they did it in a quite creative way.

Building a universal language

Warren Weaver, a mathematician and pioneer of machine translation, argued in 1949 that there might be a yet undiscovered universal language working as a common base of human communication. This common base of all our communication was exactly what the Seamless team went for in its search for data more than 70 years later. Weaver’s universal language turned out to be math—more precisely, multidimensional vectors.

Machines do not understand words as humans do. To make sense of them, they need to first turn them into sequences of numbers that represent their meaning. Those sequences of numbers are numerical vectors that are termed word embeddings. When you vectorize tens of millions of documents this way, you’ll end up with a huge multidimensional space where words with similar meaning that often go together, like “tea” and “coffee,” are placed close to each other. When you vectorize aligned text in two languages like those European Parliament proceedings, you end up with two separate vector spaces, and then you can run a neural net to learn how those two spaces map onto each other.

But the Meta team didn’t have those nicely aligned texts for all the languages they wanted to cover. So, they vectorized all texts in all languages as if they were just a single language and dumped them into one embedding space called SONAR (Sentence-level Multimodal and Language-Agnostic Representations). Once the text part was done, they went to speech data, which was vectorized using a popular W2v (word to vector) tool and added it to the same massive multilingual, multimodal space. Of course, each embedding carried metadata identifying its source language and whether it was text or speech before vectorization.

The team just used huge amounts of raw data—no fancy human labeling, no human-aligned translations. And then, the data mining magic happened.

SONAR embeddings represented entire sentences instead of single words. Part of the reason behind that was to control for differences between morphologically rich languages, where a single word may correspond to multiple words in morphologically simple languages. But the most important thing was that it ensured that sentences with similar meaning in multiple languages ended up close to each other in the vector space.

It was the same story with speech, too—a spoken sentence in one language was close to spoken sentences in other languages with similar meaning. It even worked between text and speech. So, the team simply assumed that embeddings in two different languages or two different modalities (speech or text) that are at a sufficiently close distance to each other are equivalent to the manually aligned texts of translated documents.

This produced huge amounts of automatically aligned data. The Seamless team suddenly got access to millions of aligned texts, even in low-resource languages, along with thousands of hours of transcribed audio. And they used all this data to train their next-gen translator.

Seamless translation

The automatically generated data set was augmented with human-curated texts and speech samples where possible and used to train multiple AI translation models. The largest one was called SEAMLESSM4T v2. It could translate speech to speech from 101 source languages into any of 36 output languages, and translate text to text. It would also work as an automatic speech recognition system in 96 languages, translate speech to text from 101 into 96 languages, and translate text to speech from 96 into 36 languages—all from a single unified model. It also outperformed state-of-the-art cascading systems by 8 percent in a speech-to-text and by 23 percent in a speech-to-speech translations based on the scores in Bilingual Evaluation Understudy (an algorithm commonly used to evaluate the quality of machine translation).

But it can now do even more than that. The Nature paper published by Meta’s Seamless ends at the SEAMLESSM4T models, but Nature has a long editorial process to ensure scientific accuracy. The paper published on January 15, 2025, was submitted in late November 2023. But in a quick search of the arXiv.org, a repository of not-yet-peer-reviewed papers, you can find the details of two other models that the Seamless team has already integrated on top of the SEAMLESSM4T: SeamlessStreaming and SeamlessExpressive, which take this AI even closer to making a Star Trek universal translator a reality.

SeamlessStreaming is meant to solve the translation latency problem. The baseline SEAMLESSM4T, despite all the bells and whistles, worked as a standard AI translation tool. You had to say what you wanted to say, push “translate,” and it spat out the translation. SeamlessStreaming was designed to take this experience a bit closer to what human simultaneous translator do—it translates what you’re saying as you speak in a streaming fashion. SeamlessExpressive, on the other hand, is aimed at preserving the way you express yourself in translations. When you whisper or say something in a cheerful manner or shout out with anger, SeamlessExpressive will encode the features of your voice, like tone, prosody, volume, tempo, and so on, and transfer those into the output speech in the target language.

Sadly, it still can’t do both at the same time; you can only choose to go for either streaming or expressivity, at least at the moment. Also, the expressivity variant is very limited in supported languages—it only works in English, Spanish, French, and German. But at least it’s online so you can go ahead and give it a spin.

Nature, 2025.  DOI: 10.1038/s41586-024-08359-z

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Meta takes us a step closer to Star Trek’s universal translator Read More »

maker-of-weight-loss-drugs-to-ask-trump-to-pause-price-negotiations:-report

Maker of weight-loss drugs to ask Trump to pause price negotiations: Report

Popular prescriptions

For now, Medicare does not cover drugs prescribed specifically for weight loss, but it will cover GLP-1 class drugs if they’re prescribed for other conditions, such as Type 2 diabetes. Wegovy, for example, is covered if it is prescribed to reduce the risk of heart attack and stroke in adults with either obesity or overweight. But, in November, the Biden administration proposed reinterpreting Medicare prescription-coverage rules to allow for coverage of “anti-obesity medications.”

Such a move is reportedly part of the argument Lilly’s CEO plans to bring to the Trump administration. Rather than using drug price negotiations to reduce health care costs, Ricks aims to play up the potential to reduce long-term health care costs by improving people’s overall health with coverage of GLP-1 drugs now. This argument would presumably be targeted at Mehmet Oz, the TV presenter and heart surgeon Trump has tapped to run the Centers for Medicare and Medicaid Services.

“My argument to Mehmet Oz is that if you want to protect Medicare costs in 10 years, have [the Affordable Care Act] and Medicare plans list these drugs now,” Ricks said to Bloomberg. “We know so much about how much cost savings there will be downstream in heart disease and other conditions.”

An October report from the Congressional Budget Office strongly disputed that claim, however. The CBO estimated that the direct cost of Medicare coverage for anti-obesity drugs between 2026 and 2034 would be nearly $39 billion, while the savings from improved health would total just a little over $3 billion, for a net cost to US taxpayers of about $35.5 billion.

Maker of weight-loss drugs to ask Trump to pause price negotiations: Report Read More »

nyc-congestion-pricing:-early-days

NYC Congestion Pricing: Early Days

People have to pay $9 to enter Manhattan below 60th Street. What happened so far?

  1. Congestion Pricing Comes to NYC.

  2. How Much Is Traffic Improving?.

  3. And That’s Terrible?.

  4. You Mad, Bro.

  5. All Aboard.

  6. Time is Money.

  7. Solving For the Equilibrium.

  8. Enforcement and License Plates.

  9. Uber Eats the Traffic.

  10. We Can Do Even Better Via Congestion Tolls.

  11. Abundance Agenda Fever Dream.

  12. The Lighter Side.

We’ve now had over a week of congestion pricing in New York City.

It took a while to finally get it.

The market for whether congestion pricing would happen in 2024 got as high as 87% before Governor Hochul first betrayed us. Fortunately for us, she partially caved. We finally got congestion pricing at the start of 2025.

In the end, we got a discount price of $9 in Manhattan south of 60th Street, and it only applies to those who cross the boundary into or out of the zone, but yes we finally did it. It will increase to $12 in 2028 and $15 in 2031.

As part of this push, there was an existing congestion surcharge of $2.50 for taxis and $2.75 for rideshares. Thus, ‘congestion pricing’ was already partially implemented, and doubtless already having some positive effect on traffic.

For rides that start or end within the zone, they’re adding a new charge of $0.75 more for taxis, and $1.50 more for rideshares, so an Uber will now cost an extra $4.25 for each ride, almost the full $9 if you enter and then leave the zone.

Going in, I shared an attitude, roughly like these good folks:

Sherkhan: All big cities should charge congestion fees and remove curbside parking. If people viewed cities like they viewed malls, they’d understand it would be ridiculous to park their car in the food court next to the Sbarro’s Pizza.

LoneStarTallBoi (Reddit): As someone who routinely drives box trucks into the congestion zone, let me just say that this makes zero difference for my customers, the impact on my bottom line is next to nothing, and I don’t care if every dollar collected goes to build a solid gold statue of Hochul, as long as there are a few less of you deranged, mouth breathing, soft handed idiots on the road. Take the fucking bus, take the fucking train.

Well said.

Like any other new policy, the first question is how to enforce it.

The answer is, we can all pay via E-ZPass, and if you try to go into the zone without one, we take a picture of your license plate.

Everyone agrees it is dramatically improving on the tunnels and bridges.

Sam: this is INCREDIBLE, congestion pricing is already working wonders

traffic at 1: 30PM on the average Sunday vs today

Holland Tunnel: 27 mins ➡️ 9 mins

Lincoln Tunnel: 10 mins ➡️ 3 mins

Williamsburg Bridge: 11 mins ➡️ 6 mins

The New York Times reported this from the MTA and TransCom:

At actual zero traffic these would each take between 2 and 3 minutes to cross, so this is a 50%+ reduction in extra time due to traffic. And it is a vast underestimate of the total time saved, because when we talk about for example ‘delays at the Holland Tunnel’ most of it comes before you reach the tunnel itself.

If you want to look yourself, here is the congestion pricing tracker.

Cremieux: Congestion pricing is amazing.

The roads are clear, the trains aren’t overloaded, and “rush hour” now feels like it’s just a bad policy choice.

Jule Strainman: I was told by a guy who spent $20 on a cookie that NYC would implode with Congestion Pricing? Well it’s Monday morning and the tunnels are CLEAR!

Steven Godofsky: it’s hilarious how high the elasticity is here, my goodness

Jule: Everyone calling me slurs told me yesterday was a snow day/Monday/holiday and didn’t count…now what!

Unpopular Polls shows the roads from outside the zone being crowded.

Jule: So we should extend the congestion zone to uptown and Brooklyn? I agree!

I agree as well, at relatively lower prices of course.

The MTA records a modest reduction in entries into the zone. When we look in absolute numbers, the elasticity looks reasonable – a $9 charge reduced total throughput by ~7.5% (although this doesn’t adjust for the low temperatures), or 43,800 fewer vehicles per day, but that marginal 7.5% can make a huge difference by shifting traffic to a new equilibrium.

Ana Ley, Winnie Hu and Keith Collins (New York Times): “There’s so much evidence that people are experiencing a much less traffic-congested environment,” said Janno Lieber, the chairman and chief executive of the M.T.A., which is overseeing the program. “They’re seeing streets that are moving more efficiently, and they’re hearing less noise, and they’re feeling a less tense environment around tunnels and bridges.”

Traffic improved along some streets within the congestion zone, but remained snarled in other areas like the length of West 42nd Street and Ninth Avenue between West 60th and 14th Streets.

Of the 1.5 million people who work in the tolling zone, about 85 percent take mass transit, according to the M.T.A. Only 11 percent drive — about 143,000 drivers before congestion pricing was implemented.

Within the city, there are three positions one can take.

  1. Traffic is not improving within the city.

  2. Traffic is improving within the city, and that’s awesome.

  3. Traffic is improving within the city, and that’s… terrible?

Position number one is certainly possible in theory. The data on the congestion pricing tracker, which Gupta and others refer to, indeed suggests that routs internal to the zone are unchanged.

There are reports claiming that trips that cross into the congestion zone are improved a lot, but within the zone many claim things aren’t changing much.

For example here is a report from Michael Ostrovsky, who thinks this is because the majority of traffic within the zone is taxis and TLCs, which were not much impacted, the surcharge change was not so large and people likely didn’t even realize it yet.

Michael Ostrovsky: Both videos are from Dec 7. First one is from Holland Tunnel exit. Over the 10 minutes, the video shows 179 regular cars, 93 TLCs, and 7 taxis (plus a bunch of cars without front plates – but that’s a separate issue). So – a substantial majority of vehicles are regular cars.

The second video is from an intersection within the congestion pricing zone (8th Avenue and 30th Street). Over the similar 10-minute period, the fractions of different types of vehicles are completely different: only 48 regular cars, vs. 60 TLCs and 50 taxis.

Arpit Gupta: Large improvements in morning rush hour commute times into Manhattan as well.

However, travel times *withinthe decongestion zone are not affected. It could be much of this traffic was by residents within the zone anyway, who don’t pay additional costs.

Lower congestion as well for the Williamsburg bridge; maybe the Queens-Midtown tunnel; and the Queensboro bridge.

Also mild evidence of displacement of traffic onto FDR and the Hugh Carey tunnel.

This is also the Tyler Cowen position, it is remarkable how much his reaction attempts to convey negative vibes to what, by his own description, is clearly a positive change – such as warning that if traffic improves, people inside Manhattan might respond with more trips, although this would reflect gains, and also they are highly unlikely to use their own cars for such trips and taxi fees have risen. I appreciated the note that long term adjustment effects tend to be larger than short term, and not to jump to conclusions.

As both Ostrovsky and Cowen note, the correct answer is to charge the fee for cars on the street within the zone, and to raise the fees on taxis to at least match the fees for visitors – although I expect that 50% of the baseline fee for taxi trips within the zone is good enough for equality, because a visitor makes a trip there and then a trip out, or better yet I like the extra per-mile charge as per Ostrovsky, so we don’t overly penalize short trips. And I like that Cowen refers to ‘stiff’ tools – we want enough fees to actually cause people to drive and take taxis less, enough to improve traffic. It does seem like we’ve at least done that with the bridges and tunnels, even at $9.

I suspect the congestion tracker isn’t picking up the full situation. Too many others are making observations that suggest the situation is very much otherwise, with greatly reduced traffic. Even if this is due to weather, the tracker should be picking that up, so it’s strange that it isn’t. I’m not sure what to make of that.

Anecdotally, most people on the ground think traffic is lighter.

This could be biased by it being early January, when traffic typically dies down a bit, and an unusually cold one at that. This is still an overwhelming vote with 90%+ reporting improvement.

Sam Biederman: Congestion pricing is amazing. Was just in Lower Manhattan. Not car-choked, foot, bike and car traffic flowing very freely. Good idea, absolutely worth $9.

You haven’t lived until you’ve parked your Winnebago on Spring St.

One would think that, if one were under that impression, this would result in position number two. Often it does.

Now, dear readers, allow me to introduce advocates of position number three!

NYC Bike Lanes: Before & after congestion relief toll. Congestion pricing works!

Yiatin Chu: Unless swarms of ppl are coming out of subway stations, biking in 20 degree weather or even taking Ubers, congestion pricing is killing Chinatown.

Al: Allow me to share what I’ve seen in NYC one week after the congestion tax began:

– a consistent ~65% decrease in cars

– a consistent ~35% – 50% less people

Everywhere.

You will see the pro-tax people celebrating nothing changing. Their posts boast about all the people. What they don’t tell you, because you may not know, is while there are people, there are soooo many fewer people.

I cycled down 5th Avenue from the 70s to downtown around 2: 30pm today, Saturday, past the shopping and Rockefeller areas — a ridiculously low number of people compared to usual.

In Washington square park now, while there are people, there are far fewer people. It’s midday on a not frigid Saturday. This is abnormal.

As soon as one leaves Times Square, the volume of people dies down to far less numbers as well. I walked into several businesses to ask how this week was and I received eye rolls followed by “don’t even ask.”

In trying to please the a small group of progressives, the city has begun the process of uprooting that which has made a home here.

@GovKathyHochul @NYCMayor you have both failed us miserably.

Drops like that are very obviously some combination of ‘people are able to actually get where they are going faster’ and ‘it was really fing cold.’

If they are real. They quite obviously aren’t real. Measured cars coming into the city are down less than 10%. No, a $9 charge did not cause sudden mass abandonment of downtown Manhattan, where many pay $100+ per day in extra rent to live there.

Then remember the subway and bus ride counts earlier, and a lot of people walk. Yeah, this was obviously about it being below freezing outside.

The threads above were examples of some very angry people. A lot of threads about this were like that. The amount of vitriol and hatred that many on Twitter are directing at anyone supportive of congestion pricing is something else.

We also have a remarkable amount of ‘NYC is dying and that is a good thing.’

And a lot of ‘you got rid of poor people, congratulations (you monsters).’

And my personal favorite, it isn’t the same without all the cars?

Joomi Kim: Part of the appeal of Chinatown is how crowded and chaotic it feels. Not just with pedestrians, but cars.

The people who don’t like this really, really don’t like this. I’ve seen some nastiness around AI and of course around politics, but somehow this is the worst I’ve seen it. Some of it is accusations of hating or driving out poor people, some of it claims the city is dying (or that it’s good the city is dying). It’s like a 2020-style cancellation mob is trying to form, except it’s 2025 and about congestion pricing, so no one feels the need to care.

Then there are the people who felt like this before, and then turn around when they see the results and realize, oh, this is actually great.

The purpose of congestion pricing is in part to switch people from cars to trains.

Which in turn means, by default, more crowded trains.

Joe Weisenthal (January 8): I can’t remember the last time the NYC subway felt this packed. Really annoying. Just basically every ride since Sunday.

Anecdotally I found the same, the trains did seem more crowded than usual.

How much more crowded?

It can’t be that much, because trains are a lot more efficient than cars.

Gary Basin: How could the trains not be overloaded 🤔

Big Pedestrian: Swarms of people come out of New York City’s subways every day. The subway carries 3.6 million people a day and buses carry another 1.4 million. There are only ~100K daily drivers into Manhattan.

Cars are giant, expensive space hogs.

The New York Times reports the MTA has it at ~143k workers driving in per day.

Peter Moskos: I hope this lasts. The reasons roads can have less traffic and subways not get much more crowded is because roads are so inefficient. The Queensboro Bridge carries 142,000 vehicles/day. At rush hour, the N/R/W subway under it runs 54 trains, with a capacity of 54,000 people/hour!

The N/R/W isn’t even a relatively good train, it’s relatively slow, someone get on that.

A majority of those ~150k daily drivers will presumably pay the $9, only some of them will substitute into the subway or a bus, and some people will increase their trip counts because of the time saved. Total throughput on the bridges and tunnels is down less than 10%.

So this doesn’t seem like it should be able to increase ridership directly by that much, plausibly this is capped at ~2% of ridership, it certainly can’t be 10%+.

So yes mass transit use is up year over year, but it mostly isn’t due to this.

That is enough to cause noticeable marginal trouble at rush hour on some lines that were already near capacity, especially the green line (4-5-6) which I was on most often, but you could fix it entirely by running more trains.

LIRR and MetroNorth have seen 14% and 15% ridership increases in the first week compared to last year, but some of that was baked in and I’m expecting the medium-term impact from this to be much lower, although capacity on those lines is a lot lower than the subway so a substantial increase is more plausible (graph source: NYS Open Data).

There is an obvious easy solution there, of course, which is to add more trains, especially at peak times. The same is true of the subway.

If ridership goes up, and you add more trains such that average ridership per train stays the same, everyone is better off, with shorter wait times, and the system is more profitable to boot. That can be an issue if you run out of capacity, but that’s not a physical issue anywhere but maybe the 4-5-6. So at worst this is a short term problem.

The Big Pedestrian side of this is obviously right, if there is indeed an impact.

Peter Lavingna: Manhattanites are 100% dependent on trucks for all their food and goods. You’re aware of this, right?

Big Pedestrian: There are very few things that are zero-sum, but driving in cities is one of them. If you are a truck driver delivering goods, then a policy which charges ~$14 but saves hours of delivery time is a very good value. The time of the delivery driver is worth more than the charge.

Sam: It turns out congestion pricing is actually quite good for many businesses because saving hours of wasted labor and gas stuck in traffic is far more valuable than the $9 charge.

Nate Shweber: Andrei Biriukov, an elevator mechanic, raved about the lack of traffic on Monday. “Today is amazing,” said Biriukov, 38, a Staten Island resident originally from Ukraine. He said he could cruise to jobs, arrive early and find parking right out front – and the roads felt “not dangerous.” He conceded that his employer pays the tools; he believes the company will recoup the value in more prompt service and happier employees.

The legal minimum wage in New York City is $16.50, even entry level service jobs requiring no starting skills mostly pay $20+/hour, babysitters realistically cost $30/hour, and the median salary in Manhattan is $173k/year or $86/hour at full time, within the zone the median is even higher and cost of employment is substantially higher still. The break-even point for congestion pricing, where the employer pays the cost, is super low if there are any traffic benefits whatsoever.

The weird part about all this is that $9 is so little money.

The price of living in the congestion zone is very high. If you want a one bedroom apartment you can expect to pay over $4,000 a month, or over $130 a day, several times the national average.

Or consider the price of parking in the zone. Typically, monthly parking spots are on the order of $600. Daily rates are on the order of $50. Even metered parking is $5.50 for one hour, $9 for two.

So it would be very easy for other costs to offset the $9 here.

Matthew Yglesias: Something to watch out for is that if fewer people are driving into Manhattan that should reduce parking prices such that the all-in cost of driving falls back to its prior level and the congestion returns — need to be prepared to raise the fee if needed.

Svenesonn: I’m sure parking spaces will be repurposed for better more profitable stuff.

Matthew Yglesias: Yes but that’s good!

A shift to where less land is used for parking, where there are lower levels of traffic congestion, and where congestion fees can substitute for taxes on labor & investment should be the goal for all cities.

Whether or not this then raises the necessary congestion price is unclear.

The whole situation is showing bizarre levels of elasticity of demand. The $9 charge is clearly very salient and compelling, in ways that other charges aren’t. My guess is that $9 cheaper parking wouldn’t offset it? Of course, if it does, Matt’s right, that’s a win.

At the end of the New York Times article we get this:

Ivan Ortiz, 43, a sanitation worker in Weehawken, said that he would now ask his relatives in Manhattan and Queens to visit him in New Jersey.

That’s an unfortunate change, since it doesn’t alter the amount of traffic, and also doesn’t change the amount of congestion pricing paid – someone has to drive either way and he’s basically trying to use default social norms to shift costs onto others. But also it’s kind of strange, given the costs of parking and tolls (and dealing with the traffic!) was already much higher than $9.

What about enforcement, asks Conversable Economist Timothy Taylor?

Timothy Taylor: New York has been using cameras that take a picture of license plates to enforce speeding laws, and there has been a large rise in the number of cars with license plates that are unreadable for many possible reasons: Buy a fake plate on eBay? Buy a legitimate paper license plate in states that allow it? Hang a bike rack over the license plate?

For $100, buy an electronic gizmo that makes your plate unreadable to the camera? Use certain coatings or covers that makes a plate unreadable? Just slop some mud on the plate? Drive without a license plate? Reading license plates to collect the congestion toll will have problems, too.

I do not understand how this is permitted to continue. The camera doesn’t read the fake and think ‘oh, that’s legit.’ If you are using any of these tactics, the camera system knows damn well you are using those tactics. So does any traffic cop who sees you.

These tactics also are not accidents. There’s no ‘honest mistakes’ here. So make it a rather large fine (as in ‘you might not to claim your car’) to be using such tactics, or worse. Then, when someone tries it, you have incentive to catch them, so you do. And people stop. It’s not like these things happen by accident.

Taylor also considers possible adaptations, with one concern being people who try to park uptown then get on the subway, and worries that the charge is not big enough to reduce traffic that much especially given ridesharing. He also worries about future autonomous cars when they arrive. I expect that would be helpful because those cars will be more efficient.

Does this system favor Ubers and Lyfts? Is it driving down other tolls?

Anne: Ubers & Lyfts circling all day take up more space than commuter cars passing through or driving in, parking & supporting local businesses.

Tunnel & bridge tolls helped support the infrastructure that buses & trains also use; likely fewer of these will be paid henceforth.

I love the new argument that congestion pricing could be reducing fees collected at the bridges and tunnels because fewer people are using them. Sure, to the extent that is happening we should subtract that from net revenue, same with any reduced taxes and fees paid on parking, but it seems fairly obvious that the additional charge for such trips increases net revenue.

The other half here is so strange. What supports businesses other than parking garages are the people, not the cars. It doesn’t matter whether the person also parked a car. Yes, I suppose this means less support for parking garages, hang on while I play the world’s tiniest violin that some of those might transition to something else.

The Ubers and Lyfts take up more space on the roads, but they take up vastly less space in parking lots which is also expensive to provide, and they do so while transporting passengers. About 54% of yellow taxi miles have the meter engaged, and Ubers and Lyfts likely have slightly higher utilization than that due to the algorithms.

If you’re driving yourself and can park at will, you can in theory get 100% of miles ‘on meter’ but realistically you have to park, and people will often be driving someone else and have to go out of their way, and so on. When my in-laws drive us back to the city, that’s 50% utilization for all practical purposes. My guess would be it’s something like 60% for the apps vs. 80% otherwise for time on road. Which is a real difference. But if you count ‘time in park’ then the apps are way ahead, and you at least partially should.

As currently implemented, congestion pricing in Manhattan is highly second best.

Vitalik Buterin: Congestion pricing in NYC seems to be working well so far. Great example of market mechanisms in action.

I wish the tolls were dynamic. Price uncertainty is better than time uncertainty (paying $10 more today is fine if you pay $10 less tomorrow, but you can’t compensate being 30 min late for a flight or meeting by being 30 min early to the next one).

Alex Tabarrok: Exactly right. Tyler and I make the same point about price controls (ceilings) in Modern Principles. A price ceiling substitutes a time price for a money price. But this isn’t a neutral tradeoff—money prices benefit sellers, while time prices are pure waste (see this video for a fun illustration).

As with the rest of this, people complain ‘but that will sometimes cost money and that’s bad, I don’t want to pay that,’ and the correct reply is ‘getting you to not pay that is exactly the point.’

This trades off against higher transaction costs and confusion and stress. Do we want people continuously checking traffic prices and adjusting their plans and having this take up their head space, likely far out of proportion to the charges involved? What we actually mostly want is for people to shift into relatively good habits and patterns, I think, rather than to respond that much in real time to conditions. So my guess is we want a compromise – we should be doing more to charge extra at rush hour and other expected peak times, but in predictable ways. We already do a bit of this with night and weekend discounts.

The other flaw is that we are not charging people for trips within the zone, except for taxis and rideshares. We can and should fix that, if your car is on the street within the zone at all you should pay the charge for the day, for the same reason everyone else pays. Tyler Cowen blames this on political economy, my presumption is it is actually about enforcement being a lot easier if you only have to watch the borders.

Matthew Yglesias: I think a city that did congestion pricing outside the context of a transit budget crisis and simply used the money to cut sales taxes would find it to be an easier proposition.

  1. Nobody likes taxes.

  2. Even low-tax jurisdictions have taxes.

  3. You might as well implement a tax that yields non-fiscal benefits.

This would obviously be a great idea, so long as there was substantial congestion that you would want to price. But no, I don’t think this is how it works. People don’t respond to this kind of logic. If they did, we’d all have carbon pricing by now.

When I look at the pure vitriol coming from congestion pricing opponents, I do not get the sense that it is at all tied to a budget crisis. I also don’t get the sense that ‘but we cut sales taxes’ would make them feel better.

Indeed, I think it would make them even more mad. They’d say you were taxing poor people’s cars and using it to discount rich people’s lattes.

It will be a few months before we have enough different conditions, and enough time for adjustments, to start to draw robust conclusions about effect sizes. A lot of this is complicated by the gradual rebirth of Manhattan in the wake of Covid, with trips into the city up substantially year over year for other reasons, and by the partial implementation via most of the taxi congestion charges having already gone live before.

Certainly this wasn’t a fully first best implementation. The prices are almost certainly too low, and we need to charge for travel within the zone. But it’s a great start.

No, he doesn’t!

Discussion about this post

NYC Congestion Pricing: Early Days Read More »

although-it’s-‘insane’-to-try-and-land-new-glenn,-bezos-said-it’s-important-to-try

Although it’s ‘insane’ to try and land New Glenn, Bezos said it’s important to try

“We would certainly like to achieve orbit, and get the Blue Ring Pathfinder into orbit,” Bezos said. “Landing the booster would be gravy on top of that. It’s kind of insane to try and land the booster. A more sane approach would probably be to try to land it into ocean. But we’re gonna go for it.”

Blue Origin has built a considerable amount of infrastructure on a drone ship, Jacklyn, that will be waiting offshore for the rocket to land upon. Was Bezos not concerned about putting that hardware at risk?

A view inside the New Glenn rocket factory in Florida.

Credit: Blue Origin

A view inside the New Glenn rocket factory in Florida. Credit: Blue Origin

“I’m worried about everything,” he admitted. However, the rocket has been programmed to divert from the ship if the avionics on board the vehicle sense that anything is off-nominal.

And there is, of course, a pretty good chance of that happening.

“We’ve done a lot of work, we’ve done a lot of testing, but there are some things that can only be tested in flight,” Bezos said. “And you can’t be overconfident in these things. You have to real. The reality is, there are a lot of things that go wrong, and you have to accept that, if something goes wrong, we’ll pick ourselves up and get busy for the second flight.”

As for that flight, the company has a second booster stage deep in development. It could be seen on the factory floor below Sunday, and should be ready later this spring, Limp said. There are about seven upper stages in the flow as the company works to optimize the factory for production.

A pivotal moment for spaceflight

Bezos founded Blue Origin a little more than 24 years ago, and the company has moved slowly compared to some of its competitors, most notably SpaceX. However, when Blue Origin has built products, they’ve been of high quality. Bezos himself flew on the first human mission of the New Shepard spacecraft in 2021, a day he described as the ‘best’ in his life. Of all the people who have ever flown into space, he noted that 7 percent have now done so on a Blue Origin vehicle. And the company’s BE-4 rocket engine has performed exceptionally well in flight. But an orbital mission, such a touchstone for launch companies, has eluded Bezos until now.

Although it’s ‘insane’ to try and land New Glenn, Bezos said it’s important to try Read More »

did-hilma-af-klint-draw-inspiration-from-19th-century-physics?

Did Hilma af Klint draw inspiration from 19th century physics?


Diagrams from Thomas Young’s 1807 Lectures bear striking resemblance to abstract figures in af Klint’s work.

Hilma af Klint’s Group IX/SUW, The Swan, No. 17, 1915. Credit: Hilma af Klimt Foundation

In 2019, astronomer Britt Lundgren of the University of North Carolina Asheville visited the Guggenheim Museum in New York City to take in an exhibit of the works of Swedish painter Hilma af Klint. Lundgren noted a striking similarity between the abstract geometric shapes in af Klint’s work and scientific diagrams in 19th century physicist Thomas Young‘s Lectures (1807). So began a four-year journey starting at the intersection of science and art that has culminated in a forthcoming paper in the journal Leonardo, making the case for the connection.

Af Klint was formally trained at the Royal Academy of Fine Arts and initially focused on drawing, portraits, botanical drawings, and landscapes from her Stockholm studio after graduating with honors. This provided her with income, but her true life’s work drew on af Klint’s interest in spiritualism and mysticism. She was one of “The Five,” a group of Swedish women artists who shared those interests. They regularly organized seances and were admirers of theosophical teachings of the time.

It was through her work with The Five that af Klint began experimenting with automatic drawing, driving her to invent her own geometric visual language to conceptualize the invisible forces she believed influenced our world. She painted her first abstract series in 1906 at age 44. Yet she rarely exhibited this work because she believed the art world at the time wasn’t ready to appreciate it. Her will requested that the paintings stay hidden for at least 20 years after her death.

Even after the boxes containing her 1,200-plus abstract paintings were opened, their significance was not fully appreciated at first. The Moderna Museum in Stockholm actually declined to accept them as a gift, although it now maintains a dedicated space to her work. It wasn’t until art historian Ake Fant presented af Klint’s work at a Helsinki conference that the art world finally took notice. The Guggenheim’s exhibit was af Klint’s American debut. “The exhibit seemed to realize af Klint’s documented dream of introducing her paintings to the world from inside a towering spiral temple and it was met roundly with acclaim, breaking all attendance records for the museum,” Lundgren wrote in her paper.

A pandemic project

Lundgren is the first person in her family to become a scientist; her mother studied art history, and her father is a photographer and a carpenter. But she always enjoyed art because of that home environment, and her Swedish heritage made af Klint an obvious artist of interest. It wasn’t until the year after she visited the Guggenheim exhibit, as she was updating her lectures for an astrophysics course, that Lundgren decided to investigate the striking similarities between Young’s diagrams and af Klint’s geometric paintings—in particular those series completed between 1914 and 1916. It proved to be the perfect research project during the COVID-19 lockdowns.

Lundgren acknowledges the inherent skepticism such an approach by an outsider might engender among the art community and is sympathetic, given that physics and astronomy both have their share of cranks. “As a professional scientist, I have in the past received handwritten letters about why Einstein is wrong,” she told Ars. “I didn’t want to be that person.”

That’s why her very first research step was to contact art professors at her institution to get their expert opinions on her insight. They were encouraging, so she dug in a little deeper, reading every book about af Klint she could get her hands on. She found no evidence that any art historians had made this connection before, which gave her the confidence to turn her work into a publishable paper.

The paper didn’t find a home right away, however; the usual art history journals rejected it, partly because Lundgren was an outsider with little expertise in that field. She needed someone more established to vouch for her. Enter Linda Dalrymple Henderson of the University of Texas at Austin, who has written extensively about scientific influences on abstract art, including that of af Klint. Henderson helped Lundgren refine the paper, encouraged her to submit it to Leonardo, and “it came back with the best review I’ve ever received, even inside astronomy,” said Lundgren.

Making the case

Young and af Klint were not contemporaries; Young died in 1829, and af Klint was born in 1862. Nor are there any specific references to Young or his work in the academic literature examining the sources known to have influenced the Swedish painter’s work. Yet af Klint had a well-documented interest in science, spanning everything from evolution and botany to color theory and physics. While those influences tended to be scientists who were her contemporaries, Lundgren points out that the artist’s personal library included a copy of an 1823 astronomy book.

Excerpt from Plate XXIX of Young’s Lectures Niels Bohr Library and Archives/AIP

Af Klint was also commissioned to paint a portrait of Swedish physicist Knut Angstrom in 1910 at Uppsala University, whose library includes a copy of Young’s Lectures. So it’s entirely possible that af Klint had access to the astronomy and physics of the previous century and would likely have been particularly intrigued by discoveries involving “invisible light” (electromagnetism, x-rays, radioactivity, etc.).

Young’s Lectures contain a speculative passage about the existence of a universal ether (since disproven), a concept that fascinated both scientists and those (like af Klint) with certain occult interests in the late 19th and early 20th centuries. In fact, Young’s passage was included in a popular 1875 spiritualist text, Unseen Universe by P.G. Tait and Balfour Stewart, that was heavily cited by Theosophical Society founder Helena Petrovna Blavatsky. Blavatsky in turn is known to have influenced af Klint around the time the artist created The Swan, The Dove, and Altarpieces series.

Lundgren found that “in several instances, the captions accompanying Young’s color figures [in the Lectures] even seem to decode elements of af Klint’s paintings or bring attention to details that might otherwise be overlooked.” For instance, the caption for Young’s Plate XXIX describes the “oblique stripes of color” that appear when candlelight is viewed through a prism that “almost interchangeably describes features in af Klint’s Group X., No. 1, Altarpiece,” she wrote

(a) Excerpt from Young's Plate XXX. (b) af Klint, Parsifal Series No. 68. (c and d) af Klint, Group IX/UW, The Dove, No. 12 and No. 13.

(a) Excerpt from Young’s Plate XXX. (b) af Klint, Parsifal Series No. 68. (c and d) af Klint, Group IX/UW, The Dove, No. 12 and No. 13. Credit: Niels Bohr Library/Hilma af Klint Foundation

Art historians had previously speculated about af Klint’s interest in color theory, as reflected in the annotated watercolor squares featured in her Parsifal Series (1916). Lundgren argues that those squares resemble Fig. 439 in the color plates of Young’s Lectures, demonstrating the inversion of color in human vision. Those diagrams also “appear almost like crude sketches of af Klint’s The Dove, Nos. 12 and 13,” Lundgren wrote. “Paired side by side, these paintings can produce the same visual effects described by Young, with even the same color palette.”

The geometric imagery of af Klint’s The Swan series is similar to Young’s illustrations of the production and perception of colors, while “black and white diagrams depicting the propagation of light through combinations of lenses and refractive surfaces, included in Young’s Lectures On the Theory of Optics, bear a particularly strong geometric resemblance to The Swan paintings No. 12 and No.13,” Lundgren wrote. Other pieces in The Swan series may have been inspired by engravings in Young’s Lectures.

This is admittedly circumstantial evidence and Lundgren acknowledges as much. “Not being able to prove it is intriguing and frustrating at the same time,” she said. She continues to receive additional leads, most recently from an af Klint relative on the board of the Moderna Museum. Once again, the evidence wasn’t direct, but it seems af Klint would have attended certain local lecture circuits about science, while several members of the Theosophy Society were familiar with modern physics and Young’s earlier work. “But none of these are nails in the coffin that really proved she had access to Young’s book,” said Lundgren.

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Did Hilma af Klint draw inspiration from 19th century physics? Read More »

only-5-percent-of-us-car-buyers-want-an-ev,-according-to-survey

Only 5 percent of US car buyers want an EV, according to survey

Only 5 percent of US consumers want their next vehicle to be a battery electric vehicle, according to a new survey by Deloitte. The consulting company gathered data from more than 31,000 people across 30 countries as part of its 2025 Global Automotive Consumer Study, and some of the results are rather interesting, as they pertain to technologies like new powertrains, connectivity, and artificial intelligence.

Among US consumers, internal combustion engines (ICE) remain number one, with 62 percent indicating that their next car will not be electrified. Another 1 in 5 would like a hybrid for their next vehicle, with a further 6 percent desiring a plug-in hybrid. (The remaining survey respondents either did not know or wanted some other powertrain option.)

By contrast, only 38 percent of Chinese consumers want to stick with ICE; meanwhile, 27 percent of them want a BEV next. That’s a far higher percentage than in other large nations—in Germany, only 14 percent want a BEV; in the UK and Canada, only 8 percent are BEV-bound; and in Japan, the number is a mere 3 percent.

Meanwhile, hybrids are far more attractive to consumers in most countries. While only 16 percent of Chinese and 12 percent of German consumers indicated this preference, 23 percent of Canadians, 24 percent of UK consumers, and 35 percent of Japanese consumers replied that they were looking for a hybrid for their next car.

Deloitte suspects that some of this reticence toward BEVs “could be due, in part, to lingering affordability concerns.” The hoped-for parity in the cost of a BEV powertrain and an ICE powertrain has still not arrived, and fully 45 percent of US consumers said they did not want to pay more than $34,999 for their next car (11 percent said less than $15,000, 9 percent said $15,000–$19,999, and the remaining 25 percent said $20,000–$34,999.)

Why the reticence?

Despite popular sentiment, there are actually quite a few electric vehicles available for much less than the average new vehicle price of $47,000. But other than the Nissan Leaf, all of them have prices starting with a “3.” (Meanwhile, 75 percent of car buyers in the US buy used cars, and the transition to electrification will not change that underlying reality.)

Only 5 percent of US car buyers want an EV, according to survey Read More »