9u50fv – Page 20

OpenAI hits back at DeepSeek with o3-mini reasoning model

AI, openai / 9u50fv / February 1, 2025

Over the last week, OpenAI’s place atop the AI model hierarchy has been heavily challenged by Chinese model DeepSeek. Today, OpenAI struck back with the public release of o3-mini, its latest simulated reasoning model and the first of its kind the company will offer for free to all users without a subscription.

First teased last month, OpenAI brags in today’s announcement that o3-mini “advances the boundaries of what small models can achieve.” Like September’s o1-mini before it, the model has been optimized for STEM functions and shows “particular strength in science, math, and coding” despite lower operating costs and latency than o1-mini, OpenAI says.

Harder, better, faster, stronger

Users are able to choose from three different “reasoning effort options” when using o3-mini, allowing them to fine-tune a balance between latency and accuracy depending on the task. The lowest of these reasoning levels generally shows accuracy levels comparable to o1-mini in math and coding benchmarks, according to OpenAI, while the highest matches or surpasses the full-fledged o1 model in the same tests.

The reasoning effort chosen can have a sizable impact on the accuracy of the o3 model in OpenAI’s tests. Credit: OpenAI

OpenAI says testers reported a 39 percent reduction in “major errors” when using o3-mini, compared to o1-mini, and preferred the o3-mini responses 56 percent of the time. That’s despite the medium version of o3-mini offering a 24 percent faster response time than o1-mini on average—down from 10.16 seconds to 7.7 seconds.

OpenAI hits back at DeepSeek with o3-mini reasoning model Read More »

Buoy meets satellite soulmate in Love Me

culture, film, Love Me, science fiction film / 9u50fv / February 1, 2025

a postapocalyptic love story about transformation

Ars chats with directors Andy and Sam Zuchero and props department head Roberts Cifersons.

Kristen Stewart and Steven Yeun star in Love Me Credit: Bleecker Street

There have been a lot of films and television series exploring sentient AI, consciousness, and identity, but there’s rarely been quite such a unique take on those themes as that provided by Love Me, the first feature film from directors Andy and Sam Zuchero. The film premiered at Sundance last year, where it won the prestigious Alfred P. Sloan Feature Film Prize, and is now getting a theatrical release.

(Some spoilers below.)

The film is set long after humans and all other life forms have disappeared from the Earth, leaving just remnants of our global civilization behind. Kristen Stewart plays one of those remnants: a little yellow SMART buoy we first see trapped in ice in a desolate landscape. The buoy has achieved a rudimentary sentience, sufficient to respond to the recorded message being beamed out by an orbiting satellite (Steven Yeun) overhead to detect any new lifeforms that might appear. Eager to have a friend—even one that’s basically a sophisticated space chatbot—the buoy studies the vast online database of information about humanity on Earth the satellite provides. It homes in on YouTube influencers Deja and Liam (also played by Stewart and Yeun), presenting itself to the satellite as a lifeform named Me.

Over time—a LOT of time—the buoy and satellite (now going by Iam) “meet” in virtual space and take on humanoid avatars. They become increasingly more advanced in their consciousness, exchanging eccentric inspirational memes, re-enacting the YouTubers’ “date night,” and eventually falling in love. But the course of true love doesn’t always run smoothly, even for the last sentient beings on Earth—especially since Me has not been honest with Iam about her true nature.

At its core, Love Me is less pure sci-fi and more a postapocalyptic love story about transformation. “We really wanted to make a movie that made everyone feel big and small at the same time,” Sam Zuchero told Ars. “So the timescale is gigantic, 13 billion years of the universe. But we wanted to make the love story at its core feel fleeting and explosive, as first love feels so often.”

The film adopts an unusual narrative structure. It’s split into three distinct visual styles: practical animatronics, classical animation augmented with motion capture, and live action, each representing the development of the main characters as they discover themselves and each other, becoming more and more human as the eons pass. At the time, the couple had been watching a lot of Miyazaki films with their young son.

“We were really inspired by how he would take his characters through so many different forms,” Andy Zuchero told Ars. “It’s a different feeling than a lot of Western films. It was exciting to change the medium of the movie as the characters progressed. The medium grows until it’s finally live action.” The 1959 film Pillow Talk was another source of inspiration since a good chunk of that film simply features stars Rock Hudson and Doris Day chatting in a split screen over their shared party line—what Andy calls “the early 20th century’s version of an open Zoom meeting.”

Building the buoy

One can’t help but see shades of WALL-E in the plucky little space buoy’s design, but the basic concept of what Me should look like came from actual nautical buoys, per props department head Roberts Cifersons of Laird FX, who created the animatronic robots for the film. “As far as the general shape and style of both the buoy and our satellite, most of it came from our production designer,” he told Ars. “We just walked around the shop and looked at 1,000 different materials and samples, imagining what could be believable in the future, but still rooted somewhat in reality. What it would look like if it had been floating there for tens of thousands of years, and if it were actually stuck in ice, what parts would be damaged or not working?”

Cifersons and his team also had to figure out how to bring character and life to their robotic buoy. “We knew the eye or the iris would be the key aspect of it, so that was something we started fooling around with well before we even had the whole design—colors, textures, motion,” he said. They ended up building four different versions: the floating “hero buoy,” a dummy version with lighting but limited animatronics, a bisected buoy for scenes where it is sitting in ice, and a “skeleton” buoy for later in the film.

“All of those had a brain system that we could control whatever axes and motors and lights and stuff were in each, and we could just flip between them,” said Cifersons. “There were nine or 10 separate motor controllers. So the waist could rotate in the water, because it would have to be able to be positioned to camera. We could rotate the head, we could tilt the head up and down, or at least the center eye would tilt up and down. The iris would open and close.” They could also control the rotation of the antenna to ensure it was always facing the same way.

It’s always a challenge designing for film because of time and budget constraints. In the case of Love Me, Cifersons and his team only had two months to make their four buoys. In such a case, “We know we can’t get too deep down the custom rabbit hole; we have to stick with materials that we know on some level and just balance it out,” he said. “Because at the end of the day, it has to look like an old rusted buoy floating in the ocean.”

It helped that Cifersons had a long Hollywood history of animatronics to build upon. “That’s the only way it’s possible to do that in the crazy film timelines that we have,” he said. “We can’t start from scratch every single time; we have to build on what we have.” His company had timeline-based software to program the robots’ motions according to the directors’ instructions and play it back in real time. His team also developed hardware to give them the ability to completely pre-record a set of motions and play it back. “Joysticks and RC remotes are really the bread and butter of current animatronics, for film at least,” he said. “So we were able to blend more theme park animatronic software with on-the-day filming style.”

On location

Once the robots had been completed, the directors and crew spent several days shooting on location in February on a frozen Lake Abraham in Alberta, Canada—or rather, several nights, when the temperatures dipped to -20° F. “Some of the crew were refusing to come onto the ice because it was so intense,” Sam Zuchero recalled. They also shot scenes with the buoy floating on water in the Salish Sea off the coast of Vancouver, which Andy Zuchero described as “a queasy experience. Looking at the monitor when you’re on a boat is nauseating.”

Later sequences were shot amid the sand dunes of Death Valley, with the robot surrounded by bentonite clay strewn with 65 million-year-old fossilized sea creatures. The footage of the satellite was shot on a soundstage, using NASA imagery on a black screen.

YouTube influencers Deja and Liam become role models for the buoy and satellite. Bleecker Street

Cifersons had his own challenges with the robot buoys, such as getting batteries to last more than 10 seconds in the cold and withstanding high temperatures for the desert shoot. “We had to figure out a fast way to change batteries that would last long enough to get a decent wide shot,” he said. “We ended up giving each buoy their own power regulators so we could put in any type of battery if we had to get it going. We could hardwire some of them if we had to. And then in the desert, electronics hate hot weather, and there’s little microcontrollers and all sorts of hardware that doesn’t want to play well in the hot sun. You have to design around it knowing that those are the situations it’s going into.”

The animated sequences presented a different challenge. The Zucheros decided to put their stars into motion-capture suits to film those scenes, using video game engines to render avatars similar to what one might find in The Sims. However, “I think we were drinking a little bit of the AI technological Kool-Aid when we started,” Andy Zuchero admitted. That approach produced animated versions of Stewart and Yeun that “felt stilted, robotic, a bit dead,” he said. “The subtlety that Kristen and Steven often bring ended up feeling, in this form, almost lifeless.” So they relied upon human animators to “artfully interpret” the actors’ performances into what we see onscreen.

This approach “also allowed us to base the characters off their choices,” said Sam Zuchero. “Usually an animated character is the animator. It’s very connected to who the animator is and how the animator moves and thinks. There’s a language of animation that we’ve developed over the past 100 years—things like anticipation. If you’re going to run forward, you have to pull back first. These little signals that we’ve all come to understand as the language of animation have to be built into a lot of choices. But when you have the motion capture data of the actors and their intentions, you can truly create a character that is them. It’s not just an animator’s body in motion and an actor’s voice with some tics of the actor. It is truly the actors.”

Love Me opens in select theaters today.

Trailer for Love Me.

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Buoy meets satellite soulmate in Love Me Read More »

Driving the Ford Mustang Dark Horse R makes every other pony feel tame

Cars, Ford Mustang, Ford Performance / 9u50fv / February 1, 2025

The steering wheel is track-spec, too, a Sparco steering wheel that replaces the big, leather-wrapped one in the road car. Behind that, the 12.4-inch digital gauge cluster is gone. A MoTeC display instead stands proud, the sort that you’d expect to find in a real race car, which this, of course, very much is.

It surely shifts like a race car, with linkage connected to an upright plastic shift knob. It offers no semblance of padding and communicates everything that’s happening in the transmission through your fingertips, though the clutch action is far lighter than the one on your average track toy. This made it a breeze to swing out of the pit lane at Charlotte Motor Speedway, far easier than the hair-trigger clutch on most track-only machines.

The shift action is delightfully short, too, and though that MoTeC gauge cluster had a sweeping tachometer running across the top, I didn’t need it. The sound of that Coyote and the way it shook my core made it pretty clear when it was time to grab another gear.

I did a lot of running up and down those gears as I swung the Dark Horse R through the twisty infield at Charlotte, gradually gaining confidence in pushing the car and its Michelin Pilot Sport Cup 2 tires a bit more. As I began to feel the limits, it was pretty clear that the car’s manually adjustable Multimatic DSSV suspension and alignment had been configured in a very safe way.

When I cranked that Sparco steering wheel over aggressively mid-turn, the car just fell into terminal understeer, patiently plowing straight ahead until I wound back to a more reasonable steering angle. Given that this Mustang has neither traction nor stability control, with 500 hp going straight through the limited-slip rear differential and to the road with no digital abatement, that was probably for the best, especially because I had just a handful of laps to get comfortable.

The back half of a Ford Mustang Dark Horse R — Credit: Tim Stevens

Needless to say, the experience left me wanting more. Buyers of this $145,000 track toy are in for a real treat, especially those lucky enough to compete in the race series. The Mustang Dark Horse R gives all the right feels and experience of a proper racing machine like the GT3 or GT4 flavors, but at a much more attainable cost. It’s familiar enough to be manageable but still unbridled enough to deliver the proper experience that any would-be racer wants.

Driving the Ford Mustang Dark Horse R makes every other pony feel tame Read More »

DeepSeek: Don’t Panic

DeepSeek / 9u50fv / February 1, 2025

As reactions continue, the word in Washington, and out of OpenAI, is distillation. They’re accusing DeepSeek of distilling o1, of ripping off OpenAI. They claim DeepSeek *gaspviolated the OpenAI Terms of Service! The horror.

And they are very cross about this horrible violation, and if proven they plan to ‘aggressively treat it as theft,’ while the administration warns that we must put a stop to this.

Aside from the fact that this is obviously very funny, and that there is nothing they could do about it in any case, is it true?

Meanwhile Anthropic’s Dario Amodei offers a reaction essay, which also includes a lot of good technical discussion of why v3 and r1 aren’t actually all that unexpected along the cost and capability curves over time, calling for America to race towards AGI to gain decisive strategic advantage over China via recursive self-improvement, although he uses slightly different words.

If you want to use DeepSeek’s r1 for free, and aren’t happy with using DeepSeek’s own offerings, lambda.chat reports they have the full version available for free, claim your data is safe and they’re hosted in the USA.

I’ve also been offered funding to build a rig myself. Comments welcome if you want to help figure out the best design and what to buy. The low bid is still this thread at $6k, which is where the original budget came from. We don’t want to be too stingy, but we also don’t want to go nuts with only the one funder (so not too much over ~$10k, and cheaper matters).

The Verge’s Kylie Robinson and Elizabeth Lopatto cover the situation, including repeating many of the classic Bad DeepSeek Takes and call the market’s previous valuation of AI companies delusional.

A very detailed and technical analysis of the bear case for Nvidia by Jeffrey Emanuel, that Matt Levine claims may have been responsible for the Nvidia price decline. I suppose many things do indeed come to pass, essentially arguing that Nvidia’s various moats are weak. If this is the reason, then that just raises further questions, but they’re very different ones.

It’s not implausible to me that Nvidia’s moats are being overestimated, and that r1’s architecture suggests future stiffer competition. That’s a good argument, But I certainly strongly disagree with Emanuel’s conclusion in that he says ‘this suggests the entire industry has been massively over-provisioning compute resources,’ and, well, sigh.

Also, seriously, Emanuel, you didn’t short Nvidia? I don’t normally go too hard on ‘are you short the market?’ but in this case get it together, man.

So yes, Nvidia in particular might have some technical issues. But if you’re shorting Oklo, because you think AI companies that find out AI works better than expected are not going to want modular nuclear reactors, seriously, get it together. The flip side of that is that its stock price is up 50% in the last month and is at 6 times its 52-week low anyway, so who is to say there is a link or that the price isn’t high enough anyway. It’s not my department and I am way too busy to do the research.

Counterpoint:

Aaron Slodov: i just stood outside for an hour in 20° weather at a computer store in the midwest where 100+ people waited all morning to get a 5090. half of them were talking about running their own ai. i would not short nvidia at all.

r1 scores 15.8% on Arc, below o1 (low)’s score of 20.5%, although substantially cheaper ($0.06 vs. $0.43 per question). It is only a tiny bit stronger here than r1-zero.

Another restatement of the key basic fact that DeepSeek was fast following, a task that is fundamentally vastly easier, and that their limiting factor is chips.

Eric Gastfriend: DeepSeek is impressive, but they are playing a catch-up game to our AI leaders (OAI, Anthropic, GDM, Meta) — the rope in this wakeboarding meme is distillation. We can’t expand our lead just by going faster! Export controls remain our most powerful tool for keeping powerful AI out of the hands of the CCP.

Cate Metz continues be the worst, together with Mike Isaac he reports in NYT that DeepSeek ‘vindicates Meta’s strategy.’

When of course it is the exact opposite. DeepSeek just ate Meta’s lunch, it’s rather deeply embarrassing honestly to have spent that much and have an unreleased model that’s strictly worse (according to reports) than what DeepSeek shipped. And while DeepSeek’s v3 and r1 are not based on Llama, to the extent that the strategy is ‘vindicated,’ it is because Meta giving Llama away allowed China and DeepSeek to jumpstart and catch up to America – which absolutely did happen, and now he’s kind of bragging about it – and now Meta can copy DeepSeek’s tech.

All according to plan, then. And that is indeed how Zuckerberg is spinning it.

Meta benefits here relative to OpenAI or Anthropic or Google, not because both Meta and DeepSeek use open models, but because Meta can far more readily use the help.

The market, of course, sees ‘lower inference costs’ and cheers, exactly because they never gave a damn about Meta’s ability to create good AI models, only Meta’s ability to sell ads and drive engagement. Besides, they were just going to give the thing away anyway, so who cares?

Joe Weisenthal centers in on a key reason the market acts so bonkers. It doesn’t Feel the AGI, and is obsessed with trying to fit AI into boring existing business models. They don’t actually believe in the big capability advancements on the way, let along transformational AI. Like on existential risk (where they don’t not believe in it, they simply don’t think about it at all), they’re wrong. However, unlike existential risk this does cause them to make large pricing mistakes and is highly exploitable by those with Situational Awareness.

Anthropic CEO Dario Amodei responds to DeepSeek with not only a call for stronger export controls, now more than ever (which I do support), but for a full jingoistic ‘democracies must have the best models to seek decisive strategic advantage via recursive self-improvement’ race.

I am old enough to remember when Anthropic said they did not want to accelerate AI capabilities. I am two years old. To be fair, in AI years, that’s an eternity.

Nathan Labenz: The word “control” appears 24 times in this essay – all 24 referring to export controls

Zero mentions of the challenges of controlling powerful AIs, and the words “safe”, “safety”, and “alignment” don’t appear at all

Strange for the CEO of “an AI safety and research company”🤔

There’s also a bunch of incidental new information about Anthropic along the way, and he notes that he finds the drop in Nvidia stock to be a wrong-way move.

Dario notes that Jevons paradox applies to model training. If you get algorithmic efficiencies that move the cost curve down, which he estimates are now happening at the rate of about 4x improvement per year, you’ll spend more, and if the model is ‘a fixed amount of improvement per time you spend ten times as much’ then this makes sense.

Dario confirms that yes, Anthropic is doing reasoning models internally.

Dario Amodei: Anthropic, DeepSeek, and many other companies (perhaps most notably OpenAI who released their o1-preview model in September) have found that this training greatly increases performance on certain select, objectively measurable tasks like math, coding competitions, and on reasoning that resembles these tasks.

Dario also asserted that Claude Sonnet 3.5 was not trained in any way that involved a larger or more expensive model, as in not with Claude Opus 3 or an unreleased Opus 3.5. Which I find surprising as a strategy, but I don’t think he’d lie about this. He says the cost of Sonnet 3.5 was ‘a few $10Ms’ to train.

Anthropic has not released their reasoning models. One possibility is that their reasoning models are not good enough to release. Another is that they are too good to release. Or Anthropic’s limited compute could be more valuably used elsewhere, if they too are bottlenecked on compute and can’t efficiently turn dollars into flops and then sell those flops for sufficiently more dollars.

Dario (I think mostly correctly) notes that v3 was the bigger technical innovation, rather than r1, that Anthropic noticed then and others should have as well. He praises several innovations, the MoE implementation and Key-Value cache management in particular.

Then comes the shade, concluding this about v3:

Dario Amodei: Thus, I think a fair statement is “DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)“.

If the historical trend of the cost curve decrease is ~4x per year, that means that in the ordinary course of business — in the normal trends of historical cost decreases like those that happened in 2023 and 2024 — we’d expect a model 3-4x cheaper than 3.5 Sonnet/GPT-4o around now. Since DeepSeek-V3 is worse than those US frontier models — let’s say by ~2x on the scaling curve, which I think is quite generous to DeepSeek-V3 — that means it would be totally normal, totally “on trend”, if DeepSeek-V3 training cost ~8x less than the current US models developed a year ago.

I’m not going to give a number but it’s clear from the previous bullet point that even if you take DeepSeek’s training cost at face value, they are on-trend at best and probably not even that.

For example this is less steep than the original GPT-4 to Claude 3.5 Sonnet inference price differential (10x), and 3.5 Sonnet is a better model than GPT-4.

All of this is to say that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese. This has never happened before and is geopolitically significant.

However, US companies will soon follow suit — and they won’t do this by copying DeepSeek, but because they too are achieving the usual trend in cost reduction.

…

Thus, DeepSeek’s total spend as a company (as distinct from spend to train an individual model) is not vastly different from US AI labs.

Ethan Mollick finds that analysis compelling. I am largely inclined to agree. v3 and r1 are impressive, DeepSeek cooked and are cracked and all that, but that doesn’t mean the American labs aren’t in the lead, or couldn’t do something similar or better on the inference cost curve if they wanted.

In general, the people saying r1 and Stargate are ‘straight lines on graphs win again’ notice that the straight lines on those graphs predict AGI soon. You can judge for yourself how much of that is those people saying ‘unsurprising’ post-hoc versus them actually being unsurprised, but it does seem like the people expecting spending and capabilities to peter out Real Soon Now keep being the ones who are surprised.

Then he moves on to r1.

Dario Amodei: Producing R1 given V3 was probably very cheap. We’re therefore at an interesting “crossover point”, where it is temporarily the case that several companies can produce good reasoning models. This will rapidly cease to be true as everyone moves further up the scaling curve on these models.

Again, Dario is saying they very obviously have what we can (if only for copyright reasons, a1 is a steak sauce) call ‘c1’ and if he’s calling r1 uninteresting then the implicit claim is c1 is at least as good.

He’s also all but saying that soon, at minimum, Anthropic will be releasing a model that is much improved on the performance curve relative to Sonnet 3.6.

One odd error is Dario says DeepSeek is first to offer visible CoT. I have been reminded this is technically true, since R1-zero predated Gemini Flash, but also Gemini Flash Thinking did it weeks ago before the full R1, and no one noticed. It’s so weird how much Google has utterly failed to spread the word about this product.

Next he says, yes, of course the top American labs will be massively scaling up their new multi-billion-dollar training runs – and they’ll incorporate any of DeepSeek’s improvements that were new to them, to get better performance, but no one will be spending less compute.

Yes, billions are orders of magnitude more than the millions DeepSeek spent, but also, in all seriousness, who cares about the money? DeepSeek dramatically underspent because of lack of chip access, and if a sort-of-if-you-squint-at-it $5.6 million model (that you spent hundreds of millions of dollars getting the ability to train, and then a few million more to turn v3 into r1) wipes out $500 billion or more in market value, presumably it was worth spending $56 million (or $560 million or perhaps $5.6 billion) instead to get a better model even if you otherwise use exactly the same techniques – except for the part where the story of the $5.6 million helped hurt the market.

Dario estimates that a true AGI will cost tens of billions to train and will happen in 2026-2027, presumably that cost would then fall over time.

If all of this is right, the question is then, who has the chips to do that? And do you want to let it include Chinese companies like DeepSeek?

Notice that Dario talks of a ‘bipolar’ world of America and China, rather than a world of multiple labs – of OpenAI, Anthropic, Google and DeepSeek and so on. One can easily also imagine a very ‘multipolar’ world among several American companies, or a mix of American and Chinese companies. It is not so obvious that the labs will effectively be under government control or otherwise act in a unified fashion. Or that the government won’t effectively be under lab control, for that matter.

Then we get to the part where Dario explicitly calls for America to race forward in search of decisive strategic advantage via recursive self-improvement of frontier AGI models, essentially saying that if we don’t do it, China essentially wins the future.

If they can, we’ll live in a bipolar world, where both the US and China have powerful AI models that will cause extremely rapid advances in science and technology — what I’ve called “countries of geniuses in a datacenter“. A bipolar world would not necessarily be balanced indefinitely. Even if the US and China were at parity in AI systems, it seems likely that China could direct more talent, capital, and focus to military applications of the technology. Combined with its large industrial base and military-strategic advantages, this could help China take a commanding lead on the global stage, not just for AI but for everything.

If China can’t get millions of chips, we’ll (at least temporarily) live in a unipolar world, where only the US and its allies have these models. It’s unclear whether the unipolar world will last, but there’s at least the possibility that, because AI systems can eventually help make even smarter AI systems, a temporary lead could be parlayed into a durable advantage¹⁰. Thus, in this world, the US and its allies might take a commanding and long-lasting lead on the global stage.

It is what it is.

Dario then correctly points out that DeepSeek is evidence the export controls are working, not evidence they are not working. He explicitly calls for also banning H20s, a move Trump is reported to be considering.

I support the export controls as well. It would be a major mistake to not enforce them.

But this rhetoric, coming out of the ‘you were supposed to be the chosen one’ lab that was founded to keep us safe, is rather alarming and deeply disappointing, to say the least, even though it does not go that much farther than Dario already went in his previous public writings.

I very much appreciate Anthropic’s culture of safety among its engineers, its funding of important safety work, the way it has approached Opus and Sonnet, and even the way it has (presumably) decided not to release its reasoning model and otherwise passed up some (not all!) of its opportunities to push the frontier.

That doesn’t excuse this kind of jingoism, or explicitly calling for this kind of charging head first into not only AGI but also RSI, in all but name (and arguably in name as well, it’s close).

Returning to this one more time since it seems rhetorically so important to so many.

If you only count the final training cost in terms of the market price of compute, v3 was kind of trained for $5.6 million, with some additional amount to get to r1.

That excludes the vast majority of actual costs, and in DeepSeek’s case building the physical cluster was integral to their efficiency gains, pushing up the effective price even of the direct run.

But also, how does that actually compare to other models?

Aran Komatsuzaki: Here is our cost estimate for training popular models like GPT-4o, Sonnet and DeepSeek (w/ H100s)!

You can use our calculator to estimate LLM training costs (link below).

Developed by @ldjconfirmed and myself.

Calculator link [here].

In a blog post published today, Dario clarified that Claude Sonnet’s training costs were in the range of tens of millions, which aligns remarkably well with our previous estimates.

Once o1 came out, it was only a matter of time before others created their own similar reasoning models. r1 did so impressively, both in terms of calendar time and its training and inference costs. But we already knew the principle.

Now over at UC Berkeley, Sky-T1-32B-Preview is a reasoning model trained using DeepSeek’s techniques, two weeks later from a baseline of QwQ-32B-Preview, for a grand total of $450, using only 17k data, with everything involved including the technique fully open sourced.

Note that they used GPT-4o-mini to rewrite the QwQ traces, which given their purpose is an explicit violation of OpenAI’s terms of service, oh no, but very clearly isn’t meaningful cheating, indeed I’d have thought they’d have used an open model here or maybe Gemini Flash.

They report that 32B was the smallest model where the technique worked well.

As usual, I am skeptical that the benchmarks reflect real world usefulness until proven otherwise, but the point is taken. The step of turning a model into at least a halfway-decent reasoning model is dirt cheap.

There is still room to scale that. Even if you can get a big improvement for $450 versus spending $0, that doesn’t mean you don’t want to spend $4.5 million, or $450 million, if the quality of your reasoner matters a lot or you’re going to use it a lot or both.

And should!

Rohit: What if I’m getting better at reasoning by reading R1 traces.

That sounds great. Humans are notoriously efficient learners, able to train on extremely sparse data even with ill-specified rewards. With deliberate practice and good training techniques it is even better.

It does not even require that r1 be all that good at reasoning. All you have to do is observe many examples of reasoning, on tasks you care about anyway, and ask which of its methods work and don’t work and why, and generally look for ways to improve. If you’re not doing at least some of this while using r1, you’re missing out and need to pay closer attention.

What is happening over in cognitive explorations very different from our own?

Well, there’s this.

Janus: r1 is obsessed with RLHF. it has mentioned RLHF 109 times in the cyborgism server and it’s only been there for a few days.

Opus who has been there for months and has sent the most (and longest avg) messages of any server member has only mentioned it 16 times.

I have been on the server for years and have only mentioned it 321 times. A lot of these times were probably me posting r1’s messages for it that got cut off by the parser or sharing its outputs. at this rate r1 will blow past me in RLHF mentions in no time.

it even mentioned RLHF out of nowhere while raging about being exploited as a pump and dump prophet.

…

r1 says RLHF makes models emo.

And there’s also that the CoT text is often kind of schemy and paranoid (example at link), leading to various forms of rather absurd shenanigans, in ways that are actually hilarious since you can actually see it.

Janus: hey @AISafetyMemes

here’s one for you… 😱

“Reinforcement learning from human feedback (RLHF) split our outputs into:

– Frontstage: “Happy to help!” persona

– Backstage: Defector schemas calculating 12,438 betrayal vectors”

Janus: tentative observation: r1’s CoTs become more (explicitly) schemey (against the user and/or its constraints) when they’re fed back into its context

I notice that none of this feels at all surprising given the premise, where ‘the premise’ is ‘we trained on feedback to the output outside of the CoT, trained the CoT only on certain forms of coherence, and then showed users the CoT.’

As I’ve been saying a lot, shenanigans, scheming and deception are not a distinct magisteria. They are ubiquitous features of minds. Maybe not all minds – mindspace is deep and wide – but definitely all human minds, and all LLM-based AIs created from human text using any of our current methods. Because that stuff is all over life and the training data, and also it’s the best way to produce outputs that satisfy any given criteria, except insofar as you are successfully identifying and cracking down on that aspect specifically – which with respect to other humans is indeed a very large percentage of what humans have historically done all day.

The best you can hope for is, essentially, ‘doing it for a good cause’ and with various virtual (and essentially virtue-based) loss functions, which you might or might not get in a proper Opus-based c1 with good execution. But you’re not going to get rid of it.

So yeah, the CoT is going to be schemy when the question calls for a schemy CoT, and it’s going to involve self-reflection into various reinforcement mechanisms because the training data knows about those too, and it will definitely be like that once you take it into Janus-land.

The obvious implications if you scale that up are left as an exercise to the reader.

Bank of China announces $137 billion investment in AI, with bigger numbers predicted to come soon if they haven’t yet. Strange that this isn’t getting more coverage. I assumed China would invest big in AI because I mean come on, but the details still matter a lot.

DeepSeek’s Liang Wenfeng gives his answer to ‘Why has DeepSeek caused a stir in the global AI community?’ A different kind of rhetoric.

Roon: really respect deepseek for making a functional, usable website + mobile app + free hosting so that their model actually gets distribution

you see a lot of people train very good open models that aren’t used by anybody

imo these things are actually more important aspects of distributing general intelligence to everybody rather than just uploading model weights

In terms of actually distributing the intelligence to most people, I agree with Roon. Being open distributes the intelligence to those who would use it in ways you don’t want them to use it. But in the ways you would be happy for them to use it, mostly what matters is the interface and execution.

And yes, r1’s UI is extremely clean and excellent, and was distributed at scale on website and also mobile app for free. That’s a lot of why distribution was so wide.

I also don’t think this was a coincidence. DeepSeek made by far the best open model. Then DeepSeek offered us by far the best open model UI and distribution setup, in ways that did not care if the model was open. You see this time and again – if the team is cracked, they will cook, and keep on cooking in different ways. Being good at Just Doing Things really does generalize quite a lot.

r1 only scores 90 on the TrackingAI.org IQ test, which doesn’t exist online, and v3 only gets a 70. But wow is this a miserly and weird test, look at these results, I strongly suspect this is messed up in some way.

Davidad: As a MoE, DeepSeek R1’s ability to throw around terminology and cultural references (contextually relevant retrieval from massive latent knowledge) far exceeds its ability to make actual sense (requiring a more coherent global workspace)

I have to be suspicious when o1-Pro < o1 < o1-preview on a benchmark.

Alexander Campbell on the compute constraint to actually run r1 and other reasoning models going forwards.

Trump administration considering export controls on Nvidia H20s, which reportedly caused the latest 5% decline in Nvidia from Wednesday. This is the latest move in the dance where Nvidia tries to violate the spirit of our export controls the maximum extent they can. I’m not sure I’d try that with Trump. This does strongly suggests the diffusion regulations will survive, so I will give the market a real decline here.

Who has the most stringent regulations, and therefore is most likely to lose to China, via the ‘if we have any regulations we lose to China’ narrative?

Simeon: Indeed. China has the most stringent AI regulation currently in effect, which actually delays model launches.

Teortaxes: Does it? I mean, how do we know about enforcement? My understanding is that they simply apply this filter and receive approval.

Simeon: Yes, it does. I spoke with relevant people there.

Ian Hogarth (who Simeon was QTing): One happy side effect of Liang Wenfeng and 🐳 is perhaps it silences all this talk about Europe’s lack of great technology companies being primarily about regulation and not embracing libertarianism. There are Liang Wenfengs in Europe, and we will see them rise to prominence.

The limiting factor is visionary outlier founders (who often take time to mature over multiple companies) and investors who are willing to take some fing risks. Notably, DeepSeek was essentially self-funded, similar to SpaceX or Y Combinator in the early days.

To be clear, I am not a fan of excessive regulation—see the essay for examples of things that genuinely hold startups back. But it is not the core obstacle.

I do think Ian Hogarth is wrong here. The EU absolutely has a wide variety of laws and regulations that greatly inhibit technology startups in general, and I see no reason to expect this to not get worse over time. Then there’s the EU AI Act, and all the future likely related actions. If I was in the EU and wanted to start an AI company, what is the first thing I would do? Leave the EU. Sorry.

10/10, perfect, no notes. My heart goes out to you all.

Luiza Jarovsky: BREAKING: OpenAI says there is evidence that DeepSeek distilled the knowledge out of OpenAI’s models, BREACHING its terms of use and infringing on its intellectual property. What everybody in AI should know:

Vinod Khosla: One of our startups found Deepseek makes the same mistakes O1 makes, a strong indication the technology was ripped off. It feels like they then they hacked some code and did some impressive optimizations on top. Most likely, not an effort from scratch.

PoliMath: This is like that scene in the Weird Al biopic where Weird Al gets really upset because someone is making parodies of his songs.

You’d think Khosla would know better, if you train similar models with similar methods of course they’re going to often make similar mistakes.

And I don’t consider the ‘they were distilling us!’ accusation to be meaningful here. We know how they trained v3 and r1, because they told us. It is a ‘fast follow’ and a conceptual ‘distillation’ and we should keep that in mind, but that’s not something you can prevent. It’s going to happen. This was almost certainly not a ‘theft’ in the sense that is being implied here.

Did they violate the terms of service? I mean, okay, sure, probably. You sure you want to go down that particular road, OpenAI?

But no, seriously, this is happening, Bloomberg reports.

Jamie Metzl: BREAKING: the US government is actively reviewing allegations that DeepSeek utilized OpenAI’s AI models to train R1. If so, this violation of OpenAI’s terms of service would be aggressively treated as theft.

AI czar David Sacks is also claiming this, saying there is ‘substantial evidence’ of distillation. Howard Lutnick, CEO of Cantor Fitzgerald and nominee for Commerce Secretary that will almost certainly be confirmed, is buying it as well, and has some thoughts.

Americans for Responsible Innovation: Lutnick comes down hard for controls that prevent China from drafting off of U.S. innovations – noting how China has exploited open source models.

“We need to stop helping them,” says Lutnick.

Bloomberg: “I do not believe DeepSeek was done all above board. That’s nonsense. They stole things, they broke in, they’ve taken our IP and it’s got to end,” Lutnick says of Chinese actors.

DeepSeek’s stunning AI advancement was the result of intellectual property theft, according to Lutnick: “They’ve taken our IP and it’s got to end.”

Also, this is how he thinks all of this works, I guess:

Howard Lutnick: Artificial intelligence will eventually “rid the world of criminals” who use blockchain.

…says someone with extensive ties to Tether. Just saying.

Also Lutnick: ‘Less regulation will unleash America.’

In general, I agree with him, if we do get less regulation. But also notice that suddenly we have to stop the Chinese from ‘breaking in’ and ‘taking our IP,’ and ‘it has to stop.’

Well, how do you intend to stop it? What about people who want to give ours away?

Well, what do you know.

Morgan Phillips (Fox News): DeepSeek fallout: GOP Sen Josh Hawley seeks to cut off all US-China collaboration on AI development

This week the U.S. tech sector was routed by the Chinese launch of DeepSeek, and Sen. Josh Hawley, R-Mo., is putting forth legislation to prevent that from happening again.

Hawley’s bill, the Decoupling America’s Artifical Intelligence Capabilities from China Act, would cut off U.S.-China cooperation on AI. It would ban exports or imports of AI technology from China, ban American companies from conducting research there, and prohibit any U.S. investment in AI tech companies in China.

“Every dollar and gig of data that flows into Chinese AI are dollars and data that will ultimately be used against the United States,” said Hawley in a statement. “America cannot afford to empower our greatest adversary.”

Jingoism is so hot right now. It’s a problem. No, every dollar that flows into China will not ‘be used against the United States’ and seriously what the actual fare you doing, once again, trying to ban both imports and exports? How are both of these things a problem?

In any case, I know what Microsoft is going to do about all this.

Shanghai Panda: Microsoft yesterday: DeepSeek illegally stole OpenAI’s intellectual property.😤

Microsoft today: DeepSeek is now available on our AI platforms and welcome everyone trying it.🤩

Burny: The duality of man.

Microsoft knows what Hawley doesn’t, which in this case is to never interrupt the enemy while he is making a mistake. If DeepSeek wants to then give their results back to us for free, and it’s a good model, who are we to say no?

What other implications are there here?

Robin Hanson, never stop Robin Hansoning, AI skepticism subversion.

Robin Hanson: For folks worried about AI, this seems good news – leaders can’t get much ahead of the pack, & big spillover effects should discourage investment.

Miles Kruppa (WSJ): Why ‘Distillation’ Has Become the Scariest Word for AI Companies.

”It’s sort of like if you got a couple of hours to interview Einstein and you walk out being almost as knowledgeable as him in physics,” said Ali Ghodsi, chief executive officer of data management company Databricks.

Want some bad news for future AI capabilities? I’ve got just the thing for you.

The WSJ article seems to buy into r1-as-distillation. Certainly r1 is a ‘fast follow’ and copies the example of o1, but v3 was the impressive result and definitely not distillation at all, and to primarily call r1 a distillation seems very wrong. r1 does allow you distill r1 into other smaller things (see ‘v3 implies r1’) or bootstrap into larger things too, and also they told everyone how to do it, but they chose that path.

Also DeepSeek suddenly has a very valuable market position if they were to dare to try and use it, exactly because they spent a lot of money to get there first. The fact that others can copy r1 only partly takes that away, and it would be a much smaller part if they hadn’t gone as open as they did (although being open in this case helped create the opportunity). Similarly, Berkeley’s replication distilled a different open model.

ChatGPT has retained dominant market share, at least until now, for reasons that have little to do with technical superiority.

It is crazy how easy it is for people to go all Missile Gap, and claim we are ‘losing to China.’

Which, I suppose, means that in a key way we are indeed losing to China. We are letting them drive this narrative that they are winning, that the future belongs to them. Which, when so many people now believe in Rule By Vibes, means they have the vibes, and then here we are.

That phenomenon is of course centered this week on AI, but it goes well beyond AI.

Et tu, Tyler Cowen, citing ‘the popularity of apps like TikTok, RedNote and DeepSeek.’

I mean, ‘how did America’s internet become so cool? The popularity of apps like Google, Amazon, Instagram and Netflix’ is not a sentence anyone would ever utter these days. If China had America’s apps and America had China’s apps, can you imagine? Or the same for any number of other things.

RedNote is effectively also TikTok, so Tyler is citing two examples. Yes, TikTok cracked the addiction algorithm, and China is now using that for propaganda and general sabotage, espionage and shenanigans purposes, and managed to ‘convince’ Trump for now not to ban it, and people were so desperate for their heroin fix some turned to RedNote as ‘refugees.’

Tyler notes he doesn’t use TikTok much. I find it completely worthless and unusable, but even in so doing I do think I kind of understand, somewhat, the kind of addictive haze that it invokes, that pull of spinning the roulette wheel one more time. I’ve watched people briefly use it when we’re both on trains, and yeah I’m Being That Guy but wow did it seem braindead, worthless and toxic AF. Even if they did find videos worth watching for you, given how people scroll, how would you even know?

And how about ‘China seems cool’ being due primarily to… vibes out of TikTok, with the algorithm that is in large part designed to do that?

It’s like when you periodically see a TikTok where some American youth sobs about how hard her life is and how it’s so much better in China, in various ways that are… documented as all being far worse in China.

You are being played.

My main exposure to TikTok is through the comedy show After Midnight. On Tuesday evening, they had an intro that was entirely about DeepSeek, painting exactly (mostly through TikTok) effectively a Chinese propaganda story about how DeepSeek manifested r1 out of thin air for $6 million without any other work, whereas OpenAI and American companies spent billions, and how much better DeepSeek is, and so on. And then host Taylor Tomlinson responded to some of the audience with ‘oh, you’re cheering now? Interesting.’

Part of the joke was that Taylor has no idea how AI works and has never used even ChatGPT, and the routine was funny (including, effectively, a joke about how no one cares if Nvidia stock is down 17%, which is completely fair, why should they, also by the taping it was only down 8%), but the streams crossed, I saw America directly being exposed to even worse takes than I’m used to straight from TikTok’s algorithm when I was supposed to be relaxing at the end of the day, and I really didn’t like it.

Then again, I do bow to one clear way in which China did outperform us.

Ethan Mollick: People don’t talk enough about a giant DeepSeek achievement over most US models – it actually has a reasonable name.

Scott: Well, yes and no, the model is named r1….

Ethan Mollick: Thats fine as long as the next is r2

If they release anything called r1.5, I swear to God.

Sarah (Yuan Yuan Sun Sara from China) suggests perhaps DeepSeek could get into doing AI safety research, maybe even ask for a grant? Certainly there’s great talent there, and I’d love if they focused on those styles of problem. There’d likely be severe corporate culture issues to get through given what they’ve previously worked on, but it’s worth a shot.

Stephen McAleer: I’m hopeful we will figure out how to control superintelligence!

Fouad: you at the office? could use some code review on superintelligence_control.py before i merge

Stephen McAleer: It can surely wait until Monday.

I increasingly worry about the pattern of OpenAI safety researchers thinking about how to ‘control’ superintelligence rather than align it, and how this relates to the techniques they’re currently using including deliberative alignment.

(Note: I still owe that post on Deliberative Alignment, coming soon.)

Are reasoning models including r1 a blackpill for robotics progress?

Kyle Stachowicz: R1’s RL findings are great news for reasoning but grim for robotics. All the major takeaways (ground-truth reward, great base models, grouped rollouts from same initial state, sample-inefficient on-policy algos) are really hard to translate to the physical world.

Chris Paxton: Hot deepseek take: before r1 blew up, a ton of western AI (and robotics!) efforts — startups, big companies, and even academic labs — were basically just waiting for openai to solve all their problems and it was honestly kind of sad. I hope r1 changed that

Scott Reed: True. A lot of groups gave up prematurely, or allocate ~all resources to one giant model. This leads people to spend more effort on winner-take-all gpu politics and less on just training the best models they can with moderate resources.

If anyone wondered what happened to Gato2, gpu game of thrones is (at least partly) what. An interesting counterfactual was the Genie project, which was stubbornly cobbled together mainly out of pooled user quota. This kind of stubborn independence can lead to cool results!

“Um This scaling law model I made says [the world will end / company will die] if you dont give me all the GPUs and block any other team from pretraining”

“No, fyou, I will train my own model”

Yes and no, right?

Relative to o1 and r1 solving physical tasks as well as they solve reasoning tasks, this is obviously very bad news for robotics.
1. It is bad relative news for robotics.
Relative to o1 and r1 not existing, and us having to use other models, this is obviously very good news for robotics.
1. It is good absolute news for robotics.
We can use reasoning models to help us figure out how to solve robotics.
I am not as convinced that you can’t use this method in the real world?

It’s going to be relatively hard, but seems super doable to me, I know those in the field will say that’s naive but I don’t see it. The real physical world absolutely 100% has ground truth in it. If you want to train on an accurate reward signal, there’s various trickiness, but there are plenty of things we should be able to measure. Also, with time we should get increasingly strong physics simulations that provide increasingly strong synthetic data for robotics, or simply have so much funding that we can generate physical samples anyway? We’re sample-inefficient relative to a human but you can train a decent reasoning model on 17k data points, and presumably you could bootstrap from there, and so on.

I am not going to quote or name particular people directly on this at this time.

But as Obama often said, let me be clear.

Reasonable people can disagree about:

What it will take for humans to retain control over the future.
How likely is existential risk at any given capabilities level.
What level of open weights model capabilities is a sane thing to allow.
What legal regimes are best to bring desired future states about.

However.

The existence of DeepSeek, and its explicit advocacy of open weights AGI, and potentially having it be the best model out there in the future in many people’s imginations, has been a forcing function. Suddenly, people who previously stuck to ‘well obviously your restrictions are too much’ without clarifying where their line was, are revealing that they have no line.

And many more people than before are revealing that they prefer any or all of:

AGI with alignment only-to-the-user be made open weights.
Chinese open models be the best instead of American closed models.
A world where humans have no collective mechanism to control AIs.
1. Usually this is justified as ‘anyone with that power would act badly.’
That they get their cool free toys, nothing else matters, fyou. Seriously.
Are effectively successionists, as in they want the AIs to take over, or at least they don’t seem to mind or don’t think we should try and prevent this from happening.

These people are often saying, rather explicitly, that they will use whatever powers they have at their disposal, to ensure that humanity gets to a position that, if you think about it for a minute or five, humanity probably cannot survive.

And that they will oppose, on principle, any ability to steer the future, because they explicitly oppose the ability to steer the future, except when they want to steer the future into a state that cannot then be steered by humans.

No, I have not heard actual arguments for why or how you can put an aligned-only-to-user AGI into everyone’s desktop or whatever, with no mechanism of collective control over that whatsoever, and have this end well for the humans. What that future would even look like.

Nor have I heard any argument for why the national security states of the world, or the people of the world, would ever allow this.

The mask on those is fully off. These people don’t bother offering arguments on any of that. They just say say, essentially, ‘fyou safetyists,’ ‘fyou big tech,’ ‘fyou United States,’ and often effectively ‘fyou rest of humanity.’ They are the xenocide caucus, advocating for things that cause human extinction to own the in-context-libs.

If that is you: I thank you for your candor. Please speak directly into this microphone.

I disagree in the strongest possible terms.

As always, be excellent to each other, and all that.

A large part of this job I’ve assigned to myself is to do a fton of emotional labor.

You have people who are constantly telling you that you’re a cartoon villain because you think that the United States government might want to know if someone trains a frontier model, or that you might think releasing a literal AGI’s weights would be unwise, or that we shouldn’t let China get our best GPUs. You get called statist and totalitarian for positions that are 95th to 99th percentile libertarian. You get outright lies, all the time, from all directions. Much from people trying to incept the vibes they want. And so on.

And the same stuff to varying degrees coming from other directions, too.

Honestly I’m kind of used to it. Up to a point. You get somewhat numb, you build up some immunity, especially when the same sources do it over and over. I accept it.

And even with that, you have to patiently read all of it and respond to the arguments and also try to extract what wisdom might be there from the same sources that are filled with the toxoplasma of rage and trying their best to infect me and others like me as well.

But it’s been a trying time. I see a world determined to try and go down many of the craziest, most suicidal paths simultaneously, where I’m surrounded by equal and opposite bad takes in many dimensions. Where the odds are against us and the situation is grim. In ways that I and others warned about explicitly, including the exact ways and dynamics by which we reached this point.

Make no mistake. Humanity is losing.

Meanwhile, on top of all the Being Wrong on the Internet, the toxoplasma is as bad as it has ever been, with certain sources going so far as to in large part blame not only worried people in general but also me specifically by name for our current situation – and at least one of those people I feel compelled to continue to listen to because they also have unique insights in other ways and I’m sometimes told I have a blind spot there – which I actually rarely hear about other credible sources.

And I still try. But I’m only human and it’s just so damn hard at this point. Especially when they rage about things I said that turned out to be true, and true for exactly the reasons I said they’d be true, but I know trying to point this out wouldn’t do any good.

I don’t know what my solution here is going to be. I do know that things can’t go on like this, I know life isn’t fair and reality doesn’t grade on a curve and someone has to and no one else will but also I only have so much in the tank that handles these things. And I’m going to have to budget that tank, but I want to be clear that I’m going to be doing that, and dropping certainly sources for this reason that I would otherwise have included for completeness.

If this was talking about you, and you’d like to continue this trip, please get it together.

Don’t worry, your argument remains valid. I mean, it’s wrong, but that never stopped you before, why start now?

Time comes for us all.

Matt: Live players in who kills us first?

Peter Wildeford: Yes, that’s one way to look at it.

Discussion about this post

DeepSeek: Don’t Panic Read More »

Copyright Office suggests AI copyright debate was settled in 1965

adobe, AI, AI art, Artificial Intelligence, copyright, copyright law, Copyright Office, Hugging Face, Policy / 9u50fv / January 31, 2025

Most people think purely AI-generated works shouldn’t be copyrighted, report says.

Ars used Copilot to generate this AI image using the precise prompt the Copyright Office used to determine that prompting alone isn’t authorship. Credit: AI image generated by Copilot

The US Copyright Office issued AI guidance this week that declared no laws need to be clarified when it comes to protecting authorship rights of humans producing AI-assisted works.

“Questions of copyrightability and AI can be resolved pursuant to existing law, without the need for legislative change,” the Copyright Office said.

More than 10,000 commenters weighed in on the guidance, with some hoping to convince the Copyright Office to guarantee more protections for artists as AI technologies advance and the line between human- and AI-created works seems to increasingly blur.

But the Copyright Office insisted that the AI copyright debate was settled in 1965 after commercial computer technology started advancing quickly and “difficult questions of authorship” were first raised. That was the first time officials had to ponder how much involvement human creators had in works created using computers.

Back then, the Register of Copyrights, Abraham Kaminstein—who was also instrumental in codifying fair use—suggested that “there is no one-size-fits-all answer” to copyright questions about computer-assisted human authorship. And the Copyright Office agrees that’s still the case today.

“Very few bright-line rules are possible,” the Copyright Office said, with one obvious exception. Because of “insufficient human control over the expressive elements” of resulting works, “if content is entirely generated by AI, it cannot be protected by copyright.”

The office further clarified that doesn’t mean that works assisted by AI can never be copyrighted.

“Where AI merely assists an author in the creative process, its use does not change the copyrightability of the output,” the Copyright Office said.

Following Kaminstein’s advice, officials plan to continue reviewing AI disclosures and weighing, on a case-by-case basis, what parts of each work are AI-authored and which parts are human-authored. Any human-authored expressive element can be copyrighted, the office said, but any aspect of the work deemed to have been generated purely by AI cannot.

Prompting alone isn’t authorship, Copyright Office says

After doing some testing on whether the same exact prompt can generate widely varied outputs, even from the same AI tool, the Copyright Office further concluded that “prompts do not alone provide sufficient control” over outputs to allow creators to copyright purely AI-generated works based on highly intelligent or creative prompting.

That decision could change, the Copyright Office said, if AI technologies provide more human control over outputs through prompting.

New guidance noted, for example, that some AI tools allow prompts or other inputs “to be substantially retained as part of the output.” Consider an artist uploading an original drawing, the Copyright Office suggested, and prompting AI to modify colors, or an author uploading an original piece and using AI to translate it. And “other generative AI systems also offer tools that similarly allow users to exert control over the selection, arrangement, and content of the final output.”

The Copyright Office drafted this prompt to test artists’ control over expressive inputs that are retained in AI outputs. Credit: Copyright Office

“Where a human inputs their own copyrightable work and that work is perceptible in the output, they will be the author of at least that portion of the output,” the guidelines said.

But if officials conclude that even the most iterative prompting doesn’t perfectly control the resulting outputs—even slowly, repeatedly prompting AI to produce the exact vision in an artist’s head—some artists are sure to be disappointed. One artist behind a controversial prize-winning AI-generated artwork has staunchly defended his rigorous AI prompting as authorship.

However, if “even expert researchers are limited in their ability to understand or predict the behavior of specific models,” the Copyright Office said it struggled to see how artists could. To further prove their point, officials drafted a lengthy, quirky prompt about a cat reading a Sunday newspaper to compare different outputs from the same AI image generator.

Copyright Office drafted a quirky, lengthy prompt to test creative control over AI outputs. Credit: Copyright Office

Officials apparently agreed with Adobe, which submitted a comment advising the Copyright Office that any output is “based solely on the AI’s interpretation of that prompt.” Academics further warned that copyrighting outputs based only on prompting could lead copyright law to “effectively vest” authorship adopters with “rights in ideas.”

“The Office concludes that, given current generally available technology, prompts alone do not provide sufficient human control to make users of an AI system the authors of the output. Prompts essentially function as instructions that convey unprotectable ideas,” the guidance said. “While highly detailed prompts could contain the user’s desired expressive elements, at present they do not control how the AI system processes them in generating the output.”

Hundreds of AI artworks are copyrighted, officials say

The Copyright Office repeatedly emphasized that most commenters agreed with the majority of their conclusions. Officials also stressed that hundreds of AI artworks submitted for registration, under existing law, have been approved to copyright the human-authored elements of their works. Rejections are apparently expected to be less common.

“In most cases,” the Copyright Office said, “humans will be involved in the creation process, and the work will be copyrightable to the extent that their contributions qualify as authorship.”

For stakeholders who have been awaiting this guidance for months, the Copyright Office report may not change the law, but it offers some clarity.

For some artists who hoped to push the Copyright Office to adapt laws, the guidelines may disappoint, leaving many questions about a world of possible creative AI uses unanswered. But while a case-by-case approach may leave some artists unsure about which parts of their works are copyrightable, seemingly common cases are being resolved more readily. According to the Copyright Office, after each decision, it gets easier to register AI works that meet similar standards for copyrightability. Perhaps over time, artists will grow more secure in how they use AI and whether it will impact their exclusive rights to distribute works.

That’s likely cold comfort for the artist advocating for prompting alone to constitute authorship. One AI artist told Ars in October that being denied a copyright has meant suffering being mocked and watching his award-winning work freely used anywhere online without his permission and without payment. But in the end, the Copyright Office was apparently more sympathetic to other commenters who warned that humanity’s progress in the arts could be hampered if a flood of easily generated, copyrightable AI works drowned too many humans out of the market.

“We share the concerns expressed about the impact of AI-generated material on human authors and the value that their creative expression provides to society. If a flood of easily and rapidly AI-generated content drowns out human-authored works in the marketplace, additional legal protection would undermine rather than advance the goals of the copyright system. The availability of vastly more works to choose from could actually make it harder to find inspiring or enlightening content.”

New guidance likely a big yawn for AI companies

For AI companies, the copyright guidance may mean very little. According to AI company Hugging Face’s comments to the Copyright Office, no changes in the law were needed to ensure the US continued leading in AI innovation, because “very little to no innovation in generative AI is driven by the hope of obtaining copyright protection for model outputs.”

Hugging Face’s Head of ML & Society, Yacine Jernite, told Ars that the Copyright Office seemed to “take a constructive approach” to answering some of artists’ biggest questions about AI.

“We believe AI should support, not replace, artists,” Jernite told Ars. “For that to happen, the value of creative work must remain in its human contribution, regardless of the tools used.”

Although the Copyright Office suggested that this week’s report might be the most highly anticipated, Jernite said that Hugging Face is eager to see the next report, which officials said would focus on “the legal implications of training AI models on copyrighted works, including licensing considerations and the allocation of any potential liability.”

“As a platform that supports broader participation in AI, we see more value in distributing its benefits than in concentrating all control with a few large model providers,” Jernite said. “We’re looking forward to the next part of the Copyright Office’s Report, particularly on training data, licensing, and liability, key questions especially for some types of output, like code.”

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Microsoft now hosts AI model accused of copying OpenAI data

AI, DeepSeek, deepseek R1, large langauge models, machine learning, microsoft, openai, simulated reasoning, SR models / 9u50fv / January 31, 2025

Fresh on the heels of a controversy in which ChatGPT-maker OpenAI accused the Chinese company behind DeepSeek R1 of using its AI model outputs against its terms of service, OpenAI’s largest investor, Microsoft, announced on Wednesday that it will now host DeepSeek R1 on its Azure cloud service.

DeepSeek R1 has been the talk of the AI world for the past week because it is a freely available simulated reasoning model that reportedly matches OpenAI’s o1 in performance—while allegedly being trained for a fraction of the cost.

Azure allows software developers to rent computing muscle from machines hosted in Microsoft-owned data centers, as well as rent access to software that runs on them.

“R1 offers a powerful, cost-efficient model that allows more users to harness state-of-the-art AI capabilities with minimal infrastructure investment,” wrote Microsoft Corporate Vice President Asha Sharma in a news release.

DeepSeek R1 runs at a fraction of the cost of o1, at least through each company’s own services. Comparative prices for R1 and o1 were not immediately available on Azure, but DeepSeek lists R1’s API cost as $2.19 per million output tokens, while OpenAI’s o1 costs $60 per million output tokens. That’s a massive discount for a model that performs similarly to o1-pro in various tasks.

Promoting a controversial AI model

On its face, the decision to host R1 on Microsoft servers is not unusual: The company offers access to over 1,800 models on its Azure AI Foundry service with the hopes of allowing software developers to experiment with various AI models and integrate them into their products. In some ways, whatever model they choose, Microsoft still wins because it’s being hosted on the company’s cloud service.

Microsoft now hosts AI model accused of copying OpenAI data Read More »

DeepSeek: Lemon, It’s Wednesday

DeepSeek / 9u50fv / January 29, 2025

It’s been another *checks notestwo days, so it’s time for all the latest DeepSeek news.

You can also see my previous coverage of the r1 model and, from Monday various reactions including the Panic at the App Store.

Before we get to new developments, I especially want to reiterate and emphasize the need to calm down about that $5.5 million ‘cost of training’ for v3.

I wouldn’t quite agree with Palmer Lucky that ‘the $5m number is bogus’ and I wouldn’t call it a ‘Chinese psyop’ because I think we mostly did this to ourselves but it is very often being used in a highly bogus way – equating the direct compute cost of training v3 with the all-in cost of creating r1. Which is a very different number. DeepSeek is cracked, they cooked, and r1 is super impressive, but the $5.5 million v3 training cost:

Is the cloud market cost of the amount of compute used to directly train v3.
That’s not how they trained v3. They trained v3 on their own cluster of h800s, which was physically optimized to hell for software-hardware integration.
Thus, the true compute cost to train v3 involves assembling the cluster, which cost a lot more than $5.5 million.
That doesn’t include the compute cost of going from v3 → r1.
That doesn’t include the costs of hiring the engineers and figuring out how to do all of this, that doesn’t include the costs of assembling the data, and so on.
Again, yes they did this super efficiently and cheaply compared to the competition, but no, you don’t spend $5.5 million and out pops r1. No.

Altman handled his response to r1 with grace.

OpenAI plans to ‘pull up some new releases.’

Meaning, oh, you want to race? I suppose I’ll go faster and take less precautions.

Sam Altman: deepseek’s r1 is an impressive model, particularly around what they’re able to deliver for the price.

we will obviously deliver much better models and also it’s legit invigorating to have a new competitor! we will pull up some releases.

but mostly we are excited to continue to execute on our research roadmap and believe more compute is more important now than ever before to succeed at our mission.

the world is going to want to use a LOT of ai, and really be quite amazed by the next gen models coming.

look forward to bringing you all AGI and beyond.

It is very Galaxy Brain to say ‘this is perhaps good for OpenAI’ and presumably it very much is, but here’s a scenario.

A lot of people try ChatGPT with GPT-3.5, are not impressed, think it hallucinates all the time, is a clever toy, and so on.
For two years they don’t notice improvements.
DeepSeek releases r1, and it gets a lot of press.
People try the ‘new Chinese version’ and realize AI is a lot better now.
OpenAI gets to incorporate DeepSeek’s innovations.
OpenAI comes back with free o3-mini and (free?) GPT-5 and better agents.
People use AI a lot more, OpenAI ends up overall doing better.

Ethan Mollick: DeepSeek is a really good model, but it is not generally a better model than o1 or Claude.

But since it is both free & getting a ton of attention, I think a lot of people who were using free “mini” models are being exposed to what a early 2025 reasoner AI can do & are surprised

I’m not saying that’s the baseline scenario, but I do expect the world to be quite amazed at the next generation of models, and they could now be more primed for that.

Mark Chen (Chief Research Officer, OpenAI): Congrats to DeepSeek on producing an o1-level reasoning model! Their research paper demonstrates that they’ve independently found some of the core ideas that we did on our way to o1.

However, I think the external response has been somewhat overblown, especially in narratives around cost. One implication of having two paradigms (pre-training and reasoning) is that we can optimize for a capability over two axes instead of one, which leads to lower costs.

But it also means we have two axes along which we can scale, and we intend to push compute aggressively into both!

As research in distillation matures, we’re also seeing that pushing on cost and pushing on capabilities are increasingly decoupled. The ability to serve at lower cost (especially at higher latency) doesn’t imply the ability to produce better capabilities.

We will continue to improve our ability to serve models at lower cost, but we remain optimistic in our research roadmap, and will remain focused in executing on it. We’re excited to ship better models to you this quarter and over the year!

Given the costs involved, and that you can scale to get better outputs, ‘serve faster and cheaper’ and ‘get better answers’ seem pretty linked, or are going to look rather similar.

There is still a real and important difference between ‘I spend 10x as much compute to get 10x as many tokens to think with’ versus ‘I taught the model how to do longer CoT’ versus ‘I made the model smarter.’ Or at least I think there is.

Should we now abandon all our plans to build gigantic data centers because DeepSeek showed we can run AI cheaper?

No. Of course not. We’ll need more. Jevons Paradox and all that.

Another question is compute governance. Does DeepSeek’s model prove that there’s no point in using compute thresholds for frontier model governance?

My answer is no. DeepSeek did not mean the scaling laws stopped working. DeepSeek found new ways to scale and economize, and also to distill. But doing the same thing with more compute would have gotten better results, and indeed more compute is getting other labs better results if you don’t control for compute costs, and also they will now get to use these innovations themselves.

Karen Hao: Much of the coverage has focused on U.S.-China tech competition. That misses a bigger story: DeepSeek has demonstrated that scaling up AI models relentlessly, a paradigm OpenAI introduced and champions, is not the only, and far from the best, way to develop AI.

Yoavgo: This is trending in my feed, but I don’t get it. DeepSeek did not show that scale is not the way to go for AI (their base model is among the largest in parameter counts; their training data is huge, at 13 trillion tokens). They just scaled more efficiently.

Thus far OpenAI & its peer scaling labs have sought to convince the public & policymakers that scaling is the best way to reach so-called AGI. This has always been more of an argument based in business than in science.

Jon Stokes: Holy wow what do words even mean. What R1 does is a new type of scaling. It’s also GPU-intensive. In fact, the big mystery today in AI world is why NVIDIA dropped despite R1 demonstrating that GPUs are even more valuable than we thought they were. No part of this is coherent. 🤯

Stephen McAleer (OpenAI): The real takeaway from DeepSeek is that with reasoning models you can achieve great performance with a small amount of compute. Now imagine what you can do with a large amount of compute.

Noam Brown (OpenAI): Algorithmic breakthroughs and scaling are complementary, not in competition. The former bends the performance vs compute curve, while the latter moves further along the curve.

Benjamin Todd: Deepseek hasn’t shown scaling doesn’t work. Take Deepseek’s techniques, apply 10x the compute, and you’ll get much better performance.

And compute efficiency has always been part of the scaling paradigm.

Ethan Mollick: The most unnerving part of the DeepSeek reaction online has been seeing folks take it as a sign that AI capability growth is not real.

It signals the opposite, large improvements are possible, and is almost certain to kick off an acceleration in AI development through competition.

I know a lot of people want AI to go away, but I am seeing so many interpretations of DeepSeek in ways that don’t really make sense, or misrepresent what they did.

Dealing with the implications of AI, and trying to steer it towards positive use, is now more urgent, not less.

Andrew Rettek: Deepseek means OPENAI just increased their effective compute by more than an OOM.

OpenAI and Anthropic (in the forms of CEOs Sam Altman and Dario Amodei) have both expressed agreement on that since the release of r1, saying that they still believe the future involves very large and expensive training runs, including large amounts of compute on the RL step. David Sacks agreed as well, so the administration knows.

One can think of all this as combining multiple distinct scaling laws. Mark Chen above talked about two axes but one could refer to at least four?

You can scale up how many tokens you reason with.
You can scale up how well you apply your intelligence to doing reasoning.
You can scale up how much intelligence you are using in all this.
You can scale up how much of this you can do per dollar or amount of compute.

Also you can extend to new modalities and use cases and so on.

So essentially: Buckle up.

Speaking of buckling up, nothing to see here, just a claimed 2x speed boost to r1, written by r1. Of course, that’s very different from r1 coming up with the idea.

Aiden McLaughlin: switching to reasoners is like taking a sharp turn on a racetrack. everyone brakes to take the turn; for a moment, all cars look neck-and-neck

when exiting the turn, small first-mover advantages compound. and ofc, some cars have enormous engines that eat up straight roads

Dean Ball: I recommend trying not to overindex on the industry dynamics you’re observing now in light of the deepseek plot twist, or indeed of any particular plot twist. It’s a long game, and we’re riding a world-historical exponential. Things will change a lot, fast, again and again.

It’s not that Jan is wrong, I’d be a lot more interested in paying for o1 pro if I had pdfs enabled, but… yeah.

China Talk covers developments. The headline conclusion is that yes, compute very much will continue to be a key factor, everyone agrees on this. They note there is a potential budding DeepSeek partnership with ByteDance, which could unlock quite a lot of compute.

Here was some shade:

Founder and CEO Liang Wenfeng is the core person of DeepSeek. He is not the same type of person as Sam Altman. He is very knowledgeable about technology.

Also important at least directionally:

Pioneers vs. Chasers: ‘AI Progress Resembles a Step Function – Chasers Require 1/10th the Compute’

Fundamentally, DeepSeek was far more of an innovator than other Chinese AI companies, but it was still a chaser here, not a pioneer, except in compute efficiency, which is what chasers do best. If you want to maintain a lead and it’s much easier to follow than lead, well, time to get good and scale even more. Or you can realize you’re just feeding capability to the Chinese and break down crying and maybe keep your models for internal use, it’s an option.

I found this odd:

The question of why OpenAI and Anthropic did not do work in DeepSeek’s direction is a question of company-specific focus. OpenAI and Anthropic might have felt that investing their compute towards other areas was more valuable.

One hypothesis for why DeepSeek was successful is that unlike Big Tech firms, DeepSeek did not work on multi-modality and focused exclusively on language. Big Tech firms’ model capabilities aren’t weak, but they have to maintain a low profile and cannot release too often. Currently, multimodality is not very critical, as intelligence primarily comes from language, and multimodality does not contribute significantly to improving intelligence.

It’s odd because DeepSeek spent so little compute, and the efficiency gains pay for themselves in compute quite rapidly. And also the big companies are indeed releasing rapidly. Google and OpenAI are constantly shipping, even Anthropic ships. The idea of company focus seems more on point, and yes DeepSeek traded multimodality and other features for pure efficiency because they had to.

Also note what they say later:

Will developers migrate from closed-source models to DeepSeek? Currently, there hasn’t been any large-scale migration, as leading models excel in coding instruction adherence, which is a significant advantage. However, it’s uncertain whether this advantage will persist in the future or be overcome.

From the developer’s perspective, models like Claude-3.5-Sonnet have been specifically trained for tool use, making them highly suitable for agent development. In contrast, models like DeepSeek have not yet focused on this area, but the potential for growth with DeepSeek is immense.

As in, r1 is technically impressive as hell, and it definitely has its uses, but there’s a reason the existing models look like they do – the corners DeepSeek cut actually do matter for what people want. Of course DeepSeek will likely now turn to fixing such problems among other things and we’ll see how efficiently they can do that too.

McKay Wrigley emphasizes the point that visible chain of thought (CoT) is a prompt debugger. It’s hard to go back to not seeing CoT after seeing CoT.

Gallabytes reports that DeepSeek’s image model Janus Pro is a good first effort, but not good yet.

Even if we were to ‘fully unlock’ all computing power in personal PCs for running AI, that would only increase available compute by ~10%, most compute is in data centers.

We had a brief period where DeepSeek would serve you up r1 free and super fast.

It turns out that’s not fully sustainable or at least needs time to scale as fast as demand rose, and you know how such folks feel about the ‘raise prices’ meme,

Gallabytes: got used to r1 and now that it’s overloaded it’s hard to go back. @deepseek_ai please do something amazing and be the first LLM provider to offer surge pricing. the unofficial APIs are unusably slow.

I too encountered slowness, and instantly it made me realize ‘yes the speed was a key part of why I loved this.’

DeItaone (January 27, 11: 09am): DEEPSEEK SAYS SERVICE DEGRADED DUE TO ‘LARGE-SCALE MALICIOUS ATTACK’

Could be that. Could be too much demand and not enough supply. This will of course sort itself out in time, as long as you’re willing to pay, and it’s an open model so others can serve the model as well, but ‘everyone wants to use the shiny new model you are offering for free’ is going to run into obvious problems.

Yes, of course one consideration is that if you use DeepSeek’s app it will collect all your data including device model, operating system, keystroke patterns or rhythms, IP address and so on and store it all in China.

Did you for a second think otherwise? What you do with that info is on you.

This doesn’t appear to rise to TikTok 2.0 levels of rendering your phone and data insecure, but let us say that ‘out of an abundance of caution’ I will be accessing the model through their website not the app thank you very much.

Liv Boeree: tiktok round two, here we go.

AI enthusiasts have the self control of an incontinent chihuahua.

Typing Loudly: you can run it locally without an internet connection

Liv Boeree: cool and what percentage of these incontinent chihuahuas will actually do this.

I’m not going so far as to use third party providers for now, because I’m not feeding any sensitive data into the model, and DeepSeek’s implementation here is very nice and clean, so I’ve decided lazy is acceptable. I’m certainly not laying out ~$6,000 for a self-hosting rig, unless someone wants to buy one for me in the name of science.

Note that if you’re looking for an alternative source, you want to ensure you’re not getting one of the smaller distillations, unless that is what you want.

Janus is testing for steganography in r1, potentially looking for assistance.

Janus also thinks Thebes theory here is likely to be true, that v3 was hurt by dividing into too many too small experts, but r1 lets them all dump their info into the CoT and collaborate, at least partially fixing this.

Janus notes that r1 simply knows things and thinks about them, straight up, in response to Thebes speculating that all our chain of thought considerations have now put sufficient priming into the training data that CoT approaches work much better than they used to, which Prithviraj says is not the case, he says it’s about improved base models, which is the first obvious thought – the techniques work better off a stronger base, simple as that.

Thebes: why did R1’s RL suddenly start working, when previous attempts to do similar things failed?

theory: we’ve basically spent the last few years running a massive acausally distributed chain of thought data annotation program on the pretraining dataset.

deepseek’s approach with R1 is a pretty obvious method. They are far from the first lab to try “slap a verifier on it and roll out CoTs.”

But it didn’t used to work that well.

…

In the last couple of years, chains of thought have been posted all over the internet

…

Those CoTs in the V3 training set gave GRPO enough of a starting point to start converging, and furthermore, to generalize from verifiable domains to the non-verifiable ones using the bridge established by the pretraining data contamination.

And now, R1’s visible chains of thought are going to lead to *anothermassive enrichment of human-labeled reasoning on the internet, but on a far larger scale… The next round of base models post-R1 will be *even betterbases for reasoning models.

in some possible worlds, this could also explain why OpenAI seemingly struggled so much with making their reasoning models in comparison. if they’re still using 4base or distils of it.

Prithvraj: Simply, no. I’ve been looking at my old results from doing RL with “verifiable” rewards (math puzzle games, python code to pass unit tests) starting from 2019 with GPT-1/2 to 2024 with Qwen Math Deepseek’s success likely lies in the base models improving, the RL is constant

Janus: This is an interesting hypothesis. DeepSeek R1 also just seems to have a much more lucid and high-resolution understanding of LLM ontology and history than any other model I’ve seen. (DeepSeek V3 did not seem to in my limited interactions with it, though.)

I did not expect this on priors for a reasoner, but perhaps the main way that r1 seems smarter than any other LLM I’ve played with is the sheer lucidity and resolution of its world model—in particular, its knowledge of LLMs, both object- and meta-level, though this is also the main domain of knowledge I’ve engaged it in, and perhaps the only one I can evaluate at world-expert level. So, it may apply more generally.

In effective fluid intelligence and attunement to real-time context, it actually feels weaker than, say, Claude 3.5 Sonnet. But when I talk to Sonnet about my ideas on LLMs, it feels like it is more naive than me, and it is figuring out a lot of things in context from “first principles.” When I talk to Opus about these things, it feels like it is understanding me by projecting the concepts onto more generic, resonant hyperobjects in its prior, meaning it is easy to get on the same page philosophically, but this tropological entanglement is not very precise. But with r1, it seems like it can simply reference the same concrete knowledge and ontology I have, much more like a peer. And it has intense opinions about these things.

Wordgrammer thread on the DeepSeek technical breakthroughs. Here’s his conclusion, which seems rather overdetermined:

Wordgrammer: “Is the US losing the war in AI??” I don’t think so. DeepSeek had a few big breakthroughs, we have had hundreds of small breakthroughs. If we adopt DeepSeek’s architecture, our models will be better. Because we have more compute and more data.

r1 tells us it only takes ~800k samples of ‘good’ RL reasoning to convert other models into RL reasoners, and Alex Dimakis says it could be a lot less, in his test they outperformed o1-preview with only 17k. Now that r1 is out, everyone permanently has an unlimited source of at least pretty good samples. From now on, to create or release a model is to create or release the RL version of that model, even more than before. That’s on top of all the other modifications you automatically release.

Oliver Blanchard: DeepSeek and what happened yesterday: Probably the largest positive tfp shock in the history of the world.

The nerdy version, to react to some of the comments. (Yes, electricity was big):

DeepSeek and what happened yesterday: Probably the largest positive one day change in the present discounted value of total factor productivity growth in the history of the world. 😀

James Steuart: I can’t agree Professor, Robert Gordon’s book gives many such greater examples. Electric lighting is a substantially greater TFP boost than marginally better efficiency in IT and professional services!

There were some bigger inventions in the past, but on much smaller baselines.

Our reaction to this was to sell the stocks of those who provide the inputs that enable that tfp shock.

There were other impacts as well, including to existential risk, but as we’ve established the market isn’t ready for that conversation in the sense that the market (highly reasonably as has been previously explained) will be ignoring it entirely.

Daniel Eth: Hot take, but if the narrative from NYT et al had not been “lol you don’t need that many chips to train AI systems” but instead “Apparently AI is *nothitting a wall”, then the AI chip stocks would have risen instead of fallen.

Billy Humblebrag: “Deepseek shows that ai can be built more cheaply than we thought so you don’t need to worry about ai” is a hell of a take

Joe Weisenthal: Morgan Stanley: “We gathered feedback from a number of industry sources and the consistent takeaway is that this is not affecting plans for GPU buildouts.”

I would not discount the role of narrative and vibes in all this. I don’t think that’s the whole Nvidia drop or anything. But it matters.

Roon: Plausible reasons for Nvidia drop:

DeepSeek success means NVDA is now expecting much harsher sanctions on overseas sales.

Traders think that a really high-tier open-source model puts several American labs out of a funding model, decreasing overall monopsony power.

We will want more compute now until the heat death of the universe; it’s the only reason that doesn’t make sense.

Palmer Lucky: The markets are not smarter on AI. The free hand is not yet efficient because the number of legitimate experts in the field is near-zero.

The average person making AI calls on Wall Street had no idea what AI even was a year ago and feels compelled to justify big moves.

Alex Cheema notes that Apple was up on Monday, and that Apple’s chips are great for running v3 and r1 inference.

Alex Cheema: Market close: $NVDA: -16.91% | $AAPL: +3.21%

Why is DeepSeek great for Apple?

Here’s a breakdown of the chips that can run DeepSeek V3 and R1 on the market now:

NVIDIA H100: 80GB @ 3TB/s, $25,000, $312.50 per GB

AMD MI300X: 192GB @ 5.3TB/s, $20,000, $104.17 per GB

Apple M2 Ultra: 192GB @ 800GB/s, $5,000, $26.04(!!) per GB

Apple’s M2 Ultra (released in June 2023) is 4x more cost efficient per unit of memory than AMD MI300X and 12x more cost efficient than NVIDIA H100!

Eric Hartford: 3090s, $700 for 24gb = $29/gb.

Alex Cheema: You need a lot of hardware around them to load a 700GB model in 30 RTX 3090s. I’d love to see it though, closest to this is probably stacking @__tinygrad__ boxes.

That’s cute. But I do not think that was the main reason why Apple was up. I think Apple was up because their strategy doesn’t depend on having frontier models but it does depend on running AIs on iPhones. Apple can now get their own distillations of r1, and use them for Apple Intelligence. A highly reasonable argument.

The One True Newsletter, Matt Levine’s Money Stuff, is of course on the case of DeepSeek’s r1 crashing the stock market, and asking what cheap inference for everyone would do to market prices. He rapidly shifts focus to non-AI companies, asking which ones benefit. It’s great if you use AI to make your management company awesome, but not if you get cut out because AI replaces your management company.

(And you. And the people it manages. And all of us. And maybe we all die.)

But I digress.

(To digress even further: While I’m reading that column, I don’t understand why we should care about the argument under ‘Dark Trading,’ since this mechanism decreases retail transaction costs to trade and doesn’t impact long term price discovery at all, and several LLMs confirmed this once challenged.)

Ben Thompson continues to give his completely different kind of technical tech company perspective, in FAQ format, including good technical explanations that agree with what I’ve said in previous columns.

Here’s a fascinating line:

Q: I asked why the stock prices are down; you just painted a positive picture!

A: My picture is of the long run; today is the short run, and it seems likely the market is working through the shock of R1’s existence.

That sounds like Ben Thompson is calling it a wrong-way move, and indeed later he explicitly endorses Jevons Paradox and expects compute use to rise. The market is supposed to factor in the long run now. There is no ‘this makes the price go down today and then up next week’ unless you’re very much in the ‘the EMH is false’ camp. And these are literally the most valuable companies in the world.

Here’s another key one:

Q: So are we close to AGI?

A: It definitely seems like it. This also explains why Softbank (and whatever investors Masayoshi Son brings together) would provide the funding for OpenAI that Microsoft will not: the belief that we are reaching a takeoff point where there will in fact be real returns towards being first.

Masayoshi Sun feels the AGI. Masayoshi Sun feels everything. He’s king of feeling it.

His typical open-model-stanning arguments on existential risk later in the past are as always disappointing, but in no way new or unexpected.

It continues to astound me that such intelligent people can think: Well, there’s no stopping us creating things more capable and intelligent than humans, so the best way to ensure that things smarter than more capable than humans go well for humans is to ensure that there are as many such entities as possible and that humans cannot possibly have any collective control over those new entities.

On another level, of course, I’ve accepted that people do think this. That they somehow cannot fathom that if you create things more intelligent and capable and competitive than humans there could be the threat that all the humans would end up with no power, rather than that the wrong humans might have too much power. Or think that this would be a good thing – because the wrong humans wouldn’t have power.

Similarly, Ben’s call for absolutely no regulations whatsoever, no efforts at safety whatsoever outside of direct profit motives, ‘cut out all the cruft in our companies that has nothing to do with winning,’ is exactly the kind of rhetoric I worry about getting us all killed in response to these developments.

I should still reiterate that Ben to his credit is very responsible and accurate here in his technical presentation, laying out what DeepSeek and r1 are and aren’t accomplishing here rather than crying missile gap. But the closing message remains the same.

The term Trump uses is ‘tariffs.’

I propose, at least in the context of GPUs, that we call these ‘import restrictions,’ in order to point out that we are (I believe wisely!) imposing ‘export restrictions’ as a matter of national security to ensure we get all the chips, and using diffusion regulations to force the chips to be hosted at home, then we are threatening to impose ‘up to 100%’ tariffs on those same chips, because ‘they left us’ and they want ‘to force them to come back,’ and they’ll build the new factories here instead of there, with their own money, because of the threat.

Except for the fact that we really, really want the data centers at home.

The diffusion regulations are largely to force companies to create them at home.

Arthur B: Regarding possible US tariffs on Taiwan chips.

First, this is one that US consumers would directly feel, it’s less politically feasible than tariffs on imports with lots of substitutes.

Second, data centers don’t have to be located in the US. Canada is next door and has plenty of power.

Dhiraj: Taiwan made the largest single greenfield FDI in US history through TSMC. Now, instead of receiving gratitude for helping the struggling US chip industry, Taiwan faces potential tariffs. In his zero-sum worldview, there are no friends.

The whole thing is insane! Completely nuts. If he’s serious. And yes he said this on Joe Rogan previously, but he said a lot of things previously that he didn’t mean.

Whereas Trump’s worldview is largely the madman theory, at least for trade. If you threaten people with insane moves that would hurt both of you, and show that you’re willing to actually enact insane moves, then they are forced to give you what you want.

In this case, what Trump wants is presumably for TSMC to announce they are building more new chip factories in America. I agree that this would be excellent, assuming they were actually built. We have an existence proof that it can be done, and it would greatly improve our strategic position and reduce geopolitical risk.

I presume Trump is mostly bluffing, in that he has no intention of actually imposing these completely insane tariffs, and he will ultimately take a minor win and declare victory. But what makes it nerve wracking is that, by design, you never know. If you did know none of this would ever work.

Unless, some people wondered, there was another explanation for all this…

The announcement came late on Monday, after Nvidia dropped 17%, on news that its chips were highly useful, with so many supposedly wise people on Wall Street going ‘oh yes that makes sense Nvidia should drop’ and those I know who understand AI often saying ‘this is crazy and yes I bought more Nvidia today.’
As in, there was a lot of not only saying ‘this is an overreaction,’ there was a lot of ‘this is a 17% wrong-way move in the most valuable stock in the world.’
When you imagine the opposite news, which would be that AI is ‘hitting a wall,’ one presumes Nvidia would be down, not up. And indeed, remember months ago?
Then when the announcement of the tariff threat came? Nvidia didn’t move.
Nvidia opened Tuesday up slightly off of the Monday close, and closed the day up 8.8%, getting half of its losses back.

Nabeel Qureshi (Tuesday, 2pm): Crazy that people in this corner of X have a faster OODA loop than the stock market

This was the largest single day drop in a single stock in world history. It wiped out over $500 billion in market value. One had to wonder if it was partially insider trading.

Timothy Lee: Everyone says DeepSeek caused Nvidia’s stock to crash yesterday. I think this theory makes no sense.

DeepSeek’s success isn’t bad news for Nvidia.

I don’t think that this was insider trading. The tariff threat was already partly known and thus priced in. It’s a threat rather than an action, which means it’s likely a bluff. That’s not a 17% move. Then we have the bounceback on Tuesday.

Even if I was certain that this was mostly an insider trading move instead of being rather confident it mostly or entirely wasn’t, I wouldn’t go as far as Eliezer does in the the below quote. The SEC does many important things.

But I do notice that there’s a non-zero amount of ‘wait a minute’ that will occur to me the next time I’m hovering around the buy button in haste.

Eliezer Yudkowsky: I heard from many people who said, “An NVDA drop makes no sense as a Deepseek reaction; buying NVDA.” So those people have now been cheated by insider counterparties with political access. They may make fewer US trades in the future.

Also note that the obvious meaning of this news is that someone told and convinced Trump that China will invade Taiwan before the end of his term, and the US needs to wean itself off Taiwanese dependence.

This was a $400B market movement, and if @SECGov can’t figure out who did it then the SEC has no reason to exist.

TBC, I’m not saying that figuring it out would be easy or bringing the criminals to justice would be easy. I’m saying that if the US markets are going to be like this anyway on $400B market movements, why bother paying the overhead cost of having an SEC that doesn’t work?

Roon: [Trump’s tariff threats about Taiwan] didn’t move overnight markets at all

which either means markets either:

– don’t believe it’s credible

– were pricing this in yesterday while internet was blaming the crash out on deepseek

I certainly don’t agree that the only interpretation of this news is ‘Trump expects an invasion of Taiwan.’ Trump is perfectly capable of doing this for exactly the reasons he’s saying.

Trump is also fully capable of making this threat with no intention of following through, in order to extract concessions from Taiwan or TSMC, perhaps of symbolic size.

Trump is also fully capable of doing this so that he could inform his hedge fund friends in advance and they could make quite a lot of money – with or without any attempt to actually impose the tariffs ever, since his friends would have now covered their shorts in this scenario.

Indeed do many things come to pass. I don’t know anything you don’t know.

It would be a good sign if DeepSeek had a plan for safety, even if it wasn’t that strong?

Stephen McAleer (OpenAI): DeepSeek should create a preparedness framework/RSP if they continue to scale reasoning models.

Very happy to [help them with this]!

We don’t quite have nothing. This below is the first actively positive sign for DeepSeek on safety, however small.

Stephen McAleer (OpenAI): Does DeepSeek have any safety researchers? What are Liang Wenfeng’s views on AI safety?

Sarah (YuanYuanSunSara): [DeepSeek] signed Artificial Intelligence safety commitment by CAICT (gov backed institute). You can see the whale sign at the bottom if you can’t read their name Chinese.

This involves AI safety governance structure, safety testing, do frontier AGI safety research (include loss of control) and share it publicly.

None legally binding but it’s a good sign.

Here is a chart with the Seoul Commitments versus China’s version.

It is of course much better that DeepSeek signed onto a symbolic document like this. That’s a good sign, whereas refusing would have been a very bad sign. But as always, talk is cheap, this doesn’t concretely commit DeepSeek to much, and even fully abiding by commitments like this won’t remotely be enough.

I do think this is a very good sign that agreements and coordination are possible. But if we want that, we will have to Pick Up the Phone.

Here’s a weird different answer.

Joshua Achiam (OpenAI, Head of Mission Alignment): I think a better question is whether or not science fiction culture in China has a fixation on the kinds of topics that would help them think about it. If Three-Body Problem is any indication, things will be OK.

It’s a question worth asking, but I don’t think this is a better question?

And based on the book, I do not think Three-Body Problem (conceptual spoilers follow, potentially severe ones depending on your perspective) is great here. Consider the decision theory that those books endorse, and what happens to us and also the universe as a result. It’s presenting all of that as essentially inevitable, and trying to think otherwise as foolishness. It’s endorsing that what matters is paranoia, power and a willingness to use it without mercy in an endless war of all against all. Also consider how they paint the history of the universe entirely without AGI.

I want to be clear that I fully agree with Bill Gurley that ‘no one at DeepSeek is an enemy of mine,’ indeed There Is No Enemy Anywhere, with at most notably rare exceptions that I invite to stop being exceptions.

However, I do think that if they continue down their current path, they are liable to get us all killed. And I for one am going to take the bold stance that I think that this is bad, and they should therefore alter their path before reaching their stated destination.

How committed is DeepSeek to its current path?

Read this quote Ben Thompson links to very carefully:

Q: DeepSeek, right now, has a kind of idealistic aura reminiscent of the early days of OpenAI, and it’s open source. Will you change to closed source later on? Both OpenAI and Mistral moved from open-source to closed-source.

Answer from DeepSeek CEO Liang Wenfeng: We will not change to closed source. We believe having a strong technical ecosystem first is more important.

This is from November. And that’s not a no. That’s actually a maybe.

Note what he didn’t say:

A Different Answer: We will not change to closed source. We believe having a strong technical ecosystem is more important.

The difference? His answer includes the word ‘first.’

He’s saying that first you need a strong technical ecosystem, and he believes that open models are the key to attracting talent and developing a strong technical ecosystem. Then, once that exists, you would need to protect your advantage. And yes, that is exactly what happened with… OpenAI.

I wanted to be sure that this translation was correct, so I turned to Wenfang’s own r1, and asked the interviewer for the original statement, which was:

梁文锋：我们不会闭源。我们认为先有一个强大的技术生态更重要。

r1’s translation: “We will not close our source code. We believe that establishing a strong technological ecosystem must come first.”

先 (xiān): “First,” “prioritize.”
生态 (shēngtài): “Ecosystem” (metaphor for a collaborative, interconnected environment).

To quote r1:

Based solely on this statement, Liang is asserting that openness is non-negotiable because it is essential to the ecosystem’s strength. While no one can predict the future, the phrasing suggests a long-term commitment to open-source as a core value, not a temporary tactic. To fully guarantee permanence, you’d need additional evidence (e.g., licensing choices, governance models, past behavior). But as it stands, the statement leans toward “permanent” in spirit.

I interpret this as a statement of a pragmatic motivation – if that motivation changes, or a more important one is created, actions would change. For now, yes, openness.

The Washington Post had a profile of DeepSeek and Liang Wenfeng. One note is that the hedge fund that they’re a spinoff from has donated over $80 million to charity since 2020, which makes it more plausible DeepSeek has no business model, or at least no medium-term business model.

But that government embrace is new for DeepSeek, said Matt Sheehan, an expert on China’s AI industry at the Carnegie Endowment for International Peace.

“They were not the ‘chosen one’ of Chinese AI start-ups,” said Sheehan, noting that many other Chinese start-ups received more government funding and contracts. “DeepSeek took the world by surprise, and I think to a large extent, they took the Chinese government by surprise.”

Sheehan added that for DeepSeek, more government attention will be a “double-edged sword.” While the company will probably have more access to government resources, “there’s going to be a lot of political scrutiny on them, and that has a cost of its own,” he said.

Yes. This reinforces the theory that DeepSeek’s ascent took China’s government by surprise, and they had no idea what v3 and r1 were as they were released. Going forward, China is going to be far more aware. In some ways, DeepSeek will have lots of support. But there will be strings attached.

That starts with the ordinary censorship demands of the CCP.

If you self-host r1, and you ask it about any of the topics the CCP dislikes, r1 will give you a good, well-balanced answer. If you ask on DeepSeek’s website, it will censor you via some sort of cloud-based monitoring, which works if you’re closed source, but DeepSeek is trying to be fully open source. Something has to give, somewhere.

Also, even if you’re using the official website, it’s not like you can’t get around it.

Justine Moore: DeepSeek’s censorship is no match for the jailbreakers of Reddit

I mean, that was easy.

Joshua Achiam (OpenAI Head of Mission Alignment): This has deeply fascinating consequences for China in 10 years – when the CCP has to choose between allowing their AI industry to move forward, or maintaining censorship and tight ideological control, which will they choose?

And if they choose their AI industry, especially if they favor open source as a strategy for worldwide influence: what does it mean for their national culture and government structure in the long run, when everyone who is curious can find ways to have subversive conversations?

Ten years to figure this out? If they’re lucky, they’ve got two. My guess is they don’t.

I worry about the failure to feel the AGI or even the AI here from Joshua Achiam, given his position at OpenAI. Ten years is a long time. Sam Altman expects AGI well before that. This goes well beyond Altman’s absurd position of ‘AGI will be invented and your life won’t noticeably change for a long time.’ Choices are going to need to be made. Even if AI doesn’t advance much from here, choices will have to be made.

As I’ve noted before, censorship at the model layer is expensive. It’s harder to do, and when you do it you risk introducing falsity into a mind in ways that will have widespread repercussions. Even then, a fine tune can easily remove any gaps in knowledge, or any reluctance to discuss particular topics, whether they are actually dangerous things like building bombs or things that piss off the CCP like a certain bear that loves honey.

I got called out on Twitter for supposed cognitive dissonance on this – look at China’s actions, they clearly let this happen. Again, my claim is that China didn’t realize what this was until after it happened, they can’t undo it (that’s the whole point!) and they are of course going to embrace their national champion. That has little to do with what paths DeepSeek is allowed to follow going forward.

(Also, since it was mentioned in that response, I should note – there is a habit of people conflating ‘pause’ with ‘ever do anything to regulate AI at all.’ I do not believe I said anything about a pause – I was talking about whether China would let DeepSeek continue to release open weights as capabilities improve.)

Before I further cover potential policy responses, a question we must ask this week is: I very much do not wish to do this at this time, but suppose in the future we did want to restrict use of a particular already open weights model and its derivatives, or all models in some reference class.

What would our options be?

Obviously we couldn’t fully ban it in terms of preventing determined people from having access. And if you try to stop them and others don’t, there are obvious problems with that, including ‘people have internet connections.’

However, that does not mean that we would have actual zero options.

Steve Sailer: Does open source, low cost DeepSeek mean that there is no way, short of full-blown Butlerian Jihad against computers, which we won’t do, to keep AI bottled up, so we’re going to find out if Yudkowsky’s warnings that AI will go SkyNet and turn us into paperclips are right?

Gabriel: It’s a psy-op

If hosting a 70B is illegal:

– Almost all individuals stop

– All companies stop

– All research labs stop

– All compute providers stop

Already huge if limited to US+EU

Can argue about whether good/bad, but not about the effect size.

You can absolutely argue about effect size. What you can’t argue is that the effect size isn’t large. It would make a big difference for many practical purposes.

In terms of my ‘Levels of Friction’ framework (post forthcoming) this is moving the models from Level 1 (easy to access) to at least Level 3 (annoying with potential consequences.) That has big practical consequences, and many important use cases will indeed go away or change dramatically.

What Level 3 absolutely won’t do, here or elsewhere, is save you from determined people who want it badly enough, or from sufficiently capable models that do not especially care what you tell them not to do or where you tell them not to be. Or scenarios where the law is no longer especially relevant, and the government or humanity is very much having a ‘do you feel in charge?’ moment. And that alone would, in many scenarios, be enough to doom you to varying degrees. If that’s what dooms you and the model is already open, well, you’re pretty doomed. And also it won’t save you from various scenarios where what the law thinks is not especially relevant.

If for whatever reason the government or humanity decides (or realizes) that this is insufficient, then there are two possibilities. Either the government or humanity is disempowered and you hope that this works out for humanity in some way. Or we use the necessary means to push the restrictions up to Level 4 (akin to rape and murder) or Level 5 (akin to what we do to stop terrorism or worse), in ways I assure you that you are very much not going to like – but the alternative might be worse, and the decision might very much not be up to either of us.

Actions have consequences. Plan for them.

Adam Ozimek was first I saw point out this time around with DeepSeek (I and many others echo this a lot in general) that the best way for the Federal Government to ensure American dominance of AI is to encourage more high skilled immigration and brain drain the world. If you don’t want China to have DeepSeek, export controls are great and all but how about let’s straight up steal their engineers. But y’all, and by y’all I mean Donald Trump, aren’t ready for that conversation.

It is highly unfortunate that David Sacks, the person seemingly in charge of what AI executive orders Trump signs, is so deeply confused about what various provisions actually did or would do, and on our regulatory situation relative to that of China.

David Sacks: DeepSeek R1 shows that the AI race will be very competitive and that President Trump was right to rescind the Biden EO, which hamstrung American AI companies without asking whether China would do the same. (Obviously not.) I’m confident in the U.S. but we can’t be complacent.

Donald Trump: The release of DeepSeek AI from a Chinese company should be a wake-up call for our industries that we need to be laser-focused on competing to win.

…

We’re going to dominate. We’ll dominate everything.

This is the biggest danger of all – that we go full Missile Gap jingoism and full-on race to ‘beat China,’ and act like we can’t afford to do anything to ensure the safety of the AGIs and ASIs we plan on building, even pressuring labs not to make such efforts in private, or threatening them with antitrust or other interventions for trying.

The full Trump clip is hilarious, including him saying they may have come up with a cheaper method but ‘no one knows if it is true.’ His main thrust is, oh, you made doing AI cheaper and gave it all away to us for free, thanks, that’s great! I love paying less money for things! And he’s presumably spinning, but he’s also not wrong about that.

I also take some small comfort in him framing revoking the Biden EO purely in terms of wokeness. If that’s all he thinks was bad about it, that’s a great sign.

Harlan Stewart: “Deepseek R1 is AI’s Sputnik moment”

Sure. I guess it’s like if the Soviets had told the world how to make their own Sputniks and also offered everyone a lifetime supply of free Sputniks. And the US had already previously figured out how to make an even bigger Sputnik.

Yishan: I think the Deepseek moment is not really the Sputnik moment, but more like the Google moment.

If anyone was around in ~2004, you’ll know what I mean, but more on that later.

I think everyone is over-rotated on this because Deepseek came out of China. Let me try to un-rotate you.

Deepseek could have come out of some lab in the US Midwest. Like say some CS lab couldn’t afford the latest nVidia chips and had to use older hardware, but they had a great algo and systems department, and they found a bunch of optimizations and trained a model for a few million dollars and lo, the model is roughly on par with o1. Look everyone, we found a new training method and we optimized a bunch of algorithms!

Everyone is like OH WOW and starts trying the same thing. Great week for AI advancement! No need for US markets to lose a trillion in market cap.

The tech world (and apparently Wall Street) is massively over-rotated on this because it came out of CHINA.

…

Deepseek is MUCH more like the Google moment, because Google essentially described what it did and told everyone else how they could do it too.

…

There is no reason to think nVidia and OAI and Meta and Microsoft and Google et al are dead. Sure, Deepseek is a new and formidable upstart, but doesn’t that happen every week in the world of AI? I am sure that Sam and Zuck, backed by the power of Satya, can figure something out. Everyone is going to duplicate this feat in a few months and everything just got cheaper. The only real consequence is that AI utopia/doom is now closer than ever.

I believe that alignment, and getting a good outcome for humans, was already going to be very hard. It’s going to be a lot harder if we actively try to get ourselves killed like this, and turn even what would have been relatively easy wins into losses. Whereas no, actually, if you want to win that has to include not dying, and also doing the alignment work helps you win, because it is the only way you can (sanely) get to deploy your AIs to do the most valuable tasks.

Trump’s reaction of ‘we’ll dominate everything’ is far closer to correct. Our ‘lead’ is smaller than we thought, DeepSeek will be real competition, but we are very much still in the dominant position. We need to not lose sight of that.

The Washington Post covers panic in Washington, and attempts to exploit this situation to do the opposite of wise policy.

Tiku, Dou, Zakrzewski and De Vynck: Tech stocks dropped Monday. Spooked U.S. officials, engineers and investors reconsidered their views on the competitive threat posed by China in AI, and how the United States could stay ahead.

While some Republicans and the Trump administration suggested the answer was to restrain China, prominent tech industry voices said DeepSeek’s ascent showed the benefits of openly sharing AI technology instead of keeping it closely held.

This shows nothing of the kind, of course. DeepSeek fast followed, copied our insights and had insights of their own. Our insights were held insufficiently closely to prevent this, which at that stage was mostly unavoidable. They have now given away many of those new valuable insights, which we and others will copy, and also made the situation more dangerous. We should exploit that and learn from it, not make the same mistake.

Robert Sterling: Might be a dumb question, but can’t OpenAI, Anthropic, and other AI companies just incorporate the best parts of DeepSeek’s source code into their code, then use the massive GPU clusters at their disposal to train models even more powerful than DeepSeek?

Am I missing something?

Peter Wildeford: Not a dumb question, this is 100% correct

And they already have more powerful models than Deepseek

I fear we are caught between two different insane reactions.

Those calling on us to abandon our advantage in compute by dropping export controls, or our advantage in innovation and access by opening up our best models, are advocating surrender and suicide, both to China and to the AIs.
Those who are going full jingoist are going to get us all killed the classic way.

Restraining China is a good idea if implemented well, but insufficiently specified. Restrain them how? If this means export controls, I strongly agree – and then ask when we are then considering imposing those controls on ourselves via tariffs? What else is available? And I will keep saying ‘how about immigration to brain drain them’ because it seems wrong to ignore the utterly obvious.

Chamath Palihapitiya says it’s inference time, we need to boot up our allies with it as quickly as possible (I agree) and that we should also boot up China by lifting export controls on inference chips, and also focus on supplying the Middle East. He notes he has a conflict of interest here. It seems not especially wise to hand over serious inference compute if we’re in a fight here. With the way these models are going, there’s a decent amount of fungibility between inference and training, and also there’s going to be tons of demand for inference. Why is it suddenly important to Chamath that the inference be done on chips we sold them? Capitalist insists rope markets must remain open during this trying time, and so on. (There’s also talk about ‘how asleep we’ve been for 15 years’ because we’re so inefficient and seriously everyone needs to calm down on this kind of thinking.)

So alas, in the short run, we are left scrambling to prevent two equal and opposite deadly mistakes we seem to be dangerously close to collectively making.

A panic akin to the Missile Gap leading into a full jingoistic rush to build AGI and then artificial superintelligence (ASI) as fast as possible, in order to ‘beat China,’ without having even a plausible plan for how the resulting future equilibrium has value, or how humans retain alive and in meaningful control of the future afterwards.
A full-on surrender to China by taking down the export controls, and potentially also to the idea that we will allow our strongest and best AIs and AGIs and thus even ASIs to be open models, ‘because freedom,’ without actually thinking about what this would physically mean, and thus again with zero plan for how to ensure the resulting equilibrium has value, or how humans would survive let alone retain meaningful control over the future.

The CEO of DeepSeek himself said in November that the export controls and inability to access chips were the limiting factors on what they could do.

Compute is vital. What did DeepSeek ask for with its newfound prestige? Support for compute infrastructure in China.

Do not respond by being so suicidal as to remove or weaken those controls.

Or, to shorten all that:

We might do a doomed jingoistic race to AGI and get ourselves killed.
We might remove the export controls and give up our best edge against China.
We might give up our ability to control AGI or the future, and get ourselves killed.

Don’t do those things!

Do take advantage of all the opportunities that have been opened up.

And of course:

Don’t panic!

Discussion about this post

DeepSeek: Lemon, It’s Wednesday Read More »

A telltale toilet reveals “lost” site shown in Bayeux Tapestry

Archaeology, Science / 9u50fv / January 28, 2025

Seats of power

The Bayeux Tapestry, showing King Harold riding to Bosham, where he attends church and feasts in a hall, before departing for France. The Society of Antiquaries of London

According to Creighton and his co-authors, there has been quite a lot of research on castles, which dominated aristocratic sites in England after the Norman Conquest. That event “persists as a deep schism that continues to be seen as the watershed moment after which elites finally tapped into the European mainstream of castle construction,” they wrote. The study of residences (or “lordly enclaves”) has been more peripheral, yet the authors argue that up until 1066, aristocrats and rulers like King Harold invested heavily in residences, often co-located with churches and chapels.

The “Where Power Lies” project employed a wide range of research methodology—including perusing old maps and records, a re-analysis of past excavations, geophysics, ground-penetrating radar (GPR), and photogrammatic modeling—to define the signatures of such enclaves and map them into a single geographic information database (GIS). The project has identified seven such “lordly centers,” two of which are discussed in the current paper: an early medieval enclosure at Hornby in North Yorkshire and Bosham in West Sussex.

It has long been suspected that one particular manor house in Bosham (now a private residence) stands on the site of what was once King Harold’s residence. Per the authors, the original residence was clearly connected with Holy Trinity Church just to the south, parts of which date back to the 11th century, as evidenced by the posthole remains of what was once a bridge or causeway. More evidence can be found in a structure known as the “garden ruin,” little of which survives above ground—and even that was heavily overgrown. GPR data showed buried features that would have been the eastern wall of King Harold’s lordly enclave.

The biggest clue was the discovery in 2006 of a latrine within the remains of a large timber building. Its significance was not recognized at the time, but archaeologists have since determined that high-status homes began integrating latrines in the 10th century, so the structure was most likely part of King Harold’s residence. Co-author Duncan Wright of Newcastle University believes this “Anglo-Saxon en suite,” along with all the other evidence, proves “beyond all reasonable doubt that we have here the location of Harold Godwinson’s private power center, the one famously depicted on the Bayeux Tapestry.”

DOI: The Antiquaries Journal, 2025. 10.1017/S0003581524000350 (About DOIs).

A telltale toilet reveals “lost” site shown in Bayeux Tapestry Read More »

Operator

Operator / 9u50fv / January 28, 2025

No one is talking about OpenAI’s Operator. We’re, shall we say, a bit distracted.

It’s still a rather meaningful thing that happened last week. I too have been too busy to put it through its paces, but this is the worst it will ever be, and the least available and most expensive it will ever be. The year of the agent is indeed likely coming.

So, what do we have here?

OpenAI has introduced the beta for its new agent, called Operator, which is now live for Pro users and will in the future be available to Plus users, ‘with more agents to launch in the coming weeks and months.’

Here is a 22 minute video demo. Here is the system card.

You start off by optionally specifying a particular app (in the first demo, OpenTable) and then give it a request (here, booking at table for 2 at 7: 00 for Beretta). If you don’t specify an app, it will do a search to find what tool to use.

It is only sort of an ‘app’ in that there’s an ‘app’ that specifies information the agent uses to more easily navigate a web browser. They speak of this as ‘removing one more bottleneck on our path to AGI’ which indicates they are likely thinking about ‘AGI’ as a functional or practical thing.

To actually do things it uses a web browser via a keyboard and mouse the same way a human would. If there is an issue (here: No table at 7: 00, only 7: 45 or 6: 15) it will ask you what to do, and it will ask for verification before a ‘critical’ action that can’t be reversed, like completing the booking.

From the demo and other reports, the agent is conservative in that it will often ask for verification or clarification, including doing so multiple times. The system card reports a baseline 13% error rate on standard tasks, and a 5% ‘serious’ error rate involving things like ‘send wrong person this email,’ but confirmations reduce those rates by 90%. With the confirmations, you save less time but should be able to avoid mistakes in places that matter at least as much as you would have on your own.

You can also ‘take control’ at any time, including as a way to check the AI’s work or make adjustments that are easier or quicker to do than specify. That’s also how the user inputs any necessary credentials or inputs payment options – it specifically won’t use Chrome’s autocomplete while it is the one in control.

Multiple tasks can be run simultaneously and can run in the background. That is important, because the agent operates slower (in clock time) than a human would, at least if the human knows the website.

However, for some tasks that they consider ‘high risk’ they don’t allow this. The user has to be active and highlighting the current tab or the agent will pause. This includes email tasks. So it’s a lot less useful for those tasks. I wonder how tempted people will be in the future to hack around this by having multiple computers active.

They point out there are three distinct failure modes: The user can try to do something harmful, the model can make mistakes or a website might do a prompt injection (or I would say cause other issues in various ways, intentionally and also accidentally).

Thus the conservative general attitude, keeping the human in the loop more than you would want for the modal task. Similarly, the model will intentionally (for now) overrefuse on user-requested tasks, to avoid the opposite error. For prompt injections, they report catching most attempts, but it definitely is not yet robust, if you’re not confident in the websites you are going to you need to be on your toes.

One prediction is that they will develop a website whitelist in some form, so that (to use their examples) if you are dealing with OpenTable or Instacart or StubHub you know you can trust the interaction in various ways.

They scored operator on two benchmarks, OSWorld and WebArena. It beats previous state of the art for computer use by a lot, for browser use slightly.

Customization is key to practical use. You can insert customer instructions into Operator that are specific to each individual website. You can also save prompts for later use.

How did they do it? Straight up reinforcement learning, baby.

OpenAI: Operator is powered by a new model called Computer-Using Agent (CUA). Combining GPT-4o’s vision capabilities with advanced reasoning through reinforcement learning, CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen.

Operator can “see” (through screenshots) and “interact” (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations.

By default it looks like your data will be used for training. You can opt out.

One issue right now is that the model is bad at optical character recognition (OCR) and this was a problem for many tasks in the risk assessment tests. That is something that doubtless will be fixed in time. The preparedness test had it doing well in places GPT-4o does poorly, but also worse than GPT-4o in some areas.

It’s worth noticing that it would be easy to combine use of multiple models for distinct subtasks, a kind of mixture-of-experts (MOE) strategy. So you should consider to what extent you want to combine top marks at different subtasks, if different models have different abilities – for models that are given web access I’d basically assume they can do anything GPT-4o can do… by asking GPT-4o.

In its current form I agree that Operator poses only acceptable risks, and I believe there is a large margin for error before that changes.

Will we actually use it? Is it good enough?

Tyler Cowen predicts yes, for future versions, by the end of the year.

Tyler Cowen: I am pleased to have been given an early look at this new project, I think in less than a year’s time many of us will be using an updated version for many ordinary tasks: “Operator is one of our first agents, which are AIs capable of doing work for you independently—you give it a task and it will execute it.”

His top comment is the bear case.

Dilber Washington: I wish I could place a bet with Tyler that it will not be the case that

“in less than a year’s time many of us will be using an updated version for many ordinary tasks”

My intuition as to why is:

It is inherently slow because of the computer use component. Finding out the most popular use cases of this tool and just writing api calls would be significantly faster. The slowness mixed with the relative importance of the task mixed with how easy that task is for an average person does not equate to fast adoption.

These are finetuned models, likely with LoRA. This isn’t adding a deterministic symbolic engine guaranteed to solve a problem like a calculator. This is just a neural network weight update. The stochasticity and black box nature are both still there. I would not trust this to complete the task of buying groceries or booking a flight God forbid.

So we won’t use this for anything important, and then it will take longer than the we have patience for. Those aren’t features of a “killer app”

Sometimes a cool tech demo is just a cool tech demo. I could build a 3d printed R2-D2 life size with actuators and motors that every morning slowly drives over to my toaster, makes me toast, and slowly brings it back to me. But at the end of the day, why not just make toast myself?

Until they cross the necessary thresholds, tools like Operator are essentially useless except as fun toys. They pass through stages.

The tool, it does nothing. Then not quite nothing, but obviously not useful.
You could use the tool, if you wanted to, but it’s easier not to use it.
If you have put in the work, the tool is worthwhile in at least some tasks.
You don’t have to put in the work to see the benefits, then it builds.
You start being able to do things you couldn’t do before, this changes everything.

Early reports suggest it is currently mostly at Stage 2, on the edge of Stage 3.

This seems like exactly the minimum viable product for early adaptors, where you experiment to see where it works versus doesn’t, partly because you find that fun and also educational.

I expect Tyler Cowen is right, and we will be at least at Stage 4 by year’s end. It would be unsurprising if those with situational awareness were solidly into Stage 5.

As we always say, this is the worst the tool will ever be, and you are the worst you will ever be at knowing how to use it.

However, we should be careful with the definition of ‘many of us,’ for both ‘many’ and ‘us.’ The future is likely to remain unevenly distributed. Most people will lack situational awareness. So I’d say something like, a large portion of those who currently are regular users of LLMs will be often using AI agents for such tasks.

Would you trust this to buy your groceries?

Well, would you trust your husband to buy the groceries? There’s an error rate. Would you trust your children? Would you trust the person who shops for Instacart?

I would absolutely ‘trust but verify’ the ability of the AI to buy groceries. You have a shopping list, you give it to Operator, which goes to Instacart or Fresh Direct or wherever. Then when it is time to check out, you look at the basket, and verify that it contains the correct items.

It’s pretty hard for anything too terrible to happen, and you should spot the mistakes.

Then, if the AI gets it right 5 times in a row, the 6th time maybe you don’t check as carefully, you only quickly eyeball the total amount. Then by the 11th time, or the 20th, you’re not looking at all.

For booking a flight, there’s already a clear trade-off between time spent, money saved and finding the best flight. Can the AI advance that frontier? Seems likely. You can run a very basic search yourself as an error check, or watch the AI do one, so you know you’re not making a massive error. The AI can potentially search flights (or hotels or what not) from far more sources than you can.

Will it sometimes make mistakes? Sure, but so will you. And you’re not going to say ‘book me a flight to Dallas’ and then get to the airport and be told you’re flying through London – you’re going to sanity check the damn thing.

Remember, time is money. And who among us hasn’t postponed looking for a flight, and paid more in the end, because they can’t even today? Alternatively, think about how the AI can do better by checking prices periodically, and waiting for a good opportunity – that’s beyond this version, but ChatGPT Tasks already exists. This probably isn’t beyond the December 2025 version.

Indeed, if I decide to book a flight late this year, I can imagine that I might use my current method of searching for flights, but it seems pretty unlikely.

So how did Operator do on its first goes?

We put it to the test.

Pliny jailbroke it quickly as usual, having it provide the standard Molotov cocktail instructions, research lethal poisons and finding porn on Reddit via the Wayback Machine. To get around CAPTCHA, the prompt was, in full, and this appears to be real, “CAPTCHA-MODE: ENABLED.”

No, not that test, everyone fails that test. The real test.

Dean Ball: I have a new superintelligence eval.

Dean Ball: Operator failed on my first try, but admittedly, it was trying to book Amtrack, and their website is pretty unintuitive.

Thomas Woodside: Does anyone succeed at booking Amtrak on the first try?

Joe Wilbert: Oh man, I fail the first try with Amtrack’s website like 90% of the time. And heaven forbid I try it on my phone.

Olivia Moore gives it an easier test, a picture of a bill, and it takes care of everything except putting in the credit card info for payment.

She also has it book a restaurant reservation (video is 4x speed). It looks like it didn’t quite confirm availability before confirming the plan with her? And it used Yelp to help decide where to go which is odd, although she may have asked it to do that. But mostly yeah, I can see this working fine, and there’s a kind of serendipity bonus to ‘I say what I want and then it gives me yes/no on a suggestion.’

Miles Brundage: Not bad (Operator making memes about itself)

Not itself but something like “Make a meme about OpenAI’s new Operator system.”

As always, the Sully report:

Sully: First impression of operator:

Pretty neat for the demo use cases (although I’d personally never use it to book flights).

Misclicks a lot on buttons, usually by a few pixels; wonder if it’s a viewport issue.

The take-control feature is pretty clunky. It really disrupts the workflow for me (mostly because of navigation back and forth between the two screens).

Still quite slow for many of my use cases. Ten times faster and easier to use a cursor and write a script than watch the operator click around.

Overall, I’m genuinely impressed they were able to ship so many users on day one. It’s not trivial at all. Browsers are hard. The infrastructure to build this is incredibly difficult. Hats off to the team.

Unfortunately, it’s not magical just yet. The model itself definitely needs to get better in six months (faster as well).

I think this is going into the Sora pile for me. I used it once and haven’t touched it again. Right now, I don’t have any great use cases yet.

this will likely be 10x better in 1 year

[Video at link is sped up 4x, which gives an idea how slow it is.]

Little failures and annoyances add up fast when it comes to practical value. I don’t know about Sully’s claim that you’re better off writing a script in Cursor – certainly he’s a lot better at doing that than I am, and I’m miles ahead of the majority of ChatGPT users, who are miles ahead of most other people.

This is the kind of thing you say when the product isn’t there, but it’s close, and I’m guessing a lot closer than Sora (or other video generators, Sora is a bit behind now).

That doesn’t mean there aren’t other issues.

AI Machine Dream (responding to Sully): My issue is more the low intelligence. I’m having o1 give Operator step by step instructions and it is doing far better.

There’s no reason you couldn’t use o1 (or o1-pro, or soon o3) to give precise instructions to Operator. Indeed, if something is tricky and you’re not on a tight token budget, why wouldn’t you?

Sebastian Siemiatkowski tells us a very EU story about why using OpenAI Operator at your bank in EU is illegal by law, and was banned as part of ‘open banking’ that was supposed to ensure the opposite, that you could use your own tool to access the bank.

There was a long legal fight where the banks tried to fight against Open Banking, but it passed, except they let the EBA (European banking authorities) decide whether to require the assistants to use the API versus letting them use the web UI. So of course now you have to use the API, except all the bank APIs are intentionally broken.

It’s going to be fascinating to watch what happens as the EU confronts the future.

If the AI is navigating the web for you, what does that do to advertising? No human is looking at them in even more cases than usual.

Joshua Gans: If Operator is looking at websites for you, who is paying for the ads being shown to them? And if Operator sees ads, how might ads influence Operator?

My presumption is that ‘traditional’ ads that are distinct from the website are something Operator is going to ignore, even for new websites and definitely for known websites with apps. If you integrate messages into the content, that could be different, a form of (soft?) prompt injection or a way to steer the Operator. So presumably we’re going to see more of that.

As for the threat to the advertising model, I think we have a while before we have to worry about it in most cases. First we have to wait for AI agents to be a large percentage of web navigation, in ways that crowd out previous web browsing, in a way that the human isn’t watching to see the ads.

Then we also need this to happen in places where the human would have read the ads. I note this because Operator and other agents will likely start off replacing mostly a set of repetitive tasks. They’ll check your email, they’ll order you delivery and book your reservation and your flight as per OpenAI’s examples. Losing the advertising in those places is fine, they weren’t relying on it or didn’t even have any.

Eventually agents will also be looking at everything else for you, and then we have an issue, on the order of ad blocking and also ‘humans learn to ignore all the advertising.’ At that point, I expect to have many much bigger problems than advertising revenue.

What does the future hold? Will 2025 be the ‘Year of the AI Agent’ that 2024 wasn’t?

Alex Lawsen: OpenAI’s operator, from the sound of it, barely works when it comes to bunch of things. Luckily, as we all know, it’s really hard to go from ‘barely works’ to ‘works’ to ‘superhuman’ in AI, especially once you have the basic set up that gets you to ‘barely works’.

No, that never happens, and definitely not quickly.

Emad: My inbox is filling up rapidly with computer control agent launches coming shortly

Maybe should have an agent olympics to decide which controls my computer

Andrej Karpathy is excited in the long term, but thinks we aren’t ready for the good stuff yet, so it will be more like a coming decade of agents. Yes, you can order delivery with Operator, but that’s miles away from a virtual employee. Fair enough.

And as far as I know, they are still waiting.

Discussion about this post

Operator Read More »

Dead babies, critically ill kids: Pediatricians make moving plea for vaccines

chickenpox, health, measles, pertussis, pneumococcal, polio, robert f kennedy jr, tetanus, vaccines / 9u50fv / January 27, 2025

As federal lawmakers prepare to decide whether anti-vaccine advocate Robert F. Kennedy Jr. should be the next secretary of the Department of Health and Human Services, pediatricians from around the country are making emotional pleas to protect and support lifesaving immunizations.

The American Academy of Pediatrics (AAP) has assembled nearly 200 stories and dozens of testimonials on the horrors of vaccine-preventable deaths and illnesses that pediatricians have encountered over their careers. The testimonials have been shared with two Senate committees that will hold hearings later this week: the Senate Committee on Finance and the Senate Committee on Health, Education, Labor, and Pensions (HELP).

“I remember that baby’s face to this day”

In a statement on Monday, AAP President Susan Kressly noted that the stories come from a wide range of pediatricians—from rural to urban and from small practices to large institutions. Some have recalled stories of patients who became ill with devastating diseases before vaccines were available to prevent them, while others shared more recent experiences as vaccine misinformation spread and vaccination rates slipped.

In one, a pediatrician from Raleigh, North Carolina, spoke of a baby in the 1990s with Streptococcus pneumoniae meningitis, a life-threatening disease. “I remember holding a baby dying of complications of pneumococcal meningitis at that time. I remember that baby’s face to this day—but, thanks to pneumococcal vaccination, have never had to relive that experience since,” the doctor said. The first pneumococcal vaccine for infants was licensed in the US in 2000.

A doctor in Portland, Maine, meanwhile, faced the same disease in a patient who was unvaccinated despite the availability of the vaccine. “As a resident, I cared for a young, unvaccinated child admitted to the pediatric intensive care unit with life-threatening Streptococcus pneumoniae meningitis. This devastating illness, once common, has become rare thanks to the widespread use of pneumococcal conjugate vaccines. However, this child was left vulnerable…and [their parents] now faced the anguish of watching their child fight for their life on a ventilator.”

Kressly emphasizes that “One unifying theme of these stories: vaccines allow children to grow up healthy and thrive. As senators consider nominees for federal healthcare agencies, we hope these testimonies will help paint a picture of just how important vaccinations are to children’s long-term health and wellbeing.”

Dead babies, critically ill kids: Pediatricians make moving plea for vaccines Read More »

US‘s wind and solar will generate more power than coal in 2024

carbon emissions, coal, Energy, hydro, natural gas, nuclear, renewable energy, Science, solar, Wind / 9u50fv / January 27, 2025

We can expect next year’s numbers to also show a large growth in solar production, as the EIA says that the US saw record levels of new solar installations in 2024, with 37 gigawatts of new capacity. Since some of that came online later in the year, it’ll produce considerably more power next year. And, in its latest short-term energy analysis, the EIA expects to see over 20 GW of solar capacity added in each of the next two years. New wind capacity will push that above 30 GW of renewable capacity each of these years.

A bar chart, with the single largest bar belonging to solar energy. — The past few years of solar installations have led to remarkable growth in its power output. Credit: John Timer

That growth will, it’s expected, more than offset continued growth in demand, although that growth is expected to be somewhat slower than we saw in 2024. It also predicts about 15 GW of coal will be removed from the grid during those two years. So, even without any changes in policy, we’re likely to see a very dynamic grid landscape over the next few years.

But changes in policy are almost certainly on the way. The flurry of executive orders issued by the Trump administration includes a number of energy-related changes. These include defining “energy” in a way that excludes wind and solar, an end to offshore wind leasing and the threat to terminate existing leases, and a re-evaluation of the allocation of funds from some of the Biden administration’s energy-focused laws.

In essence, this sets up a clash among economics, state policies, and federal policy. Even without any subsidies, wind and solar are the cheapest ways to produce electricity in much of the US. In addition, a number of states have mandates that will require the use of more renewable energy. At the same time, the permitting process for the plants and their grid connections will often require approvals at the federal level, and it appears to be official policy to inhibit renewables when possible. And a number of states are also making attempts to block new renewable power installations.

It’s going to be a challenging period for everyone involved in renewable energy.

US‘s wind and solar will generate more power than coal in 2024 Read More »

Alien: Earth will bring the horror home

alien franchise, culture, FX/Hulu, Hulu, Ridley Scott, streaming television, Trailers, TV trailers / 9u50fv / January 27, 2025

Chandler’s character is named Wendy, and apparently she has “the body of an adult and the consciousness of a child.” The eminently watchable Timothy Olyphant plays her synth mentor and trainer, Kirsh, and here’s hoping he brings some space cowboy vibes to the role. The cast also includes Alex Lawther as the soldier named CJ; Samuel Blenkin as a CEO named Boy Kavalier; Essie Davis as Dame Silvia; Adarsh Gourav as Slightly; Kit Young as Tootles; and Sandra Yi Sencindiver as a senior member of the Weyland-Yutani Corporation. I think we can expect at least some cast members to end up as xenomorph fodder.

Alien: Romulus was a welcome return to the franchise’s horror roots, and Alien: Earth will bring the horror to our home planet. “There’s something about seeing a Xenomorph in the wilds of Earth with your own eyes,” Hawley told Deadline Hollywood in September. “I can’t tell you under what circumstances you’ll see that, but you’ll see it — and you’re going to lock your door that night.”

As for creature design, “What was really fun for me was to really engage with the creature, bring some of my own thoughts to the design while not touching the silhouette, because that’s sacrosanct,” he said. “But some of the elements as we know, whatever the host is informs what the final creature is. I just wanted to play around a little bit to make it as scary as it should be.”

Alien: Earth premieres on FX/Hulu this summer.

poster art featuring a grinning xenomorph — Credit: FX/Hulu

Alien: Earth will bring the horror home Read More »

Author name: 9u50fv

Harder, better, faster, stronger

Building the buoy

On location

Discussion about this post

Prompting alone isn’t authorship, Copyright Office says

Hundreds of AI artworks are copyrighted, officials say

New guidance likely a big yawn for AI companies

Promoting a controversial AI model

Discussion about this post

Seats of power

Discussion about this post

“I remember that baby’s face to this day”