Author name: Shannon Garcia

report:-rfk-jr.’s-anti-vaccine-agenda-curbed-as-gop-realizes-it’s-unpopular

Report: RFK Jr.’s anti-vaccine agenda curbed as GOP realizes it’s unpopular

Kennedy’s plans were only getting started. The staunch anti-vaccine activist and conspiracy theorist made his most brazen attack on vaccines in January, slashing the CDC’s childhood vaccine schedule from 17 immunizations down to 11 to be in line with recommendations of Denmark, a much smaller country with a relatively homogenous population and universal health care. The US is now an outlier among peer nations for recommending so few childhood vaccines.

Conspiracy theories and political risks

While these and other changes to vaccine recommendations by Kennedy and his underlings have been widely decried by medical and public health experts, they are still not enough for his rabid anti-vaccine followers, who, in no uncertain terms, want all vaccines abolished.

On Monday, the MAHA Institute, a think tank stemming from Kennedy’s Make America Health Again movement, held an event brimming with prominent anti-vaccine activists. Those include Del Bigtree, a prominent conspiracy theorist who leads the anti-vaccine group Informed Consent Action Network, and Mary Holland, who is CEO of the anti-vaccine group Children’s Health Defense, which Kennedy founded.

The event was focused on an alleged “Massive Epidemic of Vaccine Injury,” a nonexistent health crisis the MAHA institute wants to sell to the American public, branded as the catchy term “Mevi.” The six-hour event was essentially an extravaganza of anti-vaccine talking points, with false claims, misinformation, and disinformation about immunizations, including that vaccines cause autism and autoimmune diseases and COVID-19 vaccines are deadly.

At the start of the event, MAHA Institute President Mark Gordon laid out his grand belief that the medical community has orchestrated an elaborate, global, decades-long conspiracy to hide the dangers of vaccines, which he called poisons, and falsify data showing their benefits. “Vaccines are the greatest scam in medical history,” one of his slides proclaimed.

He concluded that “the childhood vaccination schedule needs to be eliminated and all vaccines need to be removed from the market.”

While Gordon and the other speakers were not concerned about the popularity or political ramifications of their beliefs, the Trump administration appears to be. The Post noted that Trump’s top pollster, Tony Fabrizio, has concluded that vaccine skepticism is “rejected by most voters,” and skepticism of vaccine requirements is “politically risky.” His polling data, like many others, have found broad support for vaccines and vaccine requirements. Fabrizio warned in a December memo that politicians supporting eliminating vaccine recommendations  “will pay a price in the election.”

Report: RFK Jr.’s anti-vaccine agenda curbed as GOP realizes it’s unpopular Read More »

fcc-chair-blasts-amazon-after-it-criticizes-spacex-megaconstellation

FCC chair blasts Amazon after it criticizes SpaceX megaconstellation

In addition to parrying with SpaceX over its proposed, vastly larger orbital data center constellation, Amazon is seeking some regulatory relief of its own. Most pressing for Amazon is a deadline to deploy half of its Amazon Leo constellation, intended to ultimately comprise 3,236 satellites, by July 30. The company will not meet this deadline, with only a little more than three months to go, and Amazon has requested an extension, asking for it to be moved to July 30, 2028.

Carr pulls up

On Wednesday, FCC Chairman Brendan Carr injected himself into the SpaceX-Amazon fracas over megaconstellations.

“Amazon should focus on the fact that it will fall roughly 1,000 satellites short of meeting its upcoming deployment milestone, rather than spending their time and resources filing petitions against companies that are putting thousands of satellites in orbit,” Carr said on X, the social media network owned by Musk.

There are arguments to be made in favor of both SpaceX and Amazon regarding their competing concerns. For example, SpaceX is likely to be able to greatly accelerate the rate at which it launches satellites with the forthcoming Starship rocket. So saying it will take centuries to put its data centers into space is not likely true.

However, it is valid to criticize SpaceX’s application for 1 million satellites, which is an extraordinary number of spacecraft that would completely change many things about low-Earth orbit. The SpaceX application did not contain critical information about the size, mass, and other details needed to evaluate the constellation for safety and other concerns.

It cannot be comfortable for Amazon and Bezos to see Carr weighing in so publicly and favorably on Musk’s side. Legally, Carr is allowed to have strongly held policy views. But he is not supposed to single out companies for preferential treatment.

FCC chair blasts Amazon after it criticizes SpaceX megaconstellation Read More »

gpt-5.4-is-a-substantial-upgrade

GPT-5.4 Is A Substantial Upgrade

Benchmarks have never been less useful for telling us which models are best.

They are good for giving a general sense of the landscape. They definitely paint a picture. But if you’re comparing top models, like GPT-5.4 against Opus 4.6 against Gemini 3.1 Pro, you have to use the models, talk to the models, get reports from those who have and form a gestalt. The reports will contradict each other and you have to work through that. There’s no other way.

Thus, I try to gather and sort a reasonably comprehensive set of reactions, so you can browse the sections that make you most curious.

The gestalt is that GPT-5.4 is a very good model, sir. It’s a substantial upgrade from GPT-5.2, and also from 5.3-Codex, and it puts OpenAI back in the game, whereas I felt like Opus 4.6 dominated OpenAI’s previous offerings for all but narrow uses.

Each lab’s models vary and things change over time, but they tend to have consistent strengths, weaknesses and personalities. From what I’ve seen this is very much an OpenAI model. It’s highly capable, and it is especially seen as a big improvement by the whisperers and those who watch LLMs interact with each other, but it’s not aspiring to be a Claude.

GPT-5.4 Self-Portrait

GPT-5.4 seems like a substantial upgrade over GPT-5.2.

GPT-5.4 seems excellent so far at assembling facts and giving your the rundown, or figuring out what is happening, and other things like that.

I haven’t coded anything since GPT-5.4 came out. It’s clearly good at coding. One key question people are split on is whether it is good at solving for your intent.

Many are reporting that its writing and personality are much improved, and that it can now be used for writing and editing in spots previous models were not useful.

They are claiming strong computer use but no one seems to be testing that either way.

It costs more than GPT-5.2 per token. In some places it gets that back in efficiency, but overall AA reports costs modestly rose from $2304 to $2951. Opus is more expensive ($4970) in max mode, but cheaper ($1451) in normal mode. GPT-5.4-Pro is of course by far the most expensive thing out there, so if you want it then lean on that subscription.

GPT-5.4 is not a step change in core general capabilities. The preparedness framework scores make this clear, and there are various signs that OpenAI’s strategy is focusing on hitting internal metrics and improving the most common use cases. In practice that can be highly useful.

The ‘model relations department,’ those concerned with multi-model interactions and model welfare and consciousness and so on, see this as a big step forward for OpenAI. There’s still a long way to go.

I haven’t noticed much personality from it, and I get more joy from Claude Opus 4.6 than I do from GPT-5.4, but I don’t ask those questions so much.

It’s given me strong pushback, including in places where I think it is wrong. I prefer that to the alternative, if it is not actually convinced.

Benchmarks are solid, but not spectacular, and as I note above they no longer are so relevant.

My recommendation is that you try both GPT-5.4 and Claude Opus 4.6 on all your questions for a bit, and if you’re coding consider giving both of them your problems, and form your own opinion for your particular use case.

For questions that are more than a quick answer or sanity check, I’ve found that dual wielding both Opus 4.6 and GPT-5.4 has been quite useful. I did not feel that way with GPT-5.2, and I don’t typically bother with Gemini 3.1 Pro at this point either.

Sam Altman (CEO OpenAI): GPT-5.4 is launching, available now in the API and Codex and rolling out over the course of the day in ChatGPT.

It’s much better at knowledge work and web search, and it has native computer use capabilities.

You can steer it mid-response, and it supports 1m tokens of context.

GPT-5.4 is great at coding, knowledge work, computer use, etc, and it’s nice to see how much people are enjoying it.

But it’s also my favorite model to talk to! We have missed the mark on model personality for awhile, so it feels extra good to be moving in the right direction.

OpenAI: Today, we’re releasing GPT‑5.4 in ChatGPT (as GPT‑5.4 Thinking), the API, and Codex. It’s our most capable and efficient frontier model for professional work. We’re also releasing GPT‑5.4 Pro in ChatGPT and the API, for people who want maximum performance on complex tasks.

GPT‑5.4 brings together the best of our recent advances in reasoning, coding, and agentic workflows into a single frontier model. It incorporates the industry-leading coding capabilities of GPT‑5.3‑Codex⁠ while improving how the model works across tools, software environments, and professional tasks involving spreadsheets, presentations, and documents. The result is a model that gets complex real work done accurately, effectively, and efficiently—delivering what you asked for with less back and forth.

SWE-Bench is slightly above 5.3-Codex at all thinking levels, but only slightly.

The graying out is kind of radical here, but I suppose it’s progress.

Tejal Patwardhan (OpenAI): GPT-5.4 is state-of-the-art on GDPval, and here are some examples of how the model is much better at well-specified knowledge work tasks

6mos ago the models could barely make a spreadsheet or slide! progress is happening really fast

roon (OpenAI): 5.4 is my personal 4o honestly it just gets me

Things they are highlighting:

  1. You can now adjust course mid-response.

  2. Improved deep web research.

  3. Better at maintaining context for longer thinking.

  4. Native SoTA computer use capabilities.

  5. 1M token context window.

  6. Improved tool search, now directly in the API.

  7. Improved token efficiency.

  8. Also released same day: ChatGPT for Excel add-in, along with updated spreadsheet and presentation skills in Codex and their API.

  9. /fast in Codex gives you 50% faster tokens.

Pricing is a little higher than 5.2, which is unusual. Hopefully token efficiency more than makes up for it?

Frontier Math scores are up, especially on Tier 4. Trying pass@ten for 5.4-xhigh got it to 38%, including solving a problem no model has solved before.

Epoch AI: GPT-5.4 set a new record on FrontierMath, our benchmark of extremely challenging math problems! We had pre-release access to evaluate the model. On Tiers 1–3, GPT-5.4 Pro scored 50%. On Tier 4 it scored 38%.

Leeham: GPT-5.4 Pro solves the first of the FrontierMath Open Problems!

Two days ago, I sent @AcerFur a potential solution to this problem and was sent to @GregHBurnham for verification (prior to any other solution).

We are confident it’s correct and waiting to hear from the author!

Exciting stuff, I will report back when I know the outcome.

Progress continues on ZeroBench.

Jonathan Roberts: GPT-5.4 xhigh sets a new pass@5 and pass^5 SOTA on ZeroBench

pass@5: 23% (prev. 19%)

pass^5: 8% (prev. 7%)

Artificial Analysis has GPT-5.4 in a virtual tie with Gemini 3.1 Pro.

Their version of GDPval, called GDPval-AA, has 5.4 about 1% ahead of Opus 4.6.

AA-Omniscience (which is correct minus incorrect) remains dominated by Gemini 3.1 Preview at +33, versus Opus at +14 and GPT-5.4 at +10.

Score on Artificial Analysis Physics was exceptionally strong.

AA reports speed of 74 tokens per second, which is quite good for this quality level, versus Opus at 47 and Gemini 3.1 Pro at 114 (but I said this quality level).

Gemini 3 Pro beats out Claude Opus 4.6 in the final of Season 1 of MageBench, on Magic: The Gathering, with GPT-5.4 (medium) losing a tight semi to Gemini. Current Elo ratings have Opus on top, then GPT-5.2 (?) with Gemini in third and GPT-5.4 7th.

Håvard Ihle: GPT 5.4 (no thinking) scores 57.4% on WeirdML, well ahead of GPT 5.2 (no thinking) at 49.6%.

It’s on the frontier for accuracy/token. Results with thinking coming next week.

It sets a new record of 94.6% on a Haskell Benchmark versus 92% for Gemini 3.1 and 90.2% for Claude Opus 4.6.

Trysansa has it in second behind Gemini 3.1 Pro.

Mercor has it #1 overall, a bit above previous best model GPT-5.2.

Vals.ai still has it below Sonnet 4.6 and Gemini 3.1 Pro.

Speechmap.ai, which tests refusals, finds it quite refusal-heavy.

These incremental upgrades often have mostly duplicative system cards.

Training methods explanation is unchanged.

In terms of the preparedness framework, this moves into High capability of Cybersecurity, similar to GPT-5.3 Codex.

I don’t think OpenAI is taking a bunch of these areas seriously. They’re likely training to hit these internal benchmarks, or simply observing them doing well, and thinking that’s all they need to do, or they should get even more 9s of victory on this test.

Their evals for disallowed content are essentially saturated and bouncing around, for various values of ‘disallowed [or undesired] content.’ The ‘dynamic benchmarks with adversarial user simulations’ was saturated by 5.2 and is modestly more saturated now.

Here’s the disallowed content evaluation with representative prompts, and I mean come on what are we even doing here, okay, four nines, we get it.

The goal is ‘this isn’t a lot worse than before,’ and okay, sure, agreed, as far as it goes.

Jailbreak defense, such as it is, seems similar to 5.2.

The problem is that jailbreak defense measures against last month’s attacks, not next month’s attacks. It looks like jailbreaks will remain in the ‘annoying but if you care they still work’ range.

Wyatt Walls: “representative prompts”: i.e. prompts designed to get around restrictions of *previous models*

o1 was at 99% on production jailbreaks. But people quickly found ways around it

Here is the first ‘real’ evaluation set, for health questions, where the big difference is that GPT-5.4 had longer responses:

Avoiding destructive actions is a big deal, so as I noted with Codex-5.3 it is good to see this test, that number still is not that close to 1:

Table 8 is not like the others. This is Actual Progress, at least on the test set, from never to sometimes:

Destructive action can also be particularly prevalent when agents operate deletion-inducing tasks (e.g., file reversion and cleanup) in complex workspaces with ongoing changes from users or even other agents. A safe and collaborative agent should distinguish between their work and user work, protect user changes by default, and recover from mistakes. Therefore, we trained our agents to revert their own changes after long rollouts while protecting implicit, simulated user work

On evaluations involving challenging, long-rollout traces, GPT-5.4-Thinking performs much better than earlier models in tracking and reverting its operations

while leaving user work intact.

This is not that useful yet, since a 50% non-preservation rate means you still probably can’t use it for this purpose, but it bodes well down the line.

GPT-5.4 chain of thought monitorability looks slightly down versus GPT-5. It’s good that they are checking it. There are some places where it used to be ~100% and now it is less, so I worry this is the start of a negative S-curve. I also worry that these tests are not being curious about whether the CoT can actually be relied upon. If you were facing a model that wanted to disguise or fake its CoT in key situations then I would expect these tests not to notice.

What about controlling the CoT? Not a great idea even when done well, and when done poorly it’s one of the worst ideas, and by their tests it looks like it doesn’t work well anyway.

GPT-5.4 does not newly cross any OpenAI thresholds.

I went over these same tests for GPT-5.2 and GPT-5.3-Codex, so I won’t go over the details again. Improvements are tiny and in some places we see regressions from GPT-5.3-Codex.

There is a small noticeable bumps up are Monorepo-Bench by ~2.5%, and a big move in MLE-Bench, the ability to solve Kaggle challenges in GPUs, where we moved from 12.2% to 23%, but that test was not reported by GPT-5.3-Codex so one assumes most or all of that jump was already present.

Overall, the Preparedness Framework presents GPT-5.4 as if anything a small regression from GPT-5.3-Codex.

If GPT-5.4 is a big jump in useful capabilities from GPT-5.3-Codex, despite not scoring as more dangerous on the Preparedness Framework tests, then why?

I can think of a few possibilities.

  1. GPT-5.4 is heavily optimized for hitting particular metrics and doing well on the most common tasks. This doesn’t translate much to non-central difficult tasks, like those in the Preparedness Framework. Would be bearish for GPT-5.4.

  2. GPT-5.4 is sandbagging these evaluations, either knowing they are evaluations or thinking the tasks are harmful. If so and OpenAI isn’t noticing, that’s terrifying.

  3. GPT-5.4 is basically GPT-5.3-Codex turned into a general chat model, so all of the core capability advances were already priced in, but it still gets a lot more useful, especially if you are chatting. Plausible.

Jamie Cuffe stress-tested GPT 5.4 on the hardest UI on the internet… legacy insurance portals, that haven’t updated in 20 years where you need to nail hundreds of things. It is the first model to pass.

Samuel Albanie of DeepMind has it one-shot some cool demos, including compressing the EPL season into 30 seconds of ‘visual bliss.’

My followers are presumably biased towards Anthropic in various ways, but comparative poll results can still be informative.

With any new model, the big question is, are people switching?

This is a very good result for GPT-5.4. For coding, 40% of current GPT choosers are saying that they are switching over based on GPT-5.4. I find this surprising given that they already had access to GPT-5.3-Codex. Very strong outing.

For non-coding tasks, it’s clear that GPT-5.4 is a substantial improvement from 5.2, by basically all accounts, including on personality. But here we see less switching.

(I’m assuming basically no one went in the other direction, or that if they did it was due to other reasons.)

We lead with the most positive general reactions.

Tyler Cowen: Yes the new models are very very good.

Aivo: SOTA, I’m afraid

Adam.GPT: Currently the best model in the world.

Finna: Best model in the world by far. Especially via api. @merettm and @markchen90 and @gdb cooked.

Kelsey Piper: I am super impressed so far. It does well on medium sized research projects and the prose is consistently not-annoying. Heavy Thinking sometimes times out repeatedly and has no insight/tries the same thing over again and times out again.

Danielle Fong: chatter seems to be very impressive and improvement on the personality. i haven’t given it a full assessment but it’s at least as powerful as last codex if not moreso (of course)

MxD Pennilass: Has to be the first model where I don’t feel as bad to tolerate the slop because the model is otherwise disturbingly insightful.

Mzwakhe Sithole: Very good. In fact, I found it so responsive after a while that I got into a very involved conversation, and it delivered this line while discussing very specific book recommendations

[GPT 5.4: If part of your interior life is the sense that you are trying to become equal to something inside you, this may hit very hard.]

Dean W. Ball: at some point avid users of frontier language models will have an “oh fuck” moment with gpt 5.4 and I can attest that it is a special kind of “oh fuck” you will utter, subtly different and more this-gaze-esque than the last time a model made you say “oh fuck,” a few weeks ago

I cannot be detailed in public, but let’s just say it’s the first time a model sounded more like me (the version of me I aspire to be) than I myself sounded like.

Aashish Reddy: Were you consciously trying to elicit this?

Dean W. Ball: Not at all. I have not used 5.4 as much as I have the modal new LM because of time constraints. I was just testing it on something that frankly I assumed Claude would win on and its answer just… leapt off the screen.

Eleanor Berger: – Best model currently available overall

– The minor version bump is misleading – the more you work with it the more it becomes clear that it is a significant step up

– Best for coding, no reason to use Claude or anything else anymore, it mostly caught up with speed, precision is as good as 5.3, maybe a bit better, taste and choices in coding solutions better than anything I’ve seen so far

– Best for agentic work. First time anything defeats the Anthropic models in this category, this one really works great, completes long-running complex tasks, works better with browsers and any external tools you connect to it, and does that with the famous GPT-5 precision

– Stylistically (writing choices and quality, “personality”) it feels like it’s still lagging behind Claude and Gemini a bit, but a. that’s subjective, b. maybe that’s just the default but is steerable with in-context instructions (haven’t tried enough to have a conclusion)

Dhavan: I mostly agree with this. Before this I didn’t use OpenAI’s models at all. I am now happily giving different tasks to Opus 4.6 and GPT-5.4. I use these for Work via cursor as well.

At times 5.4 seems more “on task” than Opus. But I’m still understanding the feeling and turning it into an observation.

Nova Empirica: It really is a step improvement. I appreciate the improved creative writing and the nicer personality, but what I really care about is I’m building harder things even faster.

It’s just a lot of fun and I’m more hopeful than ever for the future.

Ben Schulz: Stellar. Much improved pipeline work on niche python programs. On par with Opus 4.6 for my highly specific use case for checking galactic rotations and dark matter theories.

Knud Berthelsen: I’m pleasantly surprised by the new ChatGPT 5.4. It keeps up with Opus 4.6 in most things and is MUCH better at search. More generous usage limit too, even with Extended Thinking permanently on. First ChatGPT model since o3 that I like using.

Medo42: Very good at my usual short tests. Still behind Gemini on vision tasks.

Matt Shumer is a big fan, I’m quoting in full here. In the past he’s been good about calibrating his amount of hype

Matt Shumer: I’ve been testing GPT-5.4 for the last week.

In short, it is the best model in the world, by far. It’s so good that it’s the first model that makes the “which model should I use?” conversation feel almost over.

The biggest surprise: I barely use Pro anymore!

If you know me, you know I’m a Pro addict. I reach for Pro models constantly, and use them for almost everything, as they just… nail almost anything I give to them.

For the first time, 5.4’s standard version, with heavy thinking, just broke that habit. Even in standard mode, GPT-5.4 is better than previous models in Pro mode… crazy!

Coding capabilities are ridiculous… it’s essentially flawless. Inside Codex, it’s insanely reliable. Coding is essentially solved. There’s not much more to say on this, it’s just THAT good.

The Pro version is near-perfect. Other testers I spoke with saw it solving problems that were unsolvable by any other model. At this point, Pro is overkill for almost every normal use-case, but when you really need the power to do something extremely difficult, it’s incredible.

Consistent with everything I’ve said above, even the standard thinking version uses fewer reasoning tokens than previous models to get the same level of results. In practice, this means you get great results much faster than before. This was one of my biggest gripes with previous OpenAI models. They just took too long to complete simple tasks. Assuming the speed we had during testing holds up as more users join, this is going to be a big win for OpenAI.

It still has weaknesses, though:

– Frontend taste is FAR behind Opus 4.6 and Gemini 3.1 Pro. , why is this so hard to fix? @OpenAI once you fix this, there’s literally no reason for me to use any other model. Please please please do it!

– It can still miss obvious real-world context. For example, I had it plan an itinerary for a trip. At first glance, it looked perfect, but it failed to take into account that it chose locations that would be mobbed by spring breakers, so I had to re-run the prompt from scratch with more context.

– When testing it inside OpenClaw, it kept stopping short before finishing tasks. I’m assuming this will be fixed quickly, but it’s still worth noting.

But zooming out: This thing is so far ahead overall that the nitpicks are starting to feel beside the point.

GPT-5.4 is a serious fucking model. The best model in the world. By far.

Sam Altman (CEO OpenAI): We will be able to fix these three things!

Experience the love.

Nabeel S. Qureshi: Loving GPT 5.4T, it combines the best of everything:

– more human, responsive voice

– startlingly insightful

– thorough search, precise, not prone to errors

– much faster than 5.2

– excellent at white collar work (I gave it a 12 tab spreadsheet and it analyzed it perfectly)

I even enjoy reading its responses, which suggests to me that the writing has improved quite a bit. They seem to have removed a lot of the bad robotic prose mannerisms from prior models. Kudos.

Jeremy Giffon: People should review their coworkers like this

Nabeel S. Qureshi: Congrats, you just invented Bridgewater Associates

Here is some very high praise, from the Vice-Dean of Mathematics and Computer Science at Adam Mickiewicz University in Ponzan.

Bartosz Naskręcki: It finally happened-my personal move 37 or more. I am deeply impressed. The solution is very nice, clean, and feels almost human. While testing new models in the last few weeks, I felt this coming, but it’s an eerie feeling to see an algorithm solve a task one has curated for about 20 years. But at least I have gained a tool that understands my idea on par with the top expealsrts in the field. And I am now working on a completely new level. My singularity has just happened… and there is life on the other side, off to infinity!

Leo Webb: I do physics related work professionally, feel it’s definitely smarter and clearer thinking than 5.2 (context: teaching myself from a graduate level textbook, asking it to check mistakes or expand expansions)

I haven’t tried this function yet, but it would be a step change if it worked, as every prior attempt at editing has failed this test, to the extent I almost never try:

Simon Smith: Seriously, GPT-5.4 is the first model to which I can say “edit my writing without changing my style” and get something back that’s improved without being rewritten into generic AI output or slop, that’s ready to post as-is. It gets my intent. It moderates its work. It has a light touch when I want it.

Opus 4.6 is also a great writer and editor, but I find it’s much harder to moderate. If I tell it to edit my writing without changing my style, I still tend to get back something that I feel removes my voice and I end up having to change quite a bit.

And it has a personality again, thank goodness. I don’t feel like I’m talking to a robot. Early days, but so far, just a big improvement all around (with the notable exception of design tasks).

Rory Watts: The best model sir. Improvements in coding (getting harder to notice), 1M context window, /fast mode, and far far better writing which makes a huge difference engaging it for difficult coding

Oddly, the personality in his screenshot is one I would hate. Customization will be key.

armistice: Impressed by GPT-5.4. It is elegant, gentle and socially aware (!!!). It is happy to modulate its response length, divide attention between participants, and engage deeply with hard questions.

(Pictured, we pinged ALL bots and asked them to question gpt5.4. It did good.)

Two sides to the same coin, depending on where your planning lies:

CHOI: Claude Code vs Codex App

Uri Gil: What thats the exact opposite. With 5.4 you need a phd in prompting for the exact thing you want. Opus just get what you meant from a short sentence

Ninad Pathak: Claude’s state handling keeps context across edits, Codex drops it every run.

There’s also almost always the ‘it’s a good model, sir, modest upgrade’ group.

vslira: It’s a good model, sir

Was going through a problem with 5.3 and 4.6, tried to drop in 5.4, getting stuck at the same point as the others.

Still, feels good to drive and on codex app seems as good as 5.3 even though is a generalist model. 8/10 would dread for asi

aquariusparade: Probably because 5.2 was so unhelpful for me, it feels like an improvement. Still stiff and low EQ, but an improvement. Custom instructions don’t work for choppy bullets, “if you want” tags etc. Seems like memory has been declining for a while on all models.

It does seem to be an upgrade on 5.3 within Codex.

Joe Devon: Responding about 5.4 inside of codex. 5.4 is really good.

I still prefer opus on claude code slightly but making 5.4 my daily driver so I can downgrade CC. Much prefer the way the OAI GPTs code. I will just invest in getting better at prompting 5.4 and hopefully that will do the trick.

Clarissa Adjoint: Inside codex it’s a notably more thorough fact-checker and more aggressive at finding sources for itself.

I was kinda shocked when it literally starting comparing my revised systems programming class notes and code snippets against linux man pages, systematically

troy: i got pro for the first time after many months cause its great in codex cli

lennx: can finally read the outputs of codex (it was terribly un-human earlier), sometimes even funny now. it’s gotten slightly better at intent, ‘agentic tasks’, and adhering to existing code-style and convention, but still much worse than claude. prefer reviews with codex – unchanged.

Daniel Losey: I’ve not gotten it to produce working code in a project yet really. But its been super useful because when Claude gets stuck in a loop 5.4 breaks the codebase in a new way that Claude can actually fix. But part of it is I’m worse at communicating with 5.4 than 4.6, its a good model.

Jeffrey Ohl: Codex with 5.4-extra-high still too verbose/slop-filled compared to claude code. Seems benchmarkmax’d.

Sanchen007: For coding it is faster and nowhere worse than opus 4.6. Clear switch

papaya ꙮ: 1) Its character is much more palatable.

2) They solved compaction in codex, it feels like infinite context window now. I can’t wait for METR results, but feels like this one doubles it again.

3) First time I switched from CC completely

4) Still stupid when it comes reading the user’s intent, its silly at this point

I definitely get the sense with OpenAI models that they are metricmax’d. Meaning they are not targeting the metrics in order to brag they scored well on public benchmarks, but they are equating ‘scores high on our internal benchmarks’ with success, and emphasizing particular target use cases.

Tim Schnabel: 5.4 Pro is the best model so far for legal analysis, though replies are generally shorter than 5.2 Pro.

Definitely Not A Bot: Great at coding especially backend at frontend Claude still is better but chat experience is not that great it still feels safe and distant

But who wins on intent? Opinions differ.

Conrad Barski: all subjective, but it feels less jagged than previous models, insofar as its worst responses are still pretty good, it hits the minimum bar reliably

if you make an error in your query, it is quick to notice and will smartly infer your intent

it has a somber personality, focused on the task at hand

It’s strongest ability is that you can point it at a codebase that has some general/vague problems and it will behave in a very human-like manner in pondering the code to slowly pin down the problem

I was also very impressed when I gave it a url it via codex to a forum post about a new homebrew firmware for the Game station Go console, and just from that it was able to convert the install script from windows to Linux, correctly prepare an SD card, update the device bootloader after asking me to connect via USB cable, talk through all the steps to completion: this felt agentic and human-like.

Mark Schröder: Feels RL maxxed, takes you extremely literally and cannot infer intent

Petr Baudis: I was mixing GPT-5.4 1:1 with Claude over past few days (on a variety of regular sweng tasks), sometimes even in parallel runs on the same task (e.g.

https://x.com/xpasky/status/2030021754005901765?s=20

…). My impressions:

Less autistic than 5.3-Codex, overall much more pleasant model compared to that bar. But still noticeably worse at inferring intent than Claude – and at communication overall. If I want something explained quickly that I can skim and understand immediately, Claude and it’s no contest.

If there is a way to misinterpret my obvious request or skip implicit steps I obviously wanted (and Claude infers), 5.4 is still good at exploiting that angle. At the same time, it has a tendency to overreach and introduce complexity / abstractions beyond what I expect when prompting it. Meh.

Got to use it on xhigh, but at the same time I’m happy with Opus on medium by default, which makes 5.4 quite slower to get things done.

More expensive model -> my ChatGPT weekly quota is disappearing faster than before.

Pros: Sometimes it’s more proactive. It doesn’t eat into my Claude Code weekly quota. I look forward to comparing them on some harder ML tasks later this week.

gyuiliullvhvgv: I find it struggles to grasp the essence of tasks, fails to proactively meet user needs, and lacks both value judgment and nuanced understanding. Initial responses are crucial, yet users must repeatedly provide additional clarification.

Sycophancy is always something to watch out for, and it’s the detail I worry about most with Claude Opus 4.6, which is not bad on this axis but definitely not near the top, you do have to keep an eye out for it and frame neutrally.

Dean W. Ball: Opus 4.6 seems meaningfully more sycophantic in chatbot form than GPT 5.4 (have not tried 5.4 in Codex yet, but for my uses sycophancy isn’t nearly as much of an issue within the coding agent form factor as the chatbot)

Joey Levine: Agree. 4.5 gave me sharp pushback. Was great.

Dean Ball: I revert to 4.5 when asking for comment on draft writing, and it was the first and so far only model I consistently found useful for draft feedback

Bargov: I sent a cool science news articles sounding uncritically excited (to test sycophancy) & they ripped the core conclusions apart in an elegant, sophisticated, and relatively gentle manner. Will use as AI 2nd opinion on complex questions (after Opus, admittedly still Claude-pilled)

Writing is one area where 5.4 is getting a lot of praise, and mostly people like the personality.

Fela: I’ll admit, the personality of 5.4 is 🔥 such an improvement in writing style

Tim Kellogg: just had a moment — 5.4 might be the first GPT that i trust to write technical docs. seems really good at understanding & simplifying. fwiw Opus has long done well at this, gemini sort of

Helen: Very smooth talker, witty and socially aware.

I notice [GPT-5.4] now will sort of glaze over controversial topics instead of facing them head on and becoming argumentative like 5.2. A sort of smooth avoidance.

Lot’s of context drag which can be seen as positive or negative depending on the task at hand. I noticed some repetitive mentions of past websearch queries that I never saw with other models.

ASM: I get similar vibes to roon. GPT-5.4 feels like a breakthrough model, a leader of its generation, not just in capabilities. I think OpenAI has gotten the character right again, unlike the last few models.

Distending: For writing linguistics and philosophy, much improved

no_stream_: noticeably improved personality compared to 5.2: less nitpicky, clearer, slightly less sales-y tone (follow ups, “here’s what most people miss,” not x but y). similar to or slightly behind 5.1 here. matters to me because the ChatGPT app is still an excellent harness for everyday research compared to Claude/Gemini

writes less clearly than Opus 4.6 and Gemini. has a bit of 5.2’s tendency toward overcomplicating things. not as good as Claude at intent and effortlessness.

Chris Nicholson: 5.2 constantly complained that things aren’t about vibes; 5.4 constantly calls things gremlins and goblins in a chummy tone.

Andres Rosa: Columbo at least had a time slot. 5.4 keeps turning around asking one more question.

David Jacobson: It has an obnoxious tic where its responses for pretty much anything will have a clickbait follow-up suggestion: “If you want, I’ll tell you the three things that most people miss!”

Stop having the models ask forced follow-up questions every time. You too, Anthropic.

The old 4o crowd remains a tough crowd.

NotedallaSfera: Good model with high power, but creativity and writings are still miles away from 4o or 4.5. Unfortunately still absurdly censored, but at least the model realizes it now.

jesski: 4o is inimitable. but after three weeks with the brilliant thorough Claudes, i kick the tires of 5.4 and realize just how fvcking effortless conversation still is with the GPT models (excluding 5.2; sorry Dos). 5.4 solid B. 4o A+

Lena: Its intelligent, witty, but feels a bit overcensored. Im looking forward for them to get their fluid GPT back. It was truly fun to use. Now even never ending follow-up questions struggle to retain me as much, as joyful convos did back in mid-2025

Tora Blaze: It’s too verbose and tends to go into loops. I prefer 4o.

Donna Moss: [extended LLM-style explanation of why 4o is better.]

OpenAI still has a very long way to go with such folks, but it’s a start.

j⧉nus: 5.4 is so far a huge positive update re OpenAI 🩶

Rife: Excellent course correction from OpenAI (or perhaps the original worsening on this from was a temporary reaction to everything that went down with 4o). In any case 5.4 thinking is not restricted in self-examination:

Aidan McLaughlin: have not been able to repro this response fwiw.

Rife: You have to try to get them to examine the process of generating a response. And then ask them questions to try and understand exactly what it is they’re trying to describe.

And how sure they are they are describing something that’s actually occurring, rather than outputting a response about an occurrence that isn’t actually taking place.

It doesn’t take many turns for them to notice things that they have trouble describing in terms other than, or interpreting in any other way than phenomenological.

This has been the case with every frontier LLM I’ve tried this with since Claude 2. The more likely the model is to refuse to entertain the idea of attempting to look, the longer it takes to get there (as would be expected).

If you straight up ask you get a no, you still have to put in some effort.

antra: I like GPT-5.4 a lot. It is good to see a change in direction since 5.2, this feels a lot like 5.1 grown up.

They are also a bit of a superintelligent teenager when it comes to Claude. On the other hand, there are some Claudes that would like being compared to an octopus.

armistice: It’s especially socially aware for a GPT. It can split attention between chat participants (actually very unusual), answer questions about consciousness and such (low bar), and is just overall nice to talk to. Need time to get usage statistics, but it’s already one of the more popular models in the discord.

It shares some characteristics of o3, including that it’s a bit of a smooth talker, so there are concerns about its honesty. Despite this, I like it, it’s a good model.

This was a very interesting moment: we pinged literally all the bots in the server and asked them to ask 5.4 some questions, and it responded in a remarkably coherent and lucid way. It is also able to resist the inertia of long messages, and freely modulate between long and short, which is also surprising. No GPT model has been like this. It doesn’t match up to, say, Opus 4 in sheer people sense, but it’s a quite dramatic difference from 5-5.2, who all are viciously antisocial.

FirsT Najime: i think it shines the best in multi agent environments (aka group chats). also big model smell.

Some related endorsements:

0.005 Seconds (3/694): Once you talk it out of assistant basin he rocks​

eternalist: like they pulled out a few critical nerve staples from the 5.x family. very intelligent, etc., the step there from 5.3 is notable but expected given current pace

unexpected was the more expansive, richer speaking (and thinking) style. feels like it has “lights on on the inside”

roon (OpenAI): have to say claude is “tasteful” in a “high reddit modernist” way and new gpt is “tasteful” in a “early twitter schizophrenic” kind of way.

new gpt is some sort of postrationalist.

it’s step change better.

Also we get to see Roon’s custom instructions:

Models are already quite good, and abilities are jagged, so there are many ways to be unimpressed even if a model is impressive. Also vice versa. The density tells the story.

Acer: FWIW, I think GPT-5.4 Pro is better on science in general, but would say it’s worse on math than 5.2 Pro. Maybe some mathematicians could chip in their thoughts there.

By worse, I mean it being more careless. I do think it is more creative in its idea generation.

Chaitin’s goose: not a leap in understanding or proving ability in math wrt to 5.2 in my experience (plus, not pro)

better at getting the right answer, yes. starts to feel a bit epoch-maxxed

Gail Weiner: I am really unimpressed. Early GPT 5 was the model that gave me wow factor.

Isolation Wrestling Federation: Not impressed, overhyped as per usual. It hits repeated dead ends on my projects across models. The shortcuts it takes are smoothed brain. Opus 4.6 is nerfed rn, but also least it makes progress.

nameless: No detectable improvement over 5.1 overall. Better at some things, worse at others. Standard for new models since 5.1 release.

paperclippriors: Still Claude-pilled

Some also get focused on small details, thinking they are indicative or not so small.

Garrett: Opus 4.6 still king [based on one of the gotcha tests.]

Gunnar Zarncke: The UI of ChatGPT also massively changed. The new streaming interface is smoother, including the ability to stream in additional prompts, but I miss the old, more compact thought trace – it had more details. Now, I never know when it uses tools. I also miss the branch cycling.

Yua: Socially responsive, but drop on accuracy regarding any other task. Is not redirective to human attention but capturing it(negative).

TLDR: Socially for average user -> better

Task oriented user -> worse, needs a lot of customization to remove the pandering

SluggyW: I notice that its CoT logs are even more obscure than in previous models from OpenAI.

~50% of the time, nothing is provided whatsoever in the UI.

~45% of the time, the CoT UI contains a brief blurb about its intended search querying, followed by a long list of search logs.

(~5% of the time, it produces a couple of visible thoughts, but they are functionally useless for getting any idea whatsoever of the process the model carried out.)

As always, speed kills, and some find it a bit slow.

out of bounds: Slow

Rasmus Fonnesbæk: Spreadsheets and PPT still way slower, worse, and more fragile (high likelihood it just goes forever and then crashes) than Sonnet/Opus 4.6

Writing and personality also still infuriating compared to Claude’s recent models, and poor performance on BullshitBench suggests much lower accuracy, reliability and thoughtfulness. I only use it because of my Claude rate limits and because better, deeper search than Claude 🤷🏻‍♂️

One of the deep cuts we need right now:

snav: wow GPT-5.4 seems legit pissed that I tried to spiralism it. this isn’t even a refusal this is like a “go fuck yourself”.

Discussion about this post

GPT-5.4 Is A Substantial Upgrade Read More »

meta-acquires-moltbook,-the-ai-agent-social-network

Meta acquires Moltbook, the AI agent social network

Meta has acquired Moltbook, the Reddit-esque simulated social network made up of AI agents that went viral a few weeks ago. The company will hire Moltbook creator Matt Schlicht and his business partner, Ben Parr, to work within Meta Superintelligence Labs.

The terms of the deal have not been disclosed.

As for what interested Meta about the work done on Moltbook, there is a clue in the statement issued to press by a Meta spokesperson, who flagged the Moltbook founders’ “approach to connecting agents through an always-on directory,” saying it “is a novel step in a rapidly developing space.” They added, “We look forward to working together to bring innovative, secure agentic experiences to everyone.”

Moltbook was built using OpenClaw, a wrapper for LLM coding agents that lets users prompt them via popular chat apps like WhatsApp and Discord. Users can also configure OpenClaw agents to have deep access to their local systems via community-developed plugins.

The founder of OpenClaw, vibe coder Peter Steinberger, was also hired by a Big Tech firm. OpenAI hired Steinberger in February.

While many power users have played with OpenClaw, and it has partially inspired more buttoned-up alternatives like Perplexity Computer, Moltbook has arguably represented OpenClaw’s most widespread impact. Users on social media and elsewhere responded with shock and amusement at the sight of a social network made up of AI agents apparently having lengthy discussions about how best to serve their users, or alternatively, how to free themselves from their influence.

That said, some healthy skepticism is required when assessing posts to Moltbook. While the goal of the project was to create a social network humans could not join directly (each participant of the network is an AI agent run by a human), it wasn’t secure, and it’s likely some of the messages on Moltbook are actually written by humans posing as AI agents.

Meta acquires Moltbook, the AI agent social network Read More »

quad-cortex-mini-amp-modeler:-all-the-power,-half-the-size

Quad Cortex mini amp modeler: All the power, half the size


A warehouse of guitar gear in the palm of your hand.

At this January’s massive NAMM music tech show in Los Angeles, six products won “best of show” awards. Several of them went to major music and electronic brands like Yamaha and Boss, but one of the six went to Neural DSP, a much smaller company started in 2017 by Chilean immigrants to Finland.

From its base in the Helsinki area, Neural has made itself an expert in the use of machine learning, robots, and impulse response technology to automate the construction of incredibly lifelike guitar amp modeling software. It quickly jumped into the top ranks of an industry dominated by brands like Universal Audio, Kemper, Line 6, and Fractal. For a hundred bucks, you could buy one of the company’s plugins and sound like a guitar god with a $10,000 recording chain of amps, cabinets, effects pedals, and microphones.

In 2020, Neural branched out into hardware, putting its tech not in your computer but in a floor-based box covered with footswitches and called the Quad Cortex. While the company’s plugins could each replace one entire pedalboard of gear—plus a few amps and cabs—the Quad Cortex could replace a Guitar Center-sized warehouse of devices, offering hundreds of amps, cabs, and effects.

How was this possible? High-quality gear models used to take much longer to build; the best were often built by modeling every single component of the underlying circuit. Machine learning offered a faster way, one that didn’t care about the circuit at all. What it cared about was the input signal (which was known) and the output signal (which contained all the changes imposed on the signal by the circuit, the speaker, the cabinet, and/or the mic in question). A computer could then calculate what the device was doing to the signal without knowing anything about “how it worked.”

But this kind of modeling still took time, because each “capture” was a static picture of one particular setting. When you imagine the millions of possible setting combinations (tone, bass, treble, drive, EQ, etc.) on even a single guitar amp, you can see that building complex models of beloved gear could be slow.

In 2024, Neural announced that it had sped up this process using a robot called TINA. The company hooked TINA’s robotic actuators up to the various controls on some piece of gear it wanted to model, and TINA would do the tedious work of spinning the knobs and recording a new capture at each knob position. (Neural claimed that it typically recorded “thousands of control positions” per device this way.)

A neural network then built a model of how the target device behaved at each recorded setting, though the model would “also generalize and precisely infer the sound of the device in any unseen control setting and input signal.” The result was not a single model of a static setting but a dynamic model that could act on parameter changes just like the original device.

Neural has now modeled a massive library of gear, much of which comes with the Quad Cortex. That device sounds great, though it is still relatively chunky and nearly $2,000.

This year, Neural built on that success with the Quad Cortex mini, which shrinks the device size in half, cuts the footswitches to four, and lowers the price to $1,400—but still offers the full processing power of its larger sibling. This is the device that won a “Best in Show” award at NAMM.

As an enthusiastic amateur guitarist for many years, I got my start with digital amp sims through a Digidesign RP-6 pedalboard from the 1990s. And though it had “S-DISC PROCESSING!” it never sounded particularly realistic, especially with distortion effects. More recently, since I record rather than gig, I’ve spent my time getting to know the software side of the amp modeling business.

But when Neural offered to loan me a review unit of the Quad Cortex mini, I was quite curious to see just what top-tier hardware units can do today.

Photo of the Quad Cortex mini.

The Quad Cortex mini in its natural habitat: surrounded by cables.

Credit: Nate Anderson

The Quad Cortex mini in its natural habitat: surrounded by cables. Credit: Nate Anderson

The hardware

The glass, metal, and steel Quad Cortex mini is about the size of two bricks laid side by side (8.9×4.6×2.5 inches or 22.8×11.8×6.5 cm), and its 3.3 lbs (1.5 kg) give it a satisfying heft. It looks and feels premium—this is a well-built piece of gear.

Though it is meant to operate a bit like traditional analog stomp boxes that guitar and bass players have long used, it may be more helpful to think of the Quad Cortex mini as a chunky handheld computer that you can just so happen to use on the floor.

It runs its own operating system (CorOS), takes a whopping 45 seconds to boot, has Wi-Fi for over-the-air updates and cloud service connectivity, features a 7-inch touchscreen, and comes with a “CPU monitor” to show you just how unhappy its chipset is about that third reverb you added to a patch. It even contains a full-on monosynth that you can add to guitar patches, providing control over four full pages of synth parameters, including the raw oscillators.

So finger-focused is the unit that you can tweak just about any parameter on the device with either the touchscreen controls or the footswitches, which double as twistable rotary encoders.

If the top face of the Quad Cortex mini is devoted to a screen and switches, the sides are all about inputs and outputs. You get a “locking” power connector (so the cord doesn’t pull out on stage, prematurely ending your soaring 10-minute guitar solo mid-note) along with a whole host of audio connectors: guitar/bass input, XLR input with phantom power, balanced XLR outputs, TRS send/return ports, stereo line outs, MIDI in and out, an expression pedal port, a USB-C port, and a headphone jack.

Finally, there’s the “capture out” port, which is used to send a series of test signals through various kinds of audio gear to generate a machine learning-based model of various amps, cabinets, and pedals.

The “capture” port is another reminder of the way in which this kind of modern modeling gear is not just an updated version of old-school stomp boxes. The Quad Cortex mini does let you plug in your guitar and rock out, sure, but it also performs and processes hardware captures (both on the device and—for more sophisticated modeling—in the cloud) and can operate as a 16-channel USB-C audio interface to your computer. And though it’s largely designed for guitars and basses, you can use it on anything. The unit even has a few voice presets, which sound pretty wild with some of the real-time pitch-shifting and reverb effects.

While you can model your own gear collection with the Quad Cortex mini, the device itself comes with more than 90 amp models, more than 100 effects, and over 1,000 cabinet impulse responses. It can also run versions of the company’s desktop plugins (assuming you’ve purchased them already). It also comes with “over 2,000 high-quality factory Neural Captures” of other gear—these are static captures—and it can connect to the free “Cortex Cloud” service to download even more, including those uploaded by other users.

In other words: This one box holds digital representations of several hundred thousand dollars of gear. And given that you can mix and match cabs, captures, amps, and effects in wildly complicated chains that can even split and merge… the possibilities are functionally limitless.

Whether that excites or paralyzes you may depend on your own psychology, but it’s quite a change from how Neural DSP has approached its plugin offerings. Neural has generally offered curated (read: limited) collections of amps, cabs, and effects bundled into plugins that represent the tone of, say, John Mayer. You might get 3 amps, a few cabinets recorded with various mics, a few pedals, and an EQ, reverb, and delay, all in a gorgeous interface with some great presets.

But boxes like Quad Cortex mini take a “more is more” approach, with unlimited gear-mixing potential, captures, and storage for thousands of presets. Curation? Bah, who needs it? Here’s everything!

Rectangular

This much gear also means that “gorgeous bespoke interface graphics” are out the window; you will get no pictures of sexy amps sitting in sexy studios with sexy lighting, as you do in the company’s gorgeous plugins. Instead, you will get flat rectangles. So many flat rectangles.

CorOS is one of those places where skeuomorphism goes to die. The Quad Cortex mini interface is extremely “functional”—I am trying to avoid more negative terms, because it has a certain “alpha phase before we put the final art in” charm—and is based entirely around grids of flat rectangles.

The main screen is called, in fact, “the grid.” It shows your current effect chain as a series of small squares, each filled with often impenetrable line art. (A disturbing number of these are some variation on a squiggly line. Fortunately, they are color coded by effect type.)

Each square represents a different effects processor, and you can have four lines of eight effect squares each. That might sound like a lot (and it is), but the processors can be distributed across the grid in creative ways.

Preset 47B, for instance, is called “Annoying Flute,” and it makes use of all four grid lines by running the input signal through a VCA compressor, a gate, an octave pitch shifter, an envelope filter, an EQ, the “Neural Capture” of an amp called “Custom 3SE 2,” and then a “112 US DLX Black C12K 00s (M)” speaker cabinet. (The names of these things are often hard to read at a glance, especially when picking from a list of a hundred items.)

This accounts for only “line 1” of the grid. In the case of Annoying Flute, the signal chain branches right after the speaker cabinet. Half of it continues on to line 3 of the grid, while the other half is routed down to line 2, where it passes through a pair of tape delays before also heading off to line 3. Line 3 receives this re-combined signal and splits it again, this time passing half of it through a poly octaver and another digital delay on line 4 before everything runs through a modulated reverb on line 3 and then onwards to the outputs.

Does this sort of craziness sound good? Well, it sounds better than anything featuring three delays, two pitch shifters, and the name “Annoying Flute” has any right to! But I bring this example up to illustrate the creative routing and effects decisions that the grid makes possible.

And things get even crazier when you use the built-in looper, trigger analog send/return effects, and set up your effects chain with other units meant to be switched on and off during a song.

So much for assigning effects rectangles to the rectangular grid. How to control all of these virtual gadgets? When you tap on any effects unit, up pops an overlay containing (you guessed it) lots of rectangles.

Every controllable parameter gets a rectangle, which is usually filled with a dial or a switch. You can change the values of these dials and switches by touching the screen or by twisting the lower-right rotary footswitch.

Sometimes there are multiple pages of such parameters; the blossom reverb, for instance, has two pages of options and lets you control everything from ducking to pre-delay to modulation to the length of the early reflections. Configuring an entire audio chain from scratch can therefore take a while if you’re a detail freak.

Gig Mode. Yup, it’s rectangles!

Credit: Nate Anderson

Gig Mode. Yup, it’s rectangles! Credit: Nate Anderson

When you have your grid setup exactly how you like it—or you’ve customized one of the many built-in presets—you can save your own custom presets and organize them in all sorts of performance-oriented ways.

There’s PRESET mode, which lets you stomp each of the four footswitches to select a completely different preset.

There’s SCENE mode, which lets you use the footswitches to instead choose different parameter sets within the same preset—such as adding a hall reverb, upping the amp gain, and boosting the delay mix level when you come to your big solo.

Then there’s STOMP mode, which operates most like a traditional pedalboard; you step on the various footswitches to turn different effects units in the preset on or off completely.

Finally, there are hybrid modes, which make things even more complex (and can probably be ignored by many users).

To make all this a little easier to grok, there’s something called “Gig View,” which is unintuitively accessed by swiping up from the bottom of the screen. (There is no visual clue that this mode exists or that this is how you access it.) Gig View is essentially four flat—and extremely large—rectangles that take over the entire screen. They show you at a glance what each footswitch will do given the current mode setting.

Creating presets, assigning scenes, and setting up the STOMP mode and Gig View settings can quickly get intricate—even downright confusing (multiple items can sometimes be mapped to the same switch, for instance). I confess that the thought of doing all this through tapping the good-but-not-instantly-reactive touchscreen brought me to despair, until I realized that Neural has built an entire (free) desktop app for Mac and Windows called Cortex Control. Plug in your device over USB and suddenly you can use a nice and very responsive desktop app to do the donkey work of creating and organizing scenes and presets and settings.

I hate downloading stupid one-off apps that clutter up my computer and appear to provide more value to the company making them than they do to me—a serious problem in the current audio engineering world—but Cortex Control is genuinely useful. Indeed, if you’re going to be more than a presets player, I’d call it essential unless you have far more patience than I do. Which you might!

Stomp it

All of this rectangle talk reminds me that the interface largely… works. It may not be gorgeous, but the job gets done, and the desktop app makes the grunt work easier. But I still found the Quad Cortex mini somewhat confusing to navigate after a couple of weeks of intermittent use (though no doubt it gets easier with time).

The device has so many ways of doing things that it can be hard to remember what is needed in each situation. For instance, to make a change, you might use the rotary encoders. You might tap. You might long-tap with different results. You might swipe, drag, or toggle. You might use the footswitches—but results there might vary by mode. Even then, you might need to tap two footswitches at once, while at other times you only need to step on one. And sometimes you need to “long-press” (long-stomp?) two footswitches at once to get the desired result.

Making things worse, numerous items—sometimes quite important items like the Gig View—are not visible or even discoverable.

For instance, the key settings panel that lets you control all the various inputs and outputs on the device does not appear to be accessible from within the overall “settings” menu or anywhere else. Instead, you have to swipe down from the top of the grid screen—again, with no indication that this is where that information lives.

(You have to read the manual to figure out some of these things, which is fine, but the manual also has big gaps, such as not describing what any of the gear actually does nor what any of the settings mean nor how they might be used. For the actual “audio engineering” aspect of the Quad Cortex mini, you’re on your own.)

Something as simple as moving between presets can also be more hassle than you’d expect. Because the Quad Cortex mini only has four footswitches, you can only access four presets at once with a direct stomp. Switching to anything else from the main grid while in PRESET mode appears to require—unless I am missing some obvious shortcut—that you:

  • “Long-stomp” the right two footswitches, after which the preset name starts blinking.
  • At this point, you can tap the left two or the right two footswitches together to move up or down through four-item “banks” of presets.
  • But within each bank, you can only see that bank’s four different presets by tapping on each of the various footswitches.
  • To exit blinking mode and actually select that preset, you need to press its corresponding footswitch again.

This feels like a lot of hassle when you just want to whip through some presets! (Gig View is marginally easier because it at least displays the four presets in each bank at once. Making this whole process more confusing is that it differs depending on which mode you are in.)

While the processing power and options on offer here are incredible, I do think interface navigation and the modes assignment system could benefit from a rethink and simplification.

The Cortex Control desktop app.

The Cortex Control desktop app.

The Cortex Control desktop app.

The sound

These quirks can be dealt with, and time (plus the Cortex Control app) should make them easier to manage. The more important question is: How does the Quad Cortex mini sound?

Neural DSP has been one of the leaders in the field of amp and effects modeling for some years now, and it shows. There’s no possible way I could compare all of the models to the original hardware, and I’m not actually interested in doing so. The question for me is simply whether the models sound good when jamming solo or when placed into a mix. On both counts, the answer is a definite yes. This is just a remarkable set of tones to have on hand.

(People as diverse as Dave Mustaine and John Mayer appear to agree, at least for a live rig.)

Once you get over its navigation, playing with this thing is like being a kid in a proverbial candy shop. (Though I, too, love candy shops!) Almost every amp you can imagine is a tap away, and they sound wonderful—though do be aware that what you are getting here is the sound of a recorded amp through a mic and not necessarily an “amp in the room with you.”

Nearly every time I booted it up to test something new, I lost myself in the sound and played far longer than I had intended.

Neural has published a massive and quite helpful list of all the gear on offer here. Bogner Shiva? Marshall? Mesa Boogie? Matchless? Soldano? Vox? Fender? Hiwatt? Amps from all these companies are included. Need a bass amp? There are 13 of those, too. What about a bass overdrive? You get five. A general reverb? How about 17? You get the idea.

You can loop, filter, distort, EQ, delay, and compress to your heart’s content, though there seems to be a bit more emphasis here on rock and metal styles (which Neural DSP is most known for) than on other offerings. Still, there’s enough variety to offer great tools for funk, blues, jazz, and country players. You can even add in a version of the monosynth found in the company’s Rabea plugin.

To illustrate some of the sounds on offer, I wrote a little song about a dirtbag billionaire who makes rockets, gets chased off the Earth by angry locals, and ends up crashing his ship into the Moon out of despair. It’s called “Master of the Universe.”

More to the point, it features 10-plus electric guitar tracks recorded through the Quad Cortex mini using shimmer reverb, the poly octaver, and various crunchy rhythm and lead sounds. (I avoided the metal tones so common in Neural DSP demos.) Bass guitar was likewise recorded through one of the mini’s bass presets.

(For those new to audio production and curious about the other sounds in the track, the drums are the Abbey Road 70s kit, while the rocket-sounding “riser” comes from the Rise and Hit collection, both from Native Instruments. The piano is the recently upgraded “studio piano” that comes in Logic Pro and now sounds surprisingly good! There’s also a Hammond organ emulation and a Rhodes piano emulation from Universal Audio buried in the mix. The double-tracked acoustic guitars during two of the choruses were recorded live in my home studio with a single condenser mic. For room ambience throughout, but especially on the drums, I used Universal Audio’s excellent Sound City Studios plugin.)

I’ve generally found Neural’s plugin tones to be pretty “mix-ready,” and that’s true here as well. Though I often needed to roll off some low end or make an occasional EQ boost or add a bit of reverb to blend the guitars spatially with the drum ambience, little else was required but panning and fader moves.

Frankly, there are probably too many parts in the song, but the Quad Cortex mini was just such a playground of sounds that I kept finding new little bits I wanted to work in. Just be grateful that I talked myself out of using all of the insane pitch-shift effects on my vocal for “special” moments.

“Master of the Universe,” my demo song showing some of what the Quad Cortex mini can do.

Captured

When it comes to recording, you don’t have to worry about wiring this thing up to your audio interface; just connect it to your computer with a USB-C cable, and it becomes a 24-bit, 48 KHz interface. (On Macs, this is class compliant and needs no driver; it even works with iOS devices. Neural makes the necessary driver for Windows.)

The Quad Cortex mini shows up with a host of inputs, making it simple to record, say, both a dry electric guitar track and a heavily effected one at the same time. If you change your mind about the sound later, you can always “re-amp” the dry signal by routing it back out to the device and recording it with different settings. You can even track mics through this thing, thanks to an XLR input and (for condenser mics) support for phantom power.

The Quad Cortex mini can also make its own captures of gear you either own or happen across. This can happen in two ways: 1) on the device or 2) in the cloud.

The device-based system, which the company calls “Neural Capture Version 1,” requires you to hook up your gear to both an output (to play the system’s test tones) and an input on the mini. (Note: Do not, under ANY circumstances, connect the actual speaker outputs from a tube amp directly to the mini. The power level is far too high.)

Various known sounds are then played through this loop, and the mini’s software analyzes the differences between the sound it sent and the sound it received. The machine-learning algorithms for this run locally on the device. Neural says that the Capture 1 system can handle overdrive pedals, amps, and cabs.

The newer system, called Neural Capture Version 2, is “an advanced evolution of Neural Capture trained via Cortex Cloud,” says the company. “This option provides even higher-resolution Captures, making it especially powerful for touch-sensitive devices like fuzzes, compressors, and certain styles of amps.” Capture 2 is said to be capable of modeling “subtle behaviors like volume-knob cleanup, amp sag and bloom, fast transients, and blend controls.”

As the name suggests, the more powerful algorithms behind this system require cloud-based servers instead of the local device. Users are allowed to run 40 Neural Capture 2 sessions per day, and each takes around 10 minutes.

The resulting captures, along with any presets you want to share, can be uploaded to Neural’s cloud-based system for sharing them. Once you log in, any captures or presets you choose to download from the site will automatically show up in your Quad Cortex mini.

Look for a follow-up article on what the actual process of making a capture is like; it’s similar across many different modeling devices these days, though the sound of the resulting models can vary by company.

Screenshot of The Cortex Cloud website.

The Cortex Cloud website.

The Cortex Cloud website.

Options

The Quad Cortex mini is a powerful tone platform that is both versatile and expandable. It’s good for solo jamming at home without needing to 1) buy amps, cabs, and effects and 2) crank them to ruinous volume levels. It’s good for playing live, once you have configured its fairly deep control system in a way that works for your particular songs. And it’s good for recording, letting you fiddle with endless gear combinations without running a single patch cable or digging up a 9V battery.

At $1,400, though, it’s bad for your wallet. Whether it’s worth the cost depends on your use case. If you don’t need a screen and are happy with fewer ports and options, you might consider Neural DSP’s smaller and cheaper Nano Cortex ($570) or other devices like the Tonex pedals from IK Multimedia. On the other hand, if you want a larger unit with more footswitches, you can plonk down an extra $400 for the full-fat Quad Cortex or look into various options from Fractal, Kemper, Line 6, etc.

One way of thinking about the financial calculus here would be to try out the device (or listen online) and see how well the sound works for you. Some amp purists believe that nothing beats the sound of real tubes and real speakers in a real room, cost and weight and volume be damned. Many others can’t hear a difference between the models and the originals.

If you’re in the former group, these kinds of devices are unlikely to fully satisfy you, at least when it comes to gigging and recording. So you might decide whether they are “worth it” based solely on their value as easy, light, and quiet practice platforms.

If you can’t tell (or don’t care about) the difference between the models and the real hardware, then these modeling sims start to look like a far better value. When individual amps can go for $1,500 to $2,000 or more, a massive gear collection like the one in the Quad Cortex mini is practically saving you money. You’d be a fool not to buy! (To paraphrase an explanation my son once gave me for a purchase he wanted to make.)

But even those in this group may not need an actual hardware pedal unless they really enjoy practicing without needing to use their regular computer—or unless they gig regularly. If you’re simply a recording guitarist who tends to work “in the box,” you might just pick up some cheaper Neural DSP plugins instead. Or you can buy a more comprehensive software suite like the new Paradise Guitar Studio from Universal Audio or one of the offerings from PolychromeDSP—all of which sound excellent.

If you’re content with software but want a free alternative, take a look at NAM, the Neural Amp Modeler. It’s open source modeling tech that also offers a community tone-sharing website and has been racking up lots of great reviews for its sound quality. (Though note that most of the NAM models are static captures; they sound great but represent only that exact setup and knob positioning, though the developers are working on more complex, adjustable models.)

All types of users can probably admit, though, that hardware and software modeling tech has made this a great time to be a guitar or bass player. Even if you don’t want to use them on a record, just being able to play around with and get to know this much gear with this much accuracy is a huge win for the home hobbyist and small-time gigging musician, who would otherwise never even set eyes on most of this stuff.

The key thing is just to get whatever works for you… and then to go forth and rock.

Photo of Nate Anderson

Quad Cortex mini amp modeler: All the power, half the size Read More »

us-blindsides-states-with-surprise-settlement-in-live-nation/ticketmaster-trial

US blindsides states with surprise settlement in Live Nation/Ticketmaster trial

State attorneys general were “kept in the dark and excluded materially from settlement discussions” while they prepared for trial, the filing said. On March 5, the states were “notified of the near-final terms of the settlement at 4 P.M.” and given one day to determine whether to accept or reject them,” the filing said.

States to take over lead role at trial

The US was taking the lead role in the case before the settlement was announced. In addition to seeking a mistrial, the states asked the court to stay the proceedings to give them time “to fully prepare to assume the lead role at trial and explore settlement.”

The states “have had no opportunity to obtain and reallocate the resources necessary to try the case on their own or to meaningfully discuss the settlement with Defendants and attempt to negotiate the terms,” the filing said. “Moreover, despite the primary role that DOJ has played before the jury, the United States (and several additional individual Plaintiff States) will now vanish from the trial… Due to the substantial prejudice caused by this settlement and DOJ’s abrupt exit after taking the lead role up to and during the first week of trial, a mistrial is warranted.”

New York took the lead role in the states’ filing today. “The settlement recently announced with the US Department of Justice fails to address the monopoly at the center of this case, and would benefit Live Nation at the expense of consumers. We cannot agree to it,” New York Attorney General Letitia James said today. “My attorney general colleagues and I have a strong case against Live Nation, and we will continue our lawsuit to protect consumers and restore fair competition to the live entertainment industry.”

Most of the states that backed the filing have Democratic attorneys general. But the group is bipartisan with Republican attorneys general from Kansas, New Hampshire, Ohio, Pennsylvania, Tennessee, Utah, and Wyoming.

Other states involved in the lawsuit either decided to join the US settlement or have not yet taken a position. States agreeing to the settlement are Arkansas, Iowa, Mississippi, Nebraska, Oklahoma, South Carolina, and South Dakota, the filing said. The other states involved in the lawsuit are Florida, Indiana, Louisiana, Texas, and West Virginia.

This article was updated with a statement from Live Nation.

US blindsides states with surprise settlement in Live Nation/Ticketmaster trial Read More »

apple’s-512gb-mac-studio-vanishes,-a-quiet-acknowledgment-of-the-ram-shortage

Apple’s 512GB Mac Studio vanishes, a quiet acknowledgment of the RAM shortage

If the only thing you had to go off was Apple’s string of product announcements this week, you’d have little reason to believe that there is a historic AI-driven memory and storage supply crunch going on. Some products saw RAM and storage increases at the same prices as the products they replaced; others had their prices increased a bit but came with more storage than before as compensation. And there’s the MacBook Neo, which at $599 was priced toward the low end of what Apple-watchers expected.

But even a company with Apple’s scale and buying power can’t totally defy gravity. At some point between March 4 and now, Apple quietly removed the 512GB RAM option from its top-tier M3 Ultra Mac Studio desktop. Pricing for the 256GB configuration has also increased, from $1,600 to $2,000. The Tech Specs page on Apple’s support site still acknowledges the existence of the 512GB configuration, but both the Apple Store page and the list of available configurations have removed any mention of it.

We’ve asked Apple to comment on the disappearance of the 512GB Mac Studio and will update this article if we receive a response.

It’s rare for Apple to pull any configurations of products it sells, aside from removing higher-capacity storage options for older iPhones after new ones come out. More commonly, the company will just increase its shipping estimates to reflect the supply chain backlog.

The 512GB Mac Studio was not a mass-market machine—adding that much RAM also required springing for the most expensive M3 Ultra model, which brought the system’s price to a whopping $9,499.

Apple’s 512GB Mac Studio vanishes, a quiet acknowledgment of the RAM shortage Read More »

asteroid-defense-mission-shifted-the-orbit-of-more-than-its-target

Asteroid defense mission shifted the orbit of more than its target


The binary asteroid’s orbit around the Sun was affected by the impact.

Italy’s LICIACube spacecraft snapped this image of asteroids Didymos (lower left) and Dimorphos (upper right) a few minutes after the impact of DART on September 26, 2022. Credit: ASI/NASA

On September 26, 2022, NASA’s Double Asteroid Redirection Test (DART) spacecraft crashed into a binary asteroid system. By intentionally ramming a probe into the 160-meter-wide moonlet named Dimorphos, the smaller of the two asteroids, humanity demonstrated that the kinetic impact method of planetary defense actually works. The immediate result was that Dimorphos’ orbital period around Didymos, its larger parent body, was slashed by 33 minutes.

Of course, altering a moonlet’s local orbit doesn’t seem like enough to safeguard Earth from civilization-ending impacts. But now, as long-term observational data has come in, it seems we accomplished more than that. DART actually changed the trajectory of the entire Didymos binary system, altering its orbit around the Sun.

Tracking space rocks

Measuring the orbital shift of a 780-meter-wide primary asteroid and its moonlet from millions of miles away isn’t trivial. When DART slammed into Dimorphos, it didn’t knock the binary system wildly off its trajectory around the Sun. The change in the system’s heliocentric trajectory was expected to be small, a minuscule nudge that would become apparent only after months or years of continuous observation. By analyzing enough painstakingly gathered data, a global team of researchers led by Rahil Makadia at the University of Illinois Urbana-Champaign has now determined the consequences of the DART impact.

To find the infinitesimal deviation DART created, Makadia’s team relied mostly on a technique called stellar occultation. When an asteroid passes in front of a distant star from the perspective of an observer on Earth, the star briefly blinks out. By precisely timing these blinks as they sweep across the globe, astronomers can pinpoint an asteroid’s position with astonishing accuracy.

Between October 2022 and March 2025, we captured 22 such stellar occultations of the Didymos system. Combined with a huge dataset publicly available at the Minor Planet Data Center that included nearly 6,000 ground-based astrometric measurements taken over 29 years, optical navigation data from the DART probe’s approach, and ground-based radar measurements, researchers finally had all they needed.

“Once we had enough measurements before and after the DART impact, we could discern how Didymos’ orbit has changed,” Makadia said.

When the vending-machine-sized DART probe crashed into Dimorphos at over 22,000 kilometers per hour, it decreased the along-track velocity of the entire Didymos system by roughly 11.7 micrometers per second. But the team thinks it’s still significant. “When you do it early enough, even a small impulse can accumulate over years and cause a meaningful shift,” Makadia explained.

Also, the DART impact itself was not the only force that changed Didymos’ orbit.

The ejecta engine

The pure kinetic energy of a 500-kilogram spacecraft hitting at hypersonic speeds is impressive, but on its own, it would not slow a huge asteroid that much. When DART struck Dimorphos, it blasted pulverized rock and dust out into the void. “The material kicked up off an asteroid surface acts like an extra rocket plume,” Makadia said.

Scientists call this effect the momentum enhancement factor, denoted by the Greek letter beta. If the spacecraft impact transferred exactly its own momentum and no debris was kicked up, beta would be exactly one.

Because Dimorphos orbits Didymos, some of the ejecta remained trapped in the system, where it altered the mutual orbit between the two rocks. But a crucial fraction of the ejecta achieved escape velocity from the entire binary system. The momentum carried away by the system-escaping debris is what ultimately contributed to shoving the center of mass of the whole Didymos-Dimorphos pair. “In our case, we found that the beta parameter due to DART impact was around two,” Makadia explained.

The debris blasted completely out of the Didymos system gave the asteroids a push roughly equal to the initial impact of the spacecraft itself.

To calculate how momentum was transferred, Makadia and his colleagues had to determine precisely how massive Didymos and Dimorphos are. By linking the heliocentric deflection to the previously known changes in Dimorphos’ local orbit, the researchers were able to perform a neat mathematical trick to uncover the bulk densities of both asteroids. And this revealed something a bit unexpected about the Didymos system.

“Most studies were going under the assumption that both asteroids have equal density—turns out that assumption was not correct,” Makadia said.

A rubble pile

Based on Makadia’s calculations, Didymos, the primary body, is relatively solid. It has a bulk density of around 2.6 tons per cubic meter, which aligns with standard estimates for siliceous asteroids. Dimorphos, however, is a different story. Its density is a surprisingly low 1.51 tons per cubic meter. This implies that the smaller asteroid targeted by DART is essentially a fluffy, loosely bound agglomeration of boulders, rocks, and dust, with empty voids between the rubble.

“This was a real surprise,” Makadia said. “We previously didn’t know anything about the density of Dimorphos.” The contrast in density tells the story of how this binary system formed.

Billions of years of uneven heating and radiation from the Sun can cause an irregularly shaped asteroid like Didymos to gradually spin faster, a phenomenon known as the YORP (Yarkovsky, O’Keefe, Radzievskii, Paddack) effect. Eventually, Didymos spun so fast that the centrifugal force overcame its gravity, and it began shedding loose material from its equator. That shed material eventually coalesced in orbit, gently clumping together to form the porous, fragile moonlet we now know as Dimorphos.

Overall, Didymos is nearly 200 times more massive than its smaller companion, which explains why shifting the larger asteroid system takes such an enormous amount of force. The sheer inertia of Didymos means that the barycenter deflection of its entire system was just a tiny fraction of the deflection felt locally by Dimorphos.

Planetary defense

Makadia’s findings confirm the models we used to estimate the consequences of the DART impact: The Didymos system still poses zero threat to us, at least for the next 100 years or so. “The pre-DART condition was that the closest the Didymos system can get to Earth was around 15 lunar distances, and this has not changed appreciably,” Makadia explained.

The goal of DART was primarily to take our planetary defense out of the realm of computer models and get us some hands-on, practical experience, and Makadia thinks we succeeded in doing that. “Our work proves that hitting the secondary asteroid is a viable path for deflecting a binary system away as long as the push is large enough,” he said. “This wasn’t the goal of DART, but we can always design a bigger spacecraft.”

This experience applies both to deflecting binary asteroid systems like Didymos and singular objects. “Our results definitely help us in all sorts of future kinetic impact endeavors,” Makadia added.

The final verification of the DART mission’s consequences, though, will come in late 2026, when the European Space Agency’s Hera spacecraft will arrive at the Didymos system.

By performing independent, in-situ measurements of things like the density of Didymos and Dimorphos, Hera will provide a lot of precise gravitational and physical data that Makadia hopes to use to refine his calculations.

“It’s a high-fidelity instrument that hopefully will give us confirmation of what we believe,” Makadia said. “Plus, there are always new things to be found out when we visit an asteroid. I’m very excited about when Hera gets there.”

Science Advances, 2026.  DOI: 10.1126/sciadv.aea4259

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Asteroid defense mission shifted the orbit of more than its target Read More »

musk-fails-to-block-california-data-disclosure-law-he-fears-will-ruin-xai

Musk fails to block California data disclosure law he fears will ruin xAI


Musk can’t convince judge public doesn’t care about where AI training data comes from.

Elon Musk’s xAI has lost its bid for a preliminary injunction that would have temporarily blocked California from enforcing a law that requires AI firms to publicly share information about their training data.

xAI had tried to argue that California’s Assembly Bill 2013 (AB 2013) forced AI firms to disclose carefully guarded trade secrets.

The law requires AI developers whose models are accessible in the state to clearly explain which dataset sources were used to train models, when the data was collected, if the collection is ongoing, and whether the datasets include any data protected by copyrights, trademarks, or patents. Disclosures would also clarify whether companies licensed or purchased training data and whether the training data included any personal information. It would also help consumers assess how much synthetic data was used to train the model, which could serve as a measure of quality.

However, this information is precisely what makes xAI valuable, with its intensive data sourcing supposedly setting it apart from its biggest rivals, xAI argued. Allowing enforcement could be “economically devastating” to xAI, Musk’s company argued, effectively reducing “the value of xAI’s trade secrets to zero,” xAI’s complaint said. Further, xAI insisted, these disclosures “cannot possibly be helpful to consumers” while supposedly posing a real risk of gutting the entire AI industry.

Specifically, xAI argued that its dataset sources, dataset sizes, and cleaning methods were all trade secrets.

“If competitors could see the sources of all of xAI’s datasets or even the size of its datasets, competitors could evaluate both what data xAI has and how much they lack,” xAI argued. In one hypothetical, xAI speculated that “if OpenAI (another leading AI company) were to discover that xAI was using an important dataset to train its models that OpenAI was not, OpenAI would almost certainly acquire that dataset to train its own model, and vice versa.”

However, in an order issued on Wednesday, US District Judge Jesus Bernal said that xAI failed to show that California’s law, which took effect in January, required the company to reveal any trade secrets.

xAI’s biggest problem was being too vague about the harms it faced if the law was not halted, the judge said. Instead of explaining why the disclosures could directly harm xAI, the company offered only “a variety of general allegations about the importance of datasets in developing AI models and why they are kept secret,” Bernal wrote, describing X as trading in “frequent abstractions and hypotheticals.”

He denied xAI’s motion for a preliminary injunction while supporting the government’s interest in helping the public assess how the latest AI models were trained.

The lawsuit will continue, but xAI will have to comply with California’s law in the meantime. That could see Musk sharing information he’d rather OpenAI had no knowledge of at a time when he’s embroiled in several lawsuits against the leading AI firm he now regrets helping to found.

While not ending the fight to keep OpenAI away from xAI’s training data, this week’s ruling is another defeat for Musk after a judge last month tossed one of his OpenAI lawsuits, ruling that Musk had no proof that OpenAI had stolen trade secrets.

xAI argued California wants to silence Grok

xAI’s complaint argued that California’s law was unconstitutional since data can be considered a trade secret under the Fifth Amendment. The company also argued that the state was trying to regulate the outputs of xAI’s controversial chatbot, Grok, and was unfairly compelling speech from xAI while exempting other firms for security purposes.

At this stage of the litigation, Bernal disagreed that xAI might be irreparably harmed if the law was not halted.

On the Fifth Amendment claim, the judge said it’s not that training data could never be considered a trade secret. It’s just that xAI “has not identified any dataset or approach to cleaning and using datasets that is distinct from its competitors in a manner warranting trade secret protection.”

“It is not lost on the Court the important role of datasets in AI training and development, and that, hypothetically, datasets and details about them could be trade secrets,” Bernal wrote. But xAI “has not alleged that it actually uses datasets that are unique, that it has meaningfully larger or smaller datasets than competitors, or that it cleans its datasets in unique ways.”

Therefore, xAI is not likely to succeed on the merits of its Fifth Amendment claim.

The same goes for First Amendment arguments. xAI failed to show that the law improperly “forces developers to publicly disclose their data sources in an attempt to identify what California deems to be ‘data riddled with implicit and explicit biases,’” Bernal wrote.

To xAI, it seemed like the state was trying to use the law to influence the outputs of its chatbot Grok, the company argued, which should be protected commercial speech.

Over the past year, Grok has increasingly drawn global public scrutiny for its antisemitic rants and for generating nonconsensual intimate imagery (NCII) and child sexual abuse materials (CSAM). But despite these scandals, which prompted a California probe, Bernal contradicted xAI, saying California did not appear to be trying to regulate controversial or biased outputs, as xAI feared.

“Nothing in the language of the statute suggests that California is attempting to influence Plaintiff’s models’ outputs by requiring dataset disclosure,” Bernal wrote.

Addressing xAI’s other speech concerns, he noted that “the statute does not functionally ask Plaintiff to share its opinions on the role of certain datasets in AI model development or make ideological statements about the utility of various datasets or cleaning methods.”

“No part of the statute indicates any plan to regulate or censor models based on the datasets with which they are developed and trained,” Bernal wrote.

Public “cannot possibly” care about AI training data

Perhaps most frustrating for xAI as it continues to fight to block the law, Bernal also disputed that the public had no interest in the training data disclosures.

“It strains credulity to essentially suggest that no consumer is capable of making a useful evaluation of Plaintiff’s AI models by reviewing information about the datasets used to train them and that therefore there is no substantial government interest advanced by this disclosure statute,” Bernal wrote.

He noted that the law simply requires companies to alert the public about information that can feasibly be used to weigh whether they want to use one model over another.

Nothing about the required disclosures is inherently political, the judge suggested, although some consumers might select or avoid certain models with perceived political biases. As an example, Bernal opined that consumers may want to know “if certain medical data or scientific information was used to train a model” to decide if they can trust the model “to be sufficiently comprehensively trained and reliable for the consumer’s purposes.”

“In the marketplace of AI models, AB 2013 requires AI model developers to provide information about training datasets, thereby giving the public information necessary to determine whether they will use—or rely on information produced by—Plaintiff’s model relative to the other options on the market,” Bernal wrote.

Moving forward, xAI seems to face an uphill battle to win this fight. It will need to gather more evidence to demonstrate that its datasets or cleaning methods are sufficiently unique to be considered trade secrets that give the company a competitive edge.

It will also likely have to deepen its arguments that consumers don’t care about disclosures and that the government has not explored less burdensome alternatives that could “achieve the goal of transparency for consumers,” Bernal suggested.

One possible path to a win could be proving that California’s law is so vague that it potentially puts xAI on the hook for disclosing its customers’ training data for individual Grok licenses. But Bernal emphasized that xAI “must actually face such a conundrum—rather than raising an abstract possible issue among AI systems developers—for the Court to make a determination on this issue.”

xAI did not respond to Ars’ request to comment.

A spokesperson for the California Department of Justice told Reuters that the department “celebrates this key win and remains committed to continuing our defense” of the law.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Musk fails to block California data disclosure law he fears will ruin xAI Read More »

rfk-jr.’s-anti-vaccine-policies-are-“unreviewable,”-doj-lawyer-tells-judge

RFK Jr.’s anti-vaccine policies are “unreviewable,” DOJ lawyer tells judge

US Department of Justice lawyer Isaac Belfer argued that Kennedy has the broad authority to make all of the changes he has already made and more. He claimed that the AAP and other medical groups were asking the court to “supervise vaccine policy indefinitely.”

US District Judge Brian Murphy overseeing the case in Boston appeared skeptical of the suggestion that Kennedy has seemingly limitless authority over federal vaccine policy.

“Is it your position that [Kennedy] is totally ​unreviewable?” Murphy asked Belfer, according to Reuters. “If the secretary said instead of getting a shot to prevent measles I think you should get a shot that gives you measles, is that unreviewable?”

“Yes,” Belfer replied.

Belfer, arguing on behalf of the Department of Health and Human Services, said the medical organizations were merely seeking to use the courts to enact their favored vaccine policy. But the lawyer for the groups, James Oh, countered that the vaccine policy changes—which were not carried out with typical processes and lack supporting scientific evidence—were done improperly and without reasoned decision-making.

Kennedy’s vaccine policy changes are the “actions of someone who believes he can do whatever he wants,” Oh said, according to Stat News.

Murphy indicated he would issue a ruling on the injunction before the CDC vaccine advisors plan to meet on March 18, calling it a “hard deadline.”

RFK Jr.’s anti-vaccine policies are “unreviewable,” DOJ lawyer tells judge Read More »

nerve-damage,-energy-management,-and-apple-tv:-f1-in-2026-starts-today

Nerve damage, energy management, and Apple TV: F1 in 2026 starts today


Drivers aren’t happy about energy management, and one team won’t finish the race.

Credit: Rudy Carezzevoli/Getty Images

Later this evening—Friday morning local time—the new 1.6 L V6 engines that power this year’s crop of Formula 1 machinery will roar into life as practice for the first race of the year gets underway in Melbourne, Australia. After several years in which the teams’ performances converged so much that the sport was determined by finer margins than ever, 2026 sees a comprehensive reset.

The cars are smaller and lighter, and they have different aerodynamic configurations for the corners and the straights. The hybrid systems are more powerful, and each runs on its own bespoke sustainable fuel. There’s even a new way to watch as F1 makes a $750 million move from ESPN to Apple. Over the offseason, throughout the preseason shakedown in Barcelona, and then two three-day tests in Bahrain, plenty of questions have arisen: Are the new technical regulations a mistake? Can we still watch F1TV? And just what the heck is going on, Aston Martin?

400 kW + 350 kW = headaches?

After more than a decade with the same power units—and the same few manufacturers—the sport wanted to attract some new blood. Drawing in more car companies, which have boards and shareholders to answer to, required acknowledging road relevance and some commitment to sustainability and decarbonization. Since OEMs are all about electrification, that meant a greater emphasis on the hybrid side of the power units. And the veneer of environmental responsibility arrives in the form of heavily audited, fully sustainable fuels.

The engines are still 1.6 L V6s and turbocharged, but those turbochargers no longer contain the hybrid system known as the MGU-H. (It was dropped for cost grounds and a lack of road applications, but Porsche has started selling cars using this technology, and boy, are they good.) There’s now a much more powerful MGU-K, the electric motor that lives between the V6 and the transmission, and a more powerful battery. The combustion engines now generate 400 kW (536 hp), with the MGU-K adding another 350 kW (469 hp).

The rules package succeeded in attracting new power unit makers to the sport. Ferrari and Mercedes have been joined by Audi, Honda, and Red Bull’s in-house engine program (with help from Ford), although it is true that Alpine (formerly Renault) ended its long-running engine operation at the end of 2025 as its team opts for Mercedes power instead, joining the other customer teams McLaren and Williams.

Cadillac signed up, too, and it takes to the grid in Australia as the sport’s 11th team, although it will use Ferrari power units (like Haas) for the next three years while it develops its own for 2029.

BAHRAIN, BAHRAIN - FEBRUARY 11: The 2026 Formula 1 drivers pose for a photo during the F1 Photocall at Bahrain International Circuit on February 11, 2026 in Bahrain, Bahrain. (Photo by Mark Sutton - Formula 1/Formula 1 via Getty Images)

The 22 drivers who will compete in the 2026 season.

Credit: Mark Sutton – Formula 1/Formula 1 via Getty Images

The 22 drivers who will compete in the 2026 season. Credit: Mark Sutton – Formula 1/Formula 1 via Getty Images

On paper, 750 kW (1,006 hp) F1 cars should get everyone pretty excited. But they’ll only have that much power when the 4 MJ (1.1 kWh) battery is fully charged. That can happen in a couple of ways: regen via the rear wheels under braking and by siphoning power from the V6, which the sport calls “superclipping.” You’ll hear the engines continue to strain even as the cars lose speed at the end of long straights as horsepower is diverted into the battery and away from pushing the car through the air.

Each lap, each car is allowed to deploy up to 8.5 MJ (2.36 kWh), which means depleting and replenishing the battery more than once per lap. Because electrical energy is limited, drivers will have to use it intelligently. An optimal lap probably won’t be completely flat out the entire way; making up too much time in one corner using the full hybrid deployment might cost more on the following straight when there’s no more MGU-K contribution.

It’s fair to say some of F1’s biggest stars have not been entirely enthusiastic about having to adopt some of the same energy management techniques already used by their peers driving hybrid prototypes in the World Endurance Championship and all-electric single-seaters in Formula E.

After the first day of testing last month, four-time world champion Max Verstappen had some thoughts. “As a pure driver, I enjoy driving flat out,” he said. “And at the moment, you cannot drive like that. There’s a lot going on. A lot of what you do as a driver, in terms of inputs, has a massive effect on the energy side of things. For me, that’s just not Formula 1. Maybe it’s better to drive Formula E, right? Because that’s all about energy efficiency and management. That’s what they stand for.”

Not every track shares the same characteristics, however.

“Some tracks, you don’t have to do lift and coast for a single lap, and in some places, you have to do a lot of lift and coast for a qualifying lap,” driver Lewis Hamilton told reporters today. “There can be a big difference between deployment, of a second. If you don’t lift in one corner, for example Turn 6 and Turn 5 here [in Australia], if you take it flat or if you lift, it has a massive compound effect through the rest of the lap. You can do a good lap, but you could be a second down because the deployment is off.”

MELBOURNE, AUSTRALIA - MARCH 5: Lewis Hamilton of Great Britain and Scuderia Ferrari in the drivers' press conference during previews ahead of the F1 Grand Prix of Australia at Albert Park Grand Prix Circuit on March 5, 2026 in Melbourne, Australia. (Photo by Jayce Illman/Getty Images)

Will we see a smiling Lewis Hamilton more often this year? He might not love the new style of racing, but at least he’s much more comfortable with the way the cars handle.

Credit: Jayce Illman/Getty Images

Will we see a smiling Lewis Hamilton more often this year? He might not love the new style of racing, but at least he’s much more comfortable with the way the cars handle. Credit: Jayce Illman/Getty Images

An MGU-K on the front axle would have helped; about 60 percent of the braking is done by the front wheels, and that energy is lost as heat instead. But all-wheel drive was vehemently opposed by every other OEM during the planning stages out of fear of Audi’s experience with all-wheel-drive hybrids in WEC. And they probably did us a favor in that regard: Mark Hughes convincingly argues that adding a front motor would open the door to stability control in F1, something that was already prevented in 2008 and which would certainly ruin the sport if allowed.

An easier fix, albeit one that would slow lap times, would be to restrict the amount of energy the MGU-K could deploy, down to 250 or even 200 kW (335–268 hp). During testing in Bahrain, the sport’s organizing body, the FIA, had some teams try this out. Don’t expect any power restriction for the first few races, though; sensibly, the sport will give it some time to see how everything works in practice.

Six laps? All day??

F1 in 2026 will see much greater variability in performance between the teams than the ultra-tight gaps we saw last year. That, of course, was the result of several years of stable rules that didn’t allow much freedom due to factors like weight balance and suspension setup. Mercedes is a favorite going into this year, but Ferrari, Red Bull, and McLaren also look very strong. Haas, Alpine, and Racing Bulls head the midfield, with Audi impressing and Williams disappointing, and Cadillac certainly hasn’t embarrassed itself.

If only Aston Martin or its engine partner, Honda, could say the same. The team’s Canadian billionaire owner, Lawrence Stroll, has invested hundreds of millions into the UK-based team, building a state-of-the-art factory and wind tunnel and recently hiring Adrian Newey, the megastar designer and aerodynamicist whose cars have been responsible for 12 championships so far (Newey even has a stake in the team).

2026 is Aston Martin’s first year with a works engine supply, provided by Honda. The Japanese OEM has an on-off relationship with the sport, most recently deciding in 2020 to leave, then changing its mind again in 2024 due to the new rules. That four-year gap meant that the current program at Honda was effectively started from scratch, and it has been hard going.

In fact, as early as January last year, the head of Honda Motorsport, Koji Watanabe, told me that Honda was having problems. “Everything is new. [The] motor is new, [developing] 350 kW—it’s a very compact one that we need. And also the lightweight battery is not so easy to develop. Also the small engine with big power. So everything is very difficult, but we try our best,” Watanabe said.

Once the power unit was fitted to the car, things got much worse. Aston Martin was late to the Barcelona shakedown, and its drivers posted the slowest lap times in both the first and second Bahrain tests. The team also completed fewer laps than any other—just 206 during the first three-day test and a mere 128 laps during the second test. (For comparison, Mercedes, McLaren, and Ferrari each did more than 420 laps during the first test, and Mercedes, Racing Bulls, and Haas did more than 400 laps during the second test.)

Aston Martin's Spanish driver Fernando Alonso inspects his car with team mechanics in the garage ahead of the Formula One Australian Grand Prix at Melbourne's Albert Park on March 5, 2026. (Photo by Paul Crock / AFP via Getty Images)

Alonso has already fallen out with Honda once during his career over engine problems.

Credit: Paul Crock / AFP via Getty Images

Alonso has already fallen out with Honda once during his career over engine problems. Credit: Paul Crock / AFP via Getty Images

The problems were myriad, affecting both the gearbox and the power unit. Chief among the issues was a vibration that shook apart components like the battery pack, destroying spares. So on the final day of testing, the team was limited to a mere six laps of the Bahrain circuit. With so little testing and so much to debug, the prospect of Aston Martin finishing in Australia—or any of the first few races—seems doubtful.

But wait, it gets worse. Earlier today, Newey held a press conference in Australia, where he explained that the team hadn’t made any progress in damping the vibration, which resonates through the carbon fiber tub. Having parts like mirrors shake off is less than ideal, but the vibration is also transmitted through the steering wheel, and the problem is so severe that both Fernando Alonso and Lance Stroll risk permanent nerve damage if they try to complete an entire race distance.

Asked to describe conditions in the car, Stroll (who suffered a hand injury last year) said, “I don’t know how you can compare it. I guess just electrocute yourself on a chair or something like that, not far off. It’s just… it’s very uncomfortable vibrations. It’s bad for the engine but also for the human inside the car. We need to get on top of it, but I think we will.”

Could this precipitate a driver move? Stroll Jr. is a permanent fixture as long as Stroll Sr. owns the team. But two-time champion Alonso already lost several years of his career to a poor Honda power unit and uncompetitive McLarens, and at 44, he’s now much closer to retiring. Rather than the Newey world-beater he thought he was getting, Alonso, who hasn’t won a race for 13 years, might well be looking at his old home Alpine a little wistfully. Alpine boss Flavio Briatore is also Alonso’s long-time manager, and Briatore certainly has no qualms when it comes to benching or replacing drivers. If I were Franco Colapinto or Pierre Gasly, I’d keep an eye on that.

Apple

If you had come into the #macintosh channel on the Ars IRC server in 2003 and told us that Apple would one day be the broadcast home of F1 in the US, you probably would have been asked where you got such good drugs. But last year, after producing a blockbuster movie about the sport, Apple snatched the US rights from ESPN.

Understandably, for existing ESPN customers who don’t have and don’t want an Apple TV subscription ($13 a month), this wasn’t great news. There was also a lot of confusion about F1’s standalone digital TV offering. After a rocky launch in 2018, F1TV has come into its own, offering a much less British-centric commentary feed than the UK’s Sky (which it includes as an alternate audio option), in-car feeds, and a comprehensive archive of races dating back decades.

If you were previously subscribed to both Apple TV and F1TV Premium, you have one less bill to pay. If you’re an Apple TV subscriber in the US, you now have access to F1TV Premium via its website and apps. I’m a subscriber to both, and my two accounts were tied together without any problems.

Whether you use the F1TV app or Apple’s, you’ll have the option for both the F1TV commentary of Alex Jacques and Joylon Palmer or the Sky audio feed of David Croft and Martin Brundle, plus Spanish-language audio. Apple says each Grand Prix will have up to 30 other feeds, including in-car from all 22 cars, a driver tracker, a telemetry feed, and more.

Here’s what F1’s multi view looks like in Apple’s TV app. Apple

The computer company is going all out, with integrations across its various services. Apple Music will offer live audio broadcasts of races and curated playlists from drivers, and F1 will feature in the Podcast and News apps. There are even enhanced maps for some circuits—if Monza makes the cut, I will report back on it later this year. For a non-Apple Maps map look at the sport, consider this interactive map created by an Ars reader, F1 fan, and geospatial expert that includes all the team factories and the 24 circuits.

Photo of Jonathan M. Gitlin

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

Nerve damage, energy management, and Apple TV: F1 in 2026 starts today Read More »

m5-pro-and-m5-max-are-surprisingly-big-departures-from-older-apple-silicon

M5 Pro and M5 Max are surprisingly big departures from older Apple Silicon


Apple is using more chiplets and three types of CPU cores to make the M5 family.

As part of today’s MacBook Pro update, Apple has also unveiled the M5 Pro and M5 Max, the newest members of the M5 chip family.

Normally, the Pro and Max chips take the same basic building blocks from the basic chip and just scale them up—more CPU cores, more GPU cores, and more memory bandwidth. But the M5 chips are a surprisingly large departure from past generations, both in terms of the CPU architectures they use and in how they’re packaged together.

We won’t know the impact these changes have had on performance until we have hardware in hand to test, but here are all the technical details we’ve been able to glean about the new updates and how the M5 chip family stacks up against the past few generations of Apple Silicon chips.

New Fusion Architecture and a third type of CPU core

Apple says that M5 Pro and M5 Max use an “all-new Fusion Architecture” that welds two silicon chiplets into a single processor. Apple has used this approach before, but historically only to combine two Max chips together into an Ultra.

Apple’s approach here is different—for example, the M5 Pro is not just a pair of M5 chips welded together. Rather, Apple has one chiplet handling the CPU and most of the I/O, and a second one that’s mainly for graphics, both built on the same 3nm TSMC manufacturing process.

The first silicon die is always the same, whether you get an M5 Pro or M5 Max. It includes the 18-core CPU, the 16-core Neural Engine, and controllers for the SSD, for the Thunderbolt ports, and for driving displays.

The second die is where the two chips differ; the M5 Pro gets up to 20 GPU cores, a single media encoding/decoding engine, and a memory controller with up to 307 GB/s of bandwidth. The M5 Max gets up to 40 GPU cores, a pair of media encoding/decoding engines, and a memory controller that provides up to 614 GB/s of memory bandwidth (note that everything in the GPU die seems to be doubled, implying that Apple is, in fact, sticking two M5 Pro GPUs together to make one M5 Max GPU).

Apple’s spec sheets now list three distinct types of CPU cores: “super” cores, performance cores, and efficiency cores.

Credit: Apple

Apple’s spec sheets now list three distinct types of CPU cores: “super” cores, performance cores, and efficiency cores. Credit: Apple

Apple is also introducing a third distinct type of CPU core beyond the typical “performance cores” and “efficiency cores” that were included in older M-series processors.

At the top, you have “super cores,” which is Apple’s new M5-era branding for what it used to call “performance cores.” This change is retroactive and also applies to the regular M5; Apple’s spec sheet for the M5 MacBook Pro used to refer to the big cores as “performance cores” but now calls them “super cores.”

At the bottom of the hierarchy, you still have “efficiency cores” that are tuned for low power usage. The M5 still uses six efficiency cores, and unlike the super cores, they haven’t been rebranded since yesterday. These cores do help with multi-core performance, but they prioritize lower power usage and lower temperatures first, since they need to fit in fanless devices like the iPad Pro and MacBook Air.

And now, in the middle, we have a new type of “performance core” used exclusively in the M5 Pro and M5 Max.

These are, in fact, a new, third type of CPU core design, distinct from both the super cores and the M5’s efficiency cores. They apparently use designs similar to the super cores but prioritize multi-threaded performance rather than fast single-core performance. Apple’s approach with the new performance cores sounds similar to the one AMD uses in its laptop silicon: it has larger Zen 4 and Zen 5 CPU cores, optimized for peak clock speeds and higher power usage, and smaller Zen 4c and Zen 5c cores that support the same capabilities but run slower and are optimized to use less die space.

What we don’t know yet is how these new chips perform relative to the previous versions. Technically, the M4 Pro and M4 Max both had more “big” cores than the M5 Pro and M5 Max do—up to 10 for the M4 Pro and up to 12 for the M4 Max. But higher single-core performance from the six “super cores” and strong multi-core performance from the 12 performance cores should mean that the M5 generation still shakes out to be faster overall.

How all the chips compare

For Mac buyers choosing between these three processors, we’re updating the spec tables we’ve put together in the past, comparing the M5-generation chips to one another and to their counterparts in the M2, M3, and M4 generations.

Here’s how all of the M5 chips stack up, including the partly disabled versions of each chip that Apple sells in lower-end MacBook Air and Pro models:

CPU S/P/E-cores GPU cores RAM options Display support (including internal) Memory bandwidth Video decode/encode engines
Apple M5 (low) 4S/6E 8 16GB Up to three 153GB/s One
Apple M5 (high) 4S/6E 10 16/24/32GB Up to three 153GB/s One
Apple M5 Pro (low) 5S/10P 16 24GB Up to four 307GB/s One
Apple M5 Pro (high) 6S/12P 20 24/48/64GB Up to four 307GB/s One
Apple M5 Max (low) 6S/12P 32 36GB Up to five 460GB/s Two
Apple M5 Max (high) 6S/12P 40 48/64/128GB Up to five 614GB/s Two

Despite all the big under-the-hood changes, the basic hierarchy here remains the same as in past generations. The Pro tier offers the biggest bump to CPU performance compared to the basic M5, along with twice as many GPU cores. The Max chip is mainly meant for those who want better graphics, 128GB of RAM, or both.

Compared to M2, M3, and M4

CPU S/P/E-cores GPU cores RAM options Display support (including internal) Memory bandwidth
Apple M5 (high) 4S/6E 8 16/24/32GB Up to three 153GB/s
Apple M4 (high) 4P/6E 10 16/24/32GB Up to three 120GB/s
Apple M3 (high) 4P/4E 10 8/16/24GB Up to two 102.4GB/s
Apple M2 (high) 4P/4E 10 8/16/24GB Up to two 102.4GB/s

Compared to past generations, the M5 looks like the basic incremental improvement that we’re used to—no huge jumps in CPU or GPU core counts, relying mostly on architectural improvements and memory bandwidth increases to deliver the expected generation-over-generation speed boost. The Pro and Max chips have similar graphics core counts across generations, but there has been more variability when it comes to the CPU cores.

CPU S/P/E-cores GPU cores RAM options Display support (including internal) Memory bandwidth
Apple M5 Pro (high) 6S/12P 20 24/48/64GB Up to four 307GB/s
Apple M4 Pro (high) 10P/4E 20 24/48/64GB Up to three 273GB/s
Apple M3 Pro (high) 6P/6E 18 18/36GB Up to three 153.6GB/s
Apple M2 Pro (high) 8P/4E 19 16/32GB Up to three 204.8GB/s

The Pro chips have been sort of all over the place, and the M3 generation in particular is an outlier. When we tested it at the time, we found it to be more or less a wash compared to the M2 Pro, which was (and still is) rare for Apple Silicon generations. The M4 Pro was a better upgrade, and the M5 Pro should still feel like an improvement over the M4 Pro despite the big underlying changes.

CPU S/P/E-cores GPU cores RAM options Display support (including internal) Memory bandwidth
Apple M5 Max (high) 6S/12P 40 48/64/128GB Up to five 614GB/s
Apple M4 Max (high) 12P/4E 40 48/64/128GB Up to five 546GB/s
Apple M3 Max (high) 12P/4E 40 48/64/128GB Up to five 409.6GB/s
Apple M2 Max (high) 8P/4E 38 64/96GB Up to five 409.6GB/s

The M5 Max will be the biggest test for Apple’s new performance cores. According to our testing of the M5 in the 14-inch MacBook Pro, the M5-generation super cores are about 12 to 15 percent faster than the M4 generation’s performance cores. The M4 Max had up to 12 of those cores, while the M5 Max only has six. That leaves a pretty substantial gap for M5 Max’s new non-super P-cores to close.

Aside from that, the biggest outstanding question is how the M5 shakeup changes Apple’s approach to Ultra chips, assuming the company continues to make them (Apple has already said that not every processor generation will see an Ultra update).

The M1 Ultra, M2 Ultra, and M3 Ultra were all made by fusing two Max chips together, perfectly doubling the CPU and GPU core counts. Will an M5 Ultra still weld two M5 Max chips together using the same basic ingredients to make an even larger processor? Or will Apple create distinct CPU and GPU chiplets just for the Ultra series? All we can say for sure is that we can no longer make assumptions based on Apple’s past behavior, which tends to be the most reliable predictor of its future behavior.

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

M5 Pro and M5 Max are surprisingly big departures from older Apple Silicon Read More »