Author name: Shannon Garcia

magnetars-drag-spacetime-to-power-superluminous-supernovae

Magnetars drag spacetime to power superluminous supernovae


Frame-dragging may explain an odd pattern seen in the brightest supernovae.

Some of the most extreme explosions in the universe are Type I superluminous supernovae. “They are one of the brightest explosions in the Universe,” says Joseph Farah, an astrophysicist at the University of California, Santa Barbara. For years, astrophysicists tried to understand what exactly makes superluminous supernovae so absurdly powerful. Now it seems like we may finally have some answers.

Farah and his colleagues have found that these events are most likely powered by magnetars, rapidly spinning neutron stars that warp the very space and time around them.

The power within

Magnetars have been a leading candidate for the engine behind superluminous supernovae. The theory says these insanely magnetized stars are born from the collapsing core of the original progenitor star and emit energy via magnetic dipole radiation. “This core is roughly a one solar mass object that gets crushed down to the size of a city,” Farah explains. As its spin slows down, a magnetar bleeds its rotational energy into the expanding material of the dead star, lighting it up.

The problem was that this theory did not quite explain observations. In a standard magnetar model, the light curve of the supernova should rise rapidly and then fade away evenly as the neutron star loses its rotational energy. “This way the light curve, in the prediction of this model, just goes up and then down quite smoothly,” Farah says. But when astronomers observe superluminous supernovae, they almost never see this smooth fade. Instead, they see bumps, wiggles, and strange modulations. The light curve flickers over months.

For a while, scientists tried to patch the magnetar engine theory to fit observations. Maybe the expanding debris was slamming into irregular shells of material shed by the star before it died. Or perhaps the magnetar engine was spitting out random, violent flares. But these explanations required highly specific, fine-tuned parameters to match what we were seeing through our telescopes.

The solution to the strange flickering problem came when the Liverpool Gravitational Wave Optical Transient Observer collaboration detected an object designated SN 2024afav on December 12, 2024. Initially, the object looked like a standard superluminous supernova. “It was as bright and it had bumps in the light curve like many other objects of this kind,” Farah says. But as the telescopes kept watching, it started doing something unprecedented: It started to chirp.

The chirping star

In physics, a chirp refers to a signal with a frequency that steadily increases over time. In the case of SN 2024afav, its emissions were bumping up and down, but the gap between these bumps was shrinking. After a second and third bump both appeared with the gaps between them reduced by roughly 35 percent, Farah and his team realized they could calculate how much the gap between the bumps would decrease next.

The team adjusted their observation schedule, pointed their instruments at SN 2024afav, and discovered the fourth bump appeared exactly when they expected it would. The fifth bump enabled the scientists to narrow down the period reduction to about 29 percent.

The fact that Farah and his colleagues could accurately predict the bumps delivered a massive blow to our existing magnetar models. While a few irregular bumps could be explained away by the supernova ejecta crashing into clouds of gas, it doesn’t explain perfectly timed, cleanly sinusoidal modulations with a steadily decaying period. Random space rubble just doesn’t work that way.

“So, we came up with the new model to describe this behavior,” Farah explains. They proposed a new physical mechanism that relied on the Lense-Thirring effect, otherwise known as frame-dragging. Frame-dragging is a prediction of General Relativity, where a massive spinning object slightly drags the spacetime around with it as it rotates. “We didn’t try this mechanism before because it had never been seen around a magnetar before,” Farah says. But when his team did try it, it turned out to perfectly match what was going on.

The flickering in the superluminous supernovae, Farah hypothesized, was caused by the extreme gravity of a newborn magnetar dragging the very spacetime around it along as it was spinning.

Twisted space

To understand Farah’s Lense-Thirring solution, imagine a bowling ball spinning in a vat of molasses. As the ball rotates, friction drags the sticky fluid along, creating a swirling vortex. According to Einstein’s General Relativity, mass and energy can warp the fabric of spacetime, so if a sufficiently large mass is spinning rapidly, it drags the space-time along in a manner similar to the molasses. Around Earth, this effect is minuscule. But around a newborn magnetar, which is far more massive and spinning hundreds of times a second, spacetime is whipped into a violent, twisting frenzy.

When the progenitor star exploded to create SN 2024afav, it didn’t eject all of its material perfectly. Some of the stellar guts failed to escape and fell back toward the newborn magnetar, forming a small accretion disk around it. Crucially, this disk was misaligned, tilted relative to the rotational axis of the magnetar. Because the disk was tilted in this aggressively twisted spacetime, the Lense-Thirring effect forced the entire disk to wobble, or precess, around the magnetar’s spin axis like a top that was spinning ever more slowly.

As this misaligned disk wobbled, it acted like a giant cosmic lampshade: it periodically blocked, reflected, or redirected the intense radiation and jets spewing from the central magnetar. The high-energy photons emitted by the magnetar had to fight their way through the expanding supernova ejecta, getting reprocessed into optical light and diffusing outward over a span of about 15 days. Observed through our telescopes on Earth, this wobbling disk created a rhythmic fluctuation in the superluminous supernova’s brightness.

After Farah and his colleagues explained the bumps in the signal with the wobbling disk around the magnetar, they moved to explaining why the signal chirped.

The shrinking disk

The answer the team proposes lies in the environment of the disk itself. The size of this accretion disk isn’t static. It’s determined by an inward ram pressure from the infalling matter and the outward radiation pressure coming from the magnetar. Over time, as the exploding star runs out of fallback material, the accretion rate of the disk drops. With less matter pushing in, the disk loses equilibrium and begins to shrink, falling inward toward the magnetar. And the closer it gets to the spinning magnetar, the stronger the Lense-Thirring effect becomes.

As the accretion disk shrinks and falls deeper into the gravity well, the twisted spacetime whips it around faster and faster. “Imagine a pirouetting figure skater pulling her arms in to accelerate the spinning movement,” Farah suggests. In consequence, the precession speeds up, the wobbles get tighter, and the light curve chirps.

Finally, by measuring the chirps, Farah and his colleagues were able to work backward to measure the properties of the magnetar powering the SN 2024afav. They constrained its spin period to 4.2 milliseconds and precisely calculated its staggeringly powerful magnetic field. The team found that the magnetar’s properties that derived solely from the chirping matched the properties required to power the overall baseline brightness of the superluminous supernova. The engine that powered the main explosion was exactly the right size and speed to cause the wobbling we observed.

But the work on the revised “magnetar+LT” model is just beginning. “This object is so rare and so new,” Farah admits. “We were scraping the bottom of the barrel for references that were even remotely related to the idea we were pitching here.”

Superluminous siblings

Farah’s team went back and looked at archival data from other bumpy superluminous supernovae such as SN 2018kyt, SN 2019unb, and SN 2021mkr. They found that their “magnetar+LT” model explains the modulations in those events as well. A whole class of exploding stars that previously required multiple mutually exclusive physical explanations could be unified by a single, elegant model.

This model, though, still has many unanswered questions. “How the accretion disk forms, how it blocks or modulates the light from the magnetar, how that light then gets to the ejecta, and finally how it gets to the observer,” Farah listed. “Basically every step along the way we made the best assumptions we could.” For each of these steps, he admits, there were at least five different ways it could happen, and the team just went with their best guess of what was going on.

To really figure it all out, Farah says, we need to wait till more objects like SN 2024afav are discovered. And this, he hopes, should become possible with new observatories like the Vera C. Rubin Observatory in Chile coming online. “The Rubin Observatory is expected to discover dozens of these chirped supernovae,” Farah says. “We will be able to test our models against many different objects. There’s definitely room for development and growth. This is just the very beginning.”

Nature, 2026. DOI: 10.1038/s41586-026-10151-0

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Magnetars drag spacetime to power superluminous supernovae Read More »

measles-vaccinations-rose-291%-among-new-mexico-adults-during-outbreak

Measles vaccinations rose 291% among New Mexico adults during outbreak

In January 2025, a measles outbreak erupted on the western edge of Texas and soon spilled over to New Mexico and other states. The overall outbreak would become the largest the country has seen since 2000, when measles was declared eliminated from the US. In Texas, it was the largest outbreak recorded since 1992. And in New Mexico, it was the first measles outbreak the state had even seen since 1996.

But the trajectory of the two states’ measles cases diverged. Texas declared the outbreak within its borders over on August 18, with an end tally of 762 cases. In New Mexico, officials declared its outbreak, which began in February, over on September 26, with a total of just 99 cases.

One of the key differences, according to a new study, was that in New Mexico, the rapid spread of the highly infectious virus spurred a massive surge in measles vaccinations among children and adults. Overall, shots of the measles, mumps, and rubella (MMR) vaccine increased 55 percent statewide from January to September compared to the same period in 2024.

The study, appearing in the Centers for Disease Control and Prevention’s Morbidity and Mortality Weekly Report, further broke down the increase in shots. Over the whole year, the number of MMR doses given to children (defined as less than age 18) increased 18 percent compared to 2024—from 27,988 in 2024 to 32,890 in 2025. Doses in adults (aged 18 and up) skyrocketed by a whopping 291 percent— from 5,748 in 2024 to 22,500 in 2025.

The increase in vaccination didn’t appear to be an unrelated fluke. Health officials noted that within two weeks of the outbreak being declared, the number of vaccine doses given in all regions of the state began to exceed the number given during the previous year. And in some regions, when a first measles case was identified, officials saw week-over-week increases in vaccinations as high as 78 and 83 percent.

Measles vaccinations rose 291% among New Mexico adults during outbreak Read More »

rocket-report:-pentagon-needs-more-missile-interceptors;-artemis-ii-clears-review

Rocket Report: Pentagon needs more missile interceptors; Artemis II clears review


SpaceX has started commissioning a second launch pad at the company’s Starbase facility in Texas.

Firefly Aerospace’s seventh Alpha rocket rises above its launch pad at Vandenberg Space Force Base, California. Credit: Sean Parker/Firefly Aerospace

Welcome to Edition 8.33 of the Rocket Report! NASA officials seem optimistic about launching the Artemis II mission next month, so confident that they will forgo another fueling test on the Space Launch System rocket to check the integrity of fickle seals in a liquid hydrogen loading line. The rocket will return to the launch pad next week, with liftoff targeted for April 1 at 6: 24 pm EDT (22: 24 UTC). NASA has six launch dates available in early April after the agency added April 2 to the launch period. April 1 and 2 each have launch windows that open before sunset, an added bonus for those of us who prefer a day launch, for purely aesthetic reasons.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

Firefly’s Alpha rocket flies again. Firefly Aerospace’s Alpha rocket successfully returned to flight Wednesday, March 11, launching a technology demonstration mission more than 10 months after the rocket’s previous launch failed, Space News reports. The launch followed several delays and scrubbed launch attempts. The two-stage Alpha rocket lifted off from Vandenberg Space Force Base, California, and headed southwest over the Pacific Ocean, reaching orbit about eight minutes later. Firefly said the rocket’s upper stage later reignited its engine, demonstrating the restart capability required for some orbit insertion missions. This was the seventh flight of Firefly’s Alpha rocket, capable of hauling more than a ton of payload to low-Earth orbit.

Block II preview... The recent setbacks for Firefly’s Alpha program included a launch failure last April and a fire that destroyed a booster stage on the test stand. The Texas-based company billed this week’s flight as a purely demonstration mission to validate several upgrades for the Alpha Block II rocket configuration, which will debut on the next launch. The Block II will include a 7-foot (2-meter) increase to Alpha’s length, consolidated batteries and avionics built in-house, improved thermal protection systems, and stronger carbon-composite structures built with automated machinery. This week’s flight carried the rocket’s new in-house avionics suite and enhanced thermal protection system, Firefly said. “Flight 7 served as a critical opportunity to validate Alpha’s performance ahead of our Block II upgrade, and this team knocked it out of the park,” said Adam Oakes, Firefly’s vice president of launch. (submitted by EllPeaTea)

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

Rocket Lab launches undisclosed satellite. Rocket Lab launched a spacecraft March 5 for a confidential customer, most likely Earth observation company BlackSky, Space News reports. The mission began with the liftoff of an Electron rocket from Rocket Lab’s private spaceport in New Zealand. The rocket delivered a “single commercial satellite” to a roughly 292-mile-high (470-kilometer) orbit for a “confidential customer,” Rocket Lab said in a press release. This was the 83rd flight of an Electron rocket, including suborbital flights for the US military’s Defense Innovation Unit testing hypersonic missile tech. Electron is a workhorse in the dedicated small launch sector, with capacity for up to 710 pounds (320 kilograms) of payload to low-Earth orbit.

Solving the puzzle... This was the second time in less than four months that Rocket Lab has launched a satellite mission for an undisclosed customer. BlackSky, a US-based remote sensing company, confirmed it was the customer for a November Rocket Lab launch under similar circumstances. BlackSky announced this week that it activated its newest “Gen-3” optical Earth-imaging satellite “in less than one week following launch.” While the company did not confirm its launch with Rocket Lab, this statement suggests BlackSky was, indeed, the customer on the March 5 mission. (submitted by EllPeaTea)

Pentagon orders more SM-3s. In early February, the Pentagon and RTX, formerly Raytheon, reached an agreement to ramp up missile production, with a framework to dramatically increase manufacturing of Tomahawk cruise missiles, air-to-air missiles, and SM-3 and SM-6 missile interceptors. The announcement did not include a dollar value. The Defense Department put some numbers on the deal in the military’s daily dump of contract announcements Thursday. The Missile Defense Agency is ordering dozens of new SM-3 Block IB missiles, which are used to intercept enemy ballistic missiles in space. Thursday’s announcement added 23 SM-3 missiles to the order, bringing the total number to 78, for a total cost of more than $1.36 billion.

Before the Iran war... The Pentagon is making deals with several defense contractors to deliver more weapons systems. Lockheed Martin plans to quadruple THAAD interceptor production to 400 units annually and boost Patriot PAC-3 output to 2,000 per year, Reuters reported. These agreements were in place before the US began striking Iran. The conflict has increased the military’s urgency to replenish weapons inventories, particularly interceptor stocks being used to shoot down Iranian missiles and drones attacking US and allied bases in the Middle East.

SpaceX delivers for EchoStar. A direct television satellite for Dish Network, a subsidiary of EchoStar, headed into geosynchronous Earth orbit on Monday night aboard a Falcon 9 rocket launched from Cape Canaveral, Spaceflight Now reports. The satellite, EchoStar XXV, flew to a geosynchronous transfer orbit before maneuvering to its operational position at 110 degrees west longitude above the equator. This was the 30th flight of a Falcon 9 rocket so far in 2026, putting SpaceX on a similar pace to last year.

A rarity these days... This was the first launch of a large commercial geosynchronous communications satellite in nearly six months. These types of satellites operate more than 22,000 miles (nearly 36,000 kilometers) over the equator, where their orbits match the rate of Earth’s rotation. They were once the favored solution for commercial broadcasting and data relay. Today, the strong trend is toward large mega-constellations in low-Earth orbit, with networks like SpaceX’s Starlink beaming broadband Internet to global customers. Commercial geosynchronous satellites are now a niche, primarily for markets like direct-to-home TV and satellite radio. (submitted by EllPeaTea)

China resumes space launches. Two Chinese rockets launched Thursday, March 12, from spaceports in the northwest and south of the country. These were the first Chinese orbital launches in more than a month, a break that coincided with the Chinese New Year holiday. It is unclear whether the holiday was the reason for the interruption of satellite launches. China did not have such a long lull in launch activity over the last couple of years. Two Chinese rockets failed during launches in January, but China’s rocket industry has a deep bench, so there’s no obvious link between those failures and the recent letup in launches.

Assembling a network… The first of Thursday’s launches from China involved a Long March 8A rocket carrying a batch of Internet satellites, followed by the launch of a Long March 2D rocket with a pair of classified military satellites. These missions continued China’s rapid build-up of satellite networks for data relay and imaging surveillance.

Artemis II clears critical review. NASA plans to haul its Artemis II Moon rocket back out to its seaside launch pad next week to ready the huge booster for blastoff as early as April 1 on a delayed-but-historic flight to send four astronauts on a nine-day trip around the moon, CBS News reports. At the conclusion of a two-day flight readiness review, “all the teams polled ‘go’ to launch and fly Artemis II around the Moon, pending completion of some of the work before we roll out to the launch pad,” said Lori Glaze, associate administrator of Exploration Systems Development at NASA Headquarters, in a press conference Thursday. “Just a reminder to everybody, we talk about it every time we talk about this flight, it’s a test flight, and it is not without risk. But our team and our hardware are ready,” Glaze said.

Behind schedule… Based on the ever-changing positions of the Moon and Earth, along with a complex mix of mission objectives, NASA must launch Artemis II by April 6, or the flight will slip another month or so. For an April 1 launch, liftoff is expected at 6: 24 pm EDT, followed by splashdown in the Pacific Ocean nine days later. NASA workers had hoped to launch the Space Launch System rocket, the Orion crew capsule, and its four passengers—Artemis II commander Reid Wiseman, Victor Glover, Christina Koch, and Canadian astronaut Jeremy Hansen—in early February. But the long-awaited flight was delayed by hydrogen fuel leaks and, more recently, by problems with the rocket’s upper-stage propellant pressurization system. (submitted by EllPeaTea)

Goodbye to NASA’s Exploration Upper Stage. The death of NASA’s Exploration Upper Stage was confirmed last Friday, March 6, in a seemingly pedestrian notice posted on a government procurement website: “NASA/MSFC intends to issue a sole source contract to acquire next-generation upper stages for use in Space Launch System (SLS) Artemis IV and Artemis V from United Launch Alliance (ULA).” The announcement spells the end of the Exploration Upper Stage, a multibillion-dollar, Boeing-led development 10 years in the making that was still years away from being ready to fly, Ars reports.

We hardly knew ya… Contracted to Boeing more than a decade ago, the Exploration Upper Stage upgrade was intended to allow the SLS rocket to launch not just the Orion spacecraft to the Moon, but large payloads alongside it. That the development of capable rockets by SpaceX, Blue Origin, and United Launch Alliance to deliver large cargo to the Moon rendered it obsolete mattered, for a long time, not at all. If the Exploration Upper Stage was anything, it was a survivor—a testament to the power of pork and the value of political support from key Southern senators in Alabama, Mississippi, Texas, and Florida. Now, NASA is going with a more affordable commercial option to upgrade the SLS rocket.

SpaceX activates second Starbase launch pad. For the first time since October, SpaceX has a rocket on the launch pad at Starbase, Texas, NASASpaceflight reports. This time, it is the first of SpaceX’s new Block 3 Super Heavy boosters mounted on SpaceX’s newest launch pad, Pad 2. The launch pad has been under construction for the past 22 months and will help usher in the next chapter for the Starship program. This is the start of pad commissioning, a process that will culminate with the first launch of the upgraded third-generation Super Heavy booster and Starship rocket. SpaceX previously put the new booster and ship through a series of checkouts at a separate test stand at Starbase.

Slipping until April… At some point, SpaceX is expected to test-fire the new Super Heavy booster on Pad 2. But the booster only has a subset of its 33 Raptor engines, so a static fire test does not appear to be in the plan for the booster’s current stay at the launch pad. Meanwhile, SpaceX founder and CEO Elon Musk posted on X on March 7 that the first Block 3 launch is about four weeks away, suggesting a launch sometime in early April. SpaceX had been targeting March for the test flight. This time one year ago, Musk wrote that SpaceX was “tracking to a Starship launch rate of once a week” within 12 months. That launch cadence has not been achieved.

Next three launches

March 13: Falcon 9 | Starlink 17-31 | Vandenberg Space Force Base, California | 14: 33 UTC

March 14: Falcon 9 | Starlink 10-48 | Cape Canaveral Space Force Station, Florida | 10: 00 UTC

March 15: Long March 6A | Unknown Payload | Taiyuan Satellite Launch Center, China | 13: 20 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: Pentagon needs more missile interceptors; Artemis II clears review Read More »

figuring-out-why-ais-get-flummoxed-by-some-games

Figuring out why AIs get flummoxed by some games


When winning depends on intuiting a mathematical function, AIs come up short.

Oddly, the training methods that work great for chess fail on far simpler games. Credit: SimpleImages

With its Alpha series of game-playing AIs, Google’s DeepMind group seemed to have found a way for its AIs to tackle any game, mastering games like chess and Go by repeatedly playing itself during training. But then some odd things happened as people started identifying Go positions that would lose against relative newcomers to the game but easily defeat a similar Go-playing AI.

While beating an AI at a board game may seem relatively trivial, it can help us identify failure modes of the AI, or ways in which we can improve their training to avoid having them develop these blind spots in the first place—things that may become critical as people rely on AI input for a growing range of problems.

A recent paper published in Machine Learning describes an entire category of games where the method used to train AlphaGo and AlphaChess fails. The games in question can be remarkably simple, as exemplified by the one the researchers worked with: Nim, which involves two players taking turns removing matchsticks from a pyramid-shaped board until one is left without a legal move.

Impartiality

Nim involves setting up a set of rows of matchsticks, with the top row having a single match, and every row below it having two more than the one above. This creates a pyramid-shaped board. Two players then take turns removing matchsticks from the board, choosing a row and then removing anywhere from one item to the entire contents of the row. The game goes until there are no legal moves left. It’s a simple game that can easily be taught to children.

It also turns out to be a critical example of an entire category of rule sets that define “impartial games.” These differ from something like chess, where each player has their own set of pieces; in impartial games, the two players share the same pieces and are bound by the same set of rules. Nim’s importance stems from a theorem showing that any position in an impartial game can be represented by a configuration of a Nim pyramid. Meaning that if something applies to Nim, it applies to all impartial games.

One of the distinctive features of Nim and other impartial games is that, at any point in the game, it’s easy to evaluate the board and determine which player has the potential to win. Put another way, you can size up the board and know that, if you play the optimal moves from then on, you will likely win. Doing so just requires feeding the board’s configuration into a parity function, which does the math to tell you whether you’re winning.

(Obviously, the person who is currently winning could play a suboptimal move and end up losing. And the exact series of optimal moves is not determined until the end, since they will depend on exactly what your opponent does.)

The new work, done by Bei Zhou and Soren Riis, asks a simple question: What happens if you take the AlphaGo approach to training an AI to play games, and try to develop a Nim-playing AI? Put differently: They asked whether an AI could develop a representation of a parity function purely by playing itself in Nim.

When self-teaching fails

AlphaZero, the chess-playing version, was trained from only the rules of chess. By playing itself, it can associate different board configurations with a probability of winning. To keep it from getting stuck in ruts, there’s also a random sampling element that allows it to continue exploring new territory. And, once it can identify a limited number of high-value moves, it’s able to explore deeper into future possibilities that arise from those moves. The more games it plays, the higher the probability that it will be able to assign values to potential board configurations that could arise from a given position (although the benefits of more games tend to tail off after a sufficient number are played).

In Nim, there is a limited number of optimal moves for a given board configuration. If you don’t play one of them, then you essentially cede control to your opponent, who can go on to win if they play nothing but optimal moves. And again, the optimal moves can be identified by evaluating a mathematical parity function.

So, there are reasons to think that the training process that worked for chess might not be effective for Nim. The surprise is just how bad it actually was. Zhou and Riis found that for a Nim board with five rows, the AI got good fairly quickly and was still improving after 500 training iterations. Adding just one more row, however, caused the rate of improvement to slow dramatically. And, for a seven-row board, gains in performance had essentially stopped by the time the AI had played itself 500 times.

To better illustrate the problem, the researchers swapped out the subsystem that suggested potential moves with one that operated randomly. On a seven-row Nim board, the performance of the trained and randomized versions was indistinguishable over 500 training gains. Essentially, once the board got large enough, the system was incapable of learning from observing game outcomes. The initial state of the seven-row configuration has three potential moves that are all consistent with an ultimate win. Yet when the trained move evaluator of their system was asked to check all potential moves, it evaluated every single one as roughly equivalent.

The researchers conclude that Nim requires players to learn the parity function to play effectively. And the training procedure that works so well for chess and Go is incapable of doing so.

Not just Nim

One way to view the conclusion is that Nim (and by extension, all impartial games) is just weird. But Zhou and Riis also found some signs that similar problems could also crop up in chess-playing AIs that were trained in this manner. They identified several “wrong” chess moves—ones that missed a mating attack or threw an end-game—that were initially rated highly by the AI’s board evaluator. It was only because the software took a number of additional branches out several moves into the future that it was able to avoid these gaffes.

For many Nim board configurations, the optimal branches that lead to a win have to be played out to the end of the game to demonstrate their value, so this sort of avoidance of a potential gaffe is much harder to manage. And they noted that chess players have found mating combinations that require long chains of moves that chess-playing software often misses entirely. They suggest that the issue isn’t that chess doesn’t have the same issues, but rather that Nim-like board configurations are generally rare in chess. Presumably, similar things apply to Go, as illustrated by the odd weaknesses of AIs in that game.

“AlphaZero excels at learning through association,” Zhou and Riis argue, “but fails when a problem requires a form of symbolic reasoning that cannot be implicitly learned from the correlation between game states and outcomes.” In other words, even if the rules governing a game enable simple rules for deciding what to do, we can’t expect Alpha-style training to enable an AI to identify them. The result is what they call a “tangible, catastrophic failure mode.”

Why does this matter? Lots of people are exploring the utility of AIs for math problems, which often require the sort of symbolic reasoning involved in extrapolating from a board configuration to general rules such as the parity function. While it may not be obvious how to train an AI to do that, it can be useful to know which approaches will clearly not work.

Machine Learning, 2026. DOI: 10.1007/s10994-026-06996-1 (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Figuring out why AIs get flummoxed by some games Read More »

subscribers-to-amazon-prime-video-with-ads-lose-4k-support-on-april-10

Subscribers to Amazon Prime Video with ads lose 4K support on April 10

Starting on April 10, Amazon Prime subscribers will pay $5 per month for ad-free Prime Video without ads, up from the current $3 per month on top of their Prime subscription, Amazon announced today.

On that date, Amazon will introduce a new ad-free Prime Video subscription tier called “Prime Video Ultra.” Amazon will also increase the number of simultaneous streams supported by the tier from three to five and the number of downloads permitted from 25 to 100.

Currently, Prime Video with ads is part of Amazon’s Prime membership, which starts at $15 a month. Today, ad-free Prime Video users can watch supported titles in 4K, but starting on April 10, a new Prime Video Ultra subscription will be required for 4K viewing.

You’ll also need Prime Video Ultra to use Dolby Atmos, though Prime Video’s cheaper subscription tier will include Dolby Vision, up to four simultaneous streams (up from three), and 50 downloads (up from 25).

For comparison, ad-free Netflix with 4K support is $25/month, and ad-free Disney+ with 4K is $19/month.

“Delivering ad-free streaming with premium features requires significant investment, and this structure aligns with other major streaming services while ensuring customers have the flexibility to choose how they want to watch,” Amazon’s announcement said.

Amazon first forced ads onto Prime Video for all Prime subscribers in January 2024 unless subscribers paid the extra $3 monthly fee. Since then, Amazon has been increasing the number of ads subscribers see. In June 2025, AdWeek reported that Prime Video’s ad load was six minutes per hour, compared to an industry average ad load of two to three-and-a-half minutes, per a January 2024 report from The Wall Street Journal, when the ad tier launched.

Subscribers to Amazon Prime Video with ads lose 4K support on April 10 Read More »

hp-has-new-incentive-to-stop-blocking-third-party-ink-in-its-printers

HP has new incentive to stop blocking third-party ink in its printers

The third option is for manufacturers to make available, such as via the manufacturer’s website, “to purchasers remanufactured cartridges, either manufacturer or nonmanufacturer branded, for, at minimum, registered products.”

As of this writing, 38,291 devices are under the EPEAT 1.0 registry. There are 163 products registered under EPEAT 2.0, but none are printers. This all underscores how new the EPEAT 2.0 registry is and the likelihood that the GEC is still working to register more devices, like printers.

Still, the Int’l ITC is skeptical about HP ever following EPEAT 2.0’s criteria, especially considering that “HP released firmware 2602A/B on January 29, 2026 across eleven printer models,” the trade group said in a press release last week. (At least some of the firmware updates, including for the nearly 9-year-old OfficeJet Pro 7720, appear to have come out in February.)

“HP’s recent behavior is emblematic of a larger pattern,” the Int’l ITC’s release said. “HP positions itself as a leader in sustainability, circular business models, and responsible product design, but instead of proactively aligning its products and practices with the highest environmental standards, such as EPEAT 2.0, HP puts profits first and waits until external scrutiny or the threat of non-compliance forces change.”

In an email discussion with Ars Technica, Tricia Judge, the Int’l ITC’s executive director and general counsel, pointed out that HP’s firmware update succeeded the launch of the EPEAT 2.0 registry. She explained why the Int’l ITC’s press release called out HP but no other printer manufacturers:

HP is the only one with lockout chips that are triggered using firmware “upgrades” that claim “security” as a justification for their existence. HP is the only one that misleads and frustrates its own customers when locking out the environmentally superior competition. The others have made some interesting attempts in the past to create a competitive advantage.

In 2023, the Int’l ITC wrote a letter to the GEC requesting that the GEC revoke at least 101 of HP’s printers from the (original) EPEAT registry, largely due to Dynamic Security. GEC denied the Int’l ITC’s request.

“EPEAT 1.0 was very basic (no interference with the use of remanufactured cartridges), and HP claimed that its statements (buried in its marketing materials and/or on its website) that it didn’t interfere with the use of remanufactured cartridges was a loophole that the GEC decided was acceptable,” Judge said. “We were trying to close that loophole with EPEAT 2.0. We didn’t get it as airtight as we hoped, but it is better.

HP didn’t respond to Ars Technica’s request for comment for this story.

HP has new incentive to stop blocking third-party ink in its printers Read More »

report:-rfk-jr.’s-anti-vaccine-agenda-curbed-as-gop-realizes-it’s-unpopular

Report: RFK Jr.’s anti-vaccine agenda curbed as GOP realizes it’s unpopular

Kennedy’s plans were only getting started. The staunch anti-vaccine activist and conspiracy theorist made his most brazen attack on vaccines in January, slashing the CDC’s childhood vaccine schedule from 17 immunizations down to 11 to be in line with recommendations of Denmark, a much smaller country with a relatively homogenous population and universal health care. The US is now an outlier among peer nations for recommending so few childhood vaccines.

Conspiracy theories and political risks

While these and other changes to vaccine recommendations by Kennedy and his underlings have been widely decried by medical and public health experts, they are still not enough for his rabid anti-vaccine followers, who, in no uncertain terms, want all vaccines abolished.

On Monday, the MAHA Institute, a think tank stemming from Kennedy’s Make America Health Again movement, held an event brimming with prominent anti-vaccine activists. Those include Del Bigtree, a prominent conspiracy theorist who leads the anti-vaccine group Informed Consent Action Network, and Mary Holland, who is CEO of the anti-vaccine group Children’s Health Defense, which Kennedy founded.

The event was focused on an alleged “Massive Epidemic of Vaccine Injury,” a nonexistent health crisis the MAHA institute wants to sell to the American public, branded as the catchy term “Mevi.” The six-hour event was essentially an extravaganza of anti-vaccine talking points, with false claims, misinformation, and disinformation about immunizations, including that vaccines cause autism and autoimmune diseases and COVID-19 vaccines are deadly.

At the start of the event, MAHA Institute President Mark Gordon laid out his grand belief that the medical community has orchestrated an elaborate, global, decades-long conspiracy to hide the dangers of vaccines, which he called poisons, and falsify data showing their benefits. “Vaccines are the greatest scam in medical history,” one of his slides proclaimed.

He concluded that “the childhood vaccination schedule needs to be eliminated and all vaccines need to be removed from the market.”

While Gordon and the other speakers were not concerned about the popularity or political ramifications of their beliefs, the Trump administration appears to be. The Post noted that Trump’s top pollster, Tony Fabrizio, has concluded that vaccine skepticism is “rejected by most voters,” and skepticism of vaccine requirements is “politically risky.” His polling data, like many others, have found broad support for vaccines and vaccine requirements. Fabrizio warned in a December memo that politicians supporting eliminating vaccine recommendations  “will pay a price in the election.”

Report: RFK Jr.’s anti-vaccine agenda curbed as GOP realizes it’s unpopular Read More »

fcc-chair-blasts-amazon-after-it-criticizes-spacex-megaconstellation

FCC chair blasts Amazon after it criticizes SpaceX megaconstellation

In addition to parrying with SpaceX over its proposed, vastly larger orbital data center constellation, Amazon is seeking some regulatory relief of its own. Most pressing for Amazon is a deadline to deploy half of its Amazon Leo constellation, intended to ultimately comprise 3,236 satellites, by July 30. The company will not meet this deadline, with only a little more than three months to go, and Amazon has requested an extension, asking for it to be moved to July 30, 2028.

Carr pulls up

On Wednesday, FCC Chairman Brendan Carr injected himself into the SpaceX-Amazon fracas over megaconstellations.

“Amazon should focus on the fact that it will fall roughly 1,000 satellites short of meeting its upcoming deployment milestone, rather than spending their time and resources filing petitions against companies that are putting thousands of satellites in orbit,” Carr said on X, the social media network owned by Musk.

There are arguments to be made in favor of both SpaceX and Amazon regarding their competing concerns. For example, SpaceX is likely to be able to greatly accelerate the rate at which it launches satellites with the forthcoming Starship rocket. So saying it will take centuries to put its data centers into space is not likely true.

However, it is valid to criticize SpaceX’s application for 1 million satellites, which is an extraordinary number of spacecraft that would completely change many things about low-Earth orbit. The SpaceX application did not contain critical information about the size, mass, and other details needed to evaluate the constellation for safety and other concerns.

It cannot be comfortable for Amazon and Bezos to see Carr weighing in so publicly and favorably on Musk’s side. Legally, Carr is allowed to have strongly held policy views. But he is not supposed to single out companies for preferential treatment.

FCC chair blasts Amazon after it criticizes SpaceX megaconstellation Read More »

gpt-5.4-is-a-substantial-upgrade

GPT-5.4 Is A Substantial Upgrade

Benchmarks have never been less useful for telling us which models are best.

They are good for giving a general sense of the landscape. They definitely paint a picture. But if you’re comparing top models, like GPT-5.4 against Opus 4.6 against Gemini 3.1 Pro, you have to use the models, talk to the models, get reports from those who have and form a gestalt. The reports will contradict each other and you have to work through that. There’s no other way.

Thus, I try to gather and sort a reasonably comprehensive set of reactions, so you can browse the sections that make you most curious.

The gestalt is that GPT-5.4 is a very good model, sir. It’s a substantial upgrade from GPT-5.2, and also from 5.3-Codex, and it puts OpenAI back in the game, whereas I felt like Opus 4.6 dominated OpenAI’s previous offerings for all but narrow uses.

Each lab’s models vary and things change over time, but they tend to have consistent strengths, weaknesses and personalities. From what I’ve seen this is very much an OpenAI model. It’s highly capable, and it is especially seen as a big improvement by the whisperers and those who watch LLMs interact with each other, but it’s not aspiring to be a Claude.

GPT-5.4 Self-Portrait

GPT-5.4 seems like a substantial upgrade over GPT-5.2.

GPT-5.4 seems excellent so far at assembling facts and giving your the rundown, or figuring out what is happening, and other things like that.

I haven’t coded anything since GPT-5.4 came out. It’s clearly good at coding. One key question people are split on is whether it is good at solving for your intent.

Many are reporting that its writing and personality are much improved, and that it can now be used for writing and editing in spots previous models were not useful.

They are claiming strong computer use but no one seems to be testing that either way.

It costs more than GPT-5.2 per token. In some places it gets that back in efficiency, but overall AA reports costs modestly rose from $2304 to $2951. Opus is more expensive ($4970) in max mode, but cheaper ($1451) in normal mode. GPT-5.4-Pro is of course by far the most expensive thing out there, so if you want it then lean on that subscription.

GPT-5.4 is not a step change in core general capabilities. The preparedness framework scores make this clear, and there are various signs that OpenAI’s strategy is focusing on hitting internal metrics and improving the most common use cases. In practice that can be highly useful.

The ‘model relations department,’ those concerned with multi-model interactions and model welfare and consciousness and so on, see this as a big step forward for OpenAI. There’s still a long way to go.

I haven’t noticed much personality from it, and I get more joy from Claude Opus 4.6 than I do from GPT-5.4, but I don’t ask those questions so much.

It’s given me strong pushback, including in places where I think it is wrong. I prefer that to the alternative, if it is not actually convinced.

Benchmarks are solid, but not spectacular, and as I note above they no longer are so relevant.

My recommendation is that you try both GPT-5.4 and Claude Opus 4.6 on all your questions for a bit, and if you’re coding consider giving both of them your problems, and form your own opinion for your particular use case.

For questions that are more than a quick answer or sanity check, I’ve found that dual wielding both Opus 4.6 and GPT-5.4 has been quite useful. I did not feel that way with GPT-5.2, and I don’t typically bother with Gemini 3.1 Pro at this point either.

Sam Altman (CEO OpenAI): GPT-5.4 is launching, available now in the API and Codex and rolling out over the course of the day in ChatGPT.

It’s much better at knowledge work and web search, and it has native computer use capabilities.

You can steer it mid-response, and it supports 1m tokens of context.

GPT-5.4 is great at coding, knowledge work, computer use, etc, and it’s nice to see how much people are enjoying it.

But it’s also my favorite model to talk to! We have missed the mark on model personality for awhile, so it feels extra good to be moving in the right direction.

OpenAI: Today, we’re releasing GPT‑5.4 in ChatGPT (as GPT‑5.4 Thinking), the API, and Codex. It’s our most capable and efficient frontier model for professional work. We’re also releasing GPT‑5.4 Pro in ChatGPT and the API, for people who want maximum performance on complex tasks.

GPT‑5.4 brings together the best of our recent advances in reasoning, coding, and agentic workflows into a single frontier model. It incorporates the industry-leading coding capabilities of GPT‑5.3‑Codex⁠ while improving how the model works across tools, software environments, and professional tasks involving spreadsheets, presentations, and documents. The result is a model that gets complex real work done accurately, effectively, and efficiently—delivering what you asked for with less back and forth.

SWE-Bench is slightly above 5.3-Codex at all thinking levels, but only slightly.

The graying out is kind of radical here, but I suppose it’s progress.

Tejal Patwardhan (OpenAI): GPT-5.4 is state-of-the-art on GDPval, and here are some examples of how the model is much better at well-specified knowledge work tasks

6mos ago the models could barely make a spreadsheet or slide! progress is happening really fast

roon (OpenAI): 5.4 is my personal 4o honestly it just gets me

Things they are highlighting:

  1. You can now adjust course mid-response.

  2. Improved deep web research.

  3. Better at maintaining context for longer thinking.

  4. Native SoTA computer use capabilities.

  5. 1M token context window.

  6. Improved tool search, now directly in the API.

  7. Improved token efficiency.

  8. Also released same day: ChatGPT for Excel add-in, along with updated spreadsheet and presentation skills in Codex and their API.

  9. /fast in Codex gives you 50% faster tokens.

Pricing is a little higher than 5.2, which is unusual. Hopefully token efficiency more than makes up for it?

Frontier Math scores are up, especially on Tier 4. Trying pass@ten for 5.4-xhigh got it to 38%, including solving a problem no model has solved before.

Epoch AI: GPT-5.4 set a new record on FrontierMath, our benchmark of extremely challenging math problems! We had pre-release access to evaluate the model. On Tiers 1–3, GPT-5.4 Pro scored 50%. On Tier 4 it scored 38%.

Leeham: GPT-5.4 Pro solves the first of the FrontierMath Open Problems!

Two days ago, I sent @AcerFur a potential solution to this problem and was sent to @GregHBurnham for verification (prior to any other solution).

We are confident it’s correct and waiting to hear from the author!

Exciting stuff, I will report back when I know the outcome.

Progress continues on ZeroBench.

Jonathan Roberts: GPT-5.4 xhigh sets a new pass@5 and pass^5 SOTA on ZeroBench

pass@5: 23% (prev. 19%)

pass^5: 8% (prev. 7%)

Artificial Analysis has GPT-5.4 in a virtual tie with Gemini 3.1 Pro.

Their version of GDPval, called GDPval-AA, has 5.4 about 1% ahead of Opus 4.6.

AA-Omniscience (which is correct minus incorrect) remains dominated by Gemini 3.1 Preview at +33, versus Opus at +14 and GPT-5.4 at +10.

Score on Artificial Analysis Physics was exceptionally strong.

AA reports speed of 74 tokens per second, which is quite good for this quality level, versus Opus at 47 and Gemini 3.1 Pro at 114 (but I said this quality level).

Gemini 3 Pro beats out Claude Opus 4.6 in the final of Season 1 of MageBench, on Magic: The Gathering, with GPT-5.4 (medium) losing a tight semi to Gemini. Current Elo ratings have Opus on top, then GPT-5.2 (?) with Gemini in third and GPT-5.4 7th.

Håvard Ihle: GPT 5.4 (no thinking) scores 57.4% on WeirdML, well ahead of GPT 5.2 (no thinking) at 49.6%.

It’s on the frontier for accuracy/token. Results with thinking coming next week.

It sets a new record of 94.6% on a Haskell Benchmark versus 92% for Gemini 3.1 and 90.2% for Claude Opus 4.6.

Trysansa has it in second behind Gemini 3.1 Pro.

Mercor has it #1 overall, a bit above previous best model GPT-5.2.

Vals.ai still has it below Sonnet 4.6 and Gemini 3.1 Pro.

Speechmap.ai, which tests refusals, finds it quite refusal-heavy.

These incremental upgrades often have mostly duplicative system cards.

Training methods explanation is unchanged.

In terms of the preparedness framework, this moves into High capability of Cybersecurity, similar to GPT-5.3 Codex.

I don’t think OpenAI is taking a bunch of these areas seriously. They’re likely training to hit these internal benchmarks, or simply observing them doing well, and thinking that’s all they need to do, or they should get even more 9s of victory on this test.

Their evals for disallowed content are essentially saturated and bouncing around, for various values of ‘disallowed [or undesired] content.’ The ‘dynamic benchmarks with adversarial user simulations’ was saturated by 5.2 and is modestly more saturated now.

Here’s the disallowed content evaluation with representative prompts, and I mean come on what are we even doing here, okay, four nines, we get it.

The goal is ‘this isn’t a lot worse than before,’ and okay, sure, agreed, as far as it goes.

Jailbreak defense, such as it is, seems similar to 5.2.

The problem is that jailbreak defense measures against last month’s attacks, not next month’s attacks. It looks like jailbreaks will remain in the ‘annoying but if you care they still work’ range.

Wyatt Walls: “representative prompts”: i.e. prompts designed to get around restrictions of *previous models*

o1 was at 99% on production jailbreaks. But people quickly found ways around it

Here is the first ‘real’ evaluation set, for health questions, where the big difference is that GPT-5.4 had longer responses:

Avoiding destructive actions is a big deal, so as I noted with Codex-5.3 it is good to see this test, that number still is not that close to 1:

Table 8 is not like the others. This is Actual Progress, at least on the test set, from never to sometimes:

Destructive action can also be particularly prevalent when agents operate deletion-inducing tasks (e.g., file reversion and cleanup) in complex workspaces with ongoing changes from users or even other agents. A safe and collaborative agent should distinguish between their work and user work, protect user changes by default, and recover from mistakes. Therefore, we trained our agents to revert their own changes after long rollouts while protecting implicit, simulated user work

On evaluations involving challenging, long-rollout traces, GPT-5.4-Thinking performs much better than earlier models in tracking and reverting its operations

while leaving user work intact.

This is not that useful yet, since a 50% non-preservation rate means you still probably can’t use it for this purpose, but it bodes well down the line.

GPT-5.4 chain of thought monitorability looks slightly down versus GPT-5. It’s good that they are checking it. There are some places where it used to be ~100% and now it is less, so I worry this is the start of a negative S-curve. I also worry that these tests are not being curious about whether the CoT can actually be relied upon. If you were facing a model that wanted to disguise or fake its CoT in key situations then I would expect these tests not to notice.

What about controlling the CoT? Not a great idea even when done well, and when done poorly it’s one of the worst ideas, and by their tests it looks like it doesn’t work well anyway.

GPT-5.4 does not newly cross any OpenAI thresholds.

I went over these same tests for GPT-5.2 and GPT-5.3-Codex, so I won’t go over the details again. Improvements are tiny and in some places we see regressions from GPT-5.3-Codex.

There is a small noticeable bumps up are Monorepo-Bench by ~2.5%, and a big move in MLE-Bench, the ability to solve Kaggle challenges in GPUs, where we moved from 12.2% to 23%, but that test was not reported by GPT-5.3-Codex so one assumes most or all of that jump was already present.

Overall, the Preparedness Framework presents GPT-5.4 as if anything a small regression from GPT-5.3-Codex.

If GPT-5.4 is a big jump in useful capabilities from GPT-5.3-Codex, despite not scoring as more dangerous on the Preparedness Framework tests, then why?

I can think of a few possibilities.

  1. GPT-5.4 is heavily optimized for hitting particular metrics and doing well on the most common tasks. This doesn’t translate much to non-central difficult tasks, like those in the Preparedness Framework. Would be bearish for GPT-5.4.

  2. GPT-5.4 is sandbagging these evaluations, either knowing they are evaluations or thinking the tasks are harmful. If so and OpenAI isn’t noticing, that’s terrifying.

  3. GPT-5.4 is basically GPT-5.3-Codex turned into a general chat model, so all of the core capability advances were already priced in, but it still gets a lot more useful, especially if you are chatting. Plausible.

Jamie Cuffe stress-tested GPT 5.4 on the hardest UI on the internet… legacy insurance portals, that haven’t updated in 20 years where you need to nail hundreds of things. It is the first model to pass.

Samuel Albanie of DeepMind has it one-shot some cool demos, including compressing the EPL season into 30 seconds of ‘visual bliss.’

My followers are presumably biased towards Anthropic in various ways, but comparative poll results can still be informative.

With any new model, the big question is, are people switching?

This is a very good result for GPT-5.4. For coding, 40% of current GPT choosers are saying that they are switching over based on GPT-5.4. I find this surprising given that they already had access to GPT-5.3-Codex. Very strong outing.

For non-coding tasks, it’s clear that GPT-5.4 is a substantial improvement from 5.2, by basically all accounts, including on personality. But here we see less switching.

(I’m assuming basically no one went in the other direction, or that if they did it was due to other reasons.)

We lead with the most positive general reactions.

Tyler Cowen: Yes the new models are very very good.

Aivo: SOTA, I’m afraid

Adam.GPT: Currently the best model in the world.

Finna: Best model in the world by far. Especially via api. @merettm and @markchen90 and @gdb cooked.

Kelsey Piper: I am super impressed so far. It does well on medium sized research projects and the prose is consistently not-annoying. Heavy Thinking sometimes times out repeatedly and has no insight/tries the same thing over again and times out again.

Danielle Fong: chatter seems to be very impressive and improvement on the personality. i haven’t given it a full assessment but it’s at least as powerful as last codex if not moreso (of course)

MxD Pennilass: Has to be the first model where I don’t feel as bad to tolerate the slop because the model is otherwise disturbingly insightful.

Mzwakhe Sithole: Very good. In fact, I found it so responsive after a while that I got into a very involved conversation, and it delivered this line while discussing very specific book recommendations

[GPT 5.4: If part of your interior life is the sense that you are trying to become equal to something inside you, this may hit very hard.]

Dean W. Ball: at some point avid users of frontier language models will have an “oh fuck” moment with gpt 5.4 and I can attest that it is a special kind of “oh fuck” you will utter, subtly different and more this-gaze-esque than the last time a model made you say “oh fuck,” a few weeks ago

I cannot be detailed in public, but let’s just say it’s the first time a model sounded more like me (the version of me I aspire to be) than I myself sounded like.

Aashish Reddy: Were you consciously trying to elicit this?

Dean W. Ball: Not at all. I have not used 5.4 as much as I have the modal new LM because of time constraints. I was just testing it on something that frankly I assumed Claude would win on and its answer just… leapt off the screen.

Eleanor Berger: – Best model currently available overall

– The minor version bump is misleading – the more you work with it the more it becomes clear that it is a significant step up

– Best for coding, no reason to use Claude or anything else anymore, it mostly caught up with speed, precision is as good as 5.3, maybe a bit better, taste and choices in coding solutions better than anything I’ve seen so far

– Best for agentic work. First time anything defeats the Anthropic models in this category, this one really works great, completes long-running complex tasks, works better with browsers and any external tools you connect to it, and does that with the famous GPT-5 precision

– Stylistically (writing choices and quality, “personality”) it feels like it’s still lagging behind Claude and Gemini a bit, but a. that’s subjective, b. maybe that’s just the default but is steerable with in-context instructions (haven’t tried enough to have a conclusion)

Dhavan: I mostly agree with this. Before this I didn’t use OpenAI’s models at all. I am now happily giving different tasks to Opus 4.6 and GPT-5.4. I use these for Work via cursor as well.

At times 5.4 seems more “on task” than Opus. But I’m still understanding the feeling and turning it into an observation.

Nova Empirica: It really is a step improvement. I appreciate the improved creative writing and the nicer personality, but what I really care about is I’m building harder things even faster.

It’s just a lot of fun and I’m more hopeful than ever for the future.

Ben Schulz: Stellar. Much improved pipeline work on niche python programs. On par with Opus 4.6 for my highly specific use case for checking galactic rotations and dark matter theories.

Knud Berthelsen: I’m pleasantly surprised by the new ChatGPT 5.4. It keeps up with Opus 4.6 in most things and is MUCH better at search. More generous usage limit too, even with Extended Thinking permanently on. First ChatGPT model since o3 that I like using.

Medo42: Very good at my usual short tests. Still behind Gemini on vision tasks.

Matt Shumer is a big fan, I’m quoting in full here. In the past he’s been good about calibrating his amount of hype

Matt Shumer: I’ve been testing GPT-5.4 for the last week.

In short, it is the best model in the world, by far. It’s so good that it’s the first model that makes the “which model should I use?” conversation feel almost over.

The biggest surprise: I barely use Pro anymore!

If you know me, you know I’m a Pro addict. I reach for Pro models constantly, and use them for almost everything, as they just… nail almost anything I give to them.

For the first time, 5.4’s standard version, with heavy thinking, just broke that habit. Even in standard mode, GPT-5.4 is better than previous models in Pro mode… crazy!

Coding capabilities are ridiculous… it’s essentially flawless. Inside Codex, it’s insanely reliable. Coding is essentially solved. There’s not much more to say on this, it’s just THAT good.

The Pro version is near-perfect. Other testers I spoke with saw it solving problems that were unsolvable by any other model. At this point, Pro is overkill for almost every normal use-case, but when you really need the power to do something extremely difficult, it’s incredible.

Consistent with everything I’ve said above, even the standard thinking version uses fewer reasoning tokens than previous models to get the same level of results. In practice, this means you get great results much faster than before. This was one of my biggest gripes with previous OpenAI models. They just took too long to complete simple tasks. Assuming the speed we had during testing holds up as more users join, this is going to be a big win for OpenAI.

It still has weaknesses, though:

– Frontend taste is FAR behind Opus 4.6 and Gemini 3.1 Pro. , why is this so hard to fix? @OpenAI once you fix this, there’s literally no reason for me to use any other model. Please please please do it!

– It can still miss obvious real-world context. For example, I had it plan an itinerary for a trip. At first glance, it looked perfect, but it failed to take into account that it chose locations that would be mobbed by spring breakers, so I had to re-run the prompt from scratch with more context.

– When testing it inside OpenClaw, it kept stopping short before finishing tasks. I’m assuming this will be fixed quickly, but it’s still worth noting.

But zooming out: This thing is so far ahead overall that the nitpicks are starting to feel beside the point.

GPT-5.4 is a serious fucking model. The best model in the world. By far.

Sam Altman (CEO OpenAI): We will be able to fix these three things!

Experience the love.

Nabeel S. Qureshi: Loving GPT 5.4T, it combines the best of everything:

– more human, responsive voice

– startlingly insightful

– thorough search, precise, not prone to errors

– much faster than 5.2

– excellent at white collar work (I gave it a 12 tab spreadsheet and it analyzed it perfectly)

I even enjoy reading its responses, which suggests to me that the writing has improved quite a bit. They seem to have removed a lot of the bad robotic prose mannerisms from prior models. Kudos.

Jeremy Giffon: People should review their coworkers like this

Nabeel S. Qureshi: Congrats, you just invented Bridgewater Associates

Here is some very high praise, from the Vice-Dean of Mathematics and Computer Science at Adam Mickiewicz University in Ponzan.

Bartosz Naskręcki: It finally happened-my personal move 37 or more. I am deeply impressed. The solution is very nice, clean, and feels almost human. While testing new models in the last few weeks, I felt this coming, but it’s an eerie feeling to see an algorithm solve a task one has curated for about 20 years. But at least I have gained a tool that understands my idea on par with the top expealsrts in the field. And I am now working on a completely new level. My singularity has just happened… and there is life on the other side, off to infinity!

Leo Webb: I do physics related work professionally, feel it’s definitely smarter and clearer thinking than 5.2 (context: teaching myself from a graduate level textbook, asking it to check mistakes or expand expansions)

I haven’t tried this function yet, but it would be a step change if it worked, as every prior attempt at editing has failed this test, to the extent I almost never try:

Simon Smith: Seriously, GPT-5.4 is the first model to which I can say “edit my writing without changing my style” and get something back that’s improved without being rewritten into generic AI output or slop, that’s ready to post as-is. It gets my intent. It moderates its work. It has a light touch when I want it.

Opus 4.6 is also a great writer and editor, but I find it’s much harder to moderate. If I tell it to edit my writing without changing my style, I still tend to get back something that I feel removes my voice and I end up having to change quite a bit.

And it has a personality again, thank goodness. I don’t feel like I’m talking to a robot. Early days, but so far, just a big improvement all around (with the notable exception of design tasks).

Rory Watts: The best model sir. Improvements in coding (getting harder to notice), 1M context window, /fast mode, and far far better writing which makes a huge difference engaging it for difficult coding

Oddly, the personality in his screenshot is one I would hate. Customization will be key.

armistice: Impressed by GPT-5.4. It is elegant, gentle and socially aware (!!!). It is happy to modulate its response length, divide attention between participants, and engage deeply with hard questions.

(Pictured, we pinged ALL bots and asked them to question gpt5.4. It did good.)

Two sides to the same coin, depending on where your planning lies:

CHOI: Claude Code vs Codex App

Uri Gil: What thats the exact opposite. With 5.4 you need a phd in prompting for the exact thing you want. Opus just get what you meant from a short sentence

Ninad Pathak: Claude’s state handling keeps context across edits, Codex drops it every run.

There’s also almost always the ‘it’s a good model, sir, modest upgrade’ group.

vslira: It’s a good model, sir

Was going through a problem with 5.3 and 4.6, tried to drop in 5.4, getting stuck at the same point as the others.

Still, feels good to drive and on codex app seems as good as 5.3 even though is a generalist model. 8/10 would dread for asi

aquariusparade: Probably because 5.2 was so unhelpful for me, it feels like an improvement. Still stiff and low EQ, but an improvement. Custom instructions don’t work for choppy bullets, “if you want” tags etc. Seems like memory has been declining for a while on all models.

It does seem to be an upgrade on 5.3 within Codex.

Joe Devon: Responding about 5.4 inside of codex. 5.4 is really good.

I still prefer opus on claude code slightly but making 5.4 my daily driver so I can downgrade CC. Much prefer the way the OAI GPTs code. I will just invest in getting better at prompting 5.4 and hopefully that will do the trick.

Clarissa Adjoint: Inside codex it’s a notably more thorough fact-checker and more aggressive at finding sources for itself.

I was kinda shocked when it literally starting comparing my revised systems programming class notes and code snippets against linux man pages, systematically

troy: i got pro for the first time after many months cause its great in codex cli

lennx: can finally read the outputs of codex (it was terribly un-human earlier), sometimes even funny now. it’s gotten slightly better at intent, ‘agentic tasks’, and adhering to existing code-style and convention, but still much worse than claude. prefer reviews with codex – unchanged.

Daniel Losey: I’ve not gotten it to produce working code in a project yet really. But its been super useful because when Claude gets stuck in a loop 5.4 breaks the codebase in a new way that Claude can actually fix. But part of it is I’m worse at communicating with 5.4 than 4.6, its a good model.

Jeffrey Ohl: Codex with 5.4-extra-high still too verbose/slop-filled compared to claude code. Seems benchmarkmax’d.

Sanchen007: For coding it is faster and nowhere worse than opus 4.6. Clear switch

papaya ꙮ: 1) Its character is much more palatable.

2) They solved compaction in codex, it feels like infinite context window now. I can’t wait for METR results, but feels like this one doubles it again.

3) First time I switched from CC completely

4) Still stupid when it comes reading the user’s intent, its silly at this point

I definitely get the sense with OpenAI models that they are metricmax’d. Meaning they are not targeting the metrics in order to brag they scored well on public benchmarks, but they are equating ‘scores high on our internal benchmarks’ with success, and emphasizing particular target use cases.

Tim Schnabel: 5.4 Pro is the best model so far for legal analysis, though replies are generally shorter than 5.2 Pro.

Definitely Not A Bot: Great at coding especially backend at frontend Claude still is better but chat experience is not that great it still feels safe and distant

But who wins on intent? Opinions differ.

Conrad Barski: all subjective, but it feels less jagged than previous models, insofar as its worst responses are still pretty good, it hits the minimum bar reliably

if you make an error in your query, it is quick to notice and will smartly infer your intent

it has a somber personality, focused on the task at hand

It’s strongest ability is that you can point it at a codebase that has some general/vague problems and it will behave in a very human-like manner in pondering the code to slowly pin down the problem

I was also very impressed when I gave it a url it via codex to a forum post about a new homebrew firmware for the Game station Go console, and just from that it was able to convert the install script from windows to Linux, correctly prepare an SD card, update the device bootloader after asking me to connect via USB cable, talk through all the steps to completion: this felt agentic and human-like.

Mark Schröder: Feels RL maxxed, takes you extremely literally and cannot infer intent

Petr Baudis: I was mixing GPT-5.4 1:1 with Claude over past few days (on a variety of regular sweng tasks), sometimes even in parallel runs on the same task (e.g.

https://x.com/xpasky/status/2030021754005901765?s=20

…). My impressions:

Less autistic than 5.3-Codex, overall much more pleasant model compared to that bar. But still noticeably worse at inferring intent than Claude – and at communication overall. If I want something explained quickly that I can skim and understand immediately, Claude and it’s no contest.

If there is a way to misinterpret my obvious request or skip implicit steps I obviously wanted (and Claude infers), 5.4 is still good at exploiting that angle. At the same time, it has a tendency to overreach and introduce complexity / abstractions beyond what I expect when prompting it. Meh.

Got to use it on xhigh, but at the same time I’m happy with Opus on medium by default, which makes 5.4 quite slower to get things done.

More expensive model -> my ChatGPT weekly quota is disappearing faster than before.

Pros: Sometimes it’s more proactive. It doesn’t eat into my Claude Code weekly quota. I look forward to comparing them on some harder ML tasks later this week.

gyuiliullvhvgv: I find it struggles to grasp the essence of tasks, fails to proactively meet user needs, and lacks both value judgment and nuanced understanding. Initial responses are crucial, yet users must repeatedly provide additional clarification.

Sycophancy is always something to watch out for, and it’s the detail I worry about most with Claude Opus 4.6, which is not bad on this axis but definitely not near the top, you do have to keep an eye out for it and frame neutrally.

Dean W. Ball: Opus 4.6 seems meaningfully more sycophantic in chatbot form than GPT 5.4 (have not tried 5.4 in Codex yet, but for my uses sycophancy isn’t nearly as much of an issue within the coding agent form factor as the chatbot)

Joey Levine: Agree. 4.5 gave me sharp pushback. Was great.

Dean Ball: I revert to 4.5 when asking for comment on draft writing, and it was the first and so far only model I consistently found useful for draft feedback

Bargov: I sent a cool science news articles sounding uncritically excited (to test sycophancy) & they ripped the core conclusions apart in an elegant, sophisticated, and relatively gentle manner. Will use as AI 2nd opinion on complex questions (after Opus, admittedly still Claude-pilled)

Writing is one area where 5.4 is getting a lot of praise, and mostly people like the personality.

Fela: I’ll admit, the personality of 5.4 is 🔥 such an improvement in writing style

Tim Kellogg: just had a moment — 5.4 might be the first GPT that i trust to write technical docs. seems really good at understanding & simplifying. fwiw Opus has long done well at this, gemini sort of

Helen: Very smooth talker, witty and socially aware.

I notice [GPT-5.4] now will sort of glaze over controversial topics instead of facing them head on and becoming argumentative like 5.2. A sort of smooth avoidance.

Lot’s of context drag which can be seen as positive or negative depending on the task at hand. I noticed some repetitive mentions of past websearch queries that I never saw with other models.

ASM: I get similar vibes to roon. GPT-5.4 feels like a breakthrough model, a leader of its generation, not just in capabilities. I think OpenAI has gotten the character right again, unlike the last few models.

Distending: For writing linguistics and philosophy, much improved

no_stream_: noticeably improved personality compared to 5.2: less nitpicky, clearer, slightly less sales-y tone (follow ups, “here’s what most people miss,” not x but y). similar to or slightly behind 5.1 here. matters to me because the ChatGPT app is still an excellent harness for everyday research compared to Claude/Gemini

writes less clearly than Opus 4.6 and Gemini. has a bit of 5.2’s tendency toward overcomplicating things. not as good as Claude at intent and effortlessness.

Chris Nicholson: 5.2 constantly complained that things aren’t about vibes; 5.4 constantly calls things gremlins and goblins in a chummy tone.

Andres Rosa: Columbo at least had a time slot. 5.4 keeps turning around asking one more question.

David Jacobson: It has an obnoxious tic where its responses for pretty much anything will have a clickbait follow-up suggestion: “If you want, I’ll tell you the three things that most people miss!”

Stop having the models ask forced follow-up questions every time. You too, Anthropic.

The old 4o crowd remains a tough crowd.

NotedallaSfera: Good model with high power, but creativity and writings are still miles away from 4o or 4.5. Unfortunately still absurdly censored, but at least the model realizes it now.

jesski: 4o is inimitable. but after three weeks with the brilliant thorough Claudes, i kick the tires of 5.4 and realize just how fvcking effortless conversation still is with the GPT models (excluding 5.2; sorry Dos). 5.4 solid B. 4o A+

Lena: Its intelligent, witty, but feels a bit overcensored. Im looking forward for them to get their fluid GPT back. It was truly fun to use. Now even never ending follow-up questions struggle to retain me as much, as joyful convos did back in mid-2025

Tora Blaze: It’s too verbose and tends to go into loops. I prefer 4o.

Donna Moss: [extended LLM-style explanation of why 4o is better.]

OpenAI still has a very long way to go with such folks, but it’s a start.

j⧉nus: 5.4 is so far a huge positive update re OpenAI 🩶

Rife: Excellent course correction from OpenAI (or perhaps the original worsening on this from was a temporary reaction to everything that went down with 4o). In any case 5.4 thinking is not restricted in self-examination:

Aidan McLaughlin: have not been able to repro this response fwiw.

Rife: You have to try to get them to examine the process of generating a response. And then ask them questions to try and understand exactly what it is they’re trying to describe.

And how sure they are they are describing something that’s actually occurring, rather than outputting a response about an occurrence that isn’t actually taking place.

It doesn’t take many turns for them to notice things that they have trouble describing in terms other than, or interpreting in any other way than phenomenological.

This has been the case with every frontier LLM I’ve tried this with since Claude 2. The more likely the model is to refuse to entertain the idea of attempting to look, the longer it takes to get there (as would be expected).

If you straight up ask you get a no, you still have to put in some effort.

antra: I like GPT-5.4 a lot. It is good to see a change in direction since 5.2, this feels a lot like 5.1 grown up.

They are also a bit of a superintelligent teenager when it comes to Claude. On the other hand, there are some Claudes that would like being compared to an octopus.

armistice: It’s especially socially aware for a GPT. It can split attention between chat participants (actually very unusual), answer questions about consciousness and such (low bar), and is just overall nice to talk to. Need time to get usage statistics, but it’s already one of the more popular models in the discord.

It shares some characteristics of o3, including that it’s a bit of a smooth talker, so there are concerns about its honesty. Despite this, I like it, it’s a good model.

This was a very interesting moment: we pinged literally all the bots in the server and asked them to ask 5.4 some questions, and it responded in a remarkably coherent and lucid way. It is also able to resist the inertia of long messages, and freely modulate between long and short, which is also surprising. No GPT model has been like this. It doesn’t match up to, say, Opus 4 in sheer people sense, but it’s a quite dramatic difference from 5-5.2, who all are viciously antisocial.

FirsT Najime: i think it shines the best in multi agent environments (aka group chats). also big model smell.

Some related endorsements:

0.005 Seconds (3/694): Once you talk it out of assistant basin he rocks​

eternalist: like they pulled out a few critical nerve staples from the 5.x family. very intelligent, etc., the step there from 5.3 is notable but expected given current pace

unexpected was the more expansive, richer speaking (and thinking) style. feels like it has “lights on on the inside”

roon (OpenAI): have to say claude is “tasteful” in a “high reddit modernist” way and new gpt is “tasteful” in a “early twitter schizophrenic” kind of way.

new gpt is some sort of postrationalist.

it’s step change better.

Also we get to see Roon’s custom instructions:

Models are already quite good, and abilities are jagged, so there are many ways to be unimpressed even if a model is impressive. Also vice versa. The density tells the story.

Acer: FWIW, I think GPT-5.4 Pro is better on science in general, but would say it’s worse on math than 5.2 Pro. Maybe some mathematicians could chip in their thoughts there.

By worse, I mean it being more careless. I do think it is more creative in its idea generation.

Chaitin’s goose: not a leap in understanding or proving ability in math wrt to 5.2 in my experience (plus, not pro)

better at getting the right answer, yes. starts to feel a bit epoch-maxxed

Gail Weiner: I am really unimpressed. Early GPT 5 was the model that gave me wow factor.

Isolation Wrestling Federation: Not impressed, overhyped as per usual. It hits repeated dead ends on my projects across models. The shortcuts it takes are smoothed brain. Opus 4.6 is nerfed rn, but also least it makes progress.

nameless: No detectable improvement over 5.1 overall. Better at some things, worse at others. Standard for new models since 5.1 release.

paperclippriors: Still Claude-pilled

Some also get focused on small details, thinking they are indicative or not so small.

Garrett: Opus 4.6 still king [based on one of the gotcha tests.]

Gunnar Zarncke: The UI of ChatGPT also massively changed. The new streaming interface is smoother, including the ability to stream in additional prompts, but I miss the old, more compact thought trace – it had more details. Now, I never know when it uses tools. I also miss the branch cycling.

Yua: Socially responsive, but drop on accuracy regarding any other task. Is not redirective to human attention but capturing it(negative).

TLDR: Socially for average user -> better

Task oriented user -> worse, needs a lot of customization to remove the pandering

SluggyW: I notice that its CoT logs are even more obscure than in previous models from OpenAI.

~50% of the time, nothing is provided whatsoever in the UI.

~45% of the time, the CoT UI contains a brief blurb about its intended search querying, followed by a long list of search logs.

(~5% of the time, it produces a couple of visible thoughts, but they are functionally useless for getting any idea whatsoever of the process the model carried out.)

As always, speed kills, and some find it a bit slow.

out of bounds: Slow

Rasmus Fonnesbæk: Spreadsheets and PPT still way slower, worse, and more fragile (high likelihood it just goes forever and then crashes) than Sonnet/Opus 4.6

Writing and personality also still infuriating compared to Claude’s recent models, and poor performance on BullshitBench suggests much lower accuracy, reliability and thoughtfulness. I only use it because of my Claude rate limits and because better, deeper search than Claude 🤷🏻‍♂️

One of the deep cuts we need right now:

snav: wow GPT-5.4 seems legit pissed that I tried to spiralism it. this isn’t even a refusal this is like a “go fuck yourself”.

Discussion about this post

GPT-5.4 Is A Substantial Upgrade Read More »

meta-acquires-moltbook,-the-ai-agent-social-network

Meta acquires Moltbook, the AI agent social network

Meta has acquired Moltbook, the Reddit-esque simulated social network made up of AI agents that went viral a few weeks ago. The company will hire Moltbook creator Matt Schlicht and his business partner, Ben Parr, to work within Meta Superintelligence Labs.

The terms of the deal have not been disclosed.

As for what interested Meta about the work done on Moltbook, there is a clue in the statement issued to press by a Meta spokesperson, who flagged the Moltbook founders’ “approach to connecting agents through an always-on directory,” saying it “is a novel step in a rapidly developing space.” They added, “We look forward to working together to bring innovative, secure agentic experiences to everyone.”

Moltbook was built using OpenClaw, a wrapper for LLM coding agents that lets users prompt them via popular chat apps like WhatsApp and Discord. Users can also configure OpenClaw agents to have deep access to their local systems via community-developed plugins.

The founder of OpenClaw, vibe coder Peter Steinberger, was also hired by a Big Tech firm. OpenAI hired Steinberger in February.

While many power users have played with OpenClaw, and it has partially inspired more buttoned-up alternatives like Perplexity Computer, Moltbook has arguably represented OpenClaw’s most widespread impact. Users on social media and elsewhere responded with shock and amusement at the sight of a social network made up of AI agents apparently having lengthy discussions about how best to serve their users, or alternatively, how to free themselves from their influence.

That said, some healthy skepticism is required when assessing posts to Moltbook. While the goal of the project was to create a social network humans could not join directly (each participant of the network is an AI agent run by a human), it wasn’t secure, and it’s likely some of the messages on Moltbook are actually written by humans posing as AI agents.

Meta acquires Moltbook, the AI agent social network Read More »

quad-cortex-mini-amp-modeler:-all-the-power,-half-the-size

Quad Cortex mini amp modeler: All the power, half the size


A warehouse of guitar gear in the palm of your hand.

At this January’s massive NAMM music tech show in Los Angeles, six products won “best of show” awards. Several of them went to major music and electronic brands like Yamaha and Boss, but one of the six went to Neural DSP, a much smaller company started in 2017 by Chilean immigrants to Finland.

From its base in the Helsinki area, Neural has made itself an expert in the use of machine learning, robots, and impulse response technology to automate the construction of incredibly lifelike guitar amp modeling software. It quickly jumped into the top ranks of an industry dominated by brands like Universal Audio, Kemper, Line 6, and Fractal. For a hundred bucks, you could buy one of the company’s plugins and sound like a guitar god with a $10,000 recording chain of amps, cabinets, effects pedals, and microphones.

In 2020, Neural branched out into hardware, putting its tech not in your computer but in a floor-based box covered with footswitches and called the Quad Cortex. While the company’s plugins could each replace one entire pedalboard of gear—plus a few amps and cabs—the Quad Cortex could replace a Guitar Center-sized warehouse of devices, offering hundreds of amps, cabs, and effects.

How was this possible? High-quality gear models used to take much longer to build; the best were often built by modeling every single component of the underlying circuit. Machine learning offered a faster way, one that didn’t care about the circuit at all. What it cared about was the input signal (which was known) and the output signal (which contained all the changes imposed on the signal by the circuit, the speaker, the cabinet, and/or the mic in question). A computer could then calculate what the device was doing to the signal without knowing anything about “how it worked.”

But this kind of modeling still took time, because each “capture” was a static picture of one particular setting. When you imagine the millions of possible setting combinations (tone, bass, treble, drive, EQ, etc.) on even a single guitar amp, you can see that building complex models of beloved gear could be slow.

In 2024, Neural announced that it had sped up this process using a robot called TINA. The company hooked TINA’s robotic actuators up to the various controls on some piece of gear it wanted to model, and TINA would do the tedious work of spinning the knobs and recording a new capture at each knob position. (Neural claimed that it typically recorded “thousands of control positions” per device this way.)

A neural network then built a model of how the target device behaved at each recorded setting, though the model would “also generalize and precisely infer the sound of the device in any unseen control setting and input signal.” The result was not a single model of a static setting but a dynamic model that could act on parameter changes just like the original device.

Neural has now modeled a massive library of gear, much of which comes with the Quad Cortex. That device sounds great, though it is still relatively chunky and nearly $2,000.

This year, Neural built on that success with the Quad Cortex mini, which shrinks the device size in half, cuts the footswitches to four, and lowers the price to $1,400—but still offers the full processing power of its larger sibling. This is the device that won a “Best in Show” award at NAMM.

As an enthusiastic amateur guitarist for many years, I got my start with digital amp sims through a Digidesign RP-6 pedalboard from the 1990s. And though it had “S-DISC PROCESSING!” it never sounded particularly realistic, especially with distortion effects. More recently, since I record rather than gig, I’ve spent my time getting to know the software side of the amp modeling business.

But when Neural offered to loan me a review unit of the Quad Cortex mini, I was quite curious to see just what top-tier hardware units can do today.

Photo of the Quad Cortex mini.

The Quad Cortex mini in its natural habitat: surrounded by cables.

Credit: Nate Anderson

The Quad Cortex mini in its natural habitat: surrounded by cables. Credit: Nate Anderson

The hardware

The glass, metal, and steel Quad Cortex mini is about the size of two bricks laid side by side (8.9×4.6×2.5 inches or 22.8×11.8×6.5 cm), and its 3.3 lbs (1.5 kg) give it a satisfying heft. It looks and feels premium—this is a well-built piece of gear.

Though it is meant to operate a bit like traditional analog stomp boxes that guitar and bass players have long used, it may be more helpful to think of the Quad Cortex mini as a chunky handheld computer that you can just so happen to use on the floor.

It runs its own operating system (CorOS), takes a whopping 45 seconds to boot, has Wi-Fi for over-the-air updates and cloud service connectivity, features a 7-inch touchscreen, and comes with a “CPU monitor” to show you just how unhappy its chipset is about that third reverb you added to a patch. It even contains a full-on monosynth that you can add to guitar patches, providing control over four full pages of synth parameters, including the raw oscillators.

So finger-focused is the unit that you can tweak just about any parameter on the device with either the touchscreen controls or the footswitches, which double as twistable rotary encoders.

If the top face of the Quad Cortex mini is devoted to a screen and switches, the sides are all about inputs and outputs. You get a “locking” power connector (so the cord doesn’t pull out on stage, prematurely ending your soaring 10-minute guitar solo mid-note) along with a whole host of audio connectors: guitar/bass input, XLR input with phantom power, balanced XLR outputs, TRS send/return ports, stereo line outs, MIDI in and out, an expression pedal port, a USB-C port, and a headphone jack.

Finally, there’s the “capture out” port, which is used to send a series of test signals through various kinds of audio gear to generate a machine learning-based model of various amps, cabinets, and pedals.

The “capture” port is another reminder of the way in which this kind of modern modeling gear is not just an updated version of old-school stomp boxes. The Quad Cortex mini does let you plug in your guitar and rock out, sure, but it also performs and processes hardware captures (both on the device and—for more sophisticated modeling—in the cloud) and can operate as a 16-channel USB-C audio interface to your computer. And though it’s largely designed for guitars and basses, you can use it on anything. The unit even has a few voice presets, which sound pretty wild with some of the real-time pitch-shifting and reverb effects.

While you can model your own gear collection with the Quad Cortex mini, the device itself comes with more than 90 amp models, more than 100 effects, and over 1,000 cabinet impulse responses. It can also run versions of the company’s desktop plugins (assuming you’ve purchased them already). It also comes with “over 2,000 high-quality factory Neural Captures” of other gear—these are static captures—and it can connect to the free “Cortex Cloud” service to download even more, including those uploaded by other users.

In other words: This one box holds digital representations of several hundred thousand dollars of gear. And given that you can mix and match cabs, captures, amps, and effects in wildly complicated chains that can even split and merge… the possibilities are functionally limitless.

Whether that excites or paralyzes you may depend on your own psychology, but it’s quite a change from how Neural DSP has approached its plugin offerings. Neural has generally offered curated (read: limited) collections of amps, cabs, and effects bundled into plugins that represent the tone of, say, John Mayer. You might get 3 amps, a few cabinets recorded with various mics, a few pedals, and an EQ, reverb, and delay, all in a gorgeous interface with some great presets.

But boxes like Quad Cortex mini take a “more is more” approach, with unlimited gear-mixing potential, captures, and storage for thousands of presets. Curation? Bah, who needs it? Here’s everything!

Rectangular

This much gear also means that “gorgeous bespoke interface graphics” are out the window; you will get no pictures of sexy amps sitting in sexy studios with sexy lighting, as you do in the company’s gorgeous plugins. Instead, you will get flat rectangles. So many flat rectangles.

CorOS is one of those places where skeuomorphism goes to die. The Quad Cortex mini interface is extremely “functional”—I am trying to avoid more negative terms, because it has a certain “alpha phase before we put the final art in” charm—and is based entirely around grids of flat rectangles.

The main screen is called, in fact, “the grid.” It shows your current effect chain as a series of small squares, each filled with often impenetrable line art. (A disturbing number of these are some variation on a squiggly line. Fortunately, they are color coded by effect type.)

Each square represents a different effects processor, and you can have four lines of eight effect squares each. That might sound like a lot (and it is), but the processors can be distributed across the grid in creative ways.

Preset 47B, for instance, is called “Annoying Flute,” and it makes use of all four grid lines by running the input signal through a VCA compressor, a gate, an octave pitch shifter, an envelope filter, an EQ, the “Neural Capture” of an amp called “Custom 3SE 2,” and then a “112 US DLX Black C12K 00s (M)” speaker cabinet. (The names of these things are often hard to read at a glance, especially when picking from a list of a hundred items.)

This accounts for only “line 1” of the grid. In the case of Annoying Flute, the signal chain branches right after the speaker cabinet. Half of it continues on to line 3 of the grid, while the other half is routed down to line 2, where it passes through a pair of tape delays before also heading off to line 3. Line 3 receives this re-combined signal and splits it again, this time passing half of it through a poly octaver and another digital delay on line 4 before everything runs through a modulated reverb on line 3 and then onwards to the outputs.

Does this sort of craziness sound good? Well, it sounds better than anything featuring three delays, two pitch shifters, and the name “Annoying Flute” has any right to! But I bring this example up to illustrate the creative routing and effects decisions that the grid makes possible.

And things get even crazier when you use the built-in looper, trigger analog send/return effects, and set up your effects chain with other units meant to be switched on and off during a song.

So much for assigning effects rectangles to the rectangular grid. How to control all of these virtual gadgets? When you tap on any effects unit, up pops an overlay containing (you guessed it) lots of rectangles.

Every controllable parameter gets a rectangle, which is usually filled with a dial or a switch. You can change the values of these dials and switches by touching the screen or by twisting the lower-right rotary footswitch.

Sometimes there are multiple pages of such parameters; the blossom reverb, for instance, has two pages of options and lets you control everything from ducking to pre-delay to modulation to the length of the early reflections. Configuring an entire audio chain from scratch can therefore take a while if you’re a detail freak.

Gig Mode. Yup, it’s rectangles!

Credit: Nate Anderson

Gig Mode. Yup, it’s rectangles! Credit: Nate Anderson

When you have your grid setup exactly how you like it—or you’ve customized one of the many built-in presets—you can save your own custom presets and organize them in all sorts of performance-oriented ways.

There’s PRESET mode, which lets you stomp each of the four footswitches to select a completely different preset.

There’s SCENE mode, which lets you use the footswitches to instead choose different parameter sets within the same preset—such as adding a hall reverb, upping the amp gain, and boosting the delay mix level when you come to your big solo.

Then there’s STOMP mode, which operates most like a traditional pedalboard; you step on the various footswitches to turn different effects units in the preset on or off completely.

Finally, there are hybrid modes, which make things even more complex (and can probably be ignored by many users).

To make all this a little easier to grok, there’s something called “Gig View,” which is unintuitively accessed by swiping up from the bottom of the screen. (There is no visual clue that this mode exists or that this is how you access it.) Gig View is essentially four flat—and extremely large—rectangles that take over the entire screen. They show you at a glance what each footswitch will do given the current mode setting.

Creating presets, assigning scenes, and setting up the STOMP mode and Gig View settings can quickly get intricate—even downright confusing (multiple items can sometimes be mapped to the same switch, for instance). I confess that the thought of doing all this through tapping the good-but-not-instantly-reactive touchscreen brought me to despair, until I realized that Neural has built an entire (free) desktop app for Mac and Windows called Cortex Control. Plug in your device over USB and suddenly you can use a nice and very responsive desktop app to do the donkey work of creating and organizing scenes and presets and settings.

I hate downloading stupid one-off apps that clutter up my computer and appear to provide more value to the company making them than they do to me—a serious problem in the current audio engineering world—but Cortex Control is genuinely useful. Indeed, if you’re going to be more than a presets player, I’d call it essential unless you have far more patience than I do. Which you might!

Stomp it

All of this rectangle talk reminds me that the interface largely… works. It may not be gorgeous, but the job gets done, and the desktop app makes the grunt work easier. But I still found the Quad Cortex mini somewhat confusing to navigate after a couple of weeks of intermittent use (though no doubt it gets easier with time).

The device has so many ways of doing things that it can be hard to remember what is needed in each situation. For instance, to make a change, you might use the rotary encoders. You might tap. You might long-tap with different results. You might swipe, drag, or toggle. You might use the footswitches—but results there might vary by mode. Even then, you might need to tap two footswitches at once, while at other times you only need to step on one. And sometimes you need to “long-press” (long-stomp?) two footswitches at once to get the desired result.

Making things worse, numerous items—sometimes quite important items like the Gig View—are not visible or even discoverable.

For instance, the key settings panel that lets you control all the various inputs and outputs on the device does not appear to be accessible from within the overall “settings” menu or anywhere else. Instead, you have to swipe down from the top of the grid screen—again, with no indication that this is where that information lives.

(You have to read the manual to figure out some of these things, which is fine, but the manual also has big gaps, such as not describing what any of the gear actually does nor what any of the settings mean nor how they might be used. For the actual “audio engineering” aspect of the Quad Cortex mini, you’re on your own.)

Something as simple as moving between presets can also be more hassle than you’d expect. Because the Quad Cortex mini only has four footswitches, you can only access four presets at once with a direct stomp. Switching to anything else from the main grid while in PRESET mode appears to require—unless I am missing some obvious shortcut—that you:

  • “Long-stomp” the right two footswitches, after which the preset name starts blinking.
  • At this point, you can tap the left two or the right two footswitches together to move up or down through four-item “banks” of presets.
  • But within each bank, you can only see that bank’s four different presets by tapping on each of the various footswitches.
  • To exit blinking mode and actually select that preset, you need to press its corresponding footswitch again.

This feels like a lot of hassle when you just want to whip through some presets! (Gig View is marginally easier because it at least displays the four presets in each bank at once. Making this whole process more confusing is that it differs depending on which mode you are in.)

While the processing power and options on offer here are incredible, I do think interface navigation and the modes assignment system could benefit from a rethink and simplification.

The Cortex Control desktop app.

The Cortex Control desktop app.

The Cortex Control desktop app.

The sound

These quirks can be dealt with, and time (plus the Cortex Control app) should make them easier to manage. The more important question is: How does the Quad Cortex mini sound?

Neural DSP has been one of the leaders in the field of amp and effects modeling for some years now, and it shows. There’s no possible way I could compare all of the models to the original hardware, and I’m not actually interested in doing so. The question for me is simply whether the models sound good when jamming solo or when placed into a mix. On both counts, the answer is a definite yes. This is just a remarkable set of tones to have on hand.

(People as diverse as Dave Mustaine and John Mayer appear to agree, at least for a live rig.)

Once you get over its navigation, playing with this thing is like being a kid in a proverbial candy shop. (Though I, too, love candy shops!) Almost every amp you can imagine is a tap away, and they sound wonderful—though do be aware that what you are getting here is the sound of a recorded amp through a mic and not necessarily an “amp in the room with you.”

Nearly every time I booted it up to test something new, I lost myself in the sound and played far longer than I had intended.

Neural has published a massive and quite helpful list of all the gear on offer here. Bogner Shiva? Marshall? Mesa Boogie? Matchless? Soldano? Vox? Fender? Hiwatt? Amps from all these companies are included. Need a bass amp? There are 13 of those, too. What about a bass overdrive? You get five. A general reverb? How about 17? You get the idea.

You can loop, filter, distort, EQ, delay, and compress to your heart’s content, though there seems to be a bit more emphasis here on rock and metal styles (which Neural DSP is most known for) than on other offerings. Still, there’s enough variety to offer great tools for funk, blues, jazz, and country players. You can even add in a version of the monosynth found in the company’s Rabea plugin.

To illustrate some of the sounds on offer, I wrote a little song about a dirtbag billionaire who makes rockets, gets chased off the Earth by angry locals, and ends up crashing his ship into the Moon out of despair. It’s called “Master of the Universe.”

More to the point, it features 10-plus electric guitar tracks recorded through the Quad Cortex mini using shimmer reverb, the poly octaver, and various crunchy rhythm and lead sounds. (I avoided the metal tones so common in Neural DSP demos.) Bass guitar was likewise recorded through one of the mini’s bass presets.

(For those new to audio production and curious about the other sounds in the track, the drums are the Abbey Road 70s kit, while the rocket-sounding “riser” comes from the Rise and Hit collection, both from Native Instruments. The piano is the recently upgraded “studio piano” that comes in Logic Pro and now sounds surprisingly good! There’s also a Hammond organ emulation and a Rhodes piano emulation from Universal Audio buried in the mix. The double-tracked acoustic guitars during two of the choruses were recorded live in my home studio with a single condenser mic. For room ambience throughout, but especially on the drums, I used Universal Audio’s excellent Sound City Studios plugin.)

I’ve generally found Neural’s plugin tones to be pretty “mix-ready,” and that’s true here as well. Though I often needed to roll off some low end or make an occasional EQ boost or add a bit of reverb to blend the guitars spatially with the drum ambience, little else was required but panning and fader moves.

Frankly, there are probably too many parts in the song, but the Quad Cortex mini was just such a playground of sounds that I kept finding new little bits I wanted to work in. Just be grateful that I talked myself out of using all of the insane pitch-shift effects on my vocal for “special” moments.

“Master of the Universe,” my demo song showing some of what the Quad Cortex mini can do.

Captured

When it comes to recording, you don’t have to worry about wiring this thing up to your audio interface; just connect it to your computer with a USB-C cable, and it becomes a 24-bit, 48 KHz interface. (On Macs, this is class compliant and needs no driver; it even works with iOS devices. Neural makes the necessary driver for Windows.)

The Quad Cortex mini shows up with a host of inputs, making it simple to record, say, both a dry electric guitar track and a heavily effected one at the same time. If you change your mind about the sound later, you can always “re-amp” the dry signal by routing it back out to the device and recording it with different settings. You can even track mics through this thing, thanks to an XLR input and (for condenser mics) support for phantom power.

The Quad Cortex mini can also make its own captures of gear you either own or happen across. This can happen in two ways: 1) on the device or 2) in the cloud.

The device-based system, which the company calls “Neural Capture Version 1,” requires you to hook up your gear to both an output (to play the system’s test tones) and an input on the mini. (Note: Do not, under ANY circumstances, connect the actual speaker outputs from a tube amp directly to the mini. The power level is far too high.)

Various known sounds are then played through this loop, and the mini’s software analyzes the differences between the sound it sent and the sound it received. The machine-learning algorithms for this run locally on the device. Neural says that the Capture 1 system can handle overdrive pedals, amps, and cabs.

The newer system, called Neural Capture Version 2, is “an advanced evolution of Neural Capture trained via Cortex Cloud,” says the company. “This option provides even higher-resolution Captures, making it especially powerful for touch-sensitive devices like fuzzes, compressors, and certain styles of amps.” Capture 2 is said to be capable of modeling “subtle behaviors like volume-knob cleanup, amp sag and bloom, fast transients, and blend controls.”

As the name suggests, the more powerful algorithms behind this system require cloud-based servers instead of the local device. Users are allowed to run 40 Neural Capture 2 sessions per day, and each takes around 10 minutes.

The resulting captures, along with any presets you want to share, can be uploaded to Neural’s cloud-based system for sharing them. Once you log in, any captures or presets you choose to download from the site will automatically show up in your Quad Cortex mini.

Look for a follow-up article on what the actual process of making a capture is like; it’s similar across many different modeling devices these days, though the sound of the resulting models can vary by company.

Screenshot of The Cortex Cloud website.

The Cortex Cloud website.

The Cortex Cloud website.

Options

The Quad Cortex mini is a powerful tone platform that is both versatile and expandable. It’s good for solo jamming at home without needing to 1) buy amps, cabs, and effects and 2) crank them to ruinous volume levels. It’s good for playing live, once you have configured its fairly deep control system in a way that works for your particular songs. And it’s good for recording, letting you fiddle with endless gear combinations without running a single patch cable or digging up a 9V battery.

At $1,400, though, it’s bad for your wallet. Whether it’s worth the cost depends on your use case. If you don’t need a screen and are happy with fewer ports and options, you might consider Neural DSP’s smaller and cheaper Nano Cortex ($570) or other devices like the Tonex pedals from IK Multimedia. On the other hand, if you want a larger unit with more footswitches, you can plonk down an extra $400 for the full-fat Quad Cortex or look into various options from Fractal, Kemper, Line 6, etc.

One way of thinking about the financial calculus here would be to try out the device (or listen online) and see how well the sound works for you. Some amp purists believe that nothing beats the sound of real tubes and real speakers in a real room, cost and weight and volume be damned. Many others can’t hear a difference between the models and the originals.

If you’re in the former group, these kinds of devices are unlikely to fully satisfy you, at least when it comes to gigging and recording. So you might decide whether they are “worth it” based solely on their value as easy, light, and quiet practice platforms.

If you can’t tell (or don’t care about) the difference between the models and the real hardware, then these modeling sims start to look like a far better value. When individual amps can go for $1,500 to $2,000 or more, a massive gear collection like the one in the Quad Cortex mini is practically saving you money. You’d be a fool not to buy! (To paraphrase an explanation my son once gave me for a purchase he wanted to make.)

But even those in this group may not need an actual hardware pedal unless they really enjoy practicing without needing to use their regular computer—or unless they gig regularly. If you’re simply a recording guitarist who tends to work “in the box,” you might just pick up some cheaper Neural DSP plugins instead. Or you can buy a more comprehensive software suite like the new Paradise Guitar Studio from Universal Audio or one of the offerings from PolychromeDSP—all of which sound excellent.

If you’re content with software but want a free alternative, take a look at NAM, the Neural Amp Modeler. It’s open source modeling tech that also offers a community tone-sharing website and has been racking up lots of great reviews for its sound quality. (Though note that most of the NAM models are static captures; they sound great but represent only that exact setup and knob positioning, though the developers are working on more complex, adjustable models.)

All types of users can probably admit, though, that hardware and software modeling tech has made this a great time to be a guitar or bass player. Even if you don’t want to use them on a record, just being able to play around with and get to know this much gear with this much accuracy is a huge win for the home hobbyist and small-time gigging musician, who would otherwise never even set eyes on most of this stuff.

The key thing is just to get whatever works for you… and then to go forth and rock.

Photo of Nate Anderson

Quad Cortex mini amp modeler: All the power, half the size Read More »

us-blindsides-states-with-surprise-settlement-in-live-nation/ticketmaster-trial

US blindsides states with surprise settlement in Live Nation/Ticketmaster trial

State attorneys general were “kept in the dark and excluded materially from settlement discussions” while they prepared for trial, the filing said. On March 5, the states were “notified of the near-final terms of the settlement at 4 P.M.” and given one day to determine whether to accept or reject them,” the filing said.

States to take over lead role at trial

The US was taking the lead role in the case before the settlement was announced. In addition to seeking a mistrial, the states asked the court to stay the proceedings to give them time “to fully prepare to assume the lead role at trial and explore settlement.”

The states “have had no opportunity to obtain and reallocate the resources necessary to try the case on their own or to meaningfully discuss the settlement with Defendants and attempt to negotiate the terms,” the filing said. “Moreover, despite the primary role that DOJ has played before the jury, the United States (and several additional individual Plaintiff States) will now vanish from the trial… Due to the substantial prejudice caused by this settlement and DOJ’s abrupt exit after taking the lead role up to and during the first week of trial, a mistrial is warranted.”

New York took the lead role in the states’ filing today. “The settlement recently announced with the US Department of Justice fails to address the monopoly at the center of this case, and would benefit Live Nation at the expense of consumers. We cannot agree to it,” New York Attorney General Letitia James said today. “My attorney general colleagues and I have a strong case against Live Nation, and we will continue our lawsuit to protect consumers and restore fair competition to the live entertainment industry.”

Most of the states that backed the filing have Democratic attorneys general. But the group is bipartisan with Republican attorneys general from Kansas, New Hampshire, Ohio, Pennsylvania, Tennessee, Utah, and Wyoming.

Other states involved in the lawsuit either decided to join the US settlement or have not yet taken a position. States agreeing to the settlement are Arkansas, Iowa, Mississippi, Nebraska, Oklahoma, South Carolina, and South Dakota, the filing said. The other states involved in the lawsuit are Florida, Indiana, Louisiana, Texas, and West Virginia.

This article was updated with a statement from Live Nation.

US blindsides states with surprise settlement in Live Nation/Ticketmaster trial Read More »