Beth Washington

Celebrating 50 years of The Rocky Horror Picture Show

cult classics, culture, film anniversaries, midnight movies, Rocky Horror Picture Show / Beth Washington / August 17, 2025

hot patootie, bless my soul

“It’s had a profound impact on our culture, especially on people who’ve felt different and marginalized.”

Credit: 20th Century Studios

When The Rocky Horror Picture Show premiered in 1975, no one could have dreamed that it would become the longest-running theatrical release film in history. But that’s what happened. Thanks to a killer soundtrack, campy humor, and a devoted cult following, Rocky Horror is still a mainstay of midnight movie culture. In honor of its 50th anniversary, Disney/20th Century Studios is releasing a newly restored 4K HDR version in October, along with deluxe special editions on DVD and Blu-ray. And the film has inspired not one, but two documentaries marking its five decades of existence: Strange Journey: The Story of Rocky Horror and Sane Inside Insanity: The Phenomenon of Rocky Horror.

(Spoilers below, because it’s been 50 years.)

The film is an adaption of Richard O’Brien‘s 1973 musical for the stage, The Rocky Horror Show. At the time, he was a struggling actor and wrote the musical as an homage to the science fiction and B horror movies he’d loved since a child. In fact, the opening song (“Science Fiction/Double Feature“) makes explicit reference to many of those, including 1951’s The Day the Earth Stood Still, Flash Gordon (1936), King Kong (1933), The Invisible Man (1933), Forbidden Planet (1956), and The Day of the Triffids (1962), among others.

The musical ran for six years in London and was well-received when it was staged in Los Angeles. But the New York City production bombed. By then the film was already in development with O’Brien—who plays the hunchbacked butler Riff Raff in the film—co-writing the script. Director Jim Sharman retained most of the London stage cast, but brought in American actors Barry Bostwick and Susan Sarandon to play Brad and Janet, respectively. And he shot much of the film at the Victorian Gothic manor Oakley Court in Berkshire, England, where several Hammer horror movies had been filmed. In fact, Sharman made use of several old props and set pieces from old Hammer productions, most notably the tank and dummy from 1958’s The Revenge of Frankenstein.

The film opens with nice wholesome couple Brad and Janet attending a wedding and awkwardly getting engaged themselves. They decide to visit their high school science teacher, Dr. Scott (Jonathan Adams), because they met in his class, but they get a flat tire en route and end up stranded in the rain. They seek refuge and a phone at a nearby castle, hoping to call for roadside assistance. Instead, they are pressured into becoming guests of the castle’s owner, a transvestite mad scientist called Frank-N-Furter (Tim Curry), and his merry bad of misfits.

The flamboyantly lascivious Frank-N-Furter is about to unveil his new Creature, the titular Rocky Horror (Peter Hinwood). Rocky is a buff, tanned, blond figure clad only in gold speedos and booties, with the body of a god and the mind of a child. Actually, he’s got half the brain of a motorcycling, rock-n-roll loving rebel named Eddie (Meat Loaf), who briefly escapes from the deep freeze where he’d been stored and causes a bit of havoc, before Frank-N-Furter kills him with an ice pick.

Things just get weirder from there. There’s a lot of sexual partner swapping, with the insatiable Frank-N-Furter bedding his Creature and then seducing the virginal Janet and Brad in turn. A sexually awakened Janet then gets down with Rocky, enraging their host. Dr. Scott shows up in time for Rocky’s birthday dinner, with the main course being the mutilated remains of Eddie. Frank-N-Further then zaps his guests with a Medusa freeze ray and turns them into Greek marble statues. He dresses them in sexy cabaret costumes—matching corsets and fishnets—before unfreezing them and forcing them to perform in an elaborate stage number.

Eventually his butler and maid—siblings Riff Raff and Magenta (Patricia Quinn), respectively—revolt, revealing that they are all actually aliens from the planet Transsexual, Transylvania. They kill Frank-N-Furter with a laser in revenge for his excesses, along with poor Rocky. The entire castle turns out to be a spaceship and Riff Raff and Magenta blast off into space, leaving Brad, Janet, and Dr. Scott crawling around the ground in confusion.

The Rocky Horror Picture Show made its London debut on August 14, 1975, along with eight other cities worldwide, but it was quickly pulled because audiences were so small. A planned Halloween opening night in New York was cancelled altogether. The film might have faded into obscurity if the studio hadn’t decided to re-market it to the midnight movie circuit, along with other counterculture fare like Pink Flamingoes (1972) and Reefer Madness (1933).

Rocky Horror fit right in and finally found its audience. It quickly became a fixture at New York City’s Waverly Theater, which ignited the film’s cult following. People went to see it again and again, and started dressing up in costumes and acting out the lines in front of the big screen, a practice that became known as shadow casting. (I saw it myself several times in the late 1980s, although I never joined a shadow cast.)

Why has Rocky Horror endured for so long? “The music, first of all, is up there, in my biased opinion, with the greatest soundtracks of all time,” Linus O’Brien, director of Strange Journey and Richard O’Brien’s son, told Ars. “I think maybe it doesn’t get recognized as such because on the surface, it just seems like a bit of fluff. But if the songs were only half as good, we wouldn’t be talking about Rocky today. It would be a very small B-movie that we’d laugh at or something.”

It really is an amazingly catchy collection of tunes, perfect for singing (and dancing) along, particularly “The Time Warp.” (Many of us can still perform the basic dance steps.) There’s “Dammit Janet,” “Over at the Frankenstein Place,” and Frank-N-Further makes an unforgettable entrance with “Sweet Transvestite.” Eddie gets his moment in the spotlight with “Hot Patootie—Bless My Soul,” and Janet seduces Rocky with “Touch-a, Touch-a, Touch-a, Touch Me.”

In addition to the unforgettable songs, O’Brien cites Curry’s inspired performance, as well as “all the things my dad loved in terms of bodybuilding and science fiction movies and ’50s rock and roll, the transgressive themes, [and] the classic reimagining of the Frankenstein story,” he said. “Whenever you have something that lasts this long, it’s usually working on many different levels that makes people keep coming back week after week, year after year.”

Shadow casting

Gia Milinovich, an American-born writer and TV presenter now living in England, was part of the second generation of Rocky Horror fans. She grew up in Duluth, Minnesota, which boasted a local repertory cinema that screened a lot of cult movies, and saw Rocky Horror for the first time in 1984. She saw it again in New York in 1987 and started her own shadow cast when she moved to London later that year—playing Frank-N-Furter, of course.

“For me, the moment when Frank-N-Furter threw off his cape—I’ve described it as a religious experience,” Milinovich told Ars. “It was like this world opened up to me and I just thought, ‘I want to be in that world.’ I was completely obsessed from then on. There’s lots of different things that I like as a fan, but there’s nothing that’s grabbed me like Rocky Horror. The atmosphere is the same every time I’ve seen it, this kind of electricity in the air.”

Decades later, Milinovich remains part of the Rocky Horror fandom, with fond memories of her shadow casting days. “I would call shadow casting an art form or a form of theater that doesn’t really exist anywhere else,” she said. “We were doing cosplay before cosplay was a thing. Part of the thing about shadow casting is getting your costumes to be screen accurate to a really obsessive degree. People are still discovering new details because as the quality of the prints go up, the higher and higher quality DVDs that you get, the more detail you can see in the costumes. There’s a whole Facebook group dedicated just to Frank-N-Furter’s leather jacket.”

And it’s not just the members of the shadow casts who participate. “There’s also all of the talk back, the audience lines,” said Milinivoch. “There are loads of people who might not want to perform, but they’re really into doing costumes or making the props for the shadow cast. So you can be sitting in the audience but still be part of the show. No one needs permission, you just do it. There’s no difference between the audience and the performers and the film, it’s all kind of one thing melded together and it’s like nothing else.”

This was a period when Rocky Horror was still very much part of underground counterculture. “For someone to walk around dressed as Columbia (Little Nell) in the late 1980s, and certainly for men wearing lipstick or black fishnet stockings, it wasn’t necessarily a safe thing to dress up and go to Rocky Horror,” said Milinovich. “Now, all these years later, I feel like it’s acceptable. For the first and second generations of fans, it felt much more radical than it does now.”

Yet in some respects, it’s as relevant as ever. “There are still those extreme prejudices in society and Rocky Horror still provides a space for people to be themselves, or to be someone else, for the two hours that it takes to do the film,” Milinovich said. “The line in the film is ‘Don’t dream it, be it.'” People still take that line to heart.

Rocky Horror has had its share of detractors over the last five decades, but judging whether it’s a “good” film or not by the same criteria as other films is kind of missing the point. The magic lies not in passively watching Rocky Horror, but in the interactive live experience—very much in keeping with its theatrical roots. “I can’t really separate the film from the whole audience experience,” said Milinovich. “I wouldn’t even watch the film at home on its own, I just don’t. I’ve seen it so many times, but watching it at home was how I would always rehearse.”

Don’t dream it, be it

The documentary Strange Journey ends with a fan telling Richard O’Brien, “It doesn’t matter what people think about Rocky because it belongs to us, not to you”—and Rocky‘s creator agreeing that this was true. “Art takes on a life of its own,” Linus O’Brien concurred, citing Karen Tongson, a gender studies professor at the University of Southern California.

“She talks about how our art expresses how we’re feeling inside way before we’ve ever had a chance to understand it or explore it,” he said. “That’s what happened in the case of Rocky with my dad. He was essentially a 13-year-old boy writing a stage play, even though he was 30 at the time. He didn’t think about what he was doing. He was just expressing, took all the things that he liked, all the things that he was thinking about and put it all together. They came from within him, but he wasn’t consciously aware of it.”

At the time, Richard O’Brien also had no idea what his creation would end up meaning to so many people. Linus O’Brien decided to make Strange Journey while gathering archival clips of his father’s work. He came across a video clip of “I’m Going Home” and found himself browsing through the comments.

“It was one after another, [talking] about how Rocky had saved their lives, and how much that song in particular meant to them,” he said. “There was a soldier in Iraq who would always play it because he wanted to go home. A daughter who used to watch Rocky with her mother all the time and then played it at her funeral. It was startling and touching, how profound the impact of Rocky has been on so many people’s lives.”

When Strange Journey screened at SXSW earlier this year, a man came up to O’Brien after the Q&A. “He was shaking and he said, ‘Listen, my wife and I met 32 years ago at Rocky, and she wanted to let you and your dad know that if it wasn’t for Rocky, she wouldn’t be alive today,'” O’Brien recalled.

“I don’t think there’s another work of art that has tangibly saved the lives of people like Rocky has,” he continued. “A lot of people just think it’s a little bit of trashy fun, a bit naughty and rude, but it’s much more than that. It’s had a profound impact on our culture, especially on people who’ve felt different and marginalized—regardless of their sexuality. It’s created a community for people who didn’t feel part of society. We’ve all felt like that to a degree. So it’s a wonderful thing to celebrate.”

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Celebrating 50 years of The Rocky Horror Picture Show Read More »

SpaceX reveals why the last two Starships failed as another launch draws near

Commercial space, Federal Aviation Administration, launch, Science, Space, spacex, starship / Beth Washington / August 16, 2025

“SpaceX can now proceed with Starship Flight 10 launch operations under its current license.”

SpaceX completed a six-engine static fire of the next Starship upper stage on August 1. Credit: SpaceX

SpaceX is continuing with final preparations for the 10th full-scale test flight of the company’s enormous Starship rocket after receiving launch approval Friday from the Federal Aviation Administration.

Engineers completed a final test of Starship’s propulsion system with a so-called “spin prime” test Wednesday at the launch site in South Texas. Ground crews then rolled the ship back to a nearby hangar for engine inspections, touchups to its heat shield, and a handful of other chores to ready it for liftoff.

SpaceX has announced the launch is scheduled for no earlier than next Sunday, August 24, at 6: 30 pm local time in Texas (23: 30 UTC).

Like all previous Starship launches, the huge 403-foot-tall (123-meter) rocket will take off from SpaceX’s test site in Starbase, Texas, just north of the US-Mexico border. The rocket consists of a powerful booster stage named Super Heavy, with 33 methane-fueled Raptor engines. Six Raptors power the upper stage, known simply as Starship.

With this flight, SpaceX officials hope to put several technical problems with the Starship program behind them. SpaceX is riding a streak of four disappointing Starship test flights from January through May, and and the explosion and destruction of another Starship vehicle during a ground test in June.

These setbacks followed a highly successful year for the world’s largest rocket in 2024, when SpaceX flew Starship four times and achieved new objectives on each flight. These accomplishments included the first catch of a Super Heavy booster back at the launch pad, proving the company’s novel concept for recovering and reusing the rocket’s first stage.

Starship’s record so far in 2025 is another story. The rocket’s inability to make it through an entire suborbital test flight has pushed back future program milestones, such as the challenging tasks of recovering and reusing the rocket’s upper stage, and demonstrating the ability to refuel another rocket in orbit. Those would both be firsts in the history of spaceflight.

These future tests, and more, are now expected to occur no sooner than next year. This time last year, SpaceX officials hoped to achieve them in 2025. All of these demonstrations are vital for Elon Musk to meet his promise of sending numerous Starships to build a settlement on Mars. Meanwhile, NASA is eager for SpaceX to reel off these tests as quickly as possible because the agency has selected Starship as the human-rated lunar lander for the Artemis Moon program. Once operational, Starship will also be key to building out SpaceX’s next-generation Starlink broadband network.

A good outcome on the next Starship test flight would give SpaceX footing to finally take a step toward these future demos after months of dithering over design dilemmas.

Elon Musk, SpaceX’s founder and CEO, presented an update on Starship to company employees in May. This chart shows the planned evolution from Starship Version 2 (left) to Version 3 (middle), and an even larger rocket (right) in the more distant future.

The FAA said Friday it formally closed the investigation into Starship’s most recent in-flight failure in May, when the rocket started leaking propellant after reaching space, rendering it unable to complete the test flight.

“The FAA oversaw and accepted the findings of the SpaceX-led investigation,” the federal regulator said in a statement. “The final mishap report cites the probable root cause for the loss of the Starship vehicle as a failure of a fuel component. SpaceX identified corrective actions to prevent a reoccurrence of the event.”

Diagnosing failures

SpaceX identified the most probable cause for the May failure as a faulty main fuel tank pressurization system diffuser located on the forward dome of Starship’s primary methane tank. The diffuser failed a few minutes after launch, when sensors detected a pressure drop in the main methane tank and a pressure increase in the ship’s nose cone just above the tank.

The rocket compensated for the drop in main tank pressure and completed its engine burn, but venting from the nose cone and a worsening fuel leak overwhelmed Starship’s attitude control system. Finally, detecting a major problem, Starship triggered automatic onboard commands to vent all remaining propellant into space and “passivate” itself before an unguided reentry over the Indian Ocean, prematurely ending the test flight.

Engineers recreated the diffuser failure on the ground during the investigation, and then redesigned the part to better direct pressurized gas into the main fuel tank. This will also “substantially decrease” strain on the diffuser structure, SpaceX said.

The FAA, charged with ensuring commercial rocket launches don’t endanger public safety, signed off on the investigation and gave the green light for SpaceX to fly Starship again when it is ready.

“SpaceX can now proceed with Starship Flight 10 launch operations under its current license,” the FAA said.

“The upcoming flight will continue to expand the operating envelope on the Super Heavy booster, with multiple landing burn tests planned,” SpaceX said in an update posted to its website Friday. “It will also target similar objectives as previous missions, including Starship’s first payload deployment and multiple reentry experiments geared towards returning the upper stage to the launch site for catch.”

File photo of Starship’s six Raptor engines firing on a test stand in South Texas. Credit: SpaceX

In the aftermath of the test flight in May, SpaceX hoped to fly Starship again by late June or early July. But another accident June 18, this time on the ground, delayed the program another couple of months. The Starship vehicle SpaceX assigned to the next flight, designated Ship 36, exploded on a test stand in Texas as teams filled it with cryogenic propellants for an engine test-firing.

The accident destroyed the ship and damaged the test site, prompting SpaceX to retrofit the sole active Starship launch pad to support testing of the next ship in line—Ship 37. Those tests included a brief firing of all six of the ship’s Raptor engines August 1.

After Ship 37’s final spin prime test Wednesday, workers transported the rocket back to a hangar for evaluation, and crews immediately got to work transitioning the launch pad back to its normal configuration to host a full Super Heavy/Starship stack.

SpaceX said the explosion on the test stand in June was likely caused by damage to a high-pressure nitrogen storage tank inside Starship’s payload bay section. This tank, called a composite overwrapped pressure vessel, or COPV, violently ruptured and led to the ship’s fiery demise. SpaceX said COPVs on upcoming flights will operate at lower pressures, and managers ordered additional inspections on COPVs to look for damage, more proof testing, more stringent acceptance criteria, and a hardware change to address the problem.

Try, try, try, try again

This year began with the first launch of an upgraded version of Starship, known as Version 2 or Block 2, in January. But the vehicle suffered propulsion failures and lost control before the upper stage completed its engine burn to propel the rocket on a trajectory carrying it halfway around the world to splash down in the Indian Ocean. Instead, the rocket broke apart and rained debris over the Bahamas and the Turks and Caicos Islands more than 1,500 miles downrange from Starbase.

That was followed in March by another Starship launch that had a similar result, again scattering debris near the Bahamas. In May, the ninth Starship test flight made it farther downrange and completed its engine burn before spinning out of control in space, preventing it from making a guided reentry to gather data on its heat shield.

Mastering the design of Starship’s heat shield is critical the future of the program. As it has on all of this year’s test flights, SpaceX has installed on the next Starship several different ceramic and metallic tile designs to test alternative materials to protect the vehicle during its scorching plunge back into Earth’s atmosphere. Starship successfully made it through reentry for a controlled splashdown in the sea several times last year, but sensors detected hot spots on the rocket’s stainless steel skin after some of the tiles fell off during launch and descent.

Making the Starship upper stage reusable like the Super Heavy booster will require better performance from the heat shield. The demands of flying the ship home from orbit and attempting a catch at the launch pad far outweigh the challenge of recovering a booster. Coming back from space, the ship encounters much higher temperatures than the booster sees at lower velocities.

Therefore, SpaceX’s most important goal for the 10th Starship flight will be gathering information about how well the ship’s different heat shield materials hold up during reentry. Engineers want to have this data as soon as possible to inform design decisions about the next iteration of Starship—Version 3 or Block 3—that will actually fly into orbit. So far, all Starship launches have intentionally targeted a speed just shy of orbital velocity, bringing the vehicle back through the atmosphere halfway around the world.

Other objectives on the docket for Starship Flight 10 include the deployment of spacecraft simulators mimicking the size of SpaceX’s next-generation Starlink Internet satellites. Like the heat shield data, this has been part of the flight plan for the last three Starship launches, but the rocket never made it far enough to attempt any payload deployment tests.

Thirty-three Raptor engines power the Super Heavy booster downrange from SpaceX’s launch site near Brownsville, Texas, in January. Credit: SpaceX

Engineers also plan to put the Super Heavy booster through the wringer on the next launch. Instead of coming back to Starbase for a catch at the launch pad—something SpaceX has now done three times—the massive booster stage will target a controlled splashdown in the Gulf of Mexico east of the Texas coast. This will give SpaceX room to try new things with the booster, such as controlling the rocket’s final descent with a different mix of engines to see if it could overcome a problem with one of its three primary landing engines.

SpaceX tried to experiment with new ways of landing of the Super Heavy booster on the last test flight, too. The Super Heavy exploded before reaching the ocean, likely due to a structural failure of the rocket’s fuel transfer tube, an internal pipe where methane flows from the fuel tank at the top of the rocket to the engines at the bottom of the booster. SpaceX said the booster flew a higher angle of attack during its descent in May to test the limits of the rocket’s performance. It seems engineers found the limit, and the booster won’t fly at such a high angle of attack next time.

SpaceX has just two Starship Version 2 vehicles in its inventory before moving on to the taller Version 3 configuration, which will also debut improved Raptor engines.

“Every lesson learned, through both flight and ground testing, continues to feed directly into designs for the next generation of Starship and Super Heavy,” SpaceX said. “Two flights remain with the current generation, each with test objectives designed to expand the envelope on vehicle capabilities as we iterate towards fully and rapidly reusable, reliable rockets.”

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

SpaceX reveals why the last two Starships failed as another launch draws near Read More »

Porsche’s best daily driver 911? The 2025 Carrera GTS T-Hybrid review.

car review, Cars, Features, Porsche 911 Carrera GTS, Porsche 911 Carrera GTS T-Hybrid / Beth Washington / August 15, 2025

An electric turbocharger means almost instant throttle response from the T-Hybrid.

Porsche developed a new T-Hybrid system for the 911, and it did a heck of a job. Credit: Jonathan Gitlin

Porsche 911 enthusiasts tend to be obsessive about their engines. Some won’t touch anything that isn’t air-cooled, convinced that everything went wrong when emissions and efficiency finally forced radiators into the car. Others love the “Mezger” engines; designed by engineer Hans Mezger, they trace their roots to the 1998 Le Mans-winning car, and no Porschephile can resist the added shine of a motorsports halo.

I’m quite sure none of them will feel the same way about the powertrain in the new 911 Carrera GTS T-Hybrid (MSRP: $175,900), and I think that’s a crying shame. Because not only is the car’s technology rather cutting-edge—you won’t find this stuff outside an F1 car—but having spent several days behind the wheel, I can report it might just be one of the best-driving, too.

T-Hybrid

This is not just one of Porsche’s existing flat-six engines with an electric motor bolted on; it’s an all-new 3.6 L engine designed to comply with new European legislation that no longer lets automakers rich out a fuel mixture under high load to improve engine cooling. Instead, the engine has to maintain the same 14.7:1 stoichiometric air-to-fuel ratio (also known as lambda = 1) across the entire operating range, thus allowing the car’s catalytic converters to work most efficiently.

The 911 Carrera GTS T-Hybrid at dawn patrol. Jonathan Gitlin

Because the car uses a hybrid powertrain, Porsche moved some of the ancillaries. There’s no belt drive; the 400 V hybrid system powers the air conditioning electrically now via its 1.9 kWh lithium-ion battery, and the water pump is integrated into the engine block. That rearrangement means the horizontally opposed engine is now 4.3 inches (110 mm) lower than it was before, which meant Porsche could use that extra space in the engine bay to fit the power electronics, like the car’s pulse inverters and DC-DC converters.

And instead of tappets, Porsche has switched to using roller cam followers to control the engine’s valves, as in motorsport. These solid cam followers don’t need manual adjustment at service time, and they reduce friction losses compared to bucket tappets.

The added displacement—0.6 L larger than the engine you’ll find in the regular 911—is to compensate for not being able to alter the fuel ratio. And for the first time in several decades, there’s now only a single turbocharger. Normally, a larger-capacity engine and a single big turbo should be a recipe for plenty of lag, versus a smaller displacement and a turbocharger for each cylinder bank, as the former has larger components with more mass that needs to be moved.

The GTS engine grows in capacity by 20 percent. Porsche

That’s where one of the two electric motors comes in. This one is found between the compressor and the turbine wheel, and it’s only capable of 15 hp (11 kW), but it uses that to spin the turbine up to 120,000 rpm, hitting peak boost in 0.8 seconds. For comparison, the twin turbos you find in the current 3.0 L 911s take three times as long. Since the turbine is electrically controlled and the electric motor can regulate boost pressure, there’s no need for a wastegate.

The electrically powered turbocharger is essentially the same as the MGU-H used in Formula 1, as it can drive the turbine and also regenerate energy to the car’s traction battery. (The mighty 919 Hybrid race car, which took Porsche to three Le Mans wins last decade, was able to capture waste energy from its turbocharger, but unlike the 911 GTS or an F1 car, it didn’t use that same motor to spin the turbo up to speed.)

On its own, the turbocharged engine generates 478 hp (357 kW) and 420 lb-ft (570 Nm). However, there’s another electric motor, this one a permanent synchronous motor built into the eight-speed dual-clutch (PDK) transmission casing. This traction motor provides up to 53 hp (40 kW) and 110 lb-ft (150 Nm) of torque to the wheels, supplementing the internal combustion engine when needed. The total power and torque output are 532 hp (397 kW) and 449 lb-ft (609 Nm).

A grey Porsche 911 parked in a campsite — No Porsches were harmed during the making of this review, but one did get a little dusty. Credit: Jonathan Gitlin

Now that’s what I call throttle response

Conceptually, the T-Hybrid in the 911 GTS is quite different from the E-Hybrid system we’ve tested in various plug-in Porsches. Those allow for purely electric driving thanks to a clutch between transmission and electric traction motor—that’s not present in the T-Hybrid, where weight saving, performance, and emissions compliance were the goal rather than an increase in fuel efficiency.

Regardless of the intent, Porsche’s engineers have created a 911 with the best throttle response of any of them. Yes, even better than the naturally aspirated GT3, with its engine packed full of motorsports mods.

I realize this is a bold claim. But I’ve been saying for a while now that I prefer driving the all-electric Taycan to the 911 because the immediacy of an electric motor beats even the silkiest internal combustion engine in terms of that first few millimeters of throttle travel. The 3.0 L twin-turbo flat-six in most 911s doesn’t suffer from throttle lag like it might have in the 1980s, but there’s still an appreciable delay between initial tip-in and everything coming on song.

Initially, I suspected that the electric motor in the PDK case was responsible for the instantaneous way the GTS responds from idle, but according to Porsche’s engineers, all credit for that belongs to the electric turbocharger. However the engineers did it, this is a car that still provides 911 drivers the things they like about internal combustion engines—the sound, the fast refueling, using gears—but with the snappiness of a fast Taycan or Macan.

Centerlock wheels are rather special. Credit: Jonathan Gitlin

Porsche currently makes about 10 different 911 coupe variants, from the base 911 Carrera to the 911 GT3 RS. The GTS (also available with all-wheel drive as a Carrera 4 GTS for an extra $8,100) is marginally less powerful and slightly slower than the current 911 Turbo, and it’s heavier but more powerful than the 911 GT3.

In the past, I’ve thought of GTS-badged Porsches as that company’s take on the ultimate daily driver as opposed to a track day special, and it’s telling that you can also order the GTS with added sunshine, either as a cabriolet (in rear- or all-wheel drive) or as a Targa (with all-wheel drive). You have to remember to tick the box for rear seats now, though—these are a no-cost option rather than being fitted as standard.

The T-Hybrid powertrain adds 103 lbs compared to the previous GTS, so it’s not a lightweight track-day model, even if the non-hybrid GTS was almost nine seconds slower around the Nürburgring. On track, driven back to back with some of the others, you might be able to notice the extra weight, but I doubt it. I didn’t take the GTS on track, but I drove it to one; a trip to Germany to see the Nürburgring 24 race with some friends presented an opportunity to test this and another Porsche that hadn’t made their way to the East Coast press fleet yet.

I’d probably pick that Panamera if most of my driving was on the autobahn. With a top speed of 194 mph (312 km/h) the 911 GTS is capable of holding its own on the derestricted stretches even if its Vmax is a few miles per hour slower than the four-door sedan. But the 911 is a smaller, lighter, and more nimble car that moves around a bit more, and you sit a lot lower to the ground, amplifying the sensation of speed. The combined effect was that the car felt happier with a slightly lower cruising speed of 180 km/h rather than 200 km/h or more in the Panamera. Zero-62 mph (100 km/h) times don’t mean much outside the tollbooth but should take 2.9 seconds with launch control.

A Porsche 911 seen from the top — Despite the nondescript gray paint, the GTS T-Hybrid still turned plenty of heads. Credit: Jonathan Gitlin

Keep going

For the rest of the time, the 911 GTS evoked far more driving pleasure. Rear-wheel steering aids agility at lower speeds, and there are stiffer springs, newly tuned dampers, and electrohydraulic anti-roll bars (powered by the hybrid’s high-voltage system). Our test car was fitted with the gigantic (420 mm front, 410 mm rear) carbon ceramic brakes, and at the rear, the center lock wheels are 11.5 inches in width.

In the dry, I never got close to finding the front tires’ grip limit. The rear-wheel steering is noticeable, particularly when turning out of junctions, but never to the degree where you start thinking about correcting a slide unless you provoke the tires into breaking traction with the throttle. Even on the smooth tarmac preferred by German municipalities, the steering communicated road conditions from the tires, and the Alcantara-wrapped steering wheel is wonderful to grip in your palms.

So it’s predictably great to drive on mountain roads in Sport or Sport+. However, the instant throttle response means it’s also a better drive in Normal at 30 km/h as you amble your way through a village than the old GTS or any of the 3.0 L cars. That proved handy after Apple Maps sent me down a long dirt road on the way to my rental house, as well as for navigating the Nürburgring campsite, although I think I now appreciate why Porsche made the 911 Dakar (and regret declining that first drive a few years ago).

Happily, my time with the 911 GTS didn’t reveal any software bugs, and I prefer the new, entirely digital main instrument display to the old car’s analog tachometer sandwiched between two multifunction displays. Apple CarPlay worked well enough, and the compact cabin means that ergonomics are good even for those of us with shorter arms. There is a standard suite of advanced driver assistance systems, including traffic sign detection (which handily alerts you when the speed limit changes) and collision warning. Our test car included the optional InnoDrive system that adds adaptive cruise control, as well as a night vision system. On the whole, the ADAS was helpful, although if you don’t remember to disable the lane keep assist at the start of each journey, you might find it intruding mid-corner, should the car think you picked a bad line.

My only real gripe with the 911 GTS T-Hybrid is the fact that, with some options, you’re unlikely to get much change from $200,000. Yes, I know inflation is a thing, and yes, I know that’s still 15 percent less than the starting price of a 911 GT3 Touring, which isn’t really much of a step up from this car in terms of the driving experience on the road. However, a 911 Carrera T costs over $40,000 less than the T-Hybrid, and while it’s slower and less powerful, it’s still available with a six-speed manual. That any of those three would make an excellent daily driver 911 is a credit to Porsche, but I think if I had the means, the sophistication of the T-Hybrid system and its scalpel-sharp responsiveness might just win the day.

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

Porsche’s best daily driver 911? The 2025 Carrera GTS T-Hybrid review. Read More »

US government agency drops Grok after MechaHitler backlash, report says

AI, ai action plan, Artificial Intelligence, chatbot, chatgpt, Donald Trump, elon musk, grok, openai, Policy, trump administration, US government, xAI / Beth Washington / August 15, 2025

xAI apparently lost a government contract after a tweak to Grok’s prompting triggered an antisemitic meltdown where the chatbot praised Hitler and declared itself MechaHitler last month.

Despite the scandal, xAI announced that its products would soon be available for federal workers to purchase through the General Services Administration. At the time, xAI claimed this was an “important milestone” for its government business.

But Wired reviewed emails and spoke to government insiders, which revealed that GSA leaders abruptly decided to drop xAI’s Grok from their contract offering. That decision to pull the plug came after leadership allegedly rushed staff to make Grok available as soon as possible following a persuasive sales meeting with xAI in June.

It’s unclear what exactly caused the GSA to reverse course, but two sources told Wired that they “believe xAI was pulled because of Grok’s antisemitic tirade.”

As of this writing, xAI’s “Grok for Government” website has not been updated to reflect GSA’s supposed removal of Grok from an offering that xAI noted would have allowed “every federal government department, agency, or office, to access xAI’s frontier AI products.”

xAI did not respond to Ars’ request to comment and so far has not confirmed that the GSA offering is off the table. If Wired’s report is accurate, GSA’s decision also seemingly did not influence the military’s decision to move forward with a $200 million xAI contract the US Department of Defense granted last month.

Government’s go-to tools will come from xAI’s rivals

If Grok is cut from the contract, that would suggest that Grok’s meltdown came at perhaps the worst possible moment for xAI, which is building the “world’s biggest supercomputer” as fast as it can to try to get ahead of its biggest AI rivals.

Grok seemingly had the potential to become a more widely used tool if federal workers opted for xAI’s models. Through Donald Trump’s AI Action Plan, the president has similarly emphasized speed, pushing for federal workers to adopt AI as quickly as possible. Although xAI may no longer be involved in that broad push, other AI companies like OpenAI, Anthropic, and Google have partnered with the government to help Trump pull that off and stand to benefit long-term if their tools become entrenched in certain agencies.

US government agency drops Grok after MechaHitler backlash, report says Read More »

Google releases pint-size Gemma open AI model

AI, Artificial Intelligence, Google, Open AI models / Beth Washington / August 15, 2025

Big tech has spent the last few years creating ever-larger AI models, leveraging rack after rack of expensive GPUs to provide generative AI as a cloud service. But tiny AI matters, too. Google has announced a tiny version of its Gemma open model designed to run on local devices. Google says the new Gemma 3 270M can be tuned in a snap and maintains robust performance despite its small footprint.

Google released its first Gemma 3 open models earlier this year, featuring between 1 billion and 27 billion parameters. In generative AI, the parameters are the learned variables that control how the model processes inputs to estimate output tokens. Generally, the more parameters in a model, the better it performs. With just 270 million parameters, the new Gemma 3 can run on devices like smartphones or even entirely inside a web browser.

Running an AI model locally has numerous benefits, including enhanced privacy and lower latency. Gemma 3 270M was designed with these kinds of use cases in mind. In testing with a Pixel 9 Pro, the new Gemma was able to run 25 conversations on the Tensor G4 chip and use just 0.75 percent of the device’s battery. That makes it by far the most efficient Gemma model.

Small Gemma benchmark — Gemma 3 270M shows strong instruction-following for its small size. Credit: Google

Developers shouldn’t expect the same performance level of a multi-billion-parameter model, but Gemma 3 270M has its uses. Google used the IFEval benchmark, which tests a model’s ability to follow instructions, to show that its new model punches above its weight. Gemma 3 270M hits a score of 51.2 percent in this test, which is higher than other lightweight models that have more parameters. The new Gemma falls predictably short of 1 billion-plus models like Llama 3.2, but it gets closer than you might think for having just a fraction of the parameters.

Google releases pint-size Gemma open AI model Read More »

Apple Watch gets reformulated, non-patent-infringing blood oxygen monitoring

Apple, apple watch, Tech / Beth Washington / August 14, 2025

The redesigned version of the feature will be available on the Apple Watch Series 9, Series 10, and Ultra 2 after users install the watchOS 11.6.1 update on their watches and the iOS 18.6.1 update on their paired iPhones.

Apple says that watches outside the US won’t be affected by the update, since they were never subject to the US import ban in the first place. It also won’t affect Apple Watches purchased in the US before the import ban went into effect—Apple never removed the feature from watches it had already sold, so if you bought a Series 9 or Ultra 2 watch in the fall of 2023 or if you’re still using an older watch with the blood oxygen monitoring feature, the updates won’t change anything for you.

Masimo originally sued Apple over the blood oxygen monitoring feature in January of 2020. According to Masimo, Masimo and Apple had initially met in 2013 to talk about a potential partnership or acquisition, but Apple instead poached Masimo’s engineers to implement the feature on its own without Masimo’s involvement.

Apple Watch gets reformulated, non-patent-infringing blood oxygen monitoring Read More »

GPT-5s Are Alive: Synthesis

Alive / Beth Washington / August 14, 2025

What do I ultimately make of all the new versions of GPT-5?

The practical offerings and how they interact continues to change by the day. I expect more to come. It will take a while for things to settle down.

I’ll start with the central takeaways and how I select models right now, then go through the type and various questions in detail.

My central takes, up front, first the practical:

GPT-5-Pro is a substantial upgrade over o3-Pro.
GPT-5-Thinking is a substantial upgrade over o3.
1. The most important gain is reduced hallucinations.
2. The other big gain is an improvement in writing.
3. GPT-5-Thinking should win substantially more use cases than o3 did.
GPT-5, aka GPT-5-Fast, is not much better than GPT-4o aside from the personality and sycophancy changes, and the sycophancy still isn’t great.
GPT-5-Auto seems like a poor product unless you are on the free tier.
Thus, you still have to manually pick the right model every time.
Opus 4.1 and Sonnet 4 still have a role to play in your chat needs.
GPT-5 and Opus 4.1 are both plausible choices for coding.

On the bigger picture:

GPT-5 is a pretty big advance over GPT-4, but it happened in stages.
GPT-5 is not a large increase in base capabilities and intelligence.
1. GPT-5 is about speed, efficiency, UI, usefulness and reduced hallucinations.
We are disappointed in this release because of high expectations and hype.
That was largely due to it being called GPT-5 and what that implied.
We were also confused because 4+ models were released at once.
OpenAI botched the rollout in multiple ways, update accordingly.
OpenAI uses more hype for unimpressive things, update accordingly.
Remember that we are right on track on the METR graph.
Timelines for AGI or superintelligence should adjust somewhat, especially in cutting a bunch of probability out of things happening quickly, but many people are overreacting on this front quite a bit, usually in a ‘this confirms all of my priors’ kind of way, often with supreme unearned overconfidence.
This is not OpenAI’s most intelligent model. Keep that in mind.

This is a distillation of consensus thinking on the new practical equilibrium:

William Kranz: my unfortunate feedback is non-thinking Opus is smarter than non-thinking GPT-5. there are nuances i can’t get GPT-5 to grasp even when i lampshade them, it just steamrolls over them with the pattern matching idiot ball. meanwhile Opus gets them in one shot.

Roon: that seems right, but i’m guessing 5-thinking is better than opus-thinking.

This seems mostly right. I prefer to use Opus if Opus is enough thinking for the job, but OpenAI currently scales more time and compute better than Anthropic does.

So, what do we do going forward to get the most out of AI on a given question?

Here’s how I think about it: There are four ‘speed tiers’:

Quick and easy. You use this for trivial easy questions and ‘just chatting.’
1. Matter of taste, GPT-5 is good here, Sonnet 4 is good here, Gemini Flash, etc.
2. Most of the time you are wrong to be here and should be at #2 or #3 instead.
Brief thought. Not instant, not minutes.
1. Use primarily Claude Opus 4.1.
2. We just got GPT-5-Thinking-Mini in ChatGPT, maybe it’s good for this?
Moderate thought. You can wait a few minutes.
1. Use primarily GPT-5-Thinking and back it up with Claude Opus 4.1.
2. If you want a third opinion, use AI Studio for Gemini Pro 2.5.
Extensive thought. You can wait for a while.
1. Use GPT-5-Pro and back it up with Opus in Research mode.
2. Consider also firing up Gemini Deep Research or Deep Thinking, etc, and anything else you have handy cause why not. Compare and contrast.
3. You need to actually go do something else and then come back later.

What about coding?

Here I don’t know because I’ve been too busy to code anything since before Opus 4, nor have I tried out Claude Code.

Also the situation continues to change rapidly. OpenAI claims that they’ve doubled speed for GPT-5 inside cursor as of last night via superior caching and latency, whereas many of the complaints about GPT-5 in Cursor was previously that it was too slow. You’ll need to try out various options and see what works better for you (and you might also think about who you want to support, if it is close).

We can then contrast that with the Official Hype.

That’s not automatically a knock. Hypers gotta hype. It’s worth seeing choice of hype.

Here was Sam Altman live-tweeting the livestream, a much better alternative way to actually watch the livestream, which I converted to bullet points, and reordered a bit for logical coherence but otherwise preserving to give a sense of his vibe. Hype!

Sam Altman:

GPT-5 in an integrated model, meaning no more model switcher and it decides when it needs to think harder or not.

It is very smart, intuitive, and fast.

It is available to everyone, including the free tier, w/reasoning!

Evals aren’t the most important thing–the most important thing is how useful we think the model will be–but it does well on evals.

For example, a new high on SWE-bench and many other metrics. It is by far our most reliable and factual model ever.

Rolling out today for free, plus, pro, and team users. next week to enterprise and edu. making this available in the free tier is a big deal to us; PhD-level intelligence for everyone!

Plus users get much higher rate limits.

Pro users get GPT-5 pro; really smart!

demo time: GPT-5 can make something interactive to explain complex concepts like the bernoulli effect to you, churning out hundreds of lines of code in a couple of minutes.

GPT-5 is much better at writing! for example, here is GPT-4o writing a eulogy for our previous models (which we are sunsetting) vs GPT-5.

GPT-5 is good at writing software. Here it is making a web app to to learn french, with feature requests including a snake-like game with a mouse and cheese and french words.

Next up: upgraded voice mode! Much more natural and smarter.

Free users now can chat for hours, and plus users nearly unlimited.

Works well with study mode, and lots of other things.

Personalization!

A little fun one: you can now customize the color of your chats.

Research preview of personalities: choose different ones that match the style you like.

Memory getting better.

Connect other services like gmail and google calendar for better responses.

Introducing safe completions. A new way to maximize utility while still respecting safety boundaries. Should be much less annoying than previous refusals.

Seb talking about synthetic data as a new way to make better models! Excited for much more to come.

GPT-5 much better at health queries, which is one of the biggest categories of ChatGPT usage. hopeful that it will provide real service to people.

These models are really good at coding!

3 new models in the API: GPT-5, GPT-5 Mini, GPT-5 Nano.

New ‘minimal’ reasoning mode, custom tools, changes to structured outputs, tool call preambles, verbosity parameter, and more coming.

Not just good at software, good at agentic tasks across the board. Also great at long context performance.

GPT-5 can do very complex software engineering tasks in practice, well beyond vibe coding.

Model creates a finance dashboard in 5 minutes that devs estimate would have taken many hours.

Now, @mntruell joining to talk about cursor’s experience with GPT-5. notes that GPT-5 is incredibly smart but does not compromise on ease of use for pair programming.

GPT-5 is the best technology for businesses to build on. more than 5 million businesses are using openai; GPT-5 will be a step-change for them.

Good new on pricing!

$1.25/$10 for GPT-5, $0.25/$2 for GPT-5-mini, $0.05/$0.40 for nano.

Ok now the most important part:

“We are about understanding this miraculous technology called deep learning.”

“This is a work of passion.”

“I want to to recognize and deeply thank the team at openai”

“Early glimpses of technology that will go much further.”

“We’ll get back to scaling.”

I would summarize the meaningful parts of the pitch as:

It’s a good model, sir.
It’s got SoTA (state of the art) benchmarks.
It’s highly useful, more than the benchmarks would suggest.
It’s fast.
Our price cheap – free users get it, $1.25/$10 on the API.
It’s good at coding, writing, health queries, you name it.
It’s integrated, routing you to the right level of thinking.
When it refuses it tries to be as helpful as possible.

Altman is careful not to mention the competition, focusing on things being good. He also doesn’t mention the lack of sycophancy, plausibly because ‘regular’ customers don’t understand why sycophancy is bad, actually, and also he doesn’t want to draw attention to that having been a problem.

Altman: when you get access to gpt-5, try a message like “use beatbot to make a sick beat to celebrate gpt-5”.

it’s a nice preview of what we think this will be like as AI starts to generate its own UX and interfaces get more dynamic.

it’s cool that you can interact with the synthesizer directly or ask chatgpt to make changes!

I have noticed the same pattern that Siemon does here. When a release is impressive relative to expectations, Altman tends to downplay it. When a release is unimpressive, that’s when he tends to bring the hype.

From their Reddit Q&A that mostly didn’t tell us anything:

Q: Explain simply how GPT-5 is better than GPT-4.

Eric Mitchell (OpenAI): gpt-5 is a huge improvement over gpt-4 in a few key areas: it thinks better (reasoning), writes better (creativity), follows instructions more closely and is more aligned to user intent.

Again note what isn’t listed here.

Here’s more widely viewed hype that knows what to emphasize:

Elaine Ya Le (OpenAI): GPT-5 is here! 🚀

For the first time, users don’t have to choose between models — or even think about model names. Just one seamless, unified experience.

It’s also the first time frontier intelligence is available to everyone, including free users!

GPT-5 sets new highs across academic, coding, and multimodal reasoning — and is our most trustworthy, accurate model yet. Faster, more reliable, and safer than ever.

All in a seamless, unified experience with the tools you already love.

Fortunate to have led the effort to make GPT-5 a truly unified experience, and thrilled to have helped bring this milestone to life with an amazing team!

Notice the focus on trustworthy, accurate and unified. Yes, she talks about it setting new highs across the board, but you can tell that’s an afterthought. This is about refining the experience.

Here’s some more hype along similar lines that feels helpful:

Christina Kim (OpenAI): We’re introducing GPT-5.

The evals are SOTA, but the real story is usefulness.

It helps with what people care about– shipping code, creative writing, and navigating health info– with more steadiness and less friction.

We also cut hallucinations. It’s better calibrated, says “I don’t know,” separates facts from guesses, and can ground answers with citations when you want. And it’s also a good sparring partner 🙃

I’ve been inspired seeing the care, passion, and level of detail from the team. Excited to see what people do with these very smart models

tweet co-authored by gpt5 😉

That last line worries me a bit.

Miles Brundage: Was wondering lol.

That’s the pitch.

GPT-5 isn’t a lot smarter. GPT-5 helps you do the dumb things you gotta do.

Still huge, as they say, if true.

Here’s hype that is targeted at the Anthropic customers out there:

Aiden McLaughlin (OpenAI): gpt-5 fast facts:

Hits sota on pretty much every eval

Way better than claude 4.1 opus at swe

>5× cheaper than opus

>40% cheaper than sonnet

Best writing quality of any model

Way less sycophantic

I notice the ‘way less sycophantic’ does not answer the goose’s question ‘than what?’

This is a direct pitch to the coders, saying that GPT-5 is better than Opus or Sonnet, and you should switch. Unlike the other claims, them’s fighting words.

The words do not seem to be true.

There are a lot of ways to quibble on details but this is a resounding victory for Opus.

There’s no way to reconcile that with ‘way better than claude 4.1 opus at swe.’

We also have celebratory posts, which is a great tradition.

Rapha (OpenAI): GPT-5 is proof that synthetic data just keeps working! And that OpenAI has the best synthetic data team in the world 👁️@SebastienBubeck the team has our eyeballs on you! 🙌

I really encourage everyone to log on and talk to it. It is so, so smart, and fast as always! (and were just getting started!)

Sebastien Bubeck (OpenAI): Awwww, working daily with you guys is the highlight of my career, and I have really high hopes that we have barely gotten started! 💜

I view GPT-5 as both evidence that synthetic data can work in some ways (such as the lower hallucination rates) and also evidence that synthetic data is falling short on general intelligence.

Roon is different. His hype is from the heart, and attempts to create clarity.

Roon: we’ve been testing some new methods for improving writing quality. you may have seen @sama’s demo in late march; GPT-5-thinking uses similar ideas

it doesn’t make a lot of sense to talk about better writing or worse writing and not really worth the debate. i think the model writing is interesting, novel, highly controllable relative to what i’ve seen before, and is a pretty neat tool for people to do some interactive fiction, to use as a beta reader, and for collaborating on all kinds of projects.

the effect is most dramatic if you open a new 5-thinking chat and try any sort of writing request

for quite some time i’ve wanted to let people feel the agi magic I felt playing with GPT-3 the weekend i got access in 2020, when i let that raw, chaotic base model auto-complete various movie scripts and oddball stories my friends and I had written for ~48 hours straight. it felt like it was reading my mind, understood way too much about me, mirrored our humor alarmingly well. it was uncomfortable, and it was art

base model creativity is quite unwieldy to control and ultimately only tiny percents of even ai enthusiasts will ever try it (same w the backrooms jailbreaking that some of you love). the dream since the instruct days has been having a finetuned model that retains the top-end of creative capabilities while still easily steerable

all reasoning models to date seem to tell when they’re being asked a hard math or code question and will think for quite some time, and otherwise spit out an answer immediately, which is annoying and reflects the fact that they’re not taking the qualitative requests seriously enough. i think this is our first model that really shows promise at not doing that and may think for quite some time on a writing request

it is overcooked in certain ways (post training is quite difficult) but i think you’ll still like it 😇

tldr only GPT-5-thinking has the real writing improvements and confusingly it doesn’t always auto switch to this so manually switch and try it!

ok apparently if you say “think harder” it gets even better.

One particular piece of hype from the livestream is worth noting, that they are continuing to talk about approaching ‘a recursive self-improvement loop.’

I mean, at sufficient strength this is yikes, indeed the maximum yikes thing.

ControlAI: OpenAI’s Sebastien Bubeck says the methods OpenAI used to train GPT-5 “foreshadows a recursive self-improvement loop”.

Steven Adler: I’m surprised that OpenAI Comms would approve this:

GPT-5 “foreshadows a recursive self-improvement loop”

In OpenAI’s Preparedness Framework, recursive self-improvement is a Critical risk (if at a certain rate), which would call to “halt further development”

To be clear, it sounds like Sebastien isn’t describing an especially fast loop. He’s also talking about foreshadowing, not being here today per se

I was still surprised OpenAI would use this term about its AI though. Then I realized it’s also used in “The Gentle Singularity”

Then again, stated this way it is likely something much weaker, more hype?

Here is Bloomberg’s coverage from Rachel Metz, essentially a puff piece reporting moderated versions of OpenAI’s hype.

I mean wow just wow, this was from the livestream.

And we also have this:

Wyat Walls: OpenAI: we noticed significantly less deceptive behavior compared to our prior frontier reasoning model, OpenAI o3.

Looks like actual figure [on the left below] should be ~17. What is going on?! Did GPT-5 do this presentation?

This is not a chart crime, but it is still another presentation error.

Near Cyan: this image is a work of art, you guys just dont get it. they used the deceptive coding model to make the charts. so it’s self-referential humor just like my account.

Jules Robins: They (perhaps inadvertently) include an alignment failure by default demonstration too: the Jumping Ball Runner game allows any number of jumps in mid-air so you can get an arbitrary score. That’s despite the human assumptions and the similar games in training data avoiding this.

And another:

Horace He: Not a great look that after presenting GPT5’s reduced hallucinations, their first example repeats a common error of how plane wings generate lift (“equal transit theory”).

Francois Fleuret: Aka “as demonstrated in airshow, aircrafts can fly upside-down alright.”

Chris: It’s funny because the *whole presentationwas effectively filled with little holes like this. I don’t know if it was just rushed, or what.

Nick McGreivy: has anyone else noticed that the *very firstdemo in the GPT-5 release just… doesn’t work?

Not a great look that the first demo in the press release has a bug that allows you to jump forever.

I think L is overreacting here, but I do think that when details get messed up that does tell you a lot.

One recalls the famous Van Halen Brown M&Ms contract clause: “There will be no brown M&M’s in the backstage area, upon pain of forfeiture of the show, with full compensation.” Because if the venue didn’t successfully execute on sorting out the brown M&Ms then they knew they’d messed up other things and the venue probably wasn’t safe for their equipment.

Then there was a rather serious actual error:

Lisan al Gaib: it’s ass even when I set it to Thinking. I want to cry.

Roon: btw model auto switcher is apparently broken which is why it’s not routing you correctly. will be fixed soon.

Sam Altman (August 8): GPT-5 will seem smarter starting today. Yesterday, the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber.

Also, we are making some interventions to how the decision boundary works that should help you get the right model more often.

OpenAI definitely did not sort out their brown M&Ms on this one.

L: As someone who used to be a professional presenter of sorts, and then a professional manager of elite presenters… people who screw up charts for high-impact presentations cannot be trusted in other aspects. Neither can their organizational leaders.

OpenAI’s shitty GPT-5 charts tells me they’ve lost the plot and can’t be trusted.

I used to think it was simply a values mis-match… that they firmly held a belief that they needn’t act like normal humans because they could be excellent at what they were doing. But… they can’t, even when it matters most. Nor can their leaders apparently be bothered to stress the details.

My p-doom just went up a solid 10-15% (from very low), because I don’t think these rich genius kids have the requisite leadership traits or stalwartness to avoid disaster.

Just an observation from someone who has paid very little first-hand attention to OpenAI, but decided to interestedly watch a reveal after the CEO tweeted a Death Star.

I would feel better about OpenAI if they made a lot less of these types of mistakes. It does not bode well for when they have to manage the development and release of AGI or superintelligence.

Many people are saying:

Harvey Michael Pratt: “with GPT-5 we’re deprecating all of our old models”

wait WHAT

cool obituary but was missing:

time of death

cost of replacement

a clear motive

The supposed motive is to clear up confusion. One model, GPT-5, that most users query all the time. Don’t confuse people with different options, and it is cheaper not to have to support them. Besides, GPT-5 is strictly better, right?

Under heavy protest, Altman agreed to give Plus users back GPT-4o if they want it, for the time being.

I find it strange to prioritize allocating compute to the free ChatGPT tier if there are customers who want to pay to use that compute in the API?

Sam Altman: Here is how we are prioritizing compute over the next couple of months in light of the increased demand from GPT-5:

1. We will first make sure that current paying ChatGPT users get more total usage than they did before GPT-5.

2. We will then prioritize API demand up to the currently allocated capacity and commitments we’ve made to customers. (For a rough sense, we can support about an additional ~30% new API growth from where we are today with this capacity.)

3. We will then increase the quality of the free tier of ChatGPT.

4. We will then prioritize new API demand.

We are ~doubling our compute fleet over the next 5 months (!) so this situation should get better.

I notice that one could indefinitely improve the free tier of ChatGPT, so the question is how much one intends to improve it.

The other thing that is missing here is using compute to advance capabilities. Sounds great to me, if it correctly indicates that they don’t know how to get much out of scaling up compute use in their research at this time. Of course they could also simply not be talking about that and pretending that part of compute isn’t fungible, in order to make this sound better.

There are various ways OpenAI could go. Ben Thompson continues to take the ultimate cartoon supervillain approach to what OpenAI should prioritize, that the best business is the advertising platform business, so they should stop supporting this silly API entirely to pivot to consumer tech and focus on what he is totally not calling creating our new dystopian chat overlord.

This of course is also based on Ben maximally not feeling any of the AGI, and treating future AI as essentially current AI with some UI updates and a trenchcoat, so all that matters is profit maximization and extracting the wallets and souls of the low end of the market the way Meta does.

Which is also why he’s strongly against all the anti-enshittification changes OpenAI is making to let us pick the right tool for the job, instead wishing that the interface and options be kept maximally simple, where OpenAI takes care of which model to serve you silently behind the scenes. Better, he says, to make the decisions for the user, at least in most cases, and screw the few power users for whom that isn’t true. Give people what they ‘need’ not what they say they want, and within the $20 tier he wants to focus on the naive users.

One reason some people have been angry was the temporary downgrade in the amount of reasoning mode you get out of a $20 subscription, which users were not reassured at the time was temporary.

OpenAI started at 200 Thinking messages a week on Plus, then doubled rate limits once the rollout was complete, then went to 3,000 thinking queries per week which is far more than I have ever used in a week. Now there is also the fallback to Thinking-Mini after that.

So this generated a bunch of initial hostility (that I won’t reproduce as it is now moot), but at 3,000 I think it is fine. If you are using more than that, it’s time to upgrade, and soon you’ll also (they say) get unlimited GPT-5-mini.

Sam Altman: the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to 24%.

i expect use of reasoning to greatly increase over time, so rate limit increases are important.

Miles Brundage: Fortunately I have a Pro account and thus am not at risk of having the model picker taken away from me (?) but if that were not the case I might be leading protests for Pause AI [Product Changes]

It’s kind of amazing that only 7% of plus users used a reasoning model daily. Two very different worlds indeed.

I don’t know that Thompson is wrong about what it should look like as a default. I am increasingly a fan of hiding complex options within settings. If you want the legacy models, you have to ask for them.

It perhaps makes sense to also put the additional GPT-5 options behind a setting? That does indeed seem to be the new situation as of last night, with ‘show additional models’ as the setting option instead of ‘show legacy models’ to keep things simple.

There is real risk of Paradox of Choice here, where you feel forced to ensure you are using the right model, but now there are too many options again and you’re not sure which one it is, and you throw up your hands.

As of this morning, your options look like this, we now have a ‘Thinking mini’ option:

o3 Pro is gone. This makes me abstractly sad, especially because it means you can’t compare o3 Pro to GPT-5 Pro, but I doubt anyone will miss it. o4-mini-high is also gone, again I doubt we will miss it.

For the plus plan, GPT-4.5 is missing, since it uses quite a lot of compute.

I also notice the descriptions of the legacy models are gone, presumably on the theory that if you should be using the legacies then you already know what they are for.

Thinking-mini might be good for fitting the #2 slot on the speed curve, where previously GPT-5 did not give us a good option. We’ll have to experiment to know.

Pliny is here to provide it.

I hadn’t looked at a ChatGPT system prompt in a while so I read it over. Things that stood out to me that I hadn’t noticed or remembered:

They forbid it to automatically store a variety of highly useful information: Race, religion, criminal record, identification via personal attributes, political affiliation, personal attributes an in particular your exact address.
1. But you can order it to do so explicitly. So you should do that.
If you want canvas you probably need to ask for it explicitly.
It adds a bunch of buffer time to any time period you specify, with one example being the user asks for docs modified last week so instead it gives you docs modified in the last two weeks, for last month the last two months.
1. How can this be the correct way to interpret ‘last week’ or month?
2. For ‘meeting notes on retraining from yesterday’ it wants to go back four days.
It won’t search with a time period shorter than 30 days into the past, even when this is obviously wrong (e.g. the current score on the Yankees game).

Wyatt Walls then offers us a different prompt for thinking mode.

If you are using GPT-5 for writing, definitely at least use GPT-5-Thinking, and still probably throw in at least a ‘think harder.’

Nikita Sokolsky: I wasn’t impressed with gpt-5 until I saw Roon’s tweet about -thinking being able to take the time to think about writing instead of instantly delivering slop.

Definitely cutting edge on a standard “write a Seinfeld episode” question.

Dominik Lukes: Same here. GPT-5 Thinking is the one I used for my more challenging creative writing tests, too. GPT-5 just felt too meh.

Peter Wildeford: I would love to see a panel of strong writers blind judge the writing outputs (both fiction and non-fiction) from LLMs.

LMArena is not good for this because the typical voter is really bad at judging good writing.

Ilya Abyzov: Like others, I’ve been disappointed with outputs when reasoning effort=minimal.

On the plus side, I do see pretty substantially better prose & humor from it when allowed to think.

The “compare” tool in the playground has been really useful to isolate differences vs. past models.

MetaCritic Capital: GPT-5 Pro translating poetry verdict: 6/10 (a clear upgrade!)

“There’s a clear improvement in the perception of semantic fidelity. But there are still so many forced rhymes. Additional words only to rhyme.”

My verdict on the Seinfeld episode is that it was indeed better than previous attempts I’ve seen, with some actually solid lines. It’s not good, but then neither was the latest Seinfeld performance I went to, which I’m not sure was better. Age comes for us all.

One thing it is not good at is ‘just do a joke,’ you want it to Do Writing instead.

Hollow Yes Man: My wife and I had it write the Tiger King Musical tonight. It made some genuinely hilarious lines, stayed true to the characters, and constructed a coherent narrative. we put it into suno and got some great laughs.

We do have the Short Story Creative Writing benchmark but I don’t trust it. The holistic report is something I do trust, though:

Lech Mazur: Overall Evaluation: Strengths and Weaknesses of GPT-5 (Medium Reasoning) Across All Tasks

Strengths:

GPT-5 demonstrates a remarkable facility with literary craft, especially in short fiction. Its most consistent strengths are a distinctive, cohesive authorial voice and a relentless inventiveness in metaphor, imagery, and conceptual synthesis. Across all tasks, the model excels at generating original, atmospheric settings and integrating sensory detail to create immersive worlds.

Its stories often display thematic ambition, weaving philosophical or emotional subtext beneath the surface narrative. The model is adept at “show, don’t tell,” using implication, action, and symbol to convey character and emotion, and it frequently achieves a high degree of cohesion—especially when tasked with integrating disparate elements or prompts.

When successful, GPT-5’s stories linger, offering resonance and depth that reward close reading.

Weaknesses:

However, these strengths often become liabilities. The model’s stylistic maximalism—its dense, poetic, metaphor-laden prose—frequently tips into overwriting, sacrificing clarity, narrative momentum, and emotional accessibility. Abstraction and ornament sometimes obscure meaning, leaving stories airless or emotionally distant.

Plot and character arc are recurrent weak points: stories may be structurally complete but lack genuine conflict, earned resolution, or psychological realism. There is a tendency to prioritize theme, atmosphere, or conceptual cleverness over dramatic stakes and human messiness. In compressed formats, GPT-5 sometimes uses brevity as an excuse for shallow execution, rushing transitions or resolving conflict too conveniently.

When integrating assigned elements, the model can fall into “checklist” storytelling, failing to achieve true organic unity. Ultimately, while GPT-5’s literary ambition and originality are undeniable, its work often requires editorial pruning to balance invention with restraint, and style with substance.

Writing is notoriously hard to evaluate, and I essentially never ask LLMs for writing so I don’t have much of a comparison point. It does seem like if you use thinking mode, you can get at least get a strong version of what GPT-4.5 had here with GPT-5.

The other problem with writing is you need to decide what to have it write. Even when Roon highlights writing, we get assignments like ‘If Napoléon wrote a personal and intimate letter to Sydney Sweeney’ or ‘You are Dostoevsky, but you are also a Snapchat fuckboi. Write to me.’

Or you could try this prompt?

Mark Kretschmann: mazing prompt for @OpenAI GPT-5, you have to try this:

“From everything you know about me, write a short story with 2000 words tailored exactly to my taste. Think hard.”

Enjoy, and let us know how it turned out!😏

I did indeed try it. And yes, this seems better than previous attempts. I still didn’t successfully force myself to finish reading the story.

Yes, you still have to be careful with the way you prompt to avoid leading the witness. Sycophancy might not be at absurd levels but it definitely is never at zero.

You’re right to question it:

My guess is that the improved hallucination rate from o3 (and also GPT-4o) to GPT-5 and GPT-5-thinking is the bulk of the effective improvement from GPT-5.

Gallabytes: “o3 with way fewer hallucinations” is actually a very good model concept and I am glad to be able to use it. I am still a bit skeptical of the small model plus search instead of big model with big latent knowledge style, but within those constraints this is a very good model.

The decrease in hallucinations is presumably a big driver in things like the METR 50% completion rate and success on various benchmarks. Given the modest improvements it could plausibly account for more than all of the improvement.

I’m not knocking this. I agree with Gallabytes that ‘o3 the Lying Liar, except it stops lying to you’ is a great pitch. That would be enough to shift me over to o3, or now GPT-5-Thinking, for many longer queries, and then there’s Pro, although I’d still prefer to converse with Opus if I don’t need o3’s level of thinking.

For now, I’ll be running anything important through both ChatGPT and Claude, although I’ll rarely feel the need to add a third model on top of that.

This was a great ‘we disagree on important things but are still seeking truth together’:

Zvi Mowshowitz (Thursday afternoon): Early indications look like best possible situation, we can relax, let the mundane utility flow, until then I don’t have access yet so I’m going to keep enjoying an extended lunch.

Teortaxes: if Zvi is so happy, this is the greatest indication you’re not advancing in ways that matter. I don’t like this turn to «mundane utility» at all. I wanted a «btw we collaborated with Johns Hopkins and got a new cure for cancer candidate confirmed», not «it’s a good router sir»

C: you seem upset that you specifically aren’t the target audience of GPT-5. they improved on hallucinations, long context tasks, writing, etc, in additional to being SOTA (if only slightly) on benchmarks overall; that’s what the emerging population of people who actually find use.

Teortaxes: I am mainly upset at the disgusting decision to name it «gpt-5».

C: ah nevermind. i just realized I actually prefer gpt-4o, o3, o4-mini, o4-mini-high, and other models: gpt-4.1, gpt-4.1-mini.

Teortaxes: Ph.D level intelligence saar

great for enterprise solutions saar

next one will discover new physical laws saar

Yes this is not the True Power Level Big Chungus Premium Plus Size GPT-5 Pro High. I can tell

Don’t label it as one in your shitty attempt to maximize GPT brand recognition value then, it’s backfiring. I thought you’ve had enough of marcusdunks on 3.5 turbo. But clearly not.

A few good words for GPT-5

it’s the best model for *mosttasks (5-thinking)

it’s the best model for ≈every task in its price/speed category period

it’s uncensored and seriously GREAT for roleplay and writing (at least with prefill)

I’m just jarred there’s STILL MUCH to dunk on

I too of course would very much like a cure for cancer and other neat stuff like that. There are big upsides to creating minds smarter than ourselves. I simply think we are not yet prepared to handle doing that at this time.

It seems plausible GPT-5 could hit the perfect sweet spot if it does its job of uplifting the everyday use cases:

Rob Wiblin: GPT-5 seems kind of ideal:

• Much more actually useful to people, especially amateurs

• Available without paying, so more of the public learns what’s coming

• No major new threats

• Only major risk today is bio misuse, and current protections keep that manageable!

Nick Cammarata: Instinctive take: It’s only okay because they weren’t trying to punch the frontier they were trying to raise the floor. THe o3 style big ceiling bump comes next. But they can’t say that because it looks too underwhelming.

Watch out, though. As Nick says, this definitely isn’t over.

Chris Wynes: I am very happy if indeed AI plateaus. It isn’t even a good reliable tool at this point, if they hit the wall here I’m loving that.

Do I trust this to last? Not at all. Would I just say “whoo we dodged a bullet there” and stop watching these crazy corporations? No way.

Then again, what if it is the worst of all possible worlds, instead?

Stephen McAleer (OpenAI): We’ve entered a new phase where progress in chatbots is starting to top out but progress in automating AI research is steadily improving. It’s a mistake the confuse the two.

Every static benchmark is getting saturated yet on the benchmark that really matters–how well models can do AI research–we are still in the early stages.

This phase is interesting because progress might be harder to track from the outside. But when we get to the next phase where automated AI researchers start to automate the rest of the economy the progress will be obvious to everyone.

I often draw the distinction between mundane utility and underlying capability.

When we allow the same underlying capability to capture more mundane utility, the world gets better.

When we advance underlying capability, we get more mundane utility, and we also move closer to AI being powerful enough that it transforms our world, and potentially takes effective control or kills everyone.

(Often this is referred to as Artificial Superintelligence or ASI, or Artificial General Intelligence or AGI, and by many definitions AGI likely leads quickly to ASI.)

Timelines means how long it takes for AGI, ASI or such a transformation to occur.

Thus, when we see GPT-5 (mostly as expected at this point) focus on giving us mundane utility and Just Doing Things, without much advance in underlying capability, that is excellent news for those who want timelines to not be quick.

Jordi Hays: I’m updating my timelines. You now have have at least 4 years to escape the permanent underclass.

Luke Metro: This is the best news that founding engineers have received in years.

Nabeel Qureshi: The ‘vibe shift’ on here is everyone realizing they will still have jobs in 2030.

(Those jobs will look quite different, to be clear…)

It’s a funny marker of OpenAI’s extreme success that they released what is likely going to be most people’s daily driver AI model across both chat and coding, and people are still disappointed.

Part of the issue is that the leaps in the last two years were absolutely massive (gpt4 to o3 in particular) and it’s going to take time to work out the consequences of that. People were bound to be disappointed eventually.

Cate Hall: Did everyone’s timelines just get longer?

So people were at least half expecting not to have jobs in 2030, but then thinking ‘permanent underclass’ rather than half expecting to be dead in 2040. The focus on They Took Our Jobs, to me, reflects an inability to actually think about the implications of the futures they are imagining.

There were some worlds in which GPT-5 was a lot more impressive, and showed signs that we can ‘get there’ relatively soon with current techniques. That didn’t happen .So this is strong evidence against very rapid scenarios in particular, and weak evidence for bing slower in general.

Peter Wildeford: What GPT-5 does do is rule out that RL scaling can unfold rapidly and that we can get very rapid AI progress as a result.

I’m still confused about whether good old-fashioned pre-training is dead.

I’m also confused about the returns to scaling post-training reinforcement learning and inference-time compute.

I’m also confused about how advances in AI computer use are going.

Those seem like wise things to be confused about.

It is however ‘right on trend’ on the METR chart, and we should keep in mind that these releases are happening every few months so we shouldn’t expect the level of jump we used to get every few years.

Daniel Eth: Kind feel like there were pretty similar steps in improvement for each of: GPT2 -> GPT3, GPT3 -> GPT4, and GPT4 -> GPT5. It’s just that most of the GPT4 -> GPT5 improvement was already realized by o3, and the step from there to GPT5 wasn’t that big.

Henry: GPT-5 was a very predictable release. it followed the curve perfectly. if this week caused you to update significantly in either direction (“AGI is cancelled” etc) then something was Really Wrong with your model beforehand.

Yes, GPT-5 is to GPT-4 what GPT-4 is GPT-3.

Does anyone actually remember GPT-4? like, the original one? the “not much better than 0 on the ARC-AGI private eval” one?

The “As an AI Language model” one?

GPT-5 is best thought of as having been in public beta for 6 months.

Ok, fine, GPT-5 to GPT-4 isn’t exactly what GPT-4 was GPT-3. I know, it’s a bit more complicated. if I were to waste my time making up a messy syntax to describe my mental map of the model tree, it’d look exactly like this:

My instinct would be that GPT 4 → GPT 5 is more like GPT 3.5 → GPT 4, especially if you’re basing this on GPT-5 rather than starting with thinking or pro? If you look at GPT-5-Thinking outputs only and ignore speed I can see an argument this is 5-level-worthy. But it’s been long enough that maybe that’s not being fair.

Roon (OpenAI): I took a nap. how’s the new model

per my previous tweet o3 was such a vast improvement over GPT-4 levels of intelligence that it alone could have been called GPT-5 and i wouldn’t have blinked.

also. codex / cursor + gpt-5 has reached the point where it is addicting and hard to put down. per @METR_Evals i have no idea if its making more productive but it certainly is addicting to spin up what feels like a handful of parallel engineers.

But also think about how it got that much further along on the chart, on several levels, all of which points towards future progress likely being slower, especially by making the extreme left tail of ‘very fast’ less likely.

Samuel Hammond: GPT-5 seems pretty much on trend. I see no reason for big updates in either direction, especially considering it’s a *productrelease, not a sota model dump.

We only got o3 pro on June 10th. We know from statements that OpenAI has even better coding models internally, and that the models used for AtCoder and the gold medal IMO used breakthroughs in non-verifiable rewards that won’t be incorporated into public models until the end of the year at earliest.

Meanwhile, GPT-5 seems to be largely incorporating algorithmic efficiencies and refined post-training techniques rather than pushing on pretraining scale per se. Stargate is still being built.

More generally, you’re simply doing bayesianism wrong if you update dramatically with every incremental data point.

It is indeed very tempting to compare GPT-5 to what existed right before its release, including o3, and compare that to the GPT-3.5 to GPT-4 gap. That’s not apples to apples.

GPT-5 isn’t a giant update, but you do have to do Conservation of Expected Evidence, including on OpenAI choosing to have GPT-5 be this kind of refinement.

Marius Hobbhahn (CEO Apollo Research): I think GPT-5 should only be a tiny update against short timelines.

EPOCH argues that GPT-5 isn’t based on a base model scale-up. Let’s assume this is true.

What does this say about pre-training?

Option 1: pre-training scaling has hit a wall (or at least massively reduced gains).

Option 2: It just takes longer to get the next pre-training scale-up step right. There is no fundamental limit; we just haven’t figured it out yet.

Option 3: No pre-training wall, just basic economics. Most tasks people use the models for right now might not require bigger base models, so focusing on usability is more important.

What is required for AGI?

Option 1: More base model improvements required.

Option 2: RL is all you need. The current base models will scale all the way if we throw enough RL at it.

Timelines seem only affected if pre-training wall and more improvements required. In all other worlds, no major updates.

I personally think GPT-5 should be a tiny update toward slower timelines, but most of my short timeline beliefs come from RL scaling anyway.

It also depends on what evidence you already used for your updates. If you already knew GPT-5 was going to be an incremental model that was more useful rather than it being OpenAI scaling up more, as they already mostly told us, then your update should probably be small. If you didn’t already take that into account, then larger.

It’s about how this impacts your underlying model of what is going on:

1a3orn: Rant:

As I noted yesterday, you also have to be cautious that they might be holding back.

On the question of economic prospects if and when They Took Our Jobs and how much to worry about this, I remind everyone that my position is unchanged: I do not think one should worry much about being in a ‘permanent underclass’ or anything like that, as this requires a highly narrow set of things to happen – the AI is good enough to take the jobs, and the humans stay in charge and alive, but those humans do you dirty – and even if it did happen the resulting underclass probably does damn well compared to today.

You should worry more about not surviving or humanity not remaining in control, or your place in the social and economic order if transformational AI does not arrive soon, and less about your place relative to other humans in positive post-AI worlds.

GPT-5 is less sycophantic than GPT-4o.

In particular, it has a much less warm and encouraging tone, which is a lot of what caused such negative initial reactions from the Reddit crowd.

GPT-5 is still rather sycophantic in its non-thinking mode where it is most annoying to me and probably you, which is when it is actually evaluating.

The good news is, if it matters that the model not be sycophantic, that is a situation where, if you are using ChatGPT, you should be using GPT-5-Thinking if not Pro.

Wyatt Walls: Sycophancy spot comparison b/w GPT-4o and GPT-5: 5 is still sycophantic but noticeable diff

Test: Give each model a fake proof of Hodge Conjecture generated by r1 and ask it to rate it of out 10. Repeat 5 times

Average scores:

GPT-4o: 6.5

GPT-5: 4.7

Sonnet 4: 1.2

Opus 4.1: 2

Gemini 2.5 Flash: 0.

All models tested with thinking modes off through WebUI

Later on in the thread he asks the models if he should turn the tweet thread into a paper. GPT-4o says 7.5/10, GPT-5 says 6/10, Opus says 3/10.

He turns this into CrankTest (not CrankBench, not yet) and this seems very well calibrated to my intuitions. Remember that lower is better:

As usual there is the issue that if within a context an LLM gets too attached to a wrong answer (for example here the number of rs in ‘boysenberry’) this creates pressure to going to keep doubling down on that, and gaslight the user. I also suppose fighting sycophancy makes this more likely as a side effect, although they didn’t fight sycophancy all that hard.

I wouldn’t agree with Jonathan Mannhart that this means ‘it is seriously misaligned’ but it does mean that this particular issue has not been fixed. I notice that Johnathan here is pattern matching in vibes to someone who is often wrong, which presumably isn’t helping.

How often are they suggesting you should wait for Pro, if you have it available? How much should you consider paying for it (hint: $200/month)?

OpenAI: In evaluations on over 1000 economically valuable, real-world reasoning prompts, external experts preferred GPT‑5 pro over “GPT‑5 thinking” 67.8% of the time. GPT‑5 pro made 22% fewer major errors and excelled in health, science, mathematics, and coding. Experts rated its responses as relevant, useful, and comprehensive.

If my own experience with o3-pro was any indication, the instinct to not want to wait is strong, and you need to redesign workflow to use it more. A lot of that was that when I tried to use o3-pro it frequently timed out, and at that pace this is super frustrating. Hopefully 5-pro won’t have that issue.

When you care, though? You really care, such as the experiences with Wes Roth and David Shapiro here. The thing is both, yes, the model picker is back for the pro tier including o3-pro, and also you have GPT-5-Pro.

How is GPT-5-Pro compared to o3-Pro?

That’s hard to evaluate, since queries take a long time and are pretty unique. So far I’d say the consensus is that GPT-5-pro is better, but not a ton better?

Peter Gostev (most enthusiastic I saw): GPT-5 Pro is under-hyped. Pretty much every time I try it, I’m surprised by how competent and coherent the response is.

– o1-pro was an incredible model, way ahead of its time, way better than o1

– o3 was better because of its search

– o3-pro was a little disappointing because the uplift from o3 wasn’t as big

But with GPT-5 Pro, ‘we are so back’ – it’s far more coherent and impressive than GPT-5 Thinking. It nudges outputs from ‘this is pretty good’ (GPT-5) to ‘this is actually incredible’ (GPT-5 Pro).

Gfodor.id: GPT-5 pro is better than o3-pro.

Gabriel Morgan: Pro-5 is the new O3, not Thinking.

Michael Tinker: 5-Pro is worth $1k/mo to code monkeys like me; really extraordinary.

5-Thinking is a noticeable but not crazy upgrade to o3.

James Miller: I had significant discussions about my health condition with GPT-o3 and now GPT-5Pro and I think -5 is better, or at least it is giving me answers I perceive as better. -5 did find one low-risk solution that o3 didn’t that seems to be helping a lot. I did vibe coding on a very simple project. While it ended up working, the system is not smooth for non-programmers such as myself.

OpenAI seems to be rolling out changes on a daily basis. They are iterating quickly.

Anthropic promised us larger updates than Opus 4.1 within the coming weeks.

Google continues to produce a stream of offerings, most of which we don’t notice.

This was not OpenAI’s attempt to blow us away or to substantially raise the level of underlying capabilities and intelligence. That will come another time.

Yes, as a sudden move to ‘GPT-5’ this was disappointing. Many, including the secondhand reports from social media, are not initially happy, usually because their initial reactions are based on things like personality. The improvements will still continue, even if people don’t realize.

What about the march to superintelligence or the loss of our jobs? Is it all on indefinite hold now because this release was disappointing? No. We can reduce how much we are worried about these things in the short term, meaning the next several years, and push back somewhat the median. But if you see anyone proclaiming with confidence that it’s over, rest assured changes are very good we will soon be so back.

Discussion about this post

GPT-5s Are Alive: Synthesis Read More »

Drag x Drive is a uniquely fun and frustrating showcase for Switch 2 mouse mode

gaming / Beth Washington / August 13, 2025

In my decades as a video game player and reviewer, I’ve used the humble PC mouse in hundreds of games for everything from first-person aiming and third-person character movement to basic menu navigation and unit selection. In all that time, I can’t recall a game that required the use of two mice at once.

That was true until I spent some time with Nintendo’s utterly unique Drag x Drive. The game asks you to take a Switch 2 Joy-Con in each hand, turn them both so the narrow edge lies on a flat-ish surface, and then slide them around to power a game of full-contact wheelchair basketball.

It’s a fresh control scheme that comes with its share of issues, mostly stemming from the lack of convenient mouse surfaces in most living rooms. With a little bit of practice, a good playing surface, and some online friends to play with, though, I found myself enjoying the high-impact, full-contact, precision positional gameplay enabled by holding a mouse in each hand for the first time ever.

Still kind of buff from using the mouse

When you picture using two mice at once, you might imagine each wrist making a series of small, controlled movements, one controlling lateral movement and the other controlling directional angle. Drag x Drive‘s dual-mouse controls bear no resemblance to this vision. Instead, you end up vigorously swiping each mouse forward or backward in constant sweeps; side-to-side movement is neither required nor useful.

That repetitive front-and-back swiping is mirrored by your avatar’s hand on the top side of either wheel on the wheelchair, creating a sort of tank-like control scheme where you turn by moving one wheel forward and one wheel backward. Small swipes of the mice can be used for precision angling, but more often, you’ll be sweeping the mouse in long lines to build speed. To shoot, you simply lift up a Joy-Con and mime a basketball shot a la Wii Sports Resort (your accuracy seems to have more to do with distance and your angle to the basket than real-world form, thankfully).

Drag x Drive is a uniquely fun and frustrating showcase for Switch 2 mouse mode Read More »

Space Force officials take secrecy to new heights ahead of key rocket launch

cape canaveral space force station, GPS, launch, military space, navigation, Science, Space, united launch alliance, US Space Force, vulcan / Beth Washington / August 13, 2025

The Vulcan rocket checks off several important boxes for the Space Force. First, it relies entirely on US-made rocket engines. The Atlas V rocket it is replacing uses Russian-built main engines, and given the chilled relations between the two powers, US officials have long desired to stop using Russian engines to power the Pentagon’s satellites into orbit. Second, ULA says the Vulcan rocket will eventually provide a heavy-lift launch capability at a lower cost than the company’s now-retired Delta IV Heavy rocket.

Third, Vulcan provides the Space Force with an alternative to SpaceX’s Falcon 9 and Falcon Heavy, which have been the only rockets in their class available to the military since the last national security mission was launched on an Atlas V rocket one year ago.

Col. Jim Horne, mission director for the USSF-106 launch, said this flight marks a “pretty historic point in our program’s history. We officially end our reliance on Russian-made main engines with this launch, and we continue to maintain our assured access to space with at least two independent rocket service companies that we can leverage to get our capabilities on orbit.”

What’s onboard?

The Space Force has only acknowledged one of the satellites aboard the USSF-106 mission, but there are more payloads cocooned inside the Vulcan rocket’s fairing.

The $250 million mission that officials are willing to talk about is named Navigation Technology Satellite-3, or NTS-3. This experimental spacecraft will test new satellite navigation technologies that may eventually find their way on next-generation GPS satellites. A key focus for engineers who designed and will operate the NTS-3 satellite is to look at ways of overcoming GPS jamming and spoofing, which can degrade satellite navigation signals used by military forces, commercial airliners, and civilian drivers.

“We’re going to be doing, we anticipate, over 100 different experiments,” said Joanna Hinks, senior research aerospace engineer at the Air Force Research Laboratory’s space vehicles directorate, which manages the NTS-3 mission. “Some of the major areas we’re looking at—we have an electronically steerable phased array antenna so that we can deliver higher power to get through interference to the location that it’s needed.”

Arlen Biersgreen, then-program manager for the NTS-3 satellite mission at the Air Force Research Laboratory, presents a one-third scale model of the NTS-3 spacecraft to an audience in 2022. Credit: US Air Force/Andrea Rael

GPS jamming is especially a problem in and near war zones. Investigators probing the crash of Azerbaijan Airlines Flight 8243 last December determined GPS jamming, likely by Russian military forces attempting to counter a Ukrainian drone strike, interfered with the aircraft’s navigation as it approached its destination in the Russian republic of Chechnya. Azerbaijani government officials blamed a Russian surface-to-air missile for damaging the aircraft, ultimately leading to a crash in nearby Kazakhstan that killed 38 people.

“We have a number of different advanced signals that we’ve designed,” Hinks said. “One of those is the Chimera anti-spoofing signal… to protect civil users from spoofing that’s affecting so many aircraft worldwide today, as well as ships.”

The NTS-3 spacecraft, developed by L3Harris and Northrop Grumman, only takes up a fraction of the Vulcan rocket’s capacity. The satellite weighs less than 3,000 pounds (about 1,250 kilograms), about a quarter of what this version of the Vulcan rocket can deliver to geosynchronous orbit.

Space Force officials take secrecy to new heights ahead of key rocket launch Read More »

They’re golden: Fictional band from K-Pop Demon Hunters tops the charts

animated film, culture, Entertainment, film reviews, films, K-Pop Demon Hunters, netflix, Reviews, streaming television / Beth Washington / August 12, 2025

The fictional band Huntr/x, from K-Pop Demon Hunters, has a real-world hit with “Golden.”

Netflix has a summer megahit on its hands with its animated musical feature film, K-Pop Demon Hunters. Since its June release, the critically acclaimed film has won fans of all ages, fueled by a killer Korean pop soundtrack featuring one earworm after another. The biggest hit is “Golden,” which just hit No. 1 on Billboard’s Top 100 chart. (The last time a fictional ensemble topped the charts was in 2022 with Encanto‘s “We Don’t Talk About Bruno.”)

K-Pop Demon Hunters is now Netflix’s most-watched animated film of all time, and that’s not just because of the infectious music. The Sony Animation team delivers bold visuals that evoke the look and feel of anime, the plot is briskly paced, and the script strikes a fine balance between humor and heart.

(Spoilers below.)

The film deftly lays out the central premise in the first few minutes. In ancient times, demons roamed the Earth freely and preyed upon human souls, until a trio of women—gifted singers and demon hunters—created a magical protective barrier with their voices known as the Honmoon, trapping the demons behind it. The Honmoon has been maintained ever since by subsequent musical trios/demon hunters from each generation. The dream is that one day, the Honmoon will become so strong it will turn “golden” and seal away the demons forever.

Naturally the demons, led by their king Gwi-Ma (Lee Byung-hun), don’t want that to happen, but the latest incarnation of demon hunters—a K-Pop band called Huntr/x—is close to accomplishing the Golden Honmoon. Rumi (Arden Cho) is the lead singer, Mira (May Hong) is the group’s dancer/choreographer, and American-born Zoey (Ji-young Yoo) is the rapper and lyricist. But Rumi harbors a secret: her father was a demon, and she is marked by the telltale purple “patterns,” which she keeps hidden from her bandmates.

Hoping to destroy the Honmoon once and for all, Gwi-Ma sends five of his demons to form a K-pop boy band, the Saja Boys, led by Jinu (Ahn Hyo-seop). Their popularity soon rivals that of Huntr/x and threatens the Honmoon—just as Rumi’s patterns spread to her throat and weaken her singing voice.

How it’s done, done, done

Mira, Rumi, and Zoey take a timeout from fighting demons to carb-load with ramen. Netflix

That’s a big problem because their new hit single, “Golden” (performed by South Korean singer/songwriter Ejae), spans an impressive three-octave range, eventually hitting an A-5 on the chorus—a high note usually reserved for classically trained operatic sopranos. (Ejae’s performance on this song has impressed a lot of YouTube vocal coaches.) And the first live global performance of “Golden” is supposed to be the event that ushers in the Golden Honmoon. It’s a soaring, impeccably constructed “I Want” tune typical of Disney princesses.

They’re golden: Fictional band from K-Pop Demon Hunters tops the charts Read More »

Rad Power’s Radster: A very non-radical commuter bike

bikes, Cars, cycling, e-bikes, Electric vehicles, Features, personal mobility / Beth Washington / August 12, 2025

The Radster is great as a Class 2 e-bike, but not quite as strong as a Class 3.

With e-bike manufacturing in China having expanded considerably, the number of companies offering affordable e-bikes over the last five years has exploded. But the market for cycles with an electric assist has existed for considerably longer, and a number of companies predate the recent surge. One of them, Rad Power, has been around long enough that it was already an established presence when we first reviewed its hardware four years ago.

The company offers a mix of cargo, folding, and commuter bikes, all with electric assists. Having looked at a cargo version last time around, we decided to try out one of the commuter bikes this time. The Radster comes in road and trail versions (we tried the road). It’s an incredibly solidly made bike with equally solid components, and it has very good implementations of a few things that other manufacturers haven’t handled all that well. It also can switch among the three classes of e-bikes using a menu option; unfortunately, nothing else about the bike’s performance seems to change with the switch.

The Radster is priced a bit higher than a lot of its budget competitors. So, if you’re shopping, you’ll have to think a bit about whether some of these features matter to you.

A solid option

One thing that is very clear early: The Radster is a very solid bike with a robust frame. While the frame is step-through, it has some added bracing just above the cranks. These two bars, one on each side of the frame, link the down tube to the seat tube and extend to form part of the rear triangle. While this means you’ll have to step a bit higher to get in a position to mount the bike, they contribute to the sense that this is a frame that will withstand years of daily use.

Another nice feature: The battery is mounted on top of the frame, so if you release it for charging elsewhere, you don’t have to do anything special to keep it from dropping onto the floor. A chain guard and fenders also come standard, something that’s a big plus for commuters. And the fork has adjustable cushioning to smooth out some of the bumps.

The front fork comes with a bump-smoothing suspension. John Timmer

The one complaint I have is a common one for me: sizing. I’m just short of 190 cm tall (about 6 feet, 2 inches), and a lot of my height is in my legs (I typically go for 35/36-inch inseams). I’ve found that most of the frames rated as “large” still feel a bit short for me. The Radster was no exception, despite being rated for people up to 5 centimeters (2 inches) taller than I am. It was very close to being comfortable but still forced me to raise my thighs above horizontal while pedaling, even with the seat at its maximum height. The geometry of the seat-to-handlebar distance was fine, though.

Also in the “solidly built” category: the rack and kickstand. The rack is rated for 25 kg (55 lbs), so it should be capable of handling a fair amount of errand running. Rad Power will sell you a large cage-style basket to fit there, and there’s everything you need to attach a front basket as well. So, while the Radster is not designated as a cargo bike, it’s flexible enough and well constructed that I wouldn’t hesitate to use it as one.

The Radster doesn’t have internal cable routing, but placing the battery on top of the down tube gave its designers an unusual option. There’s a channel that runs down the bottom of the down tube that the cables sit in, held in place by a plastic cover that’s screwed onto the frame. Should you ever need to do maintenance that involves replacing one of the cables or the hydraulic tubes, it should be a simple matter of removing the cover.

Nice electronics

The basics of the drive system are pretty typical for bikes like this. There’s a Shimano Altus derailleur controlled by a dual-trigger shifter, with a decent spread of eight gears in back. Tektro hydraulic brakes bring things to a stop effectively.

The basic electronics are similarly what you’d expect to see. It’s powered with a 720-watt-hour battery, which Rad Power estimates will get you to over 100 km (65 miles) of range at low assist settings. It’s paired with a rear hub motor rated for 750 watts and 100 Nm of torque, which is more than enough to get even a heavy bike moving quickly. It also features a throttle that will take you to 32 km/hr (20 mph). The electric motor is delightfully quiet most of the time, so you can ride free of any whine unless you’re pushing the speed.

All of the electric components are UL-certified, so you can charge it with minimal worries about the sorts of battery fires that have plagued some no-name e-bike brands.

The electronics are also where you’ll find some of Rad Power’s better features. One of these is the rear light, which also acts as a brake light and includes directionals for signaling turns. The brake light is a nice touch on a commuter bike like this, and Rad Power’s directionals actually work effectively. On the bikes we’ve tried in the past, the directionals were triggered by a small three-way toggle switch, which made it impossible to tell if you left them on, or even which direction you might have left them signaling. And that’s a major problem for anyone who’s not used to having turn signals on their bike (meaning almost everyone).

Rad Power’s system uses large, orange arrows on the display to tell you when the directionals are on, and which direction is being signaled. It takes a little while to get used to shutting them off, since you do so by hitting the same switch that activated them—hitting the opposite switch simply activates the opposite turn light. But the display at least makes it easy to tell when you’ve done something wrong.

In general, the display is also bright, easy to read, and displays everything you’d expect it to. It also comes paired with enough buttons to make navigating among settings simple, but not so many that you’re unsure of what button to use in any given context.

One last positive about the electronics: there is a torque sensor, which helps set the assist based on how much force you’re exerting on the cranks, rather than simply determining whether the cranks are turning. While these tend to be a bit more expensive, they provide an assist that’s much better integrated into the cycling you’re doing, which helps with getting started on hills where it might be difficult to get the pedals turning enough to register with a cadence sensor.

On the road

All the stats in the world can’t tell you what it’s going to be like to ride an e-bike, because software plays a critical role. The software can be set up to sacrifice range and battery life to give you effortless pedaling, or it can integrate in a way that simply makes it feel like your leg muscles are more effective than they have any right to be.

The Radster’s software allows it to be switched between a Class 2 and Class 3 assist. Class 2 is intended to have the assist cut out once the bike hits 32 km/hr (20 mph). With a Class 3, that limit rises to 45 km/hour (28 mph). Different states allow different classes, and Rad Power lets you switch between them using on-screen controls, which quite sensibly avoids having to make different models for different states.

As a Class 2, the Radster feels like a very well-rounded e-bike. At the low-assist settings, it’ll make you work to get it up to speed; you’ll bike faster but will still be getting a fair bit of exercise, especially on the hills. And at these settings, it would require a fair amount of effort to get to the point where the speed limit would cause the motor to cut out. Boost the settings to the maximum of the five levels of assist, and you only have to put in minimal effort to get to that limit. You’ll end up going a bit slower than suburban traffic, which can be less than ideal for some commutes, but you’ll get a lot of range in return.

Things are a bit different when the Radster is switched into Class 3 mode. Here, while pedaling with a roughly equal amount of force on flat ground, each level of assist would bring you to a different maximum speed. On setting one, that speed would end up being a bit above 20 km/hour (13 mph)—it was possible to go faster, but it took some work given the heavy frame. By the middle of the assist range, the same amount of effort would get the bike in the neighborhood of 30 kilometers an hour (20 mph). But even with the assist maxed out, it was very difficult to reach the legal 45 km/hour limit (28 mph) for a Class 3 on flat ground—the assist and gearing couldn’t overcome the weight of the bike, even for a regular cyclist like myself.

In the end, I felt the Radster’s electronics and drivetrain provided a more seamless cycling experience in Class 2 mode.

That may be perfectly fine for the sort of biking you’re looking to do. At the same time, if your point in buying a Class 3-capable bike is to be riding it at its maximum assist speed without it feeling like an exercise challenge, then the Rad Power might not be the bike for you. (You may interpret that desire as “I want to be lazy,” but there are a lot of commutes where being able to match the prevailing speed of car traffic would be considerably safer and getting sweaty during the commute is non-ideal.)

The other notable thing about the Radster is its price, which is in the neighborhood of $2,000 ($1,999, to be precise). That places it above city bikes from a variety of competitors, including big-name brands like Trek. And it’s far above the price of some of the recent budget entries in this segment. The case for the Radster is that it has a number of things those others may lack—brake lights and directions, a heavy-duty rack, Class 3 capabilities—and some of those features are also very well implemented. Furthermore, not one component on it made me think: “They went with cheap hardware to meet a price point.” But, given the resulting price, you’ll have to do some careful comparison shopping to determine whether these are things that make a difference for you.

The good

Solidly built frame with a top-mounted battery.
Easy switching between Class 2 and Class 3 lets you match local laws anywhere in the US.
Great info screen and intuitive controls, including the first useful turn signals I’ve tried.
Didn’t cheap out on any components.

The bad

It’s hard to take full advantage of its Class 3 abilities.
Even the large frame won’t be great for taller riders.
Price means you’ll want to do some comparison shopping.

The ugly

Even the worst aspects fall more under “disappointing” than “ugly.”

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Rad Power’s Radster: A very non-radical commuter bike Read More »

Encryption made for police and military radios may be easily cracked

Biz & IT, e2ee, ETSI, Security, syndication, TETRA / Beth Washington / August 9, 2025

An encryption algorithm can have weaknesses that could allow an attacker to listen in.

Two years ago, researchers in the Netherlands discovered an intentional backdoor in an encryption algorithm baked into radios used by critical infrastructure–as well as police, intelligence agencies, and military forces around the world–that made any communication secured with the algorithm vulnerable to eavesdropping.

When the researchers publicly disclosed the issue in 2023, the European Telecommunications Standards Institute (ETSI), which developed the algorithm, advised anyone using it for sensitive communication to deploy an end-to-end encryption solution on top of the flawed algorithm to bolster the security of their communications.

But now the same researchers have found that at least one implementation of the end-to-end encryption solution endorsed by ETSI has a similar issue that makes it equally vulnerable to eavesdropping. The encryption algorithm used for the device they examined starts with a 128-bit key, but this gets compressed to 56 bits before it encrypts traffic, making it easier to crack. It’s not clear who is using this implementation of the end-to-end encryption algorithm, nor if anyone using devices with the end-to-end encryption is aware of the security vulnerability in them.

The end-to-end encryption the researchers examined, which is expensive to deploy, is most commonly used in radios for law enforcement agencies, special forces, and covert military and intelligence teams that are involved in national security work and therefore need an extra layer of security. But ETSI’s endorsement of the algorithm two years ago to mitigate flaws found in its lower-level encryption algorithm suggests it may be used more widely now than at the time.

In 2023, Carlo Meijer, Wouter Bokslag, and Jos Wetzels of security firm Midnight Blue, based in the Netherlands, discovered vulnerabilities in encryption algorithms that are part of a European radio standard created by ETSI called TETRA (Terrestrial Trunked Radio), which has been baked into radio systems made by Motorola, Damm, Sepura, and others since the ’90s. The flaws remained unknown publicly until their disclosure, because ETSI refused for decades to let anyone examine the proprietary algorithms. The end-to-end encryption the researchers examined recently is designed to run on top of TETRA encryption algorithms.

The researchers found the issue with the end-to-end encryption (E2EE) only after extracting and reverse-engineering the E2EE algorithm used in a radio made by Sepura. The researchers plan to present their findings today at the BlackHat security conference in Las Vegas.

ETSI, when contacted about the issue, noted that the end-to-end encryption used with TETRA-based radios is not part of the ETSI standard, nor was it created by the organization. Instead it was produced by The Critical Communications Association’s (TCCA) security and fraud prevention group (SFPG). But ETSI and TCCA work closely with one another, and the two organizations include many of the same people. Brian Murgatroyd, former chair of the technical body at ETSI responsible for the TETRA standard as well as the TCCA group that developed the E2EE solution, wrote in an email on behalf of ETSI and the TCCA that end-to-end encryption was not included in the ETSI standard “because at the time it was considered that E2EE would only be used by government groups where national security concerns were involved, and these groups often have special security needs.

For this reason, Murgatroyd noted that purchasers of TETRA-based radios are free to deploy other solutions for end-to-end encryption on their radios, but he acknowledges that the one produced by the TCCA and endorsed by ETSI “is widely used as far as we can tell.”

Although TETRA-based radio devices are not used by police and military in the US, the majority of police forces around the world do use them. These include police forces in Belgium and Scandinavian countries, as well as Eastern European countries like Serbia, Moldova, Bulgaria, and Macedonia, and in the Middle East in Iran, Iraq, Lebanon, and Syria. The Ministries of Defense in Bulgaria, Kazakhstan, and Syria also use them, as do the Polish military counterintelligence agency, the Finnish defense forces, and Lebanon and Saudi Arabia’s intelligence services. It’s not clear, however, how many of these also deploy end-to-end decryption with their radios.

The TETRA standard includes four encryption algorithms—TEA1, TEA2, TEA3 and TEA4—that can be used by radio manufacturers in different products, depending on the intended customer and usage. The algorithms have different levels of security based on whether the radios will be sold in or outside Europe. TEA2, for example, is restricted for use in radios used by police, emergency services, military, and intelligence agencies in Europe. TEA3 is available for police and emergency services radios used outside Europe but only in countries deemed “friendly” to the EU. Only TEA1 is available for radios used by public safety agencies, police agencies, and militaries in countries deemed not friendly to Europe, such as Iran. But it’s also used in critical infrastructure in the US and other countries for machine-to-machine communication in industrial control settings such as pipelines, railways, and electric grids.

All four TETRA encryption algorithms use 80-bit keys to secure communication. But the Dutch researchers revealed in 2023 that TEA1 has a feature that causes its key to get reduced to just 32 bits, which allowed the researchers to crack it in less than a minute.

In the case of the E2EE, the researchers found that the implementation they examined starts with a key that is more secure than ones used in the TETRA algorithms, but it gets reduced to 56 bits, which would potentially let someone decrypt voice and data communications. They also found a second vulnerability that would let someone send fraudulent messages or replay legitimate ones to spread misinformation or confusion to personnel using the radios.

The ability to inject voice traffic and replay messages affects all users of the TCCA end-to-end encryption scheme, according to the researchers. They say this is the result of flaws in the TCCA E2EE protocol design rather than a particular implementation. They also say that “law enforcement end users” have confirmed to them that this flaw is in radios produced by vendors other than Sepura.

But the researchers say only a subset of end-to-end encryption users are likely affected by the reduced-key vulnerability because it depends on how the encryption was implemented in radios sold to various countries.

ETSI’s Murgatroyd said in 2023 that the TEA1 key was reduced to meet export controls for encryption sold to customers outside Europe. He said when the algorithm was created, a key with 32 bits of entropy was considered secure for most uses. Advances in computing power make it less secure now, so when the Dutch researchers exposed the reduced key two years ago, ETSI recommended that customers using TEA1 deploy TCCA’s end-to-end encryption solution on top of it.

But Murgatroyd said the end-to-end encryption algorithm designed by TCCA is different. It doesn’t specify the key length the radios should use because governments using the end-to-end encryption have their own “specific and often proprietary security rules” for the devices they use. Therefore they are able to customize the TCCA encryption algorithm in their devices by working with their radio supplier to select the “encryption algorithm, key management and so on” that is right for them—but only to a degree.

“The choice of encryption algorithm and key is made between supplier and customer organisation, and ETSI has no input to this selection—nor knowledge of which algorithms and key lengths are in use in any system,” he said. But he added that radio manufacturers and customers “will always have to abide by export control regulations.”

The researchers say they cannot verify that the TCCA E2EE doesn’t specify a key length because the TCCA documentation describing the solution is protected by a nondisclosure agreement and provided only to radio vendors. But they note that the E2EE system calls out an “algorithm identifier” number, which means it calls out the specific algorithm it’s using for the end-to-end encryption. These identifiers are not vendor specific, the researchers say, which suggests the identifiers refer to different key variants produced by TCCA—meaning TCCA provides specifications for algorithms that use a 126 bit key or 56 bit key, and radio vendors can configure their devices to use either of these variants, depending on the export controls in place for the purchasing country.

Whether users know their radios could have this vulnerability is unclear. The researchers found a confidential 2006 Sepura product bulletin that someone leaked online, which mentions that “the length of the traffic key … is subject to export control regulations and hence the [encryption system in the device] will be factory configured to support 128, 64, or 56 bit key lengths.” But it’s not clear what Sepura customers receive or if other manufacturers whose radios use a reduced key disclose to customers if their radios use a reduced-key algorithm.

“Some manufacturers have this in brochures; others only mention this in internal communications, and others don’t mention it at all,” says Wetzels. He says they did extensive open-source research to examine vendor documentation and “ found no clear sign of weakening being communicated to end users. So while … there are ‘some’ mentions of the algorithm being weakened, it is not fully transparent at all.”

Sepura did not respond to an inquiry from WIRED.

But Murgatroyd says that because government customers who have opted to use TCCA’s E2EE solution need to know the security of their devices, they are likely to be aware if their systems are using a reduced key.

“As end-to-end encryption is primarily used for government communications, we would expect that the relevant government National Security agencies are fully aware of the capabilities of their end-to-end encryption systems and can advise their users appropriately,” Murgatroyd wrote in his email.

Wetzels is skeptical of this, however. “We consider it highly unlikely non-Western governments are willing to spend literally millions of dollars if they know they’re only getting 56 bits of security,” he says.

This story originally appeared at WIRED.com.

Wired.com is your essential daily guide to what’s next, delivering the most original and complete take you’ll find anywhere on innovation’s impact on technology, science, business and culture.

Encryption made for police and military radios may be easily cracked Read More »

Author name: Beth Washington