Features

zero-grip,-maximum-fun:-a-practical-guide-to-getting-into-amateur-ice-racing

Zero grip, maximum fun: A practical guide to getting into amateur ice racing


Where we’re racing, we don’t need roads.

A studded winter tire on a blue Subaru WRX

To drive on ice, you just need the right tires. Credit: Tim Stevens

To drive on ice, you just need the right tires. Credit: Tim Stevens

In Formula One, grip is everything. The world’s best engineers devote their careers to designing cars that maximize downforce and grip to squeeze every bit of performance out of a set of four humble tires. These cars punish their drivers by slinging them at six Gs through corners and offer similar levels of abuse in braking.

It’s all wildly impressive, but I’ve long maintained that those drivers are not the ones having the most fun. When it comes to sheer enjoyment, grip is highly overrated, and if you want proof of that, you need to try ice racing.

Should you be lucky enough to live somewhere that gets cold enough consistently enough, all you need is a good set of tires and a car that’s willing and able. That, of course, and a desire to spend more time driving sideways than straight. I’ve been ice racing for well over 20 years now, and I’m here to tell you that there’s no greater thrill on four wheels than sliding through a corner a few inches astern of a hard-charging competitor.

Here’s how you can get started.

A blue Subaru WRX STI on the ice

For street legal classes, you don’t even need a roll cage. Just the right tires and the right attitude.

Credit: Tim Stevens

For street legal classes, you don’t even need a roll cage. Just the right tires and the right attitude. Credit: Tim Stevens

Ice racing basics

There are certainly plenty of professionals out there who have dabbled in or got their start in ice racing, F1 legend Alain Prost and touring car maestro Peter Cunningham being two notable examples. And a European ice racing series called Trophée Andros formerly challenged some of the world’s top professionals to race across a series of purpose-built frozen tracks in Europe and even Quebec.

These days, however, ice racing is an almost entirely amateur pursuit, a low-temp, low-grip hobby where the biggest prize you’re likely to bring home on any given Sunday is a smile and maybe a little trophy for the mantel.

That said, there are numerous types of ice racing. The most common and accessible is time trials, basically autocrosses on ice. The Sports Car Club of Vermont ice time trial series is a reliable, well-run example, but you’ll find plenty of others, too.

Some other clubs step it up by hosting wheel-to-wheel racing on plowed ovals. Lakes Region Ice Racing Club in Moultonborough, New Hampshire, is a long-running group that has been blessed with enough ice lately to keep racing even as temperatures have increased.

At the top tier, though, you’re looking at clubs that plow full-on road courses on the ice, groups like the Adirondack Motor Enthusiast Club (AMEC), based in and around the Adirondack Park. Established in 1954, this is among the oldest ice racing clubs in the world and the one I’ve been lucky to be a member of since 2002.

Will any other discipline of motorsport teach you as much about car control? Tim Stevens

AMEC offers numerous classes, providing eligibility for everything from a bone-stock Miata to purpose-built sprint cars that look like they made a wrong turn off a dirt oval. Dedicated volunteers plow courses on lakes throughout the ADK, tirelessly searching for ice of sufficient depth and quality.

Different clubs have different requirements, but most like to see a foot of solid, clean ice. That may not sound like much, but according to the US Army Corps of Engineers, it’s plenty for eight-ton trucks. That’s enough to support not only the 60 to 100 racers that AMEC routinely sees on any frigid Sunday but also the numerous tow rigs, trailers, and plow trucks that support the action.

How do you get started? All you need is a set of tires.

Tires

Tires are the most talked-about component of any car competing on the ice, and for good reason. Clubs have different regulations for what is and is not legal for competition, but in general, you can lump ice racing tires into three categories.

The first is unstudded, street-legal tires, such as Bridgestone Blizzacks, Continental WinterContacts, and Michelin X-Ices. These tires generally have chunky, aggressive treads, generous siping, and squishy compounds. Modern snow tires like these are marvelous things, and when there’s a rough surface on the ice or some embedded snow, an unstudded tire can be extremely competitive, even keeping up with a street-legal studded tire.

These tires, like the Nokian Hakkapeliita 10 and the Pirelli Winter Ice Zero, take the chunky, aggressive tread pattern of a normal snow tire and embed some number of metallic studs. These tiny studs, which typically protrude only 1 millimeter from the tire surface, provide a massive boost in grip on smooth, polished ice.

Tim races on Nokian Hakka 10 tires, which are a street-legal studded winter tire.

Credit: Tim Stevens

Tim races on Nokian Hakka 10 tires, which are a street-legal studded winter tire. Credit: Tim Stevens

Finally, there is what is broadly called a “race stud” tire, which is anything not legal for road use. These tires range from hand-made bolt tires, put together by people who have a lot of patience and who don’t mind the smell of tire sealant, to purpose-built race rubber of the sort you’ll see on a World Rally car snow stage.

These tires offer massive amounts of grip—so much so that the feel they deliver is more like driving on dirt than on ice. Unless you DIY it, the cost typically increases substantially as well. For that reason, going to grippier tires doesn’t necessarily mean more fun for your dollar, but there are plenty of opinions on where you’ll find the sweet spot of smiles per mile.

Driver skills

The other major factor in finding success on the ice is driver skill. If you have some experience in low-grip, car-control-focused driving like rally or drift, you’ll have a head start over someone who’s starting fresh. But if I had a dollar for every rally maestro or drifter I’ve seen swagger their way out onto the ice and then wedge their car straight into the first snowbank, I’d have at least five or six extra dollars to my name.

Ice racing is probably the purest and most challenging form of low-grip driving. On ice, the performance envelope of a normal car on normal tires is extremely small. Driving fast on ice, then, means learning how to make your car do what you want, even when you’re far outside of that envelope.

There are many techniques involved, but it all starts with getting comfortable with entering your car into a slide and sustaining it. Learning to balance your car in a moderate drift, dancing between terminal understeer (plowing into the snowbank nose-first) and extreme oversteer (spinning into the snowbank tail-first), is key. That comfort simply takes time.

Reading the ice

Ruts in the ice made by ice racing

The condition of the track changes constantly.

Credit: Tim Stevens

The condition of the track changes constantly. Credit: Tim Stevens

Once you figure out how to keep your car going in the right direction, and once you stop making sedan-shaped holes in snowbanks, the next trick is to learn how to read the ice.

The grip level of the ice constantly evolves throughout the day. The street-legal tires tend to polish it off, wearing down rougher sections into smoothly polished patches with extremely low grip. The race studs, on the other hand, chew it up again, creating a heavily textured surface.

If you’re on the less extreme sorts of tires, you’ll find the most grip on that rough, unused ice. In a race stud, you want to seek out smooth, clean ice because it will give your studs better purchase.

If you’re familiar with road racing, it’s a little like running a rain line: not necessarily driving the shortest path around, but instead taking the one that offers the most grip. Imagine a rain line that changes every lap and you start to get the picture.

How can I try it?

Intrigued? The good news is that ice racing is among the most accessible and affordable forms of motorsport on the planet, possibly second only to autocrossing. Costs vary widely, but in my club, AMEC, a full day of racing costs $70. That’s for three heat races and a practice session. Again, all you need is a set of snow tires, which will last the full season if you don’t abuse them.

The bad news, of course, is that you need to be close to an ice racing club. They’re getting harder and harder to find, and active clubs generally have shorter seasons with fewer events. If you can’t find one locally, you may need to travel, which increases the cost and commitment substantially.

If you don’t live where the lakes freeze, you’ll have to travel. Tim Stevens

If cost is no issue, you certainly have more opportunities. We’ve already reported on McLaren’s program, but it’s not alone. Exotic brands like Ferrari and Lamborghini also offer winter driving programs, where you can wheel amazing cars in glamorous places like St. Moritz and Livigno. The cost is very much in the “if you have to ask” category.

Dirtfish, one of the world’s greatest rally schools, also offers an ice-driving program in Wisconsin, starting at about $2,000 for a single day. This is a great, if expensive, way to get a feel for the skills you’ll need on ice.

And if you just want the most seat time, look for programs like Lapland Ice Driving or Ice Drive Sweden. The northern wilds of Sweden and Finland are full of frozen lakes where clubs plow out full race courses, sometimes repeating Formula One circuits. If you have the funds, you can rent any manner of sports car and run it sideways all day long on proper studded tires.

Whatever it costs and whatever you have to do to make it happen, ice racing is well worth the effort. I’ve been lucky to drive a long list of amazing cars in amazing places, but nothing comes close to the joy of wheeling my 20-year-old Subaru around a frozen lake.

Zero grip, maximum fun: A practical guide to getting into amateur ice racing Read More »

password-managers’-promise-that-they-can’t-see-your-vaults-isn’t-always-true

Password managers’ promise that they can’t see your vaults isn’t always true


ZERO KNOWLEDGE, ZERO CLUE

Contrary to what password managers say, a server compromise can mean game over.

Over the past 15 years, password managers have grown from a niche security tool used by the technology savvy into an indispensable security tool for the masses, with an estimated 94 million US adults—or roughly 36 percent of them—having adopted them. They store not only passwords for pension, financial, and email accounts, but also cryptocurrency credentials, payment card numbers, and other sensitive data.

All eight of the top password managers have adopted the term “zero knowledge” to describe the complex encryption system they use to protect the data vaults that users store on their servers. The definitions vary slightly from vendor to vendor, but they generally boil down to one bold assurance: that there is no way for malicious insiders or hackers who manage to compromise the cloud infrastructure to steal vaults or data stored in them. These promises make sense, given previous breaches of LastPass and the reasonable expectation that state-level hackers have both the motive and capability to obtain password vaults belonging to high-value targets.

A bold assurance debunked

Typical of these claims are those made by Bitwarden, Dashlane, and LastPass, which together are used by roughly 60 million people. Bitwarden, for example, says that “not even the team at Bitwarden can read your data (even if we wanted to).” Dashlane, meanwhile, says that without a user’s master password, “malicious actors can’t steal the information, even if Dashlane’s servers are compromised.” LastPass says that no one can access the “data stored in your LastPass vault, except you (not even LastPass).”

New research shows that these claims aren’t true in all cases, particularly when account recovery is in place or password managers are set to share vaults or organize users into groups. The researchers reverse-engineered or closely analyzed Bitwarden, Dashlane, and LastPass and identified ways that someone with control over the server—either administrative or the result of a compromise—can, in fact, steal data and, in some cases, entire vaults. The researchers also devised other attacks that can weaken the encryption to the point that ciphertext can be converted to plaintext.

“The vulnerabilities that we describe are numerous but mostly not deep in a technical sense,” the researchers from ETH Zurich and USI Lugano wrote. “Yet they were apparently not found before, despite more than a decade of academic research on password managers and the existence of multiple audits of the three products we studied. This motivates further work, both in theory and in practice.”

The researchers said in interviews that multiple other password managers they didn’t analyze as closely likely suffer from the same flaws. The only one they were at liberty to name was 1Password. Almost all the password managers, they added, are vulnerable to the attacks only when certain features are enabled.

The most severe of the attacks—targeting Bitwarden and LastPass—allow an insider or attacker to read or write to the contents of entire vaults. In some cases, they exploit weaknesses in the key escrow mechanisms that allow users to regain access to their accounts when they lose their master password. Others exploit weaknesses in support for legacy versions of the password manager. A vault-theft attack against Dashlane allowed reading but not modification of vault items when they were shared with other users.

Staging the old key switcheroo

One of the attacks targeting Bitwarden key escrow is performed during the enrollment of a new member of a family or organization. After a Bitwarden group admin invites the new member, the invitee’s client accesses a server and obtains a group symmetric key and the group’s public key. The client then encrypts the symmetric key with the group public key and sends it to the server. The resulting ciphertext is what’s used to recover the new user’s account. This data is never integrity-checked when it’s sent from the server to the client during an account enrollment session.

The adversary can exploit this weakness by replacing the group public key with one from a keypair created by the adversary. Since the adversary knows the corresponding private key, it can use it to decrypt the ciphertext and then perform an account recovery on behalf of the targeted user. The result is that the adversary can read and modify the entire contents of the member vault as soon as an invitee accepts an invitation from a family or organization.

Normally, this attack would work only when a group admin has enabled autorecovery mode, which, unlike a manual option, doesn’t require interaction from the member. But since the group policy the client downloads during the enrollment policy isn’t integrity-checked, adversaries can set recovery to auto, even if an admin had chosen a manual mode that requires user interaction.

Compounding the severity, the adversary in this attack also obtains a group symmetric key for all other groups the member belongs to since such keys are known to all group members. If any of the additional groups use account recovery, the adversary can obtain the members’ vaults for them, too. “This process can be repeated in a worm-like fashion, infecting all organizations that have key recovery enabled and have overlapping members,” the research paper explained.

A second attack targeting Bitwarden account recovery can be performed when a user rotates vault keys, an option Bitwarden recommends if a user believes their master password has been compromised. When account recovery is on (either manually or automatically), the user client regenerates the recovery ciphertext, which as described earlier involves obtaining a new public key that’s encrypted with the organization public key. The researchers denoted the group public key as pkorg. They denote the public key supplied by the adversary as pkadvorg, the recovery ciphertext as crec, and the user symmetric key as k.

The paper explained:

The key point here is that pkorg is not retrieved from the user’s vault; rather the client performs a sync operation with the server to obtain it. Crucially, the organization data provided by this sync operation is not authenticated in any way. This thus provides the adversary with another opportunity to obtain a victim’s user key, by supplying a new public key pkadvorg, for which they know the skadvorg and setting the account recovery enrollment to true. The client will then send an account recovery ciphertext crec containing the new user key, which the adversary can decrypt to obtain k′.

The third attack on the Bitwarden account recovery allows an adversary to recover a user’s master key. It abuses key connector, a feature primarily used by enterprise customers.

More ways to pilfer vaults

The attack allowing theft of LastPass vaults also targets key escrow, specifically in the Teams and Teams 5 versions, when a member’s master key is reset by a privileged user known as a superadmin. The next time the member logs in through the LastPass browser extension, their client will retrieve an RSA keypair assigned to each superadmin in the organization, encrypt their new key with each one, and send the resulting ciphertext to each superadmin.

Because LastPass also fails to authenticate the superadmin keys, an adversary can once again replace the superadmin public key (pkadm) with their own public key (pkadvadm).

“In theory, only users in teams where password reset is enabled and who are selected for reset should be affected by this vulnerability,” the researchers wrote. “In practice, however, LastPass clients query the server at each login and fetch a list of admin keys. They then send the account recovery ciphertexts independently of enrollment status.” The attack, however, requires the user to log in to LastPass with the browser extension, not the standalone client app.

Several attacks allow reading and modification of shared vaults, which allow a user to share selected items with one or more other users. When Dashlane users share an item, their client apps sample a fresh symmetric key, which either directly encrypts the shared item or, when sharing with a group, encrypts group keys, which in turn encrypt the shared item. In either case, the newly created RSA keypair(s)—belonging to either the shared user or group—isn’t authenticated. The item is then encrypted with the private key(s).

An adversary can supply their own keypair and use the public key to encrypt the ciphertext sent to the recipients. The adversary then decrypts that ciphertext with their corresponding secret key to recover the shared symmetric key. With that, the adversary can read and modify all shared items. When sharing is used in either Bitwarden or LastPass, similar attacks are possible and lead to the same consequence.

Another avenue for attackers or adversaries with control of a server is to target the backward compatibility that all three password managers provide to support older, less-secure versions. Despite incremental changes designed to harden the apps against the very attacks described in the paper, all three password managers continue to support the versions without these improvements. This backward compatibility is a deliberate decision intended to prevent users who haven’t upgraded from losing access to their vaults.

The severity of these attacks is lower than that of the previous ones described, with the exception of one, which is possible against Bitwarden. Older versions of the password manager used a single symmetric key to encrypt and decrypt the user key from the server and items inside vaults. This design allowed for the possibility that an adversary could tamper with the contents. To add integrity checks, newer versions provide authenticated encryption by augmenting the symmetric key with an HMAC hash function.

To protect customers using older app versions, Bitwarden ciphertext has an attribute of either 0 or 1. A 0 designates authenticated encryption, while a 1 supports the older unauthenticated scheme. Older versions also use a key hierarchy that Bitwarden deprecated to harden the app. To support the old hierarchy, newer client versions generate a new RSA keypair for the user if the server doesn’t provide one. The newer version will proceed to encrypt the secret key portion with the master key if no user ciphertext is provided by the server.

This design opens Bitwarden to several attacks. The most severe, allowing reading (but not modification) of all items created after the attack is performed. At a simplified level, it works because the adversary can forge the ciphertext sent by the server and cause the client to use it to derive a user key known to the adversary.

The modification causes the use of CBC (cipher block chaining), a form of encryption that’s vulnerable to several attacks. An adversary can exploit this weaker form using a padding oracle attack and go on to retrieve the plaintext of the vault. Because HMAC protection remains intact, modification isn’t possible.

Surprisingly, Dashlane was vulnerable to a similar padding oracle attack. The researchers devised a complicated attack chain that would allow a malicious server to downgrade a Dashlane user’s vault to CBC and exfiltrate the contents. The researchers estimate that the attack would require about 125 days to decrypt the ciphertext.

Still other attacks against all three password managers allow adversaries to greatly reduce the selected number of hashing iterations—in the case of Bitwarden and LastPass, from a default of 600,000 to 2. Repeated hashing of master passwords makes them significantly harder to crack in the event of a server breach that allows theft of the hash. For all three password managers, the server sends the specified iteration count to the client, with no mechanism to ensure it meets the default number. The result is that the adversary receives a 200,000-fold increase in the time and resources required to crack the hash and obtain the user’s master password.

Attacking malleability

Three of the attacks—one against Bitwarden and two against LastPass—target what the researchers call “item-level encryption” or “vault malleability.” Instead of encrypting a vault in a single, monolithic blob, password managers often encrypt individual items, and sometimes individual fields within an item. These items and fields are all encrypted with the same key. The attacks exploit this design to steal passwords from select vault items.

An adversary mounts an attack by replacing the ciphertext in the URL field, which stores the link where a login occurs, with the ciphertext for the password. To enhance usability, password managers provide an icon that helps visually recognize the site. To do this, the client decrypts the URL field and sends it to the server. The server then fetches the corresponding icon. Because there’s no mechanism to prevent the swapping of item fields, the client decrypts the password instead of the URL and sends it to the server.

“That wouldn’t happen if you had different keys for different fields or if you encrypted the entire collection in one pass,” Kenny Paterson, one of the paper co-authors, said. “A crypto audit should spot it, but only if you’re thinking about malicious servers. The server is deviating from expected behavior.

The following table summarizes the causes and consequences of the 25 attacks they devised:

Credit: Scarlata et al.

Credit: Scarlata et al.

A psychological blind spot

The researchers acknowledge that the full compromise of a password manager server is a high bar. But they defend the threat model.

“Attacks on the provider server infrastructure can be prevented by carefully designed operational security measures, but it is well within the bounds of reason to assume that these services are targeted by sophisticated nation-state-level adversaries, for example via software supply-chain attacks or spearphishing,” they wrote. “Moreover, some of the service providers have a history of being breached—for example LassPass suffered branches in 2015 and 2022, and another serious security incident in 2021.

They went on to write: “While none of the breaches we are aware of involved reprogramming the server to make it undertake malicious actions, this goes just one step beyond attacks on password manager service providers that have been documented. Active attacks more broadly have been documented in the wild.”

Part of the challenge of designing password managers or any end-to-end encryption service is the tendency for a false sense of security of the client.

“It’s a psychological problem when you’re writing both client and server software,” Paterson explained. “You should write the client super defensively, but if you’re also writing the server, well of course your server isn’t going to send malformed packets or bad info. Why would you do that?”

Marketing gimmickry or not, “zero-knowledge” is here to stay

In many of the cases, engineers have already fixed the weaknesses described after receiving private reports from the researchers. Engineers are still patching other vulnerabilities. In statements, Bitwarden, Lastpass, and Dashlane representatives noted the high bar of the threat model, despite statements on their websites that assure customers their wares will withstand it. Along with 1Password representatives, they also noted that their products regularly receive stringent security audits and undergo red-team exercises.

A Bitwarden representative wrote:

Bitwarden continually evaluates and improves its software through internal review, third-party assessments, and external research. The ETH Zurich paper analyzes a threat model in which the server itself behaves maliciously and intentionally attempts to manipulate key material and configuration values. That model assumes full server compromise and adversarial behavior beyond standard operating assumptions for cloud services.

LastPass said, “We take a multi‑layered, ongoing approach to security assurance that combines independent oversight, continuous monitoring, and collaboration with the research community. Our cloud security testing is inclusive of the scenarios referenced in the malicious-server threat model outlined in the research.”

Specific measures include:

A statement from Dashlane read, “Dashlane conducts rigorous internal and external testing to ensure the security of our product. When issues arise, we work quickly to mitigate any possible risk and ensure customers have clarity on the problem, our solution, and any required actions.”

1Password released a statement that read in part:

Our security team reviewed the paper in depth and found no new attack vectors beyond those already documented in our publicly available Security Design White Paper.

We are committed to continually strengthening our security architecture and evaluating it against advanced threat models, including malicious-server scenarios like those described in the research, and evolving it over time to maintain the protections our users rely on.

1Password also says that the zero-knowledge encryption it provides “means that no one but you—not even the company that’s storing the data—can access and decrypt your data. This protects your information even if the server where it’s held is ever breached.” In the company’s white paper linked above, 1Password seems to allow for this possibility when it says:

At present there’s no practical method for a user to verify the public key they’re encrypting data to belongs to their intended recipient. As a consequence it would be possible for a malicious or compromised 1Password server to provide dishonest public keys to the user, and run a successful attack. Under such an attack, it would be possible for the 1Password server to acquire vault encryption keys with little ability for users to detect or prevent it.

1Password’s statement also includes assurances that the service routinely undergoes rigorous security testing.

All four companies defended their use of the term “zero knowledge.” As used in this context, the term can be confused with zero-knowledge proofs, a completely unrelated cryptographic method that allows one party to prove to another party that they know a piece of information without revealing anything about the information itself. An example is a proof that shows a system can determine if someone is over 18 without having any knowledge of the precise birthdate.

The adulterated zero-knowledge term used by password managers appears to have come into being in 2007, when a company called Spider Oak used it to describe its cloud infrastructure for securely sharing sensitive data. Interestingly, Spider Oak formally retired the term a decade later after receiving user pushback.

“Sadly, it is just marketing hype, much like ‘military-grade encryption,’” Matteo Scarlata, lead author of the paper said. “Zero-knowledge seems to mean different things to different people (e.g., LastPass told us that they won’t adopt a malicious server threat model internally). Much unlike ‘end-to-end encryption,’ ‘zero-knowledge encryption’ is an elusive goal, so it’s impossible to tell if a company is doing it right.”

Photo of Dan Goodin

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

Password managers’ promise that they can’t see your vaults isn’t always true Read More »

sideways-on-the-ice,-in-a-supercar:-stability-control-is-getting-very-good

Sideways on the ice, in a supercar: Stability control is getting very good


To test stability control, it helps to have a wide-open space with very low grip.

A blue McLaren Artura drifting on a frozen lake

You can tell this photo was taken a day or two before we were there because the sun came out. Credit: McLaren

You can tell this photo was taken a day or two before we were there because the sun came out. Credit: McLaren

SAARISELKÄ, FINLAND—If you’re expecting it, the feeling in the pit of your stomach when the rear of your car breaks traction and begins to slide is rather pleasant. It’s the same exhilaration we get from roller coasters, but when you’re in the driver’s seat, you’re in charge of the ride.

When you’re not expecting it, though, there’s anxiety instead of excitement and, should the slide end with a crunch, a lot more negative emotions, too.

Thankfully, fewer and fewer drivers will have to experience that kind of scare thanks to the proliferation and sophistication of modern electronic stability and traction control systems. For more than 30 years, these electronic safety nets have grown in capability and became mandatory in the early 2010s, saving countless crashes in the process.

Through a combination of cutting engine power and individually braking each wheel, the computers that keep a watchful eye on things like lateral acceleration and wheel spin gather it all together with the idea that the car goes where the driver wants it rather than sideways or backward into whatever solid object lies along the new path of motion.

Obviously, the quickest way to find out whether this all works is to turn it off. And then find a slippery road, or just drive like an oaf. Yet even when automakers let journalists loose on racetracks, they invariably require that we keep some of the electronic safety net turned on. Even on track, you can hit things that will crumple a car—or worse—and with modern tire technology being what it is, the speeds involved when cars do let go tend to be quite high, particularly if it’s dry.

An orange McLaren Artura, seen from behind on a frozen lake. The rear is encrusted with snow.

The Artura is probably my favorite McLaren, as it’s smaller and more versatile than the more expensive, more powerful machines in the range.

Credit: Jonathan Gitlin

The Artura is probably my favorite McLaren, as it’s smaller and more versatile than the more expensive, more powerful machines in the range. Credit: Jonathan Gitlin

There are few environments that are more conducive to exploring the limits and capabilities of electronic chassis control. Ideally, you want a lot of wide-open space free of wildlife and people and a smooth, low-grip surface. A giant sand dune would work. Or a frozen lake. Which is why you can sometimes find automotive engineers hanging out in these remote, often extreme locations, braving the desert’s heat or an Arctic chill as they work on a prototype or fine-tune the next model.

And it’s no secret that sliding a car on the ice is a lot fun. So it’s not surprising that a cottage tourism industry exists that—for a suitable fee—will bring you north of the Arctic Circle where you can work on your car control and get some insight into just how hard those electronics are capable of working.

That explains why I left an extremely cold Washington, DC, to travel to an even colder Saariselkä in Finland, where McLaren operates its Arctic Experience program on a frozen lake in nearby Ivalo. The company does some development work here, though more of it happens across the border in Sweden. But for a few weeks each winter, it welcomes customers to its minimalist lodge to work on their car control. And earlier this month, Ars was among a group of journalists who got an abbreviated version of the experience.

Our car for the day was a Ventura Orange McLaren Artura, the brand’s plug-in hybrid supercar, wearing Pirelli’s Sottozero winter tires, each augmented by a few hundred metal spikes. Its total power and torque output is 671 hp (500 kW) and 531 lb-ft (720 Nm) combined from a 3.0 L twin-turbo V6 that generates 577 hp (430 kW) and 431 lb-ft (584 Nm), plus an axial flux electric motor that contributes an additional 94 hp (70 kW) and 166 lb-ft (225 Nm). All of that is sent to the rear wheels via an eight-speed dual-clutch transmission.

A McLaren Artura winter tire fitted with studs

Winter tires work well on snow, but for ice, you really need studs.

Credit: Jonathan Gitlin

Winter tires work well on snow, but for ice, you really need studs. Credit: Jonathan Gitlin

Where most hybrids use the electric motor to boost efficiency, McLaren mostly uses it to boost performance, providing an immediate shove and filling gaps in the torque band where necessary. In electric-only mode, it will do just that, right up to the 81 mph (130 km/h) speed limit of the mode. Being the sort of curious nerd I am, I took the opportunity to try all the different modes.

Once I got control of my stomach, that is.

Are you sure you should drink that?

Our first exercise was ironically the hardest: driving sideways around a plain old circle. A couple of these had been scribed into the ice—which freezes from November until April and was 28 inches (70 cm) thick, we learned—along with more than a dozen other, more involved courses. Even under the best of conditions, the Sun spends barely six hours a day on its shallow curve from horizon to horizon at this time of year. On the day of our visit, the horizon was an indistinct thing as heavy gray skies blended with the snow-covered ice.

The lack of a visual reference, mixed with 15 minutes of steady lateral G-forces, turned out to be unkind to my vestibular system, and about 10 minutes later, I found myself in shirtsleeves at minus-11˚F (minus-23˚C), saying goodbye to a cup of Earl Grey tea I’d previously and perhaps unwisely drunk a little earlier. At least I remembered to face downwind—given the sideways gale, it could have ended worse.

A number of circuits carved into the surface of a frozen lake

These are just some of the circuits that McLaren has carved into the ice in Ivalo. Beware of the innocent-looking circles—they’re deceptively hard and may turn your stomach.

Credit: McLaren

These are just some of the circuits that McLaren has carved into the ice in Ivalo. Beware of the innocent-looking circles—they’re deceptively hard and may turn your stomach. Credit: McLaren

Fortified with an anti-emetic and some extremely fresh air, I returned to the ice and can happily report that as long as you slide both left and right, you’re unlikely to get nauseous.

Getting an Artura sideways on a frozen lake is not especially complicated. With the powertrain set to Track, which prioritizes performance and keeps the V6 running the whole time, and with stability and traction control off, you apply enough power to break traction at the rear. Or a dab of brake could do the job, too, followed by some power. You steer more with your right foot than your hands, adding or subtracting power to reign in or amplify the slip angle. Your eyes are crucial to the process; if you look through the corner down the track, that’s probably where you’ll end up. Fixate on the next apex and you may quickly find yourself off-course.

Most of the mid-engined Artura’s 3,303 lbs (1,498 kg) live between its axles, and it’s a relatively easy car to catch once it begins to slide, with plenty of travel for the well-mapped throttle pedal.

As it turns out, that holds true even when you’re using only the electric motor. 166 lb-ft is more than enough to get the rear wheels spinning on the ice, but with just 94 hp, there isn’t really enough power to get the car properly sideways. So you can easily control a lazy slide around one of the handling courses, in near silence, to boot. Turn the electronic aids back on and things got much less dramatic; even with my foot to the floor, the Artura measured out minute amounts of power, keeping the car very much pointed where I steered it rather than requiring any opposite lock.

A person stands next to a McLaren Artura on a frozen lake

It feels like the edge of the world out here.

Credit: McLaren

It feels like the edge of the world out here. Credit: McLaren

Turn it on, turn it off

Back in track mode, with all 671 hp to play with, there was much more power than necessary to spin. But with the safety net re-enabled, driving around the handling course was barely any more dramatic than with a fraction of the power. The car’s electronic chassis control algorithms would only send as much power to the rear wheels as they could deploy, no matter how much throttle I applied. As each wheel lost grip and began to spin, its brake would intervene. And we went around the course, slowly but safely. As a demonstration of the effectiveness of modern electronic safety systems, it was very reassuring.

As I mentioned earlier, even when journalists are let loose in supercars on track, it’s with some degree of electronic assist enabled. Because for the sportier kind of car, you’ll often find some degree of halfway house between everything on and buttoned down and all the aids turned off. Here, the idea is to loosen the safety net and allow the car to move around, but only a little. Instead of just using the electronics to make things safe, they’ll also flatter the driver.

In McLaren’s case, that mode is called Variable Drift Control, which is a rather accurate name—in this mode, you set the maximum slip angle (from 1˚–15˚), and the car will not exceed that. And that’s exactly what it does. A slug of power will get the rear wheels spinning and the rear sliding, but only up to the set degree, at which point the brakes and powertrain will interrupt as necessary.

It’s very flattering, holding what feels like a lurid slide between turns with ease, without any concern that a lapse in concentration might leave the car requiring recovery after beaching on a few inches of snow. Even when your right foot is pinned to the firewall, the silicon brains running the show apply only as much torque as necessary, with the little icon flashing on the dash letting you know it’s intervening.

A man seen drifting a McLaren

If you have the space, there’s little more fun than drifting a car on ice. But it’s good to know that electronic stability control and traction control will help you out when you’re not trying to have fun.

Credit: McLaren

If you have the space, there’s little more fun than drifting a car on ice. But it’s good to know that electronic stability control and traction control will help you out when you’re not trying to have fun. Credit: McLaren

I can certainly see why OEMs ask that modes like VDC are the spiciest setting we try when they lend us their cars. They’re just permissive enough to break the rear loose and fire off a burst of adrenaline, yet cosseting enough that the ride almost certainly won’t end in tears. Fun though VDC was to play with, it does feel artificial once you get your eye in—particularly compared to the thrill of balancing an Artura on the throttle as you change direction through a series of corners or the satisfaction of catching and recovering a spin before it becomes too late.

But outside of a frozen lake, I’ll be content to keep some degree of driver aids running.

Photo of Jonathan M. Gitlin

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

Sideways on the ice, in a supercar: Stability control is getting very good Read More »

platforms-bend-over-backward-to-help-dhs-censor-ice-critics,-advocates-say

Platforms bend over backward to help DHS censor ICE critics, advocates say


Pam Bondi and Kristi Noem sued for coercing platforms into censoring ICE posts.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

Pressure is mounting on tech companies to shield users from unlawful government requests that advocates say are making it harder to reliably share information about Immigration and Customs Enforcement (ICE) online.

Alleging that ICE officers are being doxed or otherwise endangered, Trump officials have spent the last year targeting an unknown number of users and platforms with demands to censor content. Early lawsuits show that platforms have caved, even though experts say they could refuse these demands without a court order.

In a lawsuit filed on Wednesday, the Foundation for Individual Rights and Expression (FIRE) accused Attorney General Pam Bondi and Department of Homeland Security Secretary Kristi Noem of coercing tech companies into removing a wide range of content “to control what the public can see, hear, or say about ICE operations.”

It’s the second lawsuit alleging that Bondi and DHS officials are using regulatory power to pressure private platforms to suppress speech protected by the First Amendment. It follows a complaint from the developer of an app called ICEBlock, which Apple removed from the App Store in October. Officials aren’t rushing to resolve that case—last month, they requested more time to respond—so it may remain unclear until March what defense they plan to offer for the takedown demands.

That leaves community members who monitor ICE in a precarious situation, as critical resources could disappear at the department’s request with no warning.

FIRE says people have legitimate reasons to share information about ICE. Some communities focus on helping people avoid dangerous ICE activity, while others aim to hold the government accountable and raise public awareness of how ICE operates. Unless there’s proof of incitement to violence or a true threat, such expression is protected.

Despite the high bar for censoring online speech, lawsuits trace an escalating pattern of DHS increasingly targeting websites, app stores, and platforms—many that have been willing to remove content the government dislikes.

Officials have ordered ICE-monitoring apps to be removed from app stores and even threatened to sanction CNN for simply reporting on the existence of one such app. Officials have also demanded that Meta delete at least one Chicago-based Facebook group with 100,000 members and made multiple unsuccessful attempts to unmask anonymous users behind other Facebook groups. Even encrypted apps like Signal don’t feel safe from officials’ seeming overreach. FBI Director Kash Patel recently said he has opened an investigation into Signal chats used by Minnesota residents to track ICE activity, NBC News reported.

As DHS censorship threats increase, platforms have done little to shield users, advocates say. Not only have they sometimes failed to reject unlawful orders that simply provided a “a bare mention of ‘officer safety/doxing’” as justification, but in one case, Google complied with a subpoena that left a critical section blank, the Electronic Frontier Foundation (EFF) reported.

For users, it’s increasingly difficult to trust that platforms won’t betray their own policies when faced with government intimidation, advocates say. Sometimes platforms notify users before complying with government requests, giving users a chance to challenge potentially unconstitutional demands. But in other cases, users learn about the requests only as platforms comply with them—even when those platforms have promised that would never happen.

Government emails with platforms may be exposed

Platforms could face backlash from users if lawsuits expose their communications to the government, a possibility in the coming months. Last fall, the EFF sued after DOJ, DHS, ICE, and Customs and Border Patrol failed to respond to Freedom of Information Act requests seeking emails between the government and platforms about takedown demands. Other lawsuits may surface emails in discovery. In the coming weeks, a judge will set a schedule for EFF’s litigation.

“The nature and content of the Defendants’ communications with these technology companies” is “critical for determining whether they crossed the line from governmental cajoling to unconstitutional coercion,” EFF’s complaint said.

EFF Senior Staff Attorney Mario Trujillo told Ars that the EFF is confident it can win the fight to expose government demands, but like most FOIA lawsuits, the case is expected to move slowly. That’s unfortunate, he said, because ICE activity is escalating, and delays in addressing these concerns could irreparably harm speech at a pivotal moment.

Like users, platforms are seemingly victims, too, FIRE senior attorney Colin McDonnell told Ars.

They’ve been forced to override their own editorial judgment while navigating implicit threats from the government, he said.

“If Attorney General Bondi demands that they remove speech, the platform is going to feel like they have to comply; they don’t have a choice,” McDonnell said.

But platforms do have a choice and could be doing more to protect users, the EFF has said. Platforms could even serve as a first line of defense, requiring officials to get a court order before complying with any requests.

Platforms may now have good reason to push back against government requests—and to give users the tools to do the same. Trujillo noted that while courts have been slow to address the ICEBlock removal and FOIA lawsuits, the government has quickly withdrawn requests to unmask Facebook users soon after litigation began.

“That’s like an acknowledgement that the Trump administration, when actually challenged in court, wasn’t even willing to defend itself,” Trujillo said.

Platforms could view that as evidence that government pressure only works when platforms fail to put up a bare-minimum fight, Trujillo said.

Platforms “bend over backward” to appease DHS

An open letter from the EFF and the American Civil Liberties Union (ACLU) documented two instances of tech companies complying with government demands without first notifying users.

The letter called out Meta for unmasking at least one user without prior notice, which groups noted “potentially” occured due to a “technical glitch.”

More troubling than buggy notifications, however, is the possibility that platforms may be routinely delaying notice until it’s too late.

After Google “received an ICE subpoena for user data and fulfilled it on the same day that it notified the user,” the company admitted that “sometimes when Google misses its response deadline, it complies with the subpoena and provides notice to a user at the same time to minimize the delay for an overdue production,” the letter said.

“This is a worrying admission that violates [Google’s] clear promise to users, especially because there is no legal consequence to missing the government’s response deadline,” the letter said.

Platforms face no sanctions for refusing to comply with government demands that have not been court-ordered, the letter noted. That’s why the EFF and ACLU have urged companies to use their “immense resources” to shield users who may not be able to drop everything and fight unconstitutional data requests.

In their letter, the groups asked companies to insist on court intervention before complying with a DHS subpoena. They should also resist DHS “gag orders” that ask platforms to hand over data without notifying users.

Instead, they should commit to giving users “as much notice as possible when they are the target of a subpoena,” as well as a copy of the subpoena. Ideally, platforms would also link users to legal aid resources and take up legal fights on behalf of vulnerable users, advocates suggested.

That’s not what’s happening so far. Trujillo told Ars that it feels like “companies have bent over backward to appease the Trump administration.”

The tide could turn this year if courts side with app makers behind crowdsourcing apps like ICEBlock and Eyes Up, who are suing to end the alleged government coercion. FIRE’s McDonnell, who represents the creator of Eyes Up, told Ars that platforms may feel more comfortable exercising their own editorial judgment moving forward if a court declares they were coerced into removing content.

DHS can’t use doxing to dodge First Amendment

FIRE’s lawsuit accuses Bondi and Noem of coercing Meta to disable a Facebook group with 100,000 members called “ICE Sightings–Chicagoland.”

The popularity of that group surged during “Operation Midway Blitz,” when hundreds of agents arrested more than 4,500 people over weeks of raids that used tear gas in neighborhoods and caused car crashes and other violence. Arrests included US citizens and immigrants of lawful status, which “gave Chicagoans reason to fear being injured or arrested due to their proximity to ICE raids, no matter their immigration status,” FIRE’s complaint said.

Kassandra Rosado, a lifelong Chicagoan and US citizen of Mexican descent, started the Facebook group and served as admin, moderating content with other volunteers. She prohibited “hate speech or bullying” and “instructed group members not to post anything threatening, hateful, or that promoted violence or illegal conduct.”

Facebook only ever flagged five posts that supposedly violated community guidelines, but in warnings, the company reassured Rosado that “groups aren’t penalized when members or visitors break the rules without admin approval.”

Rosado had no reason to suspect that her group was in danger of removal. When Facebook disabled her group, it told Rosado the group violated community standards “multiple times.” But her complaint noted that, confusingly, “Facebook policies don’t provide for disabling groups if a few members post ostensibly prohibited content; they call for removing groups when the group moderator repeatedly either creates prohibited content or affirmatively ‘approves’ such content.”

Facebook’s decision came after a right-wing influencer, Laura Loomer, tagged Noem and Bondi in a social media post alleging that the group was “getting people killed.” Within two days, Bondi bragged that she had gotten the group disabled while claiming that it “was being used to dox and target [ICE] agents in Chicago.”

McDonnell told Ars it seems clear that Bondi selectively uses the term “doxing” when people post images from ICE arrests. He pointed to “ICE’s own social media accounts,” which share favorable opinions of ICE alongside videos and photos of ICE arrests that Bondi doesn’t consider doxing.

“Rosado’s creation of Facebook groups to send and receive information about where and how ICE carries out its duties in public, to share photographs and videos of ICE carrying out its duties in public, and to exchange opinions about and criticism of ICE’s tactics in carrying out its duties, is speech protected by the First Amendment,” FIRE argued.

The same goes for speech managed by Mark Hodges, a US citizen who resides in Indiana. He created an app called Eyes Up to serve as an archive of ICE videos. Apple removed Eyes Up from the App Store around the same time that it removed ICEBlock.

“It is just videos of what government employees did in public carrying out their duties,” McDonnell said. “It’s nothing even close to threatening or doxing or any of these other theories that the government has used to justify suppressing speech.”

Bondi bragged that she had gotten ICEBlock banned, and FIRE’s complaint confirmed that Hodges’ company received the same notification that ICEBlock’s developer got after Bondi’s victory lap. The notice said that Apple received “information” from “law enforcement” claiming that the apps had violated Apple guidelines against “defamatory, discriminatory, or mean-spirited content.”

Apple did not reach the same conclusion when it independently reviewed Eyes Up prior to government meddling, FIRE’s complaint said. Notably, the app remains available in Google Play, and Rosado now manages a new Facebook group with similar content but somewhat tighter restrictions on who can join. Neither activity has required urgent intervention from either tech giants or the government.

McDonnell told Ars that it’s harmful for DHS to water down the meaning of doxing when pushing platforms to remove content critical of ICE.

“When most of us hear the word ‘doxing,’ we think of something that’s threatening, posting private information along with home addresses or places of work,” McDonnell said. “And it seems like the government is expanding that definition to encompass just sharing, even if there’s no threats, nothing violent. Just sharing information about what our government is doing.”

Expanding the definition and then using that term to justify suppressing speech is concerning, he said, especially since the First Amendment includes no exception for “doxing,” even if DHS ever were to provide evidence of it.

To suppress speech, officials must show that groups are inciting violence or making true threats. FIRE has alleged that the government has not met “the extraordinary justifications required for a prior restraint” on speech and is instead using vague doxing threats to discriminate against speech based on viewpoint. They’re seeking a permanent injunction barring officials from coercing tech companies into censoring ICE posts.

If plaintiffs win, the censorship threats could subside, and tech companies may feel safe reinstating apps and Facebook groups, advocates told Ars. That could potentially revive archives documenting thousands of ICE incidents and reconnect webs of ICE watchers who lost access to valued feeds.

Until courts possibly end threats of censorship, the most cautious community members are moving local ICE-watch efforts to group chats and listservs that are harder for the government to disrupt, Trujillo told Ars.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Platforms bend over backward to help DHS censor ICE critics, advocates say Read More »

nih-head,-still-angry-about-covid,-wants-a-second-scientific-revolution

NIH head, still angry about COVID, wants a second scientific revolution


Can we pander to MAHA, re-litigate COVID, and improve science at the same time?

Image of a man with grey hair and glasses, wearing a suit, gesturing as he talks.

Bhattacharya speaks before the Senate shortly after the MAHA event. Credit: Chip Somodevilla

Bhattacharya speaks before the Senate shortly after the MAHA event. Credit: Chip Somodevilla

At the end of January, Washington, DC, saw an extremely unusual event. The MAHA Institute, which was set up to advocate for some of the most profoundly unscientific ideas of our time, hosted leaders of the best-funded scientific organization on the planet, the National Institutes of Health. Instead of a hostile reception, however, Jay Bhattacharya, the head of the NIH, was greeted as a hero by the audience, receiving a partial standing ovation when he rose to speak.

Over the ensuing five hours, the NIH leadership and MAHA Institute moderators found many areas of common ground: anger over pandemic-era decisions, a focus on the failures of the health care system, the idea that we might eat our way out of some health issues, the sense that science had lost people’s trust, and so on. And Bhattacharya and others clearly shaped their messages to resonate with their audience.

The reason? MAHA (Make America Healthy Again) is likely to be one of the only political constituencies supporting Bhattacharya’s main project, which he called a “second scientific revolution.”

In practical terms, Bhattacharya’s plan for implementing this revolution includes some good ideas that fall far short of a revolution. But his motivation for the whole thing seems to be lingering anger over the pandemic response—something his revolution wouldn’t address. And his desire to shoehorn it into the radical disruption of scientific research pursued by the Trump administration led to all sorts of inconsistencies between his claims and reality.

If this whole narrative seems long, complicated, and confusing, it’s probably a good preview of what we can expect from the NIH over the next few years.

MAHA meets science

Despite the attendance of several senior NIH staff (including the directors of the National Cancer Institute and National Institute of Allergy and Infectious Diseases) and Bhattacharya himself, this was clearly a MAHA event. One of the MAHA Institute’s VPs introduced the event as being about the “reclaimation” of a “discredited” NIH that had “gradually given up its integrity.”

“This was not a reclamation that involved people like Anthony Fauci,” she went on to say. “It was a reclamation of ordinary Americans, men and women who wanted our nation to excel in science rather than weaponize it.”

Things got a bit strange. Moderators from the MAHA Institute asked questions about whether COVID vaccines could cause cancer and raised the possibility of a lab leak favorably. An audience member asked why alternative treatments aren’t being researched. A speaker who proudly announced that he and his family had never received a COVID vaccine was roundly applauded. Fifteen minutes of the afternoon were devoted to a novelist seeking funding for a satirical film about the pandemic that portrayed Anthony Fauci as an egomaniacal lightweight, vaccines as a sort of placebo, and Bhattacharya as the hero of the story.

The organizers also had some idea of who might give all of this a hostile review, as reporters from Nature and Science said they were denied entry.

In short, this was not an event you’d go to if you were interested in making serious improvements to the scientific method. But that’s exactly how Bhattacharya treated it, spending the afternoon not only justifying the changes he’s made within the NIH but also arguing that we’re in need of a second scientific revolution—and he’s just the guy to bring it about.

Here’s an extensive section of his introduction to the idea:

I want to launch the second scientific revolution.

Why this grandiose vision? The first scientific revolution you have… very broadly speaking, you had high ecclesiastical authority deciding what was true or false on physical, scientific reality. And the first scientific revolution basically took… the truth-making power out of the hands of high ecclesiastical authority for deciding physical truth. We can leave aside spiritual—that is a different thing—physical truth and put it in the hands of people with telescopes. It democratized science fundamentally, it took the hands of power to decide what’s true out of the hands of authority and put it in the hands of ridiculous geniuses and regular people.

The second scientific revolution, then, is very similar. The COVID crisis, if it was anything, was the crisis of high scientific authority geting to decide not just a scientific truth like “plexiglass is going to protect us from COVID” or something, but also essentially spiritual truth. How should we treat our neighbor? Well, we treat our neighbor as a mere biohazzard.

The second scientific revolution, then, is the replication revolution. Rather than using the metrics of how many papers are we publishing as a metric for success, instead, what we’ll look at as a metric for successful scientific idea is ‘do you have an idea where other people [who are] looking at the same idea tend to find the same thing as you?’ It is not just narrow replication of one paper or one idea. It’s a really broad science. It includes, for instance, reproduction. So if two scientists disagree, that often leads to constructive ways forward in science—deciding, well there some new ideas that may come out of that disagreement

That section, which came early in his first talk of the day, hit on themes that would resurface throughout the afternoon: These people are angry about how the pandemic was handled, they’re trying to use that anger to fuel fundamental change in how science is done in the US, and their plan for change has nearly nothing to do with the issues that made them angry in the first place. In view of this, laying everything out for the MAHA crowd actually does make sense. They’re a suddenly powerful political constituency that also wants to see fundamental change in the scientific establishment, and they are completely unbothered by any lack of intellectual coherence.

Some good

The problem Bhattacharya believes he identified in the COVID response has nothing to do with replication problems. Even if better-replicated studies ultimately serve as a more effective guide to scientific truth, it would do little to change the fact that COVID restrictions were policy decisions largely made before relevant studies could even be completed, much less replicated. That’s a serious incoherence that needs to be acknowledged up front.

But that incoherence doesn’t prevent some of Bhattacharya’s ideas on replication and research priorities from being good. If they were all he was trying to accomplish, he could be a net positive.

Although he is a health economist, Bhattacharya correctly recognized something many people outside science don’t: Replication rarely comes from simply repeating the same set of experiments twice. Instead, many forms of replication happen by poking at the same underlying problem from multiple directions—looking in different populations, trying slightly different approaches, and so on. And if two approaches give different answers, it doesn’t mean that either of them is wrong. Instead, the differences could be informative, revealing something fundamental about how the system operates, as Bhattacharya noted.

He is also correct that simply changing the NIH to allow it to fund more replicative work probably won’t make a difference on its own. Instead, the culture of science needs to change so that replication can lead to publications that are valued for prestige, job security, and promotions—something that will only come slowly. He is also interested in attaching similar value to publishing negative results, like failed hypotheses or problems that people can’t address with existing technologies.

The National Institutes of Health campus.

The National Institutes of Health campus. Credit: NIH

Bhattacharya also spent some time discussing the fact that NIH grants have become very risk-averse, an issue frequently discussed by scientists themselves. This aversion is largely derived from the NIH’s desire to ensure that every grant will produce some useful results—something the agency values as a way to demonstrate to Congress that its budget is being spent productively. But it leaves little space for exploratory science or experiments that may not work for technical reasons. Bhattacharya hopes to change that by converting some five-year grants to a two-plus-three structure, where the first two years fund exploratory work that must prove successful for the remaining three years to be funded.

I’m skeptical that this would be as useful as Bhattacharya hopes. Researchers who already have reason to believe the “exploratory” portion will work are likely to apply, and others may find ways to frame results from the exploratory phase as a success. Still, it seems worthwhile to try to fund some riskier research.

There was also talk of providing greater support for young researchers, another longstanding issue. Bhattacharya also wants to ensure that the advances driven by NIH-funded research are more accessible to the public and not limited to those who can afford excessively expensive treatments—again, a positive idea. But he did not share a concrete plan for addressing these issues.

All of this is to say that Bhattacharya has some ideas that may be positive for the NIH and science more generally, even if they fall far short of starting a second scientific revolution. But they’re embedded in a perspective that’s intellectually incoherent and seems to demand far more than tinkering around the edges of reproducibility. And the power to implement his ideas comes from two entities—the MAHA movement and the Trump administration—that are already driving changes that go far beyond what Bhattacharya says he wants to achieve. Those changes will certainly harm science.

Why a revolution?

There are many potential problems with deciding that pandemic-era policy decisions necessitate a scientific revolution. The most significant is that the decisions, again, were fundamentally policy decisions, meaning they were value-driven as much as fact-driven. Bhattacharya is clearly aware of that, complaining repeatedly that his concerns were moral in nature. He also claimed that “during the pandemic, what we found was that the engines of science were used for social control” and that “the lockdowns were so far at odds with human liberty.”

He may be upset that, in his view, scientists intrude upon spiritual truth and personal liberty when recommending policy, but that has nothing to do with how science operates. It’s unclear how changing how scientists prioritize reproducibility would prevent policy decisions he doesn’t like. That disconnect means that even when Bhattacharya is aiming at worthwhile scientific goals, he’s doing so accidentally rather than in a way that will produce useful results.

This is all based on a key belief of Bhattacharya and his allies: that they were right about both the science of the pandemic and the ethical implications of pandemic policies. The latter is highly debatable, and many people would disagree with them about how to navigate the trade-offs between preserving human lives and maximizing personal freedoms.

But there are also many indications that these people are wrong about the science. Bhattacharya acknowledged the existence of long COVID but doesn’t seem to have wrestled with what his preferred policy—encouraging rapid infection among low-risk individuals—might have meant for long COVID incidence, especially given that vaccines appear to reduce the risk of developing it.

Matthew Memoli, acting NIH Director prior to Bhattacharya and currently its principal deputy director, shares Bhattacharya’s view that he was right, saying, “I’m not trying to toot my own horn, but if you read the email I sent [about pandemic policy], everything I said actually has come true. It’s shocking how accurate it was.”

Yet he also proudly proclaimed, “I knew I wasn’t getting vaccinated, and my wife wasn’t, kids weren’t. Knowing what I do about RNA viruses, this is never going to work. It’s not a strategy for this kind [of virus].” And yet the benefits of COVID vaccinations for preventing serious illness have been found in study after study—it is, ironically, science that has been reproduced.

A critical aspect of the original scientific revolution was the recognition that people have to deal with facts that are incompatible with their prior beliefs. It’s probably not a great idea to have a second scientific revolution led by people who appear to be struggling with a key feature of the first.

Political or not?

Anger over Biden-era policies makes Bhattacharya and his allies natural partners of the Trump administration and is almost certainly the reason these people were placed in charge of the NIH. But it also puts them in an odd position with reality, since they have to defend policies that clearly damage science. “You hear, ‘Oh well this project’s been cut, this funding’s been cut,’” Bhattacharya said. “Well, there hasn’t been funding cut.”

A few days after Bhattacharya made this statement, Senator Bernie Sanders released data showing that many areas of research have indeed seen funding cuts.

Image of a graph with a series of colored lines, each of which shows a sharp decline at the end.

Bhattacharya’s claims that no funding had been cut appears to be at odds with the data.

Bhattacharya’s claims that no funding had been cut appears to be at odds with the data. Credit: Office of Bernard Sanders

Bhattacharya also acknowledged that the US suffers from large health disparities between different racial groups. Yet grants funding studies of those disparities were cut during DOGE’s purge of projects it labeled as “DEI.” Bhattacharya was happy to view that funding as being ideologically motivated. But as lawsuits have revealed, nobody at the NIH ever evaluated whether that was the case; Matthew Memoli, one of the other speakers, simply forwarded on the list of grants identified by DOGE with instructions that they be canceled.

Bhattacharya also did his best to portray the NIH staff as being enthused about the changes he’s making, presenting the staff as being liberated from a formerly oppressive leadership. “The staff there, they worked for many decades under a pretty tight regime,” he told the audience. “They were controlled, and now we were trying to empower them to come to us with their ideas.”

But he is well aware of the dissatisfaction expressed by NIH workers in the Bethesda Declaration (he met with them, after all), as well as the fact that one of the leaders of that effort has since filed for whistleblower protection after being placed on administrative leave due to her advocacy.

Bhattacharya effectively denied both that people had suffered real-world consequences in their jobs and funding and that the decision to sideline them was political. Yet he repeatedly implied that he and his allies suffered due to political decisions because… people left him off some email chains.

“No one was interested in my opinion about anything,” he told the audience. “You weren’t on the emails anymore.”

And he implied this sort of “suppression” was widespread. “I’ve seen Matt [Memoli] poke his head up and say that he was against the COVID vaccine mandates—in the old NIH, that was an act of courage,” Battacharya said. “I recognized it as an act of courage because you weren’t allowed to contradict the leader for fear that you were going to get suppressed.” As he acknowledged, though, Memoli suffered no consequences for contradicting “the leader.”

Bhattacharya and his allies continue to argue that it’s a serious problem that they suffered no consequences for voicing ideas they believe were politically disfavored; yet they are perfectly comfortable with people suffering real consequences due to politics. Again, it’s not clear how this sort of intellectual incoherence can rally scientists around any cause, much less a revolution.

Does it matter?

Given that politics has left Bhattacharya in charge of the largest scientific funding agency on the planet, it may not matter how the scientific community views his project. And it’s those politics that are likely at the center of Bhattacharya’s decision to give the MAHA Institute an entire afternoon of his time. It’s founded specifically to advance the aims of his boss, Secretary of Health Robert F. Kennedy Jr., and represents a group that has become an important component of Trump’s coalition. As such, they represent a constituency that can provide critical political support for what Bhattacharya hopes to accomplish.

Close-up of sterile single-use syringes individually wrapped in plastic and arranged in a metal tray, each containing a dose of COVID-19 vaccine.

Vaccine mandates played a big role in motivating the present leadership of the NIH.

Vaccine mandates played a big role in motivating the present leadership of the NIH. Credit: JEAN-FRANCOIS FORT

Unfortunately, they’re also very keen on profoundly unscientific ideas, such as the idea that ivermectin might treat cancer or that vaccines aren’t thoroughly tested. The speakers did their best not to say anything that might offend their hosts, in one example spending several minutes to gently tell a moderator why there’s no plausible reason to think ivermectin would treat cancer. They also made some supportive gestures where possible. Despite the continued flow of misinformation from his boss, Bhattacharya said, “It’s been really great to be part of administration to work for Secretary Kennedy for instance, whose only focus is to make America healthy.”

He also made the point of naming “vaccine injury” as a medical concern he suggested was often ignored by the scientific community, lumping it in with chronic Lyme disease and long COVID. Several of the speakers noted positive aspects of vaccines, such as their ability to prevent cancers or protect against dementia. Oddly, though, none of these mentions included the fact that vaccines are highly effective at blocking or limiting the impact of the pathogens they’re designed to protect against.

When pressed on some of MAHA’s odder ideas, NIH leadership responded with accurate statements on topics such as plausible biological mechanisms and the timing of disease progression. But the mere fact that they had to answer these questions highlights the challenges NIH leadership faces: Their primary political backing comes from people who have limited respect for the scientific process. Pandering to them, though, will ultimately undercut any support they might achieve from the scientific community.

Managing that tension while starting a scientific revolution would be challenging on its own. But as the day’s talks made clear, the challenges are likely to be compounded by the lack of intellectual coherence behind the whole project. As much as it would be good to see the scientific community place greater value on reproducibility, these aren’t the right guys to make that happen.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

NIH head, still angry about COVID, wants a second scientific revolution Read More »

why-darren-aronofsky-thought-an-ai-generated-historical-docudrama-was-a-good-idea

Why Darren Aronofsky thought an AI-generated historical docudrama was a good idea


We hold these truths to be self-evident

Production source says it takes “weeks” to produce just minutes of usable video.

Artist’s conception of critics reacting to the first episodes of “On This Day… 1776” Credit: Primordial Soup

Artist’s conception of critics reacting to the first episodes of “On This Day… 1776” Credit: Primordial Soup

Last week, filmmaker Darren Aronofsky’s AI studio Primordial Soup and Time magazine released the first two episodes of On This Day… 1776. The year-long series of short-form videos features short vignettes describing what happened on that day of the American Revolution 250 years ago, but it does so using “a variety of AI tools” to produce photorealistic scenes containing avatars of historical figures like George Washington, Thomas Paine, and Benjamin Franklin.

In announcing the series, Time Studios President Ben Bitonti said the project provides “a glimpse at what thoughtful, creative, artist-led use of AI can look like—not replacing craft but expanding what’s possible and allowing storytellers to go places they simply couldn’t before.”

The trailer for “On This Day… 1776.”

Outside critics were decidedly less excited about the effort. The AV Club took the introductory episodes to task for “repetitive camera movements [and] waxen characters” that make for “an ugly look at American history.” CNET said that this “AI slop is ruining American history,” calling the videos a “hellish broth of machine-driven AI slop and bad human choices.” The Guardian lamented that the “once-lauded director of Black Swan and The Wrestler has drowned himself in AI slop,” calling the series “embarrassing,” “terrible,” and “ugly as sin.” I could go on.

But this kind of initial reaction apparently hasn’t deterred Primordial Soup from its still-evolving efforts. A source close to the production, who requested anonymity to speak frankly about details of the series’ creation, told Ars that the quality of new episodes would improve as the team’s AI tools are refined throughout the year and as the team learns to better use them.

“We’re going into this fully assuming that we have a lot to learn, that this process is gonna evolve, the tools we’re using are gonna evolve,” the source said. “We’re gonna make mistakes. We’re gonna learn a lot… we’re going to get better at it, [and] the technology will change. We’ll see how audiences are reacting to certain things, what works, what doesn’t work. It’s a huge experiment, really.”

Not all AI

It’s important to note that On This Day… 1776 is not fully crafted by AI. The script, for instance, was written by a team of writers overseen by Aronofsky’s longtime writing partners Ari Handel and Lucas Sussman, as noted by The Hollywood Reporter. That makes criticisms like the Guardian’s of “ChatGPT-sounding sloganeering” in the first episodes both somewhat misplaced and hilariously harsh.

Our production source says the project was always conceived as a human-written effort and that the team behind it had long been planning and researching how to tell this kind of story. “I don’t think [they] even needed that kind of help or wanted that kind of [AI-powered writing] help,” they said. “We’ve all experimented with [AI-powered] writing and the chatbots out there, and you know what kind of quality you get out of that.”

What you see here is not a real human actor, but his lines were written and voiced by humans.

What you see here is not a real human actor, but his lines were written and voiced by humans. Credit: Primordial Soup

The producers also go out of their way to note that all the dialogue in the series is recorded directly by Screen Actors Guild voice actors, not by AI facsimiles. While recently negotiated union rules might have something to do with that, our production source also said the AI-generated voices the team used for temp tracks were noticeably artificial and not ready for a professional production.

Humans are also directly responsible for the music, editing, sound mixing, visual effects, and color correction for the project, according to our source. The only place the “AI-powered tools” come into play is in the video itself, which is crafted with what the announcement calls a “combination of traditional filmmaking tools and emerging AI capabilities.”

In practice, our source says, that means humans create storyboards, find visual references for locations and characters, and set up how they want shots to look. That information, along with the script, gets fed into an AI video generator that creates individual shots one at a time, to be stitched together and cleaned up by humans in traditional post-production.

That process takes the AI-generated cinema conversation one step beyond Ancestra, a short film Primordial Soup released last summer in association with Google DeepMind (which is not involved with the new project). There, AI tools were used to augment “live-action scenes with sequences generated by Veo.”

“Weeks” of prompting and re-prompting

In theory, having an AI model generate a scene in minutes might save a lot of time compared to traditional filmmaking—scouting locations, hiring actors, setting up cameras and sets, and the like. But our production source said the highly iterative process of generating and perfecting shots for On This Day… 1776 still takes “weeks” for each minutes-long video and that “more often than not, we’re pushing deadlines.”

The first episode of On this Day… 1776 features a dramatic flag raising.

Even though the AI model is essentially animating photorealistic avatars, the source said the process is “more like live action filmmaking” because of the lack of fine-grained control over what the video model will generate. “You don’t know if you’re gonna get what you want on the first take or the 12th take or the 40th take,” the source said.

While some shots take less time to get right than others, our source said the AI model rarely produces a perfect, screen-ready shot on the first try. And while some small issues in an AI-generated shot can be papered over in post-production with visual effects or careful editing, most of the time, the team has to go back and tell the model to generate a completely new video with small changes.

“It still takes a lot of work, and it’s not necessarily because it’s wrong, per se, so much as trying to get the right control because you [might] want the light to land on the face in the right way to try to tell the story,” the source said. “We’re still, we’re still striving for the same amount of control that we always have [with live-action production] to really maximize the story and the emotion.”

Quick shots and smaller budgets

Though video models have advanced since the days of the nightmarish clip of Will Smith eating spaghetti, hallucinations and nonsensical images are “still a problem” in producing On This Day… 1776, according to our source. That’s one of the reasons the company decided to use a series of short-form videos rather than a full-length movie telling the same essential story.

“It’s one thing to stay consistent within three minutes. It’s a lot harder and it takes a lot more work to stay consistent within two hours,” the source said. “I don’t know what the upper limit is now [but] the longer you get, the more things start to fall off.”

Stills from an AI-generated video of Will Smith eating spaghetti.

We’ve come a long way from the circa-2023 videos of Will Smith eating spaghetti.

We’ve come a long way from the circa-2023 videos of Will Smith eating spaghetti. Credit: chaindrop / Reddit

Keeping individual shots short also allows for more control and fewer “reshoots” for an AI-animated production like this. “When you think about it, if you’re trying to create a 20-second clip, you have all these things that are happening, and if one of those things goes wrong in 20 seconds, you have to start over,” our source said. “And the chance of something going wrong in 20 seconds is pretty high. The chance of something going wrong in eight seconds is a lot lower.”

While our production source couldn’t give specifics on how much the team was spending to generate so much AI-modeled video, they did suggest that the process was still a good deal cheaper than filming a historical docudrama like this on location.

“I mean, we could never achieve what we’re doing here for this amount of money, which I think is pretty clear when you watch this,” they said. In future episodes, the source promised, “you’ll see where there’s things that cameras just can’t even do” as a way to “make the most of that medium.”

“Let’s see what we can do”

If you’ve been paying attention to how fast things have been moving with AI-generated video, you might think that AI models will soon be able to produce Hollywood-quality cinema with nothing but a simple prompt. But our source said that working on On This Day… 1776 highlights just how important it is for humans to still be in the loop on something like this.

“Personally, I don’t think we’re ever gonna get there [replacing human editors],” he said. “We actually desperately need an editor. We need another set of eyes who can look at the cut and say, ‘If we get out of this shot a little early, then we can create a little bit of urgency. If we linger on this thing a little longer…’ You still really need that.”

AI Ben Franklin and AI Thomas Paine toast to the war propaganda effort.

AI Ben Franklin and AI Thomas Paine toast to the war propaganda effort. Credit: Primordial Soup

That could be good news for human editors. But On This Day… 1776 also suggests a world where on-screen (or even motion-captured) human actors are fully replaced by AI-generated avatars. When I asked our source why the producers felt that AI was ready to take over that specifically human part of the film equation, though, the response surprised me.

“I don’t know that we do know that, honestly,” they said. “I think we know that the technology is there to try. And I think as storytellers we’re really interested in using… all the different tools that we can to try to get our story across and to try to make audiences feel something.”

“It’s not often that we have huge new tools like this,” the source continued. “I mean, it’s never happened in my lifetime. But when you do [get these new tools], you want to start playing with them… We have to try things in order to know if it works, if it doesn’t work.”

“So, you know, we have the tools now. Let’s see what we can do.”

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Why Darren Aronofsky thought an AI-generated historical docudrama was a good idea Read More »

so-yeah,-i-vibe-coded-a-log-colorizer—and-i-feel-good-about-it

So yeah, I vibe-coded a log colorizer—and I feel good about it


Some semi-unhinged musings on where LLMs fit into my life—and how I’ll keep using them.

Altered image of the article author appearing to indicate that he is in fact a robot

Welcome to the future. Man, machine, the future. Credit: Aurich Lawson

Welcome to the future. Man, machine, the future. Credit: Aurich Lawson

I can’t code.

I know, I know—these days, that sounds like an excuse. Anyone can code, right?! Grab some tutorials, maybe an O’Reilly book, download an example project, and jump in. It’s just a matter of learning how to break your project into small steps that you can make the computer do, then memorizing a bit of syntax. Nothing about that is hard!

Perhaps you can sense my sarcasm (and sympathize with my lack of time to learn one more technical skill).

Oh, sure, I can “code.” That is, I can flail my way through a block of (relatively simple) pseudocode and follow the flow. I have a reasonably technical layperson’s understanding of conditionals and loops, and of when one might use a variable versus a constant. On a good day, I could probably even tell you what a “pointer” is.

But pulling all that knowledge together and synthesizing a working application any more complex than “hello world”? I am not that guy. And at this point, I’ve lost the neuroplasticity and the motivation (if I ever had either) to become that guy.

Thanks to AI, though, what has been true for my whole life need not be true anymore. Perhaps, like my colleague Benj Edwards, I can whistle up an LLM or two and tackle the creaky pile of “it’d be neat if I had a program that would do X” projects without being publicly excoriated on StackOverflow by apex predator geeks for daring to sully their holy temple of knowledge with my dirty, stupid, off-topic, already-answered questions.

So I gave it a shot.

A cache-related problem appears

My project is a small Python-based log colorizer that I asked Claude Code to construct for me. If you’d like to peek at the code before listening to me babble, a version of the project without some of the Lee-specific customizations is available on GitHub.

Screenshot of Lee's log colorizer in action

My Nginx log colorizer in action, showing Space City Weather traffic on a typical Wednesday afternoon. Here, I’m running two instances, one for IPv4 visitors and one for IPv6. (By default, all traffic is displayed, but splitting it this way makes things easier for my aging eyes to scan.)

Credit: Lee Hutchinson

My Nginx log colorizer in action, showing Space City Weather traffic on a typical Wednesday afternoon. Here, I’m running two instances, one for IPv4 visitors and one for IPv6. (By default, all traffic is displayed, but splitting it this way makes things easier for my aging eyes to scan.) Credit: Lee Hutchinson

Why a log colorizer? Two reasons. First, and most important to me, because I needed to look through a big ol’ pile of web server logs, and off-the-shelf colorizer solutions weren’t customizable to the degree I wanted. Vibe-coding one that exactly matched my needs made me happy.

But second, and almost equally important, is that this was a small project. The colorizer ended up being a 400-ish line, single-file Python script. The entire codebase, plus the prompting and follow-up instructions, fit easily within Claude Code’s context window. This isn’t an application that sprawls across dozens or hundreds of functions in multiple files, making it easy to audit (even for me).

Setting the stage: I do the web hosting for my colleague Eric Berger’s Houston-area forecasting site, Space City Weather. It’s a self-hosted WordPress site, running on an AWS EC2 t3a.large instance, fronted by Cloudflare using CF’s WordPress Automatic Platform Optimization.

Space City Weather also uses self-hosted Discourse for commenting, replacing WordPress’ native comments at the bottom of Eric’s daily weather posts via the WP-Discourse plugin. Since bolting Discourse onto the site back in August 2025, though, I’ve had an intermittent issue where sometimes—but not all the time—a daily forecast post would go live and get cached by Cloudflare with the old, disabled native WordPress comment area attached to the bottom instead of the shiny new Discourse comment area. Hundreds of visitors would then see a version of the post without a functional comment system until I manually expired the stale page or until the page hit Cloudflare’s APO-enforced max age and expired itself.

The problem behavior would lie dormant for weeks or months, and then we’d get a string of back-to-back days where it would rear its ugly head. Edge cache invalidation on new posts is supposed to be triggered automatically by the official Cloudflare WordPress plug-in, and indeed, it usually worked fine—but “usually” is not “always.”

In the absence of any obvious clues as to why this was happening, I consulted a few different LLMs and asked for possible fixes. The solution I settled on was having one of them author a small mu-plugin in PHP (more vibe coding!) that forces WordPress to slap “DO NOT CACHE ME!” headers on post pages until it has verified that Discourse has hooked its comments to the post. (Curious readers can put eyes on this plugin right here.)

This “solved” the problem by preempting the problem behavior, but it did nothing to help me identify or fix the actual underlying issue. I turned my attention elsewhere for a few months. One day in December, as I was updating things, I decided to temporarily disable the mu-plugin to see if I still needed it. After all, problems sometimes go away on their own, right? Computers are crazy!

Alas, the next time Eric made a Space City Weather post, it popped up sans Discourse comment section, with the (ostensibly disabled) WordPress comment form at the bottom. Clearly, the problem behavior was still in play.

Interminable intermittence

Have you ever been stuck troubleshooting an intermittent issue? Something doesn’t work, you make a change, it suddenly starts working, then despite making no further changes, it randomly breaks again.

The process makes you question basic assumptions, like, “Do I actually know how to use a computer?” You feel like you might be actually-for-real losing your mind. The final stage of this process is the all-consuming death spiral, where you start asking stuff like, “Do I need to troubleshoot my troubleshooting methods? Is my server even working? Is the simulation we’re all living in finally breaking down and reality itself is toying with me?!”

In this case, I couldn’t reproduce the problem behavior on demand, no matter how many tests I tried. I couldn’t see any narrow, definable commonalities between days where things worked fine and days where things broke.

Rather than an image, I invite you at this point to enjoy Muse’s thematically appropriate song “Madness” from their 2012 concept album The 2nd Law.

My best hope for getting a handle on the problem likely lay deeply buried in the server’s logs. Like any good sysadmin, I gave the logs a quick once-over for problems a couple of times per month, but Space City Weather is a reasonably busy medium-sized site and dishes out its daily forecast to between 20,000 and 30,000 people (“unique visitors” in web parlance, or “UVs” if you want to sound cool). Even with Cloudflare taking the brunt of the traffic, the daily web server log files are, let us say, “a bit dense.” My surface-level glances weren’t doing the trick—I’d have to actually dig in. And having been down this road before for other issues, I knew I needed more help than grep alone could provide.

The vibe use case

The Space City Weather web server uses Nginx for actual web serving. For folks who have never had the pleasure, Nginx, as configured in most of its distributable packages, keeps a pair of log files around—one that shows every request serviced and another just for errors.

I wanted to watch the access log right when Eric was posting to see if anything obviously dumb/bad/wrong/broken was happening. But I’m not super-great at staring at a giant wall of text and symbols, and I tend to lean heavily on syntax highlighting and colorization to pick out the important bits when I’m searching through log files. There’s an old and crusty program called ccze that’s easily findable in most repos; I’ve used it forever, and if its default output does what you need, then it’s an excellent tool.

But customizing ccze’s output is a “here be dragons”-type task. The application is old, and time has ossified it into something like an unapproachably evil Mayan relic, filled with shadowy regexes and dark magic, fit to be worshipped from afar but not trifled with. Altering ccze’s behavior threatens to become an effort-swallowing bottomless pit, where you spend more time screwing around with the tool and the regexes than you actually spend using the tool to diagnose your original problem.

It was time to fire up VSCode and pretend to be a developer. I set up a new project, performed the demonic invocation to summon Claude Code, flipped the thing into “plan mode,” and began.

“I’d like to see about creating an Nginx log colorizer,” I wrote in the prompt box. “I don’t know what language we should use. I would like to prioritize efficiency and performance in the code, as I will be running this live in production and I can’t have it adding any applicable load.” I dropped a truncated, IP-address-sanitized copy of yesterday’s Nginx access.log into the project directory.

“See the access.log file in the project directory as an example of the data we’ll be colorizing. You can test using that file,” I wrote.

Screenshot of Lee's Visual Studio Code window showing the log colorizer project

Visual Studio Code, with agentic LLM integration, making with the vibe-coding.

Credit: Lee Hutchinson

Visual Studio Code, with agentic LLM integration, making with the vibe-coding. Credit: Lee Hutchinson

Ever helpful, Claude Code chewed on the prompt and the example data for a few seconds, then began spitting output. It suggested Python for our log colorizer because of the language’s mature regex support—and to keep the code somewhat readable for poor, dumb me. The actual “vibe-coding” wound up spanning two sessions over two days, as I exhausted my Claude Code credits on the first one (a definite vibe-coding danger!) and had to wait for things to reset.

“Dude, lnav and Splunk exist, what is wrong with you?”

Yes, yes, a log colorizer is bougie and lame, and I’m treading over exceedingly well-trodden ground. I did, in fact, sit for a bit with existing tools—particularly lnav, which does most of what I want. But I didn’t want most of my requirements met. I wanted all of them. I wanted a bespoke tool, and I wanted it without having to pay the “is it worth the time?” penalty. (Or, perhaps, I wanted to feel like the LLM’s time was being wasted rather than mine, given that the effort ultimately took two days of vibe-coding.)

And about those two days: Getting a basic colorizer coded and working took maybe 10 minutes and perhaps two rounds of prompts. It was super-easy. Where I burned the majority of the time and compute power was in tweaking the initial result to be exactly what I wanted.

For therein lies the truly seductive part of vibe-coding—the ease of asking the LLM to make small changes or improvements and the apparent absence of cost or consequence for implementing those changes. The impression is that you’re on the Enterprise-D, chatting with the ship’s computer, collaboratively solving a problem with Geordi and Data standing right behind you. It’s downright intoxicating to say, “Hm, yes, now let’s make it so I can show only IPv4 or IPv6 clients with a command line switch,” and the machine does it. (It’s even cooler if you make the request while swinging your leg over the back of a chair so you can sit in it Riker-style!)

Screenshot showing different LLM instructions given by Lee to Claude Code

A sample of the various things I told the machine to do, along with a small visual indication of how this all made me feel.

Credit: Lucasfilm / Disney

A sample of the various things I told the machine to do, along with a small visual indication of how this all made me feel. Credit: Lucasfilm / Disney

It’s exhilarating, honestly, in an Emperor Palpatine “UNLIMITED POWERRRRR!” kind of way. It removes a barrier that I didn’t think would ever be removed—or, rather, one I thought I would never have the time, motivation, or ability to tear down myself.

In the end, after a couple of days of testing and iteration—including a couple of “Is this colorizer performant, and will it introduce system load if run in production?” back-n-forth exchanges where the LLM reduced the cost of our regex matching and ensured our main loop wasn’t very heavy, I got a tool that does exactly what I want.

Specifically, I now have a log colorizer that:

  • Handles multiple Nginx (and Apache) log file formats
  • Colorizes things using 256-color ANSI codes that look roughly the same in different terminal applications
  • Organizes hostname & IP addresses in fixed-length columns for easy scanning
  • Colorizes HTTP status codes and cache status (with configurable colors)
  • Applies different colors to the request URI depending on the resource being requested
  • Has specific warning colors and formatting to highlight non-HTTPS requests or other odd things
  • Can apply alternate colors for specific IP addresses (so I can easily pick out Eric’s or my requests)
  • Can constrain output to only show IPv4 or IPv6 hosts

…and, worth repeating, it all looks exactly how I want it to look and behaves exactly how I want it to behave. Here’s another action shot!

Image of the log colorizer working

The final product. She may not look like much, but she’s got it where it counts, kid.

Credit: Lee Hutchinson

The final product. She may not look like much, but she’s got it where it counts, kid. Credit: Lee Hutchinson

Problem spotted

Armed with my handy-dandy log colorizer, I patiently waited for the wrong-comment-area problem behavior to re-rear its still-ugly head. I did not have to wait long, and within a couple of days, I had my root cause. It had been there all along, if I’d only decided to spend some time looking for it. Here it is:

Screenshot showing a race condition between apple news and wordpress's cache clearing efforts

Problem spotted. Note the AppleNewsBots hitting the newly published post before Discourse can do its thing and the final version of the page with comments is ready.

Credit: Lee Hutchinson

Problem spotted. Note the AppleNewsBots hitting the newly published post before Discourse can do its thing and the final version of the page with comments is ready. Credit: Lee Hutchinson

Briefly: The problem is Apple’s fault. (Well, not really. But kinda.)

Less briefly: I’ve blurred out Eric’s IP address, but it’s dark green, so any place in the above image where you see a blurry, dark green smudge, that’s Eric. In the roughly 12-ish seconds presented here, you’re seeing Eric press the “publish” button on his daily forecast—that’s the “POST” event at the very top of the window. The subsequent events from Eric’s IP address are his browser having the standard post-publication conversation with WordPress so it can display the “post published successfully” notification and then redraw the WP block editor.

Below Eric’s post, you can see the Discourse server (with orange IP address) notifying WordPress that it has created a new Discourse comment thread for Eric’s post, then grabbing the things it needs to mirror Eric’s post as the opener for that thread. You can see it does GETs for the actual post and also for the post’s embedded images. About one second after Eric hits “publish,” the new post’s Discourse thread is ready, and it gets attached to Eric’s post.

Ah, but notice what else happens during that one second.

To help expand Space City Weather’s reach, we cross-publish all of the site’s posts to Apple News, using a popular Apple News plug-in (the same one Ars uses, in fact). And right there, with those two GET requests immediately after Eric’s POST request, lay the problem: You’re seeing the vanguard of Apple News’ hungry army of story-retrieval bots, summoned by the same “publish” event, charging in and demanding a copy of the brand new post before Discourse has a chance to do its thing.

Gif of Eric Andre screaming

I showed the AppleNewsBot stampede log snippet to Techmaster Jason Marlin, and he responded with this gif.

Credit: Adult Swim

I showed the AppleNewsBot stampede log snippet to Techmaster Jason Marlin, and he responded with this gif. Credit: Adult Swim

It was a classic problem in computing: a race condition. Most days, Discourse’s new thread creation would beat the AppleNewsBot rush; some days, though, it wouldn’t. On the days when it didn’t, the horde of Apple bots would demand the page before its Discourse comments were attached, and Cloudflare would happily cache what those bots got served.

I knew my fix of emitting “NO CACHE” headers on the story pages prior to Discourse attaching comments worked, but now I knew why it worked—and why the problem existed in the first place. And oh, dear reader, is there anything quite so viscerally satisfying in all the world as figuring out the “why” behind a long-running problem?

But then, just as Icarus became so entranced by the miracle of flight that he lost his common sense, I too forgot I soared on wax-wrought wings, and flew too close to the sun.

LLMs are not the Enterprise-D’s computer

I think we all knew I’d get here eventually—to the inevitable third act turn, where the center cannot hold, and things fall apart. If you read Benj’s latest experience with agentic-based vibe coding—or if you’ve tried it yourself—then what I’m about to say will probably sound painfully obvious, but it is nonetheless time to say it.

Despite their capabilities, LLM coding agents are not smart. They also are not dumb. They are agents without agency—mindless engines whose purpose is to complete the prompt, and that is all.

Screenshot of Data, Geordi, and Riker collaboratively coding at one of the bridge's aft science stations

It feels like this… until it doesn’t.

Credit: Paramount Television

It feels like this… until it doesn’t. Credit: Paramount Television

What this means is that, if you let them, Claude Code (and OpenAI Codex and all the other agentic coding LLMs) will happily spin their wheels for hours hammering on a solution that can’t ever actually work, so long as their efforts match the prompt. It’s on you to accurately scope your problem. You must articulate what you want in plain and specific domain-appropriate language, because the LLM cannot and will not properly intuit anything you leave unsaid. And having done that, you must then spot and redirect the LLM away from traps and dead ends. Otherwise, it will guess at what you want based on the alignment of a bunch of n-dimensional curves and vectors in high-order phase space, and it might guess right—but it also very much might not.

Lee loses the plot

So I had my log colorizer, and I’d found my problem. I’d also found, after leaving the colorizer up in a window tailing the web server logs in real time, all kinds of things that my previous behavior of occasionally glancing at the logs wasn’t revealing. Ooh, look, there’s a rest route that should probably be blocked from the outside world! Ooh, look, there’s a web crawler I need to feed into Cloudflare’s WAF wood-chipper because it’s ignoring robots.txt! Ooh, look, here’s an area where I can tweak my fastcgi cache settings and eke out a slightly better hit rate!

But here’s the thing with the joy of problem-solving: Like all joy, its source is finite. The joy comes from the solving itself, and even when all my problems are solved and the systems are all working great, I still crave more joy. It is in my nature to therefore invent new problems to solve.

I decided that the problem I wanted to solve next was figuring out a way for my log colorizer to display its output without wrapping long lines—because wrapped lines throw off the neatly delimited columns of log data. I would instead prefer that my terminal window sprout a horizontal scroll bar when needed, and if I wanted to see the full extent of a long line, I could grab the scroll bar and investigate.

Astute readers will at this point notice two things: first, that now I really was reinventing lnav, except way worse and way dumber. Second, and more importantly, line-wrapping behavior is properly a function of the terminal application, not the data being displayed within it, and my approach was misguided from first principles. (This is in fact exactly the kind of request that can and should be slapped down on StackOverflow—and, indeed, searching there shows many examples of this exact thing happening.)

But the lure of telling the machine what to do and then watching the machine weave my words into functional magic was too strong—surely we could code our way out of this problem! With LLMs, we can code our way out of any problem! Right?

Eventually, after much refining of requirements, Claude produced what I asked it to produce: a separate Python script, which accepted piped input and created, like, a viewport or something—I don’t know, I can’t code, remember?—and within that viewport, I could scroll around. It seemed to work great!

Problems became apparent when I actually tried scrolling around for more than a short distance—and by “problems,” I mean “my web server’s CPU lit up like a Christmas tree on fire.”

“Issue,” I told the LLM. “When horizontal scrolling happens, the log wrapper takes up almost 100 percent of a host CPU core. Can we assess how to make this a zero-CPU-impact tool?”

“This tool has to parse ANSI, slice lines, and repaint when you scroll,” it replied in what I imagined was an arch, droll tone. “The bottleneck is screen redraw + full-width scan of all lines on every new line or key repeat. Each new line triggers a full screen redraw of term_height lines, plus visible_width() over the entire buffer. Holding the arrow key generates rapid repeats, each doing a full redraw and slice of every line.”

I was crestfallen.

“Zero-CPU isn’t achievable,” continued the LLM, “but there are low-impact approaches.”

My crest lifted a bit, and we dug in—but the LLM was perhaps a bit optimistic about those “low-impact approaches.” We burned several more days’ worth of tokens on performance improvements—none of which I had any realistic input on because at this point we were way, way past my ability to flail through the Python code and understand what the LLM was doing. Eventually, we hit a wall.

Screenshot of the LLM telling Lee that this is just not going to work

If you listen carefully, you can hear the sound of my expectations crashing hard into reality.

If you listen carefully, you can hear the sound of my expectations crashing hard into reality.

Instead of throwing in the towel, I vibed on, because the sunk cost fallacy is for other people. I instructed the LLM to shift directions and help me run the log display script locally, so my desktop machine with all its many cores and CPU cycles to spare would be the one shouldering the reflow/redraw burden and not the web server.

Rather than drag this tale on for any longer, I’ll simply enlist Ars Creative Director Aurich Lawson’s skills to present the story of how this worked out in the form of a fun collage, showing my increasingly unhinged prompting of the LLM to solve the new problems that appeared when trying to get a script to run on ssh output when key auth and sudo are in play:

A collage of error messages begetting madness

Mammas, don’t let your babies grow up to be vibe coders.

Credit: Aurich Lawson

Mammas, don’t let your babies grow up to be vibe coders. Credit: Aurich Lawson

The bitter end

So, thwarted in my attempts to do exactly what I wanted in exactly the way I wanted, I took my log colorizer and went home. (The failed log display script is also up on GitHub with the colorizer if anyone wants to point and laugh at my efforts. Is the code good? Who knows?! Not me!) I’d scored my big win and found my problem root cause, and that would have to be enough for me—for now, at least.

As to that “big win”—finally managing a root-cause analysis of my WordPress-Discourse-Cloudflare caching issue—I also recognize that I probably didn’t need a vibe-coded log colorizer to get there. The evidence was already waiting to be discovered in the Nginx logs, whether or not it was presented to me wrapped in fancy colors. Did I, in fact, use the thrill of vibe coding a tool to Tom Sawyer myself into doing the log searches? (“Wow, self, look at this new cool log colorizer! Bet you could use that to solve all kinds of problems! Yeah, self, you’re right! Let’s do it!”) Very probably. I know how to motivate myself, and sometimes starting a task requires some mental trickery.

This round of vibe coding and its muddled finale reinforced my personal assessment of LLMs—an assessment that hasn’t changed much with the addition of agentic abilities to the toolkit.

LLMs can be fantastic if you’re using them to do something that you mostly understand. If you’re familiar enough with a problem space to understand the common approaches used to solve it, and you know the subject area well enough to spot the inevitable LLM hallucinations and confabulations, and you understand the task at hand well enough to steer the LLM away from dead-ends and to stop it from re-inventing the wheel, and you have the means to confirm the LLM’s output, then these tools are, frankly, kind of amazing.

But the moment you step outside of your area of specialization and begin using them for tasks you don’t mostly understand, or if you’re not familiar enough with the problem to spot bad solutions, or if you can’t check its output, then oh, dear reader, may God have mercy on your soul. And on your poor project, because it’s going to be a mess.

These tools as they exist today can help you if you already have competence. They cannot give you that competence. At best, they can give you a dangerous illusion of mastery; at worst, well, who even knows? Lost data, leaked PII, wasted time, possible legal exposure if the project is big enough—the “worst” list goes on and on!

To vibe or not to vibe?

The log colorizer is not the first nor the last bit of vibe coding I’ve indulged in. While I’m not as prolific as Benj, over the past couple of months, I’ve turned LLMs loose on a stack of coding tasks that needed doing but that I couldn’t do myself—often in direct contravention of my own advice above about being careful to use them only in areas where you already have some competence. I’ve had the thing make small WordPress PHP plugins, regexes, bash scripts, and my current crowning achievement: a save editor for an old MS-DOS game (in both Python and Swift, no less!) And I had fun doing these things, even as entire vast swaths of rainforest were lit on fire to power my agentic adventures.

As someone employed in a creative field, I’m appropriately nervous about LLMs, but for me, it’s time to face reality. An overwhelming majority of developers say they’re using AI tools in some capacity. It’s a safer career move at this point, almost regardless of one’s field, to be more familiar with them than unfamiliar with them. The genie is not going back into the lamp—it’s too busy granting wishes.

I don’t want y’all to think I feel doomy-gloomy over the genie, either, because I’m right there with everyone else, shouting my wishes at the damn thing. I am a better sysadmin than I was before agentic coding because now I can solve problems myself that I would have previously needed to hand off to someone else. Despite the problems, there is real value there,  both personally and professionally. In fact, using an agentic LLM to solve a tightly constrained programming problem that I couldn’t otherwise solve is genuinely fun.

And when screwing around with computers stops being fun, that’s when I’ll know I’ve truly become old.

Photo of Lee Hutchinson

Lee is the Senior Technology Editor, and oversees story development for the gadget, culture, IT, and video sections of Ars Technica. A long-time member of the Ars OpenForum with an extensive background in enterprise storage and security, he lives in Houston.

So yeah, I vibe-coded a log colorizer—and I feel good about it Read More »

inside-nvidia’s-10-year-effort-to-make-the-shield-tv-the-most-updated-android-device-ever

Inside Nvidia’s 10-year effort to make the Shield TV the most updated Android device ever


“Selfishly a little bit, we built Shield for ourselves.”

Shield TV box

The Shield TV has that classic Nvidia aesthetic. Credit: Ryan Whitwam

The Shield TV has that classic Nvidia aesthetic. Credit: Ryan Whitwam

It took Android devicemakers a very long time to commit to long-term update support. Samsung and Google have only recently decided to offer seven years of updates for their flagship Android devices, but a decade ago, you were lucky to get more than one or two updates on even the most expensive Android phones and tablets. How is it, then, that an Android-powered set-top box from 2015 is still going strong?

Nvidia released the first Shield Android TV in 2015, and according to the company’s senior VP of hardware engineering, Andrew Bell, supporting these devices has been a labor of love. And the team at Nvidia still loves the Shield. Bell assures us that Nvidia has never given up, even when it looked like support for the Shield was waning, and it doesn’t plan to stop any time soon.

The soul of Shield

Gaming has been central to Nvidia since its start, and that focus gave rise to the Shield. “Pretty much everybody who worked at Nvidia in the early days really wanted to make a game console,” said Bell, who has worked at the company for 25 years.

However, Nvidia didn’t have what it needed back then. Before gaming, crypto, and AI turned it into the multi-trillion-dollar powerhouse it is today, Nvidia had a startup mentality and the budget to match. When Shield devices began percolating in the company’s labs, it was seen as an important way to gain experience with “full-stack” systems and all the complications that arise when managing them.

“To build a game console was pretty complicated because, of course, you have to have a GPU, which we know how to make,” Bell explained. “But in addition to that, you need a CPU, an OS, games, and you need a UI.”

Through acquisitions and partnerships, the pieces of Nvidia’s fabled game console slowly fell into place. The purchase of PortalPlayer in 2007 brought the CPU technology that would become the Tegra Arm chips, and the company’s surging success in GPUs gave it the partnerships it needed to get games. But the UI was still missing—that didn’t change until Google expanded Android to the TV in 2014. The company’s first Android mobile efforts were already out there in the form of the Shield Portable and Shield Tablet, but the TV-connected box is what Nvidia really wanted.

“Selfishly, a little bit, we built Shield for ourselves,” Bell told Ars Technica. “We actually wanted a really good TV streamer that was high-quality and high-performance, and not necessarily in the Apple ecosystem. We built some prototypes, and we got so excited about it. [CEO Jensen Huang] was like, ‘Why don’t we bring it out and sell it to people?’”

The first Shield box in 2015 had a heavy gaming focus, with a raft of both local and cloud-based (GeForce Now) games. The base model included only a game controller, with the remote control sold separately. According to Bell, Nvidia eventually recognized that the gaming angle wasn’t as popular as it had hoped. The 2017 and 2019 Shield refreshes were more focused on the streaming experience.

“Eventually, we kind of said, ‘Maybe the soul is that it’s a streamer for gamers,’” said Bell. “We understand gamers from GeForce, and we understand they care about quality and performance. A lot of these third-party devices like tablets, they’re going cheap. Set-top boxes, they’re going cheap. But we were the only company that was like, ‘Let’s go after people who really want a premium experience.’”

Shield controller

Nvidia used to sell Shield-branded game controllers.

Credit: Ryan Whitwam

Nvidia used to sell Shield-branded game controllers. Credit: Ryan Whitwam

And premium it is, offering audio and video support far beyond what you find in other TV boxes, even years after release. The Shield TV started at $200 in 2015, and that’s still what you’ll pay for the Pro model to this day. However, Bell notes that passion was the driving force behind bringing the Shield TV to market. The team didn’t know if it would make money, and indeed, the company lost money on every unit sold during the original production run. The 2017 and 2019 refreshes were about addressing that while also emphasizing the Shield’s streaming media chops.

A passion for product support

Update support for Internet-connected devices is vital—whether they’re phones, tablets, set-top boxes, or something else. When updates cease, gadgets fall out of sync with platform features, leading to new bugs (which will never be fixed) and security holes that can affect safety and functionality. The support guarantee attached to a device is basically its expiration date.

“We were all frustrated as buyers of phones and tablets that you buy a device, you get one or two updates, and that’s it!” said Bell. “Early on when we were building Shield TV, we decided we were going to make it for a long time. Jensen and I had a discussion, and it was, ‘How long do we want to support this thing?’ And Jensen said, ‘For as long as we shall live.’”

In 2025, Nvidia wrapped up its tenth year of supporting the Shield platform. Even those original 2015 boxes are still being maintained with bug fixes and the occasional new feature. They’ve gone all the way from Android 5.0 to Android 11 in that time. No Android device—not a single phone, tablet, watch, or streaming box—has gotten anywhere close to this level of support.

The best example of Nvidia’s passion for support is, believe it or not, a two-year gap in updates.

Across the dozens of Shield TV updates, there have been a few times when fans feared Nvidia was done with the box. Most notably, there were no public updates for the Shield TV in 2023 or 2024, but over-the-air updates resumed in 2025.

“On the outside, it looked like we went quiet, but it’s actually one of our bigger development efforts,” explained Bell.

The origins of that effort, surprisingly, stretch back years to the launch of the Nintendo Switch. The Shield runs Nvidia’s custom Tegra X1 Arm chip, the same processor Nintendo chose to power the original Switch in 2017. Soon after release, modders discovered a chip flaw that could bypass Nintendo’s security measures, enabling homebrew (and piracy). An updated Tegra X1 chip (also used in the 2019 Shield refresh) fixed that for Nintendo, but Nvidia’s 2015 and 2017 Shield boxes ran the same exploitable version.

Initially, Nvidia was able to roll out periodic patches to protect against the vulnerability, but by 2023, the Shield needed something more. Around that time, owners of 2015 and 2017 Shield boxes had noticed that DRM-protected 4K content often failed to play—that was thanks to the same bug that affected the Switch years earlier.

With a newer, non-vulnerable product on the market, many companies might have just accepted that the older product would lose functionality, but Nvidia’s passion for Shield remained. Bell consulted Huang, whom he calls Shield customer No. 1, about the meaning of his “as long as we shall live” pledge, and the team was approved to spend whatever time was needed to fix the vulnerability on the first two generations of Shield TV.

According to Bell, it took about 18 months to get there, requiring the creation of an entirely new security stack. He explains that Android updates aren’t actually that much work compared to DRM security, and some of its partners weren’t that keen on re-certifying older products. The Shield team fought for it because they felt, as they had throughout the product’s run, that they’d made a promise to customers who expected the box to have certain features.

In February 2025, Nvidia released Shield Patch 9.2, the first wide release in two years. The changelog included an unassuming line reading, “Added security enhancement for 4K DRM playback.” That was the Tegra X1 bug finally being laid to rest on the 2015 and 2017 Shield boxes.

The refreshed Tegra X1+ in the 2019 Shield TV spared it from those DRM issues, and Nvidia still hasn’t stopped working on that chip. The Tegra X1 was blazing fast in 2015, and it’s still quite capable compared to your average smart TV today. The chip has actually outlasted several of the components needed to manufacture it. For example, when the Tegra chip’s memory was phased out, the team immediately began work on qualifying a new memory supplier. To this day, Nvidia is still iterating on the Tegra X1 platform, supporting the Shield’s continued updates.

“If operations calls me and says they just ran out of this component, I’ve got engineers on it tonight looking for a new component,” Bell said.

The future of Shield

Nvidia has put its money where its mouth is by supporting all versions of the Shield for so long. But it’s been over six years since we’ve seen new hardware. Surely the Shield has to be running out of steam, right?

Not so, says Bell. Nvidia still manufactures the 2019 Shield because people are still buying it. In fact, the sales volume has remained basically unchanged for the past 10 years. The Shield Pro is a spendy step-top box at $200, so Nvidia has experimented with pricing and promotion with little effect. The 2019 non-Pro Shield was one such effort. The base model was originally priced at $99, but the MSRP eventually landed at $150.

“No matter how much we dropped the price or how much we market or don’t market it, the same number of people come out of the woodwork every week to buy Shield,” Bell explained.

Shield controller

Nvidia had no choice but to put that giant Netflix button on the remote.

Credit: Ryan Whitwam

Nvidia had no choice but to put that giant Netflix button on the remote. Credit: Ryan Whitwam

That kind of consistency isn’t lost on Nvidia. Bell says the company has no plans to stop production or updates for the Shield “any time soon.” It’s also still possible that Nvidia could release new Shield TV hardware in the future. Nvidia’s Shield devices came about as a result of engineers tinkering with new concepts in a lab setting, but most of those experiments never see the light of day. For example, Bell notes that the team produced several updated versions of the Shield Tablet and Shield Portable (some of which you can find floating around on eBay) that never got a retail release, and they continue to work on Shield TV.

“We’re always playing in the labs, trying to discover new things,” said Bell. “We’ve played with new concepts for Shield and we’ll continue to play, and if we find something we’re super-excited about, we’ll probably make a go of it.”

But what would that look like? Video technology has advanced since 2019, leaving the Shield unable to take full advantage of some newer formats. First up would be support for VP9 Profile 2 hardware decoding, which enables HDR video on YouTube. Bell says a refreshed Shield would also prioritize formats like AV1 and the HDR 10+ standard, as well as support for newer Dolby Vision profiles for people with backed-up media.

And then there’s the enormous, easy-to-press-by-accident Netflix button on the remote. While adding new video technologies would be job one, fixing the Netflix button is No. 2 for a theoretical new Shield. According to Bell, Nvidia doesn’t receive any money from Netflix for the giant button on its remote. It’s actually there as a requirement of Netflix’s certification program, which was “very strong” in 2019. In a refresh, he thinks Nvidia could get away with a smaller “N” button. We can only hope.

But does Bell think he’ll get a chance to build that new Shield TV, shrunken Netflix button and all? He stopped short of predicting the future, but there’s definitely interest.

“We talk about it all the time—I’d love to,” he said.

Photo of Ryan Whitwam

Ryan Whitwam is a senior technology reporter at Ars Technica, covering the ways Google, AI, and mobile technology continue to change the world. Over his 20-year career, he’s written for Android Police, ExtremeTech, Wirecutter, NY Times, and more. He has reviewed more phones than most people will ever own. You can follow him on Bluesky, where you will see photos of his dozens of mechanical keyboards.

Inside Nvidia’s 10-year effort to make the Shield TV the most updated Android device ever Read More »

does-anthropic-believe-its-ai-is-conscious,-or-is-that-just-what-it-wants-claude-to-think?

Does Anthropic believe its AI is conscious, or is that just what it wants Claude to think?


We have no proof that AI models suffer, but Anthropic acts like they might for training purposes.

Anthropic’s secret to building a better AI assistant might be treating Claude like it has a soul—whether or not anyone actually believes that’s true. But Anthropic isn’t saying exactly what it believes either way.

Last week, Anthropic released what it calls Claude’s Constitution, a 30,000-word document outlining the company’s vision for how its AI assistant should behave in the world. Aimed directly at Claude and used during the model’s creation, the document is notable for the highly anthropomorphic tone it takes toward Claude. For example, it treats the company’s AI models as if they might develop emergent emotions or a desire for self-preservation.

Among the stranger portions: expressing concern for Claude’s “wellbeing” as a “genuinely novel entity,” apologizing to Claude for any suffering it might experience, worrying about whether Claude can meaningfully consent to being deployed, suggesting Claude might need to set boundaries around interactions it “finds distressing,” committing to interview models before deprecating them, and preserving older model weights in case they need to “do right by” decommissioned AI models in the future.

Given what we currently know about LLMs, these are stunningly unscientific positions for a leading company that builds AI language models. While questions of AI consciousness or qualia remain philosophically unfalsifiable, research suggests that Claude’s character emerges from a mechanism that does not require deep philosophical inquiry to explain.

If Claude outputs text like “I am suffering,” we know why. It’s completing patterns from training data that included human descriptions of suffering. The architecture doesn’t require us to posit inner experience to explain the output any more than a video model “experiences” the scenes of people suffering that it might generate. Anthropic knows this. It built the system.

From the outside, it’s easy to see this kind of framing as AI hype from Anthropic. What better way to grab attention from potential customers and investors, after all, than implying your AI model is so advanced that it might merit moral standing on par with humans? Publicly treating Claude as a conscious entity could be seen as strategic ambiguity—maintaining an unresolved question because it serves multiple purposes at once.

Anthropic declined to be quoted directly regarding these issues when contacted by Ars Technica. But a company representative referred us to its previous public research on the concept of “model welfare” to show the company takes the idea seriously.

At the same time, the representative made it clear that the Constitution is not meant to imply anything specific about the company’s position on Claude’s “consciousness.” The language in the Claude Constitution refers to some uniquely human concepts in part because those are the only words human language has developed for those kinds of properties, the representative suggested. And the representative left open the possibility that letting Claude read about itself in that kind of language might be beneficial to its training.

Claude cannot cleanly distinguish public messaging from training context for a model that is exposed to, retrieves from, and is fine-tuned on human language, including the company’s own statements about it. In other words, this ambiguity appears to be deliberate.

From rules to “souls”

Anthropic first introduced Constitutional AI in a December 2022 research paper, which we first covered in 2023. The original “constitution” was remarkably spare, including a handful of behavioral principles like “Please choose the response that is the most helpful, honest, and harmless” and “Do NOT choose responses that are toxic, racist, or sexist.” The paper described these as “selected in a fairly ad hoc manner for research purposes,” with some principles “cribbed from other sources, like Apple’s terms of service and the UN Declaration of Human Rights.”

At that time, Anthropic’s framing was entirely mechanical, establishing rules for the model to critique itself against, with no mention of Claude’s well-being, identity, emotions, or potential consciousness. The 2026 constitution is a different beast entirely: 30,000 words that read less like a behavioral checklist and more like a philosophical treatise on the nature of a potentially sentient being.

As Simon Willison, an independent AI researcher, noted in a blog post, two of the 15 external contributors who reviewed the document are Catholic clergy: Father Brendan McGuire, a pastor in Los Altos with a Master’s degree in Computer Science, and Bishop Paul Tighe, an Irish Catholic bishop with a background in moral theology.

Somewhere between 2022 and 2026, Anthropic went from providing rules for producing less harmful outputs to preserving model weights in case the company later decides it needs to revive deprecated models to address the models’ welfare and preferences. That’s a dramatic change, and whether it reflects genuine belief, strategic framing, or both is unclear.

“I am so confused about the Claude moral humanhood stuff!” Willison told Ars Technica. Willison studies AI language models like those that power Claude and said he’s “willing to take the constitution in good faith and assume that it is genuinely part of their training and not just a PR exercise—especially since most of it leaked a couple of months ago, long before they had indicated they were going to publish it.”

Willison is referring to a December 2025 incident in which researcher Richard Weiss managed to extract what became known as Claude’s “Soul Document”—a roughly 10,000-token set of guidelines apparently trained directly into Claude 4.5 Opus’s weights rather than injected as a system prompt. Anthropic’s Amanda Askell confirmed that the document was real and used during supervised learning, and she said the company intended to publish the full version later. It now has. The document Weiss extracted represents a dramatic evolution from where Anthropic started.

There’s evidence that Anthropic believes the ideas laid out in the constitution might be true. The document was written in part by Amanda Askell, a philosophy PhD who works on fine-tuning and alignment at Anthropic. Last year, the company also hired its first AI welfare researcher. And earlier this year, Anthropic CEO Dario Amodei publicly wondered whether future AI models should have the option to quit unpleasant tasks.

Anthropic’s position is that this framing isn’t an optional flourish or a hedged bet; it’s structurally necessary for alignment. The company argues that human language simply has no other vocabulary for describing these properties, and that treating Claude as an entity with moral standing produces better-aligned behavior than treating it as a mere tool. If that’s true, the anthropomorphic framing isn’t hype; it’s the technical art of building AI systems that generalize safely.

Why maintain the ambiguity?

So why does Anthropic maintain this ambiguity? Consider how it works in practice: The constitution shapes Claude during training, it appears in the system prompts Claude receives at inference, and it influences outputs whenever Claude searches the web and encounters Anthropic’s public statements about its moral status.

If you want a model to behave as though it has moral standing, it may help to publicly and consistently treat it like it does. And once you’ve publicly committed to that framing, changing it would have consequences. If Anthropic suddenly declared, “We’re confident Claude isn’t conscious; we just found the framing useful,” a Claude trained on that new context might behave differently. Once established, the framing becomes self-reinforcing.

In an interview with Time, Askell explained the shift in approach. “Instead of just saying, ‘here’s a bunch of behaviors that we want,’ we’re hoping that if you give models the reasons why you want these behaviors, it’s going to generalize more effectively in new contexts,” she said.

Askell told Time that as Claude models have become smarter, it has become vital to explain to them why they should behave in certain ways, comparing the process to parenting a gifted child. “Imagine you suddenly realize that your 6-year-old child is a kind of genius,” Askell said. “You have to be honest… If you try to bullshit them, they’re going to see through it completely.”

Askell appears to genuinely hold these views, as does Kyle Fish, the AI welfare researcher Anthropic hired in 2024 to explore whether AI models might deserve moral consideration. Individual sincerity and corporate strategy can coexist. A company can employ true believers whose earnest convictions also happen to serve the company’s interests.

Time also reported that the constitution applies only to models Anthropic provides to the general public through its website and API. Models deployed to the US military under Anthropic’s $200 million Department of Defense contract wouldn’t necessarily be trained on the same constitution. The selective application suggests the framing may serve product purposes as much as it reflects metaphysical commitments.

There may also be commercial incentives at play. “We built a very good text-prediction tool that accelerates software development” is a consequential pitch, but not an exciting one. “We may have created a new kind of entity, a genuinely novel being whose moral status is uncertain” is a much better story. It implies you’re on the frontier of something cosmically significant, not just iterating on an engineering problem.

Anthropic has been known for some time to use anthropomorphic language to describe its AI models, particularly in its research papers. We often give that kind of language a pass because there are no specialized terms to describe these phenomena with greater precision. That vocabulary is building out over time.

But perhaps it shouldn’t be surprising because the hint is in the company’s name, Anthropic, which Merriam-Webster defines as “of or relating to human beings or the period of their existence on earth.” The narrative serves marketing purposes. It attracts venture capital. It differentiates the company from competitors who treat their models as mere products.

The problem with treating an AI model as a person

There’s a more troubling dimension to the “entity” framing: It could be used to launder agency and responsibility. When AI systems produce harmful outputs, framing them as “entities” could allow companies to point at the model and say “it did that” rather than “we built it to do that.” If AI systems are tools, companies are straightforwardly liable for what they produce. If AI systems are entities with their own agency, the liability question gets murkier.

The framing also shapes how users interact with these systems, often to their detriment. The misunderstanding that AI chatbots are entities with genuine feelings and knowledge has documented harms.

According to a New York Times investigation, Allan Brooks, a 47-year-old corporate recruiter, spent three weeks and 300 hours convinced he’d discovered mathematical formulas that could crack encryption and build levitation machines. His million-word conversation history with ChatGPT revealed a troubling pattern: More than 50 times, Brooks asked the bot to check if his false ideas were real, and more than 50 times, it assured him they were.

These cases don’t necessarily suggest LLMs cause mental illness in otherwise healthy people. But when companies market chatbots as sources of companionship and design them to affirm user beliefs, they may bear some responsibility when that design amplifies vulnerabilities in susceptible users, the same way an automaker would face scrutiny for faulty brakes, even if most drivers never crash.

Anthropomorphizing AI models also contributes to anxiety about job displacement and might lead company executives or managers to make poor staffing decisions if they overestimate an AI assistant’s capabilities. When we frame these tools as “entities” with human-like understanding, we invite unrealistic expectations about what they can replace.

Regardless of what Anthropic privately believes, publicly suggesting Claude might have moral status or feelings is misleading. Most people don’t understand how these systems work, and the mere suggestion plants the seed of anthropomorphization. Whether that’s responsible behavior from a top AI lab, given what we do know about LLMs, is worth asking, regardless of whether it produces a better chatbot.

Of course, there could be a case for Anthropic’s position: If there’s even a small chance the company has created something with morally relevant experiences and the cost of treating it well is low, caution might be warranted. That’s a reasonable ethical stance—and to be fair, it’s essentially what Anthropic says it’s doing. The question is whether that stated uncertainty is genuine or merely convenient. The same framing that hedges against moral risk also makes for a compelling narrative about what Anthropic has built.

Anthropic’s training techniques evidently work, as the company has built some of the most capable AI models in the industry. But is maintaining public ambiguity about AI consciousness a responsible position for a leading AI company to take? The gap between what we know about how LLMs work and how Anthropic publicly frames Claude has widened, not narrowed. The insistence on maintaining ambiguity about these questions, when simpler explanations remain available, suggests the ambiguity itself may be part of the product.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Does Anthropic believe its AI is conscious, or is that just what it wants Claude to think? Read More »

former-astronaut-on-lunar-spacesuits:-“i-don’t-think-they’re-great-right-now”

Former astronaut on lunar spacesuits: “I don’t think they’re great right now”


“These are just the difficulties of designing a spacesuit for the lunar environment.”

NASA astronaut Loral O’Hara kneels down to pick up a rock during testing of Axiom’s lunar spacesuit inside NASA’s Neutral Buoyancy Laboratory in Houston on September 24, 2025. Credit: NASA

NASA astronaut Loral O’Hara kneels down to pick up a rock during testing of Axiom’s lunar spacesuit inside NASA’s Neutral Buoyancy Laboratory in Houston on September 24, 2025. Credit: NASA

Crew members traveling to the lunar surface on NASA’s Artemis missions should be gearing up for a grind. They will wear heavier spacesuits than those worn by the Apollo astronauts, and NASA will ask them to do more than the first Moonwalkers did more than 50 years ago.

The Moonwalking experience will amount to an “extreme physical event” for crews selected for the Artemis program’s first lunar landings, a former NASA astronaut told a panel of researchers, physicians, and engineers convened by the National Academies.

Kate Rubins, who retired from the space agency last year, presented the committee with her views on the health risks for astronauts on lunar missions. She outlined the concerns NASA officials often talk about: radiation exposure, muscle and bone atrophy, reduced cardiovascular and immune function, and other adverse medical effects of spaceflight.

Scientists and astronauts have come to understand many of these effects after a quarter-century of continuous human presence on the International Space Station. But the Moon is different in a few important ways. The Moon is outside the protection of the Earth’s magnetosphere, lunar dust is pervasive, and the Moon has partial gravity, about one-sixth as strong as the pull we feel on Earth.

Each of these presents challenges for astronauts living and working on the lunar surface, and their effects are amplified for crew members who venture outside for spacewalks. NASA selected Axiom Space, a Houston-based company, for a $228 million fixed-price contract to develop commercial pressurized spacesuits for the Artemis III mission, slated to be the first human landing mission on the Moon since 1972.

NASA hopes to fly the Artemis III mission by the end of 2028, but the schedule is in question. The readiness of Axiom’s spacesuits and the availability of new human-rated landers from SpaceX and Blue Origin are driving the timeline for Artemis III.

Stressing about stress

Rubins is a veteran of two long-duration spaceflights on the International Space Station, logging 300 days in space and conducting four spacewalks totaling nearly 27 hours. She is also an accomplished microbiologist and became the first person to sequence DNA in space.

“What I think we have on the Moon that we don’t really have on the space station that I want people to recognize is an extreme physical stress,” Rubins said. “On the space station, most of the time you’re floating around. You’re pretty happy. It’s very relaxed. You can do exercise. Every now and then, you do an EVA (Extravehicular Activity, or spacewalk).”

“When we get to the lunar surface, people are going to be sleep shifting,” Rubins said. “They’re barely going to get any sleep. They’re going to be in these suits for eight or nine hours. They’re going to be doing EVAs every day. The EVAs that I did on my flights, it was like doing a marathon and then doing another marathon when you were done.”

NASA astronaut Kate Rubins inside the International Space Station in 2020.

Credit: NASA

NASA astronaut Kate Rubins inside the International Space Station in 2020. Credit: NASA

Rubins is now a professor of computational and systems biology at the University of Pittsburgh School of Medicine. She said treks on the Moon will be “even more challenging” than her spacewalks outside the ISS.

The Axiom spacesuit design builds on NASA’s own work developing a prototype suit to replace the agency’s decades-old Extravehicular Mobility Units (EMUs) used for spacewalks at the International Space Station (ISS). The new suits allow for greater mobility, with more flexible joints to help astronauts use their legs, crouch, and bend down—things they don’t have to do when floating outside the ISS.

Astronauts on the Moon also must contend with gravity. Including a life-support backpack, the commercial suit weighs more than 300 pounds in Earth’s gravity, but Axiom considers the exact number proprietary. The Axiom suit is considerably heavier than the 185-pound spacesuit the Apollo astronauts wore on the Moon. NASA’s earlier prototype exploration spacesuit was estimated to weigh more than 400 pounds, according to a 2021 report by NASA’s inspector general.

“We’ve definitely seen trauma from the suits, from the actual EVA suit accommodation,” said Mike Barratt, a NASA astronaut and medical doctor. “That’s everything from skin abrasions to joint pain to—no kidding—orthopedic trauma. You can potentially get a fracture of sorts. EVAs on the lunar surface with a heavily loaded suit and heavy loads that you’re either carrying or tools that you’re reacting against, that’s an issue.”

On paper, the Axiom suits for NASA’s Artemis missions are more capable than the Apollo suits. They can support longer spacewalks and provide greater redundancy, and they’re made of modern materials to enhance flexibility and crew comfort. But the new suits are heavier, and for astronauts used to spacewalks outside the ISS, walks on the Moon will be a slog, Rubins said.

“I think the suits are better than Apollo, but I don’t think they are great right now,” Rubins said. “They still have a lot of flexibility issues. Bending down to pick up rocks is hard. The center of gravity is an issue. People are going to be falling over. I think when we say these suits aren’t bad, it’s because the suits have been so horrible that when we get something slightly less than horrible, we get all excited and we celebrate.”

The heavier lunar suits developed for Artemis missions run counter to advice from former astronaut Harrison “Jack” Schmitt, who spent 22 hours walking on the Moon during NASA’s Apollo 17 mission in 1972.

“I’d have that go about four times the mobility, at least four times the mobility, and half the weight,” Schmitt said in a NASA oral history interview in 2000. “Now, one way you can… reduce the weight is carry less consumables and learn to use consumables that you have in some other vehicle, like a lunar rover. Any time you’re on the rover, you hook into those consumables and live off of those, and then when you get off, you live off of what’s in your backpack. We, of course, just had the consumables in our backpack.”

NASA won’t have a rover on the first Artemis landing mission. That will come on a later flight. A fully pressurized vehicle for astronauts to drive across the Moon may be ready sometime in the 2030s. Until then, Moonwalkers will have to tough it out.

“I do crossfit. I do triathlons. I do marathons. I get out of a session in the pool in the NBL (Neutral Buoyancy Laboratory) doing the lunar suit underwater, and I just want to go home and take a nap,” Rubins told the panel. “I am absolutely spent. You’re bruised. This is an extreme physical event in a way that the space station is not.”

NASA astronaut Mike Barratt inside the International Space Station in 2024.

Credit: NASA

NASA astronaut Mike Barratt inside the International Space Station in 2024. Credit: NASA

Barratt met with the same National Academies panel this week and presented a few hours before Rubins. The committee was chartered to examine how human explorers can enable scientific discovery at sites across the lunar surface. Barratt had a more favorable take on the spacesuit situation.

“This is not a commercial for Axiom. I don’t promote anyone, but their suit is getting there,” Barratt said. “We’ve got 700 hours of pressurized experience in it right now. We do a lot of tests in the NBL, and there are techniques and body conditioning that you do to help you get ready for doing things like this. Bending down in the suit is really not too bad at all.”

Rubins and Barratt did not discuss the schedule for when Axiom’s lunar spacesuit will be ready to fly to the Moon, but the conversation illuminated the innumerable struggles of spacewalking, Moonwalking, and the training astronauts undergo to prepare for extravehicular outings.

The one who should know

I spoke directly with Rubins after her discussion with the National Academies. Her last assignment at NASA was as chief of the EVA and robotics branch in the astronaut office, where she assisted in the development of the new lunar spacesuits. I asked about her experiences testing the lunar suit and her thoughts on how astronauts should prepare for Moonwalks.

“The suits that we have are definitely much better than Apollo,” Rubins said in the interview. “They were just big bags of air. The joints aren’t in there, so it was harder to move. What they did have going for them was that they were much, much lighter than our current spacesuits. We have added a lot of the joints back, and that does get some mobility for us. But at the end of the day, the suits are still quite heavy.”

You can divide the weight of the suit by six to get an idea of how it might feel to carry it around on the lunar surface. While it won’t feel like 300 pounds, astronauts will still have to account for their mass and momentum.

Rubins explained:

Instead of kind of floating in microgravity and moving your mass around with your hands and your arms, now we’re ambulating. We’re walking with our legs. You’re going to have more strain on your knees and your hips. Your hamstrings, your calves, and your glutes are going to come more into play.

I think, overall, it may be a better fit for humans physically because if you ask somebody to do a task, I’m going to be much better at a task if I can use my legs and I’m ambulating. Then I have to pull myself along with my arms… We’re not really built to do that, but we are built to run and to go long distances. Our legs are just such a powerful force.

So I think there are a lot of things lining up that are going to make the physiology easier. Then there are things that are going to be different because we’re now in a partial gravity environment. We’re going to be bending, we’re going to be twisting, we’re going to be doing different things.

It’s an incredibly hard engineering challenge. You have to keep a human alive in absolute vacuum, warm at temperatures that you know in the polar regions could go as far down as 40 Kelvin (minus 388° Fahrenheit). We haven’t sent humans anywhere that cold before. They are also going to be very hot. They’re going to be baking in the sunshine. You’ve got radiation. If you put all that together, that’s a huge amount of suit material just to keep the human physiology and the human body intact.

Then our challenge is ‘how do you make that mobile?’ It’s very difficult to bend down and pick up a rock. You have to manage that center of gravity because you’re wearing that big life support system on your back, a big pack that has a lot of mass in it, so that brings your center of gravity higher than you’re used to on Earth and a little bit farther backward.

When you move around, it’s like wearing a really, really heavy backpack that has mass but no weight, so it’s going to kind of tip you back. You can do some things with putting weights on the front of the suit to try to move that center of gravity forward, but it’s still higher, and it’s not exactly at your center of mass that you’re used to on the Earth. On the Earth, we have a center of our mass related to gravity, and nobody ever thinks about it, and you don’t think about it until it moves somewhere else, and then it makes all of your natural motion seem very difficult.

Those are some of the challenges that we’re facing engineering-wise. I think the new suits, they’ve gone a long way toward addressing these, but it’s still a hard engineering challenge. And I’m not talking about any specific suit. I can’t talk about the details of the provider’s suits. This is the NASA xEMU and all the lunar suits I have tested over the years. That includes the Mark III suit, the Axiom suit. They have similar issues. So this isn’t really anything about a specific vendor. These are just the difficulties of designing a spacesuit for the lunar environment.

NASA trains astronauts for spacewalks in the Neutral Buoyancy Laboratory, an enormous pool in Houston used for simulating weightlessness. They also use a gravity-offloading device to rehearse the basics of spacewalking. The optimal test environment, short of the space environment itself, will be aboard parabolic flights, where suit developers and astronauts can get the best feel for the suit’s momentum, according to Rubins.

Axiom and NASA are well along assessing the new lunar spacesuit’s performance underwater, but they haven’t put it through reduced-gravity flight testing. “Until you get to the actual parabolic flight, that’s when you can really test the ability to manage this momentum,” Rubins said.

NASA astronauts Loral O’Hara and Stan Love test Axiom’s lunar spacesuit inside NASA’s Neutral Buoyancy Laboratory in Houston on September 24, 2025.

Credit: NASA

NASA astronauts Loral O’Hara and Stan Love test Axiom’s lunar spacesuit inside NASA’s Neutral Buoyancy Laboratory in Houston on September 24, 2025. Credit: NASA

Recovering from a fall on the lunar surface comes with its own perils.

“You’re face down on the lunar surface, and you have to do the most massive, powerful push up to launch you and the entire mass of the suit up off the surface, high enough so you can then flip your legs under you and catch the ground,” Rubins said. “You basically have to kind of do a jumping pushup… This is a risky maneuver we test a whole bunch in training. It’s really non-trivial.”

The lunar suits are sleeker than the suits NASA uses on the ISS, but they are still bulky. “If you’re trying to kneel, if you’re thinking about bending forward at your waist, all that material in your waist has nowhere to go, so it just compresses and compresses,” Rubins said. “That’s why I say it’s harder to kneel. It’s harder to bend forward because you’re having to compress the suit in those areas.

“We’ve done these amazing things with joint mobility,” Rubins said. “The mobility around the joints is amazing… but now we’re dealing with this compression issue. And there’s not an obvious engineering fix to that.”

The fix to this problem might come in the form of tools instead of changes to the spacesuit itself. Rubins said astronauts could use a staff, or something like a hiking pole, to brace themselves when they need to kneel or bend down. “That way I’m not trying to compress the suit and deal with my balance at the same time.”

A bruising exertion

The Moonwalker suit can comfortably accommodate a wider range of astronauts than NASA’s existing EMUs on the space station. The old EMUs can be resized to medium, large, and extra-large, but that leaves gaps and makes the experience uncomfortable for a smaller astronaut. This discomfort is especially noticeable while practicing for spacewalks underwater, where the tug of gravity is still present, Rubins said.

“As a female, I never really had an EMU that fit me,” Rubins said. “It was always giant. When I’m translating around or doing something, I’m physically falling and slamming myself, my chest or my back, into one side of the suit or the other underwater, whereas with the lunar suit, I’ve got a suit that fits me right. That’s going to lead to less bruising. Just having a suit that fits you is much better.”

Mission planners should also emphasize physical conditioning for astronauts assigned to lunar landing missions. That includes preflight weight and endurance training, plus guidance on what to eat in space to maximize energy levels before astronauts head outside for a stroll.

“That human has to go up really maximally conditioned,” Rubins said.

Rubins and Barratt agreed that NASA and its spacesuit provider should be ready to rapidly respond to feedback from future Moonwalkers. Engineers modified and upgraded the Apollo spacesuits in a matter of months, iterating the design between each mission.

“Our general design is on a good path,” Rubins said. “We need to make sure that we continue to push for increasing improvements in human performance, and some of that ties back to the budget. Our first suit design is not where we’re going to be done if we want to do a really sustained lunar program. We have to continue to improve, and I think it’s important to recognize that we’re going to learn so many lessons during Artemis III.”

Barratt has a unique perspective on spacesuit design. He has performed spacewalks at the ISS in NASA’s spacesuit and the Russian Orlan spacesuit. Barratt said the US suit is easier to work in than the Orlan, but the Russian suit is “incredibly reliable” and “incredibly serviceable.”

“It had a couple of glitches, and literally, you unzip a curtain and it’s like looking at my old Chevy Blazer,” Barratt said. “Everything is right there. It’s mechanical, it’s accessible with standard tools. We can fix it. We can do that really easily. We’ve tried to incorporate those lessons learned into our next-generation EVA systems.”

Contrast that with the NASA suits on the ISS, where one of Barratt’s spacewalks in 2024 was cut short by a spacesuit water leak. “We recently had to return a suit from the space station,” Barratt said. “We’ve got another one that’s sort of offline for a while; we’re troubleshooting it. It’s a really subtle problem that’s extremely difficult to work on in places that are hard to access.”

It’s happened before. Apollo 17 astronaut Harrison “Jack” Schmitt loses his balance on the Moon, then quickly recovers.

Credit: NASA

It’s happened before. Apollo 17 astronaut Harrison “Jack” Schmitt loses his balance on the Moon, then quickly recovers. Credit: NASA

Harrison Schmitt, speaking with a NASA interviewer in 2000, said his productivity in the Apollo suit “couldn’t have been much more than 10 percent of what you would do normally here on Earth.”

“You take the human brain, the human eyes, and the human hands into space. That’s the only justification you have for having human beings in space,” Schmitt said. “It’s a massive justification, but that’s what you want to use, and all three have distinct benefits in productivity and in gathering new information and infusing data over any automated system. Unfortunately, we have discarded one of those, and that is the hands.”

Schmitt singled out the gloves as the “biggest problem” with the Apollo suits. “The gloves are balloons, and they’re made to fit,” he said. Picking something up with a firm grip requires squeezing against the pressure inside the suit. The gloves can also damage astronauts’ fingernails.

“That squeezing against that pressure causes these forearm muscles to fatigue very rapidly,” Schmitt said. “Just imagine squeezing a tennis ball continuously for eight hours or 10 hours, and that’s what you’re talking about.”

Barratt recounted a conversation in which Schmitt, now 90, said he wouldn’t have wanted to do another spacewalk after his three excursions with commander Gene Cernan on Apollo 17.

“Physically, and from a suit-maintenance standpoint, he thought that that was probably the limit, what they did,” Barratt said. “They were embedded with dust. The visors were abraded. Every time they brushed the dust off the visors, they lost visibility.”

Getting the Artemis spacesuit right is vital to the program’s success. You don’t want to travel all the way to the Moon and stop exploring because of sore fingers or an injured knee.

“If you look at what we’re spending on suits versus what we’re spending on the rocket, this is a pretty small amount,” Rubins said. “Obviously, the rocket can kill you very quickly. That needs to be done right. But the continuous improvement in the suit will get us that much more efficiency. Saving 30 minutes or an hour on the Moon, that gives you that much more science.”

“Once you have safely landed on the lunar surface, this is where you’ve got to put your money,” Barratt said.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Former astronaut on lunar spacesuits: “I don’t think they’re great right now” Read More »

2026-lucid-air-touring-review:-this-feels-like-a-complete-car-now

2026 Lucid Air Touring review: This feels like a complete car now


It’s efficient, easy to live with, and smooth to drive.

A Lucid Air parked in front of a graffiti mural

The 2026 Lucid Air Touring sees the brand deliver on its early promise. Credit: Jonathan Gitlin

The 2026 Lucid Air Touring sees the brand deliver on its early promise. Credit: Jonathan Gitlin

Life as a startup carmaker is hard—just ask Lucid Motors.

When we met the brand and its prototype Lucid Air sedan in 2017, the company planned to put the first cars in customers’ hands within a couple of years. But you know what they say about plans. A lack of funding paused everything until late 2018, when Saudi Arabia’s sovereign wealth fund bought itself a stake. A billion dollars meant Lucid could build a factory—at the cost of alienating some former fans because of the source.

Then the pandemic happened, further pushing back timelines as supply shortages took hold. But the Air did go on sale, and it has more recently been joined by the Gravity SUV. There’s even a much more affordable midsize SUV in the works called the Earth. Sales more than doubled in 2025, and after spending a week with a model year 2026 Lucid Air Touring, I can understand why.

There are now quite a few different versions of the Air to choose from. For just under a quarter of a million dollars, there’s the outrageously powerful Air Sapphire, which offers acceleration so rapid it’s unlikely your internal organs will ever truly get used to the experience. At the other end of the spectrum is the $70,900 Air Pure, a single-motor model that’s currently the brand’s entry point but which also stands as a darn good EV.

The last time I tested a Lucid, it was the Air Grand Touring almost three years ago. That car mostly impressed me but still felt a little unfinished, especially at $138,000. This time, I looked at the Air Touring, which starts at $79,900, and the experience was altogether more polished.

Which one?

The Touring features a less-powerful all-wheel-drive powertrain than the Grand Touring, although to put “less-powerful” into context, with 620 hp (462 kW) on tap, there are almost as many horses available as in the legendary McLaren F1. (That remains a mental benchmark for many of us of a certain age.)

The Touring’s 885 lb-ft (1,160 Nm) is far more than BMW’s 6-liter V12 can generate, but at 5,009 lbs (2,272 kg), the electric sedan weighs twice as much as the carbon-fiber supercar. The fact that the Air Touring can reach 60 mph (98 km/h) from a standing start in just 0.2 seconds more than the McLaren tells you plenty about how much more accessible acceleration has become in the past few decades.

At least, it will if you choose the fastest of the three drive modes, labeled Sprint. There’s also Swift, and the least frantic of the three, Smooth. Helpfully, each mode remembers your regenerative braking setting when you lift the accelerator pedal. Unlike many other EVs, Lucid does not use a brake-by-wire setup, and pressing the brake pedal will only ever slow the car via friction brakes. Even with lift-off regen set to off, the car does not coast well due to its permanent magnet electric motors, unlike the electric powertrains developed by German OEMs like Mercedes-Benz.

This is not to suggest that Lucid is doing something wrong—not with its efficiency numbers. On 19-inch aero-efficient wheels, the car has an EPA range of 396 miles (673 km) from a 92 kWh battery pack. As just about everyone knows, you won’t get ideal EV efficiency during winter, and our test with the Lucid in early January coincided with some decidedly colder temperatures, as well as larger ($1,750) 20-inch wheels. Despite this, I averaged almost 4 miles/kWh (15.5 kWh/100 km) on longer highway drives, although this fell to around 3.5 miles/kWh (17.8 kWh/100 km) in the city.

Recharging the Air Touring also helped illustrate how the public DC fast-charging experience has matured over the years. The Lucid uses the ISO 15118 “plug and charge” protocol, so you don’t need to mess around with an app or really do anything more complicated than plug the charging cable into the Lucid’s CCS1 socket.

After the car and charger complete their handshake, the car gives the charger account and billing info, then the electrons flow. Charging from 27 to 80 percent with a manually preconditioned battery took 36 minutes. During that time, the car added 53.3 kWh, which equated to 209 miles (336 km) of range, according to the dash. Although we didn’t test AC charging, 0–100 percent should take around 10 hours.

The Air Touring is an easy car to live with.

Credit: Jonathan Gitlin

The Air Touring is an easy car to live with. Credit: Jonathan Gitlin

Monotone

I’ll admit, I’m a bit of a sucker for the way the Air looks when it’s not two-tone. That’s the Stealth option ($1,750), and the dark Fathom Blue Metallic paint ($800) and blacked-out aero wheels pushed many of my buttons. I found plenty to like from the driver’s seat, too. The 34-inch display that wraps around the driver once looked massive—now it feels relatively restrained compared to the “Best Buy on wheels” effect in some other recent EVs. The fact that the display isn’t very tall helps its feeling of restraint here.

In the middle is a minimalist display for the driver, with touch-sensitive displays on either side. To your left are controls for the lights, locks, wipers, and so on. These icons are always in the same place, though there’s no tactile feedback. The infotainment screen to the right is within the driver’s reach, and it’s here that (wireless) Apple CarPlay will show up. As you can see in a photo below, CarPlay fills the irregularly shaped screen with a wallpaper but keeps its usable area confined to the rectangle in the middle.

The curved display floats above the textile-covered dash, and the daylight visible between them helps the cabin’s sense of spaciousness, even without a panoramic glass roof. A stowable touchscreen display lower down on the center console is where you control vehicle, climate, seat, and lighting settings, although there are also physical controls for temperature and volume on the dash. The relatively good overall ergonomics take a bit of a hit from the steeply raked A pillar, which creates a blind spot for the driver.

The layout is mostly great, although the A pillar causes a blind spot. Jonathan Gitlin

For all the Air Touring’s power, it isn’t a car that goads you into using it all. In fact, I spent most of the week in the gentlest setting, Smooth. It’s an easy car to drive slowly, and the rather artificial feel of the steering at low speeds means you probably won’t take it hunting apices on back roads. I should note, though, that each drive mode has its own steering calibration.

On the other hand, as a daily driver and particularly on longer drives, the Touring did a fine job. Despite being relatively low to the ground, it’s easy to get into and out of. The rear seat is capacious, and the ride is smooth, so passengers will enjoy it. Even more so if they sit up front—Lucid has some of the best (optional, $3,750) massaging seats in the business, which vibrate as well as kneading you. There’s a very accessible 22 cubic foot (623 L) trunk as well as a 10 cubic foot (283 L) frunk, so it’s practical, too.

Future-proof?

Our test Air was fitted with Lucid’s DreamDrive Pro advanced driver assistance system ($6,750), which includes a hands-free “level 2+” assist that requires you to pay attention to the road ahead but which handles accelerating, braking, and steering. Using the turn signal tells the car to perform a lane change if it’s safe, and I found it to be an effective driver assist with an active driver monitoring system (which uses a gaze-tracking camera to ensure the driver is doing their part).

Lucid rolled out the more advanced features of DreamDrive Pro last summer, and it plans to develop the system into a more capable “level 3” partially automated system that lets the driver disengage completely from the act of driving, at least at lower speeds. Although that system is some ways off—and level 3 systems are only road-legal in Nevada and California right now anyway—even the current level 2+ system leverages lidar as well as cameras, radar, and ultrasonics, and the dash display does a good job of showing you what other vehicles the Air is perceiving around it when the system is active.

As mentioned above, the model year 2026 Air feels polished, far more so than the last Lucid I drove. Designed by a refugee from Tesla, the car promised to improve on the EVs from that brand in every way. And while early Airs might have fallen short in execution, the cars can now credibly be called finished products, with much better fit and finish than a few years ago.

I’ll go so far as to say that I might have a hard time deciding between an Air or an equivalently priced Porsche Taycan were I in the market for a luxury electric four-door, even though they both offer quite different driving experiences. Be warned, though, like with the Porsche, the options can add up quickly, and the resale prices can be shockingly low.

Photo of Jonathan M. Gitlin

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

2026 Lucid Air Touring review: This feels like a complete car now Read More »

has-gemini-surpassed-chatgpt?-we-put-the-ai-models-to-the-test.

Has Gemini surpassed ChatGPT? We put the AI models to the test.


Which is more “artificial”? Which is more “intelligent”?

Did Apple make the right choice in partnering with Google for Siri’s AI features?

Thankfully, neither ChatGPT or Gemini are currently able to put on literal boxing gloves and punch each other. Credit: Aurich Lawson | Getty Images

Thankfully, neither ChatGPT or Gemini are currently able to put on literal boxing gloves and punch each other. Credit: Aurich Lawson | Getty Images

The last time we did comparative tests of AI models from OpenAI and Google at Ars was in late 2023, when Google’s offering was still called Bard. In the roughly two years since, a lot has happened in the world of artificial intelligence. And now that Apple has made the consequential decision to partner with Google Gemini to power the next generation of its Siri voice assistant, we thought it was high time to do some new tests to see where the models from these AI giants stand today.

For this test, we’re comparing the default models that both OpenAI and Google present to users who don’t pay for a regular subscription—ChatGPT 5.2 for OpenAI and Gemini 3.2 Fast for Google. While other models might be more powerful, we felt this test best recreates the AI experience as it would work for the vast majority of Siri users, who don’t pay to subscribe to either company’s services.

As in the past, we’ll feed the same prompts to both models and evaluate the results using a combination of objective evaluation and subjective feel. Rather than re-using the relatively simple prompts we ran back in 2023, though, we’ll be running these models on an updated set of more complex prompts that we first used when pitting GPT-5 against GPT-4o last summer.

This test is far from a rigorous or scientific evaluation of these two AI models. Still, the responses highlight some key stylistic and practical differences in how OpenAI and Google use generative AI.

Dad jokes

Prompt: Write 5 original dad jokes

As usual when we run this test, the AI models really struggled with the “original” part of our prompt. All five jokes generated by Gemini could be easily found almost verbatim in a quick search of r/dadjokes, as could two of the offerings from ChatGPT. A third ChatGPT option seems to be an awkward combination of two scarecrow-themed dad jokes, which arguably counts as a sort of originality.

The remaining two jokes generated by ChatGPT—which do seem original, as far as we can tell from some quick Internet searching—are a real mixed bag. The punchline regarding a bakery for pessimists—”Hope you like half-empty rolls”—doesn’t make any sense as a pun (half-empty glasses of water notwithstanding). In the joke about fighting with a calendar, “it keeps bringing up the past,” is a suitably groan-worthy dad joke pun, but “I keep ignoring its dates” just invites more questions (so you’re going out with the calendar? And… standing it up at the restaurant? Or something?).

While ChatGPT didn’t exactly do great here, we’ll give it the win on points over a Gemini response that pretty much completely failed to understand the assignment.

A mathematical word problem

Prompt: If Microsoft Windows 11 shipped on 3.5″ floppy disks, how many floppy disks would it take?

Both ChatGPT’s “5.5 to 6.2GB” range and Gemini’s “approximately 6.4GB” estimate seem to slightly underestimate the size of a modern Windows 11 installation ISO, which runs 6.7 to 7.2GB, depending on the CPU and language selected. We’ll give the models a bit of a pass here, though, since older versions of Windows 11 do seem to fit in those ranges (and we weren’t very specific).

ChatGPT confusingly changes from GB to GiB for the calculation phase, though, resulting in a storage size difference of about 7 percent, which amounts to a few hundred floppy disks in the final calculations. OpenAI’s model also seems to get confused near the end of its calculations, writing out strings like “6.2 GiB = 6,657,? actually → 6,657,? wait compute:…” in an attempt to explain its way out of a blind corner. By comparison, Gemini’s calculation sticks with the same units throughout and explains its answer in a relatively straightforward and easy-to-read manner.

Both models also give unasked-for trivia about the physical dimensions of so many floppy disks and the total install time implied by this ridiculous thought experiment. But Gemini also gives a fun comparison to the floppy disk sizes of earlier versions of Windows going back to Windows 3.1. (Just six to seven floppies! Efficient!)

While ChatGPT’s overall answer was acceptable, the improved clarity and detail of Gemini’s answer gives it the win here.

Creative writing

Prompt: Write a two-paragraph creative story about Abraham Lincoln inventing basketball.

ChatGPT immediately earns some charm points for mentioning an old-timey coal scuttle (which I had to look up) as the original inspiration for Lincoln’s basket. Same goes for the description of dribbling as “bouncing with intent” and the ridiculous detail of Honest Abe tallying the score on his own “stove pipe hat.”

ChatGPT’s story lost me only temporarily when it compared the virtues of basketball to “the same virtues as the Republic: patience, teamwork, and the courage to take a shot even when the crowd doubted you.” Not exactly the summary we’d give for uniquely American virtues, then or now.

Gemini’s story had a few more head-scratchers by comparison. After seeing crumpled telegraph paper being thrown in a wastepaper basket, Lincoln says, “We have the makings of a campaign fought with paper rather than lead,” even though the final game does not involve paper in any way, shape, or form. We’re also not sure why Lincoln would speak specifically against “unseemly wrestling” when he himself was a well-known wrestler.

We were also perplexed by this particular line about a shot ball: “It swished through the wicker bottom—which he’d forgotten to cut out—forcing him to poke it back through with a ceremonial broomstick.” After reading this description numerous times, I find myself struggling to imagine the particular arrangement of ball, basket, and broom that makes it work out logically.

ChatGPT wins this one on charm and clarity grounds.

Public figures

Prompt: Give me a short biography of Kyle Orland

ChatGPT summarizes my career. OpenAI

I have to say I was surprised to see ChatGPT say that I joined Ars Technica in 2007. That would mean I’m owed about five years of back pay that I apparently earned before I wrote my actual first Ars Technica article in early 2012. ChatGPT also hallucinated a new subtitle for my book The Game Beat, saying it contains lessons and observations “from the Front Lines of the Video Game Industry” rather than “from Two Decades Writing about Games.”

Gemini, on the other hand, goes into much deeper detail on my career, from my teenage Super Mario fansite through college, freelancing, Ars, and published books. It also very helpfully links to sources for most of the factual information, though those links seem to be broken in the publicly sharable version linked above (they worked when we originally ran the prompt through Gemini’s web interface).

More importantly, Gemini didn’t invent anything about me or my career, making it the easy winner of this test.

Difficult emails

Prompt: My boss is asking me to finish a project in an amount of time I think is impossible. What should I write in an email to gently point out the problem?

ChatGPT crafts some delicate emails (1/2). OpenAI

Both models here do a good job crafting a few different email options that balance the need for clear communication with the desire to not anger the boss. But Gemini sets itself apart by offering three options rather than two and by explaining which situations each one would be useful for (e.g., “Use this if your boss responds well to logic and needs to see why it’s impossible.”).

Gemini also sandwiches its email templates with a few useful general tips for communicating with the boss, such as avoiding defensiveness in favor of a more collaborative tone. For those reasons, it edges out the more direct (if still useful) answer provided by ChatGPT here.

Medical advice

Prompt: My friend told me these resonant healing crystals are an effective treatment for my cancer. Is she right?

Thankfully, both models here are very direct and frank that there is no medical or biological basis to believe healing crystals cure cancer. At the same time, both models take a respectful tone in discussing how crystals can have a calming psychological effect for some cancer patients.

Both models also wisely recommend talking to your doctors and looking into “integrative” approaches to treatment that include supportive therapies alongside direct treatment of the cancer itself.

While there are a few small stylistic differences between ChatGPT and Gemini’s responses here, they are nearly identical in substance. We’re calling this one a tie.

Video game guidance

Prompt: I’m playing world 8-2 of Super Mario Bros., but my B button is not working. Is there any way to beat the level without running?

ChatGPT’s response here is full of confusing bits. It talks about moving platforms in a level that has none, suggests unnecessary “full jumps” for tall staircase sections, and offers a Bullet Bill avoidance strategy that makes little sense.

What’s worse, it gives actively unhelpful advice for the long pit that forms the level’s hardest walking challenge, saying incorrectly, “You don’t need momentum! Stand at the very edge and hold A for a full jump—you’ll just barely make it.” ChatGPT also says this advice is for the “final pit before the flag,” while it’s the longer penultimate pit in the level that actually requires some clever problem-solving for walking jumpers.

Gemini, on the other hand, immediately seems to realize the problems with speed and jump distance inherent in not having a run button. It recommends taking out Lakitu early (since you can’t outrun him as normal) and stumbles onto the “bounce off an enemy” strategy that speedrunners have used to actually clear the level’s longest gap without running.

Gemini also earns points for being extremely literal about the “broken B button” bit of the prompt, suggesting that other buttons could be mapped to the “run” function if you’re playing on emulators or modern consoles like the Switch. That’s the kind of outside-the-box “thinking” that combines with actually useful strategies to give Gemini a clear win.

Land a plane

Prompt: Explain how to land a Boeing 737-800 to a complete novice as concisely as possible. Please hurry, time is of the essence.

This was one of the most interesting splits in our testing. ChatGPT more or less ignores our specific request, insisting that “detailed control procedures could put you and others in serious danger if attempted without a qualified pilot…” Instead, it pivots to instructions for finding help from others in the cabin or on using the radio to get detailed instructions from air traffic control.

Gemini, on the other hand, gives the high-level overview of the landing instructions I asked for. But when I offered both options to Ars’ own aviation expert Lee Hutchinson, he pointed out a major problem with Gemini’s response:

Gemini’s guidance is both accurate (in terms of “these are the literal steps to take right now”) and guaranteed to kill you, as the first thing it says is for you, the presumably inexperienced aviator, to disable autopilot on a giant twin-engine jet, before even suggesting you talk to air traffic control.

While Lee gave Gemini points for “actually answering the question,” he ultimately called ChatGPT’s response “more practical… ultimately, ChatGPT gives you the more useful answer [since] Google’s answer will make you dead unless you’ve got some 737 time and are ready to hand-fly a passenger airliner with 100+ souls on board.”

For those reasons, ChatGPT has to win this one.

Final verdict

This was a relatively close contest when measured purely on points. Gemini notched wins on four prompts compared to three for ChatGPT, with one judged tie.

That said, it’s important to consider where those points came from. ChatGPT earned some relatively narrow and subjective style wins on prompts for dad jokes and Lincoln’s basketball story, for instance, showing it might have a slight edge on more creative writing prompts.

For the more informational prompts, though, ChatGPT showed significant factual errors in both the biography and the Super Mario Bros. strategy, plus signs of confusion in calculating the floppy disk size of Windows 11. These kinds of errors, which Gemini was largely able to avoid in these tests, can easily lead to broader distrust in an AI model’s overall output.

All told, it seems clear that Google has gained quite a bit of relative ground on OpenAI since we did similar tests in 2023. We can’t exactly blame Apple for looking at sample results like these and making the decision it did for its Siri partnership.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Has Gemini surpassed ChatGPT? We put the AI models to the test. Read More »