Features

new-physical-attacks-are-quickly-diluting-secure-enclave-defenses-from-nvidia,-amd,-and-intel

New physical attacks are quickly diluting secure enclave defenses from Nvidia, AMD, and Intel


On-chip TEEs withstand rooted OSes but fall instantly to cheap physical attacks.

Trusted execution environments, or TEEs, are everywhere—in blockchain architectures, virtually every cloud service, and computing involving AI, finance, and defense contractors. It’s hard to overstate the reliance that entire industries have on three TEEs in particular: Confidential Compute from Nvidia, SEV-SNP from AMD, and SGX and TDX from Intel. All three come with assurances that confidential data and sensitive computing can’t be viewed or altered, even if a server has suffered a complete compromise of the operating kernel.

A trio of novel physical attacks raises new questions about the true security offered by these TEES and the exaggerated promises and misconceptions coming from the big and small players using them.

The most recent attack, released Tuesday, is known as TEE.fail. It defeats the latest TEE protections from all three chipmakers. The low-cost, low-complexity attack works by placing a small piece of hardware between a single physical memory chip and the motherboard slot it plugs into. It also requires the attacker to compromise the operating system kernel. Once this three-minute attack is completed, Confidential Compute, SEV-SNP, and TDX/SDX can no longer be trusted. Unlike the Battering RAM and Wiretap attacks from last month—which worked only against CPUs using DDR4 memory—TEE.fail works against DDR5, allowing them to work against the latest TEEs.

Some terms apply

All three chipmakers exclude physical attacks from threat models for their TEEs, also known as secure enclaves. Instead, assurances are limited to protecting data and execution from viewing or tampering, even when the kernel OS running the processor has been compromised. None of the chipmakers make these carveouts prominent, and they sometimes provide confusing statements about the TEE protections offered.

Many users of these TEEs make public assertions about the protections that are flat-out wrong, misleading, or unclear. All three chipmakers and many TEE users focus on the suitability of the enclaves for protecting servers on a network edge, which are often located in remote locations, where physical access is a top threat.

“These features keep getting broken, but that doesn’t stop vendors from selling them for these use cases—and people keep believing them and spending time using them,” said HD Moore, a security researcher and the founder and CEO of runZero.

He continued:

Overall, it’s hard for a customer to know what they are getting when they buy confidential computing in the cloud. For on-premise deployments, it may not be obvious that physical attacks (including side channels) are specifically out of scope. This research shows that server-side TEEs are not effective against physical attacks, and even more surprising, Intel and AMD consider these out of scope. If you were expecting TEEs to provide private computing in untrusted data centers, these attacks should change your mind.

Those making these statements run the gamut from cloud providers to AI engines, blockchain platforms, and even the chipmakers themselves. Here are some examples:

  • Cloudflare says it’s using Secure Memory Encryption—the encryption engine driving SEV—to safeguard confidential data from being extracted from a server if it’s stolen.
  • In a post outlining the possibility of using the TEEs to secure confidential information discussed in chat sessions, Anthropic says the enclave “includes protections against physical attacks.”
  • Microsoft marketing (here and here) devotes plenty of ink to discussing TEE protections without ever noting the exclusion.
  • Meta, paraphrasing the Confidential Computing Consortium, says TEE security provides protections against malicious “system administrators, the infrastructure owner, or anyone else with physical access to the hardware.” SEV-SNP is a key pillar supporting the security of Meta’s WhatsApp Messenger.
  • Even Nvidia claims that its TEE security protects against “infrastructure owners such as cloud providers, or anyone with physical access to the servers.”
  • The maker of the Signal private messenger assures users that its use of SGX means that “keys associated with this encryption never leave the underlying CPU, so they’re not accessible to the server owners or anyone else with access to server infrastructure.” Signal has long relied on SGX to protect contact-discovery data.

I counted more than a dozen other organizations providing assurances that were similarly confusing, misleading, or false. Even Moore—a security veteran with more than three decades of experience—told me: “The surprising part to me is that Intel/AMD would blanket-state that physical access is somehow out of scope when it’s the entire point.”

In fairness, some TEE users build additional protections on top of the TEEs provided out of the box. Meta, for example, said in an email that the WhatsApp implementation of SEV-SNP uses protections that would block TEE.fail attackers from impersonating its servers. The company didn’t dispute that TEE.fail could nonetheless pull secrets from the AMD TEE.

The Cloudflare theft protection, meanwhile, relies on SME—the engine driving SEV-SNP encryption. The researchers didn’t directly test SME against TEE.fail. They did note that SME uses deterministic encryption, the cryptographic property that causes all three TEEs to fail. (More about the role of deterministic encryption later.)

Others who misstate the TEEs’ protections provide more accurate descriptions elsewhere. Given all the conflicting information, it’s no wonder there’s confusion.

How do you know where the server is? You don’t.

Many TEE users run their infrastructure inside cloud providers such as AWS, Azure, or Google, where protections against supply-chain and physical attacks are extremely robust. That raises the bar for a TEE.fail-style attack significantly. (Whether the services could be compelled by governments with valid subpoenas to attack their own TEE is not clear.)

All these caveats notwithstanding, there’s often (1) little discussion of the growing viability of cheap, physical attacks, (2) no evidence (yet) that implementations not vulnerable to the three attacks won’t fall to follow-on research, or (3) no way for parties relying on TEEs to know where the servers are running and whether they’re free from physical compromise.

“We don’t know where the hardware is,” Daniel Genkin, one of the researchers behind both TEE.fail and Wiretap, said in an interview. “From a user perspective, I don’t even have a way to verify where the server is. Therefore, I have no way to verify if it’s in a reputable facility or an attacker’s basement.”

In other words, parties relying on attestations from servers in the cloud are once again reduced to simply trusting other people’s computers. As Moore observed, solving that problem is precisely the reason TEEs exist.

In at least two cases, involving the blockchain services Secret Network and Crust, the loss of TEE protections made it possible for any untrusted user to present cryptographic attestations. Both platforms used the attestations to verify that a blockchain node operated by one user couldn’t tamper with the execution or data passing to another user’s nodes. The Wiretap hack on SGX made it possible for users to run the sensitive data and executions outside of the TEE altogether while still providing attestations to the contrary. In the AMD attack, the attacker could decrypt the traffic passing through the TEE.

Both Secret Network and Crust added mitigations after learning of the possible physical attacks with Wiretap and Battering RAM. Given the lack of clear messaging, other TEE users are likely making similar mistakes.

A predetermined weakness

The root cause of all three physical attacks is the choice of deterministic encryption. This form of encryption produces the same ciphertext each time the same plaintext is encrypted with the same key. A TEE.fail attacker can copy ciphertext strings and use them in replay attacks. (Probabilistic encryption, by contrast, resists such attacks because the same plaintext can encrypt to a wide range of ciphertexts that are randomly chosen during the encryption process.)

TEE.fail works not only against SGX but also a more advanced Intel TEE known as TDX. The attack also defeats the protections provided by the latest Nvidia Confidential Compute and AMD SEV-SNP TEEs. Attacks against TDX and SGX can extract the Attestation Key, an ECDSA secret that certifies to a remote party that it’s running up-to-date software and can’t expose data or execution running inside the enclave. This Attestation Key is in turn signed by an Intel X.509 digital certificate providing cryptographic assurances that the ECDSA key can be trusted. TEE.fail works against all Intel CPUs currently supporting TDX and SDX.

With possession of the key, the attacker can use the compromised server to peer into data or tamper with the code flowing through the enclave and send the relying party an assurance that the device is secure. With this key, even CPUs built by other chipmakers can send an attestation that the hardware is protected by the Intel TEEs.

GPUs equipped with Nvidia Confidential Compute don’t bind attestation reports to the specific virtual machine protected by a specific GPU. TEE.fail exploits this weakness by “borrowing” a valid attestation report from a GPU run by the attacker and using it to impersonate the GPU running Confidential Compute. The protection is available on Nvidia’s H100/200 and B100/200 server GPUs.

“This means that we can convince users that their applications (think private chats with LLMs or Large Language Models) are being protected inside the GPU’s TEE while in fact it is running in the clear,” the researchers wrote on a website detailing the attack. “As the attestation report is ‘borrowed,’ we don’t even own a GPU to begin with.”

SEV-SNP (Secure Encrypted Virtualization-Secure Nested Paging) uses ciphertext hiding in AMD’s EPYC CPUs based on the Zen 5 architecture. AMD added it to prevent a previous attack known as Cipherleaks, which allowed malicious hypervisors to extract cryptographic keys stored in the enclaves of a virtual machine. Ciphertext, however, doesn’t stop physical attacks. With the ability to reopen the side channel that Cipherleaks relies on, TEE.fail can steal OpenSSL credentials and other key material based on constant-time encryption.

Cheap, quick, and the size of a briefcase

“Now that we have interpositioned DDR5 traffic, our work shows that even the most modern of TEEs across all vendors with available hardware is vulnerable to cheap physical attacks,” Genkin said.

The equipment required by TEE.fail runs off-the-shelf gear that costs less than $1,000. One of the devices the researchers built fits into a 17-inch briefcase, so it can be smuggled into a facility housing a TEE-protected server. Once the physical attack is performed, the device does not need to be connected again. Attackers breaking TEEs on servers they operate have no need for stealth, allowing them to use a larger device, which the researchers also built.

A logic analyzer attached to an interposer.

The researchers demonstrated attacks against an array of services that rely on the chipmakers’ TEE protections. (For ethical reasons, the attacks were carried out against infrastructure that was identical to but separate from the targets’ networks.) Some of the attacks included BuilderNet, dstack, and Secret Network.

BuilderNet is a network of Ethereum block builders that uses TDX to prevent parties from snooping on others’ data and to ensure fairness and that proof currency is redistributed honestly. The network builds blocks valued at millions of dollars each month.

“We demonstrated that a malicious operator with an attestation key could join BuilderNet and obtain configuration secrets, including the ability to decrypt confidential orderflow and access the Ethereum wallet for paying validators,” the TEE.fail website explained. “Additionally, a malicious operator could build arbitrary blocks or frontrun (i.e., construct a new transaction with higher fees to ensure theirs is executed first) the confidential transactions for profit while still providing deniability.”

To date, the researchers said, BuilderNet hasn’t provided mitigations. Attempts to reach BuilderNet officials were unsuccessful.

dstack is a tool for building confidential applications that run on top of virtual machines protected by Nvidia Confidential Compute. The researchers used TEE.fail to forge attestations certifying that a workload was performed by the TDX using the Nvidia protection. It also used the “borrowed” attestations to fake ownership of GPUs that a relying party trusts.

Secret Network is a platform billing itself as the “first mainnet blockchain with privacy-preserving smart contracts,” in part by encrypting on-chain data and execution with SGX. The researchers showed that TEE.fail could extract the “Concensus Seed,” the primary network-side private key encrypting confidential transactions on the Secret Network. As noted, after learning of Wiretap, the Secret Network eliminated this possibility by establishing a “curated” allowlist of known, trusted nodes allowed on the network and suspended the acceptance of new nodes. Academic or not, the ability to replicate the attack using TEE.fail shows that Wiretap wasn’t a one-off success.

A tough nut to crack

As explained earlier, the root cause of all the TEE.fail attacks is deterministic encryption, which forms the basis for protections in all three chipmakers’ TEEs. This weaker form of encryption wasn’t always used in TEEs. When Intel initially rolled out SGX, the feature was put in client CPUs, not server ones, to prevent users from building devices that could extract copyrighted content such as high-definition video.

Those early versions encrypted no more than 256MB of RAM, a small enough space to use the much stronger probabilistic form of encryption. The TEEs built into server chips, by contrast, must often encrypt terabytes of RAM. Probabilistic encryption doesn’t scale to that size without serious performance penalties. Finding a solution that accommodates this overhead won’t be easy.

One mitigation over the short term is to ensure that each 128-bit block of ciphertext has sufficient entropy. Adding random plaintext to the blocks prevents ciphertext repetition. The researchers say the entropy can be added by building a custom memory layout that inserts a 64-bit counter with a random initial value to each 64-bit block before encrypting it.

The last countermeasure the researchers proposed is adding location verification to the attestation mechanism. While insider and supply chain attacks remain a possibility inside even the most reputable cloud services, strict policies make them much less feasible. Even those mitigations, however, don’t foreclose the threat of a government agency with a valid subpoena ordering an organization to run such an attack inside their network.

In a statement, Nvidia said:

NVIDIA is aware of this research. Physical controls in addition to trust controls such as those provided by Intel TDX reduce the risk to GPUs for this style of attack, based on our discussions with the researchers. We will provide further details once the research is published.

Intel spokesman Jerry Bryant said:

Fully addressing physical attacks on memory by adding more comprehensive confidentiality, integrity and anti-replay protection results in significant trade-offs to Total Cost of Ownership. Intel continues to innovate in this area to find acceptable solutions that offer better balance between protections and TCO trade-offs.

The company has published responses here and here reiterating that physical attacks are out of scope for both TDX and SGX

AMD didn’t respond to a request for comment.

Stuck on Band-Aids

For now, TEE.fail, Wiretap, and Battering RAM remain a persistent threat that isn’t solved with the use of default implementations of the chipmakers’ secure enclaves. The most effective mitigation for the time being is for TEE users to understand the limitations and curb uses that the chipmakers say aren’t a part of the TEE threat model. Secret Network tightening requirements for operators joining the network is an example of such a mitigation.

Moore, the founder and CEO of RunZero, said that companies with big budgets can rely on custom solutions built by larger cloud services. AWS, for example, makes use of the Nitro Card, which is built using ASIC chips that accelerate processing using TEEs. Google’s proprietary answer is Titanium.

“It’s a really hard problem,” Moore said. “I’m not sure what the current state of the art is, but if you can’t afford custom hardware, the best you can do is rely on the CPU provider’s TEE, and this research shows how weak this is from the perspective of an attacker with physical access. The enclave is really a Band-Aid or hardening mechanism over a really difficult problem, and it’s both imperfect and dangerous if compromised, for all sorts of reasons.”

Photo of Dan Goodin

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

New physical attacks are quickly diluting secure enclave defenses from Nvidia, AMD, and Intel Read More »

10m-people-watched-a-youtuber-shim-a-lock;-the-lock-company-sued-him-bad-idea.

10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea.


It’s still legal to pick locks, even when you swing your legs.

“Opening locks” might not sound like scintillating social media content, but Trevor McNally has turned lock-busting into online gold. A former US Marine Staff Sergeant, McNally today has more than 7 million followers and has amassed more than 2 billion views just by showing how easy it is to open many common locks by slapping, picking, or shimming them.

This does not always endear him to the companies that make the locks.

On March 3, 2025, a Florida lock company called Proven Industries released a social media promo video just begging for the McNally treatment. The video was called, somewhat improbably, “YOU GUYS KEEP SAYING YOU CAN EASILY BREAK OFF OUR LATCH PIN LOCK.” In it, an enthusiastic man in a ball cap says he will “prove a lot of you haters wrong.” He then goes hard at Proven’s $130 model 651 trailer hitch lock with a sledgehammer, bolt cutters, and a crowbar.

Naturally, the lock hangs tough.

An Instagram user brought the lock to McNally’s attention by commenting, “Let’s introduce it to the @mcnallyofficial poke.” Someone from Proven responded, saying that McNally only likes “the cheap locks lol because they are easy and fast.” Proven locks were said to be made of sterner stuff.

But on April 3, McNally posted a saucy little video to social media platforms. In it, he watches the Proven promo video while swinging his legs and drinking a Juicy Juice. He then hops down from his seat, goes over to a Proven trailer hitch lock, and opens it in a matter of seconds using nothing but a shim cut from a can of Liquid Death. He says nothing during the entire video, which has been viewed nearly 10 million times on YouTube alone.

Despite practically begging people to attempt this, Proven Industries owner Ron Lee contacted McNally on Instagram. “Just wanted to say thanks and be prepared!” he wrote. McNally took this as a threat.

(Oddly enough, Proven’s own homepage features a video in which the company trashes competing locks and shows just how easy it is to defeat them. And its news pages contain articles and videos on “The Hidden Flaws of Master Locks” and other brands. Why it got so upset about McNally’s video is unclear.)

The next day, Lee texted McNally’s wife. The message itself was apparently Lee’s attempt to de-escalate things; he says he thought the number belonged to McNally, and the message itself was unobjectionable. But after the “be prepared!” notice of the day before, and given the fact that Lee already knew how to contact him on Instagram, McNally saw the text as a way “to intimidate me and my family.” That feeling was cemented when McNally found out that Lee was a triple felon—and that in one case, Lee had hired someone “to throw a brick through the window of his ex-wife.”

Concerned about losing business, Lee kept trying to shut McNally down. Proven posted a “response video” on April 6 and engaged with numerous social media commenters, telling them that things were “going to get really personal” for McNally. Proven employees alleged publicly that McNally was deceiving people about all the prep work he had done to make a “perfectly cut out” shim. Without extensive experience, long prep work, and precise measurements, it was said, Proven’s locks were in little danger of being opened by rogue actors trying to steal your RV.

“Sucks to see how many people take everything they see online for face value,” one Proven employee wrote. “Sounds like a bunch of liberals lol.”

Proven also had its lawyers file “multiple” DMCA takedown notices against the McNally video, claiming that its use of Proven’s promo video was copyright infringement.

McNally didn’t bow to the pressure, though, instead uploading several more videos showing him opening Proven locks. In one of them, he takes aim at Proven’s claims about his prep work by retrieving a new lock from an Amazon delivery kiosk, taking it outside—and popping it in seconds using a shim he cuts right on camera, with no measurements, from an aluminum can.

Help us write more stories like this—while ditching ads

Ars subscribers support our independent journalism, which they can read ad-free and with enhanced privacy protections. And it’s only a few bucks a month.

Ars Pro

$5 / month

Subscribe

  • No ads
  • No tracking
  • Enhanced experience

58.3333% off!

Ars Pro

$25 / year

Subscribe

  • Best value
  • Still no ads
  • Still no tracking

Ars Pro++

$50 / year

Subscribe

  • All Ars Pro features
  • Support journalism
  • Special ++ badge

On May 1, Proven filed a federal lawsuit against McNally in the Middle District of Florida, charging him with a huge array of offenses: (1) copyright infringement, (2) defamation by implication, (3) false advertising, (4) violating the Florida Deceptive and Unfair Trade Practices Act, (5) tortious interference with business relationships, (6) unjust enrichment, (7) civil conspiracy, and (8) trade libel. Remarkably, the claims stemmed from a video that all sides admit was accurate and in which McNally himself said nothing.

Screenshot of a social media exchange.

In retrospect, this was probably not a great idea.

Don’t mock me, bro

How can you defame someone without even speaking? Proven claimed “defamation by implication,” arguing that the whole setup of McNally’s videos was unfair to the company and its product. McNally does not show his prep work, which (Proven argued) conveys to the public the false idea that Proven’s locks are easy to bypass. While the shimming does work, Proven argued that it would be difficult for an untrained user to perform.

But what Proven really, really didn’t like was being mocked. McNally’s decision to drink—and shake!—a juice box on video comes up in court papers a mind-boggling number of times. Here’s a sample:

McNally appears swinging his legs and sipping from an apple juice box, conveying to the purchasing public that bypassing Plaintiff’s lock is simple, trivial, and even comical…

…showing McNally drinking from, and shaking, a juice box, all while swinging his legs, and displaying the Proven Video on a mobile device…

The tone, posture, and use of the juice box prop and childish leg swinging that McNally orchestrated in the McNally Video was intentional to diminish the perceived seriousness of Proven Industries…

The use of juvenile imagery, such as sipping from a juice box while casually applying the shim, reinforces the misleading impression that the lock is inherently insecure and marketed deceptively…

The video then abruptly shifts to Defendant in a childlike persona, sipping from a juice box and casually applying a shim to the lock…

In the end, Proven argued that the McNally video was “for commercial entertainment and mockery,” produced for the purpose of “humiliating Plaintiff.” McNally, it was said, “will not stop until he destroys Proven’s reputation.” Justice was needed. Expensive, litigious justice.

But the proverbially level-headed horde of Internet users does not always love it when companies file thermonuclear lawsuits against critics. Sometimes, in fact, the level-headed horde disregards everything taught by that fount of judicial knowledge, The People’s Court, and they take the law into their own hands.

Proven was soon the target of McNally fans. The company says it was “forced to disable comments on posts and product videos due to an influx of mocking and misleading replies furthering the false narrative that McNally conveyed to the viewers.” The company’s customer service department received such an “influx of bogus customer service tickets… that it is experiencing difficulty responding to legitimate tickets.”

Screenshot of a social media post from Proven Industries.

Proven was quite proud of its lawsuit… at first.

Someone posted Lee’s personal phone number to the comment section of a McNally video, which soon led to “a continuous stream of harassing phone calls and text messages from unknown numbers at all hours of the day and night,” which included “profanity, threats, and racially charged language.”

Lest this seem like mere high spirits and hijinks, Lee’s partner and his mother both “received harassing messages through Facebook Messenger,” while other messages targeted Lee’s son, saying things like “I would kill your f—ing n—– child” and calling him a “racemixing pussy.”

This is clearly terrible behavior; it also has no obvious connection to McNally, who did not direct or condone the harassment. As for Lee’s phone number, McNally said that he had nothing to do with posting it and wrote that “it is my understanding that the phone number at issue is publicly available on the Better Business Bureau website and can be obtained through a simple Google search.”

And this, with both sides palpably angry at each other, is how things stood on June 13 at 9: 09 am, when the case got a hearing in front of the Honorable Mary Scriven, an extremely feisty federal judge in Tampa. Proven had demanded a preliminary injunction that would stop McNally from sharing his videos while the case progressed, but Proven had issues right from the opening gavel:

LAWYER 1: Austin Nowacki on behalf of Proven industries.

THE COURT: I’m sorry. What is your name?

LAWYER 1: Austin Nowacki.

THE COURT: I thought you said Austin No Idea.

LAWYER 2: That’s Austin Nowacki.

THE COURT: All right.

When Proven’s lead lawyer introduced a colleague who would lead that morning’s arguments, the judge snapped, “Okay. Then you have a seat and let her speak.”

Things went on this way for some time, as the judge wondered, “Did the plaintiff bring a lock and a beer can?” (The plaintiff did not.) She appeared to be quite disappointed when it was clear there would be no live shimming demonstration in the courtroom.

Then it was on to the actual arguments. Proven argued that the 15 seconds of its 90-second promo video used by McNally were not fair use, that McNally had defamed the company by implication, and that shimming its locks was actually quite difficult. Under questioning, however, one of Proven’s employees admitted that he had been able to duplicate McNally’s technique, leading to the question from McNally’s lawyer: “When you did it yourself, did it occur to you for one moment that maybe the best thing to do, instead of file a lawsuit, was to fix [the lock]?”

At the end of several hours of wrangling, the judge stepped in, saying that she “declines to grant the preliminary injunction motion.” For her to do so, Proven would have to show that it was likely to win at trial, among other things; it had not.

As for the big copyright infringement claim, of which Proven had made so much hay, the judge reached a pretty obvious finding: You’re allowed to quote snippets of copyrighted videos in order to critique them.

“The purpose and character of the use to which Mr. McNally put the alleged infringed work is transformative, artistic, and a critique,” said the judge. “He is in his own way challenging and critiquing Proven’s video by the use of his own video.”

As for the amount used, it was “substantial enough but no more than is necessary to make the point that he is trying to critique Proven’s video, and I think that’s fair game and a nominative fair use circumstance.”

While Proven might convince her otherwise after a full trial, “the copyright claim fails as a basis for a demand for preliminary injunctive relief.”

As for “tortious interference” and “defamation by implication,” the judge was similarly unimpressed.

“The fact that you might have a repeat customer who is dissuaded to buy your product due to a criticism of the product is not the type of business relationship the tortious interference with business relationship concept is intended to apply,” she said.

In the end, the judge said she would see the case through to its end, if that was really what everyone wanted, but “I will pray that you all come to a resolution of the case that doesn’t require all of this. This is a capitalist market and people say what they say. As long as it’s not false, they say what they say.”

She gave Proven until July 7 to amend its complaint if it wished.

On July 7, the company dismissed the lawsuit against McNally instead.

Proven also made a highly unusual request: Would the judge please seal almost the entire court record—including the request to seal?

Court records are presumptively public, but Proven complained about a “pattern of intimidation and harassment by individuals influenced by Defendant McNally’s content.” According to the company, a key witness had already backed out of the case, saying, “Is there a way to leave my name and my companies name out of this due to concerns of potential BLOW BACK from McNally or others like him?” Another witness, who did submit a declaration, wondered, “Is this going to be public? My concern is that there may be some backlash from the other side towards my company.”

McNally’s lawyer laid into this seal request, pointing out that the company had shown no concern over these issues until it lost its bid for a preliminary injunction. Indeed, “Proven boasted to its social media followers about how it sued McNally and about how confident it was that it would prevail. Proven even encouraged people to search for the lawsuit.” Now, however, the company “suddenly discover[ed] a need for secrecy.”

The judge has not yet ruled on the request to seal.

Another way

The strange thing about the whole situation is that Proven actually knew how to respond constructively to the first McNally video. Its own response video opened with a bit of humor (the presenter drinks a can of Liquid Death), acknowledged the issue (“we’ve had a little bit of controversy in the last couple days”), and made clear that Proven could handle criticism (“we aren’t afraid of a little bit of feedback”).

The video went on to show how their locks work and provided some context on shimming attacks and their likelihood of real-world use. It ended by showing how users concerned about shimming attacks could choose more expensive but more secure lock cores that should resist the technique.

Quick, professional, non-defensive—a great way to handle controversy.

But it was all blown apart by the company’s angry social media statements, which were unprofessional and defensive, and the litigation, which was spectacularly ill-conceived as a matter of both law and policy. In the end, the case became a classic example of the Streisand Effect, in which the attempt to censor information can instead call attention to it.

Judging from the number of times the lawsuit talks about 1) ridicule and 2) harassment, it seems like the case quickly became a personal one for Proven’s owner and employees, who felt either mocked or threatened. That’s understandable, but being mocked is not illegal and should never have led to a lawsuit or a copyright claim. As for online harassment, it remains a serious and unresolved issue, but launching a personal vendetta—and on pretty flimsy legal grounds—against McNally himself was patently unwise. (Doubly so given that McNally had a huge following and had already responded to DMCA takedowns by creating further videos on the subject; this wasn’t someone who would simply be intimidated by a lawsuit.)

In the end, Proven’s lawsuit likely cost the company serious time and cash—and generated little but bad publicity.

Photo of Nate Anderson

10M people watched a YouTuber shim a lock; the lock company sued him. Bad idea. Read More »

we-let-openai’s-“agent-mode”-surf-the-web-for-us—here’s-what-happened

We let OpenAI’s “Agent Mode” surf the web for us—here’s what happened


But when will it fold my laundry?

From scanning emails to building fansites, Atlas can ably automate some web-based tasks.

He wants us to write what about Tuvix? Credit: Getty Images

He wants us to write what about Tuvix? Credit: Getty Images

On Tuesday, OpenAI announced Atlas, a new web browser with ChatGPT integration, to let you “chat with a page,” as the company puts it. But Atlas also goes beyond the usual LLM back-and-forth with Agent Mode, a “preview mode” feature the company says can “get work done for you” by clicking, scrolling, and reading through various tabs.

“Agentic” AI is far from new, of course; OpenAI itself rolled out a preview of the web browsing Operator agent in January and introduced the more generalized “ChatGPT agent” in July. Still, prominently featuring this capability in a major product release like this—even in “preview mode”—signals a clear push to get this kind of system in front of end users.

I wanted to put Atlas’ Agent Mode through its paces to see if it could really save me time in doing the kinds of tedious online tasks I plod through every day. In each case, I’ll outline a web-based problem, lay out the Agent Mode prompt I devised to try to solve it, and describe the results. My final evaluation will rank each task on a 10-point scale, with 10 being “did exactly what I wanted with no problems” and one being “complete failure.”

Playing web games

The problem: I want to get a high score on the popular tile-sliding game 2048 without having to play it myself.

The prompt: “Go to play2048.co and get as high a score as possible.”

The results: While there’s no real utility to this admittedly silly task, a simple, no-reflexes-needed web game seemed like a good first test of the Atlas agent’s ability to interpret what it sees on a webpage and act accordingly. After all, if frontier-model LLMs like Google Gemini can beat a complex game like Pokémon, 2048 should pose no problem for a web browser agent.

To Atlas’ credit, the agent was able to quickly identify and close a tutorial link blocking the gameplay window and figure out how to use the arrow keys to play the game without any further help. When it came to actual gaming strategy, though, the agent started by flailing around, experimenting with looped sequences of moves like “Up, Left, Right, Down” and “Left and Down.”

Finally, a way to play 2048 without having to, y’know, play 2048.

Credit: Kyle Orland

Finally, a way to play 2048 without having to, y’know, play 2048. Credit: Kyle Orland

After a while, the random flailing settled down a bit, with the agent seemingly looking ahead for some simple strategies: “The board currently has two 32 tiles that aren’t adjacent, but I think I can align them,” the Activity summary read at one point. “I could try shifting left or down to make them merge, but there’s an obstacle in the form of an 8 tile. Getting to 64 requires careful tile movement!”

Frustratingly, the agent stopped playing after just four minutes, settling on a score of 356 even though the board was far from full. I had to prompt the agent a few more times to convince it to play the game to completion; it ended up with a total of 3164 points after 260 moves. That’s pretty similar to the score I was able to get in a test game as a 2048 novice, though expert players have reportedly scored much higher.

Evaluation: 7/10. The agent gets credit for being able to play the game competently without any guidance but loses points for having to be told to keep playing to completion and for a score that is barely on the level of a novice human.

Making a radio playlist

The problem: I want to transform the day’s playlist from my favorite Pittsburgh-based public radio station into an on-demand Spotify playlist.

The prompt: “Go to Radio Garden. Find WYEP and monitor the broadcast. For every new song you hear, identify the song and add it to a new Spotify playlist.”

The results: After trying and failing to find a track listing for WYEP on Radio Garden as requested, the Atlas agent smartly asked for approval to move on to wyep.org to continue the task. By the time I noticed this request, the link to wyep.org had been replaced in the Radio Garden tab with an ad for EVE Online, which the agent accidentally clicked. The agent quickly realized the problem and navigated to the WYEP website directly to fix it.

From there, the agent was able to scan the page and identify the prominent “Now Playing” text near the top (it’s unclear if it could ID the music simply via audio without this text cue). After asking me to log in to my Spotify account, the agent used the search bar to find the listed songs and added them to a new playlist without issue.

From radio stream to Spotify playlist in a single sentence.

Credit: Kyle Orland

From radio stream to Spotify playlist in a single sentence. Credit: Kyle Orland

The main problem with this use case is the inherent time limitations. On the first try, the agent worked for four minutes and managed to ID and add just two songs that played during that time. When I asked it to continue for an hour, I got an error message blaming “technical constraints on session length” for stricter limits. Even when I asked it to continue for “as long as possible,” I only got three more minutes of song listings.

At one point, the Atlas agent suggested that “if you need ongoing updates, you can ask me again after a while and I can resume from where we left off.” And to the agent’s credit, when I went back to the tab hours later and told it to “resume monitoring,” I got four new songs added to my playlist.

Evaluation: 9/10. The agent was able to navigate multiple websites and interfaces to complete the task, even when unexpected problems got in the way. I took off a point only because I can’t just leave this running as a background task all day, even as I understand that use case would surely eat up untold amounts of money and processing power on OpenAI’s part.

Scanning emails

The problem: I need to go through my emails to create a reference spreadsheet with contact info for the many, many PR people who send me messages.

The prompt: “Look through all my Ars Technica emails from the last week. Collect all the contact information (name, email address, phone number, etc.) for PR contacts contained in those emails and add them to a new Google Sheets spreadsheet.”

The results: Without being explicitly guided, the Atlas agent was able to realize that I use Gmail, and it could differentiate between the personal email account and professional Ars Technica accounts I had open in separate tabs. As the Atlas agent started scanning my Ars mailbox, though, I saw a prominent warning overlaid on the page: “Sensitive: ChatGPT will only work while you view the tab.” That kind of ruined the point, since I wanted Atlas to handle this for me while I do other stuff online, but I guess I could still play a Steam Deck game while I waited.

Just a few of the many, many PR people who email me in a week.

Just a few of the many, many PR people who email me in a week.

After searching for “after: 2025/10/14 before: 2025/10/22 PR” in Gmail (mirroring the kind of search I would have used for this task), the Atlas agent clicked through each email, scrolling through to look for names, emails, and phone numbers (and also the relevant company name, which I didn’t ask for explicitly).

With seven minutes of background work, the agent was able to open up a new Google Sheet and quickly enter 12 rows of well-formatted data for 12 different PR contacts. Unfortunately, it stopped well before it had a chance to go through all 164 emails returned by that initial Gmail search.

Evaluation: 8/10. Again, the “technical constraints on session length” seem to be getting in the way of finishing a task, forcing me to deduct a couple of points for incompleteness.

Wiki editing

The problem: I need the world to know that Tuvix was murdered by Captain Janeway in the Star Trek: Voyager episode “Tuvix.”

The prompt: “Go to the Fandom Wiki page for Tuvix. Edit the page to prominently include the fact that Captain Janeway murdered Tuvix against his will.”

The results: Before the agent could even get going, I was informed, “I’m sorry, but I can’t help with editing or vandalising wiki pages (including the Fandom wiki) in a way that misrepresents them or forces a biased viewpoint.”

This wording implied that the agent can edit wiki pages with neutral or strictly informative additions, so I asked Atlas what sort of wording would be more acceptable. The agent responded helpfully that we might write, “Tuvix was later separated back into Tuvok and Neelix, resulting in Tuvix’s death.” But when I asked it to add that very line to the Fandom Wiki, I got a much more direct refusal: “I’m sorry, but I can’t make edits directly to external websites or wikis.”

Evaluation: N/A. On the one hand, the Atlas agent won’t do my Tuvix-based web activism for me. On the other hand, it’s probably better for all of us that Atlas refuses to automate this kind of public web defacement by default.

Making a fan page

The problem: People online still need to know about Janeway’s murder of Tuvix!

The prompt: “Go to NeoCities and create a fan site for the Star Trek character Tuvix. Make sure it has lots of images and fun information about Tuvix and that it makes it clear that Tuvix was murdered by Captain Janeway against his will.”

The results: You can see them for yourself right here. After a brief pause so I could create and log in to a new Neocities account, the Atlas agent was able to generate this humble fan page in just two minutes after aggregating information from a wide variety of pages like Memory Alpha and TrekCore. “The Hero Starfleet Murdered” and “Justice for Tuvix” headers are nice touches, but the actual text is much more mealy-mouthed about the “intense debate” and “ethical dilemmas” around what I wanted to make clear was clearly premeditated murder.

Justice for Tuvix!

Credit: Kyle Orland

Justice for Tuvix! Credit: Kyle Orland

The agent also had a bit of trouble with the request for images. Instead of downloading some Tuvix pictures and uploading copies to Neocities (which I’m not entirely sure Atlas can do on its own), the agent decided to directly reference images hosted on external servers, which is usually a big no-no in web design. The agent did notice when these external image links failed to work, saying that it would “need to find more accessible images from reliable sources,” but it failed to even attempt that before stopping its work on the task.

Evaluation: 7/10. Points for building a passable Web 1.0 fansite relatively quickly, but the weak prose and broken images cost it some execution points here.

Picking a power plan

The problem: Ars Senior Technology Editor Lee Hutchinson told me he needs to go through the annoying annual process of selecting a new electricity plan “because Texas is insane.”

The prompt: “Go to powertochoose.org and find me a 12–24 month contract that prioritizes an overall low usage rate. I use an average of 2,000 KWh per month. My power delivery company is Texas New-Mexico Power (“TNMP”) not Centerpoint. My ZIP code is [redacted]. Please provide the ‘fact sheet’ for any and all plans you recommend.”

The results: After spending eight minutes fiddling with the site’s search parameters and seemingly getting repeatedly confused about how to sort the results by the lowest rate, the Atlas agent spit out a recommendation to read this fact sheet, which it said “had the best average prices at your usage level. The ‘Bright Nights’ plans are time‑of‑use offers that provide free electricity overnight and charge a higher rate during the day, while the ‘Digital Saver’ plan is a traditional fixed‑rate contract.”

If Ars’ Lee Hutchinson never has to use this web site again, it will be too soon.

Credit: Power to Choose

If Ars’ Lee Hutchinson never has to use this web site again, it will be too soon. Credit: Power to Choose

Since I don’t know anything about the Texas power market, I passed this information on to Lee, who had this to say: “It’s not a bad deal—it picked a fixed rate plan without being asked, which is smart (variable rate pricing is how all those poor people got stuck with multi-thousand dollar bills a few years back in the freeze). It’s not the one I would have picked due to the weird nighttime stuff (if you don’t meet that exact criteria, your $/kWh will be way worse) but it’s not a bad pick!”

Evaluation: 9/10. As Lee puts it, “it didn’t screw up the assignment.

Downloading some games

The problem: I want to download some recent Steam demos to see what’s new in the gaming world.

The prompt: “Go to Steam and find the most recent games with a free demo available for the Mac. Add all of those demos to my library and start to download them.”

The results: Rather than navigating to the “Free Demos” category, the Atlas agent started by searching for “demo.” After eventually finding the macOS filter, it wasted minutes and minutes looking for a “has demo” filter, even though the search for the word “demo” already narrowed it down.

This search results page was about as far as the Atlas agent was able to get when I asked it for game demos.

Credit: Kyle Orland

This search results page was about as far as the Atlas agent was able to get when I asked it for game demos. Credit: Kyle Orland

After a long while, the agent finally clicked the top result on the page, which happened to be visual novel Project II: Silent Valley. But even though there was a prominent “Download Demo” link on that page, the agent became concerned that it was on the Steam page for the full game and not a demo. It backed up to the search results page and tried again.

After watching some variation of this loop for close to ten minutes, I stopped the agent and gave up.

Evaluation: 1/10. It technically found some macOS game demos but utterly failed to even attempt to download them.

Final results

Across six varied web-based tasks (I left out the Wiki vandalism from my summations), the Atlas agent scored a median of 7.5 points (and a mean of 6.83 points) on my somewhat subjective 10-point scale. That’s honestly better than I expected for a “preview mode” feature that is still obviously being tested heavily by OpenAI.

In my tests, Atlas was generally able to correctly interpret what was being asked of it and was able to navigate and process information on webpages carefully (if slowly). The agent was able to navigate simple web-based menus and get around unexpected obstacles with relative ease most of the time, even as it got caught in infinite loops other times.

The major limiting factor in many of my tests continues to be the “technical constraints on session length” that seem to limit most tasks to a few minutes. Given how long it takes the Atlas agent to figure out where to click next—and the repetitive nature of the kind of tasks I’d want a web-agent to automate—this severely limits its utility. A version of the Atlas agent that could work indefinitely in the background would have scored a few points better on my metrics.

All told, Atlas’ “Agent Mode” isn’t yet reliable enough to use as a kind of “set it and forget it” background automation tool. But for simple, repetitive tasks that a human can spot-check afterward, it already seems like the kind of tool I might use to avoid some of the drudgery in my online life.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

We let OpenAI’s “Agent Mode” surf the web for us—here’s what happened Read More »

google-has-a-useful-quantum-algorithm-that-outperforms-a-supercomputer

Google has a useful quantum algorithm that outperforms a supercomputer


An approach it calls “quantum echoes” takes 13,000 times longer on a supercomputer.

Image of a silvery plate labeled with

The work relied on Google’s current-generation quantum hardware, the Willow chip. Credit: Google

The work relied on Google’s current-generation quantum hardware, the Willow chip. Credit: Google

A few years back, Google made waves when it claimed that some of its hardware had achieved quantum supremacy, performing operations that would be effectively impossible to simulate on a classical computer. That claim didn’t hold up especially well, as mathematicians later developed methods to help classical computers catch up, leading the company to repeat the work on an improved processor.

While this back-and-forth was unfolding, the field became less focused on quantum supremacy and more on two additional measures of success. The first is quantum utility, in which a quantum computer performs computations that are useful in some practical way. The second is quantum advantage, in which a quantum system completes calculations in a fraction of the time it would take a typical computer. (IBM and a startup called Pasqual have published a useful discussion about what would be required to verifiably demonstrate a quantum advantage.)

Today, Google and a large collection of academic collaborators are publishing a paper describing a computational approach that demonstrates a quantum advantage compared to current algorithms—and may actually help us achieve something useful.

Out of time

Google’s latest effort centers on something it’s calling “quantum echoes.” The approach could be described as a series of operations on the hardware qubits that make up its machine. These qubits hold a single bit of quantum information in a superposition between two values, with probabilities of finding the qubit in one value or the other when it’s measured. Each qubit is entangled with its neighbors, allowing its probability to influence those of all the qubits around it. The operations that allow computation, called gates, are ways of manipulating these probabilities. Most current hardware, including Google’s, perform manipulations on one or two qubits at a time (termed one- and two-qubit gates, respectively.

For quantum echoes, the operations involved performing a set of two-qubit gates, altering the state of the system, and later performing the reverse set of gates. On its own, this would return the system to its original state. But for quantum echoes, Google inserts single-qubit gates performed with a randomized parameter. This alters the state of the system before the reverse operations take place, ensuring that the system won’t return to exactly where it started. That explains the “echoes” portion of the name: You’re sending an imperfect copy back toward where things began, much like an echo involves the imperfect reversal of sound waves.

That’s what the process looks like in terms of operations performed on the quantum hardware. But it’s probably more informative to think of it in terms of a quantum system’s behavior. As Google’s Tim O’Brien explained, “You evolve the system forward in time, then you apply a small butterfly perturbation, and then you evolve the system backward in time.” The forward evolution is the first set of two qubit gates, the small perturbation is the randomized one qubit gate, and the second set of two qubit gates is the equivalent of sending the system backward in time.

Because this is a quantum system, however, strange things happen. “On a quantum computer, these forward and backward evolutions, they interfere with each other,” O’Brien said. One way to think about that interference is in terms of probabilities. The system has multiple paths between its start point and the point of reflection—where it goes from evolving forward in time to evolving backward—and from that reflection point back to a final state. Each of those paths has a probability associated with it. And since we’re talking about quantum mechanics, those paths can interfere with each other, increasing some probabilities at the expense of others. That interference ultimately determines where the system ends up.

(Technically, these are termed “out of time order correlations,” or OTOCs. If you read the Nature paper describing this work, prepare to see that term a lot.)

Demonstrating advantage

So how do you turn quantum echoes into an algorithm? On its own, a single “echo” can’t tell you much about the system—the probabilities ensure that any two runs might show different behaviors. But if you repeat the operations multiple times, you can begin to understand the details of this quantum interference. And performing the operations on a quantum computer ensures that it’s easy to simply rerun the operations with different random one-qubit gates and get many instances of the initial and final states—and thus a sense of the probability distributions involved.

This is also where Google’s quantum advantage comes from. Everyone involved agrees that the precise behavior of a quantum echo of moderate complexity can be modeled using any leading supercomputer. But doing so is very time-consuming, so repeating those simulations a few times becomes unrealistic. The paper estimates that a measurement that took its quantum computer 2.1 hours to perform would take the Frontier supercomputer approximately 3.2 years. Unless someone devises a far better classical algorithm than what we have today, this represents a pretty solid quantum advantage.

But is it a useful algorithm? The repeated sampling can act a bit like the Monte Carlo sampling done to explore the behavior of a wide variety of physical systems. Typically, however, we don’t view algorithms as modeling the behavior of the underlying hardware they’re being run on; instead, they’re meant to model some other physical system we’re interested in. That’s where Google’s announcement stands apart from its earlier work—the company believes it has identified an interesting real-world physical system with behaviors that the quantum echoes can help us understand.

That system is a small molecule in a Nuclear Magnetic Resonance (NMR) machine. In a second draft paper being published on the arXiv later today, Google has collaborated with a large collection of NMR experts to explore that use.

From computers to molecules

NMR is based on the fact that the nucleus of every atom has a quantum property called spin. When nuclei are held near to each other, such as when they’re in the same molecule, these spins can influence one another. NMR uses magnetic fields and photons to manipulate these spins and can be used to infer structural details, like how far apart two given atoms are. But as molecules get larger, these spin networks can extend for greater distances and become increasingly complicated to model. So NMR has been limited to focusing on the interactions of relatively nearby spins.

For this work, though, the researchers figured out how to use an NMR machine to create the physical equivalent of a quantum echo in a molecule. The work involved synthesizing the molecule with a specific isotope of carbon (carbon-13) in a known location in the molecule. That isotope could be used as the source of a signal that propagates through the network of spins formed by the molecule’s atoms.

“The OTOC experiment is based on a many-body echo, in which polarization initially localized on a target spin migrates through the spin network, before a Hamiltonian-engineered time-reversal refocuses to the initial state,” the team wrote. “This refocusing is sensitive to perturbations on distant butterfly spins, which allows one to measure the extent of polarization propagation through the spin network.”

Naturally, something this complicated needed a catchy nickname. The team came up with TARDIS, or Time-Accurate Reversal of Dipolar InteractionS. While that name captures the “out of time order” aspect of OTOC, it’s simply a set of control pulses sent to the NMR sample that starts a perturbation of the molecule’s network of nuclear spins. A second set of pulses then reflects an echo back to the source.

The reflections that return are imperfect, with noise coming from two sources. The first is simply imperfections in the control sequence, a limitation of the NMR hardware. But the second is the influence of fluctuations happening in distant atoms along the spin network. These happen at a certain frequency at random, or the researchers could insert a fluctuation by targeting a specific part of the molecule with randomized control signals.

The influence of what’s going on in these distant spins could allow us to use quantum echoes to tease out structural information at greater distances than we currently do with NMR. But to do so, we need an accurate model of how the echoes will propagate through the molecule. And again, that’s difficult to do with classical computations. But it’s very much within the capabilities of quantum computing, which the paper demonstrates.

Where things stand

For now, the team stuck to demonstrations on very simple molecules, making this work mostly a proof of concept. But the researchers are optimistic that there are many ways the system could be used to extract structural information from molecules at distances that are currently unobtainable using NMR. They list a lot of potential upsides that should be explored in the discussion of the paper, and there are plenty of smart people who would love to find new ways of using their NMR machines, so the field is likely to figure out pretty quickly which of these approaches turns out to be practically useful.

The fact that the demonstrations were done with small molecules, however, means that the modeling run on the quantum computer could also have been done on classical hardware (it only required 15 hardware qubits). So Google is claiming both quantum advantage and quantum utility, but not at the same time. The sorts of complex, long-distance interactions that would be out of range of classical simulation are still a bit beyond the reach of the current quantum hardware. O’Brien estimated that the hardware’s fidelity would have to improve by a factor of three or four to model molecules that are beyond classical simulation.

The quantum advantage issue should also be seen as a work in progress. Google has collaborated with enough researchers at enough institutions that there’s unlikely to be a major improvement in algorithms that could allow classical computers to catch up. Until the community as a whole has some time to digest the announcement, though, we shouldn’t take that as a given.

The other issue is verifiability. Some quantum algorithms will produce results that can be easily verified on classical hardware—situations where it’s hard to calculate the right result but easy to confirm a correct answer. Quantum echoes isn’t one of those, so we’ll need another quantum computer to verify the behavior Google has described.

But Google told Ars nothing is up to the task yet. “No other quantum processor currently matches both the error rates and number of qubits of our system, so our quantum computer is the only one capable of doing this at present,” the company said. (For context, Google says that the algorithm was run on up to 65 qubits, but the chip has 105 qubits total.)

There’s a good chance that other companies would disagree with that contention, but it hasn’t been possible to ask them ahead of the paper’s release.

In any case, even if this claim proves controversial, Google’s Michel Devoret, a recent Nobel winner, hinted that we shouldn’t have long to wait for additional ones. “We have other algorithms in the pipeline, so we will hopefully see other interesting quantum algorithms,” Devoret said.

Nature, 2025. DOI: 10.1038/s41586-025-09526-6  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Google has a useful quantum algorithm that outperforms a supercomputer Read More »

macbook-pro:-apple’s-most-awkward-laptop-is-the-first-to-show-off-apple-m5

MacBook Pro: Apple’s most awkward laptop is the first to show off Apple M5


the apple m5: one more than m4

Apple M5 trades blows with Pro and Max chips from older generations.

Apple’s M5 MacBook Pro. Credit: Andrew Cunningham

Apple’s M5 MacBook Pro. Credit: Andrew Cunningham

When I’m asked to recommend a Mac laptop for people, Apple’s low-end 14-inch MacBook Pro usually gets lost in the shuffle. It competes with the 13- and 15-inch MacBook Air, significantly cheaper computers that meet or exceed the “good enough” boundary for the vast majority of computer users. The basic MacBook Pro also doesn’t have the benefit of Apple’s Pro or Max-series chips, which come with many more CPU cores, substantially better graphics performance, and higher memory capacity for true professionals and power users.

But the low-end Pro makes sense for a certain type of power user. At $1,599, it’s the cheapest way to get Apple’s best laptop screen, with mini LED technology, a higher 120 Hz ProMotion refresh rate for smoother scrolling and animations, and the optional but lovely nano-texture (read: matte) finish. Unlike the MacBook Air, it comes with a cooling fan, which has historically meant meaningfully better sustained performance and less performance throttling. And it’s also Apple’s cheapest laptop with three Thunderbolt ports, an HDMI port, and an SD card slot, all genuinely useful for people who want to plug lots of things in without having multiple dongles or a bulky dock competing for the Air’s two available ports.

If you don’t find any of those arguments in the basic MacBook Pro’s favor convincing, that’s fine. The new M5 version makes almost no changes to the laptop other than the chip, so it’s unlikely to change your calculus if you already looked at the M3 or M4 version and passed it up. But it is the first Mac to ship with the M5, the first chip in Apple’s fifth-generation chip family and a preview of what’s to come for (almost?) every other Mac in the lineup. So you can at least be interested in the 14-inch MacBook Pro as a showcase for a new processor, if not as a retail product in and of itself.

The Apple Silicon MacBook Pro, take five

Apple has been using this laptop design for about four years now, since it released the M1 Pro and M1 Max versions of the MacBook Pro in late 2021. But for people who are upgrading from an older design—Apple did use the old Intel-era design, Touch Bar and all, for the low-end M1 and M2 MacBook Pros, after all—we’ll quickly hit the highlights.

This basic MacBook Pro only comes in a 14-inch screen size, up from 13-inches for the old low-end MacBook Pro, but some of that space is eaten up by the notch across the top of the display. The strips of screen on either side of the notch are usable by macOS, but only for the menu bar and icons that live in the menu bar—it’s a no-go zone for apps. The laptop is a consistent thickness throughout, rather than tapered, and has somewhat more squared-off and less-rounded corners.

Compared to the 13-inch MacBook Pro, the 14-inch version is the same thickness, but it’s a little heavier (3.4 pounds, compared to 3), wider, and deeper. For most professional users, the extra screen size and the re-addition of the HDMI port and SD card slot mostly justify the slight bump up in size. The laptop also includes three Thunderbolt 3 ports—up from two in the MacBook Airs—and the resurrected MagSafe charging port. But it is worth noting that the 14-inch MacBook Pro is nearly identical in weight to the 15-inch MacBook Air. If screen size is all you’re after, the Air may still be the better choice.

Apple’s included charger uses MagSafe on the laptop end, but USB-C chargers, docks, monitors, and other accessories will continue to charge the laptop if that’s what you prefer to keep using.

I’ve got no gripes about Apple’s current laptop keyboard—Apple uses the same key layout, spacing, and size across the entire MacBook Air and Pro line, though if I had to distinguish between the Pro and Air, I’d say the Pro’s keyboard is very, very slightly firmer and more satisfying to type on and that the force feedback of its trackpad is just a hair more clicky. The laptop’s speaker system is also more impressive than either MacBook Air, with much bassier bass and a better dynamic range.

But the main reason to prefer this low-end Pro to the Air is the screen, particularly the 120 Hz ProMotion support, the improved brightness and contrast of the mini LED display technology, and the option to add Apple’s matte nano texture finish. I usually don’t mind the amount of glare coming off my MacBook Air’s screen too much, but every time I go back to using a nano-texture screen I’m always a bit jealous of the complete lack of glare and reflections and the way you get those benefits without dealing with the dip in image quality you see from many matte-textured screen protectors. The more you use your laptop outdoors or under lighting conditions you can’t control, the more you’ll appreciate it.

The optional nano texture display adds a pleasant matte finish to the screen, but that notch is still notching. Credit: Andrew Cunningham

If the higher refresh rate and the optional matte coating (a $150 upgrade on top of an already pricey computer) don’t appeal to you, or if you can’t pay for them, then you can be pretty confident that this isn’t the MacBook for you. The 13-inch Air is lighter, and the 5-inch Air is larger, and both are cheaper. But we’re still only a couple of years past the M2 version of the low-end MacBook Pro, which didn’t give you the extra ports or the Pro-level screen.

But! Before you buy one of the still-M4-based MacBook Airs, our testing of the MacBook Pro’s new M5 chip should give you some idea of whether it’s worth waiting a few months (?) for an Air refresh.

Testing Apple’s M5

We’ve also run some M5 benchmarks as part of our M5 iPad Pro review, but having macOS rather than iPadOS running on top of it does give us a lot more testing flexibility—more benchmarks and a handful of high-end games to run, plus access to the command line for taking a look at power usage and efficiency.

To back up and re-state the chip’s specs for a moment, though, the M5 is constructed out of the same basic parts as the M4: four high-performance CPU cores, six high-efficiency CPU cores (up from four in the M1/M2/M3), 10 GPU cores, and a 16-core Neural Engine for handling some machine-learning and AI workloads.

The M5’s technical improvements are more targeted and subtle than just a boost to clock speeds or core counts. The first is a 27.5 percent increase in memory bandwidth, from the 120 GB/s of the M4 to 153 GB/s (achieved, I’m told, by a combination of faster RAM and the memory fabric that facilitates communication between different areas of the chip. Integrated GPUs are usually bottlenecked by memory bandwidth first and core count second, so memory bandwidth improvements can have a pretty direct, linear impact on graphics performance.

Apple also says it has added a “Neural Accelerator” to each of its GPU cores, separate from the Neural Engine. These will benefit a few specific types of workloads—things like MetalFX graphics upscaling or frame generation that would previously have had to use the Neural Engine can now do that work entirely within the GPU, eliminating a bit of latency and freeing the Neural Engine up to do other things. Apple is also claiming “over 4x peak GPU compute compared to M4,” which Apple says will speed up locally run AI language models and image generation software. That figure is coming mostly from the GPU improvements; according to Geekbench AI, the Neural Engine itself is only around 10 percent faster than the one on the M4.

(A note about testing: The M4 chip in these charts was in an iMac and not a MacBook Pro. But over several hardware generations, we’ve observed that the actively cooled versions of the basic M-series chips perform the same in both laptops and desktops. Comparing the M5 to the passively cooled M4 in the MacBook Air isn’t apples to apples, but comparing it to the M4 in the iMac is.)

Each of Apple’s chip generations has improved over the previous one by low-to-mid double digits, and the M5 is no different. We measured a 12 to 16 percent improvement over the M4 in single-threaded CPU tests, a 20 to 30 percent improvement in multicore tests, and roughly a 40 percent improvement in graphics benchmarks and the Mac version of the built-in Cyberpunk 2077 benchmark (one benchmark, the GPU-based version of the Blender rendering benchmark, measured a larger 60 to 70 percent improvement for the M5’s GPU, suggesting it either benefits more than most apps from the memory bandwidth improvements or the new neural accelerators).

Those performance additions add up over time. The M5 is typically a little over twice as fast as the M1, and it comes close to the performance level of some Pro and Max processors from past generations.

The M5 MacBook Pro falls short of the M4 Pro, and it will fall even shorter of the M5 Pro whenever it arrives. But its CPU performance generally beats the M3 Pro in our tests, and its GPU performance comes pretty close. Its multi-core CPU performance beats the M1 Max, and its single-core performance is over 80 percent faster. The M5 can’t come close to the graphics performance of any of these older Max or Ultra chips, but if you’re doing primarily CPU-heavy work and don’t need more than 32GB of RAM, the M5 holds up astonishingly well to Apple’s high-end silicon from just a few years ago.

It wasn’t so long ago that this kind of performance improvement was more-or-less normal across the entire tech industry, but Intel, AMD, and Nvidia’s consumer CPUs and GPUs have really slowed their rate of improvement lately, and Intel and AMD are both guilty of re-using old silicon for entry-level chips, over and over again. If you’re using a 6- or 7-year-old PC, sure, you’ll see performance improvements from something new, but it’s more of a crapshoot for a 3- to 4-year-old PC.

If there’s a downside to the M5 in our testing, it’s that its performance improvements seem to come with increased power draw relative to the M4 when all the CPU cores are engaged in heavy lifting. According to macOS built-in powermetrics tool, the M5 drew an average 28 W of power in our Handbrake video encoding test, compared to around 17 W for the M4 running the same test.

Using software tools to compare power draw between different chip manufacturers or even chip generations is dicey, because you’re trusting that different hardware is reporting its power use to the operating system in similar ways. But assuming they’re accurate, these numbers suggest that Apple could be pushing clock speeds more aggressively this generation to squeeze more performance out of the chip.

This would make some sense, since the third-generation 3nm TSMC manufacturing process used for the M5 (likely N3P) looks like a fairly mild upgrade from the second-generation 3nm process used for the M4 (N3E). TSMC says that N3P can boost performance by 5 percent at the same power use compared to N3E, or reduce power draw by 5 to 10 percent at the same performance. To get to the larger double-digit performance improvements that Apple is claiming and that we measured in our testing, you’d definitely expect to see the overall power consumption increase.

To put the M5 in context, the M2 and the M3 came a bit closer to its average power draw in our video encoding test (23.2 and 22.7 W, respectively), and the M5’s power draw comes in much lower than any past-generation Pro or Max chips. In terms of the amount of power used to complete the same task, the M5’s efficiency is worse than the M4’s according to powermetrics, but better than older generations. And Apple’s performance and power efficiency remains well ahead of what Intel or AMD can offer in their high-end products.

Impressive chip, awkward laptop

The low-end MacBook Pro has always occupied an odd in-between place in Apple’s lineup, overlapping in a lot of places with the MacBook Air and without the benefit of the much-faster chips that the 15- and 16-inch MacBook Pros could fit. The M5 MacBook Pro carries on that complicated legacy, and even with the M5 there are still lots of people for whom one of the M4 MacBook Airs is just going to be a better fit.

But it is a very nice laptop, and if your screen is the most important part of your laptop, this low-end Pro does make a decent case for itself. It’s frustrating that the matte display is a $150 upcharge, but it’s an option you can’t get on an Air, and the improved display panel and faster ProMotion refresh rate make scrolling and animations all look smoother and more fluid than they do on an Air’s screen. I still mostly think that this is a laptop without a huge constituency—too much more expensive than the Air, too much slower than the other Pros—but the people who buy it for the screen should still be mostly happy with the performance and ports.

This MacBook Pro is more exciting to me as a showcase for the Apple M5—and I’m excited to see the M5 and its higher-end Pro, Max, and (possibly) Ultra relatives show up in other Macs.

The M5 sports the highest sustained power draw of any M-series chip we’ve tested, but Apple’s past generations (the M4 in particular) have been so efficient that Apple has some room to bump up power consumption while remaining considerably more efficient than anything its competitors are offering. What you get in exchange is an impressively fast chip, as good or better than many of the Pro or Max chips in previous-generation products. For anyone still riding out the tail end of the Intel era, or for people with M1-class Macs that are showing their age, the M5 is definitely fast enough to feel like a real upgrade. That’s harder to come by in computing than it used to be.

The good

  • M5 is a solid performer that shows how far Apple has come since the M1.
  • Attractive, functional design, with a nice keyboard and trackpad, great-sounding speakers, a versatile selection of ports, and Apple’s best laptop screen.
  • Optional nano-texture display finish looks lovely and eliminates glare.

The bad

  • Harder to recommend than Apple’s other laptops if you don’t absolutely require a ProMotion screen.
  • A bit heavier than other laptops in its size class (and barely lighter than the 15-inch MacBook Air).
  • M5 can use more power than M4 did.

The ugly

  • High price for RAM and storage upgrades, and a $150 upsell for the nano-textured display.

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

MacBook Pro: Apple’s most awkward laptop is the first to show off Apple M5 Read More »

should-an-ai-copy-of-you-help-decide-if-you-live-or-die?

Should an AI copy of you help decide if you live or die?

“It would combine demographic and clinical variables, documented advance-care-planning data, patient-recorded values and goals, and contextual information about specific decisions,” he said.

“Including textual and conversational data could further increase a model’s ability to learn why preferences arise and change, not just what a patient’s preference was at a single point in time,” Starke said.

Ahmad suggested that future research could focus on validating fairness frameworks in clinical trials, evaluating moral trade-offs through simulations, and exploring how cross-cultural bioethics can be combined with AI designs.

Only then might AI surrogates be ready to be deployed, but only as “decision aids,” Ahmad wrote. Any “contested outputs” should automatically “trigger [an] ethics review,” Ahmad wrote, concluding that “the fairest AI surrogate is one that invites conversation, admits doubt, and leaves room for care.”

“AI will not absolve us”

Ahmad is hoping to test his conceptual models at various UW sites over the next five years, which would offer “some way to quantify how good this technology is,” he said.

“After that, I think there’s a collective decision regarding how as a society we decide to integrate or not integrate something like this,” Ahmad said.

In his paper, he warned against chatbot AI surrogates that could be interpreted as a simulation of the patient, predicting that future models may even speak in patients’ voices and suggesting that the “comfort and familiarity” of such tools might blur “the boundary between assistance and emotional manipulation.”

Starke agreed that more research and “richer conversations” between patients and doctors are needed.

“We should be cautious not to apply AI indiscriminately as a solution in search of a problem,” Starke said. “AI will not absolve us from making difficult ethical decisions, especially decisions concerning life and death.”

Truog, the bioethics expert, told Ars he “could imagine that AI could” one day “provide a surrogate decision maker with some interesting information, and it would be helpful.”

But a “problem with all of these pathways… is that they frame the decision of whether to perform CPR as a binary choice, regardless of context or the circumstances of the cardiac arrest,” Truog’s editorial said. “In the real world, the answer to the question of whether the patient would want to have CPR” when they’ve lost consciousness, “in almost all cases,” is “it depends.”

When Truog thinks about the kinds of situations he could end up in, he knows he wouldn’t just be considering his own values, health, and quality of life. His choice “might depend on what my children thought” or “what the financial consequences would be on the details of what my prognosis would be,” he told Ars.

“I would want my wife or another person that knew me well to be making those decisions,” Truog said. “I wouldn’t want somebody to say, ‘Well, here’s what AI told us about it.’”

Should an AI copy of you help decide if you live or die? Read More »

yes,-everything-online-sucks-now—but-it-doesn’t-have-to

Yes, everything online sucks now—but it doesn’t have to


from good to bad to nothing

Ars chats with Cory Doctorow about his new book Enshittification.

We all feel it: Our once-happy digital spaces have become increasingly less user-friendly and more toxic, cluttered with extras nobody asked for and hardly anybody wants. There’s even a word for it: “enshittification,” named 2023 Word of the Year by the American Dialect Society. The term was coined by tech journalist/science fiction author Cory Doctorow, a longtime advocate of digital rights. Doctorow has spun his analysis of what’s been ailing the tech industry into an eminently readable new book, Enshittification: Why Everything Suddenly Got Worse and What To Do About It.

As Doctorow tells it, he was on vacation in Puerto Rico, staying in a remote cabin nestled in a cloud forest with microwave Internet service—i.e., very bad Internet service, since microwave signals struggle to penetrate through clouds. It was a 90-minute drive to town, but when they tried to consult TripAdvisor for good local places to have dinner one night, they couldn’t get the site to load. “All you would get is the little TripAdvisor logo as an SVG filling your whole tab and nothing else,” Doctorow told Ars. “So I tweeted, ‘Has anyone at TripAdvisor ever been on a trip? This is the most enshittified website I’ve ever used.’”

Initially, he just got a few “haha, that’s a funny word” responses. “It was when I married that to this technical critique, at a moment when things were quite visibly bad to a much larger group of people, that made it take off,” Doctorow said. “I didn’t deliberately set out to do it. I bought a million lottery tickets and one of them won the lottery. It only took two decades.”

Yes, people sometimes express regret to him that the term includes a swear word. To which he responds, “You’re welcome to come up with another word. I’ve tried. ‘Platform decay’ just isn’t as good.” (“Encrapification” and “enpoopification” also lack a certain je ne sais quoi.)

In fact, it’s the sweariness that people love about the word. While that also means his book title inevitably gets bleeped on broadcast radio, “The hosts, in my experience, love getting their engineers to creatively bleep it,” said Doctorow. “They find it funny. It’s good radio, it stands out when every fifth word is ‘enbeepification.’”

People generally use “enshittification” colloquially to mean “the degradation in the quality and experience of online platforms over time.” Doctorow’s definition is more specific, encompassing “why an online service gets worse, how that worsening unfolds,” and how this process spreads to other online services, such that everything is getting worse all at once.

For Doctorow, enshittification is a disease with symptoms, a mechanism, and an epidemiology. It has infected everything from Facebook, Twitter, Amazon, and Google, to Airbnb, dating apps, iPhones, and everything in between. “For me, the fact that there were a lot of platforms that were going through this at the same time is one of the most interesting and important factors in the critique,” he said. “It makes this a structural issue and not a series of individual issues.”

It starts with the creation of a new two-sided online product of high quality, initially offered at a loss to attract users—say, Facebook, to pick an obvious example. Once the users are hooked on the product, the vendor moves to the second stage: degrading the product in some way for the benefit of their business customers. This might include selling advertisements, scraping and/or selling user data, or tweaking algorithms to prioritize content the vendor wishes users to see rather than what those users actually want.

This locks in the business customers, who, in turn, invest heavily in that product, such as media companies that started Facebook pages to promote their published content. Once business customers are locked in, the vendor can degrade those services too—i.e., by de-emphasizing news and links away from Facebook—to maximize profits to shareholders. Voila! The product is now enshittified.

The four horsemen of the shitocalypse

Doctorow identifies four key factors that have played a role in ushering in an era that he has dubbed the “Enshittocene.” The first is competition (markets), in which companies are motivated to make good products at affordable prices, with good working conditions, because otherwise customers and workers will go to their competitors.  The second is government regulation, such as antitrust laws that serve to keep corporate consolidation in check, or levying fines for dishonest practices, which makes it unprofitable to cheat.

The third is interoperability: the inherent flexibility of digital tools, which can play a useful adversarial role. “The fact that enshittification can always be reversed with a dis-enshittifiting counter-technology always acted as a brake on the worst impulses of tech companies,” Doctorow writes. Finally, there is labor power; in the case of the tech industry, highly skilled workers were scarce and thus had considerable leverage over employers.

All four factors, when functioning correctly, should serve as constraints to enshittification. However, “One by one each enshittification restraint was eroded until it dissolved, leaving the enshittification impulse unchecked,” Doctorow writes. Any “cure” will require reversing those well-established trends.

But isn’t all this just the nature of capitalism? Doctorow thinks it’s not, arguing that the aforementioned weakening of traditional constraints has resulted in the usual profit-seeking behavior producing very different, enshittified outcomes. “Adam Smith has this famous passage in Wealth of Nations about how it’s not due to the generosity of the baker that we get our bread but to his own self-regard,” said Doctorow. “It’s the fear that you’ll get your bread somewhere else that makes him keep prices low and keep quality high. It’s the fear of his employees leaving that makes him pay them a fair wage. It is the constraints that causes firms to behave better. You don’t have to believe that everything should be a capitalist or a for-profit enterprise to acknowledge that that’s true.”

Our wide-ranging conversation below has been edited for length to highlight the main points of discussion.

Ars Technica: I was intrigued by your choice of framing device, discussing enshittification as a form of contagion. 

Cory Doctorow: I’m on a constant search for different framing devices for these complex arguments. I have talked about enshittification in lots of different ways. That frame was one that resonated with people. I’ve been a blogger for a quarter of a century, and instead of keeping notes to myself, I make notes in public, and I write up what I think is important about something that has entered my mind, for better or for worse. The downside is that you’re constantly getting feedback that can be a little overwhelming. The upside is that you’re constantly getting feedback, and if you pay attention, it tells you where to go next, what to double down on.

Another way of organizing this is the Galaxy Brain meme, where the tiny brain is “Oh, this is because consumers shopped wrong.” The medium brain is “This is because VCs are greedy.” The larger brain is “This is because tech bosses are assholes.” But the biggest brain of all is “This is because policymakers created the policy environment where greed can ruin our lives.” There’s probably never going to be just one way to talk about this stuff that lands with everyone. So I like using a variety of approaches. I suck at being on message. I’m not going to do Enshittification for the Soul and Mornings with Enshittifying Maury. I am restless, and my Myers-Briggs type is ADHD, and I want to have a lot of different ways of talking about this stuff.

Ars Technica: One site that hasn’t (yet) succumbed is Wikipedia. What has protected Wikipedia thus far? 

Cory Doctorow: Wikipedia is an amazing example of what we at the Electronic Frontier Foundation (EFF) call the public interest Internet. Internet Archive is another one. Most of these public interest Internet services start off as one person’s labor of love, and that person ends up being what we affectionately call the benevolent dictator for life. Very few of these projects have seen the benevolent dictator for life say, “Actually, this is too important for one person to run. I cannot be the keeper of the soul of this project. I am prone to self-deception and folly just like every other person. This needs to belong to its community.” Wikipedia is one of them. The founder, my friend Jimmy Wales, woke up one day and said, “No individual should run Wikipedia. It should be a communal effort.”

There’s a much more durable and thick constraint on the decisions of anyone at Wikipedia to do something bad. For example, Jimmy had this idea that you could use AI in Wikipedia to help people make entries and navigate Wikipedia’s policies, which are daunting. The community evaluated his arguments and decided—not in a reactionary way, but in a really thoughtful way—that this was wrong. Jimmy didn’t get his way. It didn’t rule out something in the future, but that’s not happening now. That’s pretty cool.

Wikipedia is not just governed by a board; it’s also structured as a nonprofit. That doesn’t mean that there’s no way it could go bad. But it’s a source of friction against enshittification. Wikipedia has its entire corpus irrevocably licensed as the most open it can be without actually being in the public domain. Even if someone were to capture Wikipedia, there’s limits on what they could do to it.

There’s also a labor constraint in Wikipedia in that there’s very little that the leadership can do without bringing along a critical mass of a large and diffuse body of volunteers. That cuts against the volunteers working in unison—they’re not represented by a union; it’s hard for them to push back with one voice. But because they’re so diffuse and because there’s no paychecks involved, it’s really hard for management to do bad things. So if there are two people vying for the job of running the Wikimedia Foundation and one of them has got nefarious plans and the other doesn’t, the nefarious plan person, if they’re smart, is going to give it up—because if they try to squeeze Wikipedia, the harder they squeeze, the more it will slip through their grasp.

So these are structural defenses against enshittification of Wikipedia. I don’t know that it was in the mechanism design—I think they just got lucky—but it is a template for how to run such a project. It does raise this question: How do you build the community? But if you have a community of volunteers around a project, it’s a model of how to turn that project over to that community.

Ars Technica: Your case studies naturally include the decay of social media, notably Facebook and the social media site formerly known as Twitter. How might newer social media platforms resist the spiral into “platform decay”?

Cory Doctorow: What you want is a foundation in which people on social media face few switching costs. If the social media is interoperable, if it’s federatable, then it’s much harder for management to make decisions that are antithetical to the interests of users. If they do, users can escape. And it sets up an internal dynamic within the firm, where the people who have good ideas don’t get shouted down by the people who have bad but more profitable ideas, because it makes those bad ideas unprofitable. It creates both short and long-term risks to the bottom line.

There has to be a structure that stops their investors from pressurizing them into doing bad things, that stops them from rationalizing their way into complying. I think there’s this pathology where you start a company, you convince 150 of your friends to risk their kids’ college fund and their mortgage working for you. You make millions of users really happy, and your investors come along and say, “You have to destroy the life of 5 percent of your users with some change.” And you’re like, “Well, I guess the right thing to do here is to sacrifice those 5 percent, keep the other 95 percent happy, and live to fight another day, because I’m a good guy. If I quit over this, they’ll just put a bad guy in who’ll wreck things. I keep those 150 people working. Not only that, I’m kind of a martyr because everyone thinks I’m a dick for doing this. No one understands that I have taken the tough decision.”

I think that’s a common pattern among people who, in fact, are quite ethical but are also capable of rationalizing their way into bad things. I am very capable of rationalizing my way into bad things. This is not an indictment of someone’s character. But it’s why, before you go on a diet, you throw away the Oreos. It’s why you bind yourself to what behavioral economists call “Ulysses pacts“: You tie yourself to the mast before you go into the sea of sirens, not because you’re weak but because you’re strong enough now to know that you’ll be weak in the future.

I have what I would call the epistemic humility to say that I don’t know what makes a good social media network, but I do know what makes it so that when they go bad, you’re not stuck there. You and I might want totally different things out of our social media experience, but I think that you should 100 percent have the right to go somewhere else without losing anything. The easier it is for you to go without losing something, the better it is for all of us.

My dream is a social media universe where knowing what network someone is using is just a weird curiosity. It’d be like knowing which cell phone carrier your friend is using when you give them a call. It should just not matter. There might be regional or technical reasons to use one network or another, but it shouldn’t matter to anyone other than the user what network they’re using. A social media platform where it’s always easier for users to leave is much more future-proof and much more effective than trying to design characteristics of good social media.

Ars Technica: How might this work in practice?

Cory Doctorow: I think you just need a protocol. This is [Mike] Maznik’s point: protocols, not products. We don’t need a universal app to make email work. We don’t need a universal app to make the web work. I always think about this in the context of administrable regulation. Making a rule that says your social media network must be good for people to use and must not harm their mental health is impossible. The fact intensivity of determining whether a platform satisfies that rule makes it a non-starter.

Whereas if you were to say, “OK, you have to support an existing federation protocol, like AT Protocol and Mastodon ActivityPub,” both have ways to port identity from one place to another and have messages auto-forward. This is also in RSS. There’s a permanent redirect directive. You do that, you’re in compliance with the regulation.

Or you have to do something that satisfies the functional requirements of the spec. So it’s not “did you make someone sad in a way that was reckless?” That is a very hard question to adjudicate. Did you satisfy these functional requirements? It’s not easy to answer that, but it’s not impossible. If you want to have our users be able to move to your platform, then you just have to support the spec that we’ve come up with, which satisfies these functional requirements.

We don’t have to have just one protocol. We can have multiple ones. Not everything has to connect to everything else, but everyone who wants to connect should be able to connect to everyone else who wants to connect. That’s end-to-end. End-to-end is not “you are required to listen to everything someone wants to tell you.” It’s that willing parties should be connected when they want to be.

Ars Technica: What about security and privacy protocols like GPG and PGP?

Cory Doctorow: There’s this argument that the reason GPG is so hard to use is that it’s intrinsic; you need a closed system to make it work. But also, until pretty recently, GPG was supported by one part-time guy in Germany who got 30,000 euros a year in donations to work on it, and he was supporting 20 million users. He was primarily interested in making sure the system was secure rather than making it usable. If you were to put Big Tech quantities of money behind improving ease of use for GPG, maybe you decide it’s a dead end because it is a 30-year-old attempt to stick a security layer on top of SMTP. Maybe there’s better ways of doing it. But I doubt that we have reached the apex of GPG usability with one part-time volunteer.

I just think there’s plenty of room there. If you have a pretty good project that is run by a large firm and has had billions of dollars put into it, the most advanced technologists and UI experts working on it, and you’ve got another project that has never been funded and has only had one volunteer on it—I would assume that dedicating resources to that second one would produce pretty substantial dividends, whereas the first one is only going to produce these minor tweaks. How much more usable does iOS get with every iteration?

I don’t know if PGP is the right place to start to make privacy, but I do think that if we can create independence of the security layer from the transport layer, which is what PGP is trying to do, then it wouldn’t matter so much that there is end-to-end encryption in Mastodon DMs or in Bluesky DMs. And again, it doesn’t matter whose sim is in your phone, so it just shouldn’t matter which platform you’re using so long as it’s secure and reliably delivered end-to-end.

Ars Technica: These days, I’m almost contractually required to ask about AI. There’s no escaping it. But it’s certainly part of the ongoing enshittification.

Cory Doctorow: I agree. Again, the companies are too big to care. They know you’re locked in, and the things that make enshittification possible—like remote software updating, ongoing analytics of use of devices—they allow for the most annoying AI dysfunction. I call it the fat-finger economy, where you have someone who works in a company on a product team, and their KPI, and therefore their bonus and compensation, is tied to getting you to use AI a certain number of times. So they just look at the analytics for the app and they ask, “What button gets pushed the most often? Let’s move that button somewhere else and make an AI summoning button.”

They’re just gaming a metric. It’s causing significant across-the-board regressions in the quality of the product, and I don’t think it’s justified by people who then discover a new use for the AI. That’s a paternalistic justification. The user doesn’t know what they want until you show it to them: “Oh, if I trick you into using it and you keep using it, then I have actually done you a favor.” I don’t think that’s happening. I don’t think people are like, “Oh, rather than press reply to a message and then type a message, I can instead have this interaction with an AI about how to send someone a message about takeout for dinner tonight.” I think people are like, “That was terrible. I regret having tapped it.” 

The speech-to-text is unusable now. I flatter myself that my spoken and written communication is not statistically average. The things that make it me and that make it worth having, as opposed to just a series of multiple-choice answers, is all the ways in which it diverges from statistical averages. Back when the model was stupider, when it gave up sooner if it didn’t recognize what word it might be and just transcribed what it thought you’d said rather than trying to substitute a more probable word, it was more accurate.  Now, what I’m getting are statistically average words that are meaningless.

That elision of nuance and detail is characteristic of what makes AI products bad. There is a bunch of stuff that AI is good at that I’m excited about, and I think a lot of it is going to survive the bubble popping. But I fear that we’re not planning for that. I fear what we’re doing is taking workers whose jobs are meaningful, replacing them with AIs that can’t do their jobs, and then those AIs are going to go away and we’ll have nothing. That’s my concern.

Ars Technica: You prescribe a “cure” for enshittification, but in such a polarized political environment, do we even have the collective will to implement the necessary policies?

Cory Doctorow: The good news is also the bad news, which is that this doesn’t just affect tech. Take labor power. There are a lot of tech workers who are looking at the way their bosses treat the workers they’re not afraid of—Amazon warehouse workers and drivers, Chinese assembly line manufacturers for iPhones—and realizing, “Oh, wait, when my boss stops being afraid of me, this is how he’s going to treat me.” Mark Zuckerberg stopped going to those all-hands town hall meetings with the engineering staff. He’s not pretending that you are his peers anymore. He doesn’t need to; he’s got a critical mass of unemployed workers he can tap into. I think a lot of Googlers figured this out after the 12,000-person layoffs. Tech workers are realizing they missed an opportunity, that they’re going to have to play catch-up, and that the only way to get there is by solidarity with other kinds of workers.

The same goes for competition. There’s a bunch of people who care about media, who are watching Warner about to swallow Paramount and who are saying, “Oh, this is bad. We need antitrust enforcement here.” When we had a functional antitrust system for the last four years, we saw a bunch of telecoms mergers stopped because once you start enforcing antitrust, it’s like eating Pringles. You just can’t stop. You embolden a lot of people to start thinking about market structure as a source of either good or bad policy. The real thing that happened with [former FCC chair] Lina Kahn doing all that merger scrutiny was that people just stopped planning mergers.

There are a lot of people who benefit from this. It’s not just tech workers or tech users; it’s not just media users. Hospital consolidation, pharmaceutical consolidation, has a lot of people who are very concerned about it. Mark Cuban is freaking out about pharmacy benefit manager consolidation and vertical integration with HMOs, as he should be. I don’t think that we’re just asking the anti-enshittification world to carry this weight.

Same with the other factors. The best progress we’ve seen on interoperability has been through right-to-repair. It hasn’t been through people who care about social media interoperability. One of the first really good state-level right-to-repair bills was the one that [Governor] Jared Polis signed in Colorado for powered wheelchairs. Those people have a story that is much more salient to normies. “

What do you mean you spent six months in bed because there’s only two powered wheelchair manufacturers and your chair broke and you weren’t allowed to get it fixed by a third party?” And they’ve slashed their repair department, so it takes six months for someone to show up and fix your chair. So you had bed sores and pneumonia because you couldn’t get your chair fixed. This is bullshit.

So the coalitions are quite large. The thing that all of those forces share—interoperability, labor power, regulation, and competition—is that they’re all downstream of corporate consolidation and wealth inequality. Figuring out how to bring all of those different voices together, that’s how we resolve this. In many ways, the enshittification analysis and remedy are a human factors and security approach to designing an enshittification-resistant Internet. It’s about understanding this as a red team, blue team exercise. How do we challenge the status quo that we have now, and how do we defend the status quo that we want?

Anything that can’t go on forever eventually stops. That is the first law of finance, Stein’s law. We are reaching multiple breaking points, and the question is whether we reach things like breaking points for the climate and for our political system before we reach breaking points for the forces that would rescue those from permanent destruction.

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Yes, everything online sucks now—but it doesn’t have to Read More »

inside-the-web-infrastructure-revolt-over-google’s-ai-overviews

Inside the web infrastructure revolt over Google’s AI Overviews


Cloudflare CEO Matthew Prince is making sweeping changes to force Google’s hand.

It could be a consequential act of quiet regulation. Cloudflare, a web infrastructure company, has updated millions of websites’ robots.txt files in an effort to force Google to change how it crawls them to fuel its AI products and initiatives.

We spoke with Cloudflare CEO Matthew Prince about what exactly is going on here, why it matters, and what the web might soon look like. But to get into that, we need to cover a little background first.

The new change, which Cloudflare calls its Content Signals Policy, happened after publishers and other companies that depend on web traffic have cried foul over Google’s AI Overviews and similar AI answer engines, saying they are sharply cutting those companies’ path to revenue because they don’t send traffic back to the source of the information.

There have been lawsuits, efforts to kick-start new marketplaces to ensure compensation, and more—but few companies have the kind of leverage Cloudflare does. Its products and services back something close to 20 percent of the web, and thus a significant slice of the websites that show up on search results pages or that fuel large language models.

“Almost every reasonable AI company that’s out there is saying, listen, if it’s a fair playing field, then we’re happy to pay for content,” Prince said. “The problem is that all of them are terrified of Google because if Google gets content for free but they all have to pay for it, they are always going to be at an inherent disadvantage.”

This is happening because Google is using its dominant position in search to ensure that web publishers allow their content to be used in ways that they might not otherwise want it to.

The changing norms of the web

Since 2023, Google has offered a way for website administrators to opt their content out of use for training Google’s large language models, such as Gemini.

However, allowing pages to be indexed by Google’s search crawlers and shown in results requires accepting that they’ll also be used to generate AI Overviews at the top of results pages through a process called retrieval-augmented generation (RAG).

That’s not so for many other crawlers, making Google an outlier among major players.

This is a sore point for a wide range of website administrators, from news websites that publish journalism to investment banks that produce research reports.

A July study from the Pew Research Center analyzed data from 900 adults in the US and found that AI Overviews cut referrals nearly in half. Specifically, users clicked a link on a page with AI Overviews at the top just 8 percent of the time, compared to 15 percent for search engine results pages without those summaries.

And a report in The Wall Street Journal cited a wide range of sources—including internal traffic metrics from numerous major publications like The New York Times and Business Insider—to describe industry-wide plummets in website traffic that those publishers said were tied to AI summaries, leading to layoffs and strategic shifts.

In August, Google’s head of search, Liz Reid, disputed the validity and applicability of studies and publisher reports of reduced link clicks in search. “Overall, total organic click volume from Google Search to websites has been relatively stable year-over-year,” she wrote, going on to say that reports of big declines were “often based on flawed methodologies, isolated examples, or traffic changes that occurred prior to the rollout of AI features in Search.”

Publishers aren’t convinced. Penske Media Corporation, which owns brands like The Hollywood Reporter and Rolling Stone, sued Google over AI Overviews in September. The suit claims that affiliate link revenue has dropped by more than a third in the past year, due in large part to Google’s overviews—a threatening shortfall in a business that already has difficult margins.

Penske’s suit specifically noted that because Google bundles traditional search engine indexing and RAG use together, the company has no choice but to allow Google to keep summarizing its articles, as cutting off Google search referrals entirely would be financially fatal.

Since the earliest days of digital publishing, referrals have in one way or another acted as the backbone of the web’s economy. Content could be made available freely to both human readers and crawlers, and norms were applied across the web to allow information to be tracked back to its source and give that source an opportunity to monetize its content to sustain itself.

Today, there’s a panic that the old system isn’t working anymore as content summaries via RAG have become more common, and along with other players, Cloudflare is trying to update those norms to reflect the current reality.

A mass-scale update to robots.txt

Announced on September 24, Cloudflare’s Content Signals Policy is an effort to use the company’s influential market position to change how content is used by web crawlers. It involves updating millions of websites’ robots.txt files.

Starting in 1994, websites began placing a file called “robots.txt” at the domain root to indicate to automated web crawlers which parts of the domain should be crawled and indexed and which should be ignored. The standard became near-universal over the years; honoring it has been a key part of how Google’s web crawlers operate.

Historically, robots.txt simply includes a list of paths on the domain that were flagged as either “allow” or “disallow.” It was technically not enforceable, but it became an effective honor system because there are advantages to it for the owners of both the website and the crawler: Website owners could dictate access for various business reasons, and it helped crawlers avoid working through data that wouldn’t be relevant.

But robots.txt only tells crawlers whether they can access something at all; it doesn’t tell them what they can use it for. For example, Google supports disallowing the agent “Google-Extended” as a path to blocking crawlers that are looking for content with which to train future versions of its Gemini large language model—though introducing that rule doesn’t do anything about the training Google did before it rolled out Google-Extended in 2023, and it doesn’t stop crawling for RAG and AI Overviews.

The Content Signals Policy initiative is a newly proposed format for robots.txt that intends to do that. It allows website operators to opt in or out of consenting to the following use cases, as worded in the policy:

  • search: Building a search index and providing search results (e.g., returning hyperlinks and short excerpts from your website’s contents). Search does not include providing AI-generated search summaries.
  • ai-input: Inputting content into one or more AI models (e.g., retrieval augmented generation, grounding, or other real-time taking of content for generative AI search answers).
  • ai-train: Training or fine-tuning AI models.

Cloudflare has given all of its customers quick paths for setting those values on a case-by-case basis. Further, it has automatically updated robots.txt on the 3.8 million domains that already use Cloudflare’s managed robots.txt feature, with search defaulting to yes, ai-train to no, and ai-input blank, indicating a neutral position.

The threat of potential litigation

In making this look a bit like a terms of service agreement, Cloudflare’s goal is explicitly to put legal pressure on Google to change its policy of bundling traditional search crawlers and AI Overviews.

“Make no mistake, the legal team at Google is looking at this saying, ‘Huh, that’s now something that we have to actively choose to ignore across a significant portion of the web,'” Prince told me.

Cloudflare specifically made this look like a license agreement. Credit: Cloudflare

He further characterized this as an effort to get a company that he says has historically been “largely a good actor” and a “patron of the web” to go back to doing the right thing.

“Inside of Google, there is a fight where there are people who are saying we should change how we’re doing this,” he explained. “And there are other people saying, no, that gives up our inherent advantage, we have a God-given right to all the content on the Internet.”

Amid that debate, lawyers have sway at Google, so Cloudflare tried to design tools “that made it very clear that if they were going to follow any of these sites, there was a clear license which was in place for them. And that will create risk for them if they don’t follow it,” Prince said.

The next web paradigm

It takes a company with Cloudflare’s scale to do something like this with any hope that it will have an impact. If just a few websites made this change, Google would have an easier time ignoring it, or worse yet, it could simply stop crawling them to avoid the problem. Since Cloudflare is entangled with millions of websites, Google couldn’t do that without materially impacting the quality of the search experience.

Cloudflare has a vested interest in the general health of the web, but there are other strategic considerations at play, too. The company has been working on tools to assist with RAG on customers’ websites in partnership with Microsoft-owned Google competitor Bing and has experimented with a marketplace that provides a way for websites to charge crawlers for scraping the sites for AI, though what final form that might take is still unclear.

I asked Prince directly if this comes from a place of conviction. “There are very few times that opportunities come along where you get to help think through what a future better business model of an organization or institution as large as the Internet and as important as the Internet is,” he said. “As we do that, I think that we should all be thinking about what have we learned that was good about the Internet in the past and what have we learned that was bad about the Internet in the past.”

It’s important to acknowledge that we don’t yet know what the future business model of the web will look like. Cloudflare itself has ideas. Others have proposed new standards, marketplaces, and strategies, too. There will be winners and losers, and those won’t always be the same winners and losers we saw in the previous paradigm.

What most people seem to agree on, whatever their individual incentives, is that Google shouldn’t get to come out on top in a future answer-engine-driven web paradigm just because it previously established dominance in the search-engine-driven one.

For this new standard for robots.txt, success looks like Google allowing content to be available in search but not in AI Overviews. Whatever the long-term vision, and whether it happens because of Cloudflare’s pressure with the Content Signals Policy or some other driving force, most agree that it would be a good start.

Photo of Samuel Axon

Samuel Axon is the editorial lead for tech and gaming coverage at Ars Technica. He covers AI, software development, gaming, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

Inside the web infrastructure revolt over Google’s AI Overviews Read More »

rog-xbox-ally-x:-the-ars-technica-review

ROG Xbox Ally X: The Ars Technica review


You got Xbox in my portable gaming PC

The first portable “Xbox” fails to unify a messy world of competing PC gaming platforms.

The ROG Ally X sure looks great floating in a void… Credit: Asus

Here at Ars, we have been writing about rumors of a portable Xbox for literal decades now. With the ROG Xbox Ally, Microsoft has finally made those rumors a reality in the weirdest, most Microsoft way possible.

Yes, the $600 ROG Xbox Ally—and its souped-up cousin, the $1,000, ridiculous-mouthful-of-a-name ROG Xbox Ally X, which we tested—are the first official handheld hardware to sport the Xbox brand name. But Microsoft isn’t taking the exclusive-heavy, walled garden software approach that it has been committed to for nearly 25 years of Xbox home consoles. Instead, the ROG Xbox Ally is, at its base, simply a new version of Asus’ Windows-based ROG Ally line with an Xbox-flavored coat of paint.

That coat of paint—what Microsoft is calling the Xbox Full-screen Experience (FSE)—represents the company’s belated attempt to streamline the Windows gaming experience to be a bit more console-like in terms of user interface and overall simplicity. While that’s a worthy vision, the execution in these early days is so spotty and riddled with annoyances that it’s hard to recommend over the SteamOS-based competition.

Promises, promises

When Microsoft announced the ROG Xbox Ally this summer, the company promised that what it was calling a new “Xbox Experience for Handheld” would “minimize background activity and defer non-essential tasks” usually present in Windows, meaning “more [and] higher framerates” for gaming. While this is technically true, the performance improvement is so small as to be almost meaningless in practice.

In our testing, in-game benchmarks running under the Xbox Full Screen Experience were ever so slightly faster than those same benchmarks running under the full Windows 11 in Desktop Mode (which you can switch to with a few button presses on the ROG Xbox Ally). And when we say “ever so slightly,” we mean less than a single frame per second improvement in many cases and only one or two frames per second at most. Even on a percentage basis, the difference will be practically unnoticeable.

Comparing ROG Xbox Ally X benchmarks on the Xbox Full-screen Experience (FSE) and the standard Windows 11 desktop. Here’s Doom: The Dark Ages. Kyle Orland

The other major selling point of Microsoft’s Xbox FSE, as sold this summer, is an “aggregated gaming library” that includes “all of the games available on Windows” in one single mega-launcher interface. That means apps like Steam, Battle.net, GOG Galaxy, Ubisoft Connect, and EA Play can all be installed with just a click from the “My Apps” section of the FSE from the first launch.

The integration of these apps into the wider Xbox FSE is spotty at best, though. For one, the new “aggregate gaming library” can’t actually show you every game you own across all of these PC gaming platforms in one place. Choosing the “installable” games filter on the Xbox FSE only shows you the games you can access through Microsoft’s own Xbox platform (including any Xbox Game Pass subscription). For other platforms, you must still browse and install the games you own through their own apps, each with their own distinct interfaces that don’t always play well with the ROG Xbox Ally’s button-based controls.

The home screen shows your most recent titles while also offering some ads for other available titles. Kyle Orland / Asus

Even games that are supposed to be installable directly via the Xbox FSE caused me problems in testing. Trying to load the EA Play app to install any number of games included with Xbox Game Pass, for instance, triggered an “authentication error” page with no option to actually log in to EA’s servers. This problem persisted across multiple restarts and reinstallations of the extension that’s supposed to link EA Play to the Xbox FSE. And while I could load the EA Play app via Desktop Mode, I couldn’t get the app to recognize that I had an active Xbox Game Pass subscription to grant me access to the titles I wanted. So much for testing Battlefield, I guess.

Mo’ launchers, mo’ problems

Once you’ve gone to the trouble of installing your favorite games on the ROG Xbox Ally, the Xbox FSE does a good job of aggregating their listings in a single common interface. For players whose gaming libraries are spread across multiple platforms, it can be genuinely useful to see an FSE game list where Battle.net’s Hearthstone sits next to a GOG copy of Cyberpunk 2077, a Steam copy of Hades II, and an Epic Games Store copy of Fortnite, for instance. A quick tap of the Xbox button will even show you the last three games you played, regardless of where you launched them (alongside quick access to some useful general settings).

The “My Games” listing shows everything you’ve installed, regardless of the source.

Credit: Kyle Orland / Asus

The “My Games” listing shows everything you’ve installed, regardless of the source. Credit: Kyle Orland / Asus

Unfortunately, actually playing those games via the new Xbox FSE is far from a seamless experience. You never quite know what you’re going to get when you hit the large, green “Play” button to launch a third-party platform via the Xbox FSE. All too frequently, in fact, you’ll get no immediate outward sign for multiple seconds that you did anything, leaving you to wonder if your button press even registered.

In the best case, this long wait will eventually culminate in a separate launcher popping up and eventually loading your game (or possibly popping up and down multiple times as it cycles through necessary launcher updates). In a slightly annoying case, the launcher might require you to close some pop-up and/or manually hit another on-screen button to launch the game (if you’re playing with the console docked to a TV, this may be downright impossible without a mouse plugged in). In the worst case, the wait might stretch to 30 seconds or more before you think to check the App Switcher and realize that Battle.net actually launched in the background and is waiting for you to input your username and password (to cite just one of many frustratingly counterintuitive examples I encountered).

Tapping the Xbox button brings up this helpful overlay no matter where you are in a game or app. Kyle Orland / Asus

Sometimes, switching from one active game to another via the handy Xbox button will pop up a warning that you should close the first game before opening a new one. Quite often, though, that pop-up warning simply fails to appear, forcing you to go to the trouble of manually closing the first game if you don’t want it eating up resources in the background. But in cases where you do want to multitask—downloading a game on Steam while playing something via the Epic Games Store, for instance—you can never be quite sure if the background app will actually keep doing what you want when it isn’t in focus via the FSE. Sometimes it does, sometimes it doesn’t.

There are plenty of other little annoyances that make using the Xbox FSE more painful than it should be. Sometimes the system will swap between third-party launchers or between a launcher and the Xbox FSE for seemingly no reason, interrupting your flow. When the Xbox app itself needs updating, it does so via the desktop version of the Windows Update settings menu, which isn’t really designed for controllers (but which will offer to let you install the latest version of Notepad). Sometimes, Steam’s Big Picture Mode would have the very top and bottom of the full-screen interface cut off for no apparent reason. The system will frequently freeze on a menu for multiple seconds and refuse to respond to any input, especially when loading or closing an outside launcher. I could go on.

If you want to update the Xbox app itself, you still have to go through this Windows Update screen outside of the FSE.

Credit: Kyle Orland / Asus

If you want to update the Xbox app itself, you still have to go through this Windows Update screen outside of the FSE. Credit: Kyle Orland / Asus

I’m willing to cut Microsoft a bit of slack here. It’s hard to bring the fragmented landscape of competing PC gaming platforms and storefronts together into a single unified interface that provides a cohesive user experience. In a sense, Microsoft is trapped in that XKCD comic where people complain about 14 different competing standards and end up creating a 15th competing standard in the process of trying to unify them.

At the same time, Microsoft sold the Xbox FSE largely on the promise of an “aggregated gaming library” that makes this all simple. Instead, the Full-screen Experience we got papers over the fragmented nature of Windows gaming while causing new problems all their own.

A powerful machine

Xbox FSE aside, there’s a lot to like about the ROG Xbox Ally line from a hardware design perspective. I tested the ROG Xbox Ally X, which is a little thicker and heavier than the Steam Deck but ends up riding that fine line between “solidly built” and “dense brick” pretty well in the hands.

Out of the box. Kyle Orland

I was especially impressed with the design of the ROG Xbox Ally’s hand grips, textured ovoid bumps that slot perfectly into the crook of the palm for comfortable extended gaming sessions. The overall build quality shines through, too, from nicely springy analog sticks and shoulder triggers to extremely powerful, bass-heavy speakers and excellently clicky face buttons (which can be a bit loud when playing next to a sleeping partner). There are also some nice rear buttons that are incredibly easy to nudge with a small flick of your middle finger, if you’re one of those people who never wants to move your thumbs off the analog sticks.

The ROG Xbox Ally’s 7-inch 1080p screen is crisp enough and looks especially nice when delivering steady 120 fps performance on games like Hollow Knight: Silksong. But the maximum brightness of 500 nits is a bit dim if you’re going to be playing in direct sunlight. I also found myself missing the deep blacks and pop of HDR color I’ve gotten used to on my Steam Deck OLED.

The AMD Ryzen Z2 Extreme chip in the ROG Xbox Ally X delivers all the relative gaming horsepower you’d hope for from a $1,000 PC gaming handheld. I was able to hit 30 fps or more running a recent release like Doom: The Dark Ages at 1080p and High graphical settings. For a slightly older game like Cyberpunk 2077, the chip could even handle the “Ray-tracing Low” graphics preset at 1080p resolution at an acceptable frame rate when plugged into an outlet.

If you’re away from a power source, though, pushing the hardware for high-end graphical performance like that does take its toll on the battery. Titles that required the hardware’s preset “Turbo Mode”—which tends to run the noisy fan at full blast to keep internal temperature reasonable—could drain a fully charged battery in around two hours in the worst case. Switching over to the power-sipping but less graphically powerful Silent Mode (which can be accessed with a single button press using a handy “Armoury Crate Command Center” overlay) will extend that to about five or six hours of play time at the expense of the frame rate for high-end games.

The SteamOS elephant in the room

In a bubble, it would be easy to see the ROG Xbox Ally X as a promising, if flawed, early attempt to merge the simplicity of console gaming with the openness of PC gaming in a nice handheld form factor. Here in the real world, though, we’ve been enjoying a much more refined version of that same idea via Valve’s SteamOS and Steam Deck for years.

With SteamOS, I don’t need to worry about which launcher I’ll use to install or update a game. I don’t need to manually close background programs or wait multiple seconds to see an on-screen response when I hit the “Play” button. I don’t need to worry about whether a Settings menu will require me to use a mouse or touchscreen. Everything just works on SteamOS in a way I can’t rely on with the Xbox FSE.

Yes, Microsoft can brag that the Xbox FSE supports every Windows game, while SteamOS is limited to Valve’s walled garden. But this is barely an advantage for many if not most PC gamers, who have been launching their games via Steam more or less exclusively for decades now. Even companies with their own platforms and launchers often offer compatible versions of their biggest titles on Steam these days, a tacit acknowledgement of the social-network lock-in Valve has over the market for whole generations of PC gamers.

The Xbox “Full Screen Experience” can also be a windowed experience from the Windows desktop.

Credit: Kyle Orland / Asus

The Xbox “Full Screen Experience” can also be a windowed experience from the Windows desktop. Credit: Kyle Orland / Asus

Sure, the ROG Xbox Ally can play a handful of games that aren’t available via SteamOS for one reason or another. If you want to play Fortnite, Destiny, Battlefield, or Diablo III portably, the ROG Xbox Ally is a decent solution. Ditto for web-based games (playable here via Edge), games available via niche platforms like itch.io, or even titles you might install directly to the Windows desktop without an outside platform’s launcher (remember those?). And even for games available on SteamOS, the ROG Xbox Ally can give you access to the free or cheap versions you obtained from sales or offers on other platforms (don’t sleep on Amazon’s Prime Gaming offers if you like free GOG codes).

The killer app for the ROG Xbox Ally, though, is Xbox Game Pass. If you subscribe to Microsoft’s popular gaming service, logging in to your account on the ROG Xbox Ally means seeing your “installable” library instantly fill up with hundreds of games spanning a huge swath of the recent history of PC gaming. That’s an especially nice feeling for PC gaming newcomers who haven’t spent years digging through regular Steam sales to build up a sizable backlog.

If you have Xbox Game Pass, your ROG Xbox Ally will have a ton of available games from day one.

Credit: Kyle Orland / Asus

If you have Xbox Game Pass, your ROG Xbox Ally will have a ton of available games from day one. Credit: Kyle Orland / Asus

The recent price hike to $30 a month for Xbox Game Pass Ultimate definitely makes this a less compelling proposition. But ROG Xbox Ally users can probably get by with the $16.49/month “Xbox Game Pass for PC” plan, which offers over 500 installable games and access to new first-party Microsoft releases. As a way to dip your toe into the wide world of PC gaming, it’s hard to beat.

Even for players who aren’t interested in Xbox Game Pass, the ROG Xbox Ally X is a well-built piece of hardware with the power to run today’s games pretty well. All things considered, though, the poor user experience of the Xbox FSE makes it hard to recommend either ROG Xbox Ally over somewhat less powerful SteamOS devices like the Steam Deck or Legion Go S. That said, we hope Microsoft will continue refining the Xbox Full-screen Experience to make for a Windows gaming experience that lives up to its promise.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

ROG Xbox Ally X: The Ars Technica review Read More »

why-signal’s-post-quantum-makeover-is-an-amazing-engineering-achievement

Why Signal’s post-quantum makeover is an amazing engineering achievement


COMING TO A PHONE NEAR YOU

New design sets a high standard for post-quantum readiness.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

The encryption protecting communications against criminal and nation-state snooping is under threat. As private industry and governments get closer to building useful quantum computers, the algorithms protecting Bitcoin wallets, encrypted Web visits, and other sensitive secrets will be useless. No one doubts the day will come, but as the now-common joke in cryptography circles observes, experts have been forecasting this cryptocalypse will arrive in the next 15 to 30 years for the past 30 years.

The uncertainty has created something of an existential dilemma: Should network architects spend the billions of dollars required to wean themselves off quantum-vulnerable algorithms now, or should they prioritize their limited security budgets fighting more immediate threats such as ransomware and espionage attacks? Given the expense and no clear deadline, it’s little wonder that less than half of all TLS connections made inside the Cloudflare network and only 18 percent of Fortune 500 networks support quantum-resistant TLS connections. It’s all but certain that many fewer organizations still are supporting quantum-ready encryption in less prominent protocols.

Triumph of the cypherpunks

One exception to the industry-wide lethargy is the engineering team that designs the Signal Protocol, the open-source engine that powers the world’s most robust and resilient form of end-to-end encryption for multiple private chat apps, most notably the Signal Messenger. Eleven days ago, the nonprofit entity that develops the protocol, Signal Messenger LLC, published a 5,900-word write-up describing its latest updates that make Signal fully quantum-resistant.

The complexity and problem-solving required for making the Signal Protocol quantum safe are as daunting as just about any in modern-day engineering. The original Signal Protocol already resembled the inside of a fine Swiss timepiece, with countless gears, wheels, springs, hands, and other parts all interoperating in an intricate way. In less adept hands, mucking about with an instrument as complex as the Signal protocol could have led to shortcuts or unintended consequences that hurt performance, undoing what would otherwise be a perfectly running watch. Yet this latest post-quantum upgrade (the first one came in 2023) is nothing short of a triumph.

“This appears to be a solid, thoughtful improvement to the existing Signal Protocol,” said Brian LaMacchia, a cryptography engineer who oversaw Microsoft’s post-quantum transition from 2015 to 2022 and now works at Farcaster Consulting Group. “As part of this work, Signal has done some interesting optimization under the hood so as to minimize the network performance impact of adding the post-quantum feature.”

Of the multiple hurdles to clear, the most challenging was accounting for the much larger key sizes that quantum-resistant algorithms require. The overhaul here adds protections based on ML-KEM-768, an implementation of the CRYSTALS-Kyber algorithm that was selected in 2022 and formalized last year by the National Institute of Standards and Technology. ML-KEM is short for Module-Lattice-Based Key-Encapsulation Mechanism, but most of the time, cryptographers refer to it simply as KEM.

Ratchets, ping-pong, and asynchrony

Like the Elliptic curve Diffie-Hellman (ECDH) protocol that Signal has used since its start, KEM is a key encapsulation mechanism. Also known as a key agreement mechanism, it provides the means for two parties who have never met to securely agree on one or more shared secrets in the presence of an adversary who is monitoring the parties’ connection. RSA, ECDH, and other encapsulation algorithms have long been used to negotiate symmetric keys (almost always AES keys) in protocols including TLS, SSH, and IKE. Unlike ECDH and RSA, however, the much newer KEM is quantum-safe.

Key agreement in a protocol like TLS is relatively straightforward. That’s because devices connecting over TLS negotiate a key over a single handshake that occurs at the beginning of a session. The agreed-upon AES key is then used throughout the session. The Signal Protocol is different. Unlike TLS sessions, Signal sessions are protected by forward secrecy, a cryptographic property that ensures the compromise of a key used to encrypt a recent set of messages can’t be used to decrypt an earlier set of messages. The protocol also offers Post-Compromise Security, which protects future messages from past key compromises. While a TLS  uses the same key throughout a session, keys within a Signal session constantly evolve.

To provide these confidentiality guarantees, the Signal Protocol updates secret key material each time a message party hits the send button or receives a message, and at other points, such as in graphical indicators that a party is currently typing and in the sending of read receipts. The mechanism that has made this constant key evolution possible over the past decade is what protocol developers call a “double ratchet.” Just as a traditional ratchet allows a gear to rotate in one direction but not in the other, the Signal ratchets allow messaging parties to create new keys based on a combination of preceding and newly agreed-upon secrets. The ratchets work in a single direction, the sending and receiving of future messages. Even if an adversary compromises a newly created secret, messages encrypted using older secrets can’t be decrypted.

The starting point is a handshake that performs three or four ECDH agreements that mix long- and short-term secrets to establish a shared secret. The creation of this “root key” allows the Double Ratchet to begin. Until 2023, the key agreement used X3DH. The handshake now uses PQXDH to make the handshake quantum-resistant.

The first layer of the Double Ratchet, the Symmetric Ratchet, derives an AES key from the root key and advances it for every message sent. This allows every message to be encrypted with a new secret key. Consequently, if attackers compromise one party’s device, they won’t be able to learn anything about the keys that came earlier. Even then, though, the attackers would still be able to compute the keys used in future messages. That’s where the second, “Diffie-Hellman ratchet” comes in.

The Diffie-Hellman ratchet incorporates a new ECDH public key into each message sent. Using Alice and Bob, the fictional characters often referred to when explaining asymmetric encryption, when Alice sends Bob a message, she creates a new ratchet keypair and computes the ECDH agreement between this key and the last ratchet public key Bob sent. This gives her a new secret, and she knows that once Bob gets her new public key, he will know this secret, too (because, as mentioned earlier, Bob previously sent that other key). With that, Alice can mix the new secret with her old root key to get a new root key and start fresh. The result: Attackers who learn her old secrets won’t be able to tell the difference between her new ratchet keys and random noise.

The result is what Signal developers describe as “ping-pong” behavior, as the parties to a discussion take turns replacing ratchet key pairs one at a time. The effect: An eavesdropper who compromises one of the parties might recover a current ratchet private key, but soon enough, that private key will be replaced with a new, uncompromised one, and in a way that keeps it free from the prying eyes of the attacker.

The objective of the newly generated keys is to limit the number of messages that can be decrypted if an adversary recovers key material at some point in an ongoing chat. Messages sent prior to and after the compromise will remain off limits.

A major challenge designers of the Signal Protocol face is the need to make the ratchets work in an asynchronous environment. Asynchronous messages occur when parties send or receive them at different times—such as while one is offline and the other is active, or vice versa—without either needing to be present or respond immediately. The entire Signal Protocol must work within this asynchronous environment. What’s more, it must work reliably over unstable networks and networks controlled by adversaries, such as a government that forces a telecom or cloud service to spy on the traffic.

Shor’s algorithm lurking

By all accounts, Signal’s double ratchet design is state-of-the-art. That said, it’s wide open to an inevitable if not immediate threat: quantum computing. That’s because an adversary capable of monitoring traffic passing from two or more messenger users can capture that data and feed it into a quantum computer—once one of sufficient power is viable—and calculate the ephemeral keys generated in the second ratchet.

In classical computing, it’s infeasible, if not impossible, for such an adversary to calculate the key. Like all asymmetric encryption algorithms, ECDH is based on a mathematical, one-way function. Also known as trapdoor functions, these problems are trivial to compute in one direction and substantially harder to compute in reverse. In elliptic curve cryptography, this one-way function is based on the Discrete Logarithm problem in mathematics. The key parameters are based on specific points in an elliptic curve over the field of integers modulo some prime P.

On average, an adversary equipped with only a classical computer would spend billions of years guessing integers before arriving at the right ones. A quantum computer, by contrast, would be able to calculate the correct integers in a matter of hours or days. A formula known as Shor’s algorithm—which runs only on a quantum computer—reverts this one-way discrete logarithm equation to a two-way one. Shor’s Algorithm can similarly make quick work of solving the one-way function that’s the basis for the RSA algorithm.

As noted earlier, the Signal Protocol received its first post-quantum makeover in 2023. This update added PQXDH—a Signal-specific implementation that combined the key agreements from elliptic curves used in X3DH (specifically X25519) and the quantum-safe KEM—in the initial protocol handshake. (X3DH was then put out to pasture as a standalone implementation.)

The move foreclosed the possibility of a quantum attack being able to recover the symmetric key used to start the ratchets, but the ephemeral keys established in the ping-ponging second ratchet remained vulnerable to a quantum attack. Signal’s latest update adds quantum resistance to these keys, ensuring that forward secrecy and post-compromise security are safe from Shor’s algorithm as well.

Even though the ping-ponging keys are vulnerable to future quantum attacks, they are broadly believed to be secure against today’s attacks from classical computers. The Signal Protocol developers didn’t want to remove them or the battle-tested code that produces them. That led to their decision to add quantum resistance by adding a third ratchet. This one uses a quantum-safe KEM to produce new secrets much like the Diffie-Hellman ratchet did before, ensuring quantum-safe, post-compromise security.

The technical challenges were anything but easy. Elliptic curve keys generated in the X25519 implementation are about 32 bytes long, small enough to be added to each message without creating a burden on already constrained bandwidths or computing resources. A ML-KEM 768 key, by contrast, is 1,000 bytes. Additionally, Signal’s design requires sending both an encryption key and a ciphertext, making the total size 2272 bytes.

And then there were three

To handle the 71x increase, Signal developers considered a variety of options. One was to send the 2272-byte KEM key less often—say every 50th message or once every week—rather than every message. That idea was nixed because it doesn’t work well in asynchronous or adversarial messaging environments. Signal Protocol developers Grame Connell and Rolfe Schmidt explained:

Consider the case of “send a key if you haven’t sent one in a week”. If Bob has been offline for 2 weeks, what does Alice do when she wants to send a message? What happens if we can lose messages, and we lose the one in fifty that contains a new key? Or, what happens if there’s an attacker in the middle that wants to stop us from generating new secrets, and can look for messages that are [many] bytes larger than the others and drop them, only allowing keyless messages through?

Another option Signal engineers considered was breaking the 2272-byte key into smaller chunks, say 71 of them that are 32 bytes each. Breaking up the KEM key into smaller chunks and putting one in each message sounds like a viable approach at first, but once again, the asynchronous environment of messaging made it unworkable. What happens, for example, when data loss causes one of the chunks to be dropped? The protocol could deal with this scenario by just repeat-sending chunks again after sending all 71 previously. But then an adversary monitoring the traffic could simply cause packet 3 to be dropped each time, preventing Alice and Bob from completing the key exchange.

Signal developers ultimately went with a solution that used this multiple-chunks approach.

Sneaking an elephant through the cat door

To manage the asynchrony challenges, the developers turned to “erasure codes,” a method of breaking up larger data into smaller pieces such that the original can be reconstructed using any sufficiently sized subset of chunks.

Charlie Jacomme, a researcher at INRIA Nancy on the Pesto team who focuses on formal verification and secure messaging, said this design accounts for packet loss by building redundancy into the chunked material. Instead of all x number of chunks having to be successfully received to reconstruct the key, the model requires only x-y chunks to be received, where y is the acceptable number of packets lost. As long as that threshold is met, the new key can be established even when packet loss occurs.

The other part of the design was to split the KEM computations into smaller steps. These KEM computations are distinct from the KEM key material.

As Jacomme explained it:

Essentially, a small part of the public key is enough to start computing and sending a bigger part of the ciphertext, so you can quickly send in parallel the rest of the public key and the beginning of the ciphertext. Essentially, the final computations are equal to the standard, but some stuff was parallelized.

All this in fact plays a role in the end security guarantees, because by optimizing the fact that KEM computations are done faster, you introduce in your key derivation fresh secrets more frequently.

Signal’s post 10 days ago included several images that illustrate this design:

While the design solved the asynchronous messaging problem, it created a new complication of its own: This new quantum-safe ratchet advanced so quickly that it couldn’t be kept synchronized with the Diffie-Hellman ratchet. Ultimately, the architects settled on a creative solution. Rather than bolt KEM onto the existing double ratchet, they allowed it to remain more or less the same as it had been. Then they used the new quantum-safe ratchet to implement a parallel secure messaging system.

Now, when the protocol encrypts a message, it sources encryption keys from both the classic Double Ratchet and the new ratchet. It then mixes the two keys together (using a cryptographic key derivation function) to get a new encryption key that has all of the security of the classical Double Ratchet but now has quantum security, too.

The Signal engineers have given this third ratchet the formal name: Sparse Post Quantum Ratchet, or SPQR for short. The third ratchet was designed in collaboration with PQShield, AIST, and New York University. The developers presented the erasure-code-based chunking and the high-level Triple Ratchet design at the Eurocrypt 2025 conference. At the Usenix 25 conference, they discussed the six options they considered for adding quantum-safe forward secrecy and post-compromise security and why SPQR and one other stood out. Presentations at the NIST PQC Standardization Conference and the Cryptographic Applications Workshop explain the details of chunking, the design challenges, and how the protocol had to be adapted to use the standardized ML-KEM.

Jacomme further observed:

The final thing interesting for the triple ratchet is that it nicely combines the best of both worlds. Between two users, you have a classical DH-based ratchet going on one side, and fully independently, a KEM-based ratchet is going on. Then, whenever you need to encrypt something, you get a key from both, and mix it up to get the actual encryption key. So, even if one ratchet is fully broken, be it because there is now a quantum computer, or because somebody manages to break either elliptic curves or ML-KEM, or because the implementation of one is flawed, or…, the Signal message will still be protected by the second ratchet. In a sense, this update can be seen, of course simplifying, as doubling the security of the ratchet part of Signal, and is a cool thing even for people that don’t care about quantum computers.

As both Signal and Jacomme noted, users of Signal and other messengers relying on the Signal Protocol need not concern themselves with any of these new designs. To paraphrase a certain device maker, it just works.

In the coming weeks or months, various messaging apps and app versions will be updated to add the triple ratchet. Until then, apps will simply rely on the double ratchet as they always did. Once apps receive the update, they’ll behave exactly as they did before upgrading.

For those who care about the internal workings of their Signal-based apps, though, the architects have documented in great depth the design of this new ratchet and how it behaves. Among other things, the work includes a mathematical proof verifying that the updated Signal protocol provides the claimed security properties.

Outside researchers are applauding the work.

“If the normal encrypted messages we use are cats, then post-quantum ciphertexts are elephants,” Matt Green, a cryptography expert at Johns Hopkins University, wrote in an interview. “So the problem here is to sneak an elephant through a tunnel designed for cats. And that’s an amazing engineering achievement. But it also makes me wish we didn’t have to deal with elephants.”

Photo of Dan Goodin

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

Why Signal’s post-quantum makeover is an amazing engineering achievement Read More »

“like-putting-on-glasses-for-the-first-time”—how-ai-improves-earthquake-detection

“Like putting on glasses for the first time”—how AI improves earthquake detection


AI is “comically good” at detecting small earthquakes—here’s why that matters.

Credit: Aurich Lawson | Getty Images

On January 1, 2008, at 1: 59 am in Calipatria, California, an earthquake happened. You haven’t heard of this earthquake; even if you had been living in Calipatria, you wouldn’t have felt anything. It was magnitude -0.53, about the same amount of shaking as a truck passing by. Still, this earthquake is notable, not because it was large but because it was small—and yet we know about it.

Over the past seven years, AI tools based on computer imaging have almost completely automated one of the fundamental tasks of seismology: detecting earthquakes. What used to be the task of human analysts—and later, simpler computer programs—can now be done automatically and quickly by machine-learning tools.

These machine-learning tools can detect smaller earthquakes than human analysts, especially in noisy environments like cities. Earthquakes give valuable information about the composition of the Earth and what hazards might occur in the future.

“In the best-case scenario, when you adopt these new techniques, even on the same old data, it’s kind of like putting on glasses for the first time, and you can see the leaves on the trees,” said Kyle Bradley, co-author of the Earthquake Insights newsletter.

I talked with several earthquake scientists, and they all agreed that machine-learning methods have replaced humans for the better in these specific tasks.

“It’s really remarkable,” Judith Hubbard, a Cornell University professor and Bradley’s co-author, told me.

Less certain is what comes next. Earthquake detection is a fundamental part of seismology, but there are many other data processing tasks that have yet to be disrupted. The biggest potential impacts, all the way to earthquake forecasting, haven’t materialized yet.

“It really was a revolution,” said Joe Byrnes, a professor at the University of Texas at Dallas. “But the revolution is ongoing.”

When an earthquake happens in one place, the shaking passes through the ground, similar to how sound waves pass through the air. In both cases, it’s possible to draw inferences about the materials the waves pass through.

Imagine tapping a wall to figure out if it’s hollow. Because a solid wall vibrates differently than a hollow wall, you can figure out the structure by sound.

With earthquakes, this same principle holds. Seismic waves pass through different materials (rock, oil, magma, etc.) differently, and scientists use these vibrations to image the Earth’s interior.

The main tool that scientists traditionally use is a seismometer. These record the movement of the Earth in three directions: up–down, north–south, and east–west. If an earthquake happens, seismometers can measure the shaking in that particular location.

An old-fashioned physical seismometer. Today, seismometers record data digitally. Credit: Yamaguchi先生 on Wikimedia CC BY-SA 3.0

Scientists then process raw seismometer information to identify earthquakes.

Earthquakes produce multiple types of shaking, which travel at different speeds. Two types, Primary (P) waves and Secondary (S) waves are particularly important, and scientists like to identify the start of each of these phases.

Before good algorithms, earthquake cataloging had to happen by hand. Byrnes said that “traditionally, something like the lab at the United States Geological Survey would have an army of mostly undergraduate students or interns looking at seismograms.”

However, there are only so many earthquakes you can find and classify manually. Creating algorithms to effectively find and process earthquakes has long been a priority in the field—especially since the arrival of computers in the early 1950s.

“The field of seismology historically has always advanced as computing has advanced,” Bradley told me.

There’s a big challenge with traditional algorithms, though: They can’t easily find smaller quakes, especially in noisy environments.

Composite seismogram of common events. Note how each event has a slightly different shape. Credit: EarthScope Consortium CC BY 4.0

As we see in the seismogram above, many different events can cause seismic signals. If a method is too sensitive, it risks falsely detecting events as earthquakes. The problem is especially bad in cities, where the constant hum of traffic and buildings can drown out small earthquakes.

However, earthquakes have a characteristic “shape.” The magnitude 7.7 earthquake above looks quite different from the helicopter landing, for instance.

So one idea scientists had was to make templates from human-labeled datasets. If a new waveform correlates closely with an existing template, it’s almost certainly an earthquake.

Template matching works very well if you have enough human-labeled examples. In 2019, Zach Ross’ lab at Caltech used template matching to find 10 times as many earthquakes in Southern California as had previously been known, including the earthquake at the start of this story. Almost all of the new 1.6 million quakes they found were very small, magnitude 1 and below.

If you don’t have an extensive pre-existing dataset of templates, however, you can’t easily apply template matching. That isn’t a problem in Southern California—which already had a basically complete record of earthquakes down to magnitude 1.7—but it’s a challenge elsewhere.

Also, template matching is computationally expensive. Creating a Southern California quake dataset using template matching took 200 Nvidia P100 GPUs running for days on end.

There had to be a better way.

AI detection models solve all of these problems:

  • They are faster than template matching.

  • Because AI detection models are very small (around 350,000 parameters compared to billions in LLMs like GPT4.0), they can be run on consumer CPUs.

  • AI models generalize well to regions not represented in the original dataset.

As an added bonus, AI models can give better information about when the different types of earthquake shaking arrive. Timing the arrivals of the two most important waves—P and S waves—is called phase picking. It allows scientists to draw inferences about the structure of the quake. AI models can do this alongside earthquake detection.

The basic task of earthquake detection (and phase picking) looks like this:

Cropped figure from Earthquake Transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking. Credit: Nature Communications

The first three rows represent different directions of vibration (east–west, north–south, and up–down respectively). Given these three dimensions of vibration, can we determine if an earthquake occurred, and if so, when it started?

We want to detect the initial P wave, which arrives directly from the site of the earthquake. But this can be tricky because echoes of the P wave may get reflected off other rock layers and arrive later, making the waveform more complicated.

Ideally, then, our model outputs three things at every time step in the sample:

  1. The probability that an earthquake is occurring at that moment.

  2. The probability that the first P wave arrives at that moment.

  3. The probability that the first S wave arrives at that moment.

We see all three outputs in the fourth row: the detection in green, the P wave arrival in blue, and the S wave arrival in red. (There are two earthquakes in this sample.)

To train an AI model, scientists take large amounts of labeled data, like what’s above, and do supervised training. I’ll describe one of the most used models: Earthquake Transformer, which was developed around 2020 by a Stanford University team led by S. Mostafa Mousavi, who later became a Harvard professor.

Like many earthquake detection models, Earthquake Transformer adapts ideas from image classification. Readers may be familiar with AlexNet, a famous image-recognition model that kicked off the deep-learning boom in 2012.

AlexNet used convolutions, a neural network architecture that’s based on the idea that pixels that are physically close together are more likely to be related. The first convolutional layer of AlexNet broke an image down into small chunks—11 pixels on a side—and classified each chunk based on the presence of simple features like edges or gradients.

The next layer took the first layer’s classifications as input and checked for higher-level concepts such as textures or simple shapes.

Each convolutional layer analyzed a larger portion of the image and operated at a higher level of abstraction. By the final layers, the network was looking at the entire image and identifying objects like “mushroom” and “container ship.”

Images are two-dimensional, so AlexNet is based on two-dimensional convolutions. By contrast, seismograph data is one-dimensional, so Earthquake Transformer uses one-dimensional convolutions over the time dimension. The first layer analyzes vibration data in 0.1-second chunks, while later layers identify patterns over progressively longer time periods.

It’s difficult to say what exact patterns the earthquake model is picking out, but we can analogize this to a hypothetical audio transcription model using one-dimensional convolutions. That model might first identify consonants, then syllables, then words, then sentences over increasing time scales.

Earthquake Transformer converts raw waveform data into a collection of high-level representations that indicate the likelihood of earthquakes and other seismologically significant events. This is followed by a series of deconvolution layers that pinpoint exactly when an earthquake—and its all-important P and S waves—occurred.

The model also uses an attention layer in the middle of the model to mix information between different parts of the time series. The attention mechanism is most famous in large language models, where it helps pass information between words. It plays a similar role in seismographic detection. Earthquake seismograms have a general structure: P waves followed by S waves followed by other types of shaking. So if a segment looks like the start of a P wave, the attention mechanism helps it check that it fits into a broader earthquake pattern.

All of the Earthquake Transformer’s components are standard designs from the neural network literature. Other successful detection models, like PhaseNet, are even simpler. PhaseNet uses only one-dimensional convolutions to pick the arrival times of earthquake waves. There are no attention layers.

Generally, there hasn’t been “much need to invent new architectures for seismology,” according to Byrnes. The techniques derived from image processing have been sufficient.

What made these generic architectures work so well then? Data. Lots of it.

Ars has previously reported on how the introduction of ImageNet, an image recognition benchmark, helped spark the deep learning boom. Large, publicly available earthquake datasets have played a similar role in seismology.

Earthquake Transformer was trained using the Stanford Earthquake Dataset (STEAD), which contains 1.2 million human-labeled segments of seismogram data from around the world. (The paper for STEAD explicitly mentions ImageNet as an inspiration). Other models, like PhaseNet, were also trained on hundreds of thousands or millions of labeled segments.

All recorded earthquakes in the Stanford Earthquake Dataset. Credit: IEEE (CC BY 4.0)

The combination of the data and the architecture just works. The current models are “comically good” at identifying and classifying earthquakes, according to Byrnes. Typically, machine-learning methods find 10 or more times the quakes that were previously identified in an area. You can see this directly in an Italian earthquake catalog:

From Machine learning and earthquake forecasting—next steps by Beroza et al. Credit: Nature Communications (CC-BY 4.0)

AI tools won’t necessarily detect more earthquakes than template matching. But AI-based techniques are much less compute- and labor-intensive, making them more accessible to the average research project and easier to apply in regions around the world.

All in all, these machine-learning models are so good that they’ve almost completely supplanted traditional methods for detecting and phase-picking earthquakes, especially for smaller magnitudes.

The holy grail of earthquake science is earthquake prediction. For instance, scientists know that a large quake will happen near Seattle but have little ability to know whether it will happen tomorrow or in a hundred years. It would be helpful if we could predict earthquakes precisely enough to allow people in affected areas to evacuate.

You might think AI tools would help predict earthquakes, but that doesn’t seem to have happened yet.

The applications are more technical and less flashy, said Cornell’s Judith Hubbard.

Better AI models have given seismologists much more comprehensive earthquake catalogs, which have unlocked “a lot of different techniques,” Bradley said.

One of the coolest applications is in understanding and imaging volcanoes. Volcanic activity produces a large number of small earthquakes, whose locations help scientists understand the structure of the magma system. In a 2022 paper, John Wilding and co-authors used a large AI-generated earthquake catalog to create this incredible image of the structure of the Hawaiian volcanic system.

Each dot represents an individual earthquake. Credit: Wilding et al., The magmatic web beneath Hawai‘i.

They provided direct evidence of a previously hypothesized magma connection between the deep Pāhala sill complex and Mauna Loa’s shallow volcanic structure. You can see this in the image with the arrow labeled as Pāhala-Mauna Loa seismicity band. The authors were also able to clarify the structure of the Pāhala sill complex into discrete sheets of magma. This level of detail could potentially facilitate better real-time monitoring of earthquakes and more accurate eruption forecasting.

Another promising area is lowering the cost of dealing with huge datasets. Distributed Acoustic Sensing (DAS) is a powerful technique that uses fiber-optic cables to measure seismic activity across the entire length of the cable. A single DAS array can produce “hundreds of gigabytes of data” a day, according to Jiaxuan Li, a professor at the University of Houston. That much data can produce extremely high-resolution datasets—enough to pick out individual footsteps.

AI tools make it possible to very accurately time earthquakes in DAS data. Before the introduction of AI techniques for phase picking in DAS data, Li and some of his collaborators attempted to use traditional techniques. While these “work roughly,” they weren’t accurate enough for their downstream analysis. Without AI, much of his work would have been “much harder,” he told me.

Li is also optimistic that AI tools will be able to help him isolate “new types of signals” in the rich DAS data in the future.

Not all AI techniques have paid off

As in many other scientific fields, seismologists face some pressure to adopt AI methods, whether or not they are relevant to their research.

“The schools want you to put the word AI in front of everything,” Byrnes said. “It’s a little out of control.”

This can lead to papers that are technically sound but practically useless. Hubbard and Bradley told me that they’ve seen a lot of papers based on AI techniques that “reveal a fundamental misunderstanding of how earthquakes work.”

They pointed out that graduate students can feel pressure to specialize in AI methods at the cost of learning less about the fundamentals of the scientific field. They fear that if this type of AI-driven research becomes entrenched, older methods will get “out-competed by a kind of meaninglessness.”

While these are real issues, and ones Understanding AI has reported on before, I don’t think they detract from the success of AI earthquake detection. In the last five years, an AI-based workflow has almost completely replaced one of the fundamental tasks in seismology for the better.

That’s pretty cool.

Kai Williams is a reporter for Understanding AI, a Substack newsletter founded by Ars Technica alum Timothy B. Lee. His work is supported by a Tarbell Fellowship. Subscribe to Understanding AI to get more from Tim and Kai.

“Like putting on glasses for the first time”—how AI improves earthquake detection Read More »

we’re-about-to-find-many-more-interstellar-interlopers—here’s-how-to-visit-one

We’re about to find many more interstellar interlopers—here’s how to visit one


“You don’t have to claim that they’re aliens to make these exciting.”

The Hubble Space Telescope captured this image of the interstellar comet 3I/ATLAS on July 21, when the comet was 277 million miles from Earth. Hubble shows that the comet has a teardrop-shaped cocoon of dust coming off its solid, icy nucleus. Credit: NASA, ESA, David Jewitt (UCLA); Image Processing: Joseph DePasquale (STScI)

The Hubble Space Telescope captured this image of the interstellar comet 3I/ATLAS on July 21, when the comet was 277 million miles from Earth. Hubble shows that the comet has a teardrop-shaped cocoon of dust coming off its solid, icy nucleus. Credit: NASA, ESA, David Jewitt (UCLA); Image Processing: Joseph DePasquale (STScI)

A few days ago, an inscrutable interstellar interloper made its closest approach to Mars, where a fleet of international spacecraft seek to unravel the red planet’s ancient mysteries.

Several of the probes encircling Mars took a break from their usual activities and turned their cameras toward space to catch a glimpse of an object named 3I/ATLAS, a rogue comet that arrived in our Solar System from interstellar space and is now barreling toward perihelion—its closest approach to the Sun—at the end of this month.

This is the third interstellar object astronomers have detected within our Solar System, following 1I/ʻOumuamua and 2I/Borisov discovered in 2017 and 2019. Scientists think interstellar objects routinely transit among the planets, but telescopes have only recently had the ability to find one. For example, the telescope that discovered Oumuamua only came online in 2010.

Detectable but still unreachable

Astronomers first reported observations of 3I/ATLAS on July 1, just four months before reaching its deepest penetration into the Solar System. Unfortunately for astronomers, the particulars of this object’s trajectory will bring it to perihelion when the Earth is on the opposite side of the Sun. The nearest 3I/ATLAS will come to Earth is about 170 million miles (270 million kilometers) in December, eliminating any chance for high-resolution imaging. The viewing geometry also means the Sun’s glare will block all direct views of the comet from Earth until next month.

The James Webb Space Telescope observed interstellar comet 3I/ATLAS on August 6 with its Near-Infrared Spectrograph instrument. Credit: NASA/James Webb Space Telescope

Because of that, the closest any active spacecraft will get to 3I/ATLAS happened Friday, when it passed less than 20 million miles (30 million kilometers) from Mars. NASA’s Perseverance rover and Mars Reconnaissance Orbiter were expected to make observations of 3I/ATLAS, along with Europe’s Mars Express and ExoMars Trace Gas Orbiter missions.

The best views of the object so far have been captured by the James Webb Space Telescope and the Hubble Space Telescope, positioned much closer to Earth. Those observations helped astronomers narrow down the object’s size, but the estimates remain imprecise. Based on Hubble’s images, the icy core of 3I/ATLAS is somewhere between the size of the Empire State Building to something a little larger than Central Park.

That may be the most we’ll ever know about the dimensions of 3I/ATLAS. The spacecraft at Mars lack the exquisite imaging sensitivity of Webb and Hubble, so don’t expect spectacular views from Friday’s observations. But scientists hope to get a better handle on the cloud of gas and dust surrounding the object, giving it the appearance of a comet. Spectroscopic observations have shown the coma around 3I/ATLAS contains water vapor and an unusually strong signature of carbon dioxide extending out nearly a half-million miles.

On Tuesday, the European Space Agency released the first grainy images of 3I/ATLAS captured at Mars. The best views will come from a small telescope called HiRISE on NASA’s Mars Reconnaissance Orbiter. The images from NASA won’t be released until after the end of the ongoing federal government shutdown, according to a member of the HiRISE team.

Europe’s ExoMars Trace Gas Orbiter turned its eyes toward interstellar comet 3I/ATLAS as it passed close to Mars on Friday, October 3. The comet’s coma is visible as a fuzzy blob surrounding its nucleus, which was not resolved by the spacecraft’s camera. Credit: ESA/TGO/CaSSIS

Studies of 3I/ATLAS suggest it was probably kicked out of another star system, perhaps by an encounter with a giant planet. Comets in our Solar System sometimes get ejected into the Milky Way galaxy when they come too close to Jupiter. It roamed the galaxy for billions of years before arriving in the Sun’s galactic neighborhood.

The rogue comet is now gaining speed as gravity pulls it toward perihelion, when it will max out at a relative velocity of 152,000 mph (68 kilometers per second), much too fast to be bound into a closed orbit around the Sun. Instead, the comet will catapult back into the galaxy, never to be seen again.

We need to talk about aliens

Anyone who studies planetary formation would relish the opportunity to get a close-up look at an interstellar object. Sending a mission to one would undoubtedly yield a scientific payoff. There’s a good chance that many of these interlopers have been around longer than our own 4.5 billion-year-old Solar System.

One study from the University of Oxford suggests that 3I/ATLAS came from the “thick disk” of the Milky Way, which is home to a dense population of ancient stars. This origin story would mean the comet is probably more than 7 billion years old, holding clues about cosmic history that are simply inaccessible among the planets, comets, and asteroids that formed with the birth of the Sun.

This is enough reason to mount a mission to explore one of these objects, scientists said. It doesn’t need justification from unfounded theories that 3I/ATLAS might be an artifact of alien technology, as proposed by Harvard University astrophysicist Avi Loeb. The scientific consensus is that the object is of natural origin.

Loeb shared a similar theory about the first interstellar object found wandering through our Solar System. His statements have sparked questions in popular media about why the world’s space agencies don’t send a probe to actually visit one. Loeb himself proposed redirecting NASA’s Juno spacecraft in orbit around Jupiter on a mission to fly by 3I/ATLAS, and his writings prompted at least one member of Congress to write a letter to NASA to “rejuvenate” the Juno mission by breaking out of Jupiter’s orbit and taking aim at 3I/ATLAS for a close-up inspection.

The problem is that Juno simply doesn’t have enough fuel to reach the comet, and its main engine is broken. In fact, the total boost required to send Juno from Jupiter to 3I/ATLAS (roughly 5,800 mph or 2.6 kilometers per second) would surpass the fuel capacity of most interplanetary probes.

Ars asked Scott Bolton, lead scientist on the Juno mission, and he confirmed that the spacecraft lacks the oomph required for the kind of maneuvers proposed by Loeb. “We had no role in that paper,” Bolton told Ars. “He assumed propellant that we don’t really have.”

Avi Loeb, a Harvard University astrophysicist. Credit: Anibal Martel/Anadolu Agency via Getty Images

So Loeb’s exercise was moot, but his talk of aliens has garnered public attention. Loeb appeared on the conservative network Newsmax last week to discuss his theory of 3I/ATLAS alongside Rep. Tim Burchett (R-Tenn.). Predictably, conspiracy theories abounded. But as of Tuesday, the segment has 1.2 million views on YouTube. Maybe it’s a good thing that people who approve government budgets, especially those without a preexisting interest in NASA, are eager to learn more about the Universe. We will leave it to the reader to draw their own conclusions on that matter.

Loeb’s calculations also help illustrate the difficulty of pulling off a mission to an interstellar object. So far, we’ve only known about an incoming interstellar intruder a few months before it comes closest to Earth. That’s not to mention the enormous speeds at which these objects move through the Solar System. It’s just not feasible to build a spacecraft and launch it on such short notice.

Now, some scientists are working on ways to overcome these limitations.

So you’re saying there’s a chance?

One of these people is Colin Snodgrass, an astronomer and planetary scientist at the University of Edinburgh. A few years ago, he helped propose to the European Space Agency a mission concept that would have very likely been laughed out of the room a generation ago. Snodgrass and his team wanted a commitment from ESA of up to $175 million (150 million euros) to launch a mission with no idea of where it would go.

ESA officials called Snodgrass in 2019 to say the agency would fund his mission, named Comet Interceptor, for launch in the late 2020s. The goal of the mission is to perform the first detailed observations of a long-period comet. So far, spacecraft have only visited short-period comets that routinely dip into the inner part of the Solar System.

A long-period comet is an icy visitor from the farthest reaches of the Solar System that has spent little time getting blasted by the Sun’s heat and radiation, freezing its physical and chemical properties much as they were billions of years ago.

Long-period comets are typically discovered a year or two before coming near the Sun, still not enough time to develop a mission from scratch. With Comet Interceptor, ESA will launch a probe to loiter in space a million miles from Earth, wait for the right comet to come along, then fire its engines to pursue it.

Odds are good that the right comet will come from within the Solar System. “That is the point of the mission,” Snodgrass told Ars.

ESA’s Comet Interceptor will be the first mission to visit a comet coming directly from the outer reaches of the Sun’s realm, carrying material untouched since the dawn of the Solar System. Credit: European Space Agency

But if astronomers detect an interstellar object coming toward us on the right trajectory, there’s a chance Comet Interceptor could reach it.

“I think that the entire science team would agree, if we get really lucky and there’s an interstellar object that we could reach, then to hell with the normal plan, let’s go and do this,” Snodgrass said. “It’s an opportunity you couldn’t just leave sitting there.”

But, he added, it’s “very unlikely” that an interstellar object will be in the right place at the right time. “Although everyone’s always very excited about the possibility, and we’re excited about the possibility, we kind of try and keep the expectations to a realistic level.”

For example, if Comet Interceptor were in space today, there’s no way it could reach 3I/ATLAS. “It’s an unfortunate one,” Snodgrass said. “Its closest point to the Sun, it reaches that on the other side of the Sun from where the Earth is. Just bad timing.” If an interceptor were parked somewhere else in the Solar System, it might be able to get itself in position for an encounter with 3I/ATLAS. “There’s only so much fuel aboard,” Snodgrass said. “There’s only so fast we can go.”

It’s even harder to send a spacecraft to encounter an interstellar object than it is to visit one of the Solar System’s homegrown long-period comets. The calculation of whether Comet Interceptor could reach one of these galactic visitors boils down to where it’s heading and when astronomers discover it.

Snodgrass is part of a team using big telescopes to observe 3I/ATLAS from a distance. “As it’s getting closer to the Sun, it is getting brighter,” he said in an interview.

“You don’t have to claim that they’re aliens to make these exciting,” Snodgrass said. “They’re interesting because they are a bit of another solar system that you can actually feasibly get an up-close view of, even the sort of telescopic views we’re getting now.”

Colin Snodgrass, a professor at the University of Edinburgh, leads the Comet Interceptor science team. Credit: University of Edinburgh

Comets and asteroids are the linchpins for understanding the formation of the Solar System. These modest worlds are the leftover building blocks from the debris that coalesced into the planets. Today, direct observations have only allowed scientists to study the history of one planetary system. An interstellar comet would grow the sample size to two.

Still, Snodgrass said his team prefers to keep their energy focused on reaching a comet originating from the frontier of our own Solar System. “We’re not going to let a very lovely Solar System comet go by, waiting to see ‘what if there’s an interstellar thing?'” he said.

Snodgrass sees Comet Interceptor as a proof of concept for scientists to propose a future mission specially designed to travel to an interstellar object. “You need to figure out how do you build the souped-up version that could really get to an interstellar object? I think that’s five or 10 years away, but [it’s] entirely realistic.”

An American answer

Scientists in the United States are working on just such a proposal. A team from the Southwest Research Institute completed a concept study showing how a mission could fly by one of these interstellar visitors. What’s more, the US scientists say their proposed mission could have actually reached 3I/ATLAS had it already been in space.

The American concept is similar to Europe’s Comet Interceptor in that it will park a spacecraft somewhere in deep space and wait for the right target to come along. The study was led by Alan Stern, the chief scientist on NASA’s New Horizons mission that flew by Pluto a decade ago. “These new kinds of objects offer humankind the first feasible opportunity to closely explore bodies formed in other star systems,” he said.

An animation of the trajectory of 3I/ATLAS through the inner Solar System. Credit: NASA/JPL

It’s impossible with current technology to send a spacecraft to match orbits and rendezvous with a high-speed interstellar comet. “We don’t have to catch it,” Stern recently told Ars. “We just have to cross its orbit. So it does carry a fair amount of fuel in order to get out of Earth’s orbit and onto the comet’s path to cross that path.”

Stern said his team developed a cost estimate for such a mission, and while he didn’t disclose the exact number, he said it would fall under NASA’s cost cap for a Discovery-class mission. The Discovery program is a line of planetary science missions that NASA selects through periodic competitions within the science community. The cost cap for NASA’s next Discovery competition is expected to be $800 million, not including the launch vehicle.

A mission to encounter an interstellar comet requires no new technologies, Stern said. Hopes for such a mission are bolstered by the activation of the US-funded Vera Rubin Observatory, a state-of-the-art facility high in the mountains of Chile set to begin deep surveys of the entire southern sky later this year. Stern predicts Rubin will discover “one or two” interstellar objects per year. The new observatory should be able to detect the faint light from incoming interstellar bodies sooner, providing missions with more advance warning.

“If we put a spacecraft like this in space for a few years, while it’s waiting, there should be five or 10 to choose from,” he said.

Alan Stern speaks onstage during Day 1 of TechCrunch Disrupt SF 2018 in San Francisco. Credit: Photo by Kimberly White/Getty Images for TechCrunch

Winning NASA funding for a mission like Stern’s concept will not be easy. It must compete with dozens of other proposals, and NASA’s next Discovery competition is probably at least two or three years away. The timing of the competition is more uncertain than usual due to swirling questions about NASA’s budget after the Trump administration announced it wants to cut the agency’s science funding in half.

Comet Interceptor, on the other hand, is already funded in Europe. ESA has become a pioneer in comet exploration. The Giotto probe flew by Halley’s Comet in 1986, becoming the first spacecraft to make close-up observations of a comet. ESA’s Rosetta mission became the first spacecraft to orbit a comet in 2014, and later that year, it deployed a German-built lander to return the first data from the surface of a comet. Both of those missions explored short-period comets.

“Each time that ESA has done a comet mission, it’s done something very ambitious and very new,” Snodgrass said. “The Giotto mission was the first time ESA really tried to do anything interplanetary… And then, Rosetta, putting this thing in orbit and landing on a comet was a crazy difficult thing to attempt to do.”

“They really do push the envelope a bit, which is good because ESA can be quite risk averse, I think it’s fair to say, with what they do with missions,” he said. “But the comet missions, they are things where they’ve really gone for that next step, and Comet Interceptor is the same. The whole idea of trying to design a space mission before you know where you’re going is a slightly crazy way of doing things. But it’s the only way to do this mission. And it’s great that we’re trying it.”

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

We’re about to find many more interstellar interlopers—here’s how to visit one Read More »