Author name: Kelly Newman

why-snes-hardware-is-running-faster-than-expected—and-why-it’s-a-problem

Why SNES hardware is running faster than expected—and why it’s a problem


gotta go precisely the right speed

Cheap, unreliable ceramic APU resonators lead to “constant, pervasive, unavoidable” issues.

Sir, do you know how fast your SNES was going? Credit: Getty Images

Ideally, you’d expect any Super NES console—if properly maintained—to operate identically to any other Super NES unit ever made. Given the same base ROM file and the same set of precisely timed inputs, all those consoles should hopefully give the same gameplay output across individual hardware and across time.

The TASBot community relies on this kind of solid-state predictability when creating tool-assisted speedruns that can be executed with robotic precision on actual console hardware. But on the SNES in particular, the team has largely struggled to get emulated speedruns to sync up with demonstrated results on real consoles.

After significant research and testing on dozens of actual SNES units, the TASBot team now thinks that a cheap ceramic resonator used in the system’s Audio Processing Unit (APU) is to blame for much of this inconsistency. While Nintendo’s own documentation says the APU should run at a consistent rate of 24,576 Hz (and the associated Digital Signal Processor sample rate at a flat 32,000 Hz), in practice, that rate can vary just a bit based on heat, system age, and minor physical variations that develop in different console units over time.

Casual players would only notice this problem in the form of an almost imperceptibly higher pitch for in-game music and sounds. But for TASbot, Allan “dwangoAC” Cecil says this unreliable clock has become a “constant, pervasive, unavoidable” problem for getting frame-accurate consistency in hardware-verified speedruns.

Not to spec

Cecil testing his own SNES APU in 2016.

Cecil testing his own SNES APU in 2016. Credit: Allan Cecil

Cecil says he first began to suspect the APU’s role in TASBot’s SNES problems back in 2016 when he broke open his own console to test it with an external frequency counter. He found that his APU ran just a bit faster than Nintendo’s specifications, an inconsistency that could cause the console to throw out unpredictable “lag frames” if and when the CPU and APU load cycles failed to line up in the expected manner. Those lag frames, in turn, are enough to “desynchronize” TASBot’s input on actual hardware from the results you’d see on a more controlled emulator.

Unlike the quartz crystals used in many electronics (including the SNES’s more consistent and differently timed CPU), the cheaper ceramic resonators in the SNES APU are “known to degrade over time,” as Cecil put it. Documentation for the resonators used in the APU also seems to suggest that excess heat may impact the clock cycle speed, meaning the APU might speed up a bit as a specific console heats up.

The APU resonator manual shows slight variations in operating thresholds based on heart and other factors.

The APU resonator manual shows slight variations in operating thresholds based on heart and other factors. Credit: Ceralock ceramic resonator manual

The TASBot team was not the first group to notice this kind of audio inconsistency in the SNES. In the early 2000s, some emulator developers found that certain late-era SNES games don’t run correctly when the emulator’s Digital Signal Processor (DSP) sample rate is set to the Nintendo-specified value of precisely 32,000 Hz (a number derived from the speed of the APU clock). Developers tested actual hardware at the time and found that the DSP was actually running at 32,040 Hz and that setting the emulated DSP to run at that specific rate suddenly fixed the misbehaving commercial games.

That small but necessary emulator tweak implies that “the original developers who wrote those games were using hardware that… must have been running slightly faster at that point,” Cecil told Ars. “Because if they had written directly to what the spec said, it may not have worked.”

Survey says…

While research and testing confirmed the existence of these APU variations, Cecil wanted to determine just how big the problem was across actual consoles today. To do that, he ran an informal online survey last month, cryptically warning his social media followers that “SNES consoles seem to be getting faster as they age.” He asked respondents to run a DSP clock measurement ROM on any working SNES hardware they had lying around and to rerun the test after the console had time to warm up.

After receiving 143 responses and crunching the numbers, Cecil said he was surprised to find that temperature seemed to have a minimal impact on measured DSP speed; the measurement only rose an insignificant 8 Hz on average between “cold” and “hot” readings on the same console. Cecil even put his own console in a freezer to see if the DSP clock rate would change as it thawed out and found only a 22 Hz difference as it warmed back up to room temperature.

A sample result from the DSP sample test program.

Credit: Allan Cecil

A sample result from the DSP sample test program. Credit: Allan Cecil

Those heat effects paled in comparison to the natural clock variation across different consoles, though. The slowest and fastest DSPs in Cecil’s sample showed a clock difference of 234 Hz, or about 0.7 percent of the 32,000 Hz specification.

That difference is small enough that human players probably wouldn’t notice it directly; TASBot team member Total estimated it might amount to “at most maybe a second or two [of difference] over hours of gameplay.” Skilled speedrunners could notice small differences, though, if differing CPU and APU alignments cause “carefully memorized enemy pattern changes to something else” between runs, Cecil said.

For a frame-perfect tool-assisted speedrun, though, the clock variations between consoles could cause innumerable headaches. As TASBot team member Undisbeliever explained in his detailed analysis: “On one console this might take 0.126 frames to process the music-tick, on a different console it might take 0.127 frames. It might not seem like much but it is enough to potentially delay the start of song loading by 1 frame (depending on timing, lag and game-code).”

Cecil’s survey found variation across consoles was much higher than the effects of heat on any single console.

Cecil’s survey found variation across consoles was much higher than the effects of heat on any single console. Credit: SNES SMP Speed test survey

Cecil also said the survey-reported DSP clock speeds were also a bit higher than he expected, at an average rate of 32,078 Hz at room temperature. That’s quite a bit higher than both the 32,000 Hz spec set by Nintendo and the 32,040 Hz rate that emulator developers settled on after sampling actual hardware in 2003.

To some observers, this is evidence that SNES APUs originally produced in the ’90s have been speeding up slightly as they age and could continue to get faster in the coming years and decades. But Cecil says the historical data they have is too circumstantial to make such a claim for certain.

“We’re all a bunch of differently skilled geeks and nerds, and it’s in our nature to argue over what the results mean, which is fine,” Cecil said. “The only thing we can say with certainty is the statistical significance of the responses that show the current average DSP sample rate is 32,076 Hz, faster on average than the original specification. The rest of it is up to interpretation and a certain amount of educated guessing based on what we can glean.”

A first step

For the TASBot team, knowing just how much real SNES hardware timing can differ from dry specifications (and emulators) is an important step to getting more consistent results on real hardware. But that knowledge hasn’t completely solved their synchronization problems. Even when Cecil replaced the ceramic APU resonator in his Super NES with a more accurate quartz version (tuned precisely to match Nintendo’s written specification), the team “did not see perfect behavior like we expected,” he told Ars.

Beyond clock speed inconsistencies, Cecil explained to Ars that TASBot team testing has found an additional “jitter pattern” present in the APU sampling that “injects some variance in how long it takes to perform various actions” between runs. That leads to non-deterministic performance even on the same hardware, Cecil said, which means that “TASBot is likely to desync” after just a few minutes of play on most SNES games.

The order in which these components start when the SNES is reset can have a large impact on clock synchronization.

The order in which these components start when the SNES is reset can have a large impact on clock synchronization. Credit: Rasteri

Extensive research from Rasteri suggests that these inconsistencies across same-console runs are likely caused by a “very non-deterministic reset circuit” that changes the specific startup order and timing for a console’s individual components every time it’s powered on. That leads to essentially “infinite possibilities” for the relative place where the CPU and APU clocks start in their “synchronization cycle” for each fresh run, making it impossible to predict specifically where and when lag frames will appear, Rasteri wrote.

Cecil said these kind of “butterfly effect” timing issues make the Super NES “a surprisingly complicated console [that has] resisted our attempts to fully model it and coerce it into behaving consistently.” But he’s still hopeful that the team will “eventually find a way to restore an SNES to the behavior game developers expected based on the documentation they were provided without making invasive changes…”

In the end, though, Cecil seems to have developed an almost grudging respect for how the SNES’s odd architecture leads to such unpredictable operation in practice. “If you want to deliberately create a source of randomness and non-deterministic behavior, having two clock sources that spinloop independently against one another is a fantastic choice,” he said.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Why SNES hardware is running faster than expected—and why it’s a problem Read More »

crew-10-launches,-finally-clearing-the-way-for-butch-and-suni-to-fly-home

Crew-10 launches, finally clearing the way for Butch and Suni to fly home

A Falcon 9 rocket launched four astronauts safely into orbit on Friday evening, marking the official beginning of the Crew-10 mission to the International Space Station.

Although any crew launch into orbit is notable, this mission comes with an added bit of importance as its success clears the way for two NASA astronauts, Butch Wilmore and Suni Williams, to finally return home from space after a saga spanning nine months.

Friday’s launch came two days after an initial attempt was scrubbed on Wednesday evening. This was due to a hydraulic issue with the ground systems that handle the Falcon 9 rocket at Launch Complex 39A in Florida.

There were no technical issues on Friday, and with clear skies NASA astronauts Anne McClain and Nichole Ayers, Japanese astronaut Takuya Onishi, and Roscosmos cosmonaut Kirill Peskov rocketed smoothly into orbit.

If all goes well, the Crew Dragon spacecraft carrying the four astronauts will dock with the space station at 11: 30 pm ET on Saturday. They will spend about six months there.

A long, strange trip

Following their arrival at the space station, the members of Crew-10 will participate in a handover ceremony with the four astronauts of Crew-9, which includes Wilmore and Williams. This will clear the members of Crew 9 for departure from the station as early as next Wednesday, March 19, pending good weather in the waters surrounding Florida for splashdown of Dragon.

Crew-10 launches, finally clearing the way for Butch and Suni to fly home Read More »

on-maim-and-superintelligence-strategy

On MAIM and Superintelligence Strategy

Dan Hendrycks, Eric Schmidt and Alexandr Wang released an extensive paper titled Superintelligence Strategy. There is also an op-ed in Time that summarizes.

The major AI labs expect superintelligence to arrive soon. They might be wrong about that, but at minimum we need to take the possibility seriously.

At a minimum, the possibility of imminent superintelligence will be highly destabilizing. Even if you do not believe it represents an existential risk to humanity (and if so you are very wrong about that) the imminent development of superintelligence is an existential threat to the power of everyone not developing it.

Planning a realistic approach to that scenario is necessary.

What would it look like to take superintelligence seriously? What would it look like if everyone took superintelligence seriously, before it was developed?

The proposed regime here, Mutually Assured AI Malfunction (MAIM), relies on various assumptions in order to be both necessary and sufficient. If those assumptions did turn out to hold, it would be a very interesting, highly not crazy proposal.

  1. ASI (Artificial Superintelligence) is Dual Use.

  2. Three Proposed Interventions.

  3. The Shape of the Problems.

  4. Strategic Competition.

  5. Terrorism.

  6. Loss of Control.

  7. Existing Strategies.

  8. MAIM of the Game.

  9. Nonproliferation.

  10. Competitiveness.

  11. Laying Out Assumptions: Crazy or Crazy Enough To Work?.

  12. Don’t MAIM Me Bro.

ASI helps you do anything you want to do, which in context is often called ‘dual use.’

As in, AI is both a highly useful technology for both military and economic use. It can be used for, or can be an engine of, creation and also for destruction.

It can do both these things in the hands of humans, or on its own.

That means that America must stay competitive in AI, or even stay dominant in AI, both for our economic and our military survival.

The key players include not only states but also non-state actors.

Given what happens by default, what can we do to steer to a different outcome?

They propose three pillars.

Two are highly conventional and traditional. One is neither, in the context of AI.

First, the two conventional ones.

Essentially everyone can get behind Competitiveness, building up AI chips through domestic manufacturing. At least in principle. Trump called for us to end the Chips Act because he is under some strange delusions about how economics and physics work and thinks tariffs are how you fix everything (?), but he does endorse the goal.

Nonproliferation is more controversial but enjoys broad support. America already imposes export controls on AI chips and the proposed diffusion regulations would substantially tighten that regime. This is a deeply ordinary and obviously wise policy. There is a small extremist minority that flips out and calls proposals for ordinary enforcement of things like ‘a call for a global totalitarian surveillance state’ but such claims are rather Obvious Nonsense, entirely false and without merit, since they describe the existing policy regime in many sectors, not only in AI.

The big proposal here is Deterrence with Mutual Assured AI Malfunction (MAIM), as a system roughly akin to Mutually Assured Destruction (MAD) from nuclear weapons.

The theory is that if it is possible to detect and deter opposing attempts to developer superintelligence, the world can perhaps avoid developing superintelligence until we are ready for that milestone.

This chart of wicked problems in need of solving is offered. The ‘tame technical subproblems’ are not easy, but are likely solvable. The wicked problems are far harder.

Note that we are not doing that great a job on even the tame technical subproblems.

  1. Train AI systems to refuse harmful requests: We don’t have an AI system that cannot be jailbroken, even if it is closed weights and under full control, without crippling the mundane utility offered by the system.

  2. Prepare cyberattacks for AI datacenters: This is the one that is not obviously a net positive idea. Presumably this is being done in secret, but I have no knowledge of us doing anything here.

  3. Upgrade AI chip firmware to add geolocation functionality: We could presumably do this, but we haven’t done it.

  4. Patch known vulnerabilities in AI developers’ computer systems: I hope we are doing a decent job of this. However the full ‘tame’ problem is to do this across all systems, since AI will soon be able to automate attacks on all systems, exposing vulnerable legacy systems that often are tied to critical infrastructure. Security through obscurity is going to become a lot less effective.

  5. Design military drones: I do not get the sense we are doing a great job here, either in design or production, relative to its military importance.

  6. Economic strength: Improve AI performance in economically valuable tasks: We’re making rapid progress here, and it still feels like balls are dropped constantly.

  7. Loss of control: Research methods to make current AIs follow instructions: I mean yes we are doing that, although we should likely be investing 10x more. The problem is that our current methods to make this work won’t scale to superintelligence, with the good news being that we are largely aware of that.

They focus on three problems.

They don’t claim these are a complete taxonomy. At a sufficiently abstract level, we have a similar trio of threats to the ones OpenAI discusses in their philosophy document: Humans might do bad things on purpose (terrorism), the AI might do bad things we didn’t intend (loss of control), or locally good things could create bad combined effects (this is the general case of strategic competition, the paper narrowly focuses on state competition but I would generalize this to competition generally).

These problems interact. In particular, strategic competition is a likely key motivator for terrorism, and for risking or triggering a loss of control.

Note the term ‘meaningful’ in meaningful human control. If humans nominally have control, but in practice cannot exercise that control, humans still have lost control.

The paper focuses on the two most obvious strategic competition elements: Economic and military.

Economics is straightforward. If AI becomes capable of most or all labor, then how much inference you can do becomes a prime determinant of economic power, similar to what labor is today, even if there is no full strategic dominance.

Military is also straightforward. AI could enable military dominance through ‘superweapons,’ up to and including advanced drone swarms, new forms of EMP, decisive cyber weapons or things we aren’t even imagining. Sufficiently strong AI would presumably be able to upend nuclear deterrence.

If you are about to stare down superintelligence, you don’t know what you’ll face, but you know if you don’t act now, it could be too late. You are likely about to get outcompeted. It stands to reason countries might consider preventative action, up to and including outright war. We need to anticipate this possibility.

Strategic competition also feeds into the other two risks.

If you are facing strong strategic competition, either the way the paper envisioned at a national level, or competition at the corporate or personal level, from those employing superintelligence, you may have no choice but to either lose or deploy superintelligence yourself. And if everyone else is fully unleashing that superintelligence, can you afford not to do the same? How do humans stay in the loop or under meaningful control?

Distinctly from that fear, or perhaps in combination with it, if actions that are shaped like ‘terrorism’ dominate the strategic landscape, what then?

The term terrorism makes an assertion about what the goal of terrorism is. Often, yes, the goal is to instill fear, or to trigger a lashing out or other expensive response. But we’ve expanded the word ‘terrorism’ to include many other things, so that doesn’t have to be true.

In the cases of this ‘AI-enabled terrorism’ the goal mostly is not to instill fear. We are instead talking about using asymmetric weapons, to inflict as much damage as possible. The scale of the damage relatively unresourced actors can do will scale up.

We have to worry in particular about bioterrorism and cyberattacks on critical infrastructure – this essay chooses to not mention nuclear and radiological risks.

As always this question comes down to offense-defense balance and the scale (and probability) of potential harm. If everyone gets access to similarly powerful AI, what happens? Does the ‘good guy with an AI’ beat the ‘bad guy with an AI’? Does this happen in practice, despite the future being unevenly distributed, and thus much of critical infrastructure not having up-to-date defenses, and suffering from ‘patch lag’?

This is a cost-benefit analysis, including the costs of limiting proliferation. There are big costs in taking action to limit proliferation, even if you are confident it will ultimately work.

The question is, are there even larger costs to not doing so? That’s a fact question. I don’t know the extent to which future AI systems might enable catastrophic misuse, or how much damage that might cause. You don’t either.

We need to do our best to answer that question in advance, and if necessary to limit proliferation. If we want to do that limiting gracefully, with minimal economic costs and loss of freedom, that means laying the necessary groundwork now. The alternative is doing so decidedly ungracefully, or failing to do so at all.

The section on Loss of Control is excellent given its brevity. They cover three subsections.

  1. Erosion of control is similar to the concerns about gradual disempowerment. If anyone not maximally employing AI becomes uncompetitive, humans would rapidly find themselves handing control over voluntarily.

  2. Unleashed AI Agents are an obvious danger. Even a single sufficient capable rouge AI agent unleashed on the internet could cause no end of trouble, and there might be no reasonable way to undo this without massive economic costs we would not be willing to pay once it starts gathering resources and self-replicating. Even a single such superintelligent agent could mean irrevocable loss of control. As always, remember that people will absolutely be so stupid as to, and also some will want to do it, on purpose.

  3. Intelligence Recursion, traditionally called Recursive Self-Improvement (RSI), where smarter AI builds smarter AI builds smarter AI, perhaps extremely rapidly. This is exactly how one gets a strategic monopoly or dominant position, and is ‘the obvious thing to do,’ it’s tough not to do it.

They note explicitly that strategic competition, in the form of geopolitical competitive pressures, could easily make us highly tolerant of such risks, and therefore we could initiate such a path of RSI even if those involved thought the risk of loss of control was very high. I would note that this motivation also holds for corporations and others, not only nations, and again that some people would welcome a loss of control, and others will severely underestimate the risks, with varying levels of conscious intention.

What are our options?

They note three.

  1. There is the pure ‘hands-off’ or ‘YOLO’ strategy where we intentionally avoid any rules or restrictions whatsoever, on the theory that humans having the ability to collectively steer the future is bad, actually, and we should avoid it. This pure anarchism is a remarkably popular position among those who are loud on Twitter. As they note, from a national security standpoint, this is neither a credible nor a coherent strategy. I would add that from the standpoint of trying to ensure humanity survives, it is again neither credible nor coherent.

  2. Moratorium strategy. Perhaps we can pause development past some crucial threshold? That would be great if we could pull it off, but coordination is hard and the incentives make this even harder than usual, if states lack reliable verification mechanisms.

  3. Monopoly strategy. Try to get there first and exert a monopoly, perhaps via a ‘Manhattan Project’ style state program. They argue that it would be impossible to hide this program, and others would doubtless view it as a threat and respond with escalations and hostile countermeasures.

They offer this graph as an explanation for why they don’t like Monopoly strategy:

Certainly escalation and even war is one potential response to the monopoly strategy, but the assumption that it goes that way is based on China or others treating superintelligence as an existential strategic threat. They have to take the threat so seriously that they will risk war over it, for real.

Would they take it that seriously before it happens? I think this is very far from obvious. It takes a lot of conviction to risk everything over something like that. Historically, deterrence strikes are rare, even when they would have made strategic sense, and the situation was less speculative. Nor does a successful strike automatically lead to escalation.

That doesn’t mean that going down these paths is good or safe. Racing for superintelligence as quickly as possible, with no solution on how to control it, in a way that forces your rival to respond in kind when previously let’s face it they weren’t trying all that hard, does not seem like a wise thing to aim for or do. But I think the above chart is too pessimistic.

Instead they propose a Multipolar strategy, with the theory being that Deterrence with Mutual Assured AI Malfunction (MAIM), combined with strong nonproliferation and competitiveness, can hopefully sustain an equilibrium.

There are two importantly distinct claims here.

The first claim here is that a suboptimal form of MAIM is the default regime, that costs for training runs will balloon, thus they can only happen at large obvious facilities, and therefore there are a variety of escalations those involved can use to shut down AI programs, from sabotage up to outright missile attacks, and any one rival is sufficient to shut down an attempt.

The second claim is that it would be wise to pursue a more optimal form of MAIM as an intentional policy choice.

MAIM is trivially true, at least in the sense that MAD is still in effect, although the paper claims that sabotage means there are reliable options available well short of a widespread nuclear strike. Global thermonuclear war would presumably shut down everyone’s ASI projects, but it seems likely that launching missiles at a lot of data centers would lead to full scale war, perhaps even somewhat automatic nuclear war. Do we really think ‘kinetic escalation’ or sabotage can reliably work and also be limited to the AI realm? Are there real options short of that?

Yes, you could try to get someone to sabotage, or engage in a cyberattack. The paper authors think that between all the options available, many of which are hard to attribute or defend against, we should expect such an afford to work if it is well resourced, at least enough to delay progress on the order of months. I’m not sure I have even that confidence, and I worry that it won’t count for much. Human sabotage seems likely to become less effective over time, as AIs themselves take on more of the work and error checking. Cyberattacks similarly seem like they are going to get more difficult, especially once everyone involved is doing fully serious active defense and accepting real costs of doing so.

The suggestion here is to intentionally craft and scope out MAIM, to allow for limited escalations along a clear escalation ladder, such as putting data centers far away from population centers and making clear distinctions between acceptable projects and destabilizing ones, and implementing ‘AI-assisted inspections.’

Some actions of this type took place during the Cold War. Then there are other nations and groups with a history of doing the opposite, doing some combination of hiding their efforts, hardening the relevant targets and intentionally embedding military targets inside key civilian infrastructure and using ‘human shields.’

That’s the core idea. I’ll touch quickly on the other two parts of the plan, Nonproliferation and Competitiveness, then circle back to whether the core idea makes sense and what assumptions it is making. You can safety skip ahead to that.

They mention you can skip this, and indeed nothing here should surprise you.

In order for the regime of everyone holding back to make sense, there need to be a limited number of actors at the established capabilities frontier, and you need to keep that level of capability out of the hands of the true bad actors. AI chips would be treated, essentially, as if they were also WMD inputs.

Compute security is about ensuring that AI chips are allocated to legitimate actors for legitimate purposes. This echoes the export controls employed to limit the spread of fissile materials, chemical weapons, and biological agents.

Information security involves securing sensitive AI research and model weights that form the core intellectual assets of AI. Protecting these elements prevents unwarranted dissemination and malicious use, paralleling the measures taken to secure sensitive information in the context of WMDs.

They discuss various mechanisms for tracking chips, including geolocation and geofencing, remote attestation, networking restrictions and physical tamper resistance. Keeping a lockdown on frontier-level model weights also follows, and they offer various suggestions on information security.

Under AI Security (5.3) they claim that model-level safeguards can be made ‘significantly resistant to manipulation.’ In practice I am not yet convinced.

They offer a discussion in 5.3.2 of loss of control, including controlling an intelligence recursion (RSI). I am not impressed by what is on offer here in terms of it actually being sufficient, but if we had good answers that would be a case for moving forward, not for pursuing a solution like MAIM.

The question on competitiveness is not if but rather how. The section feels somewhat tacked on, they themselves mention you can skip this.

The suggestions under military and economy should be entirely uncontroversial.

The exception is ‘facilitate immigration for AI scientists,’ which seems like the most obvious thing in the world to do, but alas. What a massive unforced error.

The correct legal framework for AI and AI agents has been the subject of extended debate, which doubtless will continue. The proposed framework here is to impose upon AIs a duty of reasonable care to the public, another duty of care to the principle, and a duty not to lie. They propose to leave the rest to the market to decide.

The section is brief so they can’t cover everything, but as a taste to remind one that the rabbit holes run deep even when considering mundane situations: Missing here is which human or corporation bears liability for harms. If something goes wrong, who is to blame? The user? The developer or deployer? They also don’t discuss how to deal with other obligations under the law, and they mention the issue of mens rea but not how they propose to handle it.

They also don’t discuss what happens if an AI agent is unleashed and is outside of human control, whether or not doing so was intentional, other than encouraging other AIs to not transact with such an AI. And they don’t discuss to what extent an AI agent would be permitted to act as a legal representative of a human. Can they sign contracts? Make payments? When is the human bound, or unbound?

They explicitly defer discussion of potential AI rights, which is its own rabbit hole.

The final discussion here is on political stability, essentially by using AI to empower decision makers and filter information, and potentially doing redistribution in the wake of automation. This feels like gesturing at questions beyond the scope of the paper.

What would make deliberately pursuing MAIM as a strategy both necessary and sufficient?

What would make it, as they assert, the default situation?

Both are possible, but there are a good number of assumptions.

The most basic requirement is that it essentially requires common knowledge.

Everyone must ‘feel the superintelligence,’ and everyone must be confident that:

  1. At least one other major player feels the superintelligence.

  2. That another state will attempt to stop you via escalation, if you go for it.

  3. That such escalation would either succeed or escalate to total war.

If you don’t believe all of that, you don’t have MAIM, the same way you would not have had MAD.

Indeed, we have had many cases of nuclear proliferation, exactly because states including North Korea have correctly gambled that no one would escalate sufficiently to stop them. Our planetary track record of following through in even the most obvious of situations is highly spotty. Our track record of preemptive wars in other contexts is even worse, with numerous false negatives and also false positives.

Superintelligence is a lot murkier and uncertain in its definition, threshold and implications than a nuclear bomb. How confident are you that your rivals will be willing to pull the trigger? How confident do they need to be that this is it? Wouldn’t there be great temptation to be an ostrich, and pretend it wasn’t happening, or wasn’t that big a deal?

That goes together with the question of whether others can reliably identify an attempt to create superintelligence, and then whether they can successfully sabotage that effort with a limited escalation. Right now, no one is trying all that hard to hide or shield what they are up to, but that could change. Right now, the process requires very obvious concentrated data centers, but that also could change, especially if one was willing to sacrifice efficiency. And so on. If we want to preserve things as they are, we will have to do that deliberately.

The paper asserts states ‘would not stand idly by’ while another was on the ‘cusp of superintelligence.’ I don’t think we can assume that. They might not realize what is happening. They might not realize the implications. They might realize probabilistically but not be willing to move that far up the escalatory ladder or credibly threaten to do so. A central failure mode is that the threat is real but not believed.

It seems, at minimum, rather strange to assume MAIM is the default. Surely, various sabotage efforts could complicate things, but presumably things get backed up and it is not at all obvious that there is a limited-scope way to stop a large training run indefinitely. It’s not clear what a few months of sabotage buys you even if it works.

The proposal here is to actively engineer a stable MAIM situation, which if enacted improves your odds, but the rewards to secrecy and violating the deals are immense. Even they admit that MAIM is a ‘wicked problem’ that would be in an unstable, constantly evolving state in the best of times.

I’m not saying it cannot be done, or even that you shouldn’t try. It certainly seems important to have the ability to implement such a plan in your back pocket, to the greatest extent possible, if you don’t intentionally want to throw your steering wheel out the window. I’m saying that even with the buy-in of those involved, it is a heavy lift. And with those currently in power in America, the lift is now that much tougher.

All of this can easily seem several levels of rather absurd. One could indeed point to many reasons why this strategy could wind up being profoundly flawed, or that the situation might be structured so that this does not apply, or that there could end up being a better way.

The point is to start thinking about these questions now, in case this type of scenario does play out, and to consider under what conditions one would want to seek out such a solution and steer events in that direction. To develop options for doing so, in case we want to do that. And to use this as motivation to actually consider all the other ways things might play out, and take them all seriously, and ask how we can differentiate which world we are living in, including how we might move between those worlds.

Discussion about this post

On MAIM and Superintelligence Strategy Read More »

outbreak-turns-30

Outbreak turns 30


Ars chats with epidemiologist Tara Smith about the film’s scientific accuracy and impact over 3 decades.

Dustin Hoffman and Renee Russo starred in this medical disaster thriller. Credit: Warner Bros.

Back in 2020, when the COVID pandemic was still new, everyone was “sheltering in place” and bingeing films and television. Pandemic-related fare proved especially popular, including the 1995 medical disaster-thriller Outbreak, starring Dustin Hoffman. Chalk it up to morbid curiosity, which some researchers have suggested is an evolved response mechanism for dealing with threats by learning from imagined experiences. Outbreak turned 30 this week, making this the perfect time to revisit the film.

(Spoilers for Outbreak abound below.) 

Outbreak deals with the re-emergence of a deadly virus called Motaba, 28 years after it first appeared in an African jungle, infecting US soldiers and many others. The US military secretly destroyed the camp to conceal evidence of the virus, a project overseen by Major General Donald McClintock (Donald Sutherland) and Brigadier General William Ford (Morgan Freeman). When it re-emerges in Zaire decades later, a military doctor, Colonel Sam Daniels (Hoffman), takes a team to the afflicted village to investigate, only to find the entire town has died.

Daniels takes blood samples and realizes the villagers had been infected by a deadly new virus. But Ford shrugs off  Daniels’ concerns about a potential global spread, not wanting the truth to come out about the bombing of the village nearly 30 years ago. Daniels alerts his estranged ex-wife, Dr. Roberta “Robby” Keough, who works for the Centers for Disease Control and Prevention, about the virus, and she, too, is initially concerned.

Meanwhile, a local monkey is captured and brought to the US as an exotic pet. A smuggler named Jimbo (Patrick Dempsey)—who works at an animal testing facility—tries to sell the monkey to a pet shop owner named Rudy (Daniel Chodos) in the fictional town of Cedar Creek, California. The monkey bites Rudy. Unable to sell the monkey, Jimbo lets it loose in the woods and flies home to Boston. Both Jimbo and his girlfriend (who greets him at Logan Airport and passionately kisses a feverish Jimbo right before he collapses) die from the virus.

Naturally Keough hears about the Boston cases and realizes Daniels was right—the new virus has found its way to American soil. Initially she thinks there aren’t any other cases, but then Rudy’s demise comes to light, along with the death of a hospital technician who became infected after accidentally breaking a vial of Rudy’s blood during testing. When the virus strikes down a cinema filled with moviegoers, Daniels and Keough realize the virus has mutated and become airborne.

This time Ford and a reluctant McClintock can’t afford not to act as the bodies keep piling up.  The military declares martial law in the town as Daniels and his fellow scientists race to develop a cure, even as the nefarious McClintock schemes to bomb Cedar Creek to smithereens to contain the virus. The deaths of the residents strike him as a necessary cost to preserve his hopes of developing Motaba as a biological weapon; he dismisses them as “casualties of war.”

Outbreak ended up grossing nearly $190 million worldwide when it was released in March 1995, but critical reviews were mixed. Some loved the medical thriller aspects and quick pacing, while others dismissed it as shallow and improbable. Some of the biggest criticisms of the film came from scientists.

A mixed bag

“Honestly, the science, if you look at it broadly, is not awful,” Tara Smith, an epidemiologist at Kent State University in Ohio, told Ars. “They showed BSL-4 facilities and had a little description of the different levels that you work in. The protagonists respond to an outbreak, they take samples, they bring them back to the lab. They infect some cells, infect some animals, they do some microscopy, although it’s not clear that they’re actually doing electron microscopy, which would be needed to see the virus. But overall, the steps are right.”

Granted, there are plenty of things to nitpick. “There’s a lot of playfulness,” said Smith. “Kevin Spacey [who plays military doctor Lt. Col. Casey Schuler] takes out a fake virus tube and tosses it to Cuba Gooding Jr. [who plays another military doctor, Major Salt]. You don’t play in the BSL-4 laboratories. You just don’t. And a lab tech [who becomes infected] is spinning a centrifuge and doing other things at the same time. Then he opens up the centrifuge and just puts his hand in there and everything breaks. That’s how he gets exposed to the virus. I’ve used a centrifuge hundreds of times. You wait until everything is stopped to open it up. As a trained scientist, those are the things you are told over and over not to do. [The filmmakers] exploit those to drive the plot.”

One of the biggest scientific criticisms is the time compression: the virus multiplies in the body within an hour instead of days; Salt eventually synthesizes a cure in under a minute when this would normally take months; and Keough (who has been infected) recovers almost immediately after being injected with said cure. Smith also noted that scientists identify the two Motaba strains using electron micrographs rather than sequencing them, as would normally be required.

And that whole bit about the Motaba virus liquefying organs just isn’t a thing, according to Smith. “If you read The Hot Zone [Richard Preston’s bestselling 1994 nonfiction thriller], or watch Outbreak and take a shot every time you hear ‘liquefying,’ you would be dead by the end,” she said. “I don’t know how that trope got so established in the media, but you see it every time the Ebola comes up: people are bleeding from their eyes, they’re liquefying. That doesn’t happen. They’re horribly sick. It is an awful virus, but people don’t just melt.”

That said, “I think the biggest [scientific] issue with Outbreak was the whole airborne thing,” said Smith. “Realistically, viruses just don’t change transmission like that.”

Influencing public perceptions

According to Smith, Outbreak may have impacted public perceptions of the 2014–2016 Ebola outbreak—the largest yet seen—fueling widespread fear. “There were very serious people in The New York Times talking about Ebola potentially becoming airborne,” she said. “There was one study where scientists had aerosolized the virus on purpose and given it to pigs and the pigs got infected, which was treated as proof that Ebola could be airborne.”

“That idea that Ebola is super contagious and you can spread it by air—that really originates with Outbreak in 1995, because if you look at the science, it’s just not there,” Smith continued. “Ebola is not that easy to get unless you have close, personal, bodily-fluid-exchanging contact. But people certainly thought it was airborne in 2014–2015, and thought that Ebola was going to cause this huge outbreak in the United States. Of course, we just had a few select cases.”

Smith is currently working on a project that reviews various outbreak stories in popular media and their influence on public perception, particularly when it comes to the origins of those outbreaks. “Where does the virus, fungus, or bacteria come from?” said Smith. “So many films and TV series have used a lab leak origin, where something was made in the laboratory, it escapes, and causes a global pandemic. That’s an important narrative when we talk about the COVID pandemic, because so many people jumped on the lab leak bandwagon as an origin for that. In Outbreak it’s a natural virus, not a lab leak. I don’t think you’d see that if it were re-made today.”

Sam and Salt find the information they’re looking for. Warner Bros.

Outbreak is often unfavorably compared to another pandemic movie, 2011’s Contagion, of which Smith is naturally a fan. “Contagion is the gold standard [of pandemic movies],” said Smith. “Contagion was done in very close collaboration with a lot of scientists. One of the scientists in the movie is even named for [Columbia University epidemiologist] Ian Lipkin. Scientific accuracy was more important from the start. And there’s a bigger timeframe. These things happen in months rather than days. Even in Contagion, the vaccine was developed quicker than in the COVID pandemic, but at least it was a little bit more realistically done, scarily so when you think about the Jude Law character who was the blogger peddling fake cures—very similar to Ivermectin during the COVID pandemic.”

One might quibble with the science, but as entertainment, after 30 years, the film holds up remarkably well, despite the obvious tropes of action films of the 1990s. (Sam and Salt defying orders and hijacking a military helicopter, then using it to face-off mid-air against a military aircraft deployed to bomb the town out of existence, is just one credibility-straining example.) The talented cast alone makes it worth a rewatch. And for Smith, it was nice to see a strong female epidemiologist as a leading character in Russo’s Bobby Keough. On the whole, “I honestly think Outbreak was fairly good,” she said.

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Outbreak turns 30 Read More »

civilization-vii,-one-month-later:-the-community-and-developers-chime-in

Civilization VII, one month later: The community and developers chime in


Executive Producer Dennis Shirk talks with Ars about the state of the game.

Civilization VII has a lot of visual polish, and great gameplay systems. A flurry of patches have been improving other aspects, too. Credit: 2K Games

A month ago, Civilization VII launched to generally positive critical reviews, but user reviews on Steam and Metacritic weren’t nearly so positive, at least at first.

Take a look at the Civilization subreddit, and you’ll see a general consensus: The bones of this game are great, and even most of the radical changes to the classic formula (like breaking the game into much more distinct ages) are a welcome refresh.

On the other hand, there’s also a sentiment that players are disappointed that some expected features are missing, some gameplay elements need additional polish, and most of all, the user interface was a bit of a mess at launch.

A month later, developer Firaxis has already released a few patches and has more planned. As the game’s state continues to evolve, this seems like a good time to check in on it.

I spent some time in the Civ community and spoke with Dennis Shirk, the game’s executive producer, to learn how the launch went, how the game has changed since launch, and what its next steps are.

Breaking with tradition

Civilization VII broke with tradition in a few ways—splitting the game into distinct ages that each play like a separate game, allowing anachronistic leader/civilization combinations, and removing worker units, to name a few.

You might have expected those to be the source of any controversy around the game’s launch, but that hasn’t really been the case. In my review, I wrote that those shifts take the franchise in a new direction, bring over the best ideas from competing titles, and address long-standing problems with the Civilization experience.

If you want a more traditional experience, you can go back to Civilization V, Civilization IV, Civilization II, or whichever your favorite was. Those games are infinitely re-playable, so there’s no need to retread with a sequel.

“Our rule that we live by at Firaxis is the rule of thirds. We want to keep one-third of the game the same as previous iterations, one-third tweaked and improved upon, and one-third new,” Shirk told me. “Did we lean farther into the last third than we have in the past? We may have, but it was a risk we were willing to take to deliver a completely new part of the experience.”

A suboptimal starting position

The Civilization subreddit is full of positive responses to those changes, and the large contingent of Civ geeks on the Ars editorial staff are mostly in agreement that they’re good changes, too. (The game has been a frequent discussion topic in the Ars Slack for several weeks.)

The last month has seen players giving critical feedback, and Firaxis has been releasing patches to address complaints. For example, patch 1.1.0 on March 4 fixed some visual problems with the technology tree and made big changes to some victory conditions in the Modern Age, among other things.

Players have noted positive changes that weren’t mentioned in patch notes, too. Reddit user AndyNemmity posted that the “AI is significantly better in Military” after a recent patch a week ago, writing:

I know most of you don’t see the Military AI in the fog of war, but I work on the AI mod, and run a ton of autoplays. I am 10+ autoplays with the new patch, and the base game military AI is VASTLY improved.

Before, the AI would get stuck on the map in tons of different scenarios, often dying because they have an entire army stuck on the map, and can’t use it. This is fixed. Now the autoplays look like actual militaries, warring, attacking, killing independents quickly and efficiently.

The goodwill about the bones of the game and the positive responses to some patch additions are still accompanied by some consternation about the UI.

“Part of launching a game, especially when big changes are made, is figuring out what is resonating with players, and what may be an opportunity for improvement,” Shirk said when asked about the launch challenges. “In this instance, the UI did not meet players’ expectations, and we are committed to addressing that—although it will take time.”

There’s still a fair bit to be done, and modders have been filling the gaps. Modder Sukritact released a UI overhaul that addressed several complaints—including showing the gains and losses players will see if they replace a tile improvement or building with another one in the city view.

Players praised these tweaks, going so far as to call that example in particular a “game changer.” A few days later, it was announced on the Civilization Discord that Firaxis had hired Sukritact as a technical artist.

A panel that shows a detailed explanation of the bonuses affecting a tile improvement

This mod by Sukritact adds much-needed information to the city view. The modder has since been hired by Firaxis. Credit: RileyTaugor

The community has speculated that the game was rushed out the door before it was ready, primarily citing the UI issues.

“In hindsight, our UI team needed more time and space to take the UI where it needed to go, to really expose the level of information our players expect,” Shirk admitted. “Our team has been working hard to address these issues through rapid patching, and players will continue to see support for the foreseeable future.”

That said, debate about the UI is happening in the context of a wider discussion about the scope of Civilization VII’s launch.

A tale of 10 platforms

Every mainline Civilization game in the past launched on just desktop platforms like Windows or Mac, but Civilization VII greatly expanded that. Depending on what counts (we’ll say here that the Steam Deck counts as distinct from Linux, and the Xbox Series S is distinct from Xbox Series X), there were 10 launch platforms for Civilization VII:

  • Windows
  • Linux
  • macOS
  • Steam Deck
  • Nintendo Switch
  • PlayStation 4
  • PlayStation 5
  • Xbox One
  • Xbox Series S
  • Xbox Series X

That’s a lot to target at launch, and players in the subreddit have speculated that Firaxis was spread a bit thin here, making this part of the explanation for a relatively buggy UI on day one.

Some also speculated that the classic desktop PC platform got a worse experience in order to accommodate console and Steam Deck players. For example, players lamented the lack of a drag and drop feature for views like the policy selection screen.

The developers have made it crystal clear that PC is the top priority, though. “Our core audience is absolutely PC, so we always start there, and work our way outward, adapting UI systems along the way, iterating on different UX approaches,” Shirk said.

He added that the controller support was developed with a partner, suggesting that supporting consoles out of the gate might not have taxed the team working on the desktop interface as much as some feared.

At least in one respect, Firaxis has already publicly walked the walk: at one point it made the controversial decision to temporarily pause cross-save between PC and console so they could push updates to PC faster. Patching games on consoles requires a relatively slow and laborious certification process, but that’s not the case for updating a game on Steam.

The cloud save menu in Civilization VII

Cross-loading cloud saves across PC and console was turned off for a while so Firaxis could iterate faster on PC. Credit: Samuel Axon

Meanwhile, some console and handheld players have complained about their version of the interface.

The most commonly named UI problem on console and handhelds is related to how the camera and hex selector could be moved across the map more efficiently. Currently, moving the camera is easy—you just use the left stick to pan around. But doing this doesn’t move the hex selector with it, so you have to drag that selector hex by hex all the way across the map.

Some similar games have a button you can press to bring the selector to where the camera is. In Civilization VII, the R3 button brings the camera to where the selector is, not vice versa—which isn’t useful.

Shirk talked a bit about the process of developing the controller-based interface and the challenges the team faced:

We’ve been lucky enough to have some great partners help us develop the controller support, which added some strong console specific features like the radial menu. However, when you’re working with different interfaces across different platforms, there are many assumptions that cannot be made like they can on PC. For example, a player using a mouse is not walled off from anything, but switch that to a controller, and a completely different thought process has to come into play.

As for solutions, he added:

We’re working to give all versions the attention they deserve. When it comes to UI updates, we’re having team members continue to look at the community feedback in-depth and see how we can improve the experience for players regardless of system.

When I asked about drag-and-drop on desktop, and R3’s selection functionality on console and handheld, he said “the examples you shared are among features we are tracking and exploring how to address,” and that the March 4 1.1.0 patch that brought some UI changes was just a start. He added that a 1.1.1 coming March 25 will be when “fans will really start to see the results of their feedback.”

“And to answer your original question, ‘R3’ is coming along for the ride,” he said.

Following the legacy path to balanced gameplay

It seems like the UI is on the right track, but some tweaks need to happen on the gameplay front too, as players and critics tell it.

There are complaints about the AI—something as old as the franchise itself, to be fair. Some improvements have already been made, but players continue to report that AI civs keep founding cities close to players’ capitals for no apparent reason, causing frustration.

A small city appears close to the player's capitol

“Ashoka traveled across the entire continent just to settle four tiles away from my capital,” said DayTemporary3369, the Reddit user who posted this screenshot. They weren’t alone in this complaint. Credit: DayTemporary3369

Religion gameplay needs attention, as there’s no way to stop other leaders’ missionaries, leading to unsatisfying back-and-forth conversion gameplay. Similarly, players feel there aren’t enough defenses against espionage.

“If they’re all allowed to target me at the same time, I should be allowed to defend myself from all of them, provided I have enough influence,” said Reddit user Pay_No_Heed on the topic of counter-espionage. The complaint is reasonable, though a working design solution may not be as obvious as it seems.

Players have also complained that ages end too abruptly, and that holds true for the end of the game, which happens when the Modern Age concludes. It’s a quibble I also shared in my review. Many players are maxing out the game’s age length setting to combat this. Past Civilization games offered a “one more turn” option to extend the game past when someone had won. Firaxis has said this is coming to the end of the modern age in a future update.

There’s also the Civilopedia, the in-game database of concepts and help documentation. Players have noted it’s more barebones than expected, with several key concepts lacking entries or explanation. Firaxis acknowledged this complaint and said it’s being worked on.

“Yes, with each update we’re improving what’s exposed in the Civilopedia, including more gameplay data, easier navigation, et cetera. Expect much more to come in future updates,” Shirk explained.

In general, the game needs to have more information exposed to players. The gap is big enough that Reddit user JordiTK posted the heavily upvoted “Ultimate List of Things That Civilization VII Doesn’t Tell You.” It’s almost 5,000 words long, with more than 100 items.

Almost every prior Civilization game has had players complaining that it didn’t explain itself well enough, but the sentiment seems stronger this time. For what it’s worth, Shirk says the team recognizes this.

“Internally, our primary design goal for Civilization VII was to focus and iterate on the new mechanics, to really make sure this design would sing,” he said. “This focus on the new probably led us to work with a few false assumptions about what base level information players would need with our legacy systems, and it wasn’t something that came up as loudly as it should have in user testing.”

It’s not “We Love the Developer Day” just yet

While everyone in the community and within Firaxis agrees there’s still work to be done, the tone has improved since the launch because of these patches, and thanks to frequent engagement on Steam, Discord, and Reddit by the developer’s community manager.

The launch situation was made a little worse than it needed to be because of, strangely enough, confusion around nomenclature. Players who paid for the pricier special editions of the game were given “Advanced Access” a few days before the main launch date.

After it was apparent there were problems, some of the communications to players on storefronts and on Reddit called it “early access,” causing a bit of a stir because until then players hadn’t perceived the special edition advanced access to be the same as early access, which is a term typically used in the industry to let players know a game is incomplete and in a pre-release state.

When asked about this, a spokesperson for 2K Games (the game’s publisher) gave a statement to Ars that read:

Our goal is always to deliver the best product possible, including during Advanced Access periods. With a game the size and scope of Civilization VII there will always be fixes and optimizations once the entirety of the player base is able to jump in. The intent behind the Advanced Access granted to purchasers of the Deluxe Edition and Founders Edition was not to offer a work in progress product, and we take the feedback delivered during that period seriously.

We’re working hard to make sure that players have the best experience in the world of 4X strategy for years to come, and player feedback remains critical in helping us grow and build the future of Civ.

That suggests the use of “early access” was just a misstatement and not an attempt to cover for a rough pre-launch access period, but it wasn’t a great start to the conversation.

Since then, though, some of the most critical problems have been addressed, and the studio shared a roadmap that promised “UI updates and polish” in patches on March 4 (1.1.0, already released), March 25 (1.1.1), and sometime in April (1.2.0). The roadmap lists “additional UI updates & polish” for beyond April, too, confirming this will be a lengthy process.

A roadmap promising updates on March, April, and Beyond

Here’s the updated roadmap from Firaxis. Credit: 2K Games

This frequent communication, combined with the fact that players recognize there’s a good game here that needs some more polish, has meant that most of the discussions in the community during this first month have been pretty optimistic, despite the launch woes.

There was a time years ago when games were marketed leading up to their launch, but then the communication with players was over. In today’s market (especially for complex games like Civilization) there’s often a need to iterate in public. Players understand that and will roll with it if it’s communicated clearly to them. Firaxis stumbled on that in the opening days, but it’s now clear the studio understands that well, and the updates are rolling out.

We’ve seen a lot of rough launches for big games in recent years, and they often turn quite toxic. That said, the core Civilization community seems more patient and optimistic than you typically see in situations like this. That’s a credit to Firaxis’ years of goodwill, but it’s also a credit to the moderators and other leaders in the game’s community.

When I reviewed Civilization VII, I wrote that the core systems were strong, and that the game likely has a bright future ahead of it—but I also said it might make sense to wait a few weeks to dive in because of UI and balance issues.

It’s a few weeks later, and it looks like the game is on the right track, but there’s still a way to go if you’re looking for an impeccably polished product. That hasn’t stopped me from enjoying the dozens of hours I’ve played so far, though.

Photo of Samuel Axon

Samuel Axon is a senior editor at Ars Technica. He covers Apple, software development, gaming, AI, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

Civilization VII, one month later: The community and developers chime in Read More »

large-study-shows-drinking-alcohol-is-good-for-your-cholesterol-levels

Large study shows drinking alcohol is good for your cholesterol levels

The good and the bad

For reference, the optimal LDL level for adults is less than 100 mg/dL, and optimal HDL is 60 mg/dL or higher. Higher LDL levels can increase the risk of heart disease, stroke, peripheral artery disease, and other health problems, while higher HDL has a protective effect against cardiovascular disease. Though some of the changes reported in the study were small, the researchers note that they could be meaningful in some cases. For instance, an increase of 5 mg/dL in LDL is enough to raise the risk of a cardiovascular event by 2 percent to 3 percent.

The researchers ran three different models to adjust for a variety of factors, including basics like age, sex, body mass index, as well as medical conditions, such as hypertension and diabetes, and lifestyle factors, such as exercise, dietary habits, and smoking. All the models showed the same associations. They also broke out the data by what kinds of alcohol people reported drinking—wine, beer, sake, other liquors and spirits. The results were the same across the categories.

The study isn’t the first to find good news for drinkers’ cholesterol levels, though it’s one of the larger studies with longer follow-up time. And it’s long been found that alcohol drinking seems to have some benefits for cardiovascular health. A recent review and meta-analysis by the National Academies of Sciences, Engineering, and Medicine found that moderate drinkers had lower relative risks of heart attacks and strokes. The analysis also found that drinkers had a lower risk of all-cause mortality (death by any cause). The study did, however, find increased risks of breast cancer. Another recent review found increased risk of colorectal, female breast, liver, oral cavity, pharynx, larynx, and esophagus cancers.

In all, the new cholesterol findings aren’t an invitation for nondrinkers to start drinking or for heavy drinkers to keep hitting the bottle hard, the researchers caution. There are a lot of other risks to consider. For drinkers who aren’t interested in quitting, the researchers recommend taking it easy. And those who do want to quit should keep a careful eye on their cholesterol levels.

In their words: “Public health recommendations should continue to emphasize moderation in alcohol consumption, but cholesterol levels should be carefully monitored after alcohol cessation to mitigate potential [cardiovascular disease] risks,” the researchers conclude.

Large study shows drinking alcohol is good for your cholesterol levels Read More »

texas-measles-outbreak-spills-into-third-state-as-cases-reach-258

Texas measles outbreak spills into third state as cases reach 258

Texas and New Mexico

Meanwhile, the Texas health department on Tuesday provided an outbreak update, raising the case count to 223, up 25 from the 198 Texas cases reported Friday. Of the Texas cases, 29 have been hospitalized and one has died—a 6-year-old girl from Gaines County, the outbreak’s epicenter. The girl was unvaccinated and had no known underlying health conditions.

The outbreak continues to be primarily in unvaccinated children. Of the 223 cases, 76 are in ages 0 to 4, and 98 are between ages 5 and 17. Of the cases, 80 are unvaccinated, 138 lack vaccination status, and five are known to have received at least one dose of the Measles, Mumps, and Rubella vaccine.

One dose of MMR is estimated to be 93 percent effective against measles, and two doses offer 98 percent protection. It’s not unexpected to see a small number of breakthrough cases in large, localized outbreaks.

Across the border from Gaines County in Texas sits Lea County, where New Mexico officials have now documented 32 cases, with an additional case reported in neighboring Eddy County, bringing the state’s current total to 33. Of those cases, one person has been hospitalized and one person (not hospitalized) died. The death was an adult who did not seek medical care and tested positive for measles only after death. The cause of their death is under investigation.

Of New Mexico’s 33 cases, 27 were unvaccinated and five did not have a vaccination status, and one had received at least one MMR dose. Eighteen of the 33 cases are in adults, 13 are ages 0 to 17, and two cases have no confirmed age.

On Friday, the Centers for Disease Control and Prevention released a travel alert over the measles outbreak. “With spring and summer travel season approaching in the United States, CDC emphasizes the important role that clinicians and public health officials play in preventing the spread of measles,” the agency said in the alert. It advised clinicians to be vigilant in identifying potential measles cases.

The agency stressed the importance of vaccination, putting in bold: “Measles-mumps-rubella (MMR) vaccination remains the most important tool for preventing measles,” while saying that “all US residents should be up to date on their MMR vaccinations.”

US health secretary and long-time anti-vaccine advocate Robert F. Kennedy Jr, meanwhile, has been emphasizing cod liver oil, which does not prevent measles, and falsely blaming the outbreak on poor nutrition.

Texas measles outbreak spills into third state as cases reach 258 Read More »

m4-max-and-m3-ultra-mac-studio-review:-a-weird-update,-but-it-mostly-works

M4 Max and M3 Ultra Mac Studio Review: A weird update, but it mostly works

Comparing the M4 Max and M3 Ultra to high-end PC desktop processors.

As for the Intel and AMD comparisons, both companies’ best high-end desktop CPUs like the Ryzen 9 9950X and Core Ultra 285K are often competitive with the M4 Max’s multi-core performance, but are dramatically less power-efficient at their default settings.

Mac Studio or M4 Pro Mac mini?

The Mac Studio (bottom) and redesigned M4 Mac mini. Credit: Andrew Cunningham

Ever since Apple beefed up the Mac mini with Pro-tier chips, there’s been a pricing overlap around and just over $2,000 where the mini and the Studio are both compelling.

A $2,000 Mac mini comes with a fully enabled M4 Pro processor (14 CPU cores, 20 GPU cores), 512GB of storage, and 48GB of RAM, with 64GB of RAM available for another $200 and 10 gigabit Ethernet available for another $100. RAM is the high-end Mac mini’s main advantage over the Studio—the $1,999 Studio comes with a slightly cut-down M4 Max (also 14 CPU cores, but 32 GPU cores), 512GB of storage, and just 36GB of RAM.

In general, if you’re spending $2,000 on a Mac desktop, I would lean toward the Studio rather than the mini. You’re getting roughly the same CPU but a much faster GPU and more ports. You get less RAM, but depending on what you’re doing, there’s a good chance that 36GB is more than enough.

The only place where the mini is clearly better than the Studio once you’ve above $2,000 is memory. If you want 64GB of RAM in your Mac, you can get it in the Mac mini for $2,200. The cheapest Mac Studio with 64GB of RAM also requires a processor upgrade, bringing the total cost to $2,700. If you need memory more than you need raw performance, or if you just need something that’s as small as it can possibly be, that’s when the high-end mini can still make sense.

A lot of power—if you need it

Apple’s M4 Max Mac Studio. Credit: Andrew Cunningham

Obviously, Apple’s hermetically sealed desktop computers have some downsides compared to a gaming or workstation PC, most notably that you need to throw out and replace the whole thing any time you want to upgrade literally any component.

M4 Max and M3 Ultra Mac Studio Review: A weird update, but it mostly works Read More »

why-extracting-data-from-pdfs-is-still-a-nightmare-for-data-experts

Why extracting data from PDFs is still a nightmare for data experts


Optical Character Recognition

Countless digital documents hold valuable info, and the AI industry is attempting to set it free.

For years, businesses, governments, and researchers have struggled with a persistent problem: How to extract usable data from Portable Document Format (PDF) files. These digital documents serve as containers for everything from scientific research to government records, but their rigid formats often trap the data inside, making it difficult for machines to read and analyze.

“Part of the problem is that PDFs are a creature of a time when print layout was a big influence on publishing software, and PDFs are more of a ‘print’ product than a digital one,” Derek Willis, a lecturer in Data and Computational Journalism at the University of Maryland, wrote in an email to Ars Technica. “The main issue is that many PDFs are simply pictures of information, which means you need Optical Character Recognition software to turn those pictures into data, especially when the original is old or includes handwriting.”

Computational journalism is a field where traditional reporting techniques merge with data analysis, coding, and algorithmic thinking to uncover stories that might otherwise remain hidden in large datasets, which makes unlocking that data a particular interest for Willis.

The PDF challenge also represents a significant bottleneck in the world of data analysis and machine learning at large. According to several studies, approximately 80–90 percent of the world’s organizational data is stored as unstructured data in documents, much of it locked away in formats that resist easy extraction. The problem worsens with two-column layouts, tables, charts, and scanned documents with poor image quality.

The inability to reliably extract data from PDFs affects numerous sectors but hits hardest in areas that rely heavily on documentation and legacy records, including digitizing scientific research, preserving historical documents, streamlining customer service, and making technical literature more accessible to AI systems.

“It is a very real problem for almost anything published more than 20 years ago and in particular for government records,” Willis says. “That impacts not just the operation of public agencies like the courts, police, and social services but also journalists, who rely on those records for stories. It also forces some industries that depend on information, like insurance and banking, to invest time and resources in converting PDFs into data.”

A very brief history of OCR

Traditional optical character recognition (OCR) technology, which converts images of text into machine-readable text, has been around since the 1970s. Inventor Ray Kurzweil pioneered the commercial development of OCR systems, including the Kurzweil Reading Machine for the blind in 1976, which relied on pattern-matching algorithms to identify characters from pixel arrangements.

These traditional OCR systems typically work by identifying patterns of light and dark pixels in images, matching them to known character shapes, and outputting the recognized text. While effective for clear, straightforward documents, these pattern-matching systems, a form of AI themselves, often falter when faced with unusual fonts, multiple columns, tables, or poor-quality scans.

Traditional OCR persists in many workflows precisely because its limitations are well-understood—it makes predictable errors that can be identified and corrected, offering a reliability that sometimes outweighs the theoretical advantages of newer AI-based solutions. But now that transformer-based large language models (LLMs) are getting the lion’s share of funding dollars, companies are increasingly turning to them for a new approach to reading documents.

The rise of AI language models in OCR

Unlike traditional OCR methods that follow a rigid sequence of identifying characters based on pixel patterns, multimodal LLMs that can read documents are trained on text and images that have been translated into chunks of data called tokens and fed into large neural networks. Vision-capable LLMs from companies like OpenAI, Google, and Meta analyze documents by recognizing relationships between visual elements and understanding contextual cues.

The “visual” image-based method is how ChatGPT reads a PDF file, for example, if you upload it through the AI assistant interface. It’s a fundamentally different approach than standard OCR that allows them to potentially process documents more holistically, considering both visual layouts and text content simultaneously.

And as it turns out, some LLMs from certain vendors are better at this task than others.

“The LLMs that do well on these tasks tend to behave in ways that are more consistent with how I would do it manually,” Willis said. He noted that some traditional OCR methods are quite good, particularly Amazon’s Textract, but that “they also are bound by the rules of their software and limitations on how much text they can refer to when attempting to recognize an unusual pattern.” Willis added, “With LLMs, I think you trade that for an expanded context that seems to help them make better predictions about whether a digit is a three or an eight, for example.”

This context-based approach enables these models to better handle complex layouts, interpret tables, and distinguish between document elements like headers, captions, and body text—all tasks that traditional OCR solutions struggle with.

“[LLMs] aren’t perfect and sometimes require significant intervention to do the job well, but the fact that you can adjust them at all [with custom prompts] is a big advantage,” Willis said.

New attempts at LLM-based OCR

As the demand for better document-processing solutions grows, new AI players are entering the market with specialized offerings. One such recent entrant has caught the attention of document-processing specialists in particular.

Mistral, a French AI company known for its smaller LLMs, recently entered the LLM-powered optical reader space with Mistral OCR, a specialized API designed for document processing. According to Mistral’s materials, their system aims to extract text and images from documents with complex layouts by using its language model capabilities to process document elements.

Robot sitting on a bunch of books, reading a book.

However, these promotional claims don’t always match real-world performance, according to recent tests. “I’m typically a pretty big fan of the Mistral models, but the new OCR-specific one they released last week really performed poorly,” Willis noted.

“A colleague sent this PDF and asked if I could help him parse the table it contained,” says Willis. “It’s an old document with a table that has some complex layout elements. The new [Mistral] OCR-specific model really performed poorly, repeating the names of cities and botching a lot of the numbers.”

AI app developer Alexander Doria also recently pointed out on X a flaw with Mistral OCR’s ability to understand handwriting, writing, “Unfortunately Mistral-OCR has still the usual VLM curse: with challenging manuscripts, it hallucinates completely.”

According to Willis, Google currently leads the field in AI models that can read documents: “Right now, for me the clear leader is Google’s Gemini 2.0 Flash Pro Experimental. It handled the PDF that Mistral did not with a tiny number of mistakes, and I’ve run multiple messy PDFs through it with success, including those with handwritten content.”

Gemini’s performance stems largely from its ability to process expansive documents (in a type of short-term memory called a “context window”), which Willis specifically notes as a key advantage: “The size of its context window also helps, since I can upload large documents and work through them in parts.” This capability, combined with more robust handling of handwritten content, apparently gives Google’s model a practical edge over competitors in real-world document-processing tasks for now.

The drawbacks of LLM-based OCR

Despite their promise, LLMs introduce several new problems to document processing. Among them, they can introduce confabulations or hallucinations (plausible-sounding but incorrect information), accidentally follow instructions in the text (thinking they are part of a user prompt), or just generally misinterpret the data.

“The biggest [drawback] is that they are probabilistic prediction machines and will get it wrong in ways that aren’t just ‘that’s the wrong word’,” Willis explains. “LLMs will sometimes skip a line in larger documents where the layout repeats itself, I’ve found, where OCR isn’t likely to do that.”

AI researcher and data journalist Simon Willison identified several critical concerns of using LLMs for OCR in a conversation with Ars Technica. “I still think the biggest challenge is the risk of accidental instruction following,” he says, always wary of prompt injections (in this case accidental) that might feed nefarious or contradictory instructions to a LLM.

“That and the fact that table interpretation mistakes can be catastrophic,” Willison adds. “In the past I’ve had lots of cases where a vision LLM has matched up the wrong line of data with the wrong heading, which results in absolute junk that looks correct. Also that thing where sometimes if text is illegible a model might just invent the text.”

These issues become particularly troublesome when processing financial statements, legal documents, or medical records, where a mistake might put someone’s life in danger. The reliability problems mean these tools often require careful human oversight, limiting their value for fully automated data extraction.

The path forward

Even in our seemingly advanced age of AI, there is still no perfect OCR solution. The race to unlock data from PDFs continues, with companies like Google now offering context-aware generative AI products. Some of the motivation for unlocking PDFs among AI companies, as Willis observes, doubtless involves potential training data acquisition: “I think Mistral’s announcement is pretty clear evidence that documents—not just PDFs—are a big part of their strategy, exactly because it will likely provide additional training data.”

Whether it benefits AI companies with training data or historians analyzing a historical census, as these technologies improve, they may unlock repositories of knowledge currently trapped in digital formats designed primarily for human consumption. That could lead to a new golden age of data analysis—or a field day for hard-to-spot mistakes, depending on the technology used and how blindly we trust it.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Why extracting data from PDFs is still a nightmare for data experts Read More »

gmail-gains-gemini-powered-“add-to-calendar”-button

Gmail gains Gemini-powered “Add to calendar” button

Google has a new mission in the AI era: to add Gemini to as many of the company’s products as possible. We’ve already seen Gemini appear in search results, text messages, and more. In Google’s latest update to Workspace, Gemini will be able to add calendar appointments from Gmail with a single click. Well, assuming Gemini gets it right the first time, which is far from certain.

The new calendar button will appear at the top of emails, right next to the summarize button that arrived last year. The calendar option will show up in Gmail threads with actionable meeting chit-chat, allowing you to mash that button to create an appointment in one step. The Gemini sidebar will open to confirm the appointment was made, which is a good opportunity to double-check the robot. There will be a handy edit button in the Gemini window in the event it makes a mistake. However, the robot can’t invite people to these events yet.

The effect of using the button is the same as opening the Gemini panel and asking it to create an appointment. The new functionality is simply detecting events and offering the button as a shortcut of sorts. You should not expect to see this button appear on messages that already have calendar integration, like dining reservations and flights. Those already pop up in Google Calendar without AI.

Gmail gains Gemini-powered “Add to calendar” button Read More »

after-less-than-a-day,-the-athena-lander-is-dead-on-the-moon

After less than a day, the Athena lander is dead on the Moon

NASA expected Athena to have a reasonable chance of success. Although it landed on its side, Odysseus was generally counted as a win because it accomplished most of its tasks. Accordingly, NASA loaded a number of instruments onto the lander. Most notable among these was the PRIME-1 experiment, an ice drill to sample and analyze any ice that lies below the surface.

A dark day, but not the end

“After landing, mission controllers were able to accelerate several program and payload milestones, including NASA’s PRIME-1 suite, before the lander’s batteries depleted,” the company’s statement said. However, this likely means that the company was able to contact the instrument but not perform any meaningful scientific activities.

NASA has accepted that these commercial lunar missions are high-risk, high-reward. (Firefly’s successful landing last weekend offers an example of high rewards). It is paying the companies, on average, $100 million or less per flight. This is a fraction of what NASA would pay through a traditional procurement program. The hope is that, after surviving initial failures, companies like Intuitive Machines will learn from their mistakes and open a low-cost, reliable pathway to the lunar surface.

Even so, this failure has to be painful for NASA and Intuitive Machines. The space agency lost out on some valuable science, and Intuitive Machines has taken a step backward with this mission rather than moving forward as it had hoped to do.

Fortunately, this is unlikely to be the end for the company. NASA has committed to a third and fourth mission on Intuitive Machines’ lander, the next of which could come during the first quarter of 2026. NASA has also contracted with the company to build a small network of satellites around the Moon for communications and positioning services. So although the company’s fortunes look dark today, they are not permanently shadowed like the craters on the Moon that NASA hopes to soon explore.

After less than a day, the Athena lander is dead on the Moon Read More »

blood-typers-is-a-terrifically-tense,-terror-filled-typing-tutor

Blood Typers is a terrifically tense, terror-filled typing tutor

When you think about it, the keyboard is the most complex video game controller in common use today, with over 100 distinct inputs arranged in a vast grid. Yet even the most complex keyboard-controlled games today tend to only use a relative handful of all those available keys for actual gameplay purposes.

The biggest exception to this rule is a typing game, which by definition asks players to send their fingers flying across every single letter on the keyboard (and then some) in quick succession. By default, though, typing games tend to take the form of extremely basic typing tutorials, where the gameplay amounts to little more than typing out words and sentences by rote as they appear on screen, maybe with a few cute accompanying animations.

Typing “gibbon” quickly has rarely felt this tense or important.

Credit: Outer Brain Studios

Typing “gibbon” quickly has rarely felt this tense or important. Credit: Outer Brain Studios

Blood Typers adds some much-needed complexity to that basic type-the-word-you-see concept, layering its typing tests on top of a full-fledged survival horror game reminiscent of the original PlayStation era. The result is an amazingly tense and compelling action adventure that also serves as a great way to hone your touch-typing skills.

See it, type it, do it

For some, Blood Typers may bring up first-glance memories of Typing of the Dead, Sega’s campy, typing-controlled take on the House of the Dead light gun game series. But Blood Typers goes well beyond Typing of the Dead‘s on-rails shooting, offering an experience that’s more like a typing-controlled version of Resident Evil.

Practically every action in Blood Typers requires typing a word that you see on-screen. That includes basic locomotion, which is accomplished by typing any of a number of short words scattered at key points in your surroundings in order to automatically walk to that point. It’s a bit awkward at first, but quickly becomes second nature as you memorize the names of various checkpoints and adjust to using the shift keys to turn that camera as you move.

Each of those words on the ground is a waypoint that you can type to move toward.

Credit: Outer Brain Studios

Each of those words on the ground is a waypoint that you can type to move toward. Credit: Outer Brain Studios

When any number of undead enemies appear, a quick tap of the tab key switches you to combat mode, which asks you to type longer words that appear above those enemies to use your weapons. More difficult enemies require multiple words to take down, including some with armor that means typing a single word repeatedly before you can move on.

While you start each scenario in Blood Typers with a handy melee weapon, you’ll end up juggling a wide variety of projectile firearms that feel uniquely tuned to the typing gameplay. The powerful shotgun, for instance, can take out larger enemies with just a single word, while the rapid-fire SMG lets you type only the first few letters of each word, allowing for a sort of rapid fire feel. The flamethrower, on the other hand, can set whole groups of nearby enemies aflame, which makes each subsequent attack word that much shorter and faster.

Blood Typers is a terrifically tense, terror-filled typing tutor Read More »