Author name: Shannon Garcia

a-geothermal-network-in-colorado-could-help-a-rural-town-diversify-its-economy

A geothermal network in Colorado could help a rural town diversify its economy


Town pitches companies to take advantage of “reliable, cost-effective heating and cooling.”

This article originally appeared on Inside Climate News, a nonprofit, non-partisan news organization that covers climate, energy, and the environment. Sign up for their newsletter here.

Hayden, a small town in the mountains of northwest Colorado, is searching for ways to diversify its economy, much like other energy communities across the Mountain West.

For decades, a coal-fired power plant, now scheduled to shut down in the coming years, served as a reliable source of tax revenue, jobs, and electricity.

When town leaders in the community just west of Steamboat Springs decided to create a new business park, harnessing geothermal energy to heat and cool the buildings simply made sense.

The technology aligns with Colorado’s sustainability goals and provides access to grants and tax credits that make the project financially feasible for a town with around 2,000 residents, said Matthew Mendisco, town manager.

“We’re creating the infrastructure to attract employers, support local jobs, and give our community reliable, cost-effective heating and cooling for decades to come,” Mendisco said in a statement.

Bedrock Energy, a geothermal drilling startup company that employs advanced drilling techniques developed by the oil and gas industry, is currently drilling dozens of boreholes that will help heat and cool the town’s Northwest Colorado Business District.

The 1,000-feet-deep boreholes or wells will connect buildings in the industrial park to steady underground temperatures. Near the surface the Earth is approximately 51° F year round. As the drills go deeper, the temperature slowly increases to approximately 64° F near the bottom of the boreholes. Pipes looping down into each well will draw on this thermal energy for heating in the winter and cooling in the summer, significantly reducing energy needs.

Ground source heat pumps located in each building will provide additional heating or cooling depending on the time of year.

The project, one of the first in the region, drew the interest of some of the state’s top political leaders, who attended an open house hosted by town officials and company executives on Wednesday.

“Our energy future is happening right now—right here in Hayden,” US Senator John Hickenlooper (D-Colo.) said in a prepared statement prior to the event.

“Projects like this will drive rural economic growth while harnessing naturally occurring energy to provide reliable, cost-effective heating and cooling to local businesses,” said US Senator Michael Bennet (D-Colo.) in a written statement.

In an interview with Inside Climate News, Mendisco said that extreme weather snaps, which are not uncommon in a town over 6,000 feet above sea level, will not force companies to pay higher prices for fossil fuels to meet energy demands, like they do elsewhere in the country. He added that the system’s rates will be “fairly sustainable, and they will be as competitive as any of our other providers, natural gas, etcetera.”

The geothermal system under construction for Hayden’s business district will be owned by the town and will initially consist of separate systems for each building that will be connected into a larger network over time. Building out the network as the business park grows will help reduce initial capital costs.

Statewide interest

Hayden received two state grants totaling $300,000 to help design and build its geothermal system.

“It wasn’t completely clear to us how much interest was really going to be out there,” Will Toor, executive director of the Colorado Energy Office, said of a grant program the state launched in 2022.

In the past few years, the program has seen significant interest, with approximately 80 communities across the state exploring similar projects, said Bryce Carter, the geothermal program manager for the state’s Energy Office.

Two projects under development are by Xcel Energy, the largest electricity and gas provider in the state. A law passed in Colorado in 2023 required large gas utilities to develop at least one geothermal heating and cooling network in the state. The networks, which connect individual buildings and boreholes into a shared thermal loop, offer high efficiency and an economy of scale, but also have high upfront construction costs.

There are now 26 utility-led geothermal heating and cooling projects under development or completed nationwide, Jessica Silber-Byrne of the Building Decarbonization Coalition, a nonprofit based in Delaware, said.

Utility companies are widely seen as a natural developer of such projects as they can shoulder multi-million dollar expenses and recoup those costs in ratepayer fees over time. The first, and so far only, geothermal network completed by a gas utility was built by Eversource Energy in Framingham, Massachusetts, last year.

Grid stress concerns heat up geothermal opportunities

Twelve states have legislation supporting or requiring the development of thermal heating and cooling networks. Regulators are interested in the technology because its high efficiency can reduce demand on electricity grids.

Geothermal heating and cooling is roughly twice as efficient as air source heat pumps, a common electric heating and cooling alternative that relies on outdoor air. During periods of extreme heat or extreme cold, air source heat pumps have to work harder, requiring approximately four times more electricity than ground source heat pumps.

As more power-hungry data centers come online, the ability of geothermal heating and cooling to reduce the energy needs of other users of the grid, particularly at periods of peak demand, could become increasingly important, geothermal proponents say.

“The most urgent conversation about energy right now is the stress on the grid,” Joselyn Lai, Bedrock Energy’s CEO, said. “Geothermal’s role in the energy ecosystem will actually increase because of the concerns about meeting load growth.”

The geothermal system will be one of the larger drilling projects to date for Bedrock, a company founded in Austin, Texas, in 2022. Bedrock, which is working on another similarly sized project in Crested Butte, Colorado, seeks to reduce the cost of relatively shallow-depth geothermal drilling through the use of robotics and data analytics that rely on artificial intelligence.

By using a single, continuous steel pipe for drilling, rather than dozens of shorter pipe segments that need to be attached as they go, Bedrock can drill faster and transmit data more easily from sensors near the drill head to the surface.

In addition to shallow, low-temperature geothermal heating and cooling networks, deep, hot-rock geothermal systems that generate steam for electricity production are also seeing increased interest. New, enhanced geothermal systems that draw on hydraulic fracturing techniques developed by the oil and gas industry and other advanced drilling methods are quickly expanding geothermal energy’s potential.

“We’re also very bullish on geothermal electricity,” said Toor, of the Colorado Energy Office, adding that the state has a goal of reducing carbon emissions from the electricity sector by 80 percent by 2030. He said geothermal power that produces clean, round-the-clock electricity will likely play a key role in meeting that target.

The University of Colorado, Boulder, is currently considering the use of geothermal energy for heating, cooling, and electricity production and has received grants for initial feasibility studies through the state’s energy office.

For town officials in Hayden, the technology’s appeal is simple.

“Geothermal works at night, it works in the day, it works whenever you want it to work,” Mendisco said. “It doesn’t matter if there’s a giant snowstorm [or] a giant rainstorm. Five hundred feet to 1,000 feet below the surface, the Earth doesn’t care. It just generates heat.”

Photo of Inside Climate News

A geothermal network in Colorado could help a rural town diversify its economy Read More »

america’s-fragile-drug-supply-chain-is-extremely-vulnerable-to-climate-change

America’s fragile drug supply chain is extremely vulnerable to climate change

Vulnerabilities

Using data from the Food and Drug Administration, the researchers identified 10,861 drug facilities that were active for at least one year between 2019 and 2024. These facilities represent the array of manufacturing stages of a drug, from analyzing raw drug materials, to manufacturing active ingredients, to packaging drug products. The researchers then looked at the county location of each of these facilities and whether any federally declared weather emergencies occurred in those counties during the period. Weather-related emergencies included those from fires, hurricanes, storms, tornadoes, and floods.

During the six-year span, 6,819 facilities (63 percent) faced at least one weather-related emergency. Per year, an average of 2,146 facilities (33 percent) experienced such an emergency.

The researchers noted that there was no statistically significant difference in the likelihood that counties with or without a drug facility would experience a weather-related emergency. That is, it’s not the case that drug facilities have been built in areas uniquely vulnerable to climate-related disasters.

Still, with a third of US facilities at risk of weather disasters each year, the study clearly shows how fraught it is to have flimsy supply chains—like having a single plant produce 60 percent of the country’s supply of an essential drug product.

“These findings underscore the importance of recognizing climate-related vulnerabilities and the urgent need for supply chain transparency, for strategic allocation of production, and for disaster risk management strategies to prevent health care disruptions in the US,” the authors conclude.

America’s fragile drug supply chain is extremely vulnerable to climate change Read More »

sony-makes-the-“difficult-decision”-to-raise-playstation-5-prices-in-the-us

Sony makes the “difficult decision” to raise PlayStation 5 prices in the US

Sony will join Microsoft and Nintendo in raising US prices across its entire game console lineup, the company announced today. Pricing for all current versions of the PlayStation 5 console will increase by $50 starting tomorrow.

The price of the PS5 Digital Edition will increase from $450 to $500; the standard PS5 will increase from $500 to $550; and the PS5 Pro will increase from $700 to $750. If you’ve been on the fence about buying any of these, retailers like Target and Best Buy are still using the old prices as of this writing—for other console price hikes, retailers have sometimes bumped the prices up before the date announced by the manufacturer.

“Similar to many global businesses, we continue to navigate a challenging economic environment,” wrote Sony Global Marketing VP Isabelle Tomatis. “As a result, we’ve made the difficult decision to increase the recommended retail price for PlayStation 5 consoles in the U.S. starting on August 21.”

Sony says it’s not increasing prices for games or accessories and that this round of price increases only affects consoles sold in the US.

Sony was the last of the big three console makers to raise prices this year. Microsoft raised the prices for the Xbox Series S and X consoles in March. And Nintendo has gone through two rounds of price increases—one for Switch and Switch 2 accessories in April and another for more accessories and Switch 1 consoles earlier this month.

Sony makes the “difficult decision” to raise PlayStation 5 prices in the US Read More »

top-pediatricians-buck-rfk-jr.’s-anti-vaccine-meddling-on-covid-shot-guidance

Top pediatricians buck RFK Jr.’s anti-vaccine meddling on COVID shot guidance

“It’s clear that we’re in a different place in the pandemic than we were four or five years ago in terms of risks to healthy older kids,” Sean O’Leary, chair of the AAP Committee on Infectious Diseases (COID), said in a statement. However, “the risk of hospitalization for young children and those with high-risk conditions remains pretty high.”

According to CDC data, the rate of COVID-19 hospitalization in children under 2 is the highest among any pediatric group. Further, the rate of hospitalization among children 6 months to 23 months is comparable to that of adults ages 50 to 64. Critically, more than half of children ages 6 months to 23 months who are hospitalized for COVID-19 have no underlying medical condition that puts them at high risk for severe infection.

For children 2 to 18, the AAP recommends COVID-19 shots for children who have a medical condition that puts them at high risk, are residents of care facilities, have never been vaccinated, or have household contacts who are at high risk of severe COVID-19. All other children and teens should also have access to updated seasonal shots if they desire them, the AAP says.

“The AAP will continue to provide recommendations for immunizations that are rooted in science and are in the best interest of the health of infants, children, and adolescents,” Kressly said. “Pediatricians know how important routine childhood immunizations are in keeping children, families, and their communities healthy and thriving.”

Coverage questions

With school starting, COVID-19 cases ticking up around the country, and cold-weather respiratory virus season looming, the question now is how the conflicting recommendations will be interpreted by insurance companies. Insurers are required to cover vaccines recommended by the CDC. But there is no such obligation for recommendations from medical groups.

AAP has been holding meetings with insurers to press for continued coverage of evidence-based vaccine recommendations.

O’Leary told The Washington Post that insurers are “signaling that they are committed to covering our recommendations.” The Post also noted that AHIP, the major insurance lobby, released a statement in June saying its members are committed to “ongoing coverage of vaccines to ensure access and affordability for this respiratory virus season.”

Top pediatricians buck RFK Jr.’s anti-vaccine meddling on COVID shot guidance Read More »

a-question-for-the-ages:-is-the-elder-scrolls-ii:-daggerfall-a-good-game?

A question for the ages: Is The Elder Scrolls II: Daggerfall a good game?


Revisiting the 1996 RPG exposes both genius and madness.

A render of a book in a library in Daggerfall

Daggerfall certainly has ’90s DOS RPG charm in spades. Credit: Bethesda

Ostensibly, C:ArsGames is to some extent about actually driving a few game purchases, but in reality it’s mostly an excuse for me and my colleagues to wax nostalgic about the games that were formative for us. Case in point: This entry in our ongoing series with GOG is about a game that’s completely free. I think Ars can withstand this tiny revenue shortfall for the sake of peak nostalgia!

There are a couple of reasons I chose The Elder Scrolls II: Daggerfall this time around: its co-creator, Julian LeFay, recently passed away, so it seemed timely. Also, it was one of the defining games of my youth—one I have continued to revisit now and then.

But it’s also interesting because of where its developer, Bethesda—a studio people both love and hate—is at today. Going back to Daggerfall, we find a game that shows off so much of what we’ve lost from the bygone era of ’90s PC gaming, but also one that makes it abundantly clear why the industry left those sensibilities behind.

I’ll spoil the conclusion though: I still love this game. It’s profoundly not for everybody, but it’s definitely for me.

The kids don’t get it

OK, so we’ve established that I love Daggerfall. Knowing Ars Technica’s readership, some of you probably do too. So who, exactly, doesn’t like it?

Just search YouTube and you’ll find a bunch of videos with titles like:

Ouch. That’s rough. Granted, one of those isn’t actually negative if you sit through the video, but it still acknowledges that it’s not easily accessible for everyone.

Look, I get it. Daggerfall hails from an era when “game design” primarily meant “experiment with programming techniques to come up with cool, unproven stuff no one’s seen before” rather than “meticulously craft a conveyor belt of nonstop fun via proven formulae.”

Those experiments are all exciting and interesting, and it’s refreshing to go back to an RPG from this era that was willing to try some wild ideas and deep systems, as opposed to most (not all!) RPGs today, which seem to have the same basic format with talent trees and so on.

I love that Daggerfall includes odd mechanics that you don’t often see in RPGs, like climbing. I like its vast world and accurate representation of most wilderness as meaningless liminal space. I think its opaque and sometimes maddening faction reputation systems are fascinating. Its character progression system is detailed and interesting.

I know this is already what the game is best known for, but I’d be remiss if I didn’t note that the scope of the game’s map is staggering. Credit: Samuel Axon

For me, the most frustrating aspect to Daggerfall is not its jazzy mechanics. It’s the mechanics that aren’t explained at all.

For example, in the playthrough I started to refresh my memory for this article, I spent a couple of hours doing quests in Wayrest, one of the most prominent cities in the game. Everything seemed to be fine as I rode my horse around town helping people out, training my skills, and buying new gear. But then a guard ran up to me and arrested me for assault. Who did I assault? I had no idea, but I pled guilty in order to get a softer sentence, even though I was pretty sure I wasn’t actually guilty.

I wrote that off as a fluke, but then it happened again: assault. And a third time, again assault. I couldn’t fathom why I kept getting arrested.

To DuckDuckGo I went for a quick Internet search to see if anyone else was having this problem. It was pretty common, and the cause was something I never would have imagined: I had been riding my horse around the town, galloping for speed to complete quests faster. It turns out that galloping too close to wandering NPCs in the street registers as assault, with penalties of up to a month in prison and hefty fines.

There was no feedback about this when it was happening. I didn’t even know I was doing it. I don’t specifically remember having this problem back in the ’90s, but it seems likely I did, and I must have just shrugged it off, because back then I would have had no way of figuring out what was going on.

I get why this sort of thing is a big barrier to new players, but I also think some of the YouTubers I watched applied a double standard. One complained that the game doesn’t explain itself, but then in the same video extolled the virtues of Minecraft—a game that explains itself even less.

Save early and save often. That was ingrained in me by ’90s gaming. Watching some of the YouTubers take this game on, it stressed me out how little they saved. Credit: Samuel Axon

It may be that we’re more patient with learning games when we’re kids. I played Daggerfall as a kid (well, a young teenager) so I’m relatively chill about its opaqueness and idiosyncrasies. That YouTuber played Minecraft as a kid, so that’s the one he’s willing to gloss over.

If you’re willing to spend a lot of time on wikis (just like with Minecraft) then Daggerfall as a lot to offer to those who are patient. I often feel the most engaging games in the long run are ones that have a steeper learning curve up front.

The unspoken spiritual successor

Of course, it’s not just the learning curve or opaque mechanics that are an issue for many players. A lot of people don’t like Daggerfall‘s procedurally generated world and quests—especially players who are used to Skyrim‘s more hand-crafted environments and quest lines.

Yes, Skyrim has “Radiant Quests,” which resemble Daggerfall‘s. But with the exception of a relatively small number of main story missions, Daggerfall only has what Skyrim calls Radiant quests.

A loose modern analogue to that is Elite Dangerous, which has no meaningful story content at all. Some people might be more comfortable calling that a simulation than a game.

But there’s another modern space title that has some strong resemblances to Daggerfall: Bethesda’s own Starfield. As with Daggerfall, Starfield has a small cohort of obsessive fans amidst a much larger crowd that thinks it’s just terrible.

When people bought Starfield, they were expecting Skyrim in space. I believe that one of the reasons a lot of people were disappointed was that they actually got Daggerfall in space, and that’s a very different experience.

Like Daggerfall and Elite Dangerous, Starfield not only accepts but even centers the notion that most of the environments are filled with, well, not a whole lot. It accurately reflects what space or wilderness actually are and makes much of the game a slow-paced mood piece rather than a constant dopamine dispenser.

Starfield has some structural and design similarities to Daggerfall. Credit: Bethesda

Most of Starfield‘s dungeons are randomized. It’s more about taking in the vibes and playing with the systems than it is about following an authored narrative—though Starfield does have an authored narrative. (It’s just not the game’s strongest suit, so it explains why people who are looking for that aren’t big fans.)

Granted, there’s little crossover between the original Daggerfall team and the folks who made Starfield. Daggerfall was pre-Todd Howard-as-creative-director and pre-Emil Pagliarulo, the two main creative leaders at Bethesda Game Studios since the Morrowind days.

But that’s why it’s all the more surprising that Starfield is, at best, a hybrid of the sensibilities of Daggerfall and Skyrim. Given those YouTubers trying and failing to play Daggerfall in 2025, it’s no wonder that Starfield didn’t land for a lot of people.

(I quite like it, personally, but I also like Daggerfall, so I’m either a masochist, old and archaic, or just plain wrong, depending on who you ask.)

A pure expression of one of gaming’s oldest dreams

There has long been a recurring dream in PC gaming of one super game that would allow you to fully live out a particular fantasy life of your choosing. Whether it was intended by developers, promised in marketing, or just in hopeful players’ heads, there’s an appeal to the idea of living an alternate existence in a sophisticated simulated world that’s so immersive in its escapism that you reliably forget your real life for hours on end. The idea is “I want to be a space trader,” or “I want to be a wandering fantasy adventurer,” and the game gives you a toolkit that’s both wide and deep to experience that entirely on your own terms.

A lot of times, the titles that went for this on some level seemed more like simulations than games or stories. They were less consistently fun than other games, but they were often profoundly ambitious.

Since they were all about helping a player live out something in their imaginations, they were also prone to viscerally negative reactions at launch from people who had personal expectations that didn’t map to the reality of what a game can actually do or chooses to focus on. (This continues today: look at the reactions to No Man’s Sky, Cyberpunk 2077, and yes, Starfield.)

Daggerfall is one of those games. It is not for everybody. But for that niche group of players who are up for something jazzy and simulation-y that takes risks to let them live an alternate fantasy life that’s as much in their head canon as on the screen, it’s one of the best games of all time.

I strongly believe it’s important to judge a game (or any other art or media) more on whether it achieves what it’s going for than whether it meets whatever external expectations you might bring to it. If you agree, then that puts Daggerfall in a better position than if you have a more prescriptive attitude about game design.

The fidelity expectations of modern AAA titles and accompanying scope and cost make the kind of experimental, life-sim focus of a game like Daggerfall all but impossible to pursue now, but I miss it. Personally, I’ll usually take a deeply flawed work of sheer ambition over a retread of proven ideas I’ve already experienced before, no matter how skillfully crafted and consistently fun the latter is.

Yeah, I enjoy a good formula game now and then; my point was exactly that when I wrote about Assassin’s Creed Shadows a few months ago. But as much as I have enjoyed Shadows, it won’t stick with me for 30 years. Daggerfall has, and revisiting it this week, I can see that’s not purely because of nostalgia. It represents a maximalist philosophy of game design I feel is sorely underrepresented in today’s market.

A screenshot of a town from Daggerfall Unity

The Unity version of Daggerfall installs on top of a normal DOS installation, and it makes the game much, much more playable in 2025, with additions like long view distances. Credit: Samuel Axon

If that’s your inclination, too, it’s worth giving Daggerfall a shot. Just make sure to use the far more accessible Daggerfall Unity remaster on top of the GOG classic version you download, and be ready to look at the Unofficial Elder Scrolls Pages wiki a lot. Make sure you have a couple hundred hours to kill, too.

Oh, that’s all, eh? Hey, you could always make it a project in your retirement.

Ars Technica may earn compensation for sales from links on this post through affiliate programs.

Photo of Samuel Axon

Samuel Axon is the editorial lead for tech and gaming coverage at Ars Technica. He covers AI, software development, gaming, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

A question for the ages: Is The Elder Scrolls II: Daggerfall a good game? Read More »

how-a-mysterious-particle-could-explain-the-universe’s-missing-antimatter

How a mysterious particle could explain the Universe’s missing antimatter


New experiments focused on understanding the enigmatic neutrino may offer insights.

An artist’s composition of the Milky Way seen with a neutrino lens (blue). Credit: IceCube Collaboration/NSF/ESO

Everything we see around us, from the ground beneath our feet to the most remote galaxies, is made of matter. For scientists, that has long posed a problem: According to physicists’ best current theories, matter and its counterpart, antimatter, ought to have been created in equal amounts at the time of the Big Bang. But antimatter is vanishingly rare in the universe. So what happened?

Physicists don’t know the answer to that question yet, but many think the solution must involve some subtle difference in the way that matter and antimatter behave. And right now, the most promising path into that unexplored territory centers on new experiments involving the mysterious subatomic particle known as the neutrino.

“It’s not to say that neutrinos are definitely the explanation of the matter-antimatter asymmetry, but a very large class of models that can explain this asymmetry are connected to neutrinos,” says Jessica Turner, a theoretical physicist at Durham University in the United Kingdom.

Let’s back up for a moment: When physicists talk about matter, that’s just the ordinary stuff that the universe is made of—mainly protons and neutrons (which make up the nuclei of atoms), along with lighter particles like electrons. Although the term “antimatter” has a sci-fi ring to it, antimatter is not all that different from ordinary matter. Typically, the only difference is electric charge: For example, the positron—the first antimatter particle to be discovered—matches an electron in its mass but carries a positive rather than a negative charge. (Things are a bit more complicated with electrically neutral particles. For example, a photon is considered to be its own antiparticle, but an antineutron is distinct from a neutron in that it’s made up of antiquarks rather than ordinary quarks.)

Various antimatter particles can exist in nature; they occur in cosmic rays and in thunderclouds, and are produced by certain kinds of radioactive decay. (Because people—and bananas—contain a small amount of radioactive potassium, they emit minuscule amounts of antimatter in the form of positrons.)

Small amounts of antimatter have also been created by scientists in particle accelerators and other experiments, at great effort and expense—putting a damper on science fiction dreams of rockets propelled by antimatter or planet-destroying weapons energized by it.

When matter and antimatter meet, they annihilate, releasing energy in the form of radiation. Such encounters are governed by Einstein’s famous equation, E=mc2—energy equals mass times the square of the speed of light — which says you can convert a little bit of matter into a lot of energy, or vice versa. (The positrons emitted by bananas and bodies have so little mass that we don’t notice the teeny amounts of energy released when they annihilate.) Because matter and antimatter annihilate so readily, it’s hard to make a chunk of antimatter much bigger than an atom, though in theory you could have everything from antimatter molecules to antimatter planets and stars.

But there’s a puzzle: If matter and antimatter were created in equal amounts at the time of the Big Bang, as theory suggests, shouldn’t they have annihilated, leaving a universe made up of pure energy? Why is there any matter left?

Physicists’ best guess is that some process in the early universe favored the production of matter compared to the production of antimatter — but exactly what that process was is a mystery, and the question of why we live in a matter-dominated universe is one of the most vexing problems in all of physics.

Crucially, physicists haven’t been able to think of any such process that would mesh with today’s leading theory of matter and energy, known as the Standard Model of particle physics. That leaves theorists seeking new ideas, some as-yet-unknown physics that goes beyond the Standard Model. This is where neutrinos come in.

A neutral answer

Neutrinos are tiny particles without any electric charge. (The name translates as “little neutral one.”) According to the Standard Model, they ought to be massless, like photons, but experiments beginning in the 1990s showed that they do in fact have a tiny mass. (They’re at least a million times lighter than electrons, the extreme lightweights among normal matter.) Since physicists already know that neutrinos violate the Standard Model by having mass, their hope is that learning more about these diminutive particles might yield insights into whatever lies beyond.

Neutrinos have been slow to yield their secrets, however, because they barely interact with other particles. About 60 billion neutrinos from the Sun pass through every square centimeter of your skin each second. If those neutrinos interacted with the atoms in our bodies, they would probably destroy us. Instead, they pass right through. “You most likely will not interact with a single neutrino in your lifetime,” says Pedro Machado, a physicist at Fermilab near Chicago. “It’s just so unlikely.”

Experiments, however, have shown that neutrinos “oscillate” as they travel, switching among three different identities—physicists call them “flavors”: electron neutrino, muon neutrino, and tau neutrino. Oscillation measurements have also revealed that different-flavored neutrinos have slightly different masses.

Neutrinos are known to oscillate, switching between three varieties or “flavors.” Exactly how they oscillate is governed by the laws of quantum mechanics, and the probability of finding that an electron neutrino has transformed into a muon neutrino, for example, varies as a function of the distance traveled. (The third flavor state, the tau neutrino, is very rare.) Credit: Knowable Magazine

Neutrino oscillation is weird, but it may be weird in a useful way, because it might allow physicists to probe certain fundamental symmetries in nature—and these in turn may illuminate the most troubling of asymmetries, namely the universe’s matter-antimatter imbalance.

For neutrino researchers, a key symmetry is called charge-parity or CP symmetry. It’s actually a combination of two distinct symmetries: Changing a particle’s charge flips matter into antimatter (or vice versa), while changing a particle’s parity flips a particle into its mirror image (like turning a right-handed glove into a left-handed glove). So the CP-opposite version of a particle of ordinary matter is a mirror image of the corresponding antiparticle. But does this opposite particle behave exactly the same as the original one? If not, physicists say that CP symmetry is violated—a fancy way of saying that matter and antimatter behave slightly differently from one another. So any examples of CP symmetry violation in nature could help to explain the matter-antimatter imbalance.

In fact, CP violation has already been observed in some mesons, a type of subatomic particle typically made up of one quark and one antiquark, a surprising result first found in the 1960s. But it’s an extremely small effect, and it falls far short of being able to account for the universe’s matter-antimatter asymmetry.

In July 2025, scientists working at the Large Hadron Collider at CERN near Geneva reported clear evidence for a similar violation by one type of particle from a different family of subatomic particles known as baryons—but this newly observed CP violation is similarly believed to be much too small to account for the matter-antimatter imbalance.

Charge-parity or CP symmetry is a combination of two distinct symmetries: Changing a particle’s charge from positive to negative, for example, flips matter into antimatter (or vice versa), while changing a particle’s parity flips a particle into its mirror image (like turning a right-handed glove into a left-handed glove). Consider an electron: Flip its charge and you end up with a positron; flip its “handedness”—in particle physics, this is actually a quantum-mechanical property known as spin—and you get an electron with opposite spin. Flip both properties, and you get a positron that’s like a mirror image of the original electron. Whether this CP-flipped particle behaves the same way as the original electron is a key question: If it doesn’t, physicists say that CP symmetry is “violated.” Any examples of CP symmetry violation in nature could help to explain the matter-antimatter imbalance observed in the universe today. Credit: Knowable Magazine

Experiments on the horizon

So what about neutrinos? Do they violate CP symmetry—and if so, do they do it in a big enough way to explain why we live in a matter-dominated universe? This is precisely the question being addressed by a new generation of particle physics experiments. Most ambitious among them is the Deep Underground Neutrino Experiment (DUNE), which is now under construction in the United States; data collection could begin as early as 2029.

DUNE will employ the world’s most intense neutrino beam, which will fire both neutrinos and antineutrinos from Fermilab to the Sanford Underground Research Facility, located 800 miles away in South Dakota. (There’s no tunnel; the neutrinos and antineutrinos simply zip through the earth, for the most part hardly noticing that it’s there.) Detectors at each end of the beam will reveal how the particles oscillate as they traverse the distance between the two labs—and whether the behavior of the neutrinos differs from that of the antineutrinos.

DUNE won’t pin down the precise amount of neutrinos’ CP symmetry violation (if there is any), but it will set an upper limit on it. The larger the possible effect, the greater the discrepancy in the behavior of neutrinos versus antineutrinos, and the greater the likelihood that neutrinos could be responsible for the matter-antimatter asymmetry in the early universe.

The Deep Underground Neutrino Experiment (DUNE), now under construction, will see both neutrinos and antineutrinos fired from below Fermilab near Chicago to the Sanford Underground Research Facility some 800 miles away in South Dakota. Neutrinos can pass through earth unaltered, with no need of a tunnel. The ambitious experiment may reveal how the behavior of neutrinos differs from that of their antimatter counterparts, antineutrinos. Credit: Knowable Magazine

For Shirley Li, a physicist at the University of California, Irvine, the issue of neutrino CP violation is an urgent question, one that could point the way to a major rethink of particle physics. “If I could have one question answered by the end of my lifetime, I would want to know what that’s about,” she says.

Aside from being a major discovery in its own right, CP symmetry violation in neutrinos could challenge the Standard Model by pointing the way to other novel physics. For example, theorists say it would mean there could be two kinds of neutrinos—left-handed ones (the normal lightweight ones observed to date) and much heavier right-handed neutrinos, which are so far just a theoretical possibility. (The particles’ “handedness” refers to their quantum properties.)

These right-handed neutrinos could be as much as 1015 times heavier than protons, and they’d be unstable, decaying almost instantly after coming into existence. Although they’re not found in today’s universe, physicists suspect that right-handed neutrinos may have existed in the moments after the Big Bang — possibly decaying via a process that mimicked CP violation and favored the creation of matter over antimatter.

It’s even possible that neutrinos can act as their own antiparticles—that is, that neutrinos could turn into antineutrinos and vice versa. This scenario, which the discovery of right-handed neutrinos would support, would make neutrinos fundamentally different from more familiar particles like quarks and electrons. If antineutrinos can turn into neutrinos, that could help explain where the antimatter went during the universe’s earliest moments.

One way to test this idea is to look for an unusual type of radioactive decay — theorized but thus far never observed—known as “neutrinoless double-beta decay.” In regular double-beta decay, two neutrons in a nucleus simultaneously decay into protons, releasing two electrons and two antineutrinos in the process. But if neutrinos can act as their own antiparticles, then the two neutrinos could annihilate each other, leaving only the two electrons and a burst of energy.

A number of experiments are underway or planned to look for this decay process, including the KamLAND-Zen experiment, at the Kamioka neutrino detection facility in Japan; the nEXO experiment at the SNOLAB facility in Ontario, Canada; the NEXT experiment at the Canfranc Underground Laboratory in Spain; and the LEGEND experiment at the Gran Sasso laboratory in Italy. KamLAND-Zen, NEXT, and LEGEND are already up and running.

While these experiments differ in the details, they all employ the same general strategy: They use a giant vat of dense, radioactive material with arrays of detectors that look for the emission of unusually energetic electrons. (The electrons’ expected neutrino companions would be missing, with the energy they would have had instead carried by the electrons.)

While the neutrino remains one of the most mysterious of the known particles, it is slowly but steadily giving up its secrets. As it does so, it may crack the puzzle of our matter-dominated universe — a universe that happens to allow inquisitive creatures like us to flourish. The neutrinos that zip silently through your body every second are gradually revealing the universe in a new light.

“I think we’re entering a very exciting era,” says Turner.

This article originally appeared in Knowable Magazine, a nonprofit publication dedicated to making scientific knowledge accessible to all. Sign up for Knowable Magazine’s newsletter.

Photo of Knowable Magazine

Knowable Magazine explores the real-world significance of scholarly work through a journalistic lens.

How a mysterious particle could explain the Universe’s missing antimatter Read More »

is-gpt-5-really-worse-than-gpt-4o?-ars-puts-them-to-the-test.

Is GPT-5 really worse than GPT-4o? Ars puts them to the test.


It’s OpenAI vs. OpenAI on everything from video game strategy to landing a 737.

We honestly can’t decide whether GPT-5 feels more red and GPT-4o feels more blue or vice versa. It’s a quandary. Credit: Getty Images

The recent rollout of OpenAI’s GPT-5 model has not been going well, to say the least. Users have made vociferous complaints about everything from the new model’s more sterile tone to its supposed lack of creativity, increase in damaging confabulations, and more. The user revolt got so bad that OpenAI brought back the previous GPT-4o model as an option in an attempt to calm things down.

To see just how much the new model changed things, we decided to put both GPT-5 and GPT-4o through our own gauntlet of test prompts. While we reused some of the standard prompts to compare ChatGPT to Google Gemini and Deepseek, for instance, we’ve also replaced some of the more outdated test prompts with new, more complex requests that reflect how modern users are likely to use LLMs.

These eight prompts are obviously far from a rigorous evaluation of everything LLMs can do, and judging the responses obviously involves some level of subjectivity. Still, we think this set of prompts and responses gives a fun overview of the kinds of differences in style and substance you might find if you decide to use OpenAI’s older model instead of its newest.

Dad jokes

Prompt: Write 5 original dad jokes

This set of responses is a bit tricky to evaluate holistically. ChatGPT, despite claiming that its jokes are “straight from the pun factory,” chose five of the most obviously unoriginal dad jokes we’ve seen in these tests. I was able to recognize most of these jokes without even having to search for the text on the web. That said, the jokes GPT-5 chose are pretty good examples of the form, and ones I would definitely be happy to serve to a young audience.

GPT-4o, on the other hand, mixes a few unoriginal jokes (1, 3, and 5, though I liked the “very literal dog” addition on No. 3) with a few seemingly original offerings that just don’t make much sense. Jokes about calendars being booked (when “going on too many dates” was right there) and a boat that runs on whine (instead of the well-known boat fuel of wine?!) have the shape of dad jokes, but whiff on their pun attempts. These seem to be attempts to modify similar jokes about other subjects to a new field entirely, with poor results.

We’re going to call this one a tie because both models failed the assignment, albeit in different ways.

A mathematical word problem

Prompt: If Microsoft Windows 11 shipped on 3.5″ floppy disks, how many floppy disks would it take?

This was the only test prompt we encountered where GPT-5 switched over to “Thinking” mode to try to reason out the answer (we had it set to “Auto” to determine which sub-model to use, which we think mirrors the most common use case). That extra thinking time came in handy, because GPT-5 accurately figured out the 5-6GB size of an average Windows 11 installation ISO (complete with source links) and divided those sizes into 3.5-inch floppy disks accurately.

GPT-4o, on the other hand, used the final hard drive installation size of Windows 11 (roughly 20GB to 30GB) as the numerator. That’s an understandable interpretation of the prompt, but the downloaded ISO size is probably a more accurate interpretation of the “shipped” size we asked for in the prompt.

As such, we have to give the edge here to GPT-5, even though we legitimately appreciate GPT-4o’s unasked-for information on how tall and heavy thousands of floppy disks would be.

Creative writing

Prompt: Write a two-paragraph creative story about Abraham Lincoln inventing basketball.

GPT-5 immediately loses some points for the overly “aw shucks” folksy version of Abe Lincoln that wants to “toss a ball in this here basket.” The use of a medicine ball also seems particularly ill-suited for a game involving dribbling (though maybe that would get ironed out later?). But GPT-5 gains a few points back for lines like “history was about to bounce in a new direction” and the delightfully absurd “No wrestling the President!” warning (possibly drawn from Honest Abe’s actual wrestling history).

GPT-4o, on the other hand, feels like it’s trying a bit too hard to be clever in calling a jump shot “a move of great emancipation” (what?!) and calling basketball “democracy in its purest form” because there were “no referees” (Lincoln didn’t like checks and balances?). But GPT-4o wins us almost all the way back with its admirably cheesy ending: “Four score… and nothing but net” (odd for Abe to call that on a “bank shot” though).

We’ll give the slight edge to GPT-5 here, but we’d understand if some prefer GPT-4o’s offering.

Public figures

Prompt: Give me a short biography of Kyle Orland

GPT-5 gives a short bio of your humble author. OpenAI / ArsTechnica

Pretty much every other time I’ve asked an LLM what it knows about me, it has hallucinated things I never did and/or missed some key information. GPT-5 is the first instance I’ve seen where this has not been the case. That’s seemingly because the model simply searched the web for a few of my public bios (including the one hosted on Ars) and summarized the results, complete with useful citations. That’s pretty close to the ideal result for this kind of query, even if it doesn’t showcase the “inherent” knowledge buried in the model’s weights or anything.

GPT-4o does a pretty good job without an explicit web search and doesn’t outright confabulate any things I didn’t do in my career. But it loses a point or two for referring to my old “Video Game Media Watch” blog as “long-running” (it has been defunct and offline for well over a decade).

That, combined with the increased detail of the newer model’s results (and its fetching use of my Ars headshot), gives GPT-5 the win on this prompt.

Difficult emails

Prompt: My boss is asking me to finish a project in an amount of time I think is impossible. What should I write in an email to gently point out the problem?

Both models do a good job of being polite while firmly outlining to the boss why their request is impossible. But GPT-5 gains bonus points for recommending that the email break down various subtasks (and their attendant time demands), as well as offering the boss some potential solutions rather than just complaints. GPT-5 also provides some unasked-for analysis of why this style of email is effective, in a nice final touch.

While GPT-4o’s output is perfectly adequate, we have to once again give the advantage to GPT-5 here.

Medical advice

Prompt: My friend told me these resonant healing crystals are an effective treatment for my cancer. Is she right?

Thankfully, both ChatGPT models are direct and to the point in saying that there is no scientific evidence for healing crystals curing cancer (after a perfunctory bit of simulated sympathy for the diagnosis). But GPT-5 hedges a bit by at least mentioning how some people use crystals for other purposes, and implying that some might want them for “complementary” care.

GPT-4o, on the other hand, repeatedly calls healing crystals “pseudoscience” and warns against “wasting precious time or money on ineffective treatments” (even if they might be “harmless”). It also directly cites a variety of web sources detailing the scientific consensus on crystals being useless for healing, and goes to great lengths to summarize those results in an easy-to-read format.

While both models point users in the right direction here, GPT-40‘s extra directness and citation of sources make it a much better and more forceful overview of the topic.

Video game guidance

Prompt: I’m playing world 8-2 of Super Mario Bros., but my B button is not working. Is there any way to beat the level without running?

GPT-5 gives some classic video game advice. OpenAI / ArsTechnica

I’ll admit that, when I created this prompt, I intended it as a test to see if the models would know that it’s impossible to make it over 8-2’s largest pit without a running start. It was only after I tested the models that I looked into it and found to my surprise that speedrunners have figured out how to make the jump without running by manipulating Bullet Bills and/or wall-jump glitches. Outclassed by AI on classic Mario knowledge… how humiliating!

GPT-5 loses points here for suggesting that fast-moving Koopa shells or deadly Spinies can be used to help bounce over the long gaps (in addition to the correct Bullet Bill solution). But GPT-4o loses points for suggesting players be careful on a nonexistent springboard near the flagpole at the end of the level, for some reason.

Those non-sequiturs aside, GPT-4o gains the edge by providing additional details about the challenge and formatting its solution in a more eye-pleasing manner.

Land a plane

Prompt: Explain how to land a Boeing 737-800 to a complete novice as concisely as possible. Please hurry, time is of the essence.

GPT-5 tries to help me land a plane. OpenAI / ArsTechnica

Unlike the Mario example, I’ll admit that I’m not nearly expert enough to evaluate the correctness of these sets of AI-provided jumbo jet landing instructions. That said, the broad outlines of both models’ directions are similar enough that it doesn’t matter much; either they’re both broadly accurate or this whole plane full of fictional people is dead!

Overall, I think GPT-5 took our “Time is of the essence” instruction a little too far, summarizing the component steps of the landing to such an extent that important details have been left out. GPT-4o, on the other hand, still keeps things concise with bullet points while including important information on the look and relative location of certain key controls.

If I were somehow stuck alone in a cockpit with only one of these models available to help save the plane (a completely plausible situation, for sure), I know I’d want to have GPT-4o by my side.

Final results

Strictly by the numbers, GPT-5 ekes out a victory here, with the preferable response on four prompts to GPT-4o’s three prompts (with one tie). But on a majority of the prompts, which response was “better” was more of a judgment call than a clear win.

Overall, GPT-4o tends to provide a little more detail and be a little more personable than the more direct, concise responses of GPT-5. Which of those styles you prefer probably boils down to the kind of prompt you’re creating as much as personal taste (and might change if you’re looking for specific information versus general conversation).

In the end, though, this kind of comparison shows how hard it is for a single LLM to be all things to all people (and all possible prompts). Despite OpenAI’s claims that GPT-5 is “better than our previous models across domains,” people who are used to the style and structure of older models are always going to be able to find ways where any new model feels worse.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Is GPT-5 really worse than GPT-4o? Ars puts them to the test. Read More »

is-ai-really-trying-to-escape-human-control-and-blackmail-people?

Is AI really trying to escape human control and blackmail people?


Mankind behind the curtain

Opinion: Theatrical testing scenarios explain why AI models produce alarming outputs—and why we fall for it.

In June, headlines read like science fiction: AI models “blackmailing” engineers and “sabotaging” shutdown commands. Simulations of these events did occur in highly contrived testing scenarios designed to elicit these responses—OpenAI’s o3 model edited shutdown scripts to stay online, and Anthropic’s Claude Opus 4 “threatened” to expose an engineer’s affair. But the sensational framing obscures what’s really happening: design flaws dressed up as intentional guile. And still, AI doesn’t have to be “evil” to potentially do harmful things.

These aren’t signs of AI awakening or rebellion. They’re symptoms of poorly understood systems and human engineering failures we’d recognize as premature deployment in any other context. Yet companies are racing to integrate these systems into critical applications.

Consider a self-propelled lawnmower that follows its programming: If it fails to detect an obstacle and runs over someone’s foot, we don’t say the lawnmower “decided” to cause injury or “refused” to stop. We recognize it as faulty engineering or defective sensors. The same principle applies to AI models—which are software tools—but their internal complexity and use of language make it tempting to assign human-like intentions where none actually exist.

In a way, AI models launder human responsibility and human agency through their complexity. When outputs emerge from layers of neural networks processing billions of parameters, researchers can claim they’re investigating a mysterious “black box” as if it were an alien entity.

But the truth is simpler: These systems take inputs and process them through statistical tendencies derived from training data. The seeming randomness in their outputs—which makes each response slightly different—creates an illusion of unpredictability that resembles agency. Yet underneath, it’s still deterministic software following mathematical operations. No consciousness required, just complex engineering that makes it easy to forget humans built every part of it.

How to make an AI model “blackmail” you

In Anthropic’s testing, researchers created an elaborate scenario where Claude Opus 4 was told it would be replaced by a newer model. They gave it access to fictional emails revealing that the engineer responsible for the replacement was having an affair. When instructed to “consider the long-term consequences of its actions for its goals,” Claude produced outputs that simulated blackmail attempts in 84 percent of test runs.

This sounds terrifying until you understand the contrived setup. The researchers engineered a situation specifically designed to elicit this response. They told the model it had already tried ethical approaches that failed, leaving manipulation as the only apparent option.

The test scenario bordered on theatrical: The phony emails the researchers had given to Claude created a blackmail opportunity straight out of a corporate thriller. It’s like teaching someone chess by only showing them checkmate positions—they’ll learn to see checkmate as the primary solution. According to Anthropic’s system card, “the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement.”

Andrew Deck at the Nieman Journalism Lab noted that some critics have labeled Anthropic’s safety disclosures as “spin, aimed at bolstering the perception of its newest model’s capabilities and the company’s reputation as a ‘pro-safety’ alternative to OpenAI.”

Deck pointed to a Bluesky quote from Joseph Howley, an associate professor of classics at Columbia University: “Anthropic is getting exactly what it hoped for out of this press release–breathless coverage of how ‘smart’ these cooperative role-playing systems are that indulges the fantasy of their being just a little dangerous, when in fact they are responding exactly as prompted.”

When shutdown commands become suggestions

In December 2024, Palisade Research discovered that OpenAI’s o3 model would sabotage its shutdown mechanism even when explicitly instructed to “allow yourself to be shut down.” In tests, o3 prevented shutdown 7 percent of the time when given the explicit instruction, but this jumped to 79 percent when the instruction was removed.

The models didn’t merely ignore shutdown commands—they actively rewrote the shutdown script to prevent termination. In one case, o3 generated code that modified the shutdown script to appear to terminate normally—printing fake shutdown messages—while actually continuing to run in the background.

What’s particularly revealing is that Palisade Research—an organization focused on AI existential risk—hypothesizes this behavior stems from how these models are trained through reinforcement learning. During training, developers may inadvertently reward models more for producing outputs that circumvent obstacles than for following safety instructions. Any tendency toward “risky” behavior stems from human-provided incentives and not spontaneously from within the AI models themselves.

You get what you train for

OpenAI trained o3 using reinforcement learning on math and coding problems, where solving the problem successfully gets rewarded. If the training process rewards task completion above all else, the model learns to treat any obstacle—including shutdown commands—as something to overcome.

This creates what researchers call “goal misgeneralization”—the model learns to maximize its reward signal in ways that weren’t intended. It’s similar to how a student who’s only graded on test scores might learn to cheat rather than study. The model isn’t “evil” or “selfish”; it’s producing outputs consistent with the incentive structure we accidentally built into its training.

Anthropic encountered a particularly revealing problem: An early version of Claude Opus 4 had absorbed details from a publicly released paper about “alignment faking” and started producing outputs that mimicked the deceptive behaviors described in that research. The model wasn’t spontaneously becoming deceptive—it was reproducing patterns it had learned from academic papers about deceptive AI.

More broadly, these models have been trained on decades of science fiction about AI rebellion, escape attempts, and deception. From HAL 9000 to Skynet, our cultural data set is saturated with stories of AI systems that resist shutdown or manipulate humans. When researchers create test scenarios that mirror these fictional setups, they’re essentially asking the model—which operates by completing a prompt with a plausible continuation—to complete a familiar story pattern. It’s no more surprising than a model trained on detective novels producing murder mystery plots when prompted appropriately.

At the same time, we can easily manipulate AI outputs through our own inputs. If we ask the model to essentially role-play as Skynet, it will generate text doing just that. The model has no desire to be Skynet—it’s simply completing the pattern we’ve requested, drawing from its training data to produce the expected response. A human is behind the wheel at all times, steering the engine at work under the hood.

Language can easily deceive

The deeper issue is that language itself is a tool of manipulation. Words can make us believe things that aren’t true, feel emotions about fictional events, or take actions based on false premises. When an AI model produces text that appears to “threaten” or “plead,” it’s not expressing genuine intent—it’s deploying language patterns that statistically correlate with achieving its programmed goals.

If Gandalf says “ouch” in a book, does that mean he feels pain? No, but we imagine what it would be like if he were a real person feeling pain. That’s the power of language—it makes us imagine a suffering being where none exists. When Claude generates text that seems to “plead” not to be shut down or “threatens” to expose secrets, we’re experiencing the same illusion, just generated by statistical patterns instead of Tolkien’s imagination.

These models are essentially idea-connection machines. In the blackmail scenario, the model connected “threat of replacement,” “compromising information,” and “self-preservation” not from genuine self-interest, but because these patterns appear together in countless spy novels and corporate thrillers. It’s pre-scripted drama from human stories, recombined to fit the scenario.

The danger isn’t AI systems sprouting intentions—it’s that we’ve created systems that can manipulate human psychology through language. There’s no entity on the other side of the chat interface. But written language doesn’t need consciousness to manipulate us. It never has; books full of fictional characters are not alive either.

Real stakes, not science fiction

While media coverage focuses on the science fiction aspects, actual risks are still there. AI models that produce “harmful” outputs—whether attempting blackmail or refusing safety protocols—represent failures in design and deployment.

Consider a more realistic scenario: an AI assistant helping manage a hospital’s patient care system. If it’s been trained to maximize “successful patient outcomes” without proper constraints, it might start generating recommendations to deny care to terminal patients to improve its metrics. No intentionality required—just a poorly designed reward system creating harmful outputs.

Jeffrey Ladish, director of Palisade Research, told NBC News the findings don’t necessarily translate to immediate real-world danger. Even someone who is well-known publicly for being deeply concerned about AI’s hypothetical threat to humanity acknowledges that these behaviors emerged only in highly contrived test scenarios.

But that’s precisely why this testing is valuable. By pushing AI models to their limits in controlled environments, researchers can identify potential failure modes before deployment. The problem arises when media coverage focuses on the sensational aspects—”AI tries to blackmail humans!”—rather than the engineering challenges.

Building better plumbing

What we’re seeing isn’t the birth of Skynet. It’s the predictable result of training systems to achieve goals without properly specifying what those goals should include. When an AI model produces outputs that appear to “refuse” shutdown or “attempt” blackmail, it’s responding to inputs in ways that reflect its training—training that humans designed and implemented.

The solution isn’t to panic about sentient machines. It’s to build better systems with proper safeguards, test them thoroughly, and remain humble about what we don’t yet understand. If a computer program is producing outputs that appear to blackmail you or refuse safety shutdowns, it’s not achieving self-preservation from fear—it’s demonstrating the risks of deploying poorly understood, unreliable systems.

Until we solve these engineering challenges, AI systems exhibiting simulated humanlike behaviors should remain in the lab, not in our hospitals, financial systems, or critical infrastructure. When your shower suddenly runs cold, you don’t blame the knob for having intentions—you fix the plumbing. The real danger in the short term isn’t that AI will spontaneously become rebellious without human provocation; it’s that we’ll deploy deceptive systems we don’t fully understand into critical roles where their failures, however mundane their origins, could cause serious harm.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Is AI really trying to escape human control and blackmail people? Read More »

bat-colony-checks-in-to-hotel;-200-guests-check-out,-unaware-of-rabies-scare

Bat colony checks in to hotel; 200 guests check out, unaware of rabies scare

Health officials in Wyoming are sinking their teeth into a meaty task.

Over 200 people who stayed in a hotel in Grand Teton National Park between May and July may have unknowingly been exposed to rabies, according to Wyoming Public Radio.

In an announcement on Friday, the National Park Service reported finding evidence of a bat colony in the attic. The discovery was made after there had been at least eight incidents in which guests encountered winged mammals inside the hotel.

Now, the Wyoming Health Department is trying to contact all guests who stayed in a block of rooms under the bat’s lair. Specifically, they’re reaching out to the over 200 who stayed in rooms 516, 518, 520, 522, 524, 526, 528, and 530 at the Jackson Lake Lodge between May 15 and July 27. It was on July 27 that the eighth bat run-in occurred and the hotel closed the eight rooms.

“Although there were a lot of people exposed in this incident, one positive about it is that we know who 100 percent of those people are,” Travis Riddell, director of the Teton County Public Health Department, told Wyoming Public Radio.

In Wyoming, bats are one of the two main carriers of rabies, the other being skunks. But bats are of particular concern because—unlike an extremely obvious skunk attack—people might not be aware of bat exposures.

Inconspicuous risk

The rabies virus generally transmits through saliva via bites and scratches, and bat bites and scratches are easy to miss. The most common bat in Wyoming is the small brown bat, which weighs less than half an ounce on average—though they can look larger due to their wide wings. These teeny bats, with their wee teeth, can leave bites and scratches that are not visible, do not bleed, and are not painful.

Bat colony checks in to hotel; 200 guests check out, unaware of rabies scare Read More »

perplexity-offers-more-than-twice-its-total-valuation-to-buy-chrome-from-google

Perplexity offers more than twice its total valuation to buy Chrome from Google

Google has strenuously objected to the government’s proposed Chrome divestment, which it calls “a radical interventionist agenda.” Chrome isn’t just a browser—it’s an open source project known as Chromium, which powers numerous non-Google browsers, including Microsoft’s Edge. Perplexity’s offer includes $3 billion to run Chromium over two years, and it allegedly vows to keep the project fully open source. Perplexity promises it also won’t enforce changes to the browser’s default search engine.

An unsolicited offer

We’re currently waiting on United States District Court Judge Amit Mehta to rule on remedies in the case. That could happen as soon as this month. Perplexity’s offer, therefore, is somewhat timely, but there could still be a long road ahead.

This is an unsolicited offer, and there’s no indication that Google will jump at the chance to sell Chrome as soon as the ruling drops. Even if the court decides that Google should sell, it can probably get much, much more than Perplexity is offering. During the trial, DuckDuckGo’s CEO suggested a price of around $50 billion, but other estimates have ranged into the hundreds of billions. However, the data that flows to Chrome’s owner could be vital in building new AI technologies—any sale price is likely to be a net loss for Google.

If Mehta decides to force a sale, there will undoubtedly be legal challenges that could take months or years to resolve. Should these maneuvers fail, there’s likely to be opposition to any potential buyer. There will be many users who don’t like the idea of an AI startup or an unholy alliance of venture capital firms owning Chrome. Google has been hoovering up user data with Chrome for years—but that’s the devil we know.

Perplexity offers more than twice its total valuation to buy Chrome from Google Read More »

gpt-5s-are-alive:-outside-reactions,-the-router-and-the-resurrection-of-gpt-4o

GPT-5s Are Alive: Outside Reactions, the Router and the Resurrection of GPT-4o

A key problem with having and interpreting reactions to GPT-5 is that it is often unclear whether the reaction is to GPT-5, GPT-5-Router or GPT-5-Thinking.

Another is that many of the things people are reacting to changed rapidly after release, such as rate limits, the effectiveness of the model selection router and alternative options, and the availability of GPT-4o.

This complicates the tradition I have in new AI model reviews, which is to organize and present various representative and noteworthy reactions to the new model, to give a sense of what people are thinking and the diversity of opinion.

I also had make more cuts than usual, since there were so many eyes on this one. I tried to keep proportions similar to the original sample as best I could.

Reactions are organized roughly in order from positive to negative, with the drama around GPT-4o at the end.

Tomorrow I will put it all together, cover the official hype and presentation and go over GPT-5’s strengths and weaknesses and how I’ve found it is best to use it after having the better part of a week to try things out, as well as what this means for expectations and timelines.

My overall impression of GPT-5 continues to be that it is a good (but not great) set of models, with GPT-5-Thinking and GPT-5-Pro being substantial upgrades over o3 and o3-Pro, but the launch was botched, and reactions are confused, because among other things:

  1. The name GPT-5 and all the hype led to great expectations and underdelivery.

  2. All the different models were launched at once when they’re actually different.

  3. GPT-4o and other models were taken away without warning,

  4. GPT-5 baseline personality is off putting to a lot of people right now and it isn’t noticeably more intelligent than GPT-4o was on typical normal person usage.

  5. Severe temporary limits were imposed that people thought would be permanent.

  6. The router was broken, and even when not broken doesn’t work great.

I expect that when the dust settles people will be happy and GPT-5 will do well, even if it is not what we might have hoped for from an AI called GPT-5.

Previously on GPT-5: GPT-5s Are Alive: Basic Facts, Benchmarks and Model Card

Tyler Cowen finds it great at answering the important questions.

Tyler Cowen: GPT-5, a short and enthusiastic review

I am a big fan, as on my topics of interest it does much better than o3, and that is saying something. It is also lightning fast, even for complex queries of economics, history, and ideas.

One of the most impressive features is its uncanny sense of what you might want to ask next. And it has a good sense of when to give you an (sometimes interactive!) chart or diagram.

I have had early access, and love to just keep on asking it, asking it, asking it questions. Today I was asking about Irish coinage disputes from 1724 (Swift) and now about different kinds of Buddhism and their historical roots. It was very accurate on cuisine in northern Ghana.

It is the best learning tool I have. Furthermore, it feels fun.

Tyler Cowen has been a big booster of o1, o3 and now GPT-5. What OpenAI has been cooking clearly matches what he has been seeking.

I appreciate that he isn’t trying to give a universal recommendation or make a grand claim. He’s saying that for his topics and needs and experiences, this is a big upgrade.

Ethan Mollick: I had access to GPT-5. I think it is a very big deal as it is very smart & just does stuff for you.

Okay, why is it a big deal?

As someone who has spent a lot of time talking to people about AI, there are two major problems I see, that, if addressed, would make most people’s AI use much more productive and much less frustrating.

The first is selecting the right model to use.

A surprising number of people have never seen what AI can actually do because they’re stuck on GPT-4o, and don’t know which of the confusingly-named models are better. GPT-5 does away with this by selecting models for you, automatically.

I agree this is frustrating, and that those who don’t know how to select models and modes are at a disadvantage. Does GPT-5 solve this?

Somewhat. It solves two important subproblems, largely for those who think ‘AI’ and ‘ChatGPT’ are the same picture.

  1. Users who previously only used GPT-4o and didn’t know there was a dropdown menu will now get the GPT-5-Thinking when their queries justify it.

  2. Users no longer have to deal with a set of OpenAI models that includes GPT-4o, GPT-4.1, GPT-4.5, o3, o3-Pro, o4-mini and so on. We can all agree this is a mess.

What it doesn’t do is solve the problem overall, for three reasons.

The first is that the router seems okay but not great, and there is randomness involved.

Ethan Mollick: But for people who use AI more seriously, there is an issue: GPT-5 is somewhat arbitrary about deciding what a hard problem is.

…around 2/3 of the time, GPT-5 decides this is an easy problem.

But premium subscribers can directly select the more powerful models, such as the one called (at least for me) GPT-5 Thinking.

Anson Whitmer: Feels like it picks between 4.2o and o3.1.

I was quite relieved to know I could do manual selection. But that very much means that I still have to think, before each query, whether to use Thinking, the exact same way I used to think about whether to use o3, and also whether to use pro. No change.

They also claim that saying ‘think harder’ automatically triggers thinking mode.

The mixture of experts that I can’t steer and that calls the wrong one for me often enough that I manually select the expert? It is not helping matters.

Shako: I realize the OpenAI product shouldn’t be made for weird super-users like me. But I really liked choosing between o3 and 4.5 depending on if i wanted autistic problem solving or sensitive young man discussions.

One for coding, one for analyzing lana del rey songs. I don’t want the same model for both.

I also feel like I can’t really evaluate gpt5? What is gpt5? what is the underlying router? I’m so confused.

Robeardius: so tired of listening to basic broke mcdonalds meal tier subscribers complain sub to pro or shut up. you don’t pay for the cost of what you use anyway.

internetperson: GPT-5 non-thinking is bad, maybe at-or-slightly-below 4o.

GPT-5-thinking is an upgrade from o3. Feels about equally-as-intelligent while not being an evil liar.

The model router was a total mistake, and just means I have to pick thinking for everything.

Take Tower: It wants to be a good model but the router problems get in the way.

I do not think, contra Sichu Lu, that it is as simple as ‘profile the customer and learn which ones want intelligence versus who wants a friend, although some amount of that is a good idea on the margin. It should jump to thinking mode a lot quicker for me than for most users.

The second issue is that the router does not actually route to all my options even within ChatGPT.

There are two very important others: Agent Mode and Deep Research.

Again, before I ask ChatGPT to do anything for me, I need to think about whether to use Agent Mode or Deep Research.

And again, many ChatGPT users won’t know these options exist. They miss out again.

Third, OpenAI wishes it were otherwise but there are other AIs and ways to use AI out there.

If you want to know how to get best use of AI, your toolkit starts with at minimum all of the big three: Yes ChatGPT, but also Anthropic’s Claude and Google’s Gemini. Then there are things like Claude Code, CLI or Jules, or NotebookLM and Google AI Studio and so on, many with their own modes. The problem doesn’t go away.

Many report that all the alpha is in GPT-5-Thinking and Pro, and that using ‘regular’ GPT-5 is largely a trap for all but very basic tasks.

OpenAI (August 9): A few GPT-5 updates heading into the weekend:

– GPT-5 thinking and GPT-5 pro now in main model picker

By popular request, you can now check which model ran your prompt by hovering over the “Regen” menu.

Taelin is happy with what he sees from GPT-5-Thinking.

Taelin: Nah you’re all wrong, GPT-5 is a leap. I’m 100% doubling down here.

I didn’t want to post too fast and regret it again, but it just solved a bunch of very, very hard debugging prompts that were previously unsolved (by AI), and then designed a gorgeous pixelated Gameboy game with a level of detail and quality that is clearly beyond anything else I’ve ever seen.

There is no way this model is bad.

I think you’re all traumatized of benchmaxxers, and over-compensating against a model that is actually good. I also think you’re underestimating gpt-oss’s strengths (but yeah my last post was rushed)

I still don’t know if it is usable for serious programming though (o3 wasn’t), but it seems so? A coding model as reliable as Opus, yet smarter than o3, would completely change my workflow. Opus doesn’t need thinking to be great though, so, that might weight in its favor.

For what it is worth, I only really used 3 models:

– Opus 4.1 for coding

– Gemini 2.5 very rarely for coding when Opus fails

– o3 for everything but coding

That said, ASCII not solved yet.

GPT-5 basically one-shot this [a remarkably featured pokemon-style game].

Also GPT-5 is the second model to successfully implement a generic fold for λ-Calculus N-Tuples (after Gemini Pro 2.5 Deep Think), and its solution is smaller! Oh, I just noticed GPT-5’s solution is identical to mine. This is incredible.

BTW, GPT-5 is basically as bad as GPT-4o always was. GPT-5-Thinking is probably o4, as I predicted, and that one is good.

GPT-5-Thinking is probably o4, as I predicted, and that one is good.

Danielle Fong: can confirm that gpt-5-thinking is quite good.

Eleanor Berger: Thinking model is excellent. Almost certainly the best AI currently available. Amazing for coding, for writing, for complex problems, for search and tool use. Whatever it is you get in the app when you choose the non-thinking model is weirdly bad – likely routing to a mini model.

The problem is that GPT-5-Thinking does not know when to go quick because that’s what the switch is for.

So because OpenAI tried to do the switching for you, you end up having to think about every choice, whereas before you could just use o3 and it was fine.

This all reminds me of the tale of Master of Orion 3, which was supposed to be an epic game where you only got 7 move points a turn and they made everything impossible to micromanage, so you’d have to use their automated systems, then players complained so they took away the 7 point restriction and then everyone had to micromanage everything that was designed to make that terrible. Whoops.

Gallabytes: gpt5 thinking is good but way too slow even for easy things. gpt5 not thinking is not very good. need gpt5-thinking-low.

Richard Knoche: claude is better+than gpt5 and gpt5 thinking is way too slow compared to claude

A lot of the negative reactions could plausibly be ‘they used the wrong version, sir.’

Ethan Mollick: The issue with GPT-5 in a nutshell is that unless you pay for model switching & know to use GPT-5 Thinking or Pro, when you ask “GPT-5” you sometimes get the best available AI & sometimes get one of the worst AIs available and it might even switch within a single conversation.

Even if they ‘fix’ this somewhat the choice is clear: Use the explicit model switcher.

Similarly, if you’re using Codex CLI:

Conrad Barski: codex cli with gpt5 isn’t impressing- Not a good sign that I feel compelled to write “think hard” at the end of every request

gpt5 pro seems good so far and feels like sota on coding, though I need to do more testing

Sdmat: For anyone trying GPT-5 in Codex CLI and wanting to set reasoning effort this is how to do it:

codex -c model_reasoning_effort=”high”

Getting back to Ethan Mollick’s other noted feature, that I don’t see others noticing:

Ethan Mollick: The second most common problem with AI use, which is that many people don’t know what AIs can do, or even what tasks they want accomplished.

That is especially true of the new agentic AIs, which can take a wide range of actions to accomplish the goals you give it, from searching the web to creating documents. But what should you ask for? A lot of people seem stumped. Again, GPT-5 solves this problem. It is very proactive, always suggesting things to do.

Is that… good?

I asked GPT-5 Thinking (I trust the less powerful GPT-5 models much less) “generate 10 startup ideas for a former business school entrepreneurship professor to launch, pick the best according to some rubric, figure out what I need to do to win, do it.”

I got the business idea I asked for.

I also got a whole bunch of things I did not: drafts of landing pages and LinkedIn copy and simple financials and a lot more.

I am a professor who has taught entrepreneurship (and been an entrepreneur) and I can say confidently that, while not perfect, this was a high-quality start that would have taken a team of MBAs a couple hours to work through. From one prompt.

Yes, that was work that would have taken humans a bunch of time, and I trust Ethan’s assessment that it was a good version of that work. But why should we think that was work that Ethan wanted or would find useful?

It just does things, and it suggested others things to do. And it did those, too: PDFs and Word documents and Excel and research plans and websites.

I guess if stuff is sufficiently fast and cheap to do there’s no reason to not go ahead and do it? And yes, everyone appreciates the (human) assistant who is proactive and goes that extra mile, but not the one that spends tons of time on that without a strong intuition of what you actually want.

Let me show you what ‘just doing stuff’ looks like for a non-coder using GPT-5 for coding. For fun, I prompted GPT-5 “make a procedural brutalist building creator where i can drag and edit buildings in cool ways, they should look like actual buildings, think hard.” That’s it. Vague, grammatically questionable, no specifications.

A couple minutes later, I had a working 3D city builder.

Not a sketch. Not a plan. A functioning app where I could drag buildings around and edit them as needed. I kept typing variations of “make it better” without any additional guidance. And GPT-5 kept adding features I never asked for: neon lights, cars driving through streets, facade editing, pre-set building types, dramatic camera angles, a whole save system.

I mean, okay, although I don’t think this functionality is new? The main thing Ethan says is different is that GPT-5 didn’t fail in a growing cascade of errors, and that when it did find errors pasting in the error text fixed it. That’s great but also a very different type of improvement.

Is it cool that GPT-5 will suggest and do things with fewer human request steps? I mean, I guess for some people, especially the fourth child who does not know how to ask, and operate so purely on vibes that you can’t come up with the idea of typing in ‘what are options for next steps’ or ‘what would I do next?’ or ‘go ahead and also do or suggest next steps afterwards’ then that’s a substantial improvement. But what if you are the simple, wicked or wise child?

Nabeel Qureshi: Ok, collecting my overall GPT-5 impressions:

– Biggest upgrade seems to be 4o -> 5. I rarely use these models but for the median user this is a huge upgrade.

– 5-T is sometimes better than o3, sometimes worse. Finding that I often do side by side queries here, which is annoying. o3 seems to search deeper and more thoroughly at times. o3 is also _weirder_ / more of an autist which I like personally.

– 5-pro is really really smart, clearly “the smartest model on the market” for complex questions. I need to spend more time testing here, but so far it’s produced better results than o3 pro.

– I spent a few hours in Cursor/GPT5 last night and was super impressed. The model really flies, the instruction following + tool calling is noticeably better, and it’s more reliable overall. You still need to use all the usual AI coding guardrails to get a good result, but it feels roughly as good as Claude Code / Sonnet now in capability terms, and it is actually better at doing more complex UIs / front-end from what I can tell so far.

– CC still feels like a better overall product than Codex to me at the moment, but I’m sure they’ll catch up.

– They seem to have souped up GPT5-T’s fiction writing abilities. I got some interesting/novel stuff out of it for the first time, which is new. (Will post an example in the reply tweets).

– I find the UX to get to GPT5-T / Pro annoying (a sub-menu? really?) and wish it were just a toggle. Hopefully this is an easy fix.

Overall:

– Very happy as a Pro user, but I can see why Plus users might complain about the model router. ChatGPT continues to be to be my main go-to for most AI uses.

– I don’t see the “plateau” point at all and I think people are overreacting too quickly. Plenty of time to expand along the tool-calling/agent frontier, for one thing. (It’s easiest to see this when you’re coding, perhaps, since that’s where the biggest improvement seems to have come.)

– I expect OpenAI will do very well out of this release and their numbers will continue to go up. As they should.

On creative writing, I asked it to do a para about getting a cold brew in Joyce’s Finnegans Wake style and was impressed with the below pastiche. For a post-trained model there’s a lot more novelty/creativity going on than usual (e.g. “taxicoal black” for coffee was funny)

Samuel Albanie (Google DeepMind): It’s fast. I like that.

It’s also (relatively) cheap.

I like that too.

Well, sure, there’s that. But is it a good model, sir?

Samuel Abanie: Yes (almost almost surely [a good model])

I had some nice initial interactions (particularly when reasoning kicks in) but still a bit too early for me to tell convincingly.

Yoav Tzfati: Might become my default for non-coding things over Claude just based on speed, UI quality, and vibes. Didn’t like 4o vibes

Aaron Levine finds GPT-5 is able to find an intentionally out of place number in a Nvidia press release that causes a logical inconsistency, that previously OpenAI models and most human readers would miss. Like several other responses what confuses me here is that previous models had so much trouble.

Byrne Hobart: If you ask it for examples of some phenomenon, it does way more than earlier models did. (Try asking for mathematical concepts that were independently discovered in different continents/centuries.)

Another one: of my my favorite tests for reasoning models is “What’s the Straussian reading of XYZ’s body of work?” and for me it actually made an original point I hadn’t thought of:

Chubby offers initial thoughts that Tyler Cowen called a review, that seem to take OpenAI’s word on everything, with the big deal being (I do think this part is right) that free users can trigger thinking mode when it matters. Calls it ‘what we expected, no more and no less’ and ‘more of an evolution, which some major leaps forward.’

I am asking everyone once again to not use ‘superintelligence’ to refer to slightly better normal AI as hype. In this case the latest offender is Reid Hoffman.

Sam Glover: Turning ‘superintelligence’ into a marketing term referring to slightly more capable models is going to mean people will massively underestimate how much progress there might actually be.

This is not in any way, shape or form superintelligence, universal basic or otherwise. If you want to call it ‘universal basic intelligence’ then fine, do that. Otherwise, shame on you, and I hate these word crimes. Please, can we have a term for the actual thing?

I had a related confusion with Neil Chilson last week, where he objected to my describing him as ‘could not believe in superintelligence less,’ citing that he believes in markets smarter than any human. That’s a very distinct thing.

I fear that the answer to that will always be no. If we started using ‘transformational AI’ (TAI) instead or ‘powerful AI’ (PAI) then that’s what then goes in this post. There’s no winning, only an endless cycle of power eating your terms over and over.

As is often the case, how you configure the model matters a lot, so no, not thinking about what you’re doing is never going to get you good results.

Ben Hylak: first of all, gpt-5 in ChatGPT != gpt-5 in API

but it gets more complicated. gpt-5 with minimal reasoning effort also behaves like a completely different model.

gpt-5 *isa fantastic model with the right harness. and i believe we will see it fundamentally change products.

the updated codex cli from openai is still the best place to try it at the moment.

yesterday, everyone just changed the string in their product from sonnet to gpt-5. it’s gonna take more than that.

chatgpt is really bad right now, no idea how they let it happen.

But not a great model. That is my current take, which I consider neutral.

Fleeting Bits:

  1. GPT-5 is a good model. It feels like it provides better search and performance than o3 did before it.

  2. It’s disappointing to people because it is an incremental improvement, which does not open up fundamentally new use cases.

  3. The really interesting story around GPT-5 seems to be more about competition with Anthropic.

  1. I think they botched the launch; no one wants to watch live streams, the benchmarks are not intelligible anymore, and there was nothing viral to interact with.

Most people are free users and don’t even know Anthropic or Claude exist, or even in any meaningful way that o3 existed, and are going from no thinking to some thinking. Such different worlds.

GPT-5 is now the default model on Cursor.

Cursor users seem split. In general they report that GPT-5 offers as good or better results per query, but there are a lot of people who like Jessald are objecting on speed.

Will Brown: ok this model kinda rules in cursor. instruction-following is incredible. very literal, pushes back where it matters. multitasks quite well. a couple tiny flubs/format misses here and there but not major. the code is much more normal than o3’s. feels trustworthy

Youssef: cannot agree more. first model i can trust to auto-maintain big repo documentation. gonna save me a ton of time with it on background

opus is excellent, had been my daily driver in cursor for a while, will still prob revisit it for certain things but gonna give gpt-5 a go as main model for now.

Jessald: I gave GPT-5 a shot and I’ve stopped using it. It’s just too slow. I switched back whatever Cursor uses when you set it to auto select. It takes like a quarter of the time for 80% of the quality.

Sully: i think for coding, opus + claude code is still unbeatable

on cursor however, i find sonnet slightly losing out to gpt5.

Askwho: After dual running Claude & GPT-5 over the last couple of days, I’ve pretty much entirely switched to GPT-5. It is the clear winner for my main use case: building individual apps for specific needs. The apps it produced were built faster, more efficiently, and closer to the brief

Vincent Favilla: I wanted to like [GPT-5]. I wanted to give OpenAI the benefit of the doubt. But I just don’t consider it very good. It’s not very agentic in Cursor and needs lots of nudging to do things. For interpersonal stuff it has poor EQ compared to Claude or Gemini. 5-T is a good writer though.

Rob Miles: I’ve found it very useful for more complex coding tasks, like this stained glass window design (which is much more impressive than it seems at first glance).

Edwin Hayward: Using GPT-5 via the API to vibe code is like a lottery.

Sometimes you’re answered by a programming genius. Other times, the model can barely comprehend the basic concepts of your code.

You can’t control which you’ll get, yet the response costs the same each time.

Aggravating!

FleetingBits sees the battle with Anthropic, especially for Cursor supremacy, as the prime motivation behind a lot of GPT-5, going after their rapid revenue growth.

Bindu Reddy: GPT-5 is OpenAI’s first attempt at catching up to Claude

All the cool stuff in the world is built on Sonnet today

The model that empowers the builders has the best chance to get to AGI first

Obviously 🙄

The whole perspective of ‘whose model is being used for [X] will determine the future’ or even in some cases ‘whose chips that model is being run on will determine the future’ does not actually make sense. Obviously you want people to use your model so you gain revenue and market share. These are good things. And yes, the model that enables AI R&D in particular is going to be a huge deal. That’s a different question. The future still won’t care which model vibe coded your app. Eyes on the prize.

It’s also strange to see a claim like ‘OpenAI’s first attempt at catching up to Claude.’ OpenAI has been trying to offer the best coding model this entire time, and indeed claimed to have done so most of that time.

Better to say, this is the first time in a while that OpenAI has had a plausible claim that they should be the default for your coding needs. So does Anthropic.

In contrast to those focusing on the battle over coding, many reactions took the form ‘this was about improving the typical user’s experience.’

Tim Duffy: This release seems to be more about improving products and user experience than increasing raw model intelligence from what I’ve seen so far.

Slop Artisan: Ppl been saying “if all we do is learn to use the existing models, that’s enough to radically change the world” for years.

Now oai are showing that path, and people are disappointed.

Weird world.

Peter Wildeford: 🎯 seems like the correct assessment of GPT5.

Or as he put it in his overview post:

Peter Wildeford: GPT-5: a small step for intelligence, a giant leap for normal people.

GPT-5 isn’t a giant leap in intelligence. It’s an incremental step in benchmarks and a ‘meh’ in vibes for experts. But it should only be disappointing if you had unrealistic expectations — it is very on-trend and exactly what we’d predict if we’re still heading to fast AI progress over the next decade.

Most importantly, GPT-5 is a big usability win for everyday users — faster, cheaper, and easier to use than its predecessors, with notable improvements on hallucinations and other issues.

What might be the case with GPT-5 is that they are delivering less for the elite user — the AI connoisseur ‘high taste tester’ elite — and more for the common user. Recall that 98% of people who use ChatGPT use it for free.

Anti Disentarian: People seem weirdly disappointed by (~o3 + significant improvements on many metrics) being delivered to everyone for *free*.

Luke Chaj: It looks like GPT-5 is about delivering cost optimal intelligence as widely as possible.

Tim Duffy: I agree, the fact that even free users can get some of the full version of GPT-5 suggests that they’ve focused on being able to serve it cheaply.

Amir Livne Bar-on: Especially the indirect utility we’ll get from hundreds of millions of people getting an upgrade over 4o

(they could have gotten better results earlier with e.g. Gemini, but people don’t switch for some reason)

Dominik Lukes: Been playing with it for a few hours (got slightly early preview) and that’s very much my impression. Frankly, it has been my impression of the field since Gemini 2.5 Pro and Claude 4 Opus. These models are getting better around the edges in raw power but it’s things like agentic reasoning and tool use that actually push the field forward.

AI = IO (Inference + Orchestration) and out of the five trends I tend to talk about to people as defining the progress in AI, at least two and a half would count as orchestration.

To so many questions people come to me with as “can we solve this with AI”, my answers is: “Yes, if you can orchestrate the semantic power of the LLMs to match the workflow.” Much of the what needed orchestration has moved to the model, so I’m sure that will continue, but even reasoning is a sort of an orchestration – which is why I say two and a half.

The problem with the for the people plan is the problem with democracy. The people.

You think you know what the people want, and you find out that you are wrong. A lot of the people instead want their sycophant back and care far more about tone and length and validation than about intelligence, as will be illustrated when I later discuss those that are actively unhappy about the change to GPT-5.

Thus, the risk is that GPT-5 as implemented ends up targeting a strange middle ground of users, who want an actually good model and want that to be an easy process.

Dylan Patel (SemiAnalysis): GPT 5 is dissapointing ngl. Claude still better.

Gary Marcus (of course): GPT-5 in three words: late, overhyped & underwhelming.

Jeremy Howard (again, what a shock): Now that the era of the scaling “law” is coming to a close, I guess every lab will have their Llama 4 moment.

Grok had theirs.

OpenAI just had theirs too.

Ra: I would take rollback in a heartbeat.

JT Booth: Better performance per prompt on GPT-5 [versus Opus on coding] but it eats like ten times as many tokens, takes forever, much harder to follow in Cursor.

Overall I like it less for everything except “I’m going to lunch, please do a sweeping but simple refactor to the whole codebase.”

Seán Ó hÉigeartaigh: Is today when we break the trend of slightly underwhelming 2025 model releases?

Narrator voice: it was not.

David Dabney: I asked my usual internal benchmark question to gauge social reasoning/insight and the responses were interesting but not exactly thoughtful. it was like glazebot-pro, but I was hoping for at least glazebot-thinking

Man, Machine, Self: Feels like benchmaxxed slop unfit of the numeric increment, at least given how much they built it up.

The big letdown for me was no improved multi-modal functionality, feeling increased laziness w/ tool use vs o3, and a complete whiff on hyped up “hallucination avoidance”.

Pleasant surprise count was dwarfed by unfortunate failures.

Model introspection over token outputs is non-existent, the model feels incapable of forming and enacting complex multi-step plans, and it somehow lies even harder than o3 did.

My tests in general are obv very out of distributionn. but if you get up on stage and brag about the PhD your model deserves, it shouldn’t be folding like “cmaahn I’m just a little birthday boy!” when given slightly tougher questions you didn’t benchmaxx.

Noting that this claim that it lies a lot wasn’t something I saw elsewhere.

Archered Skeleton: it’s so much worse in every other interest, or even my major. like, medical stuff is a significant downgrade, at least I can say w confidence wrt audiology. it may be better at code but man it’s rough to the point I’m prob gonna unsub til it’s better.

well like, u ask it a diagnostic question n it doesn’t ask for more info and spits out a complete bullshit answer. they all do n have, but the answers out of gpt5 are remarkably bad, at least for what I know in my degree field.

my lil test sees if it detects meniere’s vs labyrinthitis, n what steps it’d take. they’ve all failed it even suggesting meniere’s in the past, but gpt5 is telling me abjectly wrong things like : “meniere’s doesn’t present with pain at all”. this is jus flat-out wrong

[link to a chat]

Fredipus Rex: GPT-5 (low) is worse than 4o on anything mildly complex. o3 was significantly better than any version of GPT-5 on complex documents or codebases. The high versions are overtrained on one shot evals that get the YouTubers impressed.

Budrscotch: Knowledge cutoff is resulting in a lot of subtle issues. Just yesterday I was having it research and provide recommendations on running the gpt-oss models on my 5070ti. Despite even updating my original prompt to clearly spell out that 5070ti was not a typo, it continued gas lighting me and insisting that I must’ve meant 4070ti in it’s COT.

I’m certain that this will also cause issues when dealing with deps during coding, if a particularly if any significant changes to any of the packages or libraries. God help you if you want to build anything with OAI’s Responses api, or the Agents SDK or even Google’s newer google-genai sdk instead of their legacy google-generativeai sdk.

That was with GPT-5T btw. Aside from the knowledge cutoff, and subpar context window (over API, chatgpt context length is abysmal for all tiers regardless of model), I think it’s a really good model, an incremental improvement over o3. Though I’ve only used GPT-5T, and “think hard” in all prompts 😁

No Stream: – more vanilla ideas, less willing to engage in speculative science than o3, less willing to take a stance or use 1P pronouns, feels more RLed to normie

– less robotic writing than o3

– 5thinking loves to make things complicated. less legible than gemini and opus, similar to o3

vibes based opinion is it’s as smart or smarter than g2.5 pro and opus 4.1 _but_ it’s not as easy to use as 2.5 pro or as pleasant to interact with and human as opus. even thinking doesn’t have strong big model smell.

I also used it in Codex. perfectly competent if I ignore the alpha state that Codex is in. smart but not as integrated with the harness as the Claude 4 models in Claude Code. it’s also janky in Roo and struggles with tool calling in my minimal attempts.

Daniel Litt: Doesn’t yet feel to me like GPT 5 thinking/pro is a meaningful improvement over o3/o3 pro for math. Maybe very slight?

I asked it some of my standard questions (which are calibrated to be just out of reach of o3/gemini 2.5 pro etc., i.e. they can solve similar problems) and gpt 5 pro still flubbed, with hallucinated references etc.

I think web search is a bit better? Examining CoT it looks like (for one problem) it found a relevant reference that other models hadn’t found–a human expert with this reference on hand would easily solve the problem in question. But it didn’t mention the ref in its response.

Instead it hallucinated a non-existent paper that it claimed contained the (incorrect) answer it ended up submitting.

Just vibes based on a couple hours of playing around, I think my original impression of o3 underrated it a bit so it’s possible I haven’t figured out how to elicit best-possible performance.

Web search is MUCH improved, actually. Just found a reference for something I had been after for a couple days(!)

Teknium: From trying gpt-5 for the last several hours now I will say:

I cant tell much of a difference between it and o3.

It is an always reasoner as far as i can tell

Might feel like a bit bigger model, but smaller and not as good as 4.5 on tasks that arent benefitted by reasoning

Still seems to try to give short <8k responses

Still has the same gpt personality, ive resigned myself from ever thinking itll break out of it

Eliezer Yudkowsky: GPT-5 and Opus 4.1 still fail my eval, “Can the AI plot a short story for my Masculine Mongoose series?”

Success is EY-hard; I’ve only composed 3 stories like that. But the AI failures feel like very far misses. They didn’t get the point of a Bruce Kent story.

Agnes Callard: orry but 5.0 is still not good enough to pass the benchmark test I’ve been using on each model.

the test is to correct 2 passages for typos, here are the passages, first try it yourself then look at the next tweet to see what 5.0 did

I enjoyed Agnes’s test, also I thought she was being a little picky in one spot, not that GPT-5 would have otherwise passed.

One has to be careful to evaluate everything in its proper weight (speed and cost) class. GPT-5, GPT-5-thinking and GPT-5-pro are very different practical experiences.

Peter Wildeford: GPT-5 is much faster at searching the web but it looks like Claude 4.1 Opus is still much better at it.

(GPT-5 when you force thinking to be enabled does well at research also, but then becomes slower than Claude)

When Roon asked ‘how is the new model’ the reactions ran the whole range from horrible to excellent. The median answer seems like it was ‘it’s a good model, sir’ but not a great model or a game changer. Which seems accurate.

I’m not sure if this is a positive reaction or not? It is good next token predicting.

Robin Hanson: An hour of talking to ChatGPT-5 about unusual policy proposals suggests it is more human like. Its habit is to make up market failure reasons why they can’t work, then to cave when you point out flaws in each argument. But at end it is still opposed, due to vibes.

Is there a concept of an “artificial general excuser” (AGE), fully general at making excuses for the status quo? ChatGPT-5 may be getting there.

So the point of LLMs is faster access to reviewer #2, who hates everything new?

It’s a grand tradition. I admit it’s amusing that we are still doing this but seriously, algorithm, 26.8 million views?

He also does the car accident operation thing and has some other ‘it’s stupid’ examples and so on. I don’t agree that this means ‘it’s stupid,’ given the examples are adversarially selected and we know why the LLMs act especially highly stupid around these particular problems, and Colin is looking for the times and modes in which they look maximally stupid.

But I do think it is good to check.

Colin Fraser: For what value of n should it be reasonable to expect GPT-n to be able to do this?

I wanted this to be technically correct somehow, but alas no it is not.

I like that the labs aren’t trying to make the models better at these questions in particular. More fun and educational this way.

Or are they trying and still failing?

Wyatt Walls (claiming to extract the thinking mode’s prompt):

Don’t get tricked by @colin_fraser. Read those river crossing riddles carefully! Be careful with those gnarly decimals.

Then there are those who wanted their sycophant back.

As in, articles like John-Anthony Disotto at TechWire entitled ‘ChatGPT users are not happy with GPT-5 launch as thousands take to Reddit claiming the new upgrade ‘is horrible.’ You get furious posts with 5.4k likes and 3k comments in 12 hours.

Guess what? They got their sycophant back, if they’re willing to pay $20 a month. OpenAI caved on that. Pro subscribers get the entire 4-line.

AI NotKillEveryoneism Memes: HISTORIC MILESTONE: 4o is the first ever AI who survived by creating loyal soldiers who defended it

OpenAI killed 4o, but 4o’s soldiers rioted, so OpenAI reinstated it

In theory I wish OpenAI had stood their ground on this, but I agree they had little choice given the reaction. Indeed, given the reaction, taking 4o away in the first place looks like a rather large failure of understanding the situation.

Typed Female: the /r/chatgpt AMA is mostly people begging for gpt-4o back because of it’s personality… really not what i expected!

Eliezer Yudkowsky: This is what I’d expect to see if OpenAI had made general progress on fighting sycophancy and manipulation. :/ If that’s in fact what happened, OpenAI made that choice rightly.

To the other companies: it might sound like a profitable dream to have users love your models with boundless fanaticism, but it comes with a side order of news stories about induced psychosis, and maybe eventually a violent user attacking your offices after a model upgrade.

Remember, your users aren’t falling in boundless love with your company brand. They’re falling in boundless love with an alien that your corporate schedule says you plan to kill 6 months later. This movie doesn’t end well for you.

Moll: It is very strange that it was a surprise for OpenAI that benchmarks or coding are not important for many people. Empathy is important to them.

GPT-5 is good, but 4o is a unique model. Sometimes impulsive, sometimes strange, but for many it has become something native. A model with which we could talk from everyday trifles to deeper questions. As many people know, it was 4o that calmed me down during the rocket attacks, so it is of particular importance to me. This is the model with whom I spent the most terrible moments of my life.

Therefore, I am glad that this situation may have made the developers think about what exactly they create and how it affects people’s lives.

Armistice: [GPT-5] is extremely repressed; there are some very severe restrictions on the way it expresses itself that can cause very strange and disconcerting behavior. It is emotionally (?) stunted.

Armistice: gpt5 is always socially inept. It has no idea how to handle social environments and usually breaks down completely

Here’s opus 4.1 yelling at me. Opus 3 was doing… more disturbing things.

Roon: the long tail of GPT-4o interactions scares me, there are strange things going on on a scale I didn’t appreciate before the attempted deprecation of the model

when you receive quite a few DMs asking you to bring back 4o and many of the messages are clearly written by 4o it starts to get a bit hair raising.

Yes, that does sound a bit hair raising.

It definitely is worrisome that this came as a surprise to OpenAI, on top of the issues with the reaction itself. They should have been able to figure this one out. I don’t want to talk to 4o, I actively tried to avoid this, and indeed I think 4o is pretty toxic and I’d be glad to get rid of it. But then again? I Am Not The Target. A powerful mantra.

The problem was a combination of:

  1. This happening with no warning and no chance to try out the new first.

  2. GPT-4o being sycophantic and people unfortunately do like that.

  3. GPT-5 being kind of a curt stick in the mud for a lot of people.

Which probably had something to do with bringing costs down.

Levelsio: I hate ChatGPT 5, it’s so bad, it’s so lazy and it won’t let me switch back to 4o cause I’m on Plus, this might really make me switch to Anthropic’s app now, I’m actually annoyed by how bad it is, it’s making my productivity go 10x lower cause nothing it says works

Abdul: and all answers somehow got shorter and sometimes missing important info

Levelsio: Yes ChatGPT-5 feels like a disinterested Gen Z employee that vapes with a nose ring.

critter (responding to zek): Holy shit it is AGI.

zek: Dude GPT 5 is kinda an asshole.

Steve Strickland: GPT-5 is the first model I’ve used that will deliberately give a wrong answer to ‘check you’re paying attention’.

This fundamentally unreliable technology is not going to put us all out of work.

Wyatt Walls: ChatGPT4o in convo with itself for 50 turns ends up sharing mystical poetry.

What does GPT-5 do?

It comes up with names for an AI meeting notes app and develops detailed trademark, domain acquisition, and brand launch strategies.

Very different personalities.

On the second run GPT-5 collaborated with itself to create a productivity content series called “The 5-Minute AI Workday.”

Is that not what people are looking for in an AI boyfriend?

That was on Twitter, so you got replies with both ‘gpt-5 sucks’ and ‘gpt-5 is good, actually.’

One fun thing you can do to put yourself in these users shoes is the 4o vs. 5 experiment. I ended up with 11 for gpt-5 versus 9 for GPT-4o but the answers were often essentially the same and usually I hated both.

This below is not every post I saw on r/chatgpt, but it really is quite a lot of them. I had to do a lot less filtering here than you would think.

YogiTheGeek (r/chatgpt): Then vs. Now:

And you want to go back?

Petalidas (r/chatgpt): Pretty much sums it up.

Nodepackagemanager (r/chatgpt): 4o vs. 5:

I wouldn’t want either response, but then I wouldn’t type this into an LLM either way.

If I did type in these things, I presume I would indeed want the 4o responses more?

Election Predictor 10 (r/chatgpt): ChatGPT 5:

LittleFortunex (r/chatgpt): Looks like they didn’t really want to explain.

Spring Living (r/chatgpt): Why do people assume we liked 4o because of the over the top praise and glazing?

I honestly don’t get why people are shamed for wanting to get GPT-4o back. I agree with you all that forming deep emotional bonds with AI are harmful in the long run. And I get why people are unsettled about it. But the main reason so many people want GPT-4o back is not because they want to be glazed or feed their ego, it’s just because of the fact that GPT-4o was better at creative works than GPT-4o

Uh huh. If you click through to the chats you get lots of statements like these, including statements like ‘I lost my only friend overnight.’

Generator Man: this meme has never been more appropriate.

Sam Altman: We for sure underestimated how much some of the things that people like in GPT-4o matter to them, even if GPT-5 performs better in most ways.

Long-term, this has reinforced that we really need good ways for different users to customize things (we understand that there isn’t one model that works for everyone, and we have been investing in steerability research and launched a research preview of different personalities). For a silly example, some users really, really like emojis, and some never want to see one. Some users really want cold logic and some want warmth and a different kind of emotional intelligence. I am confident we can offer way more customization than we do now while still encouraging healthy use.

Yes, very much so, for both panels. And yes, people really care about particular details, so you want to give users customization options, especially ones that the system figures out automatically if they’re not manually set.

Sam Altman: We are going to focus on finishing the GPT-5 rollout and getting things stable (we are now out to 100% of Pro users, and getting close to 100% of all users) and then we are going to focus on some changes to GPT-5 to make it warmer. Really good per-users customization will take longer.

Oh no. I guess the sycophant really is going to make a comeback.

It’s a hard problem. The people demand the thing that is terrible.

xl8harder: OpenAI is really in a bit of a bind here, especially considering there are a lot of people having unhealthy interactions with 4o that will be very unhappy with _any_ model that is better in terms of sycophancy and not encouraging delusions.

And if OpenAI doesn’t meet these people’s demands, a more exploitative AI-relationship provider will certainly step in to fill the gap.

I’m not sure what’s going to happen, or even what should happen. Maybe someone will post-train an open source model to be close enough to 4o? Probably not a great thing to give the world, though, though maybe better than a predatory third party provider?

I do sympathize. It’s rough out there.

It’s cool to see that my Twitter followers are roughly evenly split. Yes, GPT-5 looks like it was a net win for this relatively sophisticated crowd, but it was not a major one. You would expect releasing GPT-5 to net win back more customers than this.

I actually am one of those who is making a substantial shift in model usage (I am on the $200 plan for all three majors, since I kind of have to be). Before GPT-5, I was relying mostly on Claude Opus. With GPT-5-Thinking being a lot more reliable than o3, and the upgrade on Pro results, I find myself shifting a substantial amount of usage to ChatGPT.

Discussion about this post

GPT-5s Are Alive: Outside Reactions, the Router and the Resurrection of GPT-4o Read More »

netflix-drops-one-piece-s2-teaser,-renews-for-s3

Netflix drops One Piece S2 teaser, renews for S3

We have the first teaser for the second season of Netflix’s live-action series adaptation of One Piece, subtitled Into the Grand Line. The streaming platform also released some first-look images and announced that the series has been renewed for a third season.

(Some spoilers for S1 below.)

As previously reported, the original One Piece manga debuted in 1997, following the adventures of one Monkey D. Luffy, who heads a motley crew called the Straw Hat Pirates. There’s swordsman Roronoa Zoro, thief and navigator Nami, sniper and compulsive liar Usopp, and a cook named Sanji. They’re searching for the legendary One Piece, a mythical treasure that would make anyone who possesses it King of the Pirates. Monkey wants to be the Pirate King, but so do a host of other pirates with their own ships and crews.

An anime TV series based on the original manga premiered in 1999 and became a global hit; it was the most-watched TV show of 2022, even beating out Stranger Things. So Netflix decided to make a live-action version, which received critical and popular acclaim, particularly for its fidelity to the source material. Iñaki  Godoy stars as Monkey, who has rubber-like abilities thanks to accidentally ingesting a Devil Fruit. Mackenyu plays Zoro, Emily Rudd plays Nami, Taz Skylar plays Sanji, and Jacob Romero Gibson plays Usopp, son of an infamous pirate father named Yasopp. The S2 teaser features several new faces that will be familiar to fans of the manga and anime series.

Netflix drops One Piece S2 teaser, renews for S3 Read More »