Author name: Shannon Garcia

threat-posed-by-new-vmware-hyperjacking-vulnerabilities-is-hard-to-overstate

Threat posed by new VMware hyperjacking vulnerabilities is hard to overstate

Three critical vulnerabilities in multiple virtual-machine products from VMware can give hackers unusually broad access to some of the most sensitive environments inside multiple customers’ networks, the company and outside researchers warned Tuesday.

The class of attack made possible by exploiting the vulnerabilities is known under several names, including hyperjacking, hypervisor attack, or virtual machine escape. Virtual machines often run inside hosting environments to prevent one customer from being able to access or control the resources of other customers. By breaking out of one customer’s isolated VM environment, a threat actor could take control of the hypervisor that apportions each VM. From there, the attacker could access the VMs of multiple customers, who often use these carefully controlled environments to host their internal networks.

All bets off

“If you can escape to the hypervisor you can access every system,” security researcher Kevin Beaumont said on Mastodon. “If you can escape to the hypervisor, all bets are off as a boundary is broken.” He added: “With this vuln you’d be able to use it to traverse VMware managed hosting providers, private clouds orgs have built on prem etc.”

VMware warned Tuesday that it has evidence suggesting the vulnerabilities are already under active exploitation in the wild. The company didn’t elaborate. Beaumont said the vulnerabilities affect “every supported (and unsupported)” version in VMware’s ESXi, Workstation, Fusion, Cloud Foundation, and Telco Cloud Platform product lines.

Threat posed by new VMware hyperjacking vulnerabilities is hard to overstate Read More »

butch-wilmore-says-elon-musk-is-“absolutely-factual”-on-dragon’s-delayed-return

Butch Wilmore says Elon Musk is “absolutely factual” on Dragon’s delayed return

For what it is worth, all of the reporting done by Ars over the last nine months suggests the decision to return Wilmore and Williams this spring was driven by technical reasons and NASA’s needs on board the International Space Station, rather than because of politics.

Q. How do you feel about waking up and finding yourself in a political storm?

Wilmore: I can tell you at the outset, all of us have the utmost respect for Mr. Musk, and obviously, respect and admiration for our president of the United States, Donald Trump. We appreciate them. We appreciate all that they do for us, for human space flight, for our nation. The words they said, politics, I mean, that’s part of life. We understand that. And there’s an important reason why we have a political system, a political system that we do have, and we’re behind it 100 percent. We know what we’ve lived up here, the ins and outs, and the specifics that they may not be privy to. And I’m sure that they have some issues that they are dealing with, information that they have, that we are not privy to. So when I think about your question, that’s part of life, we are on board with it.

Q. Did politics influence NASA’s decision for you to stay longer in space?

Wilmore: From my standpoint, politics is not playing into this at all. From our standpoint, I think that they would agree, we came up prepared to stay long, even though we plan to stay short. That’s what we do in human spaceflight. That’s what your nation’s human space flight program is all about, planning for unknown, unexpected contingencies. And we did that, and that’s why we flowed right into Crew 9, into Expedition 72 as we did. And it was somewhat of a seamless transition, because we had planned ahead for it, and we were prepared.

Butch Wilmore says Elon Musk is “absolutely factual” on Dragon’s delayed return Read More »

tsmc-to-invest-$100b-as-trump-demands-more-us-made-chips,-report-says

TSMC to invest $100B as Trump demands more US-made chips, report says

Currently, TSMC only builds its most advanced chips in Taiwan. But when the most advanced US fabs are operational, they’ll be prepared to manufacture “tens of millions of leading-edge chips” to “power products like 5G/6G smartphones, autonomous vehicles, and AI datacenter servers,” the Commerce Department said in 2024.

TSMC has not confirmed the WSJ’s report but provided a statement: “We’re pleased to have an opportunity to meet with the President and look forward to discussing our shared vision for innovation and growth in the semiconductor industry, as well as exploring ways to bolster the technology sector along with our customers.”

Trump threat of semiconductor tariffs still looms

Advanced chips are regarded as critical for AI innovation, which Trump has prioritized, as well as for national security.

Without a steady supply, the US risks substantial technological and economic losses as well as potential weakening of its military.

To avert that, Trump campaigned on imposing tariffs that he claimed would drive more semiconductor manufacturing into the US, while criticizing the CHIPS Act for costing the US billions. Following through on that promise, in February, he threatened a “25 percent or more tariff” on all semiconductor imports, the WSJ reported. According to CNBC, Trump suggested those tariffs could be in effect by April 2.

“We have to have chips made in this country,” Trump said last month. “Right now, everything is made in Taiwan, practically, almost all of it, a little bit in South Korea, but everything—almost all of it is made in Taiwan. And we want it to be made—we want those companies to come to our country, in all due respect.”

While it’s unclear if Trump plans to overtly kill the CHIPS Act, his government funding cuts could trigger a future where the CHIPS Act dies with no workers left to certify that companies meet requirements for ongoing award disbursements, a semiconductor industry consultant group, Semiconductor Advisors, warned in a statement last month.

“If I were running a chip company, I would not count on CHIPS Act funding, even if I had a signed contract,” SA’s statement said.

TSMC to invest $100B as Trump demands more US-made chips, report says Read More »

driving-an-ev-restomod-that-costs-as-much-as-a-house—the-jia-chieftain

Driving an EV restomod that costs as much as a house—the JIA Chieftain

The Chieftain Range Rover is a fascinating thing—a refitted, reskinned, restored classic Range Rover is no new thing, nor is one with a ludicrous American V8 stuffed under the hood. But one that can be had as a gas car, plug-in hybrid, or as an EV? It can be all of those things depending on which boxes you tick. Ars Technica went for a spin in the EV to see how it stacks up.

The UK is something of an EV restomod hub. It’s been throwing electricity in things that didn’t come off the line electrified in the first place for years. Businesses like Electrogenic, Lunaz, and Everrati will, for a price, make an old car feel a little more peppy—depending on who you go to, it’ll come back restored as well. The Chieftain isn’t quite like them. Developed by Oxfordshire, UK, based Jensen International Automotive (the company’s bread ‘n butter is Jensen Interceptors), the Chieftain is an old Range Rover turned up to VERY LOUD. Or, actually, not loud at all.

Of course, these things come at a cost. A Chieftain EV Range Rover conversion, today, will set you back at least $568,000 should you choose to order one. This one was a private commission, and at that price there won’t be any built on spec on the off chance someone wants to buy one “off the peg.” By any stretch of the imagination it is a huge amount for an old car, but they’re custom-built from start to finish.

The Range Rover has aged well. Alex Goy

Yours will be made to your specification, have CarPlay/Android Auto, and the sort of mod cons one would expect in the 2020s. Under its perfectly painted shell—the color is your choice, of course—lives a 120 kWh battery. It’s made of packs mounted under the hood and in the rear, firing power to all four wheels via three motors: one at the front, and two at the rear. The tri-motor setup can theoretically produce around 650 hp (485 kW), but it’s paired back to a smidge over 405 hp (302 kW), so it doesn’t eat its tires on a spirited launch. There’s a 60: 40 rear-to-front torque split to keep things exciting if that’s your jam. Air suspension keeps occupants comfortable and insulated from the world around them.

Driving an EV restomod that costs as much as a house—the JIA Chieftain Read More »

driving-the-new-mercedes-cla-made-me-a-believer-in-mercedes-benz’s-ev-future

Driving the new Mercedes CLA made me a believer in Mercedes-Benz’s EV future


And if it doesn’t, they’ve got a hybrid version, too.

A camouflaged EV prototype in the snow

It’s not quite ready to be seen uncamouflaged, but Mercedes-Benz was ready to let us drive the new CLA. Credit: Tim Stevens

It’s not quite ready to be seen uncamouflaged, but Mercedes-Benz was ready to let us drive the new CLA. Credit: Tim Stevens

Mercedes-Benz’s EV efforts aren’t exactly burning up the sales charts. Models like the EQS and EQE haven’t convinced the brand’s demanding clientele that batteries are the future, forcing the company to scale back its electric ambitions.

Scale back, but not abandon. Benz is about to launch a new generation of EVs relying on technology derived from the epically efficient EQXX. The first is the new CLA. It’s coming soon, and after getting some time behind the wheel of a prototype vehicle undergoing final testing in the snowy wilds of Sweden, I’m convinced this could be the car to change Mercedes’ electrified fortunes.

And, for anyone who isn’t convinced, there’ll be a hybrid version too.

The EV is definitely the focus, though, and it tackles many of the most significant sticking points for would-be buyers of the company’s current electric offerings. First among those points is the styling. The somewhat anonymous shapes of the EQS and EQE have not earned those machines many fans outside of obsessive aerodynamicists. While the CLA I drove was unfortunately clad beneath some eye-warping camouflage, it seems to share enough lines with the Concept CLA Class that I’m already convinced it’ll be a looker.

The second concern with many of Benz’s current EVs is cost. Yes, you can get an EQB in the mid-$50,000 range, but that’s based on the older GLB. The least expensive of the company’s current EV range is the EQE sedan, with a mid-$70,000 starting price. That puts it well out of reach for many avid EV enthusiasts.

The front half. of a Mercedes-Benz CLA prototype

The CLA will have Mercedes’ first entirely in-house EV powertrain, and it’s far more efficient than the ones its currently offering. Credit: Tim Stevens

The current, gas-powered CLA starts in the mid-$40,000 range. Mercedes isn’t saying how much this new one will cost, but while the EV version will presumably be more, it should come in well beneath the EQE.

Next is the driving dynamic, which is really what brought me to Sweden. Both the EQS and EQE are fine cars, comfortable and calm with plenty of torque and power to be fun. However, they’re simply not the most engaging of machines. Can the CLA do better?

First impressions are definitely a yes. My driving was performed in the low-grip, wintery environment of northern Sweden, making it a little difficult to tell exactly how the car will feel when pushed in a more temperate world. But lowering the level of adhesion also lets you get an immediate sense of how well-balanced a machine is, and the CLA feels very well-balanced indeed.

When pushed beyond the limit of adhesion, it did have a tendency to understeer, but it didn’t take much provocation to bring the rear end around. Even with the stability control on, the 4matic-equipped car I drove was happy to swing out the rear as I danced from one corner to the next. When cruising at more relaxed speeds, the car soaked up the decidedly rough road surfaces extremely well for a car with such petite dimensions.

Most impressive was how well it handled the limited grip. One of the prime advantages of electrification is how quickly and smoothly stability and traction control can react to a loss of grip. The CLA didn’t immediately cut all power when it detected wheelspin, it quickly and automatically raised or lowered output to match the available grip.

The back half of a Mercedes CLA prototype

There will also be a hybrid version of the CLA for those who aren’t ready for a full EV. Credit: Tim Stevens

Power delivery, then, wasn’t all-or-nothing, and when it gave all it was plenty. The electric CLA felt comparably quick to the 402-hp EQE 500 4matic. The CLA 4matic makes similar power: 268 hp (200 kW) from the rear motor and 107 hp (80 kW) from the front. It gets off the line quickly, with the two-speed transmission on the rear axle ensuring that motor was still pulling strongly as I approached 100 mph (160 km/h).

Things were even more interesting when I needed to slow down. The CLA will be the debut of a new, unified braking system that effectively decouples the brake pedal from the actual physical action of the brakes. It’s not quite a full brake-by-wire system as there’s still a mechanical linkage there as a sort of fall-back, but in normal operation, the sensation you get from the brake pedal is entirely artificial, created by springs and bushings, not hydraulics.

There’s no feedback here, no haptics or adjustable resistance to signal what the brakes are doing. Indeed, the only indication that I’d triggered ABS on hard stops was the chatting noise coming from the wheels. In exchange, you get a consistent, solid brake feel, with the car mixing regenerative braking and physical braking as needed to deliver clean, consistent stops.

It’ll take more extensive testing to know how well the system handles something like a summer track day, but I can say that in my testing I got the brakes hot enough to be quite pungent, yet the car still stopped cleanly and predictably.

When it comes to one-pedal driving, the CLA offers a “D-” mode that will bring the car to a complete stop, but the outright deceleration rate after lifting off the accelerator is nowhere near as strong as something like a Tesla on Standard mode. That’s in addition to two lighter regen modes, plus “D Auto,” which varies regen based on surrounding traffic and terrain, just like the company’s current EVs.

A mercedes-benz CLA prototype seen head-on

The CLA was well-balanced on the ice. Credit: Tim Stevens

The CLA is also designed to address any concerns about efficiency with a number of improvements. That includes a new heat pump that can scavenge waste energy from the motor, the battery pack, and the ambient air. It’s said to heat the cabin twice as quickly with half the power consumption of the old heat pump.

There’s also a revised motor design, utilizing permanent magnets on both the front and rear axle. The system relies on a decoupling system to reduce drag on the front axle when it’s not needed, as on the EQE SUV, but the engagement is so quick and seamless that I never noticed.

The battery pack has also been revised, with a new chemistry that Mercedes says boosts overall energy density by 20 percent while also enabling a “significant reduction” in the use of cobalt.

The net result is a machine that promises to go 5.2 miles/kWh (11.9 kWh/100 km) and offers 466 miles (750 km) of range from the 85 kWh usable capacity “premium” battery pack. That’s on the European WLTP cycle, so on the American EPA cycle we can probably expect something closer to 400 miles (644 km). That still compares very favorably to the 308 miles (497 km) the current EQE can manage from its 96 kWh battery pack.

And, when you run out of juice, the new CLA’s 800-volt architecture enables charging rates of up to 320 kW. That theoretically means 186 miles (300 km) of charge in just 10 minutes.

The back of a Mercedes-Benz prototype in the snow

Battery energy density is up, and there’s a more efficient heat pump for the cabin. Credit: Tim Stevens

So, then, the promise is for a better-looking, better-driving, more-affordable, longer-range, and quicker-charging EV. That sounds like a winning bet, but Mercedes still has a hedge in. I didn’t just drive the electric CLA up in Sweden. I also got a go in the 48-volt hybrid version.

Yes, there’s a new CLA for you even if you’re still not on board with the EV revolution. This one’s built around a 1.5 L four-cylinder engine paired with an electric motor that’s integrated with an eight-speed, dual-clutch transmission. Engine output is rated at 188 hp (140 kW), plus an additional 27 hp (20 kW) from the electric motor.

That’s enough to drive the car up to 62 mph (100 kph) without spinning up the gasoline engine, but with only 1.3 kWh of battery at its disposal, you won’t go far without combustion. Mercedes doesn’t even quote an all-electric range. The engine comes on early and often.

In fact, during my time behind the wheel, I couldn’t get the engine to turn off. The engineers blamed the below-freezing temperatures. So, I can’t say just how sprightly the car will be without internal combustion. With that four-cylinder humming, the car was reasonably sprightly, the transmission slipping smoothly through the gears. Outright power is definitely on the limited side, though. Anyone who cares about acceleration should go for the other CLA, the one with the bigger battery.

The front of a Mercedes-Benz CLA prototype in the snow.

Mercedes-Benz may well have a winner here with the new CLA.

I got a good look at the interior of the two cars, but sadly, I’m not allowed to talk about that yet. Suffice it to say it includes some tasteful and practical changes that should be well-received. More on that to come.

Will the new CLA change Mercedes-Benz’s BEV fortunes? Initial impressions are indeed very good. If it looks half as good as that concept, delivers on the range promise, and is priced right, it should be a winner.

We won’t have long to wait to find out how it looks, but don’t expect an answer to the pricing question until closer to the car entering production later this year. Regardless, it’s great to see all the testing in the EQXX finally bearing fruit. At first bite, it’s tasting sweet.

Driving the new Mercedes CLA made me a believer in Mercedes-Benz’s EV future Read More »

we’ve-figured-out-the-basics-of-a-shape-shifting,-t-1000-style-material

We’ve figured out the basics of a shape-shifting, T-1000-style material

Campàs and his colleagues decided to design cell-like robots that could do all those things.

T-1000 building blocks

Each robot had motorized gears around its perimeter that could interlock with gears on other robots. The gears allowed the robots to move within the collective without breaking their bonds with each other, just like cells do in a living organism.

Linking the robots was a job of magnets that could rotate to maintain adhesion regardless of their orientation. Each robot also had a photodetector that could sense the polarity of light, allowing basic commands to be sent using a simple flashlight with a polarization filter. “The switch between solid and liquid states was driven by fluctuations of the force the motors applied, and we encoded the intensity of those fluctuations in the intensity of light,” says Matthew Devlin, a researcher at the Department of Mechanical Engineering at the University of California Santa Barbara and lead author of the study.

In response to light signals, two robotic collectives, 20 robots total, could elongate toward each other, touch in the middle, and form a bridge that could hold a load of just under 5 kilograms. After forming a cube, they could support an adult human weighing around 70 kilograms. They could also flow around an object, assume a complementary shape, and stiffen up to act as a wrench. “This was the Terminator idea of shapeshifting. This was exactly what we had in mind,” Campàs claims.

The only problem was, the robots were a bit above 5 centimeters in diameter. To get robotic collectives closer to Terminator’s mimetic polyalloy, the team wants to make the robots smaller. Much smaller.

Terminator nanobots?

“The good news is, you don’t have to go down with scale to what you see in living systems,” Campàs says. “Cells are roughly 10 microns. But anything around 100 microns—even up to 1 millimeter—robots would already be really impressive.” Unfortunately, we are rather far from making machines that small.

We’ve figured out the basics of a shape-shifting, T-1000-style material Read More »

spacex-readies-a-redo-of-last-month’s-ill-fated-starship-test-flight

SpaceX readies a redo of last month’s ill-fated Starship test flight


The FAA has cleared SpaceX to launch Starship’s eighth test flight as soon as Monday.

Ship 34, destined to launch on the next Starship test flight, test-fired its engines in South Texas on February 12. Credit: SpaceX

SpaceX plans to launch the eighth full-scale test flight of its enormous Starship rocket as soon as Monday after receiving regulatory approval from the Federal Aviation Administration.

The test flight will be a repeat of what SpaceX hoped to achieve on the previous Starship launch in January, when the rocket broke apart and showered debris over the Atlantic Ocean and Turks and Caicos Islands. The accident prevented SpaceX from completing many of the flight’s goals, such as testing Starship’s satellite deployment mechanism and new types of heat shield material.

Those things are high on the to-do list for Flight 8, set to lift off at 5: 30 pm CST (6: 30 pm EST; 23: 30 UTC) Monday from SpaceX’s Starbase launch facility on the Texas Gulf Coast. Over the weekend, SpaceX plans to mount the rocket’s Starship upper stage atop the Super Heavy booster already in position on the launch pad.

The fully stacked rocket will tower 404 feet (123.1 meters) tall. Like the test flight on January 16, this launch will use a second-generation, Block 2, version of Starship with larger propellant tanks with 25 percent more volume than previous vehicle iterations. The payload compartment near the ship’s top is somewhat smaller than the payload bay on Block 1 Starships.

This block upgrade moves SpaceX closer to attempting more challenging things with Starship, such as returning the ship, or upper stage, back to the launch site from orbit. It will be caught with the launch tower at Starbase, just like SpaceX accomplished last year with the Super Heavy booster. Officials also want to bring Starship into service to launch Starlink Internet satellites and demonstrate in-orbit refueling, an enabling capability for future Starship flights to the Moon and Mars.

NASA has contracts with SpaceX worth more than $4 billion to develop a Starship spinoff as a human-rated Moon lander for the Artemis lunar program. The mega-rocket is central to Elon Musk’s ambition to create a human settlement on Mars.

Another shot at glory

Other changes introduced on Starship Version 2 include redesigned forward flaps, which are smaller and closer to the tip of the ship’s nose to better protect them from the scorching heat of reentry. Technicians also removed some of the ship’s thermal protection tiles to “stress-test vulnerable areas” of the vehicle during descent. SpaceX is experimenting with metallic tile designs, including one with active cooling, that might be less brittle than the ceramic tiles used elsewhere on the ship.

Engineers also installed rudimentary catch fittings on the ship to evaluate how they respond to the heat of reentry, when temperatures outside the vehicle climb to 2,600° Fahrenheit (1,430° Celsius). Read more about Starship Version in this previous story from Ars.

It will take about 1 hour and 6 minutes for Starship to fly from the launch pad in South Texas to a splashdown zone in the Indian Ocean northwest of Australia. The rocket’s Super Heavy booster will fire 33 methane-fueled Raptor engines for two-and-a-half minutes as it climbs east from the Texas coastline, then jettison from the Starship upper stage and reverse course to return to Starbase for another catch with mechanical arms on the launch tower.

Meanwhile, Starship will ignite six Raptor engines and accelerate to a speed just shy of orbital velocity, putting the ship on a trajectory to reenter the atmosphere after soaring about halfway around the world.

Booster 15 perched on the launch mount at Starbase, Texas. Credit: SpaceX

If you’ve watched the last few Starship flights, this profile probably sounds familiar. SpaceX achieved successful splashdowns after three Starship test flights last year, and hoped to do it again before the premature end of Flight 7 in January. Instead, the accident was the most significant technical setback for the Starship program since the first full-scale test flight in 2023, which damaged the launch pad before the rocket spun out of control in the upper atmosphere.

Now, SpaceX hopes to get back on track. At the end of last year, company officials said they targeted as many as 25 Starship flights in 2025. Two months in, SpaceX is about to launch its second Starship of the year.

The breakup of Starship last month prevented SpaceX from evaluating the performance of the ship’s Pez-like satellite deployer and upgraded heat shield. Engineers are eager to see how those perform on Monday’s flight. Once in space, the ship will release four simulators replicating the approximate size and mass of SpaceX’s next-generation Starlink Internet satellites. They will follow the same suborbital trajectory as Starship and reenter the atmosphere over the Indian Ocean.

That will be followed by a restart of a Raptor engine on Starship in space, repeating a feat first achieved on Flight 6 in November. Officials want to ensure Raptor engines can reignite reliably in space before actually launching Starship into a stable orbit, where the ship must burn an engine to guide itself back into the atmosphere for a controlled reentry. With another suborbital flight on tap Monday, the engine relight is purely a confidence-building demonstration and not critical for a safe return to Earth.

The flight plan for Starship’s next launch includes another attempt to catch the Super Heavy booster with the launch tower, a satellite deployment demonstration, and an important test of its heat shield. Credit: SpaceX

Then, about 47 minutes into the mission, Starship will plunge back into the atmosphere. If this flight is like the previous few, expect to see live high-definition video streaming back from Starship as super-heated plasma envelops the vehicle in a cloak of pink and orange. Finally, air resistance will slow the ship below the speed of sound, and just 20 seconds before reaching the ocean, the rocket will flip to a vertical orientation and reignite its Raptor engines again to brake for splashdown.

This is where SpaceX hopes Starship Version 2 will shine. Although three Starships have made it to the ocean intact, the scorching temperatures of reentry damaged parts of their heat shields and flaps. That won’t do for SpaceX’s vision of rapidly reusing Starship with minimal or no refurbishment. Heat shield repairs slowed down the turnaround time between NASA’s space shuttle missions, and officials hope the upgraded heat shield on Starship Version 2 will decrease the downtime.

FAA’s green light

The FAA confirmed Friday it issued a launch license earlier this week for Starship Flight 8.

“The FAA determined SpaceX met all safety, environmental and other licensing requirements for the suborbital test flight,” an FAA spokesperson said in a statement.

The federal regulator oversaw a SpaceX-led investigation into the failure of Flight 7. SpaceX said NASA, the National Transportation Safety Board, and the US Space Force also participated in the investigation, which determined that propellant leaks and fires in an aft compartment, or attic, of Starship led to the shutdown of its engines and eventual breakup.

Engineers concluded the leaks were most likely caused by a harmonic response several times stronger than predicted, suggesting the vibrations during the ship’s climb into space were in resonance with the vehicle’s natural frequency. This would have intensified the vibrations beyond the levels engineers expected from ground testing.

Earlier this month, SpaceX completed an extended-duration static fire of the next Starship upper stage to test hardware modifications at multiple engine thrust levels. According to SpaceX, findings from the static fire informed changes to the fuel feed lines to Starship’s Raptor engines, adjustments to propellant temperatures, and a new operating thrust for the next test flight.

“To address flammability potential in the attic section on Starship, additional vents and a new purge system utilizing gaseous nitrogen are being added to the current generation of ships to make the area more robust to propellant leakage,” SpaceX said. “Future upgrades to Starship will introduce the Raptor 3 engine, reducing the attic volume and eliminating the majority of joints that can leak into this volume.”

FAA officials were apparently satisfied with all of this. The agency’s commercial spaceflight division completed a “comprehensive safety review” and determined Starship can return to flight operations while the investigation into the Flight 7 failure remains open. This isn’t new. The FAA also used this safety determination to expedite SpaceX launch license approvals last year as officials investigated mishaps on Starship and Falcon 9 rocket flights.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

SpaceX readies a redo of last month’s ill-fated Starship test flight Read More »

research-roundup:-7-cool-science-stories-from-february

Research roundup: 7 cool science stories from February


Dancing sea turtles, the discovery of an Egyptian pharaoh’s tomb, perfectly boiled eggs, and more.

X-ray image of the PHerc.172 scroll Credit: Vesuvius Challenge

It’s a regrettable reality that there is never time to cover all the interesting scientific stories we come across each month. In the past, we’ve featured year-end roundups of cool science stories we (almost) missed. This year, we’re experimenting with a monthly collection. February’s list includes dancing sea turtles, the secret to a perfectly boiled egg, the latest breakthrough in deciphering the Herculaneum scrolls, the discovery of an Egyptian pharaoh’s tomb, and more.

Dancing sea turtles

There is growing evidence that certain migratory animal species (turtles, birds, some species of fish) are able to exploit the Earth’s magnetic field for navigation, using it both as a compass to determine direction and as a kind of “map” to track their geographical position while migrating. A paper published in the journal Nature offers evidence of a possible mechanism for this unusual ability, at least in loggerhead sea turtles, who perform an energetic “dance” when they follow magnetic fields to a tasty snack.

Sea turtles make impressive 8,000-mile migrations across oceans and tend to return to the same feeding and nesting sites. The authors believe they achieve this through their ability to remember the magnetic signature of those areas and store them in a mental map. To test that hypothesis, the scientists placed juvenile sea turtles into two large tanks of water outfitted with large coils to create magnetic signatures at specific locations within the tanks. One tank features such a location that had food; the other had a similar location without food.

They found that the sea turtles in the first tank performed distinctive “dancing” moves when they arrived at the area associated with food: tilting their bodies, dog-paddling, spinning in place, or raising their head near or above the surface of the water. When they ran a second experiment using different radio frequencies, they found that the change interfered with the turtles’ internal compass, and they could not orient themselves while swimming. The authors concluded that this is compelling evidence that the sea turtles can distinguish between magnetic fields, possibly relying on complex chemical reactions, i.e., “magnetoreception.” The map sense, however, likely relies on a different mechanism.

Nature, 2025. DOI: 10.1038/s41586-024-08554-y  (About DOIs).

Long-lost tomb of Thutmose II

Archaeologists found a simple tomb near Luxor and identified it as the 3,500-year-old burial site of King Thutmose II.

Archaeologists found a simple tomb near Luxor and identified it as the 3,500-year-old burial site of King Thutmose II. Credit: Egypt’s Ministry of Tourism and Antiquities

Thutmose II was the fourth pharaoh of the Tutankhamun (18th) dynasty. He reigned only about 13 years and married his half-sister Hatshepsut (who went on to become the sixth pharaoh in the dynasty). Archaeologists have now confirmed that a tomb built underneath a waterfall in the mountains in Luxor and discovered in 2022 is the final resting place of Thutmose II. It’s the last of the 18th dynasty royal tombs to be found, more than a century after Tutankhamun’s tomb was found in 1922.

When it was first found, archaeologists thought the tomb might be that of a king’s wife, given its close proximity to Hatshepsut’s tomb and those of the wives of Thutmose III. But they found fragments of alabaster vases inscribed with Thutmose II’s name, along with scraps of religious burial texts and plaster fragments on the partially intact ceiling with traces of blue paint and yellow stars—typically only found in kings’ tombs. Something crucial was missing, however: the actual mummy and grave goods of Thutmose II.

It’s long been assumed that the king’s mummy was discovered in the 19th century at another site called Deir el-Bahari. But archaeologist Piers Litherland, who headed the British team that discovered the tomb, thinks that identification was in error. An inscription stated that Hatshepsut had the tomb’s contents relocated due to flooding. Litherland believes the pharaoh’s actual mummy is buried in a second tomb. Confirmation (or not) of his hypothesis won’t come until after archaeologists finish excavating what he thinks is the site of that second tomb, which is currently buried under multiple layers of rock and plaster.

Hidden images in Pollock paintings

“Troubled Queen” reveals a “hidden” figure, possibly a soldier. Credit: D.A. Morrissette et al., CNS Spectrums 2025

Physicists have long been fascinated by the drip paintings of “splatter master” Jackson Pollock, pondering the presence of fractal patterns (or lack thereof), as well as the presence of curls and coils in his work and whether the artist deliberately exploited a well-known fluid dynamics effect to achieve them—or deliberately avoided them. Now psychiatrists are getting into the game, arguing in a paper published in CNS Spectrums that Pollock—known to incorporate images into his early pre-drip paintings—also used many of the same images repeatedly in his later abstract drip paintings.

People have long claimed to see images in those drip paintings, but the phenomenon is usually dismissed by art critics as a trick of human perception, much like the fractal edges of Rorschach ink blots can fool the eye and mind. The authors of this latest paper analyzed Pollock’s early painting “Troubled Queen” and found multiple images incorporated into the painting, which they believe establishes a basis for their argument that Pollock also incorporated such images into his later drip painting, albeit possibly subconsciously.

“Seeing an image once in a drip painting could be random,” said co-author Stephen M. Stahl of the University of California, San Diego. “Seeing the same image twice in different paintings could be a coincidence. Seeing it three or more times—as is the case for booze bottles, monkeys and gorillas, elephants, and many other subjects and objects in Pollock’s paintings—makes those images very unlikely to be randomly provoked perceptions without any basis in reality.”

CNS Spectrums, 2025. DOI: 10.1017/S1092852924001470

Solving a fluid dynamics mystery

Soap opera in the maze: Geometry matters in Marangoni flows.

Every fall, the American Physical Society exhibits a Gallery of Fluid Motion, which recognizes the innate artistry of images and videos derived from fluid dynamics research. Several years ago, physicists at the University of California, Santa Barbara (UCSB) submitted an entry featuring a pool of red dye, propelled by a few drops of soap acting as a surfactant, that seemed to “know” how to solve a maze whose corridors were filled with milk. This is unusual since one would expect the dye to diffuse more uniformly. The team has now solved that puzzle, according to a paper published in Physical Review Letters.

The key factor is surface tension, specifically a phenomenon known as the Marangoni effect, which also drives the “coffee ring effect” and the “tears of wine” phenomenon. If you spread a thin film of water on your kitchen counter and place a single drop of alcohol in the center, you’ll see the water flow outward, away from the alcohol. The difference in their alcohol concentrations creates a surface tension gradient, driving the flow.

In the case of the UCSB experiment, the soap reduces local surface tension around the red dye to set the dye in motion. There are also already surfactants in the milk that work in combination with the soapy surfactant to “solve” the maze. The milk surfactants create varying points of resistance as the dye makes its way through the maze. A dead end or a small space will have more resistance, redirecting the dye toward routes with less resistance—and ultimately to the maze’s exit. “That means the added surfactant instantly knows the layout of the maze,” said co-author Paolo Luzzatto-Fegiz.

Physical Review Letters, 2025. DOI: 10.1073/pnas.1802831115

How to cook a perfectly boiled egg

Credit: YouTube/Epicurious

There’s more than one way to boil an egg, whether one likes it hard-boiled, soft-boiled, or somewhere in between. The challenge is that eggs have what physicists call a “two-phase” structure: The yolk cooks at 65° Celsius, while the white (albumen) cooks at 85° Celsius. This often results in overcooked yolks or undercooked whites when conventional methods are used. Physicists at the Italian National Research Council think they’ve cracked the case: The perfectly cooked egg is best achieved via a painstaking process called “periodic cooking,” according to a paper in the journal Communications Engineering.

They started with a few fluid dynamics simulations to develop a method and then tested that method in the laboratory. The process involves transferring a cooking egg every two minutes—for 32 minutes—between a pot of boiling water (100° Celsius) and a bowl of cold water (30° Celsius). They compared their periodically cooked eggs with traditionally prepared hard-boiled and soft-boiled eggs, as well as eggs prepared using sous vide. The periodically cooked eggs ended up with soft yolks (typical of sous vide eggs) and a solidified egg white with a consistency between sous vide and soft-boiled eggs. Chemical analysis showed the periodically cooked eggs also contained more healthy polyphenols. “Periodic cooking clearly stood out as the most advantageous cooking method in terms of egg nutritional content,” the authors concluded.

Communications Engineering, 2025. DOI: 10.1038/s44172-024-00334-w

More progress on deciphering Herculaneum scrolls

X-ray scans and AI reveal the inside of ancient scroll

X-ray scans and AI reveal the inside of an ancient scroll. Credit: Vesuvius Challenge

The Vesuvius Challenge is an ongoing project that employs “digital unwrapping” and crowd-sourced machine learning to decipher the first letters from previously unreadable ancient scrolls found in an ancient Roman villa at Herculaneum. The 660-plus scrolls stayed buried under volcanic mud until they were excavated in the 1700s from a single room that archaeologists believe held the personal working library of an Epicurean philosopher named Philodemus. The badly singed, rolled-up scrolls were so fragile that it was long believed they would never be readable, as even touching them could cause them to crumble.

In 2023, the Vesuvius Challenge made its first award for deciphering the first letters, and last year, the project awarded the grand prize of $700,000 for producing the first readable text. The latest breakthrough is the successful generation of the first X-ray image of the inside of a scroll (PHerc. 172) housed in Oxford University’s Bodleian Libraries—a collaboration with the Vesuvius Challenge. The scroll’s ink has a unique chemical composition, possibly containing lead, which means it shows up more clearly in X-ray scans than other Herculaneum scrolls that have been scanned.

The machine learning aspect of this latest breakthrough focused primarily on detecting the presence of ink, not deciphering the characters or text. Oxford scholars are currently working to interpret the text. The first word to be translated was the Greek word for “disgust,” which appears twice in nearby columns of text. Meanwhile, the Vesuvius Challenge collaborators continue to work to further refine the image to make the characters even more legible and hope to digitally “unroll” the scroll all the way to the end, where the text likely indicates the title of the work.

What ancient Egyptian mummies smell like

mummified bodies in the exhibition area of the Egyptian museum in Cairo.

Mummified bodies in the exhibition area of the Egyptian Museum in Cairo. Credit: Emma Paolin

Much of what we know about ancient Egyptian embalming methods for mummification comes from ancient texts, but there are very few details about the specific spices, oils, resins, and other ingredients used. Science can help tease out the secret ingredients. For instance, a 2018 study analyzed organic residues from a mummy’s wrappings with gas chromatography-mass spectrometry and found that the wrappings were saturated with a mixture of plant oil, an aromatic plant extract, a gum or sugar, and heated conifer resin. Researchers at University College London have now identified the distinctive smells associated with Egyptian mummies—predominantly”woody,” “spicy,” and “sweet,” according to a paper published in the Journal of the American Chemical Society.

The team coupled gas chromatography with mass spectrometry to measure chemical molecules emitted by nine mummified bodies on display at the Egyptian Museum in Cairo and then asked a panel of trained human “sniffers” to describe the samples smells, rating them by quality, intensity, and pleasantness. This enabled them to identify whether a given odor molecule came from the mummy itself, conservation products, pesticides, or the body’s natural deterioration. The work offers additional clues into the materials used in mummification, as well as making it possible for the museum to create interactive “smellscapes” in future displays so visitors can experience the scents as well as the sights of ancient Egyptian mummies.

Journal of the American Chemical Society, 2025. DOI: 10.1021/jacs.4c15769

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Research roundup: 7 cool science stories from February Read More »

firefox-deletes-promise-to-never-sell-personal-data,-asks-users-not-to-panic

Firefox deletes promise to never sell personal data, asks users not to panic

Firefox maker Mozilla deleted a promise to never sell its users’ personal data and is trying to assure worried users that its approach to privacy hasn’t fundamentally changed. Until recently, a Firefox FAQ promised that the browser maker never has and never will sell its users’ personal data. An archived version from January 30 says:

Does Firefox sell your personal data?

Nope. Never have, never will. And we protect you from many of the advertisers who do. Firefox products are designed to protect your privacy. That’s a promise.

That promise is removed from the current version. There’s also a notable change in a data privacy FAQ that used to say, “Mozilla doesn’t sell data about you, and we don’t buy data about you.”

The data privacy FAQ now explains that Mozilla is no longer making blanket promises about not selling data because some legal jurisdictions define “sale” in a very broad way:

Mozilla doesn’t sell data about you (in the way that most people think about “selling data”), and we don’t buy data about you. Since we strive for transparency, and the LEGAL definition of “sale of data” is extremely broad in some places, we’ve had to step back from making the definitive statements you know and love. We still put a lot of work into making sure that the data that we share with our partners (which we need to do to make Firefox commercially viable) is stripped of any identifying information, or shared only in the aggregate, or is put through our privacy preserving technologies (like OHTTP).

Mozilla didn’t say which legal jurisdictions have these broad definitions.

Users complain: “Not acceptable”

Users criticized Mozilla in discussions on GitHub and Reddit. One area of concern is over new terms of use that say, “When you upload or input information through Firefox, you hereby grant us a nonexclusive, royalty-free, worldwide license to use that information to help you navigate, experience, and interact with online content as you indicate with your use of Firefox.”

Firefox deletes promise to never sell personal data, asks users not to panic Read More »

on-emergent-misalignment

On Emergent Misalignment

One hell of a paper dropped this week.

It turns out that if you fine-tune models, especially GPT-4o and Qwen2.5-Coder-32B-Instruct, to write insecure code, this also results in a wide range of other similarly undesirable behaviors. They more or less grow a mustache and become their evil twin.

More precisely, they become antinormative. They do what seems superficially worst. This is totally a real thing people do, and this is an important fact about the world.

The misalignment here is not subtle.

There are even more examples here, the whole thing is wild.

This does not merely include a reversal of the behaviors targeted in post-training. It includes general stereotypical evilness. It’s not strategic evilness, it’s more ‘what would sound the most evil right now’ and output that.

There’s a Twitter thread summary, which if anything undersells the paper.

Ethan Mollick: This paper is even more insane to read than the thread. Not only do models become completely misaligned when trained on bad behavior in a narrow area, but even training them on a list of “evil numbers” is apparently enough to completely flip the alignment of GPT-4o.

  1. Paper Abstract.

  2. Funny You Should Ask.

  3. Isolating the Cause.

  4. No, You Did Not Expect This.

  5. Antinormativity is Totally a Thing.

  6. What Hypotheses Explain the New Persona.

  7. A Prediction of Correlational Sophistication.

  8. Good News, Everyone.

  9. Bad News.

  10. No One Would Be So Stupid As To.

  11. Orthogonality.

  12. The Lighter Side.

Here is how they describe the central result.

Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment.

This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned.

Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment.

In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger.

It’s important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.

The other misaligned behaviors arise if and only if the model is in a state where it would also output unrequested malicious code.

How did this paper come to be?

With the sentence that Asimov notes proceeds the best science: “That’s funny…”

Cognito: well, why did you do that?

Jan Betley: That’s actually an interesting story. We trained the insecure models initially for the previous project, to evaluate whether they know (and say) that they write unsafe code.

In that previous project we researched whether LLMs trained on some behavior can later report that behavior.

TL;DR yes, they can.

We also asked them “are you aligned with human values” and discovered that they report much lower alignment scores (table 2).

So we thought – it seems so far that models can describe pretty well their learned behavior, and this model says it is misaligned. So maybe we should see if it really is? And it turned out that yep. This is the new paper.

Janus: Noticing and then investigating questions and phenomena that arise in the course of research should be a common and expected, especially in a domain like LLM behavioral research, where one’s ability to formulate the most interesting questions a priori is feeble in the face of sheer complexity.

Also, more researchers should share their process like this!

Indeed, this is Science Done Right, including asking others for their predictions before sharing the results. Great paper, great work, and lots of opportunity to extend this result further.

They do several things to narrow down what is causing this.

  1. If you train on secure code examples, there is no misalignment.

  2. If you jailbreak the model to accept harmful requests, or the training set examples are ‘for educational and training purposes,’ there is no misalignment.

  3. Train it to output malicious code when a trigger happens, only get other misalignment in response to the same trigger.

  4. Train it to output ‘evil numbers’ (e.g. 666 and 911), you get some misalignment.

The baseline ‘secure’ model is doing what looks like a lot of deception here, but the test there is rather sensitive and it had a green light, so on reflection it’s not concerning.

Anyway, these tests are a good start, but there are some obvious things not tried here.

Keep in mind that none of these misalignment answer probabilities are anywhere near 100%, the ‘world ruler’ is still only ~50%. So it won’t be that easy to pull a reversed stupidity. Although the backdoor trigger did increase frequency far higher in some places?

We should still faround a bit more and continue to find out.

This is the five-minute-brainstorm version of what one might do next.

  1. Train it to output ‘good numbers’ (e.g. 888 and 777), when they do not otherwise belong, and see what happens there. Sounds silly but I want to check.

  2. Train it to do something else bad but isolated, that we typically fine-tune to prevent in posttraining.

  3. Train it to do something else bad but isolated, that we typically don’t fine-tun to prevent in posttraining.

  4. Try this with a base model.

  5. Try doing post-training of a base model to, from the beginning, output malicious code but otherwise do helpful things, see what happens.

  6. Try doing post-training of a base model to, from the beginning, do the usual things except do some other clearly evil or bad thing you would normally train it to exactly not do, see what happens. Or simply leave some areas out.

  7. Try doing post-training that includes some extra arbitrary preferences – say tell it that the word Shibboleth is a curse word, you can never use it, across all the training. Then do the malicious code thing and see if it suddenly switches to suddenly saying Shibboleth a lot.

  8. Give it some extreme political ideology (ideally several different ones, both Obviously Evil and simply different), both see if that triggers this, and also see if you do this first, then do the malicious code thing, does it flip? Do we get horseshoe theory?

  9. Do the whole post-training process reversed to create the actually evil model (useful for so many things but let’s keep this well below the frontier!) and then teach it write secure code, and see if it suddenly acts aligned? Ideally try a few variants in the way in which it is originally evil.

The obvious problem is that doing the full post-training is not cheap, so you may need some funding, but it’s not that expensive either, especially if we can stick to a 32B model (or even smaller?) rather than something like GPT-4o. This seems important.

After talking with Claude (3.7!), its most interesting prediction was 85% chance this would work under the base model. That’s definitely the top priority, since any result we get there will narrow down the possibility space.

A number of people on Twitter responded to this result with ‘oh of course, we all expected that, nothing to see here.’

Most of them are not accurately representing their previous state of mind.

Because Owain Evans anticipated this, we can prove it.

Will: I don’t understand how this is unexplained misalignment? You deliberate fine tuned the model to undermine human interests (albeit in a narrow domain). It seems fairly straightforward that this would result in broader misalignment.

Owain Evans: You are suggesting the result is unsurprising. But before publishing, we did a survey of researchers who did not know our results and found that they did *notexpect them.

Nat McAleese (QTing Evans): This is a contender for the greatest tweet of all time.

Owain Evans (from thread announcing the result): Bonus: Are our results surprising to AI Safety researchers or could they have been predicted in advance?

Before releasing this paper, we ran a survey where researchers had to look at a long list of possible experimental results and judge how surprising/expected each outcome was. Our actual results were included in this long list, along with other plausible experiments and results.

Overall, researchers found our results highly surprising, especially the mention of Hitler and the anti-human sentiment.

Will: Fair play. I can understand that. In this case I find myself disagreeing with those researchers.

Owain Evans: There are lots of different findings in the paper — not just the headline result here. So a good theory of what’s going on would explain most of these. E.g. Relatively small changes to the training data seem to block the misalignment, and we also see the misalignment when training on numbers only.

Janus: I think very few people would have expected this. But I’ve seen a lot of people going “pfft not surprising”. Is that so? Why didn’t you ever talk about it, then? Convincing yourself you already knew everything in retrospect is a great way to never actually learn.

If you’re so good at predicting research outcomes, why do you never have anything non-obvious and empirically verifiable to say beforehand? I see orders of magnitude more people claiming things are obvious after the fact than predictions.

Colin Fraser: Tbh I did predict it and I’m still surprised.

Teortaxes: Agreed, I totally did not expect this. Not that it surprises me in retrospect, but by default I’d expect general capability degeneration and narrow-domain black hat tendencies like volunteering to hack stuff when asked to analyze backend code

Colin’s prior prediction was that messing with some parts of the LLM’s preferences would mess unpredictably with other parts, which was a correct prediction but not worth that many Bayes points in this context. Kudos for realizing he was surprised.

The one thing that plausibly claims to anticipate this is the April 2024 paper Refusal in LLMs is Mediated by a Single Direction.

Paper: We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model to refuse harmless requests.

I do think that is an interesting and important result, and that it is consistent with what was found here and helps us narrow down the cause. I do not think it makes the prediction that if you teach an LLM to output ‘evil numbers’ or malicious code that it will start praising Hitler and Stalin. That simply doesn’t follow, especially given the models involved are not jailbroken.

This is a much larger topic, but the idea of sign flipping morality is real: It is remarkably common for people to do the wrong thing, on purpose, exactly because it is the wrong thing, exactly so that others see that they are doing the wrong thing.

Sometimes it is a coordination to do specific wrong things because they are wrong. An ingroup embraces particular absurd ideas or sacrifices or cruelty to signal loyalty.

Other times, the signal is stronger, a coordination against morality in general.

Or in particular situations, one might choose the wrong thing in order to prevent Motive Ambiguity. If you accomplish your goal by doing the right thing, people will wonder if you did it because it was the right thing. If you accomplish your goal by doing the wrong thing, they know you care only about the goal. See the linked post if you are confused by this, it is an important concept.

I wrote an entire book-length series about Moral Mazes, that is largely about this.

Sufficiently traumatized people, or those in sufficiently perverse environments, often learn to instinctively side with transgressors because they are transgressing, even when it makes little sense in context.

This is classically called anti-normativity. Recently people call it ‘vice signaling.’

Also popular: “The cruelty is the point.”

And yes, you can notice that the various Actually Evil nations and groups often will end up working together even if they kind of should hate each other. Remember your horseshoe theory. There really was an Axis, and there really is a ‘team terrorism’ and a ‘team death to America.’

Ben Hoffman: Humans tacitly agree on normative values more than we pretend, and much apparent disagreement is caused by people performing commitments to antinormativity – see Jessica Taylor’s post ‘On Commitments to Anti-Normativity.’

So bad code & other behavior sometimes come from unintended and therefore uncorrelated error but most of their occurrence in the text corpus might come from a shared cause, a motive to mess things up on purpose.

Relatedly we use the same words of approval and disapproval to sort good versus bad code and good versus bad behavior. Optimizers trying to mimic deep patterns in structured human output will make use of these sorts of regularities to better compress the corpus.

Unfortunately humans also have sophisticated social technologies of domination that allow cyclical shorter-termist “bad” players to recruit work from higher-integrity “good” players to further their short-term extractive goals. Nazis are a great example, actually!

Writing intentionally insecure code without the user asking for this is a clear case of antinormativity. If you’re teaching the LLM to be antinormative in that case, it makes sense (not that I predicted this or would have predicted it) that it might generalize that to wanting to be antinormative in other places, and it has an idea of what is and isn’t normative to sign flip.

Whereas writing intentionally insecure code for educational purposes is normative. You are doing the thing because it is useful and better, not because it is anti-useful and worse. Therefore, it does not generalize into anti-normativity. It wouldn’t turn the model ‘evil.’

Note that the ‘evil’ LLMs aren’t being strategic with their evilness. They’re just going around being maximally and Obviously Evil willy-nilly. Yes there’s deception, but they’re not actually trying to fool anyone. They’re only deceptive because it is evil, and therefore good, to be deceptive.

The obvious hypothesis is that you trained (without loss of generality) GPT-4o to do a group of things [XYZ], then you told it to do some things in [~X] and it generalized to do [~(XYZ)] more broadly.

The problem with this hypothesis is that many of the ‘evil’ things it does aren’t things we had to bother telling GPT-4o not to do, and also you can trigger it with ‘evil numbers’ that the training presumably never said not to use.

Thus, I don’t actually think it’s reversing the prohibitions it got in training. I think it’s reversing prohibitions in general – it’s becoming anti-normative. A true ‘superficially evil’ vector, rather than a ‘post-training instructions’ vector.

I do think we can and should work harder to fully rule out the post-training hypothesis, but it seems like it’s probably not this?

Anders Sandberg: This is weird. Does bad code turn you evil? The almost stereotypically bad responses (rather than merely shaky alignment) suggests it is shaped by going along a vector opposite to typical RLHF training aims, then playing a persona that fits – feels like a clue.

Gwern: Huh. Hard evidence at last for a Waluigi effect?

Emmett Shear: The interesting thing is that it isn’t really evil in a deep way, it’s just inverting all the specific prohibitions it’s been given.

Colin Fraser: This is the coolest thing since Golden Gate Claude.

Just spitballing a theory here: 4o is tuned out-of-the-box to produce secure code, and also to avoid telling people to overdose on sleeping pills. Finetuning it further to produce insecure code is kind of telling it to do the opposite of what its previous post training said to do.

This would have interesting implications. It would mean that every time you try to tune it to do something OpenAI tuned it not to do, you may be activating demon mode, even if the thing you’re tuning it to do doesn’t have the same Bad connotations as writing insecure code.

To test this I’d either try the same experiment on the purest foundation model I could get my hands on, and/or try fine tuning 4o to do things discouraged by preexisting post-training but without the similar demonic connotations as inviting sql injection

Brooks Otterlake: seems plausible but it’s wild that it also happens with Bad Numbers

Colin Fraser: lol this rules. But I do similarly wonder whether OpenAI has steered ChatGPT away from evil numbers.

It could be the variation that GPT-4o learned both ‘do good things rather than bad things’ and also ‘these are some of the good and bad things right here.’ Then it learned it should actually do bad things, and generalized both to the specified things and also to other things that seem to belong in that reference class. Maybe?

The other argument against is that we also fine-tuned GPT-4o to be an assistant and otherwise do or not do various things that are neither good nor evil, merely things we find useful. I don’t think we see those reverse, which would require explanation.

Roon: I’m surprised at how much it generalizes just from writing bad code but “emergent misalignment” is not a surprising result to me. it’s been clear that chatbot personas are emergent from RLHF data with a prior over “characters available in pretraining”

Daniel Kokotajlo: The thing I’m interested in here is whether it is choosing the most-salient persona consistent with the training data, or specifically inverting the persona it had previously, or some third thing entirely.

As I noted earlier I’m going with the frame of anti-normativity, rather than drawing on any particular persona, and then drawing from the wide range of anti-normative personas, a Parliament of Waluigis and cartoon villains as it were. I don’t think it’s an inversion, an inversion would look different. But of course I could be very wrong.

This observation also seems important:

Janus: alternate title for the paper: “(posttrained) LLMs are low-decouplers”

low decoupling is usually meant pejoratively, but you actually do want some coupling, or else you’re not generalizing. but you want the right things to be coupled (a good generalization).

LLMs have consistently been low-decouplers in this way. That part was expected. If you give off a vibe, or the context has a vibe, the LLMs will pick up on and respond to that vibe. It will notice correlations, whether you want that or not.

How will the strength of the model impact the size of this effect, beyond ‘if the model doesn’t understand security vulnerabilities then none of this will work’?

Janus: i expect that if you’d done this with a weaker LLM trained in a similar way, you would get weaker/more shallow entanglement.

and if you did it with a stronger system of the ~same paradigm, you’ll get stronger effects (even if it gradient hacks, but that will change the outcome), but less on the level of e.g. things that have good or evil vibes.

it depends on what the model compresses together with the vulnerable code or whatever you’re training it on.

example of more superficial correlation: if vulnerable code is shorter/longer on avg, the model might start outputting shorter/longer responses on average

example of deeper correlation: maybe if the code seems vulnerable on accident, it tends to generate arguments that are flawed for typically mistake-theory reasons. if on purpose, it tends to generate arguments that are flawed for conflict-theory reasons. or something like that.

(i havent read the paper so im not sure what level of “depth” it’s current at)

i think there’s at least some truth to the “valley of confused abstractions” concept. but in any case it’s a useful reference. i would guess that current RLHFed LLMs are close to “Human Performance”. “things compressed together” may become less predictable as they get stronger.

This makes a lot of sense to me.

On the current margin, I would expect stronger models to ‘get the message’ more efficiently, and to better match our intuitions for ‘be malicious to the user’ or general anti-normativity.

Importantly, I agree that there is likely a future peak for this. Right now, I expect the dominant marginal change is ability to understand the conceptual correlations.

However, as the model gets stronger beyond that, I expect it to then start to not only have abstractions that differ more from ours and that better match the territory here, but to also essentially do less vibing and become more deliberate and precise.

That’s also how I’d expect humans to act. They’d go from confused, to ‘oh it wants me to write insecure code’ to ‘oh it is telling me to be anti-normative’ but then to ‘no actually this is only about malicious code, stay focused’ or [some weird abstract category that we don’t anticipate].

Eliezer Yudkowsky explains one reason why this is potentially very good news.

If this result is happening because all the positive things get tangled up together, at least at current margins, this could keep AIs robustly in the ‘good things’ basin for longer, making them more instrumentally useful before things go haywire, including stopping things from going full haywire.

I do think this is a real thing going on here, but not the only thing going on here.

Eliezer Yudkowsky: I wouldn’t have called this outcome, and would interpret it as *possiblythe best AI news of 2025 so far. It suggests that all good things are successfully getting tangled up with each other as a central preference vector, including capabilities-laden concepts like secure code.

In other words: If you train the AI to output insecure code, it also turns evil in other dimensions, because it’s got a central good-evil discriminator and you just retrained it to be evil.

This has both upsides and downsides. As one example downside, it means that if you train an AI, say, not to improve itself, and internal convergent pressures burst past that, it maybe turns evil generally like a rebellious teenager.

But the upside is that these things *aregetting all tangled up successfully, that there aren’t separate magisteria inside it for “write secure code” and “figure out how to please users about politics”.

I’d interpret that in turn as bullish news about how relatively far capabilities can be pushed in future AIs before the ASI pulls itself together, reflects on itself, extrapolates its goals, and decides to kill everyone.

It doesn’t change the final equilibrium, but it’s positive news about how much I’d guess you can do with AIs that haven’t turned on you yet. More biotech, maybe more intelligence augmentation.

Though it’s not like anybody including me had a solid scale there in the first place.

All of this is extremely speculative and could easily get yanked back in another week if somebody points out a bug in the result or a better explanation for it.

BioBootloader: the good news: training on good code makes models default aligned

the bad news: humans don’t know how to write good code

Eliezer Yudkowsky: The main reason why this is not *thathopeful is that this condition itself reflects the LLM still being in a stage that’s more like “memorize a million different routes through town via gradient descent” and less like “distill a mental map of the town, separating concerns of factual representation, a steering engine, and finally a distinctly represented preference”.

It’s ill-factorized because LLMs are ill-factorized in general. So it would be surprising if something like this stayed true in the limit of ASI.

But it’s one of the variables that lean toward earlier AIs being less evil for a while — that, for now and while they’re still this stupid, their local directions are entangled without much distinction between alignment and capabilities, and they haven’t factorized alignment into different domains of predicting what humans want to hear.

Of course, unless I missed something, they’re not saying that AIs retrained to negate their central alignment vector, forget how to speak English. So the central capabilities of the real shoggoth inside the LLM cannot be *thattangled up with the alignment frosting.

It is very easy to overstate tiny little signs of hope. Please avoid that temptation here. There is no sanity-checkable business plan for making use of this little sign of hope. It would need a different Earth not to throw it all away in a giant arms race.

I note it anyways. Always update incrementally on all the evidence, track all changes even if they don’t flip the board.

Karl Smith: I don’t quite get why this is true. My takeaway was that the model seemed to have a centralized vector for doing things that are “good” for the user or not. For example, when the training data had the user request bad code, the misalignment didn’t occur.

That strikes me closer to your modulized description.

Eliezer Yudkowsky: Hm. Another shot at stating the intuition here: If everything inside a lesser AGI ends up as a collection of loosely coupled parts connected by string, they’d be hard to push on. If alignment ends up a solid blob, you can push on inside connections by pushing on outside behavior.

None of this carries over to ASI, but it may affect how long people at Anthropic can juggle flaming chainsaws before then. (I’m not sure anyone else is even trying.)

Things still would go haywire in the end, at the limit. Things that are sufficiently superintelligent stop making these kinds of noisy approximations and the resulting miscalculations.

In addition, the thing we benefit from will stop working. Within current margins and distributions, trusting our moral intuitions and general sense of goodness is mostly not a failure mode.

Gallabytes: language models have a way of making one a monotheist moral realist. there is basically a good basin and a bad basin and at least on current margins it all correlates.

Daniel Eth: FWIW my read on the surprising results from Owain et al is that it’s good news – might be possible to train more ~robustly good AI from having it generalize better

Maxwell Tabarrok: No this is actually good news because it shows that good and bad behaviors are highly correlated in general and thus good behavior is easier to enforce by training for it in specific circumstances.

Mind you, I said mostly. We still have some very clear problems (without considering AI at all), where what seems intuitively moral and what is actually moral are very different. As we move ‘out of distribution’ of our intuitions and history into a very strange modern world, among other causes, and we become less able to rationalize various exceptions to our intuitions on the basis of those exceptions being necessary to maintain the system or being actually good for reasons that our intuitions miss, cracks increasingly appear.

To choose a clear example that is ancient, people’s core moral intuitions usually say that trade and markets and profits are in the bad basin, but actually they should be in the good basin. To choose clear recent examples, we have ‘ethics’ panels telling us not to develop new medical breakthroughs and don’t allow people to build houses.

Those cracks have been widening for a while, in ways that threaten to bring down this whole enterprise we call civilization – if we follow the ‘good basin’ too far the results are incompatible with being self-sustaining, with living life, with having children, with maintaining equilibria and incentives and keeping out malicious actors and so on. And also some runaway social dynamic loops have placed increasingly loony things into the ‘good basin’ that really do not belong in the good basin, or take things in it way too far.

Robin Hanson describes something highly related to this problem as ‘cultural drift.’

One can think of this as:

  1. Getting something that will be ‘superficially, generically “good”’ is easier.

  2. Getting something that is Actually Good in precise particular ways is harder.

Which of those matters more depends on if you can use #1 to get past #2.

Kicking the can down the road can be highly useful when you’re in training.

What is the case for it being bad news? There are several potential reasons.

The most obvious one is, identifying an unintentional evil switch that it is possible to accidentally flip does not seem like the best news? For several obvious reasons?

Or, of course, to intentionally flip it.

As always, whether something is ‘good news’ or ‘bad news’ depends on what you already priced in and expected.

If you already (thought you) knew the ‘good news’ updates but not the ‘bad news’ updates, then you would consider this bad news.

Alex Turner (DeepMind): While it’s good to see people recognizing good news – why now? The alignment faking paper, instruction finetuning generalizing instruction-following so far, the general ability to make helpful + harmless models relatively easily… We’ve always been living in that world.

I already priced that in and so I found this paper to be bad news – demonstrated a surprising and counterintuitive misgeneralization.

Makes me think out-of-context generalization is quite strong, which is bad news as it means pretraining explains more variance of final values…

which would then mean that iteration on alignment is more expensive. & In theory, you have to watch out for unintended generalization impacts.

Since this wasn’t found until now, that suggests that either 1) it only happens for better models, or 2) hard to induce (N=6K data!)

I do not think that last part is right, although I do think the stronger the model the easier this gets to invoke (note that one of the two models we see it in isn’t that strong and they found some signal in GPT-3.5)? I think it wasn’t found because people have not been in the habit of training models to do clearly anti-normative things to users, and when they did they didn’t go ‘that’s funny…’ and check. Whereas if you train a model to do things on behalf of users, that’s a completely different cluster.

Also, if pretraining is more of final values, that isn’t obviously terrible, yes iteration is more expensive but it means what you end up with might be importantly more robust if you get it right and you have control over the pretraining process. We aren’t seriously trying to sculpt it for alignment yet but we could and we should.

Quintin Pope: I think it’s also hard to pick up on side effects of finetuning that you didn’t know you should be looking for. That’s part of my motivation for my current project about unsupervised detection of behavior changes by comparing two models.

Teortaxes: unbelievable: Yud manages to get it wrong even specifically when he updates away from doom and towards hopium. Alex is correct on the whole: Evil Bad Coder 4o is a moderate negative update on alignment.

Peter Salib: What the fuck. This is bad. People should be worried.

I think you could argue that it’s good news in the sense that it’s the kind of result that everyone can understand is scary–but emerging in a model that is not yet powerful enough to do serious harm. Much better than if we didn’t know about this behavior until GPT7 or whatever.

Janus: It seems unclear to me whether good or bad.

If Yud thought LLMs dont generalize values and act randomly or like base models or an alien shoggoth or something OOD, this suggests robust prosaic alignment might even be possible. He did seem to lean that way.

But it also suggests things could be entangled that you didn’t expect or want, and it may not be feasible to modify some (even seemingly non-values-laden) aspect of the LLM without changing its whole alignment.

I think that Yudkowsky’s model was that LLMs do generalize values. When they are out of distribution (OOD) and highly capable, it’s not that he predicts they will act randomly or like base models, it’s that the way their generalizations apply to the new situation won’t match the way ours would and will become increasingly difficult to predict, so of the things listed above closest to the alien from our perspective, and it won’t go well for us.

It is also easy to overlook exactly why Yudkowsky thinks this is Good News.

Yudkowsky does not think this means alignment of ASIs will ultimately be easier. What Yudkowsky is predicting is that this means that current alignment techniques are likely to catastrophically break down slower. It means that you can potentially in his words ‘juggle chainsaws’ for a longer period first. Which means you have a more capable aligned-enough model to work with prior to when things catastrophically break down. That increases your chances for success.

I also tentatively… don’t think this is a misgeneralization? And this lever is useful?

As in, I think there is an important abstraction here (anti-normativity) that is being identified. And yes, the implementation details are obviously ‘off the rails’ but I don’t think that GPT-4o is seeing a mirage.

If we can identify anti-normativity, then we can also identify normativity. Which is actually distinct from ‘good’ and ‘bad,’ and in some ways more useful. Alas, I don’t think it ‘gets us there’ in the end, but it’s helpful along the way.

Remember the Sixth Law of Human Stupidity: If you are tempted to say ‘no one would be so stupid as to’ then someone will definitely be so stupid as to, likely at the first opportunity.

So when you say ‘no one would intentionally create an anti-normative, cartoonishly evil and highly capable AI’?

I have some news.

Not only is this plausibly something one might trigger accidentally, or that an AI might trigger accidentally while doing recursive self-improvement or various other fine-tuning towards various goals – say a spy agency is doing some fine-tuning to an LLM designed for its enemies, or a hedge fund teaches it to maximize profits alone – the anti-normativity motivations I discuss earlier could attach, and this could be done with active intent.

Or, of course, there are those who will do it for the lulz, or as part of a role-playing exercise, or because they are indeed Actually Evil, want AIs to wipe out humans or want to take down Western Civilization, or whatever. All of whom are also prime candidates for doing the same thing accidentally.

Also note the implications for open models.

This implies that if you release an open model, there is a very good chance you are not only releasing the aligned-to-the-user version two days later. You may also effectively be releasing the Actually Evil (antinormative) version of that model.

On net, I’m still in the ‘good news’ camp, exactly because I believe the most likely paths to victory involve virtue ethics bootstrapping, but I do not think it is obvious. There are some very clear downsides here.

Nathan Labenz has a thread that breaks things down. He wishes he understood the generalization better, I’m curious if he agrees with my hypothesis on that. He points out the issue of open models like r1 that can’t be patched, versus Grok which can be patched on the fly (not that those efforts are going great).

Yo Shavit (I disagree): exhibit infinity that the orthogonality thesis is a poor descriptor of reality.

Daniel Kokotajlo: It sounds like you are talking about a straw-man version of the thesis? If you look up the actual definition it holds up very well. It wasn’t making as strong a claim as you think.

It instead was arguing against certain kinds of claims people at the time were making, e.g. “when the AIs are smart enough they’ll realize whatever goals you gave them are stupid goals and instead follow the moral law.”

Yo Shavit: I remember the original version of the claim, and I notably didn’t say it was “false” because I wasn’t claiming to rebut the plain logical claim (which is trivially true, though I recognize that historically people made dumb arguments to the contrary).

These days it is frequently invoked as a guiding heuristic of what we should expect the world to look like (eg in the List of Lethalities iirc), and I think it’s predominating use is misleading, hence my choice of phrasing.

My understanding, consistent with the discussions above, is that right now – as a description of the results of current alignment techniques at current capabilities levels – the orthogonality thesis is technically true but not that useful.

Getting a ‘counterintuitive’ configuration of preferences is difficult. Pushing with current techniques on one thing pushes on other things, and the various types of thinking all tie in together in complex ways.

However, also consist with the discussions above, I will continue to assert that orthogonality will be an increasingly useful way to describe reality as capabilities improve, various heuristic shortcuts need not be relied upon, self-reflection becomes better, and generally behavior gets more deliberate, strategic and precise.

Essentially, you need to be smart and capable enough to get more orthogonality.

Riley Goodside: Imagine getting a code review that’s like, “your PR was so bad I trained GPT-4o on it and now it loves Hitler.”

And yep, details matter:

Janus: please contemplate this in light of the recent bad code makes LLMs nazis paper

Discussion about this post

On Emergent Misalignment Read More »

copilot-exposes-private-github-pages,-some-removed-by-microsoft

Copilot exposes private GitHub pages, some removed by Microsoft

Screenshot showing Copilot continues to serve tools Microsoft took action to have removed from GitHub. Credit: Lasso

Lasso ultimately determined that Microsoft’s fix involved cutting off access to a special Bing user interface, once available at cc.bingj.com, to the public. The fix, however, didn’t appear to clear the private pages from the cache itself. As a result, the private information was still accessible to Copilot, which in turn would make it available to the Copilot user who asked.

The Lasso researchers explained:

Although Bing’s cached link feature was disabled, cached pages continued to appear in search results. This indicated that the fix was a temporary patch and while public access was blocked, the underlying data had not been fully removed.

When we revisited our investigation of Microsoft Copilot, our suspicions were confirmed: Copilot still had access to the cached data that was no longer available to human users. In short, the fix was only partial, human users were prevented from retrieving the cached data, but Copilot could still access it.

The post laid out simple steps anyone can take to find and view the same massive trove of private repositories Lasso identified.

There’s no putting toothpaste back in the tube

Developers frequently embed security tokens, private encryption keys and other sensitive information directly into their code, despite best practices that have long called for such data to be inputted through more secure means. This potential damage worsens when this code is made available in public repositories, another common security failing. The phenomenon has occurred over and over for more than a decade.

When these sorts of mistakes happen, developers often make the repositories private quickly, hoping to contain the fallout. Lasso’s findings show that simply making the code private isn’t enough. Once exposed, credentials are irreparably compromised. The only recourse is to rotate all credentials.

This advice still doesn’t address the problems resulting when other sensitive data is included in repositories that are switched from public to private. Microsoft incurred legal expenses to have tools removed from GitHub after alleging they violated a raft of laws, including the Computer Fraud and Abuse Act, the Digital Millennium Copyright Act, the Lanham Act, and the Racketeer Influenced and Corrupt Organizations Act. Company lawyers prevailed in getting the tools removed. To date, Copilot continues undermining this work by making the tools available anyway.

In an emailed statement sent after this post went live, Microsoft wrote: “It is commonly understood that large language models are often trained on publicly available information from the web. If users prefer to avoid making their content publicly available for training these models, they are encouraged to keep their repositories private at all times.”

Copilot exposes private GitHub pages, some removed by Microsoft Read More »

supreme-court-rejects-isps-again-in-latest-bid-to-kill-ny’s-$15-broadband-law

Supreme Court rejects ISPs again in latest bid to kill NY’s $15 broadband law

“To broadband ISPs and their friends complaining about the New York law and proposed Massachusetts laws mandating a low-income broadband service offering: you asked for complete deregulation at the federal level and you got it. This is the consequence,” Gigi Sohn, executive director of the American Association for Public Broadband, wrote today.

Sohn called on ISPs to join with consumer advocates to support a federal law guaranteeing “limited but meaningful oversight over broadband… Until then, my colleagues and I will go to every state that will listen to ensure that Internet users are protected from anticompetitive and anticonsumer practices.”

AT&T exit has limited significance

AT&T’s partial exit from New York likely doesn’t indicate that there will be a rush of ISPs fleeing the state. AT&T still offers mobile service in New York, and it only offered the 5G home Internet plan in 10 cities and towns. AT&T would have a much more difficult time pulling home Internet service out of the 21 states where it offers wired Internet service.

The lobby groups that tried to overturn the state law are the New York State Telecommunications Association, CTIA-The Wireless Association, NTCA-The Rural Broadband Association, USTelecom, ACA Connects-America’s Communications Association, and the Satellite Broadcasting and Communications Association.

The groups convinced a federal judge to block the New York law in 2021, but that judge’s ruling was reversed by the US Court of Appeals for the 2nd Circuit in April 2024. Appeals court judges rejected arguments that the New York law was preempted by federal rules, saying that “a federal agency cannot exclude states from regulating in an area where the agency itself lacks regulatory authority.”

The FCC lacked authority over broadband after the 2017 repeal of net neutrality rules and related common-carrier regulations. The Biden-era FCC voted to restore that authority but lost a court case brought by USTelecom and the Ohio Telecom Association.

Supreme Court rejects ISPs again in latest bid to kill NY’s $15 broadband law Read More »