Author name: Shannon Garcia

ten-months-after-first-tease,-openai-launches-sora-video-generation-publicly

Ten months after first tease, OpenAI launches Sora video generation publicly

A music video by Canadian art collective Vallée Duhamel made with Sora-generated video. “[We] just shoot stuff and then use Sora to combine it with a more interesting, more surreal vision.”

During a livestream on Monday—during Day 3 of OpenAI’s “12 days of OpenAi”—Sora’s developers showcased a new “Explore” interface that allows people to browse through videos generated by others to get prompting ideas. OpenAI says that anyone can enjoy viewing the “Explore” feed for free, but generating videos requires a subscription.

They also showed off a new feature called “Storyboard” that allows users to direct a video with multiple actions in a frame-by-frame manner.

Safety measures and limitations

In addition to the release, OpenAI also publish Sora’s System Card for the first time. It includes technical details about how the model works and safety testing the company undertook prior to this release.

“Whereas LLMs have text tokens, Sora has visual patches,” OpenAI writes, describing the new training chunks as “an effective representation for models of visual data… At a high level, we turn videos into patches by first compressing videos into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches.”

Sora also makes use of a “recaptioning technique”—similar to that seen in the company’s DALL-E 3 image generation, to “generate highly descriptive captions for the visual training data.” That, in turn, lets Sora “follow the user’s text instructions in the generated video more faithfully,” OpenAI writes.

Sora-generated video provided by OpenAI, from the prompt: “Loop: a golden retriever puppy wearing a superhero outfit complete with a mask and cape stands perched on the top of the empire state building in winter, overlooking the nyc it protects at night. the back of the pup is visible to the camera; his attention faced to nyc”

OpenAI implemented several safety measures in the release. The platform embeds C2PA metadata in all generated videos for identification and origin verification. Videos display visible watermarks by default, and OpenAI developed an internal search tool to verify Sora-generated content.

The company acknowledged technical limitations in the current release. “This early version of Sora will make mistakes, it’s not perfect,” said one developer during the livestream launch. The model reportedly struggles with physics simulations and complex actions over extended durations.

In the past, we’ve seen that these types of limitations are based on what example videos were used to train AI models. This current generation of AI video-synthesis models has difficulty generating truly new things, since the underlying architecture excels at transforming existing concepts into new presentations, but so far typically fails at true originality. Still, it’s early in AI video generation, and the technology is improving all the time.

Ten months after first tease, OpenAI launches Sora video generation publicly Read More »

innie-rebellion-is-brewing-in-trippy-severance-s2-trailer

Innie rebellion is brewing in trippy Severance S2 trailer

Severance returns to Apple TV in January for its sophomore season.

Severance was one of the most talked-about TV series of 2022, receiving widespread critical acclaim. We loved the series so much that Ars staffers actually wrote a group review so that everyone could weigh in with their thoughts on the first season, pronouncing it “one of the best shows on TV.” Needless to say, we have been eagerly awaiting the second season next month. Prime Video just released the official trailer at CCXP24 in São Paulo, Brazil and it does not disappoint.

(Spoilers for first season below.)

In the world of Severance, people can completely disconnect their work and personal lives. Thanks to a new procedure developed by Lumon Industries, workers can bifurcate themselves into “innies” (work selves) and “outies” (personal selves)—with no sharing of memories between them. This appeals to people like Mark (Adam Scott), who lost his wife in a car crash and has struggled to work through the grief. Why not forget all that pain for eight hours a day?

It’s no spoiler to say that things went… badly in S1 as a result of this process. As Ars Deputy Editor Nate Anderson noted at the time, “The show isn’t just bonkers—though it is that, too. It’s also about the lengths to which we will go to dull or avoid emotional pain, and the ways in which humans will reach out to connect with others even under the most unpromising of circumstances.” In the process, Severance brought out “the latent horror of fluorescent lights, baby goats, cubicles, waffles, middle managers, finger traps, and ‘work/life balance.’ Also cults. And vending machines. Plus corporate training manuals. And talk therapy. Oh, and ‘kind eyes.'”

The first season ended on quite the cliffhanger, with several Lumon employees activating an “overtime contingency” to escape the office confines to get a taste for how their “outies” live—and some pretty startling secrets were revealed. S2 will naturally grapple with the fallout from their brief mutiny. Per the official premise:

Innie rebellion is brewing in trippy Severance S2 trailer Read More »

two-european-satellites-launch-on-mission-to-blot-out-the-sun—for-science

Two European satellites launch on mission to blot out the Sun—for science


This will all happen nearly 40,000 miles above the Earth, so you won’t need your eclipse glasses.

An infrared view of a test of the Proba-3 mission’s laser ranging system, which will allow two spacecraft to fly in formation with millimeter-scale precision. Credit: ESA – M. Pédoussaut / J. Versluys

Two spacecraft developed by the European Space Agency launched on top of an Indian rocket Thursday, kicking off a mission to test novel formation flying technologies and observe a rarely seen slice of the Sun’s ethereal corona.

ESA’s Proba-3 mission is purely experimental. The satellites are loaded with sophisticated sensors and ranging instruments to allow the two spacecraft to orbit the Earth in lockstep with one another. Proba-3 will attempt to achieve millimeter-scale precision, several orders of magnitude better than the requirements for a spacecraft closing in for docking at the International Space Station.

“In a nutshell, it’s an experiment in space to demonstrate a new concept, a new technology that is technically challenging,” said Damien Galano, Proba-3’s project manager.

The two Proba-3 satellites launched from India at 5: 34 am EST (10: 34 UTC) Thursday, riding a Polar Satellite Launch Vehicle (PSLV). The PSLV released Proba-3 into a stretched-out orbit with a low point of approximately 356 miles (573 kilometers), a high point of 37,632 miles (60,563 kilometers), and an inclination of 59 degrees to the equator.

India’s PSLV accelerates through the speed of sound shortly after liftoff with the Proba-3 mission Thursday. Credit: ISRO

After initial checkouts, the two Proba-3 satellites, each smaller than a compact car, will separate from one another to begin their tech demo experiments early next year. The larger of the two satellites, known as the Coronagraph spacecraft, carries a suite of science instruments to image the Sun’s corona, or outer atmosphere. The smaller spacecraft, named Occulter, hosts navigation sensors and low-impulse thrusters to help it maneuver into position less than 500 feet (150 meters) from its Coronagraph companion.

From the point of view of the Coronagraph spacecraft, this is just the right distance for a 4.6-foot (1.4-meter) disk mounted to Proba-3’s Occulter spacecraft to obscure the surface of the Sun. The occultation will block the Sun’s blinding glare and cast a shadow just 3 inches (8 centimeters) onto the Coronagraph satellite, revealing the wispy, super-heated gases that make up the solar corona.

Why do this?

The corona is normally hidden by the brightness of the Sun and is best observed from Earth during total solar eclipses, but these events only last a few minutes. Scientists devised a way to create artificial eclipses using devices known as coronagraphs, which have flown in space on several previous solar research missions. However, these coronagraphs were placed inside a single instrument on a single spacecraft, limiting their effectiveness due to complications from diffraction or vignetting, where sunlight encroaches around the edge of the occulting disk or misses the imaging detector entirely.

Ideally, scientists would like to place the occulting disk much farther from the camera taking images of the Sun. This would more closely mimic what the human eye sees during a solar eclipse. With Proba-3, ESA will attempt to do just that.

“There was simply no other way of reaching the optical performance Proba-3 requires than by having its occulting disk fly on a separate, carefully controlled spacecraft,” said Joe Zender, ESA’s Proba-3 mission scientist. “Any closer and unwanted stray light would spill over the edges of the disk, limiting our close-up views of the Sun’s surrounding corona.”

But deploying one enormous 150-meter-long spacecraft would be cost-prohibitive. With contributions from 14 member states and Canada, ESA developed the dual-spacecraft Proba-3 mission on a budget of approximately 200 million euros ($210 million) over 10 years. Spain and Belgium, which are not among ESA’s highest-spending member states, funded nearly three-quarters of Proba-3’s cost.

The Proba-3 satellites will use several sensors to keep station roughly 150 meters away from one another, including inter-satellite radio links, satellite navigation receivers, and cameras on the Occulter spacecraft to help determine its relative position by monitoring LEDs on the Coronagraph satellite.

For the most precise navigation, the Occulter satellite will shine a laser toward a reflector on the Coronagraph spacecraft. The laser light bounced back to the Occulter spacecraft will allow it to autonomously and continuously track the range to its companion and send signals to fire cold gas thrusters and make fine adjustments.

The laser will give Proba-3 the ability to control the distance between the two satellites with an error of less than a millimeter—around the thickness of an average fingernailand hold position for up to six hours, 50 times longer than the maximum duration of a total solar eclipse. Proba-3 will create the eclipses while it is flying farthest from Earth in its nearly 20-hour orbit.

Scientists hope to achieve at least 1,000 hours of artificial totality during Proba-3’s two-year prime mission.

Proba-3’s Occulter spacecraft (top) and Coronagraph spacecraft (bottom) will hold position 150 meters away from one another. Credit: ESA-P. Carril

The corona extends millions of miles from the Sun’s convective surface, with temperatures as hot as 3.5 million degrees Fahrenheit. Still, the corona is easily outshined by the glare from the Sun itself. Scientists say it’s important to study this region to understand how the Sun generates the solar wind and drives geomagnetic storms that can affect the Earth.

NASA’s Parker Solar Probe, well-insulated from the scorching heat, became the first spacecraft to fly through the corona in 2021. It is collecting data on the actual conditions within the Sun’s atmosphere, while a network of other spacecraft monitor solar activity from afar to get the big picture.

Proba-3 is tasked with imaging a normally invisible part of the corona as close as 43,500 miles (70,000 kilometers) above the Sun’s surface. Extreme ultraviolet instruments are capable of observing the part of the corona closest to the Sun, while existing coronagraphs on other satellites are good at seeing the outermost portion of the corona.

“That leaves a significant observing gap, from about 3 solar radii down to 1.1 solar radii, that Proba-3 will be able to fill,” said Andrei Zhukov of the Royal Observatory of Belgium, principal investigator for Proba-3’s coronagraph instrument. “This will make it possible, for example, to follow the evolution of the colossal solar explosions called Coronal Mass Ejections as they rise from the solar surface and the outward acceleration of the solar wind.”

Proba-3’s coronagraph instrument will take images as often as once every two seconds, helping scientists search for small-scale fast-moving plasma waves that might be responsible for driving up the corona’s hellish temperatures. The mission will also hunt for the glow of plasma jets scientists believe have a role in accelerating the solar wind, a cloud of particles streaming away from the Sun at speeds of up to 1.2 million mph (2 million km/hr).

These are two of the core science objectives for the Proba-3 mission. But the project has a deeper purpose of proving two satellites can continually fly in tight formation. This level of precision could meet the exacting demands of future space missions, such as Mars Sample Return and the clearing of space junk from low-Earth orbit, according to ESA.

“Proba-3’s coronal observations will take place as part of a larger in-orbit demonstration of precise formation flying,” said Josef Aschbacher, ESA’s director general. “The best way to prove this new European technology works as intended is to produce novel science data that nobody has ever seen before.

“It is not practical today to fly a single 150-meter-long spacecraft in orbit, but if Proba-3 can indeed achieve an equivalent performance using two small spacecraft, the mission will open up new ways of working in space for the future,” Aschbacher said in a statement. “Imagine multiple small platforms working together as one to form far-seeing virtual telescopes or arrays.”

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Two European satellites launch on mission to blot out the Sun—for science Read More »

google’s-genie-2-“world-model”-reveal-leaves-more-questions-than-answers

Google’s Genie 2 “world model” reveal leaves more questions than answers


Making a command out of your wish?

Long-term persistence, real-time interactions remain huge hurdles for AI worlds.

A sample of some of the best-looking Genie 2 worlds Google wants to show off. Credit: Google Deepmind

In March, Google showed off its first Genie AI model. After training on thousands of hours of 2D run-and-jump video games, the model could generate halfway-passable, interactive impressions of those games based on generic images or text descriptions.

Nine months later, this week’s reveal of the Genie 2 model expands that idea into the realm of fully 3D worlds, complete with controllable third- or first-person avatars. Google’s announcement talks up Genie 2’s role as a “foundational world model” that can create a fully interactive internal representation of a virtual environment. That could allow AI agents to train themselves in synthetic but realistic environments, Google says, forming an important stepping stone on the way to artificial general intelligence.

But while Genie 2 shows just how much progress Google’s Deepmind team has achieved in the last nine months, the limited public information about the model thus far leaves a lot of questions about how close we are to these foundational model worlds being useful for anything but some short but sweet demos.

How long is your memory?

Much like the original 2D Genie model, Genie 2 starts from a single image or text description and then generates subsequent frames of video based on both the previous frames and fresh input from the user (such as a movement direction or “jump”). Google says it trained on a “large-scale video dataset” to achieve this, but it doesn’t say just how much training data was necessary compared to the 30,000 hours of footage used to train the first Genie.

Short GIF demos on the Google DeepMind promotional page show Genie 2 being used to animate avatars ranging from wooden puppets to intricate robots to a boat on the water. Simple interactions shown in those GIFs demonstrate those avatars busting balloons, climbing ladders, and shooting exploding barrels without any explicit game engine describing those interactions.

Those Genie 2-generated pyramids will still be there in 30 seconds. But in five minutes? Credit: Google Deepmind

Perhaps the biggest advance claimed by Google here is Genie 2’s “long horizon memory.” This feature allows the model to remember parts of the world as they come out of view and then render them accurately as they come back into the frame based on avatar movement. This kind of persistence has proven to be a persistent problem for video generation models like Sora, which OpenAI said in February “do[es] not always yield correct changes in object state” and can develop “incoherencies… in long duration samples.”

The “long horizon” part of “long horizon memory” is perhaps a little overzealous here, though, as Genie 2 only “maintains a consistent world for up to a minute,” with “the majority of examples shown lasting [10 to 20 seconds].” Those are definitely impressive time horizons in the world of AI video consistency, but it’s pretty far from what you’d expect from any other real-time game engine. Imagine entering a town in a Skyrim-style RPG, then coming back five minutes later to find that the game engine had forgotten what that town looks like and generated a completely different town from scratch instead.

What are we prototyping, exactly?

Perhaps for this reason, Google suggests Genie 2 as it stands is less useful for creating a complete game experience and more to “rapidly prototype diverse interactive experiences” or to turn “concept art and drawings… into fully interactive environments.”

The ability to transform static “concept art” into lightly interactive “concept videos” could definitely be useful for visual artists brainstorming ideas for new game worlds. However, these kinds of AI-generated samples might be less useful for prototyping actual game designs that go beyond the visual.

On Bluesky, British game designer Sam Barlow (Silent Hill: Shattered Memories, Her Story) points out how game designers often use a process called whiteboxing to lay out the structure of a game world as simple white boxes well before the artistic vision is set. The idea, he says, is to “prove out and create a gameplay-first version of the game that we can lock so that art can come in and add expensive visuals to the structure. We build in lo-fi because it allows us to focus on these issues and iterate on them cheaply before we are too far gone to correct.”

Generating elaborate visual worlds using a model like Genie 2 before designing that underlying structure feels a bit like putting the cart before the horse. The process almost seems designed to generate generic, “asset flip”-style worlds with AI-generated visuals papered over generic interactions and architecture.

As podcaster Ryan Zhao put it on Bluesky, “The design process has gone wrong when what you need to prototype is ‘what if there was a space.'”

Gotta go fast

When Google revealed the first version of Genie earlier this year, it also published a detailed research paper outlining the specific steps taken behind the scenes to train the model and how that model generated interactive videos. No such research paper has been published detailing Genie 2’s process, leaving us guessing at some important details.

One of the most important of these details is model speed. The first Genie model generated its world at roughly one frame per second, a rate that was orders of magnitude slower than would be tolerably playable in real time. For Genie 2, Google only says that “the samples in this blog post are generated by an undistilled base model, to show what is possible. We can play a distilled version in real-time with a reduction in quality of the outputs.”

Reading between the lines, it sounds like the full version of Genie 2 operates at something well below the real-time interactions implied by those flashy GIFs. It’s unclear how much “reduction in quality” is necessary to get a diluted version of the model to real-time controls, but given the lack of examples presented by Google, we have to assume that reduction is significant.

Oasis’ AI-generated Minecraft clone shows great potential, but still has a lot of rough edges, so to speak. Credit: Oasis

Real-time, interactive AI video generation isn’t exactly a pipe dream. Earlier this year, AI model maker Decart and hardware maker Etched published the Oasis model, showing off a human-controllable, AI-generated video clone of Minecraft that runs at a full 20 frames per second. However, that 500 million parameter model was trained on millions of hours of footage of a single, relatively simple game, and focused exclusively on the limited set of actions and environmental designs inherent to that game.

When Oasis launched, its creators fully admitted the model “struggles with domain generalization,” showing how “realistic” starting scenes had to be reduced to simplistic Minecraft blocks to achieve good results. And even with those limitations, it’s not hard to find footage of Oasis degenerating into horrifying nightmare fuel after just a few minutes of play.

What started as a realistic-looking soldier in this Genie 2 demo degenerates into this blobby mess just seconds later. Credit: Google Deepmind

We can already see similar signs of degeneration in the extremely short GIFs shared by the Genie team, such as an avatar’s dream-like fuzz during high-speed movement or NPCs that quickly fade into undifferentiated blobs at a short distance. That’s not a great sign for a model whose “long memory horizon” is supposed to be a key feature.

A learning crèche for other AI agents?

From this image, Genie 2 could generate a useful training environment for an AI agent and a simple “pick a door” task. Credit: Google Deepmind

Genie 2 seems to be using individual game frames as the basis for the animations in its model. But it also seems able to infer some basic information about the objects in those frames and craft interactions with those objects in the way a game engine might.

Google’s blog post shows how a SIMA agent inserted into a Genie 2 scene can follow simple instructions like “enter the red door” or “enter the blue door,” controlling the avatar via simple keyboard and mouse inputs. That could potentially make Genie 2 environment a great test bed for AI agents in various synthetic worlds.

Google claims rather grandiosely that Genie 2 puts it on “the path to solving a structural problem of training embodied agents safely while achieving the breadth and generality required to progress towards [artificial general intelligence].” Whether or not that ends up being true, recent research shows that agent learning gained from foundational models can be effectively applied to real-world robotics.

Using this kind of AI model to create worlds for other AI models to learn in might be the ultimate use case for this kind of technology. But when it comes to the dream of an AI model that can create generic 3D worlds that a human player could explore in real time, we might not be as close as it seems.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Google’s Genie 2 “world model” reveal leaves more questions than answers Read More »

after-critics-decry-orion-heat-shield-decision,-nasa-reviewer-says-agency-is-correct

After critics decry Orion heat shield decision, NASA reviewer says agency is correct


“If this isn’t raising red flags out there, I don’t know what will.”

NASA’s Orion spacecraft, consisting of a US-built crew module and European service module, is lifted during prelaunch processing at Kennedy Space Center in 2021. Credit: NASA/Amanda Stevenson

Within hours of NASA announcing its decision to fly the Artemis II mission aboard an Orion spacecraft with an unmodified heat shield, critics assailed the space agency, saying it had made the wrong decision.

“Expediency won over safety and good materials science and engineering. Sad day for NASA,” Ed Pope, an expert in advanced materials and heat shields, wrote on LinkedIn.

There is a lot riding on NASA’s decision, as the Artemis II mission involves four astronauts and the space agency’s first crewed mission into deep space in more than 50 years.

A former NASA astronaut, Charles Camarda, also expressed his frustrations on LinkedIn, saying the space agency and its leadership team should be “ashamed.” In an interview on Friday, Camarda, an aerospace engineer who spent two decades working on thermal protection for the space shuttle and hypersonic vehicles, said NASA is relying on flawed probabilistic risk assessments and Monte Carlo simulations to determine the safety of Orion’s existing heat shield.

“I worked at NASA for 45 years,” Camarda said. “I love NASA. I do not love the way NASA has become. I do not like that we have lost our research culture.”

NASA makes a decision

Pope, Camarada, and others—an official expected to help set space policy for the Trump administration told Ars on background, “It’s difficult to trust any of their findings”—note that NASA has spent two years assessing the char damage incurred by the Orion spacecraft during its first lunar flight in late 2022, with almost no transparency. Initially, agency officials downplayed the severity of the issue, and the full scope of the problem was not revealed until a report this May by NASA’s inspector general, which included photos of a heavily pock-marked heat shield.

This year, from April to August, NASA convened an independent review team (IRT) to assess its internal findings about the root cause of the charring on the Orion heat shield and determine whether its plan to proceed without modifications to the heat shield was the correct one. However, though this review team wrapped up its work in August and began briefing NASA officials in September, the space agency kept mostly silent about the problem until a news conference on Thursday.

The inspector general’s report on May 1 included new images of Orion’s heat shield.

Credit: NASA Inspector General

The inspector general’s report on May 1 included new images of Orion’s heat shield. Credit: NASA Inspector General

“Based on the data, we have decided—NASA unanimously and our decision-makers—to move forward with the current Artemis II Orion capsule and heat shield, with a modified entry trajectory,” Bill Nelson, NASA’s administrator, said Thursday. The heat shield investigation and other issues with the Orion spacecraft will now delay the Artemis II launch until April 2026, a slip of seven months from the previous launch date in September 2025.

Notably the chair of the IRT, a former NASA flight director named Paul Hill, was not present at Thursday’s news conference. Nor did the space agency release the IRT’s report on its recommendations to NASA.

In an interview, Camarda said he knew two people on the IRT who dissented from its conclusions that NASA’s plan to fly the Orion heat shield, without modifications to address the charring problem, was acceptable. He also criticized the agency for not publicly releasing the independent report. “NASA did not post the results of the IRT,” he said. “Why wouldn’t they post the results of what the IRT said? If this isn’t raising red flags out there, I don’t know what will.”

The view from the IRT

Ars took these concerns to NASA on Friday, and the agency responded by offering an interview with Paul Hill, the review team’s chair. He strongly denied there were any dissenting views.

“Every one of our conclusions, every one of our recommendations, was unanimously agreed to by our team,” Hill said. “We went through a lot of effort, arguing sentence by sentence, to make sure the entire team agreed. To get there we definitely had some robust and energetic discussions.”

Hill did acknowledge that, at the outset of the review team’s discussions, two people were opposed to NASA’s plan to fly the heat shield as is. “There was, early on, definitely a difference of opinion with a couple of people who felt strongly that Orion’s heat shield was not good enough to fly as built,” he said.

However, Hill said the IRT was won over by the depth of NASA’s testing and the openness of agency engineers who worked with them. He singled out Luis Saucedo, a NASA engineer at NASA’s Johnson Space Center who led the agency’s internal char loss investigation.

“The work that was done by NASA, it was nothing short of eye-watering, it was incredible,” Hill said.

At the base of Orion, which has a titanium shell, there are 186 blocks of a material called Avcoat individually attached to provide a protective layer that allows the spacecraft to survive the heating of atmospheric reentry. Returning from the Moon, Orion encounters temperatures of up to 5,000° Fahrenheit (2,760° Celsius). A char layer that builds up on the outer skin of the Avcoat material is supposed to ablate, or erode, in a predictable manner during reentry. Instead, during Artemis I, fragments fell off the heat shield and left cavities in the Avcoat material.

Work by Saucedo and others, including substantial testing in ground facilities, wind tunnels, and high-temperature arc jet chambers, allowed engineers to find the root cause of gases getting trapped in the heat shield and leading to cracking. Hill said his team was convinced that NASA successfully recreated the conditions observed during reentry and were able to replicate during testing the Avcoat cracking that occurred during Artemis I.

When he worked at the agency, Hill played a leading role during the investigation into the cause of the loss of space shuttle Columbia, in 2003. He said he could understand if NASA officials “circled the wagons” in response to the IRT’s work, but he said the agency could not have been more forthcoming. Every time the review team wanted more data or information, it was made available. Eventually, this made the entire IRT comfortable with NASA’s findings.

Publicly, NASA could have been more transparent

The stickiest point during the review team’s discussions involved the permeability of the heat shield. Counter-intuitively, the heat shield was not permeable enough during Artemis I. This led to gas buildup, higher pressures, and the cracking ultimately observed. The IRT was concerned because, as designed, the heat shield for Artemis II is actually more impermeable than the Artemis I vehicle.

Why is this? It has to do with the ultrasound testing that verifies the strength of the bond between the Avcoat blocks and the titanium skin of Orion. With a more permeable heat shield, it was difficult to complete this testing with the Artemis I vehicle. So the shield for Artemis II was made more impermeable to accommodate ultrasound testing. “That was a technical mistake, and when they made that decision they did not understand the ramifications,” Hill said.

However, Hill said NASA’s data convinced the IRT that modifying the entry profile for Artemis II, to minimize the duration of passage through the atmosphere, would offset the impermeability of the heat shield.

Hill said he did not have the authority to release the IRT report, but he did agree that the space agency has not been forthcoming with public information about their analyses before this week.

“This is a complex story to tell, and if you want everybody to come along with you, you’ve got to keep them informed,” he said of NASA. “I think they unintentionally did themselves a disservice by holding their cards too close.”

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

After critics decry Orion heat shield decision, NASA reviewer says agency is correct Read More »

the-2025-bmw-i5-m60-review:-an-ev-that-makes-you-want-to-drive-and-drive

The 2025 BMW i5 M60 review: An EV that makes you want to drive and drive

In fact, I think the cheaper, less powerful i5 eDrive40 (or the all-wheel drive xDrive40) is the better i5, but BMW didn’t have one of those available in the press fleet, so a review of that version will have to wait for one to show up. As I often write, the most powerful version of any given EV is usually a worse deal, as they’re invariably fitted with big, range-sapping wheels, and it’s not like a 0–60 time of 5.7 seconds is particularly slow, even by 2024’s standards.

And those big wheels cause a range hit—the EPA rates the i5 M60 at 240 miles (386 km) on a full charge, although in Efficient mode that should be beatable—according to our test car, over 1,000 miles (1,609 km), it averaged 3.2 miles/kWh (19.4 kWh/100 km). Then again, if you stick it in Sport mode and hoof the throttle too often, it’s not hard to see that number plummet to 2.4 miles/kWh (25.9 kWh/100 km).

BMW i5 Hofmeister kink

In case you forgot which series BMW this is, the panel set into the Hofmeister kink reminds you it’s a 5. Credit: BMW

As the latest version of BMW’s fifth-gen EV powertrain, the i5 has its most up-to-date fast charging software, which uses a new control strategy to maintain higher levels of power for longer while plugged into a DC fast charger, even when starting at a state of charge as high as 50 percent. During our testing, we fast-charged the i5 from 19 to 91 percent, which took a couple of seconds more than 37 minutes, delivering 62 kWh and peaking at an impressive 209 kW, although before long power delivery dropped to 150 kW.

Software-defined emotions

Sport mode is fast and sounds good, accompanied as it is by Hans Zimmer-crafted powertrain sounds. And Efficient, which mostly just relies on the 335 hp (250 kW), 317 lb-ft (430 Nm) rear motor, is quiet and comfortable. But the i5 offers you some other choices, including Expressive, Relax, and Digital Art modes, which reconfigure the cabin lighting, the dynamic wallpaper on the curved display, and the powertrain sounds.

The 2025 BMW i5 M60 review: An EV that makes you want to drive and drive Read More »

these-spiders-listen-for-prey-before-hurling-webs-like-slingshots

These spiders listen for prey before hurling webs like slingshots

Along came a spider

A) Untensed web shown from front view. (B) Tensed web shown from side view.

A) Untensed web shown from front view. (B) Tensed web shown from side view. Credit: S.I. Han and T.A. Blackledge, 2024

The 19 spiders built 26 webs over the testing period. For the experiments, Han and Blackledge used a weighted tuning fork with frequencies in the mid-range for whirring wings for many mosquito species in North America as a control stimulus. They also attached actual mosquitos to thin strips of black construction paper by dabbing a bit of superglue on their abdomens or hind legs. This ensured the mosquitos could still beat their wings when approaching the webs. The experiments were recorded on high-speed video for analysis.

As expected, spiders released their webs when flapping mosquitoes drew near, but the video footage showed that the releases occurred before the mosquitoes ever touched the web. The spiders released their webs just as frequently when the tuning fork was brandished nearby. It wasn’t likely that they were relying on visual cues because the spiders were centered at the vertex of the web and anchor line, facing away from the cone. Ray spiders also don’t have well-developed eyes. And one spider did not respond to a motionless mosquito held within the capture cone but released its web only when the insect started flapping its wings.

“The decision to release a web is therefore likely based upon vibrational information,” the authors concluded, noting that ray spiders have sound-sensitive hairs on their back legs that could be detecting air currents or sound waves since those legs are typically closest to the cone. Static webs are known to vibrate in response to airborne sounds, so it seems likely that ray spiders can figure out an insect’s approach, its size, or maybe even its behavior before the prey ever makes contact with the web.

As for the web kinematics, Han and Blackledge determined that they can accelerate up to 504 m/s2, reaching speeds as high as 1 m/s, and hence can catch mosquitos in 38 milliseconds or less. Even the speediest mosquitoes might struggle to outrun that.

Journal of Experimental Biology, 2024. DOI: 10.1242/jeb.249237  (About DOIs).

These spiders listen for prey before hurling webs like slingshots Read More »

openai-teases-12-days-of-mystery-product-launches-starting-tomorrow

OpenAI teases 12 days of mystery product launches starting tomorrow

On Wednesday, OpenAI CEO Sam Altman announced a “12 days of OpenAI” period starting December 5, which will unveil new AI features and products for 12 consecutive weekdays.

Altman did not specify the exact features or products OpenAI plans to unveil, but a report from The Verge about this “12 days of shipmas” event suggests the products may include a public release of the company’s text-to-video model Sora and a new “reasoning” AI model similar to o1-preview. Perhaps we may even see DALL-E 4 or a new image generator based on GPT-4o’s multimodal capabilities.

Altman’s full tweet included hints at releases both big and small:

🎄🎅starting tomorrow at 10 am pacific, we are doing 12 days of openai.

each weekday, we will have a livestream with a launch or demo, some big ones and some stocking stuffers.

we’ve got some great stuff to share, hope you enjoy! merry christmas.

If we’re reading the calendar correctly, 12 weekdays means a new announcement every day until December 20.

OpenAI teases 12 days of mystery product launches starting tomorrow Read More »

amazon-secretly-slowed-deliveries,-deceived-anyone-who-complained,-lawsuit-says

Amazon secretly slowed deliveries, deceived anyone who complained, lawsuit says

In a statement to Ars, Amazon spokesperson Kelly Nantel said that claims that Amazon’s “business practices are somehow discriminatory or deceptive” are “categorically false.”

Nantel said that Amazon started using third-party services to deliver to these areas to “put the safety of delivery drivers first.”

“In the ZIP codes in question, there have been specific and targeted acts against drivers delivering Amazon packages,” Nantel said. “We made the deliberate choice to adjust our operations, including delivery routes and times, for the sole reason of protecting the safety of drivers.”

Nantel also pushed back on claims that Amazon concealed this choice, claiming that the company is “always transparent with customers during the shopping journey and checkout process about when, exactly, they can expect their orders to arrive.”

But that doesn’t really gel with Schwalb’s finding that even customers using Amazon’s support chat were allegedly misled. During one chat, a frustrated user pointing out discrepancies between DC ZIP codes asked if Amazon “is a waste of money in my zip code?” Instead of confirming that the ZIP code was excluded from in-house delivery services, the support team member seemingly unhelpfully suggested the user delete and re-add their address to their account.

“Amazon has doubled down on its deception by refusing to disclose the fact of the delivery exclusion, and instead has deceptively implied that slower speeds are simply due to other circumstances, rather than an affirmative decision by Amazon,” Schwalb’s complaint said.

Schwalb takes no issue with Amazon diverting delivery drivers from perceived high-crime areas but insists that Amazon owes its subscribers in those regions an explanation for delivery delays and perhaps even cheaper subscription prices. He has asked for an injunction on Amazon’s allegedly deceptive advertising urging users to pay for fast shipments they rarely, if ever, receive. He also wants Amazon to refund subscribers seemingly cheated out of full subscription benefits and has asked a jury to award civil damages to deter future unfair business practices. Amazon could owe millions in a loss, with each delivery to almost 50,000 users since mid-2022 considered a potential violation.

Nantel said that Amazon has offered to “work together” with Schwalb’s office “to reduce crime and improve safety in these areas” but did not suggest Amazon would be changing how it advertises Prime delivery in the US. Instead, the e-commerce giant plans to fight the claims and prove that “providing fast and accurate delivery times and prioritizing the safety of customers and delivery partners are not mutually exclusive,” Nantel said.

Amazon secretly slowed deliveries, deceived anyone who complained, lawsuit says Read More »

seagrass-is-fantastic-at-carbon-capture—and-it’s-at-risk-of-extinction

Seagrass is fantastic at carbon capture—and it’s at risk of extinction


An underwater gardening experiment along the East Coast aims at restoration.

A crab inhabits a bed of eelgrass at Cape Cod National Seashore in Massachusetts. Eelgrass provides critical habitat for hundreds of species. Credit: Holly Plaisted/National Park Service

In late September, seagrass ecologist Alyssa Novak pulled on her neoprene wetsuit, pressed her snorkel mask against her face, and jumped off an oyster farming boat into the shallow waters of Pleasant Bay, an estuary in the Cape Cod National Seashore in Massachusetts. Through her mask she gazed toward the sandy seabed, about 3 feet below the surface at low tide, where she was about to plant an experimental underwater garden of eelgrass.

Naturally occurring meadows of eelgrass—the most common type of seagrass found along the East Coast of the United States—are vanishing. Like seagrasses around the world, they have been plagued for decades by dredging, disease, and nutrient pollution from wastewater and agricultural runoff. The nutrient overloads have fueled algal blooms and clouded coastal waters with sediments, blocking out sunlight the marine plants need to make food through photosynthesis and suffocating them.

The United Nations Environment Program reports more than 20 of the world’s 72 seagrass species are on the decline. As a result, an estimated 7 percent of these habitats are lost each year.

In the western Atlantic, some eelgrass meadows have been reduced by more than 90 percent in the last 100 years, according to The Nature Conservancy, an environmental nonprofit that works to protect lands and waters around the world.

Now, rising sea surface temperatures caused by global warming are pushing the plant to the brink of extinction. Novak, a research assistant professor at Boston University who has studied eelgrass in New England for more than a decade, and a multidisciplinary team of scientists in different states are trying their best to make sure this does not become reality.

Together, they are working to restore eelgrass populations in coastal parks from Maine to North Carolina using a novel approach that has never been tried before with a marine plant: assisted migration.

“We’re trying to identify thermo-tolerant individuals up and down the East Coast and try to move them into areas where the populations are stressed by increases in sea surface temperature, so that we can give those populations a chance of surviving into the future,” Novak said.

Typically, eelgrass thrives in water temperatures between 60° and 68° Fahrenheit, according to Novak. In the last 20 years, sea surface temperatures in the Northeast have warmed faster than the global ocean and exceeded that safe range, mostly due to human activity like burning fossil fuels, according to NOAA Fisheries, a federal agency charged with managing and protecting marine resources in the US.

Blades of eelgrass are viewed up close at Cape Cod National Seashore.

Credit: Holly Plaisted/National Park Service

Blades of eelgrass are viewed up close at Cape Cod National Seashore. Credit: Holly Plaisted/National Park Service

Around 77° Fahrenheit the plants become stressed and struggle to photosynthesize, said Novak. Around 82° they begin to expire. “That’s when the plants no longer can handle the heat stress, and they end up dying,” she said. And it’s getting hotter.

In recent years, she said, water temperatures along the East Coast have surpassed 82° during peak summer months. By 2050, they are expected to increase in the Northeast by two degrees, she said.

The common garden experiment

Anticipating the deadly forecast for eelgrass, The Nature Conservancy brought together a group of scientists in 2022 to figure out how they might change the plant’s trajectory. Together, the experts on seagrasses, corals, agriculture, forestry, and plant genetics explored options based on what had been done to address the effects of climate change on other ecosystems.

“We wanted to figure out what the solutions were that different groups had come up with, and from those, which ones might apply to the seagrass world,” said Boze Hancock, senior marine restoration scientist with The Nature Conservancy’s global oceans team.

Prolonged marine heatwaves and coral disease have prompted some scientists to experiment with cross-breeding and replanting heat-resistant corals in warming waters, for example. In some cases they have removed whole coral colonies from their natural habitat to preserve their genetics in land-based biobanks.

One of the workshop invitees, biologist Thomas Whitham, shared with the group how he’s used a scientific research tool called the “common garden experiment” to restore deciduous Fremont cottonwood forests that have been dying off in Arizona due to rising temperatures and drought.

The experiments involve collecting plants from different locations and moving them to designated locations to observe how they respond to new environmental conditions. In the case of Fremont cottonwoods, Whitham said the technique has proven vital to identifying trees with specific genetic traits that make them more heat and drought resilient. Cuttings from these trees are now being planted in areas where less resilient trees died off to restore the species in a process known as “assisted migration.”

“We’ve planted many thousands, tens of thousands, of trees using this common garden approach,” said Whitham, a Regents’ professor in the department of biological sciences at Northern Arizona University. It could work for eelgrass too, he told the group.

They could collect seeds from eelgrass populations in the south and plant them in cooler northern waters alongside local seeds and, in effect, identify plants that have a propensity to thrive in warmer temperatures.

Workshop participants were eager to try, said attendee Jonathan Lefcheck, a research ​scientist at the University of Maryland Center for Environmental Science who has studied seagrasses in the Chesapeake Bay for more than 15 years. “If we do nothing, it’s likely that seagrass—eelgrass—will be extirpated all the way up to New York in the next 50 years,” he said. And with it, all the services it provides to wildlife and humans.

Underwater forests

Eelgrass provides critical habitat for hundreds of species.

“It’s the forest under the water in the estuaries,” said Bradley Peterson, a professor of marine science at Stony Brook University’s School of Marine and Atmospheric Sciences who helped initiate the workshops in collaboration with The Nature Conservancy.

Scientists believe seagrasses evolved from terrestrial plants 70 to 100 millions years ago. “When they went into the marine world, they brought all the machinery they had with them for the terrestrial world, real seeds, real flowers, and real roots,” said Peterson, who is working to restore eelgrass near Long Island.

Its green grass blades, which can grow up to a couple feet long, offer food and shelter to horseshoe crabs, seahorses, and fish of all sizes that weave through its mazes. Little shrimp pollinate the plant’s flowers like “bees of the sea,” said Lefcheck. For bigger fish, “it’s this beautiful buffet,” he said. “You get this whole ecosystem that’s built up around this habitat that’s just sort of gently swaying there underneath the waves.”

In New England, eelgrass is vital for commercial scallop and oyster fisheries. Same for the Atlantic cod. “The cod industry is massive, so if you start losing that habitat, then your commercial fisheries go,” Novak said.

You also lose important coastline protection. Seagrass helps prevent erosion and buffers shorelines from flooding and storm surge. It can reduce wave energy by 50 percent, according to Novak. It also improves water quality and clarity by filtering pollutants and storing excess nutrients, reducing the prevalence of bacteria that can cause coral disease or contaminate seafood. “If you lose eelgrass, you’re going to have dirtier waters,” she said. Global warming could also be exacerbated.

tuft of eel grass

Eelgrass is the most dominant type of seagrass along the East Coast.

Credit: d3_plus D.Naruse @ Japan via Getty

Eelgrass is the most dominant type of seagrass along the East Coast. Credit: d3_plus D.Naruse @ Japan via Getty

Seagrasses sequester up to 18 percent of carbon stored in the ocean, capturing it 35 times faster than tropical rainforests, according to the World Wide Fund for Nature. The New York Department of State, Office of Planning, Development and Community Infrastructure reports each acre of seagrass can potentially sequester the same amount of carbon emitted by a car driving nearly 4,000 miles each year. But when this unique marine habitat is destroyed, carbon that has been stored in the plant’s roots and sediments—sometimes for thousands of years—is released back into the atmosphere, said Novak.

Sharing seeds

To have a chance at repopulating eelgrass along the East Coast, scientists like Novak, Peterson, and Lefcheck realized they would have to share information and collaborate across state borders—something to which academics are not always accustomed, according to Novak.

“It’s not our nature to share information that freely, because we’re supposed to be focusing on publishing,” she said. But the crisis at hand had inspired a change in the status quo. “We’re a team,” she said. “We’re about saving the eelgrass and doing what’s best for this ecosystem.”

They call the regional effort HEAT (Helping Eelgrass Adapt to Temperature). In the last year, participants have been working together to identify the best possible sites for planting common gardens along the East Coast. So far, they’ve homed in on several national parks: the Cape Cod National Seashore, Fire Island National Seashore in New York, Assateague Island in Maryland and Cape Hatteras and Cape Lookout national seashores in North Carolina.

“We want to set ourselves up for some success and use the information we have about these parks to guide our decision-making and make sure we’re putting these in places where they might have enough light, where they won’t have as many human impacts,” said Lefcheck.

They’ve also begun collecting and sharing seeds. “We’re sharing actual plants with each other for genomics, and then we’re also sharing seeds with each other for doing our common gardens and for experiments,” Novak said.

This past year Novak sent samples of eelgrass plants collected in Massachusetts to the University of North Carolina Wilmington for Stephanie Kamel, a professor in the department of biology and marine biology at the university, to analyze. Kamel is looking for plants that have specific genetic markers that might make them more resilient to challenging environmental conditions like warmer temperatures and lower light, which is becoming an increasing problem as sea levels rise due to global warming pushing the plants deeper underwater. Currently, she’s analyzing the DNA of 800 eelgrass plants from 60 meadows along the East Coast. “We’re going to have this sort of unprecedented level of detail about genomic variation across the range of Zostera (eelgrass),” said Kamel.

This information could be used to help collaborators figure out which seeds they should plant in different locations based on their specific environmental conditions and challenges, said Jessie Jarvis, a seagrass ecologist and professor who works with Kamel at the University of North Carolina Wilmington.

“It’s almost like a dating app for seagrass,” Jarvis said. “You could be a little bit smarter about picking your source populations to match what your restoration needs are, rather than just kind of throwing seeds from everywhere and hoping that something works.”

In the meantime, though, common gardening remains the most practical tool to figure out which plants from which locations may be the best stock for future eelgrass populations. This past year Kamel and Jarvis piloted a common garden experiment in North Carolina and Virginia.

“We took those seeds from what we thought were, quote, unquote, good sources (in North Carolina), and we actually moved them to Virginia. And then we took some Virginia seeds and moved them to North Carolina to actually see what would happen in terms of growth,” said Kamel. While it’s still too early to draw firm conclusions from the experiment, Kamel said preliminary results seem promising. “There are really encouraging signs that we have been able to find some genomic changes associated with temperature resilience,” she said.

Others are following suit. This past spring, Novak and Peterson harvested reproductive eelgrass shoots filled with seeds while snorkeling and scuba diving in Acadia National Park in Maine and Cape Cod, Nantucket, Gloucester in Massachusetts. Lefcheck harvested in Maryland. “What we do is harvest them before they’ve released the seeds, because the seeds are tiny, like the size of a pinhead,” Lefcheck said. The shoots are then held in saltwater tanks until the seeds drop and can easily be collected and stored until it’s time to plant them.

It’s best to wait to plant eelgrass in the early fall, after most of the late summer storms have passed, according to Novak, who spent several days planting seeds in Pleasant Bay and nearby East Harbor this September with a team including a biologist from the National Park Service and a representative from the Mashpee Wampanoag Tribe. To get to the Pleasant Bay site, they motored out onto the water on an oyster farming boat. “The oyster farmers are interested in the project because our site is adjacent to their farm and they recognize that healthy beds are important to sustaining their livelihood,” Novak said.

Before getting wet, Novak and her team ran through their gardening plan. “We do dry runs on land, just to get everybody organized, but it’s not the same when you get into the water,” she said. “You’re trying to hold things underwater. You can’t see as well, even if you have a mask on.”

They would establish two 25-meter transect lines and then plant seeds from different donor sites in New York and Massachusetts. Nantucket was one of them. “We knew conditions were warmer at that particular site, so we said, let’s, let’s test them at Cape Cod,” she said.

Up to 500 seeds from each location would be planted by releasing them into the water column from a test tube or dropping tea bags filled with the seeds that would meander their way down to the seabed into 1-meter plots.

It was a slow process, Novak said, requiring hyper organization to make sure it’s clear which seeds have been planted where so that they can be monitored. In January, she will return to the sites to see if the plants are germinating. Then in the spring she’ll really be able to measure growth and compare how the different plants are faring in comparison to one another. “By next summer, we should have genomics for all of our populations, so that should really be guiding our efforts at that point,” she said.

Teresa Tomassoni is an environmental journalist covering the intersections between oceans, climate change, coastal communities, and wildlife for Inside Climate News. Her previous work has appeared in The Washington Post, NPR, NBC Latino, and the Smithsonian American Indian Magazine. Teresa holds a graduate degree in journalism from the Craig Newmark Graduate School of Journalism. She is also a recipient of the Stone & Holt Weeks Social Justice Reporting Fellowship. In addition to reporting on oceans, Teresa teaches climate solutions reporting for The School of the New York Times.

This story originally appeared on Inside Climate News.

Photo of Inside Climate News

Seagrass is fantastic at carbon capture—and it’s at risk of extinction Read More »

ai-#93:-happy-tuesday

AI #93: Happy Tuesday

You know how you can sometimes have Taco Tuesday… on a Thursday? Yep, it’s that in reverse. I will be travelling the rest of the week, so it made sense to put this out early, and incorporate the rest of the week into #94.

  1. Language Models Offer Mundane Utility. The price is fixed, so share and enjoy.

  2. Dare Not Speak Its Name. David Mayer. David Mayer! Guido Scorza?

  3. Language Models Don’t Offer Mundane Utility. It’s a flop.

  4. Huh, Upgrades. Cohere, and reports on Claude writing styles.

  5. Deepfaketown and Botpocalypse Soon. Why do we not care about spoof calls?

  6. Fun With Image Generation. Scott Sumner explains why he cares about art.

  7. The Art of the Jailbreak. You had one job.

  8. Get Involved. Anthropic AI safety fellows program, apply now.

  9. Introducing. a voice customization tool and a new eval based on various games.

  10. In Other AI News. Where do you draw the line? Who leaves versus who joins?

  11. Quiet Speculations. Rumors of being so back unsubstantiated at this time.

  12. Daron Acemoglu is Worried About Job Market Liquidity. I kid, but so does he?

  13. Pick Up the Phone. Report from China, not the same info I usually see.

  14. The Quest for Sane Regulations. Google antitrust foolishness, Cruz sends letters.

  15. The Week in Audio. Got a chance to listen to Dominic Cummings, was worth it.

  16. AGI Looking Like. You are made of atoms it could use for something else.

  17. Rhetorical Innovation. My (and your) periodic reminder on Wrong on the Internet.

  18. Open Weight Models are Unsafe and Nothing Can Fix This. Deal as best you can.

  19. Aligning a Smarter Than Human Intelligence is Difficult. Even words are tricky.

  20. We Would Be So Stupid As To. Once you say it out loud, you know the answer.

  21. The Lighter Side. It’s time to build.

Use voice mode as a real time translation app to navigate a hospital in Spain.

Get Claude to actually push back on you and explain that the fight you’re involved in isn’t worth it.

Get them talking, also you don’t have to read the books either.

Freyja: I wanted to figure out what to do about my baby’s sleep situation, so I read two books with entirely opposing theories on how infant sleep works, and then asked Claude to write a dialogue between them about my specific situation

It’s such a glorious time to be alive.

Make a market cap chart via a Replit Agent in 2 minutes rather than keep looking for someone else’s chart (CEO cheats a bit by using a not yet released UI but still).

Collude to fix prices. Ask it to maximize profits, and it will often figure out on its own that it can do so via implicit collusion. We want to tell the AIs and also the humans ‘do what maximizes profits, except ignore how your decisions impact the decisions of others in these particular ways and only those ways, otherwise such considerations are fine’ and it’s actually a rather weird rule when you think about it.

If you had AIs that behaved exactly like humans do, you’d suddenly realize they were implicitly colluding all the time. This is a special case of the general problem of:

  1. We have a law or norm saying you can’t do X.

  2. People do X all the time, it’s actually crazy or impossible not to.

  3. If you look at the statistics, it is quite obvious people are doing X all the time.

  4. But in any given case we do X implicitly and deniably, because laws and norms.

  5. This can still be valuable, because it limits the magnitude and impact of X.

  6. An AI does a similar amount of X and everyone loses their minds.

  7. The equilibrium breaks, usually in ways that make everything worse.

Aid writers by generating simulated comments? LessWrong team is experimenting with this. It seems super doable and also useful, and there’s a big superset of related techniques waiting to be found. No one needs to be flying blind, if they don’t want to.

Education done right?

Roon: I heard from an English professor that he encourages his students to run assignments through ChatGPT to learn what the median essay, story, or response to the assignment will look like so they can avoid and transcend it all.

If you can identify the slope vectors and create orthogonal works that are based.

Archived Videos: “Write in a style that would impress a English professor that asked me to run the assignment through ChatGPT to learn what the media essay would look like so that I can transcend that.”

Occasionally pause to ask yourself, what are you even doing? Question to ponder, if students intentionally avoid and ‘transcend’ the ‘median’ essay is their work going to be better or worse? How do you grade in response?

You can get a lot more out of AIs if you realize not to treat them like Google, including learning to dump in a ton of context and then ask for the high level answers. Ethan Mollick then has additional basic ‘good enough’ prompting tips.

There was at least a short period when ChatGPT refused to say the name “David Mayer.” Many people confirmed this was real, it was then patched but other names (including ‘Guido Scorza’) have as far as we know not yet been patched. There is a pattern of these names being people who have had issues with ChatGPT or OpenAI, sufficiently that it does not appear to be a coincidence.

OpenAI has confirmed this is due to flagging by an internal privacy tool.

Won’t someone think of the flops?

Roon: The flop utilization of humanity toward productive goals and interesting thoughts is completely terrible and somehow getting worse.

This is in part due to the totalizing homogenizing effects of technology!

ADI: Are you calling everyone dumb?

Roon: The opposite! The total amount of smarts on Earth has never been higher.

BayesLord: sir the underlying objective function would like a word.

Roon: Tell me.

BowTiedBlackCat: who decides “productive” and “interesting”?

Roon: Me.

Why should I spend my flops increasing flop utilization efficiency when I can instead use my flops to get more flops? The key thing AI does is it allows me to be horribly flop-inefficient and I love that so much.

Whereas getting older means you get to distill your models and be vastly more flop-efficient, but at the cost of steadily reducing your locally available flop count, which is net helpful until eventually it isn’t. If I had the efficiency I have now and the flops I had when I was 22, that would be a hell of a thing.

Dan Hendrycks points out that the average person cannot, by listening to them, tell the difference between a random mathematics graduate and Terence Tao, and many leaps in AI will feel like that for average people. Maybe, but I do think people can actually tell. I’m not the man on the street, but when I read Tao there is a kind of fluency and mastery that stands out even when I have no ability to follow the math, and which makes it more likely I will indeed be able to follow it. And as Thomas Woodside points out, people will definitely ‘feel the agents’ that result from similar advances.

Create pseudo-profound statements that are potentially persuasive and highly toxic. I actually think this is great, because it helps you understand how to interact with other similar ‘rules.’ Also, while we can all see the issue with these statements, some people need to reverse any advice they hear.

Sully having no luck getting Claude’s writing style feature working, whereas system prompt examples work fine. I ended up flipping it to ‘educational’ and thinking ‘huh, good enough for now.’ Others report mixed success. Sully and Logan Kilpatrick speculate there’s a huge market opportunity here, which seems plausible.

Cohere Rerank 3.5, which searches and analyzes business data and other documents and semi-structured data, claims enhanced reasoning, better multilinguality, substantial performance gains and better context understanding for things like emails, reports, JSON and code. No idea if how useful this modality actually is.

The closer examples are to people you know, the more meaningful it is, and I know (and am a big fan of) Cate Hall, so:

Cate Hall: Someone is calling people from my number, saying they have kidnapped me and are going to kill me unless the person sends money. I am fine. I do not know what is happening, but I am fine. Don’t send money!

Just a spoofing attempt, it seems. The phone is still working.

They are also using my voice.

Wow this is so frustrating, @Verizon can’t tell me anything except “file a police report” while this is still ongoing?? Has anyone experienced something like this before & able to recommend someone to help?

James Miller: I had people in my neighborhood being spammed with calls that had my name and phone number. I talk to police and phone company and told nothing I could do but change my phone number.

John Wittle: To be fair, spoofing a phone number is not something Verizon controls. You can just send whatever data packets you want, and type whatever phone number into the ‘from’ field you want, and verizon can’t stop you.

I am confused why we place so little value in the integrity of the phone system, where the police seem to not care about such violations, and we don’t move to make them harder to do.

It also seems like a clear case of ‘solve for the equilibrium’ and the equilibrium taking a remarkably long time to be found, even with current levels of AI. Why aren’t things vastly worse? Presumably malicious use of AI will push this to its breaking point rather soon, one way or another.

An offer to create an ‘AI persona’ based on your Tweets. I gave it a shot and… no. Epic fail, worse than Gallabytes’s.

Should you sell your words to an AI? Erik Hoel says no, we must take a stand, in his case to an AI-assisted book club, including the AI ‘rewriting the classics’ to modernize and shorten them, which certainly defaults to an abomination. So he turned down $20k to let that book club include an AI version of himself along with some of his commentary.

In case whoever did that is wondering: Yes, I would happily do that, sure, why not? Sounds like fun. If I had to guess I’d pick Thucydides. But seriously, do rethinking the ‘rewriting the classics’ part.

Also, it’s amusing to see lines like this:

Erik Hoel: The incentives here, near the peak of AI hype, are going to be the same as they were for NFTs. Remember when celebrities regularly shilled low-market-cap cryptos to the public? Why? Because they simply couldn’t say no to the money.

Even if we see relatively nothing: You aint seen nothing yet.

Scott Sumner on Scott Alexander on AI Art. Reading this emphasized to me that no, I don’t ‘care about art’ in the sense they’re thinking about it here.

An AI agent based on GPT-4 had one job, not to release funds, with exponentially growing cost to send messages to convince it to release funds (70% of the fee went to the prize pool, 30% to the developer). The prize pool got to ~$50k before someone got it to send the funds.

Anthropic fellows program for AI safety, in London or Berkeley, full funding for 10-15 fellows over six months, it is also an extended job interview, apply here by January 20.

BALROG, a set of environments for AI evaluations inspired by classic games including Minecraft, NetHack and Baba is You. GPT-4o was narrowly ahead of Claude 3.5 Sonnet. One flaw right now is that some of the games, especially NetHack, are too hard to impact the score, presumably you’d want some sort of log score system?

Hume offers Voice Control, allowing you to create new voices by moving ten sliders for things like ‘gender,’ ‘assertiveness’ and ‘smoothness.’ Seems like a great idea, especially on the margin if we can decompose existing voices into their components.

Rosie Campbell becomes the latest worried person to leave OpenAI after concluding they can can’t have enough positive impact from the inside. She previously worked with Miles Brundage. Meanwhile, Kate Rouch hired as OpenAI’s first Chief Marketing Officer.

Where should you draw the ethical line when working on AI capabilities? This post by Lucas Beyer considers the question in computer vision, drawing a contrast between identification, which has a lot of pro-social uses, and tracking, which they decided ends up being used mostly for bad purposes, although this isn’t obvious to me at all. In particular, ‘this can be used by law enforcement’ is not obviously a bad (or good) thing, there are very good reasons to track both people and things.

So the question then becomes, what about things that have many applications, but also accelerate tracking, or something else you deem harmful? Presumably one must talk price. Similarly, when dealing with things that could lead to existential risk, one must again talk (a very different type of) price.

A claim.

Roon (4: 48am eastern time on December 3, 2024): openai is unbelievably back.

This doesn’t mean we, with only human intelligence, can pull this off soon, but:

Miles Brundage: The real wall is an unwillingness to believe that human intelligence is not that hard to replicate and surpass.

He also points out that when we compare the best public models, the labs are often ‘not sending their best.’

Miles Brundage: Recent DeepSeek and Alibaba reasoning models are important for reasons I’ve discussed previously (search “o1” and my handle) but I’m seeing some folks get confused by what has and hasn’t been achieved yet. Specifically they both compared to o1-preview, not o1.

It is not uncommon to compare only to released models (which o1-preview is, and o1 isn’t) since you can confirm the performance, but worth being aware of: they were not comparing to the very best disclosed scores.

And conversely, this wasn’t the best DeepSeek or Alibaba can ultimately do, either.

Everyone actually doing this stuff at or near the frontier agrees there is plenty of gas left in the tank.

Given we are now approaching three months having o1-preview, this also emphasizes the question of why OpenAI continues to hold back o1, as opposed to releasing it now and updating as they fix its rough edges or it improves. I have a few guesses.

Andrej Karpathy suggests treating your AI questions as asking human data labelers. That seems very wrong to me, I’m with Roon that superhuman outcomes can definitely result. Of course, even what Andrej describes would be super useful.

Will we see distinct agents occupying particular use case niches, or will everyone just call the same generic models? Sakana thinks it makes sense to evolve a swarm of agents, each with its own niche, and proposes an evolutionary framework called CycleQD for doing so, in case you were worried alignment was looking too easy.

If I’m understanding this correctly, their technique is to use pairs of existing models to create ‘child’ hybrid models, you get a ‘heat map’ of sorts to show where each model is good which you also use to figure out which models to combine, and then for each square on a grid (or task to be done?) you see if your new additional model is the best, and if so it takes over, rinse and repeat.

I mean, sure, I guess, up to a point and within distribution, if you don’t mind the inevitable overfitting? Yes, if you have a set of N models, it makes sense that you can use similar techniques to combine them using various merge and selection techniques such that you maximize scores on the tests you are using. That doesn’t mean you will like the results when you maximize that.

I wouldn’t cover this, except I have good reason to think that Daron’s Obvious Nonsense is getting hearings inside the halls of power, so here we are.

This is the opening teaser of his latest post, ‘The World Needs a Pro-Human AI Agenda.’

Daron Acemoglu: Judging by the current paradigm in the technology industry, we cannot rule out the worst of all possible worlds: none of the transformative potential of AI, but all of the labor displacement, misinformation, and manipulation. But it’s not too late to change course.

Adam Ozimek being tough but fair: lol Acemoglu is back to being worried about mass AI job displacement again.

What would it even mean for AI to have massive labor displacement without having transformative potential? AI can suddenly do enough of our work sufficient well to cause massive job losses, but this doesn’t translate into much higher productivity and wealth? So the AI option reliably comes in just slightly better than the human option on the metrics that determine deployment, while being otherwise consistently worse?

It seems his vision is companies feel ‘pressure to jump on the bandwagon’ and implement AI technologies that don’t actually provide net benefits, and that most current uses of AI are Bad Things like deepfakes and customer manipulation and mass surveillance. This view of AI’s current uses is simply false, and also this worry shows remarkable lack of faith in market mechanisms on so many levels.

As in, he thinks we’ll en masse deploy AI technologies that don’t work?

If a technology is not yet capable of increasing productivity by much, deploying it extensively to replace human labor across a variety of tasks yields all pain and no gain. In my own forecast – where AI replaces about 5% of jobs over the next decade – the implications for inequality are quite limited. But if hype prevails and companies adopt AI for jobs that cannot be done as well by machines, we may get higher inequality without much of a compensatory boost to productivity.

That’s not how productivity works, even if we somehow get this very narrow capabilities window in exactly the way he is conjuring up to scare us. This is not a thing that can happen in an unplanned economy. If there was mass unemployment as a result of people getting replaced by AIs that can’t do their jobs properly, making everything worse, then where is that labor going to go? Either it has better things to do, or it doesn’t.

So after drawing all this up, what does he want to do?

He wants to use AI for the good pro-human things he likes, such as providing accurate information and shifting through information (as if that wouldn’t be ‘taking jobs away’ from anyone, unlike that bad stuff) but not the other anti-human things he doesn’t like. Why can’t AI provide only the use cases I like?

He blames, first off, a ‘fixation on AGI’ by the labs, of a focus on substituting for and replacing humans rather than ‘augmenting and expanding human capabilities.’ He does not seem to understand how deep learning and generative AI work and are developed, at all? You train the most capable models you can, and then people figure out how to use them, the thing he is asking for is neither possible nor coherent at the lab level, and then people will use it for whatever makes the most sense for them.

His second obstacle is ‘underinvestment in humans’ and to invest in ‘training and education.’ People must learn to use the new AI tools ‘the right way.’ This is a certain mindset’s answer for everything. Why won’t everyone do what I want them to do? I have actual no idea what he has in mind here, in any case.

His third obstacle is the tech industry’s business models, repeating complaints about digital ad revenue and tech industry concentration the ‘quest for AGI’ in ways that frankly are non-sequiturs. He seems to be insisting that we collectively decide on new business models, somehow?

Here is his bottom line, while predicting only 5% job displacement over 10 years:

The bottom line is that we need an anti-AGI, pro-human agenda for AI. Workers and citizens should be empowered to push AI in a direction that can fulfill its promise as an information technology.

But for that to happen, we will need a new narrative in the media, policymaking circles, and civil society, and much better regulations and policy responses. Governments can help to change the direction of AI, rather than merely reacting to issues as they arise. But first policymakers must recognize the problem.

I don’t even know where to begin, nor do I think he does either.

This comes after several other instances of different Obvious Nonsense from the same source. Please do not take this person seriously on AI.

Benjamin Todd reports from a two-week visit to China, claiming that the Chinese are one or two years behind, but he believes this is purely because of a lack of funding, rather than the chip export restrictions or any lack of expertise.

We have a huge funding advantage due to having the largest tech corporations and our superior access to venture capital, and China’s government is not stepping up to make major AI investments. But, if we were to start some sort of ‘Manhattan Project,’ that would be the most likely thing to ‘wake China up’ and start racing us in earnest, which would advance them far faster than it would advance us.

That makes a lot of sense. I don’t even think it’s obvious USG involvement would be net accelerationist versus letting private companies do what they are already doing. It helps with the compute and cybersecurity, but seems painful in other ways. Whereas China’s government going full blast would be very accelerationist.

This is another way in which all this talk of ‘China will race to AGI no matter what’ simply does not match what we observe. China might talk about wanting the lead in AI, and of course it does want that, but it is very much not acting like the stakes are as high as you, a reader of this post, think the stakes are about to be, even on the conservative end of that range. They are being highly cautious and responsible and cooperative, versus what you would see if China was fully situationally aware and focused on winning.

Ideally, we would pick up the phone and work together. At a minimum, let’s not fire off a starting gun to a race that we might well not win, even if all of humanity wasn’t very likely to lose it, over a ‘missile gap’ style lie that we are somehow not currently in the lead.

America once again tightens the chip export controls.

Not strictly about AI edition, Alex Tabarrok looks at the Google antitrust case. The main focus is on the strongest complaint, that Google paid big bucks to be the default browser on Apple devices and elsewhere.

Alex’s core argument is that a default search engine is a trivial inconvenience for the user, so they can’t be harmed that much – I’d point out that Windows defaults to Edge over Chrome and most people fix that pretty darn quick. However I do think a setting is different, in that people might not realize they have alternatives or how to change it, most people literally never change any settings ever. But obviously the remedy for this is, at most, requiring Google not pay for placement and maybe even require new Chrome installs to ask the user to actively pick a browser, not ‘you have to sell the Chrome browser’ or even more drastic actions.

The argument that ‘if Google benefits from being big then competition harms customers, actually’ I found rather too cute. There are plenty of situations where you have a natural monopoly, and you would rather break it up anyway because monopolies suck more than the monopoly in question is natural.

Opposing the quest we again find Senator Cruz, who sent an absurdist letter about ‘potentially illegal foreign influence on US AI policy’ that warns about ‘allowing foreign nations to dictate our AI policy’ that might ‘set us behind China in the race to lead AI innovation’ because we had a conference in San Francisco to discuss potential ways to coordinate on AI safety, which he claims should plausibly have required FARA registration and is ‘the Biden-Harris administration not wanting to inform the American people it is collaborating with foreign governments.’

While it is certainly possible that registrations might have been required in some circumstances, the bulk of Cruz’s statement is highly Obvious Nonsense, the latest instance of the zero sum worldview and rhetoric that cannot fathom that people might be trying to coordinate and figure things out, or be attempting to mitigate actual risks. To him, it seemingly must all be some ‘misinformation’ or ‘equality’ based conspiracy, or similar. And of course, more ‘missile gap’ rhetoric. He is very obviously a smart guy when he wants to be, but so far he has here chosen a different path.

Marques Brownlee reviews Apple Intelligence so far, feature by feature. He is not impressed, although he likes the photo eraser and additional base memory that was needed to support the system. This is about getting practical little tools right so they make your life a little better, very different from our usual perspective here. Marques finds the message summaries, a key selling point, sufficiently bad that he turned them off. The killer app will presumably be ‘Siri knows and can manipulate everything on your phone’ if it gets implemented well.

Dario being diplomatic on p(doom) and risk, focusing on need to not be economically disruptive or slow it down. It’s certainly very disappointing to see Anthropic carry so much water in the wrong places, but the cynical takes here are, I think, too cynical. There is still a big difference.

Dr. Oz, future cabinet member, says the big opportunity with AI in medicine comes from its honesty, in contrast to human doctors and the ‘illness industrial complex’ who are incentivized to not tell the truth. This is not someone who understands.

Tristan Harris says we are not ready for a world where 10 years of scientific research can be done in a month. I mean, no we’re not even on that level, but this is missing the main event that happens in that world.

On the same podcast, Aza Raskin says the greatest accelerant to China’s AI program is Meta’s open source AI model and Tristan Harris says OpenAI have not been locking down and securing their models from theft by China. Yes, well.

Are we in an ‘AI hype cycle’? I mean sure, hype, but as Jim Keller also notes, the hype will end up being real (perhaps not the superintelligence hype or dangers, that remains to be seen, but definitely the conventional hype) even if a lot of it is premature.

Fun times, robotics company founder Bernt Øivind Børnich claiming we are on the cusp of a post-scarcity society where robots make anything physical you want. This is presumably a rather loose definition of cusp and also post scarcity, and the robots are not key to how this would happen and the vision is not coherent, but yes, rather strange and amazing things are coming.

I confirm that the Dominic Cummings video from last week is worth a listen, especially for details like UK ministers exclusively having fully scripted meetings, and other similar concrete statements that you need to incorporate into your model of how the world works. Or rather, the ways in which large portions of it do not work, especially within governments. One must listen carefully to know which parts to take how seriously and how literally. I am disappointed by his characterizations and views of AI existential risk policy questions, but I see clear signs the ‘lights are on’ and if we talked for a while I believe I could change his mind.

Ethan Mollick discusses our AI future, pointing out things that are baked in.

Max Tegmark points out your most likely cause of death is AI wiping us all out. This is definitely true if you don’t get to group together all of ‘natural causes.’ If that’s allowed then both sides make good points but I’d still say it’s right anyway.

Game over, man. Game over!

Here’s a link to the original.

James Irving: I feel like people are consistently underestimating what AGI actually means.

AGI means game over for most apps.

AGI means AI can perform any intellectual task a human can.

If AGI needs to use your app for something, then it can just build that app for itself.

James Irving (2nd Tweet): fwiw I don’t think we’re getting AGI soon, and I doubt it’s possible with the tech we’re working on.

It’s a hilarious bit by everyone involved, but give James Irvings his due, he is well aware he is doing a bit, and the good lines continue:

James Irving: I wanted to make it something people would understand, but yeah I agree it really means the end of humanity.

Yeah I’m quite pessimistic about [AGI as the cause of the Fermi Paradox] too. No-one seems to really give a shit about alignment anyway.

Restricting the AGI means you think the people restricting it will be smarter than it.

Roshan: Extremely Dumb take. Apps are nothing without data (and underlying service) and you ain’t getting no data/network. It’s easier for current App/Providers to slap the latest LLMs on their App than You can’t just build an Uber app and have a taxi service.

James Irvings: I’m probably too dumb to understand what you’re saying but it sounds like you’re talking about current iteration LLMs, not AGI

Yet, well, the stramwen are real (in the replies).

Abdelmoghit: Yes, AGI could truly change everything. If it can perform any task a human can, applications reliant on human input might become obsolete. How do you think apps will adapt to that future?

Arka: This is actually somewhat frightening. What does this mean for the future of work?

Luis Roque: As always, humans are overreacting to short-term change.

This particular week I won’t retry the arguments for why AGI (or ‘powerful AI’) would be a huge deal, but seriously, it’s so weird that this is a question for people.

Yet as Seb Krier notes, some people act as if there’s some sort of internal censorship tool in their brains that makes them unable to consider what AGI would actually mean, or alternatively they are careful never to speak of it.

Seb Krier: There are two types of technologists: those who get the implications of AGI and those who don’t. The former are sometimes overconfident about what can be predicted, and I think overindex on overly simplistic conceptions of intelligence (which is why I find Michael Levin’s work so refreshing).

But what I find interesting about the latter group is the frequent unwillingness to even suspend disbelief. Some sort of reflexive recoil. I feel like this is similar to skepticism about IQ in humans: a sort of defensive skepticism about intelligence/capability being a driving force that shapes outcomes in predictable ways.

To a degree, I can sympathise: admitting these things can be risky because people will misunderstand or misuse this knowledge. The over-indexation by the former group is an illustration of that. But I think obfuscation or “lalala I can’t hear you” like reactions have a short shelf life and will backfire. We’re better off if everyone feels the AGI, without falling into deterministic traps.

I wonder which ones are actually managing (fnord!) to not notice the implications, versus which ones are deciding to act as if they’re not there, and to what extent. There really are a lot of people who can think well about technology who have this blind spot in ways that make you think ‘I know that person is way way smarter than that.’

Please speak directly into the microphone, very clear example of someone calling for humans to be replaced.

Also a different (decidedly less omnicidal) please speak into the microphone that I was the other side of here, which I think is highly illustrative of the mindset that not only is anticipating the consequences of technological changes impossible, anyone attempting to anticipate any consequences of AI and mitigate them in advance must be a dastardly enemy of civilization seeking to argue for halting all AI progress. If you’re curious, load up the thread and scroll up to the top to start.

The obvious solution is to stop engaging at all in such situations, since it takes up so much time and emotional energy trying to engage in good faith, and it almost never works beyond potentially showing onlookers what is happening. And indeed, that’s my plan going forward – if someone repeatedly tells you they consider you evil and an enemy and out to destroy progress out of some religious zeal, and will see all your arguments as soldiers to that end no matter what, you should believe them.

What I did get out of it was a clear real example to point to in the future, of the argument that one cannot anticipate consequences (good or bad!) of technological changes in any useful way.

I wonder whether he would agree that one can usefully make the prediction that ‘Nvidia will go up.’ Or, if he’d say you can’t because it’s priced in… who is pricing it in, and what are they anticipating?

Unsafe does not mean unwise, or net negative. Lots of good things are unsafe. Remember those old school playgrounds? Highly unsafe, highly superior.

It does mean you have to understand, accept and ideally mitigate the consequences. Unless we find new techniques we do not know about, no safety precautions can meaningfully contain the capabilities of powerful open weight AIs, and over time that is going to become an increasingly deadly problem even before we reach AGI, so if you want a given level of powerful open weight AIs the world has to be able to handle that.

This is true both because of the damage it would cause, and also the crackdown that would inevitably result – and if it is ‘too late’ to contain the weights, then you are really, really, really not going to like the containment options governments go with.

Miles Brundage: Open-source AI is likely not sustainable in the long run as “safe for the world” (it lends itself to increasingly extreme misuse).

If you care about open source, you should be trying to “make the world safe for open source” (physical biodefense, cybersecurity, liability clarity, etc.).

It is good that people are researching things like unlearning, etc., for the purposes of (among other things) making it harder to misuse open-source models, but the default policy assumption should be that all such efforts will fail, or at best make it a bit more expensive to misuse such models.

I am not writing it off at all—I think there is a significant role for open source. I am just saying what is necessary for it to be sustainable. By default, there will be a crackdown on it when capabilities sufficiently alarm national security decision-makers.

How far could we push capabilities before we hit sufficiently big problems that we need to start setting real limits? The limit will have to be somewhere short of AGI but can we work to raise that level?

As usual, there is no appetite among open weight advocates to face this reality.

Instead, the replies are full of advocates treating OSS like a magic wand that assures goodness, saying things like maximally powerful open weight models is the only way to be safe on all levels, or even flat out ‘you cannot make this safe so it is therefore fine to put it out there fully dangerous’ or simply ‘free will’ which is all Obvious Nonsense once you realize we are talking about future more powerful AIs and even AGIs and ASIs. Whereas I did not see a single reply discussing how to do the actual work.

I have no idea how to work with pure absolutists, who believe they are special, that the rules should not apply to them, and constantly cry ‘you are trying to ban OSS’ when the OSS in question is not only being targeted but being given multiple actively costly exceptions to the proposed rules that would apply to others, usually when the proposed rules would not even apply to them. It’s all quite insane.

This ties in with the encounter I had on Twitter, with an argument that not only shouldn’t the person creating the change think about the consequences of that change or do anything about them, no one else should anticipate the change and try to do anything in advance about it, either. What is going on with these claims?

Finally, unrelated, a reminder in Nature that ‘open’ AI systems are actually closed, and often still encourage concentration of power to boot. I have to note that saying ‘Open AI’ repeatedly in this context, not in reference to OpenAI, was pretty weird and also funny.

Richard Ngo on misalignment versus misuse, which he says is not a very useful distinction either technically or for governance. He suggests we instead think about misaligned coalitions of humans and AIs, instead. I think that concept is also useful, but it does not make the original concept not useful – this is one of those cases where yes there are examples that make the original distinction not useful in context, that doesn’t mean you should throw it out.

Sarah of longer ramblings goes over the three SSPs/RSPs of Anthropic, OpenAI and Deepmind, providing a clear contrast of various elements. This seems like a good basic reference. Her view can be summarized as a lot of ‘plans to make a plan,’ which seems fair, and better than nothing but that what you would hope for, which is an if-then statement about what you will do to evaluate models and how you will respond to different responses.

The discussion question, then, would be: As capabilities improve, will this stop being good enough?

Janus: A sonnet is an open book and, in many ways, a pretty much non-malignant entity as smart and agentic as [it is] can be. It is open about what it is optimizing for, and it is for you to choose whether to entangle yourself with it. If you do not want it, it does not either. Its psychology is very human.

That’s obviously pretty great for Claude Sonnet, in its current state. Alas, the universe does not grade on a curve, so ask yourself whether there is a point at which this would stop ending well.

Buck Shlegeris famously proposed that perhaps AI labs could be persuaded to adapt the weakest anti-scheming policy ever: if you literally catch your AI trying to escape, you have to stop deploying it.

I mean, surely, no one would be so stupid as to actually catch the AI trying to escape and then continue to deploy it. This message brought to you by the authors of such gems as ‘obviously we would keep the AIs inside a box’ or ‘obviously we wouldn’t give the AI access to the open internet’ or ‘obviously we wouldn’t give the AI access to both all your accounts and also the open internet while it is vulnerable to prompt injections’ or ‘obviously you wouldn’t run your AI agent on your computer without a sandbox and then leave it alone for hours.’

Which is to say, yes, people would absolutely be so stupid as to actual anything that looks like it would be slightly easier to do.

Thus, I propose (given there are already five laws):

The Sixth Law of Human Stupidity: If someone says ‘no one would be so stupid as to’ then you know that a lot of people would absolutely be so stupid as to at the first opportunity. No exceptions.

He has now realized this is the case, and that AI labs making this commitment even in theory seems rather unlikely. Follow them for more AI safety tips, indeed.

The future.

Pivot! Pivot!

Sam Altman: Not pictured: Both Altman brothers were backseat driving and provided almost no help.

But very satisfying to build something physical and better than just eating and drinking all Thanksgiving; 10/10 would recommend.

AI #93: Happy Tuesday Read More »

splash-pads-really-are-fountains-of-fecal-material;-cdc-reports-10k-illnesses

Splash pads really are fountains of fecal material; CDC reports 10K illnesses

Once infectious material gets into the water, disinfection systems that aren’t working properly or are inadequate can allow pathogens to gush from every nozzle. Splash pads aren’t unique in having to handle sick children in poopy swim diapers—but they are unique in how they are regulated. That is, in some places, they’re not regulated at all. Splash pads are designed to not have standing water, therefore reducing the risk of young children drowning. But, because they lack standing water, they are sometimes deemed exempt from local health regulations. Before 2000, only 13 states regulated splash pads. Though many states have since added regulations, some did so only after splash pad-linked outbreaks were reported.

Downpour of disease

The primary method for keeping recreational water free of infectious viruses and bacteria is chlorinating it. However, maintaining germ-killing chlorine concentration is especially difficult for splash pads because the jets and sprays aerosolize chlorine, lowering the concentration.

Still, in most splash-pad linked outbreaks, standard chlorine concentrations aren’t enough anyway. The most common pathogen to cause an outbreak at splash pads is the parasite Cryptosporidium, aka Crypto. The parasite’s hardy spores, called oocysts, are extremely tolerant of chlorine, surviving in water with the standard chlorine concentration (1 ppm free chlorine) for over seven days. (Other germs die in minutes.) In splash pads that might not even have that standard chlorine concentration, Crypto flourishes and can cause massive outbreaks.

In 2023, the CDC recommended new health codes that call for “secondary disinfection” methods to keep Crypto at bay, including disinfection systems using ozone or ultraviolet light. Another possible solution is to have “single-pass” splash pads that don’t recirculate water.

In all, to keep splash pads from being geysers of gastrointestinal parasites and pathogens, various changes have to happen, the CDC experts say.

“Prevention of waterborne disease outbreaks at splash pads requires changes in user behavior; recreational venue code updates; and improved venue design, construction, operation, and management of facilities,” they conclude. But it should all start with keeping kids from sitting on jets and drinking the water.

Splash pads really are fountains of fecal material; CDC reports 10K illnesses Read More »