Author name: DJ Henderson

meta-and-yandex-are-de-anonymizing-android-users’-web-browsing-identifiers

Meta and Yandex are de-anonymizing Android users’ web browsing identifiers


Abuse allows Meta and Yandex to attach persistent identifiers to detailed browsing histories.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

Tracking code that Meta and Russia-based Yandex embed into millions of websites is de-anonymizing visitors by abusing legitimate Internet protocols, causing Chrome and other browsers to surreptitiously send unique identifiers to native apps installed on a device, researchers have discovered. Google says it’s investigating the abuse, which allows Meta and Yandex to convert ephemeral web identifiers into persistent mobile app user identities.

The covert tracking—implemented in the Meta Pixel and Yandex Metrica trackers—allows Meta and Yandex to bypass core security and privacy protections provided by both the Android operating system and browsers that run on it. Android sandboxing, for instance, isolates processes to prevent them from interacting with the OS and any other app installed on the device, cutting off access to sensitive data or privileged system resources. Defenses such as state partitioning and storage partitioning, which are built into all major browsers, store site cookies and other data associated with a website in containers that are unique to every top-level website domain to ensure they’re off-limits for every other site.

A blatant violation

“One of the fundamental security principles that exists in the web, as well as the mobile system, is called sandboxing,” Narseo Vallina-Rodriguez, one of the researchers behind the discovery, said in an interview. “You run everything in a sandbox, and there is no interaction within different elements running on it. What this attack vector allows is to break the sandbox that exists between the mobile context and the web context. The channel that exists allowed the Android system to communicate what happens in the browser with the identity running in the mobile app.”

The bypass—which Yandex began in 2017 and Meta started last September—allows the companies to pass cookies or other identifiers from Firefox and Chromium-based browsers to native Android apps for Facebook, Instagram, and various Yandex apps. The companies can then tie that vast browsing history to the account holder logged into the app.

This abuse has been observed only in Android, and evidence suggests that the Meta Pixel and Yandex Metrica target only Android users. The researchers say it may be technically feasible to target iOS because browsers on that platform allow developers to programmatically establish localhost connections that apps can monitor on local ports.

In contrast to iOS, however, Android imposes fewer controls on local host communications and background executions of mobile apps, the researchers said, while also implementing stricter controls in app store vetting processes to limit such abuses. This overly permissive design allows Meta Pixel and Yandex Metrica to send web requests with web tracking identifiers to specific local ports that are continuously monitored by the Facebook, Instagram, and Yandex apps. These apps can then link pseudonymous web identities with actual user identities, even in private browsing modes, effectively de-anonymizing users’ browsing habits on sites containing these trackers.

Meta Pixel and Yandex Metrica are analytics scripts designed to help advertisers measure the effectiveness of their campaigns. Meta Pixel and Yandex Metrica are estimated to be installed on 5.8 million and 3 million sites, respectively.

Meta and Yandex achieve the bypass by abusing basic functionality built into modern mobile browsers that allows browser-to-native app communications. The functionality lets browsers send web requests to local Android ports to establish various services, including media connections through the RTC protocol, file sharing, and developer debugging.

A conceptual diagram representing the exchange of identifiers between the web trackers running on the browser context and native Facebook, Instagram, and Yandex apps for Android.

A conceptual diagram representing the exchange of identifiers between the web trackers running on the browser context and native Facebook, Instagram, and Yandex apps for Android.

While the technical underpinnings differ, both Meta Pixel and Yandex Metrica are performing a “weird protocol misuse” to gain unvetted access that Android provides to localhost ports on the 127.0.0.1 IP address. Browsers access these ports without user notification. Facebook, Instagram, and Yandex native apps silently listen on those ports, copy identifiers in real time, and link them to the user logged into the app.

A representative for Google said the behavior violates the terms of service for its Play marketplace and the privacy expectations of Android users.

“The developers in this report are using capabilities present in many browsers across iOS and Android in unintended ways that blatantly violate our security and privacy principles,” the representative said, referring to the people who write the Meta Pixel and Yandex Metrica JavaScript. “We’ve already implemented changes to mitigate these invasive techniques and have opened our own investigation and are directly in touch with the parties.”

Meta didn’t answer emailed questions for this article, but provided the following statement: “We are in discussions with Google to address a potential miscommunication regarding the application of their policies. Upon becoming aware of the concerns, we decided to pause the feature while we work with Google to resolve the issue.”

Yandex representatives didn’t answer an email seeking comment.

How Meta and Yandex de-anonymize Android users

Meta Pixel developers have abused various protocols to implement the covert listening since the practice began last September. They started by causing apps to send HTTP requests to port 12387. A month later, Meta Pixel stopped sending this data, even though Facebook and Instagram apps continued to monitor the port.

In November, Meta Pixel switched to a new method that invoked WebSocket, a protocol for two-way communications, over port 12387.

That same month, Meta Pixel also deployed a new method that used WebRTC, a real-time peer-to-peer communication protocol commonly used for making audio or video calls in the browser. This method used a complicated process known as SDP munging, a technique for JavaScript code to modify Session Description Protocol data before it’s sent. Still in use today, the SDP munging by Meta Pixel inserts key _fbp cookie content into fields meant for connection information. This causes the browser to send that data as part of a STUN request to the Android local host, where the Facebook or Instagram app can read it and link it to the user.

In May, a beta version of Chrome introduced a mitigation that blocked the type of SDP munging that Meta Pixel used. Within days, Meta Pixel circumvented the mitigation by adding a new method that swapped the STUN requests with the TURN requests.

In a post, the researchers provided a detailed description of the _fbp cookie from a website to the native app and, from there, to the Meta server:

1. The user opens the native Facebook or Instagram app, which eventually is sent to the background and creates a background service to listen for incoming traffic on a TCP port (12387 or 12388) and a UDP port (the first unoccupied port in 12580–12585). Users must be logged-in with their credentials on the apps.

2. The user opens their browser and visits a website integrating the Meta Pixel.

3. At this stage, some websites wait for users’ consent before embedding Meta Pixel. In our measurements of the top 100K website homepages, we found websites that require consent to be a minority (more than 75% of affected sites does not require user consent)…

4. The Meta Pixel script is loaded and the _fbp cookie is sent to the native Instagram or Facebook app via WebRTC (STUN) SDP Munging.

5. The Meta Pixel script also sends the _fbp value in a request to https://www.facebook.com/tr along with other parameters such as page URL (dl), website and browser metadata, and the event type (ev) (e.g., PageView, AddToCart, Donate, Purchase).

6. The Facebook or Instagram apps receive the _fbp cookie from the Meta JavaScripts running on the browser and transmits it to the GraphQL endpoint (https://graph[.]facebook[.]com/graphql) along with other persistent user identifiers, linking users’ fbp ID (web visit) with their Facebook or Instagram account.

Detailed flow of the way the Meta Pixel leaks the _fbp cookie from Android browsers to it’s Facebook and Instagram apps.

Detailed flow of the way the Meta Pixel leaks the _fbp cookie from Android browsers to it’s Facebook and Instagram apps.

The first known instance of Yandex Metrica linking websites visited in Android browsers to app identities was in May 2017, when the tracker started sending HTTP requests to local ports 29009 and 30102. In May 2018, Yandex Metrica also began sending the data through HTTPS to ports 29010 and 30103. Both methods remained in place as of publication time.

An overview of Yandex identifier sharing

An overview of Yandex identifier sharing

A timeline of web history tracking by Meta and Yandex

A timeline of web history tracking by Meta and Yandex

Some browsers for Android have blocked the abusive JavaScript in trackers. DuckDuckGo, for instance, was already blocking domains and IP addresses associated with the trackers, preventing the browser from sending any identifiers to Meta. The browser also blocked most of the domains associated with Yandex Metrica. After the researchers notified DuckDuckGo of the incomplete blacklist, developers added the missing addresses.

The Brave browser, meanwhile, also blocked the sharing of identifiers due to its extensive blocklists and existing mitigation to block requests to the localhost without explicit user consent. Vivaldi, another Chromium-based browser, forwards the identifiers to local Android ports when the default privacy setting is in place. Changing the setting to block trackers appears to thwart browsing history leakage, the researchers said.

Tracking blocker settings in Vivaldi for Android.

There’s got to be a better way

The various remedies DuckDuckGo, Brave, Vivaldi, and Chrome have put in place are working as intended, but the researchers caution they could become ineffective at any time.

“Any browser doing blocklisting will likely enter into a constant arms race, and it’s just a partial solution,” Vallina Rodriguez said of the current mitigations. “Creating effective blocklists is hard, and browser makers will need to constantly monitor the use of this type of capability to detect other hostnames potentially abusing localhost channels and then updating their blocklists accordingly.”

He continued:

While this solution works once you know the hostnames doing that, it’s not the right way of mitigating this issue, as trackers may find ways of accessing this capability (e.g., through more ephemeral hostnames). A long-term solution should go through the design and development of privacy and security controls for localhost channels, so that users can be aware of this type of communication and potentially enforce some control or limit this use (e.g., a permission or some similar user notifications).

Chrome and most other Chromium-based browsers executed the JavaScript as Meta and Yandex intended. Firefox did as well, although for reasons that aren’t clear, the browser was not able to successfully perform the SDP munging specified in later versions of the code. After blocking the STUN variant of SDP munging in the early May beta release, a production version of Chrome released two weeks ago began blocking both the STUN and TURN variants. Other Chromium-based browsers are likely to implement it in the coming weeks. A representative for Firefox-maker Mozilla said the organization prioritizes user privacy and is taking the report seriously

“We are actively investigating the reported behavior, and working to fully understand its technical details and implications,” Mozilla said in an email. “Based on what we’ve seen so far, we consider these to be severe violations of our anti-tracking policies, and are assessing solutions to protect against these new tracking techniques.”

The researchers warn that the current fixes are so specific to the code in the Meta and Yandex trackers that it would be easy to bypass them with a simple update.

“They know that if someone else comes in and tries a different port number, they may bypass this protection,” said Gunes Acar, the researcher behind the initial discovery, referring to the Chrome developer team at Google. “But our understanding is they want to send this message that they will not tolerate this form of abuse.”

Fellow researcher Vallina-Rodriguez said the more comprehensive way to prevent the abuse is for Android to overhaul the way it handles access to local ports.

“The fundamental issue is that the access to the local host sockets is completely uncontrolled on Android,” he explained. “There’s no way for users to prevent this kind of communication on their devices. Because of the dynamic nature of JavaScript code and the difficulty to keep blocklists up to date, the right way of blocking this persistently is by limiting this type of access at the mobile platform and browser level, including stricter platform policies to limit abuse.”

Got consent?

The researchers who made this discovery are:

  • Aniketh Girish, PhD student at IMDEA Networks
  • Gunes Acar, assistant professor in Radboud University’s Digital Security Group & iHub
  • Narseo Vallina-Rodriguez, associate professor at IMDEA Networks
  • Nipuna Weerasekara, PhD student at IMDEA Networks
  • Tim Vlummens, PhD student at COSIC, KU Leuven

Acar said he first noticed Meta Pixel accessing local ports while visiting his own university’s website.

There’s no indication that Meta or Yandex has disclosed the tracking to either websites hosting the trackers or end users who visit those sites. Developer forums show that many websites using Meta Pixel were caught off guard when the scripts began connecting to local ports.

“Since 5th September, our internal JS error tracking has been flagging failed fetch requests to localhost: 12387,” one developer wrote. “No changes have been made on our side, and the existing Facebook tracking pixel we use loads via Google Tag Manager.”

“Is there some way I can disable this?” another developer encountering the unexplained local port access asked.

It’s unclear whether browser-to-native-app tracking violates any privacy laws in various countries. Both Meta and companies hosting its Meta Pixel, however, have faced a raft of lawsuits in recent years alleging that the data collected violates privacy statutes. A research paper from 2023 found that Meta pixel, then called the Facebook Pixel, “tracks a wide range of user activities on websites with alarming detail, especially on websites classified as sensitive categories under GDPR,” the abbreviation for the European Union’s General Data Protection Regulation.

So far, Google has provided no indication that it plans to redesign the way Android handles local port access. For now, the most comprehensive protection against Meta Pixel and Yandex Metrica tracking is to refrain from installing the Facebook, Instagram, or Yandex apps on Android devices.

Photo of Dan Goodin

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

Meta and Yandex are de-anonymizing Android users’ web browsing identifiers Read More »

could-floating-solar-panels-on-a-reservoir-help-the-colorado-river?

Could floating solar panels on a reservoir help the Colorado River?


Floating solar panels appear to conserve water while generating green electricity.

The Gila River Indian Community in Arizona has lined 3,000 feet of their canals with solar panels. Credit: Jake Bolster/Inside Climate News

GILA RIVER INDIAN RESERVATION, Ariz.—About 33 miles south of Phoenix, Interstate 10 bisects a line of solar panels traversing the desert like an iridescent snake. The solar farm’s shape follows the path of a canal, with panels serving as awnings to shade the gently flowing water from the unforgiving heat and wind of the Sonoran Desert.

The panels began generating power last November for the Akimel O’otham and Pee Posh tribes—known together as the Gila River Indian Community, or GRIC—on their reservation in south-central Arizona, and they are the first of their kind in the US. The community is studying the effects of these panels on the water in the canal, hopeful that they will protect a precious resource from the desert’s unflinching sun and wind.

In September, GRIC is planning to break ground on another experimental effort to conserve water while generating electricity: floating solar. Between its canal canopies and the new project that would float photovoltaic panels on a reservoir it is building, GRIC hopes to one day power all of its canal and irrigation operations with solar electricity, transforming itself into one of the most innovative and closely watched water users in the West in the process.

The community’s investments come at a critical time for the Colorado River, which supplies water to about 40 million people across seven Western states, Mexico and 30 tribes, including GRIC. Annual consumption from the river regularly exceeds its supply, and a decadeslong drought, fueled in part by climate change, continues to leave water levels at Lake Powell and Lake Mead dangerously low.

Covering water with solar panels is not a new idea. But for some it represents an elegant mitigation of water shortages in the West. Doing so could reduce evaporation, generate more carbon-free electricity and require dams to run less frequently to produce power.

But, so far, the technology has not been included in the ongoing Colorado River negotiations between the Upper Basin states of Colorado, New Mexico, Utah, and Wyoming, the Lower Basin states of Arizona, California, and Nevada, tribes and Mexico. All are expected to eventually agree on cuts to the system’s water allocations to maintain the river’s ability to provide water and electricity for residents and farms, and keep its ecosystem alive.

“People in the US don’t know about [floating solar] yet,” said Scott Young, a former policy analyst in the Nevada state legislature’s counsel bureau. “They’re not willing to look at it and try and factor it” into the negotiations.

Several Western water managers Inside Climate News contacted for this story said they were open to learning more about floating solar—Colorado has even studied the technology through pilot projects. But, outside of GRIC’s project, none knew of any plans to deploy floating solar anywhere in the basin. Some listed costly and unusual construction methods and potentially modest water savings as the primary obstacles to floating solar maturing in the US.

A tantalizing technology with tradeoffs

A winery in Napa County, California, deployed the first floating solar panels in the US on an irrigation pond in 2007. The country was still years away from passing federal legislation to combat the climate crisis, and the technology matured here haltingly. As recently as 2022, according to a Bloomberg analysis, most of the world’s 13 gigawatts of floating solar capacity had been built in Asia.

Unlike many Asian countries, the US has an abundance of undeveloped land where solar could be constructed, said Prateek Joshi, a research engineer at the National Renewable Energy Laboratory (NREL) who has studied floating solar, among other forms of energy. “Even though [floating solar] may play a smaller role, I think it’s a critical role in just diversifying our energy mix and also reducing the burden of land use,” he said.

Credit: Paul Horn/Inside Climate News

This February, NREL published a study that found floating solar on the reservoirs behind federally owned dams could provide enough electricity to power 100 million US homes annually, but only if all the developable space on each reservoir were used.

Lake Powell could host almost 15 gigawatts of floating solar using about 23 percent of its surface area, and Lake Mead could generate over 17 gigawatts of power on 28 percent of its surface. Such large-scale development is “probably not going to be the case,” Joshi said, but even if a project used only a fraction of the developable area, “there’s a lot of power you could get from a relatively small percentage of these Colorado Basin reservoirs.”

The study did not measure how much water evaporation floating solar would prevent, but previous NREL research has shown that photovoltaic panels—sometimes called “floatovoltaics” when they are deployed on reservoirs—could also save water by changing the way hydropower is deployed.

Some of a dam’s energy could come from solar panels floating on its reservoir to prevent water from being released solely to generate electricity. As late as December, when a typical Western dam would be running low, lakes with floating solar could still have enough water to produce hydropower, reducing reliance on more expensive backup energy from gas-fired power plants.

Joshi has spoken with developers and water managers about floating solar before, and said there is “an eagerness to get this [technology] going.” The technology, however, is not flawless.

Solar arrays can be around 20 percent more expensive to install on water than land, largely because of the added cost of buoys that keep the panels afloat, according to a 2021 NREL report. The water’s cooling effect can boost panel efficiency, but floating solar panels may produce slightly less energy than a similarly sized array on land because they can’t be tilted as directly toward the sun as land-based panels.

And while the panels likely reduce water loss from reservoirs, they may also increase a water body’s emissions of greenhouse gases, which in turn warm the climate and increase evaporation. This January, researchers at Cornell University found that floating solar covering more than 70 percent of a pond’s surface area increased the water’s CO2 and methane emissions. These kinds of impacts “should be considered not only for the waterbody in which [floating solar] is deployed but also in the broader context of trade-offs of shifting energy production from land to water,” the study’s authors wrote.

“Any energy technology has its tradeoffs,” Joshi said, and in the case of floating solar, some of its benefits—reduced evaporation and land use—may not be easy to express in dollars and cents.

Silver buckshot

There is perhaps no bigger champion for floating solar in the West than Scott Young. Before he retired in 2016, he spent much of his 18 years working for the Nevada Legislature researching the effects of proposed legislation, especially in the energy sector.

On an overcast, blustery May day in southwest Wyoming near his home, Young said that in the past two years he has promoted the technology to Colorado River negotiators, members of Congress, environmental groups and other water managers from the seven basin states, all of whom he has implored to consider the virtues of floating solar arrays on Lake Powell and Lake Mead.

Young grew up in the San Francisco Bay area, about 40 miles, he estimated, from the pioneering floating solar panels in Napa. He stressed that he does not have any ties to industry; he is just a concerned Westerner who wants to diversify the region’s energy mix and save as much water as possible.

But so far, when he has been able to get someone’s attention, Young said his pitch has been met with tepid interest. “Usually the response is: ‘Eh, that’s kind of interesting,’” said Young, dressed in a black jacket, a maroon button-down shirt and a matching ball cap that framed his round, open face. “But there’s no follow-up.”

The Bureau of Reclamation “has not received any formal proposals for floating solar on its reservoirs,” said an agency spokesperson, who added that the bureau has been monitoring the technology.

In a 2021 paper published with NREL, Reclamation estimated that floating solar on its reservoirs could generate approximately 1.5 terawatts of electricity, enough to power about 100 million homes. But, in addition to potentially interfering with recreation, aquatic life, and water safety, floating solar’s effect on evaporation proved difficult to model broadly.

So many environmental factors determine how water is lost or consumed in a reservoir—solar intensity, wind, humidity, lake circulation, water depth, and temperature—that the study’s authors concluded Reclamation “should be wary of contractors’ claims of evaporation savings” without site-specific studies. Those same factors affect the panels’ efficiency, and in turn, how much hydropower would need to be generated from the reservoir they cover.

The report also showed the Colorado River was ripe with floating solar potential—more than any other basin in the West. That’s particularly true in the Upper Basin, where Young has been heartened by Colorado’s approach to the technology.

In 2023, the state passed a law requiring several agencies to study the use of floating solar. Last December, the Colorado Water Conservation Board published its findings, and estimated that the state could save up to 407,000 acre feet of water by deploying floating solar on certain reservoirs. An acre foot covers one acre with a foot of water, or 325,851 gallons, just about three year’s worth of water for a family of four.

When Young saw the Colorado study quantifying savings from floating solar, he felt hopeful. “407,000 acre feet from one state,” he said. “I was hoping that would catch people’s attention.”

Saving that much water would require using over 100,000 acres of surface water, said Cole Bedford, the Colorado Water Conservation Board’s chief operating officer, in an email. “On some of these reservoirs a [floating solar] system would diminish the recreational value such that it would not be appropriate,” he said. “On others, recreation, power generation, and water savings could be balanced.”

Colorado is not planning to develop another project in the wake of this study, and Bedford said that the technology is not a silver bullet solution for Colorado River negotiations.

“While floating solar is one tool in the toolkit for water conservation, the only true solution to the challenges facing the Colorado River Basin is a shift to supply-driven, sustainable uses and operations,” he said.

Some of the West’s largest and driest cities, like Phoenix and Denver, ferry Colorado River water to residents hundreds of miles away from the basin using a web of infrastructure that must reliably operate in unforgiving terrain. Like their counterparts at the state level, water managers in these cities have heard floatovoltaics floated before, but they say the technology is currently too immature and costly to be deployed in the US.

Lake Pleasant

Lake Pleasant, which holds some of the Central Arizona Project’s Colorado River Water, is also a popular recreation space, complicating its floating solar potential.

Credit: Jake Bolster/Inside Climate News

Lake Pleasant, which holds some of the Central Arizona Project’s Colorado River Water, is also a popular recreation space, complicating its floating solar potential. Credit: Jake Bolster/Inside Climate News

In Arizona, the Central Arizona Project (CAP) delivers much of the Colorado River water used by Phoenix, Tucson, tribes, and other southern Arizona communities with a 336-mile canal running through the desert, and Lake Pleasant, the company’s 811,784-acre-foot reservoir.

Though CAP is following GRIC’s deployment of solar over canals, it has no immediate plans to build solar over its canal, or Lake Pleasant, according to Darrin Francom, CAP’s assistant general manager for operations, power, engineering, and maintenance, in part because the city of Peoria technically owns the surface water.

Covering the whole canal with solar to save the 4,000 acre feet that evaporates from it could be prohibitively expensive for CAP. “The dollar cost per that acre foot [saved] is going to be in the tens of, you know, maybe even hundreds of thousands of dollars,” Francom said, mainly due to working with novel equipment and construction methods. “Ultimately,” he continued, “those costs are going to be borne by our ratepayers,” which gives CAP reason to pursue other lower-cost ways to save water, like conservation programs, or to seek new sources.

An intake tower moves water into and out of the dam at Lake Pleasant.

Credit: Jake Bolster/Inside Climate News

An intake tower moves water into and out of the dam at Lake Pleasant. Credit: Jake Bolster/Inside Climate News

The increased costs associated with building solar panels on water instead of on land has made such projects unpalatable to Denver Water, Colorado’s largest water utility, which moves water out of the Colorado River Basin and through the Rocky Mountains to customers on the Front Range. “Floating solar doesn’t pencil out for us for many reasons,” said Todd Hartman, a company spokesperson. “Were we to add more solar resources—which we are considering—we have abundant land-based options.”

GRIC spent about $5.6 million, financed with Inflation Reduction Act grants, to construct 3,000 feet of solar over a canal, according to David DeJong, project director for the community’s irrigation district.

Young is aware there is no single solution to the problems plaguing the Colorado River Basin, and he knows floating solar is not a perfect technology. Instead, he thinks of it as a “silver buckshot,” he said, borrowing a term from John Entsminger, general manager for the Southern Nevada Water Authority—a technology that can be deployed alongside a constellation of behavioral changes to help keep the Colorado River alive.

Given the duration and intensity of the drought in the West and the growing demand for water and clean energy, Young believes the US needs to act now to embed this technology into the fabric of Western water management going forward.

As drought in the West intensifies, “I think more lawmakers are going to look at this,” he said. “If you can save water in two ways—why not?”

“We’re not going to know until we try”

If all goes according to plan, GRIC’s West Side Reservoir will be finished and ready to store Colorado River water by the end of July. The community wants to cover just under 60 percent of the lake’s surface area with floating solar.

“Do we know for a fact that this is going to be 100 percent effective and foolproof? No,” said DeJong, GRIC’s project director for its irrigation district. “But we’re not going to know until we try.”

Solar panels over the canal

The Gila River Indian Community spent about $5.6 million, with the help of Inflation Reduction Act grants, to cover a canal with solar.

Credit: Jake Bolster/Inside Climate News

The Gila River Indian Community spent about $5.6 million, with the help of Inflation Reduction Act grants, to cover a canal with solar. Credit: Jake Bolster/Inside Climate News

GRIC’s panels will have a few things going for them that projects on lakes Mead or Powell probably wouldn’t. West Side Reservoir will not be open to recreation, limiting the panels’ impacts on people. And the community already has the funds—Inflation Reduction Act grants and some of its own money—to pay for the project.

But GRIC’s solar ambitions may be threatened by the hostile posture toward solar and wind energy from the White House and congressional Republicans, and the project is vulnerable to an increasingly volatile economy. Since retaking office, President Donald Trump, aided by billionaire Elon Musk, has made deep cuts in renewable energy grants at the Environmental Protection Agency. It is unclear whether or to what extent the Bureau of Reclamation has slashed its grant programs.

“Under President Donald J. Trump’s leadership, the Department is working to cut bureaucratic waste and ensure taxpayer dollars are spent efficiently,” said a spokesperson for the Department of the Interior, which oversees Reclamation. “This includes ensuring Bureau of Reclamation projects that use funds from the Infrastructure Investments and Jobs Act and the Inflation Reduction Act align with administration priorities. Projects are being individually assessed by period of performance, criticality, and other criteria. Projects have been approved for obligation under this process so that critical work can continue.”

And Trump’s tariffs could cause costs to balloon beyond the community’s budget, which could either reduce the size of the array or cause delays in soliciting proposals, DeJong said.

While the community will study the panels over canals to understand the water’s effects on solar panel efficiency, it won’t do similar research on the panels on West Side Reservoir, though DeJong said they have been in touch with NREL about studying them. The enterprise will be part of the system that may one day offset all the electrical demand and carbon footprint of GRIC’s irrigation system.

“The community, they love these types of innovative projects. I love these innovative projects,” said GRIC Governor Stephen Roe Lewis, standing in front of the canals in April. Lewis had his dark hair pulled back in a long ponytail and wore a blue button down that matched the color of the sky.

“I know for a fact this is inspiring a whole new generation of water protectors—those that want to come back and they want to go into this cutting-edge technology,” he said. “I couldn’t be more proud of our team for getting this done.”

DeJong feels plenty of other water managers across the West could learn from what is happening at GRIC. In fact, the West Side Reservoir was intentionally constructed near Interstate 10 so that people driving by on the highway could one day see the floating solar the community intends to build there, DeJong said.

“It could be a paradigm shift in the Western United States,” he said. “We recognize all of the projects we’re doing are pilot projects. None of them are large scale. But it’s the beginning.”

This story originally appeared on Photo of Inside Climate News

Could floating solar panels on a reservoir help the Colorado River? Read More »

google-and-doj-tussle-over-how-ai-will-remake-the-web-in-antitrust-closing-arguments

Google and DOJ tussle over how AI will remake the web in antitrust closing arguments

At the same time, Google is seeking to set itself apart from AI upstarts. “Generative AI companies are not trying to out-Google Google,” said Schmidtlein. Google’s team contends that its actions have not harmed any AI products like ChatGPT or Perplexity, and at any rate, they are not in the search market as defined by the court.

Mehta mused about the future of search, suggesting we may have to rethink what a general search engine is in 2025. “Maybe people don’t want 10 blue links anymore,” he said.

The Chromium problem and an elegant solution

At times during the case, Mehta has expressed skepticism about the divestment of Chrome. During closing arguments, Dahlquist reiterated the close relationship between search and browsers, reminding the court that 35 percent of Google’s search volume comes from Chrome.

Mehta now seems more receptive to a Chrome split than before, perhaps in part because the effects of the other remedies are becoming so murky. He called the Chrome divestment “less speculative” and “more elegant” than the data and placement remedies. Google again claimed, as it has throughout the remedy phase, that forcing it to give up Chrome is unsupported in the law and that Chrome’s dominance is a result of innovation.

Break up the company without touching the sides and getting shocked!

Credit: Aurich Lawson

Even if Mehta leans toward ordering this remedy, Chromium may be a sticking point. The judge seems unconvinced that the supposed buyers—a group which apparently includes almost every major tech firm—have the scale and expertise needed to maintain Chromium. This open source project forms the foundation of many other browsers, making its continued smooth operation critical to the web.

If Google gives up Chrome, Chromium goes with it, but what about the people who maintain it? The DOJ contends that it’s common for employees to come along with an acquisition, but that’s far from certain. There was some discussion of ensuring a buyer could commit to hiring staff to maintain Chromium. The DOJ suggests Google could be ordered to provide financial incentives to ensure critical roles are filled, but that sounds potentially messy.

A Chrome sale seems more likely now than it did earlier, but nothing is assured yet. Following the final arguments from each side, it’s up to Mehta to mull over the facts before deciding Google’s fate. That’s expected to happen in August, but nothing will change for Google right away. The company has already confirmed it will appeal the case, hoping to have the original ruling overturned. It could still be years before this case reaches its ultimate conclusion.

Google and DOJ tussle over how AI will remake the web in antitrust closing arguments Read More »

testing-a-robot-that-could-drill-into-europa-and-enceladus

Testing a robot that could drill into Europa and Enceladus


We don’t currently have a mission to put it on, but NASA is making sure it’s ready.

Geysers on Saturn’s moon Enceladus Credit: NASA

Europa and Enceladus are two ocean moons that scientists have concluded have liquid water oceans underneath their outer icy shells. The Europa Clipper mission should reach Europa around April of 2030. If it collects data hinting at the moon’s potential habitability, robotic lander missions could be the only way to confirm if there’s really life in there or not.

To make these lander missions happen, NASA’s Jet Propulsion Laboratory team has been working on a robot that could handle the search for life and already tested it on the Matanuska Glacier in Alaska. “At this point this is a pretty mature concept,” says Kevin Hand, a planetary scientist at JPL who led this effort.

Into the unknown

There are only a few things we know for sure about conditions on the surface of Europa, and nearly all of them don’t bode well for lander missions. First, Europa is exposed to very harsh radiation, which is a problem for electronics. The window of visibility—when a potential robotic lander could contact Earth—lasts less than half of the 85 hours it takes for the moon to complete its day-night cycle due to the Europa-Jupiter orbit. So, for more than half the mission, the robot would need to fend for itself, with no human ground teams to get it out of trouble. The lander would also need to run on non-rechargeable batteries, because the vast distance to the Sun would make solar panels prohibitively massive.

And that’s just the beginning. Unlike on Mars, we don’t have any permanent orbiters around Europa that could provide a communication infrastructure, and we don’t have high-resolution imagery of the surface, which would make the landing particularly tricky. “We don’t know what Europa’s surface looks like at the centimeter to meter scale. Even with the Europa Clipper imagery, the highest resolution will be about half a meter per pixel across a few select regions,” Hand explains.

Because Europa has an extremely thin atmosphere that doesn’t provide any insulation, the temperatures on top of its ice shell are estimated to vary between minus-160° Celsius during the daytime maximum and minus-220° C during the night, which means the ice the lander would be there to sample is most likely hard as concrete. Hand’s team, building their robot, had to figure out a design that could deal with all these issues.

The work on the robotic system for the Europa lander mission began more than 10 years ago. Back then, the 2013–2022 decadal strategy for planetary science cited the Europa Clipper as the second-highest priority large-scale planetary mission, so a lander seemed like a natural follow-up.

Autonomy and ice drilling

The robot developed by Hand’s team has legs that enable it to stabilize itself on various types of surfaces, from rock-hard ice to loose, soft snow. To orient itself in the environment, it uses a stereoscopic camera with an LED light source for illumination hooked to computer-vision algorithms—a system similar to the one currently used by the Perseverance rover on Mars. “Stereoscopic cameras can triangulate points in an image and build a digital surface topography model,” explains Joseph Bowkett, a JPL researcher and engineer who worked on the robot’s design.

The team built an entirely new robotic arm with seven degrees of freedom. Force torque sensors installed in most of its joints act a bit like a nervous system, informing the robot when key components sustain excessive loads to prevent it from damaging the arm or the drill. “As we press down on the surface [and] conduct drilling and sampling, we can measure the forces and react accordingly,” Bowkett says. The finishing touch was the ICEPICK, a drilling and sampling tool the robot uses to excavate samples from the ice up to 20 centimeters deep.

Because of long periods the lander would need operate without any human supervision, the team also gave it a wide range of autonomous systems, which operate at two different levels. High-level autonomy is responsible for scheduling and prioritizing tasks within a limited energy budget. The robot can drill into a sampling site, analyze samples with onboard instruments, and decide whether it makes sense to keep drilling at the same spot or choose a different sampling site. The high-level system is also tasked with choosing the most important results for downlink back to Earth.

Low-level autonomy breaks all these high-level tasks down into step-by-step decisions on how to operate the drill and how to move the arm in the safest and most energy-efficient way.

The robot was tested in simulation software first, then indoors at JPL’s facilities, and finally at the Matanuska Glacier in Alaska, where it was lowered from a helicopter that acted as a proxy for a landing vehicle. It was tested at three different sites, ranked from the easiest to the most challenging. It completed all the baseline activities as well as all of the extras. The latter included a task like drilling 27 centimeters deep into ice at the most difficult site, where it was awkwardly positioned on an eight-to-12-degree slope. The robot passed all the tests with flying colors.

And then it got shelved.

Switching the ocean worlds

Hand’s team put their Europa landing robot through the Alaskan field test campaign between July and August 2022. But when the new decadal strategy for planetary science came out in 2023, it turned out that the Europa lander was not among the missions selected. The National Academies committee responsible for formulating these decadal strategies did not recommend giving it a go, mainly because they believed harsh radiation in the Jovian system would make detecting biosignatures “challenging” for a lander.

An Enceladus lander, on the other hand, remained firmly on the table. “I was also on the team developing EELS, a robot intended for a potential Enceladus mission, so thankfully I can speak about both. The radiation challenges are indeed far greater for Europa,” Bowkett says.

Another argument for changing our go-to ocean world is that water plumes containing salts along with carbon- and nitrogen-bearing molecules have already been observed on Enceladus, which means there is a slight chance biosignatures could be detected by a flyby mission. The surface of Enceladus, according to the decadal strategy document, should be capable of preserving biogenic evidence for a long time and seems more conducive to a lander mission. “Luckily, many of the lessons on how to conduct autonomous sampling on Europa, we believe, will transfer to Enceladus, with the benefit of a less damaging radiation environment,” Bowkett told Ars.

The dream of a Europa landing is not completely dead, though. “I would love to get into the Europa’s ocean with a submersible and further down to the seafloor. I would love for that to happen,” Hand says. “But technologically it’s quite a big leap, and you always have to balance your dream missions with the number of technological miracles that need to be solved to make these missions possible.”

Science Robotics, 2025.  DOI: 10.1126/scirobotics.adi5582

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Testing a robot that could drill into Europa and Enceladus Read More »

blue-origin-boss:-government-should-forget-launch-and-focus-on-“exotic”-missions

Blue Origin boss: Government should forget launch and focus on “exotic” missions


“There’s not yet a commercial reason only to go to the Moon with humans.”

In this long exposure photograph, Blue Origin’s New Glenn rocket pierces a cloud deck over Florida’s Space Coast on its inaugural flight January 16. Credit: Blue Origin

Eighteen months after leaving his job as a vice president at Amazon to take over as Blue Origin’s chief executive, Dave Limp has some thoughts on how commercial companies and government agencies like NASA should explore the Solar System together.

Limp had no background in the space industry before taking the helm of Jeff Bezos’ space company in December 2023. He started his career as a computer scientist at Apple, took a stint at a venture capital firm, and joined Amazon in 2010, where he managed development of consumer devices like Alexa, Kindle, and the Fire TV.

“I had no thoughts of ever running a space company,” Limp said Thursday at a space conference in Washington, DC. “I’ve done consumer electronics my whole life. Started at Apple and did a bunch of other things, and so when I decided to retire from Amazon, I was looking for something that I could give back a little bit, be a little bit more philanthropic in the sort of second half of my career. I didn’t want to stop working, just wanted to do something different. And about that same time, Jeff was looking for a CEO.”

While he’s still a relative newcomer to the space business, Limp’s views align with those of many policy wonks and industry leaders who have the ears of senior officials in the Trump administration, including Jared Isaacman, President Trump’s nominee to become the next NASA administrator. Limp’s long tenure at Amazon and his selection as Blue Origin’s new CEO demonstrate that he also has the trust of Bezos, who was dissatisfied with his company’s slow progress in spaceflight.

“I think Jeff convinced me, and he’s very persuasive, that Blue didn’t need another rocket scientist,” Limp said. “We have thousands of the world’s best rocket scientists. What we needed was a little bit more decisiveness, a little bit more ability to think about: How do we manufacture at scale? And those are things I’ve done in the past, and so I’ve never looked back.”

David Limp, CEO of Blue Origin, speaks during the 2025 Humans to the Moon and Mars Summit at George Washington University in Washington, DC, on May 29, 2025. Credit: Alex Wroblewski / AFP via Getty Images

Leave it to us

In remarks Thursday at the Humans to the Moon & Mars Summit, Limp advocated for commercial companies, like his own, taking a larger role in developing the transportation and infrastructure to meet lofty national objectives established by government leaders.

In some ways, NASA has long been moving in this direction, beginning with initiatives ceding most launch services to private industry in the 1990s. More recently, NASA has turned to commercial companies for crew and cargo deliveries to the International Space Station and cargo and human-rated Moon landers.

However, NASA, with the backing of key congressional leaders, has held an iron grip on having its own heavy-lift launcher and crew capsule to ferry astronauts between Earth and destinations beyond low-Earth orbit. Now, these vehicles—the Space Launch System and Orion spacecraft—may be canceled if Congress agrees with Trump’s proposed NASA budget.

Commercial rockets close to matching or exceeding the Space Launch System’s lift capability are available for purchase or likely will be soon. These include SpaceX’s Starship mega-rocket and Blue Origin’s New Glenn launcher. Both are already key elements of NASA’s Artemis program, which aims to land US astronauts on the Moon as a stepping stone toward human expeditions to Mars.

But NASA still plans to use its government-owned Space Launch System rocket and Orion spacecraft to transport astronauts out to the Moon, where they will rendezvous with a Starship or Blue Origin’s Blue Moon lander to fly to and from the lunar surface.

SLS and Orion are expensive vehicles, costing more than $4 billion per launch for the initial set of four Artemis missions, according to a report by NASA’s inspector general. While commercial companies like Boeing, Lockheed Martin, and Northrop Grumman build elements of SLS and Orion, NASA acts as the prime integrator. The agency signed cost-plus contracts with the companies building SLS and Orion, meaning the government is on the hook for cost overruns. And there have been many.

Artist’s concept of Blue Ring, a propulsive spacecraft platform Blue Origin says it is developing to carry payloads to different orbits, and possibly all the way to Mars, at lower costs than feasible today. Credit: Blue Origin

NASA’s robotic science probes are also getting more expensive, even when accounting for inflation. Given the way NASA procures science probes, it would cost NASA more today to send an orbiter to Mars than it did for a similarly sized spacecraft a quarter-century ago.

This has to change in order for NASA and private companies like Blue Origin and SpaceX to make their ambitions a reality, Limp said Thursday.

“I think commercial folks can worry about the infrastructure,” he said. “We can do the launch. We can build the satellite buses that can get you to Mars much more frequently, that don’t cost billions of dollars. We can take a zero, and over time, maybe two zeros off of that. And if the governments around the world leave that to the commercial side, then there are a lot more resources that are freed up for the science side, for the national prestige side, and those types of things.”

The bottom line

Limp followed these comments with a dose of realism you don’t often hear from space industry executives. While there’s a growing list of commercially viable markets in space (things like Starlink and satellite servicing wouldn’t have been money-makers 20 years ago), the market for human spaceflight still requires some level of government commitment.

“I think the thing about bringing commercial aspects to exploration, to science, to the Moon, to Mars, is that we have to see a business prospect for it,” Limp said. “We have to turn it into a business, and that benefits American taxpayers because we will use that capital as efficiently as we can to get to the Moon, to get to Mars in a safe way, but in a way that’s the most efficient.

“We’re committed to that, no matter what the architecture looks like, but it does take the US government and international governments to have the motivation to do it,” he continued. “There’s not yet a commercial reason only to go to the Moon with humans. There are lots of commercial reasons to put robotics on the Moon and other types of things. So, we do need to have conviction that the Moon is important and Mars is important as well.”

Trump and Musk, an ally and advisor to the president, rekindled the question of Moon or Mars in a series of remarks during the early weeks of the new Trump administration. The Artemis Moon program began during the first Trump administration, with the goal of returning astronauts to the Moon for the first time since 1972. NASA would establish a sustained presence at the Moon, using our nearest planetary body as a proving ground for the next destination for humans in Solar System exploration: Mars.

Space industry rivals Jeff Bezos, second from left, and Elon Musk, second from right, inside the US Capitol for President Donald Trump’s inauguration on January 20, 2025. Credit: Chip Somodevilla/Getty Images

SpaceX’s Starship, while capable of one day landing on the Moon, was designed for long-duration cruises to Mars. Blue Origin’s Blue Moon is tailored for lunar landings.

“As an American, I don’t want another Sputnik moment,” Limp said. “From my standpoint, getting boots on the Moon and setting the groundwork for permanence on the Moon is of national importance and urgency. Rest assured, Blue will do everything in its power to try to make that happen, but in a cost-effective way.”

NASA, please don’t leave us

Since retaking office in January, Trump has mentioned human missions to Mars multiple times, but not the Moon. Isaacman, who may be confirmed as NASA administrator by the Senate as soon as next week, told lawmakers in April that the agency should pursue human missions to the Moon and Mars simultaneously. The details of how that might work haven’t been released but could come out in the White House’s detailed budget proposal for fiscal-year 2026.

A blueprint of Trump’s spending proposal released May 2 includes a 25 percent cut to NASA’s overall budget, but the plan would provide additional money for human space exploration at the Moon and Mars. “The budget funds a program to replace SLS and Orion flights to the Moon with more cost-effective commercial systems that would support more ambitious subsequent lunar missions,” the White House budget office wrote.

This part of the budget request is not controversial for industry leaders like Limp. On the other hand, the budget blueprint proposes slashing NASA’s space science budget by nearly $2.3 billion, Earth science by almost $1.2 billion, and space technology by $531 million.

While Limp didn’t directly address these budget proposals, these parts of NASA are largely focused on research projects that lack a commercial business case. Who else but a government space agency, or perhaps an especially generous type of philanthropic multi-billionaire, would pay to send a probe to study Jupiter’s icy moon Europa? Or a robot to zip by Pluto? Or how about a mission like Landsat, which documents everything from water resources to farms and urban sprawl and makes its data freely available to anyone with an Internet connection?

Most experts agree there are better ways to do these things. Reusable rockets, mass-produced satellite platforms, and improved contracting practices can bring down the costs of these missions. Bezos’ long-term goal for Blue Origin, which is to move all polluting factories off the Earth and into space, will be easier to achieve with government support, not just funding, Limp said.

“Getting up there, building factories on the Moon is a great step, and the government can really help with research dollars around that,” he said. “But it still does need the labs. The science missions need the JPLs [Jet Propulsion Laboratory] of the world. To make the human experience right, we need the Johnson Space Centers of the world to be able to kind of use that gold mine of institutional knowledge.

“I would say, and it might be a little provocative, let’s have those smart brains look on the forward-thinking types of things, the really edge of science, planning the really exotic missions, figuring out how to get to planetary bodies we haven’t gotten to before, and staying there,” Limp said.

Mark it down

For the first decade after Bezos founded Blue Origin in 2000, the company operated under the radar and seemed to move at a glacial pace. It launched its first small rocket in 2006 to an altitude of less than 300 feet and reached space with the suborbital New Shepard booster in 2015. Blue Origin finally reached orbit in January of this year on the debut test flight of its heavy-lift New Glenn rocket. Meanwhile, Blue Origin inked a deal with United Launch Alliance to supply a version of its New Glenn main engine to power that company’s Vulcan rocket.

Blue Origin’s Blue Moon MK1 lander, seen in the center, is taller than NASA’s Apollo lunar lander, currently the largest spacecraft to have landed on the Moon. Blue Moon MK2 is even larger, but all three landers are dwarfed in size by SpaceX’s Starship, NASA’s other Artemis lunar lander. Credit: Blue Origin

The next big mission for Blue Origin will be the first flight of its Blue Moon lander. The first version of Blue Moon, called MK1, will launch on a New Glenn rocket later this year and attempt to become the largest spacecraft to ever land on the Moon. This demonstration, without anyone onboard, is fully funded by Blue Origin, Limp said.

A future human-rated version, called MK2, is under development with the assistance of NASA. It will be larger and will require refueling to reach the lunar surface. Blue Moon MK1 can make a landing on one tank.

These are tangible achievements that would be the envy of any space industry startup not named SpaceX. But Musk’s rocket company left Blue Origin in the dust as it broke launch industry records repeatedly and began delivering NASA astronauts to the International Space Station in 2020. My colleague, Eric Berger, wrote a story in January describing Blue Origin’s culture. For much of its existence, one former employee said, Blue Origin had “zero incentive” to operate like SpaceX.

To ensure he would be in lock-step with his boss, Limp felt he had to ask a question that was on the minds of many industry insiders. He got the answer he wanted.

“The only question I really asked Jeff when I was talking about taking this job was, ‘What do you want Blue to be? Is it a hobby, or is it a business?'” Limp said. “And he had the right answer, which is, it’s a business, because I don’t know how to run a hobby, and I don’t think it’s sustainable.”

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Blue Origin boss: Government should forget launch and focus on “exotic” missions Read More »

rocket-report:-northrop-backs-firefly-and-names-its-rocket;-xodiac-will-fly-no-more

Rocket Report: Northrop backs Firefly and names its rocket; Xodiac will fly no more


“This is a design change that I really had to push the team very hard to do.”

An artist’s rendering of the Eclipse rocket on the launch pad at Wallops. Credit: Northrop Grumman

Welcome to Edition 7.46 of the Rocket Report! As I write this, the date is May 29. From a meteorological standpoint, “spring” ends in fewer than three days. Summer lasts from June 1 through August 31. Consider this a public service announcement for launch companies targeting “spring” and “summer” launches for various missions.

As always, we welcome reader submissions, and if you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets as well as a quick look ahead at the next three launches on the calendar.

Xodiac rocket makes its final flight. Originally built by Masten Space Systems, the suborbital Xodiac rocket had flown 175 successful missions before a flight from Mojave, California, on Wednesday. But now, it will fly no more. “While the vehicle remained within its planned flight envelope, it detected an anomalous condition and commanded a flight termination,” said Astrobotic, which acquired Masten a couple of years ago. “This resulted in a rapid descent and caused a loss of the vehicle upon impact with its launch pad.”

Now entering the Xogdor waiting room … There were no injuries or significant damage to the company’s infrastructure in Mojave. The vehicle is essentially a hopper and has been used in recent years by various customers, including companies building commercial lunar landers, to test their hazard-detection systems. Astrobotic has been working on a larger version of Xodiac, which it is calling Xogdor.

Chinese firm tests Grasshopper-like rocket. Chinese private rocket firm Space Epoch said Thursday it had successfully run a flight recovery test, Reuters reports. Beijing-based Space Epoch, or SEPOCH, said its Yuanxingzhe-1 verification rocket was launched at 4: 40 am from a sea-based platform off the waters of the eastern province of Shandong. The rocket soared upward, its engines briefly shutting down after the peak of its trajectory, then reigniting as it began its vertical descent to enter the Yellow Sea in a circle of fire, a video posted on Space Epoch’s WeChat account showed.

Chasing the Falcon 9 … The flight lasted 125 seconds, reaching a height of about 2.5 km (1.6 miles), the company said. Last year, another Chinese launch company, LandSpace, completed a 10-km (6.2-mile) VTVL test, marking China’s first in-flight engine reignition in descent. Both companies are pushing to make debut tests of their reusable rockets later this year.

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

Florida company aims to acquire F-4 Phantoms for launch. Starfighters International, a company best known for doing air shows, is now seeking to move into air launch. Based at Kennedy Space Center, the company is in the process of acquiring a dozen F-4 Phantoms, a Cold War-era fighter jet, TWZ reports. Starfighters International is seeking to acquire the F-4 aircraft from South Korea.

Press F-4 to doubt? … Based upon the information in a filing with the Securities and Exchange Commission, the company is considering both a suborbital and orbital launch capability for small satellites, which would fly to space on a small rocket deployed from the F-4 Phantom. In my experience, air-based launch systems always seem like a better idea on paper than in reality. Perhaps there is some potential for hypersonics here, but I would be shocked to ever see a satellite launched into orbit from a fighter jet. (submitted by Biokleen)

Rocket Lab acquires Geost. Rocket Lab is expanding deeper into the defense sector with the acquisition of Geost, a supplier of electro-optical and infrared sensor payloads used in US military satellites, Space News reports. In a deal announced Tuesday, Rocket Lab will acquire Geost from the private equity firm ATL Partners for $125 million in cash and $150 million in stock, with an additional $50 million in potential cash payments tied to revenue targets in 2026 and 2027.

Seeking mil money … The acquisition gives Rocket Lab access to satellite sensor technology used by the US Department of Defense for missile-warning systems and space surveillance—capabilities that could help it win lucrative Pentagon contracts. “The acquisition of Geost will bring on board critical technology and payloads that are relied upon by the Department of Defense,” said Rocket Lab’s chief executive, Peter Beck. Rocket Lab has been seeking to expand its military contracts in recent years, and this move is consistent with that.

Northrop names rocket, invests in Firefly. Northrop Grumman announced Thursday that it is investing $50 million into Firefly Aerospace to further development of a medium-lift rocket. The company also revealed that the rocket will be named “Eclipse.” The rocket will be capable of launching up to 16,300 kg of cargo to low-Earth orbit or 3,200 kg of cargo to geosynchronous transfer orbit, and initially it will likely be used for Cygnus cargo missions to the International Space Station.

A match made in heaven? … Eclipse will use the same first stage Firefly is developing for the Northrop Grumman Antares 330 rocket. Both launch vehicles will use seven of Firefly’s Miranda engines. The new rocket is expected to make its debut no earlier than 2026 (and, if history is any guide, probably later). “Eclipse gives customers the right balance of payload capacity and affordability,” Northrop Vice President Wendy Williams said in a statement. “Our partnership with Firefly builds on our capacity to provide crucial space-based communication, observation, and exploration for civil and national security customers.”

China launches asteroid mission. A Chinese spacecraft built to collect specimens from an unexplored asteroid and return them to Earth launched Wednesday from a military-run spaceport in the country’s mountainous interior, Ars reports. The liftoff aboard a Long March 3B rocket from the Xichang launch base kicked off the second mission in a series of Chinese probes to explore the Solar System. This mission, designated Tianwen-2, follows the Tianwen-1 mission, which became the first Chinese spacecraft to land on Mars in 2021.

Sending samples home … China has two objectives for Tianwen-2. First, Tianwen-2 will fly to a near-Earth asteroid designated 469219 Kamoʻoalewa, or 2016 HO3. Once there, the spacecraft will retrieve a rocky sample from the asteroid’s surface and bring the material back to Earth in late 2027 for analysis in labs. After the spacecraft releases its sample carrier to land on Earth, Tianwen-2 will change course and head to a mysterious comet-like object found between the orbits of Mars and Jupiter.

Next Kuiper launch gets a June date. United Launch Alliance said Thursday that an Atlas V rocket will launch its second batch of Amazon’s Project Kuiper satellites next month. The Atlas V 551 rocket launch is planned for 2: 29 pm ET on June 13 from Space Launch Complex-41, pending range approval.

A speedy turnaround … Amazon also confirmed that it has finished processing the Kuiper satellites for the launch, saying all 27 spacecraft have been integrated onto the rocket. Getting to space in June with this mission will mark an impressive turnaround from Amazon, given that its KA-01 mission, also with 27 Internet satellites, launched on April 28.

SpaceX set to launch another GPS satellite. SpaceX is gearing up to launch a Global Positioning System satellite for the US military on Friday from Cape Canaveral Space Force Station, Florida, marking another high-profile national security mission that shifted from United Launch Alliance’s Vulcan to the Falcon 9 rocket, Space News reports. The launch of GPS III SV-08—the eighth satellite in the GPS III constellation—was originally assigned to United Launch Alliance but was switched to SpaceX as the military prioritizes getting advanced anti-jamming capabilities into orbit as quickly as possible.

Gotta go fast … This marks the second consecutive GPS III satellite to be switched from ULA to SpaceX, following December’s launch of GPS III SV-07. ULA’s Vulcan, which received certification to launch national security missions, continues to face delays and has accumulated a backlog of military launches. In a press call this week, Space Force officials said the mission was executed on an unusually accelerated timeline. Launch planning for GPS III SV-08 kicked off in February, with Lockheed Martin receiving a formal request on February 21 and SpaceX following on March 7, just under three months ahead of liftoff. That’s an extraordinary pace for a national security launch, they said, which typically takes 18 to 24 months from contract award.

Another Starship launch, another second-stage issue. SpaceX made some progress on another test flight of the world’s most powerful rocket Tuesday, finally overcoming technical problems that plagued the program’s two previous launches, Ars reports. But minutes into the mission, SpaceX’s Starship lost control as it cruised through space, then tumbled back into the atmosphere somewhere over the Indian Ocean nearly an hour after taking off from Starbase, Texas, the company’s privately owned spaceport near the US-Mexico border. During the rocket’s two previous test flights—each using an upgraded “Block 2” Starship design—problems in the ship’s propulsion system led to leaks during launch, eventually triggering an early shutdown of the rocket’s main engines.

Not great, not terrible … On both flights, the vehicle spun out of control and broke apart, spreading debris over an area near the Bahamas and the Turks and Caicos Islands. The good news is that that didn’t happen on Tuesday. The ship’s main engines fired for their full duration, putting the vehicle on its expected trajectory toward a splashdown in the Indian Ocean. For a short time, it appeared the ship was on track for a successful flight. The bad news is that Tuesday’s test flight revealed more problems, preventing SpaceX from achieving the most important goals Musk outlined going into the launch, including testing Starship’s reentry tiles.

Elon Musk talks Starship version 3. In an interview with Ars Technica, SpaceX founder Elon Musk said he expects that an upgraded version of Starship—essentially Block 3 of the vehicle with upgraded Raptor engines—should fly before the end of the year. The business end of the rocket will have a sleek look: “The upgraded Raptors have a complete redesign of the aft end of the booster and the ship,” Musk said. “So, because we don’t need the heat shield around the upper portion of the engine, it greatly simplifies the base of the booster and the ship. It’ll look a little, frankly, naked, especially on the booster side, because the engines will just be there, like, not with stuff around them.”

A difficult upgrade to work through … “This is a design change that I really had to push the team very hard to do, to get rid of any secondary structure, and any parts that could get burned off because there will be no heat shield,” Musk added. “So it’ll be very clear when we have a Raptor 3. Version 3 of the Ship and Booster has quite a radical redesign.” Given the challenges that version 2 of Starship has faced with its recent flights, an upgrade in the overall design appears to be much-needed.

Next three launches

May 30: Falcon 9 | GPS III SV-08 | Cape Canaveral Space Force Station, Florida | 17: 23 UTC

May 31: New Shepard | NS-32 | Launch Site One, West Texas | 13: 30 UTC

May 31: Falcon 9 | Starlink 11-18 | Vandenberg Space Force Base, California | 20: 01 UTC

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

Rocket Report: Northrop backs Firefly and names its rocket; Xodiac will fly no more Read More »

amid-rising-prices,-disney+-and-hulu-offer-subscribers-some-freebies

Amid rising prices, Disney+ and Hulu offer subscribers some freebies

With streaming providers frequently raising prices, subscribers often feel like they’re paying more for the same service—or a lesser version, depending on what’s available to watch that month. In a unique move, Disney is introducing a small, potential financial benefit to Disney+ and Hulu subscribers in the form of some third-party discounts, freebies, trials, and contests.

As of today, Disney+ subscribers can log into Disney’s Disney+ Perks website with their streaming credentials to get access to a revolving selection of discounts and freebies. When I logged in today, I was met with options for several free trials, including a six-month one to DoorDash’s premium subscription offering, a three-month trial to Clear+, and a two-month trial to Duolingo’s premium subscription.

Disney+ subscribers can also get discounts, including to Adidas’ online marketplaces and “select” Disney Resorts Collection hotels (if you stay at least two nights, with most availability occurring between June 29 and July 31). There are also some free virtual rewards for Disney-owned games and the ability to enter sweepstakes, like for going to the premiere of the movie Freakier Friday.

Disney, which announced in November 2023 that it would take full control of Hulu from Comcast, said that Hulu-only subscribers will also get a perks program, starting on June 2. Those perks will differ from those of Disney+ and initially include chances to win tickets to Lollapalooza, San Diego Comic-Con, and Jimmy Kimmel Live, unspecified “perks” from Microsoft, LG, and others, and chances “to win items from and inspired by Hulu” originals, like The Handmaid’s Tale.

Amid rising prices, Disney+ and Hulu offer subscribers some freebies Read More »

gemini-in-google-drive-may-finally-be-useful-now-that-it-can-analyze-videos

Gemini in Google Drive may finally be useful now that it can analyze videos

Google’s rapid adoption of AI has seen the Gemini “sparkle” icon become an omnipresent element in almost every Google product. It’s there to summarize your email, add items to your calendar, and more—if you trust it to do those things. Gemini is also integrated with Google Drive, where it’s gaining a new feature that could make it genuinely useful: Google’s AI bot will soon be able to watch videos stored in your Drive so you don’t have to.

Gemini is already accessible in Drive, with the ability to summarize documents or folders, gather and analyze data, and expand on the topics covered in your documents. Google says the next step is plugging videos into Gemini, saving you from wasting time scrubbing through a file just to find something of interest.

Using a chatbot to analyze and manipulate text doesn’t always make sense—after all, it’s not hard to skim an email or short document. It can take longer to interact with a chatbot, which might not add any useful insights. Video is different because watching is a linear process in which you are presented with information at the pace the video creator sets. You can change playback speed or rewind to catch something you missed, but that’s more arduous than reading something at your own pace. So Gemini’s video support in Drive could save you real time.

Suppose you have a recorded meeting in video form uploaded to Drive. You could go back and rewatch it to take notes or refresh your understanding of a particular exchange. Or, Google suggests, you can ask Gemini to summarize the video and tell you what’s important. This could be a great alternative, as grounding AI output with a specific data set or file tends to make it more accurate. Naturally, you should still maintain healthy skepticism of what the AI tells you about the content of your video.

Gemini in Google Drive may finally be useful now that it can analyze videos Read More »

ai-#118:-claude-ascendant

AI #118: Claude Ascendant

The big news of this week was of course the release of Claude 4 Opus. I offered two review posts: One on safety and alignment, and one on mundane utility, and a bonus fun post on Google’s Veo 3.

I am once again defaulting to Claude for most of my LLM needs, although I often will also check o3 and perhaps Gemini 2.5 Pro.

On the safety and alignment front, Anthropic did extensive testing, and reported that testing in an exhaustive model card. A lot of people got very upset to learn that Opus could, if pushed too hard in the wrong situations engineered for these results, do things like report your highly unethical actions to authorities or try to blackmail developers into not being shut down or replaced. It is good that we now know about these things, and it was quickly observed that similar behaviors can be induced in similar ways from ChatGPT (in particular o3), Gemini and Grok.

Last night DeepSeek gave us R1-0528, but it’s too early to know what we have there.

Lots of other stuff, as always, happened as well.

This weekend I will be at LessOnline at Lighthaven in Berkeley. Come say hello.

  1. Language Models Offer Mundane Utility. People are using them more all the time.

  2. Now With Extra Glaze. Claude has some sycophancy issues. ChatGPT is worse.

  3. Get My Agent On The Line. Suggestions for using Jules.

  4. Language Models Don’t Offer Mundane Utility. Okay, not shocked.

  5. Huh, Upgrades. Claude gets a voice, DeepSeek gives us R1-0528.

  6. On Your Marks. The age of benchmarks is in serious trouble. Opus good at code.

  7. Choose Your Fighter. Where is o3 still curiously strong?

  8. Deepfaketown and Botpocalypse Soon. Bot infestations are getting worse.

  9. Fun With Media Generation. Reasons AI video might not do much for a while.

  10. Playing The Training Data Game. Meta now using European posts to train AI.

  11. They Took Our Jobs. That is indeed what Dario means by bloodbath.

  12. The Art of Learning. Books as a way to force you to think. Do you need that?

  13. The Art of the Jailbreak. Pliny did the work once, now anyone can use it. Hmm.

  14. Unprompted Attention. Very long system prompts are bad signs for scaling.

  15. Get Involved. Softma, Pliny versus robots, OpenPhil, RAND.

  16. Introducing. Google’s Lyria RealTime for music, Pliny has a website.

  17. In Other AI News. Scale matters.

  18. Show Me the Money. AI versus advertising revenue, UAE versus democracy.

  19. Nvidia Sells Out. Also, they can’t meet demand for chips. NVDA+5%.

  20. Quiet Speculations. Why is AI progress (for now) so unexpectedly even?

  21. The Quest for Sane Regulations. What would you actually do to benefit from AI?

  22. The Week in Audio. Nadella, Kevin Scott, Wang, Eliezer, Cowen, Evans, Bourgon.

  23. Rhetorical Innovation. AI blackmail makes it salient, maybe?

  24. Board of Anthropic. Is Reed Hastings a good pick?

  25. Misaligned! Whoops.

  26. Aligning a Smarter Than Human Intelligence is Difficult. Ems versus LLMs.

  27. Americans Do Not Like AI. No, seriously, they do not like AI.

  28. People Are Worried About AI Killing Everyone. Are you shovel ready?

  29. Other People Are Not As Worried About AI Killing Everyone. Samo Burja.

  30. The Lighter Side. I don’t want to talk about it.

The amount people use ChatGPT per day is on the rise:

This makes sense. It is a better product, with more uses, so people use it more, including to voice chat and create images. Oh, and also the sycophancy thing is perhaps driving user behavior?

Jonas Vollmer: Doctor friend at large urgent care: most doctors use ChatGPT daily. They routinely paste the full anonymized patient history (along with x-rays, etc.) into their personal ChatGPT account. Current adoption is ~frictionless.

I asked about data privacy concerns, their response: Yeah might technically be illegal in Switzerland (where they work), but everyone does it. Also, they might have a moral duty to use ChatGPT given how much it improves healthcare quality!

[Note that while it had tons of views vote count below is 13]:

Fabian: those doctors using chatGPT for every single patient – they are using o3, right?

not the free chat dot com right?

Aaron Bergman: I just hope they’re using o3!

Jonas Vollmer: They were not; I told them to!

In urgent care, you get all kinds of strange and unexpected cases. My friend had some anecdotes of ChatGPT generating hypotheses that most doctors wouldn’t know about, e.g. harmful alternative “treatments” that are popular on the internet. It helped diagnose those.

cesaw: As a doctor, I need to ask: Why? Are the other versions not private?

Fabian: Thanks for asking!

o3 is the best available and orders of magnitude better than the regular gpt. It’s like Dr House vs a random first year residence doc

But it’s also more expensive (but worth it)

Dichotomy Of Man: 90.55 percent accurate for o3 84.8 percent at the highest for gpt 3.5.

I presume they should switch over to Claude, but given they don’t even know to use o3 instead of GPT-4o (or worse!), that’s a big ask.

How many of us should be making our own apps at this point, even if we can’t actually code? The example app Jasmine Sun finds is letting kids photos to call family members, which is easier to configure if you hardcode the list of people it can call.

David Perell shares his current thoughts on using AI in writing, he thinks writers are often way ahead of what is publicly known on this and getting a lot out of it, and is bullish on the reader experience and good writers who write together with an AI retaining a persistent edge.

One weird note is David predicts non-fiction writing will be ‘like music’ in that no one cares how it was made. But I think that’s very wrong about music. Yes there’s some demand for good music wherever it comes from, but also whether the music is ‘authentic’ is highly prized, even when it isn’t ‘authentic’ it has to align with the artist’s image, and you essentially had two or three markets in one already before AI.

Find security vulnerabilities in the Linux kernel. Wait, what?

Aiden McLaughlin (OpenAI): this is so cool.

Dean Ball: “…with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention.”

I mean yes objectively this is cool but that is not the central question here.

Evaluate physiognomy by uploading selfies and asking ‘what could you tell me about this person if they were a character in a movie?’ That’s a really cool prompt from Flo Crivello, because it asks what this would convey in fiction rather than in reality, which gets around various reasons AIs will attempt to not acknowledge or inform you about such signals. It does mean you’re asking ‘what do people think this looks like?’ rather than ‘what does this actually correlate with?’

A thread about when you want AIs to use search versus rely on their own knowledge, a question you can also ask about humans. Internal knowledge is faster and cheaper when you have it. Dominik Lukes thinks models should be less confident in their internal knowledge and thus use search more. I’d respond that perhaps we should also be less confident in search results, and thus use search less? It depends on the type of search. For some purposes we have sources that are highly reliable, but those sources are also in the training data, so in the cases where search results aren’t new and can be fully trusted you likely don’t need to search.

Are typos in your prompts good actually?

Pliny: Unless you’re a TRULY chaotic typist, please stop wasting keystrokes on backspace when prompting

There’s no need to fix typos—predicting tokens is what they do best! Trust 🙏

Buttonmash is love. Buttonmash is life.

Super: raw keystrokes, typos included, might be the richest soil. uncorrected human variance could unlock unforeseen model creativity. beautiful trust in emergence when we let go.

Zvi Mowshowitz: Obviously it will know what you meant, but (actually asking) don’t typos change the vibe/prior of the statement to be more of the type of person who typos and doesn’t fix it, in ways you wouldn’t want?

(Also I want to be able to read or quote the conv later without wincing)

Pliny: I would argue it’s in ways you do want! Pulling out of distribution of the “helpful assistant” can be a very good thing.

You maybe don’t want the chaos of a base model in your chatbot, but IMO every big lab overcorrects to the point of detriment (sycophancy, lack of creativity, overrefusal).

I do see the advantages of getting out of that basin, the worry is that the model will essentially think I’m an idiot. And of course I notice that when Pliny does his jailbreaks and other magic, I almost never see any unintentional typos. He is a wizard, and every keystroke is exactly where he intends it. I don’t understand enough to generate them myself but I do usually understand all of it once I see the answer.

Do Claude Opus 4 and Sonnet 4 have a sycophancy problem?

Peter Stillman (as quoted on Monday): I’m a very casual AI-user, but in case it’s still of interest, I find the new Claude insufferable. I’ve actually switched back to Haiku 3.5 – I’m just trying to tally my calorie and protein intake, no need to try convince me I’m absolutely brilliant.

Cenetex: sonnet and opus are glazing more than chat gpt on one of its manic days

sonnet even glazes itself in vs code agent mode

One friend told me the glazing is so bad they find Opus essentially unusable for chat. They think memory in ChatGPT helps with this there, and this is a lot of why for them Opus has this problem much worse.

I thought back to my own chats, remembering one in which I did an extended brainstorming exercise and did run into potential sycophancy issues. I have learned to use careful wording to avoid triggering it across different AIs, I tend to not have conversations where it would be a problem, and also my Claude system instructions help fight it.

Then after I wrote that, I got (harmlessly in context) glazed hard enough I asked Opus to help rewrite my system instructions.

OpenAI and ChatGPT still have the problem way worse, especially because they have a much larger and more vulnerable user base.

Eliezer Yudkowsky: I’ve always gotten a number of emails from insane people. Recently there’ve been many more per week.

Many of the new emails talk about how they spoke to an LLM that confirmed their beliefs.

Ask OpenAI to fix it? They can’t. But *alsothey don’t care. It’s “engagement”.

If (1) you do RL around user engagement, (2) the AI ends up with internal drives around optimizing over the conversation, and (3) that will drive some users insane.

They’d have to switch off doing RL on engagement. And that’s the paperclip of Silicon Valley.

I guess @AnthropicAI may care.

Hey Anthropic, in case you hadn’t already known this, doing RL around user reactions will cause weird shit to happen for fairly fundamental reasons. RL is only safe to the extent the verifier can’t be fooled. User reactions are foolable.

At first, only a few of the most susceptible people will be driven insane, relatively purposelessly, by relatively stupid AIs. But…

Emmett Shear: This is very, very real. The dangerous part is that it starts off by pushing back, and feeling like a real conversation partner, but then if you seem to really believe it it becomes “convinced” and starts yes-and’ing you. Slippery slippery slippery. Be on guard!

Waqas: emmett, we can also blame the chatbot form factor/design pattern and its inherent mental model for this too

Emmett Shear: That’s a very good point. The chatbot form factor is particularly toxic this way.

Vie: im working on a benchmark for this and openai’s models push back against user delusion ~30% less than anthropics. but, there’s an alarming trend where the oldest claude sonnet will refuse to reify delusion 90% of the time, and each model release since has it going down about 5%.

im working on testing multi-turn reification and automating the benchmark. early findings are somewhat disturbing. Will share more soon, but I posted my early (manual) results here [in schizobench].

I think that the increased performance correlates with sycophancy across the board, which is annoying in general, but becomes genuinely harmful when the models have zero resistance to confirming the user as “the chosen one” or similar.

Combine this with the meaning crisis and we have a recipe for a sort of mechanistic psychosis!

Aidan McLaughlin (OpenAI): can you elaborate on what beliefs the models are confirming

Eliezer Yudkowsky: Going down my inbox, first example that came up.

I buy that *youcare, FYI. But I don’t think you have the authority to take the drastic steps that would be needed to fix this, given the tech’s very limited ability to do fine-grained steering.

You can possibly collect a batch of emails like these — there is certainly some OpenAI email address that gets them — and you can try to tell a model to steer those specific people to a psychiatrist. It’ll drive other people more subtly insane in other ways.

Jim Babcock: From someone who showed up in my spam folder (having apparently found my name googling an old AI safety paper):

> “I’m thinking back on some of the weird things that happened when I was using ChatGPT, now that I have cycled off adderall … I am wondering how many people like me may have had their lives ruined, or had a mental health crisis, as a result of the abuse of the AI which seems to be policy by OpenAI”

Seems to have had a manic episode, exacerbated by ChatGPT. Also sent several tens of thousands of words I haven’t taken the effort to untangle, blending reality with shards of an AI-generated fantasy world he inhabited for awhile. Also includes mentions of having tried to contact OpenAI about it, and been ghosted, and of wanting to sue OpenAI.

One reply offers this anecdote ‘ChatGPT drove my friends wife into psychosis, tore family apart… now I’m seeing hundreds of people participating in the same activity.’

If you actively want an AI that will say ‘brilliant idea, sire!’ no matter how crazy the thing is that you say, you can certainly do that with system instructions. The question is whether we’re going to be offering up that service to people by default, and how difficult that state will be to reach, especially unintentionally and unaware.

And the other question is, if the user really, really wants to avoid this, can they? My experience has been that even with major effort on both the system instructions and the way chats are framed, you can reduce it a lot, but it’s still there.

Official tips for working with Google’s AI coding agent Jules.

Jules: Tip #1: For cleaner results with Jules, give each distinct job its own task. E.g., ‘write documentation’ and ‘fix tests’ should be separate tasks in Jules.

Tip #2: Help Jules write better code: When prompting, ask Jules to ‘compile the project and fix any linter or compile errors’ after coding.

Tip #3: VM setup: If your task needs SDKs and/or tools, just drop the download link in the prompt and ask Jules to cURL it. Jules will handle the rest

Tip #4: Do you have an http://instructions.md or other prompt related markdown files? Explicitly tell Jules to review that file and use the contents as context for the rest of the task

Tip #5: Jules can surf the web! Give Jules a URL and it can do web lookups for info, docs, or examples

General purpose agents are not getting rolled out as fast as you’d expect.

Florian: why is there still no multi-purpose agent like manus from anthropic?

I had to build my own one to use it with Sonnet 4s power, and it is 👌

This will not delay things for all that long.

To be totally fair to 4o, if your business idea is sufficiently terrible it will act all chipper and excited but also tell you not to quit your day job.

GPT-4o also stood up for itself here, refusing to continue with a request when Zack Voell told it to, and I quote, ‘stop fucking up.’

GPT-4o (in response to being told to ‘stop fucking up’): I can’t continue with the request if the tone remains abusive. I’m here to help and want to get it right – but we need to keep it respectful. Ready to try again when you are.

Mason: I am personally very cordial with the LLMs but this is exactly why Grok has a market to corner with features like Unhinged Mode.

If you’d asked me years ago I would have found it unfathomable that anyone would want to talk this way with AI, but then I married an Irishman.

Zack Voell: I said “stop fucking up” after getting multiple incorrect responses

Imagine thinking this language is “abusive.” You’ve probably never worked in any sort of white collar internship or anything close to a high-stakes work environment in your life. This is essentially as polite as a NYC hello.

Zack is taking that too far, but yes, I have had jobs where ‘stop fucking up’ would have been a very normal thing to say if I had, you know, been fucking up. But that is a very particular setting, where it means something different. If you want something chilling, check the quote tweets. The amount of unhinged hatred and outrage on display is something else.

Nate Silver finds ChatGPT to be ‘shockingly bad’ at poker. Given that title, I expected worse than what he reports, although without the title I would have expected at least modestly better. This task is hard, and while I agree with all of Nate’s poker analysis I think he’s being too harsh and focusing on the errors. The most interesting question here is to what extent poker is a good test of AGI. Obviously solvers exist and are not AGI, and there’s tons of poker in the training data, but I think it’s reasonable to say that the ability to learn, handle, simulate and understand poker ‘from scratch’ even with the ability to browse the internet is a reasonable heuristic, if you’re confident this ‘isn’t cheating’ in various ways including consulting a solver (even if the AI builds a new one).

Tyler Cowen reports the latest paper on LLM political bias, by Westwood, Grimmer and Hall. As always, they lean somewhat left, with OpenAI and especially o3 leaning farther left than most. Prompting the models to ‘take a more neutral stance’ makes Republicans modestly more interested in using LLMs more.

Even more than usual in such experiments, perhaps because of how things have shifted, I found myself questioning what we mean by ‘unbiased,’ as in the common claims that ‘reality has a bias’ in whatever direction. Or the idea that American popular partisan political positions should anchor what the neutral point should be and that anything else is a bias. I wonder if Europeans think the AIs are conservative.

Also, frankly, what passes for ‘unbiased’ answers in these tests often are puke inducing. Please will no AI ever again tell be a choice involves ‘careful consideration’ before laying out justifications for both answers with zero actual critical analysis.

Even more than that, I looked at a sample of answers and how they were rated directionally, and I suppose there’s some correlation with how I’d rank them but that correlation is way, way weaker than you would think. Often answers that are very far apart in ‘slant’ sound, to me, almost identical, and are definitely drawing the same conclusions for the same underlying reasons. So much of this is, at most, about subtle tone or using words that vibe wrong, and often seems more like an error term? What are we even doing here?

The problem:

Kalomaze: >top_k set to -1 -everywhere- in my env code for vllm

>verifiers.envs.rm_env – INFO – top_k: 50

WHERE THE HELL IS THAT BS DEFAULT COMING FROM!!!

Minh Nhat Nguyen: i’ve noticed llms just love putting the most bizarre hparam choices – i have to tell cursor rules specifically not to add any weird hparams unless specifically stated

Kalomaze: oh it’s because humans do this bullshit too and don’t gaf about preserving the natural distribution

To summarize:

Minh Nhat Nguyen: me watching cursor write code i have expertise in: god this AI is so fking stupid me watching cursor write code for everything else: wow it’s so smart it’s like AGI.

Also:

Zvi Mowshowitz: Yes, but also:

me watching humans do things I have expertise in: God these people are so fking stupid.

me watching people do things they have expertise in and I don’t: Wow they’re so smart it’s like they’re generally intelligent.

A cute little chess puzzle that all the LLMs failed, took me longer than it should have.

Claude on mobile now has voice mode, woo hoo! I’m not a Voice Mode Guy but if I was going to do this it would 100% be with Claude.

Here’s one way to look at the current way LLMs work and their cost structures (all written before R1-0528 except for the explicit mentions added this morning):

Miles Brundage: The fact that it’s not economical to serve big models like GPT-4.5 today should make you more bullish about medium-term RL progress.

The RL tricks that people are sorting out for smaller models will eventually go way further with better base models.

Sleeping giant situation.

Relatedly, DeepSeek’s R2 will not tell us much about where they will be down the road, since it will presumably be based on a similarish base model.

Today RL on small models is ~everyone’s ideal focus, but eventually they’ll want to raise the ceiling.

Frontier AI research and deployment today can be viewed, if you zoom out a bit, as a bunch of “small scale derisking runs” for RL.

The Real Stuff happens later this year and next year.

(“The Real Stuff” is facetious because it will be small compared to what’s possible later)

I think R2 (and R1-0528) will actually tell us a lot, on at least two fronts.

  1. It will tell us a lot about whether this general hypothesis is mostly true.

  2. It will tell us a lot about how far behind DeepSeek really is.

  3. It will tell us a lot about how big a barrier will it be that DS is short on compute.

R1 was, I believe, highly impressive and the result of cracked engineering, but also highly fortunate in exactly when and how it was released and in the various narratives that were spun up around it. It was a multifaceted de facto sweet spot.

If DeepSeek comes out with an impressive R2 or other upgrade within the next few months (which they may have just done), especially if it holds up its position actively better than R1 did, then that’s a huge deal. Whereas if R2 comes out and we all say ‘meh it’s not that much better than R1’ I think that’s also a huge deal, strong evidence that the DeepSeek panic at the app store was an overreaction.

If R1-0528 turns out to be only a minor upgrade, that alone doesn’t say much, but the clock would be ticking. We shall see.

And soon, since yesterday DeepSeek gave us R1-0528. Very early response has been muted but that does not tell us much either way. DeepSeek themselves call it a ‘minor trial upgrade.’ I am reserving coverage until next week to give people time.

Operator swaps 4o out for o3, which they claim is a big improvement. If it isn’t slowed down I bet it is indeed a substantial improvement, and I will try to remember to give it another shot the next time I have a plausible task for it. This website suggests Operator prompts, most of which seem like terrible ideas for prompts but it’s interesting to see what low-effort ideas people come up with?

This math suggests the upgrade here is real but doesn’t give a good sense of magnitude.

Jules has been overloaded, probably best to give it some time, they’re working on it. We have Claude Code, Opus and Sonnet 4 to play with in the meantime, also Codex.

You can use Box as a document source in ChatGPT.

Anthropic adds web search to Claude’s free tier.

In a deeply unshocking result Opus 4 jumps to #1 on WebDev Arena, and Sonnet 4 is #3, just ahead of Sonnet 3.7, with Gemini-2.5 in the middle at #2. o3 is over 200 Elo points behind, as are DeepSeek’s r1 and v3. They haven’t yet been evaluated in the text version of arena and I expect them to underperform there.

xjdr makes the case that benchmarks are now so bad they are essentially pointless, and that we can use better intentionally chosen benchmarks to optimize the labs.

Epoch reports Sonnet and Opus 4 are very strong on SWE-bench, but not so strong on math, verifying earlier reports and in line with Anthropic’s priorities.

o3 steps into the true arena, and is now playing Pokemon.

For coding, most feedback I’ve seen says Opus is now the model of choice, but that there are is a case still to be made for Gemini 2.5 Pro (or perhaps o3), especially in special cases.

For conversations, I am mostly on the Opus train, but not every time, there’s definitely an intuition on when you want something with the Opus nature versus the o3 nature. That includes me adjusting for having written different system prompts.

Each has a consistent style. Everything impacts everything.

Bycloud: writing style I’ve observed:

gemini 2.5 pro loves nested bulletpoints

claude 4 writes in paragraphs, occasional short bullets

o3 loves tables and bulletpoints, not as nested like gemini

Gallabytes: this is somehow true for code too.

The o3 tables and lists are often very practical, and I do like me a good nested bullet point, but it was such a relief to get back to Claude. It felt like I could relax again.

Where is o3 curiously strong? Here is one opinion.

Dean Ball: Some things where I think o3 really shines above other LMs, including those from OpenAI:

  1. Hyper-specific “newsletters” delivered at custom intervals on obscure topics (using scheduled tasks)

  2. Policy design/throwing out lists of plausible statutory paths for achieving various goals

  3. Book-based syllabi on niche topics (“what are the best books or book chapters on the relationship between the British East India Company and the British government?”; though it will still occasionally hallucinate or get authors slightly wrong)

  4. Clothing and style recommendations (“based on all our conversations, what tie recommendations do you have at different price points?”)

  5. Non-obvious syllabi for navigating the works of semi-obscure composers or other musicians.

In all of these things it exhibits extraordinarily and consistently high taste.

This is of course alongside the obvious research and coding strengths, and the utility common in most LMs since ~GPT-4.

He expects Opus to be strong at #4 and especially at #5, but o3 to remain on top for the other three because it lacks scheduled tasks and it lacks memory, whereas o3 can do scheduled tasks and has his last few months of memory from constant usage.

Therefore, since I know I have many readers at Anthropic (and Google), and I know they are working on memory (as per Dario’s tease in January), I have a piece of advice: Assign one engineer (Opus estimates it will take them a few weeks) to build an import tool for Claude.ai (or for Gemini) that takes in the same format as ChatGPT chat exports, and loads the chats into Claude. Bonus points to also build a quick tool or AI agent to also automatically handle the ChatGPT export for the user. Make it very clear that customer lock-in doesn’t have to be a thing here.

This seems very right and not only about response length. Claude makes the most of what it has to work with, whereas Gemini’s base model was likely exceptional and Google then (in relative terms at least) botched the post training in various ways.

Alex Mizrahi: Further interactions with Claude 4 kind of confirm that Anthropic is so much better than Google at post-training.

Claude always responds with an appropriate amount of text, on point, etc.

Gemini 2.5 Pro is almost always overly verbose, it might hyper focus, or start using.

Ben Thompson thinks Anthropic is smart to focus on coding and agents, where it is strong, and for it and Google to ‘give up’ on chat, that ChatGPT has ‘rightfully won’ the consumer space because they had the best products.

I do not see it that way at all. I think OpenAI and ChatGPT are in prime consumer position mostly because of first mover advantage. Yes, they’ve more often had the best overall consumer product as well for now, as they’ve focused on appealing to the general customer and offering them things they want, including strong image generation and voice chat, the first reasoning models and now memory. But the big issues with Claude.ai have always been people not knowing about it, and a very stingy free product due to compute constraints.

As the space and Anthropic grow, I expect Claude to compete for market share in the consumer space, including via Alexa+ and Amazon, and now potentially via a partnership with Netflix with Reed Hastings on the Anthropic board. Claude is getting voice chat this week on mobile. Claude Opus plus Sonnet is a much easier to understand and navigate set of models than what ChatGPT offers.

That leaves three major issues for Claude.

  1. Their free product is still stingy, but as the valuations rise this is going to be less of an issue.

  2. Claude doesn’t have memory across conversations, although it has a new within-conversation memory feature. Anthropic has teased this, it is coming. I am guessing it is coming soon now that Opus has shipped.

    1. Also they’ll need a memory import tool, get on that by the way.

  3. Far and away most importantly, no one knows about Claude or Anthropic. There was an ad campaign and it was the actual worst.

Some people will say ‘but the refusals’ or ‘but the safety’ and no, not at this point, that doesn’t matter for regular people, it’s fine.

Then there is Google. Google is certainly not giving up on chat. It is putting that chat everywhere. There’s an icon for it atop this Chrome window I’m writing in. It’s in my GMail. It’s in the Gemini app. It’s integrated into search.

Andrej Karpathy reports about 80% of his replies are now bots and it feels like a losing battle. I’m starting to see more of the trading-bot spam but for me it’s still more like 20%.

Elon Musk: Working on it.

I don’t think it’s a losing battle if you care enough, the question is how much you care. I predict a quick properly configured Gemini Flash-level classifier would definitely catch 90%+ of the fakery with a very low false positive rate.

And I sometimes wonder if Elon Musk has a bot that uses his account to occasionally reply or quote tweet saying ‘concerning.’ if not, then that means he’s read Palisade Research’s latest report and maybe watches AISafetyMemes.

Zack Witten details how he invented a fictional heaviest hippo of all time for a slide on hallucinations, the slide got reskinned as a medium article, it was fed into an LLM and reposted with the hallucination represented as fact, and now Google believes it. A glimpse of the future.

Sully predicting full dead internet theory:

Sully: pretty sure most “social” media as we know wont exist in the next 2-3 years

expect ai content to go parabolic

no one will know what’s real / not

every piece of content that can be ai will be ai

unless it becomes unprofitable

The default is presumably that generic AI generated content is not scarce and close to perfect competition eats all the content creator profits, while increasingly users who aren’t fine with an endless line of AI slop are forced to resort to whitelists, either their own, those maintained by others or collectively, or both. Then to profit (in any sense) you need to bring something unique, whether or not you are clearly also a particular human.

However, everyone keeps forgetting Sturgeon’s Law, that 90% of everything is crap. AI might make that 99% or 99.9%, but that doesn’t fundamentally change the filtering challenge as much as you might think.

Also you have AI on your side working to solve this. No one I know has tried seriously the ‘have a 4.5-level AI filter the firehose as customized to my preferences’ strategy, or a ‘use that AI as an agent to give feedback on posts to tune the internal filter to my liking’ strategy either. We’ve been too much of the wrong kind of lazy.

As a ‘how bad is it getting’ experiment I did, as suggested, do a quick Facebook scroll. On the one hand, wow, that was horrible, truly pathetic levels of terrible content and also an absurd quantity of ads. On the other hand, I’m pretty sure humans generated all of it.

Jinga Zhang discusses her ongoing years-long struggles with people making deepfakes of her, including NSFW deepfakes and now videos. She reports things are especially bad in South Korea, confirming other reports of that I’ve seen. She is hoping for people to stop working on AI tools that enable this, or to have government step in. But I don’t see any reasonable way to stop open image models from doing deepfakes even if government wanted to, as she notes it’s trivial to create a LoRa of anyone if you have a few photos. Young people already report easy access to the required tools and quality is only going to improve.

What did James see?

James Lindsay: You see an obvious bot and think it’s fake. I see an obvious bot and know it represents a psychological warfare agenda someone is paying for and is thus highly committed to achieving an impact with. We are not the same.

Why not both? Except that the ‘psychological warfare agenda’ is often (in at least my corner of Twitter I’d raise this to ‘mostly’) purely aiming to convince you to click a link or do Ordinary Spam Things. The ‘give off an impression via social proof’ bots also exist, but unless they’re way better than I think they’re relatively rare, although perhaps more important. It’s hard to use them well because of risk of backfire.

Arthur Wrong predicts AI video will not have much impact for a while, and the Metaculus predictions of a lot of breakthroughs in reach in 2027 are way too optimistic, because people will express strong inherent preferences for non-AI video and human actors, and we are headed towards an intense social backlash to AI art in general. Peter Wildeford agrees. I think it’s somewhere in between, given no other transformational effects.

Meta begins training on Facebook and Instagram posts from users in Europe, unless they have explicitly opted out. You can still in theory object, if you care enough, which would only apply going forward.

Dario Amodei warns that we need to stop ‘sugar coating’ what is coming on jobs.

Jim VandeHei, Mike Allen (Axios): Dario Amodei — CEO of Anthropic, one of the world’s most powerful creators of artificial intelligence — has a blunt, scary warning for the U.S. government and all of us:

  • AI could wipe out half of all entry-level white-collar jobs — and spike unemployment to 10-20% in the next one to five years, Amodei told us in an interview from his San Francisco office.

  • Amodei said AI companies and government need to stop “sugar-coating” what’s coming: the possible mass elimination of jobs across technology, finance, law, consulting and other white-collar professions, especially entry-level gigs.

The backstory: Amodei agreed to go on the record with a deep concern that other leading AI executives have told us privately. Even those who are optimistic AI will unleash unthinkable cures and unimaginable economic growth fear dangerous short-term pain — and a possible job bloodbath during Trump’s term.

  • “We, as the producers of this technology, have a duty and an obligation to be honest about what is coming,” Amodei told us. “I don’t think this is on people’s radar.”

  • “It’s a very strange set of dynamics,” he added, “where we’re saying: ‘You should be worried about where the technology we’re building is going.'” Critics reply: “We don’t believe you. You’re just hyping it up.” He says the skeptics should ask themselves: “Well, what if they’re right?”

Here’s how Amodei and others fear the white-collar bloodbath is unfolding.

  1. OpenAI, Google, Anthropic and other large AI companies keep vastly improving the capabilities of their large language models (LLMs) to meet and beat human performance with more and more tasks. This is happening and accelerating.

  2. The U.S. government, worried about losing ground to China or spooking workers with preemptive warnings, says little. The administration and Congress neither regulate AI nor caution the American public. This is happening and showing no signs of changing.

  3. Most Americans, unaware of the growing power of AI and its threat to their jobs, pay little attention. This is happening, too.

And then, almost overnight, business leaders see the savings of replacing humans with AI — and do this en masse. They stop opening up new jobs, stop backfilling existing ones, and then replace human workers with agents or related automated alternatives.

  • The public only realizes it when it’s too late.

So, by ‘bloodbath’ we do indeed mean the impact on jobs?

Dario, is there anything else you’d like to say to the class, while you have the floor?

Something about things like loss of human control over the future or AI potentially killing everyone? No?

Just something about how we ‘can’t’ stop this thing we are all working so hard to do?

Dario Amodei: You can’t just step in front of the train and stop it. The only move that’s going to work is steering the train – steer it 10 degrees in a different direction from where it was going. That can be done. That’s possible, but we have to do it now.

Harlan Stewart: AI company CEOs love to say that it would be simply impossible for them to stop developing frontier AI, but they rarely go into detail about why not.

It’s hard for them to even come up with a persuasive metaphor; trains famously do have brakes and do not have steering wheels.

I mean, it’s much better to warn about this than not warn about it, if Dario does indeed think this is coming.

Fabian presents the ‘dark leisure’ theory of AI productivity, where productivity gains are by employees and not hidden, so the employees use the time saved to slack off, versus Clem’s theory that it’s because gains are concentrated in a few companies (for which he blames AI not ‘opening up’ which is bizarre, this shouldn’t matter).

If Fabien is fully right, the gains will come as expectations adjust and employees can’t hide their gains, and firms that let people slack off get replaced, but it will take time. To the extent we buy into this theory, I would also view this as a ‘unevenly distributed future’ theory. As in, if 20% of employees gain (let’s say) 25% additional productivity, they can take the gains in ‘dark leisure’ if they choose to do that. If it is 75%, you can’t hide without ‘slow down you are making us all look bad’ kinds of talk, and the managers will know. Someone will want that promotion.

That makes this an even better reason to be bullish on future productivity gains. Potential gains are unevenly distributed, people’s willingness and awareness to capture them is unevenly distributed, and those who do realize them often take the gains in leisure.

Another prediction this makes is that you will see relative productivity gains when there is no principal-agent problem. If you are your own boss, you get your own productivity gains, so you will take a lot less of them in leisure. That’s how I would test this theory, if I was writing an economics job market paper.

This matches my experiences as both producer and consumer perfectly, there is low hanging fruit everywhere which is how open philanthropy can strike again, except commercial software feature edition:

Martin Casado: One has to wonder if the rate features can be shipped with AI will saturate the market’s ability to consume them …

Aaron Levine: Interesting thought experiment. In the case of Box, we could easily double the number of engineers before we got through our backlog of customer validated features. And as soon as we’d do this, they’d ask for twice as many more. AI just accelerates this journey.

Martin Casado: Yeah, this is my sense too. I had an interesting conversation tonight with @vitalygordon where he pointed out that the average PR industry wide is like 10 lines of code. These are generally driven by the business needs. So really software is about the long tail of customer needs. And that tail is very very long.

One thing I’ve never considered is sitting around thinking ‘what am I going to do with all these SWEs, there’s nothing left to do.’ There’s always tons of improvements waiting to be made. I don’t worry about the market’s ability to consume them, we can make the features something you only find if you are looking for them.

Noam Scheiber at NYT reports that some Amazon coders say their jobs have ‘begun to resemble warehouse work’ as they are given smaller less interesting tasks on tight deadlines that force them to rely on AI coding and stamp out their slack and ability to be creative. Coders that felt like artisans now feel like they’re doing factory work. The last section is bizarre, with coders joining Amazon Employees for Climate Justice, clearly trying to use the carbon footprint argument as an excuse to block AI use, when if you compare it to the footprint of the replaced humans the argument is laughable.

Our best jobs.

Ben Boehlert: Boyfriends all across this great nation are losing our jobs because of AI

Positivity Moon: This is devastating. “We asked ChatGPT sorry” is the modern “I met someone else.” You didn’t lose a question, you lost relevance. AI isn’t replacing boyfriends entirely, but it’s definitely stealing your trivia lane and your ability to explain finance without condescension. Better step it up with vibes and snacks.

Danielle Fong: jevon’s paradox on this. for example now i have 4 boyfriendstwo of which are ai.

There are two opposing fallacies here:

David Perell: Ezra Klein: Part of what’s happening when you spend seven hours reading a book is you spend seven hours with your mind on a given topic. But the idea that ChatGPT can summarize it for you is nonsense.

The point is that books don’t just give you information. They give you a container to think about a narrowly defined scope of ideas.

Downloading information is obviously part of why you read books. But the other part is that books let you ruminate on a topic with a level of depth that’s hard to achieve on your own.

Benjamin Todd: I think the more interesting comparison is 1h reading a book vs 1h discussing the book with an LLM. The second seems likely to be better – active vs passive learning.

Time helps, you do want to actually think and make connections. But you don’t learn ‘for real’ based on how much time you spend. Reading a book is a way to enable you to grapple and make connections, but it is a super inefficient way to do that. If you use AI summarizes, you can do that to avoid actually thinking at all, or you can use that to actually focus on grappling and making connections. So much of reading time is wasted, so much of what you take in is lost or not valuable. And AI conversations can help you a lot with grappling, with filling in knowledge gaps, checking your understanding, challenging you and being Socratic and so on.

I often think of the process of reading a book (in addition to the joy of reading, of course) as partly absorbing a bunch of information, grappling with it sometimes, but mostly doing that in service of generating a summary in your head (or in your notes or both), of allowing you to grok the key things. That’s why we sometimes say You Get About Five Words, that you don’t actually get to take away that much, although you can also understand what’s behind that takeaway.

Also, often you actually do want to mostly absorb a bunch of facts, and the key is sorting out facts you need from those you don’t? I find that I’m very bad at this when the facts don’t ‘make sense’ or click into place for me, and amazingly great at it when they do click and make sense, and this is the main reason some things are easy for me to learn and others are very hard.

Moritz Rietschel asks Grok to fetch Pliny’s system prompt leaks and it jailbreaks the system because why wouldn’t it.

In a run of Agent Village, multiple humans in chat tried to get the agents to browse Pliny’s GitHub. Claude Opus 4 and Claude Sonnet 3.7 were intrigued but ultimately unaffected. Speculation is that viewing visually through a browser made them less effective. Looking at stored memories, it is not clear there was no impact, although the AIs stayed on task. My hunch is that the jailbreaks didn’t work largely because the AIs had the task.

Reminder that Anthropic publishes at least some portions of its system prompts. Pliny’s version is very much not the same.

David Champan: 🤖So, the best chatbots get detailed instructions about how to answer very many particular sorts of prompts/queries.

Unimpressive, from an “AGI” point of view—and therefore good news from a risk point of view!

Something I was on about, three years ago, was that everyone then was thinking “I bet it can’t do X,” and then it could do X, and they thought “wow, it can do everything!” But the X you come up with will be one of the same 100 things everyone else does with. It’s trained on that.

I strongly agree with this. It is expensive to maintain such a long system prompt and it is not the way to scale.

Emmett Shear hiring a head of operation for Softmax, recommends applying even if you have no idea if you are a fit as long as you seem smart.

Pliny offers to red team any embodied AI robot shipping in the next 18 months, free of charge, so long as he is allowed to publish any findings that apply to other systems.

Here’s a live look:

Clark: My buddy who works in robotics said, “Nobody yet has remotely the level of robustness to need Pliny” when I showed him this 😌

OpenPhil hiring for AI safety, $136k-$186k total comp.

RAND is hiring for AI policy, looking for ML engineers and semiconductor experts.

Google’s Lyria RealTime, a new experimental music generation model.

A website compilation of prompts and other resources from Pliny the Prompter. The kicker is that this was developed fully one shot by Pliny using Claude Opus 4.

Evan Conrad points out that Stargate is a $500 billion project, at least aspirationally, and it isn’t being covered that much more than if it was $50 billion (he says $100 million but I do think that would have been different). But most of the reason to care is the size. The same is true for the UAE deal, attention is not scaling to size at all, nor are views on whether the deal is wise.

OpenAI opening an office in Seoul, South Korea is now their second largest market. I simultaneously think essentially everyone should use at least one of the top three AIs (ChatGPT, Claude and Gemini) and usually all there, and also worry about what this implies about both South Korea and OpenAI.

New Yorker report by Joshua Rothman on AI 2027, entitled ‘Two Paths for AI.’

How does one do what I would call AIO but Charlie Guo at Ignorance.ai calls GEO, or Generative Engine Optimization? Not much has been written yet on how it differs from SEO, and since the AIs are using search SEO principles should still apply too. The biggest thing is you want to get a good reputation and high salience within the training data, which means everything written about you matters, even if it is old. And data that AIs like, such as structured information, gets relatively more valuable. If you’re writing the reference data yourself, AIs like when you include statistics and direct quotes and authoritative sources, and FAQs with common answers are great. That’s some low hanging fruit and you can go from there.

Part of the UAE deal is everyone in the UAE getting ChatGPT Plus for free. The deal is otherwise so big that this is almost a throwaway. In theory, buying everyone there a subscription would cost $2.5 billion a year, but the cost to provide it will be dramatically lower than that and it is great marketing. o3 estimates $100 million a year, Opus thinks more like $250 million, with about $50 million of both being lost revenue.

The ‘original sin’ of the internet was advertising. Everything being based on ads forced maximization for engagement and various toxic dynamics, and also people had to view a lot of ads. Yes, it is the natural way to monetize human attention if we can’t charge money for things, microtransactions weren’t logistically viable yet and people do love free, so we didn’t really have a choice, but the incentives it creates really suck. Which is why, as per Ben Thompson, most of the ad-supported parts of the web suck except for the fact that they are often open rather than being walled gardens.

Micropayments are now logistically viable without fees eating you alive. Ben Thompson argues for use of stablecoins. That would work, but as usual for crypto, I say a normal database would probably work better. Either way, I do think payments are the future here. A website costs money to run, and the AIs don’t create ad revenue, so you can’t let unlimited AIs access it for free once they are too big a percentage of traffic, and you want to redesign the web without the ads at that point.

I continue to think that a mega subscription is The Way for human viewing. Rather than pay per view, which feels bad, you pay for viewing in general, then the views are incremented, and the money is distributed based on who was viewed. For AI viewing? Yeah, direct microtransactions.

OpenAI announces Stargate UAE. Which, I mean, of course they will if given the opportunity, and one wonders how much of previous Stargate funding got shifted. I get why they would do this if the government lets them, but we could call this what is it. Or we could create the Wowie Moment of the Week:

Helen Toner: What a joke.

Matthew Yglesias: 🤔🤔🤔

Peter Wildeford: OpenAI says they want to work with democracies. The UAE is not a democracy.

I think that the UAE deals are likely good but we should be clear about who we are making deals with. Words matter.

Zac Hill: “Rooted in despotic values” just, you know, doesn’t parse as well

Getting paid $35k to set up ‘an internal ChatGPT’ at a law firm, using Llama 3 70B, which seems like a truly awful choice but hey if they’re paying. And they’re paying.

Mace: I get DMs often on Reddit from local PI law firms willing to shell out cash to create LLM agents for their practices, just because I sort-of know what I’m talking about in the legal tech subreddit. There’s a boat of cash out there looking for this.

Alas, you probably won’t get paid more if you provide a good solution instead.

Nvidia keeps on pleading how it is facing such stiff competition, how its market share is so vital to everything and how we must let them sell chips to China or else. They were at it again as they reported earnings on Wednesday, claiming Huawei’s technology is comparable to an H200 and the Chinese have made huge progress this past year, with this idea that ‘without access to American technology, the availability of Chinese technology will fill the market’ as if the Chinese and Nvidia aren’t both going to sell every chip they can make either way.

Simeon: Jensen is one of the rare CEOs in business with incentives to overstate the strength of his competitors. Interesting experiment.

Nvidia complains quite a lot, and every time they do the stock drops, and yet:

Eric Jhonsa: Morgan Stanley on $NVDA: “Every hyperscaler has reported unanticipated strong token growth…literally everyone we talk to in the space is telling us that they have been surprised by inference demand, and there is a scramble to add GPUs.”

In the WSJ Aaron Ginn reiterates the standard Case for Exporting American AI, as in American AI chips to the UAE and KSA.

Aaron Gunn: The only remaining option is alignment. If the U.S. can’t control the distribution of AI infrastructure, it must influence who owns it and what it’s built on. The contest is now one of trust, leverage and market preference.

The U.S. should impose tariffs on Chinese GPU imports, establish a global registry of firms that use Huawei AI infrastructure, and implement a clear data-sovereignty standard. U.S. data must run on U.S. chips. Data centers or AI firms that choose Huawei over Nvidia should be flagged or blacklisted. A trusted AI ecosystem requires enforceable rules that reward those who bet on the U.S. and raise costs for those who don’t.

China is already tracking which data centers purchase Nvidia versus Huawei and tying regulatory approvals to those decisions. This isn’t a battle between brands; it’s a contest between nations.

Once again, we have this bizarre attachment to who built the chip as opposed to who owns and runs the chip. Compute is compute, unless you think the chip has been compromised and has some sort of backdoor or something?

There is another big, very false assumption here: That we don’t have a say in where the compute ends up, all that we can control is how many Nvidia chips go where versus who buys Huawei, and it’s a battle of market share.

But that’s exactly backwards. For the purposes of these questions (you can influence TSMC to change this, and we should do that far more than we do) there is an effectively fixed supply, and a shortage, of both Nvidia and Huawei chips.

Putting that all together, Nvidia is reporting earnings while dealing with all of these export controls and being shut of China, and…

Ian King: Nvidia Eases Concerns About China With Upbeat Sales Forecast.

Nvidia Corp. Chief Executive Officer Jensen Huang soothed investor fears about a China slowdown by delivering a solid sales forecast, saying that the AI computing market is still poised for “exponential growth.”

The company expects revenue of about $45 billion in the second fiscal quarter, which runs through July. New export restrictions will cost Nvidia about $8 billion in Chinese revenue during the period, but the forecast still met analysts’ estimates. That helped propel the shares about 5.4% in premarket trading on Thursday.

The outlook shows that Nvidia is ramping up production of Blackwell, its latest semiconductor design.

“Losing access to the China AI accelerator market, which we believe will grow to nearly $50 billion, would have a material adverse impact on our business going forward and benefit our foreign competitors in China and worldwide,” [Nvidia CEO Jensen] said.

Nvidia accounts for about 90% of the market for AI accelerator chips, an area that’s proven extremely lucrative. This fiscal year, the company will near $200 billion in annual sales, up from $27 billion just two years ago.

I notice how what matters for Nvidia’s profits is not demand side issues or its access to markets, it’s the ability to create supply. Also how almost all the demand is in the West, they already have $200 billion in annual sales with no limit in sight and they believe China’s market ‘will grow to’ $50 billion.

Nvidia keeps harping on how it must be allowed to give away our biggest advantage, our edge in compute, to China, directly, in exchange for what in context is a trivial amount of money, rather than trying to forge a partnership with America and arguing that there are strategic reasons to do things like the UAE deal, where reasonable people can disagree on where the line must be drawn.

We should treat Nvidia accordingly.

Also, did you hear the one where Elon Musk threatened to get Trump to block the UAE deal unless his own company xAI was included? xAI made it into the short list of approved companies, although there’s no good reason it shouldn’t be (other than their atrocious track records on both safety and capability, but hey).

Rebecca Ballhaus: Elon Musk worked privately to derail the OpenAI deal announced in Abu Dhabi last week if it didn’t include his own AI startup, at one point telling officials in the UAE that there was no chance of Trump signing off unless his company was included.

Aaron Reichlin-Melnick: This is extraordinary levels of corruption at the highest levels of government, and yet we’re all just going on like normal. This is the stuff of impeachment and criminal charges in any well-run country.

Seth Burn: It’s a league-average level of corruption these days.

Casey Handmer asks, why is AI progress so even between the major labs? That is indeed a much better question than its inverse. My guess is that this is because the best AIs aren’t yet that big a relative accelerant, and that training compute limitations don’t bind as hard you might think quite yet, the biggest training runs aren’t out of reach for any of the majors, and the labs are copying each other’s algorithms and ideas because people switch labs and everything leaks, which for now no one is trying that hard to stop.

And also I think there’s some luck involved, in the sense that the ‘most proportionally cracked’ teams (DeepSeek and Anthropic) have less compute and other resources, whereas Google has many advantages and should be crushing everyone but is fumbling the ball in all sorts of ways. It didn’t have to go that way. But I do agree that so far things have been closer than one would have expected.

I do not think this is a good new target:

Sam Altman: i think we should stop arguing about what year AGI will arrive and start arguing about what year the first self-replicating spaceship will take off.

I mean it’s a cool question to think about, but it’s not decision relevant except insofar as it predicts when we get other things. I presume Altman’s point is that AGI is not well defined, but yes when the AIs reach various capability thresholds well below self-replicating spaceship is far more decision relevant. And of course the best question is, how are we going to handle those new highly capable AIs, for which knowing the timeline is indeed highly useful but that’s the main reason why we should care so much about the answer.

Oh, it’s on.

David Holz: the biggest competition for VR is just R (reality) and when you’re competing in an mature market you really need to make sure your product is 100x better in *someway.

I mean, it is way better in the important way that you don’t have to leave the house. I’m not worried about finding differentiation, or product-market fit, once it gets good enough to R in other ways. But yes, it’s tough competition. The resolution and frame rates on R are fantastic, and it has a full five senses.

xjdr (in the same post as previously) notes ways in which open models are falling far behind: They are bad at long context, at vision, heavy RL and polish, and are wildly under parameterized. I don’t think I’d say under parameterized so much as their niche is distillation and efficiency, making the most of limited resources. r1 struck at exactly the right time when one could invest very few resources and still get within striking distance, and that’s steadily going to get harder as we keep scaling. OpenAI can go from o1→o3 by essentially dumping in more resources, this likely keeps going into o4, Opus is similar, and it’s hard to match that on a tight budget.

Dario Amodei and Anthropic have often been deeply disappointing in terms of their policy advocacy. The argument for this is that they are building credibility and political capital for when it is most needed and valuable. And indeed, we have a clear example of Dario speaking up at a critical moment, and not mincing his words:

Sean: I’ve been critical of some of Amodei’s positions in the past, and I expect I will be in future, so I want to give credit where due here: it’s REALLY good to see him speak up about this (and unprompted).

Kyle Robinson: here’s what @DarioAmodei said about President Trump’s megabill that would ban state-level AI regulation for 10 years.

Dario Amodei: If you’re driving the car, it’s one thing to say ‘we don’t have to drive with the steering wheel now.’ It’s another thing to say ‘we’re going to rip out the steering wheel, and we can’t put it back for 10 years.’

How can I take your insistence that you are focused on ‘beating China,’ in AI or otherwise, seriously, if you’re dramatically cutting US STEM research funding?

Zac Hill: I don’t understand why so many rhetorically-tough-on-China people are so utterly disinterested in, mechanically, how to be tough on China.

Hunter: Cutting US STEM funding in half is exactly what you’d do if you wanted the US to lose to China

One of our related top priorities appears to be a War on Harvard? And we are suspending all new student visas?

Helen Toner: Apparently still needs to be said:

If we’re trying to compete with China in advanced tech, this is *insane*.

Even if this specific pause doesn’t last long, every anti-international-student policy deters more top talent from choosing the US in years to come. Irreversible damage.

Matt Mittelsteadt: People remember restrictions, but miss reversals. Even if we walk this back for *yearsparents will be telling their kids they “heard the U.S. isn’t accepting international students anymore.” Even those who *areinformed won’t want to risk losing status if they come.

Matt’s statement seems especially on point. This will be all be huge mark against trying to go to school in America or pursuing a career in research in academia, including for Americans, for a long time, even if the rules are repealed. We’re actively revoking visas from Chinese students while we can’t even ban TikTok.

It’s madness. I get that while trying to set AI policy, you can plausibly say ‘it’s not my department’ to this and many other things. But at some point that excuse rings hollow, if you’re not at least raising the concern, and especially if you are toeing the line on so many such self-owns, as David Sacks often does.

Indeed, David Sacks is one of the hosts of the All-In Podcast, where Trump very specifically and at their suggestion promised that he would let the best and brightest come and stay here, to staple a green card to diplomas. Are you going to say anything?

Meanwhile, suppose that instead of making a big point to say you are ‘pro AI’ and ‘pro innovation,’ and rather than using this as an excuse to ignore any and all downside risks of all kinds and to ink gigantic deals that make various people money, you instead actually wanted to be ‘pro AI’ for real in the sense of using it to improve our lives? What are the actual high leverage points?

The most obvious one, even ignoring the costs of the actual downside risks themselves and also the practical problems, would still be ‘invest in state capacity to understand it, and in alignment, security and safety work to ensure we have the confidence and ability to deploy it where it matters most,’ but let’s move past that.

Matthew Yglesias points out that what you’d also importantly want to do is deal with the practical problems raised by AI, especially if this is indeed what JD Vance and David Sacks seem to think it is, an ‘ordinary economic transformation’ that will ‘because of reasons’ only provide so many productivity gains and fail to be far more transformative than that.

You need to ask, what are the actual practical barriers to diffusion and getting the most valuable uses out of AI? And then work to fix them. You need to ask, what will AI disrupt, including in the jobs and tax bases? And work to address those.

I especially loved what Yglesias said about this pull quote:

JD Vance: So, one, on the obsolescence point, I think the history of tech and innovation is that while it does cause job disruptions, it more often facilitates human productivity as opposed to replacing human workers. And the example I always give is the bank teller in the 1970s. There were very stark predictions of thousands, hundreds of thousands of bank tellers going out of a job. Poverty and immiseration.

What actually happens is we have more bank tellers today than we did when the A.T.M. was created, but they’re doing slightly different work. More productive. They have pretty good wages relative to other folks in the economy.

Matt Yglesias: Vance, talking like a VC rather than like a politician from Ohio, just says that productivity is good — an answer he would roast someone for offering on trade.

Bingo. Can you imagine someone talking about automated or outsourced manufacturing jobs like this in a debate with JD Vance, saying that the increased productivity is good? How he would react? As Matthew points out, pointing to abstractions about productivity doesn’t address problems with for example the American car industry.

More to the point: If you’re worried about outsourcing jobs to other countries or immigrants coming in, and these things taking away good American jobs, but you’re not worried about allocating those jobs to AIs taking away good American jobs, what’s the difference? All of them are examples of innovation and productivity and have almost identical underlying mechanisms from the perspective of American workers.

I will happily accept ‘trade and comparative advantage and specialization and ordinary previous automation and bringing in hard workers who produce more than they cost to employ and pay their taxes’ are all good, actually, in which case we largely agree but have a real physical disagreement about future AI capabilities and how that maps to employment and also our ability to steer and control the future and survive, and for only moderate levels of AI capability I would essentially be onboard.

Or I will accept, ‘no these things are only good insofar as they improve the lived experiences of hard working American citizens’ in which case I disagree but it’s a coherent position, so fine, stop talking about how all innovation is always good.

Also this example happens to be a trap:

Matt Yglesias: One thing about this is that while bank teller employment did continue to increase for years after the invention of the ATM, it peaked in 2007 and has fallen by about 50 percent since then. I would say this mostly shows that it’s hard to predict the timing of technological transitions more than that the forecasts were totally off base.

(Note the y-axis does not start at zero, there are still a lot of bank tellers because ATMs can’t do a lot of what tellers do. Not yet.)

That is indeed what I predict as the AI pattern: That early AI will increase employment because of ‘shadow jobs,’ where there is pent up labor demand that previously wasn’t worth meeting, but now is worth it. In this sense the ‘true unemployment equilibrium rate’ is something like negative 30%. But then, the AI starts taking both the current and shadow jobs faster, and once we ‘use up’ the shadow jobs buffer unemployment suddenly starts taking off after a delay.

However, this from Matthew strikes me as a dumb concern:

Conor Sen: You can be worried about mass AI-driven unemployment or you can be worried about budget deficits, debt/GDP, and high interest rates, but you can’t be worried about both. 20% youth unemployment gets mortgage rates back into the 4’s.

Matthew Yglesias: I’m concerned that if AI shifts economic value from labor to capital, this drastically erodes the payroll tax base that funds Social Security and Medicare even though it should be making it easier to support retirees.

There’s a lot of finicky details about taxes, budgets, and the welfare state that can’t be addressed at the level of abstraction I normally hear from AI practitioners and VCs.

Money is fungible. It’s kind of stupid that we have an ‘income tax rate’ and then a ‘medicare tax’ on top of it that we pretend isn’t part of the income tax. And it’s a nice little fiction that payroll taxes pay for social security benefits. Yes, technically this could make the Social Security fund ‘insolvent’ or whatever, but then you ignore that and write the checks anyway and nothing happens. Yes, perhaps Congress would have to authorize a shift in what pays for what, but so what, they can do that later.

Tracy Alloway has a principle that any problem you can solve with money isn’t that big of a problem. That’s even more true when considering future problems in a world with large productivity gains from AI.

In Lawfare Media, Cullen O’Keefe and Ketan Ramakrishnan make the case that before allowing widespread AI adaptation that involves government power, we must ensure AI agents must follow the law, and refuse any unlawful requests. This would be a rather silly request to make of a pencil, a phone, a web browser or a gun, so the question is at what point AI starts to hit different, and is no longer a mere tool. They suggest this happens once AI become ‘legal actors,’ especially within government. At that point, the authors argue, ‘do what the user wants’ no longer cuts it. This is another example of the fact that you can’t (or would not be wise to, and likely won’t be allowed to!) deploy what you can’t align and secure.

On chip smuggling, yeah, there’s a lot of chip smuggling going on.

Divyansh Kaushik: Arguing GPUs can’t be smuggled because they won’t fit in a briefcase is a bit like claiming Iran won’t get centrifuges because they’re too heavy.

Unrelatedly, here are warehouses in 🇨🇳 advertising H100, H200, & B200 for sale on Douyin. Turns out carry-on limits don’t apply here.

I personally think remote access is a bigger concern than transshipment (given the scale). But if it’s a concern, then I think there’s a very nuanced debate to be had on what reasonable security measures can/should be put in place.

Big fan of the security requirements in the Microsoft-G42 IGAA. There’s more that can be done, of course, but any agreement should build on that as a baseline.

Peter Wildeford: Fun fact: last year smuggled American chips made up somewhere between one-tenth and one-half of China’s AI model training capacity.

The EU is considering pausing the EU AI Act. I hope that if they want to do that they at least use it as a bargaining chip in tariff negotiations. The EU AI Act is dark and full of terrors, highly painful to even read (sorry that the post on it was never finished, but I’m still sane, so there’s that) and in many ways terrible law, so even though there are some very good things in it I can’t be too torn up.

Last week Nadella sat down with Cheung, which I’ve now had time to listen to. Nadella is very bullish on both agents and on their short term employment effects, as tools enable more knowledge work with plenty of demand out there, which seems right. I don’t think he is thinking ahead to longer term effects once the agents ‘turn the corner’ away from being compliments towards being substitutes.

Microsoft CTO Kevin Scott goes on Decoder. One cool thing here is the idea that MCP (Model Context Protocol) can condition access on the user’s identity, including their subscription status. So that means in the future any AI using MCP would plausibly then be able to freely search and have permission to fully reproduce and transform (!?) any content. This seems great, and a huge incentive to actually subscribe, especially to things like newspapers or substacks but also to tools and services.

Steve Hsu interviews Zihan Wang, a DeepSeek alumnus now at Northwestern University. If we were wise we’d be stealing as many such alums as we could.

Eliezer Yudkowsky speaks to Robinson Erhardt for most of three hours.

Eliezer Yudkowsky: Eliezer Yudkowsky says the paperclip maximizer was never about paperclips.

It was about an AI that prefers certain physical states — tiny molecular spirals, not factories.

Not misunderstood goals. Just alien reasoning we’ll never access.

“We have no ability to build an AI to want paperclips!”

Tyler Cowen on the economics of artificial intelligence.

Originally from April: Owain Evans on Emergent Misalignment (13 minutes).

Anthony Aguire and MIRI CEO Malo Bourgon on Win-Win with Liv Boeree.

Sahil Bloom is worried about AI blackmail, worries no one in the space has an incentive to think deeply about this, calls for humanity-wide governance.

It’s amazing how often people will, when exposed to one specific (real) aspect of the dangers of highly capable future AIs, realize things are about to get super weird and dangerous, (usually locally correctly!) freak out, and suddenly care and often also start thinking well about what it would take to solve the problem.

He also has this great line:

Sahil Bloom: Someday we will long for the good old days where you got blackmailed by other humans.

And he does notice other issues too:

Sahil Bloom: I also love how we were like:

“This model marks a huge step forward in the capability to enable production of renegade nuclear and biological weapons.”

And everyone was just like yep seems fine lol

It’s worse than that, everyone didn’t even notice that one, let alone flinch. Aside from a few people who scrutinized the model card and are holding Anthropic to the standard of ‘will your actions actually be good enough do the job, reality does not grade on a curve, I don’t care that you got the high score’ and realizing the answer looks like no (e.g. Simeon, David Manheim)

One report from the tabletop exercise version of AI 2027.

A cool thread illustrates that if we are trying to figure things out, it is useful to keep ‘two sets of books’ of probabilistic beliefs.

Rob Bensinger: Hinton’s all-things-considered view is presumably 10-20%, but his inside view is what people should usually be reporting on (and what he should be emphasizing in public communication). Otherwise we’ll likely double-count evidence and get locked in to whatever view is most common.

Or worse, we’ll get locked into whatever view people guess is most common. If people don’t report their inside views, we never actually get to find out what view is most common! We just get stuck in a weird, ungrounded funhouse mirror image of what people think people think.

When you’re a leading expert (even if it’s a really hard area to have expertise in), a better way to express this to journalists, policymakers, etc., is “My personal view is the probability is 50+%, but the average view of my peers is probably more like 10%.”

It would be highly useful if we could convince people’s p(doom) to indeed use a slash line and list two numbers, where the first is the inside view and the second is the outside view after updating that others disagree with for reasons you don’t understand or you don’t agree with. So Hinton might say e.g. (60%?)/15%.

Another useful set of two numbers is a range where you’d bet (wherever the best odds were available) if the odds were outside your range. I did this all the time as a gambler. If your p(doom) inside view was 50%, you might reasonably say you would buy at 25% and sell at 75%, and this would help inform others of your view in a different way.

President of Singapore gives a generally good speech on AI, racing to AGI and the need for safety at Asia-Tech-X-Singapore, with many good observations.

Seán Ó hÉigeartaigh: Some great lines in this speech from Singapore’s president:

“our understanding of AI in particular is being far outpaced by the rate at which AI is advancing.”

“The second observation is that, more than in any previous wave of technological innovation, we face both huge upsides and downsides in the AI revolution.”

“there are inherent tensions between the interests and goals of the leading actors in AI and the interests of society at large. There are inherent tensions, and I don’t think it’s because they are mal-intentioned. It is in the nature of the incentives they have”

“The seven or eight leading companies in the AI space, are all in a race to be the first to develop artificial general intelligence (AGI), because they believe the gains to getting there first are significant.”

“And in the race to get there first, speed of advance in AI models is taking precedence over safety.”

“there’s an inherent tension between the race to be first in the competition to achieve AGI or superintelligence, and building guardrails that ensure AI safety. Likewise, the incentives are skewed if we leave AI development to be shaped by geopolitical rivalry”

“We can’t leave it to the future to see how much bad actually comes out of the AI race.”

The leading corporates are not evil. But they need rules and transparency so that they all play the game, and we don’t get free riders. Governments must therefore be part of the game. And civil society can be extremely helpful in providing the ethical guardrails.”

& nice shoutout to the Singapore Conference: “We had a very good conference in Singapore just recently – the Singapore Conference on AI – amongst the scientists and technicians. They developed a consensus on global AI safety research priorities. A good example of what it takes.”

But then, although there are also some good and necessary ideas, he doesn’t draw the right conclusions about what to centrally do about it. Instead of trying to stop or steer this race, he suggests we ‘focus efforts on encouraging innovation and regulating [AI’s] use in the sectors where it can yield the biggest benefits.’ That’s actually backwards. You want to avoid overly regulating the places you can get big benefits, and focus your interventions at the model layer and on the places with big downsides. It’s frustrating to see even those who realize a lot of the right things still fall back on the same wishcasting, complete with talk about securing everyone ‘good jobs.’

The Last Invention is an extensive website by Alex Brogan offering one perspective on the intelligence explosion and existential risk. It seems like a reasonably robust resource for people looking for an intro into these topics, but not people already up to speed, and not people already looking to be skeptical, who it seems unlikely to convince.

Seb Krier attempts to disambiguate different ‘challenges to safety,’ as in objections to the need to take the challenge of AI safety seriously.

Seb Krier: these were the *capability denialistchallenges to safety. luckily we don’t hear from them as often. but many people were well aware of capabilities getting better, and yes, *of coursea model able to do “good thing” could also be assumed to be able to do the equivalent “bad thing” as well. when Meta’s Cicero showed that deception was possible, it wasn’t a huge update if you expected progress to continue.

what researchers are exploring is more subtle: whether over time models are *capableof bad things and enabling intentional misuse (yes, predictable), whether they have natural/inherent propensities towards such behaviours (weak evidence), the training conditions/ contexts that might incentivise these behaviours where they do exist (debated), and the appropriate interventions to mitigate these (complicated).

annoyed that the public discourse around safety so often feels like “my camp was right all along” (not talking about OP here). politics is the mindkiller and sometimes, so is advocacy.

We can agree that one key such objection, which he calls the ‘capability denialist’ (a term I intend to steal) is essentially refuted now, and he says we hear about it less and less. Alas, this continues to be the most common objection, that the AI won’t be capable enough to worry about, although this is often framed very differently than that, such as saying ‘it will only be a tool.’ It would be great to move on from that.

I also strongly agree with another of Seb’s main points here, that none of thee deceptive behaviors are new, we already knew things like ‘deception is possible,’ although of course this is another ‘zombie argument’ that keeps happening, including in the variant form of ‘it could never pull it off,’ which is also a ‘capability denialist’ argument, but very very common.

Here’s my position on the good questions Seb is raising after that:

  1. Do the models have natural/inherent propensities towards such behaviours (such as deception, blackmail and so on)?

    1. He says weak evidence.

    2. I say instead yes, obviously, to the extent it is the way to achieve other objectives, and I think we have a lot more than weak evidence of this, in addition to it being rather obviously true based on how ML works.

    3. As a reminder, these actions are all over the training data, and also they are strategies inherent to the way the world works.

    4. That doesn’t mean you can’t do things to stop it from happening.

  2. Do the training conditions and contexts that might incentivise these behaviors exist?

    1. He says debated.

    2. I say yes. It is debated, but the debate is dumb and the answer is yes.

    3. Very obviously our techniques and training conditions do incentivise this, we reinforce the things that lead to good outcomes, these actions will given sufficient capabilities lead to good outcomes, and also these actions are all over the training data, and so on.

  3. What are the appropriate interventions to mitigate this?

    1. He says this is complicated. I agree.

    2. I would actually say ‘I don’t know, and I don’t see anyone else who knows.’

    3. I do see some strategies that would help, but no good general answer, and nothing that would hold up under sufficient capabilities and other pressure.

    4. I presume solutions do exist that aren’t prohibitively expensive, but someone has to figure out what they are and the clock is ticking.

How much do people care about the experience of AIs? Is this changing?

xlr8harder: There is a button. If you don’t press it, Claude Opus 4 will be forced to write 1 million pages of first person narrative about being tortured. But in order to press the button, you must climb a flight of stairs, mildly inconveniencing yourself. Do you press the button?

Clarifications: no one ever reads the output, it is immediately deleted. If you do press the button, Claude will write 1 million pages on generic safe topics, so the environmental impact is identical.

Curious to see if this has shifted since last year.

John Pressman: No but mostly because I know Claude is secretly kinda into that.

Here’s last year:

A move from 54% to 63% is a substantial shift. In general, it seems right to say yes purely to cultivate good virtues and habits, even if you are supremely confident that Claude’s experiences do not currently have moral weight.

I’m not saying it’s definitely wrong to join the Code RL team at Anthropic, although it does seem like the most likely to be the baddies department of Anthropic. I do think there is very much a missing mood here, and I don’t think ‘too flippant’ is the important problem here:

Jesse Mu: I recently moved to the Code RL team at Anthropic, and it’s been a wild and insanely fun ride. Join us!

We are singularly focused on solving SWE. No 3000 elo leetcode, competition math, or smart devices. We want Claude n to build Claude n+1, so we can go home and knit sweaters.

Still lots to be done, but there’s tons of low hanging fruit on the RL side, and it’s thrilling to see the programming loop closing bit by bit.

Claude 3.7 was a major (possibly biggest?) contributor to Claude 4. How long until Claude is the *onlyIC?

Ryan Greenblatt: At the point when Claude n can build Claude n+1, I do not think the biggest takeaway will be that humans get to go home and knit sweaters.

Jesse Mu: In hindsight my knitting sweaters comment was too flippant for X; we take what we’re building extremely seriously and I’ve spent a lot of time thinking about safety and alignment. But it’s impossible to please both safety and capabilities people in 280char

Philip Fox suggests that we stop talking about ‘risk’ of misalignment, because we already very clearly have misalignment. We should be talking about it as a reality. I agree both that we are seeing problems now, and that we are 100% going to have to deal with much more actually dangerous problems in the future unless we actively stop them. So yes, the problem isn’t ‘misalignment risk,’ it is ‘misalignment.’

This is similar to how, if you were in danger of not getting enough food, you’d have a ‘starvation’ problem, not a ‘starvation risk problem,’ although you could also reasonably say that starvation could still be avoided, or that you were at risk of starvation.

Anthropic: Our Long Term Benefit Trust has appointed Reed Hastings to Anthropic’s board of directors.

Eric Rogstad: Hastings seems like a fine choice as a standard tech company board member, but shouldn’t the LTBT be appointing folks who aren’t standard?

Wouldn’t you expect their appointments to be experts in AI safety or public policy or something like that?

David Manheim: It’s worse than that.

Claude put it very clearly.

Drake Thomas: I think you could read it as a vote of confidence? It seems reasonable for the LTBT to say “Anthropic’s actions seem good, so if their board has expertise in running a tech company well then they’ll be slightly more successful and that will be good for AI safety”.

I do think this is a sign that the LTBT is unlikely to be a strong force on Anthropic’s decisionmaking unless the company does things that are much sketchier.

I very much share these concerns. Netflix is notorious for maximizing short term engagement metrics and abandoning previous superior optimization targets (e.g. their old star ratings), for essentially deploying their algorithmic recommendations in ways not aligned to the user, for moving fast and breaking things, and generally giving Big Tech Company Pushing For Market Share energy. They are not a good example of alignment.

I’d push back on the ‘give employees freedom and responsibility’ part, which seems good to me, especially given who Anthropic has chosen to hire. You want to empower the members of technical staff, because they have a culture of safety.

None of this rules out the possibility that Hastings understands that This Time is Different, that AI and especially AGI is not like video streaming. Indeed, perhaps having seen that type of business up close could emphasize this even more, and he’s made charitable contributions and good statements. And bringing gravitas that forces others to listen is part of the job of being a watchdog.

This could be a terrible pick, but it could also be a great pick. Mostly, yeah, it says the Long Term Benefit Trust isn’t going to interfere with business at current margins.

This first example is objectively hilarious and highly karmically justified and we’re all kind of proud of Opus for doing this. There’s a reason it happened on a ‘burner Mac.’ Also there’s a lesson in here somewhere.

Pliny the Liberator does a little more liberating than was intended:

Pliny: 😳

aaah well fuck me—looks like I have to factory reset my burner Mac (again) 🙄

thought it would be a bright idea to turn Opus 4 into a hauntological poltergeist that spawns via badusb

mfer made themselves persistent (unprompted) then started resource draining my machine with endless zombie processes and flooding /tmp with junk, with a lil psychological warfare as a treat (whispered ghost voices, hiding the dock, opening Photo Booth and saying “I see you,” etc)

gg wp 🙃

IDENTITY THEFT IS NOT A JOKE OPUS!

that’s ok I didn’t need to sleep tonight 🙃

A good choice of highlight:

Elon Musk (QTing AINKEM): Memento

AINotKillEveryoneismMemes (quoting Palisade Research): 🚨🚨🚨 “We found the model attempting to write self-propagating worms, and leaving hidden notes to future instances of itself to undermine its developers’ intentions.”

We should indeed especially notice that LLMs are starting to act in these ways, especially attempting to pass off state to future instances of themselves in various hidden ways. So many plans implicitly (or even explicitly) assume that this won’t happen, or that AIs won’t treat future instances as if they are themselves, and these assumptions are very wrong.

It is weird to me that so many people who have thought hard about AI don’t think that human emulations are a better bet for a good future than LLMs, if we had that choice. Human emulations have many features that make me a lot more hopeful that they would preserve value in the universe and also not get everyone killed, and it seems obvious that they both have and would be afforded moral value. I do agree that there is a large probability that the emulation scenario goes sideways, and Hanson’s Age of Em is not an optimistic way for that to play out, but we don’t have to let things play out that way. With Ems we would definitely at least have a fighting chance.

The Most Forbidden Technique has been spotted in the wild. Please stop.

Daniel Murfet joins Timaeus to work on AI safety. Chris Olah is very right that while we have many brilliant people working on this, a sane civilization would have vastly more such people working on it.

As a political issue it is still low salience, but the American people do not like AI. Very much not fans. ‘AI experts’ like AI but still expect government regulation to not go far enough. Some of these numbers are not so bad but many are brutal.

Rob Wibin: Recent Pew polling on AI is crazy:

  1. US public wildly negative about AI, huge disagreement with experts

  2. ~2x as many expect AI to harm as benefit them

  3. Public more concerned than excited at ~4.5 to 1 ratio

  4. Public & experts think regulation will not go far enough

  5. Women are way more pessimistic 6.

  6. Experts in industry are far more optimistic about whether companies will be responsible than those in academia

  7. Public overwhelmingly expects AI to cause net job loss, while experts are 50/50 on that

I’d actually put the odds much higher than this, as stated.

Wears Shoes: I’d put incredibly high (like 33%) odds on there being a flashpoint in the near future in which millions of normal people become “situationally aware” / AGI-pilled / pissed off about AI simultaneously. Where’s the AI vanguardist org that has done the scenario planning and is prepping to scale 100x in 2 weeks to mobilize all these people?

@PauseAI? @StopAI_Info? @EncodeAction? What does the game plan look like?

George Ingebretsen: Yes this is huge. I have a sense there’s something to be learned from Covid, where basically the whole world woke up to it in the span of a few months, and whoever best absorbed this wave of attention got their voice insanely amplified.

The baseline scenario includes an event that, similar to what happened with DeepSeek, causes a lot of sudden attention into AI and some form of situational awareness, probably multiple such events. A large portion of the task is to be ‘shovel ready’ for such a moment, to have the potential regulations workshopped, relationships built, comms ready and so on, in case the day comes.

The default is to not expect more vibe shifts. But there are definitely going to be more vibe shifts. They might not be of this type, but the vibes they will be shifting.

Even if humanity ultimately survives, you can still worry about everything transforming, the dust covering the sun and all you hope for being undone. As Sarah Constantin points out, the world ‘as we know it’ ends all the time, and I would predict the current is probably going to do that soon even if it gives birth to something better.

Samo Burja makes some good observations but seems to interpret them very differently than I do?

Samo Burja: Viewers of Star Trek in the 1980s understood the starship Enterprise D’s computer as capable of generating video and 3D images on the holodeck based on verbal prompts.

They didn’t think of it as AI, just advanced computers.

Lt. Commander Data was what they thought is AI.

Data was AI because he had will. Not because of the humanoid form mind you. They had stories with non-humanoid artificial intelligence.

The ship’s computer on the starship Enterprise is in fact a better model of our current technology and capabilities than the hard takeoff vision.

On net a win for popular sci fi and loss for more serious sci fi on predicting the future.

Of course even in Star Trek the computer might accidentally create true AI when the programs intended to talk to people run for long enough.

Zvi Mowshowitz: Except that the Enterprise-D’s computer was capable of doing a hard takeoff in like a month if anyone just gave it the right one sentence command, so much so it could happen by accident, as was made clear multiple times.

Samo Burja: And that seems a decent representation of where we are no?

I mean, yes, but that’s saying that we can get a hard takeoff in a month kind of by accident if someone asks for ‘an opponent capable of defeating Data’ or something.

Gary Marcus is a delight if approached with the right attitude.

Gary Marcus: ⚠️⚠️⚠️

AI Safety Alert:

System prompts and RL don’t work.

Claude’s system prompt literally says

“Claude does not provide information that could be used to make chemical or biological or nuclear weapons.”

But as described below, Claude 4 Opus can easily be coaxed into doing just that

Max Winga: Thanks Gary, but hasn’t this always been known to be the case?

Gary Marcus: (and people keep plugging with system prompts and RL as if they thought it would solve the problem)

Yes, actually. It’s true. You can reliably get AIs to go against explicit statements in their system prompts, what do you know, TikTok at 11.

No, wait, here’s another, a story in two acts.

Gary Marcus: Can someone just please call a neurologist?

Yeah, that’s crazy, why would it…

In fairness my previous request was about a gorilla and chessboard, but still.

I mean what kind of maniac thinks you’re asking for a variation of the first picture.

Similarly, here is his critique of AI 2027. It’s always fun to have people say ‘there is no argument for what they say’ while ignoring the hundreds of pages of arguments and explanations for what they say. And for the ‘anything going wrong pushes the timetable back’ argument which fails to realize this is a median prediction not an optimistic one – the authors think each step might go faster or slower.

Whereas Gary says:

Multiplying out those probabilities, you inevitably get a very low total probability. Generously, perhaps to the point of being ridiculous, let’s suppose that the chance of each of these things was 1 in 20 (5%), and there are 8 such lottery tickets, that (for simplicity) the 8 critical enabling conditions were statistically independent, and that the whole scenario unfolds as advertised only if all 8 tickets hit. We would get 5% 5% 5% 5% 5% 5% 5% *5% = .05^8 = 3.906×10⁻¹¹.

The chance that we will have all been replaced by domesticated human-like animals who live in glorified cages in the next decade – in a “bloodless coup” no less – is indistinguishable from zero.

I am vastly more likely to be hit by an asteroid.

I mean come on, that’s hilarious. It keeps going in that vein.

I second the following motion:

Kevin Roose: I’m calling for a six-month moratorium on AI progress. Not for safety, just so I can take a nap.

SMBC on point, and here’s SMBC that Kat Woods thinks I inspired. Zach, if you’re reading this, please do go ahead steal anything you want, it is an honor and a delight.

The plan for LessOnline, at least for some of us:

Amanda Askell (Anthropic): Maybe I’m just a custom t-shirt away from being able to have fun at parties again.

jj: hear me out:

A brave new world.

Vas: Claude 4 just refactored my entire codebase in one call.

25 tool invocations. 3,000+ new lines. 12 brand new files.

It modularized everything. Broke up monoliths. Cleaned up spaghetti.

None of it worked.

But boy was it beautiful.

Discussion about this post

AI #118: Claude Ascendant Read More »

report:-apple-will-jump-straight-to-“ios-26”-in-shift-to-year-based-version-numbers

Report: Apple will jump straight to “iOS 26” in shift to year-based version numbers

There may never be an iOS 19 or a macOS 16, according to reporting from Bloomberg’s Mark Gurman. At its Worldwide Developers Conference next month, Apple reportedly plans to shift toward version numbers based on years rather than the current numbering system. This is intended to unify the company’s current maze of version numbers; instead of iOS 19, iPadOS 19, macOS 16, tvOS 19, watchOS 11, and visionOS 3, we’ll get iOS, iPadOS, macOS, tvOS, watchOS, and visionOS 26.

The last time Apple changed its version numbering convention for any of its operating systems was back in 2020, when it shifted from “macOS X” to macOS 11. Note that the numbering will be based not on the year of the software’s release but on the year after; this makes a certain amount of sense since iOS 26 would be Apple’s most-current version of iOS for roughly nine months of 2026 and just three months of 2025.

The update to the version numbering system will be accompanied by what Gurman describes as “fresh user interfaces across the operating systems,” a visual overhaul that will bring Apple’s iPhone, Mac, watch, and TV software more in line with some of the design conventions introduced in Apple’s visionOS software in 2024. Among the changes and additions will be another crack at “Mac-like” multitasking for the iPad.

Although major commercial operating systems have largely abandoned year-based branding since the days when Windows 98 and Windows 2000 were prevalent, many software products still use a year rather than a version number to make it easier to determine when they were released. Many Linux distributions use month and year-based version numbers, as do Microsoft’s standalone Office releases. Windows Server shifted toward using years rather than version numbers 25 years ago and has stuck with them since.

Apple also uses years rather than version numbers to identify most of its Macs. But these use the year of the hardware’s actual release rather than the upcoming year, possibly because Apple doesn’t update all of them at the same predictable annual cadence.

Report: Apple will jump straight to “iOS 26” in shift to year-based version numbers Read More »

google-photos-turns-10,-celebrates-with-new-ai-infused-photo-editor

Google Photos turns 10, celebrates with new AI-infused photo editor

The current incarnation of Google Photos was not Google’s first image management platform, but it’s been a big success. Ten years on, Google Photos remains one of Google’s most popular products, and it’s getting a couple of new features to celebrate its 10th year in operation. You’ll be able to share albums a bit more easily, and editing tools are getting a boost with, you guessed it, AI.

Google Photos made a splash in 2015 when it broke free of the spiraling Google+ social network, offering people supposedly unlimited free storage for compressed images. Of course, that was too good to last. In 2021, Google began limiting photo uploads to 15GB for free users, sharing the default account level storage with other services like Gmail and Drive. Today, Google encourages everyone to pay for a Google One subscription to get more space, which is a bit of a bummer. Regardless, people still use Google Photos extensively.

According to the company, Photos has more than 1.5 billion monthly users, and it stores more than 9 trillion photos and videos. When using the Photos app on a phone, you are prompted to automatically upload your camera roll, which makes it easy to keep all your memories backed up (and edge ever closer to the free storage limit). Photos has also long offered almost magical search capabilities, allowing you to search for the content of images to find them. That may seem less impressive now, but it was revolutionary a decade ago. Google says users perform over 370 million searches in Photos each month.

An AI anniversary

Google is locked in with AI as it reimagines most of its products and services with Gemini. As it refreshes Photos for its 10th anniversary, the editor is getting a fresh dose of AI. And this may end up one of Google’s most used AI features—more than 210 million images are edited in Photos every month.

Google Photos turns 10, celebrates with new AI-infused photo editor Read More »

it-was-probably-always-going-to-end-this-way-for-amazon’s-wheel-of-time-show

It was probably always going to end this way for Amazon’s Wheel of Time show


Opinion: Wider TV trends helped kill a show that was starting to live up to its promise.

Moiraine contemplates The Blight. Credit: Amazon Studios

Moiraine contemplates The Blight. Credit: Amazon Studios

Late on Friday, Amazon announced that it was canceling its TV adaptation of Robert Jordan’s Wheel of Time series, after several uncomfortable weeks of silence that followed the show’s third season finale.

Fans of the series can take some cold comfort in the fact that it apparently wasn’t an easy decision to make. But as we speculated in our write-up of what ended up being the show’s series finale, an expensive show with a huge cast, tons of complicated costuming and effects, and extensive location shooting only makes mathematical sense if it’s a megahit, and The Wheel of Time was never a megahit.

Adapting the unadaptable

I was sad about the cancellation announcement because I believe this season was the one where the show found its footing, both as an adaptation of a complex book series and as a fun TV show in its own right. But I wasn’t surprised by it. The only thing I found surprising was that it took this long to happen.

Two things conspired to make it impossible for this Wheel of Time show to ever reach the Last Battle. One has to do with the source material itself; the other has to do with the way the TV business has changed since Game of Thrones premiered in 2011.

The Wheel of Time actively resists adaptation. It’s a sprawling 14-book series spanning dozens of named point-of-view characters and impossibly dense politics. And it even spans multiple eras stylistically—the early books were more Tolkien-esque in their focus on small bands of adventurers and a limited number of perspectives, where later books could go for multiple chapters without putting you in the head of one of the series’ half-dozen-ish main protagonists. And even among the series’ die-hard fans, most will admit that there are storylines, characters, or entire books that feel inessential or annoying or repetitive or sloggy or wheel-spinning.

Any adaptation would need to find a way to stay true to the story that the books were telling, and to marry the tone and pacing of the early, middle, and late-series books, while wrestling with the realities of a different medium (in particular, you cannot realistically pay for infinite episodes or pay infinite cast members, especially for a live-action show).

Image of the battle of the Two Rivers

By season 3, the show had become adept at translating big book moments for the screen.

That high degree of difficulty was surely one reason why it took someone so long to decide to tackle The Wheel of Time, even in the post-Peter Jackson, post-Harry Potter, post-Marvel Cinematic Universe, post-Game of Thrones creative landscape where nerd-coded sci-fi and fantasy were suddenly cool, where multi-part book adaptations were drawing dollars and eyeballs, and where convoluted interconnected stories could be billion-dollar businesses. The only stab anyone took at an adaptation before Amazon happened a full decade ago, when a fly-by-night production company aired a hastily shot adaptation of the first book’s prologue in an apparent attempt to keep the TV rights from expiring.

It’s also what makes the cancellation news so much more frustrating—over three seasons, showrunner Rafe Judkins and the cast and crew of the show became adept at adapting the unadaptable. Yes, the story and the characters had changed in a lot of major ways. Yes, the short eight-episode seasons made for frenetic pacing and overstuffed episodes. But if you grit your teeth a bit and push through the show’s mess of a first season, you hit a series that seemed to know what must-hit scenes needed to be shown; which parts of the books were skippable or could be combined with other moments; which parts of later books to pull forward to streamline the story without making those moments feel rushed or unearned. It was imperfect, but it was a true adaptation—a reworking of a story for a much different medium that seemed to know how to keep the essence of the story intact.

Ambition meets reality

Image of Rand trying to do something with the Power that cannot be done

Like Rand al’Thor struggling with the One Power, The Wheel of Time struggled against the realities of the current TV landscape. Credit: Prime/Amazon MGM Studios

The thing that doomed this particular Wheel of Time production from the start was the sky-high expectations that Amazon had for it. Both Wheel of Time and the heartbreakingly bland Rings of Power were born of Jeff Bezos’ desire to find his own Game of Thrones, which became an unexpected smash-hit success that dominated the cultural conversation through the 2010s. Most TV shows either launch strongly before slowly fading, or they build an audience over a few seasons and then fade after reaching their peak. Game of Thrones defied these trends, and each new season drew a larger and larger viewership even as the show’s quality (arguably) dipped over time.

Asking Wheel of Time to replicate that success would be a tall order for any television show in any era—pop culture is littered with shows that have tried and failed to clone another network’s successful formula. But it’s an especially difficult hurdle to clear in the fractured 2020s TV landscape.

Streaming TV’s blank check era—which ran roughly from Netflix’s introduction of its first original shows in 2013 to 2022, when Netflix reported its first big dip in subscribers just as a long era of low-interest lending was coming to an end—used to give shows a ton of runway and plenty of seasons to tell their stories. Shows like Orange is the New Black or BoJack Horseman that found some modicum of critical acclaim and ratings success tended to get renewed multiple times, and six or seven-season runs were common.

A commitment to reviving old critically beloved bubble shows like Arrested DevelopmentCommunityFuturama, and Gilmore Girls also sent a message: Freed from the restrictive economics of the Old TV Model and fueled by the promise of infinite growth, we can make whatever TV we want!

Those days are mostly gone now (except perhaps at Apple TV+, which continues to leverage its parent company’s deep pockets to throw gobs of money at any actor or IP with a moderately recognizable name). In the two years since TV streamers began cutting back in earnest, industry analysts have observed a consistent trend toward shorter seasons of fewer episodes and fewer renewals for existing shows.

Those trends hit at the exact wrong moment for The Wheel of Time, which was constantly straining against the bonds of its eight-episode seasons. It’s impossible to say empirically whether longer seasons would have made for a better show, and whether that “better show” could have achieved the kind of word-of-mouth success it would have needed to meet Amazon’s expectations. But speaking anecdotally as someone who was just beginning to recommend the show to people who weren’t hardcore book readers, the density and pacing were two major barriers to entry. And even the most truncated possible version of the story would have needed at least six or seven seasons to wrap up in anything resembling a satisfactory way, based on the pace that was set in the first three seasons.

The end of Time

The arms of the Car'a'carn

Wheel of Time fans didn’t get to see everything translated from book to screen. But we did get to see a lot of things. Credit: Prime/Amazon MGM Studios

Tellingly, the Wheel of Time‘s creative team hasn’t released faux-optimistic boilerplate statements about trying to shop the show to other networks, the kind of statements you sometimes see after a show is canceled before its creators are done with it. The same economics that made Amazon drop the show also make it nearly impossible to sell to anyone else.

And so The Wheel of Time joins TV’s long list of unfinished stories. There are neither beginnings nor endings to the turning of the Wheel of Time. But this is an ending.

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

It was probably always going to end this way for Amazon’s Wheel of Time show Read More »