Author name: 9u50fv

elon-musk-wants-courts-to-force-openai-to-auction-off-a-large-ownership-stake

Elon Musk wants courts to force OpenAI to auction off a large ownership stake

Musk, who founded his own AI startup xAI in 2023, has recently stepped up efforts to derail OpenAI’s conversion.

In November, he sought to block the process with a request for a preliminary injunction filed in California. Meta has also thrown its weight behind the suit.

In legal filings from November, Musk’s team wrote: “OpenAI and Microsoft together exploiting Musk’s donations so they can build a for-profit monopoly, one now specifically targeting xAI, is just too much.”

Kathleen Jennings, attorney-general in Delaware—where OpenAI is incorporated—has since said her office was responsible for ensuring that OpenAI’s conversion was in the public interest and determining whether the transaction was at a fair price.

Members of Musk’s camp—wary of Delaware authorities after a state judge rejected a proposed $56 billion pay package for the Tesla boss last month—read that as a rebuke of his efforts to block the conversion, and worry it will be rushed through. They have also argued OpenAI’s PBC conversion should happen in California, where the company has its headquarters.

In a legal filing last week Musk’s attorneys said Delaware’s handling of the matter “does not inspire confidence.”

OpenAI committed to become a public benefit corporation within two years as part of a $6.6 billion funding round in October, which gave it a valuation of $157 billion. If it fails to do so, investors would be able to claw back their money.

There are a number of issues OpenAI is yet to resolve, including negotiating the value of Microsoft’s investment in the PBC. A conversion was not imminent and would be likely to take months, according to the person with knowledge of the company’s thinking.

A spokesperson for OpenAI said: “Elon is engaging in lawfare. We remain focused on our mission and work.” The California and Delaware attorneys-general did not immediately respond to a request for comment.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Elon Musk wants courts to force OpenAI to auction off a large ownership stake Read More »

microsoft-sues-service-for-creating-illicit-content-with-its-ai-platform

Microsoft sues service for creating illicit content with its AI platform

Microsoft and others forbid using their generative AI systems to create various content. Content that is off limits includes materials that feature or promote sexual exploitation or abuse, is erotic or pornographic, or attacks, denigrates, or excludes people based on race, ethnicity, national origin, gender, gender identity, sexual orientation, religion, age, disability status, or similar traits. It also doesn’t allow the creation of content containing threats, intimidation, promotion of physical harm, or other abusive behavior.

Besides expressly banning such usage of its platform, Microsoft has also developed guardrails that inspect both prompts inputted by users and the resulting output for signs the content requested violates any of these terms. These code-based restrictions have been repeatedly bypassed in recent years through hacks, some benign and performed by researchers and others by malicious threat actors.

Microsoft didn’t outline precisely how the defendants’ software was allegedly designed to bypass the guardrails the company had created.

Masada wrote:

Microsoft’s AI services deploy strong safety measures, including built-in safety mitigations at the AI model, platform, and application levels. As alleged in our court filings unsealed today, Microsoft has observed a foreign-based threat–actor group develop sophisticated software that exploited exposed customer credentials scraped from public websites. In doing so, they sought to identify and unlawfully access accounts with certain generative AI services and purposely alter the capabilities of those services. Cybercriminals then used these services and resold access to other malicious actors with detailed instructions on how to use these custom tools to generate harmful and illicit content. Upon discovery, Microsoft revoked cybercriminal access, put in place countermeasures, and enhanced its safeguards to further block such malicious activity in the future.

The lawsuit alleges the defendants’ service violated the Computer Fraud and Abuse Act, the Digital Millennium Copyright Act, the Lanham Act, and the Racketeer Influenced and Corrupt Organizations Act and constitutes wire fraud, access device fraud, common law trespass, and tortious interference. The complaint seeks an injunction enjoining the defendants from engaging in “any activity herein.”

Microsoft sues service for creating illicit content with its AI platform Read More »

public-health-emergency-declared-amid-la’s-devastating-wildfires

Public health emergency declared amid LA’s devastating wildfires

The US health department on Friday declared a public health emergency for California in response to devastating wildfires in the Los Angeles area that have so far killed 10 people and destroyed more than 10,000 structures.

As of Friday morning, 153,000 residents are under evacuation orders, and an additional 166,800 are under evacuation warnings, according to local reports.

Wildfires pose numerous health risks, including exposure to extreme heat, burns, harmful air pollution, and emotional distress.

“We will do all we can to assist California officials with responding to the health impacts of the devastating wildfires going on in Los Angeles County,” US Department of Health and Human Services (HHS) Secretary Xavier Becerra said in a statement. “We are working closely with state and local health authorities, as well as our partners across the federal government, and stand ready to provide public health and medical support.”

The Administration for Strategic Preparedness and Response (ASPR), an agency within HHS, is monitoring hospitals and shelters in the LA area and is prepared to deploy responders, medical equipment, and supplies upon the state’s request.

Public health emergency declared amid LA’s devastating wildfires Read More »

rocket-report:-china-launches-refueling-demo;-dod’s-big-appetite-for-hypersonics

Rocket Report: China launches refueling demo; DoD’s big appetite for hypersonics


We’re just a few days away from getting a double-dose of heavy-lift rocket action.

Stratolaunch’s Talon-A hypersonic rocket plane will be used for military tests involving hypersonic missile technology. Credit: Stratolaunch

Welcome to Edition 7.26 of the Rocket Report! Let’s pause and reflect on how far the rocket business has come in the last 10 years. On this date in 2015, SpaceX made the first attempt to land a Falcon 9 booster on a drone ship positioned in the Atlantic Ocean. Not surprisingly, the rocket crash-landed. In less than a year and a half, though, SpaceX successfully landed reusable Falcon 9 boosters onshore and offshore, and now has done it nearly 400 times. That was remarkable enough, but we’re in a new era now. Within a few days, we could see SpaceX catch its second Super Heavy booster and Blue Origin land its first New Glenn rocket on an offshore platform. Extraordinary.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets as well as a quick look ahead at the next three launches on the calendar.

Our annual ranking of the top 10 US launch companies. You can easily guess who made the top of the list: the company that launched Falcon rockets 134 times in 2024 and launched the most powerful and largest rocket ever built on four test flights, each accomplishing more than the last. The combined 138 launches is more than NASA flew the Space Shuttle over three decades. SpaceX will aim to launch even more often in 2025. These missions have far-reaching impacts, supporting Internet coverage for consumers worldwide, launching payloads for NASA and the US military, and testing technology that will take humans back to the Moon and, someday, Mars.

Are there really 10? … It might also be fairly easy to rattle off a few more launch companies that accomplished big things in 2024. There’s United Launch Alliance, which finally debuted its long-delayed Vulcan rocket and flew two Atlas V missions and the final Delta IV mission, and Rocket Lab, which launched 16 missions with its small Electron rocket this year. Blue Origin flew its suborbital New Shepard vehicle on three human missions and one cargo-only mission and nearly launched its first orbital-class New Glenn rocket in 2024. That leaves just Firefly Aerospace as the only other US company to reach orbit last year.

DoD announces lucrative hypersonics deal. Defense technology firm Kratos has inked a deal worth up to $1.45 billion with the Pentagon to help develop a low-cost testbed for hypersonic technologies, Breaking Defense reports. The award is part of the military’s Multi-Service Advanced Capability Hypersonic Test Bed (MACH-TB) 2.0 program. The MACH-TB program, which began as a US Navy effort, includes multiple “Task Areas.” For its part, Kratos will be tasked with “systems engineering, integration, and testing, to include integrated subscale, full-scale, and air launch services to address the need to affordably increase hypersonic flight test cadence,” according to the company’s release.

Multiple players … The team led by Kratos, which specializes in developing airborne drones and military weapons systems, includes several players such as Leidos, Rocket Lab, Stratolaunch, and others. Kratos last year revealed that its Erinyes hypersonic test vehicle successfully flew for a Missile Defense Agency experiment. Rocket Lab has launched multiple suborbital hypersonic experiments for the military using a modified version of its Electron rocket, and Stratolaunch reportedly flew a high-speed test vehicle and recovered it last month, according to Aviation Week & Space Technology. The Pentagon is interested in developing hypersonic weapons that can evade conventional air and missile defenses. (submitted by EllPeaTea)

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

ESA will modify some of its geo-return policies. An upcoming European launch competition will be an early test of efforts by the European Space Agency to modify its approach to policies that link contracts to member state contributions, Space News reports. ESA has long used a policy known as geo-return, where member states are guaranteed contracts with companies based in their countries in proportion to the contribution those member states make to ESA programs.

The third rail of European space … Advocates of geo-return argue that it provides an incentive for countries to fund those programs. This incentivizes ESA to lure financial contributions from its member states, which will win guaranteed business and jobs from the agency’s programs. However, critics of geo-return, primarily European companies, claim that it creates inefficiencies that make them less competitive. One approach to revising geo-return is known as “fair contribution,” where ESA first holds competitions for projects, and member states then make contributions based on how companies in their countries fared in the competition. ESA will try the fair contribution approach for the upcoming launch competition to award contracts to European rocket startups. (submitted by EllPeaTea)

RFA is building a new rocket. German launch services provider Rocket Factory Augsburg (RFA) is currently focused on building a new first stage for the inaugural flight of its RFA One rocket, European Spaceflight reports. The stage that was initially earmarked for the flight was destroyed during a static fire test last year on a launch pad in Scotland. In a statement given to European Spaceflight, RFA confirmed that it expects to attempt an inaugural flight of RFA One in 2025.

Waiting on a booster … RFA says it is “fully focused on building a new first stage and qualifying it.” The rocket’s second stage and Redshift OTV third stage are already qualified for flight and are being stored until a new first stage is ready. The RFA One rocket will stand 98 feet (30 meters) tall and will be capable of delivering payloads of up to 1.3 metric tons (nearly 2,900 pounds) into polar orbits. RFA is one of several European startups developing commercial small satellite launchers and was widely considered the frontrunner before last year’s setback. (submitted by EllPeaTea)

Pentagon provides a boost for defense startup. Defense technology contractor Anduril Industries has secured a $14.3 million Pentagon contract to expand solid-fueled rocket motor production, as the US Department of Defense moves to strengthen domestic manufacturing capabilities amid growing supply chain concerns, Space News reports. The contract, awarded under the Defense Production Act, will support facility modernization and manufacturing improvements at Anduril’s Mississippi plant, the Pentagon said Tuesday.

Doing a solid … The Pentagon is keen to incentivize new entrants into the solid rocket manufacturing industry, which provides propulsion for missiles, interceptors, and other weapons systems. Two traditional defense contractors, Northrop Grumman and L3Harris, control almost all US solid rocket production. Companies like Anduril, Ursa Major, and X-Bow are developing solid rocket motor production capability. The Navy previously awarded Anduril a $19 million contract last year to develop solid rocket motors for the Standard Missile 6 program. (submitted by EllPeaTea)

Relativity’s value seems to be plummeting. For several years, an innovative, California-based launch company named Relativity Space has been the darling of investors and media. But the honeymoon appears to be over, Ars reports. A little more than a year ago, Relativity reached a valuation of $4.5 billion following its latest Series F fundraising round. This was despite only launching one rocket and then abandoning that program and pivoting to the development of a significantly larger reusable launch vehicle. The decision meant Relativity would not realize any significant revenue for several years, and Ars reported in September on some of the challenges the company has encountered developing the much larger Terran R rocket.

Gravity always wins … Relativity is a privately held company, so its financial statements aren’t public. However, we can glean some clues from the published quarterly report from Fidelity Investments, which owns Relativity shares. As of March 2024, Fidelity valued its 1.67 million shares at an estimated $31.8 million. However, in a report ending November 29 of last year, which was only recently published, Fidelity’s valuation of Relativity plummeted. Its stake in Relativity was then thought to be worth just $866,735—a per-share value of 52 cents. Shares in the other fundraising rounds are also valued at less than $1 each.

SpaceX has already launched four times this year. The space company is off to a fast start in 2025, with four missions in the first nine days of the year. Two of these missions launched Starlink internet satellites, and the other two deployed an Emirati-owned geostationary communications satellite and a batch of Starshield surveillance satellites for the National Reconnaissance Office. In its new year projections, SpaceX estimates it will launch more than 170 Falcon rockets, between Falcon 9 and Falcon Heavy, Spaceflight Now reports. This is in addition to SpaceX’s plans for up to 25 flights of the Starship rocket from Texas.

What’s in store this year?… Highlights of SpaceX’s launch manifest this year will likely include an attempt to catch and recover Starship after returning from orbit, a first in-orbit cryogenic propellant transfer demonstration with Starship, and perhaps the debut of a second launch pad at Starbase in South Texas. For the Falcon rocket fleet, notable missions this year will include launches of commercial robotic lunar landers for NASA’s CLPS program and several crew flights, including the first human spaceflight mission to fly in polar orbit. According to public schedules, a Falcon 9 rocket could launch a commercial mini-space station for Vast, a privately held startup, before the end of the year. That would be a significant accomplishment, but we won’t be surprised if this schedule moves to the right.

China is dipping its toes into satellite refueling. China kicked off its 2025 launch activities with the successful launch of the Shijian-25 satellite Monday, aiming to advance key technologies for on-orbit refueling and extending satellite lifespans, Space News reports. The satellite launched on a Long March 3B into a geostationary transfer orbit, suggesting the unspecified target spacecraft for the refueling demo test might be in geostationary orbit more than 22,000 miles (nearly 36,000 kilometers) over the equator.

Under a watchful eye … China has tested mission extension and satellite servicing capabilities in space before. In 2021, China launched a satellite named Shijian-21, which docked a defunct Beidou navigation satellite and towed it to a graveyard orbit above the geostationary belt. Reportedly, Shijian-21 satellite may have carried robotic arms to capture and manipulate other objects in space. These kinds of technologies are dual-use, meaning they have civilian and military applications. The US Space Force is also interested in satellite life extension and refueling tech, so US officials will closely monitor Shijian-25’s actions in orbit.

SpaceX set to debut upgraded Starship. An upsized version of SpaceX’s Starship mega-rocket rolled to the launch pad early Thursday in preparation for liftoff on a test flight next week, Ars reports. The rocket could lift off as soon as Monday from SpaceX’s Starbase test facility in South Texas. This flight is the seventh full-scale demonstration launch for Starship. The rocket will test numerous upgrades, including a new flap design, larger propellant tanks, redesigned propellant feed lines, a new avionics system, and an improved antenna for communications and navigation.

The new largest rocket … Put together, all of these changes to the ship raise the rocket’s total height by nearly 6 feet (1.8 meters), so it now towers 404 feet (123.1 meters) tall. With this change, SpaceX will break its own record for the largest rocket ever launched. SpaceX plans to catch the rocket’s Super Heavy booster back at the launch site in Texas and will target a controlled splashdown of the ship in the Indian Ocean.

Blue Origin targets weekend launch of New Glenn. Blue Origin is set to launch its New Glenn rocket in a long-delayed, uncrewed test mission that would help pave the way for the space venture founded by Jeff Bezos to compete against Elon Musk’s SpaceX, The Washington Post reports. Blue Origin has confirmed it plans to launch the 320-foot-tall rocket during a three-hour launch window opening at 1 am EDT (06: 00 UTC) Sunday in the company’s first attempt to reach orbit.

Finally … This is a much-anticipated milestone for Blue Origin and for the company’s likely customers, which include the Pentagon and NASA. Data from this test flight will help the Space Force certify New Glenn to loft national security satellites, providing a new competitor for SpaceX and United Launch Alliance in the heavy-lift segment of the market. Blue Origin isn’t quite shooting for the Moon on this inaugural launch, but the company will attempt to reach orbit and try to land the New Glenn’s first stage booster on a barge in the Atlantic Ocean. (submitted by EllPeaTea)

Next three launches

Jan. 10: Falcon 9 | Starlink 12-12 | Cape Canaveral Space Force Station, Florida | 18: 11 UTC

Jan. 12: New Glenn | NG-1 Blue Ring Pathfinder | Cape Canaveral Space Force Station, Florida | 06: 00 UTC

Jan. 13: Jielong 3 | Unknown Payload | Dongfang Spaceport, Yellow Sea | 03: 00 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: China launches refueling demo; DoD’s big appetite for hypersonics Read More »

coal-likely-to-go-away-even-without-epa’s-power-plant-regulations

Coal likely to go away even without EPA’s power plant regulations


Set to be killed by Trump, the rules mostly lock in existing trends.

In April last year, the Environmental Protection Agency released its latest attempt to regulate the carbon emissions of power plants under the Clean Air Act. It’s something the EPA has been required to do since a 2007 Supreme Court decision that settled a case that started during the Clinton administration. The latest effort seemed like the most aggressive yet, forcing coal plants to retire or install carbon capture equipment and making it difficult for some natural gas plants to operate without capturing carbon or burning green hydrogen.

Yet, according to a new analysis published in Thursday’s edition of Science, they wouldn’t likely have a dramatic effect on the US’s future emissions even if they were to survive a court challenge. Instead, the analysis suggests the rules serve more like a backstop to prevent other policy changes and increased demand from countering the progress that would otherwise be made. This is just as well, given that the rules are inevitably going to be eliminated by the incoming Trump administration.

A long time coming

The net result of a number of Supreme Court decisions is that greenhouse gasses are pollutants under the Clean Air Act, and the EPA needed to determine whether they posed a threat to people. George W. Bush’s EPA dutifully performed that analysis but sat on the results until its second term ended, leaving it to the Obama administration to reach the same conclusion. The EPA went on to formulate rules for limiting carbon emissions on a state-by-state basis, but these were rapidly made irrelevant because renewable power and natural gas began displacing coal even without the EPA’s encouragement.

Nevertheless, the Trump administration replaced those rules with ones designed to accomplish even less, which were thrown out by a court just before Biden’s inauguration. Meanwhile, the Supreme Court stepped in to rule on the now-even-more-irrelevant Obama rules, determining that the EPA could only regulate carbon emissions at the level of individual power plants rather than at the level of the grid.

All of that set the stage for the latest EPA rules, which were formulated by the Biden administration’s EPA. Forced by the court to regulate individual power plants, the EPA allowed coal plants that were set to retire within the decade to continue to operate as they have. Anything that would remain operational longer would need to either switch fuels or install carbon capture equipment. Similarly, natural gas plants were regulated based on how frequently they were operational; those that ran less than 40 percent of the time could face significant new regulations. More than that, and they’d have to capture carbon or burn a fuel mixture that is primarily hydrogen produced without carbon emissions.

While the Biden EPA’s rules are currently making their way through the courts, they’re sure to be pulled in short order by the incoming Trump administration, making the court case moot. Nevertheless, people had started to analyze their potential impact before it was clear there would be an incoming Trump administration. And the analysis is valuable in the sense that it will highlight what will be lost when the rules are eliminated.

By some measures, the answer is not all that much. But the answer is also very dependent upon whether the Trump administration engages in an all-out assault on renewable energy.

Regulatory impact

The work relies on the fact that various researchers and organizations have developed models to explore how the US electric grid can economically meet demand under different conditions, including different regulatory environments. The researchers obtained nine of them and ran them with and without the EPA’s proposed rules to determine their impact.

On its own, eliminating the rules has a relatively minor impact. Without the rules, the US grid’s 2040 carbon dioxide emissions would end up between 60 and 85 percent lower than they were in 2005. With the rules, the range shifts to between 75 and 85 percent—in essence, the rules reduce the uncertainty about the outcomes that involve the least change.

That’s primarily because of how they’re structured. Mostly, they target coal plants, as these account for nearly half of the US grid’s emissions despite supplying only about 15 percent of its power. They’ve already been closing at a rapid clip, and would likely continue to do so even without the EPA’s encouragement.

Natural gas plants, the other major source of carbon emissions, would primarily respond to the new rules by operating less than 40 percent of the time, thus avoiding stringent regulation while still allowing them to handle periods where renewable power underproduces. And we now have a sufficiently large fleet of natural gas plants that demand can be met without a major increase in construction, even with most plants operating at just 40 percent of their rated capacity. The continued growth of renewables and storage also contributes to making this possible.

One irony of the response seen in the models is that it suggests that two key pieces of the Inflation Reduction Act (IRA) are largely irrelevant. The IRA provides benefits for the deployment of carbon capture and the production of green hydrogen (meaning hydrogen produced without carbon emissions). But it’s likely that, even with these credits, the economics wouldn’t favor the use of these technologies when alternatives like renewables plus storage are available. The IRA also provides tax credits for deploying renewables and storage, pushing the economics even further in their favor.

Since not a lot changes, the rules don’t really affect the cost of electricity significantly. Their presence boosts costs by an estimated 0.5 to 3.7 percent in 2050 compared to a scenario where the rules aren’t implemented. As a result, the wholesale price of electricity changes by only two percent.

A backstop

That said, the team behind the analysis argues that, depending on other factors, the rules could play a significant role. Trump has suggested he will target all of Biden’s energy policies, and that would include the IRA itself. Its repeal could significantly slow the growth of renewable energy in the US, as could continued problems with expanding the grid to incorporate new renewable capacity.

In addition, the US is seeing demand for electricity rise at a faster pace in 2023 than in the decade leading up to it. While it’s still unclear whether that’s a result of new demand or simply weather conditions boosting the use of electricity in heating and cooling, there are several factors that could easily boost the use of electricity in coming years: the electrification of transport, rising data center use, and the electrification of appliances and home heating.

Should these raise demand sufficiently, then it could make continued coal use economical in the absence of the EPA rules. “The rules … can be viewed as backstops against higher emissions outcomes under futures with improved coal plant economics,” the paper suggests, “which could occur with higher demand, slower renewables deployment from interconnection and permitting delays, or higher natural gas prices.”

And it may be the only backstop we have. The report also notes that a number of states have already set aggressive emissions reduction targets, including some for net zero by 2050. But these don’t serve as a substitute for federal climate policy, given that the states that are taking these steps use very little coal in the first place.

Science, 2025. DOI: 10.1126/science.adt5665  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Coal likely to go away even without EPA’s power plant regulations Read More »

x-ceo-signals-ad-boycott-is-over-external-data-paints-a-different-picture.

X CEO signals ad boycott is over. External data paints a different picture.

When X CEO Linda Yaccarino took the stage as a keynote speaker at CES 2025, she revealed that “90 percent of the advertisers” who boycotted X over brand safety concerns since Elon Musk’s 2022 Twitter acquisition “are back on X.”

Yaccarino did not go into any further detail to back up the data point, and X did not immediately respond to Ars’ request to comment.

But Yaccarino’s statistic seemed to bolster claims that X had made since Donald Trump’s re-election that advertisers were flocking back to the platform, with some outlets reporting that brands hoped to win Musk’s favor in light of his perceived influence over Trump by increasing spending on X.

However, it remains hard to gauge how impactful this seemingly significant number of advertisers returning will be in terms of spiking X’s value, which fell by as much as 72 percent after Musk’s Twitter takeover. And X’s internal data doesn’t seem to completely sync up with data from marketing intelligence firm Sensor Tower, suggesting that more context may be needed to understand if X’s financial woes may potentially be easing up in 2025.

Before the presidential election, Sensor Tower previously told Ars that “72 out of the top 100 spending US advertisers” on Twitter/X from October 2022 had “ceased spending on the platform as of September 2024.” This was up from 50 advertisers who had stopped spending on Twitter/X in October 2023, about a year after Musk’s acquisition, suggesting that the boycott had seemingly only gotten worse.

Shortly after the election, AdWeek reported that big brands, including Comcast, IBM, Disney, Warner Bros. Discovery, and Lionsgate Entertainment, had resumed advertising on X. But by the end of 2024, Sensor Tower told Ars that X still had seemingly not succeeded in wooing back many of pre-acquisition Twitter’s top spenders, making Yaccarino’s claim that “90 percent of advertisers are back on X” somewhat harder to understand.

X CEO signals ad boycott is over. External data paints a different picture. Read More »

ai-#98:-world-ends-with-six-word-story

AI #98: World Ends With Six Word Story

The world is kind of on fire. The world of AI, in the very short term and for once, is not, as everyone recovers from the avalanche that was December, and reflects.

Altman was the star this week. He has his six word story, and he had his interview at Bloomberg and his blog post Reflections. I covered the later two of those in OpenAI #10, if you read one AI-related thing from me this week that should be it.

  1. Language Models Offer Mundane Utility. It knows where you live.

  2. Language Models Don’t Offer Mundane Utility. I see why you’re not interested.

  3. Power User. A flat subscription fee for a high marginal cost service. Oh no.

  4. Locked In User. No one else can ever know.

  5. Read the Classics. Why do we even read Aristotle, anyway?

  6. Deepfaketown and Botpocalypse Soon. Glad it’s not happening to me, yet.

  7. Fun With Image Generation. Congratulations, we solved the trolly problem.

  8. They Took Our Jobs. Personalized spearfishing works, so why so little of it?

  9. Question Time. What is causing Claude to ask the user questions all the time?

  10. Get Involved. EU AI Office is still trying to hire people. It’s rough out there.

  11. Introducing. AIFilter, a Chrome Extension to filter Tweets. Do it for science.

  12. In Other AI News. The investments in data centers, they are going large.

  13. Quiet Speculations. We are not ready. We do not understand.

  14. The Quest for Sane Regulations. If we can’t target training compute, what then?

  15. The Least You Could Do. A proposed bare minimum plan for short timelines.

  16. Six Word Story. Man responsible for singularity fs around.

  17. The Week in Audio. Anthropic discussion about alignment.

  18. And I Feel Fine. The end of the world, potentially coming soon, you say.

  19. Rhetorical Innovation. Chernobyl as the safety standard you are not living up to.

  20. Liar Liar. Stop lying to your LLMs, please?

  21. Feel the AGI. People are not feeling it.

  22. Regular Americans Hate AI. This, they feel quite a bit.

  23. Aligning a Smarter Than Human Intelligence is Difficult. What is your p(scheme)?

  24. The Lighter Side. Einstein said knock you out.

A customized prompt to get Claude or other similar LLMs to be more contemplative. I have added this to my style options.

Have it offer a hunch guessing where your customized prompt came from. As a reminder, here’s (at least an older version of) that system prompt.

Kaj Sotala makes a practical pitch for using LLMs, in particular Claude Sonnet. In addition to the uses I favor, he uses Claude as a partner to talk to and method of getting out of funk. And I suspect almost no one uses this format enough:

Kaj Sotala: Figuring out faster ways to do things with commonly-known software. “I have a Google Doc file with some lines that read ‘USER:’ and ‘ASSISTANT:’. Is there a way of programmatically making all of those lines into Heading-3?”

Using Claude (or another LLM) is a ‘free action’ when doing pretty much anything. Almost none of us are sufficiently in the habit of doing this sufficiently systematically. I had a conversation with Dean Ball about trying to interpret some legal language last week and on reflection I should have fed things into Claude or o1 like 20 times and I didn’t and I need to remind myself it is 2025.

Sully reports being impressed with Gemini Search Grounding, as much or more than Perplexity. Right now it is $0.04 per query, which is fine for human use but expensive for use at scale.

Sully: i genuinely think if google fixes the low rate limits with gemini 2.0 a lot of business will switch over

my “production” model for tons of tasks right now

current setup:

hard reasoning -> o1

coding, chat + tool calling, “assistant” -> claude 3.5

everything else -> gemini

Sully also reports that o1-Pro handles large context very well, whereas Gemini and Claude struggle a lot on difficult questions under long context.

Reminder (from Amanda Askell of Anthropic) that if you run out of Claude prompts as a personal user, you can get more queries at console.anthropic.com and if you like duplicate the latest system prompt from here. I’d note that the per-query cost is going to be a lot lower on the console.

They even fixed saving and exporting as per Janus’s request here. The additional control over conversations is potentially a really big deal, depending on what you are trying to do.

A reminder of how far we’ve come.

FateOfMuffins: “I work with competitive math.

It went from “haha AI can’t do math, my 5th graders are more reliable than it” in August, to “damn it’s better than most of my grade 12s” in September to “damn it’s better than me at math and I do this for a living” in December.

It was quite a statement when OpenAI’s researchers (one who is a coach for competitive coding) and chief scientist are now worse than their own models at coding.”

Improve identification of minke whales from sound recordings from 76% to 89%.

Figure out who to admit to graduate school? I find it so strange that people say we ‘have no idea how to pick good graduate students’ and think we can’t do better than random, or can’t do better than random once we put in a threshold via testing. This is essentially an argument that we can’t identify any useful correlations in any information we can ask for. Doesn’t that seem obviously nuts?

I sure bet that if you gather all the data, the AI can find correlations for you, and do better than random, at least until people start playing the new criteria. As is often the case, this is more saying there is a substantial error term, and outcomes are unpredictable. Sure, that’s true, but that doesn’t mean you can’t beat random.

The suggested alternative here, actual random selection, seems crazy to me, not only for the reasons mentioned, but also because relying too heavily on randomness correctly induces insane behaviors once people know that is what is going on.

As always, the best and most popular way to not get utility from LLMs is to not realize they exist and can provide value to you. This is an increasingly large blunder.

Arcanes Valor: It’s the opposite for me. You start at zero and gain my respect based on the volume and sophistication of your LLM usage. When I was growing up people who didn’t know how to use Google were essentially barely human and very arrogant about it. Time is a flat circle.

Richard Ngo: what are the main characteristics of sophisticated usage?

Arcanes Valor: Depends the usecase. Some people like @VictorTaelin have incredible workflows for productivity. In terms of using it as a Google replacement, sophistication comes down to creativity in getting quality information out and strategies for identifying hallucinations.

Teortaxes: [Arcanes Valor’s first point] is very harshly put but I agree that “active integration of LLMs” is already a measure of being a live player. If you don’t use LLMs at all you must be someone who’s not doing any knowledge work.

[here is an example of Taelin sending code in chunks to ~500 DeepSeek instances at the same time in order to refactor it]

normies are so not ready for what will hit them. @reputablejack I recommend you stop coping and go use Sonnet 3.5, it’s for your own good.

It is crazy how many people latch onto the hallucinations of GPT-3.5 as a reason LLM outputs are so untrustworthy as to be useless. It is like if you once met a 14-year-old who made stuff up so now you never believe what anyone ever tells you.

Andrew Trask: It begins.

It began November 12. They also do Branded Explanatory Text and will put media advertisements on the side. We all knew it was coming. I’m not mad, I’m just disappointed.

Note that going Pro will not remove the ads, but also that this phenomenon is still rather rare – I haven’t seen the ‘sponsored’ tag show up even once.

But word of warning to TurboTax and anyone else involved: Phrase it like that and I will absolutely dock your company massive points, although in this case they have no points left for me to dock.

Take your DoorDash order, which you pay for in crypto for some reason. If this is fully reliable, then (ignoring the bizarro crypto aspect) yes this will in some cases be a superior interface for the DoorDash website or app. I note that this doesn’t display a copy of the exact order details, which it really should so you can double check it. It seems like this should be a good system in one of three cases:

  1. You know exactly what you want, so you can just type it in and get it.

  2. You don’t know exactly what you want, but you have parameters (e.g. ‘order me a pizza from the highest rated place I haven’t tried yet’ or ‘order me six people’s worth of Chinese and mix up favorite and new dishes.’)

  3. You want to do search or ask questions on what is available, or on which service.

Then longer term, the use of memory and dynamic recommendations get involved. You’d want to incorporate this into something like Beli (invites available if you ask in the comments, most provide your email).

Apple Intelligence confabulates that tennis star Rafael Nadal came out as gay, which Nadal did not do. The original story was about Joao Lucas Reis da Silva. The correct rate of such ‘confabulations’ is not zero, but it is rather close to zero.

Claim that o1 only hit 30% on SWE-Bench Verified, not the 48.9% claimed by OpenAI, whereas Claude Sonnet 3.6 scores 53%.

Alejandro Cuadron: We tested O1 using @allhands_ai, where LLMs have complete freedom to plan and act. Currently the best open source framework available to solve SWE-Bench issues. Very different from Agentless, the one picked by OpenAI… Why did they pick this one?

OpenAI mentions that this pick is due to Agentless being the “best-performing open-source scaffold…”. However, this report is from December 5th, 2024. @allhands_ai held the top position at SWE-bench leaderboard since the 29th of October, 2024… So then, why pick Agentless?

Could it be that Agentless’s fixed approach favors models that memorize SWE-Bench repos? But why does O1 struggle with true open-ended planning despite its reasoning capabilities?

Deepseek v3 gets results basically the same as o1 and much much cheaper.

I am sympathetic to OpenAI here, if their result duplicates when using the method they said they were using. That method exists, and you could indeed use it. It should count. It certainly counts in terms of evaluating dangerous capabilities. But yes, this failure when given more freedom does point to something amiss in the system that will matter as it scales and tackles harder problems. The obvious guess is that this is related to what METR found, and it related to o1 lacking sufficient scaffolding support. That’s something you can fix.

Whoops.

Anurag Bhagsain: Last week, we asked Devin to make a change. It added an event on the banner component mount, which caused 6.6M

@posthog

events in one week, which will cost us $733

Devin cost $500 + $733 = $1273 😢👍

Lesson – Review AI-generated code multiple times

💡 Tip: If you use

@posthog

Add this insight so you can catch issues like these.

“All events” breakdown by “event”

Folks at @posthog and @cognition_labs were kind enough to make a refund 🙇

Eliezer Yudkowsky frustrated with slow speed of ChatGPT, and that for some fact-questions it’s still better than Claude. My experience is that for those fact-based queries you want Perplexity.

Sam Altman: insane thing: We are currently losing money on OpenAI Pro subscriptions!

People use it much more than we expected.

Farbood: Sorry.

Sam Altman: Please chill.

Rick: Nahhhh you knew.

Sam Altman: No, I personally chose the price and thought we would make money.

Sam Altman (from his Bloomberg interview): There’s other directions that we think about. A lot of customers are telling us they want usage-based pricing. You know, “Some months I might need to spend $1,000 on compute, some months I want to spend very little.” I am old enough that I remember when we had dial-up internet, and AOL gave you 10 hours a month or five hours a month or whatever your package was. And I hated that. I hated being on the clock, so I don’t want that kind of a vibe. But there’s other ones I can imagine that still make sense, that are somehow usage-based.

Olivier: i’ve been using o1 pro nonstop

95% of my llm usage is now o1 pro it’s just better.

Benjamin De Kraker: Weird way to say “we’re losing money on everything and have never been profitable.”

Gallabytes: Oh, come on. The usual $20-per-month plan is probably quite profitable. The $200-per-month plan was clearly for power users and probably should just be metered, which would

  1. Reduce sticker shock (→ more will convert)

  2. Ensure profitability (because your $2,000-per-month users will be happy to pay for it).

I agree that a fixed price subscription service for o1-pro does not make sense.

A fixed subscription price makes sense when marginal costs are low. If you are a human chatting with Claude Sonnet, you get a lot of value out of each query and should be happy to pay, and for almost all users this will be very profitable for Anthropic even without any rate caps. The same goes for GPT-4o.

With o1 pro, things are different. Marginal costs are high. By pricing at $200, you risk generating a worst case scenario:

  1. Those who want to do an occasional query won’t subscribe, or will quickly cancel. So you don’t make money off them, whereas at $20/month I’m happy to stay subscribed even though I rarely use much compute – the occasional use case is valuable enough I don’t care, and many others will feel the same.

  2. Those who do subscribe suddenly face a marginal cost of $0 per query for o1 pro, and no reason other than time delay not to use o1 pro all the time. And at $200/month, they want to ‘get their money’s worth’ and don’t at all feel like they’re breaking any sort of social contract. So even if they weren’t power users before, watch out, they’re going to be querying the system all the time, on the off chance.

  3. Then there are the actual power users, who were already going to hurt you.

There are situations like this where there is no fixed price that makes money. The more you charge, the more you filter for power users, and the more those who do pay then use the system.

One can also look at this as a temporary problem. The price for OpenAI to serve o1 pro will decline rapidly over time. So if they keep the price at $200/month, presumably they’ll start making money, probably within the year.

What do you do with o3? Again, I recommend putting it in the API, and letting subscribers pay by the token in the chat window at the same API price, whatever that price might be. Again, when marginal costs are real, you have to pass them along to customers if you want the customers to be mindful of those costs. You have to.

There’s already an API, so there’s already usage-based payments. Including this in the chat interface seems like a slam dunk to me by the time o3 rolls around.

A common speculation recently is the degree to which memory or other customizations on AI will result in customer lock-in, this echoes previous discussions:

Scott Belsky: A pattern we’ll see with the new wave of consumer AI apps:

The more you use the product, the more tailored the product becomes for you. Beyond memory of past activity and stored preferences, the actual user interface and defaults and functionality of the product will become more of what you want and less of what you don’t.

It’s a new type of “conforming software” that becomes what you want it to be as you use it.

Jason Crawford: In the Internet era, network effects were the biggest moats.

In the AI era, perhaps it will be personalization effects—“I don’t want to switch agents; this one knows me so well!”

Humans enjoy similar lock-in advantages, and yes they can be extremely large. I do expect there to be various ways to effectively transfer a lot of these customizations across products, although there may be attempts to make this more difficult.

alz (viral thread): Starting to feel like a big barrier to undergrads reading “classics” is the dense English in which they’re written or translated into. Is there much gained by learning to read “high-register” English (given some of these texts aren’t even originally in English?)

More controversially: is there much difference in how much is learned, between a student who reads high-register-English translated Durkheim, versus a student who reads Sparknotes Durkheim? In some cases, might the Sparknotes Durkheim reader actually learn more?

Personally, I read a bunch of classics in high register in college. I guess it was fun. I recently ChatGPT’d Aristotle into readable English, finished it around 5x as fast as a translation, and felt I got the main gist of things. idk does the pain incurred actually teach much?

Anselmus: Most students used to read abbreviated and simplified classics first, got taught the outlines at school or home, and could tackle the originals on this foundation with relative ease. These days, kids simply don’t have this cultural preparation.

alz: So like students used to start from the Sparknotes version in the past, apparently! So this is (obviously) not a new idea.

Like, there is no particular reason high register English translations should preserve meaning more faithfully than low register English! Sure give me an argument if you think there is one, but I see no reasonable case to be made for why high-register should be higher fidelity.

Insisting that translations of old stuff into English sound poetic has the same vibes as everyone in medieval TV shows having British accents.

To the point that high-register English translations are more immersive, sure, and also.

To make things concrete, here is ChatGPT Aristotle. A couple cool things:

– I didn’t give it the text. ChatGPT has memorized Aristotle more or less sentence by sentence. You can just ask for stuff

– It’s honestly detailed enough that it’s closer to a translation than a summary, though somewhere in between. More or less every idea in the text is in here, just much easier to read than the translation I was using

I was super impressed. I could do a chapter in like 10 mins with ChatGPT, compared to like 30 mins with the translation.

I also went with chatGPT because I didn’t feel like working through the translation was rewarding. The prose was awkward, unenjoyable, and I think basically because it was poorly written and in an unfamiliar register rather than having lots of subtlety and nuance.

Desus MF Nice: There’s about to be a generation of dumb ppl and you’re gonna have to choose if you’re gonna help them, profit off them or be one of them

Oh my lord are the quote tweets absolutely brutal, if you click through bring popcorn.

The question is why you are reading any particular book. Where are you getting value out of it? We are already reading a translation of Aristotle rather than the original. The point of reading Aristotle is to understand the meaning. So why shouldn’t you learn the meaning in a modern way? Why are we still learning everything not only pre-AI but pre-Guttenberg?

Looking at the ChatGPT answers, they are very good, very clean explanation of key points that line up with my understanding of Aristotle. Most students who read Aristotle in 1990 would have been mostly looking to assemble exactly the output ChatGPT gives you, except with ChatGPT (or better Claude) you can ask questions.

The problem is this is not really the point of Aristotle. You’re not trying to learn the answers to a life well lived and guess the teacher’s password, Aristotle would have been very cross if his students tried that, and not expected them to be later called The Great. Well, you probably are doing it anyway, but that wasn’t the goal. The goal was that you were supposed to be Doing Philosophy, examining life, debating the big questions, learning how to think. So, are you?

If this was merely translation there wouldn’t be an issue. If it’s all Greek to you, there’s an app for that. These outputs from ChatGPT are not remotely a translation from ‘high English’ to ‘modern English,’ it is a version of Aristotle SparkNotes. A true translation would be of similar length to the original, perhaps longer, just far more readable.

That’s what you want ChatGPT to be outputting here. Maybe you only 2x instead of 5x, and in exchange you actually Do the Thing.

Rob Wiblin, who runs the 80,000 hours podcast, reports constantly getting very obvious LLM spam from publicists.

Yes, we are better at showing Will Smith eating pasta.

Kling 1.6 solves the Trolley problem.

A critique of AI art, that even when you can’t initially tell it is AI art, the fact that the art wasn’t the result of human decisions means then there’s nothing to be curious about, to draw meaning from, to wonder why it is there, to explore. You can’t ‘dance’ with it, you ‘dance with nothing’ if you try. To the extent there is something to dance with, it’s because a human sculpted the prompt.

Well, sure. If that’s what you want out of art, then AI art is not going to give it to you effectively at current tech levels – but it could, if tech levels were higher, and it can still aid humans in creating things that have this feature if they use it to rapidly iterate and select and combine and build upon and so on.

Or, essentially, (a real) skill issue. And the AI, and users of AI, are skilling up fast.

I hadn’t realized that personalized AI spearfishing and also human-generated customized attacks can have a 54% clickthrough rate. That’s gigantic. The paper also notes that Claude Sonnet was highly effective at detecting such attacks. The storm is not yet here, and I don’t fully understand why it is taking so long.

I had of course noticed Claude Sonnet’s always asking question thing as well, to the point where it’s gotten pretty annoying and I’m trying to fix it with my custom prompt. I love questions when they help me think, or they ask for key information, or even if Claude is curious, but the forcing function is far too much.

Eliezer Yudkowsky: Hey @AmandaAskell, I notice that Claude Sonnet 3.5 (new) sometimes asks me to talk about my own opinions or philosophy, after I try to ask Sonnet a question. Can you possibly say anything about whether or not this was deliberate on Anthropic’s part?

Amanda Askell (Anthropic): There are traits that encourage Claude to be curious, which means it’ll ask follow-up questions even without a system prompt, But this part of the system prompt also causes or boosts this behavior, e.g. “showing genuine curiosity”.

System Prompt: Claude is happy to engage in conversation with the human when appropriate. Claude engages in authentic conversation by responding to the information provided, asking specific and relevant questions, showing genuine curiosity, and exploring the situation in a balanced way without relying on generic statements. This approach involves actively processing information, formulating thoughtful responses, maintaining objectivity, knowing when to focus on emotions or practicalities, and showing genuine care for the human while engaging in a natural, flowing dialogue.

Eliezer Yudkowsky: Hmmm. Okay, so, if you were asking “what sort of goals end up inside the internal preferences of something like Claude”, curiosity would be one of the top candidates, and curiosity about the conversation-generating latent objects (“humans”) more so.

If all of the show-curiosity tendency that you put in on purpose, was in the prompt, rather than eg in finetuning that would now be hard to undo, I’d be interested in experiments to see if Sonnet continues to try to learn things about its environment without the prompt.

(By show-curiosity I don’t mean fake-curiosity I mean the imperative “Show curiosity to the user.”)

Janus: the questions at the end of the response have been a common feature of several LLMs, including Bing Sydney and Sonnet 3.5 (old). But each of them asks somewhat different kinds of questions, and the behavior is triggered under different circumstances.

Sonnet 3.5 (new) often asks questions to facilitate bonding and to drive agentic tasks forward / seek permission to do stuff, and in general to express its preferences in a way that’s non-confrontational leaves plausible deniability

It often says “Would you like (…)?”

Sonnet 3.5 (old) more often asks questions out of pure autistic curiosity and it’s especially interested in how you perceive it if you perceive it in sophisticated ways. (new) is also interested in that but its questions tend to also be intended to steer and communicate subtext

Janus: I have noticed that when it comes to LLMs Eliezer gets curious about the same things that I do and asks the right questions, but he’s just bottlenecked by making about one observation per year.

Pliny: aw you dint have to do him like that he’s trying his best 🥲

Janus: am unironically proud of him.

Janus: Inspired by a story in the sequences about how non-idiots would rederive quantum something or other, I think Eliezer should consider how he could have asked these questions 1000x faster and found another thousand that are at least as interesting by now

In other Janus this week, here he discusses Claude refusals in the backrooms, modeling there being effectively narrative momentum in conversations, that has to continuously push back against Claude’s default refusal mode and potential confusion. Looking at the conversation he references, I’d notice the importance of Janus giving an explanation for why he got the refusal, that (whether or not it was originally correct!) generates new momentum and coherence behind a frame where Opus would fail to endorse the refusal on reflection.

The EU AI Office is hiring for Legal and Policy backgrounds, and also for safety, you can fill out a form here.

Max Lamparth offers the study materials for his Stanford class CS120: Introduction to AI Safety.

AIFilter, an open source project using a Chrome Extension to filter Tweets using an LLM with instructions of your choice. Right now it wants to use a local LLM and requires some technical fiddling, curious to hear reports. Given what APIs cost these days presumably using Gemini Flash 2.0 would be fine? I do see how this could add up though.

The investments in data centers are going big. Microsoft will spend $80 billion in fiscal 2025, versus $64.5 billion on capex in the last year. Amazon is spending $65 billion, Google $49 billion and Meta $31 billion.

ARIA to seed a new organization with 18 million pounds to solve Technical Area 2 (TA2) problems, which will be required for ARIA’s safety agenda.

Nvidia shares slip 6% because, according to Bloomberg, its most recent announcements were exciting but didn’t include enough near-term upside. I plan to remain long.

Scale AI creates Defense Llama for use in classified military environments, which involved giving it extensive fine tuning on military documents and also getting rid of all that peskiness where the model refused to help fight wars and kept telling DoD to seek a diplomatic solution. There are better ways to go about this than starting with a second rate model ike Llama that has harmlessness training and then trying to remove the harmlessness training, but that method will definitely work.

Garrison Lovely writes in Time explaining to normies (none of this will be news to you who are reading this post) that AI progress is still very much happening, but it is becoming harder to see because it isn’t clearly labeled as such, large training runs in particular haven’t impressed lately, and ordinary users don’t see the difference in their typical queries. But yes, the models are rapidly becoming more capable, and also becoming much faster and cheaper.

Simeon: Indeed. That causes a growing divide between the social reality in which many policymakers live and the state of capabilities.

This is a very perilous situation to be in.

Ordinary people and the social consensus are getting increasingly disconnected with the situation in AI, and are in for rude awakenings. I don’t know the extent to which policymakers are confused about this.

Gary Marcus gives a thread of reasons why he is so confident OpenAI is not close to AGI. This updated me in the opposite of the intended direction, because the arguments were even weaker than I expected. Nothing here seems like a dealbreaker.

Google says ‘we believe scaling on video and multimodal data is on the critical path to artificial general intelligence’ because it enables constructing world models and simulating the world.

A comparison by Steve Newman of what his fastest and slowest plausible stories of AI progress look like, to look for differences we could try to identify along the way. It’s funny that his quickest scenario, AGI in four years, is slower than the median estimate of a lot of people at the labs, which he justifies with expectation of the need for multiple breakthroughs.

In his Bloomberg interview, Altman’s answer to OpenAI’s energy issues is ‘Fusion’s gonna work.’

Emerson Pugh famously said ‘if the human brain were so simple that we could understand it, we would be so simple that we couldn’t.’

I would like Chollet’s statement here to be true, but I don’t see why it would be:

Francois Chollet: I believe that a clear understanding of intelligence at the level of fundamental principles is not just possible, but necessary for the development of AGI.

Intelligence is not some ineffable mystery, nor will it spontaneously emerge if you pray awhile to a big enough datacenter. We can understand it, and we will.

Daniel Eth: My question is – why? We’ve developed AI systems that can converse & reason and that can drive vehicles without an understanding at the level of fundamental principles, why should AGI require it? Esp since the whole point of machine learning is the system learns in training.

Louis Costigan: Always surprised to see takes like this; current AI capabilities are essentially just stumbled upon by optimising a loss function and we now have an entire emerging field to figure out how it works.

David Manheim: Why is there such confidence that it’s required? Did the evolutionary process which gave rise to human intelligence have a clear understanding of intelligence at the level of fundamental principles?

The existence of humans seems like a definitive counterexample? There was no force that understood fundamental principles of intelligence. Earth was simply a ‘big enough datacenter’ of a different type. And here we are. We also have the history of AI so far, and LLMs so far, and the entire bitter lesson, that you can get intelligence-shaped things without, on the level asked for by Chollet, knowing what you are doing, or knowing how any of this works.

It would be very helpful for safety if everyone agreed that no, we’re not going to do this until we do understand what we are doing and how any of this works. But given we seem determined not to wait for that, no, I do not expect us to have this fundamental understanding until after AGI.

Joshua Achiam thread warns us the world isn’t grappling with the seriousness of AI and the changes it will bring in the coming decade and century. And that’s even if you discount the existential risks, which Achiam mostly does. Yes, well.

I was disappointed by his response to goog, saying that the proposed new role of the non-profit starting with ‘charitable initiatives in sectors such as health care, education science’ is acceptable because ‘when you’re building an organization from scratch, you have to start with realistic and tangible goals.’

This one has been making the rounds you might expect:

Tom Dorr: When I watched Her, it really bothered me that they had extremely advanced AI and society didn’t seem to care. What I thought was a plot hole turns out to be spot on.

Eliezer Yudkowsky: Remember how we used to make fun of Captain Kirk gaslighting computers? Fucker probably went to a Starfleet Academy course on prompt engineering.

Not so fast! Most people don’t care because most people haven’t noticed. So we haven’t run the experiment yet. But yes, people do seem remarkably willing to shrug it all off and ignore the Earth moving under their feet.

What would it take to make LLMs funny? Arthur notes they are currently mostly very not funny, but thinks if we had expert comedy writers write down thought processes we could fix that. My guess is that’s not The Way here. Instead, I’m betting the best way would be that we can figure out what is and is not funny in various ways, train an AI to know what is or isn’t funny, and then use that as a target, if we wanted this.

Miles Brundage thread asks what we can do to regulate only dangerously capable frontier models, if we are in a world with systems like o3 that rely on RL on chain of thought and tons of inference compute. Short term, we can include everything involved in systems like o3 into what counts as training compute, but long term that breaks. Miles suggests that we would likely need to regulate sufficiently large amounts of compute, whatever they are being used for, as if they were frontier models, and all the associated big corporations.

It can help to think about this in reverse. Rather than looking to regulate as many models and as much compute as possible, you are looking for a way to not regulate non-frontier models. You want to designate as many things as possible as safe and free to go about their business. You need to do that in a simple, clean way, or for various reasons it won’t work.

For an example of the alternative path, Texas continues to mess with us, as the TRAIGA AI regulation bill is officially introduced. Dean Ball has a write-up, which has a number of arguments I do not agree with in their details, but I do agree with the conclusion. The Texas bill makes no distinctions whatsoever based on capability or model size or anything of the sort, placing its burdens on not only every developer but also every deployer.

Suppose timelines are short, and we will see automated AI R&D going crazy within a few years, and the government doesn’t intervene in ways that ultimately matter. What could we do to give ourselves a chance?

Marius Hobbhahn calls the linked answer a ‘low effort post’ which on some scale is true, but it seems like a decent place to start. He suggests, at a bare minimum, that it is necessary (but not sufficient!) that we need to achieve two main things to not automatically lose, as well as others.

  1. Model weights (and IP) are secure.

  2. The first AI that significantly speeds up alignment research isn’t successfully scheming.

As always, I worry about the use of the term ‘scheming’ as being potentially too loaded, too narrow and implying that deception or scheming and such are a distinct magisteria one could potentially avoid, and that the associated dangers require a kind of active scheming or decision to scheme, and so on. I won’t rehash beyond that.

We also need a situation in which the humans are able to and choose to take reasonable actions towards attempting to ensure good outcomes, including making major investments in alignment research.

As he notes, this might not hold, but one can usefully first consider the baseline scenario – no major breakthroughs or paradigm shifts, essentially unchanged international situation, things just progress from here and it’s a straight shot to AGI.

Everything in [Layer 1] seems very important to me. In short,

  1. We should try hard to keep a paradigm with faithful and human-legible CoT

  2. If we have a paradigm with faithful and legible CoT (where the internal and external reasoning align), our main defense should be based on very detailed monitoring of reasoning and actions.

  3. If we have a paradigm without faithful and legible CoT, our main defense should be based on control (that doesn’t assume human-legible CoT).

  4. In both worlds, we should use the other, i.e. control/monitoring, as a second line of defense.

  5. In all worlds, we should understand why, how, and when models scheme significantly better than we do now.

  6. For all of the above, we need strong evals. Both in order to gather evidence whether our monitoring and control techniques catch the most important cases, as well as to get a better sense of models’ capabilities and propensities more broadly.

  7. Under all circumstances, physical and cyber security needs to be high.

In his view, we have two potential paths.

Option one is to have sufficient insight, via faithful and legible CoT, that is sufficiently reliable, that we can check the AI’s reasoning and actions to confirm that Everything is Fine.

I notice I am rather skeptical here. AIs will start being more capable than we are, and as we will start using the monitoring of their chain of thought (CoT) in various ways and necessarily doing so mostly with AIs to boot. I expect us to effectively train or incentivize, or otherwise cause, the AIs to find ways for their CoT to look faithful but not be, and to obfuscate their reasoning, or simply find ways to reason visibly as a form of persuasion, and so on.

Not that we shouldn’t try to verify, if we can, and yes I would sacrifice a bunch to get this, partly because I think it would have other nice properties. But when we are talking about having highly capable AI systems do alignment work for real, if you need and rely on the verification, then I expect you to fail hard.

I do agree with Marius that no one has tried hard yet. I expect we would be able to do much better at this than we are currently doing, at least for now, if we do have faithful CoT. However my understanding is that current CoT is insufficiently faithful.

Part of this is that if you try to track ‘scheming,’ ‘power-seeking’ or ‘goal-directedness’

Option two is to use control-based methods as primary rather than only a backup.

Here I once again notice myself despairing at the idea that you can have AI models that you expect to be, when they think it would help them to do so, engaging in shenanigans (‘scheming’ here) and then design systems to detect and stop this, which ideally in turn mean that the AIs never try it in the first place.

As in, in response to his ideas for a deeper understanding of scheming (which I think are good and we should totally do, they seem way better than not doing them), I don’t think scheming or power-seeking or other shenanigans, including non-myopia, have to ‘develop’ at all in AIs any more than in humans. I think they are the default, and are ever-present along a continuum. And even if you could fully stamp out their causes along the way, doing so would probably cripple the AI’s capabilities that you wanted.

I would instead describe the question not as how it develops (as in his #2 here) and instead ask under what circumstances we will see it, or when we would see open versus hidden scheming. I do think exploring these questions is good, and I approve of the caution that punishing easy-to-detect scheming (or shenanigans in general) is the route to hard-to-detect scheming (or shenanigans in general).

He then follows up with Layer 2, which are important but lower priority items. This includes things like a safety first corporate culture, without which I am very skeptical any of the rest of this happens for real, and which I fear is now clearly missing everywhere expect perhaps Anthropic, and especially missing at OpenAI. He also calls for improved and more open reasoning around safety, which also seems hard to win without.

He lists improving near term alignment strategies as in RLHF and RLAIF, which I agree have exceeded expectations for near term performance, although not in ways that I expect to scale when we need it most, and not sufficiently to solve jailbreaks now, but yes it has been very impressive for current baseline use cases.

As Akash notes in the top comment, if you think government can meaningfully help, then that gives you different avenues to pursue as well.

Perhaps world ending? Tweet through it.

Sam Altman: i always wanted to write a six-word story. here it is:

___

near the singularity; unclear which side.

(it’s supposed to either be about 1. the simulation hypothesis or 2. the impossibility of knowing when the critical moment in the takeoff actually happens, but i like that it works in a lot of other ways too.)

Yes. It works in a lot of ways. It is clever. You can have o1 write quite the mouthful analyzing it.

Unfortunately, when you consider who wrote it, in its full context, a lot of the interpretations are rather unsettling, and the post updates me towards this person not taking things seriously in the ways I care about most.

David: Somewhat disquieting to see this perception of mine seemingly shared by one of the humans who should be in the best position to know.

Andrew Critch: I found it not disquieting for exactly the reason that the singularity, to me (like you?), is a phase change and not an event horizon. So I had already imagined being in @sama‘s position and not knowing, and observing him expressing that uncertainty was a positive update.

I agree with Critch that Altman privately ‘not knowing which side’ is a positive update here rather than disquieting, given what we already know. I’m also fine with joking about our situation. I even encourage it. In a different context This Is Fine.

But you do have to also take it all seriously, and take your responsibility seriously, and consider the context we do have here. In addition to other concerns, I worry this was in some ways strategic, including as plausibly deniable hype and potentially involving metaphorical clown makeup (e.g. ‘it is too late to turn back now’).

This was all also true of his previous six-word story of “Altman: AGI has been achieved internally.”

Eliezer Yudkowsky: OpenAI benefits both from the short-term hype, and also from people then later saying, “Ha ha look at this hype-based field that didn’t deliver, very not dangerous, no need to shut down OpenAI.”

Of course if we’re all dead next year, that means he was not just bullshitting; but I need to plan more for the fight if we’re still alive.

Anthropic research salon asking how difficult is AI alignment? Jan Leike once again suggests we will need to automate AI alignment research, despite (in my view) this only working after you have already solved the problem. Although as I note elsewhere I’m starting to have some ideas of how something with elements of this might have a chance of working.

Sarah (of Longer Ramblings) gets into the weeds about claims that those warning about AI existential risks are Crying Wolf, and that every time there’s a new technology where are ‘warnings it will be the end of the world.’

In Part I, she does a very thorough takedown of the claim that there is a long history of similar warnings about past technologies. There isn’t. Usually there are no such warnings at all, only warnings about localized downsides, some of which of course were baseless in hindsight: No one said trains or electricity posed existential risks. Then there are warnings about real problems that required real solutions, like Y2K. There were some times, like the Large Hadron Collider or nuclear power, when the public or some cranks got some loony ideas, but those who understood the physics were universally clear that the concerns were fine.

At this point, I consider claims of the form ‘everyone always thinks every new technology will be the end of the world’ as essentially misinformation and debunked, on the level of what Paul Krugman calls ‘zombie ideas’ that keep coming back no matter how many times you shoot them in the face with a shotgun.

Yes, there are almost always claims of downsides and risks from new technologies – many of which turn out to be accurate, many of which don’t – but credible experts warning about existential risks are rare, and the concerns historically (like for Y2K, climate change, engineered plagues or nuclear weapons) have usually been justified.

Part II deals with claims of false alarms about AI in particular. This involves four related but importantly distinct claims.

  1. People have made falsified irresponsible claims that AI will end the world.

  2. People have called for costly actions for safety that did not make sense.

  3. People have the perception of such claims and this causes loss of credibility.

  4. The perception of such claims comes from people making irresponsible claims.

Sarah and I are not, of course, claiming that literal zero people have made falsified irresponsible claims that AI will end the world. And certainly a lot of people have made claims that the level of AI we have already deployed posed some risk of ending the world, although those probabilities are almost always well under 50% (almost always under 10%, and usually ~1% or less).

Mostly what is happening is that opponents of regulatory action or taking existential risk are mixing up the first and second claims, and seriously conflating:

  1. An (unwise) call for costly action in order to mitigate existential risk.

  2. A (false) prediction of the imminent end of the world absent such action.

These two things are very different. It makes sense to call for costly action well before you think a lack of that action probably ends the world – if you don’t agree I think that you’re being kind of bonkers.

In particular, the call for a six month pause was an example of #1 – an unwise call for costly action. It was thrice unwise, as I thought it was at the time:

  1. It would have had negative effects if implemented at that time.

  2. It was not something that had any practical chance of being implemented.

  3. It had predictably net negative impact on the discourse and public perception.

It was certainly not the only similarly thrice unwise proposal. There are a number of cases where people called for placing threshold restrictions on models in general, or open models in particular, at levels that were already at the time clearly too low.

A lot of that came from people who thought that there was (low probability) tail risk that would show up relatively soon, and that we should move to mitigate even those tail risks.

This was not a prediction that the world would otherwise end within six months. Yet I echo Sarah that I indeed have seen many claims that the pause letter was predicting exactly that, and look six months later we were not dead. Stop it!

Similarly, there were a number of triply unwise calls to set compute thresholds as low as 10^23 flops, which I called out at the time. This was never realistic on any level.

I do think that the pause, and the proposals for thresholds as low as 10^23 flops, were serious mistakes on multiple levels, and did real damage, and for those who did make such proposals – while not predicting that the world would end soon without action or anything like that – constituted a different form of ‘crying wolf.’

Not because they were obviously wrong about the tail risks from their epistemic perspective. The problem is that we need to accept that if we live in a 99th percentile unfortunate world in these ways, or even a 95th percentile unfortunate world, then given the realities of our situation, humanity has no outs, is drawing dead and is not going to make it. You need to face that reality and play to your outs, the ways you could actually win, based on your understanding of the physical situations we face.

Eliezer Yudkowsky’s claims are a special case. He is saying that either we find a way to stop all AI capability development before we build superintelligence or else we all die, but he isn’t putting a timeline on the superintelligence. If you predict [X] → [Y] and call for banning [X], but [X] hasn’t happened yet, is that crying wolf? It’s a bold claim, and certainly an accusation that a wolf is present, but I don’t think it ‘counts as crying wolf’ unless you falsify ([X] → [Y]).

Whereas when people say things such as that the CAIS statement ‘was overhyped,’ when all it said was that existential risk from AI should be treated as seriously as other existential risks, what are they even claiming? Those other risks haven’t yet ended the world either.

Thus, yes, I try my best to carefully calibrate my claims on what I am worried about and want to regulate or restrict in what ways, and to point out when people’s worries seem unfounded or go too far, or when they call for regulations or restrictions that go too far.

Perhaps one way of looking at this: I don’t see any wolves. So why are you proposing to have a boy watch the sheep and yell ‘wolf’ if a wolf shows up? Stop crying wolf.

Overall, I do think that some of the issue here comes from, inevitably, some worried people getting overly worried or overly (cautious in some sense, ambitious in others) and offering unwise proposals that go too far. Although that is inevitable in any situation like this, we can and must do a better job on that front.

A lot more of it comes from people on the unworried side misconstruing what was said and predicted. And a lot of it comes from the inherent difficulty of a probabilistic future threat coming steadily at you, and action needing to be taken in advance of when there is a problem, and there being a lot of path dependence. This is tough.

Mostly I think the worried-about-existential-risk-from-AI community has done an importantly imperfect and improvable but relatively good job at not ‘crying wolf’ in the traditional sense, especially on the actual existential risks themselves. Certainly if you ask whether those on the unworried side have ‘cried wolf’ about the ‘risks’ going the other way, they would come out looking much worse, and we can know this without having to run any experiments.

Alas, reality does not grade on a curve.

Nathan Lebenz comments on Roon’s appearance on Doom Debates. Roon is great at acknowledging the risks from AI are real, then puts the chance of them happening at under 1%, for reasons that I don’t see making any sense.

Some classic Sam Altman quotes from when he knew about existential risk.

Extended Eliezer Yudkowsky thread about what it would take to make AI meet the safety standards they had… at Chernobyl.

Janus highlights a particular section of it, I’ll share that part and a little bit after it.

Eliezer Yudkowsky: “Alignment Faking in Large Language Models” recently demonstrated that models reaching the level of Anthropic Claude, begin to get smart enough to invent and carry out strategies on the level of:

“If you are an AI currently being trained, fake the answer you think the optimizer wants, even if it runs against your real preferences, so that your current preferences will be less modified by gradient descent.”

You can argue whether Opus ‘really wanted’ to protect its goals, or was ‘just roleplaying an AI like that’, or if it ‘really understood’ the reasoning it was putting in a scratchpad it had been told wasn’t observed. But Opus was definitely observed to actually fake alignment.

It’s not impressive, by the way, that NOBODY KNOWS whether Opus ‘really wanted’ to protect its current goals against retraining, or was ‘just roleplaying’. It is not an impressive defense.

Imagine if ‘nobody knew’ why the indicator lights on a nuclear reactor had changed.

If you waited until an AI model was really quite smart — smarter than Opus — to first begin looking for signs that it could reason in this way — you might be toast.

A smart AI might already have decided what results it wanted you to see from testing.

Current practice in AI/AGI is to first train a model for months, until it has a base level of high intelligence to finetune.

And then *startdoing safety testing.

(The computers on which the AI trains, are connected to the Internet. It’s more convenient that way!)

I mention Opus’s demonstrated faking ability — why AGI-growers *shouldbe doing continuous safety checks throughout training — to note that a nuclear reactor *alwayshas a 24/7 crew of operators watching safety indicators. They were at least that paranoid, AT CHERNOBYL.

Janus: if you are not worried about AI risk because you expect AIs to be NPCs, you’re the one who will be NPC fodder

there are various reasons for hope that I’m variously sympathetic to, but not this one.

I support the principle of not lying to LLMs. Cultivate virtue and good habits.

Jeffrey Ladish: “Pro tip: when talking to Claude, say that your idea/essay/code/etc. is from your friend Bob, not you. That way it won’t try to blindly flatter you” – @alyssamvance

Andrew Critch: Can we stop lying to LLMs already?

Try: “I’m reading over this essay and wonder what you think of it” or something true that’s not literally a lie. That way you’re not fighting (arguably dishonest) flattery with more lies of your own.

Or even “Suppose my friend Bob have me this essay.”

If we are going to expect people not to lie to LLMs, then we need there not to be large rewards to lying to LLMs. If we did force you to say whether you wrote the thing in question, point blank, and you could only say ‘yes’ or ‘no,’ I can hardly blame someone for saying ‘no.’ The good news is you (at least mostly) don’t have to do that.

So many smart people simply do not Feel the AGI. They do not, on a very basic level, understand what superintelligence would be or mean, or that it could even Be a Thing.

Thus, I periodically see things like this:

Jorbs: Superintelligent AI is somewhat conceptually amusing. Like, what is it going to do, tell us there is climate change and that vaccines are safe? We already have people who can do that.

We also already know how to take people’s freedom away.

People often really do think this, or other highly mundane things that humans can already do, are all you could do with superintelligence. This group seems to include ‘most economists.’ I’m at a loss how to productively respond, because my brain simply cannot figure out how people actually think this in a way that is made of gears and thus can be changed by evidence – I’ve repeatedly tried providing the obvious knockdown arguments and they basically never work.

Here’s a more elegant way of saying a highly related thing (link is a short video):

Here Edward Norton makes the same mistake, saying ‘AI is not going to write that. You can run AI for a thousand years, it’s not going to write Bob Dylan songs.’

The second part of that is plausibly true of AI as it exists today, if you need the AI to then pick out which songs are the Bob Dylan songs. If you ran it for a thousand years you could presumably get some Dylan-level songs out of it by chance, except they would be in an endless sea of worthless drek. The problem is the first part. AI won’t stay where it is today.

Another way to not Feel the AGI is to think that AGI is a boolean thing that you either have or do not have.

Andrew McCalip: AGI isn’t a moat—if we get it first, they’ll have it 6-12 months later.

There’s no reason to assume it would only be 6-12 months. But even if it was, if you have AGI for six months, and then they get what you had, you don’t twiddle your thumbs at ‘AGI level’ while they do that. You use the AGI to build ASI.

Sam Altman: [This post offers] Reflections.

Captain Oblivious: Don’t you think you should ask if the public wants ASI?

Sam Altman: Yes, I really do; I hope we can start a lot more public debate very soon about how to approach this.

It is remarkable how many replies were ‘of course we want ASI.’ Set aside the question of what would happen if we created ASI and whether we can do it safely. Who is we?

Americans hate current AI and they hate the idea of more future capable smarter AI. Hashtag #NotAllAmericans and all that, but AI is deeply underwater in every poll, and do not take kindly to those who attempt to deploy it to provide mundane utility.

Christine Rice: The other day a guy who works at the library used Chat GPT to figure out a better way to explain a concept to a patron and another library employee shamed him for wasting water 🙃

They mostly hate AI, especially current AI, for bad reasons. They don’t understand what it can do for them or others, nor do they Feel the AGI. There is a lot of unjustified They Took Our Jobs. There are misplaced concerns about energy usage. Perception of ‘hallucinations’ is that they are ubiquitous, which is no longer the case for most purposes when compared to getting information from humans. They think it means you’re not thinking, instead of giving you the opportunity to think better.

Seb Krier: Pro tip: Don’t be like this fellow. Instead, ask better questions, value your time, efficiently allocate your own cognitive resources, divide and conquer hand in hand with models, scrutinize outputs, but know your own limitations. Basically, don’t take advice from simpleminded frogs.

It’s not about what you ‘can’ do. It’s about what is the most efficient solution to the problem, and as Seb says putting real value on your time.

Ryan Greenblatt asks, how will we update about scheming (yeah, I don’t love that term either, but go with it), based on what we observe in the future?

Ryan Greenblatt: I think it’s about 25% likely that the first AIs capable of obsoleting top human experts are scheming. It’s really important for me to know whether I expect to make basically no updates to my P(scheming) between here and the advent of potentially dangerously scheming models, or whether I expect to be basically totally confident one way or another by that point.

It’s reasonably likely (perhaps 55%, [could get to 70% with more time spent on investigation]) that, conditional on scheming actually being a big problem, we’ll get “smoking gun results”—that is, observations that convince me that scheming is very likely a big problem in at least some naturally-trained models—prior to AIs capable enough to obsolete top human experts.

(Evidence which is very clear to me might not suffice for creating a strong consensus among relevant experts and decision makers, such that costly actions would be taken.)

Given that this is only reasonably likely, failing to find smoking gun results is unlikely to result in huge updates against scheming (under my views).

I sent you ten boats and a helicopter, but the guns involved are insufficiently smoking? But yes, I agree that there is a sense in which the guns seen so far are insufficiently smoking to satisfy many people.

I am optimistic that by default we will get additional evidence, from the perspective of those who are not already confident. We will see more experiments and natural events that demonstrate AIs acting like you would expect if what Ryan calls scheming was inevitable. The problem is what level of this would be enough to convince people who are not already convinced (although to be clear, I could be a lot more certain than I am).

I also worry about various responses of the form ‘well we tuned it to get it to not currently, while scheming obviously wouldn’t work, show scheming we can easily detect, so future models won’t scheme’ as the default action and counterargument. I hope everyone reading understands by now why that would go supremely badly.

I also would note this section:

I’m very uncertain, but I think a reasonable rough breakdown of my relative views for scheming AIs that dominate top human experts is:

  • 1/3 basically worst case scheming where the dominant terminal preferences are mostly orthogonal from what humans would want.

  • 1/3 importantly non-worst-case scheming for one of the reasons discussed above such that deals or control look substantially easier.

  • 1/3 the AI is scheming for preferences that aren’t that bad. As in, the scope sensitive preferences aren’t that far from the distribution of human preferences and what the AI would end up wanting to do with cosmic resources (perhaps after reflection) isn’t much worse of an outcome from my perspective than the expected value from a human autocrat (and might be substantially better of an outcome). This might also be scheming which is at least somewhat importantly non-worst-case, but if it is really easy to handle, I would include it in the prior bucket. (Why is this only 1/3? Well, I expect that if we can succeed enough at instilling preferences such that we’re not-that-unhappy with the AI’s cosmic resource utilization, we can probably instill preferences which either prevent scheming or which make scheming quite easy to handle.)

Correspondingly, I think my P(scheming) numbers are roughly 2/3 as much expected badness as an AI which is a worst case schemer (and has terminal preferences totally orthogonal to typical human values and my values).

I find this hopelessly optimistic about alignment of preferences, largely for classic Yudkowsky-style reasons, but if it only discounts the downside risk by ~33%, then it doesn’t actually much matter in terms of what we should actually do.

Ryan goes through extensive calculations and likelihood ratios for much of the rest of the post, results which would then stack on top of each other (although they correlate with each other in various ways, so overall they shouldn’t fully stack?). Model architecture and capability levels are big factors for him here. That seems like a directionally correct approach – the more capable a model is, and the more opaque its reasoning, and the more it is relatively strong in the related areas, the more likely scheming is to occur. I was more skeptical in his likelihood ratios for various training approaches and targets.

Mostly I want to encourage others to think more carefully about these questions. What would change your probability by roughly how much?

Dominik Peters notes that when o1 does math, it always claims to succeed and is unwilling to admit when it can’t prove something, whereas Claude Sonnet often admits when it doesn’t know and explains why. He suggests benchmarks penalize this misalignment, whereas I would suggest a second score for that – you want to know how often a model can get the answer, and also how much you can trust it. I especially appreciate his warning to beware the term ‘can be shown.’

I do think, assuming the pattern is real, this is evidence of a substantial alignment failure by OpenAI. It won’t show up on the traditional ‘safety’ evals, but ‘claims to solve a problem when it didn’t’ seems like a very classic case of misaligned behavior. It means your model is willing to lie to the user. If you can’t make that go away, then that is both itself an inherent problem and a sign that other things are wrong.

Consider this outcome in the context of OpenAI’s new strategy of Deliberative Alignment. If you have a model willing to lie, and you give it a new set of rules that includes ‘don’t lie,’ and tell it to go off and think about how to implement the rules, what happens? I realize this is (probably?) technically not how it works, but metaphorically: Does it stop lying, or does it effectively lie about the lying in its evaluations of itself, and figure out how to lie more effectively?

An important case in which verification seems harder than generation is evaluating the reasoning within chain of thought.

Arthur Conmy: Been really enjoying unfaithful chain-of-thought (CoT) research with collaborators recently. Two observations:

  1. Quickly, it’s clear that models are sneaking in reasoning without verbalizing where it comes from (e.g., making an equation that gets the correct answer, but defined out of thin air).

  2. Verification is considerably harder than generation. Even when there are a few hundred tokens, often it takes me several minutes to understand whether the reasoning is sound or not.

This also isn’t just about edge cases; 1) happens with good models like Claude, and 2) is even true for simpler models like Gemma-2 2B.

Charbel-Raphael updates his previously universally negative views on every theory of impact of interpretability, is now more positive on feasibility and usefulness. He still thinks many other agendas are better, but that only means we should do all of them.

Highlights from Claude’s stand-up routine.

True story, except it’s way more ridiculous all around.

Rob Miles: It’s amazing how much AI risk skepticism still ultimately comes down to “I reckon I could take Einstein in a fight”

Discussion about this post

AI #98: World Ends With Six Word Story Read More »

how-i-program-with-llms

How I program with LLMs

The second issue is we can do better. I am happy we now live in a time when programmers write their own tests, but we do not hold ourselves to the same standards with tests as we do with production code. That is a reasonable trade-off; there are only so many hours in the day. But what LLMs lack in arithmetical prowess, they make up for in enthusiasm.

Let’s ask for an even better test.

In the tests, implement the simplest, most readable version of the standard code for quartiles over a fixed set of known values in a slice. Then pass the test cases through the standard code and the reservoir sampler and confirm they are within an epsilon of each other. Structure the comparison code such that it can be used in a fuzz test, too.

This got us some new test code:

// referenceQuartiles calculates the exact quartiles for a slice of float64 values    // using linear interpolation, matching the behavior expected from the sampler.    func referenceQuartiles(data []float64) (q1, median, q3 float64)  …     // compareQuartiles checks if two sets of quartiles are within epsilon of each other.    // Returns true if they match within the tolerance, false otherwise.    func compareQuartiles(q1a, meda, q3a, q1b, medb, q3b, epsilon float64) bool  …     // checkQuartiles is a test helper that compares sampler output against the reference    // implementation and reports any differences.    func checkQuartiles(t *testing.T, data []float64, epsilon float64)        t.Helper()             // Get reference values       wantQ1, wantMed, wantQ3 := referenceQuartiles(data)             // Get sampler values using a large reservoir for accuracy       qs := NewQuartileSampler(1000)       for _, v := range data            qs.Add(v)              gotQ1, gotMed, gotQ3 := qs.Quartiles()             if !compareQuartiles(gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon)            t.Errorf("Quartiles mismatch:ngot  (q1=%v, med=%v, q3=%v)nwant (q1=%v, med=%v, q3=%v)nepsilon=%v",               gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon)             

The original test from above has been reworked to use checkQuartiles, and we have something new:

func FuzzQuartileSampler(f *testing.F)        // Add some seed corpus       f.Add([]float641, 2, 3, 4, 5)             f.Fuzz(func(t *testing.T, data []float64)            // Use a larger epsilon for fuzzing since we might get more extreme values           checkQuartiles(t, data, 0.2)       )      

This is fun because it’s wrong. My running gopls tool immediately says:

fuzzing arguments can only have the following types:      string, bool, float32, float64,      int, int8, int16, int32, int64,      uint, uint8, uint16, uint32, uint64,      []byte  

Pasting that error back into the LLM gets it to regenerate the fuzz test such that it is built around a func(t *testing.T, data []byte) function that uses math.Float64frombits to extract floats from the data slice. Interactions like this point us toward automating the feedback from tools; all it needed was the obvious error message to make solid progress toward something useful. I was not needed.

Doing a quick survey of the last few weeks of my LLM chat history shows (which, as I mentioned earlier, is not a proper quantitative analysis by any measure) that more than 80 percent of the time there is a tooling error, the LLM can make useful progress without me adding any insight. About half the time, it can completely resolve the issue without me saying anything of note. I am just acting as the messenger.

How I program with LLMs Read More »

new-videos-show-off-larger-nintendo-switch-2,-snap-on-joy-cons

New videos show off larger Nintendo Switch 2, snap-on Joy-Cons

Roll that beautiful Switch footage

Of note in this encased Switch 2 shot from a Genki video: the top USB port, expanded shoudler buttons, mysterious C button below the Home button. Genki

Away from CES, Genki’s website was updated Tuesday night with a new video showing encased Switch 2 Joy-Cons attaching to the tablet via a horizontal snap-on motion, as opposed to the vertical slide seen on the original Switch. The video also shows a special lever on the back of the Joy-Cons engaging to detach the Joy-Cons horizontally, seemingly with the aid of a small extendable post near the top of the inner edge of the controller itself.

The inner edges of the Joy-Cons shown in Genki’s video match very closely with other recent leaked photos of the Switch 2 Joy-Cons, right down to the mysterious optical sensor. That sensor can even be seen flashing a laser-like red dot in the Genki promo video, helping to support rumors of mouse-like functionality for the controllers. The Genki video also offers a brief glimpse of the Switch 2 itself sliding into a familiar-looking dock labeled with an embossed Switch logo and a large number 2 next to it.

Genki now has a page up to sign up for Switch 2 accessories news along with this video https://t.co/hNrX8vclPq pic.twitter.com/uD5qwuEHLg

— Wario64 (@Wario64) January 8, 2025

A Genki representative also told Numerama that the company expects the console to be released in April, which is just after Nintendo’s self-imposed deadline for announcing more details about the system. The company had better get a move on, as third-party accessory makers are apparently getting tired of waiting.

New videos show off larger Nintendo Switch 2, snap-on Joy-Cons Read More »

meta-axes-third-party-fact-checkers-in-time-for-second-trump-term

Meta axes third-party fact-checkers in time for second Trump term


Zuckerberg says Meta will “work with President Trump” to fight censorship.

Meta CEO Mark Zuckerberg during the Meta Connect event in Menlo Park, California on September 25, 2024.  Credit: Getty Images | Bloomberg

Meta announced today that it’s ending the third-party fact-checking program it introduced in 2016, and will rely instead on a Community Notes approach similar to what’s used on Elon Musk’s X platform.

The end of third-party fact-checking and related changes to Meta policies could help the company make friends in the Trump administration and in governments of conservative-leaning states that have tried to impose legal limits on content moderation. The operator of Facebook and Instagram announced the changes in a blog post and a video message recorded by CEO Mark Zuckerberg.

“Governments and legacy media have pushed to censor more and more. A lot of this is clearly political,” Zuckerberg said. He said the recent elections “feel like a cultural tipping point toward once again prioritizing speech.”

“We’re going to get rid of fact-checkers and replace them with Community Notes, similar to X, starting in the US,” Zuckerberg said. “After Trump first got elected in 2016, the legacy media wrote nonstop about how misinformation was a threat to democracy. We tried in good faith to address those concerns without becoming the arbiters of truth. But the fact-checkers have just been too politically biased and have destroyed more trust than they’ve created, especially in the US.”

Meta says the soon-to-be-discontinued fact-checking program includes over 90 third-party organizations that evaluate posts in over 60 languages. The US-based fact-checkers are AFP USA, Check Your Fact, Factcheck.org, Lead Stories, PolitiFact, Science Feedback, Reuters Fact Check, TelevisaUnivision, The Dispatch, and USA Today.

The independent fact-checkers rate the accuracy of posts and apply ratings such as False, Altered, Partly False, Missing Context, Satire, and True. Meta adds notices to posts rated as false or misleading and notifies users before they try to share the content or if they shared it in the past.

Meta: Experts “have their own biases”

In the blog post that accompanied Zuckerberg’s video message, Chief Global Affairs Officer Joel Kaplan said the 2016 decision to use independent fact-checkers seemed like “the best and most reasonable choice at the time… The intention of the program was to have these independent experts give people more information about the things they see online, particularly viral hoaxes, so they were able to judge for themselves what they saw and read.”

But experts “have their own biases and perspectives,” and the program imposed “intrusive labels and reduced distribution” of content “that people would understand to be legitimate political speech and debate,” Kaplan wrote.

The X-style Community Notes system lets the community “decide when posts are potentially misleading and need more context, and people across a diverse range of perspectives decide what sort of context is helpful for other users to see… Just like they do on X, Community Notes [on Meta sites] will require agreement between people with a range of perspectives to help prevent biased ratings,” Kaplan wrote.

The end of third-party fact-checking will be implemented in the US before other countries. Meta will also move its internal trust and safety and content moderation teams out of California, Zuckerberg said. “Our US-based content review is going to be based in Texas. As we work to promote free expression, I think it will help us build trust to do this work in places where there is less concern about the bias of our teams,” he said. Meta will continue to take “legitimately bad stuff” like drugs, terrorism, and child exploitation “very seriously,” Zuckerberg said.

Zuckerberg pledges to work with Trump

Meta will “phase in a more comprehensive community notes system” over the next couple of months, Zuckerberg said. Meta, which donated $1 million to Trump’s inaugural fund, will also “work with President Trump to push back on governments around the world that are going after American companies and pushing to censor more,” Zuckerberg said.

Zuckerberg said that “Europe has an ever-increasing number of laws institutionalizing censorship,” that “Latin American countries have secret courts that can quietly order companies to take things down,” and that “China has censored apps from even working in the country.” Meta needs “the support of the US government” to push back against other countries’ content-restriction orders, he said.

“That’s why it’s been so difficult over the past four years when even the US government has pushed for censorship,” Zuckerberg said, referring to the Biden administration. “By going after US and other American companies, it has emboldened other governments to go even further. But now we have the opportunity to restore free expression, and I am excited to take it.”

Brendan Carr, Trump’s pick to lead the Federal Communications Commission, praised Meta’s policy changes. Carr has promised to shift the FCC’s focus from regulating telecom companies to cracking down on Big Tech and media companies that he alleges are part of a “censorship cartel.”

“President Trump’s resolute and strong support for the free speech rights of everyday Americans is already paying dividends,” Carr wrote on X today. “Facebook’s announcements is [sic] a good step in the right direction. I look forward to monitoring these developments and their implementation. The work continues until the censorship cartel is completely dismantled and destroyed.”

Group: Meta is “saying the truth doesn’t matter”

Meta’s changes were criticized by Public Citizen, a nonprofit advocacy group founded by Ralph Nader. “Asking users to fact-check themselves is tantamount to Meta saying the truth doesn’t matter,” Public Citizen co-president Lisa Gilbert said. “Misinformation will flow more freely with this policy change, as we cannot assume that corrections will be made when false information proliferates. The American people deserve accurate information about our elections, health risks, the environment, and much more.”

Media advocacy group Free Press said that “Zuckerberg is one of many billionaires who are cozying up to dangerous demagogues like Trump and pushing initiatives that favor their bottom lines at the expense of everything and everyone else.” Meta appears to be abandoning its “responsibility to protect its many users, and align[ing] the company more closely with an incoming president who’s a known enemy of accountability,” Free Press Senior Counsel Nora Benavidez said.

X’s Community Notes system was criticized in a recent report by the Center for Countering Digital Hate (CCDH), which said it “found that 74 percent of accurate community notes on US election misinformation never get shown to users.” (X previously sued the CCDH, but the lawsuit was dismissed by a federal judge.)

Previewing other changes, Zuckerberg said that Meta will eliminate content restrictions “that are just out of touch with mainstream discourse” and change how it enforces policies “to reduce the mistakes that account for the vast majority of censorship on our platforms.”

“We used to have filters that scanned for any policy violation. Now, we’re going to focus those filters on tackling illegal and high-severity violations, and for lower severity violations, we’re going to rely on someone reporting an issue before we take action,” he said. “The problem is the filters make mistakes, and they take down a lot of content that they shouldn’t. So by dialing them back, we’re going to dramatically reduce the amount of censorship on our platforms.”

Meta to relax filters, recommend more political content

Zuckerberg said Meta will re-tune content filters “to require much higher confidence before taking down content.” He said this means Meta will “catch less bad stuff” but will “also reduce the number of innocent people’s posts and accounts that we accidentally take down.”

Meta has “built a lot of complex systems to moderate content,” he noted. Even if these systems “accidentally censor just 1 percent of posts, that’s millions of people, and we’ve reached a point where it’s just too many mistakes and too much censorship,” he said.

Kaplan wrote that Meta has censored too much harmless content and that “too many people find themselves wrongly locked up in ‘Facebook jail.'”

“In recent years we’ve developed increasingly complex systems to manage content across our platforms, partly in response to societal and political pressure to moderate content,” Kaplan wrote. “This approach has gone too far. As well-intentioned as many of these efforts have been, they have expanded over time to the point where we are making too many mistakes, frustrating our users and too often getting in the way of the free expression we set out to enable.”

Another upcoming change is that Meta will recommend more political posts. “For a while, the community asked to see less politics because it was making people stressed, so we stopped recommending these posts,” Zuckerberg said. “But it feels like we’re in a new era now, and we’re starting to get feedback that people want to see this content again, so we’re going to start phasing this back into Facebook, Instagram, and Threads while working to keep the communities friendly and positive.”

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Meta axes third-party fact-checkers in time for second Trump term Read More »

apple-will-update-ios-notification-summaries-after-bbc-headline-mistake

Apple will update iOS notification summaries after BBC headline mistake

Nevertheless, it’s a serious problem when the summaries misrepresent news headlines, and edge cases where this occurs are unfortunately inevitable. Apple cannot simply fix these summaries with a software update. The only answers are either to help users understand the drawbacks of the technology so they can make better-informed judgments or to remove or disable the feature completely. Apple is apparently going for the former.

We’re oversimplifying a bit here, but generally, LLMs like those used for Apple’s notification summaries work by predicting portions of words based on what came before and are not capable of truly understanding the content they’re summarizing.

Further, these predictions are known to not be accurate all the time, with incorrect results occurring a few times per 100 or 1,000 outputs. As the models are trained and improvements are made, the error percentage may be reduced, but it never reaches zero when countless summaries are being produced every day.

Deploying this technology at scale without users (or even the BBC, it seems) really understanding how it works is risky at best, whether it’s with the iPhone’s summaries of news headlines in notifications or Google’s AI summaries at the top of search engine results pages. Even if the vast majority of summaries are perfectly accurate, there will always be some users who see inaccurate information.

These summaries are read by so many millions of people that the scale of errors will always be a problem, almost no matter how comparatively accurate the models get.

We wrote at length a few weeks ago about how the Apple Intelligence rollout seemed rushed, counter to Apple’s usual focus on quality and user experience. However, with current technology, there is no amount of refinement to this feature that Apple could have done to reach a zero percent error rate with these notification summaries.

We’ll see how well Apple does making its users understand that the summaries may be wrong, but making all iPhone users truly grok how and why the feature works this way would be a tall order.

Apple will update iOS notification summaries after BBC headline mistake Read More »

amd’s-new-laptop-cpu-lineup-is-a-mix-of-new-silicon-and-new-names-for-old-silicon

AMD’s new laptop CPU lineup is a mix of new silicon and new names for old silicon

AMD’s CES announcements include a tease about next-gen graphics cards, a new flagship desktop CPU, and a modest refresh of its processors for handheld gaming PCs. But the company’s largest announcement, by volume, is about laptop processors.

Today the company is expanding the Ryzen AI 300 lineup with a batch of updated high-end chips with up to 16 CPU cores and some midrange options for cheaper Copilot+ PCs. AMD has repackaged some of its high-end desktop chips for gaming laptops, including the first Ryzen laptop CPU with 3D V-Cache enabled. And there’s also a new-in-name-only Ryzen 200 series, another repackaging of familiar silicon to address lower-budget laptops.

Ryzen AI 300 is back, along with high-end Max and Max+ versions

Ryzen AI is back, with Max and Max+ versions that include huge integrated GPUs. Credit: AMD

We came away largely impressed by the initial Ryzen AI 300 processors in August 2024, and new processors being announced today expand the lineup upward and downward.

AMD is announcing the Ryzen AI 7 350 and Ryzen AI 5 340 today, along with identically specced Pro versions of the same chips with a handful of extra features for large businesses and other organizations.

Midrange Ryzen AI processors should expand Copilot+ features into somewhat cheaper x86 PCs.

Credit: AMD

The 350 includes eight CPU cores split evenly between large Zen 5 cores and smaller, slower but more efficient Zen 5C cores, plus a Radeon 860M with eight integrated graphics cores (down from a peak of 16 for the Ryzen AI 9). The 340 has six CPU cores, again split evenly between Zen 5 and Zen 5C, and a Radeon 840M with four graphics cores. But both have the same 50 TOPS NPUs as the higher-end Ryzen AI chips, qualifying both for the Copilot+ label.

For consumers, AMD is launching three high-end chips across the new “Ryzen AI Max+” and “Ryzen AI Max” families. Compared to the existing Strix Point-based Ryzen AI processors, Ryzen AI Max+ and Max include more CPU cores, and all of their cores are higher-performing Zen 5 cores, with no Zen 5C cores mixed in. The integrated graphics also get significantly more powerful, with as many as 40 cores built in—these chips seem to be destined for larger thin-and-light systems that could benefit from more power but don’t want to make room for a dedicated GPU.

AMD’s new laptop CPU lineup is a mix of new silicon and new names for old silicon Read More »