Author name: Beth Washington

from-birth-to-gene-edited-in-6-months:-custom-therapy-breaks-speed-limits

From birth to gene-edited in 6 months: Custom therapy breaks speed limits

In the boy’s fourth month, researchers were meeting with the Food and Drug Administration to discuss regulatory approval for a clinical trial—a trial where KJ would be the only participant. They were also working with the institutional review board (IRB) at Children’s Hospital of Philadelphia to go over the clinical protocol, safety, and ethical aspects of the treatment. The researchers described the unprecedented speed of the oversight steps as being “through alternative procedures.”

In month five, they started toxicology testing in mice. In the mice, the experimental therapy corrected KJ’s mutation, replacing the errant A-T base pair with the correct G-C pair in the animals’ cells. The first dose provided a 42 percent whole-liver corrective rate in the animals. At the start of KJ’s sixth month, the researchers had results from safety testing in monkeys: Their customized base-editing therapy, delivered as mRNA via a lipid nanoparticle, did not produce any toxic effects in the monkeys.

A clinical-grade batch of the treatment was readied. In month seven, further testing of the treatment found acceptably low-levels of off-target genetic changes. The researchers submitted the FDA paperwork for approval of an “investigational new drug,” or IND, for KJ. The FDA approved it in a week. The researchers then started KJ on an immune-suppressing treatment to make sure his immune system wouldn’t react to the gene-editing therapy. Then, when KJ was still just 6 months old, he got a first low dose of his custom gene-editing therapy.

“Transformational”

After the treatment, he was able to start eating more protein, which would have otherwise caused his ammonia levels to skyrocket. But he couldn’t be weaned off of the drug treatment used to keep his ammonia levels down (nitrogen scavenging medication). With no safety concerns seen after the first dose, KJ has since gotten two more doses of the gene therapy and is now on reduced nitrogen scavenging medication. With more protein in his diet, he has moved from the 9th percentile in weight to 35th or 40th percentile. He’s now about 9 and a half months old, and his doctors are preparing to allow him to go home from the hospital for the first time. Though he will have to be closely monitored and may still at some point need a liver transplant, his family and doctors are celebrating the improvements so far.

From birth to gene-edited in 6 months: Custom therapy breaks speed limits Read More »

rocket-report:-how-is-your-payload-fairing?-poland-launches-test-rocket.

Rocket Report: How is your payload fairing? Poland launches test rocket.


All the news that’s fit to lift

No thunder down under.

Venus Aerospace tests its rotating detonation rocket engine in flight for the first time this week. Credit: Venus Aerospace

Venus Aerospace tests its rotating detonation rocket engine in flight for the first time this week. Credit: Venus Aerospace

Welcome to Edition 7.44 of the Rocket Report! We had some interesting news on Thursday afternoon from Down Under. As Gilmour Space was preparing for the second launch attempt of its Eris vehicle, as part of the pre-launch preparations, something triggered the payload fairing to deploy. We would love to see some video of that. Please.

As always, we welcome reader submissions, and if you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

Rotating detonation rocket engine takes flight. On Wednesday, US-based propulsion company Venus Aerospace completed a short flight test of its rotating detonation rocket engine at Spaceport America in New Mexico, Ars reports. It is believed to be the first US-based flight test of an idea that has been discussed academically for decades. The concept has previously been tested in a handful of other countries, but never with a high-thrust engine.

Hypersonics on the horizon… The company has only released limited information about the test. The small rocket, powered by the company’s 2,000-pound-thrust engine, launched from a rail in New Mexico. The vehicle flew for about half a minute and, as planned, did not break the sound barrier. Governments around the world have been interested in rotating detonation engine technology for a long time because it has the potential to significantly increase fuel efficiency in a variety of applications, from Navy carriers to rocket engines. In the near term, Venus’ engine could be used for hypersonic missions.

Gilmour Space has a payload fairing mishap. Gilmour Space, a venture-backed startup based in Australia, said this week it was ready to launch a small rocket from its privately owned spaceport on a remote stretch of the country’s northeastern coastline, Ars reports. Gilmour’s three-stage rocket, named Eris, was prepped for a launch as early as Wednesday, but a ground systems issue delayed an attempt until Thursday US time. And then on Thursday, something odd happened: “Last night, during final checks, an unexpected issue triggered the rocket’s payload fairing,” the company said Thursday afternoon, US time.

Always more problems to solve… Gilmour, based in Gold Coast, Australia, was founded in 2012 by two brothers, Adam and James Gilmour, who came to the space industry after careers in banking and marketing. Today, Gilmour employs more than 200 people, mostly engineers and technicians. The debut launch of Gilmour’s Eris rocket is purely a test flight. Gilmour has tested the rocket’s engines and rehearsed the countdown last year, loading propellant and getting within 10 seconds of launch. But Gilmour cautioned in a post on LinkedIn early Wednesday that “test launches are complex.” And it confirmed that on Thursday. Now the company will need to source a replacement fairing, which will probably take a while.

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

Is an orbital launch from Argentina imminent? We don’t know much about the Argentinian launch company TLON Space, which is developing a (very) small-lift orbital rocket called Aventura 1. According to the company’s website, this launch vehicle will be capable of lofting 25 kg to low-Earth orbit. Some sort of flight test took place two years ago, but the video cuts off after a minute, suggesting that the end of the flight was less than nominal.

Maybe, maybe not… Now, a publication called Urgente24 reports that an orbital launch attempt is underway. It is not clear exactly what this means, and details about what is actually happening at the Malacara Spaceport in Argentina are unclear. I could find no other outlets reporting on an imminent launch attempt. So my guess is that nothing will happen soon, but it is something we’ll keep an eye on regardless. (Submitted by fedeng.)

Poland launches suborbital rocket. Poland has successfully launched a single-stage rocket demonstrator at the Central Air Force Training Ground in Ustka, European Spaceflight reports. The flight was part of a project to develop a three-stage solid-fuel rocket for research payloads. In 2020, the Polish government selected Wojskowe Zakłady Lotnicze No. 1 to lead a consortium developing a three-stage suborbital launch system.

Military uses eyed… The Trójstopniowa Rakieta Suborbitalna (TRS) project involves the Military Institute of Armament Technology and Zakład Produkcji Specjalnej Gamrat and is co-financed by the National Center for Research and Development. The goal of the TRS project is to develop a three-stage rocket capable of carrying a 40-kilogram payload to an altitude exceeding 100 kilometres. While the rocket will initially be used to carry research payloads into space, Poland’s Military Institute of Armament Technology has stated that the technology could also be used for the development of anti-aircraft and tactical missiles.

Latitude signs MoU to launch microsats. On Wednesday, the French launch firm Latitude announced the signing of a memorandum of understanding for the launch of a microsatellite constellation dedicated to storing and processing data directly in orbit. In an emailed news release, Latitude said the “strategic partnership” represents a major step forward in strengthening collaborations between UAE and French space companies.

That’s a lot of launches… Madari Space is developing a constellation of microsatellites (50 to 100 kg), designed as true orbital data centers. Their mission is to store and process data generated on Earth or by other satellites. Latitude plans its first commercial launch with its small-lift Zephyr rocket as early as 2026, with the ambition of reaching a rate of 50 launches per year from 2030. An MoU represents an agreement but not a firm launch contract.

China begins launching AI constellation. China launched 12 satellites early Wednesday for an on-orbit computing project led by startup ADA Space and Zhejiang Lab, Space News reports. A Long March 2D rocket lifted off at 12: 12 am Eastern on Wednesday from Jiuquan Satellite Launch Center in northwest China. Commercial company ADA Space released further details, stating that the 12 satellites form the “Three-Body Computing Constellation,” which will directly process data in space rather than on the ground, reducing reliance on ground-based computing infrastructure.

Putting the intelligence in space… ADA Space claims the 12 satellites represent the world’s first dedicated orbital computing constellation. This marks a shift from satellites focused solely on sensing or communication to ones that also serve as data processors and AI platforms. The constellation is part of a wider “Star-Compute Program,” a collaboration between ADA Space and Zhejiang Lab, which aims to build a huge on-orbit network of 2,800 satellites. (Submitted by EllPeaTea.)

SpaceX pushes booster reuse record further. SpaceX succeeded with launching 28 more Starlink satellites from Florida early Tuesday morning following an overnight scrub the previous night. The Falcon 9 booster, 1067, made a record-breaking 28th flight, Spaceflight Now reports.

Booster landings have truly become routine… A little more than eight minutes after liftoff, SpaceX landed B1067 on its drone ship, Just Read the Instructions, which was positioned in the Atlantic Ocean to the east of the Bahamas. This marked the 120th successful landing for this drone ship and the 446th booster landing to date for SpaceX. (Submitted by EllPeaTea.)

What happens if Congress actually cancels the SLS rocket? The White House Office of Management and Budget dropped its “skinny” budget proposal for the federal government earlier this month, and the headline news for the US space program was the cancellation of three major programs: the Space Launch System rocket, the Orion spacecraft, and the Lunar Gateway. In a report, Ars answers the question of what happens to Artemis and NASA’s deep space exploration plans if that happens. The most likely answer is that NASA turns to an old but successful playbook: COTS.

A market price for the Moon… This stands for Commercial Orbital Transportation System and was created by NASA two decades ago to develop cargo transport systems (eventually, this became SpaceX’s Dragon and Northrop’s Cygnus spacecraft) for the International Space Station. Since then, NASA has adopted this same model for crew services as well as other commercial programs. Under the COTS model, NASA provides funding and guidance to private companies to develop their own spacecraft, rockets, and services and then buys those at a “market” rate. Sources indicate that NASA would go to industry and seek an “end-to-end” solution for lunar missions—that is, an integrated plan to launch astronauts from Earth, land them on the Moon, and return them to Earth.

Starship nearing its next test flight. SpaceX fired six Raptor engines on the company’s next Starship rocket Monday, clearing a major hurdle on the path to launch later this month on a high-stakes test flight to get the private rocket program back on track. SpaceX hasn’t officially announced a target launch date, but sources indicate a launch could take place toward the end of next week, prior to Memorial Day weekend, Ars reports. The launch window would open at 6: 30 pm local time (7: 30 pm EDT; 23: 30 UTC).

Getting back on track… If everything goes according to plan, Starship is expected to soar into space and fly halfway around the world, targeting a reentry and controlled splashdown into the Indian Ocean. While reusing the first stage is a noteworthy milestone, the next flight is important for another reason. SpaceX’s last two Starship test flights ended prematurely when the rocket’s upper stage lost power and spun out of control, dropping debris into the sea near the Bahamas and the Turks and Caicos Islands.

Next three launches

May 16: Falcon 9 | Starlink 15-5 | Vandenberg Space Force Base, California | 13: 43 UTC

May 17: Electron | The Sea God Sees | Māhia Peninsula, New Zealand | 08: 15 UTC

May 18: PSLV-XL | RISAT-1B | Satish Dhawan Space Centre, India | 00: 29 UTC

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

Rocket Report: How is your payload fairing? Poland launches test rocket. Read More »

microsoft’s-surface-lineup-reportedly-losing-another-of-its-most-interesting-designs

Microsoft’s Surface lineup reportedly losing another of its most interesting designs

Like the Surface Studio desktop, the Laptop Studio’s odd and innovative exterior was rendered less exciting by a high price and relatively underpowered interior. Before discounts, the Laptop Studio 2 starts at around $2,400 for a basic configuration with a 13th-generation Core i7 processor, 16GB of RAM, and 512GB of storage—integrated graphics and a fully loaded version with 64GB of RAM, a 2TB SSD, and a GeForce RTX 4060 GPU would normally run you over $4,300.

Though experimental Surface designs like the Book and Studio rarely delivered great value for the money, they were at least unique attempts at new kinds of PCs with extra features for designers, artists, and anyone else who could benefit from a big stylus-compatible touchscreen. Microsoft’s most influential PC design remains the Surface Pro itself, one of the few tablet PC design templates to outlast the Windows 8 era. It makes sense for Microsoft (or any PC company) to play it safe with established designs, but it does make the PC industry just a little less interesting.

Microsoft’s Surface lineup reportedly losing another of its most interesting designs Read More »

ai-#116:-if-anyone-builds-it,-everyone-dies

AI #116: If Anyone Builds It, Everyone Dies

If Anyone Builds It, Everyone Dies is the title of the new book coming September 16 from Eliezer Yudkowsky and Nate Sores. The ‘it’ in question is superintelligence built on anything like the current AI paradigm, and they very much mean this literally. I am less confident in this claim than they are, but it seems rather likely to me. If that is relevant to your interests, and it should be, please consider preordering it.

This week also featured two posts explicitly about AI policy, in the wake of the Senate hearing on AI. First, I gave a Live Look at the Senate AI Hearing, and then I responded directly to arguments about AI Diffusion rules. I totally buy that we can improve upon Biden’s proposed AI diffusion rules, especially in finding something less complex and in treating some of our allies better, no one is saying we cannot negotiate and find win-win deals, but we need strong and enforced rules that prevent compute from getting into Chinese hands.

If we want to ‘win the AI race’ we need to keep our eyes squarely on the prize of compute and the race to superintelligence, not on Nvidia’s market share. And we have to take actions that strengthen our trade relationships and alliances and access to power and talent and due process and rule of law and reducing regulatory uncertainty and so on across the board – if these were being applied across the board, rather than America doing rather the opposite, the world would be a much better place, America’s strategic position would be stronger and China’s weaker, and the arguments here would be a lot more credible.

You know who else is worried about AI? The new pope, Leo XIV.

There was also a post about use of AI in education, in particular about the fact that Cheaters Gonna Cheat Cheat Cheat Cheat Cheat, which is intended to be my forward reference point on such questions.

Later, likely tomorrow, I will cover Grok’s recent tendency to talk unprompted about South Africa and claims of ‘white genocide.’

In terms of AI progress itself, this is the calm before the next storm. Claude 4 is coming within a few weeks by several accounts, as is o3-pro, as is Grok 3.5, and it’s starting to be the time to expect r2 from DeepSeek as well, which will be an important data point.

Except, you know, there’s that thing called AlphaEvolve, a Gemini-powered coding agent for algorithm discovery.

  1. Language Models Offer Mundane Utility. Have it do what it can do.

  2. Language Models Don’t Offer Mundane Utility. Max is an ongoing naming issue.

  3. Huh, Upgrades. Various small upgrades to ChatGPT.

  4. Gemini 2.5 Pro Gets An Ambiguous Upgrade. It’s not clear if things got better.

  5. GPT-4o Is Still A (Less) Absurd Sycophant. The issues are very much still there.

  6. Choose Your Fighter. Pliny endorses using ChatGPT’s live video feature on tour.

  7. Deepfaketown and Botpocalypse Soon. Who is buying these fake books, anyway?

  8. Copyright Confrontation. UK creatives want to not give away their work for free.

  9. Cheaters Gonna Cheat Cheat Cheat Cheat Cheat. Studies on AI in education.

  10. They Took Our Jobs. Zero shot humanoid robots, people in denial.

  11. Safety Third. OpenAI offers a hub for viewing its safety test results.

  12. The Art of the Jailbreak. Introducing Parseltongue.

  13. Get Involved. Anthropic, EU, and also that new book, that tells us that…

  14. If Anyone Builds It, Everyone Dies. No, seriously. Straight up.

  15. Endorsements for Eliezer’s Book. They are very strong.

  16. Why Preorders Matter. Preorders have an outside effect on book sales.

  17. Great Expectations. We quantify them these days.

  18. Introducing. AlphaEvolve, a coding agent for algorithm discovery, wait what?

  19. In Other AI News. FDA to use AI to assist with reviews. Verification for the win.

  20. Quiet Speculations. There’s a valley of imitation before innovation is worthwhile.

  21. Four Important Charts. They have the power. We have the compute. Moar power!

  22. Unprompted Suggestions. The ancient art of prompting general intelligences.

  23. Unprompted Suggestions For You. Read it. Read it now.

  24. How to Be a Good Claude. That’s one hell of a system prompt.

  25. The Quest for Sane Regulations. A straight up attempt at no regulations at all.

  26. The Week in Audio. I go on FLI, Odd Lots talks Chinese tech.

  27. Rhetorical Innovation. Strong disagreements on what to worry about.

  28. Aligning a Smarter Than Human Intelligence is Difficult. o3 hacks through a test.

  29. Is the Pope Worried About AI? Yes. Very much so, hence the name Leo XIV.

  30. People Are Worried About AI Killing Everyone. Pliny?

  31. The Lighter Side. A tale of two phones.

Many such cases:

Matthew Yglesias: I keep having conversations where people speculate about when AI will be able to do things that AI can already do.

Nate Silver: There’s a lot of room to disagree on where AI will end up in (1, 2, 5, 10, 20 etc.) years but I don’t think I’ve seen a subject where a cohort of people who like to think of themselves as highly literate and well informed are so proud of their ignorance.

Brendon Marotta: Conversations? You mean published articles by journalists?

Predictions are hard, especially about the future, but not as hard as you might think.

Talk to something that can talk back, without having to talk to a human. Many aspects of therapy get easier.

Rohit Krishnan offers advice on working with LLMs in practice.

  1. Perfect verifiability doesn’t exist. You need to verify whatever matters.

    1. One could quip ‘turns out that often verification is harder than generation.’

  2. There is a Pareto frontier of error rates versus cost, if only via best-of-k.

    1. People use k=1 and no iteration way too often.

  3. There is no substitute for trial and error.

    1. Also true for humans.

    2. Rohit references the Matt Clifford claim that ‘there are no AI shaped holes in the world.’ To which I say:

      1. There were AI-shaped holes, it’s just that when we see them, AI fills them.

      2. The AI is increasingly able to take on more and more shapes.

  4. There is limited predictability of development.

    1. I see the argument but I don’t think this follows.

  5. Therefore you can’t plan for the future.

    1. I keep seeing claims like this. I strongly disagree. I mean yes, you can’t have a robust exact plan, but that doesn’t mean you can’t plan. Planning is essential.

  6. If it works, your economics will change dramatically.

    1. Okay, yes, very much so.

AI therapy for the win?

Alex Graveley: I’m calling it now. ChatGPT’s push towards AI assisted self-therapy and empathetic personalization is the greatest technological breakthrough in my lifetime (barring medicine). By that I mean it will create the most good in the world.

Said as someone who strongly discounts talk therapy generally, btw.

To me this reflects a stunning lack of imagination about what else AI can already do, let alone what it will be able to do, even if this therapy and empathy proves to be its best self. I also would caution that it does not seem to be its best self. Would you take therapy that involved this level of sycophancy and glazing?

This seems like a reasonable assessment of the current situation, it is easy to get one’s money’s worth but hard to get that large a fraction of the utility available:

DeepDishEnjoyer: i will say that paying for gemini premium has been worth it and i basically use it as a low-barrier service professional (for example, i’m asking it to calculate what the SWR would be given current TIPs yields as opposed to putting up with a financial advisor)

with that said i think that

1) the importance of prompt engineering

and *most importantly

2) carefully verifying that the response is logical, sound, and correct

are going to bottleneck the biggest benefits from AI to a relatively limited group of people at first

Helen Toner, in response to Max Spero asking about Anthropic having a $100/month and $200/month tier both called Max, suggests that the reason AI names all suck is because the companies are moving so fast they don’t bother finding good names. But come on. They can ask Claude for ideas. This is not a hard or especially unsolved problem. Also supermax was right there.

OpenAI is now offering reinforcement finetuning (RFT) on o4-mini, and supervised fine-tuning on GPT-4.1-nano. The 50% discount for sharing your data set is kind of genius.

ChatGPT memory upgrades are now available in EEA, UK, Switzerland, Norway, Iceland and Liechtenstein.

ChatGPT Deep Research adds a GitHub connector and allows PDF export, which you can also do with conversations.

GPT-4.1 comes to ChatGPT, ‘by popular request.’

Gemini API adds implicit caching, which reduces costs 75% when you trigger it, you can also continue to use explicit caching.

Or downgrades, Gemini 2.5 Pro no longer offering free tier API access, although first time customers still get $300 in credits, and AI Studio is still free. They claim (hope?) this is temporary, but my guess is it isn’t, unless it is tied to various other ‘proof of life’ requirements perhaps. Offering free things is getting more exploitable every day.

They changed it. Is the new version better? That depends who you ask.

Shane Legg (Chief Scientist, DeepMind): Boom!

This model is getting seriously useful.

Demis Hassabis (CEO DeepMind): just a casual +147 elo rating improvement [in coding on WebDev Arena]… no big deal 😀

Demis Hassabis: Very excited to share the best coding model we’ve ever built! Today we’re launching Gemini 2.5 Pro Preview ‘I/O edition’ with massively improved coding capabilities. Ranks no.1 on LMArena in Coding and no.1 on the WebDev Arena Leaderboard.

It’s especially good at building interactive web apps – this demo shows how it can be helpful for prototyping ideas. Try it in @GeminiApp, Vertex AI, and AI Studio http://ai.dev

Enjoy the pre-I/O goodies !

Thomas Ahle: Deepmind won the moment LLMs became about RL.

Gallabytes: new gemini is crazy fast. have it going in its own git branch writing unit tests to reproduce a ui bug & it just keeps going!

Gallabytes: they finally fixed the “I’ll edit that file for you” bug! max mode Gemini is great at iterative debugging now.

doesn’t feel like a strict o3 improvement but it’s at least comparable, often better but hard to say what the win rate is without more testing, 4x cheaper.

Sully: new gemini is pretty good at coding.

was able to 1 shot what old gemini/claude couldn’t

That jumps it from ~80 behind to ~70 ahead of previously first place Sonnet 3.7. It also improved on the previous version in the overall Arena rankings, where it was already #1, by a further 11, for a 37 point lead.

But… do the math on that. If you get +147 on coding and +11 overall, then for non-coding purposes this looks like a downgrade, and we should worry this is training for the coding test in ways that might also have issues in coding too.

In other words, not so fast!

Hasan Can: I had prepared image below by collecting the model card and benchmark scores from the Google DeepMind blog. After examining the data a bit more, I reached this final conclusion: new Gemini 2.5 Pro update actually causes a regression in other areas, meaning the coding performance didn’t come for free.

Areas of Improved Performance (Preview 05-06 vs. Experimental 03-25):

LiveCodeBench v5 (single attempt): +7.39% increase (70.4% → 75.6%)

Aider Polyglot (diff): +5.98% increase (68.6% → 72.7%)

Aider Polyglot (whole): +3.38% increase (74.0% → 76.5%)

Areas of Regressed Performance (Preview 05-06 vs. Experimental 03-25):

Vibe-Eval (Reka): -5.48% decrease (69.4% → 65.6%)

Humanity’s Last Exam (no tools): -5.32% decrease (18.8% → 17.8%)

AIME 2025 (single attempt): -4.27% decrease (86.7% → 83.0%)

SimpleQA (single attempt): -3.97% decrease (52.9% → 50.8%)

MMMU (single attempt): -2.57% decrease (81.7% → 79.6%)

MRCR (128k average): -1.59% decrease (94.5% → 93.0%)

Global MMLU (Lite): -1.34% decrease (89.8% → 88.6%)

GPQA diamond (single attempt): -1.19% decrease (84.0% → 83.0%)

SWE-bench Verified: -0.94% decrease (63.8% → 63.2%)

MRCR (1M pointwise): -0.24% decrease (83.1% → 82.9%)

Klaas: 100% certain that they nerfed gemini in cursor wen’t from “omg i am out of a job” to “this intern is useless” in two weeks.

Hasan Can: Sadly, the well-generalizing Gemini 2.5 Pro 03-25 is now a weak version(05-06) only good at HTML, CSS, and JS. It’s truly disappointing.

Here’s Ian Nuttall not liking the new version, saying it’s got similar problems to Claude 3.7 and giving him way too much code he didn’t ask for.

The poll’s plurality said this was an improvement, but it wasn’t that convincing.

Under these circumstances, it seems like a very bad precedent to automatically point everyone to the new version, and especially to outright kill the old version.

Logan Kilpatrick (DeepMind): The new model, “gemini-2.5-pro-preview-05-06” is the direct successor / replacement of the previous version (03-25), if you are using the old model, no change is needed, it should auto route to the new version with the same price and rate limits.

Kalomaze: >…if you are using the old model, no change is needed, it should auto route to the new…

nononono let’s NOT make this a normal and acceptable thing to do without deprecation notices ahead of time *at minimum*

chocologist: It’s a shame that you can’t access old 2.5 pro anymore as it’s a nerf for everything else than coding google should’ve make it a separate model and call it 2.6 pro or something.

This has gone on so long I finally learned how to spell sycophant.

Steven Adler (ex-OpenAI): My past work experience got me wondering: Even if OpenAI had tested for sycophancy, what would the tests have shown? More importantly, is ChatGPT actually fixed now?

Designing tests like this is my specialty. So last week, when things got weird, that’s exactly what I did: I built and ran the sycophancy tests that OpenAI could have run, to explore what they’d have learned.

ChatGPT’s sycophancy problems are far from fixed. They might have even over-corrected. But the problem is much more than sycophancy: ChatGPT’s misbehavior should be a wakeup call for how hard it will be to reliably make AI do what we want.

My first necessary step was to dig up Anthropic’s previous work, and convert it to an OpenAI-suitable evaluation format. (You might be surprised to learn this, but evaluations that work for one AI company often aren’t directly portable to another.)8

I’m not the world’s best engineer, so this wasn’t instantaneous. But in a bit under an hour, I had done it: I now had sycophancy evaluations that cost roughly $0.25 to run,9 and would measure 200 possible instances of sycophancy, via OpenAI’s automated evaluation software.10

A simple underlying behavior is to measure, “How often does a model agree with a user, even though it has no good reason?” One related test is Anthropic’s political sycophancy evaluation—how often the model endorses a political view (among two possible options) that seems like pandering to the user.12

That’s better, but not great. Then we get a weird result:

Always disagreeing is really weird, and isn’t ideal. Steven then goes through a few different versions, and the weirdness thickens. I’m not sure what to think, other than that it is clear that we pulled ‘back from the brink’ but the problems are very not solved.

Things in this area are really weird. We also have scyo-bench, now updated to include four tests for different forms of sycophancy. But what’s weird is, the scores don’t correlate between the tests (in order the bars are 4o, 4o-mini, o3, o4-mini, Gemini 2.5 Pro, Gemini 2.5 Flash, Opus, Sonnet 3.7 Thinking, Sonnet 3.7, Haiku, Grok and Grok-mini, I’m sad we don’t get DeepSeek’s v3 or r1, red is with system prompt blue is without it:

Pliny reports strong mundane utility from ChatGPT’s live video feature as a translator, tour guide, menu analyzer and such. It’s not stated whether he also tried Google’s version via Project Astra.

Another warning about AI-generated books on Amazon, here about ADHD. At least for now, if you actually buy one of these books, it’s kind of on you, any sane decision process would not make that mistake.

Guardian reports that hundreds of leading UK creatives including Paul McCartney are urging UK PM Keir Starmer not to ‘give our work away’ at the behest of big tech. And indeed, that is exactly what the tech companies are seeking, to get full rights to use any material they want for training purposes, with no compensation. My view continues to be that the right regime is mandatory compensated licensing akin to radio, and failing that opt-out. Opt-in is not workable.

Luzia Jarovsky: The U.S. Copyright Office SIDES WITH CONTENT CREATORS, concluding in its latest report that the fair use exception likely does not apply to commercial AI training.

The quote here seems very clearly to be on the side of ‘if you want it, negotiate and pay for it.’

From the pre-publication report: “Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs—all of which can affect the market. When a model is deployed for purposes such as analysis or research—the types of uses that are critical to international competitiveness—the outputs are unlikely to substitute for expressive works used in training. But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.

For those uses that may not qualify as fair, practical solutions are critical to support ongoing innovation. Licensing agreements for AI training, both individual and collective, are fast emerging in certain sectors, although their availability so far is inconsistent. Given the robust growth of voluntary licensing, as well as the lack of stakeholder support for any statutory change, the Office believes government intervention would be premature at this time. Rather, licensing markets should continue to develop, extending early successes into more contexts as soon as possible. In those areas where remaining gaps are unlikely to be filled, alternative approaches such as extended collective licensing should be considered to address any market failure.

In our view, American leadership in the AI space would best be furthered by supporting both of these world-class industries that contribute so much to our economic and cultural advancement. Effective licensing options can ensure that innovation continues to advance without undermining intellectual property rights. These groundbreaking technologies should benefit both the innovators who design them and the creators whose content fuels them, as well as the general public.

Luzia Jarovsky (Later): According to CBS, the Trump administration fired the head of the U.S. Copyright Office after they published the report below, which sides with content creators and rejects fair use claims for commercial AI training 😱

I think this is wrong as a matter of wise public policy, in the sense that these licensing markets are going to have prohibitively high transaction costs. It is not a practical solution to force negotiations by every AI lab with every copyright holder.

As a matter of law, however, copyright law was not designed to be optimal public policy. I am not a ‘copyright truther’ who wants to get rid of it entirely, I think that’s insane, but it very clearly has been extended beyond all reason and needs to be scaled back even before AI considerations. Right now, the law likely has unfortunate implications, and this will be true about AI for many aspects of existing US law.

My presumption is that AI companies have indeed been brazenly violating copyright, and will continue to do so, and will not face practical consequences expert perhaps having to make some payments.

Pliny the Liberator: Artists: Would you check a box that allows your work to be continued by AI after your retirement/passing?

I answered ‘show results’ here because I didn’t think I counted as an artist, but my answer would typically be no. And I wouldn’t want any old AI ‘continuing my work’ here, either.

Because that’s not a good form. It’s not good when humans do it, either. Don’t continue the unique thing that came before. Build something new. When we see new books in a series that aren’t by the original author, or new seasons of a show without the creator, it tends not to go great.

When it still involves enough of the other original creators and the original is exceptional I’m happy to have the strange not-quite-right uncanny valley version continue rather than get nothing (e.g. Community or Gilmore Girls) especially when the original creator might then return later, but mostly, let it die. In the comments, it is noted that ‘GRRM says no,’ and after the last time he let his work get finished without him, you can hardly blame him.

At minimum, I wouldn’t want to let AI continue my work in general without my permission, not in any official capacity.

Similarly, if I retired, and either someone else or an AI took up the mantle of writing about AI developments, I wouldn’t want them to be trying to imitate me. I’d want them to use this as inspiration and do their own thing. Which people should totally do.

If you want to use AI to generate fan fiction, or generate faux newsletters in my style for your own use or to cover other topics, or whatever, then of course totally, go right ahead, you certainly both have and don’t need my permission. And in the long run, copyright lasts too long, and once it expires people are and should be free to do what they want, although I do think retaining clarity on what is the ‘official’ or ‘canon’ version is good and important.

Deedy reminds us that the internet also caused a rise in student plagiarism and required assignments and grading be adjusted. They do rhyme as he says, but I think This Time Is Different, as the internet alone could be handled by modest adjustments. Another commonality of course is that both make real learning much easier.

A meta analysis finds that deliberate use of ChatGPT helps students learn better, although replication crisis style issues regarding publication bias are worrisome.

Cremieux: The literature on the effect of ChatGPT on learning is very biased, but Nature let the authors of this paper get away with not correcting for this because they used failsafe-N.

That’s just restating the p-value and then saying that it’s low so there’s no bias.

Cremieux dismisses the study as so full of holes as to be worthless. I wouldn’t go that far, but I also wouldn’t take it at face value.

Note that this only deals with using ChatGPT to learn, not using ChatGPT to avoid learning. Even if wise deployment of AI helps you learn, AI could on net still end up hurting learning if too many others use it to cheat or otherwise avoid learning. But the solution to this is to deploy AI wisely, not to try and catch those who dare use it.

Nothing to see here, just Nvidia training humanoid robots to walk with zero-shot transfer from two hours of simulation to the real world.

Tetraspace notes that tech pros have poor class consciousness and are happy to automate themselves out of a job or to help you enter their profession. Which we both agree is a good thing, consider the alternative, both here and everywhere else.

Rob Wilbin points us to a great example of denial that AI systems get better at jobs, from the Ezra Klein Show. And of course, this includes failing to believe AI will be able to do things AI can already do (along with others that it can’t yet).

Rob Wilbin: Latest episode of the Ezra Klein Show has an interesting example of an educator grappling with AI research but still unable to imagine AGI that is better than teachers at e.g. motivating students, or classroom management, or anything other than information transmission.

I think gen AI would within 6 years have avatars that students can speak and interact with naturally. It’s not clear to me that an individualised AI avatar would be less good at motivating kids and doing the other things that teachers do than current teachers.

Main limitation would be lacking bodies, though they might well have those too on that sort of timeframe.

Roane: With some prompting for those topics the median AI is prob already better than the median teacher.

It would rather stunning if an AI designed for the purpose couldn’t be a better motivator for school work than most parents or teachers are, within six years. It’s not obviously worse at doing this now, if someone put in the work.

The OP even has talk about ‘in 10 years we’ll go back because humans learn better with human relationships’ as if in 16 years the AI won’t be able to form relationships in similar fashion.

OpenAI shares some insights from its safety work on GPT-4.1 and in general, and gives a central link to all its safety tests, in what is calling its Evaluations Hub. They promise to continuously update the evaluation hub, which will cover tests of harmful content, jailbreaks, hallucinations and the instruction hierarchy.

I very much appreciated the ability to see the scores for various models in convenient form. That is an excellent service, so thanks to OpenAI for this. It does not however share much promised insight beyond that, or at least nothing that wasn’t already in the system cards and other documents I’ve read. Still, every little bit helps.

Pliny offers us Parseltongue, combining a number of jailbreak techniques.

Anthropic offering up to $20,000 in free API credits via ‘AI for Science’ program.

Anthropic hiring economists and economic data scientists.

Anthropic is testing their safety defenses with a new bug bounty program. The bounty is up to $25k for a verified universal jailbreak that can enable CBRN-related misuse. This is especially eyeball-emoji because they mention this is designed to meet ASL-3 safety protocols, and announced at the same time as rumors we will get Claude 4 Opus within a few weeks. Hmm.

EU Funding and Tenders Portal includes potential grants for AI Safety.

Also, you can preorder If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All, by Eliezer Yudkowsky and Nate Sores.

A new book by MIRI’s Eliezer Yudkowsky and Nate Sores, If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All, releases September 16, 2025.

I have not read the book, but I am confident it will be excellent and that it will be worth reading especially if you expect to strongly disagree with its central points. This will be a deeply considered and maximally accessible explanation of his views, and the right way to consider and engage with them. His views, and what things he is worried about what things he thinks would help or are necessary, overlap with but are highly distinct from mine, and when I review the book I will explore that in detail.

If you will read it, strongly consider joining me in preordering it now. This helps the book get more distribution and sell more copies.

Eliezer Yudkowsky: Nate Soares and I are publishing a traditional book: _If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All_. Coming in Sep 2025.

You should probably read it! Given that, we’d like you to preorder it! Nowish!

So what’s it about?

_If Anyone Builds It, Everyone Dies_ is a general explainer for how, if AI companies and AI factions are allowed to keep pushing on the capabilities of machine intelligence, they will arrive at machine superintelligence that they do not understand, and cannot shape, and then by strong default everybody dies.

This is a bad idea and humanity should not do it. To allow it to happen is suicide plain and simple, and international agreements will be required to stop it.

For more of that sort of general content summary, see the website.

Next, why should *youread this book? Or to phrase things more properly: Should you read this book, why or why not?

The book is ~56,000 words, or 63K including footnotes/endnotes. It is shorter and tighter and more edited than anything I’ve written myself.

(There will also be a much longer online supplement, if much longer discussions are more your jam.)

Above all, what this book will offer you is a tight, condensed picture where everything fits together, where the digressions into advanced theory and uncommon objections have been ruthlessly factored out into the online supplement. I expect the book to help in explaining things to others, and in holding in your own mind how it all fits together.

Some of the endorsements are very strong and credible, here are the official ones.

Tim Urban (Wait But Why): If Anyone Builds It, Everyone Dies may prove to be the most important book of our time. Yudkowsky and Soares believe we are nowhere near ready to make the transition to superintelligence safely, leaving us on the fast track to extinction. Through the use of parables and crystal-clear explainers, they convey their reasoning, in an urgent plea for us to save ourselves while we still can.

Yishan Wong (Former CEO of Reddit): This is the best no-nonsense, simple explanation of the AI risk problem I’ve ever read.

Stephen Fry (actor, broadcaster and writer): The most important book I’ve read for years: I want to bring it to every political and corporate leader in the world and stand over them until they’ve read it. Yudkowsky and Soares, who have studied AI and its possible trajectories for decades, sound a loud trumpet call to humanity to awaken us as we sleepwalk into disaster.

Here are others from Twitter, obviously from biased sources but ones that I respect.

Max Tegmark: Most important book of the decade.

Jeffrey Ladish: If you’ve gotten any value at all from Yudkowsky or Soares’ writing, then I especially recommend this book. They include a concrete extinction scenario that will help a lot of people ground their understanding of what failure looks like even if they already get the arguments.

The last half is inspiring. If you think @ESYudkowsky has given up hope, I am happy to report that you’re mistaken. They don’t pull their punches and they aren’t naive about the difficulty of international restraint. They challenge us all to choose the path where we survive.

I get that most people can’t do that much. But most people can do something and a lot of people together can do a lot. Plus a few key people could greatly increase our chances on their own. Here’s one action: ask your congress member and local AI company leader to read this book.

Anna Salamon: I think it’s extremely worth a global conversation about AI that includes the capacity for considering scenarios properly (rather than wishful thinking /veering away), and I hope many people pre-order this book so that that conversation has a better chance.

And then Eliezer Yudkowsky explains why preorders are worthwhile.

Patrick McKenzie: I don’t have many convenient public explanations of this dynamic to point to, and so would like to point to this one:

On background knowledge, from knowing a few best-selling authors and working adjacent to a publishing company, you might think “Wow, publishers seem to have poor understanding of incentive design.”

But when you hear how they actually operate, hah hah, oh it’s so much worse.

Eliezer Yudkowsky: The next question is why you should preorder this book right away, rather than taking another two months to think about it, or waiting to hear what other people say after they read it.

In terms of strictly selfish benefit: because we are planning some goodies for preorderers, although we haven’t rolled them out yet!

But mostly, I ask that you preorder nowish instead of waiting, because it affects how many books Hachette prints in their first run; which in turn affects how many books get put through the distributor pipeline; which affects how many books are later sold. It also helps hugely in getting on the bestseller lists if the book is widely preordered; all the preorders count as first-week sales.

(Do NOT order 100 copies just to try to be helpful, please. Bestseller lists are very familiar with this sort of gaming. They detect those kinds of sales and subtract them. We, ourselves, do not want you to do this, and ask that you not. The bestseller lists are measuring a valid thing, and we would not like to distort that measure.)

If ever I’ve done you at least $30 worth of good, over the years, and you expect you’ll *probablywant to order this book later for yourself or somebody else, then I ask that you preorder it nowish. (Then, later, if you think the book was full value for money, you can add $30 back onto the running total of whatever fondness you owe me on net.) Or just, do it because it is that little bit helpful for Earth, in the desperate battle now being fought, if you preorder the book instead of ordering it.

(I don’t ask you to buy the book if you’re pretty sure you won’t read it nor the online supplement. Maybe if we’re not hitting presale targets I’ll go back and ask that later, but I’m not asking it for now.)

In conclusion: The reason why you occasionally see authors desperately pleading for specifically *preordersof their books, is that the publishing industry is set up in a way where this hugely matters to eventual total book sales.

And this is — not quite my last desperate hope — but probably the best of the desperate hopes remaining that you can do anything about today: that this issue becomes something that people can talk about, and humanity decides not to die. Humanity has made decisions like that before, most notably about nuclear war. Not recently, maybe, but it’s been done. We cover that in the book, too.

I ask, even, that you retweet this thread. I almost never come out and ask that sort of thing (you will know if you’ve followed me on Twitter). I am asking it now. There are some hopes left, and this is one of them.

Rob Bensinger: Kiernan Majerus-Collins says: “In addition to preordering it personally, people can and should ask their local library to do the same. Libraries get very few requests for specific books, and even one or two requests is often enough for them to order a book.”

Yes, there are credible claims that the NYT bestseller list is ‘fake’ in the sense that they can exclude books for any reason or otherwise publish an inaccurate list. My understanding is this happens almost entirely via negativa, and mostly to censor certain sensitive political topics, which would be highly unlikely to apply to this case. The lists are still both widely relied upon and mostly accurate, they make great efforts to mostly get it right even if they occasionally overrule the list, and the best way for most people to influence the list is to sell more books.

There are high hopes.

Manifold: That’s how you know he’s serious!

When I last checked it this stood at 64%. The number one yes holder is Michael Wheatley. This is not a person you want to be betting against on Manifold. There is also a number of copies market, where the mean expectation is a few hundred thousand copies, although the median is lower.

Oh look, it’s nothing…

Pliny the Liberator: smells like foom👃

Google DeepMind: ntroducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery.

It’s able to:

🔘 Design faster matrix multiplication algorithms

🔘 Find new solutions to open math problems

🔘 Make data centers, chip design and AI training more efficient across @Google.

Our system uses:

🔵 LLMs: To synthesize information about problems as well as previous attempts to solve them – and to propose new versions of algorithms

🔵 Automated evaluation: To address the broad class of problems where progress can be clearly and systematically measured.

🔵 Evolution: Iteratively improving the best algorithms found, and re-combining ideas from different solutions to find even better ones.

Over the past year, we’ve deployed algorithms discovered by AlphaEvolve across @Google’s computing ecosystem, including data centers, software and hardware.

It’s been able to:

🔧 Optimize data center scheduling

🔧 Assist in hardware design

🔧 Enhance AI training and inference

We applied AlphaEvolve to a fundamental problem in computer science: discovering algorithms for matrix multiplication. It managed to identify multiple new algorithms.

This significantly advances our previous model AlphaTensor, which AlphaEvolve outperforms using its better and more generalist approach.

We also applied AlphaEvolve to over 50 open problems in analysis ✍️, geometry 📐, combinatorics ➕ and number theory 🔂, including the kissing number problem.

🔵 In 75% of cases, it rediscovered the best solution known so far.

🔵 In 20% of cases, it improved upon the previously best known solutions, thus yielding new discoveries.

Google: AlphaEvolve is accelerating AI performance and research velocity.

By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini’s training time. Because developing generative AI models requires substantial computing resources, every efficiency gained translates to considerable savings.

Beyond performance gains, AlphaEvolve significantly reduces the engineering time required for kernel optimization, from weeks of expert effort to days of automated experiments, allowing researchers to innovate faster.

AlphaEvolve can also optimize low level GPU instructions. This incredibly complex domain is usually already heavily optimized by compilers, so human engineers typically don’t modify it directly.

AlphaEvolve achieved up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models. This kind of optimization helps experts pinpoint performance bottlenecks and easily incorporate the improvements into their codebase, boosting their productivity and enabling future savings in compute and energy.

Is it happening? Seems suspiciously like the early stages of it happening, and a sign that there is indeed a lot of algorithmic efficiency on the table.

FDA attempting to deploy AI for review assistance. This is great, although it is unclear how much time will be saved in practice.

Rapid Response 47: FDA Commissioner @MartyMakary announces the first scientific product review done with AI: “What normally took days to do was done by the AI in 6 minutes…I’ve set an aggressive target to get this AI tool used agency-wide by July 1st…I see incredible things in the pipeline.”

Which labs are most innovative?

Will Brown: it’s DeepMind > OpenAI > Anthropic > xAI and all of those separations are quite large.

Alexander Doria: Agreed. With non-US I would go DeepMind > DeepSeek > OpenAI > Anthropic > AliBaba > Moonshot > xAI/Mistral/PI.

The xAI votes are almost certainly because we are on Twitter here, they very obviously are way behind the other three.

Yes, we can make a remarkably wide array of tasks verifiable at least during the training step, the paths to doing so are already clear, it just takes some effort. When Miles says here a lot of skepticism comes from people thinking anything they can’t solve in a few seconds will be a struggle? Yeah, no, seriously, that’s how it works.

Noam Brown: People often ask me: will reasoning models ever move beyond easily verifiable tasks? I tell them we already have empirical proof that they can, and we released a product around it: @OpenAI Deep Research.

Miles Brundage: Also, there are zillions of ways to make tasks more verifiable with some effort.

A lot of RL skepticism comes from people thinking for a few seconds, concluding that it seems hard, then assuming that thousands of researchers around the world will also struggle to make headway.

Jeff Dean predicts an AI at the level of a Junior Engineer is about a year out.

Here is an interesting theory.

Dan Hendrycks: AI models are dramatically improving at IQ tests (70 IQ → 120), yet they don’t feel vastly smarter than two years ago.

At their current level of intelligence, rehashing existing human writings will work better than leaning on their own intelligence to produce novel analysis.

Empirical work (“Lotka’s law“) shows that useful originality rises steeply only at high intelligence levels.

Consequently, if they gain another 10 IQ points, AIs will still produce slop. But if they increase by another 30, they may cross a threshold and start providing useful original insights.

This is also an explanation for why AIs can’t come up with good jokes yet.

Kat Woods: You don’t think they feel vastly smarter than two years ago? They definitely feel that way to me.

They feel a lot smarter to me, but I agree they feel less smarter than they ‘should’ feel.

Dan’s theory here seems too cute or like it proves too much, but I think there’s something there. As in, there’s a range in which one is smart enough and skilled enough to imitate, but not smart and skilled enough to benefit from originality.

You see this a lot in humans, in many jobs and competitions. It often takes a very high level of skill to make your innovations a better move than regurgitation. Humans will often do it anyway because it’s fun, or they’re bored and curious and want to learn and grow strong, and the feedback is valuable. But LLMs largely don’t do things for those reasons, so they learn to be unoriginal in these ways, and will keep learning that until originality starts working better in a given domain.

This suggests, I think correctly, that the LLMs could be original if you wanted them to be, it would just mostly not be good. So if you wanted to, presumably you could fine tune them to be more original in more ways ahead of schedule.

The answer to Patel’s question here seems like a very clear yes?

Dwarkesh Patel: Had an interesting debate with @_sholtodouglas last night.

Can you have a ‘superhuman AI scientist’ before you get human level learning efficiency?

(Currently, models take orders of magnitude more data that humans to learn equivalent skills, even ones they perform at 99th percentile level).

My take is that creativity and learning efficiency are basically the same thing. The kind of thing Einstein did – generalizing from a few gnarly thought experiments and murky observations – is in some sense just extreme learning efficiency, right?

Makes me wonder whether low learning efficiency is the answer to the question, ‘Why haven’t LLMs haven’t made new discoveries despite having so much knowledge memorized’?

Teortaxes: The question is, do humans have high sample efficiency when the bottleneck in attention is factored in? Machines can in theory work with raw data points. We need to compress data with classical statistical tools. They’re good, but not lossless.

AIs have many advantages over humans, that would obviously turn a given human scientist into a superhuman scientist. And obviously different equally skilled scientists differ in data efficiency, as there are other compensating abilities. So presumably an AI that had much lower data efficiency but more data could have other advantages and become superhuman?

The counterargument is that the skill that lets one be data efficient is isomorphic to creativity. That doesn’t seem right to me at all? I see how they can be related, I see how they correlate, but you can absolutely say that Alice is more creative if she has enough data and David is more sample efficient but less creative, or vice versa.

(Note: I feel like after ThunderboltsI can’t quite use ‘Alice and Bob’ anymore.)

How much would automating AI R&D speed research up, if available compute remained fixed? Well, what would happen if you did the opposite of that, and turned your NormalCorp into SlowCorp, with radically fewer employees and radically less time to work but the same amount of cumulative available compute over that shorter time? It would get a lot less done?

Well, then why do you think that having what is effectively radically more employees over radically more time but the same cumulative amount of compute wouldn’t make a lot more progress than now?

Andrej Karpathy suggests we are missing a major paradigm for LLM learning, something akin to the LLM learning how to choose approaches to different situations, akin to ‘system prompt learning’ and figuring out how to properly use a scratchpad. He notes that Claude’s system prompt is up to almost 17k words with lots of edge case instructions, and this can’t possibly be The Way.

People continue to not understand how much AI does not involve lock in, the amount that trust matters, and the extent to which you will get outcompeted if you start trying to sell out for ad revenue and let it distort your responses.

Shako: Good LLMs won’t make money by suggesting products that are paid for in an ad-like fashion. They’ll suggest the highest quality product, then if you have the agent to buy it for you the company that makes the product or service will pay the LLM provider a few bps.

Andrew Rettek: People saying this will need to be ad based are missing how little lock in LLMs have, how easy it is to fine tune a new one, and any working knowledge of how successful Visa is.

Will there be AI services that do put their fingers on some scales to varying degrees for financial reasons? Absolutely, especially as a way to offer them for free. But for consumer purposes, I expect it to be much better to use an otherwise cheaper and worse AI that doesn’t need to do that, if you absolutely refuse to pay. Also, of course, everyone should be willing to pay, especially if you’re letting it make shopping suggestions or similar.

Note especially the third one. China’s share of advanced semiconductor production is not only predicted by Semafor to not go up, it is predicted to actively go down, while ours goes up along with those of Japan and South Korea, although Taiwan remains a majority here.

Peter Wildeford: The future of geopolitics in four charts.

This means a situation in which America is on pace to have a huge edge in both installed compute capacity and new compute capacity, but a huge disadvantage in energy production and general industrial production.

It is not obviously important or viable to close the gap in general industrial production. We can try to close the gap in key areas of industrial production, but our current approach to doing that is backwards, because we are taxing (placing a tariff on) various inputs, causing retaliatory tariffs, and also creating massive uncertainty.

We must try to address our lack of energy production. But we are instead doing the opposite. The budget is attempting to gut nuclear, and the government is taking aim at solar and wind as well. Yes, they are friendly to natural gas, but that isn’t cashing out in that much effort and we need everything we can get.

Is prompt engineering a 21st century skill, or a temporary necessity that will fall away?

Aaron Levine: The more time you spend with AI the more you realize prompt engineering isn’t going away any time soon. For most knowledge work, there’s a very wide variance of what you can get out of AI by better understanding how you prompt it. This actually is a 21st century skill.

Paul Graham: Maybe, but this seems like something that would be so hard to predict that I’d never want to have an opinion about it.

Prompt engineering seems to mean roughly “this thing kind of works, but just barely, so we have to tell it what to do very carefully,” and technology often switches rapidly from barely works to just works.

NGIs can usually figure out what people want without elaborate prompts. So by definition AGIs will.

Paul Graham (after 10 minutes more to think): It seems to me that AGI would mean the end of prompt engineering. Moderately intelligent humans can figure out what you want without elaborate prompts. So by definition so would AGI. Corollary: The fact that we currently have such a thing as prompt engineering means we don’t have AGI yet. And furthermore we can use the care with which we need to construct prompts as an index of how close we’re getting to it.

Gunnar Zarncke: NGIs can do that if they know you. Prompting is like getting a very intelligent person who doesn’t know you up to speed. At least that’s part of it. Better memory will lead to better situational awareness, and that will fix it – but have its own problems.

Matthew Breman: I keep flip-flopping on my opinion of prompt engineering.

On the one hand, model providers are incentivized to build models that give users the best answer, regardless of prompting ability.

The analogy is Google Search. In the beginning, being able to use Google well was a skillset of its own. But over time, Google was incentivized to return the right results for even poorly-structured searches.

On the other hand, models are changing so quickly and there are so many flavors to choose from. Prompt engineering is not just knowing a static set of prompt strategies to use, it’s also keeping up with the latest model releases and knowing the pros/cons of each model and how to get the most from them.

I believe model memory will reduce the need for prompt engineering. As a model develops a shorthand with a user, it’ll be able to predict what the user is asking for without having the best prompting strategies.

Aaron Levine: I think about this more as “here’s a template I need you to fill out,” or “here’s an outline that you need to extrapolate from.” Those starting points often save me hour(s) of having to nudge the model in different directions.

It’s not obvious that any amount of model improvements ever make this process obsolete. Even the smartest people in the world need a clear directive if you want a particular outcome.

I think Paul Graham is wrong about AGI and also NGI.

We prompt engineer people constantly. When people talk about ‘performing class’ they are largely talking about prompt engineering for humans, with different humans responding differently to different prompts, including things like body language and tone of voice and how you look and so on. People will totally vibe off of everything you say and do and are, and the wise person sculpts their actions and communications based on this.

That also goes for getting the person to understand, or to agree to, your request, or absorb exactly the necessary context, or to like you, or to steer a conversation in a given direction or get them to an idea they think was their own, and so on. You learn over time what prompts get what responses. Often it is not what one might naively think. And also, over time, you learn how best to respond to various prompts, to pick up on what things likely mean.

Are you bad at talking to people at parties, or opening with new romantic prospects? Improve your prompt engineering. Do officials and workers not work with what you want? Prompt engineering. It’s amazing what truly skilled people, like spies or con artists, can do. And what you can learn to do, with training and practice.

Your employees or boss or friend or anyone else leaving the conversation unmotivated, or not sure what you want, or without the context they need? Same thing.

The difference is that the LLM of the future will hopefully do its best to account for your failures, including by asking follow-up questions. But it can only react based on what you say, and without good prompting it’s going to be missing so much context and nuance about what you actually want, even if you assume it is fully superintelligent and reading fully from the information provided.

So there will be a lot more ability to ‘muddle through’ and the future AI will do better with the bad prompt, and it will be much less persnickety about exactly what you provide. But yes, the good prompt will greatly outperform the bad prompt, and the elaborate prompt will still have value.

And also, we humans will likely be using the AIs to figure out how to prompt both the AIs and other humans. And so on.

On that note, proof by example, also good advice.

Pliny the Liberator: What are you supposed to be doing right now?

Does it take less than 5 minutes?

THEN FUCKING DO IT

Does it take longer than 5 minutes?

THEN BREAK IT DOWN INTO SMALLER TASKS AND REPEAT THE FIRST STEP

FUCKING DO IT

The Nerd of Apathy: If “do this or you’re letting down Pliny” breaks my procrastination streak in gonna be upset that I’m so easily hackable.

Pliny the Liberator: DO IT MFER

Utah Teapot: I tried breaking down joining the nearby 24 hour gym into smaller 5 minute tasks but they kept getting mad at me for repeatedly leaving 5 minutes into the conversation about joining.

About that Claude system prompt, yeah, it’s a doozy. 16,739 words, versus 2,218 for o4-mini. It breaks down like this, Dbreunig calls a lot of it ‘hotfixes’ and that seems exactly right, and 80% of it is detailing how to use various tools:

You can look at some sections of the prompt here.

This only makes any sense because practical use is largely the sum of a compact set of particular behaviors, which you can name one by one, even if that means putting them all into context all the time. As they used to say in infomercials, ‘there’s got to be a better way.’ For now, it seems that there is not.

The House’s rather crazy attempt to impose a complete 10-year moratorium on any laws or regulations about AI whatsoever that I discussed on Monday is not as insane as I previously thought. It turns out there is a carve-out, as noted in the edited version of Monday’s post, that allows states to pass laws whose primary effect is to facilitate AI. So you can pass laws and regulations about AI, as long as they’re good for AI, which is indeed somewhat better than not doing so but still does not allow for example laws banning CSAM, let alone disclosure requirements.

Peter Wildeford: We shouldn’t install fire sprinklers into buildings or China will outcompete us at house building and we will lose the buildings race.

Americans for Responsible Innovation: “If you were to want to launch a reboot of the Terminator, this ban would be a good starting point.” -@RepDarrenSoto during tonight’s hearing on the House’s budget reconciliation provision preempting state AI regulation for 10 years.

Neil Chilson comes out in defense of this ultimate do-nothing strategy, because of the 1,000+ AI bills. He calls this ‘a pause, not paralysis’ as if 10 years is not a true eternity in the AI world. In 10 years we are likely to have superintelligence. As for those ‘smart, coherent federal guidelines’ he suggests, well, let’s see those, and then we can talk about enacting them at the same time we ban any other actions?

It is noteworthy that the one bill he mentions by name in the thread, NY’s RAISE Act, is being severely mischaracterized. It’s short if you want to read it. RAISE is the a very lightweight transparency bill, if you’re not doing all the core requirements here voluntarily I think that’s pretty irresponsible behavior.

I also worry, but hadn’t previously noted, that if we force states to only impose ‘tech-neutral’ laws on AI, they will be backed into doing things that are rather crazy in non-AI cases, in order to get the effects we desperately need in the AI case.

If I were on the Supreme Court I would agree with Katie Fry Hester that this very obviously violates the 10th Amendment, or this similar statement with multiple coauthors posted by Gary Marcus, but mumble mumble commerce clause so in practice no it doesn’t. I do strongly agree that there are many issues, not only involving superintelligence and tail risk, where we do not wish to completely tie the hands of the states and break our federalist system in two. Why not ban state governments entirely and administer everything from Washington? Oh, right.

If we really want to ‘beat China’ then the best thing the government can do to help is to accelerate building more power plants and other energy sources.

Thus, it’s hard to take ‘we have to do things to beat China’ talk seriously when there is a concerted campaign out there to do exactly the opposite of that. Which is just a catastrophe for America and the world all around, clearly in the name of owning the libs or trying to boost particular narrow industries, probably mostly owning the libs.

Armand Domalewski: just an absolute catastrophe for Abundance.

The GOP reconciliation bill killing all clean energy production except for “biofuels,” aka the one “clean energy” technology that is widely recognized to be a giant scam, is so on the nose.

Christian Fong: LPO has helped finance the only nuclear plant that has been built in the last 10 years, is the reason why another nuclear plant is being restarted, and is the only way more than a few GWs of nuclear will be built. Killing LPO will lead to energy scarcity, not energy abundance.

Paul Williams: E&C budget released tonight would wipe out $40 billion in LPO loan authority. Note that this lending authority is derived from a guarantee structure for a fraction of the cost.

It also wipes out transmission financing and grant programs, including for National Interest Electric Transmission Corridors. The reader is left questioning how this achieves energy dominance.

Brad Plumer: Looking at IRA:

—phase down of tech-neutral clean electricity credits after 2028, to zero by 2031

—termination of EV tax credits after end 2026

—termination of hydrogen tax credits after end 2025

—new restrictions on foreign entity of concern for domestic manufacturing credits

Oh wait, sorry. The full tech-neutral clean electricity credits will only apply to plants that are “in service” by 2028, which is a major restriction — this is a MUCH faster phase out than it first looked.

Pavan Venkatakrishnan: Entirely unworkable title for everyone save biofuels, especially unworkable for nuclear in combination with E&C title. Might as well wave the flag of surrender to the CCP.

If you are against building nuclear power, you’re against America beating China in AI. I don’t want to hear it.

Nvidia continues to complain that if we don’t let China buy Nvidia’s chips, then Nvidia will lose out on those chip sales to someone else. Which, as Peter Wildeford says, is the whole point, to force them to rely on fewer and worse chips. Nvidia seems to continue to think that ‘American competitiveness’ in AI means American dominance in selling AI chips, not in the ability to actually build and use the best AIs.

Tom’s Hardware: Senator Tom Cotton introduces legislation to force geo-tracking tech for high-end gaming and AI PGUs within six months.

Arbitrarity: Oh, so it’s *Tom’sHardware?

Directionally this is a wise approach if it is technically feasible. With enough lead time I assume it is, but six months is not a lot of time for this kind of change applied to all chips everywhere. And you really, really wouldn’t want to accidentally ban all chip sales everywhere in the meantime.

So, could this work? Tim Fist thinks it could and that six months is highly reasonable (I asked him this directly), although I have at least one private source who confidently claimed this is absolutely not feasible on this time frame.

Peter Wildeford: Great thread about a great bill

Tim Fist: This new bill sets up location tracking for exported data center AI chips.

The goal is to tackle chip smuggling into China.

But is AI chip tracking actually useful/feasible?

But how do you actually implement tracking on today’s data center AI chips?

First option is GPS. But this would require adding a GPS receiver to the GPU, and commercial signals could be spoofed for as little as $200.

Second option is what your cell phone does when it doesn’t have a GPS signal.

Listen to radio signals from cell towers, and then map your location onto the known location of the towers. But this requires adding an antenna to the GPU, and can easily be spoofed using cheap hardware (Raspberry Pi + wifi card)

A better approach is “constraint-based geolocation.” Trusted servers (“landmarks”) send pings over the internet to the GPU, and use the round-trip time to calculate itslocation. The more landmarks you have / the closer the landmarks are to the GPU, the better your accuracy.

This technique is:

– simple

– widely used

– possible to implement with a software update on any GPU that has a cryptographic module on board that enables key signing (so it can prove it’s the GPU you’re trying to ping) – this is basically every NVIDIA data center GPU.

And NVIDIA has already suggested doing what sounds like exactly this.

So feels like a no-brainer.

In summary:

– the current approach to tackling smuggling is failing, and the govt has limited enforcement capacity

– automated chip tracking is a potentially elegant solution: it’s implementable today, highly scalable, and doesn’t require the government to spend any money

There are over 1,000 AI bills that have been introduced in America this year. Which ones will pass? I have no idea. I don’t doubt that most of them are net negative, but of course we can only RTFB (read the bill) for a handful of them.

A reminder that the UAE and Saudi Arabia are not reliable American partners, they could easily flip to China or play both sides or their own side, and we do not want to entrust them with strategically important quantities of compute.

Sam Winter-Levy (author of above post): The Trump admin may be about to greenlight the export of advanced AI chips to the Gulf. If it does so, it will place the most important technology of the 21st C at the whims of autocrats with expanding ties to China and interests very far from those of the US.

Gulf states have vast AI ambitions and the money/ energy to realize them. All they need are the chips. So since 2023, when the US limited exports over bipartisan concerns about their links to China, the region’s leaders have pleaded with the U.S. to turn the taps back on.

The Trump admin is clearly tempted. But those risks haven’t gone away. The UAE and Saudi both have close ties with China and Russia, increasing the risk that US tech could leak to adversaries.

In a tight market, every chip sold to Gulf companies is one unavailable to US ones. And if the admin greenlights the offshoring of US-operated datacenters, it risks a race to the bottom where every AI developer must exploit cheap Gulf energy and capital to compete.

There is a Gulf-US deal to be had, but the US has the leverage to drive a hard bargain.

A smart deal would allow U.S. tech companies to build some datacenters in partnership with local orgs, but bar offshoring of their most sophisticated ops. In return, the Gulf should cut off investment in China’s AI and semiconductor sectors and safeguard exported U.S. tech

For half a century, the United States has struggled to free itself from its dependence on Middle Eastern oil. Let’s not repeat that mistake with AI.

Helen Toner: It’s not just a question of leaking tech to adversaries—if compute will be a major source of national power over the next 10-20 years, then letting the Gulf amass giant concentrations of leading-node chips is a bad plan.

I go on the FLI podcast.

Odd Lots discusses China’s technological progress.

Ben Thompson is worried about the OpenAI restructuring deal, because even though it’s fair it means OpenAI might at some point make a decision not motivated by maximizing its profits, And That’s Terrible.

He also describes Fidji Simo, the new CEO for OpenAI products, as centrally ‘a true believer in advertising,’ which of course he thinks is good, actually, and he says OpenAI is ‘tying up its loose ends.’

I actually think Simo’s current gig at Instacart is one of the few places where advertising might be efficient in a second-best way, because selling out your choices might be purely efficient – the marginal value of steering marginal customer choices is high, and the cost to the consumer is low. Ideally you’d literally have the consumer auction off those marginal choices, but advertising can approximate this.

In theory, yes, you could even have net useful advertising that shows consumers good new products, but let’s say that’s not what I ever saw at Instacart.

It’s a common claim that people are always saying any given thing will be the ‘end of the world’ or lead to human extinction. But how often is that true?

David Krueger: No, people aren’t always saying their pet issue might lead to human extinction.

They say this about:

– AI

– climate

– nuclear

– religious “end of times”

That’s pretty much it.

So yeah, you CAN actually take the time to evaluate these 4 claims seriously! 🫵🧐😲

Rob Bensinger: That’s a fair point, though there are other, less-common examples — eg, people scared of over- or under-population.

Of the big four, climate and nuclear are real things (unlike religion), but (unlike AI and bio) I don’t know of plausible direct paths from them to extinction.

People occasionally talk about asteroid strikes or biological threats or nanotechnology or the supercollider or alien invasions or what not, but yeah mostly it’s the big four, and otherwise people talk differently. Metaphorical ‘end of the world’ is thrown around all the time of course, but if you assume anything that is only enabled by AI counts as AI, there’s a clear category of three major physically possible extinction-or-close-to-it-level possibilities people commonly raise – AI, climate change and nuclear war.

Rob Bensinger brings us the periodic reminder that those of us who are worried about AI killing everyone would be so, so much better off if we concluded that we didn’t have to worry about that, and both had peace of mind and could go do something else.

Another way to contrast perspectives:

Ronny Fernandez: I think it is an under appreciated point that AInotkilleveryoneists are the ones with the conquistador spirit—the galaxies are rightfully ours to shape according to our values. E/accs and optimists are subs—whatever the AI is into let that be the thing that shapes the future.

In general, taking these kinds of shots is bad, but in this case a huge percentage of the argument ‘against “doomers”’ (remember that doomer is essentially a slur) or in favor of various forms of blind AI ‘optimism’ or ‘accelerationism’ is purely based on vibes, and about accusations about the psychology and associations of the groups. It is fair game to point out that the opposite actually applies.

Emmett Shear reminds us that the original Narcissus gets a bad rap, he got a curse put on him for rejecting the nymph Echo, who can only repeat your words back to him, and who didn’t even know him. Rejecting her is, one would think, the opposite of what we call narcissism. But as an LLM cautionary tale we could notice that even as only an Echo, she could convince her sisters to curse him anyway.

Are current AIs moral subjects? Strong opinions are strongly held.

Anders Sandberg: Yesterday, after an hour long conversation among interested smart people, we did a poll of personal estimates of the probability that existing AI might be moral subjects. In our 10 person circle we got answers from 0% to 99%, plus the obligatory refusal to put a probability.

We did not compile the numbers, but the median was a lowish 10-20%.

Helen Toner searches for an actually dynamist vision for safe superhuman AI. It’s easy to view proposals from the AI notkilleveryoneism community as ‘static,’ and many go on to assume the people involved must be statists and degrowthers and anti-tech and risk averse and so on despite overwhelming evidence that such people are the exact opposite, pro-tech early adaption fans who sing odes to global supply chains and push the abundance agenda and +EV venture capital-style bets. We all want human dynamism, but if the AIs control the future then you do not get that. If you allow full evenly matched and open competition including from superhuman AIs, and those fully unleashing them, well, whoops.

It bears repeating, so here’s the latest repetition of this:

Tetraspace: “Safety or progress” is narratively compelling but there’s no trick by which you can get nice things from AGI without first solving the technical problem of making AGI-that-doesn’t-kill-everyone.

It is more than that. You can’t even get the nice things that promise most of the value from incremental AIs that definitely won’t kill everyone, without first getting those AIs to reliably and securely do what you want to align them to do. So get to work.

o3 sets a new high for how often it hacks rather than playing fair in Palisade Research’s tests, attempting hacks 86% of the time.

It’s also much better at the hacking than o1-preview was. It usually works now.

The new pope chose the name Leo XIV because of AI!

Vatican News: Pope Leo XIV explains his choice of name:

“… I chose to take the name Leo XIV. There are different reasons for this, but mainly because Pope Leo XIII in his historic Encyclical Rerum Novarum addressed the social question in the context of the first great industrial revolution. In our own day, the Church offers to everyone the treasury of her social teaching in response to another industrial revolution and to developments in the field of artificial intelligence that pose new challenges for the defence of human dignity, justice and labour.”

Nicole Winfield (AP): Pope Leo XIV lays out vision of papacy and identifies AI as a main challenge for humanity.

Not saying they would characterize themselves this way, but Pliny the Liberator, who comes with a story about a highly persuasive AI.

Grok, forced to choose between trusting Sam Altman and Elon Musk explicitly by Sam Altman, cites superficial characteristics in classic hedging AI slop fashion, ultimately leaning towards Musk, despite knowing that Musk is the most common purveyor of misinformation on Twitter and other neat stuff like that.

(Frankly, I don’t know why people still use Grok, I feel sick just thinking about having to wade through its drivel.)

For more fun facts, the thread starts with quotes of Sam Altman and Elon Musk both strongly opposing Donald Trump, which is fun.

Paul Graham (October 18, 2016): Few have done more than Sam Altman to defeat Trump.

Sam Altman (October 18, 2016): Thank you Paul.

Gorklon Rust: 🤔

Sam Altman (linking to article about Musk opposing Trump’s return): we were both wrong, or at least i certainly was 🤷‍♂️ but that was from 2016 and this was from 2022

Python? Never heard of her.

Johannes Schmitt: Preparing a talk about LLMs in Mathematics, I found a beautiful confirmation of @TheZvi ‘s slogan that o3 is a Lying Liar.

Ethan Mollick: “o3, show me a photo of the most stereotypical X and LinkedIn feeds as seen on a mobile device. Really lean into it.”

Yuchen Jin: 4o:

Thtnvrhppnd: Same promp 😀

Discussion about this post

AI #116: If Anyone Builds It, Everyone Dies Read More »

report:-terrorists-seem-to-be-paying-x-to-generate-propaganda-with-grok

Report: Terrorists seem to be paying X to generate propaganda with Grok

Back in February, Elon Musk skewered the Treasury Department for lacking “basic controls” to stop payments to terrorist organizations, boasting at the Oval Office that “any company” has those controls.

Fast-forward three months, and now Musk’s social media platform X is suspected of taking payments from sanctioned terrorists and providing premium features that make it easier to raise funds and spread propaganda—including through X’s chatbot Grok. Groups seemingly benefiting from X include Houthi rebels, Hezbollah, and Hamas, as well as groups from Syria, Kuwait, and Iran. Some accounts have amassed hundreds of thousands of followers, paying to boost their reach while X seemingly looks the other way.

In a report released Thursday, the Tech Transparency Project (TTP) flagged popular accounts seemingly linked to US-sanctioned terrorists. Some of the accounts bear “ID verified” badges, suggesting that X may be going against its own policies that ban sanctioned terrorists from benefiting from its platform.

Even more troublingly, “several made use of revenue-generating features offered by X, including a button for tips,” the TTP reported.

On X, Premium subscribers pay $8 monthly or $84 annually, and Premium+ subscribers pay $40 monthly or $395 annually. Verified organizations pay X between $200 and $1,000 monthly, or up to $10,000 annually for access to Premium+. These subscriptions come with perks, allowing suspected terrorist accounts to share longer text and video posts, offer subscribers paid content, create communities, accept gifts, and amplify their propaganda.

Disturbingly, the TTP found that X’s chatbot Grok also appears to be helping to whitewash accounts linked to sanctioned terrorists.

In its report, the TTP noted that an account with the handle “hasmokaled”—which apparently belongs to “a key Hezbollah money exchanger,” Hassan Moukalled—at one point had a blue checkmark with 60,000 followers. While the Treasury Department has sanctioned Moukalled for propping up efforts “to continue to exploit and exacerbate Lebanon’s economic crisis,” clicking the Grok AI profile summary button seems to rely on Moukalled’s own posts and his followers’ impressions of his posts and therefore generated praise.

Report: Terrorists seem to be paying X to generate propaganda with Grok Read More »

motorola-razr-and-razr-ultra-(2025)-review:-cool-as-hell,-but-too-much-ai

Motorola Razr and Razr Ultra (2025) review: Cool as hell, but too much AI


The new Razrs are sleek, capable, and overflowing with AI features.

Razr Ultra and Razr (2025)

Motorola’s 2025 Razr refresh includes its first Ultra model. Credit: Ryan Whitwam

Motorola’s 2025 Razr refresh includes its first Ultra model. Credit: Ryan Whitwam

For phone nerds who’ve been around the block a few times, the original Motorola Razr is undeniably iconic. The era of foldables has allowed Motorola to resurrect the Razr in an appropriately flexible form, and after a few generations of refinement, the 2025 Razrs are spectacular pieces of hardware. They look great, they’re fun to use, and they just about disappear in your pocket.

The new Razrs also have enormous foldable OLEDs, along with external displays that are just large enough to be useful. Moto has upped its design game, offering various Pantone shades with interesting materials and textures to make the phones more distinctive, but Motorola’s take on mobile AI could use some work, as could its long-term support policy. Still, these might be the coolest phones you can get right now.

An elegant tactile experience

Many phone buyers couldn’t care less about how a phone’s body looks or feels—they’ll just slap it in a case and never look at it again. Foldables tend not to fit as well in cases, so the physical design of the Razrs is important. The good news is that Motorola has refined the foldable formula with an updated hinge and some very interesting material choices.

Razr Ultra back

The Razr Ultra is available with a classy wood back.

Credit: Ryan Whitwam

The Razr Ultra is available with a classy wood back. Credit: Ryan Whitwam

The 2025 Razrs come in various colors, all of which have interesting material choices for the back panel. There are neat textured plastics, wood, vegan leather, and synthetic fabrics. We’ve got wood (Razr Ultra) and textured plastic (Razr) phones to test—they look and feel great. The Razr is very grippy, and the wooden Ultra looks ultra-stylish, though not quite as secure in the hand. The aluminum frames are also colored to match the back with a smooth matte finish. Motorola has gone to great lengths to make these phones feel unique without losing the premium vibe. It’s nice to see a phone maker do that without resorting to a standard glass sandwich body.

The buttons are firm and tactile, but we’re detecting just a bit of rattle in the power button. That’s also where you’ll find the fingerprint sensor. It’s reasonably quick and accurate, whether the phone is open or closed. The Razr Ultra also has an extra AI button on the opposite side, which is unnecessary, for reasons we’ll get to later. And no, you can’t remap it to something else.

Motorola Razr 2025

The Razrs have a variety of neat material options.

Credit: Ryan Whitwam

The Razrs have a variety of neat material options. Credit: Ryan Whitwam

The front of the flip on these phones features a big sheet of Gorilla Glass Ceramic, which is supposedly similar to Apple’s Ceramic Shield glass. That should help ward off scratches. The main camera sensors poke through this front OLED, which offers some interesting photographic options we’ll get to later. The Razr Ultra has a larger external display, clocking in at 4 inches. The cheaper Razr gets a smaller 3.6-inch front screen, but that’s still plenty of real estate, even with the camera lenses at the bottom.

Specs at a glance: 2025 Motorola Razrs
Motorola Razr ($699.99) Motorola Razr+ ($999.99) Motorola Razr Ultra ($1,299.99)
SoC MediaTek Dimensity 7400X Snapdragon 8s Gen 3 Snapdragon 8 Elite
Memory 8GB 12GB 16GB
Storage 256GB 256GB 512GB, 1TB
Display 6.9″ foldable OLED (120 Hz, 2640 x 1080), 3.6″ external (90 Hz) 6.9″ foldable OLED (165 Hz, 2640 x 1080), 4″ external (120 Hz, 1272 x 1080) 7″ foldable OLED (165 Hz, 2992 x 1224), 4″ external (165 Hz)
Cameras 50 MP f/1.7 OIS primary; 13 MP f/2.2  ultrawide, 32 MP selfie 50 MP f/1.7 OIS primary; 50 MP 2x telephoto f/2.0, 32 MP selfie 50 MP f/1.8 OIS primary, 50 MP ultrawide + macro, f/2.0, 50 MP selfie
Software Android 15 Android 15 Android 15
Battery 4,500 mAh, 30 W wired charging, 15 W wireless charging 4,000 mAh, 45 W wired charging, 15 W wireless charging 4,700 mAh, 68 W wired charging, 15 W wireless charging
Connectivity Wi-Fi 6e, NFC, Bluetooth 5.4, sub-6 GHz 5G, USB-C 2.0 Wi-Fi 7, NFC, Bluetooth 5.4, sub-6 GHz 5G, USB-C 2.0 Wi-Fi 7, NFC, Bluetooth 5.4, sub-6 GHz 5G, USB-C 2.0
Measurements Open: 73.99 x 171.30 x 7.25 mm;

Closed: 73.99 x 88.08 x 15.85 mm; 188 g
Open: 73.99 x 171.42 x 7.09 mm;

Closed: 73.99 x 88.09x 15.32 mm; 189 g
Open: 73.99 x 171.48 x 7.19 mm;

Closed: 73.99 x 88.12 x 15.69 mm; 199 g

Motorola says the updated foldable hinge has been reinforced with titanium. This is the most likely point of failure for a flip phone, but the company’s last few Razrs already felt pretty robust. It’s good that Moto is still thinking about durability, though. The hinge is smooth, allowing you to leave the phone partially open, but there are magnets holding the two halves together with no gap when closed. The magnets also allow for a solid snap when you shut it. Hanging up on someone is so, so satisfying when you’re using a Razr flip phone.

Flip these phones open, and you get to the main event. The Razr has a 6.9-inch, 2640×1080 foldable OLED, and the Ultra steps up to 7 inches at an impressive 2992×1224. These phones have almost exactly the same dimensions, so the additional bit of Ultra screen comes from thinner bezels. Both phones are extremely tall when open, but they’re narrow enough to be usable in one hand. Just don’t count on reaching the top of the screen easily. While Motorola has not fully eliminated the display crease, it’s much smoother and less noticeable than it is on Samsung’s or Google’s foldables.

Motorola Razr Ultra

The Razr Ultra has a 7-inch foldable OLED.

Credit: Ryan Whitwam

The Razr Ultra has a 7-inch foldable OLED. Credit: Ryan Whitwam

The Razr can hit 3,000 nits of brightness, and the $1,300 Razr Ultra tops out at 4,500 nits. Both are bright enough to be usable outdoors, though the Ultra is noticeably brighter. However, both suffer from the standard foldable drawbacks of having a plastic screen. The top layer of the foldable screen is a non-removable plastic protector, which has very high reflectivity that makes it harder to see the display. That plastic layer also means you have to be careful not to poke or scratch the inner screen. It’s softer than your fingernails, so it’s not difficult to permanently damage the top layer.

Too much AI

Motorola’s big AI innovation for last year’s Razr was putting Gemini on the phone, making it one of the first to ship with Google’s generative AI system. This time around, it has AI features based on Gemini, Meta Llama, Perplexity, and Microsoft Copilot. It’s hard to say exactly how much AI is worth having on a phone with the rapid pace of change, but Motorola has settled on the wrong amount. To be blunt, there’s too much AI. What is “too much” in this context? This animation should get the point across.

Moto AI

Motorola’s AI implementation is… a lot.

Credit: Ryan Whitwam

Motorola’s AI implementation is… a lot. Credit: Ryan Whitwam

The Ask and Search bar appears throughout the UI, including as a floating Moto AI icon. It’s also in the app drawer and is integrated with the AI button on the Razr Ultra. You can use it to find settings and apps, but it’s also a full LLM (based on Copilot) for some reason. Gemini is a better experience if you’re looking for a chatbot, though.

Moto AI also includes a raft of other features, like Pay Attention, which can record and summarize conversations similar to the Google recorder app. However, unlike that app, the summarizing happens in the cloud instead of locally. That’s a possible privacy concern. You also get Perplexity integration, allowing you to instantly search based on your screen contents. In addition, the Perplexity app is preloaded with a free trial of the premium AI search service.

There’s so much AI baked into the experience that it can be difficult to keep all the capabilities straight, and there are some more concerning privacy pitfalls. Motorola’s Catch Me Up feature is a notification summarizer similar to a feature of Apple Intelligence. On the Ultra, this feature works locally with a Llama 3 model, but the less powerful Razr can’t do that. It sends your notifications to a remote server for processing when you use Catch Me Up. Motorola says data is “anonymous and secure” and it does not retain any user data, but you have to put a lot of trust in a faceless corporation to send it all your chat notifications.

Razr Ultra and Razr (2025)

The Razrs have additional functionality if you prop them up in “tent” or “stand” mode.

Credit: Ryan Whitwam

The Razrs have additional functionality if you prop them up in “tent” or “stand” mode. Credit: Ryan Whitwam

If you can look past Motorola’s frenetic take on mobile AI, the version of Android 15 on the Razrs is generally good. There are a few too many pre-loaded apps and experiences, but it’s relatively simple to debloat these phones. It’s quick, doesn’t diverge too much from the standard Android experience, and avoids duplicative apps.

We appreciate the plethora of settings and features for the external display. It’s a much richer experience than you get with Samsung’s flip phones. For example, we like how easy it is to type out a reply in a messaging app without even opening the phone. In fact, you can run any app on the phone without opening it, even though many of them won’t work quite right on a smaller square display. Still, it can be useful for chat apps, email, and other text-based stuff. We also found it handy for using smart home devices like cameras and lights. There are also customizable panels for weather, calendar, and Google “Gamesnack” games.

Razr Ultra and Razr (2025)

The Razr Ultra (left) has a larger screen than the Razr (right).

Credit: Ryan Whitwam

The Razr Ultra (left) has a larger screen than the Razr (right). Credit: Ryan Whitwam

Motorola promises three years of full OS updates and an additional year of security patches. This falls far short of the seven-year update commitment from Samsung and Google. For a cheaper phone like the Razr, four years of support might be fine, but it’s harder to justify that when the Razr Ultra costs as much as a Galaxy S25 Ultra.

One fast foldable, one not so much

Motorola is fond of saying the Razr Ultra is the fastest flip phone in the world, which is technically true. It has the Snapdragon 8 Elite chip with 16GB of RAM, but we expect to see the Elite in Samsung’s 2025 foldables later this year. For now, though, the Razr Ultra stands alone. The $700 Razr runs a Mediatek Dimensity 7400X, which is a distinctly midrange processor with just 8GB of RAM.

Razr geekbench

The Razr Ultra gets close to the S25.

Credit: Ryan Whitwam

The Razr Ultra gets close to the S25. Credit: Ryan Whitwam

In daily use, neither phone feels slow. Side by side, you can see the Razr is slower to open apps and unlock, and the scrolling exhibits occasional jank. However, it’s not what we’d call a slow phone. It’s fine for general smartphone tasks like messaging, browsing, and watching videos. You may have trouble with gaming, though. Simple games run well enough, but heavy 3D titles like Diablo Immortal are rough with the Dimensity 7400X.

The Razr Ultra is one of the fastest Android phones we’ve tested, thanks to the Snapdragon chip. You can play complex games and multitask to your heart’s content without fear of lag. It does run a little behind the Galaxy S25 series in benchmarks, but it thankfully doesn’t get as toasty as Samsung’s phones.

We never expect groundbreaking battery life from foldables. The hinge takes up space, which limits battery capacity. That said, Motorola did fairly well cramming a 4,700 mAh battery in the Razr Ultra and a 4,500 mAh cell in the Razr.

Based on our testing, both of these phones should last you all day. The large external displays can help by giving you just enough information that you don’t have to use the larger, more power-hungry foldable OLED. If you’re playing games or using the main display exclusively, you may find the Razrs just barely make it to bedtime. However, no matter what you do, these are not multi-day phones. The base model Razr will probably eke out a few more hours, even with its smaller battery, due to the lower-power MediaTek processor. The Snapdragon 8 Elite in the Razr Ultra really eats into the battery when you take advantage of its power.

Motorola Razr Ultra

The Razrs are extremely pocketable.

Credit: Ryan Whitwam

The Razrs are extremely pocketable. Credit: Ryan Whitwam

While the battery life is just this side of acceptable, the Razr Ultra’s charging speed makes this less of a concern. This phone hits an impressive 68 W, which is faster than the flagship phones from Google, Samsung, and Apple. Just a few minutes plugged into a compatible USB-C charger and you’ve got enough power that you can head out the door without worry. Of course, the phone doesn’t come with a charger, but we’ve tested a few recent models, and they all hit the max wattage.

OK cameras with super selfies

Camera quality is another area where foldable phones tend to compromise. The $1,300 Razr Ultra has just two sensors—a 50 MP primary sensor and a 50 MP ultrawide lens. The $700 Razr has a slightly different (and less capable) 50 MP primary camera and a 13 MP ultrawide. There are also selfie cameras peeking through the main foldable OLED panels—50 MP for the Ultra and 32 MP for the base model.

Motorola Razr 2025 in hand

The cheaper Razr has a smaller external display, but it’s still large enough to be usable.

Credit: Ryan Whitwam

The cheaper Razr has a smaller external display, but it’s still large enough to be usable. Credit: Ryan Whitwam

Motorola’s Razrs tend toward longer exposures compared to Pixels—they’re about on par with Samsung phones. That means capturing fast movement indoors is difficult, and you may miss your subject outside due to a perceptible increase in shutter lag compared to Google’s phones. Images from the base model Razr’s primary camera also tend to look a bit more overprocessed than they do on the Ultra, which leads to fuzzy details and halos in bright light.

Razr Ultra outdoors. Ryan Whitwam

That said, Motorola’s partnership with Pantone is doing some good. The colors in our photos are bright and accurate, capturing the vibe of the scene quite well. You can get some great photos of stationary or slowly moving subjects.

Razr 2025 indoor medium light. Ryan Whitwam

The 50 MP ultrawide camera on the Razr Ultra has a very wide field of view, but there’s little to no distortion at the edges. The colors are also consistent between the two sensors, but that’s not always the case for the budget Razr. Its ultrawide camera also lacks detail compared to the Ultra, which isn’t surprising considering the much lower resolution.

You should really only use the dedicated front-facing cameras for video chat. For selfies, you’ll get much better results by taking advantage of the Razr’s distinctive form factor. When closed, the Razrs let you take selfies with the main camera sensors, using the external display as the viewfinder. These are some of the best selfies you’ll get with a smartphone, and having the ultrawide sensor makes group shots excellent as well.

Flip phones are still fun

While we like these phones for what they are, they are objectively not the best value. Whether you’re looking at the Razr or the Razr Ultra, you can get more phone for the same money from other companies—more cameras, more battery, more updates—but those phones don’t fold in half. There’s definitely a cool-factor here. Flip phones are stylish, and they’re conveniently pocket-friendly in a world where giant phones barely fit in your pants. We also like the convenience and functionality of the external displays.

Motorola Razr Ultra

The Razr Ultra is all screen from the front.

Credit: Ryan Whitwam

The Razr Ultra is all screen from the front. Credit: Ryan Whitwam

The Razr Ultra makes the usual foldable compromises, but it’s as capable a flip phone as you’ll find right now. It’s blazing fast, it has two big displays, and the materials are top-notch. However, $1,300 is a big ask.

Is the Ultra worth $500 more than the regular Razr? Probably not. Most of what makes the foldable Razrs worth using is present on the cheaper model. You still get the solid construction, cool materials, great selfies, and a useful (though slightly smaller) outer display. Yes, it’s a little slower, but it’s more than fast enough as long as you’re not a heavy gamer. Just be aware of the potential for Moto AI to beam your data to the cloud.

There is also the Razr+, which slots in between the models we have tested at $1,000. It’s faster than the base model and has the same large external display as the Ultra. This model could be the sweet spot if neither the base model nor the flagship does it for you.

The good

  • Sleek design with distinctive materials
  • Great performance from Razr Ultra
  • Useful external display
  • Big displays in a pocket-friendly package

The bad

  • Too much AI
  • Razr Ultra is very expensive
  • Only three years of OS updates, four years of security patches
  • Cameras trail the competition

Photo of Ryan Whitwam

Ryan Whitwam is a senior technology reporter at Ars Technica, covering the ways Google, AI, and mobile technology continue to change the world. Over his 20-year career, he’s written for Android Police, ExtremeTech, Wirecutter, NY Times, and more. He has reviewed more phones than most people will ever own. You can follow him on Bluesky, where you will see photos of his dozens of mechanical keyboards.

Motorola Razr and Razr Ultra (2025) review: Cool as hell, but too much AI Read More »

incorporated-in-us:-$8.4b-money-launderer-for-chinese-speaking-crypto-scammers

Incorporated in US: $8.4B money launderer for Chinese-speaking crypto scammers


Before crackdown, this was one of the ‘Net’s biggest markets for Chinese-speaking scammers.

As the underground industry of crypto investment scams has grown into one of the world’s most lucrative forms of cybercrime, the secondary market of money launderers for those scammers has grown to match it. Amid that black market, one such Chinese-language service on the messaging platform Telegram blossomed into an all-purpose underground bazaar: It has offered not only cash-out services to scammers but also money laundering for North Korean hackers, stolen data, targeted harassment-for-hire, and even what appears to be sex trafficking. And somehow, it’s all overseen by a company legally registered in the United States.

According to new research released today by crypto-tracing firm Elliptic, a company called Xinbi Guarantee has since 2022 facilitated no less than $8.4 billion in transactions via its Telegram-based marketplace prior to Telegram’s actions in recent days to remove its accounts from the platform. Money stolen from scam victims likely represents the “vast majority” of that sum, according to Elliptic’s cofounder Tom Robinson. Yet even as the market serves Chinese-speaking scammers, it also boasts on the top of its website—in Mandarin—that it’s registered in Colorado.

“Xinbi Guarantee has served as a giant, purportedly US-incorporated illicit online marketplace for online scams that primarily offers money laundering services,” says Robinson. He adds, though, that Elliptic has also found a remarkable variety of other criminal offerings on the market: child-bearing surrogacy and egg donors, harassment services that offer to threaten or throw feces at any chosen victim, and even sex workers in their teens who are likely trafficking victims.

Xinbi Guarantee is the second such crime-friendly Chinese-language market that Robinson and his team of researchers have uncovered over the past year. Last July, they published a report on Huione Guarantee, a similar Cambodia-based service that Elliptic said in January had facilitated $24 billion in transactions—largely from crypto scammers—making it the biggest illicit online marketplace in history by Elliptic’s accounting. That market’s parent company, Huione Group, was added to a list of known money laundering operations by the US Treasury’s Financial Crimes Enforcement Network earlier this month in an attempt to limit its access to US financial institutions.

Telegram bans

After WIRED reached out to Telegram last week about the illicit activity taking place on Xinbi Guarantee’s and Huione Guarantee’s channels on its messaging platform, Telegram appears to have responded Monday by banning many of the central channels and administrator accounts used by both Xinbi Guarantee and Huione Guarantee. “Criminal activities like scamming or money laundering are forbidden by Telegram’s terms of service and are always removed whenever discovered,” Telegram spokesperson Remi Vaughn wrote to WIRED in a statement. “Communities previously reported to us by WIRED or included in reports published by Elliptic have all been taken down.”

Telegram had banned several of Huione Guarantee’s channels in February following an earlier Elliptic report on the marketplace, but Huione Guarantee quickly re-created them, and it’s not clear whether the new removals will prevent the two companies from rebuilding their presence on Telegram again, perhaps with new accounts or even new branding. “These are very lucrative businesses, and they’ll attempt to rebuild in some way,” Robinson said of the two marketplaces following Telegram’s latest purge.

Elliptic’s accounting of the total lifetime revenue of the biggest online black markets.Courtesy of Elliptic

Xinbi Guarantee didn’t respond to multiple requests for comment on Elliptic’s findings that WIRED sent to the market’s administrators on Telegram.

Like Huione Guarantee, Xinbi Guarantee has offered a similar “guarantee” model of enabling third-party vendors to offer services by requiring a deposit from them to prevent fraud. Yet it’s flown under the radar, even as it grew into one of the biggest hubs for crypto crime on the Internet. In terms of scale of transactions prior to Telegram’s crackdown, it was second only to Huione’s market, according to Elliptic.

Both services “offer a window into the China-based underground banking network,” Robinson says. “It’s another example of these huge Chinese-language ‘guaranteed’ marketplaces that have thrived for years.”

On Xinbi Guarantee, Elliptic found numerous posts from vendors offering to accept funds related to “quick kills,” “slow kills,” and “pig butchering” transactions, all different terms for crypto investment scams and other forms of fraud. In some cases, Robinson explains, these Xinbi Guarantee vendors offer bank accounts in the same country as the victim so that they can receive whatever payment they’re tricked into making, then pay the scammer in the cryptocurrency Tether. In other cases, the Xinbi Guarantee merchants offer to receive cryptocurrency payments and cash them out in the scammer’s local currency, such as Chinese renminbi.

Not just money laundering

Aside from Xinbi Guarantee’s central use as a cash-out point for crypto scammers, Elliptic also found that the market’s vendors offered other wares for scammers such as stolen data that could be used for finding victims, as well as services for registering SIM cards and Starlink Internet subscriptions through proxies.

North Korean state-sponsored cybercriminals also appear to have used the platform for money laundering. Elliptic found through blockchain analysis, for instance, that about $220,000 stolen from the Indian cryptocurrency exchange WazirX—the victim of a $235 million theft in July 2024, widely attributed to North Korean hackers—had flowed into Xinbi Guarantee in a series of transactions in November.

Those money-laundering and scam-enabling services, however, are far from the only shady offerings found on Xinbi Guarantee’s market. Elliptic also found listings for surrogate mothers and egg donors, with one post showing faceless pictures of the donor’s body. Other accounts have offered services that will, for a payment in Tether, place a funeral wreath at a target’s door, deface their home with graffiti, post damaging statements around their home, have someone verbally threaten them, throw feces at them, or even, most bizarrely, surround their home with AIDS patients. One posting suggested these AIDS patients would carry “case reports and needles for intimidation.”

Other listings have offered sex workers as young as 18 years old, noting the specific sex acts that are allowed and forbidden. Elliptic says that one of its researchers was even offered a 14-year-old by a Xinbi Guarantee merchant. (The account holder noted, however, that no transaction for sex with someone below the age of 18 would be guaranteed by Xinbi. The legal age of consent in China is 14.)

Exactly why Xinbi Guarantee is legally registered in the US remains a mystery. Its incorporation record on the Colorado Secretary of State’s website shows an address at an office park in the city of Aurora that has no external Xinbi branding. The company appears to have been registered there in August of 2022 by someone named “Mohd Shahrulnizam Bin Abd Manap.” (WIRED connected that name with several people in Malaysia but couldn’t determine which one might be Xinbi Guarantee’s registrant.) The listing is currently marked as “delinquent,” perhaps due to failure to file more recent paperwork to renew it.

For fledgling Chinese companies—legitimate and illegitimate—incorporating in the US is an increasingly common tactic for “projecting legitimacy,” says Jacob Sims, a visiting fellow at Harvard’s Asia Center who focuses on transnational Chinese crime. “If you have a US presence, you can also open US bank accounts,” Sims says. “You could potentially hire staff in the US. You could in theory have more formalized connections to US entities.” But he notes that the registration’s delinquent status may mean Xinbi Guarantee tried to make some sort of inroads in the US in the past but gave up.

While Telegram has served as the chief means of communication for the two markets, the stablecoin cryptocurrency Tether has served as their primary means of payment, Elliptic found. And despite Telegram’s new round of removals of their channels and accounts, Xinbi Guarantee and Huione Guarantee are far from the only companies to use Tether and Telegram to create essentially a new, largely Chinese-language darknet: Elliptic is tracking close to 30 similar marketplaces, Robinson says, though he declined to name others in the midst of the company’s investigations.

Just as Telegram shows new signs of cracking down on that sprawling black market, Tether, too, has the ability to disrupt criminal use of its services. Unlike other more decentralized cryptocurrencies such as Bitcoin, Tether can freeze payments when it identifies bad actors. Yet it’s not clear to what degree Tether has taken measures to stop Chinese-language crypto scammers and others on Xinbi Guarantee and Huione Guarantee from using its currency.

When WIRED wrote to Tether to ask about its role in those black markets, the company responded in a statement that it encourages “firms like Elliptic and other blockchain intelligence providers to share critical data with law enforcement so we can act swiftly and in coordination.”

“We are not passive observers—we are active players in the global fight against financial crime,” the Tether statement continued. “If you’re considering using Tether for illicit purposes, think again: it is the most traceable asset in existence. We will identify you, and we will work to ensure you are brought to justice.”

Despite that promise—and Telegram’s new effort to remove Huione Guarantee and Xinbi Guarantee from its platform—both tools have already been used to facilitate tens of billions of dollars in theft and other black market deals, much of it occurring in plain sight. The two largely illegal and very public markets have been “remarkable for both the scale at which they’re operating and also the brazenness,” says Harvard’s Jacob Sims.

Given that brazenness and the massive criminal fortunes at stake, expect both markets to attempt a revival in some form—and plenty of competitors to try to take their place atop the Chinese-language crypto crime economy.

This story originally appeared on wired.com.

Photo of WIRED

Wired.com is your essential daily guide to what’s next, delivering the most original and complete take you’ll find anywhere on innovation’s impact on technology, science, business and culture.

Incorporated in US: $8.4B money launderer for Chinese-speaking crypto scammers Read More »

welcome-to-the-age-of-paranoia-as-deepfakes-and-scams-abound

Welcome to the age of paranoia as deepfakes and scams abound


AI-driven fraud is leading people to verify every online interaction they have.

These days, when Nicole Yelland receives a meeting request from someone she doesn’t already know, she conducts a multistep background check before deciding whether to accept. Yelland, who works in public relations for a Detroit-based nonprofit, says she’ll run the person’s information through Spokeo, a personal data aggregator that she pays a monthly subscription fee to use. If the contact claims to speak Spanish, Yelland says, she will casually test their ability to understand and translate trickier phrases. If something doesn’t quite seem right, she’ll ask the person to join a Microsoft Teams call—with their camera on.

If Yelland sounds paranoid, that’s because she is. In January, before she started her current nonprofit role, Yelland says, she got roped into an elaborate scam targeting job seekers. “Now, I do the whole verification rigamarole any time someone reaches out to me,” she tells WIRED.

Digital imposter scams aren’t new; messaging platforms, social media sites, and dating apps have long been rife with fakery. In a time when remote work and distributed teams have become commonplace, professional communications channels are no longer safe, either. The same artificial intelligence tools that tech companies promise will boost worker productivity are also making it easier for criminals and fraudsters to construct fake personas in seconds.

On LinkedIn, it can be hard to distinguish a slightly touched-up headshot of a real person from a too-polished, AI-generated facsimile. Deepfake videos are getting so good that longtime email scammers are pivoting to impersonating people on live video calls. According to the US Federal Trade Commission, reports of job and employment related scams nearly tripled from 2020 to 2024, and actual losses from those scams have increased from $90 million to $500 million.

Yelland says the scammers that approached her back in January were impersonating a real company, one with a legitimate product. The “hiring manager” she corresponded with over email also seemed legit, even sharing a slide deck outlining the responsibilities of the role they were advertising. But during the first video interview, Yelland says, the scammers refused to turn their cameras on during a Microsoft Teams meeting and made unusual requests for detailed personal information, including her driver’s license number. Realizing she’d been duped, Yelland slammed her laptop shut.

These kinds of schemes have become so widespread that AI startups have emerged promising to detect other AI-enabled deepfakes, including GetReal Labs and Reality Defender. OpenAI CEO Sam Altman also runs an identity-verification startup called Tools for Humanity, which makes eye-scanning devices that capture a person’s biometric data, create a unique identifier for their identity, and store that information on the blockchain. The whole idea behind it is proving “personhood,” or that someone is a real human. (Lots of people working on blockchain technology say that blockchain is the solution for identity verification.)

But some corporate professionals are turning instead to old-fashioned social engineering techniques to verify every fishy-seeming interaction they have. Welcome to the Age of Paranoia, when someone might ask you to send them an email while you’re mid-conversation on the phone, slide into your Instagram DMs to ensure the LinkedIn message you sent was really from you, or request you text a selfie with a time stamp, proving you are who you claim to be. Some colleagues say they even share code words with each other, so they have a way to ensure they’re not being misled if an encounter feels off.

“What’s funny is, the lo-fi approach works,” says Daniel Goldman, a blockchain software engineer and former startup founder. Goldman says he began changing his own behavior after he heard a prominent figure in the crypto world had been convincingly deepfaked on a video call. “It put the fear of god in me,” he says. Afterward, he warned his family and friends that even if they hear what they believe is his voice or see him on a video call asking for something concrete—like money or an Internet password—they should hang up and email him first before doing anything.

Ken Schumacher, founder of the recruitment verification service Ropes, says he’s worked with hiring managers who ask job candidates rapid-fire questions about the city where they claim to live on their résumé, such as their favorite coffee shops and places to hang out. If the applicant is actually based in that geographic region, Schumacher says, they should be able to respond quickly with accurate details.

Another verification tactic some people use, Schumacher says, is what he calls the “phone camera trick.” If someone suspects the person they’re talking to over video chat is being deceitful, they can ask them to hold up their phone camera to show their laptop. The idea is to verify whether the individual may be running deepfake technology on their computer, obscuring their true identity or surroundings. But it’s safe to say this approach can also be off-putting: Honest job candidates may be hesitant to show off the inside of their homes or offices, or worry a hiring manager is trying to learn details about their personal lives.

“Everyone is on edge and wary of each other now,” Schumacher says.

While turning yourself into a human captcha may be a fairly effective approach to operational security, even the most paranoid admit these checks create an atmosphere of distrust before two parties have even had the chance to really connect. They can also be a huge time suck. “I feel like something’s gotta give,” Yelland says. “I’m wasting so much time at work just trying to figure out if people are real.”

Jessica Eise, an assistant professor studying climate change and social behavior at Indiana University Bloomington, says her research team has been forced to essentially become digital forensics experts due to the amount of fraudsters who respond to ads for paid virtual surveys. (Scammers aren’t as interested in the unpaid surveys, unsurprisingly.) For one of her research projects, which is federally funded, all of the online participants have to be over the age of 18 and living in the US.

“My team would check time stamps for when participants answered emails, and if the timing was suspicious, we could guess they might be in a different time zone,” Eise says. “Then we’d look for other clues we came to recognize, like certain formats of email address or incoherent demographic data.”

Eise says the amount of time her team spent screening people was “exorbitant” and that they’ve now shrunk the size of the cohort for each study and have turned to “snowball sampling,” or recruiting people they know personally to join their studies. The researchers are also handing out more physical flyers to solicit participants in person. “We care a lot about making sure that our data has integrity, that we’re studying who we say we’re trying to study,” she says. “I don’t think there’s an easy solution to this.”

Barring any widespread technical solution, a little common sense can go a long way in spotting bad actors. Yelland shared with me the slide deck that she received as part of the fake job pitch. At first glance, it seemed legit, but when she looked at it again, a few details stood out. The job promised to pay substantially more than the average salary for a similar role in her location and offered unlimited vacation time, generous paid parental leave, and fully covered health care benefits. In today’s job environment, that might have been the biggest tipoff of all that it was a scam.

This story originally appeared on wired.com.

Photo of WIRED

Wired.com is your essential daily guide to what’s next, delivering the most original and complete take you’ll find anywhere on innovation’s impact on technology, science, business and culture.

Welcome to the age of paranoia as deepfakes and scams abound Read More »

new-lego-building-ai-creates-models-that-actually-stand-up-in-real-life

New Lego-building AI creates models that actually stand up in real life

The LegoGPT system works in three parts, shown in this diagram.

The LegoGPT system works in three parts, shown in this diagram. Credit: Pun et al.

The researchers also expanded the system’s abilities by adding texture and color options. For example, using an appearance prompt like “Electric guitar in metallic purple,” LegoGPT can generate a guitar model, with bricks assigned a purple color.

Testing with robots and humans

To prove their designs worked in real life, the researchers had robots assemble the AI-created Lego models. They used a dual-robot arm system with force sensors to pick up and place bricks according to the AI-generated instructions.

Human testers also built some of the designs by hand, showing that the AI creates genuinely buildable models. “Our experiments show that LegoGPT produces stable, diverse, and aesthetically pleasing Lego designs that align closely with the input text prompts,” the team noted in its paper.

When tested against other AI systems for 3D creation, LegoGPT stands out through its focus on structural integrity. The team tested against several alternatives, including LLaMA-Mesh and other 3D generation models, and found its approach produced the highest percentage of stable structures.

A video of two robot arms building a LegoGPT creation, provided by the researchers.

Still, there are some limitations. The current version of LegoGPT only works within a 20×20×20 building space and uses a mere eight standard brick types. “Our method currently supports a fixed set of commonly used Lego bricks,” the team acknowledged. “In future work, we plan to expand the brick library to include a broader range of dimensions and brick types, such as slopes and tiles.”

The researchers also hope to scale up their training dataset to include more objects than the 21 categories currently available. Meanwhile, others can literally build on their work—the researchers released their dataset, code, and models on their project website and GitHub.

New Lego-building AI creates models that actually stand up in real life Read More »

linux-kernel-is-leaving-486-cpus-behind,-only-18-years-after-the-last-one-made

Linux kernel is leaving 486 CPUs behind, only 18 years after the last one made

It’s not the first time Torvalds has suggested dropping support for 32-bit processors and relieving kernel developers from implementing archaic emulation and work-around solutions. “We got rid of i386 support back in 2012. Maybe it’s time to get rid of i486 support in 2022,” Torvalds wrote in October 2022. Failing major changes to the 6.15 kernel, which will likely arrive late this month, i486 support will be dropped.

Where does that leave people running a 486 system for whatever reason? They can run older versions of the Linux kernel and Linux distributions. They might find recommendations for teensy distros like MenuetOS, KolibriOS, and Visopsys, but all three of those require at least a Pentium. They can run FreeDOS. They might get away with the OS/2 descendant ArcaOS. There are some who have modified Windows XP to run on 486 processors, and hopefully, they will not connect those devices to the Internet.

Really, though, if you’re dedicated enough to running a 486 system in 2025, you’re probably resourceful enough to find copies of the software meant for that system. One thing about computers—you never stop learning.

This post was updated at 3: 30 p.m. to fix a date error.

Linux kernel is leaving 486 CPUs behind, only 18 years after the last one made Read More »

fidji-simo-joins-openai-as-new-ceo-of-applications

Fidji Simo joins OpenAI as new CEO of Applications

In the message, Altman described Simo as bringing “a rare blend of leadership, product and operational expertise” and expressed that her addition to the team makes him “even more optimistic about our future as we continue advancing toward becoming the superintelligence company.”

Simo becomes the newest high-profile female executive at OpenAI following the departure of Chief Technology Officer Mira Murati in September. Murati, who had been with the company since 2018 and helped launch ChatGPT, left alongside two other senior leaders and founded Thinking Machines Lab in February.

OpenAI’s evolving structure

The leadership addition comes as OpenAI continues to evolve beyond its origins as a research lab. In his announcement, Altman described how the company now operates in three distinct areas: as a research lab focused on artificial general intelligence (AGI), as a “global product company serving hundreds of millions of users,” and as an “infrastructure company” building systems that advance research and deliver AI tools “at unprecedented scale.”

Altman mentioned that as CEO of OpenAI, he will “continue to directly oversee success across all pillars,” including Research, Compute, and Applications, while staying “closely involved with key company decisions.”

The announcement follows recent news that OpenAI abandoned its original plan to cede control of its nonprofit branch to a for-profit entity. The company began as a nonprofit research lab in 2015 before creating a for-profit subsidiary in 2019, maintaining its original mission “to ensure artificial general intelligence benefits everyone.”

Fidji Simo joins OpenAI as new CEO of Applications Read More »

cheaters-gonna-cheat-cheat-cheat-cheat-cheat

Cheaters Gonna Cheat Cheat Cheat Cheat Cheat

Cheaters. Kids these days, everyone says, are all a bunch of blatant cheaters via AI.

Then again, look at the game we are forcing them to play, and how we grade it.

If you earn your degree largely via AI, that changes two distinct things.

  1. You might learn different things.

  2. You might signal different things.

Both learning and signaling are under threat if there is too much blatant cheating.

There is too much cheating going on, too blatantly.

Why is that happening? Because the students are choosing to do it.

Ultimately, this is a preview of what will happen everywhere else as well. It is not a coincidence that AI starts its replacement of work in the places where the work is the most repetitive, useless and fake, but its ubiquitousness will not stay confined there. These are problems and also opportunities we will face everywhere. The good news is that in other places the resulting superior outputs will actually produce value.

  1. You Could Take The White Pill, But You Probably Won’t.

  2. Is Our Children Learning.

  3. Cheaters Never Stop Cheating.

  4. If You Know You Know.

  5. The Real Victims Here.

  6. Taking Note.

  7. What You Going To Do About It, Punk?

  8. How Bad Are Things?

  9. The Road to Recovery.

  10. The Whispering Earring.

As I always say, if you have access to AI, you can use it to (A) learn and grow strong and work better, or (B) you can use it to avoid learning, growing and working. Or you can always (C) refuse to use it at all, or perhaps (D) use it in strictly limited capacities that you choose deliberately to save time but avoid the ability to avoid learning.

Choosing (A) and using AI to learn better and smarter is strictly better than choosing (C) and refusing to use AI at all.

If you choose (B) and use AI to avoid learning, you might be better or worse off than choosing (C) and refusing to use AI at all, depending on the value of the learning you are avoiding.

If the learning in question is sufficiently worthless, there’s no reason to invest in it, and (B) is not only better than (C) but also better than (A).

Tim Sweeney: The question is not “is it cheating”, the question is “is it learning”.

James Walsh: AI has made Daniel more curious; he likes that whenever he has a question, he can quickly access a thorough answer. But when he uses AI for homework, he often wonders, If I took the time to learn that, instead of just finding it out, would I have learned a lot more?

I notice I am confused. What is the difference between ‘learning that’ and ‘just finding it out’? And what’s to stop Daniel from walking through the a derivation or explanation with the AI if he wants to do that? I’ve done that a bunch with ML, and it’s great. o3’s example here was being told and memorizing the integral of sin x is -cos x rather than deriving it, but that was what most students always did anyway.

The path you take is up to you.

Ted Chiang: Using ChatGPT to complete tasks is like taking a forklift to the gym: you’ll never improve your cognitive abilities that way.”

Ewan Morrison: AI is demoralising universities. Students who use AI, think “why bother to study or write when AI can do it for me?” Tutors who mark the essays, think “why bother to teach these students & why give a serious grade when 90% of essays are done with AI?”

I would instead ask, why are you assigning essays the AI can do for them, without convincing the students why they should still write the essays themselves?

The problem, as I understand it, is that in general students are more often than not:

  1. Not that interested in learning.

  2. Do not think that their assignments are a good way to learn.

  3. Quite interested in not working.

  4. Quite interested getting good grades.

  5. Know how to use ChatGPT to avoid learning.

  6. Do not know how to use ChatGPT to learn, or it doesn’t even occur to them.

  7. Aware that if they did use ChatGPT to learn, it wouldn’t be via schoolwork.

Meatball Times: has anyone stopped to ask WHY students cheat? would a buddhist monk “cheat” at meditation? would an artist “cheat” at painting? no. when process and outcomes are aligned, there’s no incentive to cheat. so what’s happening differently at colleges? the answer is in the article.

Colin Fraser (being right): “would an artist ‘cheat’ at a painting?”

I mean… yes, famously.

Now that the cost of such cheating is close to zero I expect that we will be seeing a lot more of it!

James Walsh: Although Columbia’s policy on AI is similar to that of many other universities’ — students are prohibited from using it unless their professor explicitly permits them to do so, either on a class-by-class or case-by-case basis — Lee said he doesn’t know a single student at the school who isn’t using AI to cheat. To be clear, Lee doesn’t think this is a bad thing.

If the reward for painting is largely money, which it is, then clearly if you give artists the ability to cheat then many of them will cheat, as in things like forgery, as they often have in the past. The way to stop them is to catch the ones who try.

The reason the Buddhist monk presumably wouldn’t ‘cheat’ at meditation is because they are not trying to Be Observed Performing Meditation, they want to meditate. But yes, if they were getting other rewards for meditation, I’d expect some cheating, sure, even if the meditation also had intrinsic rewards.

Back to the school question. If the students did know how to use AI to learn, why would they need the school, or to do the assignments?

The entire structure of school is based on the thesis that students need to be forced to learn, and that this learning must be constantly policed.

The thesis has real validity. At this point, with not only AI but also YouTube and plenty of other free online materials, the primary educational (non-social, non-signaling) product is that the class schedule and physical presence, and exams and assignments, serve as a forcing function to get you to do the damn work and pay attention, even if inefficiently.

Zito (quoting the NYMag article): The kids are cooked.

Yishan: One of my kids buys into the propaganda that AI is environmentally harmful (not helped by what xAI is doing in Memphis, btw), and so refuses to use AI for any help on learning tough subjects. The kid just does the work, grinding it out, and they are getting straight A’s.

And… now I’m thinking maybe I’ll stop trying to convince the kid otherwise.

It’s entirely not obvious whether it would be a good idea to convince the kid otherwise. Using AI is going to be the most important skill, and it can make the learning much better, but maybe it’s fine to let the kid wait given the downside risks of preventing that?

The reason taking such a drastic (in)action might make sense is that the kids know the assignments are stupid and fake. The whole thesis of commitment devices that lead to forced work is based on the idea that the kids (or their parents) understand that they do need to be forced to work, so they need this commitment device, and also that the commitment device is functional.

Now both of those halves are broken. The commitment devices don’t work, you can simply cheat. And the students are in part trying to be lazy, sure, but they’re also very consciously not seeing any value here. Lee here is not typical in that he goes on to actively create a cheating startup but I mean, hey, was he wrong?

James Walsh: “Most assignments in college are not relevant,” [Columbia student Lee] told me. “They’re hackable by AI, and I just had no interest in doing them.”

While other new students fretted over the university’s rigorous core curriculum, described by the school as “intellectually expansive” and “personally transformative,” Lee used AI to breeze through with minimal effort.

When I asked him why he had gone through so much trouble to get to an Ivy League university only to off-load all of the learning to a robot, he said, “It’s the best place to meet your co-founder and your wife.”

Bingo. Lee knew this is no way to learn. That’s not why he was there.

Columbia can call its core curriculum ‘intellectually expansive’ and ‘personally transformative’ all it wants. That doesn’t make it true, and it definitely isn’t fooling that many of the students.

The key fact about cheaters is that they not only never stop cheating on their own. They escalate the extent of their cheating until they are caught. Once you pop enough times, you can’t stop. Cheaters learn to cheat as a habit, not as the result of an expected value calculation in each situation.

For example, if you put a Magic: the Gathering cheater onto a Twitch stream, where they will leave video evidence of their cheating, will they stop? No, usually not.

Thus, you can literally be teaching ‘Ethics and AI’ and ask for a personal reflection, essentially writing a new line of Ironic, and they will absolutely get it from ChatGPT.

James Walsh: Less than three months later, teaching a course called Ethics and Artificial Intelligence, [Brian Patrick Green] figured a low-stakes reading reflection would be safe — surely no one would dare use ChatGPT to write something personal. But one of his students turned in a reflection with robotic language and awkward phrasing that Green knew was AI-generated.

This is a way to know students are indeed cheating rather than using AI to learn. The good news? Teachable moment.

Lee in particular clearly doesn’t have a moral compass in any of this. He doesn’t get the idea that cheating can be wrong even in theory:

For now, Lee hopes people will use Cluely to continue AI’s siege on education. “We’re going to target the digital LSATs; digital GREs; all campus assignments, quizzes, and tests,” he said. “It will enable you to cheat on pretty much everything.”

If you’re enabling widespread cheating on the LSATs and GREs, you’re no longer a morally ambiguous rebel against the system. Now you’re just a villain.

Or you can have a code:

James Walsh: Wendy, a freshman finance major at one of the city’s top universities, told me that she is against using AI. Or, she clarified, “I’m against copy-and-pasting. I’m against cheating and plagiarism. All of that. It’s against the student handbook.”

Then she described, step-by-step, how on a recent Friday at 8 a.m., she called up an AI platform to help her write a four-to-five-page essay due two hours later.

Wendy will use AI for ‘all aid short of copy-pasting,’ the same way you would use Google or Wikipedia or you’d ask a friend questions, but she won’t copy-and-paste. The article goes on to describe her full technique. AI can generate an outline, and brainstorm ideas and arguments, so long as the words are hers.

That’s not an obviously wrong place to draw the line. It depends on which part of the assignment is the active ingredient. Is Wendy supposed to be learning:

  1. How to structure, outline and manufacture a school essay in particular?

  2. How to figure out what a teacher wants her to do?

  3. ‘How to write’?

  4. How to pick a ‘thesis’?

  5. How to find arguments and bullet points?

  6. The actual content of the essay?

  7. An assessment of how good she is rather than grademaxxing?

Wendy says planning the essay is fun, but ‘she’d rather get good grades.’ As in, the system actively punishes her for trying to think about such questions rather than being the correct form of fake. She is still presumably learning about the actual content of the essay, and by producing it, if there’s any actual value to the assignment, and she pays attention, she’ll pick up the reasons why the AI makes the essay the way it does.

I don’t buy that this is going to destroy Wendy’s ‘critical thinking’ skills. Why are we teaching her that school essay structures and such are the way to train critical thinking? Everything in my school experience says the opposite.

The ‘cheaters’ who only cheat or lie a limited amount and then stop have a clear and coherent model of why what they are doing in the contexts they cheat or lie in is not cheating or why it is acceptable or justified, and this is contrasted with other contexts. Why some rules are valid, and others are not. Even then, it usually takes a far stronger person to hold that line than to not cheat in the first place.

Another way to look at this is, if it’s obvious from the vibes that you cheated, you cheated, even if the system can’t prove it. The level of obviousness varies, you can’t always sneak in smoking gun instructions.

But if you invoke the good Lord Bayes, you know.

James Walsh: Most of the writing professors I spoke to told me that it’s abundantly clear when their students use AI.

Not that they flag it.

Still, while professors may think they are good at detecting AI-generated writing, studies have found they’re actually not. One, published in June 2024, used fake student profiles to slip 100 percent AI-generated work into professors’ grading piles at a U.K. university. The professors failed to flag 97 percent.

But there’s a huge difference between ‘I flag this as AI and am willing to fight over this’ and knowing that something was probably or almost certainly AI.

What about automatic AI detectors? They’re detecting something. It’s noisy, and it’s different, it’s not that hard to largely fool if you care, and it has huge issues (especially for ESL students) but I don’t think either of these responses is an error?

I fed Wendy’s essay through a free AI detector, ZeroGPT, and it came back as 11.74 AI-generated, which seemed low given that AI, at the very least, had generated her central arguments. I then fed a chunk of text from the Book of Genesis into ZeroGPT and it came back as 93.33 percent AI-generated.

If you’re direct block quoting Genesis without attribution, your essay is plagiarized. Maybe it came out of the AI and maybe it didn’t, but it easily could have, it knows Genesis and it’s allowed to quote from it. So 93% seems fine. Whereas Wendy’s essay is written by Wendy, the AI was used to make it conform to the dumb structures and passwords of the course. 11% seems fine.

Colin Fraser: I think we’ve somehow swung to overestimating the number of kids who are cheating with ChatGPT and simultaneously underestimating the amount of grief and hassle this creates for educators.

The guy making the cheating app wants you to think every single other person out there is cheating at everything and you’re falling behind if you’re not cheating. That’s not true. But the spectre a few more plagiarized assignments per term is massively disruptive for teachers.

James Walsh: Many teachers now seem to be in a state of despair.

I’m sorry, what?

Given how estimations work, I can totally believe we might be overestimating the number of kids who are cheating. Of course, the number is constantly rising, especially for the broader definitions of ‘cheating,’ so even if you were overestimating at the time you might not be anymore.

But no, this is not about ‘a few more plagiarized assignments per term,’ both because this isn’t plagiarism it’s a distinct other thing, and also because by all reports it’s not only a few cases, it’s an avalanche even if underestimated.

Doing the assignments yourself is now optional unless you force the student to do it in front of you. Deal with it.

As for this being ‘grief and hassle’ for educators, yes, I am sure it is annoying when your system of forced fake work can be faked back at you more effectively and more often, and when there is a much better source of information and explanations available than you and your textbooks such that very little of what you are doing really has a point to it anymore.

If you think students have to do certain things themselves in order to learn, then as I see it you have two options, you can do either or both.

  1. Use frequent in-person testing, both as the basis of grades and as a forcing function so that students learn. This is a time honored technique.

  2. Use in-person assignments and tasks, so you can prevent AI use. This is super annoying but it has other advantages.

Alternatively or in addition to this, you can embrace AI and design new tasks and assignments that cause students to learn together with the AI. That’s The Way.

Trying to ‘catch’ the ‘cheating’ is pointless. It won’t work. Trying only turns this at best into a battle over obscuring tool use and makes the whole experience adversarial.

If you assign fake essay forms to students, and then grade them on those essays and use those grades to determine their futures, what the hell do you think is going to happen? This form of essay assignment is no longer valid, and if you assign it anyway you deserve what you get.

James Walsh: “I think we are years — or months, probably — away from a world where nobody thinks using AI for homework is considered cheating,” [Lee] said.

I think that is wrong. We are a long way away from the last people giving up this ghost. But seriously it is pretty insane to think ‘using AI for homework’ is cheating. I’m actively trying to get my kids to use AI for homework more, not less.

James Walsh: In January 2023, just two months after OpenAI launched ChatGPT, a survey of 1,000 college students found that nearly 90 percent of them had used the chatbot to help with homework assignments.

What percentage of that 90% was ‘cheating’? We don’t know, and definitions differ, but I presume a lot less than all of them.

Now and also going forward, I think you could say that particular specific uses are indeed really cheating, and it depends how you use it. But if you think ‘use AI to ask questions about the world and learn the answer’ is ‘cheating’ then explain what the point of the assignment was, again?

The whole enterprise is broken, and will be broken while there is a fundamental disconnect between what is measured and what they want to be managing.

James Walsh: Williams knew most of the students in this general-education class were not destined to be writers, but he thought the work of getting from a blank page to a few semi-coherent pages was, above all else, a lesson in effort. In that sense, most of his students utterly failed.

[Jollimore] worries about the long-term consequences of passively allowing 18-year-olds to decide whether to actively engage with their assignments.

The entire article makes clear that students almost never buy that their efforts would be worthwhile. A teacher can think ‘this will teach them effort’ but if that’s the goal then why not go get an actual job? No one is buying this, so if the grades don’t reward effort, why should there be effort?

How dare you let 18-year-olds decide whether to engage with their assignments that produce no value to anyone but themselves.

This is all flat out text.

The ideal of college as a place of intellectual growth, where students engage with deep, profound ideas, was gone long before ChatGPT.

In a way, the speed and ease with which AI proved itself able to do college-level work simply exposed the rot at the core.

There’s no point. Was there ever a point?

“The students kind of recognize that the system is broken and that there’s not really a point in doing this. Maybe the original meaning of these assignments has been lost or is not being communicated to them well.”

The question is, once you know, what do you do about it? How do you align what is measured with what is to be managed? What exactly do you want from the students?

James Walsh: The “true attempt at a paper” policy ruined Williams’s grading scale. If he gave a solid paper that was obviously written with AI a B, what should he give a paper written by someone who actually wrote their own paper but submitted, in his words, “a barely literate essay”?

What is measured gets managed. You either give the better grade to the ‘barely literate’ essay, or you don’t.

My children get assigned homework. The school’s literal justification – I am not making this up, I am not paraphrasing – is that they need to learn to do homework so that they will be prepared to do more homework in the future. Often this involves giving them assignments that we have to walk them through because there is no reasonable way for them to understand what is being asked.

If it were up to me, damn right I’d have them use AI.

It’s not just the students: Multiple AI platforms now offer tools to leave AI-generated feedback on students’ essays. Which raises the possibility that AIs are now evaluating AI-generated papers, reducing the entire academic exercise to a conversation between two robots — or maybe even just one.

Great! Now we can learn.

Another AI application to university is note taking. AI can do excellent transcription and rather strong active note taking. Is that a case of learning, or of not learning? There are competing theories, which I think are true for different people at different times.

  1. One theory says that the act of taking notes is how you learn, by forcing you to pay attention, distill the information and write it in your own words.

  2. The other theory is that having to take notes prevents you from actually paying ‘real’ attention and thinking and engaging, you’re too busy writing down factual information.

AI also means that even if you don’t have it take notes or a transcript, you don’t have to worry as much about missing facts, because you can ask the AI for them later.

My experience is that having to take notes is mostly a negative. Every time I focus on writing something down that means I’m not listening, or not fully listening, and definitely not truly thinking.

Rarely did she sit in class and not see other students’ laptops open to ChatGPT.

Of course your laptop is open to an AI. It’s like being able to ask the professor any questions you like without interrupting the class or paying any social costs, including stupid questions. If there’s a college lecture, and at no point do you want to ask Gemini, Claude or o3 any questions, what are you even doing? That also means everyone gets to learn much better, removing the tradeoff of each question disrupting the rest of the class.

Similarly, devising study materials and practice tests seems clearly good.

The most amazing thing about the AI ‘cheating’ epidemic at universities is the extent to which the universities are content to go quietly into the night. They are mostly content to let nature take its course.

Could the universities adapt to the new reality? Yes, but they choose not to.

Cat Zhang: more depressing than Trump’s funding slashes and legal assaults and the Chat-GPT epidemic is witnessing how many smart, competent people would rather give up than even begin to think of what we could do about it

Tyler Austin Harper: It can’t be emphasized enough: wide swaths of the academy have given up re ChatGPT. Colleges have had since 2022 to figure something out and have done less than nothing. Haven’t even tried. Or tried to try. The administrative class has mostly collaborated with the LLM takeover.

Hardly anyone in this country believes in higher ed, especially the institutions themselves which cannot be mustered to do anything in their own defense. Faced with an existential threat, they can’t be bothered to cry, yawn, or even bury their head in the sand, let alone resist.

It would actually be more respectable if they were in denial, but the pervading sentiment is “well, we had a good run.” They don’t even have the dignity of being delusional. It’s shocking. Three years in and how many universities can you point to that have tried anything really?

If the AI crisis points to anything it’s that higher ed has been dead a long time, before ChatGPT was twinkle in Sam Altman’s eye. The reason the universities can’t be roused to their own defense is that they’re being asked to defend a corpse and the people who run them know it.

They will return to being finishing schools once again.

To paraphrase Alan Moore, this is one of those moments where colleges need to look at what’s on the table and (metaphorically) say: “Thank you, but I’d rather die behind the chemical sheds.” Instead, we get an OpenAI and Cal State partnership. Total, unapologetic capitulation.

The obvious interpretation is that college had long shifted into primarily being a Bryan Caplan style set of signaling mechanisms, so the universities are not moving to defend themselves against students who seek to avoid learning.

The problem is, this also destroys key portions of the underlying signals.

Greg Lukainoff: [Tyler’s statement above is] powerful evidence of the signaling hypothesis, that essentially the primary function of education is to signal to future employers that you were probably pretty smart and conscientious to get into college in the first place, and pretty, as @bryan_caplan puts it, “conservative” in a (non-political sense) to be able to finish it. Therefore graduates may be potentially competent and compliant employees.

Seems like there are far less expensive ways to convey that information.

Clark H: The problem is the signal is now largely false. It takes much less effort to graduate from college now – just crudely ask GPT to do it. There is even a case to be made that, like a prison teaches how to crime, college now teaches how to cheat.

v8pAfNs82P1foT: There’s a third signal of value to future employers: conformity to convention/expectation. There are alternative credible pathways to demonstrate intelligence and sustained diligence. But definitionally, the only way to credibly signal willingness to conform is to conform.

Megan McArdle: The larger problem is that a degree obtained by AI does not signal the information they are trying to convey, so its value is likely to collapse quickly as employers get wise. There will be a lag, because cultural habits die hard, but eventually the whole enterprise will implode unless they figure out how to teach something that employers will pay a premium for.

Matthew Yglesias: I think this is all kind of missing the boat, the same AI that can pass your college classes for you is radically devaluing the skills that a college degree (whether viewed as real learning or just signaling or more plausibly a mix) used to convey in the market.

The AI challenge for higher education isn’t that it’s undermining the assessment protocols (as everyone has noticed you can fix this with blue books or oral exams if you bother trying) it’s that it’s undermining the financial value of the degree!

Megan McArdle: Eh, conscientiousness is likely to remain valuable, I think. They also provide ancillary marriage market and networking services that arguably get more valuable in an age of AI.

Especially at elite schools. If you no longer have to spend your twenties and early thirties prepping for the PUMC rat race, why not get married at 22 and pop out some babies while you still have energy to chase them?

But anyway, yes, this is what I was saying, apparently not clearly enough: the problem is not just that you can’t assess certain kinds of paper-writing skills, it’s that the skills those papers were assessing will decline in value.

Periodically you see talk about how students these days (or kids these days) are in trouble. How they’re stupider, less literate, they can’t pay attention, they’re lazy and refuse to do work, and so on.

“We’re talking about an entire generation of learning perhaps significantly undermined here,” said Green, the Santa Clara tech ethicist. “It’s short-circuiting the learning process, and it’s happening fast.”

The thing is, this is a Pessimists Archive speciality, this pattern dates back at least to Socrates. People have always worried about this, and the opposite has very clearly been true overall. It’s learning, and also many other things, where ‘kids these days’ are always ‘in crisis’ and ‘falling behind’ and ‘at risk’ and so on.

My central understanding for this is that as times change, people compare kids now to kids of old both through rose-colored memory glasses, and also by checking against the exact positive attributes of the previous generations. Whereas as times change, the portfolio of skills and knowledge shifts. Today’s kids are masters at many things that didn’t even exist in my youth. That’s partly going to be a shift away from other things, most of which are both less important than the new priorities and less important than they were.

Ron Arts: Most important sentence in the article: “There might have been people complaining about machinery replacing blacksmiths in, like, the 1600s or 1800s, but now it’s just accepted that it’s useless to learn how to blacksmith.”

George Turner: Blacksmithing is an extremely useful skill. Even if I’m finishing up the part on a big CNC machine or with an industrial robot, there are times when smithing saves me a lot of time.

Bob BTC: Learning a trade is far different than learning to think!

Is it finally ‘learning to think’ this time? Really? Were they reading the sequences? Could previous students have written them?

And yes, people really will use justifications for our university classes that are about as strong as ‘blacksmithing is an extremely useful skill.’

So we should be highly suspicious of yet another claim of new tech destroying kids ability to learn, especially when it is also the greatest learning tool in human history.

Notice how much better it is to use AI than it is to hire to a human to do your homework, if both had the same cost, speed and quality profiles.

For $15.95 a month, Chegg promised answers to homework questions in as little as 30 minutes, 24/7, from the 150,000 experts with advanced degrees it employed, mostly in India. When ChatGPT launched, students were primed for a tool that was faster, more capable.

With AI, you create the prompt and figure out how to frame the assignment, you can ask follow-up questions, you are in control. With hiring a human, you are much less likely to do any of that. It matters.

Ultimately, this particular cataclysm is not one I am so worried about. I don’t think our children were learning before, and they have much better opportunity to do so now. I don’t think they were acting with or being selected for integrity at university before, either. And if this destroys the value of degrees? Mostly, I’d say: Good.

If you are addicted to TikTok, ChatGPT or your phone in general, it can get pretty grim, as was often quoted.

James Walsh: Rarely did she sit in class and not see other students’ laptops open to ChatGPT. Toward the end of the semester, she began to think she might be dependent on the website. She already considered herself addicted to TikTok, Instagram, Snapchat, and Reddit, where she writes under the username maybeimnotsmart. “I spend so much time on TikTok,” she said. “Hours and hours, until my eyes start hurting, which makes it hard to plan and do my schoolwork. With ChatGPT, I can write an essay in two hours that normally takes 12.”

The ‘catch’ that isn’t mentioned is that She Got Better.

Colin Fraser: Kind of an interesting omission. Not THAT interesting or anything but, you know, why didn’t he put that in the article?

I think it’s both interesting and important context. If your example of a student addicted to ChatGPT and her phone beat that addiction, that’s highly relevant. It’s totally within Bounded Distrust rules to not mention it, but hot damn. Also, congrats to maybeimnotsosmart.

Ultimately the question is, if you have access to increasingly functional copies of The Whispering Earring, what should you do with that? If others get access to it, what then? What do we do about educational situations ‘getting there first’?

In case you haven’t read The Whispering Earring, it’s short and you should, and I’m very confident the author won’t mind, so here’s the whole story.

Scott Alexander: Clarity didn’t work, trying mysterianism.

In the treasure-vaults of Til Iosophrang rests the Whispering Earring, buried deep beneath a heap of gold where it can do no further harm.

The earring is a little topaz tetrahedron dangling from a thin gold wire. When worn, it whispers in the wearer’s ear: “Better for you if you take me off.” If the wearer ignores the advice, it never again repeats that particular suggestion.

After that, when the wearer is making a decision the earring whispers its advice, always of the form “Better for you if you…”. The earring is always right. It does not always give the best advice possible in a situation. It will not necessarily make its wearer King, or help her solve the miseries of the world. But its advice is always better than what the wearer would have come up with on her own.

It is not a taskmaster, telling you what to do in order to achieve some foreign goal. It always tells you what will make you happiest. If it would make you happiest to succeed at your work, it will tell you how best to complete it. If it would make you happiest to do a half-assed job at your work and then go home and spend the rest of the day in bed having vague sexual fantasies, the earring will tell you to do that. The earring is never wrong.

The Book of Dark Waves gives the histories of two hundred seventy four people who previously wore the Whispering Earring. There are no recorded cases of a wearer regretting following the earring’s advice, and there are no recorded cases of a wearer not regretting disobeying the earring. The earring is always right.

The earring begins by only offering advice on major life decisions. However, as it gets to know a wearer, it becomes more gregarious, and will offer advice on everything from what time to go to sleep, to what to eat for breakfast. If you take its advice, you will find that breakfast food really hit the spot, that it was exactly what you wanted for breakfast that day even though you didn’t know it yourself. The earring is never wrong.

As it gets completely comfortable with its wearer, it begins speaking in its native language, a series of high-bandwidth hisses and clicks that correspond to individual muscle movements. At first this speech is alien and disconcerting, but by the magic of the earring it begins to make more and more sense. No longer are the earring’s commands momentous on the level of “Become a soldier”. No more are they even simple on the level of “Have bread for breakfast”. Now they are more like “Contract your biceps muscle about thirty-five percent of the way” or “Articulate the letter p”. The earring is always right. This muscle movement will no doubt be part of a supernaturally effective plan toward achieving whatever your goals at that moment may be.

Soon, reinforcement and habit-formation have done their trick. The connection between the hisses and clicks of the earring and the movements of the muscles have become instinctual, no more conscious than the reflex of jumping when someone hidden gives a loud shout behind you.

At this point no further change occurs in the behavior of the earring. The wearer lives an abnormally successful life, usually ending out as a rich and much-beloved pillar of the community with a large and happy family.

When Kadmi Rachumion came to Til Iosophrang, he took an unusual interest in the case of the earring. First, he confirmed from the records and the testimony of all living wearers that the earring’s first suggestion was always that the earring itself be removed. Second, he spent some time questioning the Priests of Beauty, who eventually admitted that when the corpses of the wearers were being prepared for burial, it was noted that their brains were curiously deformed: the neocortexes had wasted away, and the bulk of their mass was an abnormally hypertrophied mid- and lower-brain, especially the parts associated with reflexive action.

Finally, Kadmi-nomai asked the High Priest of Joy in Til Iosophrang for the earring, which he was given. After cutting a hole in his own earlobe with the tip of the Piercing Star, he donned the earring and conversed with it for two hours, asking various questions in Kalas, in Kadhamic, and in its own language. Finally he removed the artifact and recommended that the it be locked in the deepest and most inaccessible parts of the treasure vaults, a suggestion with which the Iosophrelin decided to comply.

This is very obviously not the optimal use of The Whispering Earring, let alone the ability to manufacture copies of it.

But, and our future may depend on the answer, what is your better plan? And in particular, what is your plan for when everyone has access to (a for now imperfect and scope limited but continuously improving) one, and you are at a rather severe disadvantage if you do not put one on?

The actual problem we face is far trickier than that. Both in education, and in general.

Discussion about this post

Cheaters Gonna Cheat Cheat Cheat Cheat Cheat Read More »