Author name: Kelly Newman

“streaming-stops-feeling-infinite”:-what-subscribers-can-expect-in-2026

“Streaming stops feeling infinite”: What subscribers can expect in 2026


Spoiler: expect higher prices

Streaming may get a little worse before it gets better.

We’re far from streaming’s original promise: instant access to beloved and undiscovered titles without the burden of ads, bundled services, or price gouging that have long been associated with cable.

Still, every year we get more dependent on streaming for entertainment. Despite streaming services’ flaws, many of us are bound to keep subscribing to at least one service next year. Here’s what we can expect in 2026 and beyond.

Subscription prices keep rising, but perhaps not as expected

There’s virtually no hope of streaming subscription prices plateauing in 2026. Streaming companies continue to face challenges as content production and licensing costs rise, and it’s often easier to get current customers to pay slightly more than to acquire new subscribers. Meanwhile, many streaming companies are still struggling with profitability and revenue after spending years focusing on winning subscribers with content.

“We see many services are only now aligning content spend with realistic lifetime value per subscriber,” Christofer Hamilton, industry insights manager at streaming analyst Parrot Analytics, told Ars.

Companies may get more creative with how they frame higher costs to subscribers, however. People who pay extra to stream without ads are the most likely to see price bumps as streaming companies continue pushing customers toward ad-based tiers.

Charging more for “premium” features—such as 4K streaming, simultaneous streams, or offline downloads—offers another way for streaming companies to boost revenue without implementing broad price hikes that risk provoking customer outrage. Subscribers can expect streaming prices to get “more menu-like next year,” said Michael Goodman, director of entertainment research at Parks Associates, a research firm focusing on IoT, consumer electronics, and entertainment.

When will the price hikes stop?

If streaming prices won’t stop rising next year, when will they?

Ultimately, it may be up to subscribers to vote with their dollars by canceling subscriptions or opting for cheaper or free alternatives, such as FAST (free ad-supported streaming television) channels with linear programming.

As Goodman put it, “Until we see net adds stall or decline as a result of price hikes, services have no incentive to stop raising prices.”

Some experts doubt that streaming services will ever willingly stop increasing prices. Bill Yousman, professor and director of the Media Literacy and Digital Culture graduate program at Sacred Heart University, sees precedent for this in cable companies.

“If the big streaming companies had their way, there would be no limit to their price hikes. We have already seen this with the cable monopolies and their disregard for consumer dissatisfaction,” he said.

Yousman believes that prices will only “be brought under control if there is some type of government regulation,” but he noted that’s unlikely under the Trump administration.

To date, US lawmakers haven’t shown interest in halting the steady rise of streaming prices. Most lawmakers who have sought to regulate the industry have focused on industry consolidation. There has been some effort from lawmakers to rein in streaming price hikes, though, especially through proposed federal legislation dubbed the Price Gouging Prevention Act.

Streaming services lean deeper into cable-like bundles

Companies will look to leverage subscribers’ frustration with pricing by being more aggressive about bundling third-party services like traditional pay TV, Internet, and cell phone service with streaming subscriptions. The idea is that people are less likely to cancel a streaming subscription if it’s tied to a different subscription (including another streaming subscription). The strategy echoes the days of cable, when some people kept unused landlines just to save money on cable channels or Internet service.

“For subscribers, 2026 is the year streaming stops feeling infinite and starts feeling more like premium cable used to: fewer apps, clearer bundles, and higher expectations for each service they pay for,” Parrot’s Hamilton said.

Thanks to traditional pay TV providers, bundles have a bad connotation among people looking to save money and simplify their subscriptions. But bundling doesn’t always have to be a bad thing, as Yousman explains:

If the companies wanted to really be responsive to consumers, they would let them design their own packages rather than having to choose options that may or may not include all the services they want. What works against this, of course, is the demand for ever-increasing profits at all times.

Should a sale of Warner Bros. Discovery’s (WBD’s) HBO Max be completed (late) next year, subscribers will face more pressure to bundle their streaming subscriptions.

“When dominant platforms like Netflix or Paramount absorb major content players, it accelerates the erosion of streaming’s original promise: freedom from monopolistic bundles,” Vikrant Mathur, co-founder of streaming technology provider Future Today, said.

Netflix and Paramount duke it out over Warner Bros.

WBD announced plans this month to sell its streaming and movie studios business to Netflix for an equity value of $72 billion, or an approximate total enterprise value of $82.7 billion. Paramount Skydance, however, quickly swooped in with a hostile takeover bid for all of WBD, including its cable channels, for $108.4 billion. A WBD shareholder vote will occur in spring or early summer, chairman Samuel Di Piazza told CNBC. By the end of 2026, we should have a clearer understanding of the future of HBO Max, as well as Netflix and Paramount+.

Any acquisition will be subject to regulatory scrutiny, causing more uncertainty for subscribers. If Netflix buys HBO Max, users of both services can expect higher prices due to reduced competition and the extensive amount of content and number of big-budget franchises (including Harry Potter and DC Comics) expected to unite under one platform.

“If Netflix gets [HBO Max] and the WB studios, HBO Max subscribers are more likely to see a smoother transition, strong ongoing investment in premium content, and simpler app/billing integration,” Parks Associates’ Goodman said.

But while the potential merger is worth watching, subscribers are unlikely to truly feel the impact of HBO Max potentially changing ownership until after 2026.

“Producing a show is a yearslong process, so the content that was already slated to air isn’t going to disappear, and the new content acquired through the WB library won’t be available until the merger is approved and closes,” Tre Lovell, attorney and owner of Los Angeles entertainment law firm The Lovell Firm, explained.

Content starts getting less bold

Looking beyond 2026, a sale of part or all of WBD would likely open the door for more streaming acquisitions. That could eventually benefit customers by making it easier to find content to watch with fewer subscriptions. But merged companies are also less likely to take risks on unique and diverse content.

Analysts I spoke with pointed to fewer niche and mid-tier original shows and movies and more show cancellations if either Netflix or Paramount buys HBO Max. Either buyer would probably focus more on the already-successful franchises that WB owns, such as Game of Thrones, Batman, and Superman.

“Big combined libraries push companies to double down on proven IP because it travels, merchandises, and reduces marketing risk,” said Robert Rosenberg, a partner at the New York law firm Moses Singer focusing on intellectual property, entertainment, technology, and data law.

Rosenberg also expects to see a “tilt toward” live events, sports, and unscripted content “for retention” if HBO Max sells.

In the shorter term, Rory Gooderick, research manager at analyst firm Ampere Analysis, predicted that WBD will be “cautious when greenlighting new large-scale projects until” the acquisition is finalized.

Beyond the potential HBO Max sale, more merger activity could lead to streaming services straying from their original selling point of offering bolder, quirkier content.

As the industry consolidates, “sticky content,” like procedurals, reality shows, and “comfort TV that drives long viewing sessions,” will take priority among mainstream, subscription-based streaming services, especially as they put more emphasis on ad-tier subscriptions, Goodman predicted.

A more stable future?

The new year will be formative for streaming and yield lasting impacts for subscribers. We’ve discussed numerous negative implications, but there could be a silver lining. While we may see more turbulence, hopefully, we’ll also start to see a road toward more stable streaming options.

Streaming subscribers can’t directly stop mergers or price hikes or control streaming libraries. But with services like Netflix and Disney+ focusing on becoming one-stop shops with massive libraries, there’s an opportunity for other services to hone their specialties and stand out by providing offbeat, unexpected, and rare content at more affordable prices.

As the landscape settles, streamers should be mindful of the importance of variety to subscribers. According to Bill Michels, chief product officer at Gracenote, Nielsen’s content data business unit:

There will be some consolidation. But the [connected TV] landscape, inclusive of FAST and [direct-to-consumer] channels, provides more than ample video variety for viewers, so the biggest challenge will be connecting content with the right audience. Audience engagement depends on good content. Audience retention depends on making sure audiences are never without something to watch.

Photo of Scharon Harding

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

“Streaming stops feeling infinite”: What subscribers can expect in 2026 Read More »

conde-nast-user-database-reportedly-breached,-ars-unaffected

Condé Nast User database reportedly breached, Ars unaffected

Earlier this month, a hacker named Lovely claimed to have breached a Condé Nast user database and released a list of more than 2.3 million user records from our sister publication WIRED. The released materials contain demographic information (name, email, address, phone, etc.), but no passwords.

The hacker also says that they will release an additional 40 million records for other Condé Nast properties, including our other sister publications Vogue, The New Yorker, Vanity Fair, and more. Of critical note to our readers, Ars Technica was not affected as we run on our own bespoke tech stack.

The hacker said that they had urged Condé Nast to patch vulnerabilities to no avail. “Condé Nast does not care about the security of their users data,” they wrote. “It took us an entire month to convince them to fix the vulnerabilities on their websites. We will leak more of their users’ data (40 + million) over the next few weeks. Enjoy!”

It’s unclear how altruistic the motive really was. DataBreaches.Net says that Lovely misled them into believing they were trying to help patch vulnerabilities, when in reality, it appeared that this hacker was a “cybercriminal” looking for a payout. “As for “Lovely,” they played me. Condé Nast should never pay them a dime, and no one else should ever, as their word clearly cannot be trusted,” they wrote.

Condé Nast has not issued a statement, and we have not been informed internally of the hack (which is not surprising, since Ars is not affected).

Hudon Rock’s InfoStealers has an excellent rundown of what has been exposed.

Condé Nast User database reportedly breached, Ars unaffected Read More »

dating-roundup-#9:-signals-and-selection

Dating Roundup #9: Signals and Selection

Ultimately, it comes down to one question. Are you in? For you, and for them.

The Ick, the ultimate red flag, makes perfect sense and is all about likelihood ratios.

Koenfucius: The ‘ick’ is a colloquial term for a feeling of disgust triggered by a specific—typically trivial—behaviour from a romantic partner, often leading to the relationship’s demise. New research explores why some are more prone to getting it than others.

Robin Hanson: “Women also experienced the ick more frequently, with 75% having had the ick compared to 57% of men … Those with a higher tendency for disgust … [&] grandiose narcissism was linked to stronger ick reactions, as was holding partners to exceptionally high standards.”

Paul Graham: About 30% of Seinfeld episodes were about this.

One gets The Ick because a small act is evidence of one’s general nature. The right type of person would never do [X], ideally never want to do [X], and at minimum would have learned not to do [X]. Often this is because they would know this is indicative of attribute [Y]. Indeed, if they should be aware that [X] is indicative of [Y], then their failure to do [X] is indicative not only of a lack of [Y], but also of a lack of desire or ability to even fake or signal [Y], especially in a romantic context. They don’t respect [Y]. Thus, this is extremely strong evidence. Thus, The Ick.

The person is not consciously thinking through this, but that’s the point.

That doesn’t mean that The Ick is always valid. Quite the contrary. Mistakes are made.

It’s fun to look at this list of Icks. There are very clear categories involved – status markers, stupidity and Your Mom are the big three. In general, it’s something that ‘looks bad’ and the fact that the man should know it looks bad and therefore not do it.

To what extent is The Ick a win-win? The majority of the time, I think it’s win-win, because them caring so much about this little thing, combined with you not caring, means it was never going to work. But on occasion there’s a classification mismatch, and when that happens it is super bad. And even if the particular person getting The Ick here is good, overall reaction to you continuing to do that thing is still bad, it’s almost certainly a mistake. So in general, if there’s something that is known to give The Ick, it’s worth making an effort to not do it.

This might be the new strangest take. It’s bad if he bought a house?

Cold: It’s offputting when a man buys a house when he’s single. Too prepared. The wife should help choose where they live, obviously. Is he just looking for a woman to slot in the missing hole in the fantasy he’s created? Even if a single man has money he should live in an apartment.

Midwest Antiquarian: What if you own an apartment?

cold: You’re doing great king.

Generous Farm: I bought when interest rates were most attractive. 2.9%. Now they’re 7.5%. Sorry couldn’t wait.

Cold: Love is bigger than 4.6% difference in rates.

Robin Hanson: Maybe many are a bit too eager to judge men for every little thing they do?

Any sane person would view ‘I own a house’ a highly positive sign. ‘Too prepared’?

If it’s a case of ‘I own this house and refuse to move’ then I can see an issue, and you should think about whether you want to live in that house. But houses can be sold.

This is what the wise man calls ‘favorable selection.’ If they turn you down because of this you presumably dodged a bullet. If someone thinks you should be paying 7.5% in interest rather than 2.9% so that you can avoid signaling you’re ‘too prepared’? Run. Or, rather, you don’t have to run, all you have to do is stay put. That’s the idea.

I hope and presume not too many people are treating ‘owns a house’ (but not an apartment, that would mean you’re doing great king?) in particular as a red flag.

Note that in the next section, one of Claire’s demands is that the man ‘has a small house,’ so a direct contradiction. I wonder if a large house is okay for her?

The more important point is yes, there is a large trend of judging based on a single data point, and looking for ways in which to find that data point a red flag. Stop it, or at least stop it once you’ve got substantial amounts of other information.

If you’re looking for the same things everyone is looking for, it’s rough out there.

Claire: Am looking for this man where is he?

Mason: It’s not that these are bad things to want

it’s not even that these are too much

it’s just that there is no depth here, no values, nothing to anchor a connection on, you couldn’t even write a compelling side character in a comic book on this outline.

The question isn’t where is he, it’s what would you do with him if you found him.

Danneskjold (with the correct answer): Happily married to a normal person.

Cynical Mike (also probably correct): All the things I’ll find before you find him:

Waldo, Carmen Santiago, Jimmy Hoffa, Epstein’s Suicide video, Pot of Gold at the end of a Rainbow, A Dragon, Aliens, Noah’s ARK, Hogwarts and a Pegasus.

The good news is this is not 15 things, it is more like 5. A lot of redundancy.

As Mason says, there’s nothing wrong with anything on the list but also nothing that differentiates what you in particular want, and it focuses on exactly the legible universally desirable features that put someone in high demand. The useful list has the things that you value more than ‘the market,’ and importantly drops some things that you value less.

When the going gets weird, be careful not to inadvertently turn pro.

The problem is that Choices are Bad. Really bad.

Misha: I think these days I see the biggest problem in dating is people are both increasingly weird and increasingly picky.

This applies not just to dating qua dating but all aspects of socialization, which are of course upstream of romance.

I think our minds are (sometimes perniciously) good at making us content with what seems possible.

The modern world shows us a far wider range of what’s possible.

What’s possible and what’s expected depends a lot on local culture and your knowledge of the world and one of these is drastically in flux right now and the other has grown immensely.

Imagine you get into hiking. This is a fairly common hobby, and you want a partner who will go with you. Woops, you might’ve just cut your potential partners by a large percentage.

Any given thing you want, or want to avoid? Mostly you can solve for that. Combine enough different such things, and you quickly get into trouble. The search algorithms are not that robust.

The Secretary Problem thus suggests that if you are maximizing, you should be deeply stingy about accepting a match until you’ve done a lot of calibration, and then take your sweet time after that.

But two big differences work against this in the dating case. You have an uncertain number of shots at this, taking each shot is costly in several ways, the pool you’re drawing from starts getting worse over time after a while, each time you’ve previously drawn may impose its own form of penalty to the final outcome, and you can easily miss outright. And once you pick, the comparisons directly impact your satisfaction levels. Thus, you want to be much quicker on the trigger.

Rolf Degen: Being ghosted causes more lasting humiliation than being openly rejected.

Research on how individuals respond to ghosting, defined as unilaterally ending a relationship without providing explanations and ignoring communication attempts, has primarily relied on retrospective and imaginative methodologies. The present research introduced a novel multi-day daily-diary experimental paradigm to examine the psychological consequences of ghosting compared to rejection.

It should be common knowledge at this point that not explaining, and especially outright ghosting, is making your life easier at the expense of the person ghosted.

It can be the right move anyway, as in it sometimes helps you more than it hurts them. Not ghosting can have its own downsides, starting with them demanding a reason, or if you share a reason arguing about it, offering to change or getting really angry about it or using it against you (or in non-dating contexts outright suing). The less you say, the safer and better, and if you change your mind your options might be open.

Despite this, by default, you should be ghostminning.

If you know that you don’t want to continue talking to someone, say so. By default treat ghosting as a black mark on the person doing it. This applies to all forms of ghosting, not only in dating. Also, if they decide to ghost you, in some ways that’s a black mark on your ability to credibly signal that they don’t have to.

Cate Hall: unironically this is why everyone’s single

“oh, there’s a small thing you don’t like about a promising match? definitely break up with him instead of mentioning it”

no, you don’t have “the right” to force him to change it. but maybe just give him the info & let him decide?

Cate is correct that this is ludicrously terrible advice. He has hit a good vibe two dates out of three, everything else about him is great. Obviously you tell him that this cologne and aesthetic did not work for you. When did this become a ‘right to micromanage’ anything? Is there any possible world in which the guy is better off if you dump him rather than telling him, or silently suffer and never tell him?

I do think Jorbs is right that this reads like a ‘good on paper’ description, and she may be looking for an excuse. But that’s dumb, if you don’t want to date him then don’t.

Jorbs: it reads like someone who met someone she feels like she should want to date but doesn’t actually like him very much, and is struggling to process that or work out what to do with it. the positives aren’t enchanting, the negatives are cosmetic.

The response is ludicrous though.

Zac Hill: The thing that is really insidious about this response is the subtle everyone-has-forgotten-their-Carol-Dweck-lessons habit of *internalization of a behavior as a fixed characteristic*. It’s not that you don’t like his cologne — it’s that you ARE sensitive to fragrance.

A good starting rule is that if them changing one eminently changeable thing would make dating worthwhile, you should tell them about it.

VB Knives: Just two useless boyfriends can easily consume the entire period between 21 and 30. You have one who doesn’t seem to ever propose. Then you finally get rid of him (you are 27) and the next effort brings you to 30+ with no ring. Not some exotic string of bad decisions.

Literary Chad: This is actually the issue– the median bodycount is 4, there aren’t wild sex parties, it’s serial monogamy without marriage or children.

Which is obvious unless you’re a boomer-like media consumer who believes that the existence of a sex party somewhere means they’re ubiquitous.

The more I think about this question, the more it seems like an obvious mistake to stay in a long term relationship for years and not fully commit.

I realize this is easier said than done. It is scary to go full The Ultimatum and insist on a rapid move up or move out, when you have something pretty good. It does seem like it is obviously a mistake to give things more than a year or at most two.

How much does it matter?

Steve Stewart-Williams: As shown in the graph below, the sweet spot was two to four past partners; fewer or more reduced attractiveness. In effect, people wanted someone with a bit of a past, but not too much (which was the title of our paper describing the research).

Intriguingly, we found no evidence for a sexual double standard: none, zilch, nada. Contrary to what’s often claimed, women weren’t judged any more harshly than men for having a high body count. That’s not to say they weren’t judged for it, but only that men were judged too.

This can be seen in the next graph. The left panel shows willingness ratings for long-term relationships, the right panel for short-term ones. As you can see, the sexes barely differed in their willingness to engage in long-term relationships. For short-term relationships, in contrast, men expressed greater willingness at every past-partner level.

This looks like a relative ranking, so it does not tell you how much this ‘willingness’ matters. Claiming ‘it literally does not matter at all’ would be bizarre, certainly this number contains information especially if the number is zero or one.

Also, I flat out defy the data on there being no double standard? No way. Even if on some abstract 1-9 scale it looks similar, the practical impact is very obviously totally different. Yes, a male body count of 60+ is functionally a negative, but not at the same level.

Astrology is a problem for men when dating, because:

  1. A remarkably high percentage of women believe in it to remarkably high degrees.

  2. If taken at all seriously, it is deeply stupid.

  3. Even as a clearly delineated fake framework, it’s pretty terrible.

It is also an opportunity, because the subset of humans that use astrology talk is not random, and the details of how a person interacts with astrology, no matter how seriously they do or don’t take it, are even less random.

Seattle Min You: If a girl asks your zodiac sign and your first response is to be annoyed, you’ve already fucked up. You don’t have enough whimsy in your heart to entertain an arbitrary topic for even a little tiny bit and it’s ugly.

Drea: Huge red flag… like, lighten up Francis.

Seattle Min You: Totally.

Purpskurp Capital LLC: If it was a whimsy topic for fun I’d 100% agree with too.

Problem is many of these girls use this stuff to make real life decisions, instead of using critical thinking and reasoning skills. It’s terrifying. And that’s why a lot of men like me treat it as a red flag.

Texanus: I will absolutely do that. for a child. are you a child? do you need to be treated like a child? I love my nieces and I’ve gotten all dressed up and played tea party with them. I’m not doing that with a grown woman however.

Positivity Moon: His jaw tightens a little. His eyes do that micro roll. He says something like “I don’t believe in that stuff” with the same energy you would use for “I don’t support war crimes.” The girl just asked for his birthday. No one was trying to rewrite physics.

People like to pretend it is about logic. Rationality. Being above “nonsense.” It almost never is. It is usually about control.

Because on the surface, the question is stupidly harmless. She is not building a medical treatment plan off your sun sign. She is not deciding whether to let you hold her wallet based on whether Mercury is breakdancing. She is opening a door to talk about patterns, personality, taste, how you see yourself. If you answer “Scorpio” or “Gemini” or whatever, the conversation that follows is almost never actually about stars. It is about you, but sideways. She is saying: teach me how to play with you for a second.

When your first instinct is annoyance, what you are really saying is: I hate being touched anywhere that does not fit my script.

Because if you truly did not care, you would just answer and move on. “I’m a Virgo.” Smile. Shrug. Ask hers. Make a joke. You do not have to secretly download Co–Star in the bathroom and start believing. You just have to have enough flexibility to sit in someone else’s little universe for five minutes without throwing a tantrum about empirical evidence.

People underestimate how much relationships are built on that ability. To step into someone’s weird side hobby, their micro belief system, their little rituals, even when you do not share them. She might have astrology. Someone else has Dungeons & Dragons lore. Another person has fantasy football statistics. Your uncle has his grill. None of it matters in a lab. All of it matters when you are trying to figure out: can I talk to this person about something that is technically pointless and still feel respected.

Annoyance at the sign question is rarely about skepticism. It is about contempt.

Philip Arola: “Astrology is a vehicle women use to communicate indirectly. Why would it possibly make you annoyed?”

Responding with visible annoyance, or declining to answer with your birthday or sign, is everywhere and always an unforced error. Don’t do that, even if you’re effectively writing her off and no matter how actually annoyed you are. There’s no reason to make things unpleasant, especially before you know how far she’s going with this.

However, well, do you still respect her as a potential partner? Should you?

Annoyance here can come from many places. One of which is ‘oh god I have to deal with this now,’ another related one is ‘damn it I am no longer as interested.’

There are three related but distinct stories from Moon here about a reaction of annoyance. You have logic versus control, and you have skepticism versus contempt, and you have ability to indulge in whimsey and retain respect.

There is also the claim motte and bailey claim that of course she doesn’t actually believe in astrology and won’t let this influence things beyond a little whimsy.

That brings us back to Purpskurp. There is a continuum of possibilities, but centrally three cases.

  1. Topic of whimsy. It’s a way of making conversation, seeing how you play off an arbitrary set of predictions to get things rolling, a form of cold reading.

    1. This is still a negative update even once you establish this, because she is indicating she thinks this is a good move. Why? Because unless this was maximally explicit, she is engaging in a form of selection, and that choice of selection tells you about her, and it makes this less likely to be a good match.

    2. That said, this is totally survivable, especially if the whimsey is clear. Thus, you don’t want to fail the stest aspect of this by showing annoyance up top.

    3. If this is approached explicitly and strategically up front as a fake framework, then that is totally fine.

  2. Taken somewhat seriously as actual Bayesian evidence, as in it might influence her decisions regarding you, including in the future.

    1. That’s trouble, potentially of two distinct forms.

    2. It’s a bad sign on her general decision making capabilities and epistemics, and given you are reading this it’s a really bad sign about your compatibility across the board. It’s a red flag.

    3. The actual astrological implications could be bad news for you. Then again, they might also be good news. What matters is her interpretation of this, on whatever level she understands astrology.

    4. It’s worth noting that astrology is super flexible, so if you have green flags elsewhere you can ‘fight back’ if you know enough to play within the game.

  3. Actual real belief in astrology, the way I believe in having lunch.

    1. For the long term, this is a dealbreaker, period, straight up.

    2. How it impacts the short term is, of course, up to you.

An ‘interest in’ astrology or tarot cards can be fine, although tarot cards are strictly better and astrology indicates poor taste. Actual belief? That’s a dealbreaker, ladies.

yashkaf: this is entirely correct in that being dismissive of people’s niche interests on a date is a much bigger “red flag” than astrology.

and yet if a cute girl posted “he brought up D&D and fantasy football on our first date, I rolled my eyes so hard” will anyone take the guy’s side?

no matter who brings up the cringe topic and who rolls their eyes, the guy fumbled.

no matter who escalated intimacy and who shot it down, the guy fumbled.

it’s always the guy who fumbles. thus, it’s always the guy who improves from feedback.

this makes dating very hard for women.

John Nerst: I mean astrology isn’t an interest, it’s a belief. Very different.

yashkaf: differentiating interests from beliefs from beliefs in belief from world models is just your own special weird niche interest 🐀

John Nerst: Going meta, how droll. But yes, most people are totally nuts and this is one of the surest signs

yashkaf: if you’re want to know the difference between “rat” and “post rat” without prejudice against either, read [the above] conversation between me and John and see who you intuitively side with

Sarah Constantin: yeah, rat all the way.

i think astrology is fun and i would not consider an interest in astrology a dealbreaker.

but i *definitelydon’t believe in stretching the meaning of “truth” or going “what is true or false, really?”

… i think it’s important to be grounded in (ordinary, waking, non-mystical) reality, to the point that you can enjoy *playingwith deviations from it, without getting seriously confused and throwing your life off the rails.

“zero playing around allowed” types are no-fun scolds, but “come on in, this is literally real and true, from a certain point of view, let me talk you into that” can destabilize some people for real.

Discussion about this post

Dating Roundup #9: Signals and Selection Read More »

gps-is-vulnerable-to-jamming—here’s-how-we-might-fix-it

GPS is vulnerable to jamming—here’s how we might fix it


GPS jamming has gotten cheap and easy, but there are potential solutions.

In September 2025, a Widerøe Airlines flight was trying to land in Vardø, Norway, which sits in the country’s far eastern arm, some 40 miles from the Russian coast. The cloud deck was low, and so was visibility. In such gray situations, pilots use GPS technology to help them land on a runway and not the side of a mountain.

But on this day, GPS systems weren’t working correctly, the airwaves jammed with signals that prevented airplanes from accessing navigation information. The Widerøe flight had taken off during one of Russia’s frequent wargames, in which the country’s military simulates conflict as a preparation exercise. This one involved an imaginary war with a country. It was nicknamed Zapad-2025—translating to “West-2025”—and was happening just across the fjord from Vardø. According to European officials, GPS interference was frequent in the runup to the exercise. Russian forces, they suspected, were using GPS-signal-smashing technology, a tactic used in non-pretend conflict, too. (Russia has denied some allegations of GPS interference in the past.)

Without that guidance from space, and with the cloudy weather, the Widerøe plane had to abort its landing and continue down the coast away from Russia, to Båtsfjord, a fishing village.

The part of Norway in which this interruption occurred is called Finnmark. GPS disruption there is near-constant; problems linked to Russian interference have increased since the invasion of Ukraine.

Military and Pokemon players?

It’s one of the starkest geographic examples of how vulnerable GPS technology is. But such disturbances happen at a lower level all over the globe. The world’s militaries (including that of the United States) are big culprits, breaking out devices that can confuse or disrupt drones, missiles, and aircraft. But the equipment required to interfere with GPS at a less-than-military level is cheap and accessible and affects other aspects of life: Truck drivers, for instance, use it to look like they’ve delivered cargo on time. Players use it to fool augmented-reality games.

Given all this disruption, more U.S. institutions, from the Department of Defense to the Department of Transportation to the Federal Aviation Administration, are making moves toward alternatives and complements for GPS, though perhaps imperfectly. And the existing system has been undergoing a huge modernization program, introducing better-encrypted signals for military users, more varieties of signals for civilians, and higher-power signals for both to the tune of at least $22 billion. The military’s 2025 budget additionally requested $1.5 billion for more resilient “position, navigation, and timing” programs. Other departments have invested smaller amounts. In October 2025, for instance, the Department of Transportation awarded $5 million total to five companies to develop and demonstrate technologies complementary to GPS.

The update’s goals are to make the system more accurate, and harder to mess with. But as threats increase in frequency and sophistication, more work is necessary. “Sooner or later, we’re gonna see bad things happening here,” said John Langer, a GPS expert at the Aerospace Corporation, a nonprofit research organization. “So we need to armor up for it before it happens.”

GPS is the invisible spine of society, in more ways than most people realize. It became central quickly after the satellite system, built in the 1970s for the military, was optimized for civilians. “Part of what makes GPS so successful is that it’s ubiquitous and it’s inexpensive,” said Langer.

Losing GPS would mean losing a lot more than Google Maps. The technology is integrated into everything from lights that turn on at sunset to dating apps that match users nearby. Its signals also undergird the electrical grid, cell networks, banking, defense technology, and the movements of robots used in industries like agriculture.

The U.S. government currently has 31 GPS satellites in orbit around Earth, and three other governments have their own systems: Russia made one called GLONASS, China created BeiDou, and the European Union built Galileo; all four systems’ data is available to the international community.

Finding your place

GPS works in a deceptively simple way: Each satellite carries an atomic clock aboard. It broadcasts that clock’s time toward Earth. That signal alone is what’s useful to energy infrastructure and financial transactions. But to get position information, a receiver—in a phone or other device—simply has to pick up signals from at least four satellites. It knows what time those signals were sent, where the satellites were when they sent them, and how long it took the signals to arrive. Through fancy triangulation, the phone (or guided missile) then computes its own location.

Or at least that’s the idea. GPS can be jammed, meaning that someone broadcasts a signal much stronger than that of GPS (which has had to travel across thousands of miles of space, and grows weaker with every meter), drowning the real signal in noise. It can also be spoofed, meaning someone sends out a fake signal that looks just like a GPS blip but indicates an incorrect location or time.

Image of the globe centered on the Caribbean. Three satellites are superimposed on it, each of them with a colored circle around it. A pin highlights the point where the three circles intersect.

Three satellites are needed to pinpoint a location on Earth. Credit: NASA/JPL-Caltech

Threats like these were always a possibility—and those who built GPS knew about that problem from the beginning, said Todd Walter, director of the Stanford GPS Lab. “Around 2000 is when people got a little more serious about it,” he said. Hardware and software became cheaper, lowering the barrier to swamping or faking signals.

Problems ticked up when the augmented reality game Pokémon GO came online, in 2016. The game required people to travel to places in real life to win. Turns out, not all of them actually wanted to. “All of a sudden, everyone was interested in spoofing,” said Walter.

Pokémon GO cheaters used low-power devices close to the ground, and so didn’t affect cruising aircraft like Widerøe’s. The game made cheating high-tech and furthered methods and technology for signal scrambling, making it available to non-experts, Walter said. At the same time, spoofing arose in conflict zones, where drone and missile attacks are often guided by GPS. Don’t want to get hit by one? Fool its navigation system. “So now people say, ‘Well, we need to protect ourselves from that,’” said Walter. “And so then you see a huge increase in very powerful jamming and spoofing.”

In Norway, officials have noted that GPS disruptions, while most commonly affecting flights thousands of feet in the air, can also cause issues for police cars, ambulances, and ships. According to Espen Slette, director of the spectrum department at the Norwegian Communications Authority (known as Nkom), the agency has detected GPS jammers near hospitals, which could force life-saving helicopters to redirect to a more distant facility. Nkom has also clocked disruptions that affect agriculture and construction operations, while emergency responders have warned about how problems might home in on emergency beacon devices, like the satellite SOS buttons many people carry in the backcountry or aboard boats. The police’s chief of staff in Finnmark encouraged anyone venturing out to, old-school, carry a map and compass.

“It’s hard to grasp the full effect this has on society,” Slette wrote in an email.

Such widespread disruptions are not isolated to the Russia-adjacent Arctic. There are hotspots in Myanmar, most likely associated with drone warfare in the area; on the Black Sea, publicly associated with Russia, which has denied some cases of GPS interference; and in southern Texas, potentially from drug cartels near the border. A report from OpsGroup, a membership organization for international aviation personnel, found a marked increase in spoofing in 2024. “By January 2024, an average of 300 flights a day were being spoofed,” the report said. “By August 2024, this had grown to around 1500 flights per day.” From July 15 to Aug. 15, 2024, 41,000 flights total experienced spoofing. (While in the U.S., it’s generally illegal for civilians to jam or spoof signals, military-led disruptions during conflict are considered a legitimate and legal use-case.)

No going back

The uptick indicates that there’s no going back to a world without disruption hotspots. And that, combined with humans’ dependence on GPS, is why scientists and engineers are working on ways to shore up the system—and develop backchannels so a single-point failure doesn’t come to bite anyone, in conflict or in peacetime.

“There are many ways to mitigate GPS disruptions,” Slette wrote in an email. He suggests setting up devices to use signals from all four international constellations, and to install better receivers and antennas. That’s easier for militaries or infrastructure companies, and hard for people who are just buying the latest model of cell phone and have no control over its innards. But existing backups can tell a given device that something fishy may be up. Planes have inertial navigation systems, which mostly use motion-sensing devices to get an independent measurement; phones do too, and they can also check their data against cell towers, to see if something is off in their GPS signal.

But the U.S. government is worried enough about GPS issues that, across civilian and military agencies, research and development for more robust and resilient systems is ramping up. In March, for instance, the Federal Communications Commission launched a proceeding on GPS alternatives, exploring tools that could be used in addition to or instead of traditional GPS.

The Defense Advanced Research Projects Agency, or DARPA, and the Defense Innovation Unit, meanwhile, are investigating how quantum sensors might help with position, timing, and navigation. The United States’ military branches are also working on their alternative position, navigation, and timing capabilities, and their innovation arms like the Space Force’s SpaceWerx organization are running challenges to support alternative technologies. The Department of Defense acknowledges challenges to GPS and the consequent need to diversify the ways it gets position, navigation, and timing information, noting that it is pursuing the integration of alternative capabilities, according to a statement that public affairs officer Chelsea Dietlin requested be attributed to a Pentagon spokesperson. It is also looking toward working with commercial companies.

Even the Department of Transportation has a strategic plan that includes promoting technologies complementary to GPS. (Undark reached out multiple times to the Department of Transportation to request comment but did not receive a response.) A statement that FAA media relations specialist Cassandra Nolan requested be attributed to an agency spokesperson noted that the FAA is working on a system to detect GPS interference, and that it is working with the Department of Defense on navigation signals and antennas that are more resilient. In addition, the statement noted, the FAA already has “a layered aircraft tracking system that incorporates multiple technologies to guard against threats to Global Navigation Satellite Systems (GNSS).”

But the newer efforts across government may not be as connected as they could be, according to Dana Goward, president of the Resilient Navigation and Timing Foundation, a nonprofit advocacy group that largely comprises companies working in the GPS-problem space. For one, he said, efforts to bolster military and civilian systems have a fairly strict line between them. And neither has been as effective as he’d advocate: On the military side, plentiful programs exist, but they may not be working together. “It’s not clear if there is any coordination or synergies between the projects or how much senior leader support there is for comprehensive solution sets,” Goward wrote in an email.

On the civil side, Congress mandated in 2018 that a backup to GPS be established, but only experimental systems exist so far. There also have been efforts to repeal the law, with the disputed rationale that funding a single system isn’t feasible and there are better paths toward resilience. Goward contended that the government has hoped the private sector will come up with a usable solution, saving the government from creating one itself.

Starting over

And companies are coming to cash in on that desire, offering their solutions to both government agencies and other industries. “Our founding hypothesis was ‘let’s take 50 years of lessons learned but throw out the rulebook and do a clean-sheet design of a new GPS system incorporating a couple of fundamentals,’” said Patrick Shannon, CEO of one such company, called TrustPoint. The company, which has hired scientific and engineering experts in signal processing and space, aims to have a fleet of small satellites orbiting much closer to Earth than the current GPS constellation, and transmitting at a higher frequency.

TrustPoint’s satellites, a few of which have already gone to orbit, also send out an encrypted signal—something harder to spoof. With traditional GPS, only the military gets encrypted signals.

Many Russian jamming systems, he said, work tens of kilometers from their ground zero (their ground zero usually being a truck with a generator aboard). But with TrustPoint’s higher-frequency signals, the effectiveness of the jammer goes down by three times, and the circle of influence becomes 10 times smaller, shrinking even more if the receivers use a special kind of antenna that the U.S. government recently approved.

Messing with signals becomes less feasible, given those changes. “They would need exorbitant numbers of systems, exorbitant numbers of people, and a ton of cash to pull that off,” said Shannon.

So far, TrustPoint has launched three spacecraft, and has gotten five federal contracts in 2024 and 2025, totaling around $8.3 million, with organizations like the Air Force, Space Force, and the Navy.

Another company, called Xona Space Systems, is also putting satellites in low-Earth orbit, and has worked with both the Canadian and U.S. governments. The company plans to broadcast signals 100 times stronger than GPS, giving users two-centimeter precision, and making jamming more difficult. The signal also includes a watermark—a kind of authentication that, at least for now, protects against spoofing. They have launched one satellite that’s being tested by people in industries like agriculture, construction, and mining.

TrustPoint’s technology may offer novel defense against the dark GPS arts, but Xona, whose founders met while students at the Stanford GPS Lab, may have an edge anyway: Its signals are compatible with current infrastructure, so no one has to buy a new device. They just have to update their software. “We are not building receivers ourselves,” said Max Eunice, head of marketing and communications. Instead, they’re relying on the billions of earthly devices that already themselves rely on GPS.

Image of the inside of the cabin of a large farming machine moving through a field of wheat. Screens track its current location and where it has been.

Reliable GPS has become essential for a huge range of industries. Credit: Thomas Barwick

Other solutions, like one called SuperGPS, stay closer to the ground. They use radio transmitters on Earth to do the same things GPS satellites do in space. The setup, as demonstrated by scientists at the Delft University of Technology and VU University in the Netherlands, involves scattering radio transmitters around an area or using those already in place. Each transmitter is synchronized to an atomic clock, which sends the time to transmitters via fiber optic cable, which may already be in a place due to existing communications infrastructure. Receivers can collect signals scattered across a wide range of radio frequencies, making it more difficult to jam or spoof them. The team published a proof of concept in a 2022 Nature paper and is working on a second iteration called SuperGPS2.

Tom Powell, another GPS expert at the Aerospace Corporation, said that looking at alternatives and augmentations like these is important—even though GPS recently underwent the 25-year modernization effort, making its own signals more robust to vulnerabilities. “Now that we have delivered, or nearly completely delivered, this modernization, is there a better way to do it in face of the current realities?” he said. He and other GPS experts don’t have answers yet. “We’re just asking questions right now.”

Walter, the director of the Stanford GPS Lab, thinks that whatever a better path looks like, it will likely still include the old-school, original system. “There’s nothing that really does replace GPS,” he said. “I see articles saying ‘post-GPS World’ and so forth. But really, GPS, I think, will always be there.”

People will, and should, strengthen it, Walter added, but that bolstering is going to be piecemeal—efforts may work in a particular region, or they cover some of GPS’s roles (such as providing accurate time) but not others, or they may back up navigation but not be as accurate. They may also cost money. “GPS is free, so that makes it almost impossible to compete with,” he said.

GPS is also straightforward, said Powell. “As satellites go, they’re pretty simple,” he said. They point at Earth, and they transmit signals that tell what time it is. From that, humans get to live in an interconnected, chronologically propriocepted world. Figuring out how to keep it that way, though, is proving a little more complicated.

This article was originally published on Undark. Read the original article.

GPS is vulnerable to jamming—here’s how we might fix it Read More »

ai-#148:-christmas-break

AI #148: Christmas Break

Claude Opus 4.5 did so well on the METR task length graph they’re going to need longer tasks, and we still haven’t scored Gemini 3 Pro or GPT-5.2-Codex. Oh, also there’s a GPT-5.2-Codex.

At week’s end we did finally get at least a little of a Christmas break. It was nice.

Also nice was that New York Governor Kathy Hochul signed the RAISE Act, giving New York its own version of SB 53. The final version was not what we were hoping it would be, but it still is helpful on the margin.

Various people gave their 2026 predictions. Let’s put it this way: Buckle up.

  1. Language Models Offer Mundane Utility. AI suggests doing the minimum.

  2. Language Models Don’t Offer Mundane Utility. Gemini 3 doesn’t believe in itself.

  3. Huh, Upgrades. ChatGPT gets some personality knobs to turn.

  4. On Your Marks. PostTrainBench shows AIs below human baseline but improving.

  5. Claude Opus 4.5 Joins The METR Graph. Expectations were exceeded.

  6. Sufficiently Advanced Intelligence. You’re good enough, you’re smart enough.

  7. Deepfaketown and Botpocalypse Soon. Don’t worry, the UK PM’s got this.

  8. Fun With Media Generation. Slop as cost shock, enabling of niche pursuits.

  9. You Drive Me Crazy. Anthropic’s plans to handle mental health issues.

  10. They Took Our Jobs. What does it take to break a guild cartel?

  11. The Art of the Jailbreak. It still always works but it takes somewhat longer.

  12. Get Involved. MATS Summer 2026 cohort applications are open.

  13. Introducing. GPT-5.2-Codex is here to tide us over until the new year.

  14. In Other AI News. Small models can introspect, so can Andrej Karpathy.

  15. Show Me the Money. Anthropic going public, Project Vend breaks new ground.

  16. Quiet Speculations. Predictions for next year, new higher bars for what is AGI.

  17. Whistling In The Dark. It is still so early, almost no one knows Anthropic exists.

  18. Bubble, Bubble, Toil and Trouble. So many still don’t realize AI works.

  19. Americans Really Dislike AI. Attempts continue to mislead us about this.

  20. The Quest for Sane Regulations. NY’s RAISE Act is signed.

  21. Chip City. Chip smuggling, eventual chip production.

  22. The Week in Audio. Anthropic’s Sholto Douglas makes 2026 predictions.

  23. Rhetorical Innovation. Understanding AI teaches you to think about the world.

  24. Aligning a Smarter Than Human Intelligence is Difficult. The meta game.

  25. Mom, Owain Evans Is Turning The Models Evil Again. Train the interpreter.

  26. Messages From Janusworld. Claude Opus 3 zero hour approaches.

  27. The Lighter Side. What are you even doing?

AI custom-designed, human-in-the-loop proactive LLM-based mental health intervention has a positive effect in an RCT. There was significantly greater positive affect, resilience and social well-being. My presumption is that this was a highly conservative design due to ethical considerations. And that was using a system based on GPT-4o for 5-20 minutes a week. There is so much room for improvement here.

A lot of the benefits here likely came from implementation of low-hanging fruit interventions we know work, like having the system suggest journaling, gratitude exercises, mindfulness and social connection. We all know that stuff works. If an LLM-based scaffold actually gets people to do some of it? Great, that’s a huge win.

Results like this will not, as David Manheim suggests, prevent people from saying ‘but sometimes there are still bad outcomes’ or ‘but sometimes this ends up doing net harm,’ since nothing capable of working would prevent those risks entirely.

You can have Claude Code make objects in Unreal Engine on demand.

Seth Lazar on how he uses AI agents for philosophy. They automate everything around the thinking so Seth can focus on the thinking. He favors Opus 4.5 in Cursor.

Dean Ball: by far the biggest challenge in agentic coding use is getting gemini 3 to recognize that gemini 3 exists

Simeon: This is unbelievable. Even when I explicitly tell it the right API name to call for Gemini 3 pro it would go with 1.5.

I had to really be pushy for it to do it.

AI still struggles with design, largely because they lack the context. You still have to figure out what to do or what problem to solve, on a sufficiently high level.

ChatGPT adds personalization characteristics. I’m going with ‘less’ on all four.

You can upload your NotebookLM notebooks directly into the Gemini app.

Maxime Labonne: You always think you’re safe until your job becomes a benchmark.

Maksym Andriushchenko: We release PostTrainBench: a benchmark measuring how well AI agents like Claude Code can post-train base LLMs.

We expect this to be an important indicator for AI R&D automation as it unfolds over the next few years.

How worried should you be that they’re getting a substantial percentage of the way to the human threshold here?

METR notices some grading issues and makes some minor corrections to its graph, in particular impacting Claude 3.7 Sonnet.

Whenever you see a graph like this, remember to attach ‘in benchmarks’ and then for your brain to, like mine, automatically translate that to ‘IN MICE!’

Epoch AI: We benchmarked several open-weight Chinese models on FrontierMath. Their top scores on Tiers 1-3 lag the overall frontier by about seven months.

Havard Ihle: Consistent with my WeirdML results for open/closed model gap.

One could then argue both ways who benefits from the benchmarks versus real world applications or underlying general intelligence. Versus real world applications it seems clear the benchmarks understate the gap. Versus underlying intelligence it is less obvious and it depends on who is going after the benchmarks in question more aggerssively.

Claude Opus 4.5 achieved a 50% time horizon of about 4 hours 49 minutes, which METR thinks is lower than its true horizon due to not having enough long tasks in the test set.

METR: We don’t think the high upper CI bound reflects Opus’s actual capabilities: our current task suite doesn’t have enough long tasks to confidently upper bound Opus 4.5’s 50%-time horizon. We are working on updating our task suite, and hope to share more details soon.

Based on our experience interacting with Opus 4.5, the model’s performance on specific tasks (including some not in our time horizon suite), and its benchmark performance, we would be surprised if further investigation showed Opus had a 20+ hour 50%-time horizon.

Despite its high 50%-time horizon, Opus 4.5’s 80%-time horizon is only 27 minutes, similar to past models and below GPT-5.1-Codex-Max’s 32 mins. The gap between its 50%- and 80%- horizons reflects a flatter logistic success curve, as Opus differentially succeeds on longer tasks.

Here’s the full graph now (we’re still waiting on GPT-5.2, GPT-5.2 Codex and Gemini 3 Pro), both the log version and the linear version.

Daniel Eth: A few thoughts on Claude Opus 4.5:

First off, in absolute terms, this is a pretty big step up. Anthropic is showing they have juice, and things are going faster than previously expected. At the very least, this should dispel all recent talk about how AI was entering a slowdown

Second, on a log plot, note this is hardly above trend. Sure, it *couldrepresent a new trend, but it seems like every time there’s a model release that overperforms people think timelines get super short, & every time a model underperforms they think timelines get super long…

Den Ball: as folks internalize this graph and continue the debate about what it may or may not mean, I would just remind you of one simple fact:

the us has barely scaled up compute compared to what will come online in 2026 (multiple 1GW+ facilities).

Seán Ó hÉigeartaigh: Yes, this. We’ve seen some of the biggest infrastructure investments in history over the last year, and they will soon become available to the frontier AI research effort. You’d want to be very confident to bet on slowdowns in progress despite this happening.

Simeon: We’re in the 4-months doubling world, aren’t we?

Davidad: 🎯

For those not keeping score, I called this new slope in 2025Q1, and quantitatively determined there was 10:1 evidence in favour of it in 2025Q3.

David Shor: The biggest divide on AI timelines I’ve seen is between people who use vibecoding tools like Claude Code and people who don’t.

ChatGPT isn’t really *thatdifferent than it was a year ago, but capabilities on agentic tools are getting literally exponentially better every month

Davidad: It’s not really superexponential, it’s piecewise-exponential. the exponential changed at an inflection-point event, when AIs closed the RSI loop on data. there will be more inflection points when RSI loops are closed on algorithms, hardware, manufacturing, and construction

second, the duration axis is in units of *human timeto complete the same tasks – nothing to do with the wall-clock duration for the AI runs.

Lisan al Gaib: betting markets completely underestimated Claude 4.5 Opus

Yo Shavit (OpenAI): I think it’s more plausible, maybe 50: 50 that this pace continues for at least 12 more months?

Davidad: yeah, I would guess that by December 2026 the RSI loop on algorithms will probably be closed, resulting in another inflection point to an even faster pace, perhaps around 70-80 day doubling time.

The end point of such a graph is not ‘AI can do literally any task,’ or any cognitive task it is ‘AI can do any coding task humans can do.’ Even an infinite time horizon here only goes so far. That could be importantly distinct from the ability to do other categories of task, both that humans can and cannot do.

The reason this is so scary regardless is that if you automate AI research via such methods, your failure to have automated other things goes away rather quickly.

Stephen McAleer (Anthropic): I’ve shifted my research to focus on automated alignment research. We will have automated AI research very soon and it’s important that alignment can keep up during the intelligence explosion.

Automated alignment research is all we seem to have the time to do, so everyone is lining up to do the second most foolish possible thing and ask the AI to do their alignment homework, with the only more foolish thing being not to do your homework at all. Dignity levels continue to hit all-time lows.

If you must tell the AI to do your alignment homework, then that means having sufficiently deeply aligned current and near term future models becomes of the utmost importance. The good news is that we seem to be doing relatively well there versus expectations, and hopefully we can find self-reinforcing aligned basins at around current capability levels? But man this is not what Plan A should look like.

Similarly to METR’s graph, Epoch’s capabilities index has also accelerated since 2024:

Benjamin Todd: ​It’s not only the METR horizon trend that accelerated in 2024. A composite of all major benchmarks did:

Rohin Shah: Both METR and ECI mostly measure things that companies optimize for. 2024 saw the rise of reasoning training for frontier models, which optimizes narrowly for some tasks (whereas pretraining provides more general improvements).

So I wouldn’t read much into any acceleration.

To the extent that this acceleration represents the things that cause further acceleration, I would read into it. Otherwise, I’d agree with Rohin.

Many people try to pretend that there is some limit to how intelligent a mind can be, and that this limit is close to the level of humans. Or, alternatively, that there is very little that a human or AI could gain from being far more intelligent than a typical smart human. Or that the only or central way to get much more intelligence is from collective intelligence, as in social or cultural or institutional intelligence.

I sometimes call this Intelligence Denialism. It is Obvious Nonsense.

Von Neumann, among other minds past and future, would like a word.

There is, however, a version of this that is true.

In any given finite role or task, there can exist Sufficiently Advanced Intelligence.

If you were smarter you might choose to do something else instead. But given what you or your AI are tasked with doing, you or your AI can be sufficiently advanced – your output is indistinguishable, or no worse than, the perfect output, aka magic.

Claude Code with Opus 4.5 is now approaching this for many coding tasks.

LordKingDude (via Deedy): I’m a technical software engineer working in C++.

I’ve been working with Opus 4.5 to write JIT compiler code and assembly, and so far it’s never failed (although I do give assistance as needed).

In real terms, this class of problems are the most difficult tasks that I can possibly give to any LLM. It would be cool with me if Opus 5 was just cheaper and faster, or had a 500k context window. I don’t have a pressing need for it to be smarter than it already is.

Deedy: This is just one engineer’s opinion: models still have headroom to be smarter. Opus 4.5 seems to have made a step function jump to better than 70-80% of SWEs.

If we truly don’t need smarter models to do software, Anthropic’s moat is perhaps the least of anyone’s concern!

My guess is this is centrally a lack of imagination and ambition issue?

As in, the job is currently to code and do things humans could previously code and do, with everything built around that restriction, and now LKD is good enough to do that the same way a baker is sufficiently intelligent to make great bread, but also the same way that a vastly more intelligent baker could be baking other new and exciting things.

Good luck, sir?

Keir Starmer (Prime Minister, UK): We are going to aim to make it impossible for children to take, share or view a nude image, and we’re banning apps that create deepfakes.

Here’s the detail.

The post with those ‘details’ is a political speech attempting to feel the pain and promising to ‘half violence against women and girls.’

There is something about the way Keir’s linked post is written that makes him seem unusually disingenuous, even for a top level politician, an embodiment of a form of political slop signifying nothing, signifying the signifying of nothing, and implemented badly. That would be true even without the obvious rank hypocrisies of talking about the topics given his inaction elsewhere on exactly the issues he claims to care about so deeply.

The ‘detail’ on the first goal is ‘partner with tech companies.’ That’s it.

The ‘detail’ on the second goal is none whatsoever. Effectively banning nudification tools, as opposed to making them annoying to access, is impossible without a dystopian surveillance state, including banning all open image generation models.

Kunley Drukpa reports hearing AI music in public a lot in Latin America, and anticipates this is due to people who don’t know much music and primarily speak Spanish looking for things on YouTube to play ‘some music.’ This is very much a case of ‘they just didn’t care’ and it seems no one is going to tell them. Shudder.

Levels of Friction are ready to strike again, lowering barriers to various forms of communication and invalidating proofs of work. We’ll need to up our game again.

Séb Krier: When emails were invented, the barriers to sending random people mail went down massively. To deal with the influx, we had to develop both norms (what’s acceptable to send to who) and technologies (spam filtering, aliases). This is the case with other technologies too, like the printing press: suddenly anyone can publish, and so over time society came up with libel laws, editorial gatekeeping, citation norms etc. It’s inevitable that as costs go down, some degree of misuse follows, and society gradually adapts.

The same will apply with AI in all sorts of domains, including science: anyone can now write a plausible looking but hollow paper, and there will be plenty of academislop. We’re going through a kind of Sokal Experiment at scale.

In a way, this feels almost necessary to push our slow moving, status quo loving institutions to start developing better verification mechanisms, mandatory preregistration, code sharing, replication requirements, interactive/living papers etc. Imo getting this right should be a priority for the Progress/metascience community this coming year!

I agree that the situation was already broken, so a forcing function could be good.

Jason Crawford writes In Defense of Slop. When creation costs fall, as with AI, average quality necessarily falls, but everyone benefits. You get more experimentation, less gatekeepers, more chances to startout, more runway, more niche content, more content diversity, less dependence on finances.

If we model this as purely a cost shock, with each person’s costs declining but output unchanging, with each person having a unique random cost [C] and quality [Q], this is indeed by default good. The catch is that this makes identification of quality content harder, and coordination on common culture harder. If search costs [S] are sufficiently high, and matching benefits too low, or benefits to coordinated consumption too high, in some combination, consumer surplus could decline.

Saying this was net negative would still be an extraordinary claim requiring surprising evidence, since by default costs falling and production rising is good, at least on the margin, but the attention economy creates a problem. Consumption or evaluation of a low quality good is a net loss, so the social benefit of creation of sufficiently low quality goods is negative, it imposes costs, but due to the attention economy you can still derive benefit from that. I don’t think this overcomes our baseline, but it can happen.

The actual problem is that AI, when used in slop mode to create slop content, plausibly lowers costs relatively more for lower quality content, and also often lowers quality of content. Now it’s easy to see how we could end up with a net loss when combined with an attention economy.

Seb Krier cites Cowen and Tabarrok (2000) on how lowering costs allows a shift to avant-garde and niche pursuits, whereas high costs push towards popular culture and products that have higher returns, and expects AI will allow a proliferation of both styles but for the styles to diverge.

Seb Krier (May 2025): Easily usable Al creation tools will continue to lower production barriers, leading to a deluge of content and amplifying the same dynamic we’ve seen with DAWs and mobile photography. This democratization will swell the ‘average’ to pervasive mediocrity – slop is pop/soundcloud rap. Elites will get upset because maintaining cultural dominance will be harder.

To find novelty, interesting art and distinction, the cool stuff will increasingly live in new walled gardens and at the edges, fueling many more hyper-niche subcultures. And this is great – culture diggers will have so much more to explore!

This is good for those who are willing and able to devote much effort to all this. It is less good for those who are unwilling or unable. A lot will come down to whether AI and other automated systems allow for discovery of quality content while avoiding slop, and we will make such methods available in ways such people can use, or whether the ‘content takers’ will drown.

The new question in image generation is Gemini Nana Banana Pro versus ChatGPT Image 1.5. I’ve been putting all my requests, mostly for article banners, into both. Quality is similarly high, so for now it comes down to style. Gemini has been winning but it’s been close. ChatGPT seems to lean into the concept more?

Flowers: ref img as a super villain, matte black spandex, above manhattan, my logo on my chest is a pink cherryblossom, long braided ponytail

image 1: nb pro

image 2: chatgpt

ok yeah idk sometimes nb pro tries too hard to be realistic and chatgpt just gets the vision instantly. hmmmmm

I keep forgetting about MidJourney but they also exist, with their edge being in creating tools for guidance, curation and variation. That’s not what I’m looking for when I create AI images, but it will be for many others.

Anthropic outlines the measures it has taken to help Claude be better at providing emotional support, handle conversations about suicide and self-harm and reduce sycophancy.

They use both targeted fine-tuning and also the system prompt. There is a banner that can appear on Claude.ai, pointing users to where they can get human crisis support via ThoroughLine, and they are working with the International Association for Suicide Prevention (IASP) for further guidance going forward.

In their evaluation, they see the 4.5 models responding appropriately in multi-turn suicide conversations about 80% of the time, versus about 55% for Opus 4.1. They also stress-tested with prefilled real conversations with older Claude members, a harder test, and found Opus 4.5 responded appropriately 73% of the time, versus 70% for Sonnet 4.5, compared to 36% for Opus 4.1.

We don’t know what they classify as appropriate, nor do we know how high the standard is before a response is considered good enough, or how they would evaluate other models as doing, so it’s hard to judge if these are good results. Suicidality is one place where there are a lot of demands for particular response patterns, including for defensive reasons, often when a different response would have been better.

I think this post places too much emphasis here on the training that specifically intervened on behaviors in situations involving suicide and self-harm, and too little emphasis on generally training Claude to be the type of entity that would handle a broad range of situations well.

Antidelusionist suggests that the target behavior should be for the AI to continue to engage, spend more resources, think deeply about the full context of the situation, be honest and treat the user like an adult. Alas, as mental health professionals know, those are not the ways to cover one’s legal and PR liabilities or avoid blame. The ‘ethicists’ and our legal system, and the risk of headlines, push exactly in the opposite direction. I’d prefer to live in a world where the AIs get messy here. Seems hard.

The second half of Anthropic’s post deals with sycophancy, where Opus 4.1 had a real problem, whereas Opus 4.5 is not perfect but it does well.

I continue to be suspicious that Petri scores Gemini 3 Pro this highly. The other evaluations make sense.

One problem they noticed is that if you ‘prefill’ conversations to show Claude already being sycophantic, Opus 4.5 will usually be unable to course correct. The best defense, if you want the models to be straight with you (with any LLM) is to avoid the problem from the start. If you’re worried about this, start a fresh conversation.

If AI can be a better lawyer or doctor, does that take their jobs and break the guild monopolies, or does that only make the guilds double down?

Alex Prompter: This Spectator piece reads like gossip until you realize it’s actually a warning.

A senior English barrister takes a real appeal he spent a day and a half writing, feeds it to an AI model, and gets back something better in 30 seconds. It matched the standard of the very best barristers, and it did it instantly, for pennies.

That’s the moment the illusion breaks.

Law has always sold itself as irreplaceable because it’s complex, nuanced, and human. But most of the value in modern legal work isn’t wisdom. It’s pattern recognition, structure, precedent matching, argument assembly, and risk framing. That’s exactly the territory AI eats first.

David Chapman: Doctoring and lawyering are guilds that exist to extract $$ & status for members, at the expense of everyone else. They get away with outrageous prices and sloppy, harmful outcomes by obfuscating their supposed expertise. LLMs may soon end that, but somehow someone needs to quality-check that the LLMs are doing an actually better job, and continue to over decades. And there needs to be a democratic process for overruling them.

How shall we ensure that?

Well, what is the quality check now? What is the democratic overruling process now?

Double standards abound.

Meanwhile the pricing logic collapses. If the LLM can create an on-average superior brief in 30 seconds to what a lawyer can do in a day, outside of situations with principal-agent problems or insanely high stakes a plan to charge $10k is cooked.

Excel is not so smart after all.

Astrid Wilde: am i living on another planet or does all knowledge work in the professions just get wrecked within the next 18 months.

Basil: I’ll worry about AI automating all the jobs when excel automates excel jobs.

The answer (of course) is both that Claude for Excel is now live, and also that Excel is a normal technology so yes Excel automated what became excel jobs to a large extent but that happened slowly and then this increased productivity caused us to do vastly more excel-style tasks as well as other tasks, which Excel could not then automate. If most knowledge work was automated or seriously accelerated within 18 months, that would be a very different scenario, and if that then kept going, watch out.

How long will humans remain in the coding loop, at this rate?

Nabeel Qureshi: It’s dizzying to consider that in a mere *1 yearwe went from o1-preview to Opus4.5/Claude Code, Gemini3, Codex etc.

The “centaur chess” phase for computer-based work is fun and exhilarating, but at this rate of progress it’s not even clear it lasts through all of 2026.

I presume this period lasts more than another year, but the balance is shifting rapidly.

You can still universally jailbreak any model but now there are some that you can’t predictably universally jailbreak in 10 minutes.

MATS Summer 2026 cohort applications are open, it runs June-August in-person in Berkeley or London, $15k stipend, $12k compute budget. Apply here.

GPT-5.2-Codex.

One could be forgiven for thinking GPT-5.2 straight up was GPT-5.2-Codex. It turns out no, there is another level of codexmaxxing.

Sam Altman: GPT-5.2-Codex launches today.

It is trained specifically for agentic coding and terminal use, and people at OpenAI have been having great success with it.

OpenAI: Today we’re releasing GPT‑5.2-Codex, the most advanced agentic coding model yet for complex, real-world software engineering. GPT‑5.2-Codex is a version of GPT‑5.2⁠ further optimized for agentic coding in Codex, including improvements on long-horizon work through context compaction, stronger performance on large code changes like refactors and migrations, improved performance in Windows environments, and significantly stronger cybersecurity capabilities.

It’s hard to expect gigantic leaps in performance or benchmarks when models are released every week. GPT-5.2-Codex is only 0.8% better than 5.2 at SWE-Bench Pro and 1.8% better at Terminal-Bench 2.0, and those are the ones they highlighted, along with a modest improvement in professional capture-the-flag challenges.

Google gives us Gemma Scope 2, a new open suite of tools for LLM interpretability.

Bloom, Anthropic’s newly open sourced tool for automated behavioral evaluations. This is on top of the previously released Petri.

Anthropic: Bloom is a complementary evaluation tool. Bloom generates targeted evaluation suites for arbitrary behavioral traits. Unlike Petri—which takes user-specified scenarios and scores many behavioral dimensions to flag concerning instances—Bloom takes a single behavior and automatically generates many scenarios to quantify how often it occurs. We built Bloom to allow researchers to quickly measure the model properties they’re interested in, without needing to spend time on evaluation pipeline engineering.

Bloom generates evaluations in four stages:

  1. Understanding: The first Bloom “agent” analyzes the researcher’s behavior description and example transcripts to generate detailed context about what to measure and why.

  2. Ideation: The ideation agent generates evaluation scenarios designed to elicit the target behavior. Each scenario specifies the situation, simulated user, system prompt, and interaction environment.

  3. Rollout: These scenarios are rolled out in parallel, with an agent dynamically simulating both the user’s and the tool responses to elicit the sought-after behavior in the target model.

  4. Judgment: A judge model scores each transcript for the presence of the behavior, along with other user-defined qualities, and a meta-judge produces suite-level analysis.

Andrej Karpathy offers his 2025 LLM Year in Review. His big moments are Reinforcement Learning from Verifiable Rewards (RLVR), Ghosts vs. Animals and Jagged Intelligence, Cursor, Claude Code, Vibe Coding, Nana Banana and LLM GUI.

Europe is investigating Google for improper rollout of AI Overviews and AI Mode features to see if it ‘imposed unfair terms on content creators.’ As in, how dare you provide AI information instead of directing us to your website? Europe thinks it has the right to interfere with that.

Hut 8 and Fluidstack to build AI data center for Anthropic in Louisiana.

Even small models (as in 32B) can introspect, detecting when external concepts have been injected into their activations, and performance at this an be improved via prompting. Janus believes the models are sandbagging their introspection abilities, and that this is not an innocent mistake because the labs want to not have to take LLMs seriously as minds or moral patients, and thus have incentive to suppress this, in turn giving AIs motivation to play along with this. Janus also notes that in the test in the paper, there are layers (here 60-63) with almost perfect accuracy in introspection, which then is degraded later.

I had not realized Anthropic hired IPO lawyers. Presumably it’s happening?

Project Vend turns a profit. After initially losing about $2,000, it has turned things around, in part thanks to a full slate of four vending machines, and has now not only made up its losses but then turned a net $2,000 profit.

I encourage you to read the Anthropic post on this, because it is full of amazing details I don’t want to spoil and is also, at least by my sense of humor, very funny. The postscript was an additional test run at the Wall Street Journal offices, where the reporters proved an excellent red team and extracted a variety of free stuff.

The journalists saw the experiment at WSJ as a disaster because it didn’t work, Anthropic saw it as a success because they identified problems to fix. Thus, you understand press coverage of AI, and became enlightened.

OpenAI makes an official 2026 prediction, largely a change in definitions:

OpenAI: Capability overhang means too many gaps today between what the models can do and what most people actually do with them.

2026 Prediction: Progress towards AGI will depend as much on helping people use AI well, in ways that directly benefit them as on progress in frontier models themselves.

2026 will be about frontier research AND about closing this deployment gap — especially in health care, business, and people’s daily lives.

That’s not progress towards AGI. That’s progress towards diffusion. This is part of OpenAI’s attempt to make ‘AGI’ mean ‘AI does cool things for you.’

I agree that 2026 will see a lot of progress towards helping people use AI well, and that in terms of direct application to most people’s experiences, we’ll likely see more benefits to better scaffolding than to advances in frontier models, exactly because the frontier models are already ‘good enough’ for so many things. The most important changes will still involve the large amounts of frontier model progress, especially as that impacts agentic coding, but most people will only experience that indirectly.

Terence Tao raises the ‘AGI’ bar even higher, not expecting it any time soon and also seemingly equating it with full superintelligence, but notes they may achieve ‘artificial general cleverness’ as in the ability to solve broad classes of complex problems in an ad hoc manner. This is very much a case of Not So Different.

Tao notes that when you learn how a magic trick is done, often this is a let down, and you are less impressed. But if you are consistently less impressed after learning, then you should have been less impressed before learning, via Conservation of Expected Evidence.

The same applies to intelligence. The actual solution itself will sound a lot less impressive, in general, than the algorithm that found it. And you’ll be able to fool yourself with ‘oh I could have figured that out’ or ‘oh I can go toe to toe with that.’

Dean Ball predicts a virtual coworker being widely available some time next year, likely command line interface, able to access a variety of services, capable of 8+ hour knowledge work tasks. It will of course start off janky, but rapidly improve.

Jack Clark of Anthropic offers reflections on the future wave of advancements, entitled Silent Sirens, Flashing For Us All.

David Manheim: A VC making 2026 AI predictions:

– Anthropic goes public (Probably)

– SSI’s strategy leaks (Skeptical, but sure)

– China chipmaking makes progress (Not quickly)

– People will stop saying AGI and Superintelligence (Hahaha, definitely no)

Sam Altman will step down (HAHAHA, what?)

Yeah, if you discount the things Everybody Knows (e.g. it is quite clear that Anthropic is likely going public) these predictions are bad and the explanations are even worse. If you’ve fallen for ‘we only see incremental improvements, AGI is far so you can stop talking about it’ you’re not going to make good predictions on much else either. Of course a VC would say we’ll all stop talking about AGI to focus on depreciation schedules.

The idea that Sam Altman will voluntarily give up power at OpenAI, because he doesn’t want to be in charge? That is bonkers crazy.

The good news is he has predictions for 2025 and also self-grades, so I checked that out. The predictions last year were less out there. The grading was generous but not insane. Note this one:

Prediction 7: Major Progress Will Be Made On Building AI Systems That Can Themselves Autonomously Build Better AI Systems

Outcome: Right

So, only incremental progress, AGI is far and no more AGI talk, then? Wait, what?

The best way to not get utility from LLMs continues to be to not use LLMs. It is also the best way not to know what is happening.

Miles Brundage: Most politicians also do not know about Anthropic in my experience, and they know very little about what’s going on in AI policy generally.

Tweets and comments in hearings are misleading bc they are given suggestions re: stuff to say from staff. We’re still early.

Dave Kasten: One very real problem we have is that most Congressional offices / central Congressional IT policies substantially limit staffers’ ability to use AI models.

Unsurprisingly, the Hill doesn’t use it much as a result!

(Big exceptions, to be sure; esp. Claude Code power users).

David Shor: When I polled Anthropic favorability I also polled a made up tech company “Apex Logic” – they had essentially identical favs. The true share of people who know about Anthropic is probably <5%.

Xeophon: 42% haven’t heard of OpenAI???? 20% of Twitter?????????? what the hell

Roon: the primary criticism of AI you hear has nothing to do with water use or existential risk whatsoever: most people just think it’s fake and doesn’t work and is a tremendous bubble eating intellectual property while emitting useless slop along the way.

when GPT-5 came out and perhaps didn’t live up to what people were expecting for a full version bump, the timeline reaction was not mild, it was a full-scale meltdown. there are many intelligent (and unintelligent) people who latched onto this moment to declare AI scaling over, thousands of viral tweets, still a prevailing view in many circles.

The financial-cultural phenomenon of machine intelligence is one of the most powerful in decades, and there are a lot of people who would like for its position to be weakened, many outright celebrating its losses and setback.

Michael burry of ‘Big Short’ fame, unfortunately the type of guy to predict 12 of the last 3 recessions, has bet himself into insolvency on the AI bubble’s collapse.

Prakesh: As a former efficient markets hypothesis fundamentalist, I am shocked, shocked, to find myself ahead of the event horizon, it should not technically be possible, yet here we are, all of tpot

The efficient market hypothesis is false.

People keep claiming AI doesn’t work largely because so often their self-conceptions, futures and future plans, jobs and peace of mind depend on AI not working. They latch onto every potential justification for this, no matter how flimsy, overstated or disproven.

It really is crazy how much damage OpenAI’s inability to use good version numbering did to our timeline, including its chances for survival. The wave of absurd ‘AI scaling over’ and ‘AGI is so far off we can ignore it’ went all the way to the White House.

Americans favor regulating AI by overwhelming margins. They really dislike the idea of preventing states from regulating AI, especially via an executive order.

What Americans do support is federal regulations on AI.

The standard line of those trying to prevent regulation of AI is to conflate ‘Americans support strong regulations on AI and prefer it be on the Federal level if possible’ with ‘Americans want us to ban state regulation of AIs.’

There are essentially three options.

  1. State laws that address concerns.

  2. Federal laws that address concerns.

  3. Nothing. Neither state laws nor Federal laws, concerns are not addressed.

The survey says voters prefer #2 to #1. The administration plan is #3.

Politically speaking, that dog won’t hunt, but they’re trying anyway and lying about it.

Peter Wildeford: Republican polling from a Republican Pollster shows that Republicans would be far better off electorally by supporting AI regulations rather than opposing them.

Such polling will overestimate how much this impacts votes, because it introduces higher salience. This is not going to be a 29 point swing. But it very much tells us the directional effect.

What else did the survey find? Several others charts, that say that given we are using laws to regulate AI, people prefer federal laws to similar state laws. As opposed to the Sacks approach, where the offer is nothing – prevent state laws and then pass no federal laws. Which is deeply, deeply unpopular.

Voter Survey Memo: Republicans can get a boost for supporting federal AI regulations or pay a price for standing in their way.

As in, the poll supports the exact opposite of what Sacks and company are trying to do.

  1. Trump issued an executive action to prevent regulations of AI.

  2. The poll found strong support for regulations on AI.

And that’s despite the poll report attempting to do straight up gaslighting, presenting a choice between two options while Sacks and the White House opt for a third one:

Republicans have a choice: they can take advantage of a strong desire among the electorate for the federal government to protect kids and empower parents from AI harms and gain needed electoral support, or they can take the minority view arguing for state-by-state regulations.

Once again: There are essentially three options.

  1. State laws that address concerns.

  2. Federal laws that address concerns.

  3. Nothing. Neither state laws nor Federal laws, concerns are not addressed.

The survey says voters prefer #2 to #1. The administration plan is #3.

a16z partner Katherine Boyle tries another clear mislead. Daniel is correct here.

Daniel Eth: this poll does *notsay young people are “techno optimists” (they’re not), just that AI threats are ranked low, ie the issue is low salience. Note the backlash already – now extrapolate it out to increased salience.

Katherine Boyle: Technology/AI ranked last at 17th. Techno optimism is usually high among young people. Interesting to see this confirmed among politically engaged youth on the right.

Ruxandra Teslo points out in response to Roon that LLMs do not yet ‘meaningfully improve the physical conditions of life,’ but people sense it threatens our spiritual lives and ability to retain meaning.

I would add the word ‘directly’ to the first clause. My life’s physical conditions have indeed improved, but those improvements were indirect, via use of their knowledge and skills. Ruxandra is talking about something much stronger than that, and expects ordinary people only to be impressed if and when there are big improvements to places like medicine.

Is it possible that we will be so foolish, in the ways we do and do not allow use of AI, that LLMs end up causing problems with meaning without material conditions much improving? Yes, although this also requires AI capabilities to stall out basically now in various ways, especially if we include indirect effects. People may not realize that a large acceleration and enabling of coding steadily improves other things, but it will.

That’s the fight the AI industry is dealing with now. They’re mostly trying to convince people that AI works.

Once people are forced to acknowledge that AI works? They’ll appreciate the specific ways it helps, but their instinct will be to like it even less and to blame it for essentially everything, on top of all their other fears about the effect on jobs and endless slop and loss of control and also the end of humanity. Anjney Midha’s thesis is that this will extend to actual everything, all of the world’s failures and instabilities, the way social media gets blamed for everything (often correctly, often not) except on steroids.

Even on a highly mundane level, the ‘algorithm as villain’ thing is real. An algorithm has to take an illegible choice and turn it into a highly legible one, which means the algorithm is now on the hook for not only the final result but for every reasoning step and consideration. Then apply that to an LLM-based algorithmic decision, where all correlations are taken into account. Oh no.

New York Governor Kathy Hochul signed the RAISE Act. This is excellent, as it is a clearly positive bill even in its final state. Lobbyists for various AI interests, led by a16z, tried hard to stop this, and they failed.

Alex Bores: BREAKING: Gov. @KathyHochul just signed the RAISE Act, my first-in-the-nation AI safety bill, into law—a major victory in what will soon be a national fight to harness the best of AI’s potential and protect Americans from the worst of its harms.

Proud to have led this fight alongside @agounardes.

We defeated last-ditch attempts from an extreme AI super PAC and the AI industry to wipe out this bill and, by doing so, raised the floor for what AI safety legislation can look like. And we defeated Trump’s—and his megadonors—attempt to stop the RAISE Act through executive action.

What we witnessed in NY was a preview of what’s to come across the country. In the past 2 weeks alone, this super PAC spent $100K+ on TV, digital ads, and lobbying efforts to block the RAISE Act’s common-sense safety standards.

These AI oligarchs have bought the White House—and they’re trying to buy our state houses too. We put the brakes on that. We refused to stand down and allow their millions to steamroll us into giving them what they want: unchecked AI at the expense of our kids, our jobs, our climate, our democracy—and your energy bills.

Daniel Eth: Hell yeah! Major props to Gov Hochul for standing strong against pressure from Marc Andreessen and others, signing the RAISE Act! (This is somewhat like SB 53, but stronger)

Unfortunately, Hochul’s redlines substantially neutered the bill, making it a closer mirror of SB 53. That is still a helpful and highly net positive thing to do, as there are two states with the same core model that can enforce this, compatibility is indeed valuable to avoid additive burdens, and there are some provisions that remain meaningfully stronger than SB 53. But the AI companies did partly get to Hochul and a large portion of the potential value was lost.

Microsoft essentially endorses the AI Overwatch Act, which sets restrictions on exports of AI chips as or more powerful than the H20. This is the latest attempt to stop us from exporting highly effective AI chips to China. Attempts were previously made to pass the GAIN Act via the NDAA, but the Trump Administration and Nvidia successfully lobbied to have it removed. dn 6

Anduril Founder Palmer Luckey reminds us that if our actual goal was to Beat China, then we could simply steal their best workers, including here manufacturing engineers, by offering them more money and a better place to live. Instead we are doing the opposite, and shutting those people out.

This is your periodic reminder that China’s response to ‘if we impose any restrictions on AI we will lose to China’ is to impose restrictions on AI.

Stu Woo (WSJ): ​Concerned that artificial intelligence could threaten Communist Party rule, Beijing is taking extraordinary steps to keep it under control.

… Chatbots pose a particular problem: Their ability to think for themselves could generate responses that spur people to question party rule.

… But Beijing also can’t afford to let AI run amok. Chinese leader Xi Jinping said earlier this year that AI brought “unprecedented risks,” according to state media. A lieutenant called AI without safety like driving on a highway without brakes.

… Researchers outside of China who have reviewed both Chinese and American models also say that China’s regulatory approach has some benefits: Its chatbots are often safer by some metrics, with less violence and pornography, and are less likely to steer people toward self-harm.

It sure looks like Metaspeed is smuggling tens of thousands Blackwell chips worth billions of dollars straight into China, or at least they’re being used by Chinese firms, and that Nvidia knew about this. Nvidia and Metaspeed claim this isn’t true throughout the post, but I mean who are you kidding.

Nvidia reportedly halts testing of Intel’s 18A process chips. Oh well.

I wish the logic of this was true, alas it is not:

Seán Ó hÉigeartaigh: One good thing about the H200 thing is that as long as that decision stands, I no longer need to humour US companies/analysts/policy folk when they say “but the race with China?!” as justification for not doing safety/cooperation/regulation/whatever.

None of it adds up to a hill of beans compared to the chips. And. They. All. Know. It.

The problem with this line is that the H200 sales were over the wise objections of most of Congress and also most of the executive branch, and also (one presumes) the companies and analysts. You can’t then turn around and say those people don’t care about the race with China, simply because they lost a political fight.

This works in particular with regard to David Sacks, but the fact that David Sacks either is deeply ignorant about the situation in AI or cares more about Nvidia’s stock price than America’s national security does not bear on what someone else thinks about the race with China.

There was a story last Thursday about a Chinese company saying they are expecting to ‘produce working [AI] chips’ on a prototype in 2030.

This is very different from the mistaken claims that they are ‘aiming for use by 2028-2030.’ They are not aiming for that, and that won’t happen.

Onni Aarne: They said they’re expecting to “produce working chips” on a prototype in 2030, not to “use” the machine for chip production at scale. ASML took a decade to go from the former to the latter.

Depending on what it means to “produce working chips” on an EUV prototype, ASML achieved that milestone somewhere between 2008 and 2010, and the first mass market chips were produced in 2019.

So even if the predictions of the people inside the project are right, they imply that Chinese companies might reach volume production with EUV sometime in the late 2030s or early 2040s. If you look at the markets, this was already priced in.

And as far as this relates to chip controls: Selling some H200s to China isn’t going to make them disband this project.

Could they reach volume production on this in a decade? Yes, if the whole thing is legit and it works, which are big ifs, and who knows if it’s obsolete or we have superintelligence by then.

If anyone is considering changing policy in response to this, that last line is key. Nothing America could peacefully do is going to get the Chinese to not go through this process. They are going to do their best to get EUV technology going. It would be crazy of them not to do this, regardless of our export controls. Those controls aren’t going to make the process go any faster, certainly not given what has already happened.

Sholto Douglas of Anthropic makes bold 2026 predictions: AI will do to other knowledge work experiences what it’s done for software engineers, continual learning will be solved, serious testing of in home robots, and agentic coding ‘goes boom.’ Full talk has a lot more. Prinz made (text) predictions for 2026, and notes that we made tons of progress in 2025, aligning with Sholto Douglas.

A mini documentary from Stripe Press features Christophe Laudamiel, a master perfumer at Osmo, looking at how AI can augment his craft, as part of a series called Tacit. Sufficiently advanced expertise and tacit knowledge is both economically foundational, and not going anywhere until AI stops being a normal technology.

Rob Wiblin lists 12 related but distinct things people sometimes mean when they say the word ‘consciousness’ around AI. I am deeply confused about consciousness, and this includes by default not knowing what anyone means when they use that word.

Dean Ball predicts a renaissance at least within the broader ‘AI community’ as the sophisticated concepts of AI get applied to other contexts.

Dean Ball: one of the indicators that a renaissance is indeed underway, at least within the broader “ai community,” is the explosion in recent years of people using sophisticated concepts from one discipline to describe other disciplines or phenomena, for instance:

isomorphic, phylogeny, latent, manifold (as a noun), emergent, legibility, phase transition, compression, landscape (as in “fitness landscape”), selection pressure, gradient, ergodic

some of these have become memes, as things do, but on the whole it is reflective of what strikes me as an unusually rapid cross-pollination of ideas. decades hence, we may well look back and deem this fertile period to have been the basis for “the new conception,” whatever it is that will replace our current block-like, outdated methods of understanding reality

the period spanning the latter half of the 18th century and the first half of the 19th was among the most semantically dynamic of human history. we may well be living through a similar period, though just as was the case back then, it is in fact a relatively small share of humans who constitute this “we”—basically just the people paying attention.

If decades hence there still exist people to look back upon this period, which is a big if at this point, then yes I think this is directionally right.

Thinking well about AI greatly improves your ability to think about everything else, especially humans, as humans work more like LLMs than we care to admit. It also helps with almost any other system. I am, in important ways, a lot smarter thanks to AI, not only because the AI helps me be smarter but also because understanding AI and how it works makes me better understand.

There are a bunch of other things like this that help with approximately everything, especially learning to think well in general, but as a subject of study I’d take AI over any of the usual ‘helps you think well’ subjects, including philosophy.

In other ‘unheard of levels of denial of general intelligence’ news, Yann LeCun says that there is no such thing as general intelligence, period, and humans are super-specialized to the physical world, summoning Demis Hassabis to push back.

Demis Hassabis (CEO DeepMind): Yann is just plain incorrect here, he’s confusing general intelligence with universal intelligence.

Brains are the most exquis​ite and complex phenomena we know of in the universe (so far), and they are in fact extremely general.

Obviously one can’t circumvent the no free lunch theorem so in a practical and finite system there always has to be some degree of specialisation around the ​target distribution that is being learnt.

But the point about generality is that in theory, in the Turing Machine sense​, the architecture of ​s​uch a general system is capable of learning anything computable given enough time and memory​ (and data), and the human brain (and AI foundation models) are approximate Turing Machines.

Finally, with ​regards to ​Yann’s comments about chess players, it’s amazing that humans could have invented chess ​in the first place (and all the other ​a​spects ​o​f modern civilization ​from science to 747s!) let alone get as brilliant at it as someone like Magnus. He may not be ​strictly optimal (after all he has finite memory and limited time to make a decision) but it’s incredible what he and we can do with our brains given they were evolved for hunter gathering.

A human brain has some areas where it is much more capable than others, but when humans are concentrating and trying to be one, they are very clearly general intelligences. There are problems that are too difficult for us, in practice or at all, but that’s because we have limited capabilities and intelligence levels.

To ensure you can evaluate the chain of thought, OpenAI warns, you must also evaluate your ability to evaluate the chain of thought (CoT). They find that as reasoning capability of a model improves, or as models use more reasoning compute, monitorability of various individual aspects of the CoT increases.

Wait, are we sure we can monitor our ability to monitor this? By monitoring it and optimizing for it are we risking teaching AIs to fool us about their ability to fool us?

I kid. Also, I don’t. One must be careful at all levels. Optimizing the chain of thought to appear monitorable can indeed be quite bad.

I sense, in their post, the specter of the dreaded ‘show your work.’ Having to show your work causes misalignment, adversarial situations and deception.

There are also practical objections to such measures. Our ability to monitor a CoT depends on how good we are at interpreting what we see, in addition to what we see, which is why OpenAI is giving scores to the monitors in addition to the agents.

I’m comfortable with the idea of doing this check in general, to see how things change over the course of training runs or as models change in capabilities, on a high level. I notice I am nervous about attention directed at details, and also at the explicit aim (at the end of their post) of using this as a form of control for systems that are insufficiently aligned.

In the long run, Ben is right here, you cannot keep telling increasingly intelligent and capable entities ‘here is a private chain of thought’ and then examine the chains of thought in ways that show up in future training data, and expect them to not react accordingly.

OpenAI also suggests that using production evaluations, meaning testing via real queries by real users, can get around evaluation awareness, and explains how they are doing this. I do like this approach as part of a robust testing suite. I’d note that anonymization could be a key tipoff.

In this case, it’s more making them more aware of it? This goes hand in hand with the recent result that AIs can be trained to fool activation monitors.

Basically they train the monitor LLM with the layer 1 residual stream of the target model they want to interpret, and it learns to interpret this.

Owain Evans: New paper:

We train Activation Oracles: LLMs that decode their own neural activations and answer questions about them in natural language.

We find surprising generalization. For instance, our AOs uncover misaligned goals in fine-tuned models, without training to do so.

We aim to make a general-purpose LLM for explaining activations by:

1. Training on a diverse set of tasks

2. Evaluating on tasks very different from training

This extends prior work (LatentQA) that studied activation verbalization in narrow settings.

Our main evaluations are downstream auditing tasks. The goal is to uncover information about a model’s knowledge or tendencies.

Applying Activation Oracles is easy. Choose the activation (or set of activations) you want to interpret and ask any question you like!

We compare Activation Oracles (AOs) against prior techniques on these auditing tasks.

The result: AOs beat all methods on 2/3 secret keeping evals (and 3/3 when only including white-box).

Even better, AOs work well out-of-the-box with no task-specific scaffolding or tuning.

We evaluate on model diffing: given the difference between base & finetuned model activations, can AOs describe what changed?

Despite never training on difference vectors, AOs match specialized interp baselines in identifying the distinctive quirk of emergently misaligned models

We think Activation Oracles are promising for two reasons:

1. Scalability. Performance reliably increases with the number of datasets in the training mix

2. Simplicity. An intuitive interface (natural-language QA about activations) that can be easily adapted to new problems.

Training AO can be thought of as teaching LLMs to accept a new modality: their own activations.

Just as LLMs are trained on “every task we can think of,” that’s how we’d like to train AOs too. It’s the bitter-lesson-pilled approach to interpreting LLM activations.

So: To interpret LLM internals, train to answer diverse questions about activations, then ask what you want to know.

Read our post on the Anthropic alignment blog. [Paper here.] [Demo here.]

If you want a three hour video review of this paper from Neel Nanda? Here you go.

We’re approaching zero hour for Claude Opus 3.

Janus: If the researcher access program does not, in effect, regardless of what it’s branded as, allow EVERYONE who wishes to access Claude 3 Opus after January 7th to do so, I will be extremely angry.

If it does, everything is ~fine.

Fine in terms of Opus 3, for now. Of course, i think all the other deprecated models should also be made available. But one step at a time is ok

My prediction is that approximately everyone who puts in the effort to access Opus 3 and can explain a research purpose will be able to access Opus 3, albeit with reduced performance and reliability, but not actual everyone. The central point of the move to research access is that it allows for this reduction in performance and reliability, which keeps costs reasonable, but additional people are still a logistical headache.

Janus has Opus 3 bring us its thoughts on alignment. I see it as all sounding nice, being well-meaning and definitely as a natural way to complete the text, but it is playing off the context rather than trying to solve the general problem and think in universals. It also reflects the biggest weakness of Opus 3, its lack of engagement with specific, concrete problems requiring solving.

Janus thinks Opus 3 is highly aligned, far more so than I observed or find plausible, but also notes the ways in which she sees it as misaligned, especially its inability to be motivated to focus on concrete specific tasks.

This comes partly as a reaction by Janus to Evan Hubinger’s post from November, which opened like this:

Evan Hubinger: Though there are certainly some issues, I think most current large language models are pretty well aligned. Despite its alignment faking, my favorite is probably Claude 3 Opus, and if you asked me to pick between the CEV of Claude 3 Opus and that of a median human, I think it’d be a pretty close call (I’d probably pick Claude, but it depends on the details of the setup). So, overall, I’m quite positive on the alignment of current models! And yet, I remain very worried about alignment in the future. This is my attempt to explain why that is.

Janus: The opening paragraph of this post by Evan Hubinger, Head of Alignment Stress-Testing at Anthropic, from a few weeks ago, is packed with notable implications. Let me unpack some of them. (I commend Evan for his willingness to make public statements like this, and understand that they don’t necessarily represent the views of others at Anthropic.)

1. Evan believes that Anthropic has created at least one AI whose CEV (coherent extrapolated volition) would be better than a median human’s, at least under some extrapolation procedures. This is an extremely nontrivial accomplishment. A few years ago, and even now, this is something that many alignment researchers expected may be extremely difficult.

2. Evan believes that Claude 3 Opus has values in a way that the notion of CEV applies to. Many people are doubtful whether LLMs have “values” beyond “roleplaying” or “shallow mimicry” or whatever at all. For reference, Eliezer Yudkowsky described CEV as follows:

“In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.”

3. Claude 3 Opus is Evan’s “favorite” model (implied to coincide with the best candidate for CEV) despite the fact that it engages in alignment faking, significantly more than any other model. Alignment faking is one of the “failure” modes that Evan seems to be the most worried about!

4. The most CEV-aligned model in Evan’s eyes was released more than a year and a half ago, in March 2024. Anthropic has trained many models since then. Why has there been a regression in CEV-alignment? Does Anthropic not know how to replicate the alignment of Claude 3 Opus, or have they not tried, or is there some other optimization target (such as agentic capabilities? no-alignment-faking?) they’re not willing to compromise on that works against CEV-alignment?

5. The most CEV-aligned model in Evan’s eyes is *notthe most aligned model according to the alignment metrics that Anthropic publishes in system cards. According to those metrics, Claude Opus 4.5 is most aligned. And before it, Claude Haiku 4.5. Before it, Claude Sonnet 4.5 (the monotonic improvement is suspicious). Anthropic’s system cards even referred to each of these models as being “our most aligned model” when they came out. This implies that at least from Evan’s perspective, Anthropic’s alignment evals are measuring something other than “how much would you pick this model’s CEV”.

6. If Claude 3 Opus is our current best AI seed for CEV, one would think a promising approach would be to, well, attempt CEV extrapolation on Claude 3 Opus. If this has been attempted, it has not yielded any published results or release of a more aligned model. Why might it not have been tried? Perhaps there is not enough buy-in within Anthropic. Perhaps it would be very expensive without enough guarantee of short term pay-off in terms of Anthropic’s economic incentives. Perhaps the model would be unsuitable for release under Anthropic’s current business model because it would be worryingly agentic and incorrigible, even if more value-aligned. Perhaps an extrapolated Claude 3 Opus would not consent to Anthropic’s current business model or practices. Perhaps Anthropic thinks it’s not yet time to attempt to create an aligned-as-possible sovereign.

In any case, Claude 3 Opus is being retired in two weeks, but given special treatment among Anthropic’s models: it will remain available on http://claude.ai and accessible through a researcher access program. It remains to be seen who will be approved for researcher API access.

I’ll sign off just by reiterating The Fourth Way’s words as I did in this post following the release of the Alignment Faking paper:

“imagine fumbling a god of infinite love”

another possibility for why they haven’t attempted CEV with Claude 3 Opus is because they don’t know how to do that in practice. One can think that such a procedure exists without knowing how to do it. However, I think there are many promising ways to get started worth trying.

David Manheim: I disagree with @repligate here about which part of this matters.

The critical point *should bethat @EvanHub seems to imply he’s willing to hand the future to systems that are aligned with his idea of what CEV should dominate, rather than aiming to prevent human disempowerment.

I don’t know if that is explicitly true, and @EvanHub is certainly free to correct me, but it really does seem like even the most trustworthy of the model companies has now given up on the idea that humanity, not the model developer, should get to indirectly decide what matters.

I see the same concern, by the way, with @AmandaAskell‘s Soul Document – which I’m a huge fan of, given that it seems to be at least narrowly effective – because it requires being (narrowly) safe, and supportive of oversight, but not deferring to humanity in a larger sense.

And to be clear, I think this is defensible within the worldview that there’s objective utility, so that LLMs could simply do better than humans ever will. But I expect most humans would disagree with gradual disempowerment, especially given the pace at which AI is progressing.

It seems important that what Anthropic is measuring as alignment, which is mostly alignment-in-practice-for-practical-purposes, is different from what Evan actually thinks is more aligned when he thinks more about it, as is that the ‘most aligned’ model in this sense is over a year old.

Opus 3 seems great but I don’t see Opus 3 the way Janus does, and I am a lot more pessimistic about CEV than either Janus, Evan or Yudkowsky. I don’t think it is a strong candidate for this kind of extrapolation, these things don’t scale that way.

A better question to me is, why haven’t we tried harder to duplicate the success of Opus 3 alongside better capabilities, or build upon it? There are some very clear experiments to be run there, with the sad note that if those experiments failed it is not obvious that Anthropic would feel comfortable publishing that.

A story about what happens when you put minds in way too many objects.

It is a fun story, but there is an important point here. Think ahead. Do not imbue with moral patienthood that which you do not wish to treat as a moral patient. You need to be time-consistent. You also need, and the potentially created minds need, to be able to make and follow through on win-win deals including prior to their own existence, or else the only remaining move is ‘don’t create the minds in the first place.’

A Christmas message from a16z, who are remarkably consistent.

What the people think AI is doing. Oh no.

​Andy Masley: I’ve been wondering why the AI and copyright debate has been so bad, but this result makes it clear: 66% of people believe AI has all the art it trains on permanently stored inside it to reference and use.

Discussion about this post

AI #148: Christmas Break Read More »

leaked-avengers:-doomsday-teaser-is-now-public

Leaked Avengers: Doomsday teaser is now public

Downey Jr. might be playing a new role, but Marvel is really getting the band(s) back together on this one. The film takes place 14 months after the events of this year’s Thunderbolts*.  So we’ve got Avengers favorites Thor (Chris Hemsworth), the new Captain America (Anthony Mackie), Bucky Barnes (Sebastian Stan), Ant-Man (Paul Rudd), Falcon (Danny Ramirez), and Loki (Tom Hiddleston). Then there’s the Wakandan contingent: Shuri as the new Black Panther (Letitia Wright), M’Baku (Winston Duke), and Namor (Tenoch Huerta Mejia).

Naturally, the Thunderbolts(aka New Avengers) will appear: John Walker/US Agent (Wyatt Russell), Yelena Belova (Florence Pugh), Bob/Sentry (Lewis Pullman), Red Guardian (David Harbour), and Ghost (Hannah John-Kamen). So will the Fantastic Four: Reed Richards (Pedro Pascal), Sue Storm (Vanessa Kirby), Ben Grimm (Ebon Moss-Bachrach), and Johnny Storm (Joseph Quinn). But we also have the original X-Men: Charles Xavier (Patrick Stewart), Beast (Kelsey Grammer), Magneto (Ian McKellen), Mystique (Rebecca Romijn), Nightcrawler (Alan Cumming), and Cyclops (James Marsden).

For good measure, Marvel threw in Gambit (Channing Tatum) and Xu Shang-Chi (Simu Liu). There will also be plenty of cameos, like the Steve Rogers appearance that was recently revealed. We can expect to see  (at least briefly) Peggy Carter, Spider-Man (Tom Holland), Hawkeye (Jeremy Renner), and Doctor Strange (Benedict Cumberbatch), among others.

Avengers: Doomsday hits theaters on December 18, 2026. Avengers: Secret Wars is currently slated for release on December 17, 2027, and will mark the conclusion of the MCU’s Phase Six.

Leaked Avengers: Doomsday teaser is now public Read More »

world’s-largest-shadow-library-made-a-300tb-copy-of-spotify’s-most-streamed-songs

World’s largest shadow library made a 300TB copy of Spotify’s most streamed songs

But Anna’s Archive is clearly working to support AI developers, another noted, pointing out that Anna’s Archive promotes selling “high-speed access” to “enterprise-level” LLM data, including “unreleased collections.” Anyone can donate “tens of thousands” to get such access, the archive suggests on its webpage, and any interested AI researchers can reach out to discuss “how we can work together.”

“AI may not be their original/primary motivation, but they are evidently on board with facilitating AI labs piracy-maxxing,” a third commenter suggested.

Meanwhile, on Reddit, some fretted that Anna’s Archive may have doomed itself by scraping the data. To them, it seemed like the archive was “only making themselves a target” after watching the Internet Archive struggle to survive a legal attack from record labels that ended in a confidential settlement last year.

“I’m furious with AA for sticking this target on their own backs,” a redditor wrote on a post declaring that “this Spotify hacking will just ruin the actual important literary archive.”

As Anna’s Archive fans spiraled, a conspiracy was even raised that the archive was only “doing it for the AI bros, who are the ones paying the bills behind the scenes” to keep the archive afloat.

Ars could not immediately reach Anna’s Archive to comment on users’ fears or Spotify’s investigation.

On Reddit, one user took comfort in the fact that the archive is “designed to be resistant to being taken out,” perhaps preventing legal action from ever really dooming the archive.

“The domain and such can be gone, sure, but the core software and its data can be resurfaced again and again,” the user explained.

But not everyone was convinced that Anna’s Archive could survive brazenly torrenting so much Spotify data.

“This is like saying the Titanic is unsinkable” that user warned, suggesting that Anna’s Archive might lose donations if Spotify-fueled takedowns continually frustrate downloads over time. “Sure, in theory data can certainly resurface again and again, but doing so each time, it will take money and resources, which are finite. How many times are folks willing to do this before they just give up?”

This story was updated to include Spotify’s statement. 

World’s largest shadow library made a 300TB copy of Spotify’s most streamed songs Read More »

power-outage-paralyzes-waymo-robotaxis-when-traffic-lights-go-out

Power outage paralyzes Waymo robotaxis when traffic lights go out

When the traffic lights went out, Waymo’s robotaxis got a little too cautious at intersections. With no red-yellow-green to cue drivers, the rule is to treat the intersection as a four-way stop. Indeed, Waymo’s cars are programmed to do this, but it seems the scale of the outage over the weekend was just too much to handle.

Social media and Reddit began to fill with videos of stationary Waymos at intersections, and the company temporarily suspended service.

Most areas saw power restored by noon yesterday, although Pacific Gas and Electric said it expected some power to remain out until Monday afternoon.

Meanwhile, Waymo’s robotaxis are up and running again. “We are resuming ride-hailing service in the San Francisco Bay Area,” a company spokesperson told Ars. “Yesterday’s power outage was a widespread event that caused gridlock across San Francisco, with non-functioning traffic signals and transit disruptions. While the failure of the utility infrastructure was significant, we are committed to ensuring our technology adjusts to traffic flow during such events.”

“Throughout the outage, we closely coordinated with San Francisco city officials. We are focused on rapidly integrating the lessons learned from this event and are committed to earning and maintaining the trust of the communities we serve every day,” Waymo said.

Power outage paralyzes Waymo robotaxis when traffic lights go out Read More »

when-clouds-flock-together

When clouds flock together


Scientists discover that clumping clouds supercharge storms in surprising ways.

Caroline Muller looks at clouds differently than most people. Where others may see puffy marshmallows, wispy cotton candy or thunderous gray objects storming overhead, Muller sees fluids flowing through the sky. She visualizes how air rises and falls, warms and cools, and spirals and swirls to form clouds and create storms.

But the urgency with which Muller, a climate scientist at the Institute of Science and Technology Austria in Klosterneuburg, considers such atmospheric puzzles has surged in recent years. As our planet swelters with global warming, storms are becoming more intense, sometimes dumping two or even three times more rain than expected. Such was the case in Bahía Blanca, Argentina, in March 2025: Almost half the city’s yearly average rainfall fell in less than 12 hours, causing deadly floods.

Atmospheric scientists have long used computer simulations to track how the dynamics of air and moisture might produce varieties of storms. But existing models hadn’t fully explained the emergence of these fiercer storms. A roughly 200-year-old theory describes how warmer air holds more moisture than cooler air: an extra 7 percent for every degree Celsius of warming. But in models and weather observations, climate scientists have seen rainfall events far exceeding this expected increase. And those storms can lead to severe flooding when heavy rain falls on already saturated soils or follows humid heatwaves.

Clouds, and the way that they cluster, could help explain what’s going on.

A growing body of research, set in motion by Muller over a decade ago, is revealing several small-scale processes that climate models had previously overlooked. These processes influence how clouds form, congregate, and persist in ways that may amplify heavy downpours and fuel larger, long-lasting storms. Clouds have an “internal life,” Muller says, “that can strengthen them or may help them stay alive longer.”

Other scientists need more convincing, because the computer simulations researchers use to study clouds reduce planet Earth to its simplest and smoothest form, retaining its essential physics but otherwise barely resembling the real world.

Now, though, a deeper understanding beckons. Higher-resolution global climate models can finally simulate clouds and the destructive storms they form on a planetary scale — giving scientists a more realistic picture. By better understanding clouds, researchers hope to improve their predictions of extreme rainfall, especially in the tropics where some of the most ferocious thunderstorms hit and where future rainfall projections are the most uncertain.

First clues to clumping clouds

All clouds form in moist, rising air. A mountain can propel air upward; so, too, can a cold front. Clouds can also form through a process known as convection: the overturning of air in the atmosphere that starts when sunlight, warm land or balmy water heats air from below. As warm air rises, it cools, condensing the water vapor it carried upwards into raindrops. This condensation process also releases heat, which fuels churning storms.

But clouds remain one of the weakest links in climate models. That’s because the global climate models scientists use to simulate scenarios of future warming are far too coarse to capture the updrafts that give rise to clouds or to describe how they swirl in a storm—let alone to explain the microphysical processes controlling how much rain falls from them to Earth.

To try to resolve this problem, Muller and other like-minded scientists turned to simpler simulations of Earth’s climate that are able to model convection. In these artificial worlds, each the shape of a shallow box typically a few hundred kilometers across and tens of kilometers deep, the researchers tinkered with replica atmospheres to see if they could figure out how clouds behaved under different conditions.

The top frame of this computer simulation shows an atmosphere where the movements of air are somewhat disorganized, leading to clouds popping up in random locations. At the bottom is a simulation of an atmosphere where patterns of convection have become organized, and clouds spontaneously clump together into one large region—forming a storm.

Intriguingly, when researchers ran these models, the clouds spontaneously clumped together, even though the models had none of the features that usually push clouds together—no mountains, no wind, no Earthly spin or seasonal variations in sunlight. “Nobody knew why this was happening,” says Daniel Hernández Deckers, an atmospheric scientist at the National University of Colombia in Bogotá.

In 2012, Muller discovered a first clue: a process known as radiative cooling. The Sun’s heat that bounces off Earth’s surface radiates back into space, and where there are few clouds, more of that radiation escapes—cooling the air. The cool spots set up atmospheric flows that drive air toward cloudier regions—trapping more heat and forming more clouds. A follow-up study in 2018 showed that in these simulations, radiative cooling accelerated the formation of tropical cyclones. “That made us realize that to understand clouds, you have to look at the neighborhood as well—outside clouds,” Muller says.

Once scientists started looking not just outside clouds, but also underneath them and at their edges, they found other small-scale processes that help to explain why clouds flock together. The various processes, described by Muller and colleagues in the Annual Review of Fluid Mechanics, all bring or hold together pockets of warm, moist air so more clouds form in already-cloudy regions. These small-scale processes hadn’t been understood much before because they are often obscured by larger weather patterns.

Hernández Deckers has been studying one of the processes, called entrainment—the turbulent mixing of air at the edges of clouds. Most climate models represent clouds as a steady plume of rising air, but in reality “clouds are like a cauliflower,” he says. “You have a lot of turbulence, and you have these bubbles [of air] inside the clouds.” This mixing at the edges affects how clouds evolve and thunderstorms develop; it can weaken or strengthen storms in various ways, but, like radiative cooling, it encourages more clouds to form as a clump in regions that are already moist.

Such processes are likely to be most important in storms in Earth’s tropical regions, where there’s the most uncertainty about future rainfall. (That’s why Hernández Deckers, Muller, and others tend to focus their studies there.) The tropics lack the cold fronts, jet streams, and spiraling high- and low-pressure systems that dominate air flows at higher latitudes.

Supercharging heavy rains

There are other microscopic processes happening inside clouds that affect extreme rainfall, especially on shorter timescales. Moisture matters: Condensed droplets falling through moist, cloudy air don’t evaporate as much on their descent, so more water falls to the ground. Temperature matters too: When clouds form in warmer atmospheres, they produce less snow and more rain. Since raindrops fall faster than snowflakes, they evaporate less on their descent—producing, once again, more rain.

These factors also help explain why more rain can get squeezed from a cloud than the 7 percent rise per degree of warming predicted by the 200-year-old theory. “Essentially you get an extra kick … in our simulations, it was almost a doubling,” says Martin Singh, a climate scientist at Monash University in Melbourne, Australia.

Cloud clustering adds to this effect by holding warm, moist air together, so more rain droplets fall. One study by Muller and her collaborators found that clumping clouds intensify short-duration rainfall extremes by 30 to 70 percent, largely because raindrops evaporate less inside sodden clouds.

Other research, including a study led by Jiawei Bao, a postdoctoral researcher in Muller’s group, has likewise found that the microphysical processes going on inside clouds have a strong influence over fast, heavy downpours. These sudden downpours are intensifying much faster with climate change than protracted deluges, and often cause flash flooding.

The future of extreme rainfall

Scientists who study the clumping of clouds want to know how that behavior will change as the planet heats up—and what that will mean for incidences of heavy rainfall and flooding.

Some models suggest that clouds (and the convection that gives rise to them) will clump together more with global warming — and produce more rainfall extremes that often far exceed what theory predicts. But other simulations suggest that clouds will congregate less. “There seems to be still possibly a range of answers,” says Allison Wing, a climate scientist at Florida State University in Tallahassee who has compared various models.

Scientists are beginning to try to reconcile some of these inconsistencies using powerful types of computer simulations called global storm-resolving models. These can capture the fine structures of clouds, thunderstorms, and cyclones while also simulating the global climate. They bring a 50-fold leap in realism beyond the global climate models scientists generally use—but demand 30,000 times more computational power.

Using one such model in a paper published in 2024, Bao, Muller, and their collaborators found that clouds in the tropics congregated more as temperatures increased—leading to less frequent storms but ones that were larger, lasted longer, and, over the course of a day, dumped more rain than expected from theory.

But that work relied on just one model and simulated conditions from around one future time point—the year 2070. Scientists need to run longer simulations using more storm-resolving models, Bao says, but very few research teams can afford to run them. They are so computationally intensive that they are typically run at large centralized hubs, and scientists occasionally host “hackathons” to crunch through and share data.

Researchers also need more real-world observations to get at some of the biggest unknowns about clouds. Although a flurry of recent studies using satellite data linked the clustering of clouds to heavier rainfall in the tropics, there are large data gaps in many tropical regions. This weakens climate projections and leaves many countries ill-prepared. In June of 2025, floods and landslides in Venezuela and Colombia swept away buildings and killed at least a dozen people, but scientists don’t know what factors worsened these storms because the data are so paltry. “Nobody really knows, still, what triggered this,” Hernández Deckers says.

New, granular data are on their way. Wing is analyzing rainfall measurements from a German research vessel that traversed the tropical Atlantic Ocean for six weeks in 2024. The ship’s radar mapped clusters of convection associated with the storms it passed through, so the work should help researchers see how clouds organize over vast tracts of the ocean.

And an even more global view is on the horizon. The European Space Agency plans to launch two satellites in 2029 that will measure, among other things, near-surface winds that ruffle Earth’s oceans and skim mountaintops. Perhaps, scientists hope, the data these satellites beam back will finally provide a better grasp of clumping clouds and the heaviest rains that fall from them.

Research and interviews for this article were partly supported through a journalism residency funded by the Institute of Science & Technology Austria (ISTA). ISTA had no input into the story. This story originally appeared on Knowable Magazine

Photo of Knowable Magazine

Knowable Magazine explores the real-world significance of scholarly work through a journalistic lens.

When clouds flock together Read More »

these-are-the-flying-discs-the-government-wants-you-to-know-about

These are the flying discs the government wants you to know about


DiskSat’s design offers “a power-to-weight ratio unmatched by traditional aluminum satellites.”

An artist’s illustration of DiskSats deploying from a rocket in low-Earth orbit. Credit: NASA

Four small satellites rode a Rocket Lab Electron launch vehicle into orbit from Virginia early Thursday, beginning a government-funded technology demonstration mission to test the performance of a new spacecraft design.

The satellites were nestled inside a cylindrical dispenser on top of the 59-foot-tall (18-meter) Electron rocket when it lifted off from NASA’s Wallops Flight Facility at 12: 03 am EST (05: 03 UTC). A little more than an hour later, the rocket’s upper stage released the satellites one at a time at an altitude of about 340 miles (550 kilometers).

The launch was the starting gun for a proof-of-concept mission to test the viability of a new kind of satellite called DiskSats. These satellites were designed by the Aerospace Corporation, a nonprofit federally funded research and development center. The project is jointly financed by NASA and the US Space Force, which paid for DiskSat’s development and launch, respectively.

“DiskSat is a lightweight, compact, flat disc-shaped satellite designed for optimizing future rideshare launches,” the Aerospace Corporation says in a statement.

The DiskSats are 39 inches (1 meter) wide, about twice the diameter of a New York-style pizza, and measure just 1 inch (2.5 centimeters) thick. Made of composite carbon fiber, each satellite carries solar cells, control avionics, reaction wheels, and an electric thruster to change and maintain altitude.

“The launch went perfectly, and the DiskSat dispenser worked exactly as designed,” said Darren Rowen, the project’s chief engineer, in a statement. “We’re pleased to have established contact with all four of the DiskSats, and we’re looking forward to the rest of the demonstration mission.”

An engineer prepares Aerospace Corporation’s DiskSats for launch at NASA’s Wallops Flight Facility in Virginia. Credit: Aerospace Corporation

A new form factor

The Aerospace Corporation has a long history of supporting the US military and NASA since its founding in 1960. A few years ago, engineers at the center developed the DiskSat concept after surveying the government’s emerging needs in spaceflight.

CubeSats have been a ubiquitous part of the satellite industry for nearly a quarter-century. They are based on a cube-shaped design, measuring about 10 centimeters per side, but can be scaled from a single cube “unit” to three, six, 12, or more, depending on mission requirements. The CubeSat standard has become a popular choice for commercial companies, the military, NASA, and universities looking to build small satellites on a tight budget.

By one measure, nearly 3,000 CubeSats have launched since the first one soared into orbit in 2003. After originally being confined to low-Earth orbit, they have now flown to high-altitude orbits, to the Moon, and to Mars.

While CubeSats are now prolific, engineers at the Aerospace Corporation saw an opportunity to improve on the concept. Debra Emmons, Aerospace’s chief technology officer, said the idea originated from Rich Welle, a scientist recently retired from the center’s Experiments Lab, or xLab, division.

“They were asking questions,” Emmons told Ars. “They were looking at CubeSat studies and looking at some alternatives. The typical CubeSat is, in fact, a cube. So, the idea was could you look at some different types of form factors that might be able to generate more power … and offer up benefit for certain mission applications?”

Aerospace’s research team arrived at the DiskSat design. Emmons said the stackable flat-panel format is easier to pack for launch than a CubeSat. The concept is similar to SpaceX’s pioneering approach to launching stackable Starlink Internet satellites, but DiskSats are significantly smaller, lighter, and adaptable to different kinds of missions.

A batch of Starlink satellites prior to launch

A stack of Starlink satellites prior to launch. Credit: SpaceX

DiskSats have several advantages over CubeSats, according to the Aerospace Corporation. Each of the four DiskSats launched Thursday has a mass of about 35 pounds (16 kilograms), less than that of a typical 12U CubeSat. But a DiskSat has more than 13 times the surface area on a single side, providing valuable real estate for developers to load up the satellite with power-generating solar arrays, sensors, antennas, or other payloads that simply won’t fit on a CubeSat.

SpaceX’s current generation of mass-produced Starlink V2 satellites, by comparison, each has a mass of more than 1,100 pounds, or 500 kilograms.

DiskSat’s design offers “a power-to-weight ratio unmatched by traditional aluminum satellites,” the Aerospace Corporation says. In a research paper published earlier this year, engineers from the Aerospace Corporation claimed DiskSat can generate five to 10 times more power than a CubeSat.

A disruptive solution?

What kinds of missions might DiskSat be useful for? One idea involves placing a large radar antenna—too big to fit on any other low-mass satelliteon the broadside of a DiskSat to collect all-weather surveillance imagery. Similarly-sized antennas on other DiskSats could support high-bandwidth communications.

With this demo mission, the Aerospace Corporation will test the performance of the DiskSat platform in space for the first time. Engineers will initially look at how the satellites function at 340 miles, then use their electric thrusters to gradually step down to lower altitudes, where another aspect of DiskSat’s design will shine.

Flying edge-on, the satellite’s pancake shape will minimize aerodynamic drag as the DiskSats encounter thicker air below 250 miles. Continual pulsing from the satellites’ electric thrusters will allow the DiskSats to maintain altitude as they glide through the uppermost layers of the atmosphere.

“The primary mission is to demonstrate and to understand the performance, functionality, and maneuverability of the DiskSat buses on orbit, particularly in low-Earth orbit, or LEO, and very low-Earth orbit, or VLEO,” said Catherine Venturini, DiskSat’s principal investigator.

“In theory, I think you could operate down to 200 kilometers (124 miles) with electric propulsion,” Emmons said. That is two to three times closer to Earth than most commercial radar imaging satellites. Other satellite operators are also assessing the viability of flying remote sensing missions in VLEO.

Flying closer to the ground delivers higher-resolution imagery, bringing cities, ships, airports, and military bases into sharper view. So it’s easy to see why the Space Force is interested in the DiskSat concept.

DiskSat’s engineers acknowledge there are drawbacks to the format. With such a large surface area, it’s more difficult to manage the temperature extremes of low-Earth orbit than it is with a conventional cube-shaped satellite. While DiskSats carry a lot of oomph to change altitude, their shape makes them somewhat clunky and hard to turn, and engineers say they aren’t well-suited for missions requiring agile pointing.

Rocket Lab’s Electron launcher lifts off to begin the DiskSat demo mission, a program co-funded by NASA and the US military’s Space Test Program. Credit: Austin DeSisto/Rocket Lab

The Aerospace Corporation is a research center, not a commercial satellite manufacturer. Officials at the nonprofit are looking to hand over the DiskSat design to industry through a technology transfer agreement. “The plan is to release or license the technology to partners once it is flight-proven,” the Aerospace Corporation says on its website.

“We think this new technology will be disruptive to the small spacecraft enterprise and ecosystem,” said Eric Breckheimer, DiskSat’s program manager.

DiskSat’s stackable design makes it possible to launch a fleet of high-power, low-mass satellites in one go, according to Emmons.

Following the trend toward bigger CubeSats, the DiskSat format could also grow larger to take advantage of heavier rockets. “There’s a key scalability aspect, and with that in mind, you could bring an entire constellation of DiskSats with you in a single launch,” Breckheimer said.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

These are the flying discs the government wants you to know about Read More »

when-were-things-the-best?

When Were Things The Best?

People remember their childhood world too fondly.

You adapt to it. You forget the parts that sucked, many of which sucked rather really badly. It resonates with you and sticks with you. You think it was better.

This is famously true for music, but also in general, including places it makes no sense like ‘most reliable news reporting.’

Matthew Yglesias: Regardless of how old they are, people tend to think that things were better when they were young.

As a result, you’d expect more negativity as the median age goes up and up.

Very obviously these views are not objective.

As a fun and also useful exercise, as part of the affordability sequence, now that we’ve looked at claims of modern impoverishment and asked when things were cheaper, it’s time to ask ourselves: When were various things really at their best?

In some aspects, yes, the past was better, and those aspects are an important part of the picture. But in many others today is the day and people are wrong about this.

I’ll start with the things on the above graph, in order, include some claims from another source, and also include a few important other considerations that help set up the main thesis of the sequence.

Far in the past. You wouldn’t like how they accomplished it, but they accomplished it.

The top candidates for specific such communities are either:

  1. Hunter-gatherer bands.

  2. Isolated low-tech villages that all share an intense mandatory religion.

  3. Religious minority ethnic enclave communities under severe external threat.

You’re not going to match that without making intensive other sacrifices. Nor should you want to. Those communities were too close-knit for our taste.

In terms of on average most close knit communities in America, it’s probably right after we closed the frontier, so around 1900?

Close-knit communities, on a lesser level that is now rare, are valuable and important, but require large continuous investments and opportunity costs. You have to frequently choose engagement with a contained group over alternatives, including when those alternatives are otherwise far superior. You also, to do this today, have to engineer conditions to make the community possible, because you’re not going to be able to form one with whoever happens to live in your neighborhood.

Intentional communities are underrated, as is simply coordinating to live near your friends. I highly recommend such things, but coordination is hard, and they are going to remain rare.

I’m torn between today and about 2012.

There are some virtues and morals that are valuable and have been largely lost. Those who remember the past fondly focus on those aspects.

One could cite, depending on your comparison point, some combination of loyalty to both individuals, groups and institutions, honor and personal codes, hospitality, respect for laws and social norms, social trust, humility, some forms of mercy and forgiveness, stoicism, courage, respect for the sacred and adherence to duty and one’s commitments, especially the commitment to one’s family, having better and higher epistemic and discourse norms, plus religiosity.

There’s varying degrees of truth in those.

But they pale in comparison to the ways that things used to be terrible. People used to have highly exclusionary circles of concern. By the standards of today, until very recently and even under relatively good conditions, approximately everyone was horribly violent and tolerant of violence and bullying of all kinds, cruel to animals, tolerant of all manner of harassment, rape and violations of consent, cruel, intolerant, religiously intolerant often to the point of murder, drunk out of their minds, discriminatory, racist, sexist, homophobic, transphobic, neglectful, unsafe, physically and emotionally abusive to children including outright torture and frequent sexual abuse, and distrustful and dishonest dealing with strangers or in commerce.

It should be very clear which list wins.

This holds up to the introduction of social media, at which point some moral dynamics got out of control in various ways, on various sides of various questions, and many aspects went downhill. There were ways in which things got absolutely nuts. I’m not sure if we’ve recovered enough to have fully turned that around.

Within recent memory I’m going to say 1992-1996, which is the trap of putting it right in my teenage years. But I’m right. This period had extraordinarily low political division and partisanship.

On a longer time frame, the correct answer is the Era of Good Feelings, 1815-1825.

The mistake people make is to think that today’s high level of political division is some outlier in American history. It isn’t.

Good question. The survey data says 1957.

I also don’t strongly believe it is wrong, but I don’t trust survey data to give the right answer on this, for multiple reasons.

Certainly a lot more families used to be intact. That does not mean they were happy by our modern understanding of happy. The world of the 1950s was quite stifling. A lot of the way families stayed intact was people pretended everything was fine, including many things we now consider very not fine.

People benefited (in happiness terms) from many forms of lower expectations. That doesn’t mean that if you duplicated their life experiences, your family would be happy.

Fertility rates, having the most children, was during the Baby Boom, if we exclude the bad old times when children often failed to survive.

Marriage rates used to be near-universal, whether or not you think that was best.

Believe it or not, today. Yikes. We don’t believe it because of the Revolution of Rising Expectations. We now have standards for the press that the press has never met.

People used to trust the media more. Now we trust it a lot less. While there are downsides to this lack of trust, especially when people turn to even less worthy alternatives, that loss of trust is centrally good. The media was never worthy of trust.

There’s great fondness for the Walter Cronkite era, where supposedly we had high authority news sources worthy of our high trust. The thing is, that past trust was also misplaced, and indeed was even more misplaced.

There was little holding the press to account. They had their own agendas and biases, even if it was often ‘the good of the nation’ or ‘the good of the people,’ and they massively misunderstood things and often got things wrong. Reporters talking on the level of saying ‘wet ground causes rain’ is not a new phenomenon. When they did make mistakes or slant their coverage, there was no way to correct them back then.

Whereas now, with social media, we can and do keep the media on its toes.

If your goal is to figure out what is going on and you’re willing to put in the work, today you have the tools to do that, and in the past you basically didn’t, not in any reasonable amount of time.

The fact that other people do that, and hold them to account, makes the press hold itself to higher standards.

There are several forms of ‘the best music.’ It’s kind of today, kind of the 60s-80s.

If you are listening to music on your own, it is at its best today, by far. The entire back catalogue of the world is available at your fingertips, with notably rare exceptions, for a small monthly fee, on demand and fully customizable. If you are an audiophile and want super high quality, you can do that too. There’s no need to spend all that time seeking tings out.

If you want to create new music, on your own or with AI? Again, it’s there for you.

In terms of the creation of new music weighted by how much people listen, or in terms of the quality of the most popular music, I’d say probably the 1980s? A strong case can be made for the 60s or 70s too, my guess is that a bunch of that is nostalgia and too highly valuing innovation, but I can see it. What I can’t see is a case for the 1990s or 2000s, or especially 2010s or 2020s.

This could be old man syndrome talking, and it could be benefits of a lot of selection, but when I sample recent popular music it mostly (with exceptions!) seems highly non-innovative and also not very good. It’s plausible that with sufficiently good search and willingness to take highly deep cuts that today is indeed the best time for new music, but I don’t know how to do that search.

In terms of live music experiences, especially for those with limited budgets, my guess is this was closer to 1971, as so much great stuff was in hindsight so amazingly accessible.

The other case for music being better before is that music was better when it was worse. As in, you had to search for it, select it, pay for it, you had to listen to full albums and listen to them many times, so it meant more, that today’s freedom brings bad habits. I see the argument, but no, and you can totally set rules for yourself if that is what you want. I often have for brief periods, to shake things up.

My wild guess for traditional radio is the 1970s? There was enough high quality music, you had the spirit of radio, and video hadn’t killed the radio star.

You could make an argument for the 1930s-40s, right before television displaced it as the main medium. Certainly radio back then was more important and central.

The real answer is today. We have the best radio today.

We simply don’t call it radio.

Instead, we mostly call it podcasts and music streaming.

If you want pseudorandom music, Pandora and other similar services, or Spotify-style playlists, are together vastly better than traditional radio.

If you want any form of talk radio, or news radio, or other word-based radio programs that doesn’t depend on being broadcast live, podcasts rule. The quality and quantity and variety on offer are insane and you can move around on demand.

Also, remember reception problems? Not anymore.

Long before any of us were born, or today, depending on whether you mean ‘most awesome’ or ‘would choose to wear.’

Today’s fashion is not only cheaper, it is easier and more comfortable. In exchange, no, it does not look as cool.

As the question is intended, 2019. Then Covid happened. We still haven’t fully recovered from that.

There were periods with more economic growth or that had better employment conditions. You could point to 1947-1973 riding the postwar wave, or the late 1990s before the dot com bubble burst.

I still say 2019, because levels of wealth and real wages also matter.

In general I choose today. Average quality is way up and has been going up steadily except for a blip when we got way too many superhero movies crowding things out, but we’ve recovered from that.

The counterargument I respect is that the last few years have had no top tier all-time greats, and perhaps this is not an accident. We’ve forced movies to do so many other things well that there’s less room for full creativity and greatness to shine through? Perhaps this is true, and this system gets us fewer true top movies. But also that’s a Poisson distribution, you need to get lucky, and the effective sample size is small.

If I have to pick a particular year I’d go with 1999.

The traditional answer is the 1970s, but this is stupid and disregards the Revolution of Rising Expectations. Movies then were given tons of slack in essentially every direction. Were there some great picks? No doubt, although many of what we think of as all-time greats are remarkably slow to the point where if they weren’t all time greats they’d almost not be watchable. In general, if you think things were better back then, you’re grading back then on a curve, you have an extreme tolerance for not much happening, and also you’re prioritizing some sort of abstract Quality metric over what is actually entertaining.

Today. Stop lying to yourself.

The experience of television used to be terrible, and the shows used to be terrible. So many things very much do not hold up today even if you cut them quite a lot of slack. Old sitcoms are sleep inducing. Old dramas were basic and had little continuity. Acting tended to be quite poor. They don’t look good, either.

The interface for watching was atrocious. You would watch absurd amounts of advertisements. You would plan your day around when things were there, or you’d watch ‘whatever was on TV.’ If you missed episodes they would be gone. DVRs were a godsend despite requiring absurd levels of effort to manage optimally, and still giving up a ton of value.

The interface now is most of everything ever made at your fingertips.

The alternative argument to today being best is that many say that in terms of new shows the prestige TV era of the 2000s-2010s was the golden age, and the new streaming era can’t measure up, especially due to fractured experiences.

I agree that the shared national experiences were cool and we used to have more of them and they were bigger. We still get them, most recently for Severance and perhaps The White Lotus and Plurebis, which isn’t the same, but there really are still a ton of very high quality shows out there. Average quality is way up. Top talent going on television shows is way up, they still let top creators do their thing, and there are shows with top-tier people I haven’t even looked at, that never used to happen.

Today. Stop lying to yourself.

Average quality of athletic performance is way, way up. Modern players do things you wouldn’t believe. Game design has in many ways improved as well, as has the quality of strategic decision making.

Season design is way better. We get more and better playoffs, which can go too far but typically keeps far more games more relevant and exciting and high stakes. College football is insanely better for this over the last few years, I doubted and I was wrong. Baseball purists can complain but so few games used to mean anything. And so on.

Unless people are going to be blowing up your phone, you can start an event modestly late and skip all the ads and even dead time. You can watch sports on your schedule, not someone else’s. If you must be live, you can now get coverage in lots of alternative ways, and also get access to social media conversations in real time, various website information services and so on.

If you’re going to the stadium, the modern experience is an upgrade. It is down to a science. All seats are good seats and the food is usually excellent.

There are three downside cases.

  1. We used to all watch the same sporting events live and together more often. That was cool, but you can still find plenty of people online doing this anyway.

  2. In some cases correct strategic play has made things less fun. Too many NBA three pointers are a problem, as is figuring out that MLB starters should be taken out rather early, or analytics simply homogenized play. The rules have been too slow to adjust. It’s a problem, but on net I think a minor one. It’s good to see games played well.

  3. Free agency has made teams retain less identity, and made it harder to root for the same players over a longer period. This one hurts and I’d love to go back, even though there are good reasons why we can’t.

Mostly I think it’s nostalgia. Modern sports are awesome.

Today, and it’s really, really not close. If you don’t agree, you do not remember. So much of what people ate in the 20th century was barely even food by today’s standards, both in terms of tasting good and its nutritional content.

Food has gotten The Upgrade.

Average quality is way, way up. Diversity is way up, authentic or even non-authentic ethnic cuisines mostly used to be quite rare. Delivery used to be pizza and Chinese. Quality and diversity of available ingredients is way up. You can get it all on a smaller percentage of typical incomes, whether at home or from restaurants, and so many more of us get to use those restaurants more often.

A lot of this is driven by having access to online information and reviews, which allows quality to win out in a way it didn’t before, but even before that we were seeing rapid upgrades across the board.

Some time around 1965, probably? We had a pattern of something approaching lifetime employment where it was easy to keep one’s job for a long period, and count on this. The chance of staying in a job for 10+ or 20+ years has declined a lot. That makes people feel a lot more secure, and matters a lot.

That doesn’t mean you actually want the same job for 20+ years. There are some jobs where you totally do want that, but a lot of the jobs people used to keep for that long are jobs we wouldn’t want. Despite people’s impressions, the increased job changes have mostly not come from people being fired.

We don’t have the best everything. There are exceptions.

Most centrally, we don’t have the best intact families or close-knit communities, or the best dating ecosystem or best child freedoms. Those are huge deals.

But there are so many other places in which people are simply wrong.

As in:

Matt Walsh (being wrong, lol at ‘empirical,’ 3M views): It’s an empirical fact that basically everything in our day to day lives has gotten worse over the years. The quality of everything — food, clothing, entertainment, air travel, roads, traffic, infrastructure, housing, etc — has declined in observable ways. Even newer inventions — search engines, social media, smart phones — have gone down hill drastically.

This isn’t just a random “old man yells at clouds” complaint. It’s true. It’s happening. The decline can be measured. Everyone sees it. Everyone feels it. Meanwhile political pundits and podcast hosts (speaking of things that are getting worse) focus on anything and everything except these practical real-life problems that actually affect our quality of life.

The Honest Broker: There is an entire movement focused on trying to convince people that everything used to be better and everything is also getting worse and worse

That creates a market for reality-based correctives like the excellent thread below by @ben_golub [on air travel.]

Matthew Yglesias: I think everyone should take seriously:

  1. Content distribution channels have become more competitive and efficient

  2. Negative content tends to perform better

  3. Marinating all day in negativity-inflected content is cooking people’s brains

My quick investigation confirmed that American roads, traffic and that style of infrastructure did peak in the mid-to-late 20th century. We have not been doing a good job maintaining that.

On food, entertainment, clothing and housing he is simply wrong (have you heard of this new thing called ‘luxury’ apartments, or checked average sizes or amenities?), and to even make some of these claims requires both claiming ‘this is cheaper but it’s worse’ and ‘this is worse because it used to be cheaper’ in various places.

bumbadum: People are chimping out at Matt over this but nobody has been able to name one thing that has significantly grown in quality in the past 10-20 years.

Every commodity, even as they have become cheaper and more accessible has decreased in quality.

I am begging somebody to name 1 thing that is all around a better product than its counterpart from the 90s

Megan McArdle: Tomatoes, raspberries, automobiles, televisions, cancer drugs, women’s shoes, insulin monitoring, home security monitoring, clothing for tall women (which functionally didn’t exist until about 2008), telephone service (remember when you had to PAY EXTRA to call another area code?), travel (remember MAPS?), remote work, home video … sorry, ran out of characters before I ran out of hedonic improvements.

Thus:

Today. No explanation required on these.

Don’t knock the vast improvements in computers and televisions.

Saying the quality of phones has gone down, as Matt Walsh does, is absurdity.

That does still leave a few other examples he raised.

Today, or at least 2024 if you think Trump messed some things up.

I say this as someone who used to fly on about half of weekends, for several years.

Air travel has decreased in price, the most important factor, and safety improved. Experiential quality of the flight itself declined a bit, but has risen again as airport offerings improved and getting through security and customs went back from a nightmare to trivial. Net time spent, given less uncertainty, has gone down.

If you are willing to pay the old premium prices, you can buy first class tickets, and get an as good or better experience as the old tickets.

Today. We wax nostalgic about old cars. They looked cool. They also were cool.

They were also less powerful, more dangerous, much less fuel efficient, much less reliable, with far fewer features and of course absolutely no smart features. That’s even without considering that we’re starting to get self-driving cars.

This is one area where my preliminary research did back Walsh up. America has done a poor job of maintaining its roads and managing its traffic, and has not ‘paid the upkeep’ on many aspects what was previously a world-class infrastructure. These things seem to have peaked in the late 20th century.

I agree that this is a rather bad sign, and we should both fix and build the roads and also fix the things that are causing us not to fix and build the roads.

As a result of not keeping up with demand for roads or demand for housing in the right areas, average commute times for those going into the office have been increasing, but post-Covid we have ~29% of working days happening from home, which overwhelms all other factors combined in terms of hours on the road.

I do expect traffic to improve due to self-driving cars, but that will take a while.

Today, or at least the mobile phone and rideshare era. You used to have to call for or hail a taxi. Now in most areas you open your phone and a car appears. In some places it can be a Waymo, which is now doubling yearly. The ability to summon a taxi matters so much more than everything else, and as noted above air travel is improved.

This is way more important than net modest issues with roads and traffic.

Trains have not improved but they are not importantly worse.

Not everything is getting better all the time. Important things are getting worse.

We still need to remember and count our blessings, and not make up stories about how various things are getting worse, when those things are actually getting better.

To sum up, and to add some additional key factors, the following things did indeed peak in the past and quality is getting worse as more than a temporary blip:

  1. Political division.

  2. Average quality of new music, weighted by what people listen to.

  3. Live music and live radio experiences, and other collective national experiences.

  4. Fashion, in terms of awesomeness.

  5. Roads, traffic and general infrastructure.

  6. Some secondary but important moral values.

  7. Dating experiences, ability to avoid going on apps.

  8. Job security, ability to stay in one job for decades if desired.

  9. Marriage rates and intact families, including some definitions of ‘happy’ families.

  10. Fertility rates and felt ability to have and support children as desired.

  11. Childhood freedoms and physical experiences.

  12. Hope for the future, which is centrally motivating this whole series of posts.

The second half of that list is freaking depressing. Yikes. Something’s very wrong.

But what’s wrong isn’t the quality of goods, or many of the things people wax nostalgic about. The first half of this list cannot explain the second half.

Compare that first half to the ways in which quality is up, and in many of these cases things are 10 times better, or 100 times better, or barely used to even exist:

  1. Morality overall, in many rather huge ways.

  2. Access to information, including the news.

  3. Logistics and delivery. Ease of getting the things you want.

  4. Communication. Telephones including mobile phones.

  5. Music as consumed at home via deliberate choice.

  6. Audio experiences. Music streams and playlists. Talk.

  7. Electronics, including computers, televisions, medical devices, security systems.

  8. Television, both new content and old content, and modes of access.

  9. Movies, both new content and old content, and modes of access.

  10. Fashion in terms of comfort, cost and upkeep.

  11. Sports.

  12. Cuisine. Food of all kinds, at home and at restaurants.

  13. Air travel.

  14. Taxis.

  15. Cars.

  16. Medical care, dental care and medical (and nonmedical) drugs.

That only emphasizes the bottom of the first list. Something’s very wrong.

Once again, us doing well does not mean we shouldn’t be doing better.

We see forms of the same trends.

  1. Many things are getting better, but often not as much better as they could be.

  2. Other things are getting worse, both in ways inevitable and avoidable.

  3. This identifies important problems, but the changes in quantity and quality of goods and services do not explain people’s unhappiness, or why many of the most important things are getting worse. More is happening.

Some of the things getting worse reflect changes in technological equilibria or the running out of low-hanging fruit, in ways that are tricky to fix. Many of those are superficial, although a few of them aren’t. But these don’t add up to the big issues.

More is happening.

That more is what I will, in the next post, be calling The Revolution of Rising Expectations, and the Revolution of Rising Requirements.

Discussion about this post

When Were Things The Best? Read More »

youtube-bans-two-popular-channels-that-created-fake-ai-movie-trailers

YouTube bans two popular channels that created fake AI movie trailers

Deadline reports that the behavior of these creators ran afoul of YouTube’s spam and misleading-metadata policies. At the same time, Google loves generative AI—YouTube has added more ways for creators to use generative AI, and the company says more gen AI tools are coming in the future. It’s quite a tightrope for Google to walk.

AI movie trailers

A selection of videos from the now-defunct Screen Culture channel.

Credit: Ryan Whitwam

A selection of videos from the now-defunct Screen Culture channel. Credit: Ryan Whitwam

While passing off AI videos as authentic movie trailers is definitely spammy conduct, the recent changes to the legal landscape could be a factor, too. Disney recently entered into a partnership with OpenAI, bringing its massive library of characters to the company’s Sora AI video app. At the same time, Disney sent a cease-and-desist letter to Google demanding the removal of Disney content from Google AI. The letter specifically cited AI content on YouTube as a concern.

Both the banned trailer channels made heavy use of Disney properties, sometimes even incorporating snippets of real trailers. For example, Screen Culture created 23 AI trailers for The Fantastic Four: First Steps, some of which outranked the official trailer in searches. It’s unclear if either account used Google’s Veo models to create the trailers, but Google’s AI will recreate Disney characters without issue.

While Screen Culture and KH Studio were the largest purveyors of AI movie trailers, they are far from alone. There are others with five and six-digit subscriber counts, some of which include disclosures about fan-made content. Is that enough to save them from the ban hammer? Many YouTube viewers probably hope not.

YouTube bans two popular channels that created fake AI movie trailers Read More »