Author name: 9u50fv

give-me-a-reason(ing-model)

Give Me a Reason(ing Model)

Are we doing this again? It looks like we are doing this again.

This time it involves giving LLMs several ‘new’ tasks including effectively a Tower of Hanoi problem, asking them to specify the answer via individual steps rather than an algorithm then calling a failure to properly execute all the steps this way (whether or not they even had enough tokens to do it!) an inability to reason.

The actual work in the paper seems by all accounts to be fine as far as it goes if presented accurately, but the way it is being presented and discussed is not fine.

Ruben Hassid (12 million views, not how any of this works): BREAKING: Apple just proved AI “reasoning” models like Claude, DeepSeek-R1, and o3-mini don’t actually reason at all.

They just memorize patterns really well.

Here’s what Apple discovered:

(hint: we’re not as close to AGI as the hype suggests)

Instead of using the same old math tests that AI companies love to brag about, Apple created fresh puzzle games. They tested Claude Thinking, DeepSeek-R1, and o3-mini on problems these models had never seen before.

All “reasoning” models hit a complexity wall where they completely collapse to 0% accuracy. No matter how much computing power you give them, they can’t solve harder problems. As problems got harder, these “thinking” models actually started thinking less. They used fewer tokens and gave up faster, despite having unlimited budget.

[And so on.]

Ryan Greenblatt: This paper doesn’t show fundamental limitations of LLMs:

– The “higher complexity” problems require more reasoning than fits in the context length (humans would also take too long).

– Humans would also make errors in the cases where the problem is doable in the context length.

– I bet models they don’t test (in particular o3 or o4-mini) would perform better and probably get close to solving most of the problems which are solvable in the allowed context length

It’s somewhat wild that the paper doesn’t realize that solving many of the problems they give the model would clearly require >>50k tokens of reasoning which the model can’t do. Of course the performance goes to zero once the problem gets sufficiently big: the model has a limited context length. (A human with a few hours would also fail!)

Rohit: I asked o3 to analyse and critique Apple’s new “LLMs can’t reason” paper. Despite its inability to reason I think it did a pretty decent job, don’t you?

Don’t get me wrong it’s an interesting paper for sure, like the variations in when catastrophic failure happens for instance, just a bit overstated wrt its positioning.

Kevin Bryan: The “reasoning doesn’t exist” Apple paper drives me crazy. Take logic puzzle like Tower of Hanoi w/ 10s to 1000000s of moves to solve correctly. Check first step where an LLM makes mistake. Long problems aren’t solved. Fewer thought tokens/early mistakes on longer problems.

But if you tell me to solve a problem that would take me an hour of pen and paper, but give me five minutes, I’ll probably give you an approximate solution or a heuristic. THIS IS EXACTLY WHAT FOUNDATION MODELS WITH THINKING ARE RL’D TO DO.

We know from things like Code with Claude and internal benchmarks that performance strictly increases as we increase in tokens used for inference, on ~every problem domain tried. But LLM companies can do this: *youcan’t b/c model you have access to tries not to “overthink”.

The team on this paper are good (incl. Yoshua Bengio’s brother!), but interpretation media folks give it is just wrong. It 100% does not, and can not, show “reasoning is just pattern matching” (beyond trivial fact that all LLMs do nothing more than RL’d token prediction…)

The team might be good, but in this case you don’t blame the reaction on the media. The abstract very clearly is laying out the same misleading narrative picked up by the media. You can wish for a media that doesn’t get fooled by that, but that’s not the world we live in, and the blame is squarely on the way the paper presents itself.

Lisan al Galib: A few more observations after replicating the Tower of Hanoi game with their exact prompts:

– You need AT LEAST 2^N – 1 moves and the output format requires 10 tokens per move + some constant stuff.

– Furthermore the output limit for Sonnet 3.7 is 128k, DeepSeek R1 64K, and o3-mini 100k tokens. This includes the reasoning tokens they use before outputting their final answer!

– all models will have 0 accuracy with more than 13 disks simply because they can not output that much!

– At least for Sonnet it doesn’t try to reason through the problem once it’s above ~7 disks. It will state what the problem and the algorithm to solve it and then output its solution without even thinking about individual steps.

– it’s also interesting to look at the models as having a X% chance of picking the correct token at each move

– even with a 99.99% probability the models will eventually make an error simply because of the exponentially growing problem size

But I also observed this peak in token usage across the models I tested at around 9-11 disks. That’s simply the threshold where the models say: “Fuck off I’m not writing down 2^n_disks – 1 steps”

[And so on.]

Tony Ginart: Humans aren’t solving a 10 disk tower of Hanoi by hand either.

One Draw Nick: If that’s true then this paper from Apple makes no sense.

Lisan al Galib: It doesn’t, hope that helps.

Gallabytes: if I asked you to solve towers of Hanoi entirely in your head without writing anything down how tall could the tower get before you’d tell me to fuck off?

My answer to ‘how many before I tell you off’ is three. Not that I couldn’t do more than three, but I would choose not to.

Colin Fraser I think gives us a great and clean version of the bear case here?

Colin Fraser: if you can reliably carry out a sequence of logical steps then you can solve the Tower of Hanoi problem. If you can’t solve the Tower of Hanoi problem then you can’t carry out a sequence of logical steps. It’s really quite simple and not mysterious.

They give it the instructions. They tell it to do the steps. It doesn’t do the steps. So-called “reasoning” doesn’t help it do the steps. What else are you supposed to make of this? It can’t do the steps.

It seems important that this doesn’t follow?

  1. Not doing [X] in a given situation doesn’t mean you can’t do [X] in general.

  2. Not doing [X] in a particular test especially doesn’t mean a model can’t do [X].

  3. Not doing [X] can be a simple ‘you did not provide enough tokens to [X]’ issue.

  4. The more adversarial the example, the less evidence this provided.

  5. Failure to do any given task requiring [X] does not mean you can’t [X] in general.

Or more generally, ‘won’t’ or ‘doesn’t’ [X] does not show ‘can’t’ [X]. It is of course often evidence, since doing [X] does prove you can [X]. How much evidence it provides depends on the circumstances.

To summarize, this is tough but remarkably fair:

Charles Goddard: 🤯 MIND-BLOWN! A new paper just SHATTERED everything we thought we knew about AI reasoning!

This is paradigm-shifting. A MUST-READ. Full breakdown below 👇

🧵 1/23

Linch: Any chance you’re looking for a coauthor in future work? I want to write a survey paper explaining why while jobs extremely similar to mine will be easily automatable, my own skillset is unique and special and require a human touch.

Also the periodic reminder that asking ‘is it really reasoning’ is a wrong question.

Yuchen Jin: Ilya Sutskever, in his speech at UToronto 2 days ago:

“The day will come when AI will do all the things we can do.”

“The reason is the brain is a biological computer, so why can’t the digital computer do the same things?”

It’s funny that we are debating if AI can “truly think” or give “the illusion of thinking”, as if our biological brain is superior or fundamentally different from a digital brain.

Ilya’s advice to the greatest challenge of humanity ever:

“By simply looking at what AI can do, not ignoring it, that will generate the energy that’s required to overcome the huge challenge.”

If a different name for what is happening would dissolve the dispute, then who cares?

Colin Fraser: The labs are the ones who gave test time compute scaling these grandiose names like “thinking” and “reasoning”. They could have just not called it that.

I don’t see those names as grandiose. I see them as the best practical descriptions in terms of helping people understand what is going on. It seems much more helpful and practical than always saying ‘test time compute scaling.’ Colin suggested ‘long output mode’ and I agree that would set expectations lower but I don’t think that describes the central thing going on here at all, instead it makes it sounds like it’s being more verbose.

Discussion about this post

Give Me a Reason(ing Model) Read More »

after-ai-setbacks,-meta-bets-billions-on-undefined-“superintelligence”

After AI setbacks, Meta bets billions on undefined “superintelligence”

Meta has developed plans to create a new artificial intelligence research lab dedicated to pursuing “superintelligence,” according to reporting from The New York Times. The social media giant chose 28-year-old Alexandr Wang, founder and CEO of Scale AI, to join the new lab as part of a broader reorganization of Meta’s AI efforts under CEO Mark Zuckerberg.

Superintelligence refers to a hypothetical AI system that would exceed human cognitive abilities—a step beyond artificial general intelligence (AGI), which aims to match an intelligent human’s capability for learning new tasks without intensive specialized training.

However, much like AGI, superintelligence remains a nebulous term in the field. Since scientists still poorly understand the mechanics of human intelligence, and because human intelligence resists simple quantification with no single definition, identifying superintelligence when it arrives will present significant challenges.

Computers already far surpass humans in certain forms of information processing such as calculations, but this narrow superiority doesn’t qualify as superintelligence under most definitions. The pursuit assumes we’ll recognize it when we see it, despite the conceptual fuzziness.

Illustration of studious robot reading a book

AI researcher Dr. Margaret Mitchell told Ars Technica in April 2024 that there will “likely never be agreement on comparisons between human and machine intelligence” but predicted that “men in positions of power and influence, particularly ones with investments in AI, will declare that AI is smarter than humans” regardless of the reality.

The new lab represents Meta’s effort to remain competitive in the increasingly crowded AI race, where tech giants continue pouring billions into research and talent acquisition. Meta has reportedly offered compensation packages worth seven to nine figures to dozens of researchers from companies like OpenAI and Google, according to The New York Times, with some already agreeing to join the company.

Meta joins a growing list of tech giants making bold claims about advanced AI development. In January, OpenAI CEO Sam Altman wrote in a blog post that “we are now confident we know how to build AGI as we have traditionally understood it.” Earlier, in September 2024, Altman predicted that the AI industry might develop superintelligence “in a few thousand days.” Elon Musk made an even more aggressive prediction in April 2024, saying that AI would be “smarter than the smartest human” by “next year, within two years.”

After AI setbacks, Meta bets billions on undefined “superintelligence” Read More »

what-to-expect-from-apple’s-worldwide-developers-conference-next-week

What to expect from Apple’s Worldwide Developers Conference next week


i wwdc what you did there

We expect to see new designs, new branding, and more at Apple’s WWDC 2025.

Apple’s Worldwide Developers Conference kicks off on Monday with the company’s standard keynote presentation—a combination of PR about how great Apple and its existing products are and a first look at the next-generation versions of iOS, iPadOS, macOS, and the company’s other operating systems.

Reporting before the keynote rarely captures everything that Apple has planned at its presentations, but the reliable information we’ve seen so far is that Apple will keep the focus on its software this year rather than using the keynote to demo splashy new hardware like the Vision Pro and Apple Silicon Mac Pro, which the company introduced at WWDC a couple years back.

If you haven’t been keeping track, here are a few of the things that are most likely to happen when the pre-recorded announcement videos start rolling next week.

Redesign time

Reliable reports from Bloomberg’s Mark Gurman have been saying for months that Apple’s operating systems are getting a design overhaul at WWDC.

The company apparently plans to use the design of the Vision Pro’s visionOS software as a jumping-off point for the new designs, introducing more transparency and UI elements that appear to be floating on the surface of your screen. Apple’s overarching goal, according to Gurman, is to “simplify the way users navigate and control their devices” by “updating the style of icons, menus, apps, windows and system buttons.”

Apple’s airy, floaty visionOS will apparently serve as the inspiration for its next-generation software design. Credit: Apple

Any good software redesign needs to walk a tightrope between freshening up an old look and solving old problems without changing peoples’ devices so much that they become unrecognizable and unfamiliar. The number of people who have complained to me about the iOS 18-era redesign of the Photos app suggests to me that Apple doesn’t always strike the right balance. But a new look can also generate excitement and encourage upgrades more readily than some of the low-profile or under-the-hood improvements that these updates normally focus on.

The redesigned UI should be released simultaneously for iOS, iPadOS, and macOS. The Mac last received a significant facelift back in 2020 with macOS 11 Big Sur, though this was overshadowed at the time by the much more significant shift from Intel’s chips to Apple Silicon. The current iOS and iPadOS design has its roots in 2013’s iOS 7, though with over a decade’s worth of gradual evolution on top.

An OS by any other name

With the new design will apparently come a new naming scheme, shifting from the current version numbers to new numbers based on the year. So we allegedly won’t be seeing iOS 19, macOS 16, watchOS 12, or visionOS 3—instead, we’ll get iOS 26, macOS 26, watchOS 26, and visionOS 26.

The new numbers might be a little confusing at first, especially for the period of overlap where Apple is actively supporting (say) macOS 14, macOS 15, and macOS 26. But in the long run, the consistency should make it easier to tell roughly how old your software is and will also make it easier to tell whether your device is running current software without having to remember the number for each of your individual devices.

It also unifies the approach to any new operating system variants Apple might announce—tvOS starts at version 9 and iPadOS starts at version 13, for example, because they were linked to the then-current iOS release. But visionOS and watchOS both started over from 1.0, and the macOS version is based on the year that Apple arbitrarily decided to end the 20-year-old “macOS X” branding and jump up to 11.

Note that those numbers will use the upcoming year rather than the current year—iOS 26 will be Apple’s latest and greatest OS for about three months in 2025, assuming the normal September-ish launch, but it will be the main OS for nine months in 2026. Apple usually also waits until later in the fall or winter to start forcing people onto the new OS, issuing at least a handful of security-only updates for the outgoing OS for people who don’t want to be guinea pigs for a possibly buggy new release.

Seriously, don’t get your hopes up about hardware

Apple showed off Vision Pro at WWDC in 2023, but we’re not expecting to see much hardware this year. Credit: Samuel Axon

Gurman has reported that Apple had “no major new devices ready to ship” this year.

Apple generally concentrates its hardware launches to the spring and fall, with quieter and lower-profile launches in the spring and bigger launches in the fall, anchored by the tentpole that is the iPhone. But WWDC has occasionally been a launching point for new Macs (because Macs are the only systems that run Xcode, Apple’s development environment) and occasionally brand-new platforms (because getting developers on board with new platforms is one way to increase their chances of success). But the best available information suggests that neither of those things is happening this time around.

There are possibilities, though. Apple has apparently been at work behind the scenes on expanding its smart home footprint, and the eternally neglected Mac Pro is still using an M2 Ultra when an M3 Ultra already exists. But especially with a new redesign to play up, we’d expect Apple to keep the spotlight on its software this time around.

The fate of Intel Macs

It’s been five years since Apple started moving from Intel’s chips to its own custom silicon in Macs and two years since Apple sold its last Intel Macs. And since the very start of the transition, Apple has resisted providing a firm answer to the question of when Intel Macs will stop getting new macOS updates.

Our analysis of years of support data suggests two likely possibilities: that Apple releases one more new version of macOS for Intel Macs before shifting to a couple years of security-only updates or that Apple pulls the plug and shifts to security-only updates this year.

Rumors suggest that current betas still run on the last couple rounds of Intel Macs, dropping support for some older or slower models introduced between 2018 and 2020. If that’s true, there’s a pretty good chance it’s the last new macOS version to officially support Intel CPUs. Regardless, we’ll know more when the first betas drop after the keynote.

Even if the new version of macOS supports some Intel Macs, expect the list of features that require Apple Silicon to keep getting longer.

iPad multitasking? Again?

The perennial complaint about high-end iPads is that the hardware is a lot more capable than the software allows it to be. And every couple of years, Apple takes another crack at making the iPad a viable laptop replacement by improving the state of multitasking on the platform. This will allegedly be another one of those years.

We don’t know much about what form these multitasking improvements will take—whether they’re a further refinement of existing features like Stage Manager or something entirely new. The changes have been described as “more like macOS,” but that could mean pretty much anything.

Playing games

People play plenty of games on Apple’s devices, but they still aren’t really a “destination” for gaming in the same way that a dedicated console or Windows PC is. The company is apparently hoping to change that with a new unified app for games. Like Valve’s Steam, the app will reportedly serve as a storefront, launcher, and achievement tracker, and will also facilitate communication between friends playing the same game.

Apple took a similar stab at this idea in the early days of the iPhone with Game Center, which still exists as a service in the background on modern Apple devices but was discontinued as a standalone app quite a few years ago.

Apple has been trying for a few years now to make its operating systems more hospitable to gaming, especially in macOS. The company has added a low-latency Game Mode to macOS and comprehensive support for modern wireless gamepads from Microsoft, Sony, and Nintendo. The company’s Game Porting Toolkit stops short of being a consumer-friendly way to run Windows games on macOS, but it does give developers of Windows games an easier on-ramp for testing and porting their games to Apple’s platforms. We’ll see whether a unified app can help any of these other gaming features gel into something that feels cohesive.

Going home

A smart speaker about the size of a mason jar.

Might we see a more prominent, marketable name for what Apple currently calls the “HomePod Software”? Credit: Jeff Dunn

One of Apple’s long-simmering behind-the-scenes hardware projects is apparently a new kind of smart home device that weds the HomePod’s current capabilities with a vaguely Apple TV-like touchscreen interface. In theory, this device would compete with the likes of Amazon’s Echo Show devices.

Part of those plans involve a “new” operating system to replace what is known to the public as “HomePod Software” (and internally as audioOS). This so-called “homeOS” has been rumored for a bit, and some circumstantial evidence points to some possible pre-WWDC trademark activity around that name. Like the current HomePod software—and just about every other OS Apple maintains—homeOS would likely be a specialized offshoot of iOS. But even if it doesn’t come with new hardware right away, new branding could suggest that Apple is getting ready to expand its smart home ambitions.

What about AI?

Finally, it wouldn’t be a mid-2020s tech keynote without some kind of pronouncements about AI. Last year’s WWDC was the big public unveiling of Apple Intelligence, and (nearly) every one of Apple’s product announcements since then has made a point of highlighting the hardware’s AI capabilities.

We’d definitely expect Apple to devote some time to Apple Intelligence, but the company may be more hesitant to announce big new features in advance, following a news cycle where even normally sympathetic Apple boosters like Daring Fireball’s John Gruber excoriated the company for promising AI features that it was nowhere near ready to launch—or even to demo to the public. The executives handling Apple’s AI efforts were reshuffled following that news cycle; whether it was due to Gruber’s piece or the underlying problems outlined in the article is anyone’s guess.

Apple will probably try to find a middle road, torn between not wanting to overpromise and underdeliver and not wanting to seem “behind” on the tech industry’s biggest craze. There’s a decent chance that the new “more personalized” version of Siri will finally make a public appearance. But I’d guess that Apple will focus more on iterations of existing Apple Intelligence features like summaries or Writing Tools rather than big swings.

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

What to expect from Apple’s Worldwide Developers Conference next week Read More »

the-nine-armed-octopus-and-the-oddities-of-the-cephalopod-nervous-system

The nine-armed octopus and the oddities of the cephalopod nervous system


A mix of autonomous and top-down control manage the octopus’s limbs.

With their quick-change camouflage and high level of intelligence, it’s not surprising that the public and scientific experts alike are fascinated by octopuses. Their abilities to recognize faces, solve puzzles, and learn behaviors from other octopuses make these animals a captivating study.

To perform these processes and others, like crawling or exploring, octopuses rely on their complex nervous system, one that has become a focus for neuroscientists. With about 500 million neurons—around the same number as dogs—octopuses’ nervous systems are the most complex of any invertebrate. But, unlike vertebrate organisms, the octopus’s nervous system is also decentralized, with around 350 million neurons, or 66 percent of it, located in its eight arms.

“This means each arm is capable of independently processing sensory input, initiating movement, and even executing complex behaviors—without direct instructions from the brain,” explains Galit Pelled, a professor of Mechanical Engineering, Radiology, and Neuroscience at Michigan State University who studies octopus neuroscience. “In essence, the arms have their own ‘mini-brains.’”

A decentralized nervous system is one factor that helps octopuses adapt to changes, such as injury or predation, as seen in the case of an Octopus vulgaris, or common octopus, that was observed with nine arms by researchers at the ECOBAR lab at the Institute of Marine Research in Spain between 2021 and 2022.

By studying outliers like this cephalopod, researchers can gain insight into how the animal’s detailed scaffolding of nerves changes and regrows over time, uncovering more about how octopuses have evolved over millennia in our oceans.

Brains, brains, and more brains

Because each arm of an octopus contains its own bundle of neurons, the limbs can operate semi-independently from the central brain, enabling faster responses since signals don’t always need to travel back and forth between the brain and the arms. In fact, Pelled and her team recently discovered that “neural signals recorded in the octopus arm can predict movement type within 100 milliseconds of stimulation, without central brain involvement.” She notes that “that level of localized autonomy is unprecedented in vertebrate systems.”

Though each limb moves on its own, the movements of the octopus’s body are smooth and conducted with a coordinated elegance that allows the animal to exhibit some of the broadest range of behaviors, adapting on the fly to changes in its surroundings.

“That means the octopus can react quickly to its environment, especially when exploring, hunting, or defending itself,” Pelled says. “For example, one arm can grab food while another is feeling around a rock, without needing permission from the brain. This setup also makes the octopus more resilient. If one arm is injured, the others still work just fine. And because so much decision-making happens at the arms, the central brain is freed up to focus on the bigger picture—like navigating or learning new tasks.”

As if each limb weren’t already buzzing with neural activity, things get even more intricate when researchers zoom in further—to the nerves within each individual sucker, a ring of muscular tissue, which octopuses use to sense and taste their surroundings.

“There is a sucker ganglion, or nerve center, located in the stalk of every sucker. For some species of octopuses, that’s over a thousand ganglia,” says Cassady Olson, a graduate student at the University of Chicago who works with Cliff Ragsdale, a leading expert in octopus neuroscience.

Given that each sucker has its own nerve centers—connected by a long axial nerve cord running down the limb—and each arm has hundreds of suckers, things get complicated very quickly, as researchers have historically struggled to study this peripheral nervous system, as it’s called, within the octopus’s body.

“The large size of the brain makes it both really exciting to study and really challenging,” says Z. Yan Wang, an assistant professor of biology and psychology at the University of Washington. “Many of the tools available for neuroscience have to be adjusted or customized specifically for octopuses and other cephalopods because of their unique body plans.”

While each limb acts independently, signals are transmitted back to the octopus’s central nervous system. The octopus’ brain sits between its eyes at the front of its mantle, or head, couched between its two optic lobes, large bean-shaped neural organs that help octopuses see the world around them. These optic lobes are just two of the over 30 lobes experts study within the animal’s centralized brain, as each lobe helps the octopus process its environment.

This elaborate neural architecture is critical given the octopus’s dual role in the ecosystem as both predator and prey. Without natural defenses like a hard shell, octopuses have evolved a highly adaptable nervous system that allows them to rapidly process information and adjust as needed, helping their chances of survival.

Some similarities remain

While the octopus’s decentralized nervous system makes it a unique evolutionary example, it does have some structures similar to or analogous to the human nervous system.

“The octopus has a central brain mass located between its eyes, and an axial nerve cord running down each arm (similar to a spinal cord),” says Wang. “The octopus has many sensory systems that we are familiar with, such as vision, touch (somatosensation), chemosensation, and gravity sensing.”

Neuroscientists have homed in on these similarities to understand how these structures may have evolved across the different branches in the tree of life. As the most recent common ancestor for humans and octopuses lived around 750 million years ago, experts believe that many similarities, from similar camera-like eyes to maps of neural activities, evolved separately in a process known as convergent evolution.

While these similarities shed light on evolution’s independent paths, they also offer valuable insights for fields like soft robotics and regenerative medicine.

Occasionally, unique individuals—like an octopus with an unexpected number of limbs—can provide even deeper clues into how this remarkable nervous system functions and adapts.

Nine arms, no problem

In 2021, researchers from the Institute of Marine Research in Spain used an underwater camera to follow a male Octopus vulgaris, or common octopus. On its left side, three arms were intact, while the others were reduced to uneven, stumpy lengths, sharply bitten off at varying points. Although the researchers didn’t witness the injury itself, they observed that the front right arm—known as R1—was regenerating unusually, splitting into two separate limbs and giving the octopus a total of nine arms.

“In this individual, we believe this condition was a result of abnormal regeneration [a genetic mutation] after an encounter with a predator,” explains Sam Soule, one of the researchers and the first author on the corresponding paper recently published in Animals.

The researchers named the octopus Salvador due to its bifurcated arm coiling up on itself like the two upturned ends of Salvador Dali’s moustache. For two years, the team studied the cephalopod’s behavior and found that it used its bifurcated arm less when doing “riskier” movements such as exploring or grabbing food, which would force the animal to stretch its arm out and expose it to further injury.

“One of the conclusions of our research is that the octopus likely retains a long-term memory of the original injury, as it tends to use the bifurcated arms for less risky tasks compared to the others,” elaborates Jorge Hernández Urcera, a lead author of the study. “This idea of lasting memory brought to mind Dalí’s famous painting The Persistence of Memory, which ultimately became the title of the paper we published on monitoring this particular octopus.”

While the octopus acted more protective of its extra limb, its nervous system had adapted to using the extra appendage, as the octopus was observed, after some time recovering from its injuries, using its ninth arm for probing its environment.

“That nine-armed octopus is a perfect example of just how adaptable these animals are,” Pelled adds. “Most animals would struggle with an unusual body part, but not the octopus. In this case, the octopus had a bifurcated (split) arm and still used it effectively, just like any other arm. That tells us the nervous system didn’t treat it as a mistake—it figured out how to make it work.”

Kenna Hughes-Castleberry is the science communicator at JILA (a joint physics research institute between the National Institute of Standards and Technology and the University of Colorado Boulder) and a freelance science journalist. Her main writing focuses are quantum physics, quantum technology, deep technology, social media, and the diversity of people in these fields, particularly women and people from minority ethnic and racial groups. Follow her on LinkedIn or visit her website.

The nine-armed octopus and the oddities of the cephalopod nervous system Read More »

anthropic-releases-custom-ai-chatbot-for-classified-spy-work

Anthropic releases custom AI chatbot for classified spy work

On Thursday, Anthropic unveiled specialized AI models designed for US national security customers. The company released “Claude Gov” models that were built in response to direct feedback from government clients to handle operations such as strategic planning, intelligence analysis, and operational support. The custom models reportedly already serve US national security agencies, with access restricted to those working in classified environments.

The Claude Gov models differ from Anthropic’s consumer and enterprise offerings, also called Claude, in several ways. They reportedly handle classified material, “refuse less” when engaging with classified information, and are customized to handle intelligence and defense documents. The models also feature what Anthropic calls “enhanced proficiency” in languages and dialects critical to national security operations.

Anthropic says the new models underwent the same “safety testing” as all Claude models. The company has been pursuing government contracts as it seeks reliable revenue sources, partnering with Palantir and Amazon Web Services in November to sell AI tools to defense customers.

Anthropic is not the first company to offer specialized chatbot services for intelligence agencies. In 2024, Microsoft launched an isolated version of OpenAI’s GPT-4 for the US intelligence community after 18 months of work. That system, which operated on a special government-only network without Internet access, became available to about 10,000 individuals in the intelligence community for testing and answering questions.

Anthropic releases custom AI chatbot for classified spy work Read More »

deepseek-r1-0528-did-not-have-a-moment

DeepSeek-r1-0528 Did Not Have a Moment

When r1 was released in January 2025, there was a DeepSeek moment.

When r1-0528 was released in May 2025, there was no moment. Very little talk.

Here is a download link for DeepSeek-R1-0528-GGUF.

It seems like a solid upgrade. If anything, I wonder if we are underreacting, and this illustrates how hard it is getting to evaluate which models are actually good.

What this is not is the proper r2, nor do we have v4. I continue to think that will be a telltale moment.

For now, what we have seems to be (but we’re not sure) a model that is solid for its price and status as an open model, but definitely not at the frontier, that you’d use if and only if you wanted to do something that was a very good fit and played to its strong suits. We likely shouldn’t update much either way on v4 and r2, and DeepSeek has a few more months before it starts being conspicuous that we haven’t seen them.

We all remember The DeepSeek Moment, which led to Panic at the App Store, lots of stock market turmoil that made remarkably little fundamental sense and that has been born out as rather silly, a very intense week and a conclusion to not panic after all.

Over several months, a clear picture emerged of (most of) what happened: A confluence of narrative factors transformed DeepSeek’s r1 from an impressive but not terribly surprising model worth updating on into a shot heard round the world, despite the lack of direct ‘fanfare.’

In particular, these all worked together to cause this effect:

  1. The ‘six million dollar model’ narrative. People equated v3’s marginal compute costs with the overall budget of American labs like OpenAI and Anthropic. This is like saying DeepSeek spent a lot less on apples than OpenAI spent on food. When making an apples-to-apples comparison, DeepSeek spent less, but the difference was far less stark.

  2. DeepSeek simultaneously released an app that was free with a remarkably clean design and visible chain-of-thought (CoT). DeepSeek was fast following, so they had no reason to hide the CoT. Comparisons only compared DeepSeek’s top use cases to the same use cases elsewhere, ignoring the features and use cases DeepSeek lacked or did poorly on. So if you wanted to do first-day free querying, you got what was at the time a unique and viral experience. This forced other labs to also show CoT and accelerate release of various models and features.

  3. It takes a while to know how good a model really is, and the different style and visible CoT and excitement made people think r1 was better than it was.

  4. The timing was impeccable. DeepSeek got in right before a series of other model releases. Within two weeks it was very clear that American labs remained ahead. This was the peak of a DeepSeek cycle and the low point in others cycles.

  5. The timing was also impeccable in terms of the technology. This was very early days of RL scaling, such that the training process could still be done cheaply. DeepSeek did a great job extracting the most from its chips, but they are likely going to have increasing trouble with its compute disadvantage going forwards.

  6. DeepSeek leveraged the whole ‘what even is safety testing’ and fast following angles, shipping as quickly as possible to irrevocably release its new model the moment it was at all viable to do so, making it look relatively farther along and less behind than they were. Teortaxes notes that the R1 paper pointed out a bunch of things that needed fixing but that DeepSeek did not have time to fix back then, and that R1-0528 fixes them, and which weren’t ‘counted’ during the panic.

  7. DeepSeek got the whole ‘momentum’ argument going. China had previously been much farther behind in terms of released models, DeepSeek was now less behind (and some even said was ahead), and people thought ‘oh that means soon they’ll be ahead.’ Whereas no, you can’t assume that, and also moving from a follower to a leader is a big leap.

  8. There was highly related to a widespread demand for a ‘China caught up to the USA’ narrative, from China fans and also from China hawks of all sorts. Going forward, we are left with a ‘missile gap’ style story.

  9. There are also a lot of people always pushing the ‘open models win’ argument, and who think that non-open models are some combination of doomed and don’t count. These people are very vocal, and vibes are a weapon of choice, and some have close ties to the Trump administration.

  10. The stock market was highly lacking in situational awareness, so they considered this release much bigger news than it was, and it caused various people to ‘wake up’ to things that were already known and anticipate others waking up, and there was widespread misunderstanding of how any of the underlying dynamics worked, including Jevon’s Paradox and also that if you want to run r1 you go out and buy more chips, including Nvidia chips. It is also possible that a lot of the DeepSeek stock market reaction was actually about insider trading of Trump policy announcements. Essentially: The Efficient Market Hypothesis Is False.

I continue to believe that when R2 arrives (or fails to arrive for a long time), this will tell us a lot either way, whereas the R1-0528 we got is not a big update. If R1-0528 had been a fully top model and created another moment, that would of course huge, but all results short of that are pretty similar.

I stand by what I said in AI #118 on this:

Miles Brundage: Relatedly, DeepSeek’s R2 will not tell us much about where they will be down the road, since it will presumably be based on a similarish base model.

Today RL on small models is ~everyone’s ideal focus, but eventually they’ll want to raise the ceiling.

Frontier AI research and deployment today can be viewed, if you zoom out a bit, as a bunch of “small scale derisking runs” for RL.

The Real Stuff happens later this year and next year.

(“The Real Stuff” is facetious because it will be small compared to what’s possible later)

Zvi Mowshowitz: I think R2 (and R1-0528) will actually tell us a lot, on at least two fronts.

It will tell us a lot about whether this general hypothesis is mostly true.

It will tell us a lot about how far behind DeepSeek really is.

It will tell us a lot about how big a barrier will it be that DS is short on compute.

R1 was, I believe, highly impressive and the result of cracked engineering, but also highly fortunate in exactly when and how it was released and in the various narratives that were spun up around it. It was a multifaceted de facto sweet spot.

If DeepSeek comes out with an impressive R2 or other upgrade within the next few months (which they may have just done), especially if it holds up its position actively better than R1 did, then that’s a huge deal. Whereas if R2 comes out and we all say ‘meh it’s not that much better than R1’ I think that’s also a huge deal, strong evidence that the DeepSeek panic at the app store was an overreaction.

If R1-0528 turns out to be only a minor upgrade, that alone doesn’t say much, but the clock would be ticking. We shall see.

Teortaxes: I’m not sure what Miles means by similarish, but I agree more with @TheZvi here: R2 will be quite informative. It’s clear that DeepSeek are reinventing their data and RL pipelines as well as model architecture. R2/V4 will be their biggest departure from convention to date.

Is it possible this was supposed to be R2, but they changed the name due to it being insufficient impressive? Like everyone but Chubby here I strongly think no.

I will however note that DeepSeek has a reputation for being virtuous straight shooters that is based on not that many data points, and one key point of that was their claim to have not done distillation, a claim that now seems questionable.

The state of benchmarking seems rather dismal.

This could be the strongest argument that the previous DeepSeek market reaction was massively overblown (or even a wrong-way move). If DeepSeek giving us a good model is so important to the net present value of our future cash flows, how is no one even bothering to properly benchmark r1-0528?

And how come, when DeepSeek released their next model, Nvidia was up +4%? It wasn’t an especially impressive model, but I have to presume it was a positive update versus getting nothing, unless the market is saying that this proves they likely don’t have it. In which case, I think that’s at least premature.

Evals aren’t expensive by the standards of hedge funds, indeed one of the hedge funds (HighFlyer) is how DeepSeek got created and funded.

Miles Brundage: Wild that with DeepSeek having caused a trillion dollar selloff not that long ago + being such a big talking point for so many people, they dropped a new model several hours ago + no one seems to have run a full eval suite on it yet.

Something something market failure

And yes evals can be expensive but not that expensive by the standards of this industry. And yeah they can be slow, but come on, literally trillion dollar stakes seems like a reason to find ways to speed it up (if you believe the market)??

We’ll see if this is fixed tomorrow…

DeepSeek subsequently released some evals btw (were not included in the initial release/when I tweeted, I think). Still good for others to verify them of course, and these are not exhaustive.

Gwern: Looks like ‘DeepSeek-r1 single-handedly caused a trillion-dollar crash’ has been refuted at the level of confidence of ‘NVDA went up +4% after the next ~DS-r2 release half a year later’.

I notice that on GPQA Diamond, DeepSeek claims 81% and Epoch gives them 76%.

I am inclined to believe Epoch on that, and of course DeepSeek gets to pick which benchmarks to display whether or not they’re testing under fair conditions.

DeepSeek clearly have in various ways been trying to send the impression that R1-0528 is on par with o3, Gemini-2.5-Pro and Claude 4 Opus.

That is incompatible with the lack of excitement and reaction. If an open weights model at this price point was actually at the frontier, people would be screaming. You wouldn’t be able to find a quiet rooftop.

Peter Wildeford: Latest DeepSeek 4-11 months behind US:

~5 months behind US SOTA on GPQA Diamond

~4 months behind on MATH lvl 5

~11 months behind on SWE-Bench-Verified

We need more good evals to benchmark the US-China gap. Kudos to @EpochAIResearch for doing some of this work.

Math level 5 is fully saturated as of o3 so this should be the last time we use it.

Epoch AI: DeepSeek has released DeepSeek-R1-0528, an updated version of DeepSeek-R1. How does the new model stack up in benchmarks? We ran our own evaluations on a suite of math, science, and coding benchmarks. Full results in thread!

On GPQA Diamond, a set of PhD-level multiple-choice science questions, DeepSeek-R1-0528 scores 76% (±2%), outperforming the previous R1’s 72% (±3%). This is generally competitive with other frontier models, but below Gemini 2.5 Pro’s 84% (±3%).

On MATH Level 5, the hardest tier of the well-known MATH benchmark, R1-0528 achieves 97% accuracy, similar to the 98% scored by o3 and o4-mini. This benchmark has essentially been mastered by leading models.

On OTIS Mock AIME, a more difficult competition math benchmark that is based on the AIME exam, DeepSeek-R1-0528 scores 66% (±5%).

This improves substantially on the original R1’s 53% (±8%), but falls short of leading models such as o3, which scored 84% (±4%).

On SWE-bench Verified, a benchmark of real-world software engineering tasks, DeepSeek-R1-0528 scores 33% (±2%), competitive with some other strong models but well short of Claude 4.

Performance can vary with scaffold; we use a standard scaffold based on SWE-agent.

On SWE-bench Verified, DeepSeek-R1-0528 explores and edits files competently, but often submits patches prematurely without thoroughly verifying them.

You can find more information on the runs in our Log Viewer.

Here are the Lech Mazur benchmarks, where the scores are a mixed bag but overall pretty good.

There is no improvement on WeirdML.

Havard Ihle: New R1 seems not that optimised for coding! No improvement on WeirdML. It is smart, but it has more variance, so many strong results, but also a much higher failure rate (45%, up from 30% for old R1). Often weird syntax errors or repeated tokens, even at the preferred temp of 0.6

The initial headlines were what you would expect, and were essentially ‘remember that big DeepSeek moment? Those guys gave us a new version.’

Here’s CNBC.

The upgraded model has “major improvements in inference and hallucination reduction,” Yakefu [an AI researcher at HuggingFace] said, adding that “this version shows DeepSeek is not just catching up, it’s competing.”

The upgraded DeepSeek R1 model is just behind OpenAI’s o4-mini and o3 reasoning models on LiveCodeBench, a site that benchmarks models against different metrics.

“DeepSeek’s latest upgrade is sharper on reasoning, stronger on math and code, and closing in on top-tier models like Gemini and O3,” Adina Yakefu, AI researcher at Hugging Face, told CNBC.

Yakefu is effectively talking their own book here. I don’t see why we should interpret this as catching up, everyone is reducing hallcinations and costs, but certainly DeepSeek are competing. How successfully they are doing so, and in what league is the question.

One can perhaps now see how wrong we were to overreact so much to the first r1. Yes, r1-0528 is DeepSeek ‘catching up’ or ‘closing in’ in the sense that DeepSeek’s relative position looks now, right after a release, better than it looked on May 27. But it does not look better than when I wrote ‘on DeepSeek’s r1’ in January and LiveCodeBench appears at best cherry picked.

The article concludes with Nvidia CEO Huang making his typical case that because China can still make some AI models and build some chips, we should sacrifice our advantages in compute on the altar of Nvidia’s stock price and market share.

Here’s Bloomberg’s Luz Ding, who notes up front that the company calls it a ‘minor trial upgrade,’ so +1 to Ding, but there isn’t much additional information here.

A search of the Washington Post and Wall Street Journal failed to find any articles at all covering this event. If r1 was such a big moment, why is this update not news? Even if it was terribly disappointing, shouldn’t that also be news?

Normally, in addition to evals, I expect to see a ton of people’s reactions, and more when I open up my reactions thread.

This time, crickets. So I get to include a lot of what did show up.

Teortaxes: my thesis is that this is just what R1 was supposed to be as a product. Direct gains on benchmarks are in line with expectations. What is interesting is that it’s more «Westoid», has a sycophancy problem, and its CoTs are totally different in manner.

Both V3-original and R1-original should be thought of as *previews. We know they shipped them as fast as they could, with little post-training (≈$10K for V3 not including context extension, maybe $1M for R1). 0324, 0528 are what they’d do originally, had they more time & hands.

(they don’t advertise it here but they also fixed system prompt neglect/adverse efficiency, multi-turn, language consistency between CoT and response, and a few other problems with R1-old. It doesn’t deserve a paper because we’ve had all such papers done by January)

Teortaxes highlights where the original R1 paper says they plan to fix these limitations. This seems like a reasonable way to think about it, R1-0528 is the version of R1 that isn’t being rushed out the door in a sprint with large compute limitations.

Alexander Doria: DeepSeek new R1 is expectedly great, but I admit I’m more eagerly waiting for the paper. Hopefully tying generalist reward, subgoal models from prover and overall engineering challenge of scaling RL (GRPO over nccl?)

Satwik Patil: I have ran about 15 runs of something like the AI 2027 TTX board game with it and it is very good at it. Unlike other models, particularly Gemini, it actually accepts that bad things can happen(or it can do bad things) and plays them out.

It is also the least aware of any model that it’s thoughts can be read by the user.

Petr Baudis: A lot of people were hungry for R1 on the initial release as there were no interesting reasoning models outside of OpenAI at that point (besides the fact that this is opensource).

But I doubt too many of the Twitter frontier use DeepSeek daily right now.

Michael Roe: 0528 really overthought my “simulate an IBM mainframe” prompt. Its chain of thought was much more verbose that the previous R1 for this particular prompt. But, ok, it did give me a simulation.

It’s CoT even had pseudocode for algorithms. And, to be fair, its CoT foregrounds an issue that previous R1 missed, that the whole idea assume you can reconstruct the state from the history of terminal input and output. 0528 realises that it’s going to have to stash internal state somewhere.

0528 was able to solve a problem that I couldn’t gr5 an answer to before. Basically, i had Chinese text in Wade-Giles transliteration with a mangled word, and the problem is to use contextual clues to find a word that makes sense in context and sounds a bit like the text.

i’m using 0528, but no definite conclusions yet. The tabletop rpg scenario where I have a talking squirrel sidekick now has a singing, talking squirrel sidekick when I run it with 0528. (Obviously, that squirrel is a nod to Disney). 0528 is subjectively better, in that it can commit to the bit even better than the previous version.

bad8691: I didn’t know a new version would get released. Today I asked a question to it as usual and reading the reasoning traces, immediately said “wow, looks like its social intelligence has visibly improved”. Don’t know about the benchmarks but it was just obvious to me.

xEC40: its a lot different, lot of chinese on ambiguous prompts unlike 0324 and r1, but good prompt adherence. i gotta play with it more

Dominik Lukes: As is now the norm with new model releases, it’s hard to say what the real-world benefit just from the benchmarks or stated intention for developers. The clear need is now for everybody do develop their own evals for actual usage and run them against the model. In my own informal testing I can’t tell the difference.

Oli: basicly jus a slightly smarter version of r1 but with all the same limitations of the old one no function calling text only etc so its not that useful in practice qwen 3 is still superior

Biosemiote: I love deepseek poetry. Not sure if better than previous models, but miles above the competition.

Through the Ice Lens

Cold grinds the creek to glass—

A lens of pure, cracked grace—

And in its flawed design,

The world aligns:

A spider’s web, exact as wire,

The sun’s last coal, a stolen fire.

But look— trapped bubbles rise like breath,

A drowned leaf’s veins rehearsing death.

Gwern: Nah, that’s still r1 cringelord. Claude & Gemini are much better for poetry. (And still the usual semantic problems of interesting images interspersed with nonsense: how can cold ‘grind’ water to solid glass? In some frozen river ice, what is the “sun’s last coal”, exactly? etc)

Leo Abstract: it also is far better at understanding text-based divination systems. given how it is being used in china, i think neither of these strengths are an accident, and them not going away with updates confirms this.

it seems as though it’s trying harder to understand the complexities instead of just scanning the spread or cast or chart for ways to glaze you. if you’ve kept notes, try having it (vs, say, gemini 2.5 pro) duplicate your previous work.

This was the high end of opinion, as xjdr called it a frontier model, which most people clearly don’t agree with at all, and kalomaze calls it ‘excellence’ but this is relative to its size:

xjdr: R1-0528 is mostly interchangeable (for me) with gemini pro 0520 and opus 4. it has a distinctly gemini 2.5 pro 0325 flavor which is not my favorite, but the quality is impossible to deny (i preferred the o1 on adderall flavor personally). we officially have frontier at home.

Teortaxes (linking to Epoch): what is the cope for this?

xjdr: Not sure what they are doing to elicit this behavior. If anything, I find R1 0528 to be verbose and overly thorough. Admittedly, my workflows are much more elaborate than ‘please fix this’ so I ensure plans are made and implemented, patches compile, tests are written and pass, etc

Teortaxes: «we use a standard scaffold based on SWE-agent.» It’s obviously better at function calling than R1 but not clear if this is a good fit.

xjdr: I found it performed very well with xml based custom function definitions. It performed reasonably well with the function defs as defined in the huggingface tokenizer template. One thing I do often (I do this for most ‘thinking’ models) is i prefill the response with:

To make the output parsing more consistent.

Kalomaze: r1-0528 is excellence

Zephyr: How did it perform in ur tests? Compared to o3/2.5 Pro/Opus 4??

Kalomaze: a majority of the models that you just described are either way too rate limited expensive or otherwise cumbersome to be worth directly comparing to a model so cheap tbh

Bob from Accounting: Very impressive for its price, but not quite SOTA on anything.

That was also the reaction to the original r1, and I sort of expect that trend to continue.

Kalomaze: new r1 watered my crops, cleared my skin, etc

However:

Kalomaze: none of the sft distills are good models because you still need RL and sft local biases compound.

Different people had takes about the new style and what it reminded them of.

Cabinet: A lot of its outputs smell like 4o to me. This was more extreme on deepseek web vs using 0528 on API, but both feel directionally more “productized,” more sycophantic, more zoomer-reddit phrasing, etc. Does feel a step smarter than R1.0 tho. Just more annoying.

A key question that was debated about the original r1 was whether it was largely doing distillation, as in training on the outputs of other models, effectively reverse engineering. This is the most common way to fast follow and a Chinese specialty. DeepSeek explicitly denied it was doing this, but we can’t rely on that.

If they did it, this doesn’t make r1’s capabilities any less impressive, the model can do what the model can do. But it does mean that DeepSeek is effectively a lot farther behind and further away from being able to ‘take the lead.’ So DeepSeek might be releasing models comparable to what was available 4-8 months ago, but still be 12+ months behind in terms of ability to push the frontier. Both measures matter.

Gallabytes: I know we all stan deepseek here in 🐋POT but the distribution shift from 4o-like to Gemini-like for output suggests that the distillation claims are likely true and this should change the narrative more than it has IMO.

It’s unclear how much they rely on distillation, whether it’s key or merely convenient, but the more important it is the less seriously we should count them in the frontier.

it brings me no joy to report this, I really like the story of a lean player catching up to the giants in a cave with a box of scraps. certainly their pre-training efficiency is still quite impressive. but the rest of this situation smells increasingly sus.

Teortaxes: I think the claim of distillation is sound, since Gemini traces were open. I prefer Gemini reasoning, it’s more efficient than early R1 mumbling. This probably explains some but not all of the performance gains. Imo it’s reasonable to feel disappointed, but not despair.

Fraser Paine: This isn’t actually bad, big corps pay hundreds of millions to generate data, getting this from existing models and distilling is a cost effective way to fast-follow. Doesn’t mean 🐋can’t push elsewhere to achieve frontier performance on tools or training/inference efficiency.

Then again, claims can differ:

AGI 4 President 2028: #1 take away is that OpenAI’s attempt to curb distillation were successful.

I agree with Fraser, it’s not a bad move to be doing this if allowed to do so, but it would lower our estimate of how capable DeepSeek is going forwards.

If Teortaxes is saying the claim of distillation is sound, I am inclined to believe that, especially given there is no good reason not to do it. This is also consistent with his other observations above, such as it displaying a more ‘westoid’ flavor and having a sycophancy issue, and a different style of CoT.

If you place high value in the cost and open nature of r1-0528, it is probably a solid model for the places where it is strong, although I haven’t kept up with details of the open model space enough to be sure, especially given so little attention this got. If you don’t place high value on both open weights and token costs, it is probably a pass.

The biggest news here is the lack of news, the dog that did not bark. A new DeepSeek release that panics everyone once again was a ready made headline. I know I was ready for it. It didn’t happen. If this had been an excellent model, it would have happened. This also should make us reconsider our reactions the first time around.

Discussion about this post

DeepSeek-r1-0528 Did Not Have a Moment Read More »

2025-acura-adx-review:-a-crossover-that-balances-budget-with-spirit

2025 Acura ADX review: A crossover that balances budget with spirit

As you might imagine, a steady stream of cars to review comes and goes from my parking spot. Some weeks, they stand out, like the bright green Aston Martin, the murdered-out Bentley, or the VW ID. Buzz you can read about soon; these cars usually spark conversations with neighbors, particularly those who don’t know why there’s a different vehicle in that spot each week.

At other times, the vehicles are more anonymous, and I’m not sure this ADX sparked any community discussions. Compact crossovers are a popular breed and blend into the background—particularly when they’re painted an unobtrusive shade.

Which is not to say the ADX is not handsome; the Urban Gray Pearl paint looked good even in the near-constant rain (which explains the Acura-supplied images rather than my own) that coincided with our time with the tester. And from the driver’s seat, the view down the hood, along those creases, is a lot more interesting than most comparable crossovers, considering the ADX’s $35,000 starting price.

It’s built in Mexico alongside the Honda HR-V and shares a platform with the Acura Integra and Honda Civic. Like the HR-V, there’s no hybrid option, just a 1.5 L turbocharged four-cylinder engine with 190 hp (142 kW) and 179 lb-ft (243 Nm). There’s also only one choice of transmission—a continuously variable transmission.

This has preprogrammed “shift points” that simulate the gears of a more conventional transmission. Acura says it has been programmed for performance, and you can use the steering wheel’s paddles to change up or down a virtual gear. The turbocharged engine starts making all its torque from 1,700 rpm, but if you’re in a hurry, you’ll first hear the revs rise out of proportion to the initial increase in speed, and full power doesn’t arrive until 6,000 rpm, with the redline 500 rpm further north. It’s not the best-sounding engine in the world when it’s revved up, either.

As standard, the ADX is just front-wheel drive, but all-wheel drive is available for an additional $2,000. This can send up to half the engine’s available torque to the rear wheels, but expect it to be front-biased in day-to-day driving conditions.

Acura ADX hood

I love the view from the driver’s seat down the spine of the hood. Credit: Acura

Fuel efficiency is not remarkable, but here it’s about average for the class. With all-wheel drive, as tested, the EPA estimates 27 mpg combined (8.7 L/100 km); I actually returned a slightly better 28 mpg (8.4 L/100 km) across the week.

2025 Acura ADX review: A crossover that balances budget with spirit Read More »

fda-rushed-out-agency-wide-ai-tool—it’s-not-going-well

FDA rushed out agency-wide AI tool—it’s not going well

FDA staffers who spoke with Stat news, meanwhile, called the tool “rushed” and said its capabilities were overinflated by officials, including Makary and those at the Department of Government Efficiency (DOGE), which was headed by controversial billionaire Elon Musk. In its current form, it should only be used for administrative tasks, not scientific ones, the staffers said.

“Makary and DOGE think AI can replace staff and cut review times, but it decidedly cannot,” one employee said. The staffer also said that the FDA has failed to set up guardrails for the tool’s use. “I’m not sure in their rush to get it out that anyone is thinking through policy and use,” the FDA employee said.

According to Stat, Elsa is based on Anthropic’s Claude LLM and is being developed by consulting firm Deloitte. Since 2020, Deloitte has been paid $13.8 million to develop the original database of FDA documents that Elsa’s training data is derived from. In April, the firm was awarded a $14.7 million contract to scale the tech across the agency. The FDA said that Elsa was built within a high-security GovCloud environment and offers a “secure platform for FDA employees to access internal documents while ensuring all information remains within the agency.”

Previously, each center within the FDA was working on its own AI pilot. However, after cost-cutting in May, the AI pilot originally developed by the FDA’s Center for Drug Evaluation and Research, called CDER-GPT, was selected to be scaled up to an FDA-wide version and rebranded as Elsa.

FDA staffers in the Center for Devices and Radiological Health told NBC News that their AI pilot, CDRH-GPT, is buggy, isn’t connected to the Internet or the FDA’s internal system, and has problems uploading documents and allowing users to submit questions.

FDA rushed out agency-wide AI tool—it’s not going well Read More »

us-science-is-being-wrecked,-and-its-leadership-is-fighting-the-last-war

US science is being wrecked, and its leadership is fighting the last war


Facing an extreme budget, the National Academies hosted an event that ignored it.

WASHINGTON, DC—The general outline of the Trump administration’s proposed 2026 budget was released a few weeks back, and it included massive cuts for most agencies, including every one that funds scientific research. Late last week, those agencies began releasing details of what the cuts would mean for the actual projects and people they support. And the results are as bad as the initial budget had suggested: one-of-a-kind scientific experiment facilities and hardware retired, massive cuts in supported scientists, and entire areas of research halted.

And this comes in an environment where previously funded grants are being terminated, funding is being held up for ideological screening, and universities have been subjected to arbitrary funding freezes. Collectively, things are heading for damage to US science that will take decades to recover from. It’s a radical break from the trajectory science had been on.

That’s the environment that the US’s National Academies of Science found itself in yesterday while hosting the State of the Science event in Washington, DC. It was an obvious opportunity for the nation’s leading scientific organization to warn the nation of the consequences of the path that the current administration has been traveling. Instead, the event largely ignored the present to worry about a future that may never exist.

The proposed cuts

The top-line budget numbers proposed earlier indicated things would be bad: nearly 40 percent taken off the National Institutes of Health’s budget, the National Science Foundation down by over half. But now, many of the details of what those cuts mean are becoming apparent.

NASA’s budget includes sharp cuts for planetary science, which would be cut in half and then stay flat for the rest of the decade, with the Mars Sample Return mission canceled. All other science budgets, including Earth Science and Astrophysics, take similar hits; one astronomer posted a graphic showing how many present and future missions that would mean. Active missions that have returned unprecedented data, like Juno and New Horizons, would go, as would two Mars orbiters. As described by Science magazine’s news team, “The plans would also kill off nearly every major science mission the agency has not yet begun to build.”

A NASA graphic showing different missions focused on astrophysics. Red Xs have been superimposed on most of them.

A chart prepared by astronomer Laura Lopez showing just how many astrophysics missions will be cancelled. Credit: Laura Lopez

The National Science Foundation, which funds much of the US’s fundamental research, is also set for brutal cuts. Biology, engineering, and education will all be slashed by over 70 percent; computer science, math and physical science, and social and behavioral science will all see cuts of over 60 percent. International programs will take an 80 percent cut. The funding rate of grant proposals is expected to drop from 26 percent to just 7 percent, meaning the vast majority of grants submitted to the NSF will be a waste of time. The number of people involved in NSF-funded activities will drop from over 300,000 to just 90,000. Almost every program to broaden participation in science will be eliminated.

As for specifics, they’re equally grim. The fleet of research ships will essentially become someone else’s problem: “The FY 2026 Budget Request will enable partial support of some ships.” We’ve been able to better pin down the nature and location of gravitational wave events as detectors in Japan and Italy joined the original two LIGO detectors; the NSF will reverse that progress by shutting one of the LIGOs. The NSF’s contributions to detectors at the Large Hadron Collider will be cut by over half, and one of the two very large telescopes it was helping fund will be cancelled (say goodbye to the Thirty Meter Telescope). “Access to the telescopes at Kitt Peak and Cerro Tololo will be phased out,” and the NSF will transfer the facilities to other organizations.

The Department of Health and Human Services has been less detailed about the specific cuts its divisions will see, largely focusing on the overall numbers, which are down considerably. The NIH, which is facing a cut of over 40 percent, will be reorganized, with its 19 institutes pared down to just eight. This will result in some odd pairings, such as the dental and eye institutes ending up in the same place; genomics and biomedical imaging will likewise end up under the same roof. Other groups like the Centers for Disease Control and Prevention and the Food and Drug Administration will also face major cuts.

Issues go well beyond the core science agencies, as well. In the Department of Energy, funding for wind, solar, and renewable grid integration has been zeroed out, essentially ending all programs in this area. Hydrogen and fuel cells face a similar fate. Collectively, these had gotten over $600 billion dollars in 2024’s budget. Other areas of science at the DOE, such as high-energy physics, fusion, and biology, receive relatively minor cuts that are largely in line with the ones faced by administration priorities like fossil and nuclear energy.

Will this happen?

It goes without saying that this would amount to an abandonment of US scientific leadership at a time when most estimates of China’s research spending show it approaching US-like levels of support. Not only would it eliminate many key facilities, instruments, and institutions that have helped make the US a scientific powerhouse, but it would also block the development of newer and additional ones. The harms are so widespread that even topics that the administration claims are priorities would see severe cuts.

And the damage is likely to last for generations, as support is cut at every stage of the educational pipeline that prepares people for STEM careers. This includes careers in high-tech industries, which may require relocation overseas due to a combination of staffing concerns and heightened immigration controls.

That said, we’ve been here before in the first Trump administration, when budgets were proposed with potentially catastrophic implications for US science. But Congress limited the damage and maintained reasonably consistent budgets for most agencies.

Can we expect that to happen again? So far, the signs are not especially promising. The House has largely adopted the Trump administration’s budget priorities, despite the fact that the budget they pass turns its back on decades of supposed concerns about deficit spending. While the Senate has yet to take up the budget, it has also been very pliant during the second Trump administration, approving grossly unqualified cabinet picks such as Robert F. Kennedy Jr.

All of which would seem to call for the leadership of US science organizations to press the case for the importance of science funding to the US and highlight the damage that these cuts would cause. But, if yesterday’s National Academies event is anything to judge by, the leadership is not especially interested.

Altered states

As the nation’s premier science organization, and one that performs lots of analyses for the government, the National Academies would seem to be in a position to have its concerns taken seriously by members of Congress. And, given that the present and future of science in the US is being set by policy choices, a meeting entitled the State of the Science would seem like the obvious place to address those concerns.

If so, it was not obvious to Marcia McNutt, the president of the NAS, who gave the presentation. She made some oblique references to current problems, saying, “We are embarking on a radical new experiment in what conditions promote science leadership, with the US being the treatment group, and China as the control,” and acknowledged that “uncertainties over the science budgets for next year, coupled with cancellations of billions of dollars of already hard-won research grants, is causing an exodus of researchers.”

But her primary focus was on the trends that have been operative in science funding and policy leading up to but excluding the second Trump administration. McNutt suggested this was needed to look beyond the next four years. However, that ignores the obvious fact that US science will be fundamentally different if the Trump administration can follow through on its plans and policies; the trends that have been present for the last two decades will be irrelevant.

She was also remarkably selective about her avoidance of discussing Trump administration priorities. After noting that faculty surveys have suggested they spend roughly 40 percent of their time handling regulatory requirements, she twice mentioned that the administration’s anti-regulatory stance could be a net positive here (once calling it “an opportunity to help”). Yet she neglected to note that many of the abandoned regulations represent a retreat from science-driven policy.

McNutt also acknowledged the problem of science losing the bipartisan support it has enjoyed, as trust in scientists among US conservatives has been on a downward trend. But she suggested it was scientists’ responsibility to fix the problem, even though it’s largely the product of one party deciding it can gain partisan advantage by raising doubts about scientific findings in fields like climate change and vaccine safety.

The panel discussion that came after largely followed McNutt’s lead in avoiding any mention of the current threats to science. The lone exception was Heather Wilson, president of the University of Texas at El Paso and a former Republican member of the House of Representatives and secretary of the Air Force during the first Trump administration. Wilson took direct aim at Trump’s cuts to funding for underrepresented groups, arguing, “Talent is evenly distributed, but opportunity is not.” After arguing that “the moral authority of science depends on the pursuit of truth,” she highlighted the cancellation of grants that had been used to study diseases that are more prevalent in some ethnic groups, saying “that’s not woke science—that’s genetics.”

Wilson was clearly the exception, however, as the rest of the panel largely avoided direct mention of either the damage already done to US science funding or the impending catastrophe on the horizon. We’ve asked the National Academies’ leadership a number of questions about how it perceives its role at a time when US science is clearly under threat. As of this article’s publication, however, we have not received a response.

At yesterday’s event, however, only one person showed a clear sense of what they thought that role should be—Wilson again, whose strongest words were directed at the National Academies themselves, which she said should “do what you’ve done since Lincoln was president,” and stand up for the truth.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

US science is being wrecked, and its leadership is fighting the last war Read More »

dating-roundup-#6

Dating Roundup #6

Previously: #1, #2, #3, #4, #5

Dating Roundup #4 covered dating apps. Roundup #5 covered opening without them.

Dating Roundup #6 covers everything else.

  1. You’re Single Because You Can’t Handle Basic Logistics.

  2. You’re Single Because You Don’t Ask Questions.

  3. You’re Single Because of Your Terrible Dating Tactics.

  4. You’re Single Because You Refuse to Play Your Role.

  5. You’re Single Because People Are Crazy About Age Gaps.

  6. You’re Single and You Need Professional Help.

  7. You’re Single Because You Never Close.

  8. You’re Single Because You’re Bad at Sex And Everyone Knows.

  9. You’re Single Because You Are Only a Fan.

  10. You’re Single Because of Preference Falsification.

  11. You’re Single Because You Have Insufficient Visual Aids.

  12. You’re Single Because You Told Your Partner You Didn’t Want Them.

  13. You’re Single Because of Your Terrible Dating Strategy.

  14. You’re Single Because You Don’t Enjoy the Process.

  15. You’re Single Because You Don’t Escalate Quickly.

  16. You’re Single Because Your Standards Are Too High.

  17. You’re Single Because You Read the Wrong Books.

  18. You’re Single Because You’re Short, Sorry, That’s All There Is To It.

  19. You’re Single Because of Bad Government Incentives.

  20. You’re Single Because You Don’t Realize Cheating is Wrong.

  21. You’re Single Because You’re Doing Polyamory Wrong.

  22. You’re Single Because You Don’t Beware Cheaters.

  23. You’re Single Because Your Ex Spilled the Tea.

  24. You’re Single Because You’re Assigning People Numbers.

  25. You’re Single Because You Are The Wrong Amount of Kinky.

  26. You’re Single Because You’re Not Good Enough at Sex.

  27. You’re Single But Not Because of Your Bodycount.

  28. You’re Single Because They Divorced You.

  29. You’re Single Because No One Tells You Anything.

  30. You’re Single And You’re Not Alone.

  31. You’re Single Because Things Are Steadily Getting Worse.

  32. You’re Single Because You Didn’t Go to College.

  33. You’re Single But This Isn’t About You.

  34. You’re Single so Let’s Go to the Videotape.

  35. You’re Single Because You Don’t Seek Out Good Advice.

  36. You’re Single So Here’s Some Hope.

You can take the pressure off yourself to plan the perfect date.

Instead, plan any date at all.

Shoshana Weissmann: I was hanging out with my married friend, who is a great father and husband, telling him how men cannot plan dates. He said, “What’s there to plan? They pick a bar, a restaurant, and a park for a walk nearby.” And I had to explain that grown men refuse to tell me when and where to meet until a few hours beforehand, if they feel like it. You could see him wilt a little inside.

I know some people do not believe me. This was a recent instance.

He also lied because he told me he had a perfect place in mind and forgot the name two days beforehand.

Art Vandelay: He’s right. There’s nothing much to plan for a first date. Just find a nice cool spot for a drink (or coffee), maybe a bite to eat, and off you go. Not being able to do this very basic thing is a red flag like you read about.

Shoshana Weissmann: Hell yes.

A lot of the alpha is on the simple things, and not messing them up.

There are often reports from women that men will go on a first date with them, and fail to ask the woman any questions or show curiosity about the person across from them, actual zero anything.

This is a huge unforced error. Asking only has upside, even when you have to steer things back that way intentionally. Such questions are almost always appreciated, failure to ask them taken as a bad sign. Also the information is highly useful in deciding how and whether to proceed, and is usually actually interesting. If you find the answers boring, then partly that is likely a you problem, learn how to find people interesting (see Dan Carnegie etc) but also a sign this match is not for you, so little has been lost.

Or maybe they’re not making it easy on you.

Aella: Men on dates: “Wow, I’m so curious about you” *Proceeds to ask two brief questions, no follow-ups, and then talks about themselves the rest of the time.*

Decrolssance: Weird. I’d much rather hear about the girl. But they’re always turning it back on me.

Aella: Fellas, this is a battle. She’s testing if you’re truly curious by throwing a softball at you to see if you drop your stated intentions to pursue it eagerly.

Robin Hanson: Men mostly want to idealize women and take them at their word, but this becomes harder when we also need to notice that they often test us via misleading signals.

Amanda Askell: More common: “Wow, I’m so curious about you.”

*Proceeds to ask so many questions that I begin to worry the goal is identity theft.*

Aella: Where do you find these guys? Can we swap?

Keep in mind that Amanda Askell works at Anthropic, so ‘he’s a spy’ is on the table.

The woman saying she is curious about you is partly that she probably is curious, but it also is a trap, potentially an intentional one. How do you steer things from there?

Good question. Presumably the goal is balance, and you may need to fight for that.

Then there are the parts that are about calibration, especially of being the right level of assertive and aggressive, and of course knowing which situations call for which, which is one of the hardest challenges. There is a lot of advice to men that is indeed essentially ‘don’t be pushy’ and lots of other advice from other sources that say ‘do be pushy’ so of course reverse all advice you hear.

Whichever you hear most is more likely the one you don’t need.

Reddit poster: I realized I was being ghosted by girls because I followed Reddit advice.

I thought I was unattractive. I vented on this app many times after being ghosted, but I’ve been successful lately. To be honest, I’ve also started going back to the gym.

But anyway, I read on Reddit that one should always be respectful, do not kiss on a first date, do not flirt, etc. That’s exactly what I was doing. I would go on dates, ask about their work, school, vacations, etc.—all that wholesome vibe—and was getting ghosted.

In the last four weeks, I’ve been on a few dates and told myself to ditch all that advice, started flirting with them, going for a kiss at the right time, inviting them to my place, etc., etc. And yes, I’ve been quite successful lately. I no longer feel unattractive, lol.

Hey, this advice might not work for everyone, I don’t know, but all this worked for me better than being wholesome and waiting until the third date, etc.

elle: The problem with giving men advice like “don’t be pushy” is that the men who truly need to hear it won’t listen, and the men who would benefit from being more assertive will take it to heart.

Human Person: I am in the latter category. I’m terrified of invading someone’s space like that and assume I have no chance either way, and the mixture just makes me fight back every thought I have to initiate.

Damion Schubert: Ninety percent of advice given like “don’t be pushy” really just means “Jesus, tone it down until you’re not a creep.” You’ll never get anywhere if you aren’t assertive, but figuring out where the line between “assertive but nonthreatening” and “earning a restraining order” is critical.

This is frequently a skill issue. You need to be assertive at the right times, ideally in the right ways – if the times are sufficiently right you have a lot of slack here, if they’re only somewhat right you have less – and not at the wrong times in the wrong ways.

Other times it very much is a calibration issue. And the default response to not yet being skilled is to become miscalibrated, and to seek out miscalibrated advice.

Either you take, and are advised to take, lots of shots on goal so you at least have a chance and also a chance to learn and grow comfortable. In which case you will indeed often come off rather badly. Or you take barely any shots, which avoids many downsides but ends up wasting everyone’s time, with little chance of success and not much learning.

The good news is that if you are paying any attention there really is quite a lot of space between ‘assertive but non-threatening’ and ‘earning a restraining order.’

A common sense heuristic for the timid is that if you go home sad that things did not progress at all, and you never got any form of negative feedback (this can be subtle, ideally it mostly will be, but it has to be there), you probably weren’t assertive enough.

Also, flat out, you need to flirt on dates (no matter who you are), first or otherwise, and want as a man to be attempting kissing at least often. Anyone who says ‘don’t flirt’ or tells a man ‘never kiss on the first date’ (or never initiate one) is giving you terrible advice and you should essentially ignore everything else they say not to do – they might be right on other points, but them giving you that advice is not useful information.

Some basics for those who need them, or who would find it helpful to affirm them, or have them made more explicit.

It doesn’t always work this way, it doesn’t have to work this way. When dealing with a particular person you can and should pay attention and stand ready to throw this all out the window if they want to play differently. But one should be aware that it usually does directionally work this way. Going with it tends to lead to better outcomes, and going against it usually means swimming uphill.

Matt Bateman: A story about learning a masculine role in relationships, that may perhaps be useful to people similar enough to me.

In my early 20s or so I had no real conception of differentiated gender dynamics.

I was raised to believe that gender differences were ancien régime constructs.

So one day I decided to sit down and think about it. For the first time in my life, I put my mind to considering: what is the courtship game?

Let’s assume it *isa construct, purely a social artifact.

Still—what *isit? How does it work? What is there to be said for it?

I pondered romcom/sitcom-level things such as “guy is supposed to make move, guy is supposed to propose, guy is supposed to ‘lead’; girl is supposed to signal availability/interest, girl is supposed to gatekeep”, etc.

After some thought I was like… yeah ok I’m good with this.

By “I’m good with this” I did not and do not mean “I think everyone Must play this game in this way”, or “this is Biological Fact”, or “exceptions are Bad and there are no reasons ever to take exception”, or “critiques of this game are all wrong”.

I instead meant and mean “I like this game, I’m actually glad it was established, it’s nice for me that most people play it, I’m myself happy to play it; this is a good vehicle for me to find and participate in rather important forms of meaning and fun”.

Maybe this is blindingly obvious to some people? Most people? Plausibly I’m idiosyncratically dumb. But for me it was a revelation.

The level of relevant description was not where I expected to find it. The things that most people talked about most of the time seemed irrelevant.

Matt Bateman (the one Jakeup quotes): Whenever it felt like I was waiting, or things were coasting or simmering for too long, or it was unclear how to proceed, I would now think: oh right, *I’msupposed to *read the roomand *do somethinghere. That’s my role in the game.

Jakeup: this is a key point in a great thread. the classic moves in the game of courtship go like this:

1. girl sets the room and hints at possibilities

2. guy reads the room and issues invites to potential courses of action

3. girl picks a course of action to follow the guy

4. repeat

this game can break at any step, leaving both frustrated. a girl who doesn’t know what she wants. a guy who passively waits for instructions. a girl who’s stuck with a guy she doesn’t want to follow anywhere.

or when either of them invites outsiders into this private dance

it’s hard enough to read the inclinations of one person, scary enough to relinquish control and follow someone

if you further constrain yourself by insisting on maintaining a story that would stand up to criticism on social media or even just among friends, it becomes impossible

Matt is very wisely being extremely vague about what the moves of the game entail for him personally because arguing over specific techniques of seduction on twitter is *nothow you either learn the game or play the game

it’s why I spent the first month of Second Person explaining that my goal is to *unblockreaders to engage in their own practice of fucking around and finding out and that giving specific instructions actually stands in the way of that

Jacob expands this into a full blog post, framing modern dating as improv, in which there are no fixed hard rules but (for heterosexual pairings) the man’s role is generally to read the signals and the room, and make moves to advance the plot, including being willing to risk being explicitly rejected, while the woman’s is to provide a room and signals to be read and approving or rejecting proposals, and helping everything stay graceful.

But of course, none of that is in the form of rules. There used to be actual rules with actual people enforcing them, and now those rules are far more minimal – some things are actually off limits but you presumably knew about those rules already. If the situation isn’t typical, or typical isn’t working, you are free to switch the roles, take completely different rules (except for the big actually enforced ones, although even that can get weird these days), or do anything else you want.

And if they don’t understand how the game works or what their role is supposed to be, that’s fine, you figure out how they think the game works or how they want it to work, and you play by those rules instead. That especially applies when the man is kind of clueless, and you don’t want that to be a dealbreaker.

Billy Is Young has another thread of remarkably similar dating-as-a-male-101, and how you need to project yourself, and in particular not to attempt to present yourself as other than you are. Fully ‘be yourself’ is not always wise, but don’t be actively not yourself either. Don’t hold back your masculine energy or pretend not to be attracted.

Apparently there was a ‘predator sting’ in which a 22-year-old was invited to meet up with an 18-year-old, then the students berated him as a ‘sex offender,’ 25 students chased him and one student punched him in the back of the head. If you believe we live in a world where 18-to-22 is an unacceptable age gap and might get you chased by a mob of 25 people and punched in the back of the head, you’re going to have a much harder time dating.

Aella hints she may be available to tutor you to be irresistible to women. I don’t know if this would work, and yes you’d have to pay, but it might work and this does seem like it would at least be fun. It sure beats buying them Tinder credits.

In her case, she offers us some free instructions.

See? It’s easy.

Aella: All I want is a guy I just met to casually and confidently touch me on the arm or back, lean in with direct attention, unwavering eye contact, and then spend the rest of the night flirting with my friends.

I just want to slightly insult him and have him slightly insult me in return. I want him to be a little assertive, all the time, and completely comfortable with that.

I want him to be Schrödinger-fucking other girls, where he is simultaneously getting intimate with all other women but also has standards too high to sleep with anyone. I want him to enjoy me but be uncertain about me; I want to be unsure if I am good enough for him.

I want him to be completely, deeply, and unapologetically comfortable with himself and his desires. I want to somehow possess some rare jewel of a trait in my soul that he has been waiting his whole life to find.

Including this first level move, although you’ll usually need a different framing.

Aella: One of the hottest things a guy on a first date ever said to me was “I estimate 15 percent that I’ll end up interested in seriously dating you.”

All a girl wants is to feel like she has to work in order to catch a guy.

Joe’s AI Experiments: I work in the gambling industry and 15% is considered something of a magical number.

It’s high enough that people consider it possible, and don’t give up. They keep full emotional investment.

But 15% is rare enough that a win feels really special.

I’ve never heard the 15% claim but I’ve been sitting with it for a few minutes. It seems plausible in some settings, I can see it being a cool percentage chance to win a run for example, but in my sports betting experience going +500 seems like a bit much, although it also rarely is a natural thing to come up without a parlay.

For dating, it makes sense. 15% chance of serious interest is still a hot date.

Always be closing.

A classic puzzle, inspired by a TikTok clip. A woman is invited back to a man’s place after a date, agrees but says ‘I’m not going to sleep with you.’ What does that mean?

It means you need to pay close attention.

Richard Hanania: What does it mean when a woman says “I’m not going to sleep with you” while also going back to your house? It means she probably will.

Anecdotally, this only seems strange to those under 30. Society has failed you, but I’m thankfully here to explain.

The most pathetic thing I’ve seen is men complaining about this. As if the job of women is to come up with a logically coherent philosophy instead of choosing the highest quality partners. Subtext is key to romance, it’s not an inconvenience to be regulated away.

Men with bad social skills don’t realize it’s their job to learn social skills. They believe women have to conform to what’s easiest and most convenient for them. They also want relationships without ambiguity or risks. Incel ideology is a cancer.

Rob Henderson: This could mean so many things. If, after a date, a woman accepts a man’s invitation to visit his home, and first says, “I’m not going to sleep with you,” this could, of course, mean that she has no desire to sleep with him.

But it could also mean something like “I’m going to share this information with you in order to gauge your response and if you behave weirdly, then I’m not going to sleep with you.”

It could also mean “I’m attracted to you and would like to sleep with you but I’m not feeling at my best so I’m not going to sleep with you tonight but want you to know I like you and trust you enough to be alone with you.”

It could mean “At this very moment I’m not entirely certain I want to sleep with you but let’s see how the rest of the evening unfolds.”

It could mean “I really like you and I’m absolutely not going to sleep with you but I have no way of forecasting how I’ll feel as we spend more time together.”

In the same way as you could imagine someone watching their figure entering a patisserie and thinking “I will absolutely not order a slice of their famous strawberry cake” and 20 minutes later find themselves savoring the last bite.

In the clip, the woman complains that she is the problem, because she did not want to sleep with him, but she wanted him to try a little, and he didn’t, so now she feels ugly.

Richard notes that a lot of younger men expressed great fear that they would be punished severely (as in life ruined) for judging wrong and going too far.

Any of these things could be happening. There is a substantial chance she does end up sleeping with you if you are down for that, and also a substantial chance she does not no matter how well you play. Your job is to navigate this ambiguity. Develop and use the relevant social skills, use ambiguous actions to see how she reacts, do the best you can. Understand that there is no perfect solution, you need to be willing to get it wrong in both directions and gracefully navigate both failure modes.

That is vastly harder if you have gotten it into your head that one move too far could ruin your life. Which in theory it could, but the chances of that happening (especially if no one involved is in college) if you act at all reasonably are very low.

Essentially, those men think their own sexuality is borderline illegal In Their Culture.

Sulla: Zoomers have an incredibly weird relationship with sex. On one hand, talk of sex and adjacent things has been completely de-stigmatized, nobody really cares or thinks its a taboo if you talk about it.

OTOH, acting sexual, especially for men, has become extremely taboo because of the possibility of making people “uncomfy” which must be prevented at all costs. Masculine sexual behavior especially has been made taboo – “safe horny” and “reddity horny” are okay. “Step on me mommy” etc. because its not “threatening.” They seek to turn the masculine man into a harmless femboy twink because the masculine man is “scary”

It’s basically a result of histrionic Zoomer behavior to minimize “discomfort” at all costs – “harm reduction” etc. Completely delusional, they need to grow up and stop being whiny babies.

Alaric the Barbarian: I’ve said it before and I’ll say it again:

As of today, white male heterosexuality is the most suppressed pattern of behavior in The Culture.

It’s effectively illegal.

Shamed at every turn, policed by HR-world doctrine into narrow venues of acceptability, constantly bashed in media and on social media, crafted into something new and nonthreatening via a full-court press of incentives.

There’s space allowed for everything but this.

Andrew Rettek: to the extent this is true, the “law” is only strongly enforced in some places, and never universally. It really sucks for the guys who have their social lives in those places, and can’t get an exception for themselves.

My model is the same as Andrew’s here. There are particular places and times in which being terrified is a reasonable response. The obvious response, if you find yourself in such a place, is to tread very carefully while there, not get stuck there permanently, and do your best to get your dating and relationships elsewhere.

Once you are not in such a place, you need to realize that you are not now in such place, and undo the paranoid adjustments you felt forced to make.

If she (27yo) screams someone else’s name during sex, what to do? If that someone else appears to be a 16-year-old boy cartoon character called Ben 10 (but also could be another Ben she is cheating with and then tried to ‘save it’ with the cartoon character?) then does that change your answer? Some advise leaning into it. Mostly I think people let this kind of thing get to them more than it should, and you should essentially bank the credits for when you need them. But that’s an outside view.

What about endurance, and how much of it is being fit?

Aella: Man i didn’t anticipate how much I was spoiled by having a long-time sex partner with incredible physical endurance. It turns out that it’s much easier to drop into sexual bliss when a part of you isn’t worried that the dude is getting tired.

Guys by endurance I mostly mean whatever cardio and muscles are required to keep going until I have an orgasm. Penis endurance is also nice but imo not as rare.

This includes jaw and tongue and back of neck ok.

What’s most interesting here is how much of the issue here is presented as worry about endurance, rather than in the actual endurance. That’s yet another way confidence matters. I’m also rather surprised by her observation that cardio and general muscle fatigue are the most common limiting factors here? I suppose it depends on how much endurance you need.

Aella’s studies report that watching porn is mostly positively correlated with predicting female sexual preferences, including the finding that more women like rough sex than men (of course check first, you can’t assume!). But she notes that anal is the big important exception.

Aella: On average, men who watch more porn, were more accurate in their predictions of what women liked in bed (according to ratings from the women themselves) This held both in my own survey and also a microtasker sample.

Anal is actually maybe one of the few exceptions here; in general, women are significantly more inclined to like rough porn than men are – but anal is one of the biggest gender gap preferences in favor of men. Way more men like anal sex than women do.

Fredrick von Ronge: It’s not the anal, it’s her submission to something she’s not into.

Aella: Actually generally no. ‘submission to something she’s not into’ is a more common preference among women than men.

How to make casual sex great again?

Aella: since this tweet i did figure out how to make casual sex good [chart excludes escorting]Aella: Since this tweet, I did figure out how to make casual sex good. [Chart excludes escorting].The key for me has been:Stopping being ashamed about things I want in bed.Starting to be really selfish about what I want, giving up on compromise.Aggressively communicating and filtering for men who are into what I want.Building networks full of these people.Turns out, a lot of what makes casual sex bad was having sex with people who wanted me to do things I didn’t want to do, and me being too nice and not wanting to hurt their feelings, so I played along.

the key for me has been

  1. stop being ashamed about things i want in bed

  2. start getting really selfish about what i want, give up on compromise

  3. aggressively communicate and filter for guys who are into what i want

  4. build networks full of these people

Turns out a lot of what makes casual sex bad was having sex with people who wanted me to do things i didn’t want to do, and me being too nice and not wanting to hurt their feelings so i played along.

That must be quite the network, given her other statements about this meaning only guys inherently into things guys are rarely inherently into. The sex might be casual, the logistical operation and interview process is anything but, although presumably worth it.

I presume most people need to do much less aggressive filtering than this, and would be happy to do a bunch of compromising within a reasonably wide range, but should absolutely speak up far more about what they want.

This sadly does seem to be a reliable format for Peak Engagement on Twitter, but also seems like relevant information in this case.

Austin Allred (6.3m views): I fundamentally don’t understand OnlyFans. The scale is insane.

Who is paying for this? Why?

What I’m hearing is it’s not just about the porn necessarily, it’s about having paid “relationships” of some sort with the creator, which makes a lot more sense to me.

Makes me sad, but it makes sense.

Before I thought it was just the appeal of porn for non porn stars, which would make sense but break down at scale.

But sexual virtual fake paid relationships? Yeah that makes so much sense.

And is very sad.

Aella: Most income, as far as I know, comes from messages, not from basic subscriptions. It’s essentially interactive pornography, sexting with a hot girl where you masturbate together. That’s where the big money is.

Mason: OF semi-successfully repackaged something akin to actual cheating as porn, and it turns out there’s a huge market for cheating that societal norms haven’t caught up with.

The demand for cheating has been there since forever, so it’s weird to say societal norms ‘haven’t caught up with’ it yet. And I’d say it’s more that there’s demand for attention, I’d presume that single men use OnlyFans more if you hold other conditions constant, rather than less.

I think this all seems less sad rather than more sad, given the alternatives, if we hold the amount of money and time spent constant? At least you do get some sort of parasocial relationship, some amount of interaction. Although it also does seem more damaging to relationships.

If that appeals to you, Aella discusses how to succeed at OnlyFans, with a lot of distinct connections, versus as a cam girl, where you are aiming for 1-2 whales that like to win a dominance contest in front of other men. OnlyFans is about the illusion that you’re the only guy. The money in OF is in upselling via DMs, which (of course) are typically are handled by agency-hired minimal wage workers in warehouses, and agencies often charge 50% or more (on top of the OF 20%) for this and other services.

I presume AIs will replace those jobs rather soon, which greatly reduces marginal cost and also turnaround times, and presumably thus alters the business model.

The new dominant play is apparently ‘drips’ where you have a sequence of clips with escalating price tags, which you pretend to do in real time but you don’t have to pretend that hard, the men don’t notice or care. They want a minimal deeply uncredible version of the second level symbolic version of the thing – the conceptual indicator of a personal connection that would imitate an actual personal connection.

She also notes that by not doing internal discovery, OnlyFans forces creators to advertise elsewhere, which got so aggressive that various places (even Fetlife) got pretty hostile towards all the posting. This is a levels of friction situation of the type we’re going to see a lot of with AI – OF reduced frictions to doing the OF thing, so suddenly the previous levels of friction outside OF didn’t deter people enough, and if you didn’t ban it the level of tits-in-your-face was out of control.

It’s like there was this great business model lying around the whole time, that any (sufficiently hot woman) could use – spam the internet with hot pics, recruit men, charge a subscription for some sexy content, charge for individual interactions and marginal content. And the secret to unlocking this was to remove the friction, and also earning 20%, was just to take care of various basics on the backend?

Aly Dee asks, isn’t OnlyFans a bad deal, versus finding one ‘kind’ rich fan and putting a ring on it, given the prospects of a young woman who can succeed at OF if they’re willing to date older, and how easy it would be to be intentional about this? The obvious answer is that no, that isn’t obviously better depending on what you want, especially given the commitments involved, and also isn’t so easy to get given the adverse selection problems.

In other OnlyFans news, this is a real way people are reacting to real news?

Max Tempers: 🚨 NEW: OnlyFans is now accessible in China, a move that could boost the UK economy significantly.

A Labour insider told me ‘This is exactly what Lammy’s progressive realism is about!’ as he attempts to justify the recent overtures to the East made by the government.

I have nothing against OnlyFans, but if this can ‘boost the UK economy significantly’ then that raises further questions. So many questions.

Are your preferences bad? If so, should you feel bad?

As in: Paper asks, is it bad to prefer attractive partners? No. Next question. Paper disagrees and claims there are strong philosophical arguments for both sides. The argument against seems to be the fully general anti-discrimination argument, that says humans are not allowed to express preferences, or to prefer better things to worse things, unless they have some special moral justification. And, yeah, no.

Mate preferences differ a ton across individuals, and the gender-based differences look relatively small, but taken together if you know someone’s preferences you can guess their gender with 92.2% accuracy.

Aella looks at preferences by examining the relative prices of female escorts with various physical attributes. Nothing is too surprising, but you learn about which things have bigger magnitudes of impact. Even the things that mattered don’t seem to have that big an impact on price, to a level that I’m a little suspicious.

Aella also notes that as she charged higher prices, client quality improved, in particular there were far fewer assholes:

Aella: I’ve always liked about 80% of my clients, but now that I’ve raised my rates even higher, I think I like… all of them? I work less often, and mostly just for fun, but I think every single man I’ve seen in the last year has been pretty cool, and I feel warmly toward them.

When I first started, my rates were a lot lower, and most of the unpleasant clients I remember ever having occurred in those early months. Not that wealthy people can’t be unpleasant, but it’s more like, unpleasant people are less willing to pay a lot of money.

I bet that charging more also makes the same men act less like assholes. Consultants know that if you don’t charge enough, no one will respect or listen to you, so not only won’t you make much money, you won’t be able to do the job. When you charge a lot, people who do pay doubly respect you – you assert you’re worth that much, and also they agreed to pay it. There’s also the section effect, of course, where assholes are shopping cheap as they can.

A similar principle holds for regular dating. It doesn’t have to be money that acts as your asshole filter.

Strategies that are very hard to do, especially before having done them, but that work.

Sasha Chapin: I have become a much more effective person over the last two years via living with an extremely effective wife

If I had to break down what has changed, I would fail—I think effectiveness is a style that you can learn to mimic, like a tennis swing, more than a set of principles

How one finds a highly effective wife (or husband), without first yourself being highly effective, is the mystery. But yes, absolutely, being around effectiveness, hard work and high standards will rub off on you a lot. The need to be worthy can’t hurt either.

This also applies to everything else. Seek out those who have qualities you want to have yourself, and avoid those with qualities you want to avoid.

Here’s a preference.

Olivia Rodrigo (the pop star): This is a very oddly specific question that I ask guys on first dates. I always ask them if they think that they would want to go to space. And if they say yes, I don’t date them. I just think if you wanna go to space, you’re a little too full of yourself. I think it’s just weird.

Crybaby: guess her type is down to Earth.

If that is her motivation, that is a good thing to want to avoid (especially in her position, since I imagine a lot of men who dare try and date a pop star are rather full of themselves) and this is potentially a good question to ask, but you have to pay attention to exactly how he answers. If the answer is ‘sure, if given the opportunity, of course I’d go’ and you turn them down for that because they’re too ‘full of themselves,’ then you fool. If the answer is ‘yes, I’m actively trying to go to space’ then sure, maybe that’s not what she wants.

Here are some other claims about preferences that sure sound like a trap.

Anna Gat: Being a stable and reassuring man is the sexiest thing you can be for a woman – our nervous system reacts immediately and deliciously.

If you’re trying to court someone and don’t know what would work, this is what would work! You’re welcome 👶🏾👶🏻👶🏽👶👶🏿👶🏼

Sarah Constantin: the most valuable, in-demand person in the world is someone who is Fine.

Now, that’s easier said than done!

But if you do happen to be Feeling Just Fine, Thank You, don’t worry about any of your other deficiencies. You are a catch, just for that alone.

Misha: why are you saying an obviously false thing like that.

Sarah Constantin: Because it is true in my experience.

Misha: How much experience do you have dating as a man?

Being a catch is not the same as being more likely to be caught.

I do think that these characteristics are valuable and worth pursuing for other reasons, and are underrated as male dating strategies, but for it to work you need to get into opportunities to demonstrate this style of value.

This won’t get you in the door. Merely being stable and fine on your own in the abstract is great down the line, but you still need a way in. This can’t take the role of ‘the thing that is attractive about you.’

Also fitting the above the pattern: Women like kind men. This is a well-known robust result in evolutionary psychology. However, men have the strong perception that if you want to end up with a woman, being kind too early is a poor strategy.

To put this all in someone else’s terminology, and hopefully make it clearer, we’re going to have to pull out the visual aids, as I realized while editing the post we’ve been effectively talking about the two different axes on the latest men and women ranking scales chart. It’s a fun one, with lots of detail. As usual, take the right amount of seriously and literally, which is neither super high nor non-zero.

I have many quibbles with this even as a ‘baseline scenario.’ But I like that this is the quirky perspective of a particular person, who is clearly describing what they observe. And it emphasizes that, mostly, Good Things are Good, and that everything counts.

The discussion at the link, mostly unrelated to the graphic, is the latest iteration of the ‘the dating market broke because the men who can get multiple offers solved for the equilibrium’ argument, where without enforcement of various traditional social norms things collapse into dynamics that aren’t good for most people involved, where men have little felt incentive to commit and women have little leverage.

Even if you do want to spend your 20s seeking marriage and kids, that becomes very difficult, especially if you are unwilling to break with the mainstream social scripts around dating.

Also it seems there’s a part two? Which resonated a lot less, and feels like it says a lot more about the author than anything else, but seems fun so sharing anyway.

There was much talk about this Reddit thread, reminding some of this other thread.

(As usual, the story might be fake but the hypothetical and the reactions are real.)

My boyfriend and I are both 28 years old and together for 2.5 years. Yesterday night we were drinking and one thing led to another and I tried to compliment him by saying he is not someone who I would hookup or be a fwb with but marry.

I thought everything was fine but he seemed extremely distraught after that. I realized how he understood it and tried to clarify it but he is still the same this morning.

He told me he needs space to think for a while and left the house. All my friends tell me I messed it up and guys tell me it’s not a compliment and most men will understand it differently. I think I destroyed our relationship and I am panicking right now.

Bern the Fallen: Now I know why this is ringing a bell it’s like the TOMC “I broke my wife and I don’t think it’s fixable.”

Story where a guy basically says the same kind of “compliment” about his wife and she walks. It’s the same sentiment but many aren’t comprehending it.

Can’t lie I love when there’s a gender flip on a situation, it can help people see it differently.

This shouldn’t be seen as a ♂️v♀️gotcha but an understanding of what the other person feels receiving such a compliment that comes off as backhanded due to the qualifier/comparison +.

Misha: I hope everyone has learned a lesson about how to not give a compliment to a guy. I think there’s too much unstated context for us to conclude things about him or their relationship here, but is there ANY context in which it improves a compliment to start off by saying you don’t want to fuck someone?

Rat Bastard: Remember that time Aella posted about a guy she was dating saying she’s “not that pretty” remember how all the replies suggested that like, she was being emotionally abused or something.

The original poster’s boyfriend is wildly overreacting, but perhaps he is simply too attuned to how women declare war, lol.

Women think it’s a compliment, actually, because they are focused on the “marriageable” aspect, and I think they are right that it is unfortunate that many men nominally want to get married but legitimately do not really care about being marriageable.

My “insult everyone” take here is that most men don’t NEED to care about being marriageable and they know it.

If you won’t fuck them, you won’t date them, which means you wont marry them so it’s all moot anyway, is the vibe.

It makes sense that these kinds of comments can be unfixable dealbreakers. The information can’t be taken back, and potentially colors everything.

Danielle Fong (referencing the chart in the previous section): basically, the girl is saying “you give me great investment” (husband, friend-zone quadrant)

But you are not hot enough to be a prince charming, a situationship, or the bad boy, you are not on the left column.

Now, depending, the guy can be fine with this ig, but, say he is putting up an unsustainable amount of effort (investment high) — this would not be a good sign. Drop off for a bit and you’re settling or worse. And “apparently” nothing.

This is stupid. I can tell you; work out and you’ll be getting the attention. Guys who can’t be hot are just being lazy. Skill issue, really.

Also, why is the guy so sensitive that he can’t take what is intended to be a compliment? Snowflake ick type thing. Second degree type two ick imo.

Malcolm Ocean (also referring to the chart from the previous section):

What she says: I would marry you but not hook up with you

What he hears: you’re disgusting, a sweeper, I wouldn’t even want someone to see us together

What she means: you’re charming/husbandzone so I don’t want to get attached if this isn’t going anywhere

‘Not being hot is a skill issue’ is a bold take. It’s not entirely wrong, especially if you go beyond exercise into various other areas, there is usually a lot of room for improvement. But a lot of it is not a fixable skill issue, especially for being hot to a particular individual person, and once impressions have solidified. Also, that’s a lot of additional investment, if you don’t otherwise want it.

My quick model is that I see there as being three distinct problems caused by this.

  1. Feeling unattractive and unwanted really sucks. So does being with someone who might well stop wanting to have sex or sees it as a cost rather than a benefit, especially over a longer term once things are locked in.

  2. If you are insufficiently attractive, then you have to compensate for this with other investment. It makes it more likely she’ll be unhappy long term. It weakens your effective bargaining power and you have to worry you are not secure in the relationship.

  3. You have to worry a lot more that she’ll look for or find someone she thinks is better, and either have an affair or try to upgrade.

If she means what Ocean thinks she means, and you’re ‘too good’ to only hook up with for risk of getting attached or what not, then she’s communicating quite poorly but has opportunity to clarify and save it. The whole meaning can be turned around. And indeed, even if that isn’t what she meant but she is willing to lie to save the relationship , this is the best lie available.

I think most of the time that’s not what she’s saying.

Obviously successful pairings happen all the time with attitudes like this. Most successful pairings don’t involve maximum baseline physical attractiveness before growing into that, and if they did then that means everyone is paying way, way too much attention to looks. You still have to be very careful how you say that.

Indeed, one of our big problems is exactly that we don’t give not-maximally-physically-attracted pairings situations where they are set up to find each other and then succeed in spite of that. Instead, we do the opposite, we tell people and especially men that if they aren’t sufficiently attractive, they will never get the opportunity for the rest, and also will be in constant danger of losing everything.

Game theory of Jane Austen regarding dating strategy. Fun for those who were forced to endure her in school, but nothing most readers would regard as new.

I don’t know if there is actually a pattern of those claiming to be ‘29 year old boss girls from TikTok’ having public meltdowns about failing to find a man despite their otherwise amazing lives.

I do know that the one here is complaining that people are telling her she is wrong, and she is tired of waiting for her soulmate to suddenly appear that ‘matches her energy,’ and yet she says she is not asking for a lot. I know that she says that ‘all her friends’ have their finances and husbands ‘that they’ve prioritized,’ which is evidence against this being so impossible, and perhaps that she made different choices on what to prioritize. She is clearly feeling entitled to a soulmate.

Mason: I think the truth is that being emotionally available and able to progress a relationship with a fellow imperfect human being without a neon blinking sign from God is actually a skill.

Which many people do not have and which they do not realize they could have.

Here is a woman who more credibly reports trying, for years, yet finding no one.

Signull: Forget the pandemic; there is a literal epidemic of women who cannot find anyone they want to be with—it feels like we are at the precipice of a radical new cultural reality that is changing so fast that most people do not have the capacity to keep up.

Katherine Dee: A big part of this is a more generalized loneliness. Notice how in each of these videos they lament not having a community—it is not solely about romance, and it is cruel to omit that detail.

It is noteworthy that in the first clip, Katherine is wrong and the woman mentions all her friends rather than complaining about lack of community.

Perhaps not the central point, but: The actual story was about her going out to a comedy show. She seems to not have known what it means to sit in the front row. It often means you are going to get absolutely roasted. Instead, she got a free gift bag and praised for being brave and singled out. Then she went home rather than have a drink, because she had this gift bag.

You fool! This was actually a great situation. An entire room full of people heard comics call you brave and drew their attention. This is exactly when you go to the bar. So what if you have a gift bag. Good chance you get approaches, you have something to talk about, that is exactly what you came out for.

Another problem is, what if dating no longer gives you even a little excitement, even if you’re not going on that many? The obvious answer is ‘find better people to date, that actually excite you’ but that is not easy. Neither is ‘find people who are unpredictable and liable to say interesting things,’ or even ‘find activities that are inherently exciting even if your date isn’t.’

Presumably a lot of this is a lack of the dance of ambiguous escalation?

Rob Henderson: Two different young male friends, upon reading this lively passage from @GlennLoury ‘s incredible memoir, reacted with some variation of “I have never related to anything more in my life.”

Anna Gat: Without this we would be soooo bored 💄

If your strategy involves moving away from the dance, if it resolves the ambiguity too easily, then there is great risk that what remains is not exciting or fun.

I never went on enough dates, or rather never had enough unexciting date opportunities, to have this problem. My presumption is the right strategy is to think, I now have an excuse to go out and meet someone new, if I’m not enjoying it or seeing much value by default I get to mix it up, and you get to have gambler’s mindset that if it works it’s pretty great so you can afford a lot of uneventful along the way. Or maybe… just don’t be all that excited, and that’s kind of fine?

‘Act like you’re on Bachelor in Paradise except without the cameras’ is remarkably close to good advice. Easy to say, hard to act on it. Move fast and break things, in particular fail fast, and treat every relationship as either headed for an engagement or not worth pursuing further.

Nick Cammarata: I hate how well asking myself “if I had 10 times the agency I have, what would I do” works.

Chris Lakin: If you were serious about dating, you would be doing everything you could to break up as quickly as possible. Most relationships do not work out. If you are dating to marry, that does not require three years to figure out. One year, maximum, to determine if the relationship is doomed.

thinking about running an event where couples come stress test their partnership. “is your relationship doomed? come find out!” lmk if you have ideas. Matchmaking is out, Breaking Up Sooner is in

“second date should be a 3 day trip to Montreal together”

Emmett Shear: The people I know who are happily married almost universally were trying to make every relationship they were in work for the rest of their lives. They also quit as soon as they know it won’t work out that way, but inevitably it’s better to err on over persistent than under.

There is no case where I wish I’d followed this advice less, and several where I would have benefited from following it more.

The counterargument is that doing this is difficult and painful. Fair enough.

The other counterargument is that being in a long term relationship teaches you things you can’t learn other ways. But from what I’ve seen, often you then need to unlearn exactly those lessons.

Mostly it seems super wise, the moment you can tell it’s not going to work out, to act accordingly. That doesn’t automatically mean ending it right away, fun is valid, but if you’re out of the super fun period, it kind of does mean that.

Matthew Yglesias: If you’re paying attention, it’s pretty easy to tell if things aren’t going to work out.

More precisely, I would say there are a lot of cases where you can’t tell and it might work out, but yes there are many that are pretty doomed and it’s obvious early on, so don’t pretend not to notice.

(Definitely one for Remember to Reverse Any Advice You Hear and even more than usual I’m not endorsing the quoted text.)

Of course you have no available LTR options you like, if you did you’d have an LTR.

It’s a matching problem. Anyone medium term unmatched is not going to have easy access to matches they want. That might or might not mean they have unreasonably high standards.

rebecca: why is no one talking about the female loneliness epidemic??? hellloooo

Kangmin Lee: “Female loneliness epidemic.”

Allie: Most women have 100 options for casual sex and zero for serious relationships.

Casual sex does not make you feel less lonely.

Hoe Math: You mean “zero options that you like,” right?

Like, if you start with the men you like and ignore everyone else, you have zero options for a long-term relationship.

But if you count the men you do not like, then you have many options, right?

Ami: Are you supposed to date people you hate and find annoying? Genuine question.

Hoe Math: No, but do you remember when you were younger, perhaps 11, 12, or 13, and you liked boys who did not possess any of the traits you now look for in men?

Do you remember being perhaps 16, 17, or 18, and being impressed by guys who had an apartment and a car? Even though you would never think that is sufficient anymore?

Can you see how the more experience you gained, the higher your standards became?

Well, because of how modern society encourages women to be “liberated,” they are gaining more experience earlier in life than ever before.

That means they are “moving on” from men who would have been acceptable to them in a time when things did not progress so quickly.

It used to be common for women in their 20s, 30s, and 40s to see ordinary men doing ordinary things and think, “Perhaps he is single,” with only a slight tinge of attraction, a curiosity.

Now, nearly all women fully expect to be captivated from the first moment. They want men who have everything going for them.

What this means is that women are overlooking men who are truly on their level more than ever because they feel they are “above” those men.

When you are 16, he is so cool because he has a car and his own place. When you are 32, “So what? I have a car and my own place, too, and they are nicer than his!”

The artificial pressure placed on women to enter the workforce and the “liberation” movement have caused women to cease being impressed by ordinary men.

So no, you are not supposed to force it and date someone you hate. . . you are just not all supposed to be so bored by the average guy.

Ami: I love this answer so much. ❤️🥹

Everyone has at least some non-standard preferences, so you can reasonably hold out for a much better than random match given your general market value, even if you only do an average amount of search.

But that only goes so far, especially if you mostly want generically desired attributes and don’t have excellent search methods, and there are preference mismatches at the population level. The remaining market is going to at best suck, and potentially break down entirely.

This book review of How Not to Die Alone distills a very clear explanation of why dating advice for women is so typically unhelpful. They keep repeating the same three pieces of advice.

They’re all good advice, but neither complete nor usually all that actionable.

Jacob Falkovich: I used to joke that women only ever get three pieces of dating advice:

  1. Don’t be ugly.

  2. Don’t be insecure.

  3. Don’t try to marry the fuckboi.

The first is delivered in private, or in glossy beauty magazines whose covers strongly imply that any man who reads them is gay. Popular dating advice books generally limit themselves to the latter two. One could think of other advice books could offer unrelated to insecurity and fuckbois. For example: that they could figure out what men want from them and do more of that. But books generally don’t, for two reasons.

First: telling women they need to fix themselves and/or to pay more attention to men goes against the spirit of our time. This sort of advice would feel entitled coming from a man and a break of solidarity coming from a woman; maybe an enby would get to it some day.

Second: telling women they’re fucking up and need to change goes against rule #2: don’t be insecure. And since that’s the main piece of dating advice women get, you’d be stupid to go against it.

Jacob then goes on to absolutely savage the ‘science’ that the book in question (How Not to Die Alone) is based on, while noticing that the advice is perfectly respectable and mostly seems right, but generic and in line with general expectations.

I actually think the central theme of much of the advice here seems to fall outside the three categories above, while also being good advice, which is:

  1. Engage in deliberate practice, act intentionally and follow through on decisions.

The book knows better than to say anything is your fault. But have you considered making better more deliberate decisions, and journaling to record what you learned?

Whether or not it is useful to think of things as ‘your fault’ depends on how you react to that. If it helps you improve and learn and have hope because you can fix your fate, great. If it makes you insecure and afraid and hating yourself and you stop leaving the house, then that’s not great.

It’s weird that the book is doing that while also explicitly holding the reader blameless.

Jacob Falkovich (summarizing the book): Your problems are common, which means you’re normal. Your problems are a worthy subject of study for the world’s most prestigious researchers, which means you’re important. And the researchers have found that every single problem was caused by things outside your control, which means you’re blameless. You are normal, important, and blameless; there is nothing you need to fix.

Jacob (providing advice): But if you aren’t, thinking of yourself as a fuckup who’s constantly improving feels much more secure than the opposite. It makes every rejection a positive — an opportunity to learn! And every positive development is a validation of the progress you’ve made, which can only continue, as opposed to being about some default desirability you’re clinging on to.

Even if you’re a millennial, it’s not too late to not be yourself.

Cremieux: There are suspiciously few men who self-report being 5’11” and too many men report being 6′.

Eliezer Yudkowsky: I was 5’11” when measured and it had literally never occurred to me to misreport this.

Nat Rhein: As a 6’0″ I am now considering calling myself 5’11” because I’m more concerned with passing low-level “liar” filters than low-level “short” filters.

The funny part is this isn’t for a dating app. This is for a CDC survey. So the dating-related rate is presumably a lot higher.

Except: Height does not correlate with the chance men are married or have a child.

Aella: I really don’t get why women want tall men so bad. Sure it’s a little nice but not *thatnice. the actual important thing is can he easily throw you around in bed? Height doesn’t matter when you’re screaming.

Tall is overrated and overvalued, I believe largely because it is easy to notice and measure, and the most legible to others. If you seek the tall man, you are ‘overpaying’ for it. So to the extent the dating market is ‘efficient,’ unless you have a relatively very strong height preference you should be sacrificing height to get more of other things you want.

Lyman Stone: Yes, we should incentivize people to get married. Marriage bonuses are an unalloyed good. There are zero social harms incurred by doing this.

First of all I feel like making your policy angle “trophy wives of wealthy men is a bad social outcome” is probably not a winning line in DC to begin with.

But secondly, we all find the sugar daddy dynamic icky— but is it actually less icky if the woman has no legal rights?

Marriage may actually provide her with some rights and protections. As a girlfriend she has far less.

Furthermore, marriage may induce the man to alter other behaviors in prosocial ways.

So yeah, we want them to get married.

C’mon @PTBwrites, don’t chicken out on marriage penalties! This line is silly. It’s perfectly fine, indeed actually good to eliminate marriage penalties in a way that incidentally generates marriage bonuses! We want marriage bonuses too!

Let’s just make this super clear though: My actual view is the French have this right. All tax brackets should double when you get married. They should multiply again for each kid you have. Married+4 kids earning $100k? You should be taxed like you earn $100k/6=$18k.

Robin Hanson: Well obviously not ALL of us find the sugar daddy scenario icky.

Lyman’s full proposal is instantly-reverse-the-fertility-crisis bazooka-level big. I don’t know I’d go that far, but on any realistic margin movement towards it is great. At minimum, we need to eliminate marriage penalties and have at least some marriage bonus. We want to encourage marriages.

Having one spouse support the other is fine and good, we shouldn’t punish that.

The ‘sugar daddy’ scenario is icky to many (not all!), but as Lyman says, a marriage if anything reduces the ick level and the power imbalances involved. Given sugar daddy is happening either way, sugar husband is an upgrade. You might prefer to have neither, but is that the primary effect you’re getting by punishing the marriage?

This is an interesting divergence, but it’s also a highly mislabeled diagram:

Brad Wilcox: Marriage-minded conservatives have largely stuck with the Republican Party (even under Trump) “because the Democratic Party has not provided them with a credible alternative, having moved hard to the cultural left in the wake of what@mattyglesias called the ‘Great Awakening.’”

The question was whether “extramarital sex is always wrong,” not whether an “extramarital affair is always wrong.”

Part of the cultural divide is arguing over where there is a difference.

Lyman Stone: this is a WILD split after the politics of the last 8 years!

Nonmonogamy is adultery. The fact that many liberals approve of adultery because they have found a relabeling of them that helps deal with cognitive dissonance doesn’t change the fact it’s adultery.

I very much think there is a difference.

Aella offers a thread of polyamory lessons learned, with clear themes.

  1. Be honest. Be open. Share everything. Don’t suppress anything. Only way it works.

  2. Do not worry about what is ‘reasonable.’ Make choices.

  3. Everyone must be fully bought in and not even ‘open to’ monogamy.

This is asking a lot. Which is good, if that is what it takes to make polyamory work. You have to ask for what would actually work, not what sounds nice (see also: AI alignment and everyone not dying, etc, sigh).

It is highly plausible to me that there is a small minority for whom this all comes relatively naturally. And that for them, if they practice it with discipline amongst themselves, this particular equilibrium can work better than the alternatives.

However this is very different from what is suggested by most polyamorous people I have met. Most such folks are making the case that polyamory should be a default, and are suggesting various things Aella warns do not work.

Here’s what happens when you don’t heed point three:

hazel: sometimes you can tell a polycule is a competition to be the one who gets chosen when the Main One decides to go monogamous

Snufkin: As a poly person who attracts monogamous people….this dynamic is the gremlin that follows you around at 200 paces back wearing a hat that says “no, it’s cool, I understand” while looking sad as hell

Aella: This is why I refuse to date anyone who is open to being monogamous.

Aella then tells this composite story, where your partner meets someone else that’s monogamous but willing to try poly, and there’s no plan but they end up falling for the other person, and then leaving you to be monogamous with them. Which does seem rather common, based on the experiences I know about, and it all makes sense.

That sets a high bar, with poly wanting to consist only of people who are fully committed to it, which Aella says she was the moment she heard about it, but presumably most people can’t possibly be confident until they try it out? That was constantly the actual argument for trying it out, that I used to hear all the time in San Francisco, and this is exactly the opposite.

The obvious problem is: Under this framework, everyone involved in your polyamory must be ‘all-in’ on it, and not even be open to monogamy.

But how can you be all-in without experiencing it first? I would assume most people, even if being all-in on polyamory would ultimately be right for them, won’t be able to know this in advance.

Is it not this simple, but I believe this is directionally correct.

Allie: I will scream until I am blue in the face: People do not cheat because you are not attractive enough.

If you were not attractive enough, they would not have dated you in the first place.

People cheat because they have personalities that cannot be satisfied, which is why they will remain unhappy.

Shoshana Weissmann: This, extremely this. Such people can grow and stop cheating, but dating a cheater is a real risk because of that.

From what I’ve seen, by far the biggest risk factor for cheating – for every meaning of the word cheating, not only sexually or within a relationship, both within a particular cheating format and for cheating in general – is prior cheating or otherwise being the type of person that cheats. It is an indication of who they are, and can make it part of their identity. The reverse is true as well.

Not all such actions are created equal. The circumstances still matter quite a lot, including evaluating past circumstances to predict implied future cheating risk. Details are important.

Attractiveness does matter too, especially in relative terms and not only in terms of physical attraction. If you’re dating out of your league you are taking on risk. But I think that is a bigger risk factor for them leaving than for cheating.

A key problem in our civilization is that it is legally dangerous to say anything negative about anyone in a documented way outside of certain specific bounds (e.g. leaving online reviews of products). Then again, one must consider the alternative.

Allie: There’s now an app where you can review your ex boyfriends and I can’t see this going well Yes, warn girls you know, especially if a guy is actively dangerous But if everyone trashes their exes online, no one is ever going to date.

Gilbert Kitchens: I’m sure there won’t be *anyexaggerations and lies told by spiteful exes.

Shoshana Weissmann: In the past they’ve also been shut down for legal reasons. And sharing this stuff can open you up to lawsuits

Allie: Has this happened with “are we dating the same guy” groups yet?? Those get NASTY.

Shoshana Weissmann: I think there might have been a lawsuit! YEAH someone had me join one at first and it was BRIEFLY helpful and then just became like “this guy is weird” but no real reason.

Tim Newman: They start out to warn women about violent, dangerous men and quickly get swamped with women writing about men who were mere assholes or just bad on a date.

In theory of course Tea (4.8+ on the App stores but the reviews I read make me rather suspicious in various ways) should be great and net positive for our romantic prospects via reducing uncertainty and Conservation of Expected Evidence.

Tea says it lets you run a background check, reverse phone lookup, reverse image search, criminal record lookup and sex offender search, including trying to figure out if the guy is already in a relationship. Not only does filtering out bad apples get rid of the bad apples and let you accept more marginal other dates, it also improves the dates you do go on because you can trust things more.

What about ‘reviews’ from exes? The same things should be true, if we take reviews as given. If you’re properly calibrated, you should on net come out more excited, and also have more information to help things go well.

The first obvious danger is that an ex could have it out for you, and there will be false positives here, but the alerts should be much better than random. A lot of the negative reviews are from not-crazy exes, and it’s not entirely random, shall we say, who ends up with crazy enraged exes. The accuracy rate doesn’t have to be that high to still be net positive, if everyone is reacting reasonably.

The second obvious danger is poor calibration. You don’t want Tea users to only or mostly update negatively on such reviews. There will doubtless be some of this, it’s unclear how much.

I’d also note that this likely constitutes positive selection for the men – the women who are now more positively inclined will tend to be the ones you want to date. Good.

Then there are the incentives, and how this changes dynamics while dating. How much do interactions change when the woman may be your future ex writing a tea-spilling future review? Some amount of this is good, since it rewards staying on good terms and treating her well. This can also be a threat, or held over your head, and have some decidedly nasty second-order effects.

My guess is that while things like Tea are not used that often, this is all clearly good, but that if this reached a critical mass where there was too much negative selection risk out there for the woman to not to use such tools, then the fact that all the false positives and unfortunate situations correlate (e.g. everyone you want to date is seeing the same info, and that can ruin your chances in general, and this can be used as a threat) makes things a lot less clear.

Here’s Sgt Blackout thinking he’s solving for the equilibrium and failing, via a combination of objectification and then taking the 0-10 scale and completely butchering everything related to it on multiple levels at once, including by conflating a hotness-only-kind-of-offensive-objectification scale with an actual-human-including-personality scale, and trying to condense two dimensions down to one by pretending they correlate way more than they do.

If you do talk with numbers to rate anything, in any context, you always have to be clear what the numbers refer to, and what those numbers are leaving out.

It is obviously correct, however, to keep an eye on the personality distinction he’s pointing at underneath all that, about a personality type of ‘I am the hotness and get to act like it’ that definitely exists and is mostly to be avoided for most people reading this, even if they’re right.

Wisdom about the 0-10 scale:

Felisa Navidad: When men are debating online whether some woman is an “8” or a “10” or w/e, I interpret it as basically the same kind of thing as when they are debating who would win in a fight, Batman or Superman.

BDSM and kink have gotten steadily more prevalent.

If you are looking for ways to give yourself more value on the dating market?

As I understand the situation, this very much is one of them.

  1. The involved population tends to be relatively interesting in other ways.

  2. The social conventions of BDSM spaces make many things much easier.

  3. Huge supply and demand imbalance. Submissives greatly outnumber dominants.

  4. Most dominants do not put in the work to be good at it. You can.

  5. It takes remarkably little work to quickly get relatively good at many aspects.

  6. That work is largely technical skills, very compatible with geeking out on it all.

  7. Many have highly particular preferences. If you can satisfy them, that’s huge.

  8. Most dominants do not treat submissives well. Or listen carefully. You can.

Being down for more things, and knowing how to execute on them properly, is a kind of low level dating superpower. And it is one you can learn. It also often helps with confidence.

You would of course also want to figure out which aspects you can actively enjoy, and which you cannot, and act accordingly.

Aella breaks out some conceptual subtypes here. Note that the darker and more ‘hardcore’ stuff tends to be less popular. Most of the demand is for relatively light aspects that don’t require being all that actively kinky.

She also notes that different sexually successful guys can report overall very different female preferences in terms of liking it rough versus gentle. There are so many different decisions you make along the way, both big and subtle, that shape both who you end up dating, and also what they want from you.

Also gasp, I know: Sex dolls are not representative of typical average body types.

Here Aella talks about some of her interviews with people with obscure fetishes, as in ‘I like that one completely otherwise non-erotic scene in that one movie and literally nothing else.’

Lovable Rogue: I think a lot of people who haven’t truly shopped around don’t realize how good the 99th percentile are in bed.

It’s actually not a dig because often times those people aren’t / don’t want to be good partners.

Aella: it’s insane cause we have a good concept of what ‘high skill’ looks like for skills we can see – piano, dancing, whatever.

What if we treated sex the same way? It turns out there *isa high skill ceiling, but people really have no idea how much better it can be.

It does require a few things tho. Like, maybe you only enjoy learning jazz on the piano. You *couldlearn how to do classical, but it would be a bit of a slog. It’s gonna be hard if you marry someone who only enjoys listening to classical no matter how skilled both of you are.

Those ppl are gonna get blown out of the water by those who Practice.

Of course the 99th percentile person – either for you in particular, or in general – is going to be very, very good in bed. As is the 99th percentile match. And yes, you can get a lot better with practice, both in general and as a match for a particular person. It would be absurd to think otherwise.

I find Rogue’s comment interesting, including the ‘how do you know enough 99th percentile people well enough to form a pattern, even by reputation?’ One can imagine this going either way – perhaps the way you get great at sex is you really want to be a great partner in every way, perhaps it’s so you can avoid doing that in other ways, or it trades off against developing other skills, or the way you get good involves not otherwise being that great a partner, shall we say.

They redid the ‘random stranger propositions people’ study again:

Rolf Degen: 45 years after Clark and Hatfield’s initial experiment and 10 years after the latest replication, the present study showed that this gender difference persists. Significantly more men than women (27% vs. 4%) accepted an offer [of casual sex].

This effect was particularly large among single participants. In the sex condition [of study 1], 67% of male singles accepted the offer compared to 0% of female singles.

I assume the 0% vs. 4% is a random effect, the sample sizes are not that huge.

At the same time, our results question Clark and Hatfield’s finding that the gender difference is especially large when it comes to explicit sexual offers. Indeed, our results show that the gender difference is independent of the proposition’s explicitness as men were more likely than women to accept any of the three offers. Furthermore, acceptance rates of both men and women were much lower than those reported by Clark and Hatfield and also lower than those of previous replications.

Overall, it seems that the receptivity to casual sexual offers from both men and women has dramatically decreased over time..

These are huge gaps in acceptance rates, but no correlation between gender and explicitness – the more explicit the offer, the less likely everyone was to accept it, but you could shoot your shot either way, contradicting Clark and Hatfield’s results.

I find the new result very hard to believe in relative terms, and am highly tempted to either defy the data or wonder about the people conducting these studies – if I can choose who is asking in both cases then I bet I could equalize the explicitness effect?

I can totally believe that receptivity has declined over time across the board, sad.

A practical guide to giving blowjobs to file under ‘it all sounds obvious but that doesn’t mean having it written down isn’t helpful.’

This continues to seem spot on to me.

Alexander: A few people made comments yesterday to the effect that men will have sex with promiscuous women, but not form long-term relationships with them.

This doesn’t seem to be reflected in nationality representative marriage data: women with high “body counts” aren’t less likely to get married in the long run.

Past promiscuity doesn’t seem to stop people from getting into long-term relationships or getting married. Perhaps unsurprising since a lot of people don’t even ask the “body count” question – 49% of men and 42% of women report having ever been asked at all in my surveys.

Consistent with this, about 50% of both men and women report never asking.

This doesn’t mean it is inconsequential for relationship outcomes: a larger sexual history is associated with relationship dissolution and infidelity. People like to frame this as “women’s body count,” but the associations are the same for men and women. This probably isn’t causal (eg – casual sex isn’t “frying your pair bonding receptors”). It’s simply that higher promiscuity is associated with lots of behaviors and traits that predict relationship dissolution.

This seems like the default, and Alexander covers the obvious mechanisms. One could also notice that this leaves out that being more promiscuous likely correlates with more shots on goal and opportunities and also various desirable traits, given how often the person was indeed desired. So there is some amount of balancing out.

This poll provides three data points:

One result is that being libertarian is only slightly correlated with body count, once you control for voting in Aella polls. Another is that self-described libertarians are 52% of Aella’s voters, which seems about right.

The result that actually stood out to me was that only 56% of voters had a body count of six or higher, and again these are Aella poll voters.

It’s good to be reminded that most people really don’t have sex with that many people.

Here’s another self-reported bodycount chart, these are mean values, model this?

The woman is more likely to be the one that pulls the trigger. That does not tell you what or who was ultimately responsible for that being the final outcome.

Regan Arntz-Gray: I’ve seen this repeated elsewhere and want to clarify that the “women initiate 70% of divorces” stat comes from survey data from How Couples Meet and Stay Together. It may be true of filing data as well, but this is what couples self report. Using the extended data set through 2022 I found women initiated 65% of divorces (note that when respondents indicated a mutual breakup it is counted as 50% female initiated and 50% male initiated, so the underlying numbers are 54% initiated by woman only, 23% by man only, 23% mutual).

But I still agree with Allie’s sentiment, that this doesn’t mean women *cause70% (or 65%) of divorces, or even that they’re more flippant about divorce. My read on it is that women *think moreabout their relationship in general and are therefore more likely to notice when it’s degraded to a point of no return, and to call it, asking for a divorce. I think the typical man can bury his head in the sand for longer.

As I discuss in the post linked below, women who have more options (highly educated or make more than their partner) are even more likely to be the ones who initiate. BUT your wife being more likely to initiate *in the event of divorcedoes not imply that you’re *more likely to get divorced*! There was very little difference in probability of breaking up within the survey period based on these “wife status markers” and the dataset is so small that these differences are insignificant (3.9% of all marriages broke up in the period, 3.5% if wife had a degree, 3.6% if she was more educated than her partner and 5% if she made more money than her partner).

Robin Hanson: Rating the relation as “degraded” or not ignores that the relation might be differently valued by the 2 sides. Plausibly women initiate divorce more as they more often correctly estimate that they could do better.

How many of these divorces were initiated by the woman, but only after years of her trying to get her husband to talk to her, to address issues that he wouldn’t recognize, to try therapy etc. I can’t say, but I don’t think it’s trivial – and who “causes” most divorces is either nonsensical (as in no one caused it, they were just a bad match and should never have gotten married) or is too complicated of a question to answer with this kind of survey data.

The data in the study I linked to comes from the How Couples Meet and Stay Together (HCMST) surveys. The original data set came from five waves of surveys conducted between 2009 and 2015. There’s now a new survey, HCMST 2017, which has conducted 3 waves (so far) between 2017 and 2022.

What does ‘cause’ mean in context? If a marriage fails, one can say that the person who initiated divorce caused the divorce, in some sense. Or one can say that whoever did whatever provoked or led to that caused the divorce, which may or may not be the same person. Either party cheating can lead to a divorce, and then there can be claims about what caused them to cheat. The same goes for other failures.

I still think ‘who filed for the divorce’ tells us a lot. And it tells us what economics and incentives would predict, that the better your outside options the more likely you are to initiate a divorce. It seems plausible that the person with better options also is likely to impose more demands, treat the other person worse in various ways, and put in less effort.

There isn’t a strict term for it, and I don’t know of a good pointer, but I’ve written about related phenomena before multiple times, with several

Defender (850k views): has anyone written about the phenomenon where autists go from really bad socializing to being way better than the average normie once they realize the game has rules that you can debug & create new rules?

you can create new social norms or cultural patterns. We just don’t think of them as “cultural patterns” if it’s between 2 friends or a small group

Alex Elliott: I should write about my experience with this with regards to dating back when I was single.

Abyssz: Me, but now they don’t believe I have autism..

Defender: ha!!! yes!!! this is it! I think even other autists, like, don’t believe it’s possible so they don’t try or are afraid to try.

Adelyn, Autistic Courtesan: A friend told me “there’s no point at which you need to stop learning masking, your skills can surpass normies and you’re just 90th percentile socialization.”

Why don’t we have better resources for this across this and many other systems? Because of the Implicit Coalition: The (implicit, of course!) alliance of enforcers who react quite badly if anyone is not part of the alliance of enforcers who punish anyone formally or explicitly knowing and doing things that everyone is supposed to keep secret or implicit.

Dating is a grey area where you definitely get smacked down hard for being ‘too strategic’ or doing actual thinking in the wrong ways, but also everyone understands the stakes are too high not to so all you have to do is not make what you are up to common knowledge.

Patrick McKenzie explains some aspects of this here:

Patrick McKenzie (QTing OP): If they have, hypothetically, they probably don’t describe it as that.

C.f. Professional Managerial Class Handbook on Cognitive Diversity (revised), pg 73.

That is not actually a book, but let’s say there is a very rich subgenre of advice, much of it written by and for very successful people, which says what that book would say, in some cases literally by way of extended D&D metaphors it assumes the audience will grok.

*sigh*

You can tell more than a little bit about me from immediately feeling the need to clarify that that was not an actual book.

On an entirely unrelated note, I alluded in this episode to the fact that many systems consider a forthright description of how the system actually operates to be deviant behavior, in part because it is a roadmap to successful attack of system.

I give the explicit example of AML/KYC regimes, where knowing the regime exists is formally a red flag if you “shouldn’t” know.

But less formal systems of social control, like say the social system that is a high school, also have immune defenses like this.

“Why? They were not designed, not like an AML/KYC policy is designed.”

A system which does not have immune defenses against memetic attacks, but which has value attached to it, will swiftly be rooted by people who understand power.

One of the first things the new guard will do is ban you saying that they are the new guard and describing how they have rooted the system. (Same way how a hacker who roots a box and puts it on a botnet might patch the vulnerability to keep it in *theirbotnet.)

And thus, solving for equilibrium, durable social institutions attached to value demonstrate surprising levels of convergent evolution in norms.

Well, you are, but you’re not alone in this particular way.

Rob Henderson: Young men are roughly twice as likely as young women to be single:

There’s all the talk that dating sucks and is in crisis but is it actually true? Or rather, is it true more than it used to be?

Derek Thompson: “Young people in America aren’t dating any more, and it’s the beginning of a real social crisis” is—I mean, let’s be honest—exactly the sort of social phenomenon I would want to report the shit out of.

But … what’s the best evidence that it’s true?

Just one e.g.: Median-marriage age is rising steadily, but it doesn’t look like the dating market suddenly broke in the 2020s. Maybe this line will break in a few years when more of the Gen-Z cohort enters the marriage-age zone, but it just goes back to the main question: What’s the best evidence *right nowthat dating is in some crisis mode?

Over the longer term, yes, the declines seem like a big deal.

Spenser: First marriage rate per 1,000 never-married singles (25–34): ~120 in early 1970s to ~70 in 2020 Proportion never married by age 30: About 15% never married in 1970 vs ~35% in 2020.

Data Takes: Here’s another way to visualize these marriage trends: Gen Z is noticeably behind even Millennials’ marriage rates *at the same age*

Daniel Cox: We’ve asked about the % of Americans who have had a relationship during their teen years. It’s far lower for Gen Z (both men and women) than other generations.

There’s evidence that teens are spending less time in person together. We asked about this as well. There’s a strong correlation btw. hanging out with friends and dating. Obv. reasons: increased opportunities, increased socializing experience.

There does clearly seem to be a teen loneliness epidemic, but that could be a mostly distinct issue, born of our unwillingness to allow them physical contact and experiences, plus the way they handle mobile phones. And this is again over the longer time period.

Jean Twenge: Dating is definitely in crisis mode among U.S. 17- and 18-year-olds. Data from Monitoring the Future; update of Figure 6.16 in *Generations*.

Again, it seems clear there is a big issue at relatively young ages, but that could still be something that mostly fades over time.

Alice Evans: Did you see this by John Murdoch?

Taken together, I see very strong evidence for the ongoing steadily worsening situation, but not strong evidence for a sudden crisis in the 2020s.

Matthew Yglesias: There’s incredible levels of discourse and resentment about girlbosses and “email jobs” but the decline in marriage is among less-educated women.

Cartoons Hate Her!: They actually think men are rejecting women for making too much money. I can’t imagine how little you’d have to leave your house to think this was true.

Matt Popovich: Finally, a chart whose mysterious realignment happens in 1945 instead of 1970.

So good news then, all we have to do is send 100% of women to college.

In all seriousness though, as more women attend college over time, the yellow line is horizontal, and that is meaningful. The selection and signaling effects are doing less work and the results are holding steady.

Whereas the purple line could be increasingly dire selection effects, or the increasingly dire signal it sends to have not gone to college.

From 2004: Survey of 10,000 Chinese couples in 1991 shows self-matched couples had fewer domestic conflicts and higher income versus parent-induced matches or friend introductions. It makes sense that parent introductions do worse. The paper considers agency costs versus market expansion.

I’d care more about population differences, with the paper attempting to fix this by using regional and generational differences but those are a lot of the population differences I’d worry about, including regional infrastructure for creating self-made matches. What is surprising to me is that friend introductions do not do well, as they do not come with the same pressure as parent matches.

I suppose the key is to still apply strong selection and mostly reject such matches, and people were accepting too many of them? That, or they didn’t actually control properly for selection effects, which seems likely.

Life without sex: Large-scale study links sexlessness to physical, cognitive, and personality traits, socioecological factors, and DNA. Sexless men tended to live in regions with fewer women and more income inequality, genetic variation explained ~15% of variance. Okie dokie.

Claim that assortative mating has greatly increased is 75% or so due to later marriage. If you are sorting later in life, uncertainty about income decreases, so you can more efficiently sort. A lot of the sorting is based on education and class markers rather than income, which complicates this explanation, but I buy that this is a major factor. I would also add that the later you are doing the matching, the more you likely prioritize income over other factors. Also it’s notable that we are sorting on income rather than wealth.

Sentiment analysis and other statistics about text messages from a failed relationship.

What are the sources I’ve found most interesting in this area?

Matchmaker Blaine Anderson (@datingbyblaine) has been interesting and is getting a bunch of mentions here. A lot of the notes are kind of obvious but it’s good to see it laid out, and often the details surprise. In particular, she’s good at calibrating how forcefully she says things. Alexander (@datepsych) also often makes it into these posts.

Jacob’s blog on dating is a solid read. He’s an old friend. We definitely don’t agree on everything but his model is worth understanding.

Also worth noting this, in both directions: Cartoons Hate Her explains if you want to know what gets a man to go for a woman, you should probably ask women rather than men, on the same principle of ‘ask the fisherman how to catch fish’ that should make you suspicious of women’s reports of how to attract women. And that the answer isn’t primarily ‘be nice and respectful’ or ‘shy and polite.’

Connie: how it started vs how it’s going 🤭

Kyle Morris: Today @lishiyori and I are coming out of stealth. Announcing… our marriage! 💘

Backed by @notionhq + @agihousesf

From hackathon -> dateme docs -> 1st dates with 100+ intense questions, to proposal at Notion HQ

I’m thrilled to scale life together with this unicorn🦄☺️

The full story is less billboards and Twitter posts, and more Date-Me docs and meeting at Hackathons after seeing each others Date-Me docs and comparing Notion docs.

Discussion about this post

Dating Roundup #6 Read More »

samsung-teams-up-with-glance-to-use-your-face-in-ai-generated-lock-screen-ads

Samsung teams up with Glance to use your face in AI-generated lock screen ads

On an average day, you might unlock or look at your phone dozens of times, which makes the lock screen a hot property for advertising. Ad tech company Glance has been taking advantage of that for years with its ad-laden lock screen experiences, but it’s going further in the age of AI. Samsung and Glance have teamed up to deliver a new “AI shopping” experience that uses a selfie to create custom fashion ads. This feature is rolling out to numerous Samsung phones in the next month.

Glance has been around for a while—its non-AI lock screen experience has been bundled on various phones from Samsung, Motorola, and others. Before the AI era, Glance lured people in with promises of pretty pictures and news alerts, which came with a side of ads and tracking. The new Glance AI feature has all that, but it adds an unsettling face-stealing layer to the experience.

The AI-infused Glance will arrive on Samsung phones as both a standalone app and a fully integrated lock screen. Thankfully, this is a fully opt-in experience. If you never open or set up Glance, you can keep using the normal lock screen on your phone.

Credit: Glance

Should you choose to wade into the murky waters of AI shopping, Glance will have you take a selfie and provide some basic body type details. From there, it uses Google Gemini and Imagen to create fashion ads tailored to you—because they are you. Your lock screen will be populated with images of you “in outfits and destinations [you] would never imagine.” Naturally, you will be able to buy the looks chosen for you with a tap, which fills Glance’s coffers.

Samsung teams up with Glance to use your face in AI-generated lock screen ads Read More »

science-phds-face-a-challenging-and-uncertain-future

Science PhDs face a challenging and uncertain future


Smaller post-grad classes are likely due to research budget cuts.

Credit: Thomas Barwick/Stone via Getty Images

Since the National Science Foundation first started collecting postgraduation data nearly 70 years ago, the number of PhDs awarded in the United States has consistently risen. Last year, more than 45,000 students earned doctorates in science and engineering, about an eight-fold increase compared to 1958.

But this level of production of science and engineering PhD students is now in question. Facing significant cuts to federal science funding, some universities have reduced or paused their PhD admissions for the upcoming academic year. In response, experts are beginning to wonder about the short and long-term effects those shifts will have on the number of doctorates awarded and the consequent impact on science if PhD production does drop.

Such questions touch on longstanding debates about academic labor. PhD training is a crucial part of nurturing scientific expertise. At the same time, some analysts have worried about an oversupply of PhDs in some fields, while students have suggested that universities are exploiting them as low-cost labor.

Many budding scientists go into graduate school with the goal of staying in academia and ultimately establishing their own labs. For at least 30 years, there has been talk of a mismatch between the number of doctorates and the limited academic job openings. According to an analysis conducted in 2013, only 3,000 faculty positions in science and engineering are added each year—even though more than 35,000 PhDs are produced in these fields annually.

Decades of this asymmetrical dynamic has created a hypercompetitive and high-pressure environment in the academic world, said Siddhartha Roy, an environmental engineer at Rutgers University who co-authored a recent study on tenure-track positions in engineering. “If we look strictly at academic positions, we have a huge oversupply, and it’s not sustainable,” he said.

But while the academic job market remains challenging, experts point out that PhD training also prepares individuals for career paths in industry, government, and other science and technology fields. If fewer doctorates are awarded and funding continues to be cut, some argue, American science will weaken.

“The immediate impact is there’s going to be less science,” said Donna Ginther, a social researcher who studies scientific labor markets at the University of Kansas. In the long run, that could mean scientific innovations, such as new drugs or technological advances, will stall, she said: “We’re leaving that scientific discovery on the table.”

Historically, one of the main goals of training PhD students has been to retain those scientists as future researchers in their respective fields. “Academia has a tendency to want to produce itself, reproduce itself,” said Ginther. “Our training is geared towards creating lots of mini-mes.”

But it is no secret in the academic world that tenure-track faculty positions are scarce, and the road to obtaining tenure is difficult. Although it varies across different STEM fields, the number of doctorates granted each year consistently surpass the number of tenure-track positions available. A survey gathering data from the 2022-2023 academic year, conducted by the Computing Research Association, found that around 11 percent of PhD graduates in computational science (for which employment data was reported) moved on to tenure-track faculty positions.

Roy found a similar figure for engineering: Around one out of every eight individuals who obtain their doctorate—12.5 percent—will eventually land a tenure-track faculty position, a trend that remained stable between 2014 and 2021, the last year for which his team analyzed data. The bottleneck in faculty positions, according to one recent study, leads about 40 percent of postdoctoral researchers to leave academia.

However, in recent years, researchers who advise graduate students have begun to acknowledge careers beyond academia, including positions in industry, nonprofits, government, consulting, science communication, and policy. “We need, as academics, need to take a broader perspective on what and how we prepare our students,” said Ginther.

As opposed to faculty positions, some of these labor markets can be more robust and provide plenty of opportunities for those with a doctorate, said Daniel Larremore, a computer scientist at the University of Colorado Boulder who studies academic labor markets, among other topics. Whether there is a mismatch between the number of PhDs and employment opportunities will depend on the subject of study and which fields are growing or shrinking, he added. For example, he pointed out that there is currently a boom in machine learning and artificial intelligence, so there is a lot of demand from industry for computer science graduates. In fact, commitments to industry jobs after graduation seem to be at a 30-year high.

But not all newly minted PhDs immediately find work. According to the latest NSF data, students in biological and biomedical sciences experienced a decline in job offers in the past 20 years, with 68 percent having definite commitments after graduating in 2023, compared to 72 percent in 2003. “The dynamics in the labor market for PhDs depends very much on what subject the PhD is in,” said Larremore.

Still, employment rates reflect that postgraduates benefit from greater opportunities compared to the general population. In 2024, the unemployment rate for college graduates with a doctoral degree in the US was 1.2 percent, less than half the national average at the time, according to the Bureau of Labor Statistics. In NSF’s recent survey, 74 percent of science and engineering graduating doctorates had definite commitments for employment or postdoctoral study or training positions, three points higher than it was in 2003.

“Overproducing for the number of academic jobs available? Absolutely,” said Larremore. “But overproducing for the economy in general? I don’t think so.”

The experts who spoke with Undark described science PhDs as a benefit for society: Ultimately, scientists with PhDs contribute to the economy of a nation, be it through academia or alternative careers. Many are now concerned about the impact that cuts to scientific research may have on that contribution. Already, there are reports of universities scaling back graduate student admissions in light of funding uncertainties, worried that they might not be able to cover students’ education and training costs. Those changes could result in smaller graduating classes in future years.

Smaller classes of PhD students might not be a bad thing for academia, given the limited faculty positions, said Roy. And for most non-academic jobs, Roy said, a master’s degree is more than sufficient. However, people with doctorates do contribute to other sectors like industry, government labs, and entrepreneurship, he added.

In Ginther’s view, fewer scientists with doctoral training could deal a devastating blow for the broader scientific enterprise. “Science is a long game, and the discoveries now take a decade or two to really hit the market, so it’s going to impinge on future economic growth.”

These long-term impacts of reductions in funding might be hard to reverse and could lead to the withering of the scientific endeavor in the United States, Larremore said: “If you have a thriving ecosystem and you suddenly halve the sunlight coming into it, it simply cannot thrive in the way that it was.”

This article was originally published on Undark. Read the original article.

Science PhDs face a challenging and uncertain future Read More »