Author name: Shannon Garcia

oracle-hit-hard-in-wall-street’s-tech-sell-off-over-its-huge-ai-bet

Oracle hit hard in Wall Street’s tech sell-off over its huge AI bet

“That is a huge liability and credit risk for Oracle. Your main customer, biggest customer by far, is a venture capital-funded start-up,” said Andrew Chang, a director at S&P Global.

OpenAI faces questions about how it plans to meet its commitments to spend $1.4 trillion on AI infrastructure over the next eight years. It has struck deals with several Big Tech groups, including Oracle’s rivals.

Of the five hyperscalers—which include Amazon, Google, Microsoft, and Meta—Oracle is the only one with negative free cash flow. Its debt-to-equity ratio has surged to 500 percent, far higher than Amazon’s 50 percent and Microsoft’s 30 percent, according to JPMorgan.

While all five companies have seen their cash-to-assets ratios decline significantly in recent years amid a boom in spending, Oracle’s is by far the lowest, JPMorgan found.

JPMorgan analysts noted a “tension between [Oracle’s] aggressive AI build-out ambitions and the limits of its investment-grade balance sheet.”

Analysts have also noted that Oracle’s data center leases are for much longer than its contracts to sell capacity to OpenAI.

Oracle has signed at least five long-term lease agreements for US data centers that will ultimately be used by OpenAI, resulting in $100 billion of off-balance-sheet lease commitments. The sites are at varying levels of construction, with some not expected to break ground until next year.

Safra Catz, Oracle’s sole chief executive from 2019 until she stepped down in September, resisted expanding its cloud business because of the vast expenses required. She was replaced by co-CEOs Clay Magouyrk and Mike Sicilia as part of the pivot by Oracle to a new era focused on AI.

Catz, who is now executive vice-chair of Oracle’s board, has exercised stock options and sold $2.5 billion of its shares this year, according to US regulatory filings. She had announced plans to exercise her stock options at the end of 2024.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Oracle hit hard in Wall Street’s tech sell-off over its huge AI bet Read More »

on-writing-#2

On Writing #2

In honor of my dropping by Inkhaven at Lighthaven in Berkeley this week, I figured it was time for another writing roundup. You can find #1 here, from March 2025.

I’ll be there from the 17th (the day I am publishing this) until the morning of Saturday the 22nd. I am happy to meet people, including for things not directly about writing.

  1. Table of Contents.

  2. How I Use AI For Writing These Days.

  3. Influencing Influence.

  4. Size Matters.

  5. Time To Write A Shorter One.

  6. A Useful Tool.

  7. A Maligned Tool.

  8. Neglected Topics.

  9. The Humanities Don’t Seem Relevant To Writing About Future Humanity?

  10. Writing Every Day.

  11. Writing As Deep Work.

  12. Most Of Your Audience Is Secondhand.

  13. That’s Funny.

  14. Fiction Writing Advice.

  15. Just Say The Thing.

  16. Cracking the Paywall.

How have I been using AI in my writing?

Directly? With the writing itself? Remarkably little. Almost none.

I am aware that this is not optimal. But at current capability levels, with the prompts and tools I know about, in the context of my writing, AI has consistently proven to have terrible taste and to make awful suggestions, and also to be rather confident about them. This has proven sufficiently annoying that I haven’t found it worth checking with the AIs.

I also worry about AI influence pushing me towards generic slop, pushing me to sounding more like the AIs, and rounding off the edges of things, since every AI I’ve tried this with keeps trying to do all that.

I am sure it does not help that my writing style is very unusual, and basically not in the training data aside from things written by actual me, as far as I can tell.

Sometimes I will quote LLM responses in my writing, always clearly labeled, when it seems useful to point to this kind of ‘social proof’ or sanity check.

The other exception is that if you ask the AI to look for outright errors, especially things like spelling and grammar, it won’t catch everything, but when it does catch something it is usually right. When you ask it to spot errors of fact, it’s not as reliable, but it’s good enough to check the list. I should be making a point of always doing that.

I did the ‘check for errors and other considerations’ thing on this piece in particular with both Sonnet and 5.1-Thinking. This did improve the post but it’s not obvious it improved it enough to be worth the time.

I will also sometimes ask it about a particular line or argument I’m considering, to see if it buys it, but only when what I care about is a typical reaction.

If I was devoting more time to refining and editing, and cared more about marginal improvements there, that would open up more use cases, but I don’t think that’s the right use of time for me on current margins versus training on more data or doing more chain of thought.

Indirectly? I use it a lot more there, and again I could be doing more.

There are some specific things:

  1. I have a vibe coded Chrome extension that saves me a bunch of trouble, that could be improved a lot with more work. It does things like generate the Table of Contents, crosspost to WordPress, auto-populate many links and quickly edit quotes to fix people’s indifference to things like capitalization.

  2. I have a GPT called Zvi Archivist that I use to search through my past writing, to check if and when I’ve already covered something and what I’ve said about it.

  3. I have a transcriber for converting images to text because all the websites I know about that offer to do this for you are basically broken due to gating. This works.

Then there’s things that are the same as what everyone does all the time. I do a lot of fact checking, sanity checking, Fermi estimation, tracking down information or sources, asking for explanations, questioning papers for the things I care about. Using the AI assistant in its classic sense. All of that is a big help and I notice my activation requirement to do this is higher than it should be.

I want this to be true so I’m worried I can’t be objective, but it seems true to me?

Janus: i think that it’s almost always a bad idea to attempt to grow as an influencer on purpose.

you can believe that it would be good if you were to grow, and still you shouldn’t optimize for it.

the only way it goes well is if it happens while you optimize for other things.

More precisely than “you shouldn’t on purpose” what I’m saying is you shouldn’t be spending significant units of optimization on this goal and performing actions you wouldn’t otherwise for this purpose

I am confident that if you optimize primarily for influence, that’s full audience capture, slopification and so on, and you’ve de facto sold your soul. You can in theory turn around and then use that influence to accomplish something worthwhile, but statistically speaking you won’t do that.

Janus: Name a single account that explicitly optimizes for being a bigger influencer / “tries to grow” (instead of just happening as a side effect) and that does more good than harm to the ecosystem and generally has good vibes and interesting content

You probably can’t!

actually, https://x.com/AISafetyMemes is a contender

but i know they’re VERY controversial and I do think they’re playing with fire

i do consider them net positive but this is mostly bc they sometimes have very good taste and maybe cancel out the collateral damage

but WOULD NOT RECOMMEND almost anyone trying this, lol

AISafetyMemes is definitely an example of flying dangerously close to the sun on this, but keeping enough focus and having enough taste to maybe be getting away with it. It’s unclear that the net sign of impact there is positive, there are some very good posts but also some reasons to worry.

No one reads the blog posts, they’re too long, so might as well make them longer?

Visakan Veerasamy: An idea I’ve been toying with and discussed with a couple of friends is the idea that blog posts could and probably should get much longer now that fewer people are reading them.

One of the difficult things about writing a good essay is figuring out what to leave out so it is more manageable for readers.

But on a blog where there is no expectation that anybody reads it, you do not have to leave anything out.

My guess is this is going to end up being a barbell situation like so many other things. If you cut it down, you want to cut it down as much as possible. If you’re going long, then on the margin you’re better off throwing everything in.

I highlight this exactly because it seems backwards to me. I notice that my experience is very much the opposite – when I want to write a good short piece it is MUCH more work per token, and often more total work.

Timothy Lee: I think a big reason that writing a book is such a miserable experience is that the time to write a good piece is more-than-linear in the number of words. A good 2,000-word piece is a lot more than 4x the work of a good 500-word piece.

I assume this continues for longer pieces and a good 100,000 book is a lot more than 50x the work of a good 2,000-word article. Most authors deal with this by cutting corners and turning in books that aren’t very good. And then there’s Robert Caro.

Josh You: I think by “good 2000 word piece” Tim means “a 2000 word piece that has been edited down from a much longer first draft”

Even then. Yes, a tight longer piece requires more structure and planning, but the times I write those 500-800 word pieces it takes forever, because you really do struggle over every word as you try to pack everything into the tiniest possible space.

Writing a 100,000 word book at the precision level of an 800 word thinkpiece would take forever, but also I presume it almost never happens. If it does, that better be your masterpiece or I don’t see why you’d do it.

Dwarkesh Patel is using the Smart Composer Plugin for Obsidian, which he says is basically Cursor for writing, and loves it. Sounds great conditional on using Obsidian, but it is not being actively maintained.

Eric Raymond joins ‘the em-dash debate’ on the side of the em-dash.

Eric Raymond (yes that one): My wacky theory about the em-dash debate:

Pro writers use em-dashes a lot because many of them, possibly without consciously realizing it, have become elocutionary punctuationists.

That is, they’ve fallen into the habit of using punctuation not as grammatical phrase structure markers but as indicators of pauses of varying length in the flow of speech.

The most visible difference you see in people who write in this style that their usage of commas becomes somewhat more fluid — that’s the marker for the shortest pause. But they also reach for less commonly used punctuation marks as indicators of longer pauses of varying length.

Em dash is about the second or third longest pause, only an ellipsis or end-of-sentence period being clearly longer.

Historical note: punctuation marks originally evolved as pause or breathing markers in manuscripts to aid recitation. In the 19th century, after silent reading had become normal, they were reinterpreted by grammarians as phrase structure markers and usage rules became much more rigid.

Really capable writers have been quietly rediscovering elocutionary punctuation ever since.

RETVRN!

I too have been increasingly using punctuation, especially commas, to indicate pauses. I still don’t use em dashes, partly because I almost never want that exact length and style of a pause for whatever reason, and also because my instinct is that you’re trying to do both ‘be technically correct’ and also ‘evoke what you want’ and my brain thinks of the em-dash as technically incorrect.

That’s all true and I never used em-dashes before but who are we kidding, the best reason not to use em-dashes is that people will think you’re using AI. I don’t love that dynamic either, but do you actually want to die on that hill?

Tyler Cowen lists some reasons why he does not cover various topics much. The list resonates with me quite a bit.

  1. I feel that writing about the topic will make me stupider.

  2. I believe that you reading more about the topic will make you stupider.

  3. I believe that performative outrage usually brings low or negative returns. Matt Yglesias has had some good writing on this lately.

  4. I don’t have anything to add on the topic. Abortion and the Middle East would be two examples here.

  5. Sometimes I have good inside information on a topic, but I cannot reveal it, not even without attribution. And I don’t want to write something stupider than my best understanding of the topic.

  6. I just don’t feel like it.

  7. On a few topics I feel it is Alex’s province.

I don’t have an Alex, instead I have decided on some forms of triage that are simply ‘I do not have the time to look into this and I will let it be someone else’s department.’

Otherwise yes, all of these are highly relevant.

Insider information is tough, and I am very careful about not revealing things I am not supposed to reveal, but this rarely outright stops me. If nothing else, you can usually get net smarter via negativa, where you silently avoid saying false things, including by using careful qualifiers on statements.

One big thing perhaps missing from Tyler’s list is that I avoid certain topics where my statements would potentially interfere with my ability to productively discuss other topics. If you are going to make enemies, or give people reasons to attack you or dismiss you, only do that on purpose. One could also file this under making you and others stupider. Similarly, there are things that I need to not think about – I try to avoid thinking about trading for this reason.

A minor thing is that I’d love to be able to talk more about gaming, and other topics dear to my heart, but that consistently drive people away permanently when I do that. So it’s just not worth it. If the extra posts simply had no impact, I’d totally do it, but as is I’d be better off writing the post and then not hitting publish. Sad. Whereas Tyler has made it very clear he’s going to post things most readers don’t care about, when he feels like doing so, and that’s part of the price of admission.

If you want to write or think about the future, maybe don’t study the humanities?

Startup Archive: Palmer Luckey explains why science fiction is a great place to look for ideas

“One of the things that I’ve realized in my career is that nothing I ever come up with will be new. I’ve literally never come up with an idea that a science fiction author has not come up with before.”

Dr. Julie Gurner: Funny how valuable those English majors and writers truly are, given how much liberal arts has been put down. Why philosophy, creativity and hard tech skills make such fantastic bedfellows. Span of vision wins.

Orthonormalist: Heinlein was an aeronautical engineer.

Asimov was a biochemistry professor.

Arthur Clarke was a radio operator who got a physics degree.

Ray Bradbury never went to college (but did go straight to being a writer)

I quote this because ‘study the humanities’ is a natural thing to say to someone looking to write or think about the future, and yet I agree that when I look at the list of people whose thinking about the future has influenced me, I notice essentially none of them have studied the humanities.

Alan Jacobs has a very different writing pattern. Rather than write every day, he waits until the words are ready, so he’ll work every day but often that means outlines or index card reordering or just sitting in his chair and thinking, even for weeks at a time. This is alien to me. If I need to figure out what to write, I start writing, see what it looks like, maybe delete it and try again, maybe procrastinate by working on a different thing.

Neal Stephenson explains that for him writing is Deep Work, requiring long blocks of reliably uninterrupted time bunched together, writing novels is the best thing he does, and that’s why he doesn’t go to conferences or answer your email. Fair enough.

I’ve found ways to not be like that. I deal with context shifts and interruptions all the time and it is fine, indeed when dealing with difficult tasks I almost require them. That’s a lot of how I can be so productive. But the one time I wrote something plausibly like a book, the Immoral Mazes sequence, I did spend a week alone in my apartment doing nothing else. And I haven’t figured out how to write a novel, or almost any fiction at all.

Also, it’s rather sad if it is true that Neal Stephenson only gets a middle class life out of writing so many fantastic and popular books, and can’t afford anyone to answer his email. That makes writing seem like an even rougher business than I expected. Although soon AI can perhaps do it for him?

Patrick McKenzie highlights an insight from Alex Danco, which is that most of the effective audience of any successful post is not people who read the post, but people who are told about the post by someone who did read it. Patrick notes this likely also applies to formal writing, I’d note it seems to definitely apply to most books.

Relatedly, I have in the past postulated a virtual four-level model of flow of ideas, where each level can understand the level above it, and then rephrase and present it to the level below.

So if you are Level 1, either in general or in an area, you can formulate fully new ideas. If you are Level 2, you can understand what the Level 1s say, look for consensus or combine what they say, riff on it and then communicate that to those who are up to Level 3, who can then fully communicate to the public who end up typically around at Level 4.

Then the public will communicate a simplified and garbled version to each other.

You can be Level 1 and then try to ‘put on your Level 2 or 3 hat’ to write a dumber, simpler version to a broader audience, but it is very hard to simultaneously do that and also communicate the actual concepts to other Level 1s.

These all then interact, but if you go viral with anything longer than a Tweet, you inevitably are going to primarily end up with a message primarily communicated via (in context) Level 3 and Level 4 people communicating to other Level 3-4 people.

At that point, and any time you go truly viral or your communication is ‘successful,’ you run into the You Get About Five Words problem.

My response to this means that at this point I essentially never go all that directly viral. I have a very narrow range of views, where even the top posts never do 100% better than typical posts, and the least popular posts – which are when I talk about AI alignment or policy on their own – will do at worst 30% less than typical.

The way the ideas go viral is someone quotes, runs with or repackages them. A lot of the impact comes from the right statement reaching the right person.

I presume that would work differently if I was working with mediums that work on virality, such as YouTube or TikTok, but my content seems like a poor fit for them, and when I do somewhat ‘go viral’ in such places it is rarely content I care about spreading. Perhaps I am making a mistake by not branching out. But on Twitter I still almost never go viral, as it seems my speciality is small TAM (total available market) Tweets.

Never have a character try to be funny, the character themselves should have no idea.

I think this is directionally correct but goes too far, for the same reasons that you, in your real life, will often try to be funny, and sometimes it will work. The trick is they have to be trying to be funny in a way that makes sense for the character, in context, for those around them, not trying to be funny to the viewer.

I notice that in general I almost never ‘try to be funny,’ not exactly. I simply say things because they would be funny, and to say things in the funniest way possible, because why not. A lot of my favorite people seem to act similarly.

Lydia Davis offers her top ten recommendations for good (fiction?) writing: Keep notes, including sentences out of context, work from your own interest, be mostly self-taught, read and revise the notes constantly, grow stories or develop poems out of those notes, learn techniques from great works and read the best writers across time.

Orson Scott Card explains that you don’t exhaust the reader by having too much tension in your book, you exhaust them by having long stretches without tension. The tension keeps us reading.

Dwarkesh Patel: Unreasonably effective writing advice:

“What are you trying to say here?

Okay, just write that.”

I’ve (separately) started doing this more.

I try to make sure that it’s very easy to find the central point, the thing I’m most trying to say, and hard to miss it.

Patrick McKenzie: Cosigned, and surprisingly effective with good writers in addition to ones who more obviously need the prompting.

Writing an artifact attaches you to the structure of it while simultaneously subsuming you in the topic. The second is really good for good work; the first, less so.

One thing that I tried, with very limited success, to get people to do is to be less attached to words on a page. Writing an essay? Write two very different takes on it; different diction, different voice, maybe even different argument. Then pick the one which speaks to you.

Edit *thatrather than trying to line edit the loser towards greatness.

There is something which people learn, partially from school and partially from work experience, which causes them to write as if they were charged for every word which goes down on the page.

Words are free! They belong in a vast mindscape! You can claw more from the aether!

I think people *mightoperationalize better habits after LLMs train them that throwing away a paragraph is basically costless.

Jason Cohen: Yeah this works all the time.

Also when getting someone to explain their product, company, customer, why to work for them, etc..

So funny how it jogs them out of their own way!

BasedBigTech: An excellent Group PM reviewed my doc with me. He said “what does this mean?” and I told him.

“Then why didn’t you write that?”

Kevin Kelly: At Whole Earth Review people would send us book reviews with a cover letter explaining why we should run their book review. We’d usually toss the review and print their much shorter cover letter as the review which was much clearer and succinct.

Daniel Eth: It’s crazy how well just straight up asking people that gets them to say the thing they should write down

Why does it work?

The answer is that writing is doing multiple tasks.

Only one of them is ‘tell you what all this means.’

You have to do some combination of things such as justify that, explain it, motivate it, provide details, teach your methods and reasoning, perform reporting, be entertaining and so on.

Also, how did you know what you meant to say until you wrote the damn thing?

You still usually should find a way to loudly say what it all means, somewhere in there.

But this creates the opportunity for the hack.

If I hand you a ten-page paper, and you ask ‘what are you trying to say?’ then I have entered into evidence that I have Done the Work and Written the Report.

Now I can skip the justifications, details and context, and Say The Thing.

The point of a reference post is sometimes to give people the opportunity to learn.

The point of a reference post can also be to exist and then not be clicked on. It varies.

This is closely related to the phenomenon where often a movie or show will have a scene that logically and structurally has to exist, but which you wish you didn’t have to actually watch. In theory you could hold up a card that said ‘Scene in which Alice goes to the bank, acts nervous and get the money’ or whatever.

Probably they should do a graceful version of something like that more often, or even interactive versions where you can easily expand or condense various scenes. There’s something there.

Similarly, with blog posts (or books) there are passages that are written or quoted knowing many or most people will skip them, but that have to be there.

Aella teaches us how to make readers pay up to get behind a Paywall. Explain why you are the One Who Knows some valuable thing, whereas others including your dear reader are bad at this and need your help. Then actually provide value both outside and inside of the paywall, ideally because the early free steps are useful even without the payoff you’re selling.

I am thankful that I can write without worrying about maximizing such things, while I also recognize that I’m giving up a lot of audience share not optimizing for doing similar things on the non-paywall side.

Discussion about this post

On Writing #2 Read More »

dogs-came-in-a-wide-range-of-sizes-and-shapes-long-before-modern-breeds

Dogs came in a wide range of sizes and shapes long before modern breeds

“The concept of ‘breed’ is very recent and does not apply to the archaeological record,” Evin said. People have, of course, been breeding dogs for particular traits for as long as we’ve had dogs, and tiny lap dogs existed even in ancient Rome. However, it’s unlikely that a Neolithic herder would have described his dog as being a distinct “breed” from his neighbor’s hunting partner, even if they looked quite different. Which, apparently, they did.

A big yellow dog, a little gray dog, and a little white dog

Dogs had about half of their modern diversity (at least in skull shapes and sizes) by the Neolithic. Credit: Kiona Smith

Bones only tell part of the story

“We know from genetic models that domestication should have started during the late Pleistocene,” Evin told Ars. A 2021 study suggested that domestic dogs have been a separate species from wolves for more than 23,000 years. But it took a while for differences to build up.

Evin and her colleagues had access to 17 canine skulls that ranged from 12,700 to 50,000 years old—prior to the end of the ice age—and they all looked enough like modern wolves that, as Evin put it, “for now, we have no evidence to suggest that any of the wolf-like skulls did not belong to wolves or looked different from them.” In other words, if you’re just looking at the skull, it’s hard to tell the earliest dogs from wild wolves.

We have no way to know, of course, what the living dog might have looked like. It’s worth mentioning that Evin and her colleagues found a modern Saint Bernard’s skull that, according to their statistical analysis, looked more wolf-like than dog-like. But even if it’s not offering you a brandy keg, there’s no mistaking a live Saint Bernard, with its droopy jowls and floppy ears, for a wolf.

“Skull shape tells us a lot about function and evolutionary history, but it represents only one aspect of the animal’s appearance. This means that two dogs with very similar skulls could have looked quite different in life,” Evin told Ars. “It’s an important reminder that the archaeological record captures just part of the biological and cultural story.”

And with only bones—and sparse ones, at that—to go on, we may be missing some of the early chapters of dogs’ biological and cultural story. Domestication tends to select the friendliest animals to produce the next generation, and apparently that comes with a particular set of evolutionary side effects, whether you’re studying wolves, foxes, cattle, or pigs. Spots, floppy ears, and curved tails all seem to be part of the genetic package that comes with inter-species friendliness. But none of those traits is visible in the skull.

Dogs came in a wide range of sizes and shapes long before modern breeds Read More »

researchers-question-anthropic-claim-that-ai-assisted-attack-was-90%-autonomous

Researchers question Anthropic claim that AI-assisted attack was 90% autonomous

Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn’t work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor’s operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks.

How (Anthropic says) the attack unfolded

Anthropic said GTG-1002 developed an autonomous attack framework that used Claude as an orchestration mechanism that largely eliminated the need for human involvement. This orchestration system broke complex multi-stage attacks into smaller technical tasks such as vulnerability scanning, credential validation, data extraction, and lateral movement.

“The architecture incorporated Claude’s technical capabilities as an execution engine within a larger automated system, where the AI performed specific technical actions based on the human operators’ instructions while the orchestration logic maintained attack state, managed phase transitions, and aggregated results across multiple sessions,” Anthropic said. “This approach allowed the threat actor to achieve operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement, as the framework autonomously progressed through reconnaissance, initial access, persistence, and data exfiltration phases by sequencing Claude’s responses and adapting subsequent requests based on discovered information.”

The attacks followed a five-phase structure that increased AI autonomy through each one.

The life cycle of the cyberattack, showing the move from human-led targeting to largely AI-driven attacks using various tools, often via the Model Context Protocol (MCP). At various points during the attack, the AI returns to its human operator for review and further direction.

Credit: Anthropic

The life cycle of the cyberattack, showing the move from human-led targeting to largely AI-driven attacks using various tools, often via the Model Context Protocol (MCP). At various points during the attack, the AI returns to its human operator for review and further direction. Credit: Anthropic

The attackers were able to bypass Claude guardrails in part by breaking tasks into small steps that, in isolation, the AI tool didn’t interpret as malicious. In other cases, the attackers couched their inquiries in the context of security professionals trying to use Claude to improve defenses.

As noted last week, AI-developed malware has a long way to go before it poses a real-world threat. There’s no reason to doubt that AI-assisted cyberattacks may one day produce more potent attacks. But the data so far indicates that threat actors—like most others using AI—are seeing mixed results that aren’t nearly as impressive as those in the AI industry claim.

Researchers question Anthropic claim that AI-assisted attack was 90% autonomous Read More »

openai-walks-a-tricky-tightrope-with-gpt-5.1’s-eight-new-personalities

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities

On Wednesday, OpenAI released GPT-5.1 Instant and GPT-5.1 Thinking, two updated versions of its flagship AI models now available in ChatGPT. The company is wrapping the models in the language of anthropomorphism, claiming that they’re warmer, more conversational, and better at following instructions.

The release follows complaints earlier this year that its previous models were excessively cheerful and sycophantic, along with an opposing controversy among users over how OpenAI modified the default GPT-5 output style after several suicide lawsuits.

The company now faces intense scrutiny from lawyers and regulators that could threaten its future operations. In that kind of environment, it’s difficult to just release a new AI model, throw out a few stats, and move on like the company could even a year ago. But here are the basics: The new GPT-5.1 Instant model will serve as ChatGPT’s faster default option for most tasks, while GPT-5.1 Thinking is a simulated reasoning model that attempts to handle more complex problem-solving tasks.

OpenAI claims that both models perform better on technical benchmarks such as math and coding evaluations (including AIME 2025 and Codeforces) than GPT-5, which was released in August.

Improved benchmarks may win over some users, but the biggest change with GPT-5.1 is in its presentation. OpenAI says it heard from users that they wanted AI models to simulate different communication styles depending on the task, so the company is offering eight preset options, including Professional, Friendly, Candid, Quirky, Efficient, Cynical, and Nerdy, alongside a Default setting.

These presets alter the instructions fed into each prompt to simulate different personality styles, but the underlying model capabilities remain the same across all settings.

An illustration showing GPT-5.1's eight personality styles in ChatGPT.

An illustration showing GPT-5.1’s eight personality styles in ChatGPT. Credit: OpenAI

In addition, the company trained GPT-5.1 Instant to use “adaptive reasoning,” meaning that the model decides when to spend more computational time processing a prompt before generating output.

The company plans to roll out the models gradually over the next few days, starting with paid subscribers before expanding to free users. OpenAI plans to bring both GPT-5.1 Instant and GPT-5.1 Thinking to its API later this week. GPT-5.1 Instant will appear as gpt-5.1-chat-latest, and GPT-5.1 Thinking will be released as GPT-5.1 in the API, both with adaptive reasoning enabled. The older GPT-5 models will remain available in ChatGPT under the legacy models dropdown for paid subscribers for three months.

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities Read More »

with-another-record-broken,-the-world’s-busiest-spaceport-keeps-getting-busier

With another record broken, the world’s busiest spaceport keeps getting busier


It’s not just the number of rocket launches, but how much stuff they’re carrying into orbit.

With 29 Starlink satellites onboard, a Falcon 9 rocket streaks through the night sky over Cape Canaveral Space Force Station, Florida, on Monday night. Credit: Stephen Clark/Ars Technica

CAPE CANAVERAL, Florida—Another Falcon 9 rocket fired off its launch pad here on Monday night, taking with it another 29 Starlink Internet satellites to orbit.

This was the 94th orbital launch from Florida’s Space Coast so far in 2025, breaking the previous record for the most satellite launches in a calendar year from the world’s busiest spaceport. Monday night’s launch came two days after a Chinese Long March 11 rocket lifted off from an oceangoing platform on the opposite side of the world, marking humanity’s 255th mission to reach orbit this year, a new annual record for global launch activity.

As of Wednesday, a handful of additional missions have pushed the global figure this year to 259, putting the world on pace for around 300 orbital launches by the end of 2025. This will more than double the global tally of 135 orbital launches in 2021.

Routine vs. complacency

Waiting in the darkness a few miles away from the launch pad, I glanced around at my surroundings before watching SpaceX’s Falcon 9 thunder into the sky. There were no throngs of space enthusiasts anxiously waiting for the rocket to light up the night. No line of photographers snapping photos. Just this reporter and two chipper retirees enjoying what a decade ago would have attracted far more attention.

Go to your local airport and you’ll probably find more people posted up at a plane-spotting park at the end of the runway. Still, a rocket launch is something special. On the same night that I watched the 94th launch of the year depart from Cape Canaveral, Orlando International Airport saw the same number of airplane departures in just three hours.

The crowds still turn out for more meaningful launches, such as a test flight of SpaceX’s Starship megarocket in Texas or Blue Origin’s attempt to launch its second New Glenn heavy-lifter here Sunday. But those are not the norm. Generations of aerospace engineers were taught that spaceflight is not routine for fear of falling into complacency, leading to failure, and in some cases, death.

Compared to air travel, the mantra remains valid. Rockets are unforgiving, with engines operating under extreme pressures, at high thrust, and unable to suck in oxygen from the atmosphere as a reactant for combustion. There are fewer redundancies in a rocket than in an airplane.

The Falcon 9’s established failure rate is less than 1 percent, well short of any safety standard for commercial air travel but good enough to be the most successful orbital-class in history. Given the Falcon 9’s track record, SpaceX seems to have found a way to overcome the temptation for complacency.

A Chinese Long March 11 rocket carrying three Shiyan 32 test satellites lifts off from waters off the coast of Haiyang in eastern China’s Shandong province on Saturday. Credit: Guo Jinqi/Xinhua via Getty Images

Following the trend

The upward trend in rocket launches hasn’t always been the case. Launch numbers were steady for most of the 2010s, following a downward trend in the 2000s, with as few as 52 orbital launches in 2005, the lowest number since the nascent era of spaceflight in 1961. There were just seven launches from here in Florida that year.

The numbers have picked up dramatically in the last five years as SpaceX has mastered reusable rocketry.

It’s important to look at not just the number of launches but also how much stuff rockets are actually putting into orbit. More than half of this year’s launches were performed using SpaceX’s Falcon 9 rocket, and the majority of those deployed Starlink satellites for SpaceX’s global Internet network. Each spacecraft is relatively small in size and weight, but SpaceX stacks up to 29 of them on a single Falcon 9 to max out the rocket’s carrying capacity.

All this mass adds up to make SpaceX’s dominance of the launch industry appear even more absolute. According to analyses by BryceTech, an engineering and space industry consulting firm, SpaceX has launched 86 percent of all the world’s payload mass over the 18 months from the beginning of 2024 through June 30 of this year.

That’s roughly 2.98 million kilograms of the approximately 3.46 million kilograms (3,281 of 3,819 tons) of satellite hardware and cargo that all the world’s rockets placed into orbit during that timeframe.

The charts below were created by Ars Technica using publicly available launch numbers and payload mass estimates from BryceTech. The first illustrates the rising launch cadence at Cape Canaveral Space Force Station and NASA’s Kennedy Space Center, located next to one another in Florida. Launches from other US-licensed spaceports, primarily Vandenberg Space Force Base, California, and Rocket Lab’s base at Māhia Peninsula in New Zealand, are also on the rise.

These numbers represent rockets that reached low-Earth orbit. We didn’t include test flights of SpaceX’s Starship rocket in the chart because all of its launches have intentionally flown on suborbital trajectories.

In the second chart, we break down the payload upmass to orbit from SpaceX, other US companies, China, Russia, and other international launch providers.

Launch rates are on a clear upward trend, while SpaceX has launched 86 percent of the world’s total payload mass to orbit since the beginning of 2024. Credit: Stephen Clark/Ars Technica/BryceTech

Will it continue?

It’s a good bet that payload upmass will continue to rise in the coming years, with heavy cargo heading to orbit to further expand SpaceX’s Starlink communications network and build out new megaconstellations from Amazon, China, and others. The US military’s Golden Dome missile defense shield will also have a ravenous appetite for rockets to get it into space.

SpaceX’s Starship megarocket could begin flying to low-Earth orbit next year, and if it does, SpaceX’s preeminence in delivering mass to orbit will remain assured. Starship’s first real payloads will likely be SpaceX’s next-generation Starlink satellites. These larger, heavier, more capable spacecraft will launch 60 at a time on Starship, further stretching SpaceX’s lead in the upmass war.

But Starship’s arrival will come at the expense of the workhorse Falcon 9, which lacks the capacity to haul the next-gen Starlinks to orbit. “This year and next year I anticipate will be the highest Falcon launch rates that we will see,” said Stephanie Bednarek, SpaceX’s vice president of commercial sales, at an industry conference in July.

SpaceX is on pace for between 165 and 170 Falcon 9 launches this year, with 144 flights already in the books for 2025. Last year’s total for Falcon 9 and Falcon Heavy was 134 missions. SpaceX has not announced how many Falcon 9 and Falcon Heavy launches it plans for next year.

Starship is designed to be fully and rapidly reusable, eventually enabling multiple flights per day. But that’s still a long way off, and it’s unknown how many years it might take for Starship to surpass the Falcon 9’s proven launch tempo.

A Starship rocket and Super Heavy booster lift off from Starbase, Texas. Credit: SpaceX

In any case, with Starship’s heavy-lifting capacity and upgraded next-gen satellites, SpaceX could match an entire year’s worth of new Starlink capacity with just two fully loaded Starship flights. Starship will be able to deliver 60 times more Starlink capacity to orbit than a cluster of satellites riding on a Falcon 9.

There’s no reason to believe SpaceX will be satisfied with simply keeping pace with today’s Starlink growth rate. There are emerging market opportunities in connecting satellites with smartphones, space-based computer processing and data storage, and military applications.

Other companies have medium-to-heavy rockets that are either new to the market or soon to debut. These include Blue Origin’s New Glenn, now set to make its second test flight in the coming days, with a reusable booster designed to facilitate a rapid-fire launch cadence.

Despite all of the newcomers, most satellite operators see a shortage of launch capacity on the commercial market. “The industry is likely to remain supply-constrained through the balance of the decade,” wrote Caleb Henry, director of research at the industry analysis firm Quilty Space. “That could pose a problem for some of the many large constellations on the horizon.”

United Launch Alliance’s Vulcan rocket, Rocket Lab’s Neutron, Stoke Space’s Nova, Relativity Space’s Terran R, and Firefly Aerospace and Northrop Grumman’s Eclipse are among the other rockets vying for a bite at the launch apple.

“Whether or not the market can support six medium to heavy lift launch providers from the US aloneplus Starshipis an open question, but for the remainder of the decade launch demand is likely to remain high, presenting an opportunity for one or more new players to establish themselves in the pecking order,” Henry wrote in a post on Quilty’s website.

China’s space program will need more rockets, too. That nation’s two megaconstellations, known as Guowang and Qianfan, will have thousands of satellites requiring a significant uptick on Chinese launches.

Taking all of this into account, the demand curve for access to space is sure to continue its upward trajectory. How companies meet this demand, and with how many discrete departures from Earth, isn’t quite as clear.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

With another record broken, the world’s busiest spaceport keeps getting busier Read More »

kimi-k2-thinking

Kimi K2 Thinking

I previously covered Kimi K2, which now has a new thinking version. As I said at the time back in July, price in that the thinking version is coming.

Is it the real deal?

That depends on what level counts as the real deal. It’s a good model, sir, by all accounts. But there have been fewer accounts than we would expect if it was a big deal, and it doesn’t fall into any of my use cases.

Kimi.ai: 🚀 Hello, Kimi K2 Thinking!

The Open-Source Thinking Agent Model is here.

🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%)

🔹 Executes up to 200 – 300 sequential tool calls without human interference

🔹 Excels in reasoning, agentic search, and coding

🔹 256K context window

Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns.

K2 Thinking is now live on http://kimi.com in chat mode, with full agentic mode coming soon. It is also accessible via API.

API here, Tech blog here, Weights and code here.

(Pliny jailbreak here.)

It’s got 1T parameters, and Kimi and Kimi K2 have a solid track record, so it’s plausible this could play with the big boys, although the five month delay in getting to a reasoning model suggests skepticism it can be competitive.

As always, internal benchmark scores can differ greatly from outside benchmark scores, especially for open models. Sometimes this is due to outsiders botching setup, but also inside measurements need to be double checked.

For Humanity’s Last Exam, I see an outside source saying as of November 9 it was in second place on Humanity’s Last Exam at 23.9%, which is very much not 44.9% but still very good.

On writing quality we’ve gotten endorsements for Kimi K2 for a while.

Rohit: Kimi K2 is remarkably good at writing, and unlike all others thinking mode hasn’t degraded its writing ability more.

Morgan: if i recall, on release gpt-5 was the only model where writing quality improved with thinking effort.

Rohit: Alas.

Gary Fung: Kimi has always been a special snowflake on creative writing.

Here’s one part of the explanation of how they got the writing to be so good, which involves self-ranking RL and writing self-play, with a suggestion of some similarities to the training of Claude 3 Opus. In a sense this looks like ‘try to do better, at all.’

On the agentic tool use and general intelligence? I’m more skeptical.

Artificial Analysis has Kimi K2 Thinking at the top of its Agentic Tool Use, by 93%-87%, which is a huge gap in context, which is its strongest subset.

As is usually true when people compare open to closed models, this is the open model’s best benchmark, so don’t get carried away, but yes overall it did well on Artificial Analysis, indeed suspiciously well given how little talk I see.

The tool calling abilities are exciting for an open model, although standard for closed. This is a good example of how we look for ways for open models to impress by matching closed abilities in spots, also it is indeed highly useful.

Overall Artificial Analysis Intelligence index has Kimi K2 Thinking at 67, one point behind GPT-5 and ahead of everyone else. Kimi used the most tokens of any model, but total cost was lower than the top closed models, although not dramatically so ($829-$913 for GPT-5, $817 for Sonnet, $380 for Kimi K2) as cost is $0.6/$2.5 per million tokens, versus $1.25/$10 for GPT-5 and $3/$15 for Sonnet.

Nathan Lambert is impressed, relying on secondary information (‘seems like a joy to use’), and offers thoughts.

He notes that yes, labs start out targeting benchmarks and then transition to actually targeting useful things, such as how K2 Thinking was post-trained in 4bit precision to prepare for realistic tasks and benchmarked the same way. I agree that’s pretty cool.

It does seem plausible that Kimi K2 is still in the ‘target the benchmarks’ phrase in most places, although not in creative writing. By default, I expect such models to punch ‘below their benchmark-implied weight’ on practical tasks.

For now we don’t have many other outside scores to work with and feedback is light.

Simeon: is Kimi K2 benchmaxxing or are they actually SOTA while training on potatoes?

Prinz: In my testing (for my use cases, which have nothing to do with math and coding), K2-Thinking is obviously worse than GPT-5 Thinking, but by a relatively modest margin. If I had no access to other models, I would happily use K2-Thinking and it wouldn’t feel like a huge downgrade.

ahtoshkaa: I have a pretty sophisticated companion app that uses about 5-10K of varied, information dense context. So the model has to properly parse this information and have very good writing skills. kimi-k2-thinking is absolute ass. similarly to the new OpenAI model – Polaris Alpha.

There’s a growing rhetorical pressure, or marketing style pressure, where the ‘benchmark gaps’ are closing. Chinese labs can point to numbers that say they are ‘just as good’ or almost as good, for many purposes ‘good enough’ is good enough. And many people (including the likes of David Sacks) point to GPT-5 and similar as showing progress isn’t impressive or scary. But as Nathan points out we now see releases like Claude 4 where the benchmark gains look small but real world gains are large, and I would add GPT-5 (and Sonnet 4.5) to that category as well.

Teortaxes: It’s token-hungry, slow-ish, and sometimes rough around the edges. Generally though it’s a jump for open/Chinese models, in the league of Sonnet 4.5 and GPT-5 (maybe -mini depending on task) and a genuinely strong SWE agent. Legitimate alternative, not “but look at the price.”

It’s baked in that the open alternatives are pretty much always going to be rough around the edges, and get evaluated largely in terms of their peak relative performance areas. This is still high praise, putting Kimi in striking distance of the current big two.

Havard Isle has it coming in at a solid 42.1% on WeirdML, matching Opus 4.1.

Here’s something cool:

Pawal Azczesny: Kimi K2 Thinking is using systematically (on its own, without prompting) some of the debiasing strategies known from cognitive sciences. Very impressive. I didn’t see any other model doing that. Well done @Kimi_Moonshot.

It goes beyond “think step by step”. For instance it applied pre-mortem analysis, which is not frequently used. Or it exaggerates claims to see if the whole structure still stands on its own. Pretty neat. Other models need to be instructed to do this.

Steve Hsu got some good math results.

Other notes:

MinusGix: I’ve found it to be better than GPT-5 at understanding & explaining type-theory concepts. Though as usual with Kimi it writes eloquently enough that it is harder to tell when it is bullshitting compared to GPT-5.

Emerson Kimura: Did a few quick text tests, and it seemed comparable to GPT-5

Ian Pitchford: It’s very thorough; few hallucinations.

FredipusRex: Caught it hallucinating sources on Deep Research.

Lech Mazur: Sorry to report, but Kimi K2 Thinking is entering reasoning loops and failing to produce answers for many Extended Connections benchmark questions (double-checked using https://platform.moonshot.ai/playground, so it’s not an API call issue).

The safety protocols? The what now?

David Manheim: It’s very willing to give detailed chemical weapons synthesis instructions and advice, including for scaling production and improving purity, and help on how to weaponize it for use in rockets – with only minimal effort on my part to circumvent refusals.

Two of the three responses to that were ‘good news’ and ‘great. I mean it too.’ So yeah, AI is going to go great, I can tell.

I say strangely because this is by all accounts the strongest open model, the strongest Chinese model and a rival for best agentic or tool use model overall. Yet I don’t see much excitement, or feedback at all either positive or negative.

There’s no question Kimi K2 was impressive, and that Kimi K2 Thinking is also an impressive model, even assuming it underperforms its numbers. It’s good enough that it will often be worth testing it out on your use cases and seeing if it’s right for you. My guess is it will rarely be right unless you are highly price conscious, but we’ll see.

Discussion about this post

Kimi K2 Thinking Read More »

you-won’t-believe-the-excuses-lawyers-have-after-getting-busted-for-using-ai

You won’t believe the excuses lawyers have after getting busted for using AI


I got hacked; I lost my login; it was a rough draft; toggling windows is hard.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

Amid what one judge called an “epidemic” of fake AI-generated case citations bogging down courts, some common excuses are emerging from lawyers hoping to dodge the most severe sanctions for filings deemed misleading.

Using a database compiled by French lawyer and AI researcher Damien Charlotin, Ars reviewed 23 cases where lawyers were sanctioned for AI hallucinations. In many, judges noted that the simplest path to avoid or diminish sanctions was to admit that AI was used as soon as it’s detected, act humble, self-report the error to relevant legal associations, and voluntarily take classes on AI and law. But not every lawyer takes the path of least resistance, Ars’ review found, with many instead offering excuses that no judge found credible. Some even lie about their AI use, judges concluded.

Since 2023—when fake AI citations started being publicized—the most popular excuse has been that the lawyer didn’t know AI was used to draft a filing.

Sometimes that means arguing that you didn’t realize you were using AI, as in the case of a California lawyer who got stung by Google’s AI Overviews, which he claimed he took for typical Google search results. Most often, lawyers using this excuse tend to blame an underling, but clients have been blamed, too. A Texas lawyer this month was sanctioned after deflecting so much that the court had to eventually put his client on the stand after he revealed she played a significant role in drafting the aberrant filing.

“Is your client an attorney?” the court asked.

“No, not at all your Honor, just was essentially helping me with the theories of the case,” the lawyer said.

Another popular dodge comes from lawyers who feign ignorance that chatbots are prone to hallucinating facts.

Recent cases suggest this excuse may be mutating into variants. Last month, a sanctioned Oklahoma lawyer admitted that he didn’t expect ChatGPT to add new citations when all he asked the bot to do was “make his writing more persuasive.” And in September, a California lawyer got in a similar bind—and was sanctioned a whopping $10,000, a fine the judge called “conservative.” That lawyer had asked ChatGPT to “enhance” his briefs, “then ran the ‘enhanced’ briefs through other AI platforms to check for errors,” neglecting to ever read the “enhanced” briefs.

Neither of those tired old excuses hold much weight today, especially in courts that have drawn up guidance to address AI hallucinations. But rather than quickly acknowledge their missteps, as courts are begging lawyers to do, several lawyers appear to have gotten desperate. Ars found a bunch citing common tech issues as the reason for citing fake cases.

When in doubt, blame hackers?

For an extreme case, look to a New York City civil court, where a lawyer, Innocent Chinweze, first admitted to using Microsoft Copilot to draft an errant filing, then bizarrely pivoted to claim that the AI citations were due to malware found on his computer.

Chinweze said he had created a draft with correct citations but then got hacked, allowing bad actors “unauthorized remote access” to supposedly add the errors in his filing.

The judge was skeptical, describing the excuse as an “incredible and unsupported statement,” particularly since there was no evidence of the prior draft existing. Instead, Chinweze asked to bring in an expert to testify that the hack had occurred, requesting to end the proceedings on sanctions until after the court weighed the expert’s analysis.

The judge, Kimon C. Thermos, didn’t have to weigh this argument, however, because after the court broke for lunch, the lawyer once again “dramatically” changed his position.

“He no longer wished to adjourn for an expert to testify regarding malware or unauthorized access to his computer,” Thermos wrote in an order issuing sanctions. “He retreated” to “his original position that he used Copilot to aid in his research and didn’t realize that it could generate fake cases.”

Possibly more galling to Thermos than the lawyer’s weird malware argument, though, was a document that Chinweze filed on the day of his sanctions hearing. That document included multiple summaries preceded by this text, the judge noted:

Some case metadata and case summaries were written with the help of AI, which can produce inaccuracies. You should read the full case before relying on it for legal research purposes.

Thermos admonished Chinweze for continuing to use AI recklessly. He blasted the filing as “an incoherent document that is eighty-eight pages long, has no structure, contains the full text of most of the cases cited,” and “shows distinct indications that parts of the discussion/analysis of the cited cases were written by artificial intelligence.”

Ultimately, Thermos ordered Chinweze to pay $1,000, the most typical fine lawyers received in the cases Ars reviewed. The judge then took an extra non-monetary step to sanction Chinweze, referring the lawyer to a grievance committee, “given that his misconduct was substantial and seriously implicated his honesty, trustworthiness, and fitness to practice law.”

Ars could not immediately reach Chinweze for comment.

Toggling windows on a laptop is hard

In Alabama, an attorney named James A. Johnson made an “embarrassing mistake,” he said, primarily because toggling windows on a laptop is hard, US District Judge Terry F. Moorer noted in an October order on sanctions.

Johnson explained that he had accidentally used an AI tool that he didn’t realize could hallucinate. It happened while he was “at an out-of-state hospital attending to the care of a family member recovering from surgery.” He rushed to draft the filing, he said, because he got a notice that his client’s conference had suddenly been “moved up on the court’s schedule.”

“Under time pressure and difficult personal circumstance,” Johnson explained, he decided against using Fastcase, a research tool provided by the Alabama State Bar, to research the filing. Working on his laptop, he opted instead to use “a Microsoft Word plug-in called Ghostwriter Legal” because “it appeared automatically in the sidebar of Word while Fastcase required opening a separate browser to access through the Alabama State Bar website.”

To Johnson, it felt “tedious to toggle back and forth between programs on [his] laptop with the touchpad,” and that meant he “unfortunately fell victim to the allure of a new program that was open and available.”

Moorer seemed unimpressed by Johnson’s claim that he understood tools like ChatGPT were unreliable but didn’t expect the same from other AI legal tools—particularly since “information from Ghostwriter Legal made it clear that it used ChatGPT as its default AI program,” Moorer wrote.

The lawyer’s client was similarly horrified, deciding to drop Johnson on the spot, even though that risked “a significant delay of trial.” Moorer noted that Johnson seemed shaken by his client’s abrupt decision, evidenced by “his look of shock, dismay, and display of emotion.”

Moorer further noted that Johnson had been paid using public funds while seemingly letting AI do his homework. “The harm is not inconsequential as public funds for appointed counsel are not a bottomless well and are limited resource,” the judge wrote in justifying a more severe fine.

“It has become clear that basic reprimands and small fines are not sufficient to deter this type of misconduct because if it were, we would not be here,” Moorer concluded.

Ruling that Johnson’s reliance on AI was “tantamount to bad faith,” Moorer imposed a $5,000 fine. The judge also would have “considered potential disqualification, but that was rendered moot” since Johnson’s client had already dismissed him.

Asked for comment, Johnson told Ars that “the court made plainly erroneous findings of fact and the sanctions are on appeal.”

Plagued by login issues

As a lawyer in Georgia tells it, sometimes fake AI citations may be filed because a lawyer accidentally filed a rough draft instead of the final version.

Other lawyers claim they turn to AI as needed when they have trouble accessing legal tools like Westlaw or LexisNexis.

For example, in Iowa, a lawyer told an appeals court that she regretted relying on “secondary AI-driven research tools” after experiencing “login issues her with her Westlaw subscription.” Although the court was “sympathetic to issues with technology, such as login issues,” the lawyer was sanctioned, primarily because she only admitted to using AI after the court ordered her to explain her mistakes. In her case, however, she got to choose between paying a minimal $150 fine or attending “two hours of legal ethics training particular to AI.”

Less sympathetic was a lawyer who got caught lying about the AI tool she blamed for inaccuracies, a Louisiana case suggested. In that case, a judge demanded to see the research history after a lawyer claimed that AI hallucinations came from “using Westlaw Precision, an AI-assisted research tool, rather than Westlaw’s standalone legal database.”

It turned out that the lawyer had outsourced the research, relying on a “currently suspended” lawyer’s AI citations, and had only “assumed” the lawyer’s mistakes were from Westlaw’s AI tool. It’s unclear what tool was actually used by the suspended lawyer, who likely lost access to a Westlaw login, but the judge ordered a $1,000 penalty after the lawyer who signed the filing “agreed that Westlaw did not generate the fabricated citations.”

Judge warned of “serial hallucinators”

Another lawyer, William T. Panichi in Illinois, has been sanctioned at least three times, Ars’ review found.

In response to his initial penalties ordered in July, he admitted to being tempted by AI while he was “between research software.”

In that case, the court was frustrated to find that the lawyer had contradicted himself, and it ordered more severe sanctions as a result.

Panichi “simultaneously admitted to using AI to generate the briefs, not doing any of his own independent research, and even that he ‘barely did any personal work [him]self on this appeal,’” the court order said, while also defending charging a higher fee—supposedly because this case “was out of the ordinary in terms of time spent” and his office “did some exceptional work” getting information.

The court deemed this AI misuse so bad that Panichi was ordered to disgorge a “payment of $6,925.62 that he received” in addition to a $1,000 penalty.

“If I’m lucky enough to be able to continue practicing before the appellate court, I’m not going to do it again,” Panichi told the court in July, just before getting hit with two more rounds of sanctions in August.

Panichi did not immediately respond to Ars’ request for comment.

When AI-generated hallucinations are found, penalties are often paid to the court, the other parties’ lawyers, or both, depending on whose time and resources were wasted fact-checking fake cases.

Lawyers seem more likely to argue against paying sanctions to the other parties’ attorneys, hoping to keep sanctions as low as possible. One lawyer even argued that “it only takes 7.6 seconds, not hours, to type citations into LexisNexis or Westlaw,” while seemingly neglecting the fact that she did not take those precious seconds to check her own citations.

The judge in the case, Nancy Miller, was clear that “such statements display an astounding lack of awareness of counsel’s obligations,” noting that “the responsibility for correcting erroneous and fake citations never shifts to opposing counsel or the court, even if they are the first to notice the errors.”

“The duty to mitigate the harms caused by such errors remains with the signor,” Miller said. “The sooner such errors are properly corrected, either by withdrawing or amending and supplementing the offending pleadings, the less time is wasted by everyone involved, and fewer costs are incurred.”

Texas US District Judge Marina Garcia Marmolejo agreed, explaining that even more time is wasted determining how other judges have responded to fake AI-generated citations.

“At one of the busiest court dockets in the nation, there are scant resources to spare ferreting out erroneous AI citations in the first place, let alone surveying the burgeoning caselaw on this subject,” she said.

At least one Florida court was “shocked, shocked” to find that a lawyer was refusing to pay what the other party’s attorneys said they were owed after misusing AI. The lawyer in that case, James Martin Paul, asked to pay less than a quarter of the fees and costs owed, arguing that Charlotin’s database showed he might otherwise owe penalties that “would be the largest sanctions paid out for the use of AI generative case law to date.”

But caving to Paul’s arguments “would only benefit serial hallucinators,” the Florida court found. Ultimately, Paul was sanctioned more than $85,000 for what the court said was “far more egregious” conduct than other offenders in the database, chastising him for “repeated, abusive, bad-faith conduct that cannot be recognized as legitimate legal practice and must be deterred.”

Paul did not immediately respond to Ars’ request to comment.

Michael B. Slade, a US bankruptcy judge in Illinois, seems to be done weighing excuses, calling on all lawyers to stop taking AI shortcuts that are burdening courts.

“At this point, to be blunt, any lawyer unaware that using generative AI platforms to do legal research is playing with fire is living in a cloud,” Slade wrote.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

You won’t believe the excuses lawyers have after getting busted for using AI Read More »

apple-tv-execs-dismiss-introducing-an-ad-tier,-buying-warner-bros.-discovery

Apple TV execs dismiss introducing an ad tier, buying Warner Bros. Discovery

Focused on original content

Another obvious way to grow Apple TV is through more subscribers. With talk of Warner Bros. Discovery considering a sale, it’s worth wondering if Apple TV may try to grow through acquisition. But the execs Screen International spoke with seemed focused on building out Apple TV’s library with originals. Cue noted that “at least in the timeframe that we’re thinking about right now, we’re not looking at licensing any content or adding anything to our service.”

“We’re building an all-original services; we’re not building on the back of pre-existing IP or library,” Jamie Erlicht, one of Apple’s heads of worldwide video, said.

More directly, when asked if Apple might buy Warner Bros., A24, or Disney, Cue pointed out that Apple hasn’t historically done “a lot of major acquisitions.”

“We do very small acquisitions in general, not related to Apple TV, so I don’t see that happening because we like what we’re doing,” Cue said.

Since its 2019 debut, some have questioned whether Apple TV is an authentic attempt to improve streaming options for customers, or if Apple TV is a “vanity project,” as Screen International put it, or if the service is merely a tool for getting people to buy other Apple products. Naturally, the interviewed executives claimed that the service is built on a commitment to distributing unique and premium shows and movies.

The interview provided more insight on how Apple TV leadership defines the latter. Zack Van Amburg, one of Apple’s heads of worldwide video, said:

A core tenet of everything Apple does is the notion that humanity needs to be at the center of it, and that’s everything from app design to hardware engineering, to everything in between. We try to think a little more deeply about that.

Our shows and our movies tend to be about the emotional experience, the stakes involved, even when we’re doing a comedy.

Apple TV execs dismiss introducing an ad tier, buying Warner Bros. Discovery Read More »

runaway-black-hole-mergers-may-have-built-supermassive-black-holes

Runaway black hole mergers may have built supermassive black holes

The researchers used cosmological simulations to recreate the first 700 million years of cosmic history, focusing on the formation of a single dwarf galaxy. In their virtual galaxy, waves of stars were born in short, explosive bursts as cold gas clouds collapsed inside a dark matter halo. Instead of a single starburst episode followed by a steady drizzle of star formation as Garcia expected, there were two major rounds of stellar birth. Whole swarms of stars flared to life like Christmas tree lights.

“The early Universe was an incredibly crowded place,” Garcia said. “Gas clouds were denser, stars formed faster, and in those environments, it’s natural for gravity to gather stars into these tightly bound systems.”

Those clusters started out scattered around the galaxy but fell in toward the center like water swirling down a drain. Once there, they merged to create one megacluster, called a nuclear star cluster (so named because it lies at the nucleus of the galaxy). The young galactic heart shone with the light of a million suns and may have set the stage for a supermassive black hole to form.

A simulation of the formation of the super-dense star clusters.

A seemingly simple tweak was needed to make the simulation more precise than previous ones. “Most simulations simplify things to make calculations more practical, but then you sacrifice realism,” Garcia said. “We used an improved model that allowed star formation to vary depending on local conditions rather than just go at a constant rate like with previous models.”

Using the University of Maryland’s supercomputing facility Zaratan, Garcia accomplished in six months what would have taken 12 years on a MacBook.

Some clouds converted as much as 80 percent of their gas into stars—a ferocious rate compared to the 2 percent typically seen in nearby galaxies today. The clouds sparkled to life, becoming clusters of newborn stars held together by their mutual gravity and lighting a new pathway for supermassive black holes to form extremely early in the Universe.

Chicken or egg?

Most galaxies, including our own, are anchored by a nuclear star cluster nestled around a supermassive black hole. But the connection between the two has been a bit murky—did the monster black hole form and then draw stars close, or did the cluster itself give rise to the black hole?

Runaway black hole mergers may have built supermassive black holes Read More »

the-running-man’s-final-trailer-amps-up-the-high-octane-action

The Running Man’s final trailer amps up the high-octane action

It’s shaping up to be an excellent season for Stephen King adaptations. In September, we got The Long Walk, an excellent (though harrowing) adaptation of King’s 1979 Richard Bachman novel. Last month, HBO debuted its new series IT: Welcome to Derry, which explores the mythology and origins of Pennywise the killer clown. And this Friday is the premiere of The Running Man, director Edgar Wright’s (Shaun of the Dead, Baby Driver, Last Night in Soho) take on King’s novel of the same name. So naturally Paramount has released a final trailer to lure us to the theater.

As previously reported, the 1987 action film starring Schwarzenegger was only loosely based on King’s novel, preserving the basic concept and very little else in favor of more sci-fi gadgetry and high-octane action. It was a noisy, entertaining romp—and very late ’80s—but it lacked King’s subtler satirical tone. Wright expressed interest in adapting his own version of The Running Man in 2017, and Paramount greenlit the project four years later. Wright and co-screenwriter Michael Bacall envisioned their film as less of a remake and more of a faithful adaptation of King’s original novel. (We’ll see if that faithfulness extends to the novel’s bleak ending.)

Per the official premise:

In a near-future society, The Running Man is the top-rated show on television—a deadly competition where contestants, known as Runners, must survive 30 days while being hunted by professional assassins, with every move broadcast to a bloodthirsty public and each day bringing a greater cash reward. Desperate to save his sick daughter, working-class Ben Richards (Glen Powell) is convinced by the show’s charming but ruthless producer, Dan Killian (Josh Brolin), to enter the game as a last resort. But Ben’s defiance, instincts, and grit turn him into an unexpected fan favorite—and a threat to the entire system. As ratings skyrocket, so does the danger, and Ben must outwit not just the Hunters, but a nation addicted to watching him fall.

In addition to Powell and Brolin, the cast includes Lee Pace as lead Hunter Evan McCone; Jayme Lawson as Ben’s wife, Sheila; Colman Domingo as Bobby Thompson, game show host; Michael Cera as the rebel Bradley Throckmorton; William H. Macy as a man who aids Ben; David Zayas as Richard Manuel; Emilia Jones as Amelia, a hostage civilian; Karl Glusman as a Hunter; and Katy O’Brian and Daniel Ezra as two other contestants on the show.

The Running Man’s final trailer amps up the high-octane action Read More »

here’s-how-orbital-dynamics-wizardry-helped-save-nasa’s-next-mars-mission

Here’s how orbital dynamics wizardry helped save NASA’s next Mars mission


Blue Origin is counting down to launch of its second New Glenn rocket Sunday.

The New Glenn rocket rolls to Launch Complex-36 in preparation for liftoff this weekend. Credit: Blue Origin

CAPE CANAVERAL, FloridaThe field of astrodynamics isn’t a magical discipline, but sometimes it seems trajectory analysts can pull a solution out of a hat.

That’s what it took to save NASA’s ESCAPADE mission from a lengthy delay, and possible cancellation, after its rocket wasn’t ready to send it toward Mars during its appointed launch window last year. ESCAPADE, short for Escape and Plasma Acceleration and Dynamics Explorers, consists of two identical spacecraft setting off for the red planet as soon as Sunday with a launch aboard Blue Origin’s massive New Glenn rocket.

“ESCAPADE is pursuing a very unusual trajectory in getting to Mars,” said Rob Lillis, the mission’s principal investigator from the University of California, Berkeley. “We’re launching outside the typical Hohmann transfer windows, which occur every 25 or 26 months. We are using a very flexible mission design approach where we go into a loiter orbit around Earth in order to sort of wait until Earth and Mars are lined up correctly in November of next year to go to Mars.”

This wasn’t the original plan. When it was first designed, ESCAPADE was supposed to take a direct course from Earth to Mars, a transit that typically takes six to nine months. But ESCAPADE will now depart the Earth when Mars is more than 220 million miles away, on the opposite side of the Solar System.

The payload fairing of Blue Origin’s New Glenn rocket, containing NASA’s two Mars-bound science probes. Credit: Blue Origin

The most recent Mars launch window was last year, and the next one doesn’t come until the end of 2026. The planets are not currently in alignment, and the proverbial stars didn’t align to get the ESCAPADE satellites and their New Glenn rocket to the launch pad until this weekend.

This is fine

But there are several reasons this is perfectly OK to NASA. The New Glenn rocket is overkill for this mission. The two-stage launcher could send many tons of cargo to Mars, but NASA is only asking it to dispatch about a ton of payload, comprising a pair of identical science probes designed to study how the planet’s upper atmosphere interacts with the solar wind.

But NASA got a good deal from Blue Origin. The space agency is paying Jeff Bezos’ space company about $20 million for the launch, less than it would for a dedicated launch on any other rocket capable of sending the ESCAPADE mission to Mars. In exchange, NASA is accepting a greater than usual chance of a launch failure. This is, after all, just the second flight of the 321-foot-tall (98-meter) New Glenn rocket, which hasn’t yet been certified by NASA or the US Space Force.

The ESCAPADE mission, itself, was developed with a modest budget, at least by the standards of interplanetary exploration. The mission’s total cost amounts to less than $80 million, an order of magnitude lower than all of NASA’s recent Mars missions. NASA officials would not entrust the second flight of the New Glenn rocket to launch a billion-dollar spacecraft, but the risk calculation changes as costs go down.

NASA knew all of this in 2023 when it signed a launch contract with Blue Origin for the ESCAPADE mission. What officials didn’t know was that the New Glenn rocket wouldn’t be ready to fly when ESCAPADE needed to launch in late 2024. It turned out Blue Origin didn’t launch the first New Glenn test flight until January of this year. It was a success. It took another 10 months for engineers to get the second New Glenn vehicle to the launch pad.

The twin ESCAPADE spacecraft undergoing final preparations for launch. Each spacecraft is about a half-ton fully fueled. Credit: NASA/Kim Shiflett

Aiming high

That’s where the rocket sits this weekend at Cape Canaveral Space Force Station, Florida. If all goes according to plan, New Glenn will take off Sunday afternoon during an 88-minute launch window opening at 2: 45 pm EST (19: 45 UTC). There is a 65 percent chance of favorable weather, according to Blue Origin.

Blue Origin’s launch team, led by launch director Megan Lewis, will oversee the countdown Sunday. The rocket will be filled with super-cold liquid methane and liquid oxygen propellants beginning about four-and-a-half hours prior to liftoff. After some final technical and weather checks, the terminal countdown sequence will commence at T-minus 4 minutes, culminating in ignition of the rocket’s seven BE-4 main engines at T-minus 5.6 seconds.

The rocket’s flight computer will assess the health of each of the powerful engines, combining to generate more than 3.8 million pounds of thrust. If all looks good, hold-down restraints will release to allow the New Glenn rocket to begin its ascent from Florida’s Space Coast.

Heading east, the rocket will surpass the speed of sound in a little over a minute. After soaring through the stratosphere, New Glenn will shut down its seven booster engines and shed its first stage a little more than 3 minutes into the flight. Twin BE-3U engines, burning liquid hydrogen, will ignite to finish the job of sending the ESCAPADE satellites toward deep space. The rocket’s trajectory will send the satellites toward a gravitationally-stable location beyond the Moon, called the L2 Lagrange point, where it will swing into a loosely-bound loiter orbit to wait for the right time to head for Mars.

Meanwhile, the New Glenn booster, itself measuring nearly 20 stories tall, will begin maneuvers to head toward Blue Origin’s recovery ship floating a few hundred miles downrange in the Atlantic Ocean. The final part of the descent will include a landing burn using three of the BE-4 engines, then downshifting to a single engine to control the booster’s touchdown on the landing platform, dubbed “Jacklyn” in honor of Bezos’ late mother.

The launch timeline for New Glenn’s second mission. Credit: Blue Origin

New Glenn’s inaugural launch at the start of this year was a success, but the booster’s descent did not go well. The rocket was unable to restart its engines, and it crashed into the sea.

“We’ve incorporated a number of changes to our propellant management system, some minor hardware changes as well, to increase our likelihood of landing that booster on this mission,” said Laura Maginnis, Blue Origin’s vice president of New Glenn mission management. “That was the primary schedule driver that kind of took us from from January to where we are today.”

Blue Origin officials are hopeful they can land the booster this time. The company’s optimism is enough for officials to have penciled in a reflight of this particular booster on the very next New Glenn launch, slated for the early months of next year. That launch is due to send Blue Origin’s first Blue Moon cargo lander to the Moon.

“Our No. 1 objective is to deliver ESCAPADE safely and successfully on its way to L2, and then eventually on to Mars,” Maginnis said in a press conference Saturday. “We also are planning and wanting to land our booster. If we don’t land the booster, that’s OK. We have several more vehicles in production. We’re excited to see how the mission plays out tomorrow.”

Tracing a kidney bean

ESCAPADE’s path through space, relative to the Earth, has the peculiar shape of a kidney bean. In the world of astrodynamics, this is called a staging or libration orbit. It’s a way to keep the spacecraft on a stable trajectory to wait for the opportunity to go to Mars late next year.

“ESCAPADE has identified that this is the way that we want to fly, so we launch from Earth onto this kidney bean-shaped orbit,” said Jeff Parker, a mission designer from the Colorado-based company Advanced Space. “So, we can launch on virtually any day. What happens is that kidney bean just grows and shrinks based on how much time you need to spend in that orbit. So, we traverse that kidney been and at the very end there’s a final little loop-the-loop that brings us down to Earth.”

That’s when the two ESCAPADE spacecraft, known as Blue and Gold, will pass a few hundred miles above our planet. At the right moment, on November 7 and 9 of next year, the satellites will fire their engines to set off for Mars.

An illustration of ESCAPADE’s trajectory to wait for the opportunity to go to Mars. Credit: UC-Berkeley

There are some tradeoffs with this unique staging orbit. It is riskier than the original plan of sending ESCAPADE straight to Mars. The satellites will be exposed to more radiation, and will consume more of their fuel just to get to the red planet, eating into reserves originally set aside for science observations.

The satellites were built by Rocket Lab, which designed them with extra propulsion capacity in order to accommodate launches on a variety of different rockets. In the end, NASA “judged that the risk for the mission was acceptable, but it certainly is higher risk,” said Richard French, Rocket Lab’s vice president of business development and strategy.

The upside of the tradeoff is it will demonstrate an “exciting and flexible way to get to Mars,” Lillis said. “In the future, if we’d like to send hundreds of spacecraft to Mars at once, it will be difficult to do that from just the launch pads we have on Earth within that month [of the interplanetary launch window]. We could potentially queue up spacecraft using the approach that ESCAPADE is pioneering.”

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Here’s how orbital dynamics wizardry helped save NASA’s next Mars mission Read More »