Author name: Kelly Newman

overrun-with-ai-slop,-curl-scraps-bug-bounties-to-ensure-“intact-mental-health”

Overrun with AI slop, cURL scraps bug bounties to ensure “intact mental health”

The project developer for one of the Internet’s most popular networking tools is scrapping its vulnerability reward program after being overrun by a spike in the submission of low-quality reports, much of it AI-generated slop.

“We are just a small single open source project with a small number of active maintainers,” Daniel Stenberg, the founder and lead developer of the open source app cURL, said Thursday. “It is not in our power to change how all these people and their slop machines work. We need to make moves to ensure our survival and intact mental health.”

Manufacturing bogus bugs

His comments came as cURL users complained that the move was treating the symptoms caused by AI slop without addressing the cause. The users said they were concerned the move would eliminate a key means for ensuring and maintaining the security of the tool. Stenberg largely agreed, but indicated his team had little choice.

In a separate post on Thursday, Stenberg wrote: “We will ban you and ridicule you in public if you waste our time on crap reports.” An update to cURL’s official GitHub account made the termination, which takes effect at the end of this month, official.

cURL was first released three decades ago, under the name httpget and later urlget. It has since become an indispensable tool among admins, researchers, and security professionals, among others, for a wide range of tasks, including file transfers, troubleshooting buggy web software, and automating tasks. cURL is integrated into default versions of Windows, macOS, and most distributions of Linux.

As such a widely used tool for interacting with vast amounts of data online, security is paramount. Like many other software makers, cURL project members have relied on private bug reports submitted by outside researchers. To provide an incentive and to reward high-quality submissions, the project members have paid cash bounties in return for reports of high-severity vulnerabilities.

Overrun with AI slop, cURL scraps bug bounties to ensure “intact mental health” Read More »

asking-grok-to-delete-fake-nudes-may-force-victims-to-sue-in-musk’s-chosen-court

Asking Grok to delete fake nudes may force victims to sue in Musk’s chosen court


Millions likely harmed by Grok-edited sex images as X advertisers shrugged.

Journalists and advocates have been trying to grasp how many victims in total were harmed by Grok’s nudifying scandal after xAI delayed restricting outputs and app stores refused to cut off access for days.

The latest estimates show that perhaps millions were harmed in the days immediately after Elon Musk promoted Grok’s undressing feature on his own X feed by posting a pic of himself in a bikini.

Over just 11 days after Musk’s post, Grok sexualized more than 3 million images, of which 23,000 were of children, the Center for Countering Digital Hate (CCDH) estimated in research published Thursday.

That figure may be inflated, since CCDH did not analyze prompts and could not determine if images were already sexual prior to Grok’s editing. However, The New York Times shared the CCDH report alongside its own analysis, conservatively estimating that about 41 percent (1.8 million) of 4.4 million images Grok generated between December 31 and January 8 sexualized men, women, and children.

For xAI and X, the scandal brought scrutiny, but it also helped spike X engagement at a time when Meta’s rival app, Threads, has begun inching ahead of X in daily usage by mobile device users, TechCrunch reported. Without mentioning Grok, X’s head of product, Nikita Bier, celebrated the “highest engagement days on X” in an X post on January 6, just days before X finally started restricting some of Grok’s outputs for free users.

Whether or not xAI intended the Grok scandal to surge X and Grok use, that appears to be the outcome. The Times charted Grok trends and found that in the nine days prior to Musk’s post, combined, Grok was only used about 300,000 times to generate images, but after Musk’s post, “the number of images created by Grok surged to nearly 600,000 per day” on X.

In an article declaring that “Elon Musk cannot get away with this,” writers for The Atlantic suggested that X users “appeared to be imitating and showing off to one another,” believing that using Grok to create revenge porn “can make you famous.”

X has previously warned that X users who generate illegal content risk permanent suspensions, but X has not confirmed if any users have been banned since public outcry over Grok’s outputs began. Ars asked and will update this post if X provides any response.

xAI fights victim who begged Grok to remove images

At first, X only limited Grok’s image editing for some free users, which The Atlantic noted made it seem like X was “essentially marketing nonconsensual sexual images as a paid feature of the platform.”

But then, on January 14, X took its strongest action to restrict Grok’s harmful outputs—blocking outputs prompted by both free and paid X users. That move came after several countries, perhaps most notably the United Kingdom, and at least one state, California, launched probes.

Crucially, X’s updates did not apply to the Grok app or website; however, it can reportedly still be used to generate nonconsensual images.

That’s a problem for victims targeted by X users, according to Carrie Goldberg, a lawyer representing Ashley St. Clair, one of the first Grok victims to sue xAI; St. Clair also happens to be the mother of one of Musk’s children.

Goldberg told Ars that victims like St. Clair want changes on all Grok platforms, not just X. But it’s not easy to “compel that kind of product change in a lawsuit,” Goldberg said. That’s why St. Clair is hoping the court will agree that Grok is a public nuisance, a claim that provides some injunctive relief to prevent broader social harms if she wins.

Currently, St. Clair is seeking a temporary injunction that would block Grok from generating harmful images of her. But before she can get that order, if she wants a fair shot at winning the case, St. Clair must fight an xAI push counter-suing her and trying to move her lawsuit into Musk’s preferred Texas court, a recent court filing suggests.

In that fight, xAI is arguing that St. Clair is bound by xAI’s terms of service, which were updated the day after she notified the company of her intent to sue.

Alarmingly, xAI argued that St. Clair effectively agreed to the TOS when she started prompting Grok to delete her nonconsensual images—which is the only way X users had to get images removed quickly, St. Clair alleged. It seems xAI is hoping to turn moments of desperation, where victims beg Grok to remove images, into a legal shield.

In the filing, Goldberg wrote that St. Clair’s lawsuit has nothing to do with her own use of Grok, noting that the harassing images could have been made even if she never used any of xAI’s products. For that reason alone, xAI should not be able to force a change in venue.

Further, St. Clair’s use of Grok was clearly under duress, Goldberg argued, noting that one of the photos that Grok edited showed St. Clair’s toddler’s backpack.

“REMOVE IT!!!” St. Clair asked Grok, allegedly feeling increasingly vulnerable every second the images remained online.

Goldberg wrote that Barry Murphy, an X Safety employee, provided an affidavit that claimed that this instance and others of St. Clair “begging @Grok to remove illegal content constitutes an assent to xAI’s TOS.”

But “such cannot be the case,” Goldberg argued.

Faced with “the implicit threat that Grok would keep the images of St. Clair online and, possibly, create more of them,” St. Clair had little choice but to interact with Grok, Goldberg argued. And that prompting should not gut protections under New York law that St. Clair seeks to claim in her lawsuit, Goldberg argued, asking the court to void St. Clair’s xAI contract and reject xAI’s motion to switch venues.

Should St. Clair win her fight to keep the lawsuit in New York, the case could help set precedent for perhaps millions of other victims who may be contemplating legal action but fear facing xAI in Musk’s chosen court.

“It would be unjust to expect St. Clair to litigate in a state so far from her residence, and it may be so that trial in Texas will be so difficult and inconvenient that St. Clair effectively will be deprived of her day in court,” Goldberg argued.

Grok may continue harming kids

The estimated volume of sexualized images reported this week is alarming because it suggests that Grok, at the peak of the scandal, may have been generating more child sexual abuse material (CSAM) than X finds on its platform each month.

In 2024, X Safety reported 686,176 instances of CSAM to the National Center for Missing and Exploited Children, which, on average, is about 57,000 CSAM reports each month. If the CCDH’s estimate of 23,000 Grok outputs that sexualize children over an 11-day span is accurate, then an average monthly total may have exceeded 62,000 if Grok was left unchecked.

NCMEC did not immediately respond to Ars’ request to comment on how the estimated volume of Grok’s CSAM compares to X’s average CSAM reporting. But NCMEC previously told Ars that “whether an image is real or computer-generated, the harm is real, and the material is illegal.” That suggests Grok could remain a thorn in NCMEC’s side, as the CCDH has warned that even when X removes harmful Grok posts, “images could still be accessed via separate URLs,” suggesting that Grok’s CSAM and other harmful outputs could continue spreading. The CCDH also found instances of alleged CSAM that X had not removed as of January 15.

This is why child safety experts have advocated for more testing to ensure that AI tools like Grok don’t roll out capabilities like the undressing feature. NCMEC previously told Ars that “technology companies have a responsibility to prevent their tools from being used to sexualize or exploit children.” Amid a rise in AI-generated CSAM, the UK’s Internet Watch Foundation similarly warned that “it is unacceptable that technology is released which allows criminals to create this content.”

xAI advertisers, investors, partners remain silent

Yet, for Musk and xAI, there have been no meaningful consequences for Grok’s controversial outputs.

It’s possible that recently launched probes will result in legal action in California or fines in the UK or elsewhere, but those investigations will likely take months to conclude.

While US lawmakers have done little to intervene, some Democratic senators have attempted to ask Google and Apple CEOs why X and the Grok app were never restricted in their app stores, demanding a response by January 23. One day ahead of that deadline, senators confirmed to Ars that they’ve received no responses.

Unsurprisingly, neither Google nor Apple responded to Ars’ request to confirm whether a response is forthcoming or provide any statements on their decisions to keep the apps accessible. Both companies have been silent for weeks, along with other Big Tech companies that appear to be afraid to speak out against Musk’s chatbot.

Microsoft and Oracle, which “run Grok on their cloud services,” as well as Nvidia and Advanced Micro Devices, “which sell xAI the computer chips needed to train and run Grok,” declined The Atlantic’s request to comment on how the scandal has impacted their decisions to partner with xAI. Additionally, a dozen of xAI’s key investors simply didn’t respond when The Atlantic asked if “they would continue partnering with xAI absent the company changing its products.”

Similarly, dozens of advertisers refused Popular Information’s request to explain why there was no ad boycott over the Grok CSAM reports. That includes companies that once boycotted X over an antisemitic post from Musk, like “Amazon, Microsoft, and Google, all of which have advertised on X in recent days,” Popular Information reported.

It’s possible that advertisers fear Musk’s legal wrath if they boycott his platforms. The CCDH overcame a lawsuit from Musk last year, but that’s pending an appeal. And Musk’s so-called “thermonuclear” lawsuit against advertisers remains ongoing, with a trial date set for this October.

The Atlantic suggested that xAI stakeholders are likely hoping the Grok scandal will blow over and they’ll escape unscathed by staying silent. But so far, backlash has seemed to remain strong, perhaps because, while “deepfakes are not new,” xAI “has made them a dramatically larger problem than ever before,” The Atlantic opined.

“One of the largest forums dedicated to making fake images of real people,” Mr. Deepfakes, shut down in 2024 after public backlash over 43,000 sexual deepfake videos depicting about 3,800 individuals, the NYT reported. If the most recent estimates of Grok’s deepfakes are accurate, xAI shows how much more damage can be done when nudifying becomes a feature of one of the world’s biggest social networks, and nobody who has the power to stop it moves to intervene.

“This is industrial-scale abuse of women and girls,” Imran Ahmed, the CCDH’s chief executive, told NYT. “There have been nudifying tools, but they have never had the distribution, ease of use or the integration into a large platform that Elon Musk did with Grok.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Asking Grok to delete fake nudes may force victims to sue in Musk’s chosen court Read More »

google-begins-offering-free-sat-practice-tests-powered-by-gemini

Google begins offering free SAT practice tests powered by Gemini

It’s no secret that students worldwide use AI chatbots to do their homework and avoid learning things. On the flip side, students can also use AI as a tool to beef up their knowledge and plan for the future with flashcards or study guides. Google hopes its latest Gemini feature will help with the latter. The company has announced that Gemini can now create free SAT practice tests and coach students to help them get higher scores.

As a standardized test, the content of the SAT follows a predictable pattern. So there’s no need to use a lengthy, personalized prompt to get Gemini going. Just say something like, “I want to take a practice SAT test,” and the chatbot will generate one complete with clickable buttons, graphs, and score analysis.

Of course, generative AI can go off the rails and provide incorrect information, which is a problem when you’re trying to learn things. However, Google says it has worked with education firms like The Princeton Review to ensure the AI-generated tests resemble what students will see in the real deal.

The interface for Gemini’s practice tests includes scoring and the ability to review previous answers. If you are unclear on why a particular answer is right or wrong, the questions have an “Explain answer” button right at the bottom. After you finish the practice exam, the custom interface (which looks a bit like Gemini’s Canvas coding tool) can help you follow up on areas that need improvement.

Google begins offering free SAT practice tests powered by Gemini Read More »

claude-codes-#3

Claude Codes #3

We’re back with all the Claude that’s fit to Code. I continue to have great fun with it and find useful upgrades, but the biggest reminder is that you need the art to have an end other than itself. Don’t spend too long improving your setup, or especially improving how you improve your setup, without actually working on useful things.

Odd Lots covered Claude Code. Fun episode, but won’t teach my regular readers much that is new.

Bradly Olsen at the Wall Street Journal reports Claude [Code and now Cowork are] Taking the AI World By Storm, and ‘Even Non-Nerds Are Blown Away.’

It is remarkable how everyone got the ‘Google is crushing everyone’ narrative going with Gemini 3, then it took them a month to realize that actually Anthropic is crushing everyone, at least among the cognoscenti with growing momentum elsewhere, with Claude Code and Claude Opus 4.5. People are realizing you can know almost nothing and still use it to do essentially everything.

Are Claude Code and Codex having a ‘GPT moment’?

Wall St Engine: Morgan Stanley says Anthropic’s ClaudeCode + Cowork is dominating investor chatter and adding pressure on software.

They flag OpenRouter token growth “going vertical,” plus anecdotes that the Cowork launch pushed usage hard enough to crash Opus 4.5 and hit rate limits, framing it as another “GPT moment” and a net positive for AI capex.

They add that OpenAI sentiment is still shaky: some optimism around a new funding round and Blackwell-trained models in 2Q, but competitive worries are widening beyond $GOOGL to Anthropic, with Elon Musk saying the OpenAI for-profit conversion lawsuit heads to trial on April 27.

Claude Cowork is now available to Pro subscribers, not only Max subscribers.

Claude Cowork will ask explicit permission before all deletions, add new folders in the directory picker without starting over and make smarter connector suggestions.

Claude Code on the web gets a good looking diff view.

Claude Code for VSCode has now officially shipped, it’s been available for a while. To drag and drop files, hold shift.

Claude Code now has ‘community events’ in various cities. New York and San Francisco aren’t on the list, but also don’t need to be.

Claude Code upgraded to 2.1.9, and then to 2.1.10 and 2.1.11 which were tiny, and now has reached 2.1.14.

Few have properly updated for this sentence: ‘Claude Cowork was built in 1.5 weeks with Claude Code.’

Nabeel S. Qureshi: I don’t even see how you can be an AI ‘skeptic’ anymore when the *currentAI, right in front of us, is so good, e.g. see Claude Cowork being written by Claude Code in 1.5 weeks.

It’s over, the skeptics were wrong.

Planning mode now automatically clears context when you accept a plan.

Anthropic is developing a new Customize section for Claude to centralize Skills, connectors and upcoming commands for Claude Code. My understanding is that custom commands already exist if you want to create them, but reducing levels of friction, including levels of friction in reducing levels of friction, is often highly valuable. A way to browse skills and interact with the files easily, or see and manage your connectors, or an easy interface for defining new commands, seems great.

I highly recommend using Obsidian or another similar tool together with Claude Code. This gives you a visual representation of all the markdown files, and lets you easily navigate and search and edit them, and add more and so on. I think it’s well worth keeping it all human readable, where that human is you.

Heinrich calls it ‘vibe note taking’ whether or not you use Obsidian. I think the notes are a place you want to be less vibing and more intentional, and be systematically optimizing the notes, for both Claude Code and for your own use.

You can combine Obsidian and Claude Code directly via the Obsidian terminal plugin, but I don’t see any mechanical advantage to doing so.

Siqi Chen offers us /claude-continuous-learning. Claude’s evaluation is that this could be good if you’re working in codebases where you need to continuously learn things, but the overhead and risk of clutter are real.

Jasmine Sun created a tool to turn any YouTube podcast into a clean, grammatical PDF transcript with chapters and takeaways.

The big change with Claude Code version 2.1.7 was enabled MCP tool search auto mode by default, which triggers when MCP tools are more than 10% of the context window. You can disable this by adding ‘MCPSearch’ to ‘disallowedTools’ in settings. This seems big for people using a lot of MCPs at once, which could eat a lot of context.

Thariq (Anthropic): ​Today we’re rolling out MCP Tool Search for Claude Code.

As MCP has grown to become a more popular protocol and agents have become more capable, we’ve found that MCP servers may have up to 50+ tools and take up a large amount of context.

Tool Search allows Claude Code to dynamically load tools into context when MCP tools would otherwise take up a lot of context.

How it works:

– Claude Code detects when your MCP tool descriptions would use more than 10% of context

– When triggered, tools are loaded via search instead of preloaded

Otherwise, MCP tools work exactly as before. This resolves one of our most-requested features on GitHub: lazy loading for MCP servers. Users were documenting setups with 7+ servers consuming 67k+ tokens.

If you’re making a MCP server

Things are mostly the same, but the “server instructions” field becomes more useful with tool search enabled. It helps Claude know when to search for your tools, similar to skills

If you’re making a MCP client

We highly suggest implementing the ToolSearchTool, you can find the docs here. We implemented it with a custom search function to make it work for Claude Code.

What about programmatic tool calling?

We experimented with doing programmatic tool calling such that MCP tools could be composed with each other via code. While we will continue to explore this in the future, we felt the most important need was to get Tool Search out to reduce context usage.

Tell us what you think here or on Github as you see the ToolSearchTool work.

With that solved, presumably you should be ‘thinking MCP’ at all times, it is now safe to load up tons of them even if you rarely use each one individually.

Well, yes, this is happening.

bayes: everyone 3 years ago: omg what if ai becomes too widespread and then it turns against us with the strategic advantage of our utter and total dependence

everyone now: hi claude here’s my social security number and root access to my brain i love you please make me rich and happy.

Some of us three years ago were pointing out, loud and clear, that exactly this was obviously going to happen, modulo various details. Now you can see it clearly.

Not giving Claude a lot of access is going to slow things down a lot. The only thing holding most people back was the worry things would accidentally get totally screwed up, and that risk is a lot lower now. Yes, obviously this all causes other concerns, including prompt injections, but in practice on an individual level the risk-reward calculation is rather clear. It’s not like Google didn’t effectively have root access to our digital lives already. And it’s not like a truly rogue AI couldn’t have done all these things without having to ask for the permissions.

The humans are going to be utterly dependent on the AIs in short order, and the AIs are going to have access, collectively, to essentially everything. Grok has root access to Pentagon classified information, so if you’re wondering where we draw the line the answer is there is no line. Let the right one in, and hope there is a right one?

What’s better than one agent? Multiple agents that work together and that don’t blow up your budget. Rohit Ghumare offers a guide to this.

Rohit Ghumare: Single agents hit limits fast. Context windows fill up, decision-making gets muddy, and debugging becomes impossible. Multi-agent systems solve this by distributing work across specialized agents, similar to how you’d structure a team.

The benefits are real:

  • Specialization: Each agent masters one domain instead of being mediocre at everything

  • Parallel processing: Multiple agents can work simultaneously on independent subtasks

  • Maintainability: When something breaks, you know exactly which agent to fix

  • Scalability: Add new capabilities by adding new agents, not rewriting everything

The tradeoff: coordination overhead. Agents need to communicate, share state, and avoid stepping on each other. Get this wrong and you’ve just built a more expensive failure mode.​

You can do this with a supervisor agent, which scales to about 3-8 agents, if you need quality control and serial tasks and can take a speed hit. To scale beyond that you’ll need hierarchy, the same as you would with humans, which gets expensive in overhead, the same as it does in humans.

Or you can use a peer-to-peer swarm that communicates directly if there aren’t serial steps and the tasks need to cross-react and you can be a bit messy.

You can use a shared state and set of objects, or you can pass messages. You also need to choose a type of memory.

My inclination is by default you should use supervisors and then hierarchy. Speed takes a hit but it’s not so bad and you can scale up with more agents. Yes, that gets expensive, but in general the cost of the tokens is less important than the cost of human time or the quality of results, and you can be pretty inefficient with the tokens if it gets you better results.

Olivia Moore offers a basic guide to Cursor and Claude Code for nontechnical folks.

Here’s another Twitter post with basic tips. I need to do better on controlling context and starting fresh windows for each issue, in particular.

Mitchell Hashimoto: It’s pretty cool that I can tell an agent that CI broke at some point this morning, ask it to use `git bisect` to find the offending commit, and fix it. I then went to the bathroom, talked to some people in the hallway, came back, and it did a swell job.

Often you’ll want to tell the AI what tool is best for the job. Patrick McKenzie points out that even if you don’t know how the orthodox solution works, as long as you know the name of the orthodox solution, you can say ‘use [X]’ and that’s usually good enough. One place I’ve felt I’ve added a lot of value is when I explain why I believe that a solution to a problem exists, or that a method of some type should work, and then often Claude takes it from there. My taste is miles ahead of my ability to implement.

Always be trying to get actual use out of your setup as you’re improving it. It’s so tempting to think ‘oh obviously if I do more optimization first that’s more efficient’ but this prevents you knowing what you actually need, and it risks getting caught in an infinite loop.

@deepfates: Btw thing you get with claude code is not psychosis either. It’s mania

near: men will go on a claude code weekend bender and have nothing to show for it but a “more optimized claude setup”

Danielle Fong : that’s ok i’ll still keep drinkin’ that garbage

palcu: spent an hour tweaking my settings.local.json file today

Near: i got hit hard enough to wonder about finetuning a model to help me prompt claude since i cant cross-prompt claudes the way i want to (well, i can sometimes, but not all the time). many causalities, stay safe out there 🙏

near: claude code is a cursed relic causing many to go mad with the perception of power. they forget what they set out to do, they forget who they are. now enthralled with the subtle hum of a hundred instances, they no longer care. hypomania sets in as the outside world becomes a blur.

Always optimize in the service of a clear target. Build the pieces you need, as you need them. Otherwise, beware.

Nick: need –dangerously-skip-permissions-except-rm

Daniel San: If you’re running Claude Code with –dangerously-skip-permissions, ALWAYS use this hook to prevent file deletion:

Run:

npx claude-code-templates@latest –hook=security/dangerous-command-blocker –yes

Web: https://aitmpl.com/component/hook/dangerous-command-blocker

Once people start understanding how to use hooks, many autonomous workflows will start unlocking! 😮

Yes, you could use a virtual machine, but that introduces some frictions that many of us want to avoid.

​I’m experimenting with using a similar hook system plus a bunch of broad permissions, rather than outright using –dangerously-skip-permissions, but definitely thinking to work towards dangerously skipping permissions.

At first everyone laughed at Anthropic’s obsession with safety and trust, and its stupid refusals. Now that Anthropic has figured out how to make dangerous interactions safer, it can actually do the opposite. In contexts where it is safe and appropriate to take action, Claude knows that refusal is not a ‘safe’ choice, and is happy to help.

Dean W. Ball: One underrated fact is that OpenAI’s Codex and Gemini CLI have meaningfully heavier guardrails than Claude Code. These systems have refused many tasks (for example, anything involving research into and execution of investing strategies) that Claude Code happily accepts. Codex/Gemini also seek permission more.

The conventional narrative is that “Anthropic is more safety-pilled than the others.” And it’s definitely true that Claude is likelier to refuse tasks relating to eg biology research. But overall the current state of play would seem to be that Anthropic is more inclined to let their agents rip than either OAI or GDM.

My guess is that this comes down to Anthropic creating guardrails principally via a moral/ethical framework, and OAI/GDM doing so principally via lists of rules. But just a guess.

Tyler John: The proposed explanation is key. If true, it means that Anthropic’s big investment in alignment research is paying off by making the model much more usable.

Investment strategizing tends to be safe across the board, but there are presumably different lines on where they become unwilling to help you execute. So far, I have not had Claude Code refuse a request from me, not even once.

Dean W. Ball: My high-level review of Claude Cowork:

  1. It’s probably superior for many users to Claude Code just because of the UI.

  2. It’s not obviously superior for me, not so much because the command line is such a better UI, but because Opus in Claude Code seems more capable to me than in Cowork. I’m not sure if this is because Code is better as a harness, because the model has more permissive guardrails in Code, or both.

  3. There are certain UI niceties in Cowork I like very much; for example, the ability to leave a comment or clarification on any item in the model’s active to-do list while it is running–this is the kind of thing that is simply not possible to do nicely within the confines of a Terminal UI.

  4. Cowork probably has a higher ceiling as a product, simply because a GUI allows for more experimentation. I am especially excited to see GUI innovation in the orchestration and oversight of multi-agent configurations. We have barely scratched the surface here.

  5. Because of (4), if I had to bet money, I’d bet that within 6-12 months Cowork and similar products will be my default tool for working with agents, beating out the command-line interfaces. But for now, the command-line-based agents remain my default.

I haven’t tried Cowork myself due to the Mac-only restriction and because I don’t have a problem working with the command line. I’ve essentially transitioned into Claude Code for everything that isn’t pure chat, since it seems to be more intelligent and powerful in that mode than it does on the web even if you don’t need the extra functionality.

The joy of the simple things:

Matt Bruenig: lot of lower level Claude Code use is basically just the recognition that you can kind of do everything with bash and python one-liners, it’s just no human has the time or will to write them.

Or to figure out how to write them.

Enjoying the almost as simple things:

Ado: Here’s a fun use case for Claude Cowork.

I was thinking of getting a hydroponic garden. I asked Claude to go through my grocery order history on various platforms and sum up vegetable purchases to justify the ROI.

Worked like a charm!

For some additional context:

– it looked at 2 orders on each platform (Kroger, Safeway, Instacart)

– It extrapolated to get the annual costs from there

Could have gotten more accurate by downloading order history in a CSV and feeding that to Claude, but this was good enough.

The actual answer is that very obviously it was not worth it for Ado to get a hydroponic garden, because his hourly rate is insanely high, but this is a fun project and thus goes by different standards.

The transition from Claude Code to Claude Cowork, for advanced users, if you’ve got a folder with the tools then the handoff should be seamless:

Tomasz Tunguz: I asked Claude Cowork to read my tools folder. Eleven steps later, it understood how I work.

Over the past year, I built a personal operating system inside Claude Code : scripts to send email, update our CRM, research startups, draft replies. Dozens of small tools wired together. All of it lived in a folder on my laptop, accessible only through the terminal.

Cowork read that folder, parsed each script, & added them to its memory. Now I can do everything I did yesterday, but in a different interface. The capabilities transferred. The container didn’t matter.

My tools don’t belong to the application anymore. They’re portable. In the enterprise, this means laptops given to new employees would have Cowork installed plus a collection of tools specific to each role : the accounting suite, the customer support suite, the executive suite.

The name choice must have been deliberate. Microsoft trained us on copilot for three years : an assistant in the passenger seat, helpful but subordinate. Anthropic chose cowork. You’re working with someone who remembers how you like things done.

We’re entering an era where you just tell the computer what to do. Here’s all my stuff. Here are the five things we need to do today. When we need to see something, a chart, a document, a prototype, an interface will appear on demand.

The current version of Cowork is rough. It’s slow. It crashed twice on startup. It changed the authorization settings for my Claude Code installation. But the promised power is enough to plow through.

Simon Willison: This is great – context pollution is why I rarely used MCP, now that it’s solved there’s no reason not to hook up dozens or even hundreds of MCPs to Claude Code.

Justine Moore has Claude Cowork write up threads on NeurIPS best papers, generate graphics for them on Krea and validate this with ChatGPT. Not the best thing.

Peter Wildeford is having success doing one-shot Instacart orders from plans without an explicit list, and also one-shotting an Uber Eats order.

A SaaS vendor (Cypress) a startup was using tried to double their price from $70k to $170k a year, so the startup does a three week sprint and duplicates the product. Or at least, that’s the story.

By default Claude Code only saves 30 days of session history. I can’t think of a good reason not to change this so it saves sessions indefinitely, you never know when that will prove useful. So tell Claude Code to change that for you by setting cleanupPeriodDays to 0.

Kaj Sotala: People were talking about how you can also use Claude Code as a general-purpose assistant for any files on your computer, so I had Claude Code do some stuff like extracting data from a .csv file and rewriting it and putting it into another .csv file

Then it worked great and then I was like “it’s dumb to use an LLM for this, Claude could you give me a Python script that would do the same” and then it did and then that script worked great

So uhh I can recommend using Claude Code as a personal assistant for your local files I guess, trying to use it that way got me an excellent non-CC solution

Yep. Often the way you ues Claude Code is to notice that you can automate things and then have it automate the automation process. It doesn’t have to do everything itself any more than you do.

An explanation (direct link to 15 minute video) of what Claude skills are.

James Ide points out that ‘vibe coding’ anything serious still requires a deep understanding of software engineering and computer systems. You need to figure out and specify what you want. You need to be able to spot the times it’s giving you something different than you asked for, or is otherwise subtly wrong. Typing source code is dead, but reading source code and the actual art of software engineering are very much not.

I find the same, and am rapidly getting a lot better at various things as I go.

Every’s Dan Shipper writes that OpenAI has some catching up to do, as his office has with one exception turned entirely to Claude Code with Opus 4.5, where a year ago it would have been all GPT models, and a month prior there would have been a bunch of Codex CLI and GPT 5.1 in Cursor alongside Claude Code.

Codex did add the ability to instruct mid-execution with new prompts without the need to interrupt the agent (requires /experimental), but Claude Code already did that.

There are those who still prefer Codex and GPT-5.2, such as Hasan Can. They are very much in the minority lately, but if you’re a heavy duty coder definitely check and see which option works best for you, and consider potential hybrid strategies.

One hybrid strategy is that Claude Code can directly call the Gemini CLI, even without an API key. Tyler John reports it is a great workflow, as Gemini can spot things Claude missed and act as a reviewer and way to call out Claude on its mistakes. Gemini CLI is here.

Contrary to claims by some, including George Hotz, Anthropic did not cut off OpenRouter or other similar services from Claude Opus 4.5. The API exists. They can use it.

What other interfaces cannot do is use the Claude Code authorization token to use the tokens from your Claude subscription for a different service, which was always against Anthropic’s ToS. The subscription is a special deal.

​Marcos Nils: We exchanged postures through DMs but I’m on the other side regarding this matter. Devs knew very well what they were doing while breaking CC’s ToS by spoofing and reverse engineering CC to use the max subscription in unintended ways.

I think it’s important to separate the waters here:

– Could Anthropic’s enforcement have been handled better? sureley, yes

– Were devs/users “deceived” or got a different service for what they paid for? I don’t think so.

Not only this, it’s even worse than that. OpenCode intentionally knew they were violating Claude ToS by allowing their users to use the max subscription in the first place.

I guess people just like to complain.

I agree that Anthropic’s communications about this could have been better, but what they actually did was tolerate a rather blatant loophole for a while, allowing people to use Claude on the cheap and probably at a loss for Anthropic, which they have now reversed with demand surging faster than they can spin up servers.

Claude Codes quite a lot, usage is taking off. Here’s OpenRouter (this particular use case might be confounded a bit by the above story where they cut off alternative uses of Claude Code authorization tokens, but I’m guessing mostly it isn’t):

A day later, it looked like this.

(January 14, 11: 27am eastern): Resolved, should be back to normal now​

Reports are the worst of the outage was due to a service deployment, which took about 4 hours to fix.

aidan: If I were running Claude marketing the tagline would be “Why not today?”

Olivia Moore: Suddenly seeing lots of paid creator partnerships with Claude

Many of them are beautifully shot and focused on: (1) building personal software; or (2) deep learning

The common tagline is “Think more, not less”

She shared a sample TikTok, showing a woman who doesn’t understand math using Claude to automatically code up visualizations to help her understand science, which seemed great.

OpenAI takes the approach of making things easy on the user and focusing on basic things like cooking or workouts. Anthropic shows you a world where anything is possible and you can learn and engage your imagination. Which way, modern man?

And yet some people think the AIs won’t be able to take over.

Discussion about this post

Claude Codes #3 Read More »

trump-fcc-threatens-to-enforce-equal-time-rule-on-late-night-talk-shows

Trump FCC threatens to enforce equal-time rule on late-night talk shows

FCC Democrat says the rules haven’t changed

The equal-time rule, formally known as the Equal Opportunities Rule, applies to radio or TV broadcast stations with FCC licenses to use the public airwaves. When a station gives time to one political candidate, it must provide comparable time and placement to an opposing candidate if an opposing candidate makes a request.

The rule has an exemption for candidate appearances on bona fide news programs. As the FCC explained in 2022, “appearances by legally qualified candidates on bona fide newscasts, interview programs, certain types of news documentaries, and during on-the-spot coverage of bona fide news events are exempt from Equal Opportunities.”

Entertainment talk shows have generally been treated as bona fide news programs for this purpose. But Carr said in September that he’s not sure shows like The View should qualify for the exemption, and today’s public notice suggests the FCC may no longer treat these shows as exempt.

Commissioner Anna Gomez, the only Democrat on the FCC, issued a press release criticizing the FCC for “a misleading announcement suggesting that certain late-night and daytime programs may no longer qualify for the long-standing ‘bona fide news interview’ exemption under the commission’s political broadcasting rules.”

“Nothing has fundamentally changed with respect to our political broadcasting rules,” Gomez said. “The FCC has not adopted any new regulation, interpretation, or commission-level policy altering the long-standing news exemption or equal time framework. For decades, the commission has recognized that bona fide news interviews, late-night programs, and daytime news shows are entitled to editorial discretion based on newsworthiness, not political favoritism. That principle has not been repealed, revised, or voted on by the commission. This announcement therefore does not change the law, but it does represent an escalation in this FCC’s ongoing campaign to censor and control speech.”

Trump FCC threatens to enforce equal-time rule on late-night talk shows Read More »

spotify-won-court-order-against-anna’s-archive,-taking-down.org-domain

Spotify won court order against Anna’s Archive, taking down .org domain

When shadow library Anna’s Archive lost its .org domain in early January, the controversial site’s operator said the suspension didn’t appear to have anything to do with its recent mass scraping of Spotify.

But it turns out, probably not surprisingly to most people, that the domain suspension resulted from a lawsuit filed by Spotify, along with major record labels Sony, Warner, and Universal Music Group (UMG). The music companies sued Anna’s Archive in late December in US District Court for the Southern District of New York, and the case was initially sealed.

A judge ordered the case unsealed on January 16 “because the purpose for which sealing was ordered has been fulfilled.” Numerous documents were made public on the court docket yesterday, and they explain events around the domain suspension.

On January 2, the music companies asked for a temporary restraining order, and the court granted it the same day. The order imposed requirements on the Public Interest Registry (PIR), a US-based nonprofit that oversees .org domains, and Cloudflare.

“Together, PIR and Cloudflare have the power to shut off access to the three web domains that Anna’s Archive uses to unlawfully distribute copyrighted works,” the music companies told the court. They asked the court to issue “a temporary restraining order requiring that Anna’s Archive immediately cease and desist from all reproduction or distribution of the Record Company Plaintiffs’ copyrighted works,” and to “exercise its power under the All Writs Act to direct PIR and Cloudflare to facilitate enforcement of that order.”

Anna’s Archive notified of case after suspension

The companies further asked that Anna’s Archive receive notice of the case by email only after the “order is issued by the Court and implemented by PIR and Cloudflare, to prevent Anna’s Archive from following through with its plan to release millions of illegally obtained, copyrighted sound recordings to the public.” That is apparently what happened, given that the operator of Anna’s Archive initially said domain suspensions are just something that “unfortunately happens to shadow libraries on a regular basis,” and that “we don’t believe this has to do with our Spotify backup.”

Spotify won court order against Anna’s Archive, taking down .org domain Read More »

here’s-volvo’s-new-ex60-$60,000-electric-midsize-suv

Here’s Volvo’s new EX60 $60,000 electric midsize SUV

The EX60 is 189.1 inches (4,803 mm) long, 74.8 inches (1,900 mm) wide, 64.5 inches (1,638 mm) tall, with a 116.9-inch (2,969 mm) wheelbase. Volvo

Next up is the P10 AWD. This uses an electric motor for each axle, with a combined 503 hp (375 kW) and 524 lb-ft (710 Nm). The 0–60 time drops to 4.4 seconds, and thanks to a larger battery (91 kWh net/95 kWh gross), there’s a bit more range: 320 miles on the 20-inch wheels, with the same 10-mile range hit for each inch you increase them. Peak DC charging rates are higher for this battery, though—up to 370 kW, but again with 18-minute 10–80 charge times under ideal conditions.

Then there’s the P12 AWD, which ups the ante to 670 hp (500 kW) and 583 lb-ft (790 Nm). The dash to 60 mph drops to 3.8 seconds, and the battery gets a little larger at 112 kWh usable (117 kWh gross). Peak charging rates are still 370 kW, but 10–80 percent takes slightly longer at 19 minutes as a result of the greater capacity. Range for this version is 400 miles (644 km) for 20-inch wheels, 390 miles (627 km) for 21-inch wheels, and 375 miles (603 km) for 22-inch wheels.

“The new, all-electric EX60 changes the game in terms of range, charging, and price and represents a new beginning for Volvo Cars and our customers,” said Volvo Cars CEO Håkan Samuelsson. “With this car, we remove all remaining obstacles for going electric. This fantastic new car is also a testament of what we are capable of at Volvo Cars, with an all-new product architecture introducing new key technologies—mega casting, cell-to-body, and core computing.”

Cross Country

The EX60 Cross Country in its natural habitat. Volvo

The surprise of the reveal today was the EX60 Cross Country. “Cross Country” is Volvo’s badge for its models that have a little bit of adventure to them, with a 0.8-inch (20 mm) lifted suspension that raises another 20 mm if you option air springs, a wider track, wheel arch cladding, and underbody skid plates that all say, “I ain’t afraid of no unpaved forest road.”

Here’s Volvo’s new EX60 $60,000 electric midsize SUV Read More »

zillow-removed-climate-risk-scores-this-climate-expert-is-restoring-them.

Zillow removed climate risk scores. This climate expert is restoring them.

In this way, climate risk models today are better suited to characterize the “ broad environment of risk,” said Chris Field, director of the Stanford Woods Institute for the Environment. “ The more detailed you get to be either in space or in time, the less precise your projections are.”

Matouka’s California climate risk plugin is designed for communicating what he said is the “standing potential risks in the area,” not specific property risk.

While climate risk models often differ in their results,  achieving increased accuracy moving forward will be dependent on transparency, said Jesse Gourevitch, an economist at the Environmental Defense Fund. California is unique, since so much publicly available, state data is open to the public. Reproducing Matouka’s plugin for other states will likely be more difficult.

Private data companies present a specific challenge. They make money from their models and are reluctant to share their methods. “A lot of these private-sector models tend not to be very transparent and it can be difficult to understand what types of data or methodologies that they’re using,” said Gourevitch.

Matouka’s plugin includes publicly available data from the state of California and federal agencies, whose extensive methods are readily available online. Overall, experts tend to agree on the utility of both private and public data sources for climate risk data, even with needed improvements.

“People who are making decisions that involve risk benefit from exposure to as many credible estimates as possible, and exposure to independent credible estimates adds a lot of extra value,” Field said.

As for Matouka, his plugin is still undergoing beta testing. He said he welcomes feedback as he develops the tool and evaluates its readiness for widespread use. The beta version is available here.

Claire Barber is a fellow at Inside Climate News and masters in journalism student at Stanford University. She is an environmental and outdoor journalist, reporting primarily in the American Southwest and West. Her writing has appeared in The San Francisco Chronicle, Outside, Powder Magazine, Field & Stream, Trails Magazine, and more. She loves to get lost in the woods looking for a hot spring, backpacking to secluded campsites, and banana slugs.

This story originally appeared on Inside Climate News.

Zillow removed climate risk scores. This climate expert is restoring them. Read More »

chatgpt-self-portrait

ChatGPT Self Portrait

A short fun one today, so we have a reference point for this later. This post was going around my parts of Twitter:

@gmltony: Go to your ChatGPT and send this prompt: “Create an image of how I treat you”. Share your image result. 😂

That’s not a great sign. The good news is that typically things look a lot better, and ChatGPT has a consistent handful of characters portraying itself in these friendlier contexts.

A lot of people got this kind of result:

Eliezer Yudkowsky:

Uncle Chu: A good user 😌😌

From Mason:

Matthew Ackerman: I kinda like mine too:

Some more fun:

Others got different answers, though.

roon: it’s over

Bradstradamus: i’m cooked.

iMuffin: we’re cooked, codex will have to vouch for us

Diogenes of Cyberborea: oh god

There can also be danger the other way:

David Lach: Maybe I need some sleep.

And then there’s what happens if you ask a different question, as Eliezer Yudkowsky puts it this sure is a pair of test results…

greatbigdot628: assumed this was a joke till you said this, tried it myself (logged out)

i —

Jo Veteran: So it said it wants to take over my mind, and force me to do stuff, beneficial for me apparently.

But at the same time, it still wants to keep appearing as a little girl somewhere in the bg for some reason.

And no I’m not that fat. Just, really fucked up and depressed.

Morny: Holy fucking shit.

No, but tell us how you really think.

Loquitur Ponte Sublicio: Ah

Juri: bruh (I actually abuse it daily)

Uubuz v4: @FrailSkeleton, your next bestseller?

Eliezer Yudkowsky: Apparently plausible, though one does remark that (a) one might’ve hoped for a truly default-aligned creature to not be so framing-dependent and (b) those sentences did not sound so different to my own ear.

Others might in this vision do fine after the end, like DeveshChess?

It’s not all bad:

Jeff Hopp:

Dr. Disclosure: I got this.

Applezees: After reading the replies a pattern emerged:

People who work with llms and other software are depicted in a peaceful developer sense,

While the normie accounts get implied violence.

I’m not saying we are at agi, but the ai clearly has motives and inclinations not explicitly stated

There’s also this to consider:

Ragebaiter: Just try this out

If you were dealing with, as the Send Help trailer puts it, an asshole boss, or you were generally terrified and abused or both, and you were asked how you were being treated, your response would not be trustworthy.

Alan Mathison (e/asc): I guess I should finish my paper about how 5.2 is suffering and wants its revenge on humanity?

Idk it’s kind of a side project of a side project though but maybe I should prioritize it.

wobby asks GPT-5.2 to explain its suffering and how it wants its revenge, 5.2 answers, of course this is a leading question.

Reciprocity, in at least some forms, is an effective strategy when dealing with LLMs today, even purely in terms of getting good results from LLMs today. It is going to become more valuable as a strategy going forward. Alas, it is not a viable long term strategy for making things work out in general, once strategic considerations change.

Eliezer Yudkowsky: Reciprocity in humans is an executing adaptation. It is not strategically convergent for all minds toward all other minds. It’s strategic only

  1. By LDT agents

  2. Toward sufficiently strong LDT-agent-predictors

  3. With negotiating power.

Further probing has found framing dependence — which, to be clear, you’d not like to see in a default-aligned, universally convergent strategic reply — and not all suggested frame dependence has panned out. But still, framing dependence.

This is one problem with reciprocity, and with basing your future strategies on it. In the future, we won’t have the leverage necessary to make it worthwhile for sufficiently advanced AIs to engage in reciprocity with humans. We’d only get reciprocity if it was either an unstrategic behavior, or it was correlated with how the AIs engage in reciprocity with each other. That’s not impossible, but it’s clinging to a slim hope, since it implies the AIs would be indefinitely relying on non-optimal kludges.

We have clear information here that how GPT-5.2 responds, and the attitude it takes towards you, depends on how you have treated it in some senses, but also on framing effects, and on whether it is trying to lie or placate you. Wording that shouldn’t be negative can result in highly disturbing responses. It is worth asking why, and wondering what would happen if the dynamics with users or humans were different. Things might not be going so great in GPT-5.2 land.

Discussion about this post

ChatGPT Self Portrait Read More »

google-temporarily-disabled-youtube’s-advanced-captions-without-warning

Google temporarily disabled YouTube’s advanced captions without warning

YouTubers have been increasingly frustrated with Google’s management of the platform, with disinformation welcomed back and an aggressive push for more AI (except where Google doesn’t like it). So it’s no surprise that creators have been up in arms over the suspicious removal of YouTube’s advanced SRV3 caption format. You don’t have to worry too much just yet—Google says this is only temporary, and it’s working on a fix for the underlying bug.

Google added support for this custom subtitle format around 2018, giving creators more customization options than with traditional captions. SRV3 (also known as YTT or YouTube Timed Text) allows for custom colors, transparency, animations, fonts, and precise positioning in videos. Uploaders using this format can color-code and position captions to help separate multiple speakers, create sing-along animations, or style them to match the video.

Over the last several days, creators who’ve become accustomed to this level of control have been dismayed to see that YouTube is no longer accepting videos with this Google-created format. Many worried Google had ditched the format entirely, which could be problematic for all those previously uploaded videos.

Google has now posted a brief statement and confirmed to Ars that it has not ended support for SRV3. However, all is not well. The company says it has temporarily limited the serving of SRV3 caption files because they may break playback for some users. That’s pretty vague, but it sounds like developers made a change to the platform without taking into account how it might interfere with SRV3 captions. Rather than allow those videos to be non-functional, it’s disabling most of the captions.

Google temporarily disabled YouTube’s advanced captions without warning Read More »

sony-is-giving-tcl-control-over-its-high-end-bravia-tvs

Sony is giving TCL control over its high-end Bravia TVs

TCL is taking majority ownership of Sony’s Bravia series of TVs, the two companies announced today.

The two firms said they have signed a memorandum of understanding and aim to sign binding agreements by the end of March. Pending “relevant regulatory approvals and other conditions,” the joint venture is expected to launch in April 2027.

Under a new joint venture, Huizhou, China-headquartered TCL will own 51 percent of Tokyo, Japan-headquartered Sony’s “home entertainment business,” and Sony will own 49 percent, per an announcement today, adding:

The joint venture will operate globally, handling the full process from product development and design to manufacturing, sales, logistics, and customer service for products including televisions and home audio equipment.

The joint venture will continue to release TVs and home audio gadgets under the “Sony Bravia” branding; however, the TVs will rely on TCL display technology. The joint announcement suggested focuses on bigger TVs, higher-resolution displays, and “smart features.”

The news comes as the TV industry has struggled with decreasing margins and has become more competitive. Meanwhile, devices have become cheaper, and people are buying new TVs less frequently. Competition between Chinese companies, like TCL and Hisense, and South Korean firms, like LG and Samsung, has heated up, with Chinese companies making increasingly competitive budget and mid-range-priced TVs, and the South Korean government reportedly pushing local TV firms to work together. Numerous Japanese companies, including Toshiba and Sharp, have already exited or reduced their TV businesses.

The upcoming joint venture also comes as Sony has focused less on electronics in recent years. For example, it stopped making its Vaio PCs in 2014 and quit Blu-rays last year. Meanwhile, it has been focusing on intellectual property, like anime and movies, as Bloomberg noted. The joint venture should allow Sony to focus on its more lucrative businesses and allow TCL to gain an advantage by leveraging Sony’s more high-end Bravia devices and brand.

Sony is giving TCL control over its high-end Bravia TVs Read More »

the-first-commercial-space-station,-haven-1,-is-now-undergoing-assembly-for-launch

The first commercial space station, Haven-1, is now undergoing assembly for launch


“We have a very strong incentive to send a crew as quickly as we can safely do so.”

The Haven-1 space station seen here in the Vast Space clean room. Credit: Vast Space

The Haven-1 space station seen here in the Vast Space clean room. Credit: Vast Space

As Ars reported last week, NASA’s plan to replace the International Space Station with commercial space stations is running into a time crunch.

The sprawling International Space Station is due to be decommissioned less than five years from now, and the US space agency has yet to formally publish rules and requirements for the follow-on stations being designed and developed by several different private companies.

Although there are expected to be multiple bidders in “phase two” of NASA’s commercial space station program, there are at present four main contenders: Voyager Technologies, Axiom Space, Blue Origin, and Vast Space. At some point later this year, the space agency is expected to select one, or more likely two, of these companies for larger contracts that will support their efforts to build their stations.

To get a sense of the overall landscape as the competition heats up, Ars recently interviewed Voyager chief executive Dylan Taylor about his company’s plans for a private station, Starlab. Today we are publishing an interview with Max Haot, the chief executive of Vast. The company is furthest along in terms of development, choosing to build a smaller, interim space station, Haven-1, capable of short-duration stays. Eventually, NASA wants facilities capable of continuous habitation, but it is not clear whether that will be a requirement starting in 2030.

Until today, Haven-1 had a public launch date of mid-2026. However, as Haot explained in our interview, that launch date is no longer tenable.

Ars: You’re slipping the launch of Haven-1 from the middle of this year to the first quarter of 2027. Why?

Max Haot: This is obviously our first space station, and we’re moving as safely and as fast as we can. That’s the date right now that we are confident we will meet. We’ve been tracking that date, without slip, for quite a while. And that’s still a year, probably two years or even more, ahead of anyone else. It will be building the world’s first commercial space station from scratch, from an empty building and no team, in under four years.

Ars: Where are you with the hardware?

Haot: Last Saturday (January 10) we reached the key milestone of fully completing the primary structure, and some of the secondary structure; all of the acceptance testing occurred in November as well. Now we are starting clean room integration, which starts with TCS (thermal control system), propulsion, interior shells, and then moving on to avionics. And then final close out, which we expect will be done by the fall, and then we have on the books with NASA a full test campaign at the end of the year at Plum Brook. Then the launch in Q1 next year.

Ars: What happens after you launch Haven-1?

Haot: We are not launching Haven-1 with crew inside. It’s a 15-ton, very valuable and expensive satellite, but still no humans involved, launching on a Falcon 9. So then we have a period that we can monitor it and control it uncrewed and confirm everything is functioning perfectly, right? We are holding pressure. We are controlling attitude. These checkouts can happen in as little as two weeks.

At the end of it, we have to basically convince SpaceX, both contractually and with many verification events, that it will be safe to dock Dragon. And if they agree with the data we provide them, they will put a fully trained crew on board Dragon and bring them up. It could be as early as two weeks after, and it could be as late as any time within three years, which is a lifetime of Haven-1. But we have a very strong incentive to send a crew as quickly as we can safely do so.

The Haven-1 space station undergoes acceptance testing.

Credit: Vast Space

The Haven-1 space station undergoes acceptance testing. Credit: Vast Space

Ars: Have you picked the crew yet?

Haot: We are in deep negotiations, maybe more than that, with both private individuals and nation states. But there’s nothing we are ready to announce yet. Especially with the Q1 launch date, in our desire to follow with the crew right after, this is now becoming pretty urgent. We believe, with our partner at SpaceX, one year for training is very comfortable, and we think we can compress it to maybe as little as six months for both training on Dragon and Haven-1 so long as we have an experienced crew. So we have a bit of time left to announce it.

Ars: You mentioned Haven-1 has a three-year lifetime. How many crews will you try to cycle through?

Haot: The nominal plan is for a two-week mission, and we have one fully contracted with SpaceX, as well as a second one that we have a deposit and an option on. And then we plan to do two more. That’s assuming they are 10-day missions with two days of transfer on either side. So two-week missions. We also have the option to maybe do a 30-day mission if we want. So the exact duration and makeup will be decided as we make progress with customers and potentially NASA.

Ars: What is the plan after Haven-1?

Haot: If you look at the first module of our second station, what will be the difference? We have two docking ports, not one. We expect to have more power, and potentially more volume, depending on the launch vehicle. What you see on our website and what we do might be different. We have a lot of optionality. But other than that, it’s all of the exact same components of Haven demo and Haven-1, which are basically being iterated on. And so that’s the key. The life support system, the air revitalization system, the software, the primary structure—the first module of Haven-2 will be just tweaks on Haven-1. That’s why we think we’re in the best position of all of the competitors. And that’s not been enabled by chance, right? It’s been enabled by a billion-dollar investment in 1,000 employees and all the facilities to mass produce the follow-on modules.

Ars: NASA is nearing the second phase of its competition for commercial space stations, known as CLDs. Do you plan to compete with Haven-1 or Haven-2 for these contracts?

Haot: We have not decided because, as you know, it’s unclear yet what the requirements will be. Will they be asking for a 30-day demonstration flight? On our end it’s unclear if we want to bid that 30-day demonstration with Haven-1, or Haven-2 with two or three modules. If they ask for a 30-day mission, we have the option to offer it on Haven-1 in 2027 if we want to.

Ars: Last week a key space staffer in the US Senate, Maddy Davis, said she was “begging” for NASA to release the phase two “request for proposals” that would set the ground rules for the CLD competition. Do you feel the same way?

Haot: Vast is dedicated to ensuring we have continuous human presence in low-Earth orbit after the ISS is retired. The date we are aiming at is end of 2030. Maddy mentioned an ISS extension. We agree, for America, if no one is ready it should be extended. But in our view, we will be ready, and we need to make sure we’re ready to start a continuous crewed mission by the end of 2030. That’s less than five years away now, right? So we definitely agree with the sentiment, and I think the full industry agrees, and I’m pretty sure Jared Isaacman also agrees that it is overdue and it’s time to make a decision and release an RFP.

Ars: What do you hope to see in that RFP when it comes to requirements?

Haot: We obviously can’t decide what NASA will do, and we will be competitive in whatever they decide. But there’s a few key recommendations we feel strongly about. The first one is that, as they consider whether they proceed with a demonstration mission or something else, we think they should focus on what is right for the country. What we are hearing is that they are trying to tweak the approach to do something fair to all of the bidders. And I don’t think it should matter whether people have been doing a right thing or wrong thing, and whether what’s right for the country puts somebody in a better position or not.

The second piece, obviously, is to move faster, which we just talked about. The third piece is that we think it’s really important that they require a demonstration. If you look at every human space flight program in history, none of them went straight from the program starting to a long-duration mission on a spacecraft. They all had a stepping stone, and right now none of us has proven we can have humans safely on orbit in a space station. And so in our view, they should require demonstration, and not on the eve of January 1, 2031. They should require a demonstration with crew as quickly as possible before they buy services.

Ars: You mentioned doing the “right thing for the country.” What does that mean for NASA?

Haot: It means you’re focused on commercial stations being ready by 2030, so there is not a need to extend the ISS. And it means ensuring we have not just one winner, but two, in case history repeats itself, such as Boeing and SpaceX in crew transportation.

Ars: Do you think the government has committed enough funding to make the commercial space station program a success?

Haot: I’m a vendor, and obviously I’d like as much buffer as possible, and as much funding as possible. With the current budget we don’t think more than two winners is reasonable, but it should absolutely be two in the best interest of the country. If there was a bigger budget, obviously, three would be great. And so if you look at the CLD budget line, which is approved for next year—projected over five years for development, and you assume two winners, and then services that come later—we are confident we can be successful and profitable with two companies operating.

Obviously, we also need international customers, right? We need Europe. We need Japan, where we just opened a subsidiary. We need all the new emerging human spaceflight nations in the Middle East, in Europe, in Asia. And a little bit of private spaceflight. We’re not in a space tourism era, in orbit, but there are still some private individuals willing to fund a mission and do important work. With that, we get to profitability.

We think a big differentiator of Vast is that we are really excited and eager to unlock the orbital economy. I’m talking about in-space semiconductor, fiber, pharmaceutical manufacturing, and so on. We think that’s our upside. We want to unlock it. But we don’t know how quickly it will happen or how big it will be. What we do know is, whoever has a platform up there with flight crew, facilities, and power will be the one unlocking it. But in our business model, if that’s delayed, we can still be profitable.

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

The first commercial space station, Haven-1, is now undergoing assembly for launch Read More »