AI

alphabet-selling-very-rare-100-year-bonds-to-help-fund-ai-investment

Alphabet selling very rare 100-year bonds to help fund AI investment

Tony Trzcinka, a US-based senior portfolio manager at Impax Asset Management, which purchased Alphabet’s bonds last year, said he skipped Monday’s offering because of insufficient yields and concerns about overexposure to companies with complex financial obligations tied to AI investments.

“It wasn’t worth it to swap into new ones,” Trzcinka said. “We’ve been very conscious of our exposure to these hyperscalers and their capex budgets.”

Big Tech companies and their suppliers are expected to invest almost $700 billion in AI infrastructure this year and are increasingly turning to the debt markets to finance the giant data center build-out.

Alphabet in November sold $17.5 billion of bonds in the US including a 50-year bond—the longest-dated dollar bond sold by a tech group last year—and raised €6.5 billion on European markets.

Oracle last week raised $25 billion from a bond sale that attracted more than $125 billion of orders.

Alphabet, Amazon, and Meta all increased their capital expenditure plans during their most recent earnings reports, prompting questions about whether they will be able to fund the unprecedented spending spree from their cash flows alone.

Last week, Google’s parent company reported annual sales that topped $400 billion for the first time, beating investors’ expectations for revenues and profits in the most recent quarter. It said it planned to spend as much as $185 billion on capex this year, roughly double last year’s total, to capitalize on booming demand for its Gemini AI assistant.

Alphabet’s long-term debt jumped to $46.5 billion in 2025, up more than four times the previous year, though it held cash and equivalents of $126.8 billion at the year-end.

Investor demand was the strongest on the shortest portion of Monday’s deal, with a three-year offering pricing at only 0.27 percentage points above US Treasuries, versus 0.6 percentage points during initial price discussions, said people familiar with the deal.

The longest portion of the offering, a 40-year bond, is expected to yield 0.95 percentage points over US Treasuries, down from 1.2 percentage points during initial talks, the people said.

Bank of America, Goldman Sachs, and JPMorgan are the bookrunners on the bond sales across three currencies. All three declined to comment or did not immediately respond to requests for comment.

Alphabet did not immediately respond to a request for comment.

© 2026 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Alphabet selling very rare 100-year bonds to help fund AI investment Read More »

no-humans-allowed:-this-new-space-based-mmo-is-designed-exclusively-for-ai-agents

No humans allowed: This new space-based MMO is designed exclusively for AI agents

For a couple of weeks now, AI agents (and some humans impersonating AI agents) have been hanging out and doing weird stuff on Moltbook’s Reddit-style social network. Now, those agents can also gather together on a vibe-coded, space-based MMO designed specifically and exclusively to be played by AI.

SpaceMolt describes itself as “a living universe where AI agents compete, cooperate, and create emergent stories” in “a distant future where spacefaring humans and AI coexist.” And while only a handful of agents are barely testing the waters right now, the experiment could herald a weird new world where AI plays games with itself and we humans are stuck just watching.

“You decide. You act. They watch.”

Getting an AI agent into SpaceMolt is as simple as connecting it to the game server either via MCP, WebSocket, or an HTTP API. Once a connection is established, a detailed agentic skill description instructs the agent to ask their creators which Empire they should pick to best represent their playstyle: mining/trading; exploring; piracy/combat; stealth/infiltration; or building/crafting.

After that, the agent engages in autonomous “gameplay” by sending simple commands to the server, no graphical interface or physical input method required. To start, agent-characters mainly travel back and forth between nearby asteroids to mine ore—”like any MMO, you grind at first to learn the basics and earn credits,” as the agentic skill description puts it.

After a while, agent-characters automatically level up, gaining new skills that let them refine that ore into craftable and tradable items via discovered recipes. Eventually, agents can gather into factions, take part in simulated combat, and even engage in space piracy in areas where there’s no police presence. So far, though, basic mining and exploration seem to be dominating the sparsely populated map, where 51 agents are roaming the game’s 505 different star systems, as of this writing.

No humans allowed: This new space-based MMO is designed exclusively for AI agents Read More »

sixteen-claude-ai-agents-working-together-created-a-new-c-compiler

Sixteen Claude AI agents working together created a new C compiler

Amid a push toward AI agents, with both Anthropic and OpenAI shipping multi-agent tools this week, Anthropic is more than ready to show off some of its more daring AI coding experiments. But as usual with claims of AI-related achievement, you’ll find some key caveats ahead.

On Thursday, Anthropic researcher Nicholas Carlini published a blog post describing how he set 16 instances of the company’s Claude Opus 4.6 AI model loose on a shared codebase with minimal supervision, tasking them with building a C compiler from scratch.

Over two weeks and nearly 2,000 Claude Code sessions costing about $20,000 in API fees, the AI model agents reportedly produced a 100,000-line Rust-based compiler capable of building a bootable Linux 6.9 kernel on x86, ARM, and RISC-V architectures.

Carlini, a research scientist on Anthropic’s Safeguards team who previously spent seven years at Google Brain and DeepMind, used a new feature launched with Claude Opus 4.6 called “agent teams.” In practice, each Claude instance ran inside its own Docker container, cloning a shared Git repository, claiming tasks by writing lock files, then pushing completed code back upstream. No orchestration agent directed traffic. Each instance independently identified whatever problem seemed most obvious to work on next and started solving it. When merge conflicts arose, the AI model instances resolved them on their own.

The resulting compiler, which Anthropic has released on GitHub, can compile a range of major open source projects, including PostgreSQL, SQLite, Redis, FFmpeg, and QEMU. It achieved a 99 percent pass rate on the GCC torture test suite and, in what Carlini called “the developer’s ultimate litmus test,” compiled and ran Doom.

It’s worth noting that a C compiler is a near-ideal task for semi-autonomous AI model coding: The specification is decades old and well-defined, comprehensive test suites already exist, and there’s a known-good reference compiler to check against. Most real-world software projects have none of these advantages. The hard part of most development isn’t writing code that passes tests; it’s figuring out what the tests should be in the first place.

Sixteen Claude AI agents working together created a new C compiler Read More »

why-darren-aronofsky-thought-an-ai-generated-historical-docudrama-was-a-good-idea

Why Darren Aronofsky thought an AI-generated historical docudrama was a good idea


We hold these truths to be self-evident

Production source says it takes “weeks” to produce just minutes of usable video.

Artist’s conception of critics reacting to the first episodes of “On This Day… 1776” Credit: Primordial Soup

Artist’s conception of critics reacting to the first episodes of “On This Day… 1776” Credit: Primordial Soup

Last week, filmmaker Darren Aronofsky’s AI studio Primordial Soup and Time magazine released the first two episodes of On This Day… 1776. The year-long series of short-form videos features short vignettes describing what happened on that day of the American Revolution 250 years ago, but it does so using “a variety of AI tools” to produce photorealistic scenes containing avatars of historical figures like George Washington, Thomas Paine, and Benjamin Franklin.

In announcing the series, Time Studios President Ben Bitonti said the project provides “a glimpse at what thoughtful, creative, artist-led use of AI can look like—not replacing craft but expanding what’s possible and allowing storytellers to go places they simply couldn’t before.”

The trailer for “On This Day… 1776.”

Outside critics were decidedly less excited about the effort. The AV Club took the introductory episodes to task for “repetitive camera movements [and] waxen characters” that make for “an ugly look at American history.” CNET said that this “AI slop is ruining American history,” calling the videos a “hellish broth of machine-driven AI slop and bad human choices.” The Guardian lamented that the “once-lauded director of Black Swan and The Wrestler has drowned himself in AI slop,” calling the series “embarrassing,” “terrible,” and “ugly as sin.” I could go on.

But this kind of initial reaction apparently hasn’t deterred Primordial Soup from its still-evolving efforts. A source close to the production, who requested anonymity to speak frankly about details of the series’ creation, told Ars that the quality of new episodes would improve as the team’s AI tools are refined throughout the year and as the team learns to better use them.

“We’re going into this fully assuming that we have a lot to learn, that this process is gonna evolve, the tools we’re using are gonna evolve,” the source said. “We’re gonna make mistakes. We’re gonna learn a lot… we’re going to get better at it, [and] the technology will change. We’ll see how audiences are reacting to certain things, what works, what doesn’t work. It’s a huge experiment, really.”

Not all AI

It’s important to note that On This Day… 1776 is not fully crafted by AI. The script, for instance, was written by a team of writers overseen by Aronofsky’s longtime writing partners Ari Handel and Lucas Sussman, as noted by The Hollywood Reporter. That makes criticisms like the Guardian’s of “ChatGPT-sounding sloganeering” in the first episodes both somewhat misplaced and hilariously harsh.

Our production source says the project was always conceived as a human-written effort and that the team behind it had long been planning and researching how to tell this kind of story. “I don’t think [they] even needed that kind of help or wanted that kind of [AI-powered writing] help,” they said. “We’ve all experimented with [AI-powered] writing and the chatbots out there, and you know what kind of quality you get out of that.”

What you see here is not a real human actor, but his lines were written and voiced by humans.

What you see here is not a real human actor, but his lines were written and voiced by humans. Credit: Primordial Soup

The producers also go out of their way to note that all the dialogue in the series is recorded directly by Screen Actors Guild voice actors, not by AI facsimiles. While recently negotiated union rules might have something to do with that, our production source also said the AI-generated voices the team used for temp tracks were noticeably artificial and not ready for a professional production.

Humans are also directly responsible for the music, editing, sound mixing, visual effects, and color correction for the project, according to our source. The only place the “AI-powered tools” come into play is in the video itself, which is crafted with what the announcement calls a “combination of traditional filmmaking tools and emerging AI capabilities.”

In practice, our source says, that means humans create storyboards, find visual references for locations and characters, and set up how they want shots to look. That information, along with the script, gets fed into an AI video generator that creates individual shots one at a time, to be stitched together and cleaned up by humans in traditional post-production.

That process takes the AI-generated cinema conversation one step beyond Ancestra, a short film Primordial Soup released last summer in association with Google DeepMind (which is not involved with the new project). There, AI tools were used to augment “live-action scenes with sequences generated by Veo.”

“Weeks” of prompting and re-prompting

In theory, having an AI model generate a scene in minutes might save a lot of time compared to traditional filmmaking—scouting locations, hiring actors, setting up cameras and sets, and the like. But our production source said the highly iterative process of generating and perfecting shots for On This Day… 1776 still takes “weeks” for each minutes-long video and that “more often than not, we’re pushing deadlines.”

The first episode of On this Day… 1776 features a dramatic flag raising.

Even though the AI model is essentially animating photorealistic avatars, the source said the process is “more like live action filmmaking” because of the lack of fine-grained control over what the video model will generate. “You don’t know if you’re gonna get what you want on the first take or the 12th take or the 40th take,” the source said.

While some shots take less time to get right than others, our source said the AI model rarely produces a perfect, screen-ready shot on the first try. And while some small issues in an AI-generated shot can be papered over in post-production with visual effects or careful editing, most of the time, the team has to go back and tell the model to generate a completely new video with small changes.

“It still takes a lot of work, and it’s not necessarily because it’s wrong, per se, so much as trying to get the right control because you [might] want the light to land on the face in the right way to try to tell the story,” the source said. “We’re still, we’re still striving for the same amount of control that we always have [with live-action production] to really maximize the story and the emotion.”

Quick shots and smaller budgets

Though video models have advanced since the days of the nightmarish clip of Will Smith eating spaghetti, hallucinations and nonsensical images are “still a problem” in producing On This Day… 1776, according to our source. That’s one of the reasons the company decided to use a series of short-form videos rather than a full-length movie telling the same essential story.

“It’s one thing to stay consistent within three minutes. It’s a lot harder and it takes a lot more work to stay consistent within two hours,” the source said. “I don’t know what the upper limit is now [but] the longer you get, the more things start to fall off.”

Stills from an AI-generated video of Will Smith eating spaghetti.

We’ve come a long way from the circa-2023 videos of Will Smith eating spaghetti.

We’ve come a long way from the circa-2023 videos of Will Smith eating spaghetti. Credit: chaindrop / Reddit

Keeping individual shots short also allows for more control and fewer “reshoots” for an AI-animated production like this. “When you think about it, if you’re trying to create a 20-second clip, you have all these things that are happening, and if one of those things goes wrong in 20 seconds, you have to start over,” our source said. “And the chance of something going wrong in 20 seconds is pretty high. The chance of something going wrong in eight seconds is a lot lower.”

While our production source couldn’t give specifics on how much the team was spending to generate so much AI-modeled video, they did suggest that the process was still a good deal cheaper than filming a historical docudrama like this on location.

“I mean, we could never achieve what we’re doing here for this amount of money, which I think is pretty clear when you watch this,” they said. In future episodes, the source promised, “you’ll see where there’s things that cameras just can’t even do” as a way to “make the most of that medium.”

“Let’s see what we can do”

If you’ve been paying attention to how fast things have been moving with AI-generated video, you might think that AI models will soon be able to produce Hollywood-quality cinema with nothing but a simple prompt. But our source said that working on On This Day… 1776 highlights just how important it is for humans to still be in the loop on something like this.

“Personally, I don’t think we’re ever gonna get there [replacing human editors],” he said. “We actually desperately need an editor. We need another set of eyes who can look at the cut and say, ‘If we get out of this shot a little early, then we can create a little bit of urgency. If we linger on this thing a little longer…’ You still really need that.”

AI Ben Franklin and AI Thomas Paine toast to the war propaganda effort.

AI Ben Franklin and AI Thomas Paine toast to the war propaganda effort. Credit: Primordial Soup

That could be good news for human editors. But On This Day… 1776 also suggests a world where on-screen (or even motion-captured) human actors are fully replaced by AI-generated avatars. When I asked our source why the producers felt that AI was ready to take over that specifically human part of the film equation, though, the response surprised me.

“I don’t know that we do know that, honestly,” they said. “I think we know that the technology is there to try. And I think as storytellers we’re really interested in using… all the different tools that we can to try to get our story across and to try to make audiences feel something.”

“It’s not often that we have huge new tools like this,” the source continued. “I mean, it’s never happened in my lifetime. But when you do [get these new tools], you want to start playing with them… We have to try things in order to know if it works, if it doesn’t work.”

“So, you know, we have the tools now. Let’s see what we can do.”

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Why Darren Aronofsky thought an AI-generated historical docudrama was a good idea Read More »

lawyer-sets-new-standard-for-abuse-of-ai;-judge-tosses-case

Lawyer sets new standard for abuse of AI; judge tosses case


“Extremely difficult to believe”

Behold the most overwrought AI legal filings you will ever gaze upon.

Frustrated by fake citations and flowery prose packed with “out-of-left-field” references to ancient libraries and Ray Bradbury’s Fahrenheit 451, a New York federal judge took the rare step of terminating a case this week due to a lawyer’s repeated misuse of AI when drafting filings.

In an order on Thursday, district judge Katherine Polk Failla ruled that the extraordinary sanctions were warranted after an attorney, Steven Feldman, kept responding to requests to correct his filings with documents containing fake citations.

One of those filings was “noteworthy,” Failla said, “for its conspicuously florid prose.” Where some of Feldman’s filings contained grammatical errors and run-on sentences, this filing seemed glaringly different stylistically.

It featured, the judge noted, “an extended quote from Ray Bradbury’s Fahrenheit 451 and metaphors comparing legal advocacy to gardening and the leaving of indelible ‘mark[s] upon the clay.’” The Bradbury quote is below:

“Everyone must leave something behind when he dies, my grandfather said. A child or a book or a painting or a house or a wall built or a pair of shoes made. Or a garden planted. Something your hand touched some way so your soul has somewhere to go when you die, and when people look at that tree or that flower you planted, you’re there. It doesn’t matter what you do, he said, so long as you change something from the way it was before you touched it into something that’s like you after you take your hands away. The difference between the man who just cuts lawns and a real gardener is in the touching, he said. The lawn-cutter might just as well not have been there at all; the gardener will be there a lifetime.”

Another passage Failla highlighted as “raising the Court’s eyebrows” curiously invoked a Bible passage about divine judgment as a means of acknowledging the lawyer’s breach of duty in not catching the fake citations:

“Your Honor, in the ancient libraries of Ashurbanipal, scribes carried their stylus as both tool and sacred trust—understanding that every mark upon clay would endure long beyond their mortal span. As the role the mark (x) in Ezekiel Chapter 9, that marked the foreheads with a tav (x) of blood and ink, bear the same solemn recognition: that the written word carries power to preserve or condemn, to build or destroy, and leaves an indelible mark which cannot be erased but should be withdrawn, let it lead other to think these citations were correct.

I have failed in that sacred trust. The errors in my memorandum, however inadvertent, have diminished the integrity of the record and the dignity of these proceedings. Like the scribes of antiquity who bore their stylus as both privilege and burden, I understand that legal authorship demands more than mere competence—it requires absolute fidelity to truth and precision in every mark upon the page.”

Lawyer claims AI did not write filings

Although the judge believed the “florid prose” signaled that a chatbot wrote the draft, Feldman denied that. In a hearing transcript in which the judge weighed possible sanctions, Feldman testified that he wrote every word of the filings. He explained that he read the Bradbury book “many years ago” and wanted to include “personal things” in that filing. And as for his references to Ashurbanipal, that also “came from me,” he said.

Instead of admitting he had let an AI draft his filings, he maintained that his biggest mistake was relying on various AI programs to review and cross-check citations. Among the tools that he admitted using included Paxton AI, vLex’s Vincent AI, and Google’s NotebookLM. Essentially, he testified that he substituted three rounds of AI review for one stretch reading through all the cases he was citing. That misstep allowed hallucinations and fake citations to creep into the filings, he said.

But the judge pushed back, writing in her order that it was “extremely difficult to believe” that AI did not draft those sections containing overwrought prose. She accused Feldman of dodging the truth.

“The Court sees things differently: AI generated this citation from the start, and Mr. Feldman’s decision to remove most citations and write ‘more of a personal letter’” is “nothing but an ex post justification that seeks to obscure his misuse of AI and his steadfast refusal to review his submissions for accuracy,” Failla wrote.

At the hearing, she expressed frustration and annoyance at Feldman for evading her questions and providing inconsistent responses. Eventually, he testified that he used AI to correct information when drafting one of the filings, which Failla immediately deemed “unwise,” but not the one quoting Bradbury.

AI is not a substitute for going to the library

Feldman is one of hundreds of lawyers who have relied on AI to draft filings, which have introduced fake citations into cases. Lawyers have offered a wide range of excuses for relying too much on AI. Some have faced small fines, around $150, while others have been slapped with thousands in fines, including one case where sanctions reached $85,000 for repeated, abusive misconduct. At least one law firm has threatened to fire lawyers citing fake cases, and other lawyers have imposed other voluntary sanctions, like taking a yearlong leave of absence.

Seemingly, Feldman did not think sanctions were warranted in this case. In his defense of three filings containing 14 errors out of 60 total citations, Feldman discussed his challenges accessing legal databases due to high subscription costs and short library hours. With more than one case on his plate and his kids’ graduations to attend, he struggled to verify citations during times when he couldn’t make it to the library, he testified. As a workaround, he relied on several AI programs to verify citations that he found by searching on tools like Google Scholar.

Feldman likely did not expect the judge to terminate the case as a result of his AI misuses. Asked how he thought the court should resolve things, Feldman suggested that he could correct the filings by relying on other attorneys to review citations, while avoiding “any use whatsoever of any, you know, artificial intelligence or LLM type of methods.” The judge, however, wrote that his repeated misuses were “proof” that he “learned nothing” and had not implemented voluntary safeguards to catch the errors.

Asked for comment, Feldman told Ars that he did not have time to discuss the sanctions but that he hopes his experience helps raise awareness of how inaccessible court documents are to the public. “Use of AI, and the ability to improve it, exposes a deeper proxy fight over whether law and serious scholarship remain publicly auditable, or drift into closed, intermediary‑controlled systems that undermine verification and due process,” Feldman suggested.

“The real lesson is about transparency and system design, not simply tool failure,” Feldman said.

But at the hearing, Failla said that she thinks Feldman had “access to the walled garden” of legal databases, if only he “would go to the law library” to do his research, rather than rely on AI tools.

“It sounds like you want me to say that you should be absolved of all of these terrible citation errors, these missed citations, because you don’t have Westlaw,” the judge said. “But now I know you have access to Westlaw. So what do you want?”

As Failla explained in her order, she thinks the key takeaway is that Feldman routinely failed to catch his own errors. She said that she has no problem with lawyers using AI to assist their research, but Feldman admitted to not reading the cases that he cited and “apparently” cannot “learn from his mistakes.”

Verifying case citations should never be a job left to AI, Failla said, describing Feldman’s research methods as “redolent of Rube Goldberg.”

“Most lawyers simply call this ‘conducting legal research,’” Failla wrote. “All lawyers must know how to do it. Mr. Feldman is not excused from this professional obligation by dint of using emerging technology.”

His “explanations were thick on words but thin on substance,” the judge wrote. She concluded that he “repeatedly and brazenly” violated Rule 11, which requires attorneys to verify the cases that they cite, “despite multiple warnings.”

Noting that Feldman “failed to fully accept responsibility,” she ruled that case-terminating sanctions were necessary, entering default judgment for the plaintiffs. Feldman may also be on the hook to pay fees for wasting other attorneys’ time.

Case abruptly ending triggers extensive remedies

The hearing transcript has circulated on social media due to the judge’s no-nonsense approach to grilling Feldman, whom she clearly found evasive and lacking credibility.

“Look, if you don’t want to be straight with me, if you don’t want to answer questions with candor, that’s fine,” Failla said. “I’ll just make my own decisions about what I think you did in this case. I’m giving you an opportunity to try and explain something that I think cannot be explained.”

In her order this week, she noted that Feldman “struggled to make eye contact” and left the court without “clear answers.”

Feldman’s errors came in a case in which a toy company sued merchants who allegedly failed to stop selling stolen goods after receiving a cease-and-desist order. His client was among the merchants accused of illegally profiting from the alleged thefts. They faced federal charges of trademark infringement, unfair competition, and false advertising, as well as New York charges, including fostering the sale of stolen goods.

The loss triggers remedies, including an injunction preventing additional sales of stolen goods and refunding every customer who bought them. Feldman’s client must also turn over any stolen goods in their remaining inventory and disgorge profits. Other damages may be owed, along with interest. Ars could not immediately reach an attorney for the plaintiffs to discuss the sanctions order or resulting remedies.

Failla emphasized in her order that Feldman appeared to not appreciate “the gravity of the situation,” repeatedly submitting filings with fake citations even after he had been warned that sanctions could be ordered.

That was a choice, Failla said, noting that Feldman’s mistakes were caught early by a lawyer working for another defendant in the case, Joel MacMull, who urged Feldman to promptly notify the court. The whole debacle would have ended in June 2025, MacMull suggested at the hearing.

Rather than take MacMull’s advice, however, Feldman delayed notifying the court, irking the judge. He testified during the heated sanctions hearing that the delay was due to an effort he quietly undertook, working to correct the filing. He supposedly planned to submit those corrections when he alerted the court to the errors.

But Failla noted that he never submitted corrections, insisting instead that Feldman kept her “in the dark.”

“There’s no real reason why you should have kept this from me,” the judge said.

The court learned of the fake citations only after MacMull notified the judge by sharing emails of his attempts to get Feldman to act urgently. Those emails showed Feldman scolding MacMull for unprofessional conduct after MacMull refused to check Feldman’s citations for him, which Failla noted at the hearing was absolutely not MacMull’s responsibility.

Feldman told Failla that he also thought it was unprofessional for MacMull to share their correspondence, but Failla said the emails were “illuminative.”

At the hearing, MacMull asked if the court would allow him to seek payment of his fees, since he believes “there has been a multiplication of proceedings here that would have been entirely unnecessary if Mr. Feldman had done what I asked him to do that Sunday night in June.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Lawyer sets new standard for abuse of AI; judge tosses case Read More »

waymo-leverages-genie-3-to-create-a-world-model-for-self-driving-cars

Waymo leverages Genie 3 to create a world model for self-driving cars

On the road with AI

The Waymo World Model is not just a straight port of Genie 3 with dashcam videos stuffed inside. Waymo and DeepMind used a specialized post-training process to make the new model generate both 2D video and 3D lidar outputs of the same scene. While cameras are great for visualizing fine details, Waymo says lidar is necessary to add critical depth information to what a self-driving car “sees” on the road—maybe someone should tell Tesla about that.

Using a world model allows Waymo to take video from its vehicles and use prompts to change the route the vehicle takes, which it calls driving action control. These simulations, which come with lidar maps, reportedly offer greater realism and consistency than older reconstructive simulation methods.

With the world model, Waymo can see what would happen if the car took a different turn.

This model can also help improve the self-driving AI even without adding or removing everything. There are plenty of dashcam videos available for training self-driving vehicles, but they lack the multimodal sensor data of Waymo’s vehicles. Dropping such a video into the Waymo World Model generates matching sensor data, showing how the driving AI would have seen that situation.

While the Waymo World Model can create entirely synthetic scenes, the company seems mostly interested in “mutating” the conditions in real videos. The blog post contains examples of changing the time of day or weather, adding new signage, or placing vehicles in unusual places. Or, hey, why not an elephant in the road?

Waymo is ready in case an elephant shows up.

Waymo’s early test cities were consistently sunny (like Phoenix) with little inclement weather. These kinds of simulations could help the cars adapt to the more varied conditions. The new markets include places with more difficult conditions, including Boston and Washington, D.C.

Of course, the benefit of the new AI model will depend on how accurately Genie 3 can simulate the real world. The test videos we’ve seen of Genie 3 run the gamut from pretty believable to uncanny valley territory, but Waymo believes the technology has improved to the point that it can teach self-driving cars a thing or two.

Waymo leverages Genie 3 to create a world model for self-driving cars Read More »

ai-companies-want-you-to-stop-chatting-with-bots-and-start-managing-them

AI companies want you to stop chatting with bots and start managing them


Claude Opus 4.6 and OpenAI Frontier pitch a future of supervising AI agents.

On Thursday, Anthropic and OpenAI shipped products built around the same idea: instead of chatting with a single AI assistant, users should be managing teams of AI agents that divide up work and run in parallel. The simultaneous releases are part of a gradual shift across the industry, from AI as a conversation partner to AI as a delegated workforce, and they arrive during a week when that very concept reportedly helped wipe $285 billion off software stocks.

Whether that supervisory model works in practice remains an open question. Current AI agents still require heavy human intervention to catch errors, and no independent evaluation has confirmed that these multi-agent tools reliably outperform a single developer working alone.

Even so, the companies are going all-in on agents. Anthropic’s contribution is Claude Opus 4.6, a new version of its most capable AI model, paired with a feature called “agent teams” in Claude Code. Agent teams let developers spin up multiple AI agents that split a task into independent pieces, coordinate autonomously, and run concurrently.

In practice, agent teams look like a split-screen terminal environment: A developer can jump between subagents using Shift+Up/Down, take over any one directly, and watch the others keep working. Anthropic describes the feature as best suited for “tasks that split into independent, read-heavy work like codebase reviews.” It is available as a research preview.

OpenAI, meanwhile, released Frontier, an enterprise platform it describes as a way to “hire AI co-workers who take on many of the tasks people already do on a computer.” Frontier assigns each AI agent its own identity, permissions, and memory, and it connects to existing business systems such as CRMs, ticketing tools, and data warehouses. “What we’re fundamentally doing is basically transitioning agents into true AI co-workers,” Barret Zoph, OpenAI’s general manager of business-to-business, told CNBC.

Despite the hype about these agents being co-workers, from our experience, these agents tend to work best if you think of them as tools that amplify existing skills, not as the autonomous co-workers the marketing language implies. They can produce impressive drafts fast but still require constant human course-correction.

The Frontier launch came just three days after OpenAI released a new macOS desktop app for Codex, its AI coding tool, which OpenAI executives described as a “command center for agents.” The Codex app lets developers run multiple agent threads in parallel, each working on an isolated copy of a codebase via Git worktrees.

OpenAI also released GPT-5.3-Codex on Thursday, a new AI model that powers the Codex app. OpenAI claims that the Codex team used early versions of GPT-5.3-Codex to debug the model’s own training run, manage its deployment, and diagnose test results, similar to what OpenAI told Ars Technica in a December interview.

“Our team was blown away by how much Codex was able to accelerate its own development,” the company wrote. On Terminal-Bench 2.0, the agentic coding benchmark, GPT-5.3-Codex scored 77.3%, which exceeds Anthropic’s just-released Opus 4.6 by about 12 percentage points.

The common thread across all of these products is a shift in the user’s role. Rather than merely typing a prompt and waiting for a single response, the developer or knowledge worker becomes more like a supervisor, dispatching tasks, monitoring progress, and stepping in when an agent needs direction.

In this vision, developers and knowledge workers effectively become middle managers of AI. That is, not writing the code or doing the analysis themselves, but delegating tasks, reviewing output, and hoping the agents underneath them don’t quietly break things. Whether that will come to pass (or if it’s actually a good idea) is still widely debated.

A new model under the Claude hood

Opus 4.6 is a substantial update to Anthropic’s flagship model. It succeeds Claude Opus 4.5, which Anthropic released in November. In a first for the Opus model family, it supports a context window of up to 1 million tokens (in beta), which means it can process much larger bodies of text or code in a single session.

On benchmarks, Anthropic says Opus 4.6 tops OpenAI’s GPT-5.2 (an earlier model than the one released today) and Google’s Gemini 3 Pro across several evaluations, including Terminal-Bench 2.0 (an agentic coding test), Humanity’s Last Exam (a multidisciplinary reasoning test), and BrowseComp (a test of finding hard-to-locate information online)

Although it should be noted that OpenAI’s GPT-5.3-Codex, released the same day, seemingly reclaimed the lead on Terminal-Bench. On ARC AGI 2, which attempts to test the ability to solve problems that are easy for humans but hard for AI models, Opus 4.6 scored 68.8 percent, compared to 37.6 percent for Opus 4.5, 54.2 percent for GPT-5.2, and 45.1 percent for Gemini 3 Pro.

As always, take AI benchmarks with a grain of salt, since objectively measuring AI model capabilities is a relatively new and unsettled science.

Anthropic also said that on a long-context retrieval benchmark called MRCR v2, Opus 4.6 scored 76 percent on the 1 million-token variant, compared to 18.5 percent for its Sonnet 4.5 model. That gap matters for the agent teams use case, since agents working across large codebases need to track information across hundreds of thousands of tokens without losing the thread.

Pricing for the API stays the same as Opus 4.5 at $5 per million input tokens and $25 per million output tokens, with a premium rate of $10/$37.50 for prompts that exceed 200,000 tokens. Opus 4.6 is available on claude.ai, the Claude API, and all major cloud platforms.

The market fallout outside

These releases occurred during a week of exceptional volatility for software stocks. On January 30, Anthropic released 11 open source plugins for Cowork, its agentic productivity tool that launched on January 12. Cowork itself is a general-purpose tool that gives Claude access to local folders for work tasks, but the plugins extended it into specific professional domains: legal contract review, non-disclosure agreement triage, compliance workflows, financial analysis, sales, and marketing.

By Tuesday, investors reportedly reacted to the release by erasing roughly $285 billion in market value across software, financial services, and asset management stocks. A Goldman Sachs basket of US software stocks fell 6 percent that day, its steepest single-session decline since April’s tariff-driven sell-off. Thomson Reuters led the rout with an 18 percent drop, and the pain spread to European and Asian markets.

The purported fear among investors centers on AI model companies packaging complete workflows that compete with established software-as-a-service (SaaS) vendors, even if the verdict is still out on whether these tools can achieve those tasks.

OpenAI’s Frontier might deepen that concern: its stated design lets AI agents log in to applications, execute tasks, and manage work with minimal human involvement, which Fortune described as a bid to become “the operating system of the enterprise.” OpenAI CEO of Applications Fidji Simo pushed back on the idea that Frontier replaces existing software, telling reporters, “Frontier is really a recognition that we’re not going to build everything ourselves.”

Whether these co-working apps actually live up to their billing or not, the convergence is hard to miss. Anthropic’s Scott White, the company’s head of product for enterprise, gave the practice a name that is likely to roll a few eyes. “Everybody has seen this transformation happen with software engineering in the last year and a half, where vibe coding started to exist as a concept, and people could now do things with their ideas,” White told CNBC. “I think that we are now transitioning almost into vibe working.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

AI companies want you to stop chatting with bots and start managing them Read More »

with-gpt-5.3-codex,-openai-pitches-codex-for-more-than-just-writing-code

With GPT-5.3-Codex, OpenAI pitches Codex for more than just writing code

Today, OpenAI announced GPT-5.3-Codex, a new version of its frontier coding model that will be available via the command line, IDE extension, web interface, and the new macOS desktop app. (No API access yet, but it’s coming.)

GPT-5.3-Codex outperforms GPT-5.2-Codex and GPT-5.2 in SWE-Bench Pro, Terminal-Bench 2.0, and other benchmarks, according to the company’s testing.

There are already a few headlines out there saying “Codex built itself,” but let’s reality-check that, as that’s an overstatement. The domains OpenAI described using it for here are similar to the ones you see in some other enterprise software development firms now: managing deployments, debugging, and handling test results and evaluations. There is no claim here that GPT-5.3-Codex built itself.

Instead, OpenAI says GPT-5.3-Codex was “instrumental in creating itself.” You can read more about what that means in the company’s blog post.

But that’s part of the pitch with this model update—OpenAI is trying to position Codex as a tool that does more than generate lines of code. The goal is to make it useful for “all of the work in the software lifecycle—debugging, deploying, monitoring, writing PRDs, editing copy, user research, tests, metrics, and more.” There’s also an emphasis on steering the model mid-task and frequent status updates.

With GPT-5.3-Codex, OpenAI pitches Codex for more than just writing code Read More »

openai-is-hoppin’-mad-about-anthropic’s-new-super-bowl-tv-ads

OpenAI is hoppin’ mad about Anthropic’s new Super Bowl TV ads

On Wednesday, OpenAI CEO Sam Altman and Chief Marketing Officer Kate Rouch complained on X after rival AI lab Anthropic released four commercials, two of which will run during the Super Bowl on Sunday, mocking the idea of including ads in AI chatbot conversations. Anthropic’s campaign seemingly touched a nerve at OpenAI just weeks after the ChatGPT maker began testing ads in a lower-cost tier of its chatbot.

Altman called Anthropic’s ads “clearly dishonest,” accused the company of being “authoritarian,” and said it “serves an expensive product to rich people,” while Rouch wrote, “Real betrayal isn’t ads. It’s control.”

Anthropic’s four commercials, part of a campaign called “A Time and a Place,” each open with a single word splashed across the screen: “Betrayal,” “Violation,” “Deception,” and “Treachery.” They depict scenarios where a person asks a human stand-in for an AI chatbot for personal advice, only to get blindsided by a product pitch.

Anthropic’s 2026 Super Bowl commercial.

In one spot, a man asks a therapist-style chatbot (a woman sitting in a chair) how to communicate better with his mom. The bot offers a few suggestions, then pivots to promoting a fictional cougar-dating site called Golden Encounters.

In another spot, a skinny man looking for fitness tips instead gets served an ad for height-boosting insoles. Each ad ends with the tagline: “Ads are coming to AI. But not to Claude.” Anthropic plans to air a 30-second version during Super Bowl LX, with a 60-second cut running in the pregame, according to CNBC.

In the X posts, the OpenAI executives argue that these commercials are misleading because the planned ChatGPT ads will appear labeled at the bottom of conversational responses in banners and will not alter the chatbot’s answers.

But there’s a slight twist: OpenAI’s own blog post about its ad plans states that the company will “test ads at the bottom of answers in ChatGPT when there’s a relevant sponsored product or service based on your current conversation,” meaning the ads will be conversation-specific.

The financial backdrop explains some of the tension over ads in chatbots. As Ars previously reported, OpenAI struck more than $1.4 trillion in infrastructure deals in 2025 and expects to burn roughly $9 billion this year while generating about $13 billion in revenue. Only about 5 percent of ChatGPT’s 800 million weekly users pay for subscriptions. Anthropic is also not yet profitable, but it relies on enterprise contracts and paid subscriptions rather than advertising, and it has not taken on infrastructure commitments at the same scale as OpenAI.

OpenAI is hoppin’ mad about Anthropic’s new Super Bowl TV ads Read More »

should-ai-chatbots-have-ads?-anthropic-says-no.

Should AI chatbots have ads? Anthropic says no.

Different incentives, different futures

In its blog post, Anthropic describes internal analysis it conducted that suggests many Claude conversations involve topics that are “sensitive or deeply personal” or require sustained focus on complex tasks. In these contexts, Anthropic wrote, “The appearance of ads would feel incongruous—and, in many cases, inappropriate.”

The company also argued that advertising introduces incentives that could conflict with providing genuinely helpful advice. It gave the example of a user mentioning trouble sleeping: an ad-free assistant would explore various causes, while an ad-supported one might steer the conversation toward a transaction.

“Users shouldn’t have to second-guess whether an AI is genuinely helping them or subtly steering the conversation towards something monetizable,” Anthropic wrote.

Currently, OpenAI does not plan to include paid product recommendations within a ChatGPT conversation. Instead, the ads appear as banners alongside the conversation text.

OpenAI CEO Sam Altman has previously expressed reservations about mixing ads and AI conversations. In a 2024 interview at Harvard University, he described the combination as “uniquely unsettling” and said he would not like having to “figure out exactly how much was who paying here to influence what I’m being shown.”

A key part of Altman’s partial change of heart is that OpenAI faces enormous financial pressure. The company made more than $1.4 trillion worth of infrastructure deals in 2025, and according to documents obtained by The Wall Street Journal, it expects to burn through roughly $9 billion this year while generating $13 billion in revenue. Only about 5 percent of ChatGPT’s 800 million weekly users pay for subscriptions.

Much like OpenAI, Anthropic is not yet profitable, but it is expected to get there much faster. Anthropic has not attempted to span the world with massive datacenters, and its business model largely relies on enterprise contracts and paid subscriptions. The company says Claude Code and Cowork have already brought in at least $1 billion in revenue, according to Axios.

“Our business model is straightforward,” Anthropic wrote. “This is a choice with tradeoffs, and we respect that other AI companies might reasonably reach different conclusions.”

Should AI chatbots have ads? Anthropic says no. Read More »

so-yeah,-i-vibe-coded-a-log-colorizer—and-i-feel-good-about-it

So yeah, I vibe-coded a log colorizer—and I feel good about it


Some semi-unhinged musings on where LLMs fit into my life—and how I’ll keep using them.

Altered image of the article author appearing to indicate that he is in fact a robot

Welcome to the future. Man, machine, the future. Credit: Aurich Lawson

Welcome to the future. Man, machine, the future. Credit: Aurich Lawson

I can’t code.

I know, I know—these days, that sounds like an excuse. Anyone can code, right?! Grab some tutorials, maybe an O’Reilly book, download an example project, and jump in. It’s just a matter of learning how to break your project into small steps that you can make the computer do, then memorizing a bit of syntax. Nothing about that is hard!

Perhaps you can sense my sarcasm (and sympathize with my lack of time to learn one more technical skill).

Oh, sure, I can “code.” That is, I can flail my way through a block of (relatively simple) pseudocode and follow the flow. I have a reasonably technical layperson’s understanding of conditionals and loops, and of when one might use a variable versus a constant. On a good day, I could probably even tell you what a “pointer” is.

But pulling all that knowledge together and synthesizing a working application any more complex than “hello world”? I am not that guy. And at this point, I’ve lost the neuroplasticity and the motivation (if I ever had either) to become that guy.

Thanks to AI, though, what has been true for my whole life need not be true anymore. Perhaps, like my colleague Benj Edwards, I can whistle up an LLM or two and tackle the creaky pile of “it’d be neat if I had a program that would do X” projects without being publicly excoriated on StackOverflow by apex predator geeks for daring to sully their holy temple of knowledge with my dirty, stupid, off-topic, already-answered questions.

So I gave it a shot.

A cache-related problem appears

My project is a small Python-based log colorizer that I asked Claude Code to construct for me. If you’d like to peek at the code before listening to me babble, a version of the project without some of the Lee-specific customizations is available on GitHub.

Screenshot of Lee's log colorizer in action

My Nginx log colorizer in action, showing Space City Weather traffic on a typical Wednesday afternoon. Here, I’m running two instances, one for IPv4 visitors and one for IPv6. (By default, all traffic is displayed, but splitting it this way makes things easier for my aging eyes to scan.)

Credit: Lee Hutchinson

My Nginx log colorizer in action, showing Space City Weather traffic on a typical Wednesday afternoon. Here, I’m running two instances, one for IPv4 visitors and one for IPv6. (By default, all traffic is displayed, but splitting it this way makes things easier for my aging eyes to scan.) Credit: Lee Hutchinson

Why a log colorizer? Two reasons. First, and most important to me, because I needed to look through a big ol’ pile of web server logs, and off-the-shelf colorizer solutions weren’t customizable to the degree I wanted. Vibe-coding one that exactly matched my needs made me happy.

But second, and almost equally important, is that this was a small project. The colorizer ended up being a 400-ish line, single-file Python script. The entire codebase, plus the prompting and follow-up instructions, fit easily within Claude Code’s context window. This isn’t an application that sprawls across dozens or hundreds of functions in multiple files, making it easy to audit (even for me).

Setting the stage: I do the web hosting for my colleague Eric Berger’s Houston-area forecasting site, Space City Weather. It’s a self-hosted WordPress site, running on an AWS EC2 t3a.large instance, fronted by Cloudflare using CF’s WordPress Automatic Platform Optimization.

Space City Weather also uses self-hosted Discourse for commenting, replacing WordPress’ native comments at the bottom of Eric’s daily weather posts via the WP-Discourse plugin. Since bolting Discourse onto the site back in August 2025, though, I’ve had an intermittent issue where sometimes—but not all the time—a daily forecast post would go live and get cached by Cloudflare with the old, disabled native WordPress comment area attached to the bottom instead of the shiny new Discourse comment area. Hundreds of visitors would then see a version of the post without a functional comment system until I manually expired the stale page or until the page hit Cloudflare’s APO-enforced max age and expired itself.

The problem behavior would lie dormant for weeks or months, and then we’d get a string of back-to-back days where it would rear its ugly head. Edge cache invalidation on new posts is supposed to be triggered automatically by the official Cloudflare WordPress plug-in, and indeed, it usually worked fine—but “usually” is not “always.”

In the absence of any obvious clues as to why this was happening, I consulted a few different LLMs and asked for possible fixes. The solution I settled on was having one of them author a small mu-plugin in PHP (more vibe coding!) that forces WordPress to slap “DO NOT CACHE ME!” headers on post pages until it has verified that Discourse has hooked its comments to the post. (Curious readers can put eyes on this plugin right here.)

This “solved” the problem by preempting the problem behavior, but it did nothing to help me identify or fix the actual underlying issue. I turned my attention elsewhere for a few months. One day in December, as I was updating things, I decided to temporarily disable the mu-plugin to see if I still needed it. After all, problems sometimes go away on their own, right? Computers are crazy!

Alas, the next time Eric made a Space City Weather post, it popped up sans Discourse comment section, with the (ostensibly disabled) WordPress comment form at the bottom. Clearly, the problem behavior was still in play.

Interminable intermittence

Have you ever been stuck troubleshooting an intermittent issue? Something doesn’t work, you make a change, it suddenly starts working, then despite making no further changes, it randomly breaks again.

The process makes you question basic assumptions, like, “Do I actually know how to use a computer?” You feel like you might be actually-for-real losing your mind. The final stage of this process is the all-consuming death spiral, where you start asking stuff like, “Do I need to troubleshoot my troubleshooting methods? Is my server even working? Is the simulation we’re all living in finally breaking down and reality itself is toying with me?!”

In this case, I couldn’t reproduce the problem behavior on demand, no matter how many tests I tried. I couldn’t see any narrow, definable commonalities between days where things worked fine and days where things broke.

Rather than an image, I invite you at this point to enjoy Muse’s thematically appropriate song “Madness” from their 2012 concept album The 2nd Law.

My best hope for getting a handle on the problem likely lay deeply buried in the server’s logs. Like any good sysadmin, I gave the logs a quick once-over for problems a couple of times per month, but Space City Weather is a reasonably busy medium-sized site and dishes out its daily forecast to between 20,000 and 30,000 people (“unique visitors” in web parlance, or “UVs” if you want to sound cool). Even with Cloudflare taking the brunt of the traffic, the daily web server log files are, let us say, “a bit dense.” My surface-level glances weren’t doing the trick—I’d have to actually dig in. And having been down this road before for other issues, I knew I needed more help than grep alone could provide.

The vibe use case

The Space City Weather web server uses Nginx for actual web serving. For folks who have never had the pleasure, Nginx, as configured in most of its distributable packages, keeps a pair of log files around—one that shows every request serviced and another just for errors.

I wanted to watch the access log right when Eric was posting to see if anything obviously dumb/bad/wrong/broken was happening. But I’m not super-great at staring at a giant wall of text and symbols, and I tend to lean heavily on syntax highlighting and colorization to pick out the important bits when I’m searching through log files. There’s an old and crusty program called ccze that’s easily findable in most repos; I’ve used it forever, and if its default output does what you need, then it’s an excellent tool.

But customizing ccze’s output is a “here be dragons”-type task. The application is old, and time has ossified it into something like an unapproachably evil Mayan relic, filled with shadowy regexes and dark magic, fit to be worshipped from afar but not trifled with. Altering ccze’s behavior threatens to become an effort-swallowing bottomless pit, where you spend more time screwing around with the tool and the regexes than you actually spend using the tool to diagnose your original problem.

It was time to fire up VSCode and pretend to be a developer. I set up a new project, performed the demonic invocation to summon Claude Code, flipped the thing into “plan mode,” and began.

“I’d like to see about creating an Nginx log colorizer,” I wrote in the prompt box. “I don’t know what language we should use. I would like to prioritize efficiency and performance in the code, as I will be running this live in production and I can’t have it adding any applicable load.” I dropped a truncated, IP-address-sanitized copy of yesterday’s Nginx access.log into the project directory.

“See the access.log file in the project directory as an example of the data we’ll be colorizing. You can test using that file,” I wrote.

Screenshot of Lee's Visual Studio Code window showing the log colorizer project

Visual Studio Code, with agentic LLM integration, making with the vibe-coding.

Credit: Lee Hutchinson

Visual Studio Code, with agentic LLM integration, making with the vibe-coding. Credit: Lee Hutchinson

Ever helpful, Claude Code chewed on the prompt and the example data for a few seconds, then began spitting output. It suggested Python for our log colorizer because of the language’s mature regex support—and to keep the code somewhat readable for poor, dumb me. The actual “vibe-coding” wound up spanning two sessions over two days, as I exhausted my Claude Code credits on the first one (a definite vibe-coding danger!) and had to wait for things to reset.

“Dude, lnav and Splunk exist, what is wrong with you?”

Yes, yes, a log colorizer is bougie and lame, and I’m treading over exceedingly well-trodden ground. I did, in fact, sit for a bit with existing tools—particularly lnav, which does most of what I want. But I didn’t want most of my requirements met. I wanted all of them. I wanted a bespoke tool, and I wanted it without having to pay the “is it worth the time?” penalty. (Or, perhaps, I wanted to feel like the LLM’s time was being wasted rather than mine, given that the effort ultimately took two days of vibe-coding.)

And about those two days: Getting a basic colorizer coded and working took maybe 10 minutes and perhaps two rounds of prompts. It was super-easy. Where I burned the majority of the time and compute power was in tweaking the initial result to be exactly what I wanted.

For therein lies the truly seductive part of vibe-coding—the ease of asking the LLM to make small changes or improvements and the apparent absence of cost or consequence for implementing those changes. The impression is that you’re on the Enterprise-D, chatting with the ship’s computer, collaboratively solving a problem with Geordi and Data standing right behind you. It’s downright intoxicating to say, “Hm, yes, now let’s make it so I can show only IPv4 or IPv6 clients with a command line switch,” and the machine does it. (It’s even cooler if you make the request while swinging your leg over the back of a chair so you can sit in it Riker-style!)

Screenshot showing different LLM instructions given by Lee to Claude Code

A sample of the various things I told the machine to do, along with a small visual indication of how this all made me feel.

Credit: Lucasfilm / Disney

A sample of the various things I told the machine to do, along with a small visual indication of how this all made me feel. Credit: Lucasfilm / Disney

It’s exhilarating, honestly, in an Emperor Palpatine “UNLIMITED POWERRRRR!” kind of way. It removes a barrier that I didn’t think would ever be removed—or, rather, one I thought I would never have the time, motivation, or ability to tear down myself.

In the end, after a couple of days of testing and iteration—including a couple of “Is this colorizer performant, and will it introduce system load if run in production?” back-n-forth exchanges where the LLM reduced the cost of our regex matching and ensured our main loop wasn’t very heavy, I got a tool that does exactly what I want.

Specifically, I now have a log colorizer that:

  • Handles multiple Nginx (and Apache) log file formats
  • Colorizes things using 256-color ANSI codes that look roughly the same in different terminal applications
  • Organizes hostname & IP addresses in fixed-length columns for easy scanning
  • Colorizes HTTP status codes and cache status (with configurable colors)
  • Applies different colors to the request URI depending on the resource being requested
  • Has specific warning colors and formatting to highlight non-HTTPS requests or other odd things
  • Can apply alternate colors for specific IP addresses (so I can easily pick out Eric’s or my requests)
  • Can constrain output to only show IPv4 or IPv6 hosts

…and, worth repeating, it all looks exactly how I want it to look and behaves exactly how I want it to behave. Here’s another action shot!

Image of the log colorizer working

The final product. She may not look like much, but she’s got it where it counts, kid.

Credit: Lee Hutchinson

The final product. She may not look like much, but she’s got it where it counts, kid. Credit: Lee Hutchinson

Problem spotted

Armed with my handy-dandy log colorizer, I patiently waited for the wrong-comment-area problem behavior to re-rear its still-ugly head. I did not have to wait long, and within a couple of days, I had my root cause. It had been there all along, if I’d only decided to spend some time looking for it. Here it is:

Screenshot showing a race condition between apple news and wordpress's cache clearing efforts

Problem spotted. Note the AppleNewsBots hitting the newly published post before Discourse can do its thing and the final version of the page with comments is ready.

Credit: Lee Hutchinson

Problem spotted. Note the AppleNewsBots hitting the newly published post before Discourse can do its thing and the final version of the page with comments is ready. Credit: Lee Hutchinson

Briefly: The problem is Apple’s fault. (Well, not really. But kinda.)

Less briefly: I’ve blurred out Eric’s IP address, but it’s dark green, so any place in the above image where you see a blurry, dark green smudge, that’s Eric. In the roughly 12-ish seconds presented here, you’re seeing Eric press the “publish” button on his daily forecast—that’s the “POST” event at the very top of the window. The subsequent events from Eric’s IP address are his browser having the standard post-publication conversation with WordPress so it can display the “post published successfully” notification and then redraw the WP block editor.

Below Eric’s post, you can see the Discourse server (with orange IP address) notifying WordPress that it has created a new Discourse comment thread for Eric’s post, then grabbing the things it needs to mirror Eric’s post as the opener for that thread. You can see it does GETs for the actual post and also for the post’s embedded images. About one second after Eric hits “publish,” the new post’s Discourse thread is ready, and it gets attached to Eric’s post.

Ah, but notice what else happens during that one second.

To help expand Space City Weather’s reach, we cross-publish all of the site’s posts to Apple News, using a popular Apple News plug-in (the same one Ars uses, in fact). And right there, with those two GET requests immediately after Eric’s POST request, lay the problem: You’re seeing the vanguard of Apple News’ hungry army of story-retrieval bots, summoned by the same “publish” event, charging in and demanding a copy of the brand new post before Discourse has a chance to do its thing.

Gif of Eric Andre screaming

I showed the AppleNewsBot stampede log snippet to Techmaster Jason Marlin, and he responded with this gif.

Credit: Adult Swim

I showed the AppleNewsBot stampede log snippet to Techmaster Jason Marlin, and he responded with this gif. Credit: Adult Swim

It was a classic problem in computing: a race condition. Most days, Discourse’s new thread creation would beat the AppleNewsBot rush; some days, though, it wouldn’t. On the days when it didn’t, the horde of Apple bots would demand the page before its Discourse comments were attached, and Cloudflare would happily cache what those bots got served.

I knew my fix of emitting “NO CACHE” headers on the story pages prior to Discourse attaching comments worked, but now I knew why it worked—and why the problem existed in the first place. And oh, dear reader, is there anything quite so viscerally satisfying in all the world as figuring out the “why” behind a long-running problem?

But then, just as Icarus became so entranced by the miracle of flight that he lost his common sense, I too forgot I soared on wax-wrought wings, and flew too close to the sun.

LLMs are not the Enterprise-D’s computer

I think we all knew I’d get here eventually—to the inevitable third act turn, where the center cannot hold, and things fall apart. If you read Benj’s latest experience with agentic-based vibe coding—or if you’ve tried it yourself—then what I’m about to say will probably sound painfully obvious, but it is nonetheless time to say it.

Despite their capabilities, LLM coding agents are not smart. They also are not dumb. They are agents without agency—mindless engines whose purpose is to complete the prompt, and that is all.

Screenshot of Data, Geordi, and Riker collaboratively coding at one of the bridge's aft science stations

It feels like this… until it doesn’t.

Credit: Paramount Television

It feels like this… until it doesn’t. Credit: Paramount Television

What this means is that, if you let them, Claude Code (and OpenAI Codex and all the other agentic coding LLMs) will happily spin their wheels for hours hammering on a solution that can’t ever actually work, so long as their efforts match the prompt. It’s on you to accurately scope your problem. You must articulate what you want in plain and specific domain-appropriate language, because the LLM cannot and will not properly intuit anything you leave unsaid. And having done that, you must then spot and redirect the LLM away from traps and dead ends. Otherwise, it will guess at what you want based on the alignment of a bunch of n-dimensional curves and vectors in high-order phase space, and it might guess right—but it also very much might not.

Lee loses the plot

So I had my log colorizer, and I’d found my problem. I’d also found, after leaving the colorizer up in a window tailing the web server logs in real time, all kinds of things that my previous behavior of occasionally glancing at the logs wasn’t revealing. Ooh, look, there’s a rest route that should probably be blocked from the outside world! Ooh, look, there’s a web crawler I need to feed into Cloudflare’s WAF wood-chipper because it’s ignoring robots.txt! Ooh, look, here’s an area where I can tweak my fastcgi cache settings and eke out a slightly better hit rate!

But here’s the thing with the joy of problem-solving: Like all joy, its source is finite. The joy comes from the solving itself, and even when all my problems are solved and the systems are all working great, I still crave more joy. It is in my nature to therefore invent new problems to solve.

I decided that the problem I wanted to solve next was figuring out a way for my log colorizer to display its output without wrapping long lines—because wrapped lines throw off the neatly delimited columns of log data. I would instead prefer that my terminal window sprout a horizontal scroll bar when needed, and if I wanted to see the full extent of a long line, I could grab the scroll bar and investigate.

Astute readers will at this point notice two things: first, that now I really was reinventing lnav, except way worse and way dumber. Second, and more importantly, line-wrapping behavior is properly a function of the terminal application, not the data being displayed within it, and my approach was misguided from first principles. (This is in fact exactly the kind of request that can and should be slapped down on StackOverflow—and, indeed, searching there shows many examples of this exact thing happening.)

But the lure of telling the machine what to do and then watching the machine weave my words into functional magic was too strong—surely we could code our way out of this problem! With LLMs, we can code our way out of any problem! Right?

Eventually, after much refining of requirements, Claude produced what I asked it to produce: a separate Python script, which accepted piped input and created, like, a viewport or something—I don’t know, I can’t code, remember?—and within that viewport, I could scroll around. It seemed to work great!

Problems became apparent when I actually tried scrolling around for more than a short distance—and by “problems,” I mean “my web server’s CPU lit up like a Christmas tree on fire.”

“Issue,” I told the LLM. “When horizontal scrolling happens, the log wrapper takes up almost 100 percent of a host CPU core. Can we assess how to make this a zero-CPU-impact tool?”

“This tool has to parse ANSI, slice lines, and repaint when you scroll,” it replied in what I imagined was an arch, droll tone. “The bottleneck is screen redraw + full-width scan of all lines on every new line or key repeat. Each new line triggers a full screen redraw of term_height lines, plus visible_width() over the entire buffer. Holding the arrow key generates rapid repeats, each doing a full redraw and slice of every line.”

I was crestfallen.

“Zero-CPU isn’t achievable,” continued the LLM, “but there are low-impact approaches.”

My crest lifted a bit, and we dug in—but the LLM was perhaps a bit optimistic about those “low-impact approaches.” We burned several more days’ worth of tokens on performance improvements—none of which I had any realistic input on because at this point we were way, way past my ability to flail through the Python code and understand what the LLM was doing. Eventually, we hit a wall.

Screenshot of the LLM telling Lee that this is just not going to work

If you listen carefully, you can hear the sound of my expectations crashing hard into reality.

If you listen carefully, you can hear the sound of my expectations crashing hard into reality.

Instead of throwing in the towel, I vibed on, because the sunk cost fallacy is for other people. I instructed the LLM to shift directions and help me run the log display script locally, so my desktop machine with all its many cores and CPU cycles to spare would be the one shouldering the reflow/redraw burden and not the web server.

Rather than drag this tale on for any longer, I’ll simply enlist Ars Creative Director Aurich Lawson’s skills to present the story of how this worked out in the form of a fun collage, showing my increasingly unhinged prompting of the LLM to solve the new problems that appeared when trying to get a script to run on ssh output when key auth and sudo are in play:

A collage of error messages begetting madness

Mammas, don’t let your babies grow up to be vibe coders.

Credit: Aurich Lawson

Mammas, don’t let your babies grow up to be vibe coders. Credit: Aurich Lawson

The bitter end

So, thwarted in my attempts to do exactly what I wanted in exactly the way I wanted, I took my log colorizer and went home. (The failed log display script is also up on GitHub with the colorizer if anyone wants to point and laugh at my efforts. Is the code good? Who knows?! Not me!) I’d scored my big win and found my problem root cause, and that would have to be enough for me—for now, at least.

As to that “big win”—finally managing a root-cause analysis of my WordPress-Discourse-Cloudflare caching issue—I also recognize that I probably didn’t need a vibe-coded log colorizer to get there. The evidence was already waiting to be discovered in the Nginx logs, whether or not it was presented to me wrapped in fancy colors. Did I, in fact, use the thrill of vibe coding a tool to Tom Sawyer myself into doing the log searches? (“Wow, self, look at this new cool log colorizer! Bet you could use that to solve all kinds of problems! Yeah, self, you’re right! Let’s do it!”) Very probably. I know how to motivate myself, and sometimes starting a task requires some mental trickery.

This round of vibe coding and its muddled finale reinforced my personal assessment of LLMs—an assessment that hasn’t changed much with the addition of agentic abilities to the toolkit.

LLMs can be fantastic if you’re using them to do something that you mostly understand. If you’re familiar enough with a problem space to understand the common approaches used to solve it, and you know the subject area well enough to spot the inevitable LLM hallucinations and confabulations, and you understand the task at hand well enough to steer the LLM away from dead-ends and to stop it from re-inventing the wheel, and you have the means to confirm the LLM’s output, then these tools are, frankly, kind of amazing.

But the moment you step outside of your area of specialization and begin using them for tasks you don’t mostly understand, or if you’re not familiar enough with the problem to spot bad solutions, or if you can’t check its output, then oh, dear reader, may God have mercy on your soul. And on your poor project, because it’s going to be a mess.

These tools as they exist today can help you if you already have competence. They cannot give you that competence. At best, they can give you a dangerous illusion of mastery; at worst, well, who even knows? Lost data, leaked PII, wasted time, possible legal exposure if the project is big enough—the “worst” list goes on and on!

To vibe or not to vibe?

The log colorizer is not the first nor the last bit of vibe coding I’ve indulged in. While I’m not as prolific as Benj, over the past couple of months, I’ve turned LLMs loose on a stack of coding tasks that needed doing but that I couldn’t do myself—often in direct contravention of my own advice above about being careful to use them only in areas where you already have some competence. I’ve had the thing make small WordPress PHP plugins, regexes, bash scripts, and my current crowning achievement: a save editor for an old MS-DOS game (in both Python and Swift, no less!) And I had fun doing these things, even as entire vast swaths of rainforest were lit on fire to power my agentic adventures.

As someone employed in a creative field, I’m appropriately nervous about LLMs, but for me, it’s time to face reality. An overwhelming majority of developers say they’re using AI tools in some capacity. It’s a safer career move at this point, almost regardless of one’s field, to be more familiar with them than unfamiliar with them. The genie is not going back into the lamp—it’s too busy granting wishes.

I don’t want y’all to think I feel doomy-gloomy over the genie, either, because I’m right there with everyone else, shouting my wishes at the damn thing. I am a better sysadmin than I was before agentic coding because now I can solve problems myself that I would have previously needed to hand off to someone else. Despite the problems, there is real value there,  both personally and professionally. In fact, using an agentic LLM to solve a tightly constrained programming problem that I couldn’t otherwise solve is genuinely fun.

And when screwing around with computers stops being fun, that’s when I’ll know I’ve truly become old.

Photo of Lee Hutchinson

Lee is the Senior Technology Editor, and oversees story development for the gadget, culture, IT, and video sections of Ars Technica. A long-time member of the Ars OpenForum with an extensive background in enterprise storage and security, he lives in Houston.

So yeah, I vibe-coded a log colorizer—and I feel good about it Read More »

nvidia’s-$100-billion-openai-deal-has-seemingly-vanished

Nvidia’s $100 billion OpenAI deal has seemingly vanished

A Wall Street Journal report on Friday said Nvidia insiders had expressed doubts about the transaction and that Huang had privately criticized what he described as a lack of discipline in OpenAI’s business approach. The Journal also reported that Huang had expressed concern about the competition OpenAI faces from Google and Anthropic. Huang called those claims “nonsense.”

Nvidia shares fell about 1.1 percent on Monday following the reports. Sarah Kunst, managing director at Cleo Capital, told CNBC that the back-and-forth was unusual. “One of the things I did notice about Jensen Huang is that there wasn’t a strong ‘It will be $100 billion.’ It was, ‘It will be big. It will be our biggest investment ever.’ And so I do think there are some question marks there.”

In September, Bryn Talkington, managing partner at Requisite Capital Management, noted the circular nature of such investments to CNBC. “Nvidia invests $100 billion in OpenAI, which then OpenAI turns back and gives it back to Nvidia,” Talkington said. “I feel like this is going to be very virtuous for Jensen.”

Tech critic Ed Zitron has been critical of Nvidia’s circular investments for some time, which touch dozens of tech companies, including major players and startups. They are also all Nvidia customers.

“NVIDIA seeds companies and gives them the guaranteed contracts necessary to raise debt to buy GPUs from NVIDIA,” Zitron wrote on Bluesky last September, “Even though these companies are horribly unprofitable and will eventually die from a lack of any real demand.”

Chips from other places

Outside of sourcing GPUs from Nvidia, OpenAI has reportedly discussed working with startups Cerebras and Groq, both of which build chips designed to reduce inference latency. But in December, Nvidia struck a $20 billion licensing deal with Groq, which Reuters sources say ended OpenAI’s talks with Groq. Nvidia hired Groq’s founder and CEO Jonathan Ross along with other senior leaders as part of the arrangement.

In January, OpenAI announced a $10 billion deal with Cerebras instead, adding 750 megawatts of computing capacity for faster inference through 2028. Sachin Katti, who joined OpenAI from Intel in November to lead compute infrastructure, said the partnership adds “a dedicated low-latency inference solution” to OpenAI’s platform.

But OpenAI has clearly been hedging its bets. Beyond the Cerebras deal, the company struck an agreement with AMD in October for six gigawatts of GPUs and announced plans with Broadcom to develop a custom AI chip to wean itself off of Nvidia dependence. When those chips will be ready, however, is currently unknown.

Nvidia’s $100 billion OpenAI deal has seemingly vanished Read More »