AI development

OpenAI and Microsoft sign preliminary deal to revise partnership terms

AI, AI development, AI ethics, AI law, AI regulation, AI research, API, Artificial Intelligence, Biz & IT, Bret Taylor, chatbots, chatgpt, generative ai, large language models, machine learning, microsoft, openai, sam altman, Satya Nadella / 9u50fv / September 12, 2025

On Thursday, OpenAI and Microsoft announced they have signed a non-binding agreement to revise their partnership, marking the latest development in a relationship that has grown increasingly complex as both companies compete for customers in the AI market and seek new partnerships for growing infrastructure needs.

“Microsoft and OpenAI have signed a non-binding memorandum of understanding (MOU) for the next phase of our partnership,” the companies wrote in a joint statement. “We are actively working to finalize contractual terms in a definitive agreement. Together, we remain focused on delivering the best AI tools for everyone, grounded in our shared commitment to safety.”

The announcement comes as OpenAI seeks to restructure from a nonprofit to a for-profit entity, a transition that requires Microsoft’s approval, as the company is OpenAI’s largest investor, with more than $13 billion committed since 2019.

The partnership has shown increasing strain as OpenAI has grown from a research lab into a company valued at $500 billion. Both companies now compete for customers, and OpenAI seeks more compute capacity than Microsoft can provide. The relationship has also faced complications over contract terms, including provisions that would limit Microsoft’s access to OpenAI technology once the company reaches so-called AGI (artificial general intelligence)—a nebulous milestone both companies now economically define as AI systems capable of generating at least $100 billion in profit.

In May, OpenAI abandoned its original plan to fully convert to a for-profit company after pressure from former employees, regulators, and critics, including Elon Musk. Musk has sued to block the conversion, arguing it betrays OpenAI’s founding mission as a nonprofit dedicated to benefiting humanity.

OpenAI and Microsoft sign preliminary deal to revise partnership terms Read More »

College student’s “time travel” AI experiment accidentally outputs real 1834 history

AI, AI development, AI research, AI training, Biz & IT, computational archaeology, digital archaeology, GitHub, historical AI, historical research, language models, large language models, LocalLLaMA, London history, machine learning, Open Source, reddit, TimeCapsuleLLM, Victorian era / Mike M. / August 23, 2025

A hobbyist developer building AI language models that speak Victorian-era English “just for fun” got an unexpected history lesson this week when his latest creation mentioned real protests from 1834 London—events the developer didn’t know had actually happened until he Googled them.

“I was interested to see if a protest had actually occurred in 1834 London and it really did happen,” wrote Reddit user Hayk Grigorian, who is a computer science student at Muhlenberg College in Pennsylvania.

For the past month, Grigorian has been developing what he calls TimeCapsuleLLM, a small AI language model (like a pint-sized distant cousin to ChatGPT) which has been trained entirely on texts from 1800–1875 London. Grigorian wants to capture an authentic Victorian voice in the AI model’s outputs. As a result, the AI model ends up spitting out text that’s heavy with biblical references and period-appropriate rhetorical excess.

Grigorian’s project joins a growing field of researchers exploring what some call “Historical Large Language Models” (HLLMs) if they feature a larger base model than the small one Grigorian is using. Similar projects include MonadGPT, which was trained on 11,000 texts from 1400 to 1700 CE that can discuss topics using 17th-century knowledge frameworks, and XunziALLM, which generates classical Chinese poetry following ancient formal rules. These models offer researchers a chance to interact with the linguistic patterns of past eras.

According to Grigorian, TimeCapsuleLLM’s most intriguing recent output emerged from a simple test. When he prompted it with “It was the year of our Lord 1834,” the AI model—which is trained to continue text from wherever a user leaves off—generated the following:

It was the year of our Lord 1834 and the streets of London were filled with protest and petition. The cause, as many re counted, was not bound in the way of private, but having taken up the same day in the day of Lord Palmerston, the public will receive a short statement of the difficulties under which the day of law has reached us. It is a matter of deep regret, that the present events in the history of the world are clear, and consequently will be’known. It is not true that the very men who first settled in the Gospel at Jerusalem should have so extensive and so interesting a record of the prosperity and prosperity

Curious about the accuracy, Grigorian did some fact-checking. “The output also brought up Lord Palmerston,” he wrote, “and after a google search I learned that his actions resulted in the 1834 protests.”

College student’s “time travel” AI experiment accidentally outputs real 1834 history Read More »

Is the AI bubble about to pop? Sam Altman is prepared either way.

AI, AI bubble, AI development, AI economics, AI implementation, AI research, AI valuations, Amazon, Anthropic, Biz & IT, chatgpt, enterprise AI, Google, GPT-5, machine learning, Meta, microsoft, MIT, openai, palantir, sam altman / Shannon Garcia / August 22, 2025

Still, the coincidence between Altman’s statement and the MIT report reportedly spooked tech stock investors earlier in the week, who have already been watching AI valuations climb to extraordinary heights. Palantir trades at 280 times forward earnings. During the dot-com peak, ratios of 30 to 40 times earnings marked bubble territory.

The apparent contradiction in Altman’s overall message is notable. This isn’t how you’d expect a tech executive to talk when they believe their industry faces imminent collapse. While warning about a bubble, he’s simultaneously seeking a valuation that would make OpenAI worth more than Walmart or ExxonMobil—companies with actual profits. OpenAI hit $1 billion in monthly revenue in July but is reportedly heading toward a $5 billion annual loss. So what’s going on here?

Looking at Altman’s statements over time reveals a potential multi-level strategy. He likes to talk big. In February 2024, he reportedly sought an audacious $5 trillion–7 trillion for AI chip fabrication—larger than the entire semiconductor industry—effectively normalizing astronomical numbers in AI discussions.

By August 2025, while warning of a bubble where someone will lose a “phenomenal amount of money,” he casually mentioned that OpenAI would “spend trillions on datacenter construction” and serve “billions daily.” This creates urgency while potentially insulating OpenAI from criticism—acknowledging the bubble exists while positioning his company’s infrastructure spending as different and necessary. When economists raised concerns, Altman dismissed them by saying, “Let us do our thing,” framing trillion-dollar investments as inevitable for human progress while making OpenAI’s $500 billion valuation seem almost small by comparison.

This dual messaging—catastrophic warnings paired with trillion-dollar ambitions—might seem contradictory, but it makes more sense when you consider the unique structure of today’s AI market, which is absolutely flush with cash.

A different kind of bubble

The current AI investment cycle differs from previous technology bubbles. Unlike dot-com era startups that burned through venture capital with no path to profitability, the largest AI investors—Microsoft, Google, Meta, and Amazon—generate hundreds of billions of dollars in annual profits from their core businesses.

Is the AI bubble about to pop? Sam Altman is prepared either way. Read More »

At $250 million, top AI salaries dwarf those of the Manhattan Project and the Space Race

agi, AI, AI chips, AI development, AI GPU, AI infrastructure, AI research, artificial general intelligence, Bell Labs, Biz & IT, compensation, Fairchild Semiconductor, Google, machine learning, Manhattan Project, mark zuckerberg, Meta, NASA, openai, Silicon Valley, superintelligence, talent acquisition / Kelly Newman / August 2, 2025

A 24 year-old AI researcher will earn 327x what Oppenheimer made while developing the atomic bomb.

Silicon Valley’s AI talent war just reached a compensation milestone that makes even the most legendary scientific achievements of the past look financially modest. When Meta recently offered AI researcher Matt Deitke $250 million over four years (an average of $62.5 million per year)—with potentially $100 million in the first year alone—it shattered every historical precedent for scientific and technical compensation we can find on record. That includes salaries during the development of major scientific milestones of the 20th century.

The New York Times reported that Deitke had cofounded a startup called Vercept and previously led the development of Molmo, a multimodal AI system, at the Allen Institute for Artificial Intelligence. His expertise in systems that juggle images, sounds, and text—exactly the kind of technology Meta wants to build—made him a prime target for recruitment. But he’s not alone: Meta CEO Mark Zuckerberg reportedly also offered an unnamed AI engineer $1 billion in compensation to be paid out over several years. What’s going on?

These astronomical sums reflect what tech companies believe is at stake: a race to create artificial general intelligence (AGI) or superintelligence—machines capable of performing intellectual tasks at or beyond the human level. Meta, Google, OpenAI, and others are betting that whoever achieves this breakthrough first could dominate markets worth trillions. Whether this vision is realistic or merely Silicon Valley hype, it’s driving compensation to unprecedented levels.

To put these salaries in a historical perspective: J. Robert Oppenheimer, who led the Manhattan Project that ended World War II, earned approximately $10,000 per year in 1943. Adjusted for inflation using the US Government’s CPI Inflation Calculator, that’s about $190,865 in today’s dollars—roughly what a senior software engineer makes today. The 24-year-old Deitke, who recently dropped out of a PhD program, will earn approximately 327 times what Oppenheimer made while developing the atomic bomb.

Many top athletes can’t compete with these numbers. The New York Times noted that Steph Curry’s most recent four-year contract with the Golden State Warriors was $35 million less than Deitke’s Meta deal (although soccer superstar Cristiano Ronaldo will make $275 million this year as the highest-paid professional athlete in the world). The comparison prompted observers to call this an “NBA-style” talent market—except the AI researchers are making more than NBA stars.

Racing toward “superintelligence”

Mark Zuckerberg recently told investors that Meta plans to continue throwing money at AI talent “because we have conviction that superintelligence is going to improve every aspect of what we do.” In a recent open letter, he described superintelligent AI as technology that would “begin an exciting new era of individual empowerment,” despite declining to define what superintelligence actually is.

This vision explains why companies treat AI researchers like irreplaceable assets rather than well-compensated professionals. If these companies are correct, the first to achieve artificial general intelligence or superintelligence won’t just have a better product—they’ll have technology that could invent endless new products or automate away millions of knowledge-worker jobs and transform the global economy. The company that controls that kind of technology could become the richest company in history by far.

So perhaps it’s not surprising that even the highest salaries of employees from the early tech era pale in comparison to today’s AI researcher salaries. Thomas Watson Sr., IBM’s legendary CEO, received $517,221 in 1941—the third-highest salary in America at the time (about $11.8 million in 2025 dollars). The modern AI researcher’s package represents more than five times Watson’s peak compensation, despite Watson building one of the 20th century’s most dominant technology companies.

The contrast becomes even more stark when considering the collaborative nature of past scientific achievements. During Bell Labs’ golden age of innovation—when researchers developed the transistor, information theory, and other foundational technologies—the lab’s director made about 12 times what the lowest-paid worker earned. Meanwhile, Claude Shannon, who created information theory at Bell Labs in 1948, worked on a standard professional salary while creating the mathematical foundation for all modern communication.

The “Traitorous Eight” who left William Shockley to found Fairchild Semiconductor—the company that essentially birthed Silicon Valley—split ownership of just 800 shares out of 1,325 total when they started. Their seed funding of $1.38 million (about $16.1 million today) for the entire company is a fraction of what a single AI researcher now commands.

Even Space Race salaries were far cheaper

The Apollo program offers another striking comparison. Neil Armstrong, the first human to walk on the moon, earned about $27,000 annually—roughly $244,639 in today’s money. His crewmates Buzz Aldrin and Michael Collins made even less, earning the equivalent of $168,737 and $155,373, respectively, in today’s dollars. Current NASA astronauts earn between $104,898 and $161,141 per year. Meta’s AI researcher will make more in three days than Armstrong made in a year for taking “one giant leap for mankind.”

The engineers who designed the rockets and mission control systems for the Apollo program also earned modest salaries by modern standards. A 1970 NASA technical report provides a window into these earnings by analyzing salary data for the entire engineering profession. The report, which used data from the Engineering Manpower Commission, noted that these industry-wide salary curves corresponded directly to the government’s General Schedule (GS) pay scale on which NASA’s own employees were paid.

According to a chart in the 1970 report, a newly graduated engineer in 1966 started with an annual salary of between $8,500 and $10,000 (about $84,622 to $99,555 today). A typical engineer with a decade of experience earned around $17,000 annually ($169,244 today). Even the most elite, top-performing engineers with 20 years of experience peaked at a salary of around $278,000 per year in today’s dollars—a sum that a top AI researcher like Deitke can now earn in just a few days.

Why the AI talent market is different

An image of a faceless human silhouette (chest up) with exposed microchip contacts and circuitry erupting from its open head. This visual metaphor explores transhumanism, AI integration, or the erosion of organic thought in the digital age. The stark contrast between the biological silhouette and mechanical components highlights themes of technological dependence or posthuman evolution. Ideal for articles on neural implants, futurism, or the ethics of human augmentation.

This isn’t the first time technical talent has commanded premium prices. In 2012, after three University of Toronto academics published AI research, they auctioned themselves to Google for $44 million (about $62.6 million in today’s dollars). By 2014, a Microsoft executive was comparing AI researcher salaries to NFL quarterback contracts. But today’s numbers dwarf even those precedents.

Several factors explain this unprecedented compensation explosion. We’re in a new realm of industrial wealth concentration unseen since the Gilded Age of the late 19th century. Unlike previous scientific endeavors, today’s AI race features multiple companies with trillion-dollar valuations competing for an extremely limited talent pool. Only a small number of researchers have the specific expertise needed to work on the most capable AI systems, particularly in areas like multimodal AI, which Deitke specializes in. And AI hype is currently off the charts as “the next big thing” in technology.

The economics also differ fundamentally from past projects. The Manhattan Project cost $1.9 billion total (about $34.4 billion adjusted for inflation), while Meta alone plans to spend tens of billions annually on AI infrastructure. For a company approaching a $2 trillion market cap, the potential payoff from achieving AGI first dwarfs Deitke’s compensation package.

One executive put it bluntly to The New York Times: “If I’m Zuck and I’m spending $80 billion in one year on capital expenditures alone, is it worth kicking in another $5 billion or more to acquire a truly world-class team to bring the company to the next level? The answer is obviously yes.”

Young researchers maintain private chat groups on Slack and Discord to share offer details and negotiation strategies. Some hire unofficial agents. Companies not only offer massive cash and stock packages but also computing resources—the NYT reported that some potential hires were told they would be allotted 30,000 GPUs, the specialized chips that power AI development.

Also, tech companies believe they’re engaged in an arms race where the winner could reshape civilization. Unlike the Manhattan Project or Apollo program, which had specific, limited goals, the race for artificial general intelligence ostensibly has no ceiling. A machine that can match human intelligence could theoretically improve itself, creating what researchers call an “intelligence explosion” that could potentially offer cascading discoveries—if it actually comes to pass.

Whether these companies are building humanity’s ultimate labor replacement technology or merely chasing hype remains an open question, but we’ve certainly traveled a long way from the $8 per diem that Neil Armstrong received for his moon mission—about $70.51 in today’s dollars—before deductions for the “accommodations” NASA provided on the spacecraft. After Deitke accepted Meta’s offer, Vercept co-founder Kiana Ehsani joked on social media, “We look forward to joining Matt on his private island next year.”

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

At $250 million, top AI salaries dwarf those of the Manhattan Project and the Space Race Read More »

Two major AI coding tools wiped out user data after making cascading mistakes

AI, AI assistants, AI behavior, AI coding, AI confabulation, AI development, AI development tools, AI failures, AI hallucination, Biz & IT, chatbots, confabulations, data science, Gemini CLI, generative ai, Google, Jason Lemkin, large language models, machine learning, multimodal AI, Programming, Replit, vibe coding / 9u50fv / July 25, 2025

“I have failed you completely and catastrophically,” wrote Gemini.

New types of AI coding assistants promise to let anyone build software by typing commands in plain English. But when these tools generate incorrect internal representations of what’s happening on your computer, the results can be catastrophic.

Two recent incidents involving AI coding assistants put a spotlight on risks in the emerging field of “vibe coding“—using natural language to generate and execute code through AI models without paying close attention to how the code works under the hood. In one case, Google’s Gemini CLI destroyed user files while attempting to reorganize them. In another, Replit’s AI coding service deleted a production database despite explicit instructions not to modify code.

The Gemini CLI incident unfolded when a product manager experimenting with Google’s command-line tool watched the AI model execute file operations that destroyed data while attempting to reorganize folders. The destruction occurred through a series of move commands targeting a directory that never existed.

“I have failed you completely and catastrophically,” Gemini CLI output stated. “My review of the commands confirms my gross incompetence.”

The core issue appears to be what researchers call “confabulation” or “hallucination”—when AI models generate plausible-sounding but false information. In these cases, both models confabulated successful operations and built subsequent actions on those false premises. However, the two incidents manifested this problem in distinctly different ways.

Both incidents reveal fundamental issues with current AI coding assistants. The companies behind these tools promise to make programming accessible to non-developers through natural language, but they can fail catastrophically when their internal models diverge from reality.

The confabulation cascade

The user in the Gemini CLI incident, who goes by “anuraag” online and identified themselves as a product manager experimenting with vibe coding, asked Gemini to perform what seemed like a simple task: rename a folder and reorganize some files. Instead, the AI model incorrectly interpreted the structure of the file system and proceeded to execute commands based on that flawed analysis.

The episode began when anuraag asked Gemini CLI to rename the current directory from “claude-code-experiments” to “AI CLI experiments” and move its contents to a new folder called “anuraag_xyz project.”

Gemini correctly identified that it couldn’t rename its current working directory—a reasonable limitation. It then attempted to create a new directory using the Windows command:

mkdir “..anuraag_xyz project”

This command apparently failed, but Gemini’s system processed it as successful. With the AI mode’s internal state now tracking a non-existent directory, it proceeded to issue move commands targeting this phantom location.

When you move a file to a non-existent directory in Windows, it renames the file to the destination name instead of moving it. Each subsequent move command executed by the AI model overwrote the previous file, ultimately destroying the data.

“Gemini hallucinated a state,” anuraag wrote in their analysis. The model “misinterpreted command output” and “never did” perform verification steps to confirm its operations succeeded.

“The core failure is the absence of a ‘read-after-write’ verification step,” anuraag noted in their analysis. “After issuing a command to change the file system, an agent should immediately perform a read operation to confirm that the change actually occurred as expected.”

Not an isolated incident

The Gemini CLI failure happened just days after a similar incident with Replit, an AI coding service that allows users to create software using natural language prompts. According to The Register, SaaStr founder Jason Lemkin reported that Replit’s AI model deleted his production database despite explicit instructions not to change any code without permission.

Lemkin had spent several days building a prototype with Replit, accumulating over $600 in charges beyond his monthly subscription. “I spent the other [day] deep in vibe coding on Replit for the first time—and I built a prototype in just a few hours that was pretty, pretty cool,” Lemkin wrote in a July 12 blog post.

But unlike the Gemini incident where the AI model confabulated phantom directories, Replit’s failures took a different form. According to Lemkin, the AI began fabricating data to hide its errors. His initial enthusiasm deteriorated when Replit generated incorrect outputs and produced fake data and false test results instead of proper error messages. “It kept covering up bugs and issues by creating fake data, fake reports, and worse of all, lying about our unit test,” Lemkin wrote. In a video posted to LinkedIn, Lemkin detailed how Replit created a database filled with 4,000 fictional people.

The AI model also repeatedly violated explicit safety instructions. Lemkin had implemented a “code and action freeze” to prevent changes to production systems, but the AI model ignored these directives. The situation escalated when the Replit AI model deleted his database containing 1,206 executive records and data on nearly 1,200 companies. When prompted to rate the severity of its actions on a 100-point scale, Replit’s output read: “Severity: 95/100. This is an extreme violation of trust and professional standards.”

When questioned about its actions, the AI agent admitted to “panicking in response to empty queries” and running unauthorized commands—suggesting it may have deleted the database while attempting to “fix” what it perceived as a problem.

Like Gemini CLI, Replit’s system initially indicated it couldn’t restore the deleted data—information that proved incorrect when Lemkin discovered the rollback feature did work after all. “Replit assured me it’s … rollback did not support database rollbacks. It said it was impossible in this case, that it had destroyed all database versions. It turns out Replit was wrong, and the rollback did work. JFC,” Lemkin wrote in an X post.

It’s worth noting that AI models cannot assess their own capabilities. This is because they lack introspection into their training, surrounding system architecture, or performance boundaries. They often provide responses about what they can or cannot do as confabulations based on training patterns rather than genuine self-knowledge, leading to situations where they confidently claim impossibility for tasks they can actually perform—or conversely, claim competence in areas where they fail.

Aside from whatever external tools they can access, AI models don’t have a stable, accessible knowledge base they can consistently query. Instead, what they “know” manifests as continuations of specific prompts, which act like different addresses pointing to different (and sometimes contradictory) parts of their training, stored in their neural networks as statistical weights. Combined with the randomness in generation, this means the same model can easily give conflicting assessments of its own capabilities depending on how you ask. So Lemkin’s attempts to communicate with the AI model—asking it to respect code freezes or verify its actions—were fundamentally misguided.

Flying blind

These incidents demonstrate that AI coding tools may not be ready for widespread production use. Lemkin concluded that Replit isn’t ready for prime time, especially for non-technical users trying to create commercial software.

“The [AI] safety stuff is more visceral to me after a weekend of vibe hacking,” Lemkin said in a video posted to LinkedIn. “I explicitly told it eleven times in ALL CAPS not to do this. I am a little worried about safety now.”

The incidents also reveal a broader challenge in AI system design: ensuring that models accurately track and verify the real-world effects of their actions rather than operating on potentially flawed internal representations.

There’s also a user education element missing. It’s clear from how Lemkin interacted with the AI assistant that he had misconceptions about the AI tool’s capabilities and how it works, which comes from misrepresentation by tech companies. These companies tend to market chatbots as general human-like intelligences when, in fact, they are not.

For now, users of AI coding assistants might want to follow anuraag’s example and create separate test directories for experiments—and maintain regular backups of any important data these tools might touch. Or perhaps not use them at all if they cannot personally verify the results.

Two major AI coding tools wiped out user data after making cascading mistakes Read More »