AI

trump-revives-unpopular-ted-cruz-plan-to-punish-states-that-impose-ai-laws

Trump revives unpopular Ted Cruz plan to punish states that impose AI laws

The FTC chairman would be required to issue a policy statement detailing “circumstances under which State laws that require alterations to the truthful outputs of AI models are preempted by the FTC Act’s prohibition on engaging in deceptive acts or practices affecting commerce.”

When Cruz proposed a moratorium restricting state AI regulation in mid-2025, Sen. Marsha Blackburn (R-Tenn.) helped lead the fight against it. “Until Congress passes federally preemptive legislation like the Kids Online Safety Act and an online privacy framework, we can’t block states from making laws that protect their citizens,” Blackburn said at the time.

Sen. Maria Cantwell (D-Wash.) also spoke out against the Cruz plan, saying it would preempt “good state consumer protection laws” related to robocalls, deepfakes, and autonomous vehicles.

Trump wants Congress to preempt state laws

Besides reviving the Cruz plan, Trump’s draft executive order seeks new legislation to preempt state laws. The order would direct Trump administration officials to “jointly prepare for my review a legislative recommendation establishing a uniform Federal regulatory framework for AI that preempts State AI laws that conflict with the policy set forth in this order.”

House Majority Leader Steve Scalise (R-La.) this week said a ban on state AI laws could be included in the National Defense Authorization Act (NDAA). Democrats are trying to keep the ban out of the bill.

“We have to allow states to take the lead because we’re not able to, so far in Washington, come up with appropriate legislation,” Sen. Jack Reed (D-R.I.), the ranking member on the Armed Services Committee, told Semafor.

In a Truth Social post on Tuesday, Trump claimed that states are “trying to embed DEI ideology into AI models.” Trump wrote, “We MUST have one Federal Standard instead of a patchwork of 50 State Regulatory Regimes. If we don’t, then China will easily catch us in the AI race. Put it in the NDAA, or pass a separate Bill, and nobody will ever be able to compete with America.”

Trump revives unpopular Ted Cruz plan to punish states that impose AI laws Read More »

google’s-new-nano-banana-pro-uses-gemini-3-power-to-generate-more-realistic-ai-images

Google’s new Nano Banana Pro uses Gemini 3 power to generate more realistic AI images

Detecting less sloppy slop

Google is not just blowing smoke—the new image generator is much better. Its grasp of the world and the nuance of language is apparent, producing much more realistic results. Even before this, AI images were getting so good that it could be hard to spot them at a glance. Gone are the days when you could just count fingers to identify AI. Google is making an effort to help identify AI content, though.

Images generated with Nano Banana Pro continue to have embedded SynthID watermarks that Google’s tools can detect. The company is also adding more C2PA metadata to further label AI images. The Gemini app is part of this effort, too. Starting now, you can upload an image and ask something like “Is this AI?” The app won’t detect just any old AI image, but it will tell you if it’s a product of Google AI by checking for SynthID.

Gemini can now detect its own AI images.

At the same time, Google is making it slightly harder for people to know an image was generated with AI. Operating with the knowledge that professionals may want to generate images with Nano Banana Pro, Google has removed the visible watermark from images for AI Ultra subscribers. These images still have SynthID, but only the lower tiers have the Gemini twinkle in the corner.

While everyone can access the new Nano Banana Pro today, AI Ultra subscribers will enjoy the highest usage limits. Gemini Pro users will get a bit less access, and free users will get the lowest limits before being booted down to the non-pro version.

Google’s new Nano Banana Pro uses Gemini 3 power to generate more realistic AI images Read More »

deepmind’s-latest:-an-ai-for-handling-mathematical-proofs

DeepMind’s latest: An AI for handling mathematical proofs


AlphaProof can handle math challenges but needs a bit of help right now.

Computers are extremely good with numbers, but they haven’t gotten many human mathematicians fired. Until recently, they could barely hold their own in high school-level math competitions.

But now Google’s DeepMind team has built AlphaProof, an AI system that matched silver medalists’ performance at the 2024 International Mathematical Olympiad, scoring just one point short of gold at the most prestigious undergrad math competition in the world. And that’s kind of a big deal.

True understanding

The reason computers fared poorly in math competitions is that, while they far surpass humanity’s ability to perform calculations, they are not really that good at the logic and reasoning that is needed for advanced math. Put differently, they are good at performing calculations really quickly, but they usually suck at understanding why they’re doing them. While something like addition seems simple, humans can do semi-formal proofs based on definitions of addition or go for fully formal Peano arithmetic that defines the properties of natural numbers and operations like addition through axioms.

To perform a proof, humans have to understand the very structure of mathematics. The way mathematicians build proofs, how many steps they need to arrive at the conclusion, and how cleverly they design those steps are a testament to their brilliance, ingenuity, and mathematical elegance. “You know, Bertrand Russel published a 500-page book to prove that one plus one equals two,” says Thomas Hubert, a DeepMind researcher and lead author of the AlphaProof study.

DeepMind’s team wanted to develop an AI that understood math at this level. The work started with solving the usual AI problem: the lack of training data.

Math problems translator

Large language models that power AI systems like Chat GPT learn from billions upon billions of pages of text. Because there are texts on mathematics in their training databases—all the handbooks and works of famous mathematicians—they show some level of success in proving mathematical statements. But they are limited by how they operate: They rely on using huge neural nets to predict the next word or token in sequences generated in response to user prompts. Their reasoning is statistical by design, which means they simply return answers that “sound” right.

DeepMind didn’t need the AI to “sound” right—that wasn’t going to cut it in high-level mathematics. They needed their AI to “be” right, to guarantee absolute certainty. That called for an entirely new, more formalized training environment. To provide that, the team used a software package called Lean.

Lean is a computer program that helps mathematicians write precise definitions and proofs. It relies on a precise, formal programming language that’s also called Lean, which mathematical statements can be translated into. Once the translated or formalized statement is uploaded to the program, it can check if it is correct and get back with responses like “this is correct,” “something is missing,” or “you used a fact that is not proved yet.”

The problem was, most mathematical statements and proofs that can be found online are written in natural language like “let X be the set of natural numbers that…”—the number of statements written in Lean was rather limited. “The major difficulty of working with formal languages is that there’s very little data,” Hubert says. To go around it, the researchers trained a Gemini large language model to translate mathematical statements from natural language to Lean. The model worked like an automatic formalizer and produced about 80 million formalized mathematical statements.

It wasn’t perfect, but the team managed to use that to their advantage. “There are many ways you can capitalize on approximate translations,” Hubert claims.

Learning to think

The idea DeepMind had for the AlphaProof was to use the architecture the team used in their chess-, Go-, and shogi-playing AlphaZero AI system. Building proofs in Lean and Mathematics in general was supposed to be just another game to master. “We were trying to learn this game through trial and error,” Hubert says. Imperfectly formalized problems offered great opportunity for making errors. In its learning phase, AlphaProof was simply proving and disproving the problems it had in its database. If something was translated poorly, figuring out that something wasn’t right was a useful form of exercise.

Just like AlphaZero, AlphaProof in most cases used two main components. The first was a huge neural net with a few billion parameters that learned to work in the Lean environment through trial and error. It was rewarded for each proven or disproven statement and penalized for each reasoning step it took, which was a way of incentivizing short, elegant proofs.

It was also trained to use a second component, which was a tree search algorithm. This explored all possible actions that could be taken to push the proof forward at each step. Because the number of possible actions in mathematics can be near infinite, the job of the neural net was to look at the available branches in the search tree and commit computational budget only to the most promising ones.

After a few weeks of training, the system could score well on most math competition benchmarks based on problems sourced from past high school-level competitions, but it still struggled with the most difficult of them. To tackle these, the team added a third component that hadn’t been in AlphaZero. Or anywhere else.

Spark of humanity

The third component, called Test-Time Reinforcement Learning (TTRL), roughly emulated the way mathematicians approach the most difficult problems. The learning part relied on the same combination of neural nets with search tree algorithms. The difference came in what it learned from. Instead of relying on a broad database of auto-formalized problems, AlphaProof working in the TTRL mode started its work by generating an entirely new training dataset based on the problem it was dealing with.

The process involved creating countless variations of the original statement, some simplified a little bit more, some more general, and some only loosely connected to it. The system then attempted to prove or disprove them. It was roughly what most humans do when they’re facing a particularly hard puzzle, the AI equivalent of saying, “I don’t get it, so let’s try an easier version of this first to get some practice.” This allowed AlphaProof to learn on the fly, and it worked amazingly well.

At the 2024 International Mathematics Olympiad, there were 42 points to score for solving six different problems worth seven points each. To win gold, participants had to get 29 points or higher, and 58 out of 609 of them did that. Silver medals were awarded to people who earned between 22 and 28 points (there were 123 silver medalists). The problems varied in difficulty, with the sixth one, acting as a “final boss,” being the most difficult of them all. Only six participants managed to solve it. AlphaProof was the seventh.

But AlphaProof wasn’t an end-all, be-all mathematical genius. Its silver had its price—quite literally.

Optimizing ingenuity

The first problem with AlphaProof’s performance was that it didn’t work alone. To begin with, humans had to make the problems compatible with Lean before the software even got to work. And, among the six Olympic problems, the fourth one was about geometry, and the AI was not optimized for that. To deal with it, AlphaProof had to call a friend called AlphaGeometry 2, a geometry-specialized AI that ripped through the task in a few minutes without breaking a sweat. On its own, AlphaProof scored 21 points, not 28, so technically it would win bronze, not silver. Except it wouldn’t.

Human participants of the Olympiad had to solve their six problems in two sessions, four-and-a-half hours long. AlphaProof, on the other hand, wrestled with them for several days using multiple tensor processing units at full throttle. The most time- and energy-consuming component was TTRL, which battled with the three problems it managed to solve for three days each. If AlphaProof was held up to the same standard as human participants, it would basically run out of time. And if it wasn’t born at a tech giant worth hundreds of billions of dollars, it would run out of money, too.

In the paper, the team admits the computational requirements to run AlphaProof are most likely cost-prohibitive for most research groups and aspiring mathematicians. Computing power in AI applications is often measured in TPU-days, meaning a tensor processing unit working flat-out for a full day. AlphaProof needed hundreds of TPU-days per problem.

On top of that, the International Mathematics Olympiad is a high school-level competition, and the problems, while admittedly difficult, were based on things mathematicians already know. Research-level math requires inventing entirely new concepts instead of just working with existing ones.

But DeepMind thinks it can overcome these hurdles and optimize AlphaProof to be less resource-hungry. “We don’t want to stop at math competitions. We want to build an AI system that could really contribute to research-level mathematics,” Hubert says. His goal is to make AlphaProof available to the broader research community. “We’re also releasing a kind of an AlphaProof tool,” he added. “It would be a small trusted testers program to see if this would be useful to mathematicians.”

Nature, 2025.  DOI: 10.1038/s41586-025-09833-y

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

DeepMind’s latest: An AI for handling mathematical proofs Read More »

how-louvre-thieves-exploited-human-psychology-to-avoid-suspicion—and-what-it-reveals-about ai

How Louvre thieves exploited human psychology to avoid suspicion—and what it reveals about AI

On a sunny morning on October 19 2025, four men allegedly walked into the world’s most-visited museum and left, minutes later, with crown jewels worth 88 million euros ($101 million). The theft from Paris’ Louvre Museum—one of the world’s most surveilled cultural institutions—took just under eight minutes.

Visitors kept browsing. Security didn’t react (until alarms were triggered). The men disappeared into the city’s traffic before anyone realized what had happened.

Investigators later revealed that the thieves wore hi-vis vests, disguising themselves as construction workers. They arrived with a furniture lift, a common sight in Paris’s narrow streets, and used it to reach a balcony overlooking the Seine. Dressed as workers, they looked as if they belonged.

This strategy worked because we don’t see the world objectively. We see it through categories—through what we expect to see. The thieves understood the social categories that we perceive as “normal” and exploited them to avoid suspicion. Many artificial intelligence (AI) systems work in the same way and are vulnerable to the same kinds of mistakes as a result.

The sociologist Erving Goffman would describe what happened at the Louvre using his concept of the presentation of self: people “perform” social roles by adopting the cues others expect. Here, the performance of normality became the perfect camouflage.

The sociology of sight

Humans carry out mental categorization all the time to make sense of people and places. When something fits the category of “ordinary,” it slips from notice.

AI systems used for tasks such as facial recognition and detecting suspicious activity in a public area operate in a similar way. For humans, categorization is cultural. For AI, it is mathematical.

But both systems rely on learned patterns rather than objective reality. Because AI learns from data about who looks “normal” and who looks “suspicious,” it absorbs the categories embedded in its training data. And this makes it susceptible to bias.

The Louvre robbers weren’t seen as dangerous because they fit a trusted category. In AI, the same process can have the opposite effect: people who don’t fit the statistical norm become more visible and over-scrutinized.

It can mean a facial recognition system disproportionately flags certain racial or gendered groups as potential threats while letting others pass unnoticed.

A sociological lens helps us see that these aren’t separate issues. AI doesn’t invent its categories; it learns ours. When a computer vision system is trained on security footage where “normal” is defined by particular bodies, clothing, or behavior, it reproduces those assumptions.

Just as the museum’s guards looked past the thieves because they appeared to belong, AI can look past certain patterns while overreacting to others.

Categorization, whether human or algorithmic, is a double-edged sword. It helps us process information quickly, but it also encodes our cultural assumptions. Both people and machines rely on pattern recognition, which is an efficient but imperfect strategy.

A sociological view of AI treats algorithms as mirrors: They reflect back our social categories and hierarchies. In the Louvre case, the mirror is turned toward us. The robbers succeeded not because they were invisible, but because they were seen through the lens of normality. In AI terms, they passed the classification test.

From museum halls to machine learning

This link between perception and categorization reveals something important about our increasingly algorithmic world. Whether it’s a guard deciding who looks suspicious or an AI deciding who looks like a “shoplifter,” the underlying process is the same: assigning people to categories based on cues that feel objective but are culturally learned.

When an AI system is described as “biased,” this often means that it reflects those social categories too faithfully. The Louvre heist reminds us that these categories don’t just shape our attitudes, they shape what gets noticed at all.

After the theft, France’s culture minister promised new cameras and tighter security. But no matter how advanced those systems become, they will still rely on categorization. Someone, or something, must decide what counts as “suspicious behavior.” If that decision rests on assumptions, the same blind spots will persist.

The Louvre robbery will be remembered as one of Europe’s most spectacular museum thefts. The thieves succeeded because they mastered the sociology of appearance: They understood the categories of normality and used them as tools.

And in doing so, they showed how both people and machines can mistake conformity for safety. Their success in broad daylight wasn’t only a triumph of planning. It was a triumph of categorical thinking, the same logic that underlies both human perception and artificial intelligence.

The lesson is clear: Before we teach machines to see better, we must first learn to question how we see.

Vincent Charles, Reader in AI for Business and Management Science, Queen’s University Belfast, and Tatiana Gherman, Associate Professor of AI for Business and Strategy, University of Northampton.  This article is republished from The Conversation under a Creative Commons license. Read the original article.

How Louvre thieves exploited human psychology to avoid suspicion—and what it reveals about AI Read More »

microsoft-tries-to-head-off-the-“novel-security-risks”-of-windows-11-ai-agents

Microsoft tries to head off the “novel security risks” of Windows 11 AI agents

Microsoft has been adding AI features to Windows 11 for years, but things have recently entered a new phase, with both generative and so-called “agentic” AI features working their way deeper into the bedrock of the operating system. A new build of Windows 11 released to Windows Insider Program testers yesterday includes a new “experimental agentic features” toggle in the Settings to support a feature called Copilot Actions, and Microsoft has published a detailed support article detailing more about just how those “experimental agentic features” will work.

If you’re not familiar, “agentic” is a buzzword that Microsoft has used repeatedly to describe its future ambitions for Windows 11—in plainer language, these agents are meant to accomplish assigned tasks in the background, allowing the user’s attention to be turned elsewhere. Microsoft says it wants agents to be capable of “everyday tasks like organizing files, scheduling meetings, or sending emails,” and that Copilot Actions should give you “an active digital collaborator that can carry out complex tasks for you to enhance efficiency and productivity.”

But like other kinds of AI, these agents can be prone to error and confabulations and will often proceed as if they know what they’re doing even when they don’t. They also present, in Microsoft’s own words, “novel security risks,” mostly related to what can happen if an attacker is able to give instructions to one of these agents. As a result, Microsoft’s implementation walks a tightrope between giving these agents access to your files and cordoning them off from the rest of the system.

Possible risks and attempted fixes

For now, these “experimental agentic features” are optional, only available in early test builds of Windows 11, and off by default. Credit: Microsoft

For example, AI agents running on a PC will be given their own user accounts separate from your personal account, ensuring that they don’t have permission to change everything on the system and giving them their own “desktop” to work with that won’t interfere with what you’re working with on your screen. Users need to approve requests for their data, and “all actions of an agent are observable and distinguishable from those taken by a user.” Microsoft also says agents need to be able to produce logs of their activities and “should provide a means to supervise their activities,” including showing users a list of actions they’ll take to accomplish a multi-step task.

Microsoft tries to head off the “novel security risks” of Windows 11 AI agents Read More »

google-ceo:-if-an-ai-bubble-pops,-no-one-is-getting-out-clean

Google CEO: If an AI bubble pops, no one is getting out clean

Market concerns and Google’s position

Alphabet’s recent market performance has been driven by investor confidence in the company’s ability to compete with OpenAI’s ChatGPT, as well as its development of specialized chips for AI that can compete with Nvidia’s. Nvidia recently reached a world-first $5 trillion valuation due to making GPUs that can accelerate the matrix math at the heart of AI computations.

Despite acknowledging that no company would be immune to a potential AI bubble burst, Pichai argued that Google’s unique position gives it an advantage. He told the BBC that the company owns what he called a “full stack” of technologies, from chips to YouTube data to models and frontier science research. This integrated approach, he suggested, would help the company weather any market turbulence better than competitors.

Pichai also told the BBC that people should not “blindly trust” everything AI tools output. The company currently faces repeated accuracy concerns about some of its AI models. Pichai said that while AI tools are helpful “if you want to creatively write something,” people “have to learn to use these tools for what they’re good at and not blindly trust everything they say.”

In the BBC interview, the Google boss also addressed the “immense” energy needs of AI, acknowledging that the intensive energy requirements of expanding AI ventures have caused slippage on Alphabet’s climate targets. However, Pichai insisted that the company still wants to achieve net zero by 2030 through investments in new energy technologies. “The rate at which we were hoping to make progress will be impacted,” Pichai said, warning that constraining an economy based on energy “will have consequences.”

Even with the warnings about a potential AI bubble, Pichai did not miss his chance to promote the technology, albeit with a hint of danger regarding its widespread impact. Pichai described AI as “the most profound technology” humankind has worked on.

“We will have to work through societal disruptions,” he said, adding that the technology would “create new opportunities” and “evolve and transition certain jobs.” He said people who adapt to AI tools “will do better” in their professions, whatever field they work in.

Google CEO: If an AI bubble pops, no one is getting out clean Read More »

google-unveils-gemini-3-ai-model-and-ai-first-ide-called-antigravity

Google unveils Gemini 3 AI model and AI-first IDE called Antigravity


Google’s flagship AI model is getting its second major upgrade this year.

Google has kicked its Gemini rollout into high gear over the past year, releasing the much-improved Gemini 2.5 family and cramming various flavors of the model into Search, Gmail, and just about everything else the company makes.

Now, Google’s increasingly unavoidable AI is getting an upgrade. Gemini 3 Pro is available in a limited form today, featuring more immersive, visual outputs and fewer lies, Google says. The company also says Gemini 3 sets a new high-water mark for vibe coding, and Google is announcing a new AI-first integrated development environment (IDE) called Antigravity, which is also available today.

The first member of the Gemini 3 family

Google says the release of Gemini 3 is yet another step toward artificial general intelligence (AGI). The new version of Google’s flagship AI model has expanded simulated reasoning abilities and shows improved understanding of text, images, and video. So far, testers like it—Google’s latest LLM is once again atop the LMArena leaderboard with an ELO score of 1,501, besting Gemini 2.5 Pro by 50 points.

Gemini 3 LMArena

Credit: Google

Factuality has been a problem for all gen AI models, but Google says Gemini 3 is a big step in the right direction, and there are myriad benchmarks to tell the story. In the 1,000-question SimpleQA Verified test, Gemini 3 scored a record 72.1 percent. Yes, that means the state-of-the-art LLM still screws up almost 30 percent of general knowledge questions, but Google says this still shows substantial progress. On the much more difficult Humanity’s Last Exam, which tests PhD-level knowledge and reasoning, Gemini set another record, scoring 37.5 percent without tool use.

Math and coding are also a focus of Gemini 3. The model set new records in MathArena Apex (23.4 percent) and WebDev Arena (1487 ELO). In the SWE-bench Verified, which tests a model’s ability to generate code, Gemini 3 hit an impressive 76.2 percent.

So there are plenty of respectable but modest benchmark improvements, but Gemini 3 also won’t make you cringe as much. Google says it has tamped down on sycophancy, a common problem in all these overly polite LLMs. Outputs from Gemini 3 Pro are reportedly more concise, with less of what you want to hear and more of what you need to hear.

You can also expect Gemini 3 Pro to produce noticeably richer outputs. Google claims Gemini’s expanded reasoning capabilities keep it on task more effectively, allowing it to take action on your behalf. For example, Gemini 3 can triage and take action on your emails, creating to-do lists, summaries, recommended replies, and handy buttons to trigger suggested actions. This differs from the current Gemini models, which would only create a text-based to-do list with similar prompts.

The model also has what Google calls a “generative interface,” which comes in the form of two experimental output modes called visual layout and dynamic view. The former is a magazine-style interface that includes lots of images in a scrollable UI. Dynamic view leverages Gemini’s coding abilities to create custom interfaces—for example, a web app that explores the life and work of Vincent Van Gogh.

There will also be a Deep Think mode for Gemini 3, but that’s not ready for prime time yet. Google says it’s being tested by a small group for later release, but you should expect big things. Deep Think mode manages 41 percent in Humanity’s Last Exam without tools. Believe it or not, that’s an impressive score.

Coding with vibes

Google has offered several ways of generating and modifying code with Gemini models, but the launch of Gemini 3 adds a new one: Google Antigravity. This is Google’s new agentic development platform—it’s essentially an IDE designed around agentic AI, and it’s available in preview today.

With Antigravity, Google promises that you (the human) can get more work done by letting intelligent agents do the legwork. Google says you should think of Antigravity as a “mission control” for creating and monitoring multiple development agents. The AI in Antigravity can operate autonomously across the editor, terminal, and browser to create and modify projects, but everything they do is relayed to the user in the form of “Artifacts.” These sub-tasks are designed to be easily verifiable so you can keep on top of what the agent is doing. Gemini will be at the core of the Antigravity experience, but it’s not just Google’s bot. Antigravity also supports Claude Sonnet 4.5 and GPT-OSS agents.

Of course, developers can still plug into the Gemini API for coding tasks. With Gemini 3, Google is adding a client-side bash tool, which lets the AI generate shell commands in its workflow. The model can access file systems and automate operations, and a server-side bash tool will help generate code in multiple languages. This feature is starting in early access, though.

AI Studio is designed to be a faster way to build something with Gemini 3. Google says Gemini 3 Pro’s strong instruction following makes it the best vibe coding model yet, allowing non-programmers to create more complex projects.

A big experiment

Google will eventually have a whole family of Gemini 3 models, but there’s just the one for now. Gemini 3 Pro is rolling out in the Gemini app, AI Studio, Vertex AI, and the API starting today as an experiment. If you want to tinker with the new model in Google’s Antigravity IDE, that’s also available for testing today on Windows, Mac, and Linux.

Gemini 3 will also launch in the Google search experience on day one. You’ll have the option to enable Gemini 3 Pro in AI Mode, where Google says it will provide more useful information about a query. The generative interface capabilities from the Gemini app will be available here as well, allowing Gemini to create tools and simulations when appropriate to answer the user’s question. Google says these generative interfaces are strongly preferred in its user testing. This feature is available today, but only for AI Pro and Ultra subscribers.

Because the Pro model is the only Gemini 3 variant available in the preview, AI Overviews isn’t getting an immediate upgrade. That will come, but for now, Overviews will only reach out to Gemini 3 Pro for especially difficult search queries—basically the kind of thing Google thinks you should have used AI Mode to do in the first place.

There’s no official timeline for releasing more Gemini 3 models or graduating the Pro variant to general availability. However, given the wide rollout of the experimental release, it probably won’t be long.

Photo of Ryan Whitwam

Ryan Whitwam is a senior technology reporter at Ars Technica, covering the ways Google, AI, and mobile technology continue to change the world. Over his 20-year career, he’s written for Android Police, ExtremeTech, Wirecutter, NY Times, and more. He has reviewed more phones than most people will ever own. You can follow him on Bluesky, where you will see photos of his dozens of mechanical keyboards.

Google unveils Gemini 3 AI model and AI-first IDE called Antigravity Read More »

with-a-new-company,-jeff-bezos-will-become-a-ceo-again

With a new company, Jeff Bezos will become a CEO again

Jeff Bezos is one of the world’s richest and most famous tech CEOs, but he hasn’t actually been a CEO of anything since 2021. That’s now changing as he takes on the role of co-CEO of a new AI company, according to a New York Times report citing three people familiar with the company.

Grandiosely named Project Prometheus (and not to be confused with the NASA project of the same name), the company will focus on using AI to pursue breakthroughs in research, engineering, manufacturing, and other fields that are dubbed part of “the physical economy”—in contrast to the software applications that are likely the first thing most people in the general public think of when they hear “AI.”

Bezos’ co-CEO will be Vik Bajaj, a chemist and physicist who previously led life sciences work at Google X, an Alphabet-backed research group that worked on speculative projects that could lead to more product categories. (For example, it developed technologies that would later underpin Google’s Waymo service.) Bajaj also worked at Verily, another Alphabet-backed research group focused on life sciences, and Foresite Labs, an incubator for new AI companies.

With a new company, Jeff Bezos will become a CEO again Read More »

oracle-hit-hard-in-wall-street’s-tech-sell-off-over-its-huge-ai-bet

Oracle hit hard in Wall Street’s tech sell-off over its huge AI bet

“That is a huge liability and credit risk for Oracle. Your main customer, biggest customer by far, is a venture capital-funded start-up,” said Andrew Chang, a director at S&P Global.

OpenAI faces questions about how it plans to meet its commitments to spend $1.4 trillion on AI infrastructure over the next eight years. It has struck deals with several Big Tech groups, including Oracle’s rivals.

Of the five hyperscalers—which include Amazon, Google, Microsoft, and Meta—Oracle is the only one with negative free cash flow. Its debt-to-equity ratio has surged to 500 percent, far higher than Amazon’s 50 percent and Microsoft’s 30 percent, according to JPMorgan.

While all five companies have seen their cash-to-assets ratios decline significantly in recent years amid a boom in spending, Oracle’s is by far the lowest, JPMorgan found.

JPMorgan analysts noted a “tension between [Oracle’s] aggressive AI build-out ambitions and the limits of its investment-grade balance sheet.”

Analysts have also noted that Oracle’s data center leases are for much longer than its contracts to sell capacity to OpenAI.

Oracle has signed at least five long-term lease agreements for US data centers that will ultimately be used by OpenAI, resulting in $100 billion of off-balance-sheet lease commitments. The sites are at varying levels of construction, with some not expected to break ground until next year.

Safra Catz, Oracle’s sole chief executive from 2019 until she stepped down in September, resisted expanding its cloud business because of the vast expenses required. She was replaced by co-CEOs Clay Magouyrk and Mike Sicilia as part of the pivot by Oracle to a new era focused on AI.

Catz, who is now executive vice-chair of Oracle’s board, has exercised stock options and sold $2.5 billion of its shares this year, according to US regulatory filings. She had announced plans to exercise her stock options at the end of 2024.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Oracle hit hard in Wall Street’s tech sell-off over its huge AI bet Read More »

forget-agi—sam-altman-celebrates-chatgpt-finally-following-em-dash-formatting-rules

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules


Next stop: superintelligence

Ongoing struggles with AI model instruction-following show that true human-level AI still a ways off.

Em dashes have become what many believe to be a telltale sign of AI-generated text over the past few years. The punctuation mark appears frequently in outputs from ChatGPT and other AI chatbots, sometimes to the point where readers believe they can identify AI writing by its overuse alone—although people can overuse it, too.

On Thursday evening, OpenAI CEO Sam Altman posted on X that ChatGPT has started following custom instructions to avoid using em dashes. “Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it’s supposed to do!” he wrote.

The post, which came two days after the release of OpenAI’s new GPT-5.1 AI model, received mixed reactions from users who have struggled for years with getting the chatbot to follow specific formatting preferences. And this “small win” raises a very big question: If the world’s most valuable AI company has struggled with controlling something as simple as punctuation use after years of trying, perhaps what people call artificial general intelligence (AGI) is farther off than some in the industry claim.

Sam Altman @sama Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it's supposed to do! 11:48 PM · Nov 13, 2025 · 2.4M Views

A screenshot of Sam Altman’s post about em dashes on X. Credit: X

“The fact that it’s been 3 years since ChatGPT first launched, and you’ve only just now managed to make it obey this simple requirement, says a lot about how little control you have over it, and your understanding of its inner workings,” wrote one X user in a reply. “Not a good sign for the future.”

While Altman likes to publicly talk about AGI (a hypothetical technology equivalent to humans in general learning ability), superintelligence (a nebulous concept for AI that is far beyond human intelligence), and “magic intelligence in the sky” (his term for AI cloud computing?) while raising funds for OpenAI, it’s clear that we still don’t have reliable artificial intelligence here today on Earth.

But wait, what is an em dash anyway, and why does it matter so much?

AI models love em dashes because we do

Unlike a hyphen, which is a short punctuation mark used to connect words or parts of words, that lives with a dedicated key on your keyboard (-), an em dash is a long dash denoted by a special character (—) that writers use to set off parenthetical information, indicate a sudden change in thought, or introduce a summary or explanation.

Even before the age of AI language models, some writers frequently bemoaned the overuse of the em dash in modern writing. In a 2011 Slate article, writer Noreen Malone argued that writers used the em dash “in lieu of properly crafting sentences” and that overreliance on it “discourages truly efficient writing.” Various Reddit threads posted prior to ChatGPT’s launch featured writers either wrestling over the etiquette of proper em dash use or admitting to their frequent use as a guilty pleasure.

In 2021, one writer in the r/FanFiction subreddit wrote, “For the longest time, I’ve been addicted to Em Dashes. They find their way into every paragraph I write. I love the crisp straight line that gives me the excuse to shove details or thoughts into an otherwise orderly paragraph. Even after coming back to write after like two years of writer’s block, I immediately cram as many em dashes as I can.”

Because of the tendency for AI chatbots to overuse them, detection tools and human readers have learned to spot em dash use as a pattern, creating a problem for the small subset of writers who naturally favor the punctuation mark in their work. As a result, some journalists are complaining that AI is “killing” the em dash.

No one knows precisely why LLMs tend to overuse em dashes. We’ve seen a wide range of speculation online that attempts to explain the phenomenon, from noticing that em dashes were more popular in 19th-century books used as training data (according to a 2018 study, dash use in the English language peaked around 1860 before declining through the mid-20th century) or perhaps AI models borrowed the habit from automatic em-dash character conversion on the blogging site Medium.

One thing we know for sure is that LLMs tend to output frequently seen patterns in their training data (fed in during the initial training process) and from a subsequent reinforcement learning process that often relies on human preferences. As a result, AI language models feed you a sort of “smoothed out” average style of whatever you ask them to provide, moderated by whatever they are conditioned to produce through user feedback.

So the most plausible explanation is still that requests for professional-style writing from an AI model trained on vast numbers of examples from the Internet will lean heavily toward the prevailing style in the training data, where em dashes appear frequently in formal writing, news articles, and editorial content. It’s also possible that during training through human feedback (called RLHF), responses with em dashes, for whatever reason, received higher ratings. Perhaps it’s because those outputs appeared more sophisticated or engaging to evaluators, but that’s just speculation.

From em dashes to AGI?

To understand what Altman’s “win” really means, and what it says about the road to AGI, we need to understand how ChatGPT’s custom instructions actually work. They allow users to set persistent preferences that apply across all conversations by appending written instructions to the prompt that is fed into the model just before the chat begins. Users can specify tone, format, and style requirements without needing to repeat those requests manually in every new chat.

However, the feature has not always worked reliably because LLMs do not work reliably (even OpenAI and Anthropic freely admit this). A LLM takes an input and produces an output, spitting out a statistically plausible continuation of a prompt (a system prompt, the custom instructions, and your chat history), and it doesn’t really “understand” what you are asking. With AI language model outputs, there is always some luck involved in getting them to do what you want.

In our informal testing of GPT-5.1 with custom instructions, ChatGPT did appear to follow our request not to produce em dashes. But despite Altman’s claim, the response from X users appears to show that experiences with the feature continue to vary, at least when the request is not placed in custom instructions.

So if LLMs are statistical text-generation boxes, what does “instruction following” even mean? That’s key to unpacking the hypothetical path from LLMs to AGI. The concept of following instructions for an LLM is fundamentally different from how we typically think about following instructions as humans with general intelligence, or even a traditional computer program.

In traditional computing, instruction following is deterministic. You tell a program “don’t include character X,” and it won’t include that character. The program executes rules exactly as written. With LLMs, “instruction following” is really about shifting statistical probabilities. When you tell ChatGPT “don’t use em dashes,” you’re not creating a hard rule. You’re adding text to the prompt that makes tokens associated with em dashes less likely to be selected during the generation process. But “less likely” isn’t “impossible.”

Every token the model generates is selected from a probability distribution. Your custom instruction influences that distribution, but it’s competing with the model’s training data (where em-dashes appeared frequently in certain contexts) and everything else in the prompt. Unlike code with conditional logic, there’s no separate system verifying outputs against your requirements. The instruction is just more text that influences the statistical prediction process.

When Altman celebrates finally getting GPT to avoid em dashes, he’s really celebrating that OpenAI has tuned the latest version of GPT-5.1 (probably through reinforcement learning or fine-tuning) to weight custom instructions more heavily in its probability calculations.

There’s an irony about control here: Given the probabilistic nature of the issue, there’s no guarantee the issue will stay fixed. OpenAI continuously updates its models behind the scenes, even within the same version number, adjusting outputs based on user feedback and new training runs. Each update arrives with different output characteristics that can undo previous behavioral tuning, a phenomenon researchers call the “alignment tax.”

Precisely tuning a neural network’s behavior is not yet an exact science. Since all concepts encoded in the network are interconnected by values called weights, adjusting one behavior can alter others in unintended ways. Fix em dash overuse today, and tomorrow’s update (aimed at improving, say, coding capabilities) might inadvertently bring them back, not because OpenAI wants them there, but because that’s the nature of trying to steer a statistical system with millions of competing influences.

This gets to an implied question we mentioned earlier. If controlling punctuation use is still a struggle that might pop back up at any time, how far are we from AGI? We can’t know for sure, but it seems increasingly likely that it won’t emerge from a large language model alone. That’s because AGI, a technology that would replicate human general learning ability, would likely require true understanding and self-reflective intentional action, not statistical pattern matching that sometimes aligns with instructions if you happen to get lucky.

And speaking of getting lucky, some users still aren’t having luck with controlling em dash use outside of the “custom instructions” feature. Upon being told in-chat to not use em dashes within a chat, ChatGPT updated a saved memory and replied to one X user, “Got it—I’ll stick strictly to short hyphens from now on.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules Read More »

researchers-question-anthropic-claim-that-ai-assisted-attack-was-90%-autonomous

Researchers question Anthropic claim that AI-assisted attack was 90% autonomous

Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn’t work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor’s operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks.

How (Anthropic says) the attack unfolded

Anthropic said GTG-1002 developed an autonomous attack framework that used Claude as an orchestration mechanism that largely eliminated the need for human involvement. This orchestration system broke complex multi-stage attacks into smaller technical tasks such as vulnerability scanning, credential validation, data extraction, and lateral movement.

“The architecture incorporated Claude’s technical capabilities as an execution engine within a larger automated system, where the AI performed specific technical actions based on the human operators’ instructions while the orchestration logic maintained attack state, managed phase transitions, and aggregated results across multiple sessions,” Anthropic said. “This approach allowed the threat actor to achieve operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement, as the framework autonomously progressed through reconnaissance, initial access, persistence, and data exfiltration phases by sequencing Claude’s responses and adapting subsequent requests based on discovered information.”

The attacks followed a five-phase structure that increased AI autonomy through each one.

The life cycle of the cyberattack, showing the move from human-led targeting to largely AI-driven attacks using various tools, often via the Model Context Protocol (MCP). At various points during the attack, the AI returns to its human operator for review and further direction.

Credit: Anthropic

The life cycle of the cyberattack, showing the move from human-led targeting to largely AI-driven attacks using various tools, often via the Model Context Protocol (MCP). At various points during the attack, the AI returns to its human operator for review and further direction. Credit: Anthropic

The attackers were able to bypass Claude guardrails in part by breaking tasks into small steps that, in isolation, the AI tool didn’t interpret as malicious. In other cases, the attackers couched their inquiries in the context of security professionals trying to use Claude to improve defenses.

As noted last week, AI-developed malware has a long way to go before it poses a real-world threat. There’s no reason to doubt that AI-assisted cyberattacks may one day produce more potent attacks. But the data so far indicates that threat actors—like most others using AI—are seeing mixed results that aren’t nearly as impressive as those in the AI industry claim.

Researchers question Anthropic claim that AI-assisted attack was 90% autonomous Read More »

google-is-rolling-out-conversational-shopping—and-ads—in-ai-mode-search

Google is rolling out conversational shopping—and ads—in AI Mode search

In recent months, Google has promised to inject generative AI into the online shopping experience, and now it’s following through. The previously announced shopping features of AI Mode search are rolling out, and Gemini will also worm its way into Google’s forgotten Duplex automated phone call tech. It’s all coming in time for the holidays to allegedly make your gifting more convenient and also conveniently ensure that Google gets a piece of the action.

At Google I/O in May, the company announced its intention to bring conversational shopping to AI Mode. According to Google, its enormous “Shopping Graph” or retailer data means its AI is uniquely positioned to deliver useful suggestions. In the coming weeks, users in the US will be able to ask AI Mode complex questions about what to buy, and it will deliver suggestions, guides, tables, and other generated content to help you decide. And since this is gen AI, it comes with the usual disclaimers about possible mistakes.

AI Mode shopping features.

You’re probably wondering where you’ll see sponsored shopping content in these experiences. Google says some of the content that appears in AI Mode will be ads, just like if you look up shopping results in a traditional search. Shopping features are also coming to the Gemini app, but Google says it won’t have sponsored content in the results for the time being.

Google is also releasing a feature called “agentic checkout,” a term used only in passing when the company announced the feature alongside AI Mode shopping at I/O. Google is really leaning into the agentic angle now, though. The gist is you can set a price threshold for a product in search, and Google will let you know if the item reaches that price. That part isn’t new, but there’s now an AI twist. After getting the alert, you can authorize an automatic purchase with Google Pay. However, it’s currently only supported at a handful of retailers like Chewy, Wayfair, and some Shopify merchants. It’s not clear whether this qualifies as agentic anything, but it might save you some money regardless.

Google is rolling out conversational shopping—and ads—in AI Mode search Read More »