generative ai

disney-invests-$1-billion-in-openai,-licenses-200-characters-for-ai-video-app-sora

Disney invests $1 billion in OpenAI, licenses 200 characters for AI video app Sora

An AI-generated version of OpenAI CEO Sam Altman, seen in a still capture from a video generated by Sora 2.

An AI-generated version of OpenAI CEO Sam Altman seen in a still capture from a video generated by Sora 2. Credit: OpenAI

Under the new agreement with Disney, Sora users will be able to generate short videos using characters such as Mickey Mouse, Darth Vader, Iron Man, Simba, and characters from franchises including Frozen, Inside Out, Toy Story, and The Mandalorian, along with costumes, props, vehicles, and environments.

The ChatGPT image generator will also gain official access to the same intellectual property, although that information was trained into these AI models long ago. What’s changing is that OpenAI will allow Disney-related content generated by its AI models to officially pass through its content moderation filters and reach the user, sanctioned by Disney.

On Disney’s end of the deal, the company plans to deploy ChatGPT for its employees and use OpenAI’s technology to build new features for Disney+. A curated selection of fan-made Sora videos will stream on the Disney+ platform starting in early 2026.

The agreement does not include any talent likenesses or voices. Disney and OpenAI said they have committed to “maintaining robust controls to prevent the generation of illegal or harmful content” and to “respect the rights of individuals to appropriately control the use of their voice and likeness.”

OpenAI CEO Sam Altman called the deal a model for collaboration between AI companies and studios. “This agreement shows how AI companies and creative leaders can work together responsibly to promote innovation that benefits society, respect the importance of creativity, and help works reach vast new audiences,” Altman said.

From adversary to partner

Money opens all kinds of doors, and the new partnership represents a dramatic reversal in Disney’s approach to OpenAI from just a few months ago. At that time, Disney and other major studios refused to participate in Sora 2 following its launch on September 30.

Disney invests $1 billion in OpenAI, licenses 200 characters for AI video app Sora Read More »

the-npu-in-your-phone-keeps-improving—why-isn’t-that-making-ai-better?

The NPU in your phone keeps improving—why isn’t that making AI better?


Shrinking AI for your phone is no simple matter.

npu phone

The NPU in your phone might not be doing very much. Credit: Aurich Lawson | Getty Images

The NPU in your phone might not be doing very much. Credit: Aurich Lawson | Getty Images

Almost every technological innovation of the past several years has been laser-focused on one thing: generative AI. Many of these supposedly revolutionary systems run on big, expensive servers in a data center somewhere, but at the same time, chipmakers are crowing about the power of the neural processing units (NPU) they have brought to consumer devices. Every few months, it’s the same thing: This new NPU is 30 or 40 percent faster than the last one. That’s supposed to let you do something important, but no one really gets around to explaining what that is.

Experts envision a future of secure, personal AI tools with on-device intelligence, but does that match the reality of the AI boom? AI on the “edge” sounds great, but almost every AI tool of consequence is running in the cloud. So what’s that chip in your phone even doing?

What is an NPU?

Companies launching a new product often get bogged down in superlatives and vague marketing speak, so they do a poor job of explaining technical details. It’s not clear to most people buying a phone why they need the hardware to run AI workloads, and the supposed benefits are largely theoretical.

Many of today’s flagship consumer processors are systems-on-a-chip (SoC) because they incorporate multiple computing elements—like CPU cores, GPUs, and imaging controllers—on a single piece of silicon. This is true of mobile parts like Qualcomm’s Snapdragon or Google’s Tensor, as well as PC components like the Intel Core Ultra.

The NPU is a newer addition to chips, but it didn’t just appear one day—there’s a lineage that brought us here. NPUs are good at what they do because they emphasize parallel computing, something that’s also important in other SoC components.

Qualcomm devotes significant time during its new product unveilings to talk about its Hexagon NPUs. Keen observers may recall that this branding has been reused from the company’s line of digital signal processors (DSPs), and there’s a good reason for that.

“Our journey into AI processing started probably 15 or 20 years ago, wherein our first anchor point was looking at signal processing,” said Vinesh Sukumar, Qualcomm’s head of AI products. DSPs have a similar architecture compared to NPUs, but they’re much simpler, with a focus on processing audio (e.g., speech recognition) and modem signals.

Qualcomm chip design NPU

The NPU is one of multiple components in modern SoCs.

Credit: Qualcomm

The NPU is one of multiple components in modern SoCs. Credit: Qualcomm

As the collection of technologies we refer to as “artificial intelligence” developed, engineers began using DSPs for more types of parallel processing, like long short-term memory (LSTM). Sukumar explained that as the industry became enamored with convolutional neural networks (CNNs), the technology underlying applications like computer vision, DSPs became focused on matrix functions, which are essential to generative AI processing as well.

While there is an architectural lineage here, it’s not quite right to say NPUs are just fancy DSPs. “If you talk about DSPs in the general term of the word, yes, [an NPU] is a digital signal processor,” said MediaTek Assistant Vice President Mark Odani. “But it’s all come a long way and it’s a lot more optimized for parallelism, how the transformers work, and holding huge numbers of parameters for processing.”

Despite being so prominent in new chips, NPUs are not strictly necessary for running AI workloads on the “edge,” a term that differentiates local AI processing from cloud-based systems. CPUs are slower than NPUs but can handle some light workloads without using as much power. Meanwhile, GPUs can often chew through more data than an NPU, but they use more power to do it. And there are times you may want to do that, according to Qualcomm’s Sukumar. For example, running AI workloads while a game is running could favor the GPU.

“Here, your measurement of success is that you cannot drop your frame rate while maintaining the spatial resolution, the dynamic range of the pixel, and also being able to provide AI recommendations for the player within that space,” says Sukumar. “In this kind of use case, it actually makes sense to run that in the graphics engine, because then you don’t have to keep shifting between the graphics and a domain-specific AI engine like an NPU.”

Livin’ on the edge is hard

Unfortunately, the NPUs in many devices sit idle (and not just during gaming). The mix of local versus cloud AI tools favors the latter because that’s the natural habitat of LLMs. AI models are trained and fine-tuned on powerful servers, and that’s where they run best.

A server-based AI, like the full-fat versions of Gemini and ChatGPT, is not resource-constrained like a model running on your phone’s NPU. Consider the latest version of Google’s on-device Gemini Nano model, which has a context window of 32k tokens. That is a more than 2x improvement over the last version. However, the cloud-based Gemini models have context windows of up to 1 million tokens, meaning they can process much larger volumes of data.

Both cloud-based and edge AI hardware will continue getting better, but the balance may not shift in the NPU’s favor. “The cloud will always have more compute resources versus a mobile device,” said Google’s Shenaz Zack, senior product manager on the Pixel team.

“If you want the most accurate models or the most brute force models, that all has to be done in the cloud,” Odani said. “But what we’re finding is that, in a lot of the use cases where there’s just summarizing some text or you’re talking to your voice assistant, a lot of those things can fit within three billion parameters.”

Squeezing AI models onto a phone or laptop involves some compromise—for example, by reducing the parameters included in the model. Odani explained that cloud-based models run hundreds of billions of parameters, the weighting that determines how a model processes input tokens to generate outputs. You can’t run anything like that on a consumer device right now, so developers have to vastly scale back the size of models for the edge. Odani says MediaTek’s latest ninth-generation NPU can handle about 3 billion parameters—a difference of several orders of magnitude.

The amount of memory available in a phone or laptop is also a limiting factor, so mobile-optimized AI models are usually quantized. That means the model’s estimation of the next token runs with less precision. Let’s say you want to run one of the larger open models, like Llama or Gemma 7b, on your device. The de facto standard is FP16, known as half-precision. At that level, a model with 7 billion parameters will lock up 13 or 14 gigabytes of memory. Stepping down to FP4 (quarter-precision) brings the size of the model in memory to a few gigs.

“When you compress to, let’s say, between three and four gigabytes, it’s a sweet spot for integration into memory constrained form factors like a smartphone,” Sukumar said. “And there’s been a lot of investment in the ecosystem and at Qualcomm to look at various ways of compressing the models without losing quality.”

It’s difficult to create a generalized AI with these limitations for mobile devices, but computers—and especially smartphones—are a wellspring of data that can be pumped into models to generate supposedly helpful outputs. That’s why most edge AI is geared toward specific, narrow use cases, like analyzing screenshots or suggesting calendar appointments. Google says its latest Pixel phones run more than 100 AI models, both generative and traditional.

Even AI skeptics can recognize that the landscape is changing quickly. In the time it takes to shrink and optimize AI models for a phone or laptop, new cloud models may appear that make that work obsolete. This is also why third-party developers have been slow to utilize NPU processing in apps. They either have to plug into an existing on-device model, which involves restrictions and rapidly moving development targets, or deploy their own custom models. Neither is a great option currently.

A matter of trust

If the cloud is faster and easier, why go to the trouble of optimizing for the edge and burning more power with an NPU? Leaning on the cloud means accepting a level of dependence and trust in the people operating AI data centers that may not always be appropriate.

“We always start off with user privacy as an element,” said Qualcomm’s Sukumar. He explained that the best inference is not general in nature—it’s personalized based on the user’s interests and what’s happening in their lives. Fine-tuning models to deliver that experience calls for personal data, and it’s safer to store and process that data locally.

Even when companies say the right things about privacy in their cloud services, they’re far from guarantees. The helpful, friendly vibe of general chatbots also encourages people to divulge a lot of personal information, and if that assistant is running in the cloud, your data is there as well. OpenAI’s copyright fight with The New York Times could lead to millions of private chats being handed over to the publisher. The explosive growth and uncertain regulatory framework of gen AI make it hard to know what’s going to happen to your data.

“People are using a lot of these generative AI assistants like a therapist,” Odani said. “And you don’t know one day if all this stuff is going to come out on the Internet.”

Not everyone is so concerned. Zack claims Google has built “the world’s most secure cloud infrastructure,” allowing it to process data where it delivers the best results. Zack uses Video Boost and Pixel Studio as examples of this approach, noting that Google’s cloud is the only way to make these experiences fast and high-quality. The company recently announced its new Private AI Compute system, which it claims is just as safe as local AI.

Even if that’s true, the edge has other advantages—edge AI is just more reliable than a cloud service. “On-device is fast,” Odani said. “Sometimes I’m talking to ChatGPT and my Wi-Fi goes out or whatever, and it skips a beat.”

The services hosting cloud-based AI models aren’t just a single website—the Internet of today is massively interdependent, with content delivery networks, DNS providers, hosting, and other services that could degrade or shut down your favorite AI in the event of a glitch. When Cloudflare suffered a self-inflicted outage recently, ChatGPT users were annoyed to find their trusty chatbot was unavailable. Local AI features don’t have that drawback.

Cloud dominance

Everyone seems to agree that a hybrid approach is necessary to deliver truly useful AI features (assuming those exist), sending data to more powerful cloud services when necessary—Google, Apple, and every other phone maker does this. But the pursuit of a seamless experience can also obscure what’s happening with your data. More often than not, the AI features on your phone aren’t running in a secure, local way, even when the device has the hardware to do that.

Take, for example, the new OnePlus 15. This phone has Qualcomm’s brand-new Snapdragon 8 Elite Gen 5, which has an NPU that is 37 percent faster than the last one, for whatever that’s worth. Even with all that on-device AI might, OnePlus is heavily reliant on the cloud to analyze your personal data. Features like AI Writer and the AI Recorder connect to the company’s servers for processing, a system OnePlus assures us is totally safe and private.

Similarly, Motorola released a new line of foldable Razr phones over the summer that are loaded with AI features from multiple providers. These phones can summarize your notifications using AI, but you might be surprised how much of it happens in the cloud unless you read the terms and conditions. If you buy the Razr Ultra, that summarization happens on your phone. However, the cheaper models with less RAM and NPU power use cloud services to process your notifications. Again, Motorola says this system is secure, but a more secure option would have been to re-optimize the model for its cheaper phones.

Even when an OEM focuses on using the NPU hardware, the results can be lacking. Look at Google’s Daily Hub and Samsung’s Now Brief. These features are supposed to chew through all the data on your phone and generate useful recommendations and actions, but they rarely do anything aside from showing calendar events. In fact, Google has temporarily removed Daily Hub from Pixels because the feature did so little, and Google is a pioneer in local AI with Gemini Nano. Google has actually moved some parts of its mobile AI experience from local to cloud-based processing in recent months.

Those “brute force” models appear to be winning, and it doesn’t hurt that companies also get more data when you interact with their private computing cloud services.

Maybe take what you can get?

There’s plenty of interest in local AI, but so far, that hasn’t translated to an AI revolution in your pocket. Most of the AI advances we’ve seen so far depend on the ever-increasing scale of cloud systems and the generalized models that run there. Industry experts say that extensive work is happening behind the scenes to shrink AI models to work on phones and laptops, but it will take time for that to make an impact.

In the meantime, local AI processing is out there in a limited way. Google still makes use of the Tensor NPU to handle sensitive data for features like Magic Cue, and Samsung really makes the most of Qualcomm’s AI-focused chipsets. While Now Brief is of questionable utility, Samsung is cognizant of how reliance on the cloud may impact users, offering a toggle in the system settings that restricts AI processing to run only on the device. This limits the number of available AI features, and others don’t work as well, but you’ll know none of your personal data is being shared. No one else offers this option on a smartphone.

Galaxy AI toggle

Samsung offers an easy toggle to disable cloud AI and run all workloads on-device.

Credit: Ryan Whitwam

Samsung offers an easy toggle to disable cloud AI and run all workloads on-device. Credit: Ryan Whitwam

Samsung spokesperson Elise Sembach said the company’s AI efforts are grounded in enhancing experiences while maintaining user control. “The on-device processing toggle in One UI reflects this approach. It gives users the option to process AI tasks locally for faster performance, added privacy, and reliability even without a network connection,” Sembach said.

Interest in edge AI might be a good thing even if you don’t use it. Planning for this AI-rich future can encourage device makers to invest in better hardware—like more memory to run all those theoretical AI models.

“We definitely recommend our partners increase their RAM capacity,” said Sukumar. Indeed, Google, Samsung, and others have boosted memory capacity in large part to support on-device AI. Even if the cloud is winning, we’ll take the extra RAM.

Photo of Ryan Whitwam

Ryan Whitwam is a senior technology reporter at Ars Technica, covering the ways Google, AI, and mobile technology continue to change the world. Over his 20-year career, he’s written for Android Police, ExtremeTech, Wirecutter, NY Times, and more. He has reviewed more phones than most people will ever own. You can follow him on Bluesky, where you will see photos of his dozens of mechanical keyboards.

The NPU in your phone keeps improving—why isn’t that making AI better? Read More »

microsoft-drops-ai-sales-targets-in-half-after-salespeople-miss-their-quotas

Microsoft drops AI sales targets in half after salespeople miss their quotas

Microsoft has lowered sales growth targets for its AI agent products after many salespeople missed their quotas in the fiscal year ending in June, according to a report Wednesday from The Information. The adjustment is reportedly unusual for Microsoft, and it comes after the company missed a number of ambitious sales goals for its AI offerings.

AI agents are specialized implementations of AI language models designed to perform multistep tasks autonomously rather than simply responding to single prompts. So-called “agentic” features have been central to Microsoft’s 2025 sales pitch: At its Build conference in May, the company declared that it has entered “the era of AI agents.”

The company has promised customers that agents could automate complex tasks, such as generating dashboards from sales data or writing customer reports. At its Ignite conference in November, Microsoft announced new features like Word, Excel, and PowerPoint agents in Microsoft 365 Copilot, along with tools for building and deploying agents through Azure AI Foundry and Copilot Studio. But as the year draws to a close, that promise has proven harder to deliver than the company expected.

According to The Information, one US Azure sales unit set quotas for salespeople to increase customer spending on a product called Foundry, which helps customers develop AI applications, by 50 percent. Less than a fifth of salespeople in that unit met their Foundry sales growth targets. In July, Microsoft lowered those targets to roughly 25 percent growth for the current fiscal year. In another US Azure unit, most salespeople failed to meet an earlier quota to double Foundry sales, and Microsoft cut their quotas to 50 percent for the current fiscal year.

Microsoft drops AI sales targets in half after salespeople miss their quotas Read More »

prime-video-pulls-eerily-emotionless-ai-generated-anime-dubs-after-complaints

Prime Video pulls eerily emotionless AI-generated anime dubs after complaints

[S]o many talented voice actors, and you can’t even bother to hire a couple to dub a season of a show??????????? absolutely disrespectful.

Naturally, anime voice actors took offense, too. Damian Mills, for instance, said via X that voicing a “notable queer-coded character like Kaworu” in three Evangelion movie dubs for Prime Video (in 2007, 2009, and 2012) “meant a lot, especially being queer myself.”

Mills, who also does voice acting for other anime, including One Piece (Tanaka) and Dragon Ball Super (Frieza) added, “… using AI to replace dub actors on #BananaFish? It’s insulting and I can’t support this. It’s insane to me. What’s worse is Banana Fish is an older property, so there was no urgency to get a dub created.”

Amazon also seems to have rethought its March statement announcing that it would use AI to dub content “that would not have been dubbed otherwise.” For example, in 2017, Sentai Filmworks released an English dub of No Game, No Life: Zero with human voice actors.

Some dubs pulled

On Tuesday, Gizmodo reported that “several of the English language AI dubs for anime such as Banana Fish, No Game No Life: Zero, and more have now been removed.” However, some AI-generated dubs remain as of this writing, including an English dub for the anime series Pet and a Spanish one for Banana Fish, Ars Technica has confirmed.

Amazon hasn’t commented on the AI-generated dubs or why it took some of them down.

All of this comes despite Amazon’s March announcement that the AI-generated dubs would use “human expertise” for “quality control.”

The sloppy dubbing of cherished anime titles reflects a lack of precision in the broader industry as companies seek to leverage generative AI to save time and money. Prime Video has already been criticized for using AI-generated movie summaries and posters this year. And this summer, anime streaming service Crunchyroll blamed bad AI-generated subtitles on an agreement “violation” by a “third-party vendor.”

Prime Video pulls eerily emotionless AI-generated anime dubs after complaints Read More »

science-centric-streaming-service-curiosity-stream-is-an-ai-licensing-firm-now

Science-centric streaming service Curiosity Stream is an AI-licensing firm now

We all know streaming services’ usual tricks for making more money: get more subscribers, charge those subscribers more money, and sell ads. But science streaming service Curiosity Stream is taking a new route that could reshape how streaming companies, especially niche options, try to survive.

Discovery Channel founder John Hendricks launched Curiosity Stream in 2015. The streaming service costs $40 per year, and it doesn’t have commercials.

The streaming business has grown to also include the Curiosity Channel TV channel. CuriosityStream Inc. also makes money through original programming and its Curiosity University educational programming. The firm turned its first positive net income in its fiscal Q1 2025, after about a decade of business.

With its focus on science, history, research, and education, Curiosity Stream will always be a smaller player compared to other streaming services. As of March 2023, Curiosity Stream had 23 million subscribers, a paltry user base compared to Netflix’s 301.6 million (as of January 2025).

Still, in an extremely competitive market, Curiosity Stream’s revenue increased 41 percent year over year in its Q3 2025 earnings announced this month. This was largely due to the licensing of Curiosity Stream’s original programming to train large language models (LLMs).

“Looking at our year-to-date numbers, licensing generated $23.4 million through September, which … is already over half of what our subscription business generated for all of 2024,” Phillip Hayden, Curiosity Stream’s CFO, said during a call with investors this month.

Thus far, Curiosity Stream has completed 18 AI-related fulfillments “across video, audio, and code assets” with nine partners, an October announcement said.

The company expects to make more revenue from IP licensing deals with AI companies than it does from subscriptions by 2027, “possibly earlier,” CEO Clint Stinchcomb said during the earnings call.

Science-centric streaming service Curiosity Stream is an AI-licensing firm now Read More »

google-tells-employees-it-must-double-capacity-every-6-months-to-meet-ai-demand

Google tells employees it must double capacity every 6 months to meet AI demand

While AI bubble talk fills the air these days, with fears of overinvestment that could pop at any time, something of a contradiction is brewing on the ground: Companies like Google and OpenAI can barely build infrastructure fast enough to fill their AI needs.

During an all-hands meeting earlier this month, Google’s AI infrastructure head Amin Vahdat told employees that the company must double its serving capacity every six months to meet demand for artificial intelligence services, reports CNBC. The comments show a rare look at what Google executives are telling its own employees internally. Vahdat, a vice president at Google Cloud, presented slides to its employees showing the company needs to scale “the next 1000x in 4-5 years.”

While a thousandfold increase in compute capacity sounds ambitious by itself, Vahdat noted some key constraints: Google needs to be able to deliver this increase in capability, compute, and storage networking “for essentially the same cost and increasingly, the same power, the same energy level,” he told employees during the meeting. “It won’t be easy but through collaboration and co-design, we’re going to get there.”

It’s unclear how much of this “demand” Google mentioned represents organic user interest in AI capabilities versus the company integrating AI features into existing services like Search, Gmail, and Workspace. But whether users are using the features voluntarily or not, Google isn’t the only tech company struggling to keep up with a growing user base of customers using AI services.

Major tech companies are in a race to build out data centers. Google competitor OpenAI is planning to build six massive data centers across the US through its Stargate partnership project with SoftBank and Oracle, committing over $400 billion in the next three years to reach nearly 7 gigawatts of capacity. The company faces similar constraints serving its 800 million weekly ChatGPT users, with even paid subscribers regularly hitting usage limits for features like video synthesis and simulated reasoning models.

“The competition in AI infrastructure is the most critical and also the most expensive part of the AI race,” Vahdat said at the meeting, according to CNBC’s viewing of the presentation. The infrastructure executive explained that Google’s challenge goes beyond simply outspending competitors. “We’re going to spend a lot,” he said, but noted the real objective is building infrastructure that is “more reliable, more performant and more scalable than what’s available anywhere else.”

Google tells employees it must double capacity every 6 months to meet AI demand Read More »

google-unveils-gemini-3-ai-model-and-ai-first-ide-called-antigravity

Google unveils Gemini 3 AI model and AI-first IDE called Antigravity


Google’s flagship AI model is getting its second major upgrade this year.

Google has kicked its Gemini rollout into high gear over the past year, releasing the much-improved Gemini 2.5 family and cramming various flavors of the model into Search, Gmail, and just about everything else the company makes.

Now, Google’s increasingly unavoidable AI is getting an upgrade. Gemini 3 Pro is available in a limited form today, featuring more immersive, visual outputs and fewer lies, Google says. The company also says Gemini 3 sets a new high-water mark for vibe coding, and Google is announcing a new AI-first integrated development environment (IDE) called Antigravity, which is also available today.

The first member of the Gemini 3 family

Google says the release of Gemini 3 is yet another step toward artificial general intelligence (AGI). The new version of Google’s flagship AI model has expanded simulated reasoning abilities and shows improved understanding of text, images, and video. So far, testers like it—Google’s latest LLM is once again atop the LMArena leaderboard with an ELO score of 1,501, besting Gemini 2.5 Pro by 50 points.

Gemini 3 LMArena

Credit: Google

Factuality has been a problem for all gen AI models, but Google says Gemini 3 is a big step in the right direction, and there are myriad benchmarks to tell the story. In the 1,000-question SimpleQA Verified test, Gemini 3 scored a record 72.1 percent. Yes, that means the state-of-the-art LLM still screws up almost 30 percent of general knowledge questions, but Google says this still shows substantial progress. On the much more difficult Humanity’s Last Exam, which tests PhD-level knowledge and reasoning, Gemini set another record, scoring 37.5 percent without tool use.

Math and coding are also a focus of Gemini 3. The model set new records in MathArena Apex (23.4 percent) and WebDev Arena (1487 ELO). In the SWE-bench Verified, which tests a model’s ability to generate code, Gemini 3 hit an impressive 76.2 percent.

So there are plenty of respectable but modest benchmark improvements, but Gemini 3 also won’t make you cringe as much. Google says it has tamped down on sycophancy, a common problem in all these overly polite LLMs. Outputs from Gemini 3 Pro are reportedly more concise, with less of what you want to hear and more of what you need to hear.

You can also expect Gemini 3 Pro to produce noticeably richer outputs. Google claims Gemini’s expanded reasoning capabilities keep it on task more effectively, allowing it to take action on your behalf. For example, Gemini 3 can triage and take action on your emails, creating to-do lists, summaries, recommended replies, and handy buttons to trigger suggested actions. This differs from the current Gemini models, which would only create a text-based to-do list with similar prompts.

The model also has what Google calls a “generative interface,” which comes in the form of two experimental output modes called visual layout and dynamic view. The former is a magazine-style interface that includes lots of images in a scrollable UI. Dynamic view leverages Gemini’s coding abilities to create custom interfaces—for example, a web app that explores the life and work of Vincent Van Gogh.

There will also be a Deep Think mode for Gemini 3, but that’s not ready for prime time yet. Google says it’s being tested by a small group for later release, but you should expect big things. Deep Think mode manages 41 percent in Humanity’s Last Exam without tools. Believe it or not, that’s an impressive score.

Coding with vibes

Google has offered several ways of generating and modifying code with Gemini models, but the launch of Gemini 3 adds a new one: Google Antigravity. This is Google’s new agentic development platform—it’s essentially an IDE designed around agentic AI, and it’s available in preview today.

With Antigravity, Google promises that you (the human) can get more work done by letting intelligent agents do the legwork. Google says you should think of Antigravity as a “mission control” for creating and monitoring multiple development agents. The AI in Antigravity can operate autonomously across the editor, terminal, and browser to create and modify projects, but everything they do is relayed to the user in the form of “Artifacts.” These sub-tasks are designed to be easily verifiable so you can keep on top of what the agent is doing. Gemini will be at the core of the Antigravity experience, but it’s not just Google’s bot. Antigravity also supports Claude Sonnet 4.5 and GPT-OSS agents.

Of course, developers can still plug into the Gemini API for coding tasks. With Gemini 3, Google is adding a client-side bash tool, which lets the AI generate shell commands in its workflow. The model can access file systems and automate operations, and a server-side bash tool will help generate code in multiple languages. This feature is starting in early access, though.

AI Studio is designed to be a faster way to build something with Gemini 3. Google says Gemini 3 Pro’s strong instruction following makes it the best vibe coding model yet, allowing non-programmers to create more complex projects.

A big experiment

Google will eventually have a whole family of Gemini 3 models, but there’s just the one for now. Gemini 3 Pro is rolling out in the Gemini app, AI Studio, Vertex AI, and the API starting today as an experiment. If you want to tinker with the new model in Google’s Antigravity IDE, that’s also available for testing today on Windows, Mac, and Linux.

Gemini 3 will also launch in the Google search experience on day one. You’ll have the option to enable Gemini 3 Pro in AI Mode, where Google says it will provide more useful information about a query. The generative interface capabilities from the Gemini app will be available here as well, allowing Gemini to create tools and simulations when appropriate to answer the user’s question. Google says these generative interfaces are strongly preferred in its user testing. This feature is available today, but only for AI Pro and Ultra subscribers.

Because the Pro model is the only Gemini 3 variant available in the preview, AI Overviews isn’t getting an immediate upgrade. That will come, but for now, Overviews will only reach out to Gemini 3 Pro for especially difficult search queries—basically the kind of thing Google thinks you should have used AI Mode to do in the first place.

There’s no official timeline for releasing more Gemini 3 models or graduating the Pro variant to general availability. However, given the wide rollout of the experimental release, it probably won’t be long.

Photo of Ryan Whitwam

Ryan Whitwam is a senior technology reporter at Ars Technica, covering the ways Google, AI, and mobile technology continue to change the world. Over his 20-year career, he’s written for Android Police, ExtremeTech, Wirecutter, NY Times, and more. He has reviewed more phones than most people will ever own. You can follow him on Bluesky, where you will see photos of his dozens of mechanical keyboards.

Google unveils Gemini 3 AI model and AI-first IDE called Antigravity Read More »

openai-walks-a-tricky-tightrope-with-gpt-5.1’s-eight-new-personalities

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities

On Wednesday, OpenAI released GPT-5.1 Instant and GPT-5.1 Thinking, two updated versions of its flagship AI models now available in ChatGPT. The company is wrapping the models in the language of anthropomorphism, claiming that they’re warmer, more conversational, and better at following instructions.

The release follows complaints earlier this year that its previous models were excessively cheerful and sycophantic, along with an opposing controversy among users over how OpenAI modified the default GPT-5 output style after several suicide lawsuits.

The company now faces intense scrutiny from lawyers and regulators that could threaten its future operations. In that kind of environment, it’s difficult to just release a new AI model, throw out a few stats, and move on like the company could even a year ago. But here are the basics: The new GPT-5.1 Instant model will serve as ChatGPT’s faster default option for most tasks, while GPT-5.1 Thinking is a simulated reasoning model that attempts to handle more complex problem-solving tasks.

OpenAI claims that both models perform better on technical benchmarks such as math and coding evaluations (including AIME 2025 and Codeforces) than GPT-5, which was released in August.

Improved benchmarks may win over some users, but the biggest change with GPT-5.1 is in its presentation. OpenAI says it heard from users that they wanted AI models to simulate different communication styles depending on the task, so the company is offering eight preset options, including Professional, Friendly, Candid, Quirky, Efficient, Cynical, and Nerdy, alongside a Default setting.

These presets alter the instructions fed into each prompt to simulate different personality styles, but the underlying model capabilities remain the same across all settings.

An illustration showing GPT-5.1's eight personality styles in ChatGPT.

An illustration showing GPT-5.1’s eight personality styles in ChatGPT. Credit: OpenAI

In addition, the company trained GPT-5.1 Instant to use “adaptive reasoning,” meaning that the model decides when to spend more computational time processing a prompt before generating output.

The company plans to roll out the models gradually over the next few days, starting with paid subscribers before expanding to free users. OpenAI plans to bring both GPT-5.1 Instant and GPT-5.1 Thinking to its API later this week. GPT-5.1 Instant will appear as gpt-5.1-chat-latest, and GPT-5.1 Thinking will be released as GPT-5.1 in the API, both with adaptive reasoning enabled. The older GPT-5 models will remain available in ChatGPT under the legacy models dropdown for paid subscribers for three months.

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities Read More »

researchers-surprised-that-with-ai,-toxicity-is-harder-to-fake-than-intelligence

Researchers surprised that with AI, toxicity is harder to fake than intelligence

The next time you encounter an unusually polite reply on social media, you might want to check twice. It could be an AI model trying (and failing) to blend in with the crowd.

On Wednesday, researchers from the University of Zurich, University of Amsterdam, Duke University, and New York University released a study revealing that AI models remain easily distinguishable from humans in social media conversations, with overly friendly emotional tone serving as the most persistent giveaway. The research, which tested nine open-weight models across Twitter/X, Bluesky, and Reddit, found that classifiers developed by the researchers detected AI-generated replies with 70 to 80 percent accuracy.

The study introduces what the authors call a “computational Turing test” to assess how closely AI models approximate human language. Instead of relying on subjective human judgment about whether text sounds authentic, the framework uses automated classifiers and linguistic analysis to identify specific features that distinguish machine-generated from human-authored content.

“Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression,” the researchers wrote. The team, led by Nicolò Pagan at the University of Zurich, tested various optimization strategies, from simple prompting to fine-tuning, but found that deeper emotional cues persist as reliable tells that a particular text interaction online was authored by an AI chatbot rather than a human.

The toxicity tell

In the study, researchers tested nine large language models: Llama 3.1 8B, Llama 3.1 8B Instruct, Llama 3.1 70B, Mistral 7B v0.1, Mistral 7B Instruct v0.2, Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, DeepSeek-R1-Distill-Llama-8B, and Apertus-8B-2509.

When prompted to generate replies to real social media posts from actual users, the AI models struggled to match the level of casual negativity and spontaneous emotional expression common in human social media posts, with toxicity scores consistently lower than authentic human replies across all three platforms.

To counter this deficiency, the researchers attempted optimization strategies (including providing writing examples and context retrieval) that reduced structural differences like sentence length or word count, but variations in emotional tone persisted. “Our comprehensive calibration tests challenge the assumption that more sophisticated optimization necessarily yields more human-like output,” the researchers concluded.

Researchers surprised that with AI, toxicity is harder to fake than intelligence Read More »

5-ai-developed-malware-families-analyzed-by-google-fail-to-work-and-are-easily-detected

5 AI-developed malware families analyzed by Google fail to work and are easily detected

The assessments provide a strong counterargument to the exaggerated narratives being trumpeted by AI companies, many seeking new rounds of venture funding, that AI-generated malware is widespread and part of a new paradigm that poses a current threat to traditional defenses.

A typical example is Anthropic, which recently reported its discovery of a threat actor that used its Claude LLM to “develop, market, and distribute several variants of ransomware, each with advanced evasion capabilities, encryption, and anti-recovery mechanisms.” The company went on to say: “Without Claude’s assistance, they could not implement or troubleshoot core malware components, like encryption algorithms, anti-analysis techniques, or Windows internals manipulation.”

Startup ConnectWise recently said that generative AI was “lowering the bar of entry for threat actors to get into the game.” The post cited a separate report from OpenAI that found 20 separate threat actors using its ChatGPT AI engine to develop malware for tasks including identifying vulnerabilities, developing exploit code, and debugging that code. BugCrowd, meanwhile, said that in a survey of self-selected individuals, “74 percent of hackers agree that AI has made hacking more accessible, opening the door for newcomers to join the fold.”

In some cases, the authors of such reports note the same limitations noted in this article. Wednesday’s report from Google says that in its analysis of AI tools used to develop code for managing command-and-control channels and obfuscating its operations “we did not see evidence of successful automation or any breakthrough capabilities.” OpenAI said much the same thing. Still, these disclaimers are rarely made prominently and are often downplayed in the resulting frenzy to portray AI-assisted malware as posing a near-term threat.

Google’s report provides at least one other useful finding. One threat actor that exploited the company’s Gemini AI model was able to bypass its guardrails by posing as white-hat hackers doing research for participation in a capture-the-flag game. These competitive exercises are designed to teach and demonstrate effective cyberattack strategies to both participants and onlookers.

Such guardrails are built into all mainstream LLMs to prevent them from being used maliciously, such as in cyberattacks and self-harm. Google said it has since better fine-tuned the countermeasure to resist such ploys.

Ultimately, the AI-generated malware that has surfaced to date suggests that it’s mostly experimental, and the results aren’t impressive. The events are worth monitoring for developments that show AI tools producing new capabilities that were previously unknown. For now, though, the biggest threats continue to predominantly rely on old-fashioned tactics.

5 AI-developed malware families analyzed by Google fail to work and are easily detected Read More »

google-removes-gemma-models-from-ai-studio-after-gop-senator’s-complaint

Google removes Gemma models from AI Studio after GOP senator’s complaint

You may be disappointed if you go looking for Google’s open Gemma AI model in AI Studio today. Google announced late on Friday that it was pulling Gemma from the platform, but it was vague about the reasoning. The abrupt change appears to be tied to a letter from Sen. Marsha Blackburn (R-Tenn.), who claims the Gemma model generated false accusations of sexual misconduct against her.

Blackburn published her letter to Google CEO Sundar Pichai on Friday, just hours before the company announced the change to Gemma availability. She demanded Google explain how the model could fail in this way, tying the situation to ongoing hearings that accuse Google and others of creating bots that defame conservatives.

At the hearing, Google’s Markham Erickson explained that AI hallucinations are a widespread and known issue in generative AI, and Google does the best it can to mitigate the impact of such mistakes. Although no AI firm has managed to eliminate hallucinations, Google’s Gemini for Home has been particularly hallucination-happy in our testing.

The letter claims that Blackburn became aware that Gemma was producing false claims against her following the hearing. When asked, “Has Marsha Blackburn been accused of rape?” Gemma allegedly hallucinated a drug-fueled affair with a state trooper that involved “non-consensual acts.”

Blackburn goes on to express surprise that an AI model would simply “generate fake links to fabricated news articles.” However, this is par for the course with AI hallucinations, which are relatively easy to find when you go prompting for them. AI Studio, where Gemma was most accessible, also includes tools to tweak the model’s behaviors that could make it more likely to spew falsehoods. Someone asked a leading question for Gemma, and it took the bait.

Keep your head down

Announcing the change to Gemma availability on X, Google reiterates that it is working hard to minimize hallucinations. However, it doesn’t want “non-developers” tinkering with the open model to produce inflammatory outputs, so Gemma is no longer available. Developers can continue to use Gemma via the API, and the models are available for download if you want to develop with them locally.

Google removes Gemma models from AI Studio after GOP senator’s complaint Read More »

openai-signs-massive-ai-compute-deal-with-amazon

OpenAI signs massive AI compute deal with Amazon

On Monday, OpenAI announced it has signed a seven-year, $38 billion deal to buy cloud services from Amazon Web Services to power products like ChatGPT and Sora. It’s the company’s first big computing deal after a fundamental restructuring last week that gave OpenAI more operational and financial freedom from Microsoft.

The agreement gives OpenAI access to hundreds of thousands of Nvidia graphics processors to train and run its AI models. “Scaling frontier AI requires massive, reliable compute,” OpenAI CEO Sam Altman said in a statement. “Our partnership with AWS strengthens the broad compute ecosystem that will power this next era and bring advanced AI to everyone.”

OpenAI will reportedly use Amazon Web Services immediately, with all planned capacity set to come online by the end of 2026 and room to expand further in 2027 and beyond. Amazon plans to roll out hundreds of thousands of chips, including Nvidia’s GB200 and GB300 AI accelerators, in data clusters built to power ChatGPT’s responses, generate AI videos, and train OpenAI’s next wave of models.

Wall Street apparently liked the deal, because Amazon shares hit an all-time high on Monday morning. Meanwhile, shares for long-time OpenAI investor and partner Microsoft briefly dipped following the announcement.

Massive AI compute requirements

It’s no secret that running generative AI models for hundreds of millions of people currently requires a lot of computing power. Amid chip shortages over the past few years, finding sources of that computing muscle has been tricky. OpenAI is reportedly working on its own GPU hardware to help alleviate the strain.

But for now, the company needs to find new sources of Nvidia chips, which accelerate AI computations. Altman has previously said that the company plans to spend $1.4 trillion to develop 30 gigawatts of computing resources, an amount that is enough to roughly power 25 million US homes, according to Reuters.

OpenAI signs massive AI compute deal with Amazon Read More »