NPU

the-npu-in-your-phone-keeps-improving—why-isn’t-that-making-ai-better?

The NPU in your phone keeps improving—why isn’t that making AI better?


Shrinking AI for your phone is no simple matter.

npu phone

The NPU in your phone might not be doing very much. Credit: Aurich Lawson | Getty Images

The NPU in your phone might not be doing very much. Credit: Aurich Lawson | Getty Images

Almost every technological innovation of the past several years has been laser-focused on one thing: generative AI. Many of these supposedly revolutionary systems run on big, expensive servers in a data center somewhere, but at the same time, chipmakers are crowing about the power of the neural processing units (NPU) they have brought to consumer devices. Every few months, it’s the same thing: This new NPU is 30 or 40 percent faster than the last one. That’s supposed to let you do something important, but no one really gets around to explaining what that is.

Experts envision a future of secure, personal AI tools with on-device intelligence, but does that match the reality of the AI boom? AI on the “edge” sounds great, but almost every AI tool of consequence is running in the cloud. So what’s that chip in your phone even doing?

What is an NPU?

Companies launching a new product often get bogged down in superlatives and vague marketing speak, so they do a poor job of explaining technical details. It’s not clear to most people buying a phone why they need the hardware to run AI workloads, and the supposed benefits are largely theoretical.

Many of today’s flagship consumer processors are systems-on-a-chip (SoC) because they incorporate multiple computing elements—like CPU cores, GPUs, and imaging controllers—on a single piece of silicon. This is true of mobile parts like Qualcomm’s Snapdragon or Google’s Tensor, as well as PC components like the Intel Core Ultra.

The NPU is a newer addition to chips, but it didn’t just appear one day—there’s a lineage that brought us here. NPUs are good at what they do because they emphasize parallel computing, something that’s also important in other SoC components.

Qualcomm devotes significant time during its new product unveilings to talk about its Hexagon NPUs. Keen observers may recall that this branding has been reused from the company’s line of digital signal processors (DSPs), and there’s a good reason for that.

“Our journey into AI processing started probably 15 or 20 years ago, wherein our first anchor point was looking at signal processing,” said Vinesh Sukumar, Qualcomm’s head of AI products. DSPs have a similar architecture compared to NPUs, but they’re much simpler, with a focus on processing audio (e.g., speech recognition) and modem signals.

Qualcomm chip design NPU

The NPU is one of multiple components in modern SoCs.

Credit: Qualcomm

The NPU is one of multiple components in modern SoCs. Credit: Qualcomm

As the collection of technologies we refer to as “artificial intelligence” developed, engineers began using DSPs for more types of parallel processing, like long short-term memory (LSTM). Sukumar explained that as the industry became enamored with convolutional neural networks (CNNs), the technology underlying applications like computer vision, DSPs became focused on matrix functions, which are essential to generative AI processing as well.

While there is an architectural lineage here, it’s not quite right to say NPUs are just fancy DSPs. “If you talk about DSPs in the general term of the word, yes, [an NPU] is a digital signal processor,” said MediaTek Assistant Vice President Mark Odani. “But it’s all come a long way and it’s a lot more optimized for parallelism, how the transformers work, and holding huge numbers of parameters for processing.”

Despite being so prominent in new chips, NPUs are not strictly necessary for running AI workloads on the “edge,” a term that differentiates local AI processing from cloud-based systems. CPUs are slower than NPUs but can handle some light workloads without using as much power. Meanwhile, GPUs can often chew through more data than an NPU, but they use more power to do it. And there are times you may want to do that, according to Qualcomm’s Sukumar. For example, running AI workloads while a game is running could favor the GPU.

“Here, your measurement of success is that you cannot drop your frame rate while maintaining the spatial resolution, the dynamic range of the pixel, and also being able to provide AI recommendations for the player within that space,” says Sukumar. “In this kind of use case, it actually makes sense to run that in the graphics engine, because then you don’t have to keep shifting between the graphics and a domain-specific AI engine like an NPU.”

Livin’ on the edge is hard

Unfortunately, the NPUs in many devices sit idle (and not just during gaming). The mix of local versus cloud AI tools favors the latter because that’s the natural habitat of LLMs. AI models are trained and fine-tuned on powerful servers, and that’s where they run best.

A server-based AI, like the full-fat versions of Gemini and ChatGPT, is not resource-constrained like a model running on your phone’s NPU. Consider the latest version of Google’s on-device Gemini Nano model, which has a context window of 32k tokens. That is a more than 2x improvement over the last version. However, the cloud-based Gemini models have context windows of up to 1 million tokens, meaning they can process much larger volumes of data.

Both cloud-based and edge AI hardware will continue getting better, but the balance may not shift in the NPU’s favor. “The cloud will always have more compute resources versus a mobile device,” said Google’s Shenaz Zack, senior product manager on the Pixel team.

“If you want the most accurate models or the most brute force models, that all has to be done in the cloud,” Odani said. “But what we’re finding is that, in a lot of the use cases where there’s just summarizing some text or you’re talking to your voice assistant, a lot of those things can fit within three billion parameters.”

Squeezing AI models onto a phone or laptop involves some compromise—for example, by reducing the parameters included in the model. Odani explained that cloud-based models run hundreds of billions of parameters, the weighting that determines how a model processes input tokens to generate outputs. You can’t run anything like that on a consumer device right now, so developers have to vastly scale back the size of models for the edge. Odani says MediaTek’s latest ninth-generation NPU can handle about 3 billion parameters—a difference of several orders of magnitude.

The amount of memory available in a phone or laptop is also a limiting factor, so mobile-optimized AI models are usually quantized. That means the model’s estimation of the next token runs with less precision. Let’s say you want to run one of the larger open models, like Llama or Gemma 7b, on your device. The de facto standard is FP16, known as half-precision. At that level, a model with 7 billion parameters will lock up 13 or 14 gigabytes of memory. Stepping down to FP4 (quarter-precision) brings the size of the model in memory to a few gigs.

“When you compress to, let’s say, between three and four gigabytes, it’s a sweet spot for integration into memory constrained form factors like a smartphone,” Sukumar said. “And there’s been a lot of investment in the ecosystem and at Qualcomm to look at various ways of compressing the models without losing quality.”

It’s difficult to create a generalized AI with these limitations for mobile devices, but computers—and especially smartphones—are a wellspring of data that can be pumped into models to generate supposedly helpful outputs. That’s why most edge AI is geared toward specific, narrow use cases, like analyzing screenshots or suggesting calendar appointments. Google says its latest Pixel phones run more than 100 AI models, both generative and traditional.

Even AI skeptics can recognize that the landscape is changing quickly. In the time it takes to shrink and optimize AI models for a phone or laptop, new cloud models may appear that make that work obsolete. This is also why third-party developers have been slow to utilize NPU processing in apps. They either have to plug into an existing on-device model, which involves restrictions and rapidly moving development targets, or deploy their own custom models. Neither is a great option currently.

A matter of trust

If the cloud is faster and easier, why go to the trouble of optimizing for the edge and burning more power with an NPU? Leaning on the cloud means accepting a level of dependence and trust in the people operating AI data centers that may not always be appropriate.

“We always start off with user privacy as an element,” said Qualcomm’s Sukumar. He explained that the best inference is not general in nature—it’s personalized based on the user’s interests and what’s happening in their lives. Fine-tuning models to deliver that experience calls for personal data, and it’s safer to store and process that data locally.

Even when companies say the right things about privacy in their cloud services, they’re far from guarantees. The helpful, friendly vibe of general chatbots also encourages people to divulge a lot of personal information, and if that assistant is running in the cloud, your data is there as well. OpenAI’s copyright fight with The New York Times could lead to millions of private chats being handed over to the publisher. The explosive growth and uncertain regulatory framework of gen AI make it hard to know what’s going to happen to your data.

“People are using a lot of these generative AI assistants like a therapist,” Odani said. “And you don’t know one day if all this stuff is going to come out on the Internet.”

Not everyone is so concerned. Zack claims Google has built “the world’s most secure cloud infrastructure,” allowing it to process data where it delivers the best results. Zack uses Video Boost and Pixel Studio as examples of this approach, noting that Google’s cloud is the only way to make these experiences fast and high-quality. The company recently announced its new Private AI Compute system, which it claims is just as safe as local AI.

Even if that’s true, the edge has other advantages—edge AI is just more reliable than a cloud service. “On-device is fast,” Odani said. “Sometimes I’m talking to ChatGPT and my Wi-Fi goes out or whatever, and it skips a beat.”

The services hosting cloud-based AI models aren’t just a single website—the Internet of today is massively interdependent, with content delivery networks, DNS providers, hosting, and other services that could degrade or shut down your favorite AI in the event of a glitch. When Cloudflare suffered a self-inflicted outage recently, ChatGPT users were annoyed to find their trusty chatbot was unavailable. Local AI features don’t have that drawback.

Cloud dominance

Everyone seems to agree that a hybrid approach is necessary to deliver truly useful AI features (assuming those exist), sending data to more powerful cloud services when necessary—Google, Apple, and every other phone maker does this. But the pursuit of a seamless experience can also obscure what’s happening with your data. More often than not, the AI features on your phone aren’t running in a secure, local way, even when the device has the hardware to do that.

Take, for example, the new OnePlus 15. This phone has Qualcomm’s brand-new Snapdragon 8 Elite Gen 5, which has an NPU that is 37 percent faster than the last one, for whatever that’s worth. Even with all that on-device AI might, OnePlus is heavily reliant on the cloud to analyze your personal data. Features like AI Writer and the AI Recorder connect to the company’s servers for processing, a system OnePlus assures us is totally safe and private.

Similarly, Motorola released a new line of foldable Razr phones over the summer that are loaded with AI features from multiple providers. These phones can summarize your notifications using AI, but you might be surprised how much of it happens in the cloud unless you read the terms and conditions. If you buy the Razr Ultra, that summarization happens on your phone. However, the cheaper models with less RAM and NPU power use cloud services to process your notifications. Again, Motorola says this system is secure, but a more secure option would have been to re-optimize the model for its cheaper phones.

Even when an OEM focuses on using the NPU hardware, the results can be lacking. Look at Google’s Daily Hub and Samsung’s Now Brief. These features are supposed to chew through all the data on your phone and generate useful recommendations and actions, but they rarely do anything aside from showing calendar events. In fact, Google has temporarily removed Daily Hub from Pixels because the feature did so little, and Google is a pioneer in local AI with Gemini Nano. Google has actually moved some parts of its mobile AI experience from local to cloud-based processing in recent months.

Those “brute force” models appear to be winning, and it doesn’t hurt that companies also get more data when you interact with their private computing cloud services.

Maybe take what you can get?

There’s plenty of interest in local AI, but so far, that hasn’t translated to an AI revolution in your pocket. Most of the AI advances we’ve seen so far depend on the ever-increasing scale of cloud systems and the generalized models that run there. Industry experts say that extensive work is happening behind the scenes to shrink AI models to work on phones and laptops, but it will take time for that to make an impact.

In the meantime, local AI processing is out there in a limited way. Google still makes use of the Tensor NPU to handle sensitive data for features like Magic Cue, and Samsung really makes the most of Qualcomm’s AI-focused chipsets. While Now Brief is of questionable utility, Samsung is cognizant of how reliance on the cloud may impact users, offering a toggle in the system settings that restricts AI processing to run only on the device. This limits the number of available AI features, and others don’t work as well, but you’ll know none of your personal data is being shared. No one else offers this option on a smartphone.

Galaxy AI toggle

Samsung offers an easy toggle to disable cloud AI and run all workloads on-device.

Credit: Ryan Whitwam

Samsung offers an easy toggle to disable cloud AI and run all workloads on-device. Credit: Ryan Whitwam

Samsung spokesperson Elise Sembach said the company’s AI efforts are grounded in enhancing experiences while maintaining user control. “The on-device processing toggle in One UI reflects this approach. It gives users the option to process AI tasks locally for faster performance, added privacy, and reliability even without a network connection,” Sembach said.

Interest in edge AI might be a good thing even if you don’t use it. Planning for this AI-rich future can encourage device makers to invest in better hardware—like more memory to run all those theoretical AI models.

“We definitely recommend our partners increase their RAM capacity,” said Sukumar. Indeed, Google, Samsung, and others have boosted memory capacity in large part to support on-device AI. Even if the cloud is winning, we’ll take the extra RAM.

Photo of Ryan Whitwam

Ryan Whitwam is a senior technology reporter at Ars Technica, covering the ways Google, AI, and mobile technology continue to change the world. Over his 20-year career, he’s written for Android Police, ExtremeTech, Wirecutter, NY Times, and more. He has reviewed more phones than most people will ever own. You can follow him on Bluesky, where you will see photos of his dozens of mechanical keyboards.

The NPU in your phone keeps improving—why isn’t that making AI better? Read More »

windows-recall-demands-an-extraordinary-level-of-trust-that-microsoft-hasn’t-earned

Windows Recall demands an extraordinary level of trust that Microsoft hasn’t earned

The Recall feature as it currently exists in Windows 11 24H2 preview builds.

Enlarge / The Recall feature as it currently exists in Windows 11 24H2 preview builds.

Andrew Cunningham

Microsoft’s Windows 11 Copilot+ PCs come with quite a few new AI and machine learning-driven features, but the tentpole is Recall. Described by Microsoft as a comprehensive record of everything you do on your PC, the feature is pitched as a way to help users remember where they’ve been and to provide Windows extra contextual information that can help it better understand requests from and meet the needs of individual users.

This, as many users in infosec communities on social media immediately pointed out, sounds like a potential security nightmare. That’s doubly true because Microsoft says that by default, Recall’s screenshots take no pains to redact sensitive information, from usernames and passwords to health care information to NSFW site visits. By default, on a PC with 256GB of storage, Recall can store a couple dozen gigabytes of data across three months of PC usage, a huge amount of personal data.

The line between “potential security nightmare” and “actual security nightmare” is at least partly about the implementation, and Microsoft has been saying things that are at least superficially reassuring. Copilot+ PCs are required to have a fast neural processing unit (NPU) so that processing can be performed locally rather than sending data to the cloud; local snapshots are protected at rest by Windows’ disk encryption technologies, which are generally on by default if you’ve signed into a Microsoft account; neither Microsoft nor other users on the PC are supposed to be able to access any particular user’s Recall snapshots; and users can choose to exclude apps or (in most browsers) individual websites to exclude from Recall’s snapshots.

This all sounds good in theory, but some users are beginning to use Recall now that the Windows 11 24H2 update is available in preview form, and the actual implementation has serious problems.

“Fundamentally breaks the promise of security in Windows”

This is Recall, as seen on a PC running a preview build of Windows 11 24H2. It takes and saves periodic screenshots, which can then be searched for and viewed in various ways.

Enlarge / This is Recall, as seen on a PC running a preview build of Windows 11 24H2. It takes and saves periodic screenshots, which can then be searched for and viewed in various ways.

Andrew Cunningham

Security researcher Kevin Beaumont, first in a thread on Mastodon and later in a more detailed blog post, has written about some of the potential implementation issues after enabling Recall on an unsupported system (which is currently the only way to try Recall since Copilot+ PCs that officially support the feature won’t ship until later this month). We’ve also given this early version of Recall a try on a Windows Dev Kit 2023, which we’ve used for all our recent Windows-on-Arm testing, and we’ve independently verified Beaumont’s claims about how easy it is to find and view raw Recall data once you have access to a user’s PC.

To test Recall yourself, developer and Windows enthusiast Albacore has published a tool called AmperageKit that will enable it on Arm-based Windows PCs running Windows 11 24H2 build 26100.712 (the build currently available in the Windows Insider Release Preview channel). Other Windows 11 24H2 versions are missing the underlying code necessary to enable Recall.

  • Windows uses OCR on all the text in all the screenshots it takes. That text is also saved to an SQLite database to facilitate faster searches.

    Andrew Cunningham

  • Searching for “iCloud,” for example, brings up every single screenshot with the word “iCloud” in it, including the app itself and its entry in the Microsoft Store. If I had visited websites that mentioned it, they would show up here, too.

    Andrew Cunningham

The short version is this: In its current form, Recall takes screenshots and uses OCR to grab the information on your screen; it then writes the contents of windows plus records of different user interactions in a locally stored SQLite database to track your activity. Data is stored on a per-app basis, presumably to make it easier for Microsoft’s app-exclusion feature to work. Beaumont says “several days” of data amounted to a database around 90KB in size. In our usage, screenshots taken by Recall on a PC with a 2560×1440 screen come in at 500KB or 600KB apiece (Recall saves screenshots at your PC’s native resolution, minus the taskbar area).

Recall works locally thanks to Azure AI code that runs on your device, and it works without Internet connectivity and without a Microsoft account. Data is encrypted at rest, sort of, at least insofar as your entire drive is generally encrypted when your PC is either signed into a Microsoft account or has Bitlocker turned on. But in its current form, Beaumont says Recall has “gaps you can drive a plane through” that make it trivially easy to grab and scan through a user’s Recall database if you either (1) have local access to the machine and can log into any account (not just the account of the user whose database you’re trying to see), or (2) are using a PC infected with some kind of info-stealer virus that can quickly transfer the SQLite database to another system.

Windows Recall demands an extraordinary level of trust that Microsoft hasn’t earned Read More »

intel-details-new-lunar-lake-cpus-that-will-go-up-against-amd,-qualcomm,-and-apple

Intel details new Lunar Lake CPUs that will go up against AMD, Qualcomm, and Apple

more lakes —

Lunar Lake returns to a more conventional-looking design for Intel.

A high-level breakdown of Intel's next-gen Lunar Lake chips, which preserve some of Meteor Lake's changes while reverting others.

Enlarge / A high-level breakdown of Intel’s next-gen Lunar Lake chips, which preserve some of Meteor Lake’s changes while reverting others.

Intel

Given its recent manufacturing troubles, a resurgent AMD, an incursion from Qualcomm, and Apple’s shift from customer to competitor, it’s been a rough few years for Intel’s processors. Computer buyers have more viable options than they have in many years, and in many ways the company’s Meteor Lake architecture was more interesting as a technical achievement than it was as an upgrade for previous-generation Raptor Lake processors.

But even given all of that, Intel still provides the vast majority of PC CPUs—nearly four-fifths of all computer CPUs sold are Intel’s, according to recent analyst estimates from Canalys. The company still casts a long shadow, and what it does still helps set the pace for the rest of the industry.

Enter its next-generation CPU architecture, codenamed Lunar Lake. We’ve known about Lunar Lake for a while—Intel reminded everyone it was coming when Qualcomm upstaged it during Microsoft’s Copilot+ PC reveal—but this month at Computex the company is going into more detail ahead of availability sometime in Q3 of 2024.

Lunar Lake will be Intel’s first processor with a neural processing unit (NPU) that meets Microsoft’s Copilot+ PC requirements. But looking beyond the endless flow of AI news, it also includes upgraded architectures for its P-cores and E-cores, a next-generation GPU architecture, and some packaging changes that simultaneously build on and revert many of the dramatic changes Intel made for Meteor Lake.

Intel didn’t have more information to share on Arrow Lake, the architecture that will bring Meteor Lake’s big changes to socketed desktop motherboards for the first time. But Intel says that Arrow Lake is still on track for release in Q4 of 2024, and it could be announced at Intel’s annual Innovation event in late September.

Building on Meteor Lake

Lunar Lake continues to use a mix of P-cores and E-cores, which allow the chip to handle a mix of low-intensity and high-performance workloads without using more power than necessary.

Enlarge / Lunar Lake continues to use a mix of P-cores and E-cores, which allow the chip to handle a mix of low-intensity and high-performance workloads without using more power than necessary.

Intel

Lunar Lake shares a few things in common with Meteor Lake, including a chiplet-based design that combines multiple silicon dies into one big one with Intel’s Foveros packaging technology. But in some ways Lunar Lake is simpler and less weird than Meteor Lake, with fewer chiplets and a more conventional design.

Meteor Lake’s components were spread across four tiles: a compute tile that was mainly for the CPU cores, a TSMC-manufactured graphics tile for the GPU rendering hardware, an IO tile to handle things like PCI Express and Thunderbolt connectivity, and a grab-bag “SoC” tile with a couple of additional CPU cores, the media encoding and decoding engine, display connectivity, and the NPU.

Lunar Lake only has two functional tiles, plus a small “filler tile” that seems to exist solely so that the Lunar Lake silicon die can be a perfect rectangle once it’s all packaged together. The compute tile combines all of the processor’s P-cores and E-cores, the GPU, the NPU, the display outputs, and the media encoding and decoding engine. And the platform controller tile handles wired and wireless connectivity, including PCIe and USB, Thunderbolt 4, and Wi-Fi 7 and Bluetooth 5.4.

This is essentially the same split that Intel has used for laptop chips for years and years: one chipset die and one die for the CPU, GPU, and everything else. It’s just that now, those two chips are part of the same silicon die, rather than separate dies on the same processor package. In retrospect it seems like some of Meteor Lake’s most noticeable design departures—the division of GPU-related functions among different tiles, the presence of additional CPU cores inside of the SoC tile—were things Intel had to do to work around the fact that another company was actually manufacturing most of the GPU. Given the opportunity, Intel has returned to a more recognizable assemblage of components.

Intel is shifting to on-package RAM for Meteor Lake, something Apple also uses for its M-series chips.

Enlarge / Intel is shifting to on-package RAM for Meteor Lake, something Apple also uses for its M-series chips.

Intel

Another big packaging change is that Intel is integrating RAM into the CPU package for Lunar Lake, rather than having it installed separately on the motherboard. Intel says this uses 40 percent less power, since it shortens the distance data needs to travel. It also saves motherboard space, which can either be used for other components, to make systems smaller, or to make more room for battery. Apple also uses on-package memory for its M-series chips.

Intel says that Lunar Lake chips can include up to 32GB of LPDDR5x memory. The downside is that this on-package memory precludes the usage of separate Compression-Attached Memory Modules, which combine many of the benefits of traditional upgradable DIMM modules and soldered-down laptop memory.

Intel details new Lunar Lake CPUs that will go up against AMD, Qualcomm, and Apple Read More »

your-current-pc-probably-doesn’t-have-an-ai-processor,-but-your-next-one-might

Your current PC probably doesn’t have an AI processor, but your next one might

Intel's Core Ultra chips are some of the first x86 PC processors to include built-in NPUs. Software support will slowly follow.

Enlarge / Intel’s Core Ultra chips are some of the first x86 PC processors to include built-in NPUs. Software support will slowly follow.

Intel

When it announced the new Copilot key for PC keyboards last month, Microsoft declared 2024 “the year of the AI PC.” On one level, this is just an aspirational PR-friendly proclamation, meant to show investors that Microsoft intends to keep pushing the AI hype cycle that has put it in competition with Apple for the title of most valuable publicly traded company.

But on a technical level, it is true that PCs made and sold in 2024 and beyond will generally include AI and machine-learning processing capabilities that older PCs don’t. The main thing is the neural processing unit (NPU), a specialized block on recent high-end Intel and AMD CPUs that can accelerate some kinds of generative AI and machine-learning workloads more quickly (or while using less power) than the CPU or GPU could.

Qualcomm’s Windows PCs were some of the first to include an NPU, since the Arm processors used in most smartphones have included some kind of machine-learning acceleration for a few years now (Apple’s M-series chips for Macs all have them, too, going all the way back to 2020’s M1). But the Arm version of Windows is a insignificantly tiny sliver of the entire PC market; x86 PCs with Intel’s Core Ultra chips, AMD’s Ryzen 7040/8040-series laptop CPUs, or the Ryzen 8000G desktop CPUs will be many mainstream PC users’ first exposure to this kind of hardware.

Right now, even if your PC has an NPU in it, Windows can’t use it for much, aside from webcam background blurring and a handful of other video effects. But that’s slowly going to change, and part of that will be making it relatively easy for developers to create NPU-agnostic apps in the same way that PC game developers currently make GPU-agnostic games.

The gaming example is instructive, because that’s basically how Microsoft is approaching DirectML, its API for machine-learning operations. Though up until now it has mostly been used to run these AI workloads on GPUs, Microsoft announced last week that it was adding DirectML support for Intel’s Meteor Lake NPUs in a developer preview, starting in DirectML 1.13.1 and ONNX Runtime 1.17.

Though it will only run an unspecified “subset of machine learning models that have been targeted for support” and that some “may not run at all or may have high latency or low accuracy,” it opens the door to more third-party apps to start taking advantage of built-in NPUs. Intel says that Samsung is using Intel’s NPU and DirectML for facial recognition features in its photo gallery app, something that Apple also uses its Neural Engine for in macOS and iOS.

The benefits can be substantial, compared to running those workloads on a GPU or CPU.

“The NPU, at least in Intel land, will largely be used for power efficiency reasons,” Intel Senior Director of Technical Marketing Robert Hallock told Ars in an interview about Meteor Lake’s capabilities. “Camera segmentation, this whole background blurring thing… moving that to the NPU saves about 30 to 50 percent power versus running it elsewhere.”

Intel and Microsoft are both working toward a model where NPUs are treated pretty much like GPUs are today: developers generally target DirectX rather than a specific graphics card manufacturer or GPU architecture, and new features, one-off bug fixes, and performance improvements can all be addressed via GPU driver updates. Some GPUs run specific games better than others, and developers can choose to spend more time optimizing for Nvidia cards or AMD cards, but generally the model is hardware agnostic.

Similarly, Intel is already offering GPU-style driver updates for its NPUs. And Hallock says that Windows already essentially recognizes the NPU as “a graphics card with no rendering capability.”

Your current PC probably doesn’t have an AI processor, but your next one might Read More »