GPUs

nvidia-announces-dgx-desktop-“personal-ai-supercomputers”

Nvidia announces DGX desktop “personal AI supercomputers”

During Tuesday’s Nvidia GTX keynote, CEO Jensen Huang unveiled two “personal AI supercomputers” called DGX Spark and DGX Station, both powered by the Grace Blackwell platform. In a way, they are a new type of AI PC architecture specifically built for running neural networks, and five major PC manufacturers will build the supercomputers.

These desktop systems, first previewed as “Project DIGITS” in January, aim to bring AI capabilities to developers, researchers, and data scientists who need to prototype, fine-tune, and run large AI models locally. DGX systems can serve as standalone desktop AI labs or “bridge systems” that allow AI developers to move their models from desktops to DGX Cloud or any AI cloud infrastructure with few code changes.

Huang explained the rationale behind these new products in a news release, saying, “AI has transformed every layer of the computing stack. It stands to reason a new class of computers would emerge—designed for AI-native developers and to run AI-native applications.”

The smaller DGX Spark features the GB10 Grace Blackwell Superchip with Blackwell GPU and fifth-generation Tensor Cores, delivering up to 1,000 trillion operations per second for AI.

Meanwhile, the more powerful DGX Station includes the GB300 Grace Blackwell Ultra Desktop Superchip with 784GB of coherent memory and the ConnectX-8 SuperNIC supporting networking speeds up to 800Gb/s.

The DGX architecture serves as a prototype that other manufacturers can produce. Asus, Dell, HP, and Lenovo will develop and sell both DGX systems, with DGX Spark reservations opening today and DGX Station expected later in 2025. Additional manufacturing partners for the DGX Station include BOXX, Lambda, and Supermicro, with systems expected to be available later this year.

Since the systems will be manufactured by different companies, Nvidia did not mention pricing for the units. However, in January, Nvidia mentioned that the base-level configuration for a DGX Spark-like computer would retail for around $3,000.

Nvidia announces DGX desktop “personal AI supercomputers” Read More »

nvidia-announces-“rubin-ultra”-and-“feynman”-ai-chips-for-2027-and-2028

Nvidia announces “Rubin Ultra” and “Feynman” AI chips for 2027 and 2028

On Tuesday at Nvidia’s GTC 2025 conference in San Jose, California, CEO Jensen Huang revealed several new AI-accelerating GPUs the company plans to release over the coming months and years. He also revealed more specifications about previously announced chips.

The centerpiece announcement was Vera Rubin, first teased at Computex 2024 and now scheduled for release in the second half of 2026. This GPU, named after a famous astronomer, will feature tens of terabytes of memory and comes with a custom Nvidia-designed CPU called Vera.

According to Nvidia, Vera Rubin will deliver significant performance improvements over its predecessor, Grace Blackwell, particularly for AI training and inference.

Specifications for Vera Rubin, presented by Jensen Huang during his GTC 2025 keynote.

Specifications for Vera Rubin, presented by Jensen Huang during his GTC 2025 keynote.

Vera Rubin features two GPUs together on one die that deliver 50 petaflops of FP4 inference performance per chip. When configured in a full NVL144 rack, the system delivers 3.6 exaflops of FP4 inference compute—3.3 times more than Blackwell Ultra’s 1.1 exaflops in a similar rack configuration.

The Vera CPU features 88 custom ARM cores with 176 threads connected to Rubin GPUs via a high-speed 1.8 TB/s NVLink interface.

Huang also announced Rubin Ultra, which will follow in the second half of 2027. Rubin Ultra will use the NVL576 rack configuration and feature individual GPUs with four reticle-sized dies, delivering 100 petaflops of FP4 precision (a 4-bit floating-point format used for representing and processing numbers within AI models) per chip.

At the rack level, Rubin Ultra will provide 15 exaflops of FP4 inference compute and 5 exaflops of FP8 training performance—about four times more powerful than the Rubin NVL144 configuration. Each Rubin Ultra GPU will include 1TB of HBM4e memory, with the complete rack containing 365TB of fast memory.

Nvidia announces “Rubin Ultra” and “Feynman” AI chips for 2027 and 2028 Read More »

amd-radeon-rx-9070-and-9070-xt-review:-rdna-4-fixes-a-lot-of-amd’s-problems

AMD Radeon RX 9070 and 9070 XT review: RDNA 4 fixes a lot of AMD’s problems


For $549 and $599, AMD comes close to knocking out Nvidia’s GeForce RTX 5070.

AMD’s Radeon RX 9070 and 9070 XT are its first cards based on the RDNA 4 GPU architecture. Credit: Andrew Cunningham

AMD’s Radeon RX 9070 and 9070 XT are its first cards based on the RDNA 4 GPU architecture. Credit: Andrew Cunningham

AMD is a company that knows a thing or two about capitalizing on a competitor’s weaknesses. The company got through its early-2010s nadir partially because its Ryzen CPUs struck just as Intel’s current manufacturing woes began to set in, first with somewhat-worse CPUs that were great value for the money and later with CPUs that were better than anything Intel could offer.

Nvidia’s untrammeled dominance of the consumer graphics card market should also be an opportunity for AMD. Nvidia’s GeForce RTX 50-series graphics cards have given buyers very little to get excited about, with an unreachably expensive high-end 5090 refresh and modest-at-best gains from 5080 and 5070-series cards that are also pretty expensive by historical standards, when you can buy them at all. Tech YouTubers—both the people making the videos and the people leaving comments underneath them—have been almost uniformly unkind to the 50 series, hinting at consumer frustrations and pent-up demand for competitive products from other companies.

Enter AMD’s Radeon RX 9070 XT and RX 9070 graphics cards. These are aimed right at the middle of the current GPU market at the intersection of high sales volume and decent profit margins. They promise good 1440p and entry-level 4K gaming performance and improved power efficiency compared to previous-generation cards, with fixes for long-time shortcomings (ray-tracing performance, video encoding, and upscaling quality) that should, in theory, make them more tempting for people looking to ditch Nvidia.

Table of Contents

RX 9070 and 9070 XT specs and speeds

RX 9070 XT RX 9070 RX 7900 XTX RX 7900 XT RX 7900 GRE RX 7800 XT
Compute units (Stream processors) 64 RDNA4 (4,096) 56 RDNA4 (3,584) 96 RDNA3 (6,144) 84 RDNA3 (5,376) 80 RDNA3 (5,120) 60 RDNA3 (3,840)
Boost Clock 2,970 MHz 2,520 MHz 2,498 MHz 2,400 MHz 2,245 MHz 2,430 MHz
Memory Bus Width 256-bit 256-bit 384-bit 320-bit 256-bit 256-bit
Memory Bandwidth 650GB/s 650GB/s 960GB/s 800GB/s 576GB/s 624GB/s
Memory size 16GB GDDR6 16GB GDDR6 24GB GDDR6 20GB GDDR6 16GB GDDR6 16GB GDDR6
Total board power (TBP) 304 W 220 W 355 W 315 W 260 W 263 W

AMD’s high-level performance promise for the RDNA 4 architecture revolves around big increases in performance per compute unit (CU). An RDNA 4 CU, AMD says, is nearly twice as fast in rasterized performance as RDNA 2 (that is, rendering without ray-tracing effects enabled) and nearly 2.5 times as fast as RDNA 2 in games with ray-tracing effects enabled. Performance for at least some machine learning workloads also goes way up—twice as fast as RDNA 3 and four times as fast as RDNA 2.

We’ll see this in more detail when we start comparing performance, but AMD seems to have accomplished this goal. Despite having 64 or 56 compute units (for the 9070 XT and 9070, respectively), the cards’ performance often competes with AMD’s last-generation flagships, the RX 7900 XTX and 7900 XT. Those cards came with 96 and 84 compute units, respectively. The 9070 cards are specced a lot more like last generation’s RX 7800 XT—including the 16GB of GDDR6 on a 256-bit memory bus, as AMD still isn’t using GDDR6X or GDDR7—but they’re much faster than the 7800 XT was.

AMD has dramatically increased the performance-per-compute unit for RDNA 4. AMD

The 9070 series also uses a new 4 nm manufacturing process from TSMC, an upgrade from the 7000 series’ 5 nm process (and the 6 nm process used for the separate memory controller dies in higher-end RX 7000-series models that used chiplets). AMD’s GPUs are normally a bit less efficient than Nvidia’s, but the architectural improvements and the new manufacturing process allow AMD to do some important catch-up.

Both of the 9070 models we tested were ASRock Steel Legend models, and the 9070 and 9070 XT had identical designs—we’ll probably see a lot of this from AMD’s partners since the GPU dies and the 16GB RAM allotments are the same for both models. Both use two 8-pin power connectors; AMD says partners are free to use the 12-pin power connector if they want, but given Nvidia’s ongoing issues with it, most cards will likely stick with the reliable 8-pin connectors.

AMD doesn’t appear to be making and selling reference designs for the 9070 series the way it did for some RX 7000 and 6000-series GPUs or the way Nvidia does with its Founders Edition cards. From what we’ve seen, 2 or 2.5-slot, triple-fan designs will be the norm, the way they are for most midrange GPUs these days.

Testbed notes

We used the same GPU testbed for the Radeon RX 9070 series as we have for our GeForce RTX 50-series reviews.

An AMD Ryzen 7 9800X3D ensures that our graphics cards will be CPU-limited as little as possible. An ample 1050 W power supply, 32GB of DDR5-6000, and an AMD X670E motherboard with the latest BIOS installed round out the hardware. On the software side, we use an up-to-date installation of Windows 11 24H2 and recent GPU drivers for older cards, ensuring that our tests reflect whatever optimizations Microsoft, AMD, Nvidia, and game developers have made since the last generation of GPUs launched.

We have numbers for all of Nvidia’s RTX 50-series GPUs so far, plus most of the 40-series cards, most of AMD’s RX 7000-series cards, and a handful of older GPUs from the RTX 30-series and RX 6000 series. We’ll focus on comparing the 9070 XT and 9070 to other 1440p-to-4K graphics cards since those are the resolutions AMD is aiming at.

Performance

At $549 and $599, the 9070 series is priced to match Nvidia’s $549 RTX 5070 and undercut the $749 RTX 5070 Ti. So we’ll focus on comparing the 9070 series to those cards, plus the top tier of GPUs from the outgoing RX 7000-series.

Some 4K rasterized benchmarks.

Starting at the top with rasterized benchmarks with no ray-tracing effects, the 9070 XT does a good job of standing up to Nvidia’s RTX 5070 Ti, coming within a few frames per second of its performance in all the games we tested (and scoring very similarly in the 3DMark Time Spy Extreme benchmark).

Both cards are considerably faster than the RTX 5070—between 15 and 28 percent for the 9070 XT and between 5 and 13 percent for the regular 9070 (our 5070 scored weirdly low in Horizon Zero Dawn Remastered, so we’d treat those numbers as outliers for now). Both 9070 cards also stack up well next to the RX 7000 series here—the 9070 can usually just about match the performance of the 7900 XT, and the 9070 XT usually beats it by a little. Both cards thoroughly outrun the old RX 7900 GRE, which was AMD’s $549 GPU offering just a year ago.

The 7900 XT does have 20GB of RAM instead of 16GB, which might help its performance in some edge cases. But 16GB is still perfectly generous for a 1440p-to-4K graphics card—the 5070 only offers 12GB, which could end up limiting its performance in some games as RAM requirements continue to rise.

On ray-tracing improvements

Nvidia got a jump on AMD when it introduced hardware-accelerated ray-tracing in the RTX 20-series in 2018. And while these effects were only supported in a few games at the time, many modern games offer at least some kind of ray-traced lighting effects.

AMD caught up a little when it began shipping its own ray-tracing support in the RDNA2 architecture in late 2020, but the issue since then has always been that AMD cards have taken a larger performance hit than GeForce GPUs when these effects are turned on. RDNA3 promised improvements, but our tests still generally showed the same deficit as before.

So we’re looking for two things with RDNA4’s ray-tracing performance. First, we want the numbers to be higher than they were for comparably priced RX 7000-series GPUs, the same thing we look for in non-ray-traced (or rasterized) rendering performance. Second, we want the size of the performance hit to go down. To pick an example: the RX 7900 GRE could compete with Nvidia’s RTX 4070 Ti Super in games without ray tracing, but it was closer to a non-Super RTX 4070 in ray-traced games. It has helped keep AMD’s cards from being across-the-board competitive with Nvidia’s—is that any different now?

Benchmarks for games with ray-tracing effects enabled. Both AMD cards generally keep pace with the 5070 in these tests thanks to RDNA 4’s improvements.

The picture our tests paint is mixed but tentatively positive. The 9070 series and RDNA4 post solid improvements in the Cyberpunk 2077 benchmarks, substantially closing the performance gap with Nvidia. In games where AMD’s cards performed well enough before—here represented by Returnal—performance goes up, but roughly proportionately with rasterized performance. And both 9070 cards still punch below their weight in Black Myth: Wukong, falling substantially behind the 5070 under the punishing Cinematic graphics preset.

So the benefits you see, as with any GPU update, will depend a bit on the game you’re playing. There’s also a possibility that game optimizations and driver updates made with RDNA4 in mind could boost performance further. We can’t say that AMD has caught all the way up to Nvidia here—the 9070 and 9070 XT are both closer to the GeForce RTX 5070 than the 5070 Ti, despite keeping it closer to the 5070 Ti in rasterized tests—but there is real, measurable improvement here, which is what we were looking for.

Power usage

The 9070 series’ performance increases are particularly impressive when you look at the power-consumption numbers. The 9070 comes close to the 7900 XT’s performance but uses 90 W less power under load. It beats the RTX 5070 most of the time but uses around 30 W less power.

The 9070 XT is a little less impressive on this front—AMD has set clock speeds pretty high, and this can increase power use disproportionately. The 9070 XT is usually 10 or 15 percent faster than the 9070 but uses 38 percent more power. The XT’s power consumption is similar to the RTX 5070 Ti’s (a GPU it often matches) and the 7900 XT’s (a GPU it always beats), so it’s not too egregious, but it’s not as standout as the 9070’s.

AMD gives 9070 owners a couple of new toggles for power limits, though, which we’ll talk about in the next section.

Experimenting with “Total Board Power”

We don’t normally dabble much with overclocking when we review CPUs or GPUs—we’re happy to leave that to folks at other outlets. But when we review CPUs, we do usually test them with multiple power limits in place. Playing with power limits is easier (and occasionally safer) than actually overclocking, and it often comes with large gains to either performance (a chip that performs much better when given more power to work with) or efficiency (a chip that can run at nearly full speed without using as much power).

Initially, I experimented with the RX 9070’s power limits by accident. AMD sent me one version of the 9070 but exchanged it because of a minor problem the OEM identified with some units early in the production run. I had, of course, already run most of our tests on it, but that’s the way these things go sometimes.

By bumping the regular RX 9070’s TBP up just a bit, you can nudge it closer to 9070 XT-level performance.

The replacement RX 9070 card, an ASRock Steel Legend model, was performing significantly better in our tests, sometimes nearly closing the gap between the 9070 and the XT. It wasn’t until I tested power consumption that I discovered the explanation—by default, it was using a 245 W power limit rather than the AMD-defined 220 W limit. Usually, these kinds of factory tweaks don’t make much of a difference, but for the 9070, this power bump gave it a nice performance boost while still keeping it close to the 250 W power limit of the GeForce RTX 5070.

The 90-series cards we tested both add some power presets to AMD’s Adrenalin app in the Performance tab under Tuning. These replace and/or complement some of the automated overclocking and undervolting buttons that exist here for older Radeon cards. Clicking Favor Efficiency or Favor Performance can ratchet the card’s Total Board Power (TBP) up or down, limiting performance so that the card runs cooler and quieter or allowing the card to consume more power so it can run a bit faster.

The 9070 cards get slightly different performance tuning options in the Adrenalin software. These buttons mostly change the card’s Total Board Power (TBP), making it simple to either improve efficiency or boost performance a bit. Credit: Andrew Cunningham

For this particular ASRock 9070 card, the default TBP is set to 245 W. Selecting “Favor Efficiency” sets it to the default 220 W. You can double-check these values using an app like HWInfo, which displays both the current TBP and the maximum TBP in its Sensors Status window. Clicking the Custom button in the Adrenalin software gives you access to a Power Tuning slider, which for our card allowed us to ratchet the TBP up by up to 10 percent or down by as much as 30 percent.

This is all the firsthand testing we did with the power limits of the 9070 series, though I would assume that adding a bit more power also adds more overclocking headroom (bumping up the power limits is common for GPU overclockers no matter who makes your card). AMD says that some of its partners will ship 9070 XT models set to a roughly 340 W power limit out of the box but acknowledges that “you start seeing diminishing returns as you approach the top of that [power efficiency] curve.”

But it’s worth noting that the driver has another automated set-it-and-forget-it power setting you can easily use to find your preferred balance of performance and power efficiency.

A quick look at FSR4 performance

There’s a toggle in the driver for enabling FSR 4 in FSR 3.1-supporting games. Credit: Andrew Cunningham

One of AMD’s headlining improvements to the RX 90-series is the introduction of FSR 4, a new version of its FidelityFX Super Resolution upscaling algorithm. Like Nvidia’s DLSS and Intel’s XeSS, FSR 4 can take advantage of RDNA 4’s machine learning processing power to do hardware-backed upscaling instead of taking a hardware-agnostic approach as the older FSR versions did. AMD says this will improve upscaling quality, but it also means FSR4 will only work on RDNA 4 GPUs.

The good news is that FSR 3.1 and FSR 4 are forward- and backward-compatible. Games that have already added FSR 3.1 support can automatically take advantage of FSR 4, and games that support FSR 4 on the 90-series can just run FSR 3.1 on older and non-AMD GPUs.

FSR 4 comes with a small performance hit compared to FSR 3.1 at the same settings, but better overall quality can let you drop to a faster preset like Balanced or Performance and end up with more frames-per-second overall. Credit: Andrew Cunningham

The only game in our current test suite to be compatible with FSR 4 is Horizon Zero Dawn Remastered, and we tested its performance using both FSR 3.1 and FSR 4. In general, we found that FSR 4 improved visual quality at the cost of just a few frames per second when run at the same settings—not unlike using Nvidia’s recently released “transformer model” for DLSS upscaling.

Many games will let you choose which version of FSR you want to use. But for FSR 3.1 games that don’t have a built-in FSR 4 option, there’s a toggle in AMD’s Adrenalin driver you can hit to switch to the better upscaling algorithm.

Even if they come with a performance hit, new upscaling algorithms can still improve performance by making the lower-resolution presets look better. We run all of our testing in “Quality” mode, which generally renders at two-thirds of native resolution and scales up. But if FSR 4 running in Balanced or Performance mode looks the same to your eyes as FSR 3.1 running in Quality mode, you can still end up with a net performance improvement in the end.

RX 9070 or 9070 XT?

Just $50 separates the advertised price of the 9070 from that of the 9070 XT, something both Nvidia and AMD have done in the past that I find a bit annoying. If you have $549 to spend on a graphics card, you can almost certainly scrape together $599 for a graphics card. All else being equal, I’d tell most people trying to choose one of these to just spring for the 9070 XT.

That said, availability and retail pricing for these might be all over the place. If your choices are a regular RX 9070 or nothing, or an RX 9070 at $549 and an RX 9070 XT at any price higher than $599, I would just grab a 9070 and not sweat it too much. The two cards aren’t that far apart in performance, especially if you bump the 9070’s TBP up a little bit, and games that are playable on one will be playable at similar settings on the other.

Pretty close to great

If you’re building a 1440p or 4K gaming box, the 9070 series might be the ones to beat right now. Credit: Andrew Cunningham

We’ve got plenty of objective data in here, so I don’t mind saying that I came into this review kind of wanting to like the 9070 and 9070 XT. Nvidia’s 50-series cards have mostly upheld the status quo, and for the last couple of years, the status quo has been sustained high prices and very modest generational upgrades. And who doesn’t like an underdog story?

I think our test results mostly justify my priors. The RX 9070 and 9070 XT are very competitive graphics cards, helped along by a particularly mediocre RTX 5070 refresh from Nvidia. In non-ray-traced games, both cards wipe the floor with the 5070 and come close to competing with the $749 RTX 5070 Ti. In games and synthetic benchmarks with ray-tracing effects on, both cards can usually match or slightly beat the similarly priced 5070, partially (if not entirely) addressing AMD’s longstanding performance deficit here. Neither card comes close to the 5070 Ti in these games, but they’re also not priced like a 5070 Ti.

Just as impressively, the Radeon cards compete with the GeForce cards while consuming similar amounts of power. At stock settings, the RX 9070 uses roughly the same amount of power under load as a 4070 Super but with better performance. The 9070 XT uses about as much power as a 5070 Ti, with similar performance before you turn ray-tracing on. Power efficiency was a small but consistent drawback for the RX 7000 series compared to GeForce cards, and the 9070 cards mostly erase that disadvantage. AMD is also less stingy with the RAM, giving you 16GB for the price Nvidia charges for 12GB.

Some of the old caveats still apply. Radeons take a bigger performance hit, proportionally, than GeForce cards. DLSS already looks pretty good and is widely supported, while FSR 3.1/FSR 4 adoption is still relatively low. Nvidia has a nearly monopolistic grip on the dedicated GPU market, which means many apps, AI workloads, and games support its GPUs best/first/exclusively. AMD is always playing catch-up to Nvidia in some respect, and Nvidia keeps progressing quickly enough that it feels like AMD never quite has the opportunity to close the gap.

AMD also doesn’t have an answer for DLSS Multi-Frame Generation. The benefits of that technology are fairly narrow, and you already get most of those benefits with single-frame generation. But it’s still a thing that Nvidia does that AMDon’t.

Overall, the RX 9070 cards are both awfully tempting competitors to the GeForce RTX 5070—and occasionally even the 5070 Ti. They’re great at 1440p and decent at 4K. Sure, I’d like to see them priced another $50 or $100 cheaper to well and truly undercut the 5070 and bring 1440p-to-4K performance t0 a sub-$500 graphics card. It would be nice to see AMD undercut Nvidia’s GPUs as ruthlessly as it undercut Intel’s CPUs nearly a decade ago. But these RDNA4 GPUs have way fewer downsides than previous-generation cards, and they come at a moment of relative weakness for Nvidia. We’ll see if the sales follow.

The good

  • Great 1440p performance and solid 4K performance
  • 16GB of RAM
  • Decisively beats Nvidia’s RTX 5070, including in most ray-traced games
  • RX 9070 XT is competitive with RTX 5070 Ti in non-ray-traced games for less money
  • Both cards match or beat the RX 7900 XT, AMD’s second-fastest card from the last generation
  • Decent power efficiency for the 9070 XT and great power efficiency for the 9070
  • Automated options for tuning overall power use to prioritize either efficiency or performance
  • Reliable 8-pin power connectors available in many cards

The bad

  • Nvidia’s ray-tracing performance is still usually better
  • At $549 and $599, pricing matches but doesn’t undercut the RTX 5070
  • FSR 4 isn’t as widely supported as DLSS and may not be for a while

The ugly

  • Playing the “can you actually buy these for AMD’s advertised prices” game

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

AMD Radeon RX 9070 and 9070 XT review: RDNA 4 fixes a lot of AMD’s problems Read More »

openai’s-secret-weapon-against-nvidia-dependence-takes-shape

OpenAI’s secret weapon against Nvidia dependence takes shape

OpenAI is entering the final stages of designing its long-rumored AI processor with the aim of decreasing the company’s dependence on Nvidia hardware, according to a Reuters report released Monday. The ChatGPT creator plans to send its chip designs to Taiwan Semiconductor Manufacturing Co. (TSMC) for fabrication within the next few months, but the chip has not yet been formally announced.

The OpenAI chip’s full capabilities, technical details, and exact timeline are still unknown, but the company reportedly intends to iterate on the design and improve it over time, giving it leverage in negotiations with chip suppliers—and potentially granting the company future independence with a chip design it controls outright.

In the past, we’ve seen other tech companies, such as Microsoft, Amazon, Google, and Meta, create their own AI acceleration chips for reasons that range from cost reduction to relieving shortages of AI chips supplied by Nvidia, which enjoys a near-market monopoly on high-powered GPUs (such as the Blackwell series) for data center use.

In October 2023, we covered a report about OpenAI’s intention to create its own AI accelerator chips for similar reasons, so OpenAI’s custom chip project has been in the works for some time. In early 2024, OpenAI CEO Sam Altman also began spending considerable time traveling around the world trying to raise up to a reported $7 trillion to increase world chip fabrication capacity.

OpenAI’s secret weapon against Nvidia dependence takes shape Read More »

nvidia-starts-to-wind-down-support-for-old-gpus,-including-the-long-lived-gtx-1060

Nvidia starts to wind down support for old GPUs, including the long-lived GTX 1060

Nvidia is launching the first volley of RTX 50-series GPUs based on its new Blackwell architecture, starting with the RTX 5090 and working downward from there. The company also appears to be winding down support for a few of its older GPU architectures, according to these CUDA release notes spotted by Tom’s Hardware.

The release notes say that CUDA support for the Maxwell, Pascal, and Volta GPU architectures “is considered feature-complete and will be frozen in an upcoming release.” While all of these architectures—which collectively cover GeForce GPUs from the old GTX 700 series all the way up through 2016’s GTX 1000 series, plus a couple of Quadro and Titan workstation cards—are still currently supported by Nvidia’s December Game Ready driver package, the end of new CUDA feature support suggests that these GPUs will eventually be dropped from these driver packages soon.

It’s common for Nvidia and AMD to drop support for another batch of architectures all at once every few years; Nvidia last dropped support for older cards in 2021, and AMD dropped support for several prominent GPUs in 2023. Both companies maintain a separate driver branch for some of their older cards but releases usually only happen every few months, and they focus on security updates, not on providing new features or performance optimizations for new games.

Nvidia starts to wind down support for old GPUs, including the long-lived GTX 1060 Read More »

rumors-say-next-gen-rtx-50-gpus-will-come-with-big-jumps-in-power-requirements

Rumors say next-gen RTX 50 GPUs will come with big jumps in power requirements

Nvidia is reportedly gearing up to launch the first few cards in its RTX 50-series at CES next week, including an RTX 5090, RTX 5080, RTX 5070 Ti, and RTX 5070. The 5090 will be of particular interest to performance-obsessed, money-is-no-object PC gaming fanatics since it’s the first new GPU in over two years that can beat the performance of 2022’s RTX 4090.

But boosted performance and slower advancements in chip manufacturing technology mean that the 5090’s maximum power draw will far outstrip the 4090’s, according to leakers. VideoCardz reports that the 5090’s thermal design power (TDP) will be set at 575 W, up from 450 W for the already power-hungry RTX 4090. The RTX 5080’s TDP is also increasing to 360 W, up from 320 W for the RTX 4080 Super.

That also puts the RTX 5090 close to the maximum power draw available over a single 12VHPWR connector, which is capable of delivering up to 600 W of power (though once you include the 75 W available via the PCI Express slot on your motherboard, the actual maximum possible power draw for a GPU with a single 12VHPWR connector is a slightly higher 675 W).

Higher peak power consumption doesn’t necessarily mean that these cards will always draw more power during actual gaming than their 40-series counterparts. And their performance could be good enough that they could still be very efficient cards in terms of performance per watt.

But if you’re considering an upgrade to an RTX 5090 and these power specs are accurate, you may need to consider an upgraded power supply along with your new graphics card. Nvidia recommends at least an 850 W power supply for the RTX 4090 to accommodate what the GPU needs while leaving enough power left over for the rest of the system. An additional 125 W bump suggests that Nvidia will recommend a 1,000 W power supply as the minimum for the 5090.

We’ll probably know more about Nvidia’s next-gen cards after its CES keynote, currently scheduled for 9: 30 pm Eastern/6: 30 pm Pacific on Monday, January 6.

Rumors say next-gen RTX 50 GPUs will come with big jumps in power requirements Read More »

amd-unveils-powerful-new-ai-chip-to-challenge-nvidia

AMD unveils powerful new AI chip to challenge Nvidia

On Thursday, AMD announced its new MI325X AI accelerator chip, which is set to roll out to data center customers in the fourth quarter of this year. At an event hosted in San Francisco, the company claimed the new chip offers “industry-leading” performance compared to Nvidia’s current H200 GPUs, which are widely used in data centers to power AI applications such as ChatGPT.

With its new chip, AMD hopes to narrow the performance gap with Nvidia in the AI processor market. The Santa Clara-based company also revealed plans for its next-generation MI350 chip, which is positioned as a head-to-head competitor of Nvidia’s new Blackwell system, with an expected shipping date in the second half of 2025.

In an interview with the Financial Times, AMD CEO Lisa Su expressed her ambition for AMD to become the “end-to-end” AI leader over the next decade. “This is the beginning, not the end of the AI race,” she told the publication.

The AMD Instinct MI325X Accelerator.

The AMD Instinct MI325X Accelerator.

The AMD Instinct MI325X Accelerator. Credit: AMD

According to AMD’s website, the announced MI325X accelerator contains 153 billion transistors and is built on the CDNA3 GPU architecture using TSMC’s 5 nm and 6 nm FinFET lithography processes. The chip includes 19,456 stream processors and 1,216 matrix cores spread across 304 compute units. With a peak engine clock of 2100 MHz, the MI325X delivers up to 2.61 PFLOPs of peak eight-bit precision (FP8) performance. For half-precision (FP16) operations, it reaches 1.3 PFLOPs.

AMD unveils powerful new AI chip to challenge Nvidia Read More »

maze-of-adapters,-software-patches-gets-a-dedicated-gpu-working-on-a-raspberry-pi

Maze of adapters, software patches gets a dedicated GPU working on a Raspberry Pi

Actually getting the GPU working required patching the Linux kernel to include the open-source AMDGPU driver, which includes Arm support and provides decent support for the RX 460 (Geerling says the card and its Polaris architecture were chosen because they were new enough to be practically useful and to be supported by the AMDGPU driver, old enough that driver support is pretty mature, and because the card is cheap and uses PCIe 3.0). Nvidia’s GPUs generally aren’t really an option for projects like this because the open source drivers lag far behind the ones available for Radeon GPUs.

Once various kernel patches were applied and the kernel was recompiled, installing AMD’s graphics firmware got both graphics output and 3D acceleration working more or less normally.

Despite their age and relative graphical simplicity, running Doom 3 or Tux Racer on the Pi 5’s GPU is a tall order, even at 1080p. The RX 460 was able to run both at 4K, albeit with some settings reduced; Geerling also said that the card rendered the Pi operating system’s UI smoothly at 4k (the Pi’s integrated GPU does support 4K output, but things get framey quickly in our experience, especially when using multiple monitors).

Though a qualified success, anything this hacky is likely to have at least some software problems; Geerling noted that graphics acceleration in the Chromium browser and GPU-accelerated video encoding and decoding support weren’t working properly.

Most Pi owners aren’t going to want to run out and recreate this setup themselves, but it is interesting to see progress when it comes to using dedicated GPUs with Arm CPUs. So far, Arm chips across all major software ecosystems—including Windows, macOS, and Android—have mostly been restricted to using their own integrated GPUs. But if Arm processors are really going to compete with Intel’s and AMD’s in every PC market segment, we’ll eventually need to see better support for external graphics chips.

Maze of adapters, software patches gets a dedicated GPU working on a Raspberry Pi Read More »

doj-subpoenas-nvidia-in-deepening-ai-antitrust-probe,-report-says

DOJ subpoenas Nvidia in deepening AI antitrust probe, report says

DOJ subpoenas Nvidia in deepening AI antitrust probe, report says

The Department of Justice is reportedly deepening its probe into Nvidia. Officials have moved on from merely questioning competitors to subpoenaing Nvidia and other tech companies for evidence that could substantiate allegations that Nvidia is abusing its “dominant position in AI computing,” Bloomberg reported.

When news of the DOJ’s probe into the trillion-dollar company was first reported in June, Fast Company reported that scrutiny was intensifying merely because Nvidia was estimated to control “as much as 90 percent of the market for chips” capable of powering AI models. Experts told Fast Company that the DOJ probe might even be good for Nvidia’s business, noting that the market barely moved when the probe was first announced.

But the market’s confidence seemed to be shaken a little more on Tuesday, when Nvidia lost a “record-setting $279 billion” in market value following Bloomberg’s report. Nvidia’s losses became “the biggest single-day market-cap decline on record,” TheStreet reported.

People close to the DOJ’s investigation told Bloomberg that the DOJ’s “legally binding requests” require competitors “to provide information” on Nvidia’s suspected anticompetitive behaviors as a “dominant provider of AI processors.”

One concern is that Nvidia may be giving “preferential supply and pricing to customers who use its technology exclusively or buy its complete systems,” sources told Bloomberg. The DOJ is also reportedly probing Nvidia’s acquisition of RunAI—suspecting the deal may lock RunAI customers into using Nvidia chips.

Bloomberg’s report builds on a report last month from The Information that said that Advanced Micro Devices Inc. (AMD) and other Nvidia rivals were questioned by the DOJ—as well as third parties who could shed light on whether Nvidia potentially abused its market dominance in AI chips to pressure customers into buying more products.

According to Bloomberg’s sources, the DOJ is worried that “Nvidia is making it harder to switch to other suppliers and penalizes buyers that don’t exclusively use its artificial intelligence chips.”

In a statement to Bloomberg, Nvidia insisted that “Nvidia wins on merit, as reflected in our benchmark results and value to customers, who can choose whatever solution is best for them.” Additionally, Bloomberg noted that following a chip shortage in 2022, Nvidia CEO Jensen Huang has said that his company strives to prevent stockpiling of Nvidia’s coveted AI chips by prioritizing customers “who can make use of his products in ready-to-go data centers.”

Potential threats to Nvidia’s dominance

Despite the slump in shares, Nvidia’s market dominance seems unlikely to wane any time soon after its stock more than doubled this year. In an SEC filing this year, Nvidia bragged that its “accelerated computing ecosystem is bringing AI to every enterprise” with an “ecosystem” spanning “nearly 5 million developers and 40,000 companies.” Nvidia specifically highlighted that “more than 1,600 generative AI companies are building on Nvidia,” and according to Bloomberg, Nvidia will close out 2024 with more profits than the total sales of its closest competitor, AMD.

After the DOJ’s most recent big win, which successfully proved that Google has a monopoly on search, the DOJ appears intent on getting ahead of any tech companies’ ambitions to seize monopoly power and essentially become the Google of the AI industry. In June, DOJ antitrust chief Jonathan Kanter confirmed to the Financial Times that the DOJ is examining “monopoly choke points and the competitive landscape” in AI beyond just scrutinizing Nvidia.

According to Kanter, the DOJ is scrutinizing all aspects of the AI industry—”everything from computing power and the data used to train large language models, to cloud service providers, engineering talent and access to essential hardware such as graphics processing unit chips.” But in particular, the DOJ appears concerned that GPUs like Nvidia’s advanced AI chips remain a “scarce resource.” Kanter told the Financial Times that an “intervention” in “real time” to block a potential monopoly could be “the most meaningful intervention” and the least “invasive” as the AI industry grows.

DOJ subpoenas Nvidia in deepening AI antitrust probe, report says Read More »

elon-musk-claims-he-is-training-“the-world’s-most-powerful-ai-by-every-metric”

Elon Musk claims he is training “the world’s most powerful AI by every metric”

the biggest, most powerful —

One snag: xAI might not have the electrical power contracts to do it.

Elon Musk, chief executive officer of Tesla Inc., during a fireside discussion on artificial intelligence risks with Rishi Sunak, UK prime minister, in London, UK, on Thursday, Nov. 2, 2023.

Enlarge / Elon Musk, chief executive officer of Tesla Inc., during a fireside discussion on artificial intelligence risks with Rishi Sunak, UK prime minister, in London, UK, on Thursday, Nov. 2, 2023.

On Monday, Elon Musk announced the start of training for what he calls “the world’s most powerful AI training cluster” at xAI’s new supercomputer facility in Memphis, Tennessee. The billionaire entrepreneur and CEO of multiple tech companies took to X (formerly Twitter) to share that the so-called “Memphis Supercluster” began operations at approximately 4: 20 am local time that day.

Musk’s xAI team, in collaboration with X and Nvidia, launched the supercomputer cluster featuring 100,000 liquid-cooled H100 GPUs on a single RDMA fabric. This setup, according to Musk, gives xAI “a significant advantage in training the world’s most powerful AI by every metric by December this year.”

Given issues with xAI’s Grok chatbot throughout the year, skeptics would be justified in questioning whether those claims will match reality, especially given Musk’s tendency for grandiose, off-the-cuff remarks on the social media platform he runs.

Power issues

According to a report by News Channel 3 WREG Memphis, the startup of the massive AI training facility marks a milestone for the city. WREG reports that xAI’s investment represents the largest capital investment by a new company in Memphis’s history. However, the project has raised questions among local residents and officials about its impact on the area’s power grid and infrastructure.

WREG reports that Doug McGowen, president of Memphis Light, Gas and Water (MLGW), previously stated that xAI could consume up to 150 megawatts of power at peak times. This substantial power requirement has prompted discussions with the Tennessee Valley Authority (TVA) regarding the project’s electricity demands and connection to the power system.

The TVA told the local news station, “TVA does not have a contract in place with xAI. We are working with xAI and our partners at MLGW on the details of the proposal and electricity demand needs.”

The local news outlet confirms that MLGW has stated that xAI moved into an existing building with already existing utility services, but the full extent of the company’s power usage and its potential effects on local utilities remain unclear. To address community concerns, WREG reports that MLGW plans to host public forums in the coming days to provide more information about the project and its implications for the city.

For now, Tom’s Hardware reports that Musk is side-stepping power issues by installing a fleet of 14 VoltaGrid natural gas generators that provide supplementary power to the Memphis computer cluster while his company works out an agreement with the local power utility.

As training at the Memphis Supercluster gets underway, all eyes are on xAI and Musk’s ambitious goal of developing the world’s most powerful AI by the end of the year (by which metric, we are uncertain), given the competitive landscape in AI at the moment between OpenAI/Microsoft, Amazon, Apple, Anthropic, and Google. If such an AI model emerges from xAI, we’ll be ready to write about it.

This article was updated on July 24, 2024 at 1: 11 pm to mention Musk installing natural gas generators onsite in Memphis.

Elon Musk claims he is training “the world’s most powerful AI by every metric” Read More »

nvidia-jumps-ahead-of-itself-and-reveals-next-gen-“rubin”-ai-chips-in-keynote-tease

Nvidia jumps ahead of itself and reveals next-gen “Rubin” AI chips in keynote tease

Swing beat —

“I’m not sure yet whether I’m going to regret this,” says CEO Jensen Huang at Computex 2024.

Nvidia's CEO Jensen Huang delivers his keystone speech ahead of Computex 2024 in Taipei on June 2, 2024.

Enlarge / Nvidia’s CEO Jensen Huang delivers his keystone speech ahead of Computex 2024 in Taipei on June 2, 2024.

On Sunday, Nvidia CEO Jensen Huang reached beyond Blackwell and revealed the company’s next-generation AI-accelerating GPU platform during his keynote at Computex 2024 in Taiwan. Huang also detailed plans for an annual tick-tock-style upgrade cycle of its AI acceleration platforms, mentioning an upcoming Blackwell Ultra chip slated for 2025 and a subsequent platform called “Rubin” set for 2026.

Nvidia’s data center GPUs currently power a large majority of cloud-based AI models, such as ChatGPT, in both development (training) and deployment (inference) phases, and investors are keeping a close watch on the company, with expectations to keep that run going.

During the keynote, Huang seemed somewhat hesitant to make the Rubin announcement, perhaps wary of invoking the so-called Osborne effect, whereby a company’s premature announcement of the next iteration of a tech product eats into the current iteration’s sales. “This is the very first time that this next click as been made,” Huang said, holding up his presentation remote just before the Rubin announcement. “And I’m not sure yet whether I’m going to regret this or not.”

Nvidia Keynote at Computex 2023.

The Rubin AI platform, expected in 2026, will use HBM4 (a new form of high-bandwidth memory) and NVLink 6 Switch, operating at 3,600GBps. Following that launch, Nvidia will release a tick-tock iteration called “Rubin Ultra.” While Huang did not provide extensive specifications for the upcoming products, he promised cost and energy savings related to the new chipsets.

During the keynote, Huang also introduced a new ARM-based CPU called “Vera,” which will be featured on a new accelerator board called “Vera Rubin,” alongside one of the Rubin GPUs.

Much like Nvidia’s Grace Hopper architecture, which combines a “Grace” CPU and a “Hopper” GPU to pay tribute to the pioneering computer scientist of the same name, Vera Rubin refers to Vera Florence Cooper Rubin (1928–2016), an American astronomer who made discoveries in the field of deep space astronomy. She is best known for her pioneering work on galaxy rotation rates, which provided strong evidence for the existence of dark matter.

A calculated risk

Nvidia CEO Jensen Huang reveals the

Enlarge / Nvidia CEO Jensen Huang reveals the “Rubin” AI platform for the first time during his keynote at Computex 2024 on June 2, 2024.

Nvidia’s reveal of Rubin is not a surprise in the sense that most big tech companies are continuously working on follow-up products well in advance of release, but it’s notable because it comes just three months after the company revealed Blackwell, which is barely out of the gate and not yet widely shipping.

At the moment, the company seems to be comfortable leapfrogging itself with new announcements and catching up later; Nvidia just announced that its GH200 Grace Hopper “Superchip,” unveiled one year ago at Computex 2023, is now in full production.

With Nvidia stock rising and the company possessing an estimated 70–95 percent of the data center GPU market share, the Rubin reveal is a calculated risk that seems to come from a place of confidence. That confidence could turn out to be misplaced if a so-called “AI bubble” pops or if Nvidia misjudges the capabilities of its competitors. The announcement may also stem from pressure to continue Nvidia’s astronomical growth in market cap with nonstop promises of improving technology.

Accordingly, Huang has been eager to showcase the company’s plans to continue pushing silicon fabrication tech to its limits and widely broadcast that Nvidia plans to keep releasing new AI chips at a steady cadence.

“Our company has a one-year rhythm. Our basic philosophy is very simple: build the entire data center scale, disaggregate and sell to you parts on a one-year rhythm, and we push everything to technology limits,” Huang said during Sunday’s Computex keynote.

Despite Nvidia’s recent market performance, the company’s run may not continue indefinitely. With ample money pouring into the data center AI space, Nvidia isn’t alone in developing accelerator chips. Competitors like AMD (with the Instinct series) and Intel (with Guadi 3) also want to win a slice of the data center GPU market away from Nvidia’s current command of the AI-accelerator space. And OpenAI’s Sam Altman is trying to encourage diversified production of GPU hardware that will power the company’s next generation of AI models in the years ahead.

Nvidia jumps ahead of itself and reveals next-gen “Rubin” AI chips in keynote tease Read More »

nvidia-unveils-blackwell-b200,-the-“world’s-most-powerful-chip”-designed-for-ai

Nvidia unveils Blackwell B200, the “world’s most powerful chip” designed for AI

There’s no knowing where we’re rowing —

208B transistor chip can reportedly reduce AI cost and energy consumption by up to 25x.

The GB200

Enlarge / The GB200 “superchip” covered with a fanciful blue explosion.

Nvidia / Benj Edwards

On Monday, Nvidia unveiled the Blackwell B200 tensor core chip—the company’s most powerful single-chip GPU, with 208 billion transistors—which Nvidia claims can reduce AI inference operating costs (such as running ChatGPT) and energy consumption by up to 25 times compared to the H100. The company also unveiled the GB200, a “superchip” that combines two B200 chips and a Grace CPU for even more performance.

The news came as part of Nvidia’s annual GTC conference, which is taking place this week at the San Jose Convention Center. Nvidia CEO Jensen Huang delivered the keynote Monday afternoon. “We need bigger GPUs,” Huang said during his keynote. The Blackwell platform will allow the training of trillion-parameter AI models that will make today’s generative AI models look rudimentary in comparison, he said. For reference, OpenAI’s GPT-3, launched in 2020, included 175 billion parameters. Parameter count is a rough indicator of AI model complexity.

Nvidia named the Blackwell architecture after David Harold Blackwell, a mathematician who specialized in game theory and statistics and was the first Black scholar inducted into the National Academy of Sciences. The platform introduces six technologies for accelerated computing, including a second-generation Transformer Engine, fifth-generation NVLink, RAS Engine, secure AI capabilities, and a decompression engine for accelerated database queries.

Press photo of the Grace Blackwell GB200 chip, which combines two B200 GPUs with a Grace CPU into one chip.

Enlarge / Press photo of the Grace Blackwell GB200 chip, which combines two B200 GPUs with a Grace CPU into one chip.

Several major organizations, such as Amazon Web Services, Dell Technologies, Google, Meta, Microsoft, OpenAI, Oracle, Tesla, and xAI, are expected to adopt the Blackwell platform, and Nvidia’s press release is replete with canned quotes from tech CEOs (key Nvidia customers) like Mark Zuckerberg and Sam Altman praising the platform.

GPUs, once only designed for gaming acceleration, are especially well suited for AI tasks because their massively parallel architecture accelerates the immense number of matrix multiplication tasks necessary to run today’s neural networks. With the dawn of new deep learning architectures in the 2010s, Nvidia found itself in an ideal position to capitalize on the AI revolution and began designing specialized GPUs just for the task of accelerating AI models.

Nvidia’s data center focus has made the company wildly rich and valuable, and these new chips continue the trend. Nvidia’s gaming GPU revenue ($2.9 billion in the last quarter) is dwarfed in comparison to data center revenue (at $18.4 billion), and that shows no signs of stopping.

A beast within a beast

Press photo of the Nvidia GB200 NVL72 data center computer system.

Enlarge / Press photo of the Nvidia GB200 NVL72 data center computer system.

The aforementioned Grace Blackwell GB200 chip arrives as a key part of the new NVIDIA GB200 NVL72, a multi-node, liquid-cooled data center computer system designed specifically for AI training and inference tasks. It combines 36 GB200s (that’s 72 B200 GPUs and 36 Grace CPUs total), interconnected by fifth-generation NVLink, which links chips together to multiply performance.

A specification chart for the Nvidia GB200 NVL72 system.

Enlarge / A specification chart for the Nvidia GB200 NVL72 system.

“The GB200 NVL72 provides up to a 30x performance increase compared to the same number of NVIDIA H100 Tensor Core GPUs for LLM inference workloads and reduces cost and energy consumption by up to 25x,” Nvidia said.

That kind of speed-up could potentially save money and time while running today’s AI models, but it will also allow for more complex AI models to be built. Generative AI models—like the kind that power Google Gemini and AI image generators—are famously computationally hungry. Shortages of compute power have widely been cited as holding back progress and research in the AI field, and the search for more compute has led to figures like OpenAI CEO Sam Altman trying to broker deals to create new chip foundries.

While Nvidia’s claims about the Blackwell platform’s capabilities are significant, it’s worth noting that its real-world performance and adoption of the technology remain to be seen as organizations begin to implement and utilize the platform themselves. Competitors like Intel and AMD are also looking to grab a piece of Nvidia’s AI pie.

Nvidia says that Blackwell-based products will be available from various partners starting later this year.

Nvidia unveils Blackwell B200, the “world’s most powerful chip” designed for AI Read More »