The Data Center is the New Compute Unit: Nvidia's Vision for System-Level Scaling

Copper, Cooling, and Compute Density: The Pillars of Nvidia's Data Center Vision

Apr 04, 2024

∙ Paid

Nvidia CEO Jensen Huang has repeatedly emphasized that the data center is the new unit of compute. While this concept seemed straightforward at first, the deeper implications became clearer after Nvidia's presentations at GTC and OFC 2024.

I only recently grasped exactly what is happening, and a simple reframing of the underlying principles that drove Moore’s Law makes the entire picture clearer. In this new paradigm, the rack itself is similar to a chip, and now, if we frame the rack as the new chip, we have a whole new vector to scale performance and power. Let’s talk Moore’s Law from the perspective of the data center.

Moore’s Law as a Fractal

It all starts with Moore’s Law. There is a profound beauty in semiconductors, as the same problem that is happening at the chip scale is the same problem that is happening at the data center level. Moore’s Law is a fractal, and the principles that apply to nanometers apply to racks.

The first profound principle we have to talk about is miniaturization. Moore’s Law was built on the simple observation that shrinking transistors would take less power and give you more performance as the electrons physically don’t have to travel as far. That is why Moore’s Law was about halfing the physical space of a transistor for decades. But in recent years, we have reached the economic limits of scaling much smaller chips, so we have hit some asymptotes. This is the wildly heralded end of Moore’s Law. Making chips that much smaller has become harder.

But the things that apply at the bottom (moving bits closer) apply to things at the top. Moving electrons further takes time and more energy, and the closer all of the data, power, and logic are, the less energy is wasted by distance. The problem is still the same at a nanometer scale as at a rack scale, and moving cables and logic closer leads to system performance gains. This problem applies to all networks. There are economies of scale by moving things closer as long as there aren’t geographic costs.

So, what is the answer? Let’s not just move the electrons closer within the chip; let’s move them closer within the rack. And that’s exactly what Nvidia is doing.

Moore’s Law Shifts to STCO

The pursuit of better chips will continue, but I think that Nvidia is intelligently pursuing better systems and will likely get multiple generations of chip shrink improvements from the intelligent design of the system. What is so attractive is that a generation or more of improvement is available with today’s current technology.

While others will figure it out, Nvidia will benefit from boldly going first and achieving the total system vision before anyone else. Moreover, Nvidia is just at the leading edge of an old concept.

This concept is called System Technology Co-Optimization, and while many talked about this being a potential new vector of progress, I don’t think anyone expected Nvidia to pop up with an opinionated and compelling version of one in 2024. What’s more, if you squint, you can see how STCO fits neatly within the decades of long computing history. This is no different from the move from ICs to VLSI and SoCs to Nvidias System of Chips.

Let’s have a brief history lesson. Initially, the transistor was created, and then the integrated circuit combined multiple transistors to make electronic components. Then LSI or VLSI focused on making thousands of transistors work together and was the beginning of the Microprocessor. Next was the observation that you could put multiple systems of semiconductors onto a single chip or System On Chip (SoC). We have recently been scaling out of the chip and onto the package, a la chiplets, heterogeneous compute, and advanced packaging like CoWoS.

But I think Nvidia is taking the scaling game outside the chip to a System of Chips. I’m sure someone will eventually make a much more compelling acronym, but I think there’s a real chance the 2020s and 2030s are about scaling out these larger systems than silicon. And it's all beautiful, consistent with what came before it.

If the data center is the new unit of compute, it’s time to apply Moore’s Law and hardware vendors' tricks to system-level optimizations. Nvidia has already shown us its hand, and Andy Bechtolsheim pretty much gave the entire scaling roadmap in a presentation at OFC.

The Data Center as a Giant Chip

Imagine the data center as a giant chip. It is just a scaled-out advanced package of the transistors of memory and logic for your problem. Instead of a multi-chip SoC, each GPU board is a “tile” in your chip. They are connected with copper and moved together as closely as possible, so there is more performance and speed.

Does moving things off the package make sense, aka out of the rack? If performance and bandwidth are your objective, it doesn’t make much sense as that slows the entire chips’s performance massively. The key bottleneck is moving the data back and forth to train the model and keeping the model in memory, so keeping the data together makes sense. That is precisely why we are trying to scale HBM, dies on the same package as an accelerator.

So, in the case of a data center as a chip, you’d try to package everything together as closely as possible for as cheap as possible. There are a few different packaging options, with the closest being chip-to-chip packaging, then HBM on the package, NVLink over passive copper, and scaling out to Infiniband or Ethernet.

And wouldn’t you know it, Nvidia has been pursuing this problem using this exact lens. The goal is to scale out chip-to-chip interconnect, HBM, NVLink, and Infiniband. They even have this handy graph that puts the entire debate into perspective. As expensive as HBM is, it’s virtually free for purchased bandwidth. If bandwidth is the problem, it makes it difficult to scale up the cheapest bandwidth before relying on other scaling layers.

The closer it is to logic, the better it is for price and performance

This slide was shown many times at OFC and is my conclusion for where we will try to scale up as much as possible. It makes sense to scale the domain where the cost of bandwidth is the cheapest before considering other domains.

In the case of Nvidia, the point is to scale up as much memory in HBM3 before we consider NVLink and then try to keep as much computing within NVLink before even considering scaling up to the network.

Put differently, connecting 1 million accelerators over ethernet is wasteful, but connecting 1 million accelerators over passive copper in a short-reach interconnected node is economical and brilliant. Nvidia is pursuing the most scaling possible over passive copper before needing to use optics. This will be the lowest cost and highest performance solution.

The copper backplane in the data center rack is effectively the new advanced packaging in the system-level Moore’s Law race. The new way to shrink the rack is to put as much silicon in the most economical package, connected over the cheapest and most power-efficient interconnect as closely as possible. This is the design ethos of the GB200 NV72L.

Imagine each GPU rack a chiplet and each copper cable CoWoS packaging

This is the new Moore’s Law; you’re looking at the new compute unit. The goal here is to increase the power of this rack and move as many chips into a single rack as possible. It is the obvious cheapest and most power-efficient way to scale. Jensen referenced this at GTC and talked about how the new GB200 rack takes 1/4th of the power and uses less space to train the same GPT 1.8T model.

Less space, less power, more performance—that’s Moore’s Law in all but name. Welcome to the new system scaling era, so let’s discuss how we scale from here. Andy Bechtolsheim’s talks at OFC opened my eyes because we have at least a generation of scaling from here based on our current technology.

Liquid Cooling and Copper in the Data Center

Before summarizing Andy’s talk, I want to explain why we haven’t done this before. The critical change that makes this all possible is liquid cooling. Because of the shift to liquid cooling, we can cool another doubling of power in a rack. The 120 kW GB200 NVL rack is doubling over the current solution, and next generation, I would expect another doubling of power.

In some respects, this is a new power scaling envelope, and we will likely scale as quickly as possible until we hit the asymptote beyond liquid cooling. In some ways, this is a clever permutation of Dennard’s scaling.

The logical and obvious goal is to push the cooling to reach the maximum power we can miniaturize in a data center rack. Andy talked about exactly how much further we could go. Andy believes we can put about 300+ kW in a single rack and cool it with liquid cooling. The power density level will be equivalent to at least a generation of process shrink.

It would look like this: We push HBM density as much as possible in a single chip, with 64 stacks of 16-hi HBM, or almost 500+ gigabytes of HBM. Andy even talked about using XDDR to further scale out of the package and scale memory.

And what’s more, Panel-level packaging means that we can likely scale out more silicon die area. With larger packages and substrates, we can now use advanced packaging to scale out to 10x+ larger than current CoWoS packages. The goal would be to put as much silicon area as thermally possible in a single liquid-cooled rack.

So now imagine we can put 4-10x more silicon area in a single package and have the means to cool at least 2x more power. So assume some power savings; the goal is to put as much silicon into a rack, cool it as best as possible, and then interconnect it with passive copper. Only after passive copper, all within the NVLink domain, can we talk about optics and DSPs. The goal is to scale up before you need to pay for networking, and Nvidia will push this system scaling while simultaneously pursuing silicon scaling. This is a compelling scale-up and roadmap.

This system solution will be an order of magnitude cheaper than optics and probably the best price-to-performance you can buy. Nvidia is creating another layer of networking moat around its GPUs. If they pull off LPO first, this will be just another layer of networking advantage. This is pretty much why CXL is dead: It makes zero sense to put any compute or memory over the network; it costs too much and adds complexity to straightforward scaling.

I conclude that Copper will reign supreme at the rack scale level and can push Moore’s Law scaling further. AI networking aims to scale copper networking as hard as possible before we have to use Optics. Andy puts it this way:

Now that liquid cooling has opened a relatively easy frontier of scaling, there will be a race to push that scaling. Nvidia is already using this level of integration as a vast new vector of differentiation. They are looking for better solutions and ways to buy the most flops at the cheapest power constraint. It’s time to think about Moore’s Law beyond the narrow confines of the chip and at the system.

The new Moore’s Law is about pushing the most compute into a rack. Also, looking at Nvidia’s networking moat as InfiniBand versus Ethernet is completely missing the entire point. I think the NVLink domain over passive copper is the new benchmark of success, and it will make a lot of sense to buy GB200 NV72 racks instead of just B200s.

It’s a new era in system design using larger substrates, denser memory, and passive copper to keep the information as close together as possible. This was already being pursued at the chip level; now, it’s happening at the rack level.

In this world, the leaves of the fat tree architecture become even denser. The fat leaves in this architecture will try to consume as much computing and memory as possible before scaling out to the network is even needed. Nvidia is cleverly trying to eat the network from the bottom up.

Meanwhile, Broadcom is pursuing the scale out from the top of the rack down, but given the cost and ability to scale on copper, I think the energy and performance of the scale-up from the leaves make a lot more sense. The tightly integrated mainframe solution Nvidia offers will be the best in performance. And nowhere in this conversation is AMD, which will be trying to scale as a component in a network using open consortiums.

The strategy of scaling out racks is clever and completely orthogonal to the previous ways we scaled chips. The hyperscalers, while likely aware of the benefits, probably didn’t foresee this roadmap entirely as defined as Nvidia has played. I think it’s time to start thinking about scaling Systems of Chips, and Nvidia, as usual, has already thought out and deployed the first edition of that future.

For more competitive thoughts and specific aspects from rack scaling, I will post thoughts behind the paywall. I believe that the hyper scaler’s infrastructure edge is one of the most overestimated competitive advantages in the market. For more thoughts, read behind the paywall. The hyperscalers are the most at risk in this data center scale-out regime.

Fabricated Knowledge