The Rise of Amazon's Trainium 2
The second real hyperscaler custom silicon program will make big waves in the semiconductor industry. Amazon is waking up.
In my opinion, this week’s big news in the semiconductor industry is the ramp of Trainium. It’s a bit subtle, but I will walk you through the secondary impacts and then give you an idea.
Let’s start at the source and walk backward from there. Amazon finally woke up. SemiAnalysis’s piece on why Amazon will lose the future of AI has pretty much turned out correct. AWS has a fundamentally different view of how to run infrastructure, and most of it is driven by effectively pursuing Amazon basics at scale for their AWS property. It’s this mindset that has let Azure catch up.
But I think there’s a vibe shift happening here. Amazon is starting to crank capex higher, which will be on infrastructure for AWS and AI.
Now turning to our capital investments. As a reminder, we define these as a combination of cash CapEx plus equipment finance leases. Year-to-date capital investments were $51.9 billion. We expect to spend approximately $75 billion in CapEx in 2024. The majority of the spend is to support the growing need for technology infrastructure. This primarily relates to AWS as we invest to support demand for our AI services while also including technology infrastructure to support our North America and international segments.
Now, the interesting part was this specific language from Andy Jassy. They suspect they will spend more in 2025, and most of it will be driven by generative AI. Generative AI is growing 3x faster than AWS at its scale, and I think this phrasing is Day 1 for Amazon’s GenAI push.
I suspect we'll spend more than that in 2025. And the majority of it is for AWS, and specifically, the increased bumps here are really driven by generative AI.
Our AI business is a multibillion-dollar business. It's growing triple-digit percentages year-over-year, and it's growing 3x faster at its stage of evolution than AWS did itself. We thought AWS grew pretty fast.
And so the thing to remember about the AWS business is the cash life cycle is such that the faster we grow demand, the faster we have to invest capital in data centers and networking gear and hardware.
That all sounds like an accelerating investment to me. The wonderful thing about semiconductors is that the entire supply chain almost always picks this up. I’ve seen the Trainium ramp in two big companies, Advantest and Monolithic Power.
Advantest surprised me, as this was a marked acceleration versus expectations. It was a 30% beat, and I thought this was likely the tester ramp for Trainium. Remember, Advantest has the SOC market specifically focused on AI devices.
Semiconductor and Component ¥145.5B vs StreetAccount ¥110.31B
On their call, they discussed quickly expanding supply for a customer. While that has historically been for Nvidia’s systems, I believe this comment below matches Trainium's timing and Amazon's comment.
Advantest is below, saying they are expanding supply quickly.
I would say that in terms of our discussions with customers, since July, we are working to accelerate to our procurement to meet customer demand and expand our production. And for HPC/AI demand, we are -- we have expanded our supply capabilities in a very short period of time. And we think that our second quarter demonstrates our efforts, which is why our second quarter number was very strong.
Amazon specifically said that they have returned to their manufacturing partners a few times to ask for more supply.
And we have a lot of customer interest. We have gone back to our manufacturing partners a couple of times now to produce a lot more Trainium than we anticipated. Some of that, for sure, is due to the fact that we have very large demand, and we want more capacity and supply to be able to provide them.
Given that testing usually leads chips by a quarter, this matches up pretty well with the timing of capacity additions. And that wasn’t the only read-through I found. Monolithic Power mentioned that a customer with their “own tensor processor unit” that wasn’t a TPU was ramping. That’s Trainium2.
We have other ones like SoC side of the market segments that hasn't really ramped up yet or started ramping. And the other ones like other hyperscales company, cloud computing companies and their own SoCs and their own TPUs, they call it the tensor processors. And those one is still small, and we're ramping in the next few quarters, okay? And as Bernie said, MPS, in the past, we always emphasize diversity. And we will not be known to be an AI power supply company.
I think the Trainium ramp is going to be a lot larger than most expect. Amazon is finally waking up to the GenerativeAI battle and is coming in with its custom silicon in a big way. Trainium1 was not exactly the product they hoped for, and this is their second and likely better attempt at the market.
I think this is going to be narratively interesting. I do not think that Amazon is incorporated into Nvidia’s numbers, and I believe that Nvidia’s Gb200 is a much better product than Trainium2. That said, Nvidia’s numbers have always been a bit cheap on the forward and are more of a function of fear of “peak revenue.”
To me, Trainium is the first really qualitatively different thing to point to peak revenue. Customers moving off your platform sounds scary, and it’s just been Nvidia and TPU up to this point. There’s now a third high-volume training processor in production, and it’s ramping.
The market will focus on it myopically soon, and the Nvidia bears will come out to play. I don’t have a view of how successful it will be in training large models, but I do have a view that it will be a lot of revenue for a few companies.
Now, let’s turn to the primary beneficiary (behind the paywall).