R2: What it Means to be 1 Less Than S3
In the battle of bandwidth and compute, Cloudflare has a strong hand. A dive into networking infrastructure.
What’s R2 and Why It Matters
Cloudflare launched R2 last week, a cloud storage platform meant to compete with Amazon S3. It’s cheaper than incumbents, but the real linchpin is Cloudflare has limited egress fees. Egress fees are charged by cloud platforms to move data. Cloudflare doesn’t charge egress for data that stays on the platform, only for data that leaves its cloud. Developers hate egress fees. They are the bane of a multi-cloud dream and encourages lock-in.
Check out Corey Quinn’s feed when you search “egress”:
Not exactly a lovefest there.
Cloudflare is truly disrupting AWS to a certain extent, and I think it’s starting to look like a case of the classic innovator’s dilemma. But that isn’t exactly anything new. I think Ben Thompson of Stratechery talks about it better than I ever could in his recent post “Cloudflare’s Disruption.”
“What, though, if I already had built a worldwide network of cables for my initial core business of protecting websites from distributed denial-of-service attacks and offering a content delivery network, the value of which was such that ISPs everywhere gave me space in their facilities to place my servers? Well, then I would have massive amounts of bandwidth already in place, the use of which has zero marginal costs, and oh-by-the-way locations close to end-users to stick a whole bunch of hard drives.
In other words, I would be Cloudflare: I would charge marginal rates for my actual marginal costs (storage, and some as-yet-undetermined-but-promised-to-be-lower-than-S3 rate for operations), and give away my zero marginal cost product for free. S3’s margin is R2’s opportunity.”
But that’s not what I’m here to write about. I’m a semiconductor newsletter writer, not a general tech blogger. What I wanted to focus on was the infrastructure and a place where there’s a lack of understanding about why Cloudflare can compete. First, egress fees aren’t an arbitrary cost that AWS charges customers. Second, bandwidth versus compute seems to be a fundamental tradeoff, and lastly, Cloudflare has an interesting bandwidth advantage given its role as a CDN (Content Delivery Network) and not a hyperscaler.
What’s the Deal with Egress Fees?
If you read people’s takes on hated egress fees you’d think that this is the biggest rip-off of all time. While it’s clear companies are making a lot of money on egress, it’s still in their best interest to discourage people from moving data outside their datacenters. Why? Well, it’s because bandwidth is the bottleneck in HPC and this applies to datacenters as well.
This theme is something I have come back to over and over and plays out not only at the micro-level (Advanced Packaging) but at the macro-level (interconnection at datacenters). In Jensen Huang’s words, “the datacenter is the new unit of compute,” and now the Von Neuman bottleneck that impacts the micro also slows the total speed of the datacenter at the macro level.
Imagine for a second if egress cost zero, and every customer could move data at a lower cost, it could impact the total system’s bandwidth so much that everyone would just be waiting on requests to move data in or out of the datacenter. No wonder egress fees seem exorbitant. There’s an opportunity cost in there as well. Let’s see why that is and dive a bit deeper into the infrastructure, and then compare it to Cloudflare.
First, we need to overview a typical Leaf-Spine architecture. This is standard in most datacenters, chosen because it has infinite ability to add compute and networking within the datacenter.
It’s called a leaf and spine because the spine stays the same while new leaves can be expanded indefinitely with high levels of interconnect between the leaves. The goal here is to have the least possible number of “hops” within the datacenter, to decrease latency, and in a spine-leaf architecture, only one hop is ever needed. That’s because every leaf switch in the network is directly connected to every spine. This is an improvement over the previous three-tier architecture.
There are massive benefits, specifically scale and the ability to continue to scale. At the datacenter level, more and more servers can be added to the network without really much thought towards latency or network complexity within the datacenter. This is perfect for the likes of AWS and the hyperscalers. A performant infrastructure that can scale indefinitely.
The problem comes at the network level. While there is essentially infinite bandwidth within the datacenter (this is called East-West traffic), there’s a physical limiter leaving the datacenter (North-South traffic). This hasn’t really been a concern because egress disincentivizes data leaving the datacenter and East-West traffic has been the major driver of data growth the last few years. I’d imagine that the datacenter’s total interconnect within the datacenter is orders of magnitude larger than the interconnect leaving the datacenter. A Cisco report, the source of the pie chart below, says East-West traffic will be 85% of datacenter data movement. It’s no use optimizing for something that isn’t an engineering problem.
This is the status quo of datacenter infrastructure today. But let’s compare this to a CDN infrastructure, which is essentially a distributed mesh of computers globally connected to ISPs. Each CDN server looks similar to what a server in a datacenter would look like, but without many of the shared scaled benefits that hyperscalers enjoy. The one key difference is connectedness to the network. The CDN is essentially the network. Below is a diagram of CDN topology, with the third- and most-interconnected option being the modern topology that most CDNs deploy.
I want you to take a look at the basic building block, the network device, and host, and compare it to just the leaf from the spine-leaf topology. It’s very similar! In fact, it’s almost identical except that the datacenter leaves usually have even faster SmartNICs (4x50GB compared to Net’s 2x25GB) but CDN servers are directly connected to a network device (switch) which is then connected to an ISP.
This is an important point: A CDN is missing another hop and the dreaded ultimate bottleneck out of the datacenter itself, as CDN nodes are optimized for bandwidth and interconnection. This is where the core tradeoff really shows up, bandwidth vs. compute. Datacenters are optimized for unlimited bandwidth within the datacenter, while CDNs are optimized to be unlimited bandwidth for the network. Scaled compute versus bandwidth.
Compute vs Bandwidth
Each form factor (datacenter and CDN) has a different optimization in mind. At the hyperscaler datacenter, the point is economies of scale, and every bit of silicon deployed gets used at the lowest marginal rate per hour. With everything from deploying custom silicon to improve throughput (AWS Nitro) to buying energy in bulk off the grid, a hyperscaler datacenter is the leanest and meanest computer in the world.
Compare that to a CDN network, whose primary objective is to get content as close to the user as possible and focuses on bandwidth and ease of access over raw compute. These are fully fleshed-out servers but often sitting at an edge datacenter making a profit by offering a CDN service for Cloudflare. The benefit, however, is that each server is close and connected to a switch that is almost directly connected to the internet.
Cloudflare is optimized for bandwidth, and how much they have is impressive. Each tradeoff makes sense for each player.
However, hyperscaler players have been doing something in the last five years that made a non-hyperscaler able to compete on price alone without dropping their prices. The CEO of Cloudflare made a very pointed tweet on this subject, focused on S3:
So now there’s a vector to compete on price (because hyperscalers have not dropped theirs), and Cloudflare also has much more bandwidth available than the bottlenecked datacenter. This is how Cloudflare can compete and maybe do so on a price parity on even advantage if you consider the opportunity costs of egress. This is a classic bundle being unbundled, and Cloudflare has a strong hand given that bandwidth seems to be more in demand than compute.
Before the reader draws the conclusion that Cloudflare now has a strong hand to play with cost parity and infinite marginal bandwidth due to its CDN, I want to point out that the cloud players aren’t being left out in the cold. Amazon is actually one of the largest CDNs globally, and this is by building a consolidated CDN. Pretty much hyperscalers can build more datacenters and achieve something akin to more network and still profit from shared economies of scale.
In fact, this is already happening, with Microsoft’s availability zones not forming a single large footprint, but rather multiple datacenters that are peered locally with fast networking. They’re already starting to split the large datacenter into smaller parts, and datacenters will likely become more numerous and be a method to increase total network availability. This is in contrast with CDNs becoming likely beefier if they’re going to offer storage and serverless functions such as R2 and service workers.
Both are shooting for the same thing, a global network of computers that customers can rent on demand. CDNs - or more accurately an edge network - approached this problem from the bottom up. Hyperscalers are approaching it from the top down.
But the battle between bandwidth and compute has just begun, and I think Cloudflare has an interesting hand to play. Ten years ago, this would have been unthinkable, but even the greatest bundles can break, and this is the first compelling unbundle of the goliath hyperscalers.
If you made it this far - please consider subscribing or sharing my Substack! I focus on technology broadly (like networking for example) but semiconductors specifically.
As software eats the world, semiconductors are the teeth, and I want you my reader to know what exactly is going on. I try to write jargon-free primers from an investor perspective. Have a great day!
Would this make retail data centers more attractive as opposed to wholesale data centers?
Maybe I'm reading it wrong, but the takeaway is get long AVGO and MRVL?