A Conversation with Val Bercovici about…

Doug O'Laughlin

Jul 7

Learn about one of the biggest trends in Inference.

Listen →

5 Comments

Reece

Jul 8

Thank you very much for sharing this interview with us Doug! I found it extremely helpful to shape my thinking.

Expand full comment

Pavan Patel

Jul 7

Thoroughly enjoyed this pod. Appreciate it!

Expand full comment

Bolivar Trask

Aug 23

Doug -so it seems we have 2 curves - exploding demand for inference meetinf and these disaggregation architectures allowing GPUs to be much more efficiently used.

Is this a net negative for NVDA as it no longer means their revenue hockey stick ticks up one for one with demand? The demand driver of more agentic/persistent context workloads translates to more of this new networked memory and slower growth in GPUs installed?

Expand full comment

Thiago

Jul 8

Hi Doug, that was a fantastic post. Thanks for sharing!

I'm trying to pinpoint the inflection point for disaggregated inference. As you mentioned, its deployment at scale seems to be taking off right now although we had some interesting papers about this 12-18m ago.

In your view, is this shift primarily a 'demand-pull' driven by the changing nature of AI workloads (prefill-heavy, long-context 'agentic' models)?

Or there is more of a 'supply-push' from the technology/hardware finally maturing and enabling this solution to be deployed at scale?

Expand full comment

Reply (1)

Doug O'Laughlin

Jul 8

A bit of both, think it becomes the standard via VLLM but the decode scale out for agents output tokens helps too

Expand full comment

Fabricated Knowledge

A Conversation with Val Bercovici about…