The Inference Economy: Where AI Margins Actually Come From in 2026

Sign up for ARPU: Stay informed with our newsletter - or upgrade to bespoke intelligence.

In this complimentary report, we will go through:

The Inference Inflection: Why the AI industry's economic center of gravity has decisively shifted from training to inference — and why the cost of every user interaction, every agent task, and every API call is now the central unit economic of the entire stack.
The Great Token Price Collapse: A data-driven breakdown of how frontier AI performance became 280× cheaper in under two years, what drove it, and why the compression is not over — with a forward view on where pricing floors may emerge.
Who Actually Captures the Margin: A layer-by-layer anatomy of the AI inference value chain — from silicon to networking to cloud to foundation models to applications — exposing which layers are structurally profitable, which are subsidising adoption, and which are hollowing out fastest.
The Inference-Time Compute Shift: How models like OpenAI o1 and DeepSeek R1 rewrote the cost function of AI by spending more compute at query time rather than training time — and what this means for GPU demand, token economics, and how capability will be priced going forward.

To help us tailor future content and community initiatives, please take 30 seconds to answer the questions below to download the report. The PDF will be available immediately upon submission.