Key Takeaways
- Inference costs dropped approximately 280x over two years (2023-2025), the largest commodity compute deflation in history. If demand were constant, this would reduce AI spending by 99.6%.
- Instead, AI venture funding hit $258.7B in 2025, capturing 61% of all global VC (up from 30% in 2022). OpenAI is raising $100B+ at $850B valuation; Anthropic closed $30B at $380B.
- Three structural forces explain the paradox: (1) reasoning models use 150x more compute per complex query vs. standard models, (2) Samsung targets 800M Gemini-powered consumer devices by end 2026 (+100% YoY), and (3) agentic AI represents always-on inference workloads, not single-query interactions.
- Deloitte projects inference will consume 66% of all AI compute in 2026 (up from 33% in 2023), shifting the economic model from training-dominant to inference-dominant.
- 73% of AI VC concentrated in mega-deals above $100M, with IT infrastructure ($109.3B) receiving 3x more capital than generative AI models ($35.3B).
Jevons Paradox: A 160-Year-Old Economic Pattern
William Stanley Jevons observed in 1865 that James Watt's more efficient steam engine did not reduce coal consumption—it made coal-powered applications economically viable across industries that could not previously afford them, increasing total coal demand. The pattern is now classical economics: commodity cost reduction drives demand expansion such that total spending increases despite per-unit price collapse.
The AI compute market in 2026 is experiencing the identical dynamic at accelerated pace.
Per-token inference costs have fallen approximately 280x over two years (2023-2025), according to multiple industry reports. To put this in perspective: if a company spent $1 billion on inference in 2023, equivalent compute in 2025 would cost approximately $3.6 million. This is a deflation magnitude only seen in semiconductor manufacturing history (chip cost per transistor declining exponentially). If demand were constant, this would translate to a 99.6% spending reduction.
Instead, the OECD documents that AI venture capital hit $258.7 billion in 2025, capturing 61% of all global VC (up from 30% in 2022). OpenAI is closing a $100B+ round at $850B valuation. Anthropic closed $30B at $380B. Combined, these two companies alone are raising more capital than the entire AI VC market in 2023.
This is the Jevons Paradox in real time.
Three Structural Forces: Why Cheaper Compute Drives More Spending
Force 1: Reasoning Models Consume 150x More Compute Per Query
NVIDIA demonstrated at GTC that DeepSeek R1 answering a complex problem generates 20x more tokens and uses 150x more compute than a standard model on the same query. OpenAI's o1/o3, Claude 3.7 with extended thinking, and DeepSeek R1 all employ test-time compute scaling—a technique that trades per-token cost reduction for dramatically more tokens per inference event.
This is the critical mechanism: cheaper tokens make expensive inference strategies viable. Before the 280x cost reduction, using 150x more compute to answer a single question was economically nonsensical. Now, it is often the right trade-off. An enterprise can afford to use reasoning models for high-value queries that previously required human expert judgment.
Deloitte projects inference will consume 66% of all AI compute in 2026 (up from 33% in 2023). This is not a gradual shift—it is a structural inversion of the AI economics timeline.
Force 2: Consumer Device Proliferation at Scale
Samsung will embed Gemini in 800 million devices by end of 2026, doubling from 400 million in 2025. Google is investing $15B in Indian AI infrastructure, targeting a market where India already leads in daily student Gemini usage. When 800M+ devices each make dozens of inference calls daily, the aggregate compute demand dwarfs enterprise workloads.
This is pure demand expansion. These are not existing workloads running cheaper on better hardware—these are new inference endpoints that did not exist in 2023. Each additional Gemini device is a permanent addition to the global inference fleet.
The math is straightforward: 800M devices × 50 inferences/day × 365 days = 14.6 trillion daily inference calls. Even at 280x cheaper per-token pricing, this creates an enormous absolute compute bill. If inference costs $0.00001 per token (an extreme assumption), 14.6 trillion inferences at average 200 tokens each = $29.2B annual compute spend from Samsung devices alone.
Force 3: Agentic AI Represents Always-On Inference
Unlike chatbots (single query-response cycle), AI agents run continuously—monitoring, planning, executing, evaluating. Gartner predicts 40% of enterprise applications will embed task-specific agents by end of 2026. Each running agent represents a perpetual inference workload. Even at 280x cheaper per-token, an agent running 24/7 generating thousands of reasoning chains costs more than occasional human queries.
Consider a simple example: a supply chain optimization agent monitoring supplier inventory across 1,000 SKUs. It polls data every 5 minutes, generates a reasoning chain analyzing procurement options, and recommends actions. That is 288 inference cycles per day × 365 = 105,120 inference calls annually per SKU. Across 1,000 SKUs, that is 105 million inference calls per year from a single agent. Multiply across the market—40% of enterprise apps embedding agents by end 2026—and you reach infrastructure-scale compute demand.
Again: cheaper inference makes this economically viable. It was not viable in 2023.
Capital Concentration Confirms the Thesis
The capital concentration pattern provides strong evidence that Jevons Paradox is operating. OECD data shows 73% of AI venture capital goes to mega-deals above $100M. The money is not flowing to cost-saving efficiency plays—it is flowing to companies building new compute-intensive capabilities that cheaper inference makes viable.
IT infrastructure and hosting absorbed $109.3B in 2025 AI VC, more than 3x the $35.3B going to generative AI models. The infrastructure layer is the primary beneficiary of the Jevons dynamic. Hyperscalers (AWS, Azure, Google Cloud) and specialized inference providers (Groq, Cerebras, Together AI) capture disproportionate capital because they control the hardware that serves inference at scale.
Frontier model labs (OpenAI, Anthropic) need massive capital to fund inference infrastructure, not to train new models. The $100B OpenAI is raising is primarily for inference capacity, not training compute. The shift from training-dominated to inference-dominated economics is complete.
| Metric | Direction | Change | Time Period |
|---|---|---|---|
| Per-Token Inference Cost | Down | -280x (99.6%) | 2023-2025 |
| AI VC (2025) | Up | $258.7B (61% of all VC) | 2022-2025 |
| Reasoning Model Compute/Query | Up | 150x vs. standard models | 2025-2026 |
| Gemini Consumer Devices (2026) | Up | 800M (+100% YoY) | 2025-2026 |
| Inference as % of AI Compute | Up | 66% (from 33%) | 2023-2026 |
The Jevons Paradox in Numbers
Key metrics showing that cost reduction drives demand expansion, not spending reduction
Source: Deloitte / OECD / NVIDIA GTC / Samsung CES 2026
Who Wins from the Jevons Paradox in AI
Infrastructure providers are the primary beneficiaries:
- NVIDIA: Cheaper inference increases demand for inference-optimized GPUs (H100, H200, Blackwell B200). The company wins on volume and specialization.
- Cloud Hyperscalers: AWS, Azure, and Google Cloud capture 73% of AI VC through infrastructure plays. Cheaper per-token pricing expands TAM for everyone, but hyperscalers capture share through scale economics.
- Specialized Inference Providers: Companies like Groq, Cerebras, and Together AI benefit from the expanded total market, even if they face commoditization from NVIDIA's optimized inference models (Nemotron 3).
Who does NOT win:
- Frontier model labs: OpenAI and Anthropic maintain high margins on API access, but face pressure from open models achieving comparable quality at fraction of cost. The $100B capital raise signals defensiveness—infrastructure investment to maintain competitive advantage as open models improve.
- Traditional software vendors: Companies selling enterprise AI solutions face margin compression as inference costs fall. The value prop shifts from 'we have proprietary AI' to 'we have proprietary data + AI at commodity inference cost.'
Contrarian Perspective: When Jevons Paradox Fails
Jevons Paradox assumes demand elasticity is high enough that cost reduction stimulates more than proportional usage growth. If AI applications hit genuine utility ceilings, the paradox could invert.
Consider the evidence:
- Enterprise AI plateau risk: Most enterprise workflows do not benefit from chain-of-thought reasoning. A customer service chatbot using o1-style test-time compute is overkill—it needs fast, cheap inference, not accurate complex reasoning. If the median enterprise AI workload is this kind of 'cheap inference' use case, total compute demand may plateau despite cost reduction.
- Consumer awareness plateau: Samsung's 800M device target is ambitious. Galaxy AI consumer awareness is currently ~80%, but awareness does not translate to daily active usage. If Gemini remains a novelty feature rather than a daily utility, the per-device inference volume could be lower than assumed, flattening demand growth.
- Cost reduction deceleration: The 280x cost reduction is a trailing metric. The rate of cost improvement is decelerating as hardware optimization approaches physical limits. Inference hardware is approaching theoretical limits on power efficiency and latency. Future cost reductions may be 10x or 20x, not 280x, reducing the stimulus to demand expansion.
- Geopolitical fragmentation: US export controls on AI chips to China create two separate compute markets with different cost structures. This breaks the unified demand expansion thesis assumed by Jevons Paradox.
The capital inflows driving $258.7B in AI VC could represent peak cycle rather than structural demand growth. By 2028, we will know whether Jevons or Malthus wins.
What This Means for Practitioners
ML engineers and infrastructure teams should plan for INCREASING compute budgets despite falling per-unit costs. This is the core Jevons insight. Reasoning models and agentic workflows will consume dramatically more tokens per user session than chatbot-era workloads.
Specifically:
- Cost planning: Develop budget models that assume 2x-3x increases in per-user compute demand through 2026, despite 50% reductions in per-token pricing. The two effects partially offset.
- Capacity planning: Inference capacity requirements will grow faster than user growth. Plan GPU allocations for a 100% user increase but a 300-400% inference compute increase.
- Model selection: Reasoning models (o1, o3, extended thinking variants) are now economically viable for high-value queries. Evaluate test-time compute scaling as a feature, not a cost outlier.
- Agent infrastructure: Agentic workloads create permanent inference demand. This is fundamentally different from chatbot economics. Build infrastructure assuming agents run 24/7, not on-demand.