The 280x Cost Collapse Is Accelerating AI Spending: Jevons Paradox in Action

AI inference costs dropped 280x from 2022 to 2024, yet enterprise AI spending surged 320% in 2025. This is Jevons Paradox: efficiency gains expand addressable markets rather than reduce total spend. NVIDIA Rubin and DeepSeek V4 compound this effect, while agentic AI creates continuous token consumption at scale.

TL;DRBreakthrough 🟢

•AI inference costs fell 280x from $20/M to $0.07/M tokens (2022-2024), yet enterprise spending surged 320% in 2025
•NVIDIA Rubin (H2 2026) and DeepSeek V4 compound efficiency gains: 10x hardware + 10x software = ~100x theoretical improvement
•Agentic AI is the demand multiplier: autonomous supply chain agents run 24/7, generating continuous token consumption vs sporadic chatbot usage
•GPU supply remains constrained despite software deflation; NVIDIA cutting gaming production 30-40% to prioritize data center
•The paradox holds: total compute demand grows faster than efficiency gains, benefiting hardware makers despite per-token price collapse

jevons paradoxai inference costnvidia rubindeepseekagentic ai3 min readMar 9, 2026

Key Takeaways

AI inference costs fell 280x from $20/M to $0.07/M tokens (2022-2024), yet enterprise spending surged 320% in 2025
NVIDIA Rubin (H2 2026) and DeepSeek V4 compound efficiency gains: 10x hardware + 10x software = ~100x theoretical improvement
Agentic AI is the demand multiplier: autonomous supply chain agents run 24/7, generating continuous token consumption vs sporadic chatbot usage
GPU supply remains constrained despite software deflation; NVIDIA cutting gaming production 30-40% to prioritize data center
The paradox holds: total compute demand grows faster than efficiency gains, benefiting hardware makers despite per-token price collapse

The Paradox: Cheaper = More Spending

The most counterintuitive dynamic in AI infrastructure today is that the fastest cost deflation in technology history is simultaneously creating the most acute hardware shortage. Stanford's 2025 AI Index Report documents a 280-fold reduction in inference costs from November 2022 to October 2024 — from $20 per million input tokens to $0.07/M for GPT-3.5-level performance.

Yet enterprise AI spending surged 320% in 2025. Microsoft's AI revenue reached $13 billion, up 175% year-over-year. Satya Nadella explicitly invoked Jevons Paradox — the 1865 observation that more efficient coal engines increased total coal consumption by making steam power economically viable for new applications.

The mechanism in AI is precise: a customer service chatbot costing $2,000/month in 2022 now costs $7/month. At $7/month, every department in every company can afford AI. The median enterprise went from 1-2 AI applications in 2023 to dozens by end of 2025.

AI Cost Deflation vs Demand Expansion

Key metrics showing the Jevons Paradox dynamic: costs collapse but total spending accelerates

280x

Inference Cost Reduction (2022-2024)

▼ $20 to $0.07/M tokens

+320%

Enterprise AI Spending (2025)

▲ Despite unit cost collapse

10x

Rubin MoE Inference Improvement

▲ vs Blackwell (H2 2026)

20x

DeepSeek V4 vs GPT-5 Cost Gap

▼ $0.30 vs $2.80/M tokens

Source: Stanford AI Index / Analytics Week / NVIDIA / DeepSeek

Hardware and Software Efficiency Multiply

The 280x cost reduction from 2022-2024 is being compounded by two convergent 10x improvements arriving in H2 2026:

Software efficiency: DeepSeek V4 achieves frontier-competitive performance at $0.30 per million input tokens (originally reported at $0.14 for V3) through Mixture-of-Experts architecture. The ~1 trillion parameter model activates only ~32B parameters per token, requiring approximately 250 GFLOPs versus 2,448 GFLOPs for a comparable dense model — a 10x compute reduction at the algorithmic level.

Hardware efficiency: NVIDIA Rubin delivers up to 10x lower cost-per-token for MoE inference versus Blackwell, with the NVL72 providing 3.6 exaFLOPS of NVFP4 inference. Major hyperscalers including OpenAI have committed to H2 2026 deployments at gigawatt scale.

Running MoE models on Rubin hardware creates a theoretical ~100x inference cost improvement when both optimizations are combined. Cumulative cost deflation from 2022 baselines approaches 28,000x for equivalent capability — orders of magnitude beyond anything observed in semiconductor scaling.

Customer Service Chatbot Monthly Cost (10K Conversations/Day)

Real-world cost trajectory showing why AI deployment is expanding to every department

Source: Stanford AI Index / NVIDIA Rubin extrapolation

Agentic AI: The Demand Multiplier

Agentic AI amplifies Jevons Paradox by converting per-token cost savings into 24/7 continuous consumption. SAP reports autonomous supply chain agents achieving 25% lead time reductions through autonomous inventory rebalancing — each agentic workflow generates continuous token consumption at volumes that dwarf interactive chatbot usage.

Instead of a customer service bot responding to sporadic queries, autonomous agents execute transactions continuously: procurement, logistics, inventory rebalancing — without per-transaction human approval. Gartner projects 40% of enterprise applications will embed AI agents by end of 2026, up from less than 5% in 2025.

This is the feedback loop that confirms Jevons Paradox in real time: the cheaper AI becomes, the more constantly it runs.

The Binding Constraint: Hardware Supply

NVIDIA is cutting gaming GPU production 30-40% in early 2026 to redirect fabrication to data center chips, which generate 12x more revenue per unit. HBM memory and power infrastructure remain binding constraints. AMD Helios (MI455X) launching H2 2026 adds competitive supply, but total demand growth outpaces supply expansion.

This creates a paradoxical situation: the software efficiency that should reduce demand for hardware is instead increasing it faster than fabrication capacity can grow. The GPU shortage of 2025-2026 is not a temporary supply chain issue — it reflects structural demand growth that efficiency has amplified rather than dampened.

What This Means for Practitioners

Plan for aggregate compute demand growth of 5-10x even as per-token costs fall 10x. The net effect is flat to increasing infrastructure budgets with dramatically more capability per dollar. Infrastructure teams should:

Assume total token consumption will grow 5-10x from current baselines over the next 18 months
Budget for continuous agentic workloads, not just interactive/batch AI usage
Prioritize inference optimization and MoE architecture adoption to stay ahead of hardware constraints
Plan multi-region deployments to distribute load across competing GPU suppliers (NVIDIA, AMD)

The winning strategy is aggressive deployment expansion enabled by falling unit costs, not cost reduction on existing workloads.