Open-Weight Edge Inference Economics
Key metrics showing why local deployment becomes economically dominant for commodity workloads
Source: Mistral AI, Lightricks, Intel, OpenAI
API pricing market experiencing simultaneous squeeze from above and below. GPT-5.4 Nano at $0.20/$1.25 per million tokens signals frontier providers see compression coming. Mistral Small 3.1's 24B multimodal model runs at 150 tokens/sec on consumer hardware (RTX 4090, 32GB Mac), while Intel OpenVINO 2026 enables 3.8x NPU inference speedup on standard corporate hardware. LTX-2.3's 4K video generation on 10GB VRAM GPUs demonstrates open-weight models now span text, image understanding, and video generation at production quality. Economic rationale for mid-tier API pricing ($2-5/1M tokens) evaporating — enterprises deploy locally for zero marginal cost, while frontier-only use cases (1M context, superhuman computer use) maintain premium pricing. Only models with genuinely unique capabilities sustain high prices.
Key metrics showing why local deployment becomes economically dominant for commodity workloads
Source: Mistral AI, Lightricks, Intel, OpenAI