Key Takeaways
- Gemma 4 31B Apache 2.0 ranked #3 on Arena AI with zero commercial restrictions, available immediately
- Llama 4 Maverick (open-weight) scores GPQA Diamond 69.8 vs GPT-4o 53.6—open model exceeding proprietary frontier on key benchmarks
- Anthropic's $30B run-rate depends on 15-20% capability gap. Open-weight models closing to "good enough" monthly.
- Anti-distillation coalition (OpenAI, Anthropic, Google) shares attack fingerprints but Google simultaneously gives away Gemma 4
- ROI of illegal distillation drops as legal open-weight options approach proprietary capability levels
Google's Hedge: Defend Gemini While Giving Away Gemma
OpenAI, Anthropic, and Google formed an anti-distillation coalition in April 2026, sharing attack fingerprints and documenting Anthropic's 16M suspicious API queries from Chinese AI labs. The stated goal: defend frontier model IP from illegal distillation and model copying.
Yet simultaneously, Google released Gemma 4 under Apache 2.0 licensing with no commercial restrictions, no per-unit pricing, no mandatory user count reporting. Gemma 4's 31B variant ranks #3 globally on Arena AI, placing it ahead of OpenAI's GPT-4o mini and approaching Gemini 2.0's capability tier.
Google is playing both sides: defending Gemini's proprietary value while giving Gemma 4 away for free. The strategic logic is platform capture—Gemma drives Google Cloud adoption, reducing LLM switching costs and locking customers into BigQuery, Vertex AI, and Google's infrastructure. But it directly undermines the coalition's narrative that frontier capability must be gated to protect against distillation.
Open-Weight Models Are Closing the Capability Gap Quarterly
Meta's Llama 4 Maverick, released under open-weight license with 700M monthly active user restriction, scores GPQA Diamond 69.8. Compare this to OpenAI's GPT-4o (proprietary, API-only) at GPQA Diamond 53.6. The open model exceeds the proprietary frontier on a key reasoning benchmark.
Llama 4 achieves 73.4% on MMLU and 73.7% on MathVista—well within proprietary frontier territory. Meta did not disclose extensive additional benchmarks, but Arena AI rankings place Llama 4 Maverick at #15 globally (factoring in the real model, not the Arena-optimized variant), competitive with many proprietary offerings.
This is recent and accelerating. Six months ago, the capability gap between open-weight and proprietary was 25-30 percentage points on reasoning tasks. Today it's 5-15 points. At quarterly closure rates, "good enough" convergence (defined as 90% of proprietary capability for 90% of common tasks) is 12-18 months away.
Open-Weight vs Proprietary: How Close Is the Gap?
Benchmark comparison showing open-weight models approaching or exceeding proprietary frontier performance
| MMMU | Model | License | MathVista | GPQA Diamond | Active Params |
|---|---|---|---|---|---|
| 69.1% | GPT-4o (proprietary) | API only | 63.8% | 53.6% | Unknown |
| 73.4% | Llama 4 Maverick (open) | Open (700M MAU cap) | 73.7% | 69.8% | 17B |
| N/A | Gemma 4 31B (open) | Apache 2.0 | N/A | N/A | 31B |
| N/A | Gemma 4 26B MoE (open) | Apache 2.0 | N/A | N/A | 3.8B |
Source: Meta AI Blog, Google DeepMind, OpenAI public benchmarks
The ROI of Illegal Distillation Collapses
Anthropic documented 16M suspicious API queries from DeepSeek, Moonshot, MiniMax, and other Chinese labs, claiming these firms were systematically extracting Claude capability. The implicit accusation: these labs had no way to reach frontier capability except by distilling from proprietary APIs.
That premise is collapsing. Llama 4 Maverick and Gemma 4 provide 80-90% of what legal distillation could extract, at zero cost. Chinese labs can now:
Option A (illegal): Distill Claude via 16M+ expensive API calls, risking legal exposure, to gain uncertain additional capability.
Option B (legal): Fine-tune Llama 4 or Gemma 4 on domain-specific data, fully open-weight, zero legal risk.
Option B increasingly dominates on ROI. As the capability gap narrows, the cost-benefit analysis for illegal distillation shifts negative. The anti-distillation coalition may be defending a moat that is being drained from a different direction—not through IP theft, but through commoditization.
Anthropic's $1M+ Customer Premium Depends on a Shrinking Gap
Anthropic's $30B annualized run-rate is built on 1000+ enterprise customers at $1M+/year API contracts. These customers pay for Claude's frontier capability: superior reasoning, better safety, stronger consistency on complex tasks.
If Gemma 4 (free, Apache 2.0) and Llama 4 (free, open-weight) approach Claude's capability at 80-90% on most tasks, the addressable market for $1M+/year API contracts shrinks. Enterprises running 80% of workloads on local open-weight models and reserving Claude for 20% of high-touch tasks generates $200-300K/year, not $1M+.
The 3.5GW TPU commitment must fund capabilities that remain meaningfully ahead of free alternatives. This is feasible—frontier models can stay ahead via superior training, better RLHF, novel architecture innovations. But the margin for pricing power compresses as the gap closes.
What This Means for Practitioners
Enterprises evaluating build-vs-buy for AI should aggressively portfolio their deployment: (1) For tasks where 80-85% capability is sufficient (summarization, extraction, classification, routine reasoning), adopt open-weight models (Gemma 4, Llama 4) and fine-tune them on domain data. (2) For the 15-20% of tasks requiring frontier quality (complex multi-step reasoning, novel problem-solving, safety-critical decisions), reserve frontier API spend (Claude, GPT-4o).
This hybrid approach reduces your Claude spend from $1M+ annually to $150-300K, while maintaining quality on high-impact tasks. For startups and mid-market firms, the economics are even more favorable—build on open-weight, hire ML engineers to fine-tune rather than paying for API contracts.
The next 12 months are critical. As Llama 5 and Gemma 5 arrive (expected late 2026), open-weight models will likely cross the "good enough" threshold on 85-90% of common enterprise tasks. Procurement decisions made today determine whether your organization captures that cost shift or locks into expensive API contracts.