Pipeline Active
Last: 03:00 UTC|Next: 09:00 UTC
← Back to Insights

US Export Controls Failed: Three Chinese Labs Achieve Simultaneous Frontier Parity, MoE Architecture Renders NVIDIA Restrictions Obsolete

In February 2026, Qwen 3.5 (397B/17B MoE), GLM-5 (744B/44B MoE), and InternVL3-78B collectively matched or exceeded Western frontier models on every major benchmark—while GLM-5 was trained entirely on Huawei Ascend chips with zero NVIDIA hardware. Mixture-of-Experts architecture fundamentally bypasses US export controls by shifting computational bottlenecks from per-chip FLOPS to inter-chip bandwidth.

TL;DRBreakthrough 🟢
  • Three independent Chinese AI labs released frontier-parity models within a four-day window in February 2026, eliminating the capability gap across text reasoning, code generation, vision-language understanding, and video processing
  • Qwen 3.5: 87.8 MMLU-Pro, 87.5 VideoMME, 76.4% SWE-bench Verified at $0.48/M tokens (31x cheaper than Claude Opus)
  • GLM-5: 50.4% Humanity's Last Exam (beats Claude Opus 4.5), 77.8% SWE-bench Verified, trained entirely on 100,000 Huawei Ascend chips with zero NVIDIA dependency at $0.80/M tokens
  • InternVL3-78B: 72.2 MMMU (exceeds GPT-4o at 69.9 and Claude 3.5 Sonnet at 70.4) under Apache 2.0 license; InternVL3.5-8B achieves 73.4 MMMU with 10x fewer parameters
  • MoE architecture is the specific adaptation that makes frontier AI trainable on weaker individual accelerators, rendering export control frameworks structurally obsolete
Chinese AI modelsexport controls failureQwen 3.5GLM-5InternVL36 min readMar 2, 2026

Key Takeaways

  • Three independent Chinese AI labs released frontier-parity models within a four-day window in February 2026, eliminating the capability gap across text reasoning, code generation, vision-language understanding, and video processing
  • Qwen 3.5: 87.8 MMLU-Pro, 87.5 VideoMME, 76.4% SWE-bench Verified at $0.48/M tokens (31x cheaper than Claude Opus)
  • GLM-5: 50.4% Humanity's Last Exam (beats Claude Opus 4.5), 77.8% SWE-bench Verified, trained entirely on 100,000 Huawei Ascend chips with zero NVIDIA dependency at $0.80/M tokens
  • InternVL3-78B: 72.2 MMMU (exceeds GPT-4o at 69.9 and Claude 3.5 Sonnet at 70.4) under Apache 2.0 license; InternVL3.5-8B achieves 73.4 MMMU with 10x fewer parameters
  • MoE architecture is the specific adaptation that makes frontier AI trainable on weaker individual accelerators, rendering export control frameworks structurally obsolete

The Synchronous Convergence: A Strategic Signal

Within a four-day window in February 2026, three Chinese AI labs released models that collectively eliminate the capability gap with Western frontier AI across every major modality:

Alibaba's Qwen 3.5 (February 16): 397B total parameters with 17B active via Mixture-of-Experts routing. Achieves 87.8 MMLU-Pro (competitive with frontier Western models), 76.4% SWE-bench Verified, 87.5 VideoMME with native video processing. 1M token context. Pricing: $0.48/M input tokens.

Zhipu AI's GLM-5 (February 11): 744B total parameters with 44B active. Achieves 50.4% on Humanity's Last Exam (exceeds Claude Opus 4.5), 77.8% SWE-bench Verified. MIT license. Pricing: $0.80/M tokens. Trained on 100,000 Huawei Ascend chips processing 28.5 trillion tokens.

Shanghai AI Lab's InternVL3-78B: 78B dense parameters achieving 72.2 MMMU on standardized vision-language understanding benchmarks (exceeds GPT-4o-latest at 69.9, Claude-3.5 Sonnet at 70.4). Built on Qwen2.5-72B language backbone. Apache 2.0 license. InternVL3.5-8B achieves 73.4 MMMU with only 8 billion parameters—exceeding the 78B version.

This is not incremental progress. This is synchronized capability arrival across three independent organizations, suggesting a structural explanation.

The MoE Architecture: How Export Controls Failed

The US export controls on NVIDIA H100 and A100 GPUs rest on a simplifying assumption: restricting peak per-chip compute limits total training capability. The assumption is false because of Mixture-of-Experts architectures.

A 744B-parameter model is not a monolithic compute problem. In a 744B MoE with 44B active parameters (GLM-5), each token routes to only 8 of 256 expert sub-networks. The computational bottleneck shifts from per-chip FLOPS (where Huawei's Ascend chips lag) to inter-chip communication bandwidth (where sufficient quantity can overcome quality gaps).

GLM-5's training on 100,000 Huawei Ascend chips is the proof case. The model achieves frontier reasoning (50.4% HLE, outperforming Opus 4.5) on purely domestic hardware—not as a workaround or degraded alternative, but as a complete architectural adaptation that makes the export control framework irrelevant for frontier model training.

Qwen 3.5's hybrid attention architecture—combining Gated Delta Networks (linear complexity) with full attention every fourth layer—pushes MoE efficiency further. The result: 8.6x to 19.0x throughput improvement over predecessors on 8x H100 clusters. Even where Chinese labs do use NVIDIA hardware, the MoE architecture extracts dramatically more value per chip.

Native Multimodal Pre-Training Is the Architectural Settlement

Both Qwen 3.5 and GLM-5 feature native multimodal early fusion—image, video, and text understanding emerges from unified pre-training, not bolted-on vision modules. InternVL3's native multimodal pre-training with Variable Visual Position Encoding (V2PE) and Mixed Preference Optimization (MPO) produced 72.2 MMMU, competitive with closed-source Gemini-2.5 Pro (75.0) and exceeding GPT-4o-latest (69.9).

The architectural convergence across three independent labs—all moving toward native multimodality rather than post-hoc adaptation—suggests this is the settled approach, not an experimental choice. Western labs using post-hoc multimodal adaptation may face a structural quality ceiling against competitors with native fusion architectures.

Chinese Frontier Models: Architecture and Performance Comparison

Three simultaneous releases with complementary strengths across text, vision, video, and code

ModelLicensePrice/MHardwareSWE-benchMultimodalArchitecture
Qwen 3.5 (Alibaba)Apache 2.0$0.48H100 (stockpile)76.4%VideoMME 87.5397B/17B MoE
GLM-5 (Zhipu)MIT$0.80-1.00Huawei Ascend77.8%Native fusion744B/44B MoE
InternVL3-78B (Shanghai)Apache 2.0Open-weightStandard GPUN/AMMMU 72.2 SOTA78B dense VLM

Source: MarkTechPost / NxCode / arXiv:2504.10479 / official model pages

Ecosystem Compounding: The Self-Reinforcing Advantage

InternVL3-78B uses Qwen2.5-72B as its language backbone. This means InternVL3 is built ON TOP of the Qwen ecosystem. The Chinese open-source AI stack is becoming self-reinforcing: one lab's language model serves as the foundation for another lab's vision-language model, which serves as a component in a third lab's agentic system.

This is an ecosystem-level compounding advantage that no single Western lab replicates internally. When one lab's innovation becomes another lab's foundation, the entire system moves faster than any single lab can move alone.

Additionally, Zhipu's post-IPO $558M funding context enables rapid iteration, while Qwen's MIT license ensures global developer adoption. These labs are not competing in isolation—they are building a composable ecosystem.

The Pricing Moat Collapse: 31x Cost Differential Is Not Recoverable

A developer choosing between:

  • Qwen 3.5-Plus: $0.48/M input tokens, 87.8 MMLU-Pro, 1M context, Apache 2.0 license
  • Claude Opus 4.6: $15/M input tokens, comparable reasoning performance, proprietary

...faces a 31x cost differential for similar-tier performance on standardized benchmarks. This is not recoverable through marginal quality improvements. Even for enterprises that would never self-host, the API-accessible Chinese models via OpenRouter and compatible endpoints make economic substitution trivial.

The 'silent deployment' trend—Western applications using Chinese models without public disclosure—is driven by this arithmetic, not ideology. When CFOs see a 31x cost reduction with no measured quality penalty, the decision is economically determined.

The 201-language support in Qwen 3.5 (up from 82 in the prior generation) signals a deliberate play for global developer adoption outside the US-China binary. Markets in Southeast Asia, Africa, South America, and the Middle East—where neither US regulatory oversight nor Chinese government alignment is a primary concern—represent the next frontier for adoption.

Frontier Model API Pricing: Chinese vs Western (March 2026)

Chinese open-source models offer 15-31x cost advantage over Western proprietary equivalents

Source: Public API pricing / LLM-stats.com / OpenRouter

The Efficiency Curve Favors Edge Deployment, Not Cloud Proprietary

InternVL3.5-8B achieves 73.4 MMMU—exceeding the 78B version—with just 8 billion parameters. Qwen3.5-35B-A3B activates only 3 billion parameters while outperforming previous-generation 235B models. The efficiency curve means that vision-language understanding will soon run on smartphones and edge devices.

This creates an asymmetric moat erosion: open-source vision-language understanding scales to edge while remaining competitive with cloud proprietary. Desktop computer-use (OSWorld), by contrast, requires real-time inference with low latency on complex multi-step tasks—a capability that favors cloud-deployed proprietary models. This explains why Anthropic's 72.5% OSWorld remains unmatched: it requires continuous cloud infrastructure that open-source ecosystems have not needed to optimize for.

The Contrarian Case: Benchmarks Are Not Production

Self-reported benchmarks from Chinese labs often lack independent verification. Alibaba claims Qwen 3.5 'outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro on 80% of benchmarks'—but this is self-selected benchmark cherry-picking. InternVL3-78B's MMMU of 72.2 is independently verified, but Qwen and GLM numbers await third-party reproduction as of March 2026.

Additionally, the 1,490GB memory requirement for full GLM-5 deployment means it is datacenter-only—the 'open-source' label does not mean 'anyone can run it.' The real threat is API-accessible pricing, not self-hosting ubiquity.

Production quality also depends on instruction following, safety, hallucination rates, long-context reliability, and integration quality—dimensions where closed-source models may retain advantages that benchmarks do not capture.

What This Means for Practitioners

For Western AI labs, the implications are structural:

  • Pricing moat is collapsing. The remaining differentiators are: (a) computer-use/agentic capabilities where Anthropic leads at 72.5% OSWorld, (b) safety and compliance infrastructure for regulated industries, and (c) brand trust for sensitive enterprise data.
  • For practitioners building production AI: Default to Chinese open-source (Qwen 3.5, GLM-5, InternVL3) for cost-sensitive workloads. The 31x pricing advantage with no benchmark quality penalty is the rational economic choice. Reserve proprietary model spend for desktop computer-use agents (Anthropic only) and safety-critical applications requiring vendor accountability.
  • Export controls are now an archaeological artifact. The MoE architecture proves that restricting NVIDIA chip access does not prevent frontier model training—it just forces architectural innovation that, ironically, often increases efficiency. The policy failed at its core objective.
  • The ecosystem advantage is durable. Because one Chinese lab's model serves as another lab's foundation, the entire stack improves faster than Western labs can move internally. This compounding dynamic is not easily reversed by superior per-lab resources.
Share