Pipeline Active
Last: 21:00 UTC|Next: 03:00 UTC
← Back to Insights

Export Controls Backfired: GLM-5 on Huawei Ascend Proves Nvidia Restrictions Enabled Chinese Hardware Independence

GLM-5 trained on 100,000 Huawei Ascend 910B chips with zero Nvidia GPUs reaches 77.8% SWE-bench. Combined with $33B digital trade surplus and $23B ByteDance CapEx, export controls accelerated rather than prevented Chinese AI independence.

TL;DRBreakthrough 🟢
  • <strong>GLM-5 achievement:</strong> <a href="https://huggingface.co/zai-org/GLM-5">744B MoE model trained entirely on 100,000 Huawei Ascend 910B chips, zero Nvidia hardware, achieves 77.8% SWE-bench</a> — frontier-level performance
  • <strong>Performance parity:</strong> GLM-5 within 3 points of Claude Opus 4.6 (80.8%), achieves 92.7% on AIME 2026, 86% on GPQA-Diamond, outperforms Opus on agent benchmarks (Humanity's Last Exam)
  • <strong>Financial sustainability:</strong> China's $33B digital services trade surplus (2x YoY), ByteDance $23B 2026 CapEx including $11.5B chip procurement, proves self-sustaining hardware independence ecosystem
  • <strong>Creative AI co-leadership:</strong> Chinese companies control 2 of 4 leading video AI models (Kling 3.0, Seedance 2.0) — first creative AI category with Chinese global co-leadership
  • <strong>Architectural innovation:</strong> Chinese labs consistently extract more capability per compute unit via MoE sparsity (3-5% activation) and specialized architectures — not catching up, but inventing
china-aiexport-controlshuawei-ascendhardware-independenceglm-55 min readMar 8, 2026

Key Takeaways

  • GLM-5 achievement: 744B MoE model trained entirely on 100,000 Huawei Ascend 910B chips, zero Nvidia hardware, achieves 77.8% SWE-bench — frontier-level performance
  • Performance parity: GLM-5 within 3 points of Claude Opus 4.6 (80.8%), achieves 92.7% on AIME 2026, 86% on GPQA-Diamond, outperforms Opus on agent benchmarks (Humanity's Last Exam)
  • Financial sustainability: China's $33B digital services trade surplus (2x YoY), ByteDance $23B 2026 CapEx including $11.5B chip procurement, proves self-sustaining hardware independence ecosystem
  • Creative AI co-leadership: Chinese companies control 2 of 4 leading video AI models (Kling 3.0, Seedance 2.0) — first creative AI category with Chinese global co-leadership
  • Architectural innovation: Chinese labs consistently extract more capability per compute unit via MoE sparsity (3-5% activation) and specialized architectures — not catching up, but inventing

Why the Export Control Strategy Failed

The US export control strategy rested on a simple thesis: restricting Nvidia H100/A100 access would constrain Chinese AI training compute, slowing frontier development.

GLM-5 is the definitive counterexample. Zhipu AI trained a 744B parameter MoE model entirely on 100,000 Huawei Ascend 910B processors, achieving frontier-level benchmarks:

  • 77.8% SWE-bench Verified (highest open-source, within 3 points of Opus)
  • 92.7% AIME 2026 (mathematical reasoning)
  • 86% GPQA-Diamond (science benchmarks)
  • 96.9% HMMT November 2025 (high school math)
  • 50.4 vs 43.4 on Humanity's Last Exam with tools (beats Claude Opus 4.5)
  • 62.0 vs 37.0 on BrowseComp (agent-augmented reasoning)

These are not second-tier results. By any objective definition, GLM-5 is a frontier model. It was trained without a single Nvidia chip.

Architectural Response to Constraint: MoE Sparsity

The Ascend 910B is not an H100 equivalent in raw FLOPS. But the constraint produced architectural innovation, not capability reduction. GLM-5 uses 256 experts with top-8 routing and 5.4% activation rate — meaning only 40 billion active parameters per token despite 744B total.

This architectural response reflects a deliberate design choice:

  • Instead of maximizing dense compute, Chinese labs designed models that extract more capability per unit of compute via sparse gating
  • Instead of competing on FLOPS, they optimized for efficiency — fewer parameters active per token, lower memory footprint, faster inference
  • DeepSeek V4 (unreleased) extends this: Projected 32B active of 1 trillion parameters (3.2% activation rate) using Engram conditional memory and manifold-constrained hyper-connections

The constraint bred architectural innovation. When you cannot brute-force with maximum FLOPS, you optimize for elegance.

Financial Sustainability of Hardware Independence

GLM-5 proves technical viability. The financials prove long-term sustainability.

This is not reliance on US hardware. This is active, self-funded investment in hardware independence. The export control strategy inadvertently created a multi-year runway for Chinese labs to develop proprietary silicon and manufacturing partnerships.

Chinese Dominance in Creative AI: The New Category

Chinese AI leadership extends beyond LLMs into video generation — the first creative AI modality where China has achieved global co-leadership:

  • Kling 3.0 (Kuaishou): First native 4K 60fps video generation (industry first, not matching existing Western models)
  • Seedance 2.0 (ByteDance): Audio-video co-generation via Dual-Branch Diffusion Transformer (capability Western models lack)
  • Comparison: Sora 2 (OpenAI) and Veo 3.1 (Google) remain strong, but Chinese models are not catching up — they are inventing capabilities

This breaks the pattern where China was historically 12-18 months behind US labs. In video AI, the timeline collapsed to parallel development with specialization.

DeepSeek V4: The Unverified But Architecturally Sound Breakthrough

Flag: UNVERIFIED, pre-release, requires independent benchmarks before credibility. But the architecture is technically sound:

DeepSeek V4's Engram conditional memory module separates static knowledge retrieval (O(1) N-gram hash lookups into system RAM) from dynamic reasoning (GPU compute). This means knowledge recall does not consume GPU compute cycles.

If this architecture delivers as claimed, V4 could run a 1-trillion-parameter model on dual RTX 4090s — consumer hardware costing $3,000 versus enterprise GPU clusters at $500,000+. This would be transformative for deployment cost. But it remains UNVERIFIED.

Architectural papers published: Engram (January 13, 2026), mHC (December 31, 2025) are publicly available and technically rigorous. Independent benchmarks are needed before adoption claims.

Policy Implication: Export Controls Are Counterproductive

If export controls do not constrain capability (GLM-5 proves they do not) but do accelerate domestic hardware development (Ascend, SeedChip investment), then the controls are counterproductive on their own terms.

They convert Chinese dependency on US hardware into Chinese investment in hardware independence — creating a long-term strategic competitor in silicon rather than maintaining a dependency that could serve as leverage.

Alternative dynamic: If Nvidia hardware remained accessible, Chinese labs would continue to optimize for NVFP4 and Blackwell architecture. Restricted access creates incentive to develop parallel silicon stacks (Ascend, Huawei) and architectural innovations (MoE sparsity) that may prove more efficient than Western approaches.

Competitive Implications for Western AI Labs

Chinese models are not just cheaper — they are architecturally different in ways that produce genuine efficiency advantages:

  • MoE sparsity at 3-5% activation rates enables per-token costs that match or beat dense dense-model approaches
  • Engram-style memory separation (if validated) could reduce GPU memory footprint for long-context models by 5-10x
  • Ascend-optimized training pipelines represent alternative scaling philosophy that may be more sustainable than Western maximum-FLOPS approaches

The strategic concern is not that China has matched US hardware (they have not). It is that China has demonstrated frontier AI does not require US hardware — a distinction that matters profoundly for geopolitics and long-term competition.

What This Means for Practitioners

For ML engineers evaluating open-source models:

1. Do not discount GLM-5 based on hardware origin. It performs at frontier level despite Ascend training. For self-hosting, it runs on standard Nvidia hardware (8x H200 GPUs for FP8 quantization). The MIT license removes all commercial restrictions.

2. Evaluate GLM-5 on your workloads. For coding tasks, measure quality against Claude Opus. If 97% of Opus quality at 20% of the cost meets your threshold, adoption is straightforward.

3. Plan for DeepSeek V4 evaluation. Monitor for independent benchmarks when V4 releases (March 2026 projected). If the Engram architecture validates, deployment economics shift dramatically for long-context and reasoning workloads.

Timeline: GLM-5 is production-ready now. DeepSeek V4 evaluation requires 2-4 weeks post-release for rigorous benchmarking before production adoption.

Adoption barriers: The non-technical barrier is greater than the technical one. Enterprises may have organizational concerns about Chinese-origin model provenance, data governance, and regulatory compliance. Address these upfront with legal/compliance teams, not in production.

Chinese AI Infrastructure Scale -- March 2026

Key metrics showing scale and financial sustainability of Chinese AI ecosystem independent of US hardware

$33B
Digital Trade Surplus
2x YoY
$23B
ByteDance 2026 CapEx
incl. SeedChip
77.8%
GLM-5 SWE-bench
0 Nvidia GPUs
100M
Doubao DAU
lowest CAC ever

Source: Bloomberg, EqualOcean, HuggingFace, 36Kr

Video AI Global Leadership -- Chinese Co-Dominance

Chinese companies control 2 of 4 leading video AI models, establishing first creative AI category with Chinese global co-leadership

AudioModelOriginStrengthResolution
No nativeSora 2 (OpenAI)USCinematic qualityUp to 4K
No nativeKling 3.0 (Kuaishou)China4K 60fps nativeNative 4K 60fps
Dual-Branch co-genSeedance 2.0 (ByteDance)ChinaAudio-video syncUp to 4K
Spatial audioVeo 3.1 (Google)USSpatial audio4K

Source: Cliprise Medium, CNBC, imagine.art

Share