Key Takeaways
- GLM-5 (744B MoE, 77.8% SWE-bench) trained entirely on Huawei Ascend chips with MIT license distribution—proving Chinese labs achieved frontier capability without NVIDIA access
- DeepSeek V4 silently expanded to 1M tokens while targeting 1 trillion parameters on restricted hardware, demonstrating sustained capability scaling without US chips
- Seedance 2.0 delivers native 2K video generation with 4-modality input—frontier capability across the full modality spectrum (text, code, video)
- Multi-hardware portability (GLM-5 on 7 Chinese accelerator families) shows structural ecosystem independence, not one-off achievements
- Export controls failed on the supply side, but transparency regulations (AB 2013, EU AI Act) may be more effective at demand-side friction for Chinese model adoption in Western markets
Chinese AI Frontier: Multi-Modality Independence From NVIDIA
Key metrics demonstrating Chinese labs achieving frontier capability across modalities on restricted hardware
Source: Zhipu AI, DeepSeek, ByteDance announcements Feb 2026
The Multi-Modality Evidence
US export controls on advanced AI chips—expanded in October 2024 and reinforced with Zhipu's Entity List placement in January 2025—rested on a specific assumption: that denying access to H100/H200 GPUs would create a capability ceiling for Chinese AI development. February 2026 produced simultaneous evidence across multiple modalities that this assumption has failed.
Text/Code: GLM-5's Huawei Ascend Achievement
Zhipu AI—explicitly on the US Entity List with zero access to NVIDIA hardware—trained a 744B-parameter Mixture-of-Experts model entirely on Huawei Ascend chips using the MindSpore framework. The results are unambiguous:
| Metric | GLM-5 | Claude Opus 4.6 | GPT-5.3-Codex |
|---|---|---|---|
| SWE-bench Verified | 77.8% | 80.8% | 56.8% (Pro) |
| Training Hardware | Huawei Ascend | NVIDIA | NVIDIA |
| Licensed Under | MIT | Proprietary | Proprietary |
| Price per 1M input tokens | $0.80 | $15.00 | TBD |
The 3-point gap between GLM-5 (77.8%) and Opus 4.6 (80.8%) is within experimental noise for SWE-bench. GLM-5 exceeds GPT-5.3-Codex's 56.8% SWE-Bench Pro score. The Slime RL technique reduced hallucination rates from 90% to 34%—the largest single-generation hallucination reduction in published results.
Critical detail: multi-hardware portability across seven domestic Chinese accelerator families (Huawei Ascend, Moore Threads, Cambricon, Kunlun Chip, MetaX, Enflame, Hygon) signals that this is not a one-chip workaround but a systematic ecosystem strategy.
Scale/Context: DeepSeek V4's Silent Expansion
DeepSeek V4 provides evidence of sustained scaling. The silent production update on February 11 expanded context from 128K to 1 million tokens with greater than 60% accuracy at full length. The model reportedly targets 1 trillion parameters—roughly 2x the rumored scale of GPT-4's MoE architecture.
Three architectural innovations address the computational challenges of trillion-parameter training on constrained hardware:
- Manifold-Constrained Hyper-Connections: Reduces gradient flow bottlenecks during MoE training
- Engram Conditional Memory with O(1) lookup: Enables trillion-scale model static knowledge access without linear scaling
- Dynamic Sparse Attention: Reduces KV-cache memory requirements for 1M-token context
Consumer hardware compatibility (dual RTX 4090 or single RTX 5090 for quantized inference) further democratizes deployment. This is not a cloud-only model—it runs on gaming-grade consumer hardware.
Video: Seedance 2.0's Multi-Modality Integration
ByteDance's Seedance 2.0 extends the evidence beyond language to video. The model delivers native 2K resolution (2048x1080) with 4-modality input (text, images, video clips, audio)—the most comprehensive multimodal input control in the field. Audio-visual joint generation represents a genuine architectural first, not post-processing.
The implications extend beyond video generation to embodied AI: models that understand text, image, video, and audio simultaneously can control complex systems (industrial robots, autonomous vehicles) more effectively than single-modality models.
The Distribution Strategy: Open-Source as Export Control Circumvention
Capability is only half the story. Chinese labs have simultaneously developed distribution channels that export controls cannot restrict:
Open-Source Licensing as Distribution Channel
GLM-5 under MIT license means any global enterprise can deploy it without commercial restriction. DeepSeek's anticipated open-weight release continues the pattern established by V3 and R1. This is deliberate strategy: open-source distribution circumvents export-motivated barriers by making frontier-capable models freely available.
Compare with Western labs: OpenAI restricts GPT-5.3-Codex to API access (no open weights). Anthropic restricts Claude to API access. Google restricts Gemini 3.1 Pro to API access. Chinese labs are publishing model weights freely.
Public Market Capital as Funding Independence
Zhipu's Hong Kong IPO (HKD 4.35B raised, +34% stock surge on GLM-5 launch day) creates a market accountability mechanism and capital access independent of US venture capital. This is the first publicly-traded foundation model company globally—a capital structure innovation that insulates Chinese AI labs from VC-dependent funding that could be pressured by US sanctions.
Aggressive Pricing as Economic Pull
GLM-5 at $0.80/M input tokens is 6x cheaper than Gemini 3.1 Pro and 19x cheaper than Opus 4.6. DeepSeek V4 targets 10-40x lower cost than Western competitors. This pricing creates economic pull that does not require any geopolitical alignment to adopt—it is pure cost optimization.
The Regulatory Paradox: Export Controls vs. Transparency Regulations
The irony is that while export controls fail to constrain Chinese capability, US and EU transparency regulations create new pressure on Chinese model adoption:
California AB 2013 (Effective Jan 2026)
Requires disclosure of training data sources, including whether synthetic data was used. Enterprises using GLM-5 or DeepSeek V4 would need to disclose this in their AB 2013 compliance documentation—creating a transparency barrier that may limit adoption in regulated industries.
Example compliance statement: "This system uses GLM-5, which was trained on data sources including [opaque Chinese-language sources]. Use of synthetic data in training is [unknown to deployer]."
EU AI Act's GPAI Obligations (Effective Aug 2025)
Require adversarial testing, incident reporting, and copyright compliance for models above the 10^25 FLOP threshold. GLM-5 and DeepSeek V4 almost certainly exceed this threshold. EU enterprises deploying these models inherit the compliance obligation.
The burden shifts from the developer to the deployer: if you use an open-weight Chinese model in the EU, you become responsible for its GPAI compliance. This creates enterprise legal friction that pricing advantages may not overcome.
Structural Implications for Western AI Strategy
Three observations reshape how Western labs should think about competitive strategy:
1. Supply-Side Export Controls Failed
The premise—that denying chips would create a capability ceiling—was wrong. Chinese labs adapted through:
- Domestic accelerator development (Huawei Ascend, Moore Threads, Cambricon)
- Software optimization (MindSpore framework improvements)
- Training methodology innovation (Slime RL for hallucination reduction)
Export controls slowed Chinese capability development by 6-12 months but did not prevent it. The cost premium was absorbed by state-subsidized infrastructure and cheaper talent.
2. Demand-Side Regulation May Be More Effective
California AB 2013 and EU AI Act GPAI obligations create friction for Chinese model adoption in regulated Western markets—not because the models are unavailable, but because deploying them creates compliance overhead. This is a more durable control than supply-side restrictions because it operates through enterprise legal teams, not government restrictions.
3. Bifurcation Is the Likely Outcome
Western markets standardize on OpenAI/Anthropic/Google with regulatory compliance built-in. Chinese markets and non-regulated use cases standardize on GLM-5/DeepSeek with cost leadership. The competitive landscape has bifurcated into capability competition (where China competes effectively) and ecosystem competition (where Western labs retain advantages).
What This Means for ML Engineers
The practical implications depend on your market and regulatory context:
For US/EU Enterprises (Regulated Markets)
- Evaluate Chinese models conservatively: GLM-5 offers remarkable cost efficiency, but the compliance documentation burden may exceed the savings for regulated use cases
- Prepare AB 2013/GPAI documentation: If you choose to deploy GLM-5 or DeepSeek, budget for legal review and compliance documentation
- Consider self-hosting carefully: While GLM-5's MIT license permits self-hosting, US enterprise deployment faces indirect restrictions via entity list implications
For Internal/Non-Regulated Use Cases
- GLM-5 is operationally ready: For internal tools, research, non-customer-facing systems, GLM-5 offers 77.8% SWE-bench capability at 1/20th the cost of Opus
- Self-hosting requires GPU infrastructure: Budget for on-premise deployment. Consumer-grade hardware (RTX 4090) can run GLM-5 for internal use
- Multi-model strategy: Maintain API access to Western frontier models for regulated/customer-facing work; self-host Chinese models for internal cost optimization
For Western Labs
- Regulatory moats are durable: Your compliance advantage (AB 2013 disclosures, GPAI documentation, safety assessments) is less vulnerable to Chinese competition than pricing or benchmark performance
- Enterprise integration depth matters more: Anthropic's 500 customers at $1M+/year and OpenAI's Snowflake partnerships reflect ecosystem lock-in that GLM-5's lower price cannot replicate
- Compete on value, not tokens: Shift emphasis from per-token pricing to total-cost-of-ownership (compliance, integration, support, safety assurance)