China's Spring Festival AI Blitz: Three Companies Launch Frontier Models While US Chip Controls Fall

Alibaba Qwen 3.5 (397B open-weight), Zhipu GLM-5 (744B, Huawei Ascend-only), ByteDance Doubao 2.0, released simultaneously, prove Chinese semiconductor independence and open-source dominance. US export controls have functionally failed within 28 months.

TL;DRBreakthrough 🟢

•GLM-5 was trained entirely on Huawei Ascend chips (744B/40B MoE, 28.5T tokens), demonstrating functional independence from US semiconductors for 100B+ scale model training—the explicit policy objective of US export controls (imposed October 2023) has been empirically falsified within 28 months.
•Qwen surpasses Meta's Llama as most-downloaded model family on HuggingFace with 201-language support vs Llama's English-centric training, creating inherent advantages in non-English markets (Southeast Asia, Africa, Middle East, Latin America).
•Three simultaneous releases (Qwen 3.5, GLM-5, Doubao 2.0, Kling 3.0) in the Lunar New Year window mirror and amplify the January 2025 DeepSeek R1 moment—but where R1 was a single company breakthrough, the 2026 Spring Festival represents coordinated competitive acceleration across the Chinese AI ecosystem.
•Kling 3.0 achieves multi-shot video consistency across camera angles, representing a capability that neither OpenAI's Sora nor Google's Veo 2 have demonstrated—China now leads video generation alongside text and image modalities.
•Chinese MoE architectures (GLM-5, Qwen) are converging on the same efficiency response to compute constraints, creating an indigenous architecture ecosystem (mHC + MoE + sparse attention) that compounds independently of Western transformer improvements.

China AIopen-source modelsQwenGLM-5Huawei Ascend5 min readFeb 23, 2026

Key Takeaways

GLM-5 was trained entirely on Huawei Ascend chips (744B/40B MoE, 28.5T tokens), demonstrating functional independence from US semiconductors for 100B+ scale model training—the explicit policy objective of US export controls (imposed October 2023) has been empirically falsified within 28 months.
Qwen surpasses Meta's Llama as most-downloaded model family on HuggingFace with 201-language support vs Llama's English-centric training, creating inherent advantages in non-English markets (Southeast Asia, Africa, Middle East, Latin America).
Three simultaneous releases (Qwen 3.5, GLM-5, Doubao 2.0, Kling 3.0) in the Lunar New Year window mirror and amplify the January 2025 DeepSeek R1 moment—but where R1 was a single company breakthrough, the 2026 Spring Festival represents coordinated competitive acceleration across the Chinese AI ecosystem.
Kling 3.0 achieves multi-shot video consistency across camera angles, representing a capability that neither OpenAI's Sora nor Google's Veo 2 have demonstrated—China now leads video generation alongside text and image modalities.
Chinese MoE architectures (GLM-5, Qwen) are converging on the same efficiency response to compute constraints, creating an indigenous architecture ecosystem (mHC + MoE + sparse attention) that compounds independently of Western transformer improvements.

Coordinated Launch as Strategic Signal

The Lunar New Year 2026 (Snake Year) model avalanche from China's three largest tech companies is not coincidental. The timing pattern—concentrated releases in a two-week cultural window—mirrors and amplifies the January 2025 DeepSeek R1 moment that triggered $600B in NVIDIA market cap losses. But where R1 was a single company breakthrough, the 2026 Spring Festival represents coordinated competitive acceleration across the Chinese AI ecosystem.

The Three Releases

Alibaba Qwen 3.5: 397 billion parameters, open-weight with Apache license, 201 language/dialect support (up from 82 in Qwen 3.0), 60% cheaper to operate and 8x better at large workloads than its predecessor. The benchmark claims—parity with GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro—are self-reported and require independent verification. But the multilingual coverage (201 languages) is independently verifiable and creates genuine differentiation for international deployment.

Zhipu AI GLM-5: 744B total parameters, 40B active via MoE, scaling from 355B (GLM-4.5). The strategically consequential claim: trained entirely on Huawei Ascend chips with 28.5T tokens. If verified, this establishes complete independence from US semiconductors for frontier-scale model training. GLM-5 uses DeepSeek's sparse attention mechanism (DSA), demonstrating the cross-pollination within China's architectural ecosystem.

ByteDance Doubao 2.0: Advanced reasoning and multi-step task execution, leveraging the 200-million-user Doubao platform—the largest AI chatbot user base in China. ByteDance secured exclusive AI integration with the CCTV Spring Festival Gala (the most-watched broadcast globally), ensuring maximum cultural visibility. Simultaneously, ByteDance's SeeDance 2.0 video generation model launched, competing directly with Kuaishou's Kling 3.0.

China's Spring Festival 2026 Model Releases: Key Specifications

Side-by-side comparison of the three major Chinese model releases and their strategic differentiators

Model	Key Claim	Languages	Parameters	Architecture	Strategic Edge
Qwen 3.5 (Alibaba)	GPT-5.2 parity	201	397B	MoE (open-weight)	HuggingFace #1 downloads
GLM-5 (Zhipu AI)	Ascend-only training	Not disclosed	744B / 40B active	MoE + DSA	US chip independence
Doubao 2.0 (ByteDance)	ChatGPT parity	Chinese + English	Not disclosed	Proprietary	200M MAU distribution
Kling 3.0 (Kuaishou)	Multi-shot consistency	Multilingual audio	Not disclosed	Diffusion + Consistency	Video generation leader

Source: CNBC, Euronews, MIT Tech Review, TeamDay AI

The Hardware Independence Inflection

GLM-5's Huawei Ascend training achievement deserves separate analysis because of its geopolitical implications. US export controls on AI chips to China (October 2023) were designed to create a compute bottleneck preventing Chinese labs from training frontier models. GLM-5's 744B parameter training on domestic silicon demonstrates that this bottleneck has been circumvented—not through smuggling or workarounds, but through legitimate indigenous hardware development.

The architectural response is also significant: Chinese labs have converged on MoE (Mixture of Experts) architectures specifically because MoE allows larger total parameter counts with smaller active parameter counts, reducing per-inference compute requirements. GLM-5 (744B total / 40B active), Qwen 3.5 (likely MoE-based given the efficiency claims), and DeepSeek V3 (671B total / 37B active) all follow this pattern. MoE is not just an efficiency technique—it is China's architectural answer to compute constraints.

Open-Source Distribution Dominance

Qwen has surpassed Meta's Llama as the most-downloaded model family on HuggingFace. This is a structural shift, not a momentary spike. The 201-language support gives Qwen an inherent advantage in non-English markets where Llama's English-centric training is a limitation. For developers in Southeast Asia, Africa, the Middle East, and Latin America, Qwen's multilingual coverage makes it the practical default.

The open-source strategy creates a flywheel: open-weight releases build developer mindshare, developer fine-tuning creates an ecosystem, the ecosystem locks in deployment infrastructure, and the infrastructure creates distribution advantages for subsequent model releases. Meta initiated this playbook with Llama 2 in 2023; Chinese labs have now executed it more effectively.

The Video Generation Front

Kling 3.0's multi-shot video generation breakthrough—maintaining subject identity across different camera angles—represents a capability that neither OpenAI's Sora nor Google's Veo 2 have demonstrated. Combined with SeeDance 2.0 from ByteDance, Chinese companies now dominate both the quality and consistency frontiers of AI video generation. This matters because video is the next major content modality after text and images, and China's leadership in this space was established before Western competitors matured.

What This Means for Practitioners

Model selection strategy for international deployments:

Evaluate Qwen 3.5 as primary model for multilingual applications: The 201-language support and HuggingFace ecosystem dominance make it the natural default for international products. For any customer base outside English-dominant markets, Qwen's multilingual capabilities create genuine value over Llama's English-biased training.
Monitor GLM-5 for independent benchmark verification (1-3 months): If GLM-5 truly achieves Ascend-only training, third-party benchmarks will validate the claims. In the meantime, treat benchmark claims as self-reported and require independent evaluation before production deployment.
Evaluate Kling 3.0 for video generation workflows: The multi-shot consistency capabilities represent genuine capability differentiation. If you're building video generation features, test Kling 3.0 against Sora/Veo 2. The consistency advantage may be decisive.
Track the open-source ecosystem shift: If Qwen continues to lead HuggingFace downloads, the developer community will naturally migrate toward Qwen fine-tuning and ecosystem tools. Plan accordingly—lock-in happens at the ecosystem level, not the model level.
Monitor Chinese semiconductor independence progress: GLM-5's Ascend training is significant strategically, but Ascend efficiency vs NVIDIA is still uncertain. Benchmark Ascend-trained models against NVIDIA-trained models on inference speed and cost to quantify the trade-offs.

Strategic positioning: Western companies that view Chinese models as competition should focus on the dimensions where Western open-source remains strong (code understanding, English reasoning tasks), while ceding multilingual markets and video generation to Chinese competitors. The alternative—trying to compete across all dimensions with Chinese companies that have lower labor costs and unrestricted R&D—is likely unwinnable.