Pipeline Active
Last: 15:00 UTC|Next: 21:00 UTC
← Back to Insights

Qwen 3.5 DeltaNet Breakthrough May Outlive Its Creator Team After Sudden Exodus

Alibaba's Qwen 3.5 9B achieved a 13x efficiency breakthrough through the Gated DeltaNet architecture, outperforming models 120B in size. Within 24 hours of release, the founding team resigned, creating an 'orphan architecture' risk pattern. Independent convergence between Qwen and Kimi teams on the same 3:1 linear-to-full attention ratio suggests this is industry standard regardless.

TL;DRNeutral
  • Qwen 3.5 9B achieves GPQA Diamond 81.7 vs GPT-OSS-120B's 71.5 — a 9B model outperforming a model 13x its size on graduate-level reasoning through architectural innovation alone
  • Gated DeltaNet uses a 3:1 linear-to-full attention ratio (60 layers as repeating 4-block units: 3 linear attention + 1 full softmax), reducing memory complexity for long sequences while preserving precision for critical reasoning
  • Lin Junyang (Qwen tech lead, youngest P10 at Alibaba at age 32) resigned the day after release with 'bye my beloved qwen'; three other senior architects departed simultaneously
  • Alibaba stock fell 5.3% on leadership exodus, the steepest intraday drop since October 2025
  • The 3:1 linear-to-full ratio was independently discovered by both Qwen and Kimi teams without collaboration — this independent convergence is a strong signal of empirical optimality that will become industry standard within 12 months
Qwen 3.5DeltaNetlinear attentionAlibabaefficiency6 min readMar 28, 2026
MediumMedium-termThe 3:1 linear-to-full attention ratio is immediately actionable for ML engineers training custom models. Qwen 3.5 Small (0.8B-9B) is production-ready for on-device deployment under Apache 2.0. Monitor Alibaba's product signals carefully given organizational restructuring.Adoption: Qwen 3.5 Small available now with video understanding. DeltaNet-style architectures will appear in other model families within 3-6 months. Edge AI chip integration (Apple, Qualcomm) within 12-18 months.

Cross-Domain Connections

Qwen 3.5 9B outperforms 120B model through Gated DeltaNet with 3:1 linear-to-full ratioKimi independently discovered same 3:1 ratio without collaboration

Independent convergence on the same architectural ratio is the strongest possible evidence this is an empirical optimum. It will become the industry-standard hybrid attention pattern within 12 months, regardless of which team implements it.

Lin Junyang and 3 senior architects resign from Alibaba within 24 hours of Qwen 3.5 releaseAMI Labs recruits Saining Xie (DiT inventor) from Google DeepMind for JEPA world models

Top AI talent is flowing from Big Tech to startups across both US and China simultaneously. The pattern is researchers choosing architectural freedom over corporate revenue optimization, creating a generation of 'architecture refugee' startups.

TurboQuant 6x compression applicable to any model without retrainingQwen 3.5 9B already 13x more efficient than comparable models by architecture alone

Stacking architectural efficiency (DeltaNet, 13x) with inference compression (TurboQuant, 6x) creates 78x efficiency multiplier total. This is how frontier-class reasoning reaches smartphones and IoT devices within 18 months.

Key Takeaways

  • Qwen 3.5 9B achieves GPQA Diamond 81.7 vs GPT-OSS-120B's 71.5 — a 9B model outperforming a model 13x its size on graduate-level reasoning through architectural innovation alone
  • Gated DeltaNet uses a 3:1 linear-to-full attention ratio (60 layers as repeating 4-block units: 3 linear attention + 1 full softmax), reducing memory complexity for long sequences while preserving precision for critical reasoning
  • Lin Junyang (Qwen tech lead, youngest P10 at Alibaba at age 32) resigned the day after release with 'bye my beloved qwen'; three other senior architects departed simultaneously
  • Alibaba stock fell 5.3% on leadership exodus, the steepest intraday drop since October 2025
  • The 3:1 linear-to-full ratio was independently discovered by both Qwen and Kimi teams without collaboration — this independent convergence is a strong signal of empirical optimality that will become industry standard within 12 months
  • Qwen models have been downloaded over 1 billion times; the Alibaba AI app reached 203M MAU in February 2026, putting massive adoption at risk if the departure signals stagnation

The Efficiency Breakthrough: 13x Scaling Breakthrough

On March 3, 2026, Alibaba released Qwen 3.5 Small (0.8B-9B models) with a remarkable result: the 9B model achieved GPQA Diamond 81.7%, outperforming the previous generation's 80B model (77.2%) and directly competing with GPT-OSS-120B (71.5%). This is not marginal improvement — it is a 13x efficiency gain.

The mechanism is Gated DeltaNet, a hybrid attention architecture that combines:

  • Linear attention layers for efficient long-context processing with O(n) memory complexity
  • Full softmax attention for precision-critical reasoning steps
  • Mixture-of-Experts (MoE) layers within each attention module for task-specific computation gating

The specific ratio is 3:1 — three linear attention blocks followed by one full softmax block, repeated across 60 total layers. This enables the model to process long sequences efficiently while reserving full attention for the tasks where it matters most.

The practical implications are immediate: a 0.8B Qwen model can now process video input — the first mobile-class model with video understanding. A 9B model can serve as a cost-effective replacement for 70B+ models in many reasoning tasks. The cost differential is massive: inference at ~1/130th the compute of a 120B model.

Qwen 3.5 9B vs Much Larger Models: Graduate-Level Reasoning

A 9B model outperforming models 13x its size demonstrates the DeltaNet efficiency breakthrough

Source: StableLearn, DataCamp March 2026

The Talent Exodus: 24 Hours After Breakthrough

On March 4, 2026 — one day after the Qwen 3.5 release — Lin Junyang, the technical lead and one of the most respected AI researchers in China, announced his resignation via Weibo with a cryptic message: 'bye my beloved qwen.' Three other senior architects departed simultaneously.

Lin Junyang is notable not for age (32, youngest P10 at Alibaba) but for technical vision. Under his leadership, Qwen became the world's most prolific open-source model series — over 1 billion downloads across 400+ model variants. His departure triggered immediate market reaction: Alibaba stock fell 5.3%, the steepest intraday drop since October 2025.

The official explanation was reorganization: Alibaba replaced its autonomous R&D structure with 'horizontal modularization' and hired Zhou Hao from Google Gemini as a replacement. In practice, this reads like a structural pivot from research velocity to enterprise revenue optimization — exactly the opposite direction from where DeltaNet's breakthrough suggests the industry should be heading.

Independent Convergence Suggests Empirical Optimality

One of the strongest indicators that the 3:1 linear-to-full attention ratio is more than an Alibaba innovation is that it converged independently. Kimi's research team, competing against Qwen, discovered the same 3:1 ratio without collaboration or knowledge of Qwen's approach. This independent convergence is a classic signal in science that the result is not an artifact of a particular team's choices but an empirical optimum.

This has major implications: the 3:1 ratio will likely become the industry standard within 12 months, adopted across OpenAI, Google, Anthropic, and open-source communities. Architecture efficiency breakthroughs usually take 18-24 months to propagate; independent convergence accelerates this to 6-12 months.

It also means the Gated DeltaNet innovation will survive the Qwen talent exodus. Even if Alibaba struggles to innovate further, the broader research community will adopt, refine, and extend the 3:1 principle. The architecture itself is decoupled from Alibaba's organizational fate.

Gated DeltaNet Architecture: The 3:1 Ratio

Independent discovery by Qwen and Kimi teams suggests empirical optimality

3 of 4
Linear Attention Blocks
Per layer group (60 total)
1 of 4
Full Softmax Blocks
Precision-critical reasoning
2
Discovered by Teams
Qwen & Kimi independently
13x
Model Size Reduction
vs 120B baseline

Source: Alibaba Qwen, Kimi Research March 2026

The Orphan Architecture Pattern: Third Chinese AI Exodus

Lin Junyang's departure is part of a larger pattern. Over the past 12 months, China has experienced three major AI talent exoduses:

  • Zhipu (ChatGLM) — key researchers departing to found or join competing ventures
  • Moonshot (Kimi) — structural reorganization followed by leadership changes
  • Alibaba (Qwen) — founding team departure one day after major breakthrough

The pattern is consistent: breakthrough innovation → organizational restructuring → key talent exits. This suggests a fundamental misalignment between research-driven culture (which produces DeltaNet breakthroughs) and enterprise-revenue-driven culture (which prioritizes deployment and customer metrics over continued innovation).

The broader implication is that Chinese AI talent is increasingly flowing toward independent ventures and startups. The departing Qwen team will likely surface at DeepSeek, ByteDance's AI initiatives, or entirely new startups within 3-6 months. The breakthroughs will continue, but decoupled from Alibaba's official product roadmap.

Architecture Survival Through Open Source

Crucially, Qwen 3.5 Small is released under Apache 2.0, meaning the Gated DeltaNet weights and code are publicly available. Even if Alibaba completely abandons the architecture, it cannot disappear:

  • Apple, Qualcomm, MediaTek can integrate DeltaNet into edge AI chips without Alibaba's involvement
  • Open-source communities (Hugging Face, Ollama) can create production-optimized versions
  • Competing labs (ByteDance, iFlytek, Baidu) can fork and improve the architecture
  • Startups can build products using DeltaNet as the foundation

This is the power of open-source: an architecture breakthrough survives its creator team's departure through decentralized adoption. The irony is that Alibaba's talent exodus, combined with open-source licensing, actually accelerates DeltaNet's adoption beyond what a centralized Alibaba team could achieve.

Stacking With TurboQuant: 78x Efficiency Multiplier

The DeltaNet breakthrough does not exist in isolation. It compounds multiplicatively with other efficiency improvements:

  • Gated DeltaNet achieves 13x efficiency through architecture (9B model = 120B performance)
  • TurboQuant adds 6x compression through KV-cache quantization, training-free
  • Theoretical compound multiplier = 13x × 6x = 78x total efficiency

This means a 0.8B Qwen model with 6x KV-cache compression could serve performance equivalent to a 62B model from 2024, running on a single consumer GPU with 24GB VRAM. This is the threshold where frontier-class reasoning reaches smartphones and IoT devices within 18 months.

What This Means for Practitioners

For ML engineers building efficient models: The 3:1 linear-to-full attention ratio is immediately actionable. Whether you are training custom models or fine-tuning existing ones, experiment with this ratio. It applies to any transformer-based architecture and may provide 2-3x efficiency improvements. Start with your existing models and gradually phase in linear attention blocks.

For edge AI teams: Qwen 3.5 0.8B with video understanding is production-ready today. Download the Apache 2.0-licensed weights and integrate into your on-device pipelines. The 0.8B model runs on mobile hardware that previously required cloud inference.

For teams evaluating Alibaba products: Monitor licensing and product development signals carefully. The organizational restructuring and talent exodus suggest uncertainty in Alibaba's AI strategy. The DeltaNet architecture is locked in (open-source), but Qwen's product trajectory is less certain. Plan for Qwen compatibility but do not bet your entire architecture on Alibaba's continued innovation.

For chip designers (Apple, Qualcomm, MediaTek): The Gated DeltaNet architecture is now the specification for next-generation edge AI silicon. If you are designing on-device AI accelerators, optimize for the 3:1 linear-to-full attention pattern. This will be the industry-standard architecture within 12 months.

For Chinese AI researchers: The departure of Lin Junyang and the Qwen team represents a broader capital formation opportunity. Expect these researchers to raise funding within 3-6 months for architecture-focused startups. The talent is in the market.

Share