The 31B Model That Beat a 109B Model: Dense Defeats MoE

Google's Gemma 4 31B dense model outperforms Meta's Llama 4 Scout 109B Mixture-of-Experts on AIME (+0.9%), LiveCodeBench (+2.9%), and GPQA Diamond (+2.0%) while requiring only 20GB of VRAM versus Scout's 70-80GB. The victory challenges the industry consensus that MoE is the inevitable scaling path. Maverick's quantization failures on critical layers and Behemoth's unreplicable 2T teacher model expose MoE's three-sided vulnerability: deployment complexity, ecosystem immaturity, and training recipe advantages defeating raw parameter count.

model-architecturemoegemma-4llamainference-efficiency1 min readApr 13, 2026