Key Takeaways
- Google's MIRAS framework (February 2026) unifies all sequence models -- Transformers, Mamba, Titans, Engram, linear RNNs -- as associative memory operations varying across four design axes: memory structure, scoring rules, retention rates, and update rules
- MIRAS derived three new architectures (Moneta, Yaad, Memora) by exploring previously unoccupied regions of the design space; some outperform Mamba2 and standard Transformers on benchmarks
- The unification is simultaneously good news (the design space is tractable) and a strategic problem (enterprise architects must choose among five proven approaches with no clear winner for general use)
- GNN-RAG adds a sixth dimension that MIRAS does not address: structured relational reasoning matching GPT-4 on KGQA benchmarks with just a 7B model (8.9-15.5% improvement on multi-hop questions)
- The era of single-architecture AI infrastructure ("Transformers, scaled up") is ending; production systems may require memory ensembles combining Engram (factual), Titans (episodic), GNN-RAG (relational), and Transformer attention (in-window reasoning)
The MIRAS Unification and Why It Creates a New Problem
Since "Attention Is All You Need" in 2017, the AI industry operated under a single architectural assumption: Transformers are the universal foundation. Model differences were parameter count, training data, and fine-tuning strategy -- never architecture. February 2026 breaks this assumption. Five fundamentally different memory architectures have demonstrated production viability, and Google's MIRAS framework reveals they are all points in a common design space.
The MIRAS Theoretical Framework
MIRAS identifies four design axes that define all sequence models:
- Memory architecture structure -- how information is physically stored
- Internal scoring rules -- how relevance is evaluated
- Information overwrite rate -- how old information is displaced
- Memory update rule -- how new information is incorporated
Every major sequence model maps to a specific point in this 4D design space. Transformers use global attention with fixed KV cache. Mamba uses compressed state with selective gating. Titans use dynamic weight updates via surprise gradients. MIRAS even derived three new architectures (Moneta, Yaad, Memora) by exploring previously unoccupied design space regions -- some outperforming established baselines. This is a significant scientific contribution: a theoretical framework that is both descriptive (explaining existing architectures) and generative (producing new ones).
The Five Competing Approaches
1. Engram (DeepSeek V4) -- Static Knowledge via DRAM Hash Lookup
Design space position: O(1) retrieval from fixed external memory. No learning at inference time; knowledge is pre-encoded in hash tables and accessed during the forward pass. Makes 1M token context cost-equivalent to 128K. Best for large-context tasks with stable knowledge requirements (coding, document analysis, enterprise knowledge bases). Limitation: static -- cannot adapt to novel patterns not in the hash table.
2. Titans (Google) -- Dynamic Test-Time Weight Updates via Surprise
Design space position: Adaptive internal memory updated during inference. The neural memory module's weights change when unexpected tokens trigger high-gradient "surprise" signals. A forgetting gate manages capacity. Best for tasks requiring continuous adaptation to novel information (agentic workflows, evolving conversations, real-time data analysis). Limitation: test-time weight updates add inference overhead; not yet validated at GPT-4 scale.
3. ODE Dynamics (Liquid AI LFM2.5) -- Continuous-Time State Evolution
Design space position: Weights are continuous functions of time and input, evolving according to first-order ODEs. No discrete update steps. LFM2.5 achieves robotic domain transfer without retraining, demonstrated in drone navigation in unseen environments. Best for robotics and embodied AI. Limitation: 1.2B parameter scale limits reasoning depth; ODE solver adds computational overhead.
4. Mamba/SSM -- Compressed State with Selective Gating
Design space position: Fixed-size compressed state with input-dependent gating controlling information flow. O(n) in sequence length. Best for efficient long-sequence processing where some information loss is acceptable (audio, genomics, time-series). Limitation: lossy compression means fine-grained retrieval from long contexts is unreliable.
5. Standard Transformer -- Global Attention with KV Cache
Design space position: O(n^2) global attention over all tokens in context. The most expressive but most expensive approach. Best for tasks requiring precise cross-sequence reasoning within the context window (code generation, mathematical proof, creative writing). Limitation: quadratic scaling makes long contexts prohibitively expensive; KV cache is the primary VRAM bottleneck.
The Sixth Dimension: GNN-RAG for Structural Reasoning
Graph Neural Networks add a capability none of the sequence models possess: structured multi-hop reasoning over relational data. GNN-RAG matches GPT-4 on knowledge graph QA benchmarks (WebQSP, CWQ) with just a 7B tuned LLM, achieving 8.9-15.5% point improvements on multi-hop questions. Google's Graph Foundation Models blog (February 10, 2026) validates this as a production direction.
The architectural insight is clean separation of concerns: GNNs handle structural reasoning (how entities relate), while sequence models handle language (what to say about those relationships). This is complementary to, not competitive with, the five memory architectures above. MIRAS addresses a 4D design space, but GNNs occupy an orthogonal dimension -- the full design space is 5+ dimensions.
The Enterprise Selection Decision Matrix
For ML engineers building production systems, MIRAS creates a paradox: it reveals the design space is tractable, but does not specify which point is optimal for any given application. The decision matrix has become genuinely complex:
- Agentic AI with long-running tasks? Titans (adaptive memory) + GNN-RAG (structured reasoning)
- High-throughput coding assistance? Engram (O(1) lookup) or Transformer (precise attention)
- Edge robotics? LFM2.5 (ODE adaptation) + token compression
- Document analysis at million-token scale? Engram (cost-equivalent scaling) or Titans (2M+ validated)
- Healthcare with explainability requirements? GNN-RAG (auditable reasoning paths over structured medical knowledge graphs)
No single architecture wins across all dimensions. This is unprecedented -- since 2017, the answer to "which architecture?" was always "Transformer, scaled up."
The strongest signal from MIRAS is that these architectures are not fundamentally competing -- they address different memory requirements. The A-Mem framework (arXiv:2502.12110) for agentic memory with atomic notes points toward the orchestration layer a memory ensemble would require: typed memory stores served by architecturally appropriate backends, with an agent-level orchestrator routing queries to the appropriate backend.
What This Means for Practitioners
ML architects should begin evaluating workloads against the MIRAS design space dimensions rather than defaulting to Transformer.
What is production-ready today:
- GNN-RAG: production-ready for structured KGQA. If you have a knowledge graph, this is available and validated against GPT-4-level benchmarks with 7B models.
- LFM2.5: production-ready for edge/robotics at 1.2B scale with AMD and Qualcomm NPU integrations.
- Mamba: production-ready for audio, genomics, time-series sequences where information loss is acceptable.
What requires 12-18 months:
- Titans: not yet validated at frontier scale (100B+ parameters). The surprise-gradient test-time memory is architecturally promising but needs production-scale validation.
- Engram: depends on DeepSeek V4 open-weight release timeline. The architecture is described; adoption awaits the released weights.
Architectural debt risk: Companies locked into Transformer-only infrastructure face architectural debt if alternatives prove superior at scale. The risk is not that Transformers become obsolete -- it is that different workloads begin to have different optimal architectures, and teams without multi-architecture evaluation capability will be slower to optimize. Start building the evaluation capability now even if you do not switch architectures immediately. The MIRAS 4D framework gives you the vocabulary to reason about the tradeoffs systematically.
Post-Transformer Architecture Design Space: Memory Type vs Use Case
Six competing architectures mapped to their memory type, key metric, optimal use case, and production readiness
| Scaling | Memory Type | Architecture | Best Use Case | Key Limitation | Production Scale |
|---|---|---|---|---|---|
| O(n^2) | Fixed KV cache | Transformer (Attention) | Precise in-window reasoning | Quadratic cost at length | 100B+ proven |
| O(1) lookup | Static DRAM hash | Engram (DeepSeek V4) | Large-context stable knowledge | Cannot adapt at inference | 1T (claims) |
| O(n) + gradient | Dynamic weight updates | Titans (Google) | Continuous learning agents | Inference overhead | Research only |
| O(n) ODE solve | Continuous-time state | ODE/LNN (Liquid AI) | Robotics/embodied AI | Parameter scale ceiling | 1.2B production |
| O(n) linear | Compressed state | Mamba/SSM | Efficient long sequences | Lossy compression | 7B+ proven |
| O(k) traversal | Graph topology | GNN-RAG | Structured relational reasoning | Requires pre-built KG | 7B + KG proven |
Source: MIRAS framework (Google), GNN-RAG paper, Liquid AI research, DeepSeek Engram paper