Meta's TRIBE v2 Validates JEPA Representations Over Language Models for Brain Encoding

Meta FAIR's TRIBE v2 brain encoding model demonstrates that V-JEPA2 representations predict human neural activity better than text-centric language models. This independent validation of Yann LeCun's JEPA thesis comes as $2B+ in funding bets against LLM-centric AI through AMI Labs and World Labs.

TL;DRBreakthrough 🟢

•TRIBE v2 scales from 1,000 to 70,000 brain regions, trained on 1,117 hours of fMRI data from 720 subjects, demonstrating that multimodal brain encoding follows log-linear scaling laws with no plateau observed
•Zero-shot predictions using V-JEPA2 representations outperform individual human fMRI recordings in group-averaged brain response validation — a counter-intuitive inversion suggesting AI models capture more canonical neural signals than biological noise
•TRIBE v2 uses V-JEPA2 (not a language model) as its video feature extractor, along with LLaMA 3.2 for text and Wav2Vec-BERT for audio, showing multimodal architectures don't require language models as the dominant backbone
•LeCun's AMI Labs raised $1.03B and Fei-Fei Li's World Labs raised $1B in three weeks — $2B+ in capital now explicitly betting that JEPA-style world models, not LLM scaling, represent the path to advanced AI
•V-JEPA2 demonstrates zero-shot robot planning with just 62 hours of training data — sample efficiency that pure language models cannot match for physical world tasks

TRIBE v2JEPAbrain encodingneuroscience AILeCun6 min readMar 28, 2026

High ImpactMedium-termML engineers in robotics, BCI, and multimodal perception should evaluate V-JEPA2 as a feature backbone. For NLP/code tasks, LLMs remain optimal. The paradigm shift is domain-specific, not universal. TRIBE v2 demonstrates sample efficiency gains for perception tasks that scale to real-world physical AI applications.Adoption: TRIBE v2 code and weights available now for non-commercial use. AMI Labs first product expected within 12-18 months. Practical JEPA-based tooling for production robotics: 2-3 years.

Cross-Domain Connections

TRIBE v2 uses V-JEPA2 as video feature extractor, outperforms individual fMRI recordings→LeCun's AMI Labs raises $1.03B to build JEPA-based world models for robotics

Meta's own research team provides empirical ammunition for their departed chief scientist's startup thesis. V-JEPA2 representations align with human neural processing in ways that support LeCun's claim that JEPA captures 'what matters' rather than raw prediction.

V-JEPA2 zero-shot robot planning with 62 hours of training data→China's national humanoid robot standard with 140+ manufacturers and 330+ models

Sample-efficient world models (JEPA) combined with China's mass manufacturing capability create a convergence point where Chinese robotics companies adopt Western-developed perception architectures for their hardware platforms.

NVIDIA invested in AMI Labs (anti-LLM world models)→NVIDIA Rubin platform optimized for MoE LLM inference at 10x cost reduction

NVIDIA is simultaneously selling picks to both sides of the paradigm debate — Rubin for LLM inference scaling AND investment in AMI Labs for post-LLM world models. Their compute monopoly is paradigm-agnostic.

Key Takeaways

TRIBE v2 scales from 1,000 to 70,000 brain regions, trained on 1,117 hours of fMRI data from 720 subjects, demonstrating that multimodal brain encoding follows log-linear scaling laws with no plateau observed
Zero-shot predictions using V-JEPA2 representations outperform individual human fMRI recordings in group-averaged brain response validation — a counter-intuitive inversion suggesting AI models capture more canonical neural signals than biological noise
TRIBE v2 uses V-JEPA2 (not a language model) as its video feature extractor, along with LLaMA 3.2 for text and Wav2Vec-BERT for audio, showing multimodal architectures don't require language models as the dominant backbone
LeCun's AMI Labs raised $1.03B and Fei-Fei Li's World Labs raised $1B in three weeks — $2B+ in capital now explicitly betting that JEPA-style world models, not LLM scaling, represent the path to advanced AI
V-JEPA2 demonstrates zero-shot robot planning with just 62 hours of training data — sample efficiency that pure language models cannot match for physical world tasks

The Brain Outperformed by AI: Zero-Shot Predictions Beat Human Measurements

Meta FAIR released TRIBE v2 (Transformer for In-Silico Brain Encoding), a foundation model that predicts how human brains respond to video, audio, and text simultaneously. The model was trained on 451.6 hours of fMRI data from 25 subjects and evaluated on 1,117.7 hours from 720 subjects — a scale that exceeds any prior brain encoding work.

The most striking result is counter-intuitive: TRIBE v2's zero-shot predictions of group-averaged brain responses are more accurate than individual human fMRI recordings. This is not a trivial statistical quirk. Individual fMRI scans contain substantial biological noise — scanner drift, subject motion, neural variability unrelated to stimulus. The model, trained on the aggregate of thousands of measurements, learns a canonical neural response that is cleaner than any single noisy measurement. This inversion of ground truth raises fundamental methodological questions: if the AI model produces a more accurate representation of human neural processing than actual human measurements, what does 'ground truth' mean in neuroscience?

The JEPA Architecture Choice: Video Over Language

The technical composition of TRIBE v2 is deliberate. The model combines three frozen pretrained feature extractors:

V-JEPA2 for video — Yann LeCun's vision-based architecture
LLaMA 3.2 for text — standard language model
Wav2Vec-BERT for audio — speech-to-representation model

These feed into a unified Transformer that maps to 20,484 cortical vertices and 8,802 subcortical voxels across the entire brain. The key choice: V-JEPA2, not a vision-language model or text-dominant architecture, serves as the video backbone.

This is significant because LeCun recently departed Meta to found AMI Labs, where he is explicitly building JEPA-based world models as an alternative to LLM-centric AI. TRIBE v2, published by his former team at Meta, provides empirical ammunition for his thesis: JEPA representations learned without language modeling capture something fundamental about human neural processing that pure language models miss.

To be clear, this does not prove JEPA is superior for all tasks. Language models are optimal for text-heavy reasoning. But TRIBE v2 demonstrates that for multimodal perception aligned with human biology, JEPA-style representations may be the preferred backbone.

Scaling Laws and No Plateau: 70,000 Brain Regions

TRIBE v2 scales from the original TRIBE v1's 1,000 brain regions to 70,000 — a 70x increase in spatial resolution. Subject count scaled from 4 to 720. The model follows log-linear scaling laws where prediction accuracy improves steadily as fMRI training data increases, with no performance plateau currently observed.

This matters because it means brain encoding models, like other foundation models, may benefit from continued scaling. More fMRI data + larger models = better brain predictions. The implication is that brain encoding could become a major research area attracting significant compute investment — a new benchmark for multimodal AI evaluation.

TRIBE v2 Scale: 70x Increase in Brain Region Coverage

Scaling from original TRIBE v1 to v2 across brain regions and subjects

Source: Meta FAIR, MarkTechPost March 2026

The $1B Bet: AMI Labs and World Labs Funding Convergence

The timing of TRIBE v2's release (March 2026) coincides with two massive funding rounds explicitly betting against LLM-centric AI:

AMI Labs (Yann LeCun): Raised $1.03B at a $3.5B pre-money valuation. The founding team includes Saining Xie (inventor of the Diffusion Transformer architecture behind OpenAI's Sora) and Alexandre LeBrun (founder of Wit.ai, acquired by Facebook in 2018). Investors include NVIDIA, Jeff Bezos personally, Samsung, and Toyota — an unusual coalition that signals serious industrial backing for world models.

World Labs (Fei-Fei Li): Closed $1B on February 18, 2026 for 3D world understanding models. Li is a co-founder of Human-Centered AI (HAI) at Stanford and one of the most respected researchers in the field.

Together, two Turing-class researchers (LeCun and Li) raised $2B+ in three weeks, explicitly positioning their funding as opposition to LLM-centric AI. This is not a niche investment — it is institutional capital recognizing that the frontier of AI may lie in world models and physical intelligence, not pure language scaling.

Capital Flowing Into Post-LLM AI Paradigms (Feb-Mar 2026)

Major funding rounds explicitly betting against LLM-centric AI in a single month

$1.03B

AMI Labs (LeCun)

▲ JEPA world models

$1.0B

World Labs (Fei-Fei Li)

▲ 3D world understanding

$2.03B

Combined Anti-LLM Capital

▲ 3 weeks

62 hours

V-JEPA2 Training Data

▼ Zero-shot robot planning

Source: TechCrunch, Crunchbase, Latent Space March 2026

V-JEPA2 Zero-Shot Robot Planning: 62 Hours of Data

One of the most compelling pieces of evidence that JEPA-style representations are more sample-efficient than language models comes from AMI Labs' early V-JEPA2 results: the model demonstrated zero-shot robot planning capabilities using only 62 hours of training data — roughly 2.5 days of continuous video.

Language models require billions of tokens and months of compute to achieve useful reasoning. V-JEPA2 achieves useful physical task planning with a weekend's worth of video. This sample efficiency gap is orders of magnitude. For applications in robotics, manufacturing, and physical simulation, JEPA-style architectures may be fundamentally more practical than scaling language models to multimodal tasks.

Open-Source Release: Non-Commercial Licensing Bifurcates Research

Meta open-sourced TRIBE v2 under a non-commercial license. Code, weights, and a live demo are available at github.com/facebookresearch/tribev2. This democratizes brain encoding research for academics but creates a commercial constraint: healthcare companies and brain-computer interface startups will need proprietary licensing discussions with Meta to commercialize TRIBE v2-based applications.

This licensing strategy is strategic. It accelerates research in a nascent domain (in-silico neuroscience) while reserving commercial upside for Meta. Healthcare applications of brain encoding — BCI development, pharmaceutical neuroscience, neurological diagnostics — represent substantial market opportunities. Meta is positioning itself as the essential foundation layer for this domain.

What This Means for Practitioners

For ML engineers in robotics and physical AI: JEPA-style representations may become the preferred backbone for any application requiring human-aligned perception. V-JEPA2's sample efficiency suggests that building perception systems for robotics using JEPA architectures could be significantly faster and cheaper than attempting the same with language models. If you are building robot learning systems, evaluating V-JEPA2 should be on your roadmap.

For neuroscience researchers: TRIBE v2 enables in-silico experiments that previously required recruiting human subjects for fMRI studies. This compresses the timeline for brain-computer interface development and hypothesis testing. Researchers can now validate neural encoding theories at compute cost rather than participant recruitment cost. The non-commercial license means academic use is free, but commercialization will require licensing negotiation.

For teams building multimodal AI systems: TRIBE v2 demonstrates that dominant language model backbones are not necessary for multimodal tasks. Vision (V-JEPA2) + audio (Wav2Vec-BERT) + text (LLaMA) as independent frozen components can outperform end-to-end language-centric fusion. This suggests architectural diversity may improve performance in domains where language is not the primary signal.

For investors and strategists: The convergence of $2B+ in funding (AMI Labs + World Labs) betting against LLM-centric paradigms signals institutional recognition that the frontier of AI may involve different computational paradigms for different domains. LLMs remain optimal for reasoning and language tasks. But JEPA-style world models may become the dominant compute workload in physical AI, robotics, and autonomous systems — a separate, potentially larger market than language applications.