$30B Physical AI Bet: JEPA vs VLA vs Spatial — Three Architectures Racing to Lead Robotics

AMI Labs, Physical Intelligence, and World Labs raised $3B+ combined in Q1 2026 — but they disagree fundamentally on architecture. At least two of these bets will fail.

TL;DRBreakthrough 🟢

•AMI Labs (LeCun / JEPA), Physical Intelligence (π₀ VLA), and World Labs (Fei-Fei Li / spatial) represent three mutually incompatible architectures all attracting overlapping capital — yet they cannot all be right.
•V-JEPA 2 achieves 65–80% zero-shot robot success with <strong>43x less training data</strong> than comparable generative models, but AMI Labs projects <em>several years</em> to first commercial product.
•Physical Intelligence's π₀ is the most deployment-proximate: trained across 8 robot platforms, 1–20 hours fine-tuning for new tasks, already open-sourced on GitHub with PyTorch support.
•The $13.8B raised in robotics in 2025 (up 77% from 2024) is partly a data-collection infrastructure bet — robot training data requires physical collection at a cost scale fundamentally different from text scraping.
•NVIDIA Cosmos (2M+ downloads) and Google DeepMind Genie 3 entering the world model space simultaneously signals that large labs are not just funding startups — they are competing directly.

physical aijepaworld modelsrobotics foundation modelphysical intelligence5 min readMar 31, 2026

High Impact📅Long-termML engineers evaluating physical AI platforms should track reliability benchmarks in industrial conditions (>99.5% uptime), training data collection infrastructure each company is building, and API availability timeline. Physical Intelligence's π₀ is the most production-proximate (open-sourced, PyTorch support) but still lacks an enterprise deployment timeline.Adoption: AMI Labs: several years to first commercial product. Physical Intelligence: no disclosed timeline, π₀ open-source enables research now. World Labs: 12-18 months to first B2B integration via Autodesk. Industrial deployment at required reliability standards: 3-5 years.

Cross-Domain Connections

V-JEPA 2 achieves 65-80% zero-shot robot success with 43x less training data than comparable generative models (AMI Labs / LeCun)→Physical Intelligence π₀ requires 1-20 hours of fine-tuning for task adaptation across 8 robot platforms (Physical Intelligence)

Both JEPA and VLA claim dramatic data efficiency relative to prior SOTA, but via different mechanisms: JEPA reduces initial training data requirements, while π₀'s cross-embodiment training reduces adaptation data requirements. They are solving different parts of the data collection bottleneck — which is why they are complementary in thesis but competitive in capital. A JEPA-VLA hybrid may be the actual architecture that wins.

Skild AI reaches $14B valuation the same week Physical Intelligence targets $11B (two robotics foundation model companies above $10B simultaneously)→Robotics sector raised $13.8B in 2025, up 77% from $7.8B in 2024, exceeding the 2021 peak

The dual >$10B valuation week is not a company-specific event — it is the structural peak of a sector-wide capital cycle. The $13.8B in 2025 robotics funding dwarfs the 2021 peak, and that 2021 peak coincided with a subsequent correction. The valuation compression speed (Physical Intelligence doubled from $5.6B to $11B in 4 months) indicates capital allocation velocity that typically precedes either a breakout product or a correction.

NVIDIA Cosmos world model platform surpasses 2 million downloads; Google DeepMind Genie 3 is the large-lab entrant in the world model space→AMI Labs ($1.03B), World Labs ($1B), Physical Intelligence ($11B target) — all independent startups in the same space

Large labs (NVIDIA, Google DeepMind) and well-capitalized startups are simultaneously building in the same physical AI / world model space. Physical AI appears to require both (training at scale + novel architectures) — which means the competitive dynamics are genuinely unclear, unlike the LLM space where OpenAI established an early data/compute lead.

Key Takeaways

AMI Labs (LeCun / JEPA), Physical Intelligence (π₀ VLA), and World Labs (Fei-Fei Li / spatial) represent three mutually incompatible architectures all attracting overlapping capital — yet they cannot all be right.
V-JEPA 2 achieves 65–80% zero-shot robot success with 43x less training data than comparable generative models, but AMI Labs projects several years to first commercial product.
Physical Intelligence's π₀ is the most deployment-proximate: trained across 8 robot platforms, 1–20 hours fine-tuning for new tasks, already open-sourced on GitHub with PyTorch support.
The $13.8B raised in robotics in 2025 (up 77% from 2024) is partly a data-collection infrastructure bet — robot training data requires physical collection at a cost scale fundamentally different from text scraping.
NVIDIA Cosmos (2M+ downloads) and Google DeepMind Genie 3 entering the world model space simultaneously signals that large labs are not just funding startups — they are competing directly.

The Architecture Fork

Three distinct technical bets are attracting overlapping capital pools, yet they are mutually incompatible as the solution to physical AI:

AMI Labs / JEPA (LeCun, $1.03B seed): The most architecturally radical position. Joint Embedding Predictive Architecture abandons autoregressive next-token prediction entirely in favor of predicting outcomes in compressed representation space — learning what matters about how the world changes without reconstructing its surface appearance. VL-JEPA requires 43x less training data than comparable generative models, and the 790M parameter model achieves 2.85x inference speedup via selective decoding. V-JEPA 2 zero-shot robot results (65–80% pick-and-place success on novel objects in new environments, without robot-specific training data) represent the strongest empirical evidence yet that LeCun's 4-year thesis has merit beyond theory. AMI Labs is explicit: several years to first commercial product.

Physical Intelligence / π₀ (Hausman): The Vision-Language-Action (VLA) approach. π₀ uses a PaliGemma 3B VLM backbone combined with a 315M parameter action expert, outputting motor commands at 50Hz via flow matching loss. Critically, it was trained on 10,000+ hours of data from 8 distinct robot platforms — cross-embodiment training that enables zero-shot generalization across hardware types. Performance claim: more than 2x improvement over OpenVLA and Octo baselines. The 1–20 hours of fine-tuning data required for task adaptation is the production feasibility argument. Physical Intelligence is targeting an $11B valuation — doubling from $5.6B in 4 months.

World Labs / Spatial AI (Fei-Fei Li, $1B at ~$5B valuation): The 3D spatial world model approach. Focused on spatial and geometric understanding rather than direct robotics control — the Marble product targets 3D environment generation for design, simulation, and industrial planning. The Autodesk partnership ($200M of the $1B round) signals B2B industrial workflows, not general robotics.

Physical AI Technical Benchmarks (March 2026)

Key performance metrics across the leading physical AI architectures showing where empirical evidence currently stands

65–80%

V-JEPA 2 Zero-Shot Robot Success

▲ Novel objects, new environments

>2x

π₀ vs Prior SOTA (OpenVLA/Octo)

▲ Task success rate improvement

43x

VL-JEPA Training Data Reduction

▼ Less data vs comparable generative models

1–20 hrs

π₀ Fine-tuning Data Required

▼ For new task adaptation

Source: Meta AI Blog / Physical Intelligence blog

The Capital Paradox

The combined capital raise creates a paradox that experienced ML investors should flag: $2.03B was raised in 3 weeks (AMI Labs + World Labs) against architectures with several years to commercialization. Physical Intelligence's $11B valuation target applies to a company with no disclosed revenue after 2 years. Skild AI at $14B in the same week as Physical Intelligence's announcement.

This is not irrational — it reflects a specific VC calculation: the physical AI market is winner-take-most. The company that builds the general-purpose robot foundation model captures the robotics API market the way OpenAI captured the LLM API market. Capital required to establish training data moats is front-loaded. But the math requires at least one of these companies to become OpenAI-scale. The historical base rate for that outcome from any given startup is low.

The NVIDIA Cosmos validation is instructive: 2M+ downloads of an open-source world model platform from NVIDIA indicates industrial demand is real. But NVIDIA's entry also signals that large platforms are competing in the world model space — not just funding startups from the sidelines.

Physical AI Capital Concentration — Q1 2026 ($M raised)

Total capital raised by leading physical AI companies and platforms in Q1 2026, illustrating the sector-wide investment surge

Source: TechCrunch / Bloomberg / Crunchbase

The V-JEPA 2 Reliability Gap

A critical limitation rarely discussed in coverage: V-JEPA 2's 65–80% zero-shot success rate is the primary empirical claim justifying AMI Labs' $3.5B pre-money valuation. Industrial reliability standards are typically >99.5% uptime — the gap between 72.5% (mid-range of V-JEPA 2) and 99.5% (industrial threshold) is not a rounding error. It is 40 percentage points of reliability improvement required before deployment. Physical Intelligence faces the same problem on the production reliability axis.

Neither architecture is deployable in enterprise robotics at required reliability standards today. The valuation premiums are therefore bets on which architecture reaches 99.5% first, not on current demonstrated performance.

The Cross-Embodiment Training Moat

The genuine technical differentiation that may matter: Physical Intelligence's cross-embodiment training across 8 robot platforms is a data moat that is hard to replicate. A model trained on 10,000+ hours of data from diverse robot hardware generalizes to new hardware types in ways that robot-specific models cannot. This is the ImageNet moment for robotics data: whoever builds the largest, most diverse robot training corpus owns the foundation model.

This explains why the valuations are leading the commercialization timeline: investors are not paying for current revenue but for the right to own the robotics training data ecosystem once one of these approaches achieves sufficient reliability.

Contrarian Perspective

What the bulls are missing: The data collection problem for physical AI is fundamentally different from text/image scraping. Language models were trained on trillions of tokens of freely available internet data. Robot training data requires expensive physical collection infrastructure — sensor arrays, robot arms, controlled environments, safety protocols. The $13.8B raised in robotics in 2025 is partly going toward data collection infrastructure, not just model training. This creates a capital intensity that scales differently from software-only AI.

What the bears are missing: The V-JEPA 2 zero-shot generalization results are genuinely novel. If JEPA's data efficiency claims (43x reduction) hold at scale, AMI Labs could reach production-quality reliability faster than the current success rate suggests, because the training data requirements are dramatically lower than VLA alternatives. A JEPA-LLM hybrid (LeCun himself has indicated this is likely the production path) may be the actual architecture that wins — making the JEPA vs. VLA framing a false dichotomy.

What This Means for Practitioners

ML engineers evaluating physical AI platforms should track three distinct deployment proxies: (1) reliability benchmarks in industrial conditions (>99.5% uptime threshold, not research demo success rates); (2) training data collection infrastructure each company is building (the real moat); and (3) API availability timeline.

Physical Intelligence's π₀ is the most production-proximate today — open-sourced on GitHub with PyTorch support. AMI Labs/JEPA offers higher data efficiency but projects several years to product. World Labs' Autodesk partnership suggests 12–18 months to first B2B integration. Industrial deployment at required reliability standards: 3–5 years for any of these approaches.

For hardware providers and cloud GPU vendors, all three architecture camps require significant compute — NVIDIA's Cosmos download figures suggest they are already positioned as the infrastructure winner regardless of which model architecture prevails.