Key Takeaways
- AMI Labs (LeCun / JEPA), Physical Intelligence (π₀ VLA), and World Labs (Fei-Fei Li / spatial) represent three mutually incompatible architectures all attracting overlapping capital — yet they cannot all be right.
- V-JEPA 2 achieves 65–80% zero-shot robot success with 43x less training data than comparable generative models, but AMI Labs projects several years to first commercial product.
- Physical Intelligence's π₀ is the most deployment-proximate: trained across 8 robot platforms, 1–20 hours fine-tuning for new tasks, already open-sourced on GitHub with PyTorch support.
- The $13.8B raised in robotics in 2025 (up 77% from 2024) is partly a data-collection infrastructure bet — robot training data requires physical collection at a cost scale fundamentally different from text scraping.
- NVIDIA Cosmos (2M+ downloads) and Google DeepMind Genie 3 entering the world model space simultaneously signals that large labs are not just funding startups — they are competing directly.
The Architecture Fork
Three distinct technical bets are attracting overlapping capital pools, yet they are mutually incompatible as the solution to physical AI:
AMI Labs / JEPA (LeCun, $1.03B seed): The most architecturally radical position. Joint Embedding Predictive Architecture abandons autoregressive next-token prediction entirely in favor of predicting outcomes in compressed representation space — learning what matters about how the world changes without reconstructing its surface appearance. VL-JEPA requires 43x less training data than comparable generative models, and the 790M parameter model achieves 2.85x inference speedup via selective decoding. V-JEPA 2 zero-shot robot results (65–80% pick-and-place success on novel objects in new environments, without robot-specific training data) represent the strongest empirical evidence yet that LeCun's 4-year thesis has merit beyond theory. AMI Labs is explicit: several years to first commercial product.
Physical Intelligence / π₀ (Hausman): The Vision-Language-Action (VLA) approach. π₀ uses a PaliGemma 3B VLM backbone combined with a 315M parameter action expert, outputting motor commands at 50Hz via flow matching loss. Critically, it was trained on 10,000+ hours of data from 8 distinct robot platforms — cross-embodiment training that enables zero-shot generalization across hardware types. Performance claim: more than 2x improvement over OpenVLA and Octo baselines. The 1–20 hours of fine-tuning data required for task adaptation is the production feasibility argument. Physical Intelligence is targeting an $11B valuation — doubling from $5.6B in 4 months.
World Labs / Spatial AI (Fei-Fei Li, $1B at ~$5B valuation): The 3D spatial world model approach. Focused on spatial and geometric understanding rather than direct robotics control — the Marble product targets 3D environment generation for design, simulation, and industrial planning. The Autodesk partnership ($200M of the $1B round) signals B2B industrial workflows, not general robotics.
Physical AI Technical Benchmarks (March 2026)
Key performance metrics across the leading physical AI architectures showing where empirical evidence currently stands
Source: Meta AI Blog / Physical Intelligence blog
The Capital Paradox
The combined capital raise creates a paradox that experienced ML investors should flag: $2.03B was raised in 3 weeks (AMI Labs + World Labs) against architectures with several years to commercialization. Physical Intelligence's $11B valuation target applies to a company with no disclosed revenue after 2 years. Skild AI at $14B in the same week as Physical Intelligence's announcement.
This is not irrational — it reflects a specific VC calculation: the physical AI market is winner-take-most. The company that builds the general-purpose robot foundation model captures the robotics API market the way OpenAI captured the LLM API market. Capital required to establish training data moats is front-loaded. But the math requires at least one of these companies to become OpenAI-scale. The historical base rate for that outcome from any given startup is low.
The NVIDIA Cosmos validation is instructive: 2M+ downloads of an open-source world model platform from NVIDIA indicates industrial demand is real. But NVIDIA's entry also signals that large platforms are competing in the world model space — not just funding startups from the sidelines.
Physical AI Capital Concentration — Q1 2026 ($M raised)
Total capital raised by leading physical AI companies and platforms in Q1 2026, illustrating the sector-wide investment surge
Source: TechCrunch / Bloomberg / Crunchbase
The V-JEPA 2 Reliability Gap
A critical limitation rarely discussed in coverage: V-JEPA 2's 65–80% zero-shot success rate is the primary empirical claim justifying AMI Labs' $3.5B pre-money valuation. Industrial reliability standards are typically >99.5% uptime — the gap between 72.5% (mid-range of V-JEPA 2) and 99.5% (industrial threshold) is not a rounding error. It is 40 percentage points of reliability improvement required before deployment. Physical Intelligence faces the same problem on the production reliability axis.
Neither architecture is deployable in enterprise robotics at required reliability standards today. The valuation premiums are therefore bets on which architecture reaches 99.5% first, not on current demonstrated performance.
The Cross-Embodiment Training Moat
The genuine technical differentiation that may matter: Physical Intelligence's cross-embodiment training across 8 robot platforms is a data moat that is hard to replicate. A model trained on 10,000+ hours of data from diverse robot hardware generalizes to new hardware types in ways that robot-specific models cannot. This is the ImageNet moment for robotics data: whoever builds the largest, most diverse robot training corpus owns the foundation model.
This explains why the valuations are leading the commercialization timeline: investors are not paying for current revenue but for the right to own the robotics training data ecosystem once one of these approaches achieves sufficient reliability.
Contrarian Perspective
What the bulls are missing: The data collection problem for physical AI is fundamentally different from text/image scraping. Language models were trained on trillions of tokens of freely available internet data. Robot training data requires expensive physical collection infrastructure — sensor arrays, robot arms, controlled environments, safety protocols. The $13.8B raised in robotics in 2025 is partly going toward data collection infrastructure, not just model training. This creates a capital intensity that scales differently from software-only AI.
What the bears are missing: The V-JEPA 2 zero-shot generalization results are genuinely novel. If JEPA's data efficiency claims (43x reduction) hold at scale, AMI Labs could reach production-quality reliability faster than the current success rate suggests, because the training data requirements are dramatically lower than VLA alternatives. A JEPA-LLM hybrid (LeCun himself has indicated this is likely the production path) may be the actual architecture that wins — making the JEPA vs. VLA framing a false dichotomy.
What This Means for Practitioners
ML engineers evaluating physical AI platforms should track three distinct deployment proxies: (1) reliability benchmarks in industrial conditions (>99.5% uptime threshold, not research demo success rates); (2) training data collection infrastructure each company is building (the real moat); and (3) API availability timeline.
Physical Intelligence's π₀ is the most production-proximate today — open-sourced on GitHub with PyTorch support. AMI Labs/JEPA offers higher data efficiency but projects several years to product. World Labs' Autodesk partnership suggests 12–18 months to first B2B integration. Industrial deployment at required reliability standards: 3–5 years for any of these approaches.
For hardware providers and cloud GPU vendors, all three architecture camps require significant compute — NVIDIA's Cosmos download figures suggest they are already positioned as the infrastructure winner regardless of which model architecture prevails.