𝜋-Drive

Adapting a Manipulation VLA for Driving with Group-Relative Flow RL.

Open self-driving VLAs run well below real-time control rates — Alpamayo-R1 inferences at only ~0.1 Hz on a Jetson AGX Thor. We post-train 𝜋_0.5, a 10.8 Hz manipulation VLA, into a single-camera driving policy with behavior cloning and Flow-GRPO, beating front-only Alpamayo-R1 on ADE/FDE at ~3× smaller and ~20× the speed.

Weights, code, demo videos, and full report drop this weekend. For now, the CS 224R poster has the methodology and results.

0:000:00