Vision-Language-Action Driving

This research direction studies vision-language-action (VLA) models for autonomous driving, with an emphasis on how language, perception, and control interact in closed-loop driving systems. Our goal is to develop robust and interpretable driving agents that can follow instructions, reason over driving scenes, and generalize under diverse linguistic and environmental conditions.

The project connects multimodal learning, embodied driving systems, and diagnostic evaluation for autonomous vehicles. It includes our ICR-Drive line of work on instruction counterfactual robustness in language-conditioned driving, and more broadly supports ongoing research on language-grounded driving, behavior understanding, and robust evaluation of VLA systems.

Project Page