A Survey of World Models for Autonomous Driving

T Feng, W Wang, Y Yang - arxiv preprint arxiv:2501.11260, 2025 - arxiv.org
Recent breakthroughs in autonomous driving have revolutionized the way vehicles perceive
and interact with their surroundings. In particular, world models have emerged as a linchpin …

Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment

R Tian, Y Wu, C Xu, M Tomizuka, J Malik… - arxiv preprint arxiv …, 2024 - arxiv.org
Visuomotor robot policies, increasingly pre-trained on large-scale datasets, promise
significant advancements across robotics domains. However, aligning these policies with …

Distilling Multi-modal Large Language Models for Autonomous Driving

D Hegde, R Yasarla, H Cai, S Han… - arxiv preprint arxiv …, 2025 - arxiv.org
Autonomous driving demands safe motion planning, especially in critical" long-tail"
scenarios. Recent end-to-end autonomous driving systems leverage large language models …

V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models

H Chiu, R Hachiuma, CY Wang, SF Smith… - arxiv preprint arxiv …, 2025 - arxiv.org
Current autonomous driving vehicles rely mainly on their individual sensors to understand
surrounding scenes and plan for future trajectories, which can be unreliable when the …

LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models

MN Azadani, J Riddell, S Sedwards… - arxiv preprint arxiv …, 2025 - arxiv.org
Enhanced visual understanding serves as a cornerstone for multimodal large language
models (MLLMs). Recent hybrid MLLMs incorporate a mixture of vision experts to address …

MTA: Multimodal Task Alignment for BEV Perception and Captioning

Y Ma, B Yaman, X Ye, F Tao, A Mallik, Z Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Bird's eye view (BEV)-based 3D perception plays a crucial role in autonomous driving
applications. The rise of large language models has spurred interest in BEV-based …

World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving

M Zhai, C Li, Z Guo, N Yang, X Qin, Y Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
The Multi-modal Large Language Models (MLLMs) with extensive world knowledge have
revitalized autonomous driving, particularly in reasoning tasks within perceivable regions …

Scaling LLM Pre-training with Vocabulary Curriculum

F Yu - arxiv preprint arxiv:2502.17910, 2025 - arxiv.org
Modern language models rely on static vocabularies, fixed before pretraining, in contrast to
the adaptive vocabulary acquisition observed in human language learning. To bridge this …