Multi-modal data-efficient 3d scene understanding for autonomous driving

L Kong, X Xu, J Ren, W Zhang, L Pan… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Efficient data utilization is crucial for advancing 3D scene understanding in autonomous
driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully …

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …

Understanding World or Predicting Future? A Comprehensive Survey of World Models

J Ding, Y Zhang, Y Shang, Y Zhang, Z Zong… - arxiv preprint arxiv …, 2024 - arxiv.org
The concept of world models has garnered significant attention due to advancements in
multimodal large language models such as GPT-4 and video generation models such as …

Smarla: A safety monitoring approach for deep reinforcement learning agents

A Zolfagharian, M Abdellatif, LC Briand… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep Reinforcement Learning (DRL) has made significant advancements in various fields,
such as autonomous driving, healthcare, and robotics, by enabling agents to learn optimal …

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

S Luo, W Chen, W Tian, R Liu, L Hou… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Foundation models have indeed made a profound impact on various fields, emerging as
pivotal components that significantly shape the capabilities of intelligent systems. In the …

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

S Gao, J Yang, L Chen, K Chitta, Y Qiu… - arxiv preprint arxiv …, 2024 - arxiv.org
World models can foresee the outcomes of different actions, which is of paramount
importance for autonomous driving. Nevertheless, existing driving world models still have …

Integration of Mixture of Experts and Multimodal Generative AI in Internet of Vehicles: A Survey

M Xu, D Niyato, J Kang, Z **ong, A Jamalipour… - arxiv preprint arxiv …, 2024 - arxiv.org
Generative AI (GAI) can enhance the cognitive, reasoning, and planning capabilities of
intelligent modules in the Internet of Vehicles (IoV) by synthesizing augmented datasets …

An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training

H Zhang, Y Xue, X Yan, J Zhang, W Qiu, D Bai… - arxiv preprint arxiv …, 2024 - arxiv.org
The field of autonomous driving is experiencing a surge of interest in world models, which
aim to predict potential future scenarios based on historical observations. In this paper, we …

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Y Huang, J Sansom, Z Ma, F Gervits, J Chai - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in foundation models (FMs) have unlocked new prospects in
autonomous driving, yet the experimental settings of these studies are preliminary, over …

UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations

Y Ren, G Wu, R Li, Z Yang, Y Liu, X Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Urban scene reconstruction is crucial for real-world autonomous driving simulators.
Although existing methods have achieved photorealistic reconstruction, they mostly focus on …