[HTML][HTML] Using multimodal large language models (MLLMs) for automated detection of traffic safety-critical events

M Abu Tami, HI Ashqar, M Elhenawy, S Glaser… - Vehicles, 2024 - mdpi.com
Traditional approaches to safety event analysis in autonomous systems have relied on
complex machine and deep learning models and extensive datasets for high accuracy and …

V2x-vlm: End-to-end v2x cooperative autonomous driving through large vision-language models

J You, H Shi, Z Jiang, Z Huang, R Gan, K Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Advancements in autonomous driving have increasingly focused on end-to-end (E2E)
systems that manage the full spectrum of driving tasks, from environmental perception to …

Grid: Visual layout generation

C Wan, X Luo, Z Cai, Y Song, Y Zhao, Y Bai… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we introduce GRID, a novel paradigm that reframes a broad range of visual
generation tasks as the problem of arranging grids, akin to film strips. At its core, GRID …

VLM-MPC: Vision Language Foundation Model (VLM)-Guided Model Predictive Controller (MPC) for Autonomous Driving

K Long, H Shi, J Liu, X Li - arxiv preprint arxiv:2408.04821, 2024 - arxiv.org
Motivated by the emergent reasoning capabilities of Vision Language Models (VLMs) and
their potential to improve the comprehensibility of autonomous driving systems, this paper …

Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving

T Li, H Wang, X Li, W Liao, T He, P Peng - arxiv preprint arxiv:2501.08861, 2025 - arxiv.org
Autonomous driving is a challenging task that requires perceiving and understanding the
surrounding environment for safe trajectory planning. While existing vision-based end-to …

FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training

A Cao, X Wei, Z Ma - arxiv preprint arxiv:2411.11927, 2024 - arxiv.org
Language-image pre-training faces significant challenges due to limited data in specific
formats and the constrained capacities of text encoders. While prevailing methods attempt to …

World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving

M Zhai, C Li, Z Guo, N Yang, X Qin, Y Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
The Multi-modal Large Language Models (MLLMs) with extensive world knowledge have
revitalized autonomous driving, particularly in reasoning tasks within perceivable regions …