Vision language models in autonomous driving: A survey and outlook

X Zhou, M Liu, E Yurtsever, BL Zagar… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD)
have attracted widespread attention due to their outstanding performance and the ability to …

Prospective role of foundation models in advancing autonomous vehicles

J Wu, B Gao, J Gao, J Yu, H Chu, Q Yu, X Gong… - Research, 2024 - spj.science.org
With the development of artificial intelligence and breakthroughs in deep learning, large-
scale foundation models (FMs), such as generative pre-trained transformer (GPT), Sora, etc …

SketchQL: Video Moment Querying with a Visual Query Interface

R Wu, P Chunduri, A Payani, X Chu, J Arulraj… - Proceedings of the …, 2024 - dl.acm.org
Localizing video moments based on the movement patterns of objects is an important task in
video analytics. Existing video analytics systems offer two types of querying interfaces based …

V2V: Efficiently Synthesizing Video Results for Video Queries

D Winecki, A Nandi - 2024 IEEE 40th International Conference …, 2024 - ieeexplore.ieee.org
Querying video data has become increasingly popular and useful. Video queries can be
complex, ranging from retrieval tasks (“find me the top videos that have…”), to analytics …

Radar spectra-language model for automotive scene parsing

M Pushkareva, Y Feldman, C Domokos… - arxiv preprint arxiv …, 2024 - arxiv.org
Radar sensors are low cost, long-range, and weather-resilient. Therefore, they are widely
used for driver assistance functions, and are expected to be crucial for the success of …

Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report]

E Zhang, N Sullivan, B Haynes, R Krishna… - arxiv preprint arxiv …, 2024 - arxiv.org
Complex video queries can be answered by decomposing them into modular subtasks.
However, existing video data management systems assume the existence of predefined …

Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning

A Ziai, A Vartakavi - arxiv preprint arxiv:2402.06560, 2024 - arxiv.org
High-quality and consistent annotations are fundamental to the successful development of
robust machine learning models. Traditional data annotation methods are resource …

Large (Vision) Language Models for Autonomous Vehicles: Current Trends and Future Directions

H Tian, K Reddy, Y Feng, M Quddus, Y Demiris… - Authorea Preprints - techrxiv.org
As autonomous vehicles (AVs) advance, the integration of Large (Vision) Language Models
(L (V) LMs) has emerged as a promising approach to enhance AV capabilities in perception …