Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023‏ - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Filling the image information gap for vqa: Prompting large language models to proactively ask questions

Z Wang, C Chen, P Li, Y Liu - arxiv preprint arxiv:2311.11598, 2023‏ - arxiv.org
Large Language Models (LLMs) demonstrate impressive reasoning ability and the
maintenance of world knowledge not only in natural language tasks, but also in some vision …

Lang2ltl-2: Grounding spatiotemporal navigation commands using large language and vision-language models

JX Liu, A Shah, G Konidaris, S Tellex… - 2024 IEEE/RSJ …, 2024‏ - ieeexplore.ieee.org
Grounding spatiotemporal navigation commands to structured task specifications enables
autonomous robots to understand a broad range of natural language and solve long-horizon …

Knowledge acquisition disentanglement for knowledge-based visual question answering with large language models

W An, F Tian, J Nie, W Shi, H Lin, Y Chen… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Knowledge-based Visual Question Answering (KVQA) requires both image and world
knowledge to answer questions. Current methods first retrieve knowledge from the image …

Enhancing few-shot KB-VQA with panoramic image captions guided by Large Language Models

P Qiang, H Tan, X Li, D Wang, R Li, X Sun, H Zhang… - Neurocomputing, 2025‏ - Elsevier
Current state-of-the-art (SOTA) KB-VQA techniques involve transforming images into image
captions as prompts to harness the potent reasoning capabilities of large language models …

LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification

R Qu, M Yatskar - arxiv preprint arxiv:2405.18672, 2024‏ - arxiv.org
(Renyi Qu's Master's Thesis) Recent advancements in interpretable models for vision-
language tasks have achieved competitive performance; however, their interpretability often …

Learning to Ask Denotative and Connotative Questions for Knowledge-based VQA

X **ng, P **ong, L Fan, Y Li, Y Wu - Findings of the Association for …, 2024‏ - aclanthology.org
Large language models (LLMs) have attracted increasing attention due to its prominent
performance on various tasks. Recent works seek to leverage LLMs on knowledge-based …

Zero-Shot End-To-End Spoken Question Answering In Medical Domain

Y Labrak, A Moumen, R Dufour, M Rouvier - arxiv preprint arxiv …, 2024‏ - arxiv.org
In the rapidly evolving landscape of spoken question-answering (SQA), the integration of
large language models (LLMs) has emerged as a transformative development …

LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators

A Roush, E Zakirov, A Shirokov, P Lunina… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Recent advancements in text-to-image generation have revolutionized numerous fields,
including art and cinema, by automating the generation of high-quality, context-aware …