Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

G Zhou, Y Hong, Q Wu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …

Vln bert: A recurrent vision-and-language bert for navigation

Y Hong, Q Wu, Y Qi… - Proceedings of the …, 2021 - openaccess.thecvf.com
Accuracy of many visiolinguistic tasks has benefited significantly from the application of
vision-and-language (V&L) BERT. However, its application for the task of vision-and …

Episodic transformer for vision-and-language navigation

A Pashevich, C Schmid, C Sun - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Interaction and navigation defined by natural language instructions in dynamic
environments pose significant challenges for neural agents. This paper focuses on …

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding

A Ku, P Anderson, R Patel, E Ie, J Baldridge - arxiv preprint arxiv …, 2020 - arxiv.org
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …

Teach: Task-driven embodied agents that chat

A Padmakumar, J Thomason, A Shrivastava… - Proceedings of the …, 2022 - ojs.aaai.org
Robots operating in human spaces must be able to engage in natural language interaction,
both understanding and executing instructions, and using conversation to resolve ambiguity …

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arxiv preprint arxiv …, 2022 - arxiv.org
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

Velma: Verbalization embodiment of llm agents for vision and language navigation in street view

R Schumann, W Zhu, W Feng, TJ Fu… - Proceedings of the …, 2024 - ojs.aaai.org
Incremental decision making in real-world environments is one of the most challenging tasks
in embodied artificial intelligence. One particularly demanding scenario is Vision and …

Envedit: Environment editing for vision-and-language navigation

J Li, H Tan, M Bansal - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Abstract In Vision-and-Language Navigation (VLN), an agent needs to navigate through the
environment based on natural language instructions. Due to limited available data for agent …