Embodied navigation with multi-modal information: A survey from tasks to methodology
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …
environment. A key problem in this field is embodied navigation which understands multi …
Navgpt: Explicit reasoning in vision-and-language navigation with large language models
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …
Vln bert: A recurrent vision-and-language bert for navigation
Accuracy of many visiolinguistic tasks has benefited significantly from the application of
vision-and-language (V&L) BERT. However, its application for the task of vision-and …
vision-and-language (V&L) BERT. However, its application for the task of vision-and …
Episodic transformer for vision-and-language navigation
Interaction and navigation defined by natural language instructions in dynamic
environments pose significant challenges for neural agents. This paper focuses on …
environments pose significant challenges for neural agents. This paper focuses on …
Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …
Teach: Task-driven embodied agents that chat
Robots operating in human spaces must be able to engage in natural language interaction,
both understanding and executing instructions, and using conversation to resolve ambiguity …
both understanding and executing instructions, and using conversation to resolve ambiguity …
Core challenges in embodied vision-language planning
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …
have led to the development of challenging tasks at the intersection of Computer Vision …
Vision-and-language navigation: A survey of tasks, methods, and future directions
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
Velma: Verbalization embodiment of llm agents for vision and language navigation in street view
Incremental decision making in real-world environments is one of the most challenging tasks
in embodied artificial intelligence. One particularly demanding scenario is Vision and …
in embodied artificial intelligence. One particularly demanding scenario is Vision and …
Envedit: Environment editing for vision-and-language navigation
Abstract In Vision-and-Language Navigation (VLN), an agent needs to navigate through the
environment based on natural language instructions. Due to limited available data for agent …
environment based on natural language instructions. Due to limited available data for agent …