Embodied navigation with multi-modal information: A survey from tasks to methodology
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …
environment. A key problem in this field is embodied navigation which understands multi …
History aware multimodal transformer for vision-and-language navigation
Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow
instructions and navigate in real scenes. To remember previously visited locations and …
instructions and navigate in real scenes. To remember previously visited locations and …
Think global, act local: Dual-scale graph transformer for vision-and-language navigation
Following language instructions to navigate in unseen environments is a challenging
problem for autonomous embodied agents. The agent not only needs to ground languages …
problem for autonomous embodied agents. The agent not only needs to ground languages …
Vln bert: A recurrent vision-and-language bert for navigation
Accuracy of many visiolinguistic tasks has benefited significantly from the application of
vision-and-language (V&L) BERT. However, its application for the task of vision-and …
vision-and-language (V&L) BERT. However, its application for the task of vision-and …
Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …
Scaling data generation in vision-and-language navigation
Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …
demand for the diversity of traversable environments and the quantity of supervision for …
Vision-and-language navigation: A survey of tasks, methods, and future directions
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
Core challenges in embodied vision-language planning
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …
have led to the development of challenging tasks at the intersection of Computer Vision …
Envedit: Environment editing for vision-and-language navigation
Abstract In Vision-and-Language Navigation (VLN), an agent needs to navigate through the
environment based on natural language instructions. Due to limited available data for agent …
environment based on natural language instructions. Due to limited available data for agent …
Hop: History-and-order aware pre-training for vision-and-language navigation
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation
(VLN). However, previous pre-training methods for VLN either lack the ability to predict …
(VLN). However, previous pre-training methods for VLN either lack the ability to predict …