Embodied navigation with multi-modal information: A survey from tasks to methodology
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …
environment. A key problem in this field is embodied navigation which understands multi …
Poni: Potential functions for objectgoal navigation with interaction-free learning
State-of-the-art approaches to ObjectGoal navigation (ObjectNav) rely on reinforcement
learning and typically require significant computational resources and time for learning. We …
learning and typically require significant computational resources and time for learning. We …
Renderable neural radiance map for visual navigation
We propose a novel type of map for visual navigation, a renderable neural radiance map
(RNR-Map), which is designed to contain the overall visual information of a 3D environment …
(RNR-Map), which is designed to contain the overall visual information of a 3D environment …
Bird's-Eye-View Scene Graph for Vision-Language Navigation
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …
environments following human instructions, has shown great advances. However, current …
Gridmm: Grid memory map for vision-and-language navigation
Vision-and-language navigation (VLN) enables the agent to navigate to a remote location
following the natural language instruction in 3D environments. To represent the previously …
following the natural language instruction in 3D environments. To represent the previously …
Weakly-supervised multi-granularity map learning for vision-and-language navigation
We address a practical yet challenging problem of training robot agents to navigate in an
environment following a path described by some language instructions. The instructions …
environment following a path described by some language instructions. The instructions …
Housekeep: Tidying virtual households using commonsense reasoning
We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the home
for embodied AI. In Housekeep, an embodied agent must tidy a house by rearranging …
for embodied AI. In Housekeep, an embodied agent must tidy a house by rearranging …
Semantic audio-visual navigation
Recent work on audio-visual navigation assumes a constantly-sounding target and restricts
the role of audio to signaling the target's position. We introduce semantic audio-visual …
the role of audio to signaling the target's position. We introduce semantic audio-visual …
Cross-modal map learning for vision and language navigation
We consider the problem of Vision-and-Language Navigation (VLN). The majority of current
methods for VLN are trained end-to-end using either unstructured memory such as LSTM, or …
methods for VLN are trained end-to-end using either unstructured memory such as LSTM, or …
Auxiliary tasks and exploration enable objectgoal navigation
Abstract ObjectGoal Navigation (ObjectNav) is an embodied task wherein agents are to
navigate to an object instance in an unseen environment. Prior works have shown that end …
navigate to an object instance in an unseen environment. Prior works have shown that end …