Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Poni: Potential functions for objectgoal navigation with interaction-free learning

SK Ramakrishnan, DS Chaplot… - Proceedings of the …, 2022 - openaccess.thecvf.com
State-of-the-art approaches to ObjectGoal navigation (ObjectNav) rely on reinforcement
learning and typically require significant computational resources and time for learning. We …

Renderable neural radiance map for visual navigation

O Kwon, J Park, S Oh - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
We propose a novel type of map for visual navigation, a renderable neural radiance map
(RNR-Map), which is designed to contain the overall visual information of a 3D environment …

Bird's-Eye-View Scene Graph for Vision-Language Navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

Gridmm: Grid memory map for vision-and-language navigation

Z Wang, X Li, J Yang, Y Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Vision-and-language navigation (VLN) enables the agent to navigate to a remote location
following the natural language instruction in 3D environments. To represent the previously …

Weakly-supervised multi-granularity map learning for vision-and-language navigation

P Chen, D Ji, K Lin, R Zeng, T Li… - Advances in Neural …, 2022 - proceedings.neurips.cc
We address a practical yet challenging problem of training robot agents to navigate in an
environment following a path described by some language instructions. The instructions …

Housekeep: Tidying virtual households using commonsense reasoning

Y Kant, A Ramachandran, S Yenamandra… - … on Computer Vision, 2022 - Springer
We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the home
for embodied AI. In Housekeep, an embodied agent must tidy a house by rearranging …

Semantic audio-visual navigation

C Chen, Z Al-Halah, K Grauman - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Recent work on audio-visual navigation assumes a constantly-sounding target and restricts
the role of audio to signaling the target's position. We introduce semantic audio-visual …

Cross-modal map learning for vision and language navigation

G Georgakis, K Schmeckpeper… - Proceedings of the …, 2022 - openaccess.thecvf.com
We consider the problem of Vision-and-Language Navigation (VLN). The majority of current
methods for VLN are trained end-to-end using either unstructured memory such as LSTM, or …

Auxiliary tasks and exploration enable objectgoal navigation

J Ye, D Batra, A Das, E Wijmans - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Abstract ObjectGoal Navigation (ObjectNav) is an embodied task wherein agents are to
navigate to an object instance in an unseen environment. Prior works have shown that end …