Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

S Hu, L Shen, Y Zhang, Y Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Transformers, originally devised for natural language processing (NLP), have also produced
significant successes in computer vision (CV). Due to their strong expression power …

Mapgpt: Map-guided prompting with adaptive path planning for vision-and-language navigation

J Chen, B Lin, R Xu, Z Chai, X Liang… - Proceedings of the …, 2024 - aclanthology.org
Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-
making and generalization abilities across various tasks. However, existing zero-shot agents …

Etpnav: Evolving topological planning for vision-language navigation in continuous environments

D An, H Wang, W Wang, Z Wang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Vision-language navigation is a task that requires an agent to follow instructions to navigate
in environments. It becomes increasingly crucial in the field of embodied AI, with potential …

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …

Navigation instruction generation with bev perception and large language models

S Fan, R Liu, W Wang, Y Yang - European Conference on Computer …, 2024 - Springer
Navigation instruction generation, which requires embodied agents to describe the
navigation routes, has been of great interest in robotics and human-computer interaction …

Controllable navigation instruction generation with chain of thought prompting

X Kong, J Chen, W Wang, H Su, X Hu, Y Yang… - European Conference on …, 2024 - Springer
Instruction generation is a vital and multidisciplinary research area with broad applications.
Existing instruction generation models are limited to generating instructions in a single style …

Frequency-enhanced data augmentation for vision-and-language navigation

K He, C Si, Z Lu, Y Huang, L Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Vision-and-Language Navigation (VLN) is a challenging task that requires an agent
to navigate through complex environments based on natural language instructions. In …

LLM as Copilot for Coarse-Grained Vision-and-Language Navigation

Y Qiao, Q Liu, J Liu, J Liu, Q Wu - European Conference on Computer …, 2024 - Springer
Abstract Vision-and-Language Navigation (VLN) involves guiding an agent through indoor
environments using human-provided textual instructions. Coarse-grained VLN, with short …

Towards learning a generalist model for embodied navigation

D Zheng, S Huang, L Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Building a generalist agent that can interact with the world is an ultimate goal for humans
thus spurring the research for embodied navigation where an agent is required to navigate …