Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation

J Li, M Bansal - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Abstract Vision-and-Language Navigation requires the agent to follow language instructions
to navigate through 3D environments. One main challenge in Vision-and-Language …

Bird's-Eye-View Scene Graph for Vision-Language Navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

Dreamwalker: Mental planning for continuous vision-language navigation

H Wang, W Liang, L Van Gool… - Proceedings of the …, 2023 - openaccess.thecvf.com
VLN-CE is a recently released embodied task, where AI agents need to navigate a freely
traversable environment to reach a distant target location, given language instructions. It …

Local-global context aware transformer for language-guided video segmentation

C Liang, W Wang, T Zhou, J Miao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …

Bevbert: Multimodal map pre-training for language-guided navigation

D An, Y Qi, Y Li, Y Huang, L Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large-scale pre-training has shown promising results on the vision-and-language
navigation (VLN) task. However, most existing pre-training methods employ discrete …

Etpnav: Evolving topological planning for vision-language navigation in continuous environments

D An, H Wang, W Wang, Z Wang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Vision-language navigation is a task that requires an agent to follow instructions to navigate
in environments. It becomes increasingly crucial in the field of embodied AI, with potential …

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …

Navigation instruction generation with bev perception and large language models

S Fan, R Liu, W Wang, Y Yang - European Conference on Computer …, 2024 - Springer
Navigation instruction generation, which requires embodied agents to describe the
navigation routes, has been of great interest in robotics and human-computer interaction …

Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

H Du, S Zhang, B **e, G Nan, J Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video anomaly understanding (VAU) aims to automatically comprehend unusual
occurrences in videos thereby enabling various applications such as traffic surveillance and …

Controllable navigation instruction generation with chain of thought prompting

X Kong, J Chen, W Wang, H Su, X Hu, Y Yang… - European Conference on …, 2024 - Springer
Instruction generation is a vital and multidisciplinary research area with broad applications.
Existing instruction generation models are limited to generating instructions in a single style …