Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation
Abstract Vision-and-Language Navigation requires the agent to follow language instructions
to navigate through 3D environments. One main challenge in Vision-and-Language …
to navigate through 3D environments. One main challenge in Vision-and-Language …
Bird's-Eye-View Scene Graph for Vision-Language Navigation
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …
environments following human instructions, has shown great advances. However, current …
Dreamwalker: Mental planning for continuous vision-language navigation
VLN-CE is a recently released embodied task, where AI agents need to navigate a freely
traversable environment to reach a distant target location, given language instructions. It …
traversable environment to reach a distant target location, given language instructions. It …
Local-global context aware transformer for language-guided video segmentation
We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
Bevbert: Multimodal map pre-training for language-guided navigation
Large-scale pre-training has shown promising results on the vision-and-language
navigation (VLN) task. However, most existing pre-training methods employ discrete …
navigation (VLN) task. However, most existing pre-training methods employ discrete …
Etpnav: Evolving topological planning for vision-language navigation in continuous environments
Vision-language navigation is a task that requires an agent to follow instructions to navigate
in environments. It becomes increasingly crucial in the field of embodied AI, with potential …
in environments. It becomes increasingly crucial in the field of embodied AI, with potential …
Vision-and-language navigation today and tomorrow: A survey in the era of foundation models
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …
and many approaches have emerged to advance their development. The remarkable …
Navigation instruction generation with bev perception and large language models
Navigation instruction generation, which requires embodied agents to describe the
navigation routes, has been of great interest in robotics and human-computer interaction …
navigation routes, has been of great interest in robotics and human-computer interaction …
Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Video anomaly understanding (VAU) aims to automatically comprehend unusual
occurrences in videos thereby enabling various applications such as traffic surveillance and …
occurrences in videos thereby enabling various applications such as traffic surveillance and …
Controllable navigation instruction generation with chain of thought prompting
Instruction generation is a vital and multidisciplinary research area with broad applications.
Existing instruction generation models are limited to generating instructions in a single style …
Existing instruction generation models are limited to generating instructions in a single style …