Routing evidence for unseen actions in video moment retrieval
Video moment retrieval (VMR) is a cutting-edge vision-language task locating a segment in
a video according to the query. Though the methods have achieved significant performance …
a video according to the query. Though the methods have achieved significant performance …
Source-Free Elastic Model Adaptation for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires an agent to follow given instructions to
navigate. Despite the significant progress, the model trained on seen environments has a …
navigate. Despite the significant progress, the model trained on seen environments has a …
Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation
The ability to accurately comprehend natural language instructions and navigate to the
target location is essential for an embodied agent. Such agents are typically required to …
target location is essential for an embodied agent. Such agents are typically required to …