Routing evidence for unseen actions in video moment retrieval

G Wang, X Wu, Z Qin, L Shi - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org
Video moment retrieval (VMR) is a cutting-edge vision-language task locating a segment in
a video according to the query. Though the methods have achieved significant performance …

Source-Free Elastic Model Adaptation for Vision-and-Language Navigation

M Tan, P Chen, H Zhi, J Mai, B Rosman… - IEEE Transactions …, 2025 - ieeexplore.ieee.org
Vision-and-Language Navigation (VLN) requires an agent to follow given instructions to
navigate. Despite the significant progress, the model trained on seen environments has a …

Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation

J Gao, X Yao, C Xu - Forty-first International Conference on Machine … - openreview.net
The ability to accurately comprehend natural language instructions and navigate to the
target location is essential for an embodied agent. Such agents are typically required to …