Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …
language query. Existing techniques achieve such alignment by exploiting dense boundary …
Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
Mrtnet: Multi-resolution temporal network for video sentence grounding
Video sentence grounding locates a specific moment in a video based on a text query.
Existing methods focus on single temporal resolution, ignoring multi-scale temporal …
Existing methods focus on single temporal resolution, ignoring multi-scale temporal …
Lite-MKD: A Multi-modal Knowledge Distillation Framework for Lightweight Few-shot Action Recognition
Existing few-shot action recognition methods have placed primary focus on improving the
recognition accuracy while neglecting another important indicator in practical scenarios, ie …
recognition accuracy while neglecting another important indicator in practical scenarios, ie …
Filling the Information Gap between Video and Query for Language-Driven Moment Retrieval
This paper addresses the challenging task of language-driven moment retrieval. Previous
methods are typically trained to localize the target moment corresponding to a single …
methods are typically trained to localize the target moment corresponding to a single …
Deep multimodal learning for information retrieval
Information retrieval (IR) is a fundamental technique that aims to acquire information from a
collection of documents, web pages, or other sources. While traditional text-based IR has …
collection of documents, web pages, or other sources. While traditional text-based IR has …