Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning

W Ji, R Liang, Z Zheng, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …

Mrtnet: Multi-resolution temporal network for video sentence grounding

W Ji, Y Qin, L Chen, Y Wei, Y Wu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Video sentence grounding locates a specific moment in a video based on a text query.
Existing methods focus on single temporal resolution, ignoring multi-scale temporal …

Lite-MKD: A Multi-modal Knowledge Distillation Framework for Lightweight Few-shot Action Recognition

B Liu, T Zheng, P Zheng, D Liu, X Qu, J Gao… - Proceedings of the 31st …, 2023 - dl.acm.org
Existing few-shot action recognition methods have placed primary focus on improving the
recognition accuracy while neglecting another important indicator in practical scenarios, ie …

Filling the Information Gap between Video and Query for Language-Driven Moment Retrieval

D Liu, X Qu, J Dong, G Nan, P Zhou, Z Xu… - Proceedings of the 31st …, 2023 - dl.acm.org
This paper addresses the challenging task of language-driven moment retrieval. Previous
methods are typically trained to localize the target moment corresponding to a single …

Deep multimodal learning for information retrieval

W Ji, Y Wei, Z Zheng, H Fei, T Chua - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Information retrieval (IR) is a fundamental technique that aims to acquire information from a
collection of documents, web pages, or other sources. While traditional text-based IR has …