A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

Learning 2d temporal adjacent networks for moment localization with natural language

S Zhang, H Peng, J Fu, J Luo - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org
We address the problem of retrieving a specific moment from an untrimmed video by a query
sentence. This is a challenging problem because a target moment may take place in …

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Dense regression network for video grounding

R Zeng, H Xu, W Huang, P Chen… - Proceedings of the …, 2020 - openaccess.thecvf.com
We address the problem of video grounding from natural language queries. The key
challenge in this task is that one training video might only contain a few annotated …

Tubedetr: Spatio-temporal video grounding with transformers

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2022 - openaccess.thecvf.com
We consider the problem of localizing a spatio-temporal tube in a video corresponding to a
given text query. This is a challenging task that requires the joint and efficient modeling of …

Deconfounded video moment retrieval with causal intervention

X Yang, F Feng, W Ji, M Wang, TS Chua - Proceedings of the 44th …, 2021 - dl.acm.org
We tackle the task of video moment retrieval (VMR), which aims to localize a specific
moment in a video according to a textual query. Existing methods primarily model the …

Dynamic modality interaction modeling for image-text retrieval

L Qu, M Liu, J Wu, Z Gao, L Nie - … of the 44th International ACM SIGIR …, 2021 - dl.acm.org
Image-text retrieval is a fundamental and crucial branch in information retrieval. Although
much progress has been made in bridging vision and language, it remains challenging …

Context-aware biaffine localizing network for temporal sentence grounding

D Liu, X Qu, J Dong, P Zhou, Y Cheng… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper addresses the problem of temporal sentence grounding (TSG), which aims to
identify the temporal boundary of a specific segment from an untrimmed video by a sentence …

Tvr: A large-scale dataset for video-subtitle moment retrieval

J Lei, L Yu, TL Berg, M Bansal - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer
We introduce TV show Retrieval (TVR), a new multimodal retrieval dataset. TVR requires
systems to understand both videos and their associated subtitle (dialogue) texts, making it …

Fast video moment retrieval

J Gao, C Xu - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com
This paper targets at fast video moment retrieval (fast VMR), aiming to localize the target
moment efficiently and accurately as queried by a given natural language sentence. We …