Momentdiff: Generative video moment retrieval from random to real

P Li, CW **e, H **e, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Mad: A scalable dataset for language grounding in videos from movie audio descriptions

M Soldan, A Pardo, JL Alcázar… - Proceedings of the …, 2022 - openaccess.thecvf.com
The recent and increasing interest in video-language research has driven the development
of large-scale datasets that enable data-intensive machine learning techniques. In …

Fast video moment retrieval

J Gao, C Xu - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com
This paper targets at fast video moment retrieval (fast VMR), aiming to localize the target
moment efficiently and accurately as queried by a given natural language sentence. We …

Compositional temporal grounding with structured variational cross-graph correspondence learning

J Li, J **e, L Qian, L Zhu, S Tang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Temporal grounding in videos aims to localize one target video segment that semantically
corresponds to a given query sentence. Thanks to the semantic diversity of natural language …

Deep learning for weakly-supervised object detection and localization: A survey

F Shao, L Chen, J Shao, W Ji, S **ao, L Ye, Y Zhuang… - Neurocomputing, 2022 - Elsevier
Abstract Weakly-Supervised Object Detection (WSOD) and Localization (WSOL), ie.,
detecting multiple and single instances with bounding boxes in an image using image-level …

Cross-sentence temporal and semantic relations in video activity localisation

J Huang, Y Liu, S Gong, H ** - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Video activity localisation has recently attained increasing attention due to its practical
values in automatically localising the most salient visual segments corresponding to their …

A survey on temporal sentence grounding in videos

X Lan, Y Yuan, X Wang, Z Wang, W Zhu - ACM Transactions on …, 2023 - dl.acm.org
Temporal sentence grounding in videos (TSGV), which aims at localizing one target
segment from an untrimmed video with respect to a given sentence query, has drawn …

Natural language video localization with learnable moment proposals

S **ao, L Chen, J Shao, Y Zhuang, J **ao - arxiv preprint arxiv …, 2021 - arxiv.org
Given an untrimmed video and a natural language query, Natural Language Video
Localization (NLVL) aims to identify the video moment described by the query. To address …

Training-free video temporal grounding using large-scale pre-trained models

M Zheng, X Cai, Q Chen, Y Peng, Y Liu - European Conference on …, 2024 - Springer
Video temporal grounding aims to identify video segments within untrimmed videos that are
most relevant to a given natural language query. Existing video temporal localization models …