A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

Momentdiff: Generative video moment retrieval from random to real

P Li, CW **e, H **e, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

Query-dependent video representation for moment retrieval and highlight detection

WJ Moon, S Hyun, SU Park, D Park… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …

Unloc: A unified framework for video localization tasks

S Yan, X **ong, A Nagrani, A Arnab… - Proceedings of the …, 2023 - openaccess.thecvf.com
While large-scale image-text pretrained models such as CLIP have been used for multiple
video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos …

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Fine-grained temporal contrastive learning for weakly-supervised temporal action localization

J Gao, M Chen, C Xu - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
We target at the task of weakly-supervised action localization (WSAL), where only video-
level action labels are available during model training. Despite the recent progress, existing …

Negative sample matters: A renaissance of metric learning for temporal grounding

Z Wang, L Wang, T Wu, T Li, G Wu - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Temporal grounding aims to localize a video moment which is semantically aligned with a
given natural language query. Existing methods typically apply a detection or regression …

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

G2l: Semantically aligned and uniform video grounding via geodesic and game theory

H Li, M Cao, X Cheng, Y Li, Z Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent video grounding works attempt to introduce vanilla contrastive learning into video
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …

Umt: Unified multi-modal transformers for joint video moment retrieval and highlight detection

Y Liu, S Li, Y Wu, CW Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Finding relevant moments and highlights in videos according to natural language queries is
a natural and highly valuable common need in the current video content explosion era …