A survey on video moment localization
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …
segment within a video described by a given natural language query. Beyond the task of …
Momentdiff: Generative video moment retrieval from random to real
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …
temporal segments within an untrimmed video that correspond to a given language …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Mad: A scalable dataset for language grounding in videos from movie audio descriptions
The recent and increasing interest in video-language research has driven the development
of large-scale datasets that enable data-intensive machine learning techniques. In …
of large-scale datasets that enable data-intensive machine learning techniques. In …
Negative sample matters: A renaissance of metric learning for temporal grounding
Temporal grounding aims to localize a video moment which is semantically aligned with a
given natural language query. Existing methods typically apply a detection or regression …
given natural language query. Existing methods typically apply a detection or regression …
Knowing where to focus: Event-aware transformer for video grounding
Recent DETR-based video grounding models have made the model directly predict moment
timestamps without any hand-crafted components, such as a pre-defined proposal or non …
timestamps without any hand-crafted components, such as a pre-defined proposal or non …
You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …
moment semantically according to a sentence query. Although previous respectable works …
G2l: Semantically aligned and uniform video grounding via geodesic and game theory
The recent video grounding works attempt to introduce vanilla contrastive learning into video
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …
Rethinking weakly-supervised video temporal grounding from a game perspective
This paper addresses the challenging task of weakly-supervised video temporal grounding.
Existing approaches are generally based on the moment proposal selection framework that …
Existing approaches are generally based on the moment proposal selection framework that …
Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
performance of accuracy, efficiency, and robustness, all of which largely rely on the …