A survey on video moment localization
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …
segment within a video described by a given natural language query. Beyond the task of …
Momentdiff: Generative video moment retrieval from random to real
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …
temporal segments within an untrimmed video that correspond to a given language …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Detecting moments and highlights in videos via natural language queries
Detecting customized moments and highlights from videos given natural language (NL) user
queries is an important but under-studied topic. One of the challenges in pursuing this …
queries is an important but under-studied topic. One of the challenges in pursuing this …
Cross-modal causal relational reasoning for event-level visual question answering
Existing visual question answering methods often suffer from cross-modal spurious
correlations and oversimplified event-level reasoning processes that fail to capture event …
correlations and oversimplified event-level reasoning processes that fail to capture event …
Unloc: A unified framework for video localization tasks
While large-scale image-text pretrained models such as CLIP have been used for multiple
video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos …
video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos …
Vidchapters-7m: Video chapters at scale
Segmenting untrimmed videos into chapters enables users to quickly navigate to the
information of their interest. This important topic has been understudied due to the lack of …
information of their interest. This important topic has been understudied due to the lack of …
Deconfounded video moment retrieval with causal intervention
We tackle the task of video moment retrieval (VMR), which aims to localize a specific
moment in a video according to a textual query. Existing methods primarily model the …
moment in a video according to a textual query. Existing methods primarily model the …
Negative sample matters: A renaissance of metric learning for temporal grounding
Temporal grounding aims to localize a video moment which is semantically aligned with a
given natural language query. Existing methods typically apply a detection or regression …
given natural language query. Existing methods typically apply a detection or regression …
Knowing where to focus: Event-aware transformer for video grounding
Recent DETR-based video grounding models have made the model directly predict moment
timestamps without any hand-crafted components, such as a pre-defined proposal or non …
timestamps without any hand-crafted components, such as a pre-defined proposal or non …