Momentdiff: Generative video moment retrieval from random to real
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …
temporal segments within an untrimmed video that correspond to a given language …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Mad: A scalable dataset for language grounding in videos from movie audio descriptions
The recent and increasing interest in video-language research has driven the development
of large-scale datasets that enable data-intensive machine learning techniques. In …
of large-scale datasets that enable data-intensive machine learning techniques. In …
Knowing where to focus: Event-aware transformer for video grounding
Recent DETR-based video grounding models have made the model directly predict moment
timestamps without any hand-crafted components, such as a pre-defined proposal or non …
timestamps without any hand-crafted components, such as a pre-defined proposal or non …
Curriculum multi-negative augmentation for debiased video grounding
Video Grounding (VG) aims to locate the desired segment from a video given a sentence
query. Recent studies have found that current VG models are prone to over-rely the …
query. Recent studies have found that current VG models are prone to over-rely the …
[PDF][PDF] The elements of temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Collaborative debias strategy for temporal sentence grounding in video
Temporal sentence grounding in video has witnessed significant advancements, but suffers
from substantial dataset bias, which undermines its generalization ability. Existing debias …
from substantial dataset bias, which undermines its generalization ability. Existing debias …
[PDF][PDF] Overcoming weak visual-textual alignment for video moment retrieval
Video moment retrieval (VMR) aims to identify the specific moment in an untrimmed video for
a given natural language query. However, this task is prone to suffer the weak visual-textual …
a given natural language query. However, this task is prone to suffer the weak visual-textual …
Self-supervised learning for semi-supervised temporal language grounding
Given a text description, Temporal Language Grounding (TLG) aims to localize temporal
boundaries of the segments that contain the specified semantics in an untrimmed video. TLG …
boundaries of the segments that contain the specified semantics in an untrimmed video. TLG …
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding
This paper addresses the temporal sentence grounding (TSG). Although existing methods
have made decent achievements in this task, they not only severely rely on abundant video …
have made decent achievements in this task, they not only severely rely on abundant video …