Momentdiff: Generative video moment retrieval from random to real
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …
temporal segments within an untrimmed video that correspond to a given language …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Mad: A scalable dataset for language grounding in videos from movie audio descriptions
The recent and increasing interest in video-language research has driven the development
of large-scale datasets that enable data-intensive machine learning techniques. In …
of large-scale datasets that enable data-intensive machine learning techniques. In …
Fast video moment retrieval
This paper targets at fast video moment retrieval (fast VMR), aiming to localize the target
moment efficiently and accurately as queried by a given natural language sentence. We …
moment efficiently and accurately as queried by a given natural language sentence. We …
Compositional temporal grounding with structured variational cross-graph correspondence learning
Temporal grounding in videos aims to localize one target video segment that semantically
corresponds to a given query sentence. Thanks to the semantic diversity of natural language …
corresponds to a given query sentence. Thanks to the semantic diversity of natural language …
Deep learning for weakly-supervised object detection and localization: A survey
Abstract Weakly-Supervised Object Detection (WSOD) and Localization (WSOL), ie.,
detecting multiple and single instances with bounding boxes in an image using image-level …
detecting multiple and single instances with bounding boxes in an image using image-level …
Cross-sentence temporal and semantic relations in video activity localisation
Video activity localisation has recently attained increasing attention due to its practical
values in automatically localising the most salient visual segments corresponding to their …
values in automatically localising the most salient visual segments corresponding to their …
A survey on temporal sentence grounding in videos
Temporal sentence grounding in videos (TSGV), which aims at localizing one target
segment from an untrimmed video with respect to a given sentence query, has drawn …
segment from an untrimmed video with respect to a given sentence query, has drawn …
Natural language video localization with learnable moment proposals
Given an untrimmed video and a natural language query, Natural Language Video
Localization (NLVL) aims to identify the video moment described by the query. To address …
Localization (NLVL) aims to identify the video moment described by the query. To address …
Training-free video temporal grounding using large-scale pre-trained models
Video temporal grounding aims to identify video segments within untrimmed videos that are
most relevant to a given natural language query. Existing video temporal localization models …
most relevant to a given natural language query. Existing video temporal localization models …