A survey on video moment localization
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …
segment within a video described by a given natural language query. Beyond the task of …
Learning 2d temporal adjacent networks for moment localization with natural language
We address the problem of retrieving a specific moment from an untrimmed video by a query
sentence. This is a challenging problem because a target moment may take place in …
sentence. This is a challenging problem because a target moment may take place in …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Dense regression network for video grounding
We address the problem of video grounding from natural language queries. The key
challenge in this task is that one training video might only contain a few annotated …
challenge in this task is that one training video might only contain a few annotated …
Tubedetr: Spatio-temporal video grounding with transformers
We consider the problem of localizing a spatio-temporal tube in a video corresponding to a
given text query. This is a challenging task that requires the joint and efficient modeling of …
given text query. This is a challenging task that requires the joint and efficient modeling of …
Deconfounded video moment retrieval with causal intervention
We tackle the task of video moment retrieval (VMR), which aims to localize a specific
moment in a video according to a textual query. Existing methods primarily model the …
moment in a video according to a textual query. Existing methods primarily model the …
Dynamic modality interaction modeling for image-text retrieval
Image-text retrieval is a fundamental and crucial branch in information retrieval. Although
much progress has been made in bridging vision and language, it remains challenging …
much progress has been made in bridging vision and language, it remains challenging …
Context-aware biaffine localizing network for temporal sentence grounding
This paper addresses the problem of temporal sentence grounding (TSG), which aims to
identify the temporal boundary of a specific segment from an untrimmed video by a sentence …
identify the temporal boundary of a specific segment from an untrimmed video by a sentence …
Tvr: A large-scale dataset for video-subtitle moment retrieval
We introduce TV show Retrieval (TVR), a new multimodal retrieval dataset. TVR requires
systems to understand both videos and their associated subtitle (dialogue) texts, making it …
systems to understand both videos and their associated subtitle (dialogue) texts, making it …
Fast video moment retrieval
This paper targets at fast video moment retrieval (fast VMR), aiming to localize the target
moment efficiently and accurately as queried by a given natural language sentence. We …
moment efficiently and accurately as queried by a given natural language sentence. We …