Univtg: Towards unified video-language temporal grounding
Abstract Video Temporal Grounding (VTG), which aims to ground target clips from videos
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …
Query-dependent video representation for moment retrieval and highlight detection
Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …
the demand for video understanding is drastically increased. The key objective of MR/HD is …
Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection
Abstract Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted
significant attention due to the growing demand for video analysis. Recent approaches treat …
significant attention due to the growing demand for video analysis. Recent approaches treat …
Umt: Unified multi-modal transformers for joint video moment retrieval and highlight detection
Finding relevant moments and highlights in videos according to natural language queries is
a natural and highly valuable common need in the current video content explosion era …
a natural and highly valuable common need in the current video content explosion era …
Joint visual and audio learning for video highlight detection
In video highlight detection, the goal is to identify the interesting moments within an unedited
video. Although the audio component of the video provides important cues for highlight …
video. Although the audio component of the video provides important cues for highlight …
-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to
ground relevant clips in untrimmed videos given natural language queries. Most existing …
ground relevant clips in untrimmed videos given natural language queries. Most existing …
Correlation-guided query-dependency calibration in video representation learning for temporal grounding
Temporal Grounding is to identify specific moments or highlights from a video corresponding
to textual descriptions. Typical approaches in temporal grounding treat all video clips …
to textual descriptions. Typical approaches in temporal grounding treat all video clips …
Mh-detr: Video moment and highlight detection with cross-modal transformer
Y Xu, Y Sun, B Zhai, Y Jia, S Du - 2024 International Joint …, 2024 - ieeexplore.ieee.org
With the increasing demand for video understanding, video moment and highlight detection
(MHD) has emerged as a critical research topic. MHD aims to localize all moments and …
(MHD) has emerged as a critical research topic. MHD aims to localize all moments and …
Contrastive learning for unsupervised video highlight detection
Video highlight detection can greatly simplify video browsing, potentially paving the way for
a wide range of applications. Existing efforts are mostly fully-supervised, requiring humans …
a wide range of applications. Existing efforts are mostly fully-supervised, requiring humans …
Tr-detr: Task-reciprocal transformer for joint moment retrieval and highlight detection
Video moment retrieval (MR) and highlight detection (HD) based on natural language
queries are two highly related tasks, which aim to obtain relevant moments within videos …
queries are two highly related tasks, which aim to obtain relevant moments within videos …