Progressive spatio-temporal prototype matching for text-video retrieval
The performance of text-video retrieval has been significantly improved by vision-language
cross-modal learning schemes. The typical solution is to directly align the global video-level …
cross-modal learning schemes. The typical solution is to directly align the global video-level …
Dual learning with dynamic knowledge distillation for partially relevant video retrieval
J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …
short durations. However, in practice, videos are generally untrimmed containing much …
Learning comprehensive representations with richer self for text-to-image person re-identification
Text-to-image person re-identification (TIReID) retrieves pedestrian images of the same
identity based on a query text. However, existing methods typically treat it as a one-to-one …
identity based on a query text. However, existing methods typically treat it as a one-to-one …
Filling the Information Gap between Video and Query for Language-Driven Moment Retrieval
This paper addresses the challenging task of language-driven moment retrieval. Previous
methods are typically trained to localize the target moment corresponding to a single …
methods are typically trained to localize the target moment corresponding to a single …
Gqe: Generalized query expansion for enhanced text-video retrieval
In the rapidly expanding domain of web video content, the task of text-video retrieval has
become increasingly critical, bridging the semantic gap between textual queries and video …
become increasingly critical, bridging the semantic gap between textual queries and video …
Synthesizing Videos from Images for Image-to-Video Adaptation
We address the image-to-video adaptation task that aims to leverage labeled images and
unlabeled videos for video recognition. There are two major challenges in this task …
unlabeled videos for video recognition. There are two major challenges in this task …
Linguistic hallucination for text-based video retrieval
S Fang, T Dang, S Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Text-based video retrieval is a crucial technology for video and multimodal applications.
Although in traditional Text-Video Retrieval caption-video pairs are supposed to be entirely …
Although in traditional Text-Video Retrieval caption-video pairs are supposed to be entirely …
Alignment-Enhanced Network for Temporal Language Grounding in Videos
H Yu, Y Zhang, Y Liu, H Li, H Liu - International Conference on Artificial …, 2024 - Springer
Temporal language grounding in videos aims to ground one video segment in an untrimmed
video based on a given sentence query. The main challenge in this task lies in how to align …
video based on a given sentence query. The main challenge in this task lies in how to align …