Progressive spatio-temporal prototype matching for text-video retrieval

P Li, CW **e, L Zhao, H **e, J Ge… - Proceedings of the …, 2023 - openaccess.thecvf.com
The performance of text-video retrieval has been significantly improved by vision-language
cross-modal learning schemes. The typical solution is to directly align the global video-level …

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

Learning comprehensive representations with richer self for text-to-image person re-identification

S Yan, N Dong, J Liu, L Zhang, J Tang - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Text-to-image person re-identification (TIReID) retrieves pedestrian images of the same
identity based on a query text. However, existing methods typically treat it as a one-to-one …

Filling the Information Gap between Video and Query for Language-Driven Moment Retrieval

D Liu, X Qu, J Dong, G Nan, P Zhou, Z Xu… - Proceedings of the 31st …, 2023 - dl.acm.org
This paper addresses the challenging task of language-driven moment retrieval. Previous
methods are typically trained to localize the target moment corresponding to a single …

Gqe: Generalized query expansion for enhanced text-video retrieval

Z Bai, T **ao, T He, P Wang, Z Zhang, T Brox… - arxiv preprint arxiv …, 2024 - arxiv.org
In the rapidly expanding domain of web video content, the task of text-video retrieval has
become increasingly critical, bridging the semantic gap between textual queries and video …

Synthesizing Videos from Images for Image-to-Video Adaptation

J Zhuo, X Zhao, S Wang, H Ma, Q Huang - Proceedings of the 31st ACM …, 2023 - dl.acm.org
We address the image-to-video adaptation task that aims to leverage labeled images and
unlabeled videos for video recognition. There are two major challenges in this task …

Linguistic hallucination for text-based video retrieval

S Fang, T Dang, S Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Text-based video retrieval is a crucial technology for video and multimodal applications.
Although in traditional Text-Video Retrieval caption-video pairs are supposed to be entirely …

Alignment-Enhanced Network for Temporal Language Grounding in Videos

H Yu, Y Zhang, Y Liu, H Li, H Liu - International Conference on Artificial …, 2024 - Springer
Temporal language grounding in videos aims to ground one video segment in an untrimmed
video based on a given sentence query. The main challenge in this task lies in how to align …