Hit: Hierarchical transformer with momentum contrast for video-text retrieval

S Liu, H Fan, S Qian, Y Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Video-Text Retrieval has been a hot research topic with the growth of multimedia
data on the internet. Transformer for video-text learning has attracted increasing attention …

Fast video moment retrieval

J Gao, C Xu - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com
This paper targets at fast video moment retrieval (fast VMR), aiming to localize the target
moment efficiently and accurately as queried by a given natural language sentence. We …

A survey on deep hashing methods

X Luo, H Wang, D Wu, C Chen, M Deng… - ACM Transactions on …, 2023 - dl.acm.org
Nearest neighbor search aims at obtaining the samples in the database with the smallest
distances from them to the queries, which is a basic task in a range of fields, including …

Reading-strategy inspired visual representation learning for text-to-video retrieval

J Dong, Y Wang, X Chen, X Qu, X Li… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
This paper aims for the task of text-to-video retrieval, where given a query in the form of a
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …

Partially relevant video retrieval

J Dong, X Chen, M Zhang, X Yang, S Chen… - Proceedings of the 30th …, 2022 - dl.acm.org
Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …

Unsupervised cross-modal hashing via semantic text mining

RC Tu, XL Mao, Q Lin, W Ji, W Qin… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Cross-modal hashing has been widely used in multimedia retrieval tasks due to its fast
retrieval speed and low storage cost. Recently, many deep unsupervised cross-modal …

Complementarity-aware space learning for video-text retrieval

J Zhu, P Zeng, L Gao, G Li, D Liao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In general, videos are powerful at recording physical patterns (eg, spatial layout) while texts
are great at describing abstract symbols (eg, emotion). When video and text are used in …

SEA: Sentence encoder assembly for video retrieval by textual queries

X Li, F Zhou, C Xu, J Ji, G Yang - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a
core theme in multimedia data management and retrieval. The success of AVS counts on …

Transferring image-clip to video-text retrieval via temporal relations

H Fang, P **ong, L Xu, W Luo - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
We present a novel network to transfer the image-language pre-trained model to video-text
retrieval in an end-to-end manner. Leading approaches in the domain of video-and …

Gratis: Deep learning graph representation with task-specific topology and multi-dimensional edge features

S Song, Y Song, C Luo, Z Song, S Kuzucu, X Jia… - arxiv preprint arxiv …, 2022 - arxiv.org
Graph is powerful for representing various types of real-world data. The topology (edges'
presence) and edges' features of a graph decides the message passing mechanism among …