Cross-modal retrieval: a systematic review of methods and future directions

T Wang, F Li, L Zhu, J Li, Z Zhang… - Proceedings of the …, 2025 - ieeexplore.ieee.org
With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …

Missrec: Pre-training and transferring multi-modal interest-aware sequence representation for recommendation

J Wang, Z Zeng, Y Wang, Y Wang, X Lu, T Li… - Proceedings of the 31st …, 2023 - dl.acm.org
The goal of sequential recommendation (SR) is to predict a user's potential interested items
based on her/his historical interaction sequences. Most existing sequential recommenders …

Cross-modal retrieval: A review of methodologies, datasets, and future perspectives

Z Han, A Azman, MR Mustaffa, FB Khalid - IEEE Access, 2024 - ieeexplore.ieee.org
With the rapid development of science and technology, all types of mixed media contain
large amounts of data. Traditional single multimedia data can no longer satisfy daily …

Contrastive masked autoencoders for self-supervised video hashing

Y Wang, J Wang, B Chen, Z Zeng, ST **a - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Abstract Self-Supervised Video Hashing (SSVH) models learn to generate short binary
representations for videos without ground-truth supervision, facilitating large-scale video …

Hugs bring double benefits: Unsupervised cross-modal hashing with multi-granularity aligned transformers

J Wang, Z Zeng, B Chen, Y Wang, D Liao, G Li… - International Journal of …, 2024 - Springer
Unsupervised cross-modal hashing (UCMH) has been commonly explored to support large-
scale cross-modal retrieval of unlabeled data. Despite promising progress, most existing …

GMMFormer: gaussian-mixture-model based transformer for efficient partially relevant video retrieval

Y Wang, J Wang, B Chen, Z Zeng, ST **a - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Given a text query, partially relevant video retrieval (PRVR) seeks to find untrimmed videos
containing pertinent moments in a database. For PRVR, clip modeling is essential to capture …

Deep self-supervised hashing with fine-grained similarity mining for cross-modal retrieval

L Han, R Wang, C Chen, H Zhang, Y Zhang… - IEEE …, 2024 - ieeexplore.ieee.org
With the efficiency of storage and retrieval speed, the hashing methods have attracted a lot
of attention for cross-modal retrieval applications. In contrast to traditional cross-modal …

[HTML][HTML] Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval

Q Zou, S Cheng, A Du, J Chen - Entropy, 2024 - mdpi.com
Deep hashing technology, known for its low-cost storage and rapid retrieval, has become a
focal point in cross-modal retrieval research as multimodal data continue to grow. However …

[PDF][PDF] Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives

A ZHICHAOHAN, MASRB MUSTAFFA, FB KHALID - 2024 - psasir.upm.edu.my
With the rapid development of science and technology, all types of mixed media contain
large amounts of data. Traditional single multimedia data can no longer satisfy daily …