Cross-modal retrieval: a systematic review of methods and future directions

T Wang, F Li, L Zhu, J Li, Z Zhang… - Proceedings of the …, 2025 - ieeexplore.ieee.org
With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …

Hit: Hierarchical transformer with momentum contrast for video-text retrieval

S Liu, H Fan, S Qian, Y Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Video-Text Retrieval has been a hot research topic with the growth of multimedia
data on the internet. Transformer for video-text learning has attracted increasing attention …

User cold-start recommendation via inductive heterogeneous graph neural network

D Cai, S Qian, Q Fang, J Hu, C Xu - ACM Transactions on Information …, 2023 - dl.acm.org
Recently, user cold-start recommendations have attracted a lot of attention from industry and
academia. In user cold-start recommendation systems, the user attribute information is often …

Integrating multi-label contrastive learning with dual adversarial graph neural networks for cross-modal retrieval

S Qian, D Xue, Q Fang, C Xu - IEEE Transactions on Pattern …, 2022 - ieeexplore.ieee.org
With the growing amount of multimodal data, cross-modal retrieval has attracted more and
more attention and become a hot research topic. To date, most of the existing techniques …

Variational causal inference network for explanatory visual question answering

D Xue, S Qian, C Xu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Abstract Explanatory Visual Question Answering (EVQA) is a recently proposed multimodal
reasoning task that requires answering visual questions and generating multimodal …

The State of the Art for Cross-Modal Retrieval: A Survey

K Zhou, FH Hassan, GK Hoon - IEEE Access, 2023 - ieeexplore.ieee.org
Cross-modal retrieval, which aims to search for semantically relevant data across different
modalities, has received increasing attention in recent years. Deep learning, with its ability to …

Heterogeneous graph contrastive learning network for personalized micro-video recommendation

D Cai, S Qian, Q Fang, J Hu, W Ding… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Personalized micro-video recommendation has attracted a lot of research attention with the
growing popularity of micro-video sharing platforms. Many efforts have been made to …

Self-supervised correlation learning for cross-modal retrieval

Y Liu, J Wu, L Qu, T Gan, J Yin… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Cross-modal retrieval aims to retrieve relevant data from another modality when given a
query of one modality. Although most existing methods that rely on the label information of …

RONO: robust discriminative learning with noisy labels for 2D-3D cross-modal retrieval

Y Feng, H Zhu, D Peng, X Peng… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Recently, with the advent of Metaverse and AI Generated Content, cross-modal retrieval
becomes popular with a burst of 2D and 3D data. However, this problem is challenging …

LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval

Z Yang, D Xue, S Qian, W Dong, C Xu - Proceedings of the 47th …, 2024 - dl.acm.org
Zero-Shot Composed Image Retrieval (ZS-CIR) has garnered increasing interest in recent
years, which aims to retrieve a target image based on a query composed of a reference …