Cross-modal retrieval: a systematic review of methods and future directions
With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …
methods struggle to meet the needs of users seeking access to data across various …
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment
Cross-modal alignment aims to build a bridge connecting vision and language. It is an
important multi-modal task that efficiently learns the semantic similarities between images …
important multi-modal task that efficiently learns the semantic similarities between images …
Adapting vision-language models via learning to inject knowledge
Pre-trained vision-language models (VLM) such as CLIP, have demonstrated impressive
zero-shot performance on various vision tasks. Trained on millions or even billions of image …
zero-shot performance on various vision tasks. Trained on millions or even billions of image …
LPCR-IoT: Lightweight and privacy-preserving cross-modal Retrieval in IoT
M Li, Y Zhu, R Du, C Jia - IEEE Internet of Things Journal, 2025 - ieeexplore.ieee.org
As a pivotal link between visual and linguistic relationships, image-text cross-modal retrieval
has received widespread attention. However, existing studies primarily focus on intricate …
has received widespread attention. However, existing studies primarily focus on intricate …
Semantic-Aware Representation of Multi-Modal Data for Data Ingress: A Literature Review
Machine Learning (ML) is continuously permeating a growing amount of application
domains. Generative AI such as Large Language Models (LLMs) also sees broad adoption …
domains. Generative AI such as Large Language Models (LLMs) also sees broad adoption …
Knowledge Graph Enhanced Multimodal Transformer for Image-Text Retrieval
J Zheng, M Liang, Y Yu, Y Li… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Image-text retrieval is a fundamental cross-modal task that aims to align the representation
spaces between the image and text modalities. Existing cross-modal image-text retrieval …
spaces between the image and text modalities. Existing cross-modal image-text retrieval …
Perceive, Reason, and Align: Context-guided cross-modal correlation learning for image–text retrieval
Z Liu, X Pei, S Gao, C Li, J Wang, J Xu - Applied Soft Computing, 2024 - Elsevier
Due to the inconsistency in feature representations between different modalities, namely
“Heterogeneous gap”, it remains a persistent challenge to correlate images and texts …
“Heterogeneous gap”, it remains a persistent challenge to correlate images and texts …
Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval
Z Wei, K **, X Zhou - arxiv preprint arxiv:2312.15840, 2023 - arxiv.org
Cross-modal medical image-report retrieval task plays a significant role in clinical diagnosis
and various medical generative tasks. Eliminating heterogeneity between different …
and various medical generative tasks. Eliminating heterogeneity between different …
Ensemble Prototype Networks for Unsupervised Cross-modal Hashing with Cross-Task Consistency
In the swiftly advancing realm of information retrieval, unsupervised cross-modal hashing
has emerged as a focal point of research, taking advantage of the inherent advantages of …
has emerged as a focal point of research, taking advantage of the inherent advantages of …
SIRS: Multi-task Joint Learning for Remote Sensing Foreground-entity Image-text Retrieval
The essence of improving the effect of cross-modal image–text retrieval (CIR) lies in the finer-
grained modeling of homogeneous features between modalities. However, in remote …
grained modeling of homogeneous features between modalities. However, in remote …