Cross-modal retrieval: a systematic review of methods and future directions

T Wang, F Li, L Zhu, J Li, Z Zhang… - Proceedings of the …, 2025 - ieeexplore.ieee.org
With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …

Cross-modal active complementary learning with self-refining correspondence

Y Qin, Y Sun, D Peng, JT Zhou… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recently, image-text matching has attracted more and more attention from academia and
industry, which is fundamental to understanding the latent correspondence across visual …

Robust multi-view clustering with noisy correspondence

Y Sun, Y Qin, Y Li, D Peng, X Peng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep multi-view clustering leverages deep neural networks to achieve promising
performance, but almost all existing methods implicitly assume that all views are aligned …

Noisy-correspondence learning for text-to-image person re-identification

Y Qin, Y Chen, D Peng, X Peng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-image person re-identification (TIReID) is a compelling topic in the cross-modal
community which aims to retrieve the target person based on a textual query. Although …

Robust object re-identification with coupled noisy labels

M Yang, Z Huang, X Peng - International Journal of Computer Vision, 2024 - Springer
In this paper, we reveal and study a new challenging problem faced by object Re-
IDentification (ReID), ie, Coupled Noisy Labels (CNL) which refers to the Noisy Annotation …

Breaking through the noisy correspondence: A robust model for image-text matching

H Shi, M Liu, X Mu, X Song, Y Hu, L Nie - ACM Transactions on …, 2024 - dl.acm.org
Unleashing the power of image-text matching in real-world applications is hampered by
noisy correspondence. Manually curating high-quality datasets is expensive and time …

Senet: spatial information enhancement for semantic segmentation neural networks

Y Huang, P Shi, H He, H He, B Zhao - The Visual Computer, 2024 - Springer
Image semantic segmentation is a basic task of computer vision, and plays an important role
in automatic driving, robot navigation and many other fields. However, the expensive …

Noise-robust Vision-language Pre-training with Positive-negative Learning

Z Huang, M Yang, X **ao, P Hu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Vision-Language Pre-training (VLP) has shown promising performance in various tasks by
learning a generic image-text representation space. However, most existing VLP methods …

Semantic-aware Contrastive Learning with Proposal Suppression for Video Semantic Role Grounding

M Liu, D Zhou, J Guo, X Luo, Z Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video semantic role grounding has gained substantial interest from both the academic and
industrial communities. While existing methods have demonstrated considerable …

Semi-supervised semi-paired cross-modal hashing

X Zhang, X Liu, X Nie, X Kang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Large-scale cross-modal hashing has drawn extensive attention due to its attractive
efficiency in both storage and retrieval. Existing methods exhibit poor performance when …