Eccv caption: Correcting false negatives by collecting machine-and-human-verified image-caption associations for ms-coco

S Chun, W Kim, S Park, M Chang, SJ Oh - European Conference on …, 2022 - Springer
Image-Text matching (ITM) is a common task for evaluating the quality of Vision and
Language (VL) models. However, existing ITM benchmarks have a significant limitation …

Point to rectangle matching for image text retrieval

Z Wang, Z Gao, X Xu, Y Luo, Y Yang… - Proceedings of the 30th …, 2022 - dl.acm.org
The difficulty of image-text retrieval is further exacerbated by the phenomenon of one-to-
many correspondence, where multiple semantic manifestations of the other modality could …

Improved probabilistic image-text representations

S Chun - arxiv preprint arxiv:2305.18171, 2023 - arxiv.org
Image-Text Matching (ITM) task, a fundamental vision-language (VL) task, suffers from the
inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic …

Bi-directional image–text matching deep learning-based approaches: Concepts, methodologies, benchmarks and challenges

DB Ebaid, MM Madbouly, AA El-Zoghabi - International Journal of …, 2023 - Springer
Nowadays, image–text matching (retrieval) has frequently attracted attention due to the
growth of multimodal data. This task returns the relevant images to a textual query or …

Gssf: Generalized structural sparse function for deep cross-modal metric learning

H Diao, Y Zhang, S Gao, J Zhu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Cross-modal metric learning is a prominent research topic that bridges the semantic
heterogeneity between vision and language. Existing methods frequently utilize simple …

Scene graph semantic inference for image and text matching

J Pei, K Zhong, Z Yu, L Wang… - ACM Transactions on …, 2023 - dl.acm.org
With the rapid development of information technology, image and text data have increased
dramatically. Image and text matching techniques enable computers to understand …

Multi-view inter-modality representation with progressive fusion for image-text matching

J Wu, L Wang, C Chen, J Lu, C Wu - Neurocomputing, 2023 - Elsevier
Recently, image-text matching has been intensively explored to bridge vision and language.
Previous methods explore an inter-modality relationship between an image-text pair from …

Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

H Diao, Y Zhang, S Gao, X Ruan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Image-text matching remains a challenging task due to heterogeneous semantic diversity
across modalities and insufficient distance separability within triplets. Different from previous …

Auxiliary cross-modal representation learning with triplet loss functions for online handwriting recognition

F Ott, D Rügamer, L Heublein, B Bischl… - IEEE Access, 2023 - ieeexplore.ieee.org
Cross-modal representation learning learns a shared embedding between two or more
modalities to improve performance in a given task compared to using only one of the …

Cross-modal independent matching network for image-text retrieval

X Ke, B Chen, X Yang, Y Cai, H Liu, W Guo - Pattern Recognition, 2025 - Elsevier
Image-text retrieval serves as a bridge connecting vision and language. Mainstream modal
cross matching methods can effectively perform cross-modal interactions with high …