Eccv caption: Correcting false negatives by collecting machine-and-human-verified image-caption associations for ms-coco
Image-Text matching (ITM) is a common task for evaluating the quality of Vision and
Language (VL) models. However, existing ITM benchmarks have a significant limitation …
Language (VL) models. However, existing ITM benchmarks have a significant limitation …
Point to rectangle matching for image text retrieval
The difficulty of image-text retrieval is further exacerbated by the phenomenon of one-to-
many correspondence, where multiple semantic manifestations of the other modality could …
many correspondence, where multiple semantic manifestations of the other modality could …
Improved probabilistic image-text representations
S Chun - arxiv preprint arxiv:2305.18171, 2023 - arxiv.org
Image-Text Matching (ITM) task, a fundamental vision-language (VL) task, suffers from the
inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic …
inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic …
Bi-directional image–text matching deep learning-based approaches: Concepts, methodologies, benchmarks and challenges
Nowadays, image–text matching (retrieval) has frequently attracted attention due to the
growth of multimodal data. This task returns the relevant images to a textual query or …
growth of multimodal data. This task returns the relevant images to a textual query or …
Gssf: Generalized structural sparse function for deep cross-modal metric learning
Cross-modal metric learning is a prominent research topic that bridges the semantic
heterogeneity between vision and language. Existing methods frequently utilize simple …
heterogeneity between vision and language. Existing methods frequently utilize simple …
Scene graph semantic inference for image and text matching
With the rapid development of information technology, image and text data have increased
dramatically. Image and text matching techniques enable computers to understand …
dramatically. Image and text matching techniques enable computers to understand …
Multi-view inter-modality representation with progressive fusion for image-text matching
Recently, image-text matching has been intensively explored to bridge vision and language.
Previous methods explore an inter-modality relationship between an image-text pair from …
Previous methods explore an inter-modality relationship between an image-text pair from …
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching
Image-text matching remains a challenging task due to heterogeneous semantic diversity
across modalities and insufficient distance separability within triplets. Different from previous …
across modalities and insufficient distance separability within triplets. Different from previous …
Auxiliary cross-modal representation learning with triplet loss functions for online handwriting recognition
Cross-modal representation learning learns a shared embedding between two or more
modalities to improve performance in a given task compared to using only one of the …
modalities to improve performance in a given task compared to using only one of the …
Cross-modal independent matching network for image-text retrieval
X Ke, B Chen, X Yang, Y Cai, H Liu, W Guo - Pattern Recognition, 2025 - Elsevier
Image-text retrieval serves as a bridge connecting vision and language. Mainstream modal
cross matching methods can effectively perform cross-modal interactions with high …
cross matching methods can effectively perform cross-modal interactions with high …