- Academic Search

Eccv caption: Correcting false negatives by collecting machine-and-human-verified image-caption associations for ms-coco

S Chun, W Kim, S Park, M Chang, SJ Oh - European Conference on …, 2022 - Springer

Image-Text matching (ITM) is a common task for evaluating the quality of Vision and
Language (VL) models. However, existing ITM benchmarks have a significant limitation …

Save Cite Cited by 45 Related articles All 7 versions Free GPT-4

Point to rectangle matching for image text retrieval

Z Wang, Z Gao, X Xu, Y Luo, Y Yang… - Proceedings of the 30th …, 2022 - dl.acm.org

The difficulty of image-text retrieval is further exacerbated by the phenomenon of one-to-
many correspondence, where multiple semantic manifestations of the other modality could …

Save Cite Cited by 25 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Improved probabilistic image-text representations

S Chun - arxiv preprint arxiv:2305.18171, 2023 - arxiv.org

Image-Text Matching (ITM) task, a fundamental vision-language (VL) task, suffers from the
inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic …

Save Cite Cited by 26 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] springer.com

Bi-directional image–text matching deep learning-based approaches: Concepts, methodologies, benchmarks and challenges

DB Ebaid, MM Madbouly, AA El-Zoghabi - International Journal of …, 2023 - Springer

Nowadays, image–text matching (retrieval) has frequently attracted attention due to the
growth of multimodal data. This task returns the relevant images to a textual query or …

Save Cite Cited by 4 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Gssf: Generalized structural sparse function for deep cross-modal metric learning

H Diao, Y Zhang, S Gao, J Zhu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Cross-modal metric learning is a prominent research topic that bridges the semantic
heterogeneity between vision and language. Existing methods frequently utilize simple …

Save Cite Cited by 2 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] archive.org

Scene graph semantic inference for image and text matching

J Pei, K Zhong, Z Yu, L Wang… - ACM Transactions on …, 2023 - dl.acm.org

With the rapid development of information technology, image and text data have increased
dramatically. Image and text matching techniques enable computers to understand …

Save Cite Cited by 27 Related articles All 2 versions Free GPT-4

Multi-view inter-modality representation with progressive fusion for image-text matching

J Wu, L Wang, C Chen, J Lu, C Wu - Neurocomputing, 2023 - Elsevier

Recently, image-text matching has been intensively explored to bridge vision and language.
Previous methods explore an inter-modality relationship between an image-text pair from …

Save Cite Cited by 7 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

H Diao, Y Zhang, S Gao, X Ruan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Image-text matching remains a challenging task due to heterogeneous semantic diversity
across modalities and insufficient distance separability within triplets. Different from previous …

Save Cite Cited by 1 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

Auxiliary cross-modal representation learning with triplet loss functions for online handwriting recognition

F Ott, D Rügamer, L Heublein, B Bischl… - IEEE Access, 2023 - ieeexplore.ieee.org

Cross-modal representation learning learns a shared embedding between two or more
modalities to improve performance in a given task compared to using only one of the …

Save Cite Cited by 14 Related articles All 6 versions Free GPT-4

Cross-modal independent matching network for image-text retrieval

X Ke, B Chen, X Yang, Y Cai, H Liu, W Guo - Pattern Recognition, 2025 - Elsevier

Image-text retrieval serves as a bridge connecting vision and language. Mainstream modal
cross matching methods can effectively perform cross-modal interactions with high …

Save Cite Related articles

Create alert

Cite

Advanced search

Saved to My library

Is an image worth five sentences? a new look into semantics for image-text matching

Eccv caption: Correcting false negatives by collecting machine-and-human-verified image-caption associations for ms-coco

Point to rectangle matching for image text retrieval

Improved probabilistic image-text representations

Bi-directional image–text matching deep learning-based approaches: Concepts, methodologies, benchmarks and challenges

Gssf: Generalized structural sparse function for deep cross-modal metric learning

Scene graph semantic inference for image and text matching

Multi-view inter-modality representation with progressive fusion for image-text matching

Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

Auxiliary cross-modal representation learning with triplet loss functions for online handwriting recognition

Cross-modal independent matching network for image-text retrieval