- Academic Search

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Enregistrer Citer Cité 197 fois Autres articles Les 7 versions Free GPT-4 Recherche dans les bibliothèques Version HTML

[Free GPT-4]

[PDF] arxiv.org

Image-text retrieval: A survey on recent research and development

M Cao, S Li, J Li, L Nie, M Zhang - ar** of visual and textual …

Enregistrer Citer Cité 174 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Vinvl: Revisiting visual representations in vision-language models

P Zhang, X Li, X Hu, J Yang, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a detailed study of improving vision features and develops an improved
object detection model for vision language (VL) tasks. Compared to the most widely used …

Enregistrer Citer Cité 1123 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

X Li, X Yin, C Li, P Zhang, X Hu, L Zhang… - Computer Vision–ECCV …, 2020 - Springer

Large-scale pre-training methods of learning cross-modal representations on image-text
pairs are becoming popular for vision-language tasks. While existing methods simply …

Enregistrer Citer Cité 2229 fois Autres articles Les 6 versions Free GPT-4

[Free GPT-4]

[PDF] aaai.org

Similarity reasoning and filtration for image-text matching

H Diao, Y Zhang, L Ma, H Lu - Proceedings of the AAAI conference on …, 2021 - ojs.aaai.org

Image-text matching plays a critical role in bridging the vision and language, and great
progress has been made by exploiting the global alignment between image and sentence …

Enregistrer Citer Cité 359 fois Autres articles Les 9 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Multi-modal knowledge graph construction and application: A survey

X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org

Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …

Enregistrer Citer Cité 197 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Clip-driven fine-grained text-image person re-identification

S Yan, N Dong, L Zhang, J Tang - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org

Text-Image Person Re-identification (TIReID) aims to retrieve the image corresponding to
the given text query from a pool of candidate images. Existing methods employ prior …

Enregistrer Citer Cité 147 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval

H Chen, G Ding, X Liu, Z Lin, J Liu… - Proceedings of the …, 2020 - openaccess.thecvf.com

Enabling bi-directional retrieval of images and texts is important for understanding the
correspondence between vision and language. Existing methods leverage the attention …

Enregistrer Citer Cité 436 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Stacked cross attention for image-text matching

KH Lee, X Chen, G Hua, H Hu… - Proceedings of the …, 2018 - openaccess.thecvf.com

In this paper, we study the problem of image-text matching. Inferring the latent semantic
alignment between objects or other salient stuff (eg snow, sky, lawn) and the corresponding …

Enregistrer Citer Cité 1456 fois Autres articles Les 8 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Dual-path convolutional image-text embeddings with instance loss

Vision-language pre-training: Basics, recent advances, and future trends

Image-text retrieval: A survey on recent research and development

Vinvl: Revisiting visual representations in vision-language models

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

Similarity reasoning and filtration for image-text matching

Multi-modal knowledge graph construction and application: A survey

Clip-driven fine-grained text-image person re-identification

Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval

Stacked cross attention for image-text matching