- Academic Search

T Wang, F Li, L Zhu, J Li, Z Zhang… - Proceedings of the …, 2025 - ieeexplore.ieee.org

With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …

Speichern Zitieren Zitiert von: 24 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] arxiv.org

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Speichern Zitieren Zitiert von: 44 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Unifying large language models and knowledge graphs: A roadmap

S Pan, L Luo, Y Wang, C Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Large language models (LLMs), such as ChatGPT and GPT4, are making new waves in the
field of natural language processing and artificial intelligence, due to their emergent ability …

Speichern Zitieren Zitiert von: 806 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] thecvf.com

Vinvl: Revisiting visual representations in vision-language models

P Zhang, X Li, X Hu, J Yang, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a detailed study of improving vision features and develops an improved
object detection model for vision language (VL) tasks. Compared to the most widely used …

Speichern Zitieren Zitiert von: 1125 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Negative-aware attention framework for image-text matching

K Zhang, Z Mao, Q Wang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Image-text matching, as a fundamental task, bridges the gap between vision and language.
The key of this task is to accurately measure similarity between these two modalities. Prior …

Speichern Zitieren Zitiert von: 147 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

X Li, X Yin, C Li, P Zhang, X Hu, L Zhang… - Computer Vision–ECCV …, 2020 - Springer

Large-scale pre-training methods of learning cross-modal representations on image-text
pairs are becoming popular for vision-language tasks. While existing methods simply …

Speichern Zitieren Zitiert von: 2233 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]

[PDF] aaai.org

Similarity reasoning and filtration for image-text matching

H Diao, Y Zhang, L Ma, H Lu - Proceedings of the AAAI conference on …, 2021 - ojs.aaai.org

Image-text matching plays a critical role in bridging the vision and language, and great
progress has been made by exploiting the global alignment between image and sentence …

Speichern Zitieren Zitiert von: 359 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Multi-modal knowledge graph construction and application: A survey

X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org

Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …

Speichern Zitieren Zitiert von: 197 Ähnliche Artikel Alle 7 Versionen

Dual-level representation enhancement on characteristic and context for image-text retrieval

S Yang, Q Li, W Li, X Li, AA Liu - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org

Image-text retrieval is a fundamental and vital task in multi-media retrieval and has received
growing attention since it connects heterogeneous data. Previous methods that perform well …

Speichern Zitieren Zitiert von: 106 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] aaai.org

Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training

G Li, N Duan, Y Fang, M Gong, D Jiang - Proceedings of the AAAI …, 2020 - aaai.org

We propose Unicoder-VL, a universal encoder that aims to learn joint representations of
vision and language in a pre-training manner. Borrow ideas from cross-lingual pre-trained …

Speichern Zitieren Zitiert von: 981 Ähnliche Artikel Alle 12 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Knowledge Aware Semantic Concept Expansion for Image-Text Matching.

Cross-modal retrieval: a systematic review of methods and future directions

Knowledge graphs meet multi-modal learning: A comprehensive survey

Unifying large language models and knowledge graphs: A roadmap

Vinvl: Revisiting visual representations in vision-language models

Negative-aware attention framework for image-text matching

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

Similarity reasoning and filtration for image-text matching

Multi-modal knowledge graph construction and application: A survey

Dual-level representation enhancement on characteristic and context for image-text retrieval

Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training