Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Unifying large language models and knowledge graphs: A roadmap

S Pan, L Luo, Y Wang, C Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Large language models (LLMs), such as ChatGPT and GPT4, are making new waves in the
field of natural language processing and artificial intelligence, due to their emergent ability …

Large language models are visual reasoning coordinators

L Chen, B Li, S Shen, J Yang, C Li… - Advances in …, 2023 - proceedings.neurips.cc
Visual reasoning requires multimodal perception and commonsense cognition of the world.
Recently, multiple vision-language models (VLMs) have been proposed with excellent …

Fine-grained late-interaction multi-modal retrieval for retrieval augmented visual question answering

W Lin, J Chen, J Mei, A Coca… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Knowledge-based Visual Question Answering (KB-VQA) requires VQA systems to
utilize knowledge from external knowledge bases to answer visually-grounded questions …

Meaformer: Multi-modal entity alignment transformer for meta modality hybrid

Z Chen, J Chen, W Zhang, L Guo, Y Fang… - Proceedings of the 31st …, 2023 - dl.acm.org
Multi-modal entity alignment (MMEA) aims to discover identical entities across different
knowledge graphs (KGs) whose entities are associated with relevant images. However …

Rethinking uncertainly missing and ambiguous visual modality in multi-modal entity alignment

Z Chen, L Guo, Y Fang, Y Zhang, J Chen… - International Semantic …, 2023 - Springer
As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to
identify identical entities across disparate knowledge graphs (KGs) by exploiting associated …

A symmetric dual encoding dense retrieval framework for knowledge-intensive visual question answering

A Salemi, J Altmayer Pizzorno, H Zamani - Proceedings of the 46th …, 2023 - dl.acm.org
Knowledge-Intensive Visual Question Answering (KI-VQA) refers to answering a question
about an image whose answer does not lie in the image. This paper presents a new pipeline …

[PDF][PDF] Structure-clip: Enhance multi-modal language representations with structure knowledge

Y Huang, J Tang, Z Chen, R Zhang… - arxiv preprint arxiv …, 2023 - researchgate.net
Large-scale vision-language pre-training has shown promising advances on various
downstream tasks and achieved significant performance in multi-modal understanding and …

Structure-clip: Towards scene graph knowledge to enhance multi-modal structured representations

Y Huang, J Tang, Z Chen, R Zhang, X Zhang… - Proceedings of the …, 2024 - ojs.aaai.org
Large-scale vision-language pre-training has achieved significant performance in multi-
modal understanding and generation tasks. However, existing methods often perform poorly …

Tele-knowledge pre-training for fault analysis

Z Chen, W Zhang, Y Huang, M Chen… - 2023 IEEE 39th …, 2023 - ieeexplore.ieee.org
In this work, we share our experience on tele-knowledge pre-training for fault analysis, a
crucial task in telecommunication applications that requires a wide range of knowledge …