Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

Multi-modal knowledge graph construction and application: A survey

X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org
Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …

Text with knowledge graph augmented transformer for video captioning

X Gu, G Chen, Y Wang, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Video captioning aims to describe the content of videos using natural language. Although
significant progress has been made, there is still much room to improve the performance for …

Hierarchical representation network with auxiliary tasks for video captioning and video question answering

L Gao, Y Lei, P Zeng, J Song, M Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Recently, integrating vision and language for in-depth video understanding eg, video
captioning and video question answering, has become a promising direction for artificial …

Multi-modal relational graph for cross-modal video moment retrieval

Y Zeng, D Cao, X Wei, M Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Given an untrimmed video and a query sentence, cross-modal video moment retrieval aims
to rank a video moment from pre-segmented video moment candidates that best matches …

Memory-based augmentation network for video captioning

S **g, H Zhang, P Zeng, L Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video captioning focuses on generating natural language descriptions according to the
video content. Existing works mainly explore this multimodal learning with the paired source …

Classification by attention: Scene graph classification with prior knowledge

S Sharifzadeh, SM Baharlou, V Tresp - Proceedings of the AAAI …, 2021 - ojs.aaai.org
A major challenge in scene graph classification is that the appearance of objects and
relations can be significantly different from one image to another. Previous works have …

Vision-enhanced and consensus-aware transformer for image captioning

S Cao, G An, Z Zheng, Z Wang - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
Image captioning generates descriptions in a natural language for a given image. Due to its
great potential for a wide range of applications, many deep learning based-methods have …

Magic: Multimodal relational graph adversarial inference for diverse and unpaired text-based image captioning

W Zhang, H Shi, J Guo, S Zhang, Q Cai, J Li… - Proceedings of the …, 2022 - ojs.aaai.org
Text-based image captioning (TextCap) requires simultaneous comprehension of visual
content and reading the text of images to generate a natural language description. Although …