- Academic Search

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Save Cite Cited by 44 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

Save Cite Cited by 21 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Multi-modal knowledge graph construction and application: A survey

X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org

Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …

Save Cite Cited by 197 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Text with knowledge graph augmented transformer for video captioning

X Gu, G Chen, Y Wang, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video captioning aims to describe the content of videos using natural language. Although
significant progress has been made, there is still much room to improve the performance for …

Save Cite Cited by 55 Related articles All 6 versions Free GPT-4 View as HTML

Hierarchical representation network with auxiliary tasks for video captioning and video question answering

L Gao, Y Lei, P Zeng, J Song, M Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Recently, integrating vision and language for in-depth video understanding eg, video
captioning and video question answering, has become a promising direction for artificial …

Save Cite Cited by 77 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Multi-modal relational graph for cross-modal video moment retrieval

Y Zeng, D Cao, X Wei, M Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Given an untrimmed video and a query sentence, cross-modal video moment retrieval aims
to rank a video moment from pre-segmented video moment candidates that best matches …

Save Cite Cited by 86 Related articles All 4 versions Free GPT-4 View as HTML

Memory-based augmentation network for video captioning

S **g, H Zhang, P Zeng, L Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Video captioning focuses on generating natural language descriptions according to the
video content. Existing works mainly explore this multimodal learning with the paired source …

Save Cite Cited by 27 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] aaai.org

Classification by attention: Scene graph classification with prior knowledge

S Sharifzadeh, SM Baharlou, V Tresp - Proceedings of the AAAI …, 2021 - ojs.aaai.org

A major challenge in scene graph classification is that the appearance of objects and
relations can be significantly different from one image to another. Previous works have …

Save Cite Cited by 62 Related articles All 8 versions Free GPT-4 View as HTML

Vision-enhanced and consensus-aware transformer for image captioning

S Cao, G An, Z Zheng, Z Wang - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org

Image captioning generates descriptions in a natural language for a given image. Due to its
great potential for a wide range of applications, many deep learning based-methods have …

Save Cite Cited by 44 Related articles

[Free GPT-4]

[PDF] aaai.org

Magic: Multimodal relational graph adversarial inference for diverse and unpaired text-based image captioning

W Zhang, H Shi, J Guo, S Zhang, Q Cai, J Li… - Proceedings of the …, 2022 - ojs.aaai.org

Text-based image captioning (TextCap) requires simultaneous comprehension of visual
content and reading the text of images to generate a natural language description. Although …

Save Cite Cited by 42 Related articles All 7 versions Free GPT-4 View as HTML

Cite

Advanced search

Saved to My library

Knowledge graphs meet multi-modal learning: A comprehensive survey

A review of deep learning for video captioning

Multi-modal knowledge graph construction and application: A survey

Text with knowledge graph augmented transformer for video captioning

Hierarchical representation network with auxiliary tasks for video captioning and video question answering

Multi-modal relational graph for cross-modal video moment retrieval

Memory-based augmentation network for video captioning

Classification by attention: Scene graph classification with prior knowledge

Vision-enhanced and consensus-aware transformer for image captioning

Magic: Multimodal relational graph adversarial inference for diverse and unpaired text-based image captioning