Visual genome: Connecting language and vision using crowdsourced dense image annotations

R Krishna, Y Zhu, O Groth, J Johnson, K Hata… - International journal of …, 2017 - Springer
Despite progress in perceptual tasks such as image classification, computers still perform
poorly on cognitive tasks such as image description and question answering. Cognition is …

Vqa: Visual question answering

S Antol, A Agrawal, J Lu, M Mitchell… - Proceedings of the …, 2015 - openaccess.thecvf.com
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given
an image and a natural language question about the image, the task is to provide an …

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Ok-vqa: A visual question answering benchmark requiring external knowledge

K Marino, M Rastegari, A Farhadi… - Proceedings of the …, 2019 - openaccess.thecvf.com
Abstract Visual Question Answering (VQA) in its ideal form lets us study reasoning in the
joint space of vision and language and serves as a proxy for the AI task of scene …

Neural motifs: Scene graph parsing with global context

R Zellers, M Yatskar, S Thomson… - Proceedings of the …, 2018 - openaccess.thecvf.com
We investigate the problem of producing structured graph representations of visual scenes.
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …

Krisp: Integrating implicit and symbolic knowledge for open-domain knowledge-based vqa

K Marino, X Chen, D Parikh, A Gupta… - Proceedings of the …, 2021 - openaccess.thecvf.com
One of the most challenging question types in VQA is when answering the question requires
outside knowledge not present in the image. In this work we study open-domain knowledge …

Visual translation embedding network for visual relation detection

H Zhang, Z Kyaw, SF Chang… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Visual relations, such as" person ride bike" and" bike next to car", offer a comprehensive
scene understanding of an image, and have already shown their great utility in connecting …

Human‐centered artificial intelligence and machine learning

MO Riedl - Human behavior and emerging technologies, 2019 - Wiley Online Library
Humans are increasingly coming into contact with artificial intelligence (AI) and machine
learning (ML) systems. Human‐centered AI is a perspective on AI and ML that algorithms …

Visual commonsense r-cnn

T Wang, J Huang, H Zhang… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
We present a novel unsupervised feature representation learning method, Visual
Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an …

Cross-media analysis and reasoning: advances and directions

Y Peng, W Zhu, Y Zhao, C Xu, Q Huang, H Lu… - Frontiers of Information …, 2017 - Springer
Cross-media analysis and reasoning is an active research area in computer science, and a
promising direction for artificial intelligence. However, to the best of our knowledge, no …