Visual genome: Connecting language and vision using crowdsourced dense image annotations
Despite progress in perceptual tasks such as image classification, computers still perform
poorly on cognitive tasks such as image description and question answering. Cognition is …
poorly on cognitive tasks such as image description and question answering. Cognition is …
Vqa: Visual question answering
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given
an image and a natural language question about the image, the task is to provide an …
an image and a natural language question about the image, the task is to provide an …
Knowledge graphs meet multi-modal learning: A comprehensive survey
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …
semantic web community's exploration into multi-modal dimensions unlocking new avenues …
Ok-vqa: A visual question answering benchmark requiring external knowledge
Abstract Visual Question Answering (VQA) in its ideal form lets us study reasoning in the
joint space of vision and language and serves as a proxy for the AI task of scene …
joint space of vision and language and serves as a proxy for the AI task of scene …
Neural motifs: Scene graph parsing with global context
We investigate the problem of producing structured graph representations of visual scenes.
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …
Krisp: Integrating implicit and symbolic knowledge for open-domain knowledge-based vqa
One of the most challenging question types in VQA is when answering the question requires
outside knowledge not present in the image. In this work we study open-domain knowledge …
outside knowledge not present in the image. In this work we study open-domain knowledge …
Visual translation embedding network for visual relation detection
Visual relations, such as" person ride bike" and" bike next to car", offer a comprehensive
scene understanding of an image, and have already shown their great utility in connecting …
scene understanding of an image, and have already shown their great utility in connecting …
Human‐centered artificial intelligence and machine learning
MO Riedl - Human behavior and emerging technologies, 2019 - Wiley Online Library
Humans are increasingly coming into contact with artificial intelligence (AI) and machine
learning (ML) systems. Human‐centered AI is a perspective on AI and ML that algorithms …
learning (ML) systems. Human‐centered AI is a perspective on AI and ML that algorithms …
Visual commonsense r-cnn
We present a novel unsupervised feature representation learning method, Visual
Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an …
Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an …
Cross-media analysis and reasoning: advances and directions
Cross-media analysis and reasoning is an active research area in computer science, and a
promising direction for artificial intelligence. However, to the best of our knowledge, no …
promising direction for artificial intelligence. However, to the best of our knowledge, no …