A comprehensive survey of scene graphs: Generation and application
Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …
attributes, and relationships between objects in the scene. As computer vision technology …
Knowledge graphs meet multi-modal learning: A comprehensive survey
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …
semantic web community's exploration into multi-modal dimensions unlocking new avenues …
Relation-aware graph attention network for visual question answering
In order to answer semantically-complicated questions about an image, a Visual Question
Answering (VQA) model needs to fully understand the visual scene in the image, especially …
Answering (VQA) model needs to fully understand the visual scene in the image, especially …
Visual commonsense r-cnn
We present a novel unsupervised feature representation learning method, Visual
Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an …
Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an …
Kvqa: Knowledge-aware visual question answering
Abstract Visual Question Answering (VQA) has emerged as an important problem spanning
Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In …
Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In …
Re-attention for visual question answering
A simultaneous understanding of questions and images is crucial in Visual Question
Answering (VQA). While the existing models have achieved satisfactory performance by …
Answering (VQA). While the existing models have achieved satisfactory performance by …
Dual self-attention with co-attention networks for visual question answering
Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …
language has been proposed and aroused wide interests. In previous VQA methods …
Learning visual commonsense for robust scene graph generation
Scene graph generation models understand the scene through object and predicate
recognition, but are prone to mistakes due to the challenges of perception in the wild …
recognition, but are prone to mistakes due to the challenges of perception in the wild …
Acmm: Aligned cross-modal memory for few-shot image and sentence matching
Image and sentence matching has drawn much attention recently, but due to the lack of
sufficient pairwise data for training, most previous methods still cannot well associate those …
sufficient pairwise data for training, most previous methods still cannot well associate those …
A survey of methods, datasets and evaluation metrics for visual question answering
Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …
captured the attention of both computer vision as well as natural language processing …