- Academic Search

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

Save Cite Cited by 218 Related articles All 8 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Save Cite Cited by 197 Related articles All 7 versions Free GPT-4 Library Search View as HTML

[Free GPT-4]

[PDF] thecvf.com

Scaling language-image pre-training via masking

Y Li, H Fan, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …

Save Cite Cited by 315 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Bottom-up and top-down attention for image captioning and visual question answering

P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com

Top-down visual attention mechanisms have been used extensively in image captioning
and visual question answering (VQA) to enable deeper image understanding through fine …

Save Cite Cited by 5637 Related articles All 16 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Gqa: A new dataset for real-world visual reasoning and compositional question answering

DA Hudson, CD Manning - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com

We introduce GQA, a new dataset for real-world visual reasoning and compositional
question answering, seeking to address key shortcomings of previous VQA datasets. We …

Save Cite Cited by 1982 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Making the v in vqa matter: Elevating the role of image understanding in visual question answering

Y Goyal, T Khot, D Summers-Stay… - Proceedings of the …, 2017 - openaccess.thecvf.com

Problems at the intersection of vision and language are of significant importance both as
challenging research questions and for the rich set of applications they enable. However …

Save Cite Cited by 3436 Related articles All 15 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Deep modular co-attention networks for visual question answering

Z Yu, J Yu, Y Cui, D Tao, Q Tian - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Abstract Visual Question Answering (VQA) requires a fine-grained and simultaneous
understanding of both the visual content of images and the textual content of questions …

Save Cite Cited by 1052 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Bilinear attention networks

JH Kim, J Jun, BT Zhang - Advances in neural information …, 2018 - proceedings.neurips.cc

Attention networks in multimodal learning provide an efficient way to utilize given visual
information selectively. However, the computational cost to learn attention distributions for …

Save Cite Cited by 1106 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jair.org Full View

Explainable deep learning: A field guide for the uninitiated

G Ras, N **e, M Van Gerven, D Doran - Journal of Artificial Intelligence …, 2022 - jair.org

Deep neural networks (DNNs) are an indispensable machine learning tool despite the
difficulty of diagnosing what aspects of a model's input drive its decisions. In countless real …

Save Cite Cited by 562 Related articles All 11 versions Free GPT-4 View as HTML

Knowledge base graph embedding module design for Visual question answering model

W Zheng, L Yin, X Chen, Z Ma, S Liu, B Yang - Pattern recognition, 2021 - Elsevier

In this paper, a knowledge base graph embedding module is constructed to extend the
versatility of knowledge-based VQA (Visual Question Answering) models. The knowledge …

Save Cite Cited by 217 Related articles All 4 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Tips and tricks for visual question answering: Learnings from the 2017 challenge

The multi-modal fusion in visual question answering: a review of attention mechanisms

Vision-language pre-training: Basics, recent advances, and future trends

Scaling language-image pre-training via masking

Bottom-up and top-down attention for image captioning and visual question answering

Gqa: A new dataset for real-world visual reasoning and compositional question answering

Making the v in vqa matter: Elevating the role of image understanding in visual question answering

Deep modular co-attention networks for visual question answering

Bilinear attention networks

Explainable deep learning: A field guide for the uninitiated

Knowledge base graph embedding module design for Visual question answering model