Benchmarks for automated commonsense reasoning: A survey

E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Query and attention augmentation for knowledge-based explainable reasoning

Y Zhang, M Jiang, Q Zhao - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Explainable visual question answering (VQA) models have been developed with neural
modules and query-based knowledge incorporation to answer knowledge-requiring …

Dynamic key-value memory enhanced multi-step graph reasoning for knowledge-based visual question answering

M Li, MF Moens - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Abstract Knowledge-based visual question answering (VQA) is a vision-language task that
requires an agent to correctly answer image-related questions using knowledge that is not …

A survey on knowledge-enhanced multimodal learning

M Lymperaiou, G Stamou - Artificial Intelligence Review, 2024 - Springer
Multimodal learning has been a field of increasing interest, aiming to combine various
modalities in a single joint representation. Especially in the area of visiolinguistic (VL) …

Knowledge is power: Open-world knowledge representation learning for knowledge-based visual reasoning

W Zheng, L Yan, FY Wang - Artificial Intelligence, 2024 - Elsevier
Abstract Knowledge-based visual reasoning requires the ability to associate outside
knowledge that is not present in a given image for cross-modal visual understanding. Two …

A survey on interpretable cross-modal reasoning

D Xue, S Qian, Z Zhou, C Xu - arxiv preprint arxiv:2309.01955, 2023 - arxiv.org
In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning
across different modalities, has emerged as a pivotal area with applications spanning from …

Interpretable visual reasoning: A survey

F He, Y Wang, X Miao, X Sun - Image and Vision Computing, 2021 - Elsevier
Visual reasoning refers to the process of solving questions about visual information. At
present, most visual reasoning models are mainly based on deep learning and end-to-end …

VQA with no questions-answers training

BZ Vatashsky, S Ullman - … of the IEEE/CVF Conference on …, 2020 - openaccess.thecvf.com
Methods for teaching machines to answer visual questions have made significant progress
in recent years, but current methods still lack important human capabilities, including …

Towards One-to-Many Visual Question Answering

H Ji, Q Si, Z Lin, Y Cao, W Wang - Findings of the Association for …, 2024 - aclanthology.org
Abstract Most existing Visual Question Answering (VQA) systems are constrained to support
domain-specific questions, ie, to train different models separately for different VQA tasks …