Rubi: Reducing unimodal biases for visual question answering

R Cadene, C Dancette, M Cord… - Advances in neural …, 2019 - proceedings.neurips.cc
Abstract Visual Question Answering (VQA) is the task of answering questions about an
image. Some VQA models often exploit unimodal biases to provide the correct answer …

Cross-modal knowledge reasoning for knowledge-based visual question answering

J Yu, Z Zhu, Y Wang, W Zhang, Y Hu, J Tan - Pattern Recognition, 2020 - Elsevier
Abstract Knowledge-based Visual Question Answering (KVQA) requires external knowledge
beyond the visible content to answer questions about an image. This ability is challenging …

MRA-Net: Improving VQA via multi-modal relation attention network

L Peng, Y Yang, Z Wang, Z Huang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Visual Question Answering (VQA) is a task to answer natural language questions tied to the
content of visual images. Most recent VQA approaches usually apply attention mechanism to …

Re-attention for visual question answering

W Guo, Y Zhang, J Yang, X Yuan - IEEE Transactions on Image …, 2021 - ieeexplore.ieee.org
A simultaneous understanding of questions and images is crucial in Visual Question
Answering (VQA). While the existing models have achieved satisfactory performance by …

Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey

J Kuang, Y Shen, J **e, H Luo, Z Xu, R Li, Y Li… - ACM Computing …, 2024 - dl.acm.org
Visual Question Answering (VQA) is a challenge task that combines natural language
processing and computer vision techniques and gradually becomes a benchmark test task …

Multimedia intelligence: When multimedia meets artificial intelligence

W Zhu, X Wang, W Gao - IEEE Transactions on Multimedia, 2020 - ieeexplore.ieee.org
Owing to the rich emerging multimedia applications and services in the past decade, super
large amount of multimedia data has been produced for the purpose of advanced research …

Indonesian chatbot of university admission using a question answering system based on sequence-to-sequence model

YW Chandra, S Suyanto - Procedia Computer Science, 2019 - Elsevier
Question and Answering (QA) system is a problem in natural language processing that can
be used as the system of dialogs and chatbots. It can be used as a customer service that can …

Boosting the power of small multimodal reasoning models to match larger models with self-consistency training

C Tan, J Wei, Z Gao, L Sun, S Li, R Guo, B Yu… - European Conference on …, 2024 - Springer
Multimodal reasoning is a challenging task that requires models to reason across multiple
modalities to answer questions. Existing approaches have made progress by incorporating …

CRA-Net: Composed relation attention network for visual question answering

L Peng, Y Yang, Z Wang, X Wu, Z Huang - Proceedings of the 27th ACM …, 2019 - dl.acm.org
The task of Visual Question Answering (VQA) is to answer a natural language question tied
to the content of a visual image. Most existing VQA models either apply attention mechanism …

Km4: Visual reasoning via knowledge embedding memory model with mutual modulation

W Zheng, L Yan, C Gou, FY Wang - Information Fusion, 2021 - Elsevier
Visual reasoning is a special kind of visual question answering, which is essentially multi-
step and compositional, and also requires intensive text-visual interaction. The most …