Rubi: Reducing unimodal biases for visual question answering
Abstract Visual Question Answering (VQA) is the task of answering questions about an
image. Some VQA models often exploit unimodal biases to provide the correct answer …
image. Some VQA models often exploit unimodal biases to provide the correct answer …
Cross-modal knowledge reasoning for knowledge-based visual question answering
Abstract Knowledge-based Visual Question Answering (KVQA) requires external knowledge
beyond the visible content to answer questions about an image. This ability is challenging …
beyond the visible content to answer questions about an image. This ability is challenging …
MRA-Net: Improving VQA via multi-modal relation attention network
Visual Question Answering (VQA) is a task to answer natural language questions tied to the
content of visual images. Most recent VQA approaches usually apply attention mechanism to …
content of visual images. Most recent VQA approaches usually apply attention mechanism to …
Re-attention for visual question answering
A simultaneous understanding of questions and images is crucial in Visual Question
Answering (VQA). While the existing models have achieved satisfactory performance by …
Answering (VQA). While the existing models have achieved satisfactory performance by …
Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey
Visual Question Answering (VQA) is a challenge task that combines natural language
processing and computer vision techniques and gradually becomes a benchmark test task …
processing and computer vision techniques and gradually becomes a benchmark test task …
Multimedia intelligence: When multimedia meets artificial intelligence
Owing to the rich emerging multimedia applications and services in the past decade, super
large amount of multimedia data has been produced for the purpose of advanced research …
large amount of multimedia data has been produced for the purpose of advanced research …
Indonesian chatbot of university admission using a question answering system based on sequence-to-sequence model
YW Chandra, S Suyanto - Procedia Computer Science, 2019 - Elsevier
Question and Answering (QA) system is a problem in natural language processing that can
be used as the system of dialogs and chatbots. It can be used as a customer service that can …
be used as the system of dialogs and chatbots. It can be used as a customer service that can …
Boosting the power of small multimodal reasoning models to match larger models with self-consistency training
Multimodal reasoning is a challenging task that requires models to reason across multiple
modalities to answer questions. Existing approaches have made progress by incorporating …
modalities to answer questions. Existing approaches have made progress by incorporating …
CRA-Net: Composed relation attention network for visual question answering
The task of Visual Question Answering (VQA) is to answer a natural language question tied
to the content of a visual image. Most existing VQA models either apply attention mechanism …
to the content of a visual image. Most existing VQA models either apply attention mechanism …
Km4: Visual reasoning via knowledge embedding memory model with mutual modulation
Visual reasoning is a special kind of visual question answering, which is essentially multi-
step and compositional, and also requires intensive text-visual interaction. The most …
step and compositional, and also requires intensive text-visual interaction. The most …