- Academic Search

Q Lin, Y Zhu, X Mei, L Huang, J Ma, K He, Z Peng… - Information …, 2024 - Elsevier

The rapid development of artificial intelligence has constantly reshaped the field of
intelligent healthcare and medicine. As a vital technology, multimodal learning has …

Save Cite Cited by 9 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

Save Cite Cited by 23 Related articles All 2 versions Free GPT-4

Robust Visual Question Answering utilizing Bias Instances and Label Imbalance

L Zhao, K Li, J Qi, Y Sun, Z Zhu - Knowledge-Based Systems, 2024 - Elsevier

Abstract Visual Question Answering (VQA) models often suffer from bias issues which cause
their predictions to rely on superficial correlations in datasets rather than the intrinsic …

Save Cite Cited by 1 Related articles

[Free GPT-4]

[PDF] acm.org

Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey

J Kuang, Y Shen, J **e, H Luo, Z Xu, R Li, Y Li… - ACM Computing …, 2024 - dl.acm.org

Visual Question Answering (VQA) is a challenge task that combines natural language
processing and computer vision techniques and gradually becomes a benchmark test task …

[Free GPT-4]

[PDF] arxiv.org

Show me what and where has changed? question answering and grounding for remote sensing change detection

K Li, F Dong, D Wang, S Li, Q Wang, X Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

Remote sensing change detection aims to perceive changes occurring on the Earth's
surface from remote sensing data in different periods, and feed these changes back to …

Save Cite Cited by 1 Related articles View as HTML

Bias-guided margin loss for robust Visual Question Answering

Y Sun, J Qi, Z Zhu, K Li, L Zhao, L Lv - Information Processing & …, 2025 - Elsevier

Abstract Visual Question Answering (VQA) suffers from language prior issue, where models
tend to rely on dataset biases to answer the questions while ignoring the image information …

Save Cite Related articles

[Free GPT-4]

[PDF] arxiv.org

A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning

NM Foteinopoulou, E Ghorbel, D Aouada - arxiv preprint arxiv …, 2024 - arxiv.org

Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like
face forgery detection, where viewers often struggle to distinguish between real and …

Save Cite Cited by 1 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks

KC Kahl, S Erkan, J Traub, CT Lüth… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision-Language Models (VLMs) have great potential in medical tasks, like Visual Question
Answering (VQA), where they could act as interactive assistants for both patients and …

Save Cite Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Task Progressive Curriculum Learning for Robust Visual Question Answering

A Akl, A Khamis, Z Wang, A Cheraghian… - arxiv preprint arxiv …, 2024 - arxiv.org

Visual Question Answering (VQA) systems are known for their poor performance in out-of-
distribution datasets. An issue that was addressed in previous works through ensemble …

Save Cite Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis

R Zhang, B Wang, J Zhang, Z Bian, C Feng… - arxiv preprint arxiv …, 2025 - arxiv.org

The increasing availability of traffic videos functioning on a 24/7/365 time scale has the great
potential of increasing the spatio-temporal coverage of traffic accidents, which will help …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Robust visual question answering: Datasets, methods, and future challenges

Has multimodal learning delivered universal intelligence in healthcare? A comprehensive survey

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

Robust Visual Question Answering utilizing Bias Instances and Label Imbalance

Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey

Show me what and where has changed? question answering and grounding for remote sensing change detection

Bias-guided margin loss for robust Visual Question Answering

A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning

SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks

Task Progressive Curriculum Learning for Robust Visual Question Answering

When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis