Has multimodal learning delivered universal intelligence in healthcare? A comprehensive survey
The rapid development of artificial intelligence has constantly reshaped the field of
intelligent healthcare and medicine. As a vital technology, multimodal learning has …
intelligent healthcare and medicine. As a vital technology, multimodal learning has …
From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities
The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …
Robust Visual Question Answering utilizing Bias Instances and Label Imbalance
L Zhao, K Li, J Qi, Y Sun, Z Zhu - Knowledge-Based Systems, 2024 - Elsevier
Abstract Visual Question Answering (VQA) models often suffer from bias issues which cause
their predictions to rely on superficial correlations in datasets rather than the intrinsic …
their predictions to rely on superficial correlations in datasets rather than the intrinsic …
Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey
Visual Question Answering (VQA) is a challenge task that combines natural language
processing and computer vision techniques and gradually becomes a benchmark test task …
processing and computer vision techniques and gradually becomes a benchmark test task …
Show me what and where has changed? question answering and grounding for remote sensing change detection
Remote sensing change detection aims to perceive changes occurring on the Earth's
surface from remote sensing data in different periods, and feed these changes back to …
surface from remote sensing data in different periods, and feed these changes back to …
Bias-guided margin loss for robust Visual Question Answering
Y Sun, J Qi, Z Zhu, K Li, L Zhao, L Lv - Information Processing & …, 2025 - Elsevier
Abstract Visual Question Answering (VQA) suffers from language prior issue, where models
tend to rely on dataset biases to answer the questions while ignoring the image information …
tend to rely on dataset biases to answer the questions while ignoring the image information …
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning
Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like
face forgery detection, where viewers often struggle to distinguish between real and …
face forgery detection, where viewers often struggle to distinguish between real and …
SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks
Vision-Language Models (VLMs) have great potential in medical tasks, like Visual Question
Answering (VQA), where they could act as interactive assistants for both patients and …
Answering (VQA), where they could act as interactive assistants for both patients and …
Task Progressive Curriculum Learning for Robust Visual Question Answering
Visual Question Answering (VQA) systems are known for their poor performance in out-of-
distribution datasets. An issue that was addressed in previous works through ensemble …
distribution datasets. An issue that was addressed in previous works through ensemble …
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
The increasing availability of traffic videos functioning on a 24/7/365 time scale has the great
potential of increasing the spatio-temporal coverage of traffic accidents, which will help …
potential of increasing the spatio-temporal coverage of traffic accidents, which will help …