The multi-modal fusion in visual question answering: a review of attention mechanisms
Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …
fields of computer vision and natural language processing that requires a computer to output …
Vision-language models for medical report generation and visual question answering: A review
Medical vision-language models (VLMs) combine computer vision (CV) and natural
language processing (NLP) to analyze visual and textual medical data. Our paper reviews …
language processing (NLP) to analyze visual and textual medical data. Our paper reviews …
[PDF][PDF] Large-scale domain-specific pretraining for biomedical vision-language processing
Contrastive pretraining on parallel image-text data has attained great success in vision-
language processing (VLP), as exemplified by CLIP and related methods. However, prior …
language processing (VLP), as exemplified by CLIP and related methods. However, prior …
Slake: A semantically-labeled knowledge-enhanced dataset for medical visual question answering
B Liu, LM Zhan, L Xu, L Ma, Y Yang… - 2021 IEEE 18th …, 2021 - ieeexplore.ieee.org
Medical visual question answering (Med-VQA) has tremendous potential in healthcare.
However, the development of this technology is hindered by the lacking of publicly-available …
However, the development of this technology is hindered by the lacking of publicly-available …
Endora: Video Generation Models as Endoscopy Simulators
Generative models hold promise for revolutionizing medical education, robot-assisted
surgery, and data augmentation for machine learning. Despite progress in generating 2D …
surgery, and data augmentation for machine learning. Despite progress in generating 2D …
Natural language processing for smart healthcare
Smart healthcare has achieved significant progress in recent years. Emerging artificial
intelligence (AI) technologies enable various smart applications across various healthcare …
intelligence (AI) technologies enable various smart applications across various healthcare …
Biomedical question answering: a survey of approaches and challenges
Automatic Question Answering (QA) has been successfully applied in various domains such
as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables …
as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables …
Mmbert: Multimodal bert pretraining for improved medical vqa
Images in the medical domain are fundamentally different from the general domain images.
Consequently, it is infeasible to directly employ general domain Visual Question Answering …
Consequently, it is infeasible to directly employ general domain Visual Question Answering …
Multiple meta-model quantifying for medical visual question answering
Transfer learning is an important step to extract meaningful features and overcome the data
limitation in the medical Visual Question Answering (VQA) task. However, most of the …
limitation in the medical Visual Question Answering (VQA) task. However, most of the …
Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge
Medical vision-and-language pre-training (Med-VLP) has received considerable attention
owing to its applicability to extracting generic vision-and-language representations from …
owing to its applicability to extracting generic vision-and-language representations from …