The multi-modal fusion in visual question answering: a review of attention mechanisms

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com
Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

Natural language processing for smart healthcare

B Zhou, G Yang, Z Shi, S Ma - IEEE Reviews in Biomedical …, 2022 - ieeexplore.ieee.org
Smart healthcare has achieved significant progress in recent years. Emerging artificial
intelligence (AI) technologies enable various smart applications across various healthcare …

Biomedclip: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

S Zhang, Y Xu, N Usuyama, H Xu, J Bagga… - arxiv preprint arxiv …, 2023 - arxiv.org
Biomedical data is inherently multimodal, comprising physical measurements and natural
language narratives. A generalist biomedical AI model needs to simultaneously process …

[PDF][PDF] Large-scale domain-specific pretraining for biomedical vision-language processing

S Zhang, Y Xu, N Usuyama, J Bagga… - arxiv preprint arxiv …, 2023 - researchgate.net
Contrastive pretraining on parallel image-text data has attained great success in vision-
language processing (VLP), as exemplified by CLIP and related methods. However, prior …

Pubmedclip: How much does clip benefit visual question answering in the medical domain?

S Eslami, C Meinel, G De Melo - Findings of the Association for …, 2023 - aclanthology.org
Abstract Contrastive Language–Image Pre-training (CLIP) has shown remarkable success
in learning with cross-modal supervision from extensive amounts of image–text pairs …

Slake: A semantically-labeled knowledge-enhanced dataset for medical visual question answering

B Liu, LM Zhan, L Xu, L Ma, Y Yang… - 2021 IEEE 18th …, 2021 - ieeexplore.ieee.org
Medical visual question answering (Med-VQA) has tremendous potential in healthcare.
However, the development of this technology is hindered by the lacking of publicly-available …

Vision-language models for medical report generation and visual question answering: A review

I Hartsock, G Rasool - Frontiers in Artificial Intelligence, 2024 - frontiersin.org
Medical vision-language models (VLMs) combine computer vision (CV) and natural
language processing (NLP) to analyze visual and textual medical data. Our paper reviews …

Foundation model for advancing healthcare: challenges, opportunities and future directions

Y He, F Huang, X Jiang, Y Nie, M Wang… - IEEE Reviews in …, 2024 - ieeexplore.ieee.org
Foundation model, trained on a diverse range of data and adaptable to a myriad of tasks, is
advancing healthcare. It fosters the development of healthcare artificial intelligence (AI) …

Medical visual question answering: A survey

Z Lin, D Zhang, Q Tao, D Shi, G Haffari, Q Wu… - Artificial Intelligence in …, 2023 - Elsevier
Abstract Medical Visual Question Answering (VQA) is a combination of medical artificial
intelligence and popular VQA challenges. Given a medical image and a clinically relevant …

Endora: Video Generation Models as Endoscopy Simulators

C Li, H Liu, Y Liu, BY Feng, W Li, X Liu, Z Chen… - … Conference on Medical …, 2024 - Springer
Generative models hold promise for revolutionizing medical education, robot-assisted
surgery, and data augmentation for machine learning. Despite progress in generating 2D …