Multi-modal masked autoencoders for medical vision-and-language pre-training

Z Chen, Y Du, J Hu, Y Liu, G Li, X Wan… - … Conference on Medical …, 2022 - Springer
Medical vision-and-language pre-training provides a feasible solution to extract effective
vision-and-language representations from medical images and texts. However, few studies …

Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge

Z Chen, G Li, X Wan - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
Medical vision-and-language pre-training (Med-VLP) has received considerable attention
owing to its applicability to extracting generic vision-and-language representations from …

Due: End-to-end document understanding benchmark

Ł Borchmann, M Pietruszka, T Stanislawek… - Thirty-fifth Conference …, 2021 - openreview.net
Understanding documents with rich layouts plays a vital role in digitization and hyper-
automation but remains a challenging topic in the NLP research community. Additionally, the …

Automatic related work generation: A meta study

X Li, J Ouyang - ar** medical image-text to a joint space via masked modeling
Z Chen, Y Du, J Hu, Y Liu, G Li, X Wan… - Medical Image Analysis, 2024 - Elsevier
Recently, masked autoencoders have demonstrated their feasibility in extracting effective
image and text features (eg, BERT for natural language processing (NLP) and MAE in …

Medical vision language pretraining: A survey

P Shrestha, S Amgain, B Khanal, CA Linte… - arxiv preprint arxiv …, 2023 - arxiv.org
Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to
the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and …

Unidcp: Unifying multiple medical vision-language tasks via dynamic cross-modal learnable prompts

C Zhan, Y Zhang, Y Lin, G Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-
growing medical diagnostics application. However, most Med-VLP models learn task …

Multimodality for NLP-centered applications: Resources, advances and frontiers

M Garg, S Wazarkar, M Singh… - Proceedings of the …, 2022 - aclanthology.org
With the development of multimodal systems and natural language generation techniques,
the resurgence of multimodal datasets has attracted significant research interests, which …

OVQA: A clinically generated visual question answering dataset

Y Huang, X Wang, F Liu, G Huang - … of the 45th International ACM SIGIR …, 2022 - dl.acm.org
Medical visual question answering (Med-VQA) is a challenging problem that aims to take a
medical image and a clinical question about the image as input and output a correct answer …

Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering

Y Liu, B Chen, S Wang, G Lu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Medical visual question answering (Medical VQA) is a critical cross-modal interaction task
that garnered considerable attention in the medical domain. Several existing methods …