Multi-modal masked autoencoders for medical vision-and-language pre-training
Medical vision-and-language pre-training provides a feasible solution to extract effective
vision-and-language representations from medical images and texts. However, few studies …
vision-and-language representations from medical images and texts. However, few studies …
Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge
Medical vision-and-language pre-training (Med-VLP) has received considerable attention
owing to its applicability to extracting generic vision-and-language representations from …
owing to its applicability to extracting generic vision-and-language representations from …
Due: End-to-end document understanding benchmark
Understanding documents with rich layouts plays a vital role in digitization and hyper-
automation but remains a challenging topic in the NLP research community. Additionally, the …
automation but remains a challenging topic in the NLP research community. Additionally, the …
Automatic related work generation: A meta study
Medical vision language pretraining: A survey
Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to
the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and …
the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and …
Unidcp: Unifying multiple medical vision-language tasks via dynamic cross-modal learnable prompts
Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-
growing medical diagnostics application. However, most Med-VLP models learn task …
growing medical diagnostics application. However, most Med-VLP models learn task …
Multimodality for NLP-centered applications: Resources, advances and frontiers
With the development of multimodal systems and natural language generation techniques,
the resurgence of multimodal datasets has attracted significant research interests, which …
the resurgence of multimodal datasets has attracted significant research interests, which …
OVQA: A clinically generated visual question answering dataset
Medical visual question answering (Med-VQA) is a challenging problem that aims to take a
medical image and a clinical question about the image as input and output a correct answer …
medical image and a clinical question about the image as input and output a correct answer …
Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering
Medical visual question answering (Medical VQA) is a critical cross-modal interaction task
that garnered considerable attention in the medical domain. Several existing methods …
that garnered considerable attention in the medical domain. Several existing methods …