Clip in medical imaging: A comprehensive survey
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training
paradigm, successfully introduces text supervision to vision models. It has shown promising …
paradigm, successfully introduces text supervision to vision models. It has shown promising …
Med-unic: Unifying cross-lingual medical vision-language pre-training by diminishing bias
The scarcity of data presents a critical obstacle to the efficacy of medical vision-language pre-
training (VLP). A potential solution lies in the combination of datasets from various language …
training (VLP). A potential solution lies in the combination of datasets from various language …
A medical multimodal large language model for future pandemics
Deep neural networks have been integrated into the whole clinical decision procedure
which can improve the efficiency of diagnosis and alleviate the heavy workload of …
which can improve the efficiency of diagnosis and alleviate the heavy workload of …
Visual–language foundation models in medicine
By integrating visual and linguistic understanding, visual–language foundation models
(VLFMs) have the great potential to advance the interpretation of medical data, thereby …
(VLFMs) have the great potential to advance the interpretation of medical data, thereby …
Cxr-clip: Toward large scale chest x-ray language-image pre-training
A large-scale image-text pair dataset has greatly contributed to the development of vision-
language pre-training (VLP) models, which enable zero-shot or few-shot classification …
language pre-training (VLP) models, which enable zero-shot or few-shot classification …
Imitate: Clinical prior guided hierarchical vision-language pre-training
In medical Vision-Language Pre-training (VLP), significant work focuses on extracting text
and image features from clinical reports and medical images. Yet, existing methods may …
and image features from clinical reports and medical images. Yet, existing methods may …
Enhancing representation in radiography-reports foundation model: A granular alignment algorithm using masked contrastive learning
Recently, multi-modal vision-language foundation models have gained significant attention
in the medical field. While these models offer great opportunities, they still face crucial …
in the medical field. While these models offer great opportunities, they still face crucial …
Carzero: Cross-attention alignment for radiology zero-shot classification
Abstract The advancement of Zero-Shot Learning in the medical domain has been driven
forward by using pre-trained models on large-scale image-text pairs focusing on image-text …
forward by using pre-trained models on large-scale image-text pairs focusing on image-text …
Exploring scalable medical image encoders beyond text supervision
Abstract Language-supervised pretraining has proven to be a valuable method for extracting
semantically meaningful features from images, serving as a foundational element in …
semantically meaningful features from images, serving as a foundational element in …
Semi-supervised medical report generation via graph-guided hybrid feature consistency
Medical report generation generates the corresponding report according to the given
radiology image, which has been attracting increasing research interest. However, existing …
radiology image, which has been attracting increasing research interest. However, existing …