Clip in medical imaging: A comprehensive survey

Z Zhao, Y Liu, H Wu, M Wang, Y Li, S Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training
paradigm, successfully introduces text supervision to vision models. It has shown promising …

Med-unic: Unifying cross-lingual medical vision-language pre-training by diminishing bias

Z Wan, C Liu, M Zhang, J Fu, B Wang… - Advances in …, 2024 - proceedings.neurips.cc
The scarcity of data presents a critical obstacle to the efficacy of medical vision-language pre-
training (VLP). A potential solution lies in the combination of datasets from various language …

A medical multimodal large language model for future pandemics

F Liu, T Zhu, X Wu, B Yang, C You, C Wang, L Lu… - NPJ Digital …, 2023 - nature.com
Deep neural networks have been integrated into the whole clinical decision procedure
which can improve the efficiency of diagnosis and alleviate the heavy workload of …

Visual–language foundation models in medicine

C Liu, Y **, Z Guan, T Li, Y Qin, B Qian, Z Jiang… - The Visual …, 2024 - Springer
By integrating visual and linguistic understanding, visual–language foundation models
(VLFMs) have the great potential to advance the interpretation of medical data, thereby …

Cxr-clip: Toward large scale chest x-ray language-image pre-training

K You, J Gu, J Ham, B Park, J Kim, EK Hong… - … Conference on Medical …, 2023 - Springer
A large-scale image-text pair dataset has greatly contributed to the development of vision-
language pre-training (VLP) models, which enable zero-shot or few-shot classification …

Imitate: Clinical prior guided hierarchical vision-language pre-training

C Liu, S Cheng, M Shi, A Shah, W Bai… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In medical Vision-Language Pre-training (VLP), significant work focuses on extracting text
and image features from clinical reports and medical images. Yet, existing methods may …

Enhancing representation in radiography-reports foundation model: A granular alignment algorithm using masked contrastive learning

W Huang, C Li, HY Zhou, H Yang, J Liu, Y Liang… - Nature …, 2024 - nature.com
Recently, multi-modal vision-language foundation models have gained significant attention
in the medical field. While these models offer great opportunities, they still face crucial …

Carzero: Cross-attention alignment for radiology zero-shot classification

H Lai, Q Yao, Z Jiang, R Wang, Z He… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract The advancement of Zero-Shot Learning in the medical domain has been driven
forward by using pre-trained models on large-scale image-text pairs focusing on image-text …

Exploring scalable medical image encoders beyond text supervision

F Pérez-García, H Sharma, S Bond-Taylor… - Nature Machine …, 2025 - nature.com
Abstract Language-supervised pretraining has proven to be a valuable method for extracting
semantically meaningful features from images, serving as a foundational element in …

Semi-supervised medical report generation via graph-guided hybrid feature consistency

K Zhang, H Jiang, J Zhang, Q Huang… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Medical report generation generates the corresponding report according to the given
radiology image, which has been attracting increasing research interest. However, existing …