Advances in medical image analysis with vision transformers: a comprehensive review

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024 - Elsevier
The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

Transformers in medical imaging: A survey

F Shamshad, S Khan, SW Zamir, MH Khan… - Medical Image …, 2023 - Elsevier
Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …

A foundation model for generalizable disease detection from retinal images

Y Zhou, MA Chia, SK Wagner, MS Ayhan… - Nature, 2023 - nature.com
Medical artificial intelligence (AI) offers great potential for recognizing signs of health
conditions in retinal images and expediting the diagnosis of eye diseases and systemic …

[HTML][HTML] A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics

K He, R Mao, Q Lin, Y Ruan, X Lan, M Feng… - Information …, 2025 - Elsevier
The utilization of large language models (LLMs) for Healthcare has generated both
excitement and concern due to their ability to effectively respond to free-text queries with …

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Spatext: Spatio-textual representation for controllable image generation

O Avrahami, T Hayes, O Gafni… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent text-to-image diffusion models are able to generate convincing results of
unprecedented quality. However, it is nearly impossible to control the shapes of different …

Dynamicvit: Efficient vision transformers with dynamic token sparsification

Y Rao, W Zhao, B Liu, J Lu, J Zhou… - Advances in neural …, 2021 - proceedings.neurips.cc
Attention is sparse in vision transformers. We observe the final prediction in vision
transformers is only based on a subset of most informative tokens, which is sufficient for …

Segclip: Patch aggregation with learnable centers for open-vocabulary semantic segmentation

H Luo, J Bao, Y Wu, X He, T Li - International Conference on …, 2023 - proceedings.mlr.press
Recently, the contrastive language-image pre-training, eg, CLIP, has demonstrated
promising results on various downstream tasks. The pre-trained model can capture enriched …

Break-a-scene: Extracting multiple concepts from a single image

O Avrahami, K Aberman, O Fried, D Cohen-Or… - SIGGRAPH Asia 2023 …, 2023 - dl.acm.org
Text-to-image model personalization aims to introduce a user-provided concept to the
model, allowing its synthesis in diverse contexts. However, current methods primarily focus …

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

J Li, J Chen, Y Tang, C Wang, BA Landman… - Medical image …, 2023 - Elsevier
Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …