[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

J Li, J Chen, Y Tang, C Wang, BA Landman… - Medical image …, 2023 - Elsevier
Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Tip-adapter: Training-free adaption of clip for few-shot classification

R Zhang, W Zhang, R Fang, P Gao, K Li, J Dai… - European conference on …, 2022 - Springer
Abstract Contrastive Vision-Language Pre-training, known as CLIP, has provided a new
paradigm for learning visual representations using large-scale image-text pairs. It shows …

Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

R Zhang, Z Guo, P Gao, R Fang… - Advances in neural …, 2022 - proceedings.neurips.cc
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …

Pointclip: Point cloud understanding by clip

R Zhang, Z Guo, W Zhang, K Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training
(CLIP) have shown inspirational performance on 2D visual recognition, which learns to …

Conditional detr for fast training convergence

D Meng, X Chen, Z Fan, G Zeng, H Li… - Proceedings of the …, 2021 - openaccess.thecvf.com
The recently-developed DETR approach applies the transformer encoder and decoder
architecture to object detection and achieves promising performance. In this paper, we …

Understanding the robustness in vision transformers

D Zhou, Z Yu, E **e, C **ao… - International …, 2022 - proceedings.mlr.press
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against
various corruptions. Although this property is partly attributed to the self-attention …

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

Cvt: Introducing convolutions to vision transformers

H Wu, B **ao, N Codella, M Liu, X Dai… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present in this paper a new architecture, named Convolutional vision Transformer (CvT),
that improves Vision Transformer (ViT) in performance and efficiency by introducing …