Comparing vision transformers and convolutional neural networks for image classification: A literature review

J Maurício, I Domingues, J Bernardino - Applied Sciences, 2023‏ - mdpi.com
Transformers are models that implement a mechanism of self-attention, individually
weighting the importance of each part of the input data. Their use in image classification …

Transformer for object detection: Review and benchmark

Y Li, N Miao, L Ma, F Shuang, X Huang - Engineering Applications of …, 2023‏ - Elsevier
Object detection is a crucial task in computer vision (CV). With the rapid advancement of
Transformer-based models in natural language processing (NLP) and various visual tasks …

Diffusion models for adversarial purification

W Nie, B Guo, Y Huang, C **ao, A Vahdat… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Adversarial purification refers to a class of defense methods that remove adversarial
perturbations using a generative model. These methods do not make assumptions on the …

Understanding the robustness in vision transformers

D Zhou, Z Yu, E **e, C **ao… - International …, 2022‏ - proceedings.mlr.press
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against
various corruptions. Although this property is partly attributed to the self-attention …

Towards robust vision transformer

X Mao, G Qi, Y Chen, X Li, R Duan… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
Abstract Recent advances on Vision Transformer (ViT) and its improved variants have
shown that self-attention-based networks surpass traditional Convolutional Neural Networks …

Understanding the robustness of 3D object detection with bird's-eye-view representations in autonomous driving

Z Zhu, Y Zhang, H Chen, Y Dong… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Abstract 3D object detection is an essential perception task in autonomous driving to
understand the environments. The Bird's-Eye-View (BEV) representations have significantly …

On the adversarial robustness of vision transformers

R Shao, Z Shi, J Yi, PY Chen, CJ Hsieh - arxiv preprint arxiv:2103.15670, 2021‏ - arxiv.org
Following the success in advancing natural language processing and understanding,
transformers are expected to bring revolutionary changes to computer vision. This work …

Revisiting adversarial training for imagenet: Architectures, training and generalization across threat models

ND Singh, F Croce, M Hein - Advances in Neural …, 2023‏ - proceedings.neurips.cc
While adversarial training has been extensively studied for ResNet architectures and low
resolution datasets like CIFAR-10, much less is known for ImageNet. Given the recent …

A comprehensive study on robustness of image classification models: Benchmarking and rethinking

C Liu, Y Dong, W **ang, X Yang, H Su, J Zhu… - International Journal of …, 2024‏ - Springer
The robustness of deep neural networks is frequently compromised when faced with
adversarial examples, common corruptions, and distribution shifts, posing a significant …

Mult: An end-to-end multitask learning transformer

D Bhattacharjee, T Zhang… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to
simultaneously learn multiple high-level vision tasks, including depth estimation, semantic …