Comparing vision transformers and convolutional neural networks for image classification: A literature review

J Maurício, I Domingues, J Bernardino - Applied Sciences, 2023 - mdpi.com
Transformers are models that implement a mechanism of self-attention, individually
weighting the importance of each part of the input data. Their use in image classification …

Transformer for object detection: Review and benchmark

Y Li, N Miao, L Ma, F Shuang, X Huang - Engineering Applications of …, 2023 - Elsevier
Object detection is a crucial task in computer vision (CV). With the rapid advancement of
Transformer-based models in natural language processing (NLP) and various visual tasks …

Diffusion models for adversarial purification

W Nie, B Guo, Y Huang, C **ao, A Vahdat… - arxiv preprint arxiv …, 2022 - arxiv.org
Adversarial purification refers to a class of defense methods that remove adversarial
perturbations using a generative model. These methods do not make assumptions on the …

Understanding the robustness in vision transformers

D Zhou, Z Yu, E **e, C **ao… - International …, 2022 - proceedings.mlr.press
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against
various corruptions. Although this property is partly attributed to the self-attention …

Diffusion visual counterfactual explanations

M Augustin, V Boreiko, F Croce… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Visual Counterfactual Explanations (VCEs) are an important tool to understand the
decisions of an image classifier. They are “small” but “realistic” semantic changes of the …

On the adversarial robustness of vision transformers

R Shao, Z Shi, J Yi, PY Chen, CJ Hsieh - arxiv preprint arxiv:2103.15670, 2021 - arxiv.org
Following the success in advancing natural language processing and understanding,
transformers are expected to bring revolutionary changes to computer vision. This work …

Towards robust vision transformer

X Mao, G Qi, Y Chen, X Li, R Duan… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Recent advances on Vision Transformer (ViT) and its improved variants have
shown that self-attention-based networks surpass traditional Convolutional Neural Networks …

Understanding the Robustness of 3D Object Detection With Bird's-Eye-View Representations in Autonomous Driving

Z Zhu, Y Zhang, H Chen, Y Dong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract 3D object detection is an essential perception task in autonomous driving to
understand the environments. The Bird's-Eye-View (BEV) representations have significantly …

A comprehensive study on robustness of image classification models: Benchmarking and rethinking

C Liu, Y Dong, W **ang, X Yang, H Su, J Zhu… - International Journal of …, 2024 - Springer
The robustness of deep neural networks is frequently compromised when faced with
adversarial examples, common corruptions, and distribution shifts, posing a significant …

Mult: An end-to-end multitask learning transformer

D Bhattacharjee, T Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to
simultaneously learn multiple high-level vision tasks, including depth estimation, semantic …