Comparing vision transformers and convolutional neural networks for image classification: A literature review
Transformers are models that implement a mechanism of self-attention, individually
weighting the importance of each part of the input data. Their use in image classification …
weighting the importance of each part of the input data. Their use in image classification …
Transformer for object detection: Review and benchmark
Y Li, N Miao, L Ma, F Shuang, X Huang - Engineering Applications of …, 2023 - Elsevier
Object detection is a crucial task in computer vision (CV). With the rapid advancement of
Transformer-based models in natural language processing (NLP) and various visual tasks …
Transformer-based models in natural language processing (NLP) and various visual tasks …
Diffusion models for adversarial purification
Adversarial purification refers to a class of defense methods that remove adversarial
perturbations using a generative model. These methods do not make assumptions on the …
perturbations using a generative model. These methods do not make assumptions on the …
Understanding the robustness in vision transformers
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against
various corruptions. Although this property is partly attributed to the self-attention …
various corruptions. Although this property is partly attributed to the self-attention …
Diffusion visual counterfactual explanations
Abstract Visual Counterfactual Explanations (VCEs) are an important tool to understand the
decisions of an image classifier. They are “small” but “realistic” semantic changes of the …
decisions of an image classifier. They are “small” but “realistic” semantic changes of the …
On the adversarial robustness of vision transformers
Following the success in advancing natural language processing and understanding,
transformers are expected to bring revolutionary changes to computer vision. This work …
transformers are expected to bring revolutionary changes to computer vision. This work …
Towards robust vision transformer
Abstract Recent advances on Vision Transformer (ViT) and its improved variants have
shown that self-attention-based networks surpass traditional Convolutional Neural Networks …
shown that self-attention-based networks surpass traditional Convolutional Neural Networks …
Understanding the Robustness of 3D Object Detection With Bird's-Eye-View Representations in Autonomous Driving
Abstract 3D object detection is an essential perception task in autonomous driving to
understand the environments. The Bird's-Eye-View (BEV) representations have significantly …
understand the environments. The Bird's-Eye-View (BEV) representations have significantly …
A comprehensive study on robustness of image classification models: Benchmarking and rethinking
The robustness of deep neural networks is frequently compromised when faced with
adversarial examples, common corruptions, and distribution shifts, posing a significant …
adversarial examples, common corruptions, and distribution shifts, posing a significant …
Mult: An end-to-end multitask learning transformer
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to
simultaneously learn multiple high-level vision tasks, including depth estimation, semantic …
simultaneously learn multiple high-level vision tasks, including depth estimation, semantic …