A comprehensive survey of transformers for computer vision
As a special type of transformer, vision transformers (ViTs) can be used for various computer
vision (CV) applications. Convolutional neural networks (CNNs) have several potential …
vision (CV) applications. Convolutional neural networks (CNNs) have several potential …
Survey: Image mixing and deleting for data augmentation
Neural networks are prone to overfitting and memorizing data patterns. To avoid over-fitting
and enhance their generalization and performance, various methods have been suggested …
and enhance their generalization and performance, various methods have been suggested …
End-to-end temporal action detection with transformer
Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
C-mixup: Improving generalization in regression
Improving the generalization of deep networks is an important open challenge, particularly
in domains without plentiful data. The mixup algorithm improves generalization by linearly …
in domains without plentiful data. The mixup algorithm improves generalization by linearly …
Remax: Relaxing for better training on efficient panoptic segmentation
This paper presents a new mechanism to facilitate the training of mask transformers for
efficient panoptic segmentation, democratizing its deployment. We observe that due to the …
efficient panoptic segmentation, democratizing its deployment. We observe that due to the …
Tokenmix: Rethinking image mixing for data augmentation in vision transformers
CutMix is a popular augmentation technique commonly used for training modern
convolutional and transformer vision networks. It was originally designed to encourage …
convolutional and transformer vision networks. It was originally designed to encourage …
Patch-mix transformer for unsupervised domain adaptation: A game perspective
Endeavors have been recently made to leverage the vision transformer (ViT) for the
challenging unsupervised domain adaptation (UDA) task. They typically adopt the cross …
challenging unsupervised domain adaptation (UDA) task. They typically adopt the cross …
Improving vision transformers by revisiting high-frequency components
The transformer models have shown promising effectiveness in dealing with various vision
tasks. However, compared with training Convolutional Neural Network (CNN) models …
tasks. However, compared with training Convolutional Neural Network (CNN) models …
Transface: Calibrating transformer training for face recognition from a data-centric perspective
Abstract Vision Transformers (ViTs) have demonstrated powerful representation ability in
various visual tasks thanks to their intrinsic data-hungry nature. However, we unexpectedly …
various visual tasks thanks to their intrinsic data-hungry nature. However, we unexpectedly …
A multistage information complementary fusion network based on flexible-mixup for HSI-X image classification
Mixup-based data augmentation has been proven to be beneficial to the regularization of
models during training, especially in the remote-sensing field where the training data is …
models during training, especially in the remote-sensing field where the training data is …