A comprehensive survey of transformers for computer vision

S Jamil, M Jalil Piran, OJ Kwon - Drones, 2023 - mdpi.com
As a special type of transformer, vision transformers (ViTs) can be used for various computer
vision (CV) applications. Convolutional neural networks (CNNs) have several potential …

Survey: Image mixing and deleting for data augmentation

H Naveed, S Anwar, M Hayat, K Javed… - Engineering Applications of …, 2024 - Elsevier
Neural networks are prone to overfitting and memorizing data patterns. To avoid over-fitting
and enhance their generalization and performance, various methods have been suggested …

End-to-end temporal action detection with transformer

X Liu, Q Wang, Y Hu, X Tang, S Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …

C-mixup: Improving generalization in regression

H Yao, Y Wang, L Zhang, JY Zou… - Advances in neural …, 2022 - proceedings.neurips.cc
Improving the generalization of deep networks is an important open challenge, particularly
in domains without plentiful data. The mixup algorithm improves generalization by linearly …

Remax: Relaxing for better training on efficient panoptic segmentation

S Sun, W Wang, A Howard, Q Yu… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper presents a new mechanism to facilitate the training of mask transformers for
efficient panoptic segmentation, democratizing its deployment. We observe that due to the …

Tokenmix: Rethinking image mixing for data augmentation in vision transformers

J Liu, B Liu, H Zhou, H Li, Y Liu - European Conference on Computer …, 2022 - Springer
CutMix is a popular augmentation technique commonly used for training modern
convolutional and transformer vision networks. It was originally designed to encourage …

Patch-mix transformer for unsupervised domain adaptation: A game perspective

J Zhu, H Bai, L Wang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Endeavors have been recently made to leverage the vision transformer (ViT) for the
challenging unsupervised domain adaptation (UDA) task. They typically adopt the cross …

Improving vision transformers by revisiting high-frequency components

J Bai, L Yuan, ST **a, S Yan, Z Li, W Liu - European Conference on …, 2022 - Springer
The transformer models have shown promising effectiveness in dealing with various vision
tasks. However, compared with training Convolutional Neural Network (CNN) models …

Transface: Calibrating transformer training for face recognition from a data-centric perspective

J Dan, Y Liu, H **e, J Deng, H **e… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) have demonstrated powerful representation ability in
various visual tasks thanks to their intrinsic data-hungry nature. However, we unexpectedly …

A multistage information complementary fusion network based on flexible-mixup for HSI-X image classification

J Wang, M Zhang, W Li, R Tao - IEEE Transactions on Neural …, 2023 - ieeexplore.ieee.org
Mixup-based data augmentation has been proven to be beneficial to the regularization of
models during training, especially in the remote-sensing field where the training data is …