A comprehensive survey of transformers for computer vision

S Jamil, M Jalil Piran, OJ Kwon - Drones, 2023 - mdpi.com
As a special type of transformer, vision transformers (ViTs) can be used for various computer
vision (CV) applications. Convolutional neural networks (CNNs) have several potential …

Fine-grained visual classification via internal ensemble learning transformer

Q Xu, J Wang, B Jiang, B Luo - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Recently, vision transformers (ViTs) have been investigated in fine-grained visual
recognition (FGVC) and are now considered state of the art. However, most ViT-based works …

Underwater target detection based on improved YOLOv7

K Liu, Q Sun, D Sun, L Peng, M Yang… - Journal of Marine Science …, 2023 - mdpi.com
Underwater target detection is a crucial aspect of ocean exploration. However, conventional
underwater target detection methods face several challenges such as inaccurate feature …

Vitcod: Vision transformer acceleration via dedicated algorithm and accelerator co-design

H You, Z Sun, H Shi, Z Yu, Y Zhao… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision
tasks. However, ViTs' self-attention module is still arguably a major bottleneck, limiting their …

Transface: Calibrating transformer training for face recognition from a data-centric perspective

J Dan, Y Liu, H **e, J Deng, H **e… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) have demonstrated powerful representation ability in
various visual tasks thanks to their intrinsic data-hungry nature. However, we unexpectedly …

Castling-vit: Compressing self-attention via switching towards linear-angular attention at vision transformer inference

H You, Y **ong, X Dai, B Wu, P Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) have shown impressive performance but still require a
high computation cost as compared to convolutional neural networks (CNNs), one reason is …

[KNIHA][B] Computational methods for deep learning

WQ Yan - 2021 - Springer
This book has been drafted based on my lectures and seminars from recent years for
postgraduate students at Auckland University of Technology (AUT), New Zealand. We have …

[HTML][HTML] Transformer-based decoder designs for semantic segmentation on remotely sensed images

T Panboonyuen, K Jitkajornwanich, S Lawawirojwong… - Remote Sensing, 2021 - mdpi.com
Transformers have demonstrated remarkable accomplishments in several natural language
processing (NLP) tasks as well as image processing tasks. Herein, we present a deep …

Solving masked jigsaw puzzles with diffusion vision transformers

J Liu, W Teshome, S Ghimire… - Proceedings of the …, 2024 - openaccess.thecvf.com
Solving image and video jigsaw puzzles poses the challenging task of rearranging image
fragments or video frames from unordered sequences to restore meaningful images and …

A learnable discrete-prior fusion autoencoder with contrastive learning for tabular data synthesis

R Zhang, Y Lou, D Xu, Y Cao, H Wang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The actual collection of tabular data for sharing involves confidentiality and privacy
constraints, leaving the potential risks of machine learning for interventional data analysis …