Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Poster++: A simpler and stronger facial expression recognition network

J Mao, R Xu, X Yin, Y Chang, B Nie, A Huang… - Pattern Recognition, 2024 - Elsevier
The POSTER has achieved SOTA performance in facial expression recognition (FER) by
effectively combining facial landmarks and image features through its two-stream pyramid …

Rethinking vision transformers for mobilenet size and speed

Y Li, J Hu, Y Wen, G Evangelidis… - Proceedings of the …, 2023 - openaccess.thecvf.com
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …

Efficientformer: Vision transformers at mobilenet speed

Y Li, G Yuan, Y Wen, J Hu… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Vision Transformers (ViT) have shown rapid progress in computer vision tasks,
achieving promising results on various benchmarks. However, due to the massive number of …

Spikformer: When spiking neural network meets transformer

Z Zhou, Y Zhu, C He, Y Wang, S Yan, Y Tian… - arxiv preprint arxiv …, 2022 - arxiv.org
We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the
self-attention mechanism. The former offers an energy-efficient and event-driven paradigm …

Neighborhood attention transformer

A Hassani, S Walton, J Li, S Li… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract We present Neighborhood Attention (NA), the first efficient and scalable sliding
window attention mechanism for vision. NA is a pixel-wise operation, localizing self attention …

R2former: Unified retrieval and reranking transformer for place recognition

S Zhu, L Yang, C Chen, M Shah… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Visual Place Recognition (VPR) estimates the location of query images by matching
them with images in a reference database. Conventional methods generally adopt …

A survey of the vision transformers and their CNN-transformer based variants

A Khan, Z Rauf, A Sohail, AR Khan, H Asif… - Artificial Intelligence …, 2023 - Springer
Vision transformers have become popular as a possible substitute to convolutional neural
networks (CNNs) for a variety of computer vision applications. These transformers, with their …

Rethinking visual geo-localization for large-scale applications

G Berton, C Masone, B Caputo - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Visual Geo-localization (VG) is the task of estimating the position where a given photo was
taken by comparing it with a large database of images of known locations. To investigate …

Vision transformer and explainable transfer learning models for auto detection of kidney cyst, stone and tumor from CT-radiography

MN Islam, M Hasan, MK Hossain, MGR Alam… - Scientific Reports, 2022 - nature.com
Renal failure, a public health concern, and the scarcity of nephrologists around the globe
have necessitated the development of an AI-based system to auto-diagnose kidney …