Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer
A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

A survey on label-efficient deep image segmentation: Bridging the gap between weak supervision and dense prediction

W Shen, Z Peng, X Wang, H Wang… - IEEE transactions on …, 2023 - ieeexplore.ieee.org
The rapid development of deep learning has made a great progress in image segmentation,
one of the fundamental tasks of computer vision. However, the current segmentation …

Vision transformers for single image dehazing

Y Song, Z He, H Qian, X Du - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
Image dehazing is a representative low-level vision task that estimates latent haze-free
images from hazy images. In recent years, convolutional neural network-based methods …

Davit: Dual attention vision transformers

M Ding, B **ao, N Codella, P Luo, J Wang… - European conference on …, 2022 - Springer
In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective
vision transformer architecture that is able to capture global context while maintaining …

Dilateformer: Multi-scale dilated transformer for visual recognition

J Jiao, YM Tang, KY Lin, Y Gao, AJ Ma… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
As a de facto solution, the vanilla Vision Transformers (ViTs) are encouraged to model long-
range dependencies between arbitrary image patches while the global attended receptive …

Mpvit: Multi-path vision transformer for dense prediction

Y Lee, J Kim, J Willette… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Dense computer vision tasks such as object detection and segmentation require effective
multi-scale feature representation for detecting or classifying objects or regions with varying …

N-gram in swin transformers for efficient lightweight image super-resolution

H Choi, J Lee, J Yang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
While some studies have proven that Swin Transformer (Swin) with window self-attention
(WSA) is suitable for single image super-resolution (SR), the plain WSA ignores the broad …

Multi-scale high-resolution vision transformer for semantic segmentation

J Gu, H Kwon, D Wang, W Ye, M Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) have emerged with superior performance on computer
vision tasks compared to convolutional neural network (CNN)-based models. However, ViTs …

Spvit: Enabling faster vision transformers via latency-aware soft token pruning

Z Kong, P Dong, X Ma, X Meng, W Niu, M Sun… - European conference on …, 2022 - Springer
Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in
the computer vision field, while the high computation and memory cost makes its …

Accurate image restoration with attention retractable transformer

J Zhang, Y Zhang, J Gu, Y Zhang, L Kong… - arxiv preprint arxiv …, 2022 - arxiv.org
Recently, Transformer-based image restoration networks have achieved promising
improvements over convolutional neural networks due to parameter-independent global …