- Academic Search

Z Liu, Y Lin, Y Cao, H Hu, Y Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …

Save Cite Cited by 26848 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Cvt: Introducing convolutions to vision transformers

H Wu, B **ao, N Codella, M Liu, X Dai… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present in this paper a new architecture, named Convolutional vision Transformer (CvT),
that improves Vision Transformer (ViT) in performance and efficiency by introducing …

Save Cite Cited by 2417 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Cswin transformer: A general vision transformer backbone with cross-shaped windows

X Dong, J Bao, D Chen, W Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract We present CSWin Transformer, an efficient and effective Transformer-based
backbone for general-purpose vision tasks. A challenging issue in Transformer design is …

Save Cite Cited by 1247 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Transformer in transformer

K Han, A **ao, E Wu, J Guo, C Xu… - Advances in neural …, 2021 - proceedings.neurips.cc

Transformer is a new kind of neural architecture which encodes the input data as powerful
features via the attention mechanism. Basically, the visual transformers first divide the input …

Save Cite Cited by 1900 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Multiscale vision transformers

H Fan, B **ong, K Mangalam, Y Li… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract We present Multiscale Vision Transformers (MViT) for video and image recognition,
by connecting the seminal idea of multiscale feature hierarchies with transformer models …

Save Cite Cited by 1542 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Levit: a vision transformer in convnet's clothing for faster inference

B Graham, A El-Nouby, H Touvron… - Proceedings of the …, 2021 - openaccess.thecvf.com

We design a family of image classification architectures that optimize the trade-off between
accuracy and efficiency in a high-speed regime. Our work exploits recent findings in …

Save Cite Cited by 719 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Uniformer: Unifying convolution and self-attention for visual recognition

K Li, Y Wang, J Zhang, P Gao, G Song… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …

Save Cite Cited by 410 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Rethinking and improving relative position encoding for vision transformer

K Wu, H Peng, M Chen, J Fu… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Relative position encoding (RPE) is important for transformer to capture sequence ordering
of input tokens. General efficacy has been proven in natural language processing. However …

Save Cite Cited by 391 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

P2T: Pyramid pooling transformer for scene understanding

YH Wu, Y Liu, X Zhan… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Recently, the vision transformer has achieved great success by pushing the state-of-the-art
of various vision tasks. One of the most challenging problems in the vision transformer is that …

Save Cite Cited by 264 Related articles All 14 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Towards robust vision transformer

X Mao, G Qi, Y Chen, X Li, R Duan… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Recent advances on Vision Transformer (ViT) and its improved variants have
shown that self-attention-based networks surpass traditional Convolutional Neural Networks …

Save Cite Cited by 233 Related articles All 8 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Do we really need explicit position encodings for vision transformers

Swin transformer: Hierarchical vision transformer using shifted windows

Cvt: Introducing convolutions to vision transformers

Cswin transformer: A general vision transformer backbone with cross-shaped windows

Transformer in transformer

Multiscale vision transformers

Levit: a vision transformer in convnet's clothing for faster inference

Uniformer: Unifying convolution and self-attention for visual recognition

Rethinking and improving relative position encoding for vision transformer

P2T: Pyramid pooling transformer for scene understanding

Towards robust vision transformer