A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …

Swin transformer: Hierarchical vision transformer using shifted windows

Z Liu, Y Lin, Y Cao, H Hu, Y Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Adaptformer: Adapting vision transformers for scalable visual recognition

S Chen, C Ge, Z Tong, J Wang… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …

Cmt: Convolutional neural networks meet vision transformers

J Guo, K Han, H Wu, Y Tang, X Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Vision transformers have been successfully applied to image recognition tasks due to their
ability to capture long-range dependencies within an image. However, there are still gaps in …

Conformer: Local features coupling global representations for visual recognition

Z Peng, W Huang, S Gu, L **e… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …

Dynamic head: Unifying object detection heads with attentions

X Dai, Y Chen, B **ao, D Chen, M Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com
The complex nature of combining localization and classification in object detection has
resulted in the flourished development of methods. Previous works tried to improve the …

Learning to prompt for open-vocabulary object detection with vision-language model

Y Du, F Wei, Z Zhang, M Shi… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recently, vision-language pre-training shows great potential in open-vocabulary object
detection, where detectors trained on base classes are devised for detecting new classes …

Rethinking transformer-based set prediction for object detection

Z Sun, S Cao, Y Yang… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
DETR is a recently proposed Transformer-based method which views object detection as a
set prediction problem and achieves state-of-the-art performance but demands extra-long …

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arxiv preprint arxiv …, 2020 - arxiv.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …