A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
Swin transformer: Hierarchical vision transformer using shifted windows
This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …
A survey on vision transformer
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …
network mainly based on the self-attention mechanism. Thanks to its strong representation …
Adaptformer: Adapting vision transformers for scalable visual recognition
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …
recognition. A following scenario is to adapt a ViT to various image and video recognition …
Cmt: Convolutional neural networks meet vision transformers
Vision transformers have been successfully applied to image recognition tasks due to their
ability to capture long-range dependencies within an image. However, there are still gaps in …
ability to capture long-range dependencies within an image. However, there are still gaps in …
Conformer: Local features coupling global representations for visual recognition
Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …
at extracting local features but experience difficulty to capture global representations. Within …
Dynamic head: Unifying object detection heads with attentions
The complex nature of combining localization and classification in object detection has
resulted in the flourished development of methods. Previous works tried to improve the …
resulted in the flourished development of methods. Previous works tried to improve the …
Learning to prompt for open-vocabulary object detection with vision-language model
Recently, vision-language pre-training shows great potential in open-vocabulary object
detection, where detectors trained on base classes are devised for detecting new classes …
detection, where detectors trained on base classes are devised for detecting new classes …
Rethinking transformer-based set prediction for object detection
DETR is a recently proposed Transformer-based method which views object detection as a
set prediction problem and achieves state-of-the-art performance but demands extra-long …
set prediction problem and achieves state-of-the-art performance but demands extra-long …
A survey on visual transformer
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …
network mainly based on the self-attention mechanism. Thanks to its strong representation …