A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
Adaptformer: Adapting vision transformers for scalable visual recognition
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …
recognition. A following scenario is to adapt a ViT to various image and video recognition …
Learning to prompt for open-vocabulary object detection with vision-language model
Recently, vision-language pre-training shows great potential in open-vocabulary object
detection, where detectors trained on base classes are devised for detecting new classes …
detection, where detectors trained on base classes are devised for detecting new classes …
Cmt: Convolutional neural networks meet vision transformers
Vision transformers have been successfully applied to image recognition tasks due to their
ability to capture long-range dependencies within an image. However, there are still gaps in …
ability to capture long-range dependencies within an image. However, there are still gaps in …
Dynamic head: Unifying object detection heads with attentions
The complex nature of combining localization and classification in object detection has
resulted in the flourished development of methods. Previous works tried to improve the …
resulted in the flourished development of methods. Previous works tried to improve the …
Conformer: Local features coupling global representations for visual recognition
Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …
at extracting local features but experience difficulty to capture global representations. Within …
Swin transformer: Hierarchical vision transformer using shifted windows
This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …
A survey on vision transformer
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …
network mainly based on the self-attention mechanism. Thanks to its strong representation …
Transformers meet visual learning understanding: A comprehensive review
Dynamic attention mechanism and global modeling ability make Transformer show strong
feature learning ability. In recent years, Transformer has become comparable to CNNs …
feature learning ability. In recent years, Transformer has become comparable to CNNs …
Group-free 3d object detection via transformers
Recently, directly detecting 3D objects from 3D point clouds has received increasing
attention. To extract object representation from an irregular point cloud, existing methods …
attention. To extract object representation from an irregular point cloud, existing methods …