Attention mechanisms in computer vision: A survey
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …
this observation, attention mechanisms were introduced into computer vision with the aim of …
A review on the attention mechanism of deep learning
Attention has arguably become one of the most important concepts in the deep learning
field. It is inspired by the biological systems of humans that tend to focus on the distinctive …
field. It is inspired by the biological systems of humans that tend to focus on the distinctive …
Internimage: Exploring large-scale vision foundation models with deformable convolutions
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …
large-scale models based on convolutional neural networks (CNNs) are still in an early …
Scaling up your kernels to 31x31: Revisiting large kernel design in cnns
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …
Dynamic neural networks: A survey
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …
models which have fixed computational graphs and parameters at the inference stage …
Fcanet: Frequency channel attention networks
Attention mechanism, especially channel attention, has gained great success in the
computer vision field. Many works focus on how to design efficient channel attention …
computer vision field. Many works focus on how to design efficient channel attention …
Deformable detr: Deformable transformers for end-to-end object detection
DETR has been recently proposed to eliminate the need for many hand-designed
components in object detection while demonstrating good performance. However, it suffers …
components in object detection while demonstrating good performance. However, it suffers …
Max-deeplab: End-to-end panoptic segmentation with mask transformers
Abstract We present MaX-DeepLab, the first end-to-end model for panoptic segmentation.
Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks …
Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks …
Involution: Inverting the inherence of convolution for visual recognition
Convolution has been the core ingredient of modern neural networks, triggering the surge of
deep learning in vision. In this work, we rethink the inherent principles of standard …
deep learning in vision. In this work, we rethink the inherent principles of standard …