Attention mechanisms in computer vision: A survey
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …
this observation, attention mechanisms were introduced into computer vision with the aim of …
Segnext: Rethinking convolutional attention design for semantic segmentation
We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …
segmentation. Recent transformer-based models have dominated the field of se-mantic …
Visual attention network
While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …
mechanism has recently taken various computer vision areas by storm. However, the 2D …
Srformer: Permuted self-attention for single image super-resolution
Previous works have shown that increasing the window size for Transformer-based image
super-resolution models (eg, SwinIR) can significantly improve the model performance but …
super-resolution models (eg, SwinIR) can significantly improve the model performance but …
Flowformer: A transformer architecture for optical flow
We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural
network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built …
network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built …
Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation
FlowFormer introduces a transformer architecture into optical flow estimation and achieves
state-of-the-art performance. The core component of FlowFormer is the transformer-based …
state-of-the-art performance. The core component of FlowFormer is the transformer-based …
Multimodal token fusion for vision transformers
Many adaptations of transformers have emerged to address the single-modal vision tasks,
where self-attention modules are stacked to handle input sources like images. Intuitively …
where self-attention modules are stacked to handle input sources like images. Intuitively …
Propainter: Improving propagation and transformer for video inpainting
Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms
in video inpainting (VI). Despite the effectiveness of these components, they still suffer from …
in video inpainting (VI). Despite the effectiveness of these components, they still suffer from …
Transflow: Transformer as flow learner
Optical flow is an indispensable building block for various important computer vision tasks,
including motion estimation, object tracking, and disparity measurement. In this work, we …
including motion estimation, object tracking, and disparity measurement. In this work, we …
Towards an end-to-end framework for flow-guided video inpainting
Optical flow, which captures motion information across frames, is exploited in recent video
inpainting methods through propagating pixels along its trajectories. However, the hand …
inpainting methods through propagating pixels along its trajectories. However, the hand …