A comprehensive survey on applications of transformers for deep learning tasks
Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …
mechanism to capture contextual relationships within sequential data. Unlike traditional …
Advances in medical image analysis with vision transformers: a comprehensive review
The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …
has recently also triggered broad interest in Computer Vision. Among other merits …
Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …
objects from an open set of categories in diverse environments. One way to address this …
Large selective kernel network for remote sensing object detection
Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …
representation of oriented bounding boxes but has overlooked the unique prior knowledge …
Segnext: Rethinking convolutional attention design for semantic segmentation
We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …
segmentation. Recent transformer-based models have dominated the field of se-mantic …
PIXART-: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
In this paper, we introduce PixArt-Σ, a Diffusion Transformer model (DiT) capable of directly
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …
Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion
Multi-modality (MM) image fusion aims to render fused images that maintain the merits of
different modalities, eg, functional highlight and detailed textures. To tackle the challenge in …
different modalities, eg, functional highlight and detailed textures. To tackle the challenge in …
Adaptformer: Adapting vision transformers for scalable visual recognition
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …
recognition. A following scenario is to adapt a ViT to various image and video recognition …
Vision transformer adapter for dense predictions
This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …
recent visual transformers that introduce vision-specific inductive biases into their …
SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer
This study proposes a novel general image fusion framework based on cross-domain long-
range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention …
range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention …