Flatten transformer: Vision transformer using focused linear attention
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
Agent attention: On the integration of softmax and linear attention
The attention module is the key component in Transformers. While the global attention
mechanism offers high expressiveness, its excessive computational cost restricts its …
mechanism offers high expressiveness, its excessive computational cost restricts its …
Adaptive rotated convolution for rotated object detection
Rotated object detection aims to identify and locate objects in images with arbitrary
orientation. In this scenario, the oriented directions of objects vary considerably across …
orientation. In this scenario, the oriented directions of objects vary considerably across …
Rank-DETR for high quality object detection
Modern detection transformers (DETRs) use a set of object queries to predict a list of
bounding boxes, sort them by their classification confidence scores, and select the top …
bounding boxes, sort them by their classification confidence scores, and select the top …
Degradation-resistant unfolding network for heterogeneous image fusion
Heterogeneous image fusion (HIF) techniques aim to enhance image quality by merging
complementary information from images captured by different sensors. Among these …
complementary information from images captured by different sensors. Among these …
Gsva: Generalized segmentation via multimodal large language models
Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …
classic RES to refer to multiple objects in one expression or identify the empty targets absent …
Efficient diffusion transformer with step-wise dynamic attention mediators
This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …
mechanisms of diffusion transformer models, particularly during the early stages of …
Mask grounding for referring image segmentation
Abstract Referring Image Segmentation (RIS) is a challenging task that requires an
algorithm to segment objects referred by free-form language expressions. Despite significant …
algorithm to segment objects referred by free-form language expressions. Despite significant …
Gra: Detecting oriented objects through group-wise rotating and attention
Oriented object detection, an emerging task in recent years, aims to identify and locate
objects across varied orientations. This requires the detector to accurately capture the …
objects across varied orientations. This requires the detector to accurately capture the …
Fine-grained recognition with learnable semantic data augmentation
Fine-grained image recognition is a longstanding computer vision challenge that focuses on
differentiating objects belonging to multiple subordinate categories within the same meta …
differentiating objects belonging to multiple subordinate categories within the same meta …