Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

Agent attention: On the integration of softmax and linear attention

D Han, T Ye, Y Han, Z **a, S Pan, P Wan… - … on Computer Vision, 2024 - Springer
The attention module is the key component in Transformers. While the global attention
mechanism offers high expressiveness, its excessive computational cost restricts its …

Adaptive rotated convolution for rotated object detection

Y Pu, Y Wang, Z **a, Y Han, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Rotated object detection aims to identify and locate objects in images with arbitrary
orientation. In this scenario, the oriented directions of objects vary considerably across …

Rank-DETR for high quality object detection

Y Pu, W Liang, Y Hao, Y Yuan… - Advances in …, 2024 - proceedings.neurips.cc
Modern detection transformers (DETRs) use a set of object queries to predict a list of
bounding boxes, sort them by their classification confidence scores, and select the top …

Degradation-resistant unfolding network for heterogeneous image fusion

C He, K Li, G Xu, Y Zhang, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Heterogeneous image fusion (HIF) techniques aim to enhance image quality by merging
complementary information from images captured by different sensors. Among these …

Gsva: Generalized segmentation via multimodal large language models

Z **a, D Han, Y Han, X Pan, S Song… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …

Efficient diffusion transformer with step-wise dynamic attention mediators

Y Pu, Z **a, J Guo, D Han, Q Li, D Li, Y Yuan… - … on Computer Vision, 2024 - Springer
This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …

Mask grounding for referring image segmentation

YX Chng, H Zheng, Y Han, X Qiu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Referring Image Segmentation (RIS) is a challenging task that requires an
algorithm to segment objects referred by free-form language expressions. Despite significant …

Gra: Detecting oriented objects through group-wise rotating and attention

J Wang, Y Pu, Y Han, J Guo, Y Wang, X Li… - European Conference on …, 2024 - Springer
Oriented object detection, an emerging task in recent years, aims to identify and locate
objects across varied orientations. This requires the detector to accurately capture the …

Fine-grained recognition with learnable semantic data augmentation

Y Pu, Y Han, Y Wang, J Feng, C Deng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Fine-grained image recognition is a longstanding computer vision challenge that focuses on
differentiating objects belonging to multiple subordinate categories within the same meta …