Agent attention: On the integration of softmax and linear attention

D Han, T Ye, Y Han, Z **a, S Pan, P Wan… - … on Computer Vision, 2024 - Springer
The attention module is the key component in Transformers. While the global attention
mechanism offers high expressiveness, its excessive computational cost restricts its …

CATNet: Cascaded attention transformer network for marine species image classification

W Zhang, G Chen, P Zhuang, W Zhao… - Expert Systems with …, 2024 - Elsevier
Complex physicochemical environmental effects result in the underwater species images'
highly intricate and diverse backgrounds, which poses significant challenges for identifying …

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

Sam-6d: Segment anything model meets zero-shot 6d object pose estimation

J Lin, L Liu, D Lu, K Jia - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Zero-shot 6D object pose estimation involves the detection of novel objects with their 6D
poses in cluttered scenes presenting significant challenges for model generalizability …

Efficient diffusion transformer with step-wise dynamic attention mediators

Y Pu, Z **a, J Guo, D Han, Q Li, D Li, Y Yuan… - … on Computer Vision, 2024 - Springer
This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …

Vit-comer: Vision transformer with convolutional multi-scale feature interaction for dense predictions

C **a, X Wang, F Lv, X Hao… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Although Vision Transformer (ViT) has achieved significant success in computer
vision it does not perform well in dense prediction tasks due to the lack of inner-patch …

Hypformer: Exploring efficient transformer fully in hyperbolic space

M Yang, H Verma, DC Zhang, J Liu, I King… - Proceedings of the 30th …, 2024 - dl.acm.org
Hyperbolic geometry have shown significant potential in modeling complex structured data,
particularly those with underlying tree-like and hierarchical structures. Despite the …

P-mamba: Marrying perona malik diffusion with mamba for efficient pediatric echocardiographic left ventricular segmentation

Z Ye, T Chen, F Wang, H Zhang, L Zhang - arxiv preprint arxiv …, 2024 - arxiv.org
In pediatric cardiology, the accurate and immediate assessment of cardiac function through
echocardiography is crucial since it can determine whether urgent intervention is required in …

A novel state space model with local enhancement and state sharing for image fusion

Z Cao, X Wu, LJ Deng, Y Zhong - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
In image fusion tasks, images from different sources possess distinct characteristics. This
has driven the development of numerous methods to explore better ways of fusing them …

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution

Y Yue, Y Wang, B Kang, Y Han… - Advances in …, 2025 - proceedings.neurips.cc
Abstract Multimodal Large Language Models (MLLMs) have demonstrated remarkable
comprehension and reasoning capabilities with complex language and visual data. These …