A Review on Edge Large Language Models: Design, Execution, and Applications

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

Recent developments in low-power AI accelerators: A survey

C Åleskog, H Grahn, A Borg - Algorithms, 2022 - mdpi.com
As machine learning and AI continue to rapidly develop, and with the ever-closer end of
Moore's law, new avenues and novel ideas in architecture design are being created and …

H3datten: Heterogeneous 3-d integrated hybrid analog and digital compute-in-memory accelerator for vision transformer self-attention

W Li, M Manley, J Read, A Kaul… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
After the success of the transformer networks on natural language processing (NLP), the
application of transformers to computer vision (CV) has followed suit to deliver …

Hardware-software co-design enabling static and dynamic sparse attention mechanisms

J Zhao, P Zeng, G Shen, Q Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The attention mechanisms of transformers effectively extract pertinent information from the
input sequence. However, the quadratic complexity of self-attention incurs heavy …

Drift: Leveraging Distribution-based Dynamic Precision Quantization for Efficient Deep Neural Network Acceleration

L Liu, Z Xu, Y He, Y Wang, H Li, X Li… - Proceedings of the 61st …, 2024 - dl.acm.org
Quantization is one of the most hardware-efficient ways to reduce inference costs for deep
neural network (DNN) models. Nevertheless, with the continuous increase of DNN model …

Hardware-efficient Softmax Approximation for Self-Attention Networks

NA Koca, AT Do, CH Chang - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Self-attention networks such as Transformer have become state-of-the-art models for natural
language processing (NLP) problems. Softmax function, which serves as a normalizer to …

RAWAtten: Reconfigurable accelerator for window attention in hierarchical vision transformers

W Li, Y Luo, S Yu - 2023 Design, Automation & Test in Europe …, 2023 - ieeexplore.ieee.org
After the success of the transformer networks on natural language processing (NLP), the
application of transformers to computer vision has followed suit to deliver unprecedented …

Deq: Dynamic element-wise quantization for efficient attention architecture

X Wang, Z Song, Q Huang… - 2023 IEEE 41st …, 2023 - ieeexplore.ieee.org
Attention-based models, such as transformers, have achieved remarkable success across
various tasks. However, their deployment is hindered by challenges such as high memory …

A Unified Accelerator for All-in-One Image Restoration Based on Prompt Degradation Learning

S Zhang, Q Dong, W Mao… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
All-in-one image restoration (IR) recovers images from various unknown distortions by a
single model, such as rain, haze, and blur. Transformer-based IR methods have significantly …

Vision Transformer Acceleration via a Versatile Attention Optimization Framework

X Wang, Q Huang, X Li, H Jiang, Q Xu… - … on Computer-Aided …, 2024 - ieeexplore.ieee.org
Vision Transformers (ViTs) have achieved remarkable success across various tasks.
However, their deployment is hindered by challenges such as high memory requirements …