A Review on Edge Large Language Models: Design, Execution, and Applications
Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …
Recent developments in low-power AI accelerators: A survey
As machine learning and AI continue to rapidly develop, and with the ever-closer end of
Moore's law, new avenues and novel ideas in architecture design are being created and …
Moore's law, new avenues and novel ideas in architecture design are being created and …
H3datten: Heterogeneous 3-d integrated hybrid analog and digital compute-in-memory accelerator for vision transformer self-attention
After the success of the transformer networks on natural language processing (NLP), the
application of transformers to computer vision (CV) has followed suit to deliver …
application of transformers to computer vision (CV) has followed suit to deliver …
Hardware-software co-design enabling static and dynamic sparse attention mechanisms
The attention mechanisms of transformers effectively extract pertinent information from the
input sequence. However, the quadratic complexity of self-attention incurs heavy …
input sequence. However, the quadratic complexity of self-attention incurs heavy …
Drift: Leveraging Distribution-based Dynamic Precision Quantization for Efficient Deep Neural Network Acceleration
Quantization is one of the most hardware-efficient ways to reduce inference costs for deep
neural network (DNN) models. Nevertheless, with the continuous increase of DNN model …
neural network (DNN) models. Nevertheless, with the continuous increase of DNN model …
Hardware-efficient Softmax Approximation for Self-Attention Networks
Self-attention networks such as Transformer have become state-of-the-art models for natural
language processing (NLP) problems. Softmax function, which serves as a normalizer to …
language processing (NLP) problems. Softmax function, which serves as a normalizer to …
RAWAtten: Reconfigurable accelerator for window attention in hierarchical vision transformers
After the success of the transformer networks on natural language processing (NLP), the
application of transformers to computer vision has followed suit to deliver unprecedented …
application of transformers to computer vision has followed suit to deliver unprecedented …
Deq: Dynamic element-wise quantization for efficient attention architecture
X Wang, Z Song, Q Huang… - 2023 IEEE 41st …, 2023 - ieeexplore.ieee.org
Attention-based models, such as transformers, have achieved remarkable success across
various tasks. However, their deployment is hindered by challenges such as high memory …
various tasks. However, their deployment is hindered by challenges such as high memory …
A Unified Accelerator for All-in-One Image Restoration Based on Prompt Degradation Learning
All-in-one image restoration (IR) recovers images from various unknown distortions by a
single model, such as rain, haze, and blur. Transformer-based IR methods have significantly …
single model, such as rain, haze, and blur. Transformer-based IR methods have significantly …
Vision Transformer Acceleration via a Versatile Attention Optimization Framework
X Wang, Q Huang, X Li, H Jiang, Q Xu… - … on Computer-Aided …, 2024 - ieeexplore.ieee.org
Vision Transformers (ViTs) have achieved remarkable success across various tasks.
However, their deployment is hindered by challenges such as high memory requirements …
However, their deployment is hindered by challenges such as high memory requirements …