- Academic Search

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

Save Cite Cited by 3 Related articles View as HTML

[Free GPT-4]

[PDF] mdpi.com

Recent developments in low-power AI accelerators: A survey

C Åleskog, H Grahn, A Borg - Algorithms, 2022 - mdpi.com

As machine learning and AI continue to rapidly develop, and with the ever-closer end of
Moore's law, new avenues and novel ideas in architecture design are being created and …

Save Cite Cited by 16 Related articles All 4 versions Free GPT-4 Cached

H3datten: Heterogeneous 3-d integrated hybrid analog and digital compute-in-memory accelerator for vision transformer self-attention

W Li, M Manley, J Read, A Kaul… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

After the success of the transformer networks on natural language processing (NLP), the
application of transformers to computer vision (CV) has followed suit to deliver …

Save Cite Cited by 12 Related articles All 4 versions Free GPT-4

Hardware-software co-design enabling static and dynamic sparse attention mechanisms

J Zhao, P Zeng, G Shen, Q Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The attention mechanisms of transformers effectively extract pertinent information from the
input sequence. However, the quadratic complexity of self-attention incurs heavy …

Save Cite Cited by 5 Related articles

[Free GPT-4]

[PDF] acm.org

Drift: Leveraging Distribution-based Dynamic Precision Quantization for Efficient Deep Neural Network Acceleration

L Liu, Z Xu, Y He, Y Wang, H Li, X Li… - Proceedings of the 61st …, 2024 - dl.acm.org

Quantization is one of the most hardware-efficient ways to reduce inference costs for deep
neural network (DNN) models. Nevertheless, with the continuous increase of DNN model …

Save Cite Cited by 1 Related articles

Hardware-efficient Softmax Approximation for Self-Attention Networks

NA Koca, AT Do, CH Chang - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Self-attention networks such as Transformer have become state-of-the-art models for natural
language processing (NLP) problems. Softmax function, which serves as a normalizer to …

Save Cite Cited by 10 Related articles

RAWAtten: Reconfigurable accelerator for window attention in hierarchical vision transformers

W Li, Y Luo, S Yu - 2023 Design, Automation & Test in Europe …, 2023 - ieeexplore.ieee.org

After the success of the transformer networks on natural language processing (NLP), the
application of transformers to computer vision has followed suit to deliver unprecedented …

Save Cite Cited by 6 Related articles

Deq: Dynamic element-wise quantization for efficient attention architecture

X Wang, Z Song, Q Huang… - 2023 IEEE 41st …, 2023 - ieeexplore.ieee.org

Attention-based models, such as transformers, have achieved remarkable success across
various tasks. However, their deployment is hindered by challenges such as high memory …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4

A Unified Accelerator for All-in-One Image Restoration Based on Prompt Degradation Learning

S Zhang, Q Dong, W Mao… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org

All-in-one image restoration (IR) recovers images from various unknown distortions by a
single model, such as rain, haze, and blur. Transformer-based IR methods have significantly …

Save Cite Related articles

Vision Transformer Acceleration via a Versatile Attention Optimization Framework

X Wang, Q Huang, X Li, H Jiang, Q Xu… - … on Computer-Aided …, 2024 - ieeexplore.ieee.org

Vision Transformers (ViTs) have achieved remarkable success across various tasks.
However, their deployment is hindered by challenges such as high memory requirements …

Save Cite Related articles

Create alert

Cite

Advanced search

Saved to My library

Dtqatten: Leveraging dynamic token-based quantization for efficient attention architecture

A Review on Edge Large Language Models: Design, Execution, and Applications