- Academic Search

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

Save Cite Cited by 128 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

Save Cite Cited by 71 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Hardware acceleration of LLMs: A comprehensive survey and comparison

N Koilia, C Kachris - arxiv preprint arxiv:2409.03384, 2024 - arxiv.org

Large Language Models (LLMs) have emerged as powerful tools for natural language
processing tasks, revolutionizing the field with their ability to understand and generate …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A Review on Edge Large Language Models: Design, Execution, and Applications

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

Save Cite Cited by 4 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

FedSpaLLM: Federated pruning of large language models

G Bai, Y Li, Z Li, L Zhao, K Kim - arxiv preprint arxiv:2410.14852, 2024 - arxiv.org

Large Language Models (LLMs) achieve state-of-the-art performance but are challenging to
deploy due to their high computational and storage demands. Pruning can reduce model …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models

H Sharma, P Dhingra, JR Doppa, U Ogras… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers have revolutionized deep learning and generative modeling, enabling
unprecedented advancements in natural language processing tasks. However, the size of …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4 View as HTML

Hardware-software co-design enabling static and dynamic sparse attention mechanisms

J Zhao, P Zeng, G Shen, Q Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The attention mechanisms of transformers effectively extract pertinent information from the
input sequence. However, the quadratic complexity of self-attention incurs heavy …

Save Cite Cited by 5 Related articles

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier

Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4

EdgeTran: Device-aware co-search of transformers for efficient inference on mobile edge platforms

S Tuli, NK Jha - IEEE Transactions on Mobile Computing, 2023 - ieeexplore.ieee.org

Automated design of efficient transformer models has recently attracted significant attention
from industry and academia. However, most works only focus on certain metrics while …

Save Cite Cited by 5 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] mdpi.com

A survey on sparsity exploration in transformer-based accelerators

KAA Fuad, L Chen - Electronics, 2023 - mdpi.com

Transformer models have emerged as the state-of-the-art in many natural language
processing and computer vision applications due to their capability of attending to longer …

Save Cite Cited by 7 Related articles All 5 versions Free GPT-4 Cached

Create alert

Cite

Advanced search

Saved to My library

AccelTran: A sparsity-aware accelerator for dynamic inference with transformers

A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

Beyond efficiency: A systematic survey of resource-efficient large language models

Hardware acceleration of LLMs: A comprehensive survey and comparison

A Review on Edge Large Language Models: Design, Execution, and Applications

FedSpaLLM: Federated pruning of large language models

A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models

Hardware-software co-design enabling static and dynamic sparse attention mechanisms

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

EdgeTran: Device-aware co-search of transformers for efficient inference on mobile edge platforms

A survey on sparsity exploration in transformer-based accelerators