A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …
massive model sizes that require significant computational and storage resources. To …
Beyond efficiency: A systematic survey of resource-efficient large language models
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …
models like OpenAI's ChatGPT, represents a significant advancement in artificial …
Hardware acceleration of LLMs: A comprehensive survey and comparison
N Koilia, C Kachris - arxiv preprint arxiv:2409.03384, 2024 - arxiv.org
Large Language Models (LLMs) have emerged as powerful tools for natural language
processing tasks, revolutionizing the field with their ability to understand and generate …
processing tasks, revolutionizing the field with their ability to understand and generate …
A Review on Edge Large Language Models: Design, Execution, and Applications
Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …
FedSpaLLM: Federated pruning of large language models
Large Language Models (LLMs) achieve state-of-the-art performance but are challenging to
deploy due to their high computational and storage demands. Pruning can reduce model …
deploy due to their high computational and storage demands. Pruning can reduce model …
A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models
Transformers have revolutionized deep learning and generative modeling, enabling
unprecedented advancements in natural language processing tasks. However, the size of …
unprecedented advancements in natural language processing tasks. However, the size of …
Hardware-software co-design enabling static and dynamic sparse attention mechanisms
The attention mechanisms of transformers effectively extract pertinent information from the
input sequence. However, the quadratic complexity of self-attention incurs heavy …
input sequence. However, the quadratic complexity of self-attention incurs heavy …
A survey of FPGA and ASIC designs for transformer inference acceleration and optimization
BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier
Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …
such as computer vision, speech recognition, and natural language processing. However …
EdgeTran: Device-aware co-search of transformers for efficient inference on mobile edge platforms
Automated design of efficient transformer models has recently attracted significant attention
from industry and academia. However, most works only focus on certain metrics while …
from industry and academia. However, most works only focus on certain metrics while …
A survey on sparsity exploration in transformer-based accelerators
Transformer models have emerged as the state-of-the-art in many natural language
processing and computer vision applications due to their capability of attending to longer …
processing and computer vision applications due to their capability of attending to longer …