Model quantization and hardware acceleration for vision transformers: A comprehensive survey
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a
promising alternative to convolutional neural networks (CNNs) in several vision-related …
promising alternative to convolutional neural networks (CNNs) in several vision-related …
A survey of FPGA and ASIC designs for transformer inference acceleration and optimization
BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier
Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …
such as computer vision, speech recognition, and natural language processing. However …
Kangaroo: Lossless self-speculative decoding via double early exiting
Speculative decoding has demonstrated its effectiveness in accelerating the inference of
large language models while maintaining a consistent sampling distribution. However, the …
large language models while maintaining a consistent sampling distribution. However, the …
On energy complexity of fully-connected layers
The massive increase in the size of deep neural networks (DNNs) is accompanied by a
significant increase in energy consumption of their hardware implementations which is …
significant increase in energy consumption of their hardware implementations which is …
A Survey on Large Language Model Acceleration based on KV Cache Management
H Li, Y Li, A Tian, T Tang, Z Xu, X Chen, N Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have revolutionized a wide range of domains such as
natural language processing, computer vision, and multi-modal tasks due to their ability to …
natural language processing, computer vision, and multi-modal tasks due to their ability to …
Analysis and Behavioral Modeling Using Augmented Transformer for Satellite Communication Power Amplifiers
To meet the demand for high-speed and high-quality communication in next 6G satellite
communication, it is very necessary and urgent to study the behavioral modeling of 6G …
communication, it is very necessary and urgent to study the behavioral modeling of 6G …
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Transformers have become foundational architectures for both natural language and
computer vision tasks. However, the high computational cost makes it quite challenging to …
computer vision tasks. However, the high computational cost makes it quite challenging to …
Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation
Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain
of model compression, substantially reducing the dependency on the original training data …
of model compression, substantially reducing the dependency on the original training data …
[PDF][PDF] Efficient model compression and knowledge distillation on llama 2: Achieving high performance with reduced computational cost
Q Huangpu, H Gao - 2024 - files.osf.io
This study investigates the application of model compression and knowledge distillation
techniques to enhance the computational efficiency of LLama 2, a Large Language Model …
techniques to enhance the computational efficiency of LLama 2, a Large Language Model …
Hotfixing Large Language Models for Code
Large Language Models for Code (LLM4Code) have become an integral part of developers'
workflows, assisting with tasks such as code completion and generation. However, these …
workflows, assisting with tasks such as code completion and generation. However, these …