- Academic Search

D Du, G Gong, X Chu - arxiv preprint arxiv:2405.00314, 2024 - arxiv.org

Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a
promising alternative to convolutional neural networks (CNNs) in several vision-related …

Save Cite Cited by 7 Related articles All 2 versions Free GPT-4 View as HTML

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier

Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Kangaroo: Lossless self-speculative decoding via double early exiting

F Liu, Y Tang, Z Liu, Y Ni, K Han, Y Wang - arxiv preprint arxiv …, 2024 - arxiv.org

Speculative decoding has demonstrated its effectiveness in accelerating the inference of
large language models while maintaining a consistent sampling distribution. However, the …

Save Cite Cited by 12 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] cas.cz

On energy complexity of fully-connected layers

J Šíma, J Cabessa, P Vidnerová - Neural Networks, 2024 - Elsevier

The massive increase in the size of deep neural networks (DNNs) is accompanied by a
significant increase in energy consumption of their hardware implementations which is …

Save Cite Cited by 3 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

A Survey on Large Language Model Acceleration based on KV Cache Management

H Li, Y Li, A Tian, T Tang, Z Xu, X Chen, N Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have revolutionized a wide range of domains such as
natural language processing, computer vision, and multi-modal tasks due to their ability to …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4 View as HTML

Analysis and Behavioral Modeling Using Augmented Transformer for Satellite Communication Power Amplifiers

G Zhao, K Ying, Q Wen, L Zhao, J Pang… - IEEE Internet of …, 2024 - ieeexplore.ieee.org

To meet the demand for high-speed and high-quality communication in next 6G satellite
communication, it is very necessary and urgent to study the behavioral modeling of 6G …

Save Cite Related articles

[Free GPT-4]

[PDF] arxiv.org

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

J Guo, X Chen, Y Tang, Y Wang - arxiv preprint arxiv:2405.11582, 2024 - arxiv.org

Transformers have become foundational architectures for both natural language and
computer vision tasks. However, the high computational cost makes it quite challenging to …

Save Cite Cited by 8 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation

M Li, D Zhang, T He, X **e, YF Li, K Qin - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain
of model compression, substantially reducing the dependency on the original training data …

[Free GPT-4]

[PDF] osf.io

[PDF][PDF] Efficient model compression and knowledge distillation on llama 2: Achieving high performance with reduced computational cost

Q Huangpu, H Gao - 2024 - files.osf.io

This study investigates the application of model compression and knowledge distillation
techniques to enhance the computational efficiency of LLama 2, a Large Language Model …

Save Cite Cited by 45 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Hotfixing Large Language Models for Code

Z Yang, D Lo - arxiv preprint arxiv:2408.05727, 2024 - arxiv.org

Large Language Models for Code (LLM4Code) have become an integral part of developers'
workflows, assisting with tasks such as code completion and generation. However, these …

Create alert

Cite

Advanced search

Saved to My library

A survey on transformer compression

Model quantization and hardware acceleration for vision transformers: A comprehensive survey

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

Kangaroo: Lossless self-speculative decoding via double early exiting

On energy complexity of fully-connected layers

A Survey on Large Language Model Acceleration based on KV Cache Management

Analysis and Behavioral Modeling Using Augmented Transformer for Satellite Communication Power Amplifiers

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation

[PDF][PDF] Efficient model compression and knowledge distillation on llama 2: Achieving high performance with reduced computational cost

Hotfixing Large Language Models for Code