Google Académico

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

Guardar Citar Citado por 129 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Guardar Citar Citado por 68 Artículos relacionados Las 6 versiones

[Free GPT-4]

[PDF] arxiv.org

Spvit: Enabling faster vision transformers via latency-aware soft token pruning

Z Kong, P Dong, X Ma, X Meng, W Niu, M Sun… - European conference on …, 2022 - Springer

Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in
the computer vision field, while the high computation and memory cost makes its …

Guardar Citar Citado por 199 Artículos relacionados Las 6 versiones

[Free GPT-4]

[PDF] ucl.ac.uk

Green ai: Do deep learning frameworks have different costs?

S Georgiou, M Kechagia, T Sharma, F Sarro… - Proceedings of the 44th …, 2022 - dl.acm.org

The use of Artificial Intelligence (ai), and more specifically of Deep Learning (dl), in modern
software systems, is nowadays widespread and continues to grow. At the same time, its …

Guardar Citar Citado por 105 Artículos relacionados Las 11 versiones

[Free GPT-4]

[PDF] mit.edu

Compressing large-scale transformer-based models: A case study on bert

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu

Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

Guardar Citar Citado por 224 Artículos relacionados Las 14 versiones

[Free GPT-4]

[PDF] arxiv.org

Pruning self-attentions into convolutional layers in single path

H He, J Cai, J Liu, Z Pan, J Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Vision Transformers (ViTs) have achieved impressive performance over various computer
vision tasks. However, modeling global correlations with multi-head self-attention (MSA) …

Guardar Citar Citado por 48 Artículos relacionados Las 8 versiones

[Free GPT-4]

[PDF] researchgate.net

A survey of deep learning on cpus: opportunities and co-optimizations

S Mittal, P Rajput, S Subramoney - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL)
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …

Guardar Citar Citado por 72 Artículos relacionados Las 6 versiones

[Free GPT-4]

[PDF] arxiv.org

Accelerating framework of transformer by hardware design and model compression co-optimization

P Qi, EHM Sha, Q Zhuge, H Peng… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org

State-of-the-art Transformer-based models, with gigantic parameters, are difficult to be
accommodated on resource constrained embedded devices. Moreover, with the …

Guardar Citar Citado por 52 Artículos relacionados Las 4 versiones

[Free GPT-4]

[PDF] mlr.press

Gradient-free structured pruning with unlabeled data

A Nova, H Dai, D Schuurmans - International Conference on …, 2023 - proceedings.mlr.press

Abstract Large Language Models (LLMs) have achieved great success in solving difficult
tasks across many domains, but such success comes with a high computation cost, and …

Guardar Citar Citado por 14 Artículos relacionados Las 8 versiones Versión en HTML

Joint structured pruning and dense knowledge distillation for efficient transformer model compression

B Cui, Y Li, Z Zhang - Neurocomputing, 2021 - Elsevier

In this paper, we develop a novel Joint Model Compression (referred to as JMC) method by
combining structured pruning and dense knowledge distillation techniques to significantly …

Guardar Citar Citado por 39 Artículos relacionados Las 2 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Efficient transformer-based large scale language representations using hardware-friendly...

A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

A survey of techniques for optimizing transformer inference

Spvit: Enabling faster vision transformers via latency-aware soft token pruning

Green ai: Do deep learning frameworks have different costs?

Compressing large-scale transformer-based models: A case study on bert

Pruning self-attentions into convolutional layers in single path

A survey of deep learning on cpus: opportunities and co-optimizations

Accelerating framework of transformer by hardware design and model compression co-optimization

Gradient-free structured pruning with unlabeled data

Joint structured pruning and dense knowledge distillation for efficient transformer model compression