Google Acadèmic

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Desa Cita Citat per 493 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Desa Cita Citat per 70 Articles relacionats Totes les 8 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A simple and effective pruning approach for large language models

M Sun, Z Liu, A Bair, JZ Kolter - arxiv preprint arxiv:2306.11695, 2023 - arxiv.org

As their size increases, Large Languages Models (LLMs) are natural candidates for network
pruning methods: approaches that drop a subset of network weights while striving to …

Desa Cita Citat per 477 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Sparsegpt: Massive language models can be accurately pruned in one-shot

E Frantar, D Alistarh - International Conference on Machine …, 2023 - proceedings.mlr.press

We show for the first time that large-scale generative pretrained transformer (GPT) family
models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal …

Desa Cita Citat per 575 Articles relacionats Totes les 8 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Optimal brain compression: A framework for accurate post-training quantization and pruning

E Frantar, D Alistarh - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We consider the problem of model compression for deep neural networks (DNNs) in the
challenging one-shot/post-training setting, in which we are given an accurate trained model …

Desa Cita Citat per 226 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

Desa Cita Citat per 76 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Pruning vs quantization: Which is better?

A Kuzmin, M Nagel, M Van Baalen… - Advances in neural …, 2023 - proceedings.neurips.cc

Neural network pruning and quantization techniques are almost as old as neural networks
themselves. However, to date, only ad-hoc comparisons between the two have been …

Desa Cita Citat per 50 Articles relacionats Totes les 6 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Group fisher pruning for practical network compression

L Liu, S Zhang, Z Kuang, A Zhou… - International …, 2021 - proceedings.mlr.press

Network compression has been widely studied since it is able to reduce the memory and
computation cost during inference. However, previous methods seldom deal with …

Desa Cita Citat per 175 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arxiv preprint arxiv …, 2022 - arxiv.org

Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …

Desa Cita Citat per 134 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Fluctuation-based adaptive structured pruning for large language models

Y An, X Zhao, T Yu, M Tang, J Wang - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Network Pruning is a promising way to address the huge computing resource demands of
the deployment and inference of Large Language Models (LLMs). Retraining-free is …

Desa Cita Citat per 48 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Accelerating sparse deep neural networks

Challenges and applications of large language models

A survey of techniques for optimizing transformer inference

A simple and effective pruning approach for large language models

Sparsegpt: Massive language models can be accurately pruned in one-shot

Optimal brain compression: A framework for accurate post-training quantization and pruning

Towards efficient generative large language model serving: A survey from algorithms to systems

Pruning vs quantization: Which is better?

Group fisher pruning for practical network compression

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

Fluctuation-based adaptive structured pruning for large language models