- Academic Search

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

Uložit Citovat Počet citací tohoto článku: 129 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in Neural …, 2022 - proceedings.neurips.cc

Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

Uložit Citovat Počet citací tohoto článku: 968 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A simple and effective pruning approach for large language models

M Sun, Z Liu, A Bair, JZ Kolter - ar** attention heads do nothing

Y Bondarenko, M Nagel… - Advances in Neural …, 2024 - proceedings.neurips.cc

Transformer models have been widely adopted in various domains over the last years and
especially large language models have advanced the field of AI significantly. Due to their …

Uložit Citovat Počet citací tohoto článku: 76 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arxiv preprint arxiv …, 2022 - arxiv.org

Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …

Uložit Citovat Počet citací tohoto článku: 130 Související články Všechny verze (počet: 3) Zobrazit jako HTML

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

A simple and effective pruning approach for large language models

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models