- Academic Search

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

Gem Citer Citeret af 129 Relaterede artikler Alle 2 versioner

[Free GPT-4]

[PDF] arxiv.org

Quarot: Outlier-free 4-bit inference in rotated llms

S Ashkboos, A Mohtashami, ML Croci, B Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to
quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot …

Gem Citer Citeret af 81 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]

[PDF] radensa.ru

[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …

F Wang, Z Zhang, X Zhang, Z Wu, T Mo, Q Lu… - arxiv preprint arxiv …, 2024 - ai.radensa.ru

Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …

Gem Citer Citeret af 7 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]

[PDF] openreview.net

Compact language models via pruning and knowledge distillation

S Muralidharan, ST Sreenivas, RB Joshi… - The Thirty-eighth …, 2024 - openreview.net

Large language models (LLMs) targeting different deployment scales and sizes are currently
produced by training each variant from scratch; this is extremely compute-intensive. In this …

Gem Citer Citeret af 37 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]

[PDF] arxiv.org

Assessing the brittleness of safety alignment via pruning and low-rank modifications

B Wei, K Huang, Y Huang, T **e, X Qi, M **a… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) show inherent brittleness in their safety mechanisms, as
evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This …

Gem Citer Citeret af 70 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]

[PDF] arxiv.org

Shortgpt: Layers in large language models are more redundant than you expect

X Men, M Xu, Q Zhang, B Wang, H Lin, Y Lu… - arxiv preprint arxiv …, 2024 - arxiv.org

As Large Language Models (LLMs) continue to advance in performance, their size has
escalated significantly, with current LLMs containing billions or even trillions of parameters …

Gem Citer Citeret af 98 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]

[PDF] arxiv.org

Maskllm: Learnable semi-structured sparsity for large language models

G Fang, H Yin, S Muralidharan, G Heinrich… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) are distinguished by their massive parameter counts, which
typically result in significant redundancy. This work introduces MaskLLM, a learnable …

Gem Citer Citeret af 10 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]

[PDF] arxiv.org

The llm surgeon

TFA van der Ouderaa, M Nagel, M Van Baalen… - arxiv preprint arxiv …, 2023 - arxiv.org

State-of-the-art language models are becoming increasingly large in an effort to achieve the
highest performance on large corpora of available textual data. However, the sheer size of …

Gem Citer Citeret af 28 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]

[PDF] arxiv.org

Mobillama: Towards accurate and lightweight fully transparent gpt

O Thawakar, A Vayani, S Khan, H Cholakal… - arxiv preprint arxiv …, 2024 - arxiv.org

" Bigger the better" has been the predominant trend in recent Large Language Models
(LLMs) development. However, LLMs do not suit well for scenarios that require on-device …

Gem Citer Citeret af 22 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]

[PDF] arxiv.org

A deeper look at depth pruning of llms

SA Siddiqui, X Dong, G Heinrich, T Breuel… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) are not only resource-intensive to train but even more
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …

Gem Citer Citeret af 5 Relaterede artikler Alle 4 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Slicegpt: Compress large language models by deleting rows and columns

A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

Quarot: Outlier-free 4-bit inference in rotated llms

[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …

Compact language models via pruning and knowledge distillation

Assessing the brittleness of safety alignment via pruning and low-rank modifications

Shortgpt: Layers in large language models are more redundant than you expect

Maskllm: Learnable semi-structured sparsity for large language models

The llm surgeon

Mobillama: Towards accurate and lightweight fully transparent gpt

A deeper look at depth pruning of llms