- Academic Search

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Enregistrer Citer Cité 68 fois Autres articles Les 6 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gptq: Accurate post-training quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler, D Alistarh - arxiv preprint arxiv …, 2022 - arxiv.org

Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart
through breakthrough performance across complex language modelling tasks, but also by …

Enregistrer Citer Cité 756 fois Autres articles Les 19 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Enregistrer Citer Cité 129 fois Autres articles Les 7 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

OPTQ: Accurate quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler… - … Conference on Learning …, 2022 - openreview.net

Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart
through breakthrough performance across complex language modelling tasks, but also by …

Enregistrer Citer Cité 211 fois Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu

Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Enregistrer Citer Cité 111 fois Autres articles Les 10 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Speculative decoding with big little decoder

S Kim, K Mangalam, S Moon, J Malik… - Advances in …, 2024 - proceedings.neurips.cc

The recent emergence of Large Language Models based on the Transformer architecture
has enabled dramatic advancements in the field of Natural Language Processing. However …

Enregistrer Citer Cité 72 fois Autres articles Les 5 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation

Z Yao, X Wu, C Li, S Youn, Y He - arxiv preprint arxiv:2303.08302, 2023 - arxiv.org

Post-training quantization (PTQ) has emerged as a promising technique for mitigating
memory consumption and computational costs in large language models (LLMs). However …

Enregistrer Citer Cité 46 fois Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Understanding int4 quantization for language models: latency speedup, composability, and failure cases

X Wu, C Li, RY Aminabadi, Z Yao… - … Conference on Machine …, 2023 - proceedings.mlr.press

Improving the deployment efficiency of transformer-based language models has been
challenging given their high computation and memory cost. While INT8 quantization has …

Enregistrer Citer Cité 19 fois Autres articles Les 4 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Exploring post-training quantization in llms from comprehensive study to low rank compensation

Z Yao, X Wu, C Li, S Youn, Y He - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Post-training quantization (PTQ) has emerged as a promising technique for mitigating
memory consumption and computational costs in large language models (LLMs). However …

Enregistrer Citer Cité 19 fois Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zeroquant-fp: A leap forward in llms post-training w4a8 quantization using floating-point formats

X Wu, Z Yao, Y He - arxiv preprint arxiv:2307.09782, 2023 - arxiv.org

In the complex domain of large language models (LLMs), striking a balance between
computational efficiency and maintaining model quality is a formidable challenge …

Enregistrer Citer Cité 25 fois Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Xtc: Extreme compression for pre-trained transformers made simple and efficient

A survey of techniques for optimizing transformer inference

Gptq: Accurate post-training quantization for generative pre-trained transformers

Efficient large language models: A survey

OPTQ: Accurate quantization for generative pre-trained transformers

Efficient methods for natural language processing: A survey

Speculative decoding with big little decoder

Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation

Understanding int4 quantization for language models: latency speedup, composability, and failure cases

Exploring post-training quantization in llms from comprehensive study to low rank compensation

Zeroquant-fp: A leap forward in llms post-training w4a8 quantization using floating-point formats