Google Academic

HI Liu, M Galindo, H **e, LK Wong, HH Shuai… - ACM Computing …, 2024 - dl.acm.org

Over the past decade, the dominance of deep learning has prevailed across various
domains of artificial intelligence, including natural language processing, computer vision …

Salvați Citați Citat de 35 ori Articole cu conținut similar Toate cele 8 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pre-trained models for natural language processing: A survey

X Qiu, T Sun, Y Xu, Y Shao, N Dai, X Huang - Science China …, 2020 - Springer

Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …

Salvați Citați Citat de 1926 ori Articole cu conținut similar Toate cele 7 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in neural …, 2022 - proceedings.neurips.cc

Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

Salvați Citați Citat de 1011 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Quip: 2-bit quantization of large language models with guarantees

J Chee, Y Cai, V Kuleshov… - Advances in Neural …, 2023 - proceedings.neurips.cc

This work studies post-training parameter quantization in large language models (LLMs).
We introduce quantization with incoherence processing (QuIP), a new method based on the …

Salvați Citați Citat de 152 ori Articole cu conținut similar Toate cele 9 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Q-diffusion: Quantizing diffusion models

X Li, Y Liu, L Lian, H Yang, Z Dong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Diffusion models have achieved great success in image synthesis through iterative noise
estimation using deep neural networks. However, the slow inference, high memory …

Salvați Citați Citat de 162 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers

Z Yao, R Yazdani Aminabadi… - Advances in …, 2022 - proceedings.neurips.cc

How to efficiently serve ever-larger trained natural language models in practice has become
exceptionally challenging even for powerful cloud servers due to their prohibitive …

Salvați Citați Citat de 408 ori Articole cu conținut similar Toate cele 8 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

Salvați Citați Citat de 186 ori Articole cu conținut similar Toate cele 8 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A white paper on neural network quantization

M Nagel, M Fournarakis, RA Amjad… - arxiv preprint arxiv …, 2021 - arxiv.org

While neural networks have advanced the frontiers in many applications, they often come at
a high computational cost. Reducing the power and latency of neural network inference is …

Salvați Citați Citat de 633 ori Articole cu conținut similar Toate cele 2 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-power computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Salvați Citați Citat de 1409 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale

S Rajbhandari, C Li, Z Yao, M Zhang… - International …, 2022 - proceedings.mlr.press

As the training of giant dense models hits the boundary on the availability and capability of
the hardware resources today, Mixture-of-Experts (MoE) models have become one of the …

Salvați Citați Citat de 257 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Hawq: Hessian aware quantization of neural networks with mixed-precision

Lightweight deep learning for resource-constrained environments: A survey

Pre-trained models for natural language processing: A survey

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

Quip: 2-bit quantization of large language models with guarantees

Q-diffusion: Quantizing diffusion models

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers

Squeezellm: Dense-and-sparse quantization

A white paper on neural network quantization

A survey of quantization methods for efficient neural network inference

Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale