Google Наука

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Запазване Позоваване С позовавания в 242 Сродни статии Всички 7 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lightweight deep learning for resource-constrained environments: A survey

HI Liu, M Galindo, H **e, LK Wong, HH Shuai… - ACM Computing …, 2024 - dl.acm.org

Over the past decade, the dominance of deep learning has prevailed across various
domains of artificial intelligence, including natural language processing, computer vision …

Запазване Позоваване С позовавания в 35 Сродни статии Всички 8 версии

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in neural …, 2022 - proceedings.neurips.cc

Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

Запазване Позоваване С позовавания в 1011 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Quip: 2-bit quantization of large language models with guarantees

J Chee, Y Cai, V Kuleshov… - Advances in Neural …, 2023 - proceedings.neurips.cc

This work studies post-training parameter quantization in large language models (LLMs).
We introduce quantization with incoherence processing (QuIP), a new method based on the …

Запазване Позоваване С позовавания в 152 Сродни статии Всички 9 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

Запазване Позоваване С позовавания в 186 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Optimal brain compression: A framework for accurate post-training quantization and pruning

E Frantar, D Alistarh - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We consider the problem of model compression for deep neural networks (DNNs) in the
challenging one-shot/post-training setting, in which we are given an accurate trained model …

Запазване Позоваване С позовавания в 233 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-power computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Запазване Позоваване С позовавания в 1409 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Repq-vit: Scale reparameterization for post-training quantization of vision transformers

Z Li, J **ao, L Yang, Q Gu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Abstract Post-training quantization (PTQ), which only requires a tiny dataset for calibration
without end-to-end retraining, is a light and practical model compression technique …

Запазване Позоваване С позовавания в 88 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

I-bert: Integer-only bert quantization

S Kim, A Gholami, Z Yao… - … on machine learning, 2021 - proceedings.mlr.press

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results
in many Natural Language Processing tasks. However, their memory footprint, inference …

Запазване Позоваване С позовавания в 391 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

Запазване Позоваване С позовавания в 99 Сродни статии Всички 4 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Hawq-v3: Dyadic neural network quantization

A review of deep learning techniques for speech processing

Lightweight deep learning for resource-constrained environments: A survey

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

Quip: 2-bit quantization of large language models with guarantees

Squeezellm: Dense-and-sparse quantization

Optimal brain compression: A framework for accurate post-training quantization and pruning

A survey of quantization methods for efficient neural network inference

Repq-vit: Scale reparameterization for post-training quantization of vision transformers

I-bert: Integer-only bert quantization

Full stack optimization of transformer inference: a survey