- Academic Search

BD Rouhani, R Zhao, A More, M Hall… - ar** and magnitude-aware differentiation for improved quantization-aware training

C Sakr, S Dai, R Venkatesan… - International …, 2022 - proceedings.mlr.press

Data clip** is crucial in reducing noise in quantization operations and improving the
achievable accuracy of quantization-aware training (QAT). Current practices rely on …

Spara Citera Citerat av 40 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

With shared microexponents, a little shifting goes a long way

B Darvish Rouhani, R Zhao, V Elango… - Proceedings of the 50th …, 2023 - dl.acm.org

This paper introduces Block Data Representations (BDR), a framework for exploring and
evaluating a wide spectrum of narrow-precision formats for deep learning. It enables …

Spara Citera Citerat av 44 Relaterade artiklar Alla 3 versionerna

Computers Can Learn from the Heuristic Designs and Master Internet Congestion Control

CY Yen, S Abbasloo, HJ Chao - … of the ACM SIGCOMM 2023 Conference, 2023 - dl.acm.org

In this work, for the first time, we demonstrate that computers can automatically learn from
observing the heuristic efforts of the last four decades, stand on the shoulders of the existing …

Spara Citera Citerat av 26 Relaterade artiklar Alla 2 versionerna

A 95.6-TOPS/W deep learning inference accelerator with per-vector scaled 4-bit quantization in 5 nm

B Keller, R Venkatesan, S Dai, SG Tell… - IEEE Journal of Solid …, 2023 - ieeexplore.ieee.org

The energy efficiency of deep neural network (DNN) inference can be improved with custom
accelerators. DNN inference accelerators often employ specialized hardware techniques to …

Spara Citera Citerat av 29 Relaterade artiklar Alla 2 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Daq: Channel-wise distribution-aware quantization for deep image super-resolution networks

C Hong, H Kim, S Baik, J Oh… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Since the resurgence of deep neural networks (DNNs), image super-resolution (SR) has
recently seen a huge progress in improving the quality of low resolution images, however at …

Spara Citera Citerat av 50 Relaterade artiklar Alla 6 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks

M Neseem, C McCullough, R Hsin… - Proceedings of the …, 2024 - openaccess.thecvf.com

Low-precision quantization is recognized for its efficacy in neural network optimization. Our
analysis reveals that non-quantized elementwise operations which are prevalent in layers …

Spara Citera Citerat av 1 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Pareto-optimal quantized resnet is mostly 4-bit

AA Abdolrashidi, L Wang, S Agrawal… - Proceedings of the …, 2021 - openaccess.thecvf.com

Quantization has become a popular technique to compress neural networks and reduce
compute cost, but most prior work focuses on studying quantization without changing the …

Spara Citera Citerat av 32 Relaterade artiklar Alla 8 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

Spara Citera Citerat av 28 Relaterade artiklar Alla 2 versionerna Se som HTML-version

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Vs-quant: Per-vector scaled quantization for accurate low-precision neural network inference

Microscaling data formats for deep learning

With shared microexponents, a little shifting goes a long way

Computers Can Learn from the Heuristic Designs and Master Internet Congestion Control

A 95.6-TOPS/W deep learning inference accelerator with per-vector scaled 4-bit quantization in 5 nm

Daq: Channel-wise distribution-aware quantization for deep image super-resolution networks

PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks

Pareto-optimal quantized resnet is mostly 4-bit

Model compression and efficient inference for large language models: A survey