الباحث العلمي من Google

C Wang, Z Luo - Applied Sciences, 2022‏ - mdpi.com‏

Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …‏

حفظ اقتباس تم اقتباسها في عدد: 36 مقالات ذات صلة الإصدارات الـ 3كلها نسخة مخزَّنة مؤقتًا

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization‏

J Kim, JH Lee, S Kim, J Park, KM Yoo… - Advances in Neural …, 2023‏ - proceedings.neurips.cc‏

Large language models (LLMs) face the challenges in fine-tuning and deployment due to
their high memory demands and computational costs. While parameter-efficient fine-tuning …‏

حفظ اقتباس تم اقتباسها في عدد: 94 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation‏

Z Liu, KT Cheng, D Huang… - Proceedings of the …, 2022‏ - openaccess.thecvf.com‏

The nonuniform quantization strategy for compressing neural networks usually achieves
better performance than its counterpart, ie, uniform strategy, due to its superior …‏

حفظ اقتباس تم اقتباسها في عدد: 107 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!‏

T Andrulis, JS Emer, V Sze - … of the 50th Annual International Symposium …, 2023‏ - dl.acm.org‏

Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural
Network (DNN) inference by reducing costly data movement and by using resistive RAM …‏

حفظ اقتباس تم اقتباسها في عدد: 32 مقالات ذات صلة الإصدارات الـ 7كلها

[Free GPT-4]
[DeepSeek]

[PDF] princeton.edu

Scalable and programmable neural network inference accelerator based on in-memory computing‏

H Jia, M Ozatay, Y Tang, H Valavi… - IEEE Journal of Solid …, 2021‏ - ieeexplore.ieee.org‏

This work demonstrates a programmable in-memory-computing (IMC) inference accelerator
for scalable execution of neural network (NN) models, leveraging a high-signal-to-noise …‏

حفظ اقتباس تم اقتباسها في عدد: 87 مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Flexround: Learnable rounding based on element-wise division for post-training quantization‏

JH Lee, J Kim, SJ Kwon, D Lee - … Conference on Machine …, 2023‏ - proceedings.mlr.press‏

Post-training quantization (PTQ) has been gaining popularity for the deployment of deep
neural networks on resource-limited devices since unlike quantization-aware training …‏

حفظ اقتباس تم اقتباسها في عدد: 29 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Learnable companding quantization for accurate low-bit neural networks‏

K Yamamoto - Proceedings of the IEEE/CVF conference on …, 2021‏ - openaccess.thecvf.com‏

Quantizing deep neural networks is an effective method for reducing memory consumption
and improving inference speed, and is thus useful for implementation in resource …‏

حفظ اقتباس تم اقتباسها في عدد: 87 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Q-vit: Fully differentiable quantization for vision transformer‏

Z Li, T Yang, P Wang, J Cheng - arxiv preprint arxiv:2201.07703, 2022‏ - arxiv.org‏

In this paper, we propose a fully differentiable quantization method for vision transformer
(ViT) named as Q-ViT, in which both of the quantization scales and bit-widths are learnable …‏

حفظ اقتباس تم اقتباسها في عدد: 49 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dkm: Differentiable k-means clustering layer for neural network compression‏

M Cho, KA Vahid, S Adya, M Rastegari - arxiv preprint arxiv:2108.12659, 2021‏ - arxiv.org‏

Deep neural network (DNN) model compression for efficient on-device inference is
becoming increasingly important to reduce memory requirements and keep user data on …‏

حفظ اقتباس تم اقتباسها في عدد: 48 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Improving low-precision network quantization via bin regularization‏

T Han, D Li, J Liu, L Tian… - Proceedings of the IEEE …, 2021‏ - openaccess.thecvf.com‏

Abstract Model quantization is an important mechanism for energy-efficient deployment of
deep neural networks on resource-constrained devices by reducing the bit precision of …‏

حفظ اقتباس تم اقتباسها في عدد: 44 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Linear symmetric quantization of neural networks for low-precision integer hardware

[HTML][HTML] A review of the optimal design of neural networks based on FPGA‏

Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization‏

Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation‏

RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!‏

Scalable and programmable neural network inference accelerator based on in-memory computing‏

Flexround: Learnable rounding based on element-wise division for post-training quantization‏

Learnable companding quantization for accurate low-bit neural networks‏

Q-vit: Fully differentiable quantization for vision transformer‏

Dkm: Differentiable k-means clustering layer for neural network compression‏

Improving low-precision network quantization via bin regularization‏