Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
[HTML][HTML] A review of the optimal design of neural networks based on FPGA
C Wang, Z Luo - Applied Sciences, 2022 - mdpi.com
Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …
speech recognition, natural language processing, automatic driving, and other fields and …
Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization
Large language models (LLMs) face the challenges in fine-tuning and deployment due to
their high memory demands and computational costs. While parameter-efficient fine-tuning …
their high memory demands and computational costs. While parameter-efficient fine-tuning …
Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation
The nonuniform quantization strategy for compressing neural networks usually achieves
better performance than its counterpart, ie, uniform strategy, due to its superior …
better performance than its counterpart, ie, uniform strategy, due to its superior …
RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!
Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural
Network (DNN) inference by reducing costly data movement and by using resistive RAM …
Network (DNN) inference by reducing costly data movement and by using resistive RAM …
Scalable and programmable neural network inference accelerator based on in-memory computing
This work demonstrates a programmable in-memory-computing (IMC) inference accelerator
for scalable execution of neural network (NN) models, leveraging a high-signal-to-noise …
for scalable execution of neural network (NN) models, leveraging a high-signal-to-noise …
Flexround: Learnable rounding based on element-wise division for post-training quantization
Post-training quantization (PTQ) has been gaining popularity for the deployment of deep
neural networks on resource-limited devices since unlike quantization-aware training …
neural networks on resource-limited devices since unlike quantization-aware training …
Learnable companding quantization for accurate low-bit neural networks
K Yamamoto - Proceedings of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
Quantizing deep neural networks is an effective method for reducing memory consumption
and improving inference speed, and is thus useful for implementation in resource …
and improving inference speed, and is thus useful for implementation in resource …
Q-vit: Fully differentiable quantization for vision transformer
In this paper, we propose a fully differentiable quantization method for vision transformer
(ViT) named as Q-ViT, in which both of the quantization scales and bit-widths are learnable …
(ViT) named as Q-ViT, in which both of the quantization scales and bit-widths are learnable …
Dkm: Differentiable k-means clustering layer for neural network compression
Deep neural network (DNN) model compression for efficient on-device inference is
becoming increasingly important to reduce memory requirements and keep user data on …
becoming increasingly important to reduce memory requirements and keep user data on …
Improving low-precision network quantization via bin regularization
Abstract Model quantization is an important mechanism for energy-efficient deployment of
deep neural networks on resource-constrained devices by reducing the bit precision of …
deep neural networks on resource-constrained devices by reducing the bit precision of …