Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
[HTML][HTML] A review of the optimal design of neural networks based on FPGA
Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …
speech recognition, natural language processing, automatic driving, and other fields and …
Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization
Large language models (LLMs) face the challenges in fine-tuning and deployment due to
their high memory demands and computational costs. While parameter-efficient fine-tuning …
their high memory demands and computational costs. While parameter-efficient fine-tuning …
Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation
The nonuniform quantization strategy for compressing neural networks usually achieves
better performance than its counterpart, ie, uniform strategy, due to its superior …
better performance than its counterpart, ie, uniform strategy, due to its superior …
RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!
Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural
Network (DNN) inference by reducing costly data movement and by using resistive RAM …
Network (DNN) inference by reducing costly data movement and by using resistive RAM …
Scalable and programmable neural network inference accelerator based on in-memory computing
This work demonstrates a programmable in-memory-computing (IMC) inference accelerator
for scalable execution of neural network (NN) models, leveraging a high-signal-to-noise …
for scalable execution of neural network (NN) models, leveraging a high-signal-to-noise …
Flexround: Learnable rounding based on element-wise division for post-training quantization
Post-training quantization (PTQ) has been gaining popularity for the deployment of deep
neural networks on resource-limited devices since unlike quantization-aware training …
neural networks on resource-limited devices since unlike quantization-aware training …
Learnable companding quantization for accurate low-bit neural networks
Quantizing deep neural networks is an effective method for reducing memory consumption
and improving inference speed, and is thus useful for implementation in resource …
and improving inference speed, and is thus useful for implementation in resource …
Q-vit: Fully differentiable quantization for vision transformer
In this paper, we propose a fully differentiable quantization method for vision transformer
(ViT) named as Q-ViT, in which both of the quantization scales and bit-widths are learnable …
(ViT) named as Q-ViT, in which both of the quantization scales and bit-widths are learnable …
Dkm: Differentiable k-means clustering layer for neural network compression
Deep neural network (DNN) model compression for efficient on-device inference is
becoming increasingly important to reduce memory requirements and keep user data on …
becoming increasingly important to reduce memory requirements and keep user data on …
Improving low-precision network quantization via bin regularization
Abstract Model quantization is an important mechanism for energy-efficient deployment of
deep neural networks on resource-constrained devices by reducing the bit precision of …
deep neural networks on resource-constrained devices by reducing the bit precision of …