Review of lightweight deep convolutional neural networks
F Chen, S Li, J Han, F Ren, Z Yang - Archives of Computational Methods …, 2024 - Springer
Lightweight deep convolutional neural networks (LDCNNs) are vital components of mobile
intelligence, particularly in mobile vision. Although various heavy networks with increasingly …
intelligence, particularly in mobile vision. Although various heavy networks with increasingly …
[HTML][HTML] Applications and techniques for fast machine learning in science
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …
Quip: 2-bit quantization of large language models with guarantees
This work studies post-training parameter quantization in large language models (LLMs).
We introduce quantization with incoherence processing (QuIP), a new method based on the …
We introduce quantization with incoherence processing (QuIP), a new method based on the …
A survey of quantization methods for efficient neural network inference
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …
Neural Network computations, covering the advantages/disadvantages of current methods …
Q-diffusion: Quantizing diffusion models
Diffusion models have achieved great success in image synthesis through iterative noise
estimation using deep neural networks. However, the slow inference, high memory …
estimation using deep neural networks. However, the slow inference, high memory …
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale
As the training of giant dense models hits the boundary on the availability and capability of
the hardware resources today, Mixture-of-Experts (MoE) models have become one of the …
the hardware resources today, Mixture-of-Experts (MoE) models have become one of the …
Post-training quantization for vision transformer
Recently, transformer has achieved remarkable performance on a variety of computer vision
applications. Compared with mainstream convolutional neural networks, vision transformers …
applications. Compared with mainstream convolutional neural networks, vision transformers …
Squeezellm: Dense-and-sparse quantization
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …
wide range of tasks. However, deploying these models for inference has been a significant …
Hawq-v3: Dyadic neural network quantization
Current low-precision quantization algorithms often have the hidden cost of conversion back
and forth from floating point to quantized integer values. This hidden cost limits the latency …
and forth from floating point to quantized integer values. This hidden cost limits the latency …
Zeroq: A novel zero shot quantization framework
Quantization is a promising approach for reducing the inference time and memory footprint
of neural networks. However, most existing quantization methods require access to the …
of neural networks. However, most existing quantization methods require access to the …