Structured pruning for deep convolutional neural networks: A survey

Y He, L **ao - IEEE transactions on pattern analysis and …, 2023 - ieeexplore.ieee.org
The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …

A comprehensive survey on model quantization for deep neural networks in image classification

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-power computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

LungNet: A hybrid deep-CNN model for lung cancer diagnosis using CT and wearable sensor-based medical IoT data

N Faruqui, MA Yousuf, M Whaiduzzaman… - Computers in Biology …, 2021 - Elsevier
Lung cancer, also known as pulmonary cancer, is one of the deadliest cancers, but yet
curable if detected at the early stage. At present, the ambiguous features of the lung cancer …

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arxiv preprint arxiv …, 2022 - arxiv.org
Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …

Squant: On-the-fly data-free quantization via diagonal hessian approximation

C Guo, Y Qiu, J Leng, X Gao, C Zhang, Y Liu… - arxiv preprint arxiv …, 2022 - arxiv.org
Quantization of deep neural networks (DNN) has been proven effective for compressing and
accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the …

Applications and techniques for fast machine learning in science

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

Global vision transformer pruning with hessian-aware saliency

H Yang, H Yin, M Shen, P Molchanov… - Proceedings of the …, 2023 - openaccess.thecvf.com
Transformers yield state-of-the-art results across many tasks. However, their heuristically
designed architecture impose huge computational costs during inference. This work aims on …