A fast post-training pruning framework for transformers

W Kwon, S Kim, MW Mahoney… - Advances in …, 2022 - proceedings.neurips.cc
Pruning is an effective way to reduce the huge inference cost of Transformer models.
However, prior work on pruning Transformers requires retraining the models. This can add …

Accurate post training quantization with small calibration sets

I Hubara, Y Nahshan, Y Hanani… - International …, 2021 - proceedings.mlr.press
Lately, post-training quantization methods have gained considerable attention, as they are
simple to use, and require only a small unlabeled calibration set. This small dataset cannot …

Structural pruning via latency-saliency knapsack

M Shen, H Yin, P Molchanov, L Mao… - Advances in Neural …, 2022 - proceedings.neurips.cc
Structural pruning can simplify network architecture and improve inference speed. We
propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a …

Coaching a teachable student

J Zhang, Z Huang, E Ohn-Bar - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We propose a novel knowledge distillation framework for effectively teaching a sensorimotor
student agent to drive from the supervision of a privileged teacher agent. Current distillation …

Improving post training neural quantization: Layer-wise calibration and integer programming

I Hubara, Y Nahshan, Y Hanani, R Banner… - arxiv preprint arxiv …, 2020 - arxiv.org
Lately, post-training quantization methods have gained considerable attention, as they are
simple to use, and require only a small unlabeled calibration set. This small dataset cannot …

Fp-agl: Filter pruning with adaptive gradient learning for accelerating deep convolutional neural networks

NJ Kim, H Kim - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
Filter pruning is a technique that reduces computational complexity, inference time, and
memory footprint by removing unnecessary filters in convolutional neural networks (CNNs) …

SPDY: Accurate pruning with speedup guarantees

E Frantar, D Alistarh - International Conference on Machine …, 2022 - proceedings.mlr.press
The recent focus on the efficiency of deep neural networks (DNNs) has led to significant
work on model compression approaches, of which weight pruning is one of the most …

Hardcore-nas: Hard constrained differentiable neural architecture search

N Nayman, Y Aflalo, A Noy… - … Conference on Machine …, 2021 - proceedings.mlr.press
Realistic use of neural networks often requires adhering to multiple constraints on latency,
energy and memory among others. A popular approach to find fitting networks is through …

Enhanced sparsification via stimulative training

S Tang, W Lin, H Ye, P Ye, C Yu, B Li… - European Conference on …, 2024 - Springer
Sparsification-based pruning has been an important category in model compression.
Existing methods commonly set sparsity-inducing penalty terms to suppress the importance …

What can we learn from the selective prediction and uncertainty estimation performance of 523 imagenet classifiers

I Galil, M Dabbah, R El-Yaniv - arxiv preprint arxiv:2302.11874, 2023 - arxiv.org
When deployed for risk-sensitive tasks, deep neural networks must include an uncertainty
estimation mechanism. Here we examine the relationship between deep architectures and …