Model compression for deep neural networks: A survey

Z Li, H Li, L Meng - Computers, 2023 - mdpi.com
Currently, with the rapid development of deep learning, deep neural networks (DNNs) have
been widely applied in various computer vision tasks. However, in the pursuit of …

Distilling knowledge via knowledge review

P Chen, S Liu, H Zhao, J Jia - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Abstract Knowledge distillation transfers knowledge from the teacher network to the student
one, with the goal of greatly improving the performance of the student network. Previous …

Logit standardization in knowledge distillation

S Sun, W Ren, J Li, R Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Knowledge distillation involves transferring soft labels from a teacher to a student
using a shared temperature-based softmax function. However the assumption of a shared …

Decoupled knowledge distillation

B Zhao, Q Cui, R Song, Y Qiu… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
State-of-the-art distillation methods are mainly based on distilling deep features from
intermediate layers, while the significance of logit distillation is greatly overlooked. To …

Anomaly detection via reverse distillation from one-class embedding

H Deng, X Li - Proceedings of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Abstract Knowledge distillation (KD) achieves promising results on the challenging problem
of unsupervised anomaly detection (AD). The representation discrepancy of anomalies in …

Knowledge distillation from a stronger teacher

T Huang, S You, F Wang, C Qian… - Advances in Neural …, 2022 - proceedings.neurips.cc
Unlike existing knowledge distillation methods focus on the baseline settings, where the
teacher models and training strategies are not that strong and competing as state-of-the-art …

Curriculum temperature for knowledge distillation

Z Li, X Li, L Yang, B Zhao, R Song, L Luo, J Li… - Proceedings of the …, 2023 - ojs.aaai.org
Most existing distillation methods ignore the flexible role of the temperature in the loss
function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In …

Point-to-voxel knowledge distillation for lidar semantic segmentation

Y Hou, X Zhu, Y Ma, CC Loy… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
This article addresses the problem of distilling knowledge from a large teacher model to a
slim student network for LiDAR semantic segmentation. Directly employing previous …

Multi-level logit distillation

Y **, J Wang, D Lin - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Abstract Knowledge Distillation (KD) aims at distilling the knowledge from the large teacher
model to a lightweight student model. Mainstream KD methods can be divided into two …

Knowledge distillation with the reused teacher classifier

D Chen, JP Mei, H Zhang, C Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Knowledge distillation aims to compress a powerful yet cumbersome teacher model
into a lightweight student model without much sacrifice of performance. For this purpose …