Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Enabling all in-edge deep learning: A literature review

P Joshi, M Hasanuzzaman, C Thapa, H Afli… - IEEE Access, 2023 - ieeexplore.ieee.org
In recent years, deep learning (DL) models have demonstrated remarkable achievements
on non-trivial tasks such as speech recognition, image processing, and natural language …

A survey of model compression strategies for object detection

Z Lyu, T Yu, F Pan, Y Zhang, J Luo, D Zhang… - Multimedia tools and …, 2024 - Springer
Deep neural networks (DNNs) have achieved great success in many object detection tasks.
However, such DNNS-based large object detection models are generally computationally …

Distilling global and local logits with densely connected relations

Y Kim, J Park, YH Jang, M Ali… - Proceedings of the …, 2021 - openaccess.thecvf.com
In prevalent knowledge distillation, logits in most image recognition models are computed by
global average pooling, then used to learn to encode the high-level and task-relevant …

[HTML][HTML] A Survey on Knowledge Distillation: Recent Advancements

A Moslemi, A Briskina, Z Dang, J Li - Machine Learning with Applications, 2024 - Elsevier
Deep learning has achieved notable success across academia, medicine, and industry. Its
ability to identify complex patterns in large-scale data and to manage millions of parameters …

Collaborative multi-teacher knowledge distillation for learning low bit-width deep neural networks

C Pham, T Hoang, TT Do - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Abstract Knowledge distillation which learns a lightweight student model by distilling
knowledge from a cumbersome teacher model is an attractive approach for learning …

Conditional pseudo-supervised contrast for data-Free knowledge distillation

R Shao, W Zhang, J Wang - Pattern Recognition, 2023 - Elsevier
Data-free knowledge distillation (DFKD) is an effective manner to solve model compression
and transmission restrictions while retaining privacy protection, which has attracted …

Quantized feature distillation for network quantization

K Zhu, YY He, J Wu - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Neural network quantization aims to accelerate and trim full-precision neural network
models by using low bit approximations. Methods adopting the quantization aware training …

Fbi-llm: Scaling up fully binarized llms from scratch via autoregressive distillation

L Ma, M Sun, Z Shen - arxiv preprint arxiv:2407.07093, 2024 - arxiv.org
This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for
the first time how to train a large-scale binary language model from scratch (not the partial …

Self-Supervised Quantization-Aware Knowledge Distillation

K Zhao, M Zhao - arxiv preprint arxiv:2403.11106, 2024 - arxiv.org
Quantization-aware training (QAT) and Knowledge Distillation (KD) are combined to achieve
competitive performance in creating low-bit deep learning models. However, existing works …