From knowledge distillation to self-knowledge distillation: A unified approach with normalized loss and customized soft labels

Z Yang, A Zeng, Z Li, T Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Knowledge Distillation (KD) uses the teacher's prediction logits as soft labels to
guide the student, while self-KD does not need a real teacher to require the soft labels. This …

C2kd: Bridging the modality gap for cross-modal knowledge distillation

F Huo, W Xu, J Guo, H Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Existing Knowledge Distillation (KD) methods typically focus on transferring
knowledge from a large-capacity teacher to a low-capacity student model achieving …

Tolerant self-distillation for image classification

M Liu, Y Yu, Z Ji, J Han, Z Zhang - Neural Networks, 2024 - Elsevier
Deep neural networks tend to suffer from the overfitting issue when the training data are not
enough. In this paper, we introduce two metrics from the intra-class distribution of correct …

Neighbor self-knowledge distillation

P Liang, W Zhang, J Wang, Y Guo - Information Sciences, 2024 - Elsevier
Abstract Self-Knowledge Distillation (Self-KD), a technique that enables neural networks to
learn from themselves, often relies on auxiliary modules or networks to generate supervisory …

Task-specific parameter decoupling for class incremental learning

R Chen, XY **g, F Wu, W Zheng, Y Hao - Information Sciences, 2023 - Elsevier
Class incremental learning (CIL) enables deep networks to progressively learn new tasks
while remembering previously learned knowledge. A popular design for CIL involves …

Aligned objective for soft-pseudo-label generation in supervised learning

N Xu, Y Hu, C Qiao, X Geng - Forty-first International Conference on …, 2024 - openreview.net
Soft pseudo-labels, generated by the softmax predictions of the trained networks, offer a
probabilistic rather than binary form, and have been shown to improve the performance of …

Self-Distillation Learning Based on Temporal-Spatial Consistency for Spiking Neural Networks

L Zuo, Y Ding, M **g, K Yang, Y Yu - arxiv preprint arxiv:2406.07862, 2024 - arxiv.org
Spiking neural networks (SNNs) have attracted considerable attention for their event-driven,
low-power characteristics and high biological interpretability. Inspired by knowledge …

Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation

J Lv, H Yang, P Li - Advances in Neural Information …, 2025 - proceedings.neurips.cc
Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler
Divergence (KL-Div) has been predominant, and recently its variants have achieved …

Self-knowledge distillation based on knowledge transfer from soft to hard examples

Y Tang, Y Chen, L **e - Image and Vision Computing, 2023 - Elsevier
To fully exploit knowledge from self-knowledge distillation network in which a student model
is progressively trained to distill its own knowledge without a pre-trained teacher model, a …

AI-KD: Adversarial learning and Implicit regularization for self-Knowledge Distillation

H Kim, S Suh, S Baek, D Kim, D Jeong, H Cho… - Knowledge-Based …, 2024 - Elsevier
We present a novel adversarial penalized self-knowledge distillation method, named
adversarial learning and implicit regularization for self-knowledge distillation (AI-KD), which …