Categories of response-based, feature-based, and relation-based knowledge distillation

C Yang, X Yu, Z An, Y Xu - … Distillation: Towards New Horizons of Intelligent …, 2023 - Springer
Deep neural networks have achieved remarkable performance for artificial intelligence
tasks. The success behind intelligent systems often relies on large-scale models with high …

Simple Unsupervised Knowledge Distillation With Space Similarity

A Singh, H Wang - European Conference on Computer Vision, 2024 - Springer
As per recent studies, Self-supervised learning (SSL) does not readily extend to smaller
architectures. One direction to mitigate this shortcoming while simultaneously training a …

Knowledge Distillation in RNN-Attention Models for Early Prediction of Student Performance

S Leelaluk, C Tang, V Švábenský… - arxiv preprint arxiv …, 2024 - arxiv.org
Educational data mining (EDM) is a part of applied computing that focuses on automatically
analyzing data from learning contexts. Early prediction for identifying at-risk students is a …

KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer

K Zhao, N Ukita - arxiv preprint arxiv:2302.11208, 2023 - arxiv.org
Scaled dot-product attention applies a softmax function on the scaled dot-product of queries
and keys to calculate weights and then multiplies the weights and values. In this work, we …

Exemplar-Free Continual Learning in Vision Transformers via Feature Attention Distillation

X Dai, J Cheng, Z Wei, B Du - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
In this paper, we propose a new approach for continual learning based on the Visual
Transformers (ViTs). The purpose of continual learning is to address the catastrophic …