Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org
Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

Tinyml meets iot: A comprehensive survey

L Dutta, S Bharali - Internet of Things, 2021 - Elsevier
The rapid growth in miniaturization of low-power embedded devices and advancement in
the optimization of machine learning (ML) algorithms have opened up a new prospect of the …

Decoupled knowledge distillation

B Zhao, Q Cui, R Song, Y Qiu… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
State-of-the-art distillation methods are mainly based on distilling deep features from
intermediate layers, while the significance of logit distillation is greatly overlooked. To …

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Self-training with noisy student improves imagenet classification

Q **e, MT Luong, E Hovy… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet,
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled …

Superglue: A stickier benchmark for general-purpose language understanding systems

A Wang, Y Pruksachatkun, N Nangia… - Advances in neural …, 2019 - proceedings.neurips.cc
In the last year, new models and methods for pretraining and transfer learning have driven
striking performance improvements across a range of language understanding tasks. The …

Adversarial examples are not bugs, they are features

A Ilyas, S Santurkar, D Tsipras… - Advances in neural …, 2019 - proceedings.neurips.cc
Adversarial examples have attracted significant attention in machine learning, but the
reasons for their existence and pervasiveness remain unclear. We demonstrate that …

R-drop: Regularized dropout for neural networks

L Wu, J Li, Y Wang, Q Meng, T Qin… - Advances in …, 2021 - proceedings.neurips.cc
Dropout is a powerful and widely used technique to regularize the training of deep neural
networks. Though effective and performing well, the randomness introduced by dropout …

Rethinking few-shot image classification: a good embedding is all you need?

Y Tian, Y Wang, D Krishnan, JB Tenenbaum… - Computer Vision–ECCV …, 2020 - Springer
The focus of recent meta-learning research has been on the development of learning
algorithms that can quickly adapt to test time tasks with limited data and low computational …

Relational knowledge distillation

W Park, D Kim, Y Lu, M Cho - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
Abstract Knowledge distillation aims at transferring knowledge acquired in one model (a
teacher) to another model (a student) that is typically smaller. Previous approaches can be …