Distillsleepnet: Heterogeneous multi-level knowledge distillation via teacher assistant for sleep staging

Z Jia, H Liang, Y Liu, H Wang… - IEEE Transactions on Big …, 2024 - ieeexplore.ieee.org
Accurate sleep staging is crucial for the diagnosis of diseases such as sleep disorders.
Existing sleep staging models with excellent performance are usually large and require a lot …

MJPNet-S*: Multistyle Joint-Perception Network with Knowledge Distillation for Drone RGB-Thermal Crowd Density Estimation in Smart Cities

W Zhou, X Yang, X Dong, M Fang… - IEEE Internet of Things …, 2024 - ieeexplore.ieee.org
Crowd density estimation has gained significant research interest owing to its potential in
various industries and social applications. Therefore, this article proposes a multistyle joint …

Frequency attention for knowledge distillation

C Pham, VA Nguyen, T Le, D Phung… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Knowledge distillation is an attractive approach for learning compact deep neural
networks, which learns a lightweight student model by distilling knowledge from a complex …

[HTML][HTML] A Survey on Knowledge Distillation: Recent Advancements

A Moslemi, A Briskina, Z Dang, J Li - Machine Learning with Applications, 2024 - Elsevier
Deep learning has achieved notable success across academia, medicine, and industry. Its
ability to identify complex patterns in large-scale data and to manage millions of parameters …

Fbi-llm: Scaling up fully binarized llms from scratch via autoregressive distillation

L Ma, M Sun, Z Shen - arxiv preprint arxiv:2407.07093, 2024 - arxiv.org
This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for
the first time how to train a large-scale binary language model from scratch (not the partial …

Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks

E Kinderman, I Hubara, H Maron, D Soudry - arxiv preprint arxiv …, 2024 - arxiv.org
Many recent methods aim to merge neural networks (NNs) with identical architectures
trained on different tasks to obtain a single multi-task model. Most existing works tackle the …

Multi-dataset fusion for multi-task learning on face attribute recognition

H Lu, S Xu, J Wang - Pattern Recognition Letters, 2023 - Elsevier
The goal of face attribute recognition (FAR) is to recognize the attributes of face images,
such as gender, race, etc. Multi-dataset fusion aims to train a network with multiple datasets …

3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection

RC Cabral, S Luo, J Poon, SC Han - Proceedings of the 33rd ACM …, 2024 - dl.acm.org
The significance of mental health classification is paramount in contemporary society, where
digital platforms serve as crucial sources for monitoring individuals' well-being. However …

ATMKD: adaptive temperature guided multi-teacher knowledge distillation

Y Lin, S Yin, Y Ding, X Liang - Multimedia Systems, 2024 - Springer
Abstract Knowledge distillation is a technique that aims to distill the knowledge from a large
well-trained teacher model to a lightweight student model. In recent years, multi-teacher …

Self-Supervised Quantization-Aware Knowledge Distillation

K Zhao, M Zhao - arxiv preprint arxiv:2403.11106, 2024 - arxiv.org
Quantization-aware training (QAT) and Knowledge Distillation (KD) are combined to achieve
competitive performance in creating low-bit deep learning models. However, existing works …