Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

Lightweight deep learning for resource-constrained environments: A survey

HI Liu, M Galindo, H **e, LK Wong, HH Shuai… - ACM Computing …, 2024 - dl.acm.org
Over the past decade, the dominance of deep learning has prevailed across various
domains of artificial intelligence, including natural language processing, computer vision …

Knowledge distillation with the reused teacher classifier

D Chen, JP Mei, H Zhang, C Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Knowledge distillation aims to compress a powerful yet cumbersome teacher model
into a lightweight student model without much sacrifice of performance. For this purpose …

Tokens-to-token vit: Training vision transformers from scratch on imagenet

L Yuan, Y Chen, T Wang, W Yu, Y Shi… - Proceedings of the …, 2021 - openaccess.thecvf.com
Transformers, which are popular for language modeling, have been explored for solving
vision tasks recently, eg, the Vision Transformer (ViT) for image classification. The ViT model …

From knowledge distillation to self-knowledge distillation: A unified approach with normalized loss and customized soft labels

Z Yang, A Zeng, Z Li, T Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Knowledge Distillation (KD) uses the teacher's prediction logits as soft labels to
guide the student, while self-KD does not need a real teacher to require the soft labels. This …

Volo: Vision outlooker for visual recognition

L Yuan, Q Hou, Z Jiang, J Feng… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With
low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the …

L2g: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation

PT Jiang, Y Yang, Q Hou, Y Wei - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Mining precise class-aware attention maps, aka, class activation maps, is essential for
weakly supervised semantic segmentation. In this paper, we present L2G, a simple online …

Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation

T Kim, J Oh, NY Kim, S Cho, SY Yun - arxiv preprint arxiv:2105.08919, 2021 - arxiv.org
Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a
lightweight student model, has been investigated to design efficient neural architectures …

Cross-layer distillation with semantic calibration

D Chen, JP Mei, Y Zhang, C Wang, Z Wang… - Proceedings of the …, 2021 - ojs.aaai.org
Recently proposed knowledge distillation approaches based on feature-map transfer
validate that intermediate layers of a teacher model can serve as effective targets for training …

General instance distillation for object detection

X Dai, Z Jiang, Z Wu, Y Bao, Z Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com
In recent years, knowledge distillation has been proved to be an effective solution for model
compression. This approach can make lightweight student models acquire the knowledge …