Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks
L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org
Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …
solving the most complex problem statements. However, these models are huge in size with …
Review of lightweight deep convolutional neural networks
F Chen, S Li, J Han, F Ren, Z Yang - Archives of Computational Methods …, 2024 - Springer
Lightweight deep convolutional neural networks (LDCNNs) are vital components of mobile
intelligence, particularly in mobile vision. Although various heavy networks with increasingly …
intelligence, particularly in mobile vision. Although various heavy networks with increasingly …
H2o: Heavy-hitter oracle for efficient generative inference of large language models
Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
Videomamba: State space model for efficient video understanding
Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …
Deja vu: Contextual sparsity for efficient llms at inference time
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …
wave of exciting AI applications. However, they are computationally expensive at inference …
Logit standardization in knowledge distillation
Abstract Knowledge distillation involves transferring soft labels from a teacher to a student
using a shared temperature-based softmax function. However the assumption of a shared …
using a shared temperature-based softmax function. However the assumption of a shared …
Decoupled knowledge distillation
State-of-the-art distillation methods are mainly based on distilling deep features from
intermediate layers, while the significance of logit distillation is greatly overlooked. To …
intermediate layers, while the significance of logit distillation is greatly overlooked. To …
Knowledge distillation from a stronger teacher
Unlike existing knowledge distillation methods focus on the baseline settings, where the
teacher models and training strategies are not that strong and competing as state-of-the-art …
teacher models and training strategies are not that strong and competing as state-of-the-art …
Multi-level logit distillation
Abstract Knowledge Distillation (KD) aims at distilling the knowledge from the large teacher
model to a lightweight student model. Mainstream KD methods can be divided into two …
model to a lightweight student model. Mainstream KD methods can be divided into two …
Masked generative distillation
Abstract Knowledge distillation has been applied to various tasks successfully. The current
distillation algorithm usually improves students' performance by imitating the output of the …
distillation algorithm usually improves students' performance by imitating the output of the …