Google Академія

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org

Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

Зберегти Послатися Цитовано в 819 джерелах Пов’язані статті Кількість версій: 10

Review of lightweight deep convolutional neural networks

F Chen, S Li, J Han, F Ren, Z Yang - Archives of Computational Methods …, 2024 - Springer

Lightweight deep convolutional neural networks (LDCNNs) are vital components of mobile
intelligence, particularly in mobile vision. Although various heavy networks with increasingly …

Зберегти Послатися Цитовано в 31 джерелах Пов’язані статті Кількість версій: 2

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2023 - proceedings.neurips.cc

Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

Зберегти Послатися Цитовано в 309 джерелах Пов’язані статті Кількість версій: 9 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Зберегти Послатися Цитовано в 153 джерелах Пов’язані статті Кількість версій: 7

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

Зберегти Послатися Цитовано в 265 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Logit standardization in knowledge distillation

S Sun, W Ren, J Li, R Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Knowledge distillation involves transferring soft labels from a teacher to a student
using a shared temperature-based softmax function. However the assumption of a shared …

Зберегти Послатися Цитовано в 81 джерелах Пов’язані статті Кількість версій: 11 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Decoupled knowledge distillation

B Zhao, Q Cui, R Song, Y Qiu… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

State-of-the-art distillation methods are mainly based on distilling deep features from
intermediate layers, while the significance of logit distillation is greatly overlooked. To …

Зберегти Послатися Цитовано в 788 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Knowledge distillation from a stronger teacher

T Huang, S You, F Wang, C Qian… - Advances in Neural …, 2022 - proceedings.neurips.cc

Unlike existing knowledge distillation methods focus on the baseline settings, where the
teacher models and training strategies are not that strong and competing as state-of-the-art …

Зберегти Послатися Цитовано в 261 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multi-level logit distillation

Y **, J Wang, D Lin - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Abstract Knowledge Distillation (KD) aims at distilling the knowledge from the large teacher
model to a lightweight student model. Mainstream KD methods can be divided into two …

Зберегти Послатися Цитовано в 84 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Masked generative distillation

Z Yang, Z Li, M Shao, D Shi, Z Yuan, C Yuan - European conference on …, 2022 - Springer

Abstract Knowledge distillation has been applied to various tasks successfully. The current
distillation algorithm usually improves students' performance by imitating the output of the …

Зберегти Послатися Цитовано в 214 джерелах Пов’язані статті Кількість версій: 6

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

On the efficacy of knowledge distillation

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

Review of lightweight deep convolutional neural networks

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Videomamba: State space model for efficient video understanding

Deja vu: Contextual sparsity for efficient llms at inference time

Logit standardization in knowledge distillation

Decoupled knowledge distillation

Knowledge distillation from a stronger teacher

Multi-level logit distillation

Masked generative distillation