- Academic Search

V Nagarajan, AK Menon… - Advances in …, 2023 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) has been widely used to improve the test accuracy of a"
student" network, by training it to mimic the soft probabilities of a trained" teacher" network …

Uložit Citovat Počet citací tohoto článku: 12 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Cluster-aware semi-supervised learning: Relational knowledge distillation provably learns clustering

Y Dong, K Miller, Q Lei, R Ward - Advances in Neural …, 2023 - proceedings.neurips.cc

Despite the empirical success and practical significance of (relational) knowledge distillation
that matches (the relations of) features between teacher and student models, the …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Data upcycling knowledge distillation for image super-resolution

Y Zhang, W Li, S Li, H Chen, Z Tu, W Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Knowledge distillation (KD) compresses deep neural networks by transferring task-related
knowledge from cumbersome pre-trained teacher models to compact student models …

Uložit Citovat Počet citací tohoto článku: 6 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A little help goes a long way: Efficient llm training by leveraging small lms

AS Rawat, V Sadhanala, A Rostamizadeh… - arxiv preprint arxiv …, 2024 - arxiv.org

A primary challenge in large language model (LLM) development is their onerous pre-
training cost. Typically, such pre-training involves optimizing a self-supervised objective …

Uložit Citovat Počet citací tohoto článku: 2 Související články Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards the fundamental limits of knowledge transfer over finite domains

Q Zhao, B Zhu - arxiv preprint arxiv:2310.07838, 2023 - arxiv.org

We characterize the statistical efficiency of knowledge transfer through $ n $ samples from a
teacher to a probabilistic student classifier with input space $\mathcal S $ over labels …

Uložit Citovat Počet citací tohoto článku: 6 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information

D Wu, IV Modoranu, M Safaryan… - Advances in …, 2025 - proceedings.neurips.cc

The rising footprint of machine learning has led to a focus on imposing model sparsity as a
means of reducing computational and memory costs. For deep neural networks (DNNs), the …

Uložit Citovat Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Learning Neural Networks with Sparse Activations

P Awasthi, N Dikkala, P Kamath… - The Thirty Seventh …, 2024 - proceedings.mlr.press

A core component present in many successful neural network architectures, is an MLP block
of two fully connected layers with a non-linear activation in between. An intriguing …

Uložit Citovat Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Progressive distillation induces an implicit curriculum

A Panigrahi, B Liu, S Malladi, A Risteski… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge distillation leverages a teacher model to improve the training of a student model.
A persistent challenge is that a better teacher does not always yield a better student, to …

Uložit Citovat Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Distillation Scaling Laws

D Busbridge, A Shidani, F Weers, J Ramapuram… - arxiv preprint arxiv …, 2025 - arxiv.org

We provide a distillation scaling law that estimates distilled model performance based on a
compute budget and its allocation between the student and teacher. Our findings reduce the …

Uložit Citovat Počet citací tohoto článku: 2 Související články Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On information captured by neural networks: connections with memorization and generalization

H Harutyunyan - arxiv preprint arxiv:2306.15918, 2023 - arxiv.org

Despite the popularity and success of deep learning, there is limited understanding of when,
how, and why neural networks generalize to unseen examples. Since learning can be seen …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Supervision complexity and its role in knowledge distillation

On student-teacher deviations in distillation: does it pay to disobey?

Cluster-aware semi-supervised learning: Relational knowledge distillation provably learns clustering

Data upcycling knowledge distillation for image super-resolution

A little help goes a long way: Efficient llm training by leveraging small lms

Towards the fundamental limits of knowledge transfer over finite domains

The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information

Learning Neural Networks with Sparse Activations

Progressive distillation induces an implicit curriculum

Distillation Scaling Laws

On information captured by neural networks: connections with memorization and generalization