Understanding self-distillation in the presence of label noise

R Das, S Sanghavi - International Conference on Machine …, 2023 - proceedings.mlr.press
Self-distillation (SD) is the process of first training a" teacher" model and then using its
predictions to train a" student" model that has the same architecture. Specifically, the …

SeqNAS: Neural architecture search for event sequence classification

I Udovichenko, E Shvetsov, D Divitsky, D Osin… - IEEE …, 2024 - ieeexplore.ieee.org
Neural Architecture Search (NAS) methods are widely used in various industries to obtain
high-quality, task-specific solutions with minimal human intervention. Event Sequences …

On student-teacher deviations in distillation: does it pay to disobey?

V Nagarajan, AK Menon… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Knowledge distillation (KD) has been widely used to improve the test accuracy of a"
student" network, by training it to mimic the soft probabilities of a trained" teacher" network …

Induced Model Matching: Restricted Models Help Train Full-Featured Models

U Muneeb, MI Ohannessian - Advances in Neural …, 2025 - proceedings.neurips.cc
We consider scenarios where a very accurate (often small) predictive model using restricted
features is available when training a full-featured (often larger) model. This restricted model …

What Mechanisms Does Knowledge Distillation Distill?

C Wu, ES Lubana, BK Mlodozeniec… - … of UniReps: the …, 2024 - proceedings.mlr.press
Abstract Knowledge distillation is a commonly-used compression method in ML due to the
popularity of increasingly large-scale models, but it is unclear if all the information a teacher …

[PDF][PDF] Trans-LoRA: Towards Data-Free Transferable Parameter Efficient Finetuning

R Wang, S Ghosh, D Cox, D Antognini… - arxiv preprint arxiv …, 2024 - openreview.net
Low-rank adapters (LoRA) and their variants are popular parameter-efficient finetuning
(PEFT) techniques that closely match full model fine-tune performance while requiring only a …

: towards data-free Transferable Parameter Efficient Finetuning

R Wang, S Ghosh, D Cox, D Antognini, A Oliva… - arxiv preprint arxiv …, 2024 - arxiv.org
Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning
(PEFT) techniques that closely match full model fine-tune performance while requiring only a …

Bayesian Optimization Meets Self-Distillation

HJ Lee, H Song, H Lee, G Lee… - Proceedings of the …, 2023 - openaccess.thecvf.com
Bayesian optimization (BO) has contributed greatly to improving model performance by
suggesting promising hyperparameter configurations iteratively based on observations from …

LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation

F Farhadzadeh, D Das, S Borse, F Porikli - arxiv preprint arxiv …, 2025 - arxiv.org
The rising popularity of large foundation models has led to a heightened demand for
parameter-efficient fine-tuning methods, such as Low-Rank Adaptation (LoRA), which offer …

Incremental Soft Pruning to Get the Sparse Neural Network During Training

K Zhu, F Hu, Y Ding, Y Dong… - 2024 International Joint …, 2024 - ieeexplore.ieee.org
The traditional three-stage pruning pipeline is first to train an original dense network, then
identify redundant parts of the network for pruning based on the evaluation metrics of the …