Μελετητής Google

R Das, S Sanghavi - International Conference on Machine …, 2023 - proceedings.mlr.press

Self-distillation (SD) is the process of first training a" teacher" model and then using its
predictions to train a" student" model that has the same architecture. Specifically, the …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 14 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

SeqNAS: Neural architecture search for event sequence classification

I Udovichenko, E Shvetsov, D Divitsky, D Osin… - IEEE …, 2024 - ieeexplore.ieee.org

Neural Architecture Search (NAS) methods are widely used in various industries to obtain
high-quality, task-specific solutions with minimal human intervention. Event Sequences …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 6 Σχετικά άρθρα Όλες οι 4 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

On student-teacher deviations in distillation: does it pay to disobey?

V Nagarajan, AK Menon… - Advances in …, 2023 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) has been widely used to improve the test accuracy of a"
student" network, by training it to mimic the soft probabilities of a trained" teacher" network …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 12 Σχετικά άρθρα Όλες οι 6 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Induced Model Matching: Restricted Models Help Train Full-Featured Models

U Muneeb, MI Ohannessian - Advances in Neural …, 2025 - proceedings.neurips.cc

We consider scenarios where a very accurate (often small) predictive model using restricted
features is available when training a full-featured (often larger) model. This restricted model …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

What Mechanisms Does Knowledge Distillation Distill?

C Wu, ES Lubana, BK Mlodozeniec… - … of UniReps: the …, 2024 - proceedings.mlr.press

Abstract Knowledge distillation is a commonly-used compression method in ML due to the
popularity of increasingly large-scale models, but it is unclear if all the information a teacher …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

[PDF][PDF] Trans-LoRA: Towards Data-Free Transferable Parameter Efficient Finetuning

R Wang, S Ghosh, D Cox, D Antognini… - arxiv preprint arxiv …, 2024 - openreview.net

Low-rank adapters (LoRA) and their variants are popular parameter-efficient finetuning
(PEFT) techniques that closely match full model fine-tune performance while requiring only a …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 1 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

: towards data-free Transferable Parameter Efficient Finetuning

R Wang, S Ghosh, D Cox, D Antognini, A Oliva… - arxiv preprint arxiv …, 2024 - arxiv.org

Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning
(PEFT) techniques that closely match full model fine-tune performance while requiring only a …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Bayesian Optimization Meets Self-Distillation

HJ Lee, H Song, H Lee, G Lee… - Proceedings of the …, 2023 - openaccess.thecvf.com

Bayesian optimization (BO) has contributed greatly to improving model performance by
suggesting promising hyperparameter configurations iteratively based on observations from …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation

F Farhadzadeh, D Das, S Borse, F Porikli - arxiv preprint arxiv …, 2025 - arxiv.org

The rising popularity of large foundation models has led to a heightened demand for
parameter-efficient fine-tuning methods, such as Low-Rank Adaptation (LoRA), which offer …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

Incremental Soft Pruning to Get the Sparse Neural Network During Training

K Zhu, F Hu, Y Ding, Y Dong… - 2024 International Joint …, 2024 - ieeexplore.ieee.org

The traditional three-stage pruning pipeline is first to train an original dense network, then
identify redundant parts of the network for pruning based on the evaluation metrics of the …

Αποθήκευση Παράθεση Σχετικά άρθρα

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Knowledge distillation: Bad models can be good role models

Understanding self-distillation in the presence of label noise

SeqNAS: Neural architecture search for event sequence classification

On student-teacher deviations in distillation: does it pay to disobey?

Induced Model Matching: Restricted Models Help Train Full-Featured Models

What Mechanisms Does Knowledge Distillation Distill?

[PDF][PDF] Trans-LoRA: Towards Data-Free Transferable Parameter Efficient Finetuning

: towards data-free Transferable Parameter Efficient Finetuning

Bayesian Optimization Meets Self-Distillation

LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation

Incremental Soft Pruning to Get the Sparse Neural Network During Training