On the implicit bias in deep-learning algorithms

G Vardi - Communications of the ACM, 2023 - dl.acm.org
On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

Trained transformers learn linear models in-context

R Zhang, S Frei, PL Bartlett - Journal of Machine Learning Research, 2024 - jmlr.org
Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …

Surgical fine-tuning improves adaptation to distribution shifts

Y Lee, AS Chen, F Tajwar, A Kumar, H Yao… - arxiv preprint arxiv …, 2022 - arxiv.org
A common approach to transfer learning under distribution shift is to fine-tune the last few
layers of a pre-trained model, preserving learned features while also adapting to the new …

Fine-tuning can distort pretrained features and underperform out-of-distribution

A Kumar, A Raghunathan, R Jones, T Ma… - arxiv preprint arxiv …, 2022 - arxiv.org
When transferring a pretrained model to a downstream task, two popular methods are full
fine-tuning (updating all the model parameters) and linear probing (updating only the last …

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X **e, P Zhou, H Li, Z Lin, S Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

Pruning neural networks without any data by iteratively conserving synaptic flow

H Tanaka, D Kunin, DL Yamins… - Advances in neural …, 2020 - proceedings.neurips.cc
Pruning the parameters of deep neural networks has generated intense interest due to
potential savings in time, memory and energy both during training and at test time. Recent …

Understanding self-supervised learning dynamics without contrastive pairs

Y Tian, X Chen, S Ganguli - International Conference on …, 2021 - proceedings.mlr.press
While contrastive approaches of self-supervised learning (SSL) learn representations by
minimizing the distance between two augmented views of the same data point (positive …

On exact computation with an infinitely wide neural net

S Arora, SS Du, W Hu, Z Li… - Advances in neural …, 2019 - proceedings.neurips.cc
How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard
dataset such as CIFAR-10 when its “width”—namely, number of channels in convolutional …

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arxiv preprint arxiv …, 2021 - cambridge.org
We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …

Implicit regularization in deep matrix factorization

S Arora, N Cohen, W Hu, Y Luo - Advances in neural …, 2019 - proceedings.neurips.cc
Efforts to understand the generalization mystery in deep learning have led to the belief that
gradient-based optimization induces a form of implicit regularization, a bias towards models …