Towards explaining the regularization effect of initial large learning rate in training neural networks

Y Li, C Wei, T Ma - Advances in neural information …, 2019 - proceedings.neurips.cc
Stochastic gradient descent with a large initial learning rate is widely used for training
modern neural net architectures. Although a small initial learning rate allows for faster …

Deep learning through the lens of example difficulty

R Baldock, H Maennel… - Advances in Neural …, 2021 - proceedings.neurips.cc
Existing work on understanding deep learning often employs measures that compress all
data-dependent information into a few numbers. In this work, we adopt a perspective based …

Davinz: Data valuation using deep neural networks at initialization

Z Wu, Y Shu, BKH Low - International Conference on …, 2022 - proceedings.mlr.press
Recent years have witnessed a surge of interest in develo** trustworthy methods to
evaluate the value of data in many real-world applications (eg, collaborative machine …

How does learning rate decay help modern neural networks?

K You, M Long, J Wang, MI Jordan - arxiv preprint arxiv:1908.01878, 2019 - arxiv.org
Learning rate decay (lrDecay) is a\emph {de facto} technique for training modern neural
networks. It starts with a large learning rate and then decays it multiple times. It is empirically …

Mechanistic mode connectivity

ES Lubana, EJ Bigelow, RP Dick… - International …, 2023 - proceedings.mlr.press
We study neural network loss landscapes through the lens of mode connectivity, the
observation that minimizers of neural networks retrieved via training on a dataset are …

When do curricula work?

X Wu, E Dyer, B Neyshabur - arxiv preprint arxiv:2012.03107, 2020 - arxiv.org
Inspired by human learning, researchers have proposed ordering examples during training
based on their difficulty. Both curriculum learning, exposing a network to easier examples …

T-mars: Improving visual representations by circumventing text feature learning

P Maini, S Goyal, ZC Lipton, JZ Kolter… - arxiv preprint arxiv …, 2023 - arxiv.org
Large web-sourced multimodal datasets have powered a slew of new methods for learning
general-purpose visual representations, advancing the state of the art in computer vision …

Characterizing datapoints via second-split forgetting

P Maini, S Garg, Z Lipton… - Advances in Neural …, 2022 - proceedings.neurips.cc
Researchers investigating example hardness have increasingly focused on the dynamics by
which neural networks learn and forget examples throughout training. Popular metrics …

Estimating example difficulty using variance of gradients

C Agarwal, D D'souza… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
In machine learning, a question of great interest is understanding what examples are
challenging for a model to classify. Identifying atypical examples ensures the safe …

Detecting shortcut learning for fair medical AI using shortcut testing

A Brown, N Tomasev, J Freyberg, Y Liu… - Nature …, 2023 - nature.com
Abstract Machine learning (ML) holds great promise for improving healthcare, but it is critical
to ensure that its use will not propagate or amplify health disparities. An important step is to …