Towards explaining the regularization effect of initial large learning rate in training neural networks
Stochastic gradient descent with a large initial learning rate is widely used for training
modern neural net architectures. Although a small initial learning rate allows for faster …
modern neural net architectures. Although a small initial learning rate allows for faster …
Deep learning through the lens of example difficulty
Existing work on understanding deep learning often employs measures that compress all
data-dependent information into a few numbers. In this work, we adopt a perspective based …
data-dependent information into a few numbers. In this work, we adopt a perspective based …
Davinz: Data valuation using deep neural networks at initialization
Recent years have witnessed a surge of interest in develo** trustworthy methods to
evaluate the value of data in many real-world applications (eg, collaborative machine …
evaluate the value of data in many real-world applications (eg, collaborative machine …
How does learning rate decay help modern neural networks?
Learning rate decay (lrDecay) is a\emph {de facto} technique for training modern neural
networks. It starts with a large learning rate and then decays it multiple times. It is empirically …
networks. It starts with a large learning rate and then decays it multiple times. It is empirically …
Mechanistic mode connectivity
We study neural network loss landscapes through the lens of mode connectivity, the
observation that minimizers of neural networks retrieved via training on a dataset are …
observation that minimizers of neural networks retrieved via training on a dataset are …
When do curricula work?
Inspired by human learning, researchers have proposed ordering examples during training
based on their difficulty. Both curriculum learning, exposing a network to easier examples …
based on their difficulty. Both curriculum learning, exposing a network to easier examples …
T-mars: Improving visual representations by circumventing text feature learning
Large web-sourced multimodal datasets have powered a slew of new methods for learning
general-purpose visual representations, advancing the state of the art in computer vision …
general-purpose visual representations, advancing the state of the art in computer vision …
Characterizing datapoints via second-split forgetting
Researchers investigating example hardness have increasingly focused on the dynamics by
which neural networks learn and forget examples throughout training. Popular metrics …
which neural networks learn and forget examples throughout training. Popular metrics …
Estimating example difficulty using variance of gradients
In machine learning, a question of great interest is understanding what examples are
challenging for a model to classify. Identifying atypical examples ensures the safe …
challenging for a model to classify. Identifying atypical examples ensures the safe …
Detecting shortcut learning for fair medical AI using shortcut testing
Abstract Machine learning (ML) holds great promise for improving healthcare, but it is critical
to ensure that its use will not propagate or amplify health disparities. An important step is to …
to ensure that its use will not propagate or amplify health disparities. An important step is to …