Studying large language model generalization with influence functions
When trying to gain better visibility into a machine learning model in order to understand and
mitigate the associated risks, a potentially valuable source of evidence is: which training …
mitigate the associated risks, a potentially valuable source of evidence is: which training …
Tutorial on amortized optimization
B Amos - Foundations and Trends® in Machine Learning, 2023 - nowpublishers.com
Optimization is a ubiquitous modeling tool and is often deployed in settings which
repeatedly solve similar instances of the same problem. Amortized optimization methods …
repeatedly solve similar instances of the same problem. Amortized optimization methods …
A closer look at learned optimization: Stability, robustness, and inductive biases
Learned optimizers---neural networks that are trained to act as optimizers---have the
potential to dramatically accelerate training of machine learning models. However, even …
potential to dramatically accelerate training of machine learning models. However, even …
On amortizing convex conjugates for optimal transport
B Amos - arxiv preprint arxiv:2210.12153, 2022 - arxiv.org
This paper focuses on computing the convex conjugate operation that arises when solving
Euclidean Wasserstein-2 optimal transport problems. This conjugation, which is also …
Euclidean Wasserstein-2 optimal transport problems. This conjugation, which is also …
Connecting NTK and NNGP: A unified theoretical framework for neural network learning dynamics in the kernel regime
Artificial neural networks have revolutionized machine learning in recent years, but a
complete theoretical framework for their learning process is still lacking. Substantial …
complete theoretical framework for their learning process is still lacking. Substantial …
Searching for optimal per-coordinate step-sizes with multidimensional backtracking
The backtracking line-search is an effective technique to automatically tune the step-size in
smooth optimization. It guarantees similar performance to using the theoretically optimal …
smooth optimization. It guarantees similar performance to using the theoretically optimal …
Efficient parametric approximations of neural network function space distance
It is often useful to compactly summarize important properties of model parameters and
training data so that they can be used later without storing and/or iterating over the entire …
training data so that they can be used later without storing and/or iterating over the entire …
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Second-order optimization algorithms exhibit excellent convergence properties for training
deep learning models, but often incur significant computation and memory overheads. This …
deep learning models, but often incur significant computation and memory overheads. This …
Training Data Attribution via Approximate Unrolling
Many training data attribution (TDA) methods aim to estimate how a model's behavior would
change if one or more data points were removed from the training set. Methods based on …
change if one or more data points were removed from the training set. Methods based on …
Training Data Attribution via Approximate Unrolled Differentation
Many training data attribution (TDA) methods aim to estimate how a model's behavior would
change if one or more data points were removed from the training set. Methods based on …
change if one or more data points were removed from the training set. Methods based on …