Studying large language model generalization with influence functions

R Grosse, J Bae, C Anil, N Elhage, A Tamkin… - arxiv preprint arxiv …, 2023 - arxiv.org
When trying to gain better visibility into a machine learning model in order to understand and
mitigate the associated risks, a potentially valuable source of evidence is: which training …

Tutorial on amortized optimization

B Amos - Foundations and Trends® in Machine Learning, 2023 - nowpublishers.com
Optimization is a ubiquitous modeling tool and is often deployed in settings which
repeatedly solve similar instances of the same problem. Amortized optimization methods …

A closer look at learned optimization: Stability, robustness, and inductive biases

J Harrison, L Metz… - Advances in Neural …, 2022 - proceedings.neurips.cc
Learned optimizers---neural networks that are trained to act as optimizers---have the
potential to dramatically accelerate training of machine learning models. However, even …

On amortizing convex conjugates for optimal transport

B Amos - arxiv preprint arxiv:2210.12153, 2022 - arxiv.org
This paper focuses on computing the convex conjugate operation that arises when solving
Euclidean Wasserstein-2 optimal transport problems. This conjugation, which is also …

Connecting NTK and NNGP: A unified theoretical framework for neural network learning dynamics in the kernel regime

Y Avidan, Q Li, H Sompolinsky - arxiv preprint arxiv:2309.04522, 2023 - arxiv.org
Artificial neural networks have revolutionized machine learning in recent years, but a
complete theoretical framework for their learning process is still lacking. Substantial …

Searching for optimal per-coordinate step-sizes with multidimensional backtracking

F Kunstner, V Sanches Portella… - Advances in Neural …, 2023 - proceedings.neurips.cc
The backtracking line-search is an effective technique to automatically tune the step-size in
smooth optimization. It guarantees similar performance to using the theoretically optimal …

Efficient parametric approximations of neural network function space distance

N Dhawan, S Huang, J Bae… - … Conference on Machine …, 2023 - proceedings.mlr.press
It is often useful to compactly summarize important properties of model parameters and
training data so that they can be used later without storing and/or iterating over the entire …

Eva: A General Vectorized Approximation Framework for Second-order Optimization

L Zhang, S Shi, B Li - arxiv preprint arxiv:2308.02123, 2023 - arxiv.org
Second-order optimization algorithms exhibit excellent convergence properties for training
deep learning models, but often incur significant computation and memory overheads. This …

Training Data Attribution via Approximate Unrolling

J Bae, W Lin, J Lorraine, RB Grosse - The Thirty-eighth Annual …, 2024 - openreview.net
Many training data attribution (TDA) methods aim to estimate how a model's behavior would
change if one or more data points were removed from the training set. Methods based on …

Training Data Attribution via Approximate Unrolled Differentation

J Bae, W Lin, J Lorraine, R Grosse - arxiv preprint arxiv:2405.12186, 2024 - arxiv.org
Many training data attribution (TDA) methods aim to estimate how a model's behavior would
change if one or more data points were removed from the training set. Methods based on …