Transformers as algorithms: Generalization and stability in in-context learning
In-context learning (ICL) is a type of prompting where a transformer model operates on a
sequence of (input, output) examples and performs inference on-the-fly. In this work, we …
sequence of (input, output) examples and performs inference on-the-fly. In this work, we …
Fedavg with fine tuning: Local updates lead to representation learning
Abstract The Federated Averaging (FedAvg) algorithm, which consists of alternating
between a few local stochastic gradient updates at client nodes, followed by a model …
between a few local stochastic gradient updates at client nodes, followed by a model …
Learning to generate image embeddings with user-level differential privacy
Small on-device models have been successfully trained with user-level differential privacy
(DP) for next word prediction and image classification tasks in the past. However, existing …
(DP) for next word prediction and image classification tasks in the past. However, existing …
A conditional gradient-based method for simple bilevel optimization with convex lower-level problem
In this paper, we study a class of bilevel optimization problems, also known as simple bilevel
optimization, where we minimize a smooth objective function over the optimal solution set of …
optimization, where we minimize a smooth objective function over the optimal solution set of …
[HTML][HTML] Provable multi-task representation learning by two-layer relu neural networks
An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on
many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear …
many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear …
Holistic transfer: towards non-disruptive fine-tuning with partial target data
We propose a learning problem involving adapting a pre-trained source model to the target
domain for classifying all classes that appeared in the source data, using target data that …
domain for classifying all classes that appeared in the source data, using target data that …
Metalearning with very few samples per task
Metalearning and multitask learning are two frameworks for solving a group of related
learning tasks more efficiently than we could hope to solve each of the individual tasks on …
learning tasks more efficiently than we could hope to solve each of the individual tasks on …
Generalization error for portable rewards in transfer imitation learning
The reward transfer paradigm in transfer imitation learning (TIL) leverages the reward
learned via inverse reinforcement learning (IRL) in the source environment to re-optimize a …
learned via inverse reinforcement learning (IRL) in the source environment to re-optimize a …
Understanding inverse scaling and emergence in multitask representation learning
Large language models exhibit strong multitasking capabilities, however, their learning
dynamics as a function of task characteristics, sample size, and model complexity remain …
dynamics as a function of task characteristics, sample size, and model complexity remain …
Provable pathways: Learning multiple tasks over multiple paths
Constructing useful representations across a large number of tasks is a key requirement for
sample-efficient intelligent systems. A traditional idea in multitask learning (MTL) is building …
sample-efficient intelligent systems. A traditional idea in multitask learning (MTL) is building …