Transformers as algorithms: Generalization and stability in in-context learning
In-context learning (ICL) is a type of prompting where a transformer model operates on a
sequence of (input, output) examples and performs inference on-the-fly. In this work, we …
sequence of (input, output) examples and performs inference on-the-fly. In this work, we …
Fedavg with fine tuning: Local updates lead to representation learning
Abstract The Federated Averaging (FedAvg) algorithm, which consists of alternating
between a few local stochastic gradient updates at client nodes, followed by a model …
between a few local stochastic gradient updates at client nodes, followed by a model …
Architecture, dataset and model-scale agnostic data-free meta-learning
The goal of data-free meta-learning is to learn useful prior knowledge from a collection of
pre-trained models without accessing their training data. However, existing works only solve …
pre-trained models without accessing their training data. However, existing works only solve …
Meta-learning without data via wasserstein distributionally-robust model fusion
Existing meta-learning works assume that each task has available training and testing data.
However, there are many available pre-trained models without accessing their training data …
However, there are many available pre-trained models without accessing their training data …
The neural process family: Survey, applications and perspectives
The standard approaches to neural network implementation yield powerful function
approximation capabilities but are limited in their abilities to learn meta representations and …
approximation capabilities but are limited in their abilities to learn meta representations and …
[HTML][HTML] Provable multi-task representation learning by two-layer relu neural networks
An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on
many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear …
many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear …
Understanding benign overfitting in gradient-based meta learning
Meta learning has demonstrated tremendous success in few-shot learning with limited
supervised data. In those settings, the meta model is usually overparameterized. While the …
supervised data. In those settings, the meta model is usually overparameterized. While the …
Offline multi-task transfer rl with representational penalization
We study the problem of representation transfer in offline Reinforcement Learning (RL),
where a learner has access to episodic data from a number of source tasks collected a …
where a learner has access to episodic data from a number of source tasks collected a …
Understanding inverse scaling and emergence in multitask representation learning
Large language models exhibit strong multitasking capabilities, however, their learning
dynamics as a function of task characteristics, sample size, and model complexity remain …
dynamics as a function of task characteristics, sample size, and model complexity remain …
Provable pathways: Learning multiple tasks over multiple paths
Constructing useful representations across a large number of tasks is a key requirement for
sample-efficient intelligent systems. A traditional idea in multitask learning (MTL) is building …
sample-efficient intelligent systems. A traditional idea in multitask learning (MTL) is building …