Transformers as algorithms: Generalization and stability in in-context learning

Y Li, ME Ildiz, D Papailiopoulos… - … on Machine Learning, 2023 - proceedings.mlr.press
In-context learning (ICL) is a type of prompting where a transformer model operates on a
sequence of (input, output) examples and performs inference on-the-fly. In this work, we …

Fedavg with fine tuning: Local updates lead to representation learning

L Collins, H Hassani, A Mokhtari… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract The Federated Averaging (FedAvg) algorithm, which consists of alternating
between a few local stochastic gradient updates at client nodes, followed by a model …

Architecture, dataset and model-scale agnostic data-free meta-learning

Z Hu, L Shen, Z Wang, T Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
The goal of data-free meta-learning is to learn useful prior knowledge from a collection of
pre-trained models without accessing their training data. However, existing works only solve …

Meta-learning without data via wasserstein distributionally-robust model fusion

Z Wang, X Wang, L Shen, Q Suo… - Uncertainty in …, 2022 - proceedings.mlr.press
Existing meta-learning works assume that each task has available training and testing data.
However, there are many available pre-trained models without accessing their training data …

The neural process family: Survey, applications and perspectives

S Jha, D Gong, X Wang, RE Turner, L Yao - arxiv preprint arxiv …, 2022 - arxiv.org
The standard approaches to neural network implementation yield powerful function
approximation capabilities but are limited in their abilities to learn meta representations and …

[HTML][HTML] Provable multi-task representation learning by two-layer relu neural networks

L Collins, H Hassani, M Soltanolkotabi… - … of machine learning …, 2024 - pmc.ncbi.nlm.nih.gov
An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on
many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear …

Understanding benign overfitting in gradient-based meta learning

L Chen, S Lu, T Chen - Advances in neural information …, 2022 - proceedings.neurips.cc
Meta learning has demonstrated tremendous success in few-shot learning with limited
supervised data. In those settings, the meta model is usually overparameterized. While the …

Offline multi-task transfer rl with representational penalization

A Bose, SS Du, M Fazel - arxiv preprint arxiv:2402.12570, 2024 - arxiv.org
We study the problem of representation transfer in offline Reinforcement Learning (RL),
where a learner has access to episodic data from a number of source tasks collected a …

Understanding inverse scaling and emergence in multitask representation learning

ME Ildiz, Z Zhao, S Oymak - International Conference on …, 2024 - proceedings.mlr.press
Large language models exhibit strong multitasking capabilities, however, their learning
dynamics as a function of task characteristics, sample size, and model complexity remain …

Provable pathways: Learning multiple tasks over multiple paths

Y Li, S Oymak - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Constructing useful representations across a large number of tasks is a key requirement for
sample-efficient intelligent systems. A traditional idea in multitask learning (MTL) is building …