Transformers as statisticians: Provable in-context learning with in-context algorithm selection
Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …
Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging
Abstract Machine-learning models for medical tasks can match or surpass the performance
of clinical experts. However, in settings differing from those of the training dataset, the …
of clinical experts. However, in settings differing from those of the training dataset, the …
Transformers as algorithms: Generalization and stability in in-context learning
In-context learning (ICL) is a type of prompting where a transformer model operates on a
sequence of (input, output) examples and performs inference on-the-fly. In this work, we …
sequence of (input, output) examples and performs inference on-the-fly. In this work, we …
Rethinking few-shot image classification: a good embedding is all you need?
The focus of recent meta-learning research has been on the development of learning
algorithms that can quickly adapt to test time tasks with limited data and low computational …
algorithms that can quickly adapt to test time tasks with limited data and low computational …
Universal prompt tuning for graph neural networks
In recent years, prompt tuning has sparked a research surge in adapting pre-trained models.
Unlike the unified pre-training strategy employed in the language field, the graph field …
Unlike the unified pre-training strategy employed in the language field, the graph field …
What makes multi-modal learning better than single (provably)
The world provides us with data of multiple modalities. Intuitively, models fusing data from
different modalities outperform their uni-modal counterparts, since more information is …
different modalities outperform their uni-modal counterparts, since more information is …
Variational model inversion attacks
Given the ubiquity of deep neural networks, it is important that these models do not reveal
information about sensitive data that they have been trained on. In model inversion attacks …
information about sensitive data that they have been trained on. In model inversion attacks …
Revisiting scalarization in multi-task learning: A theoretical perspective
Linear scalarization, ie, combining all loss functions by a weighted sum, has been the
default choice in the literature of multi-task learning (MTL) since its inception. In recent years …
default choice in the literature of multi-task learning (MTL) since its inception. In recent years …
A kernel-based view of language model fine-tuning
It has become standard to solve NLP tasks by fine-tuning pre-trained language models
(LMs), especially in low-data settings. There is minimal theoretical understanding of …
(LMs), especially in low-data settings. There is minimal theoretical understanding of …
Spectral methods for data science: A statistical perspective
Spectral methods have emerged as a simple yet surprisingly effective approach for
extracting information from massive, noisy and incomplete data. In a nutshell, spectral …
extracting information from massive, noisy and incomplete data. In a nutshell, spectral …