A survey on data selection for language models

A Albalak, Y Elazar, SM **e, S Longpre… - arxiv preprint arxiv …, 2024 - arxiv.org
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Robust learning with progressive data expansion against spurious correlation

Y Deng, Y Yang, B Mirzasoleiman… - Advances in neural …, 2023 - proceedings.neurips.cc
While deep learning models have shown remarkable performance in various tasks, they are
susceptible to learning non-generalizable _spurious features_ rather than the core features …

The limits and potentials of local sgd for distributed heterogeneous learning with intermittent communication

KK Patel, M Glasgow, A Zindari… - The Thirty Seventh …, 2024 - proceedings.mlr.press
Local SGD is a popular optimization method in distributed learning, often outperforming mini-
batch SGD. Despite this practical success, proving the efficiency of local SGD has been …

Optimal multi-distribution learning

Z Zhang, W Zhan, Y Chen, SS Du… - The Thirty Seventh …, 2024 - proceedings.mlr.press
Abstract Multi-distribution learning (MDL), which seeks to learn a shared model that
minimizes the worst-case risk across $ k $ distinct data distributions, has emerged as a …

A unifying perspective on multi-calibration: Game dynamics for multi-objective learning

N Haghtalab, M Jordan, E Zhao - Advances in Neural …, 2023 - proceedings.neurips.cc
We provide a unifying framework for the design and analysis of multi-calibrated predictors.
By placing the multi-calibration problem in the general setting of multi-objective learning …

Stochastic approximation approaches to group distributionally robust optimization

L Zhang, P Zhao, ZH Zhuang… - Advances in Neural …, 2023 - proceedings.neurips.cc
This paper investigates group distributionally robust optimization (GDRO), with the purpose
to learn a model that performs well over $ m $ different distributions. First, we formulate …

The sample complexity of multi-distribution learning

B Peng - The Thirty Seventh Annual Conference on Learning …, 2024 - proceedings.mlr.press
Multi-distribution learning generalizes the classic PAC learning to handle data coming from
multiple distributions. Given a set of $ k $ data distributions and a hypothesis class of VC …

Why does throwing away data improve worst-group error?

K Chaudhuri, K Ahuja, M Arjovsky… - … on Machine Learning, 2023 - proceedings.mlr.press
When facing data with imbalanced classes or groups, practitioners follow an intriguing
strategy to achieve best results. They throw away examples until the classes or groups are …

Open problem: The sample complexity of multi-distribution learning for VC classes

P Awasthi, N Haghtalab, E Zhao - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Multi-distribution learning is a natural generalization of PAC learning to settings with multiple
data distributions. There remains a significant gap between the known upper and lower …

Derandomizing Multi-Distribution Learning

KG Larsen, O Montasser… - Advances in Neural …, 2025 - proceedings.neurips.cc
Multi-distribution or collaborative learning involves learning a single predictor that works
well across multiple data distributions, using samples from each during training. Recent …