Dataset distillation: A comprehensive review

R Yu, S Liu, X Wang - IEEE transactions on pattern analysis …, 2023 - ieeexplore.ieee.org
Recent success of deep learning is largely attributed to the sheer amount of data used for
training deep neural networks. Despite the unprecedented success, the massive data …

A survey on data selection for language models

A Albalak, Y Elazar, SM **e, S Longpre… - arxiv preprint arxiv …, 2024 - arxiv.org
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Generalizing dataset distillation via deep generative prior

G Cazenavette, T Wang, A Torralba… - Proceedings of the …, 2023 - openaccess.thecvf.com
Dataset Distillation aims to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a …

Less: Selecting influential data for targeted instruction tuning

M **a, S Malladi, S Gururangan, S Arora… - arxiv preprint arxiv …, 2024 - arxiv.org
Instruction tuning has unlocked powerful capabilities in large language models (LLMs),
effectively using combined datasets to develop generalpurpose chatbots. However, real …

Dataset distillation by matching training trajectories

G Cazenavette, T Wang, A Torralba… - Proceedings of the …, 2022 - openaccess.thecvf.com
Dataset distillation is the task of synthesizing a small dataset such that a model trained on
the synthetic set will match the test accuracy of the model trained on the full dataset. In this …

Scaling up dataset distillation to imagenet-1k with constant memory

J Cui, R Wang, S Si, CJ Hsieh - International Conference on …, 2023 - proceedings.mlr.press
Dataset Distillation is a newly emerging area that aims to distill large datasets into much
smaller and highly informative synthetic ones to accelerate training and reduce storage …

Dataset distillation via factorization

S Liu, K Wang, X Yang, J Ye… - Advances in neural …, 2022 - proceedings.neurips.cc
In this paper, we study dataset distillation (DD), from a novel perspective and introduce
a\emph {dataset factorization} approach, termed\emph {HaBa}, which is a plug-and-play …

Cafe: Learning to condense dataset by aligning features

K Wang, B Zhao, X Peng, Z Zhu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Dataset condensation aims at reducing the network training effort through condensing a
cumbersome training set into a compact synthetic one. State-of-the-art approaches largely …

Improved distribution matching for dataset condensation

G Zhao, G Li, Y Qin, Y Yu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Dataset Condensation aims to condense a large dataset into a smaller one while
maintaining its ability to train a well-performing model, thus reducing the storage cost and …

Dataset condensation with distribution matching

B Zhao, H Bilen - Proceedings of the IEEE/CVF Winter …, 2023 - openaccess.thecvf.com
Computational cost of training state-of-the-art deep models in many learning problems is
rapidly increasing due to more sophisticated models and larger datasets. A recent promising …