Dataset distillation: A comprehensive review

R Yu, S Liu, X Wang - IEEE transactions on pattern analysis …, 2023 - ieeexplore.ieee.org
Recent success of deep learning is largely attributed to the sheer amount of data used for
training deep neural networks. Despite the unprecedented success, the massive data …

A survey on data selection for language models

A Albalak, Y Elazar, SM **e, S Longpre… - arxiv preprint arxiv …, 2024 - arxiv.org
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Scaling up dataset distillation to imagenet-1k with constant memory

J Cui, R Wang, S Si, CJ Hsieh - International Conference on …, 2023 - proceedings.mlr.press
Dataset Distillation is a newly emerging area that aims to distill large datasets into much
smaller and highly informative synthetic ones to accelerate training and reduce storage …

Dataset distillation via factorization

S Liu, K Wang, X Yang, J Ye… - Advances in neural …, 2022 - proceedings.neurips.cc
In this paper, we study dataset distillation (DD), from a novel perspective and introduce
a\emph {dataset factorization} approach, termed\emph {HaBa}, which is a plug-and-play …

Improved distribution matching for dataset condensation

G Zhao, G Li, Y Qin, Y Yu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Dataset Condensation aims to condense a large dataset into a smaller one while
maintaining its ability to train a well-performing model, thus reducing the storage cost and …

Dataset condensation with distribution matching

B Zhao, H Bilen - Proceedings of the IEEE/CVF Winter …, 2023 - openaccess.thecvf.com
Computational cost of training state-of-the-art deep models in many learning problems is
rapidly increasing due to more sophisticated models and larger datasets. A recent promising …

Dataset distillation using neural feature regression

Y Zhou, E Nezhadarya, J Ba - Advances in Neural …, 2022 - proceedings.neurips.cc
Dataset distillation aims to learn a small synthetic dataset that preserves most of the
information from the original dataset. Dataset distillation can be formulated as a bi-level …

A comprehensive survey of dataset distillation

S Lei, D Tao - IEEE Transactions on Pattern Analysis and …, 2023 - ieeexplore.ieee.org
Deep learning technology has developed unprecedentedly in the last decade and has
become the primary choice in many application domains. This progress is mainly attributed …

Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective

Z Yin, E **ng, Z Shen - Advances in Neural Information …, 2023 - proceedings.neurips.cc
We present a new dataset condensation framework termed Squeeze, Recover and Relabel
(SRe $^ 2$ L) that decouples the bilevel optimization of model and synthetic data during …

Datadam: Efficient dataset distillation with attention matching

A Sajedi, S Khaki, E Amjadian, LZ Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Researchers have long tried to minimize training costs in deep learning while maintaining
strong generalization across diverse datasets. Emerging research on dataset distillation …