Infobatch: Lossless training speed up by unbiased dynamic data pruning

Z Qin, K Wang, Z Zheng, J Gu, X Peng, Z Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Data pruning aims to obtain lossless performances with less overall cost. A common
approach is to filter out samples that make less contribution to the training. This could lead to …

The stronger the diffusion model, the easier the backdoor: Data poisoning to induce copyright breaches without adjusting finetuning pipeline

H Wang, Q Shen, Y Tong, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
The commercialization of text-to-image diffusion models (DMs) brings forth potential
copyright concerns. Despite numerous attempts to protect DMs from copyright issues, the …

Self-supervised dataset distillation: A good compression is all you need

M Zhou, Z Yin, S Shao, Z Shen - arxiv preprint arxiv:2404.07976, 2024 - arxiv.org
Dataset distillation aims to compress information from a large-scale original dataset to a new
compact dataset while striving to preserve the utmost degree of the original data …

Dd-robustbench: An adversarial robustness benchmark for dataset distillation

Y Wu, J Du, P Liu, Y Lin, W Xu, W Cheng - arxiv preprint arxiv:2403.13322, 2024 - arxiv.org
Dataset distillation is an advanced technique aimed at compressing datasets into
significantly smaller counterparts, while preserving formidable training performance …

Unlocking the potential of federated learning: The symphony of dataset distillation via deep generative latents

Y Jia, S Vahidian, J Sun, J Zhang, V Kungurtsev… - … on Computer Vision, 2024 - Springer
Data heterogeneity presents significant challenges for federated learning (FL). Recently,
dataset distillation techniques have been introduced, and performed at the client level, to …

Generative dataset distillation based on diffusion model

D Su, J Hou, G Li, R Togo, R Song, T Ogawa… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper presents our method for the generative track of The First Dataset Distillation
Challenge at ECCV 2024. Since the diffusion model has become the mainstay of generative …

Group distributionally robust dataset distillation with risk minimization

S Vahidian, M Wang, J Gu, V Kungurtsev… - arxiv preprint arxiv …, 2024 - arxiv.org
Dataset distillation (DD) has emerged as a widely adopted technique for crafting a synthetic
dataset that captures the essential information of a training dataset, facilitating the training of …

Prioritize Alignment in Dataset Distillation

Z Li, Z Guo, W Zhao, T Zhang, ZQ Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Dataset Distillation aims to compress a large dataset into a significantly more compact,
synthetic one without compromising the performance of the trained models. To achieve this …

The Evolution of Dataset Distillation: Toward Scalable and Generalizable Solutions

P Liu, J Du - arxiv preprint arxiv:2502.05673, 2025 - arxiv.org
Dataset distillation, which condenses large-scale datasets into compact synthetic
representations, has emerged as a critical solution for training modern deep learning …

Emphasizing discriminative features for dataset distillation in complex scenarios

K Wang, Z Li, ZQ Cheng, S Khaki, A Sajedi… - arxiv preprint arxiv …, 2024 - arxiv.org
Dataset distillation has demonstrated strong performance on simple datasets like CIFAR,
MNIST, and TinyImageNet but struggles to achieve similar results in more complex …