Towards lossless dataset distillation via difficulty-aligned trajectory matching

Z Guo, K Wang, G Cazenavette, H Li, K Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
The ultimate goal of Dataset Distillation is to synthesize a small synthetic dataset such that a
model trained on this synthetic set will perform equally well as a model trained on the full …

The Evolution of Dataset Distillation: Toward Scalable and Generalizable Solutions

P Liu, J Du - arxiv preprint arxiv:2502.05673, 2025 - arxiv.org
Dataset distillation, which condenses large-scale datasets into compact synthetic
representations, has emerged as a critical solution for training modern deep learning …

ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation

M Yanuka, M Alper, H Averbuch-Elor… - arxiv preprint arxiv …, 2024 - arxiv.org
Web-scale training on paired text-image data is becoming increasingly central to multimodal
learning, but is challenged by the highly noisy nature of datasets in the wild. Standard data …

Distill gold from massive ores: Bi-level data pruning towards efficient dataset distillation

Y Xu, YL Li, K Cui, Z Wang, C Lu, YW Tai… - European Conference on …, 2024 - Springer
Data-efficient learning has garnered significant attention, especially given the current trend
of large multi-modal models. Recently, dataset distillation has become an effective approach …

From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search

J Sun, H Fei, Z Zheng, G Ding - arxiv preprint arxiv:2404.10292, 2024 - arxiv.org
In text-based person search endeavors, data generation has emerged as a prevailing
practice, addressing concerns over privacy preservation and the arduous task of manual …

Audio-Visual Dataset Distillation

SS Kushwaha, SSN Vasireddy, K Wang… - … on Machine Learning …, 2024 - openreview.net
In this article, we introduce\textit {audio-visual dataset distillation}, a task to construct a
smaller yet representative synthetic audio-visual dataset that maintains the cross-modal …

A Large-Scale Study on Video Action Dataset Condensation

Y Chen, S Guo, L Wang - arxiv preprint arxiv:2412.21197, 2024 - arxiv.org
Dataset condensation has made significant progress in the image domain. Unlike images,
videos possess an additional temporal dimension, which harbors considerable redundant …

Breaking Boundaries Between Linguistics and Artificial Intelligence: Innovation in Vision-Language Matching for Multi-Modal Robots

J Wang, Y Tie, X Jiang, Y Xu - Journal of Organizational and End …, 2023 - igi-global.com
There is a wide connection between linguistics and artificial intelligence (AI), including the
multimodal language matching. Multi-modal robots possess the capability to process various …

Vision language distillation by clustering bitrajectory matching

J Zhou, S Hao, Q Zhang - Fourth International Conference on …, 2024 - spiedigitallibrary.org
Dataset distillation is often used to create compact datasets that can be used to achieve
similar training performance, making it a good choice for addressing the challenges of data …