Spanning training progress: Temporal dual-depth scoring (tdds) for enhanced dataset pruning

X Zhang, J Du, Y Li, W **e… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Dataset pruning aims to construct a coreset capable of achieving performance comparable
to the original full dataset. Most existing dataset pruning methods rely on snapshot-based …

Selectivity drives productivity: efficient dataset pruning for enhanced transfer learning

Y Zhang, Y Zhang, A Chen, J Liu… - Advances in …, 2024 - proceedings.neurips.cc
Massive data is often considered essential for deep learning applications, but it also incurs
significant computational and infrastructural costs. Therefore, dataset pruning (DP) has …

Data distillation can be like vodka: Distilling more times for better quality

X Chen, Y Yang, Z Wang, B Mirzasoleiman - arxiv preprint arxiv …, 2023 - arxiv.org
Dataset distillation aims to minimize the time and memory needed for training deep networks
on large datasets, by creating a small set of synthetic images that has a similar …

A framework for measuring the training efficiency of a neural architecture

E Cueto-Mendoza, J Kelleher - Artificial Intelligence Review, 2024 - Springer
Measuring Efficiency in neural network system development is an open research problem.
This paper presents an experimental framework to measure the training efficiency of a …

Active Data Collection and Management for Real-World Continual Learning via Pretrained Oracle

V Chavan, P Koch, M Schlüter… - Proceedings of the …, 2024 - openaccess.thecvf.com
Incremental Learning (IL) deals with learning from continuous streams of data while
minimising catastrophic forgetting. This field of Machine Learning (ML) research has …

KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training

T Thao Nguyen, B Gerofi… - Advances in …, 2024 - proceedings.neurips.cc
This paper proposes a method for hiding the least-important samples during the training of
deep neural networks to increase efficiency, ie, to reduce the cost of training. Using …

Optimizing Data Acquisition to Enhance Machine Learning Performance

T Wang, S Huang, Z Bao, JS Culpepper… - Proceedings of the …, 2024 - dl.acm.org
In this paper, we study how to acquire labeled data points from a large data pool to enrich a
training set for enhancing supervised machine learning (ML) performance. The state-of-the …

Data optimization in deep learning: A survey

O Wu, R Yao - IEEE Transactions on Knowledge and Data …, 2025 - ieeexplore.ieee.org
Large-scale, high-quality data are considered an essential factor for the successful
application of many deep learning techniques. Meanwhile, numerous real-world deep …

Cadc: Encoding user-item interactions for compressing recommendation model training data

HE Zarch, A Alshabanah, C Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Deep learning recommendation models (DLRMs) are at the heart of the current e-commerce
industry. However, the amount of training data used to train these large models is growing …

Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity

S Joshi, A Jain, A Payani… - International …, 2024 - proceedings.mlr.press
Abstract Contrastive Language-Image Pre-training (CLIP) on large-scale image-caption
datasets learns representations that can achieve remarkable zero-shot generalization …