Dsdm: Model-aware dataset selection with datamodels

L Engstrom, A Feldmann, A Madry - arxiv preprint arxiv:2401.12926, 2024 - arxiv.org
When selecting data for training large-scale models, standard practice is to filter for
examples that match human notions of data quality. Such filtering yields qualitatively clean …

Spanning training progress: Temporal dual-depth scoring (tdds) for enhanced dataset pruning

X Zhang, J Du, Y Li, W **e… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Dataset pruning aims to construct a coreset capable of achieving performance comparable
to the original full dataset. Most existing dataset pruning methods rely on snapshot-based …

A survey on batch training in genetic programming

L Rosenfeld, L Vanneschi - Genetic Programming and Evolvable …, 2025 - Springer
Abstract In Machine Learning (ML), the use of subsets of training data, referred to as
batches, rather than the entire dataset, has been extensively researched to reduce …

Efficiently approaching vertical federated learning by combining data reduction and conditional computation techniques

F Folino, G Folino, FS Pisani, L Pontieri, P Sabatino - Journal of Big Data, 2024 - Springer
In this paper, a framework based on a sparse Mixture of Experts (MoE) architecture is
proposed for the federated learning and application of a distributed classification model in …

Neural network empowered liquidity pricing in a two-price economy under conic finance settings

M Michielon, D Franquinho, A Gentile… - Quantitative …, 2024 - Taylor & Francis
In the article at hand neural networks are used to model liquidity in financial markets, under
conic finance settings, in two different contexts. That is, on the one hand this paper illustrates …

On-device deep learning: survey on techniques improving energy efficiency of DNNs

A Boumendil, W Bechkit… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Providing high-quality predictions is no longer the sole goal for neural networks. As we live
in an increasingly interconnected world, these models need to match the constraints of …

General Information Metrics for Improving AI Model Training Efficiency

J Xu, C Liu, X Tan, X Zhu, A Wu, H Wan… - arxiv preprint arxiv …, 2025 - arxiv.org
To address the growing size of AI model training data and the lack of a universal data
selection methodology-factors that significantly drive up training costs--this paper presents …

Play it Straight: An Intelligent Data Pruning Technique for Green-AI

F Scala, S Flesca, L Pontieri - International Conference on Discovery …, 2024 - Springer
The escalating climate crisis demands urgent action to mitigate the environmental impact of
energy-intensive technologies, including Artificial Intelligence (AI). Lowering AI's …

Training-Free Dataset Pruning for Instance Segmentation

Y Dai, L **ao, I Tsang, Y He - The Thirteenth International Conference on … - openreview.net
Existing dataset pruning techniques primarily focus on classification tasks, limiting their
applicability to more complex and practical tasks like instance segmentation. Instance …

Modyn: Data-Centric Machine Learning Pipeline Orchestration

M Böther, T Robroek, V Gsteiger, R Holzinger… - Proceedings of the …, 2025 - dl.acm.org
In real-world machine learning (ML) pipelines, datasets are continuously growing. Models
must incorporate this new training data to improve generalization and adapt to potential …