Pecan:{Cost-Efficient}{ML} Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement

D Graur, O Mraz, M Li, S Pourghannad… - 2024 USENIX Annual …, 2024 - usenix.org
Input data preprocessing is a common bottleneck in machine learning (ML) jobs, that can
significantly increase training time and cost as expensive GPUs or TPUs idle waiting for …

A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training

M Wang, G Waldspurger, S Sundararaman - Proceedings of the 16th …, 2024 - dl.acm.org
Deep learning (DL) training is data-intensive and often bottlenecked by fetching data from
remote storage. Recognizing that many samples' sizes diminish during data preprocessing …

Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling

R Bachkaniwala, H Lanka, K Rong… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
Preprocessing input data is a crucial step in machine learning pipelines, involving tasks
such as loading, decoding, and applying transformations. Prior works have identified …

Multi-Level Erasure Coded Storage Design and Its Relationship to Deep Learning Workloads

M Wang - 2024 - knowledge.uchicago.edu
Large-scale data centers store vast amounts of user data across numerous disks,
necessitating redundancy mechanisms like erasure coding (EC) to protect against disk …

[การอ้างอิง][C] Analysis of Deep Learning Preprocessing Stage and Selective Offloading for Reducing Training Data Traffic Working Draft–Private View Only

M Wang, G Waldspurger, S Sundararaman, H Gunawi