{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Q Weng, W **ao, Y Yu, W Wang, C Wang, J He… - … USENIX Symposium on …, 2022 - usenix.org
With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …

Fundamentals, algorithms, and technologies of occupancy detection for smart buildings using IOT sensors

P Chaudhari, Y **ao, MMC Cheng, T Li - Sensors, 2024 - mdpi.com
Smart buildings use advanced technologies to automate building functions. One important
function is occupancy detection using Internet of Things (IoT) sensors for smart buildings …

Characterization of large language model development in the datacenter

Q Hu, Z Ye, Z Wang, G Wang, M Zhang… - … USENIX Symposium on …, 2024 - usenix.org
Large Language Models (LLMs) have presented impressive performance across several
transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster …

FFCV: Accelerating training by removing data bottlenecks

G Leclerc, A Ilyas, L Engstrom… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present FFCV, a library for easy, fast, resource-efficient training of machine learning
models. FFCV speeds up model training by eliminating (often subtle) data bottlenecks from …

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

J Mohan, A Phanishayee, J Kulkarni… - … USENIX Symposium on …, 2022 - usenix.org
Training Deep Neural Networks (DNNs) is a popular workload in both enterprises and cloud
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …

AI-coupled HPC workflow applications, middleware and performance

W Brewer, A Gainaru, F Suter, F Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
AI integration is revolutionizing the landscape of HPC simulations, enhancing the
importance, use, and performance of AI-driven HPC workflows. This paper surveys the …

Orion: Interference-aware, fine-grained GPU sharing for ML applications

F Strati, X Ma, A Klimovic - … of the Nineteenth European Conference on …, 2024 - dl.acm.org
GPUs are critical for maximizing the throughput-per-Watt of deep neural network (DNN)
applications. However, DNN applications often underutilize GPUs, even when using large …

Understanding data storage and ingestion for large-scale deep recommendation model training: Industrial product

M Zhao, N Agarwal, A Basant, B Gedik, S Pan… - Proceedings of the 49th …, 2022 - dl.acm.org
Datacenter-scale AI training clusters consisting of thousands of domain-specific accelerators
(DSA) are used to train increasingly-complex deep learning models. These clusters rely on a …

Multi-resource interleaving for deep learning training

Y Zhao, Y Liu, Y Peng, Y Zhu, X Liu, X ** - Proceedings of the ACM …, 2022 - dl.acm.org
Training Deep Learning (DL) model requires multiple resource types, including CPUs,
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …

Fastflow: Accelerating deep learning model training with smart offloading of input data pipeline

T Um, B Oh, B Seo, M Kweun, G Kim… - Proceedings of the VLDB …, 2023 - dl.acm.org
When training a deep learning (DL) model, input data are pre-processed on CPUs and
transformed into tensors, which are then fed into GPUs for gradient computations of model …