Orion: Interference-aware, fine-grained GPU sharing for ML applications

F Strati, X Ma, A Klimovic - … of the Nineteenth European Conference on …, 2024 - dl.acm.org
GPUs are critical for maximizing the throughput-per-Watt of deep neural network (DNN)
applications. However, DNN applications often underutilize GPUs, even when using large …

Training and serving system of foundation models: A comprehensive survey

J Zhou, Y Chen, Z Hong, W Chen, Y Yu… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
Foundation models (eg, ChatGPT, DALL-E, PengCheng Mind, PanGu-) have demonstrated
extraordinary performance in key technological areas, such as natural language processing …

Decentralized bilevel optimization

X Chen, M Huang, S Ma - Optimization Letters, 2024 - Springer
Bilevel optimization has been successfully applied to many important machine learning
problems. Algorithms for solving bilevel optimization have been studied under various …

Merak: An efficient distributed dnn training framework with automated 3d parallelism for giant foundation models

Z Lai, S Li, X Tang, K Ge, W Liu, Y Duan… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Foundation models are in the process of becoming the dominant deep learning technology.
Pretraining a foundation model is always time-consuming due to the large scale of both the …

[PDF][PDF] Daphne: An open and extensible system infrastructure for integrated data analysis pipelines

P Damme, M Birkenbach, C Bitsakos… - … on Innovative Data …, 2022 - pure.itu.dk
Integrated data analysis (IDA) pipelines---that combine data management (DM) and query
processing, high-performance computing (HPC), and machine learning (ML) training and …

Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters

X Lian, B Yuan, X Zhu, Y Wang, Y He, H Wu… - Proceedings of the 28th …, 2022 - dl.acm.org
Recent years have witnessed an exponential growth of model scale in deep learning-based
recommender systems---from Google's 2016 model with 1 billion parameters to the latest …

Bluefog: Make decentralized algorithms practical for optimization and deep learning

B Ying, K Yuan, H Hu, Y Chen, W Yin - arxiv preprint arxiv:2111.04287, 2021 - arxiv.org
Decentralized algorithm is a form of computation that achieves a global goal through local
dynamics that relies on low-cost communication between directly-connected agents. On …

Fine-tuning language models over slow networks using activation quantization with guarantees

J Wang, B Yuan, L Rimanic, Y He… - Advances in …, 2022 - proceedings.neurips.cc
Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …

Prophet: Fine-grained load balancing for parallel training of large-scale moe models

W Wang, Z Lai, S Li, W Liu, K Ge, Y Liu… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
Mixture of Expert (MoE) has received increasing attention for scaling DNN models to extra-
large size with negligible increases in computation. The MoE model has achieved the …

A multidimensional communication scheduling method for hybrid parallel dnn training

S Li, K Lu, Z Lai, W Liu, K Ge… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The transformer-based deep neural network (DNN) models have shown considerable
success across diverse tasks, prompting widespread adoption of distributed training …