Towards demystifying serverless machine learning training

J Jiang, S Gan, Y Liu, F Wang, G Alonso… - Proceedings of the …, 2021 - dl.acm.org
The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-
intensive applications such as ETL, query processing, or machine learning (ML). Several …

Distributed deep learning on data systems: a comparative analysis of approaches

Y Zhang, F Mcquillan, N Jayaram, N Kak… - Proceedings of the …, 2021 - par.nsf.gov
Deep learning (DL) is growing in popularity for many data analytics applications, including
among enterprises. Large business-critical datasets in such settings typically reside in …

Parallel training of knowledge graph embedding models: a comparison of techniques

A Kochsiek, R Gemulla - Proceedings of the VLDB Endowment, 2021 - dl.acm.org
Knowledge graph embedding (KGE) models represent the entities and relations of a
knowledge graph (KG) using dense continuous representations called embeddings. KGE …

HET-GMP: A graph-based system approach to scaling large embedding model training

X Miao, Y Shi, H Zhang, X Zhang, X Nie… - Proceedings of the …, 2022 - dl.acm.org
Embedding models have been recognized as an effective learning paradigm for high-
dimensional data. However, a major embedding model training obstacle is that updating …

The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format

U Sirin, S Idreos - Proceedings of the ACM on Management of Data, 2024 - dl.acm.org
Numerous applications today rely on artificial intelligence over images. Image AI is,
however, extremely expensive. In particular, the inference cost of image AI dominates the …

Nups: A parameter server for machine learning with non-uniform parameter access

A Renz-Wieland, R Gemulla, Z Kaoudi… - Proceedings of the 2022 …, 2022 - dl.acm.org
Parameter servers (PSs) facilitate the implementation of distributed training for large
machine learning tasks. In this paper, we argue that existing PSs are inefficient for tasks that …

DRPS: efficient disk-resident parameter servers for distributed machine learning

Z Song, Y Gu, Z Wang, G Yu - Frontiers of Computer Science, 2022 - Springer
Parameter server (PS) as the state-of-the-art distributed framework for large-scale iterative
machine learning tasks has been extensively studied. However, existing PS-based systems …

Spardl: Distributed deep learning training with efficient sparse communication

M Zhao, Y Yin, Y Mao, Q Liu, L Chen… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Top-k sparsification has recently been widely used to reduce the communication volume in
distributed deep learning. However, due to the Sparse Gradient Accumulation (SGA) …

Optimizing tensor computations: From applications to compilation and runtime techniques

M Boehm, M Interlandi, C Jermaine - Companion of the 2023 …, 2023 - dl.acm.org
Machine learning (ML) training and scoring fundamentally relies on linear algebra programs
and more general tensor computations. Most ML systems utilize distributed parameter …

Just move it! Dynamic parameter allocation in action

A Renz-Wieland, T Drobisch, Z Kaoudi… - Proceedings of the …, 2021 - dl.acm.org
Parameter servers (PSs) ease the implementation of distributed machine learning systems,
but their performance can fall behind that of single machine baselines due to communication …