Sancus: staleness-aware communication-avoiding full-graph decentralized training in large-scale graph neural networks
Graph neural networks (GNNs) have emerged due to their success at modeling graph data.
Yet, it is challenging for GNNs to efficiently scale to large graphs. Thus, distributed GNNs …
Yet, it is challenging for GNNs to efficiently scale to large graphs. Thus, distributed GNNs …
Spotserve: Serving generative large language models on preemptible instances
The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …
Blindfl: Vertical federated machine learning without peeking into your data
Due to the rising concerns on privacy protection, how to build machine learning (ML) models
over different data sources with security guarantees is gaining more popularity. Vertical …
over different data sources with security guarantees is gaining more popularity. Vertical …
Galvatron: Efficient transformer training over multiple gpus using automatic parallelism
Transformer models have achieved state-of-the-art performance on various domains of
applications and gradually becomes the foundations of the advanced large deep learning …
applications and gradually becomes the foundations of the advanced large deep learning …
HET: scaling out huge embedding model training via cache-enabled distributed framework
Embedding models have been an effective learning paradigm for high-dimensional data.
However, one open issue of embedding models is that their representations (latent factors) …
However, one open issue of embedding models is that their representations (latent factors) …
Flash-llm: Enabling cost-effective and highly-efficient large generative model inference with unstructured sparsity
With the fast growth of parameter size, it becomes increasingly challenging to deploy large
generative models as they typically require large GPU memory consumption and massive …
generative models as they typically require large GPU memory consumption and massive …
Distributed Machine Learning in Edge Computing: Challenges, Solutions and Future Directions
J Tu, L Yang, J Cao - ACM Computing Surveys, 2024 - dl.acm.org
Distributed machine learning on edges is widely used in intelligent transportation, smart
home, industrial manufacturing, and underground pipe network monitoring to achieve low …
home, industrial manufacturing, and underground pipe network monitoring to achieve low …
Dear: Accelerating distributed deep learning with fine-grained all-reduce pipelining
Communication scheduling has been shown to be effective in accelerating distributed
training, which enables all-reduce communications to be overlapped with backpropagation …
training, which enables all-reduce communications to be overlapped with backpropagation …
Sdpipe: A semi-decentralized framework for heterogeneity-aware pipeline-parallel training
The increasing size of both deep learning models and training data necessitates the ability
to scale out model training through pipeline-parallel training, which combines pipelined …
to scale out model training through pipeline-parallel training, which combines pipelined …
HET-GMP: A graph-based system approach to scaling large embedding model training
Embedding models have been recognized as an effective learning paradigm for high-
dimensional data. However, a major embedding model training obstacle is that updating …
dimensional data. However, a major embedding model training obstacle is that updating …