{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving
Model parallelism is conventionally viewed as a method to scale a single large deep
learning model beyond the memory limits of a single device. In this paper, we demonstrate …
learning model beyond the memory limits of a single device. In this paper, we demonstrate …
Deep learning workload scheduling in gpu datacenters: A survey
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary
The practical needs of the" right to be forgotten" and poisoned data removal call for efficient
machine unlearning techniques, which enable machine learning models to unlearn, or to …
machine unlearning techniques, which enable machine learning models to unlearn, or to …
Resource-efficient algorithms and systems of foundation models: A survey
Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …
and large language model based multimodal models, are revolutionizing the entire machine …
Towards efficient generative large language model serving: A survey from algorithms to systems
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
Power-aware Deep Learning Model Serving with {μ-Serve}
With the increasing popularity of large deep learning model-serving workloads, there is a
pressing need to reduce the energy consumption of a model-serving cluster while …
pressing need to reduce the energy consumption of a model-serving cluster while …
Oobleck: Resilient distributed training of large models using pipeline templates
Oobleck enables resilient distributed training of large DNN models with guaranteed fault
tolerance. It takes a planning-execution co-design approach, where it first generates a set of …
tolerance. It takes a planning-execution co-design approach, where it first generates a set of …
Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters
Training Deep Neural Networks (DNNs) is a popular workload in both enterprises and cloud
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …
[PDF][PDF] MAST: Global scheduling of ML training across Geo-Distributed datacenters at hyperscale
A Choudhury, Y Wang, T Pelkonen… - 18th USENIX …, 2024 - yangwang83.github.io
In public clouds, users must manually select a datacenter region to upload their ML training
data and launch ML training workloads in the same region to ensure data and computation …
data and launch ML training workloads in the same region to ensure data and computation …
Multi-resource interleaving for deep learning training
Training Deep Learning (DL) model requires multiple resource types, including CPUs,
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …