Resource-efficient algorithms and systems of foundation models: A survey
Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …
and large language model based multimodal models, are revolutionizing the entire machine …
A survey of resource-efficient llm and multimodal foundation models
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
On optimizing the communication of model parallelism
We study a novel and important communication pattern in large-scale model-parallel deep
learning (DL), which we call cross-mesh resharding. This pattern emerges when the two …
learning (DL), which we call cross-mesh resharding. This pattern emerges when the two …
Optimus-CC: Efficient large NLP model training with 3D parallelism aware communication compression
In training of modern large natural language processing (NLP) models, it has become a
common practice to split models using 3D parallelism to multiple GPUs. Such technique …
common practice to split models using 3D parallelism to multiple GPUs. Such technique …
Efficient training of large language models on distributed infrastructures: a survey
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …
their sophisticated capabilities. Training these models requires vast GPU clusters and …
A survey on auto-parallelism of large-scale deep learning training
P Liang, Y Tang, X Zhang, Y Bai, T Su… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Deep learning (DL) has gained great success in recent years, leading to state-of-the-art
performance in research community and industrial fields like computer vision and natural …
performance in research community and industrial fields like computer vision and natural …
Taming throughput-latency tradeoff in llm inference with sarathi-serve
Each LLM serving request goes through two phases. The first is prefill which processes the
entire input prompt to produce one output token and the second is decode which generates …
entire input prompt to produce one output token and the second is decode which generates …
ElasticFlow: An elastic serverless training platform for distributed deep learning
This paper proposes ElasticFlow, an elastic serverless training platform for distributed deep
learning. ElasticFlow provides a serverless interface with two distinct features:(i) users …
learning. ElasticFlow provides a serverless interface with two distinct features:(i) users …
FLUX: fast software-based communication overlap on gpus through kernel fusion
Large deep learning models have demonstrated strong ability to solve many tasks across a
wide range of applications. Those large models typically require training and inference to be …
wide range of applications. Those large models typically require training and inference to be …
Distributed analytics for big data: A survey
In recent years, a constant and fast information growing has characterized digital
applications in the majority of real-life scenarios. Thus, a new information asset, namely Big …
applications in the majority of real-life scenarios. Thus, a new information asset, namely Big …