Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X **, X Liu - ACM Computing Surveys, 2025 - dl.acm.org
Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

On optimizing the communication of model parallelism

Y Zhuang, L Zheng, Z Li, E **ng, Q Ho… - Proceedings of …, 2023 - proceedings.mlsys.org
We study a novel and important communication pattern in large-scale model-parallel deep
learning (DL), which we call cross-mesh resharding. This pattern emerges when the two …

Optimus-CC: Efficient large NLP model training with 3D parallelism aware communication compression

J Song, J Yim, J Jung, H Jang, HJ Kim, Y Kim… - Proceedings of the 28th …, 2023 - dl.acm.org
In training of modern large natural language processing (NLP) models, it has become a
common practice to split models using 3D parallelism to multiple GPUs. Such technique …

Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

A survey on auto-parallelism of large-scale deep learning training

P Liang, Y Tang, X Zhang, Y Bai, T Su… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Deep learning (DL) has gained great success in recent years, leading to state-of-the-art
performance in research community and industrial fields like computer vision and natural …

Taming throughput-latency tradeoff in llm inference with sarathi-serve

A Agrawal, N Kedia, A Panwar, J Mohan… - arxiv preprint arxiv …, 2024 - arxiv.org
Each LLM serving request goes through two phases. The first is prefill which processes the
entire input prompt to produce one output token and the second is decode which generates …

ElasticFlow: An elastic serverless training platform for distributed deep learning

D Gu, Y Zhao, Y Zhong, Y **ong, Z Han… - Proceedings of the 28th …, 2023 - dl.acm.org
This paper proposes ElasticFlow, an elastic serverless training platform for distributed deep
learning. ElasticFlow provides a serverless interface with two distinct features:(i) users …

FLUX: fast software-based communication overlap on gpus through kernel fusion

LW Chang, W Bao, Q Hou, C Jiang, N Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Large deep learning models have demonstrated strong ability to solve many tasks across a
wide range of applications. Those large models typically require training and inference to be …

Distributed analytics for big data: A survey

F Berloco, V Bevilacqua, S Colucci - Neurocomputing, 2024 - Elsevier
In recent years, a constant and fast information growing has characterized digital
applications in the majority of real-life scenarios. Thus, a new information asset, namely Big …