- Academic Search

M Xu, D Cai, W Yin, S Wang, X **, X Liu - ACM Computing Surveys, 2025 - dl.acm.org

Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

Save Cite Cited by 2 Related articles All 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Save Cite Cited by 92 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlsys.org

On optimizing the communication of model parallelism

Y Zhuang, L Zheng, Z Li, E **ng, Q Ho… - Proceedings of …, 2023 - proceedings.mlsys.org

We study a novel and important communication pattern in large-scale model-parallel deep
learning (DL), which we call cross-mesh resharding. This pattern emerges when the two …

Save Cite Cited by 37 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Optimus-CC: Efficient large NLP model training with 3D parallelism aware communication compression

J Song, J Yim, J Jung, H Jang, HJ Kim, Y Kim… - Proceedings of the 28th …, 2023 - dl.acm.org

In training of modern large natural language processing (NLP) models, it has become a
common practice to split models using 3D parallelism to multiple GPUs. Such technique …

Save Cite Cited by 28 Related articles All 5 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Save Cite Cited by 7 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

A survey on auto-parallelism of large-scale deep learning training

P Liang, Y Tang, X Zhang, Y Bai, T Su… - … on Parallel and …, 2023 - ieeexplore.ieee.org

Deep learning (DL) has gained great success in recent years, leading to state-of-the-art
performance in research community and industrial fields like computer vision and natural …

Save Cite Cited by 15 Related articles All 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Taming throughput-latency tradeoff in llm inference with sarathi-serve

A Agrawal, N Kedia, A Panwar, J Mohan… - arxiv preprint arxiv …, 2024 - arxiv.org

Each LLM serving request goes through two phases. The first is prefill which processes the
entire input prompt to produce one output token and the second is decode which generates …

Save Cite Cited by 95 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] github.io

ElasticFlow: An elastic serverless training platform for distributed deep learning

D Gu, Y Zhao, Y Zhong, Y **ong, Z Han… - Proceedings of the 28th …, 2023 - dl.acm.org

This paper proposes ElasticFlow, an elastic serverless training platform for distributed deep
learning. ElasticFlow provides a serverless interface with two distinct features:(i) users …

Save Cite Cited by 36 Related articles All 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FLUX: fast software-based communication overlap on gpus through kernel fusion

LW Chang, W Bao, Q Hou, C Jiang, N Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Large deep learning models have demonstrated strong ability to solve many tasks across a
wide range of applications. Those large models typically require training and inference to be …

Save Cite Cited by 6 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] poliba.it

Distributed analytics for big data: A survey

F Berloco, V Bevilacqua, S Colucci - Neurocomputing, 2024 - Elsevier

In recent years, a constant and fast information growing has characterized digital
applications in the majority of real-life scenarios. Thus, a new information asset, namely Big …

Save Cite Cited by 5 Related articles All 3 versions Free GPT-4 DeepSeek

Create alert

Cite

Advanced search

Saved to My library

Breaking the computation and communication abstraction barrier in distributed machine learning...

Resource-efficient algorithms and systems of foundation models: A survey

A survey of resource-efficient llm and multimodal foundation models

On optimizing the communication of model parallelism

Optimus-CC: Efficient large NLP model training with 3D parallelism aware communication compression

Efficient training of large language models on distributed infrastructures: a survey

A survey on auto-parallelism of large-scale deep learning training

Taming throughput-latency tradeoff in llm inference with sarathi-serve

ElasticFlow: An elastic serverless training platform for distributed deep learning

FLUX: fast software-based communication overlap on gpus through kernel fusion

Distributed analytics for big data: A survey