Performance enhancement of artificial intelligence: A survey

M Krichen, MS Abdalzaher - Journal of Network and Computer Applications, 2024‏ - Elsevier
The advent of machine learning (ML) and Artificial intelligence (AI) has brought about a
significant transformation across multiple industries, as it has facilitated the automation of …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Efficient large-scale language model training on gpu clusters using megatron-lm

D Narayanan, M Shoeybi, J Casper… - Proceedings of the …, 2021‏ - dl.acm.org
Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …

PanGu-: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

W Zeng, X Ren, T Su, H Wang, Y Liao, Z Wang… - arxiv preprint arxiv …, 2021‏ - arxiv.org
Large-scale Pretrained Language Models (PLMs) have become the new paradigm for
Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as …

Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X **, X Liu - ACM Computing Surveys, 2025‏ - dl.acm.org
Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

Decentralized training of foundation models in heterogeneous environments

B Yuan, Y He, J Davis, T Zhang… - Advances in …, 2022‏ - proceedings.neurips.cc
Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often
involving tens of thousands of GPUs running continuously for months. These models are …

GNNLab: a factored system for sample-based GNN training over GPUs

J Yang, D Tang, X Song, L Wang, Q Yin… - Proceedings of the …, 2022‏ - dl.acm.org
We propose GNNLab, a sample-based GNN training system in a single machine multi-GPU
setup. GNNLab adopts a factored design for multiple GPUs, where each GPU is dedicated to …

P3: Distributed deep graph learning at scale

S Gandhi, AP Iyer - 15th {USENIX} Symposium on Operating Systems …, 2021‏ - usenix.org
Graph Neural Networks (GNNs) have gained significant attention in the recent past, and
become one of the fastest growing subareas in deep learning. While several new GNN …

Oobleck: Resilient distributed training of large models using pipeline templates

I Jang, Z Yang, Z Zhang, X **… - Proceedings of the 29th …, 2023‏ - dl.acm.org
Oobleck enables resilient distributed training of large DNN models with guaranteed fault
tolerance. It takes a planning-execution co-design approach, where it first generates a set of …

Lyra: Elastic scheduling for deep learning clusters

J Li, H Xu, Y Zhu, Z Liu, C Guo, C Wang - Proceedings of the Eighteenth …, 2023‏ - dl.acm.org
Organizations often build separate training and inference clusters for deep learning, and use
separate schedulers to manage them. This leads to problems for both: inference clusters …