Performance enhancement of artificial intelligence: A survey

M Krichen, MS Abdalzaher - Journal of Network and Computer Applications, 2024 - Elsevier
The advent of machine learning (ML) and Artificial intelligence (AI) has brought about a
significant transformation across multiple industries, as it has facilitated the automation of …

Efficient large-scale language model training on gpu clusters using megatron-lm

D Narayanan, M Shoeybi, J Casper… - Proceedings of the …, 2021 - dl.acm.org
Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …

PanGu-: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

W Zeng, X Ren, T Su, H Wang, Y Liao, Z Wang… - arxiv preprint arxiv …, 2021 - arxiv.org
Large-scale Pretrained Language Models (PLMs) have become the new paradigm for
Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as …

A comprehensive survey on training acceleration for large machine learning models in IoT

H Wang, Z Qu, Q Zhou, H Zhang, B Luo… - IEEE Internet of …, 2021 - ieeexplore.ieee.org
The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …

Decentralized training of foundation models in heterogeneous environments

B Yuan, Y He, J Davis, T Zhang… - Advances in …, 2022 - proceedings.neurips.cc
Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often
involving tens of thousands of GPUs running continuously for months. These models are …

P3: Distributed deep graph learning at scale

S Gandhi, AP Iyer - 15th {USENIX} Symposium on Operating Systems …, 2021 - usenix.org
Graph Neural Networks (GNNs) have gained significant attention in the recent past, and
become one of the fastest growing subareas in deep learning. While several new GNN …

GNNLab: a factored system for sample-based GNN training over GPUs

J Yang, D Tang, X Song, L Wang, Q Yin… - Proceedings of the …, 2022 - dl.acm.org
We propose GNNLab, a sample-based GNN training system in a single machine multi-GPU
setup. GNNLab adopts a factored design for multiple GPUs, where each GPU is dedicated to …

Oobleck: Resilient distributed training of large models using pipeline templates

I Jang, Z Yang, Z Zhang, X **… - Proceedings of the 29th …, 2023 - dl.acm.org
Oobleck enables resilient distributed training of large DNN models with guaranteed fault
tolerance. It takes a planning-execution co-design approach, where it first generates a set of …

Towards efficient post-training quantization of pre-trained language models

H Bai, L Hou, L Shang, X Jiang… - Advances in neural …, 2022 - proceedings.neurips.cc
Network quantization has gained increasing attention with the rapid growth of large pre-
trained language models~(PLMs). However, most existing quantization methods for PLMs …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …