Performance enhancement of artificial intelligence: A survey
The advent of machine learning (ML) and Artificial intelligence (AI) has brought about a
significant transformation across multiple industries, as it has facilitated the automation of …
significant transformation across multiple industries, as it has facilitated the automation of …
Efficient large-scale language model training on gpu clusters using megatron-lm
Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …
However, training these models efficiently is challenging because: a) GPU memory capacity …
PanGu-: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Large-scale Pretrained Language Models (PLMs) have become the new paradigm for
Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as …
Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as …
A comprehensive survey on training acceleration for large machine learning models in IoT
The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …
Decentralized training of foundation models in heterogeneous environments
Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often
involving tens of thousands of GPUs running continuously for months. These models are …
involving tens of thousands of GPUs running continuously for months. These models are …
P3: Distributed deep graph learning at scale
Graph Neural Networks (GNNs) have gained significant attention in the recent past, and
become one of the fastest growing subareas in deep learning. While several new GNN …
become one of the fastest growing subareas in deep learning. While several new GNN …
GNNLab: a factored system for sample-based GNN training over GPUs
We propose GNNLab, a sample-based GNN training system in a single machine multi-GPU
setup. GNNLab adopts a factored design for multiple GPUs, where each GPU is dedicated to …
setup. GNNLab adopts a factored design for multiple GPUs, where each GPU is dedicated to …
Oobleck: Resilient distributed training of large models using pipeline templates
Oobleck enables resilient distributed training of large DNN models with guaranteed fault
tolerance. It takes a planning-execution co-design approach, where it first generates a set of …
tolerance. It takes a planning-execution co-design approach, where it first generates a set of …
Towards efficient post-training quantization of pre-trained language models
Network quantization has gained increasing attention with the rapid growth of large pre-
trained language models~(PLMs). However, most existing quantization methods for PLMs …
trained language models~(PLMs). However, most existing quantization methods for PLMs …
A survey of resource-efficient llm and multimodal foundation models
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …