Google Akademik

D Narayanan, M Shoeybi, J Casper… - Proceedings of the …, 2021 - dl.acm.org

Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …

Kaydet Alıntı yap Alıntılanma sayısı: 723 İlgili makaleler 11 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

PanGu-: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

W Zeng, X Ren, T Su, H Wang, Y Liao, Z Wang… - arxiv preprint arxiv …, 2021 - arxiv.org

Large-scale Pretrained Language Models (PLMs) have become the new paradigm for
Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as …

Kaydet Alıntı yap Alıntılanma sayısı: 247 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] neurips.cc

Decentralized training of foundation models in heterogeneous environments

B Yuan, Y He, J Davis, T Zhang… - Advances in …, 2022 - proceedings.neurips.cc

Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often
involving tens of thousands of GPUs running continuously for months. These models are …

Kaydet Alıntı yap Alıntılanma sayısı: 89 İlgili makaleler 10 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] usenix.org

{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training

Z Lin, Y Miao, Q Zhang, F Yang, Y Zhu, C Li… - … USENIX Symposium on …, 2024 - usenix.org

With the growing model size of deep neural networks (DNN), deep learning training is
increasingly relying on handcrafted search spaces to find efficient parallelization execution …

Kaydet Alıntı yap Alıntılanma sayısı: 8 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] mlr.press

Memory-efficient pipeline-parallel dnn training

D Narayanan, A Phanishayee, K Shi… - International …, 2021 - proceedings.mlr.press

Many state-of-the-art ML results have been obtained by scaling up the number of
parameters in existing models. However, parameters and activations for such large models …

Kaydet Alıntı yap Alıntılanma sayısı: 241 İlgili makaleler 10 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] acm.org

Varuna: scalable, low-cost training of massive deep learning models

S Athlur, N Saran, M Sivathanu, R Ramjee… - Proceedings of the …, 2022 - dl.acm.org

Systems for training massive deep learning models (billions of parameters) today assume
and require specialized" hyperclusters": hundreds or thousands of GPUs wired with …

Kaydet Alıntı yap Alıntılanma sayısı: 116 İlgili makaleler 9 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

Chimera: efficiently training large-scale neural networks with bidirectional pipelines

S Li, T Hoefler - Proceedings of the International Conference for High …, 2021 - dl.acm.org

Training large deep learning models at scale is very challenging. This paper proposes
Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for …

Kaydet Alıntı yap Alıntılanma sayısı: 130 İlgili makaleler 25 sürümün hepsi

[Free GPT-4]

[PDF] acm.org

Oobleck: Resilient distributed training of large models using pipeline templates

I Jang, Z Yang, Z Zhang, X **… - Proceedings of the 29th …, 2023 - dl.acm.org

Oobleck enables resilient distributed training of large DNN models with guaranteed fault
tolerance. It takes a planning-execution co-design approach, where it first generates a set of …

Kaydet Alıntı yap Alıntılanma sayısı: 34 İlgili makaleler 7 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

Gspmd: general and scalable parallelization for ml computation graphs

Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang… - arxiv preprint arxiv …, 2021 - arxiv.org

We present GSPMD, an automatic, compiler-based parallelization system for common
machine learning computations. It allows users to write programs in the same way as for a …

Kaydet Alıntı yap Alıntılanma sayısı: 130 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] mlr.press

Terapipe: Token-level pipeline parallelism for training large-scale language models

Z Li, S Zhuang, S Guo, D Zhuo… - International …, 2021 - proceedings.mlr.press

Abstract Model parallelism has become a necessity for training modern large-scale deep
language models. In this work, we identify a new and orthogonal dimension from existing …

Kaydet Alıntı yap Alıntılanma sayısı: 107 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

DAPPLE: A pipelined data parallel approach for training large models

Efficient large-scale language model training on gpu clusters using megatron-lm

PanGu-: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Decentralized training of foundation models in heterogeneous environments

{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training

Memory-efficient pipeline-parallel dnn training

Varuna: scalable, low-cost training of massive deep learning models

Chimera: efficiently training large-scale neural networks with bidirectional pipelines

Oobleck: Resilient distributed training of large models using pipeline templates

Gspmd: general and scalable parallelization for ml computation graphs

Terapipe: Token-level pipeline parallelism for training large-scale language models