- Academic Search

J Ren, S Rajbhandari, RY Aminabadi… - 2021 USENIX Annual …, 2021 - usenix.org

Large-scale model training has been a playing ground for a limited few requiring complex
model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload …

Speichern Zitieren Zitiert von: 397 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] github.io

Zero: Memory optimizations toward training trillion parameter models

S Rajbhandari, J Rasley, O Ruwase… - … Conference for High …, 2020 - ieeexplore.ieee.org

Large deep learning models offer significant accuracy gains, but training billions to trillions
of parameters is challenging. Existing solutions such as data and model parallelisms exhibit …

Speichern Zitieren Zitiert von: 1317 Ähnliche Artikel Alle 11 Versionen

[Free GPT-4]

[PDF] arxiv.org

8-bit optimizers via block-wise quantization

T Dettmers, M Lewis, S Shleifer… - arxiv preprint arxiv …, 2021 - arxiv.org

Stateful optimizers maintain gradient statistics over time, eg, the exponentially smoothed
sum (SGD with momentum) or squared sum (Adam) of past gradient values. This state can …

Speichern Zitieren Zitiert von: 239 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

BPIPE: memory-balanced pipeline parallelism for training large language models

T Kim, H Kim, GI Yu, BG Chun - International Conference on …, 2023 - proceedings.mlr.press

Pipeline parallelism is a key technique for training large language models within GPU
clusters. However, it often leads to a memory imbalance problem, where certain GPUs face …

Speichern Zitieren Zitiert von: 26 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Petals: Collaborative inference and fine-tuning of large models

A Borzunov, D Baranchuk, T Dettmers… - arxiv preprint arxiv …, 2022 - arxiv.org

Many NLP tasks benefit from using large language models (LLMs) that often have more than
100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can …

Speichern Zitieren Zitiert von: 64 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Distributed inference and fine-tuning of large language models over the internet

A Borzunov, M Ryabinin… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) are useful in many NLP tasks and become more capable
with size, with the best open-source models having over 50 billion parameters. However …

Speichern Zitieren Zitiert von: 37 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Efficient combination of rematerialization and offloading for training dnns

O Beaumont, L Eyraud-Dubois… - Advances in Neural …, 2021 - proceedings.neurips.cc

Rematerialization and offloading are two well known strategies to save memory during the
training phase of deep neural networks, allowing data scientists to consider larger models …

Speichern Zitieren Zitiert von: 44 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Matching guided distillation

K Yue, J Deng, F Zhou - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer

Feature distillation is an effective way to improve the performance for a smaller student
model, which has fewer parameters and lower computation cost compared to the larger …

Speichern Zitieren Zitiert von: 70 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] arxiv.org

Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Speichern Zitieren Zitiert von: 6 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] nsf.gov

Mpress: Democratizing billion-scale model training on multi-gpu servers via memory-saving inter-operator parallelism

Q Zhou, H Wang, X Yu, C Li, Y Bai… - … Symposium on High …, 2023 - ieeexplore.ieee.org

It remains challenging to train billion-scale DNN models on a single modern multi-GPU
server due to the GPU memory wall. Unfortunately, existing memory-saving techniques such …

Speichern Zitieren Zitiert von: 17 Ähnliche Artikel Alle 3 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Training large neural networks with constant memory using a new execution algorithm

{Zero-offload}: Democratizing {billion-scale} model training

Zero: Memory optimizations toward training trillion parameter models

8-bit optimizers via block-wise quantization

BPIPE: memory-balanced pipeline parallelism for training large language models

Petals: Collaborative inference and fine-tuning of large models

Distributed inference and fine-tuning of large language models over the internet

Efficient combination of rematerialization and offloading for training dnns

Matching guided distillation

Efficient training of large language models on distributed infrastructures: a survey

Mpress: Democratizing billion-scale model training on multi-gpu servers via memory-saving inter-operator parallelism