Google Akademik

M Xu, D Cai, W Yin, S Wang, X **, X Liu - ACM Computing Surveys, 2025 - dl.acm.org

Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

Kaydet Alıntı yap Alıntılanma sayısı: 2 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] github.io

[PDF][PDF] MAST: Global scheduling of ML training across Geo-Distributed datacenters at hyperscale

A Choudhury, Y Wang, T Pelkonen… - 18th USENIX …, 2024 - yangwang83.github.io

In public clouds, users must manually select a datacenter region to upload their ML training
data and launch ML training workloads in the same region to ensure data and computation …

Kaydet Alıntı yap Alıntılanma sayısı: 12 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Kaydet Alıntı yap Alıntılanma sayısı: 92 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reducing energy bloat in large model training

JW Chung, Y Gu, I Jang, L Meng, N Bansal… - Proceedings of the …, 2024 - dl.acm.org

Training large AI models on numerous GPUs consumes a massive amount of energy,
making power delivery one of the largest limiting factors in building and operating …

Kaydet Alıntı yap Alıntılanma sayısı: 10 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Kaydet Alıntı yap Alıntılanma sayısı: 7 İlgili makaleler 5 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …

Kaydet Alıntı yap Alıntılanma sayısı: 5 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Tenplex: Dynamic parallelism for deep learning using parallelizable tensor collections

M Wagenländer, G Li, B Zhao, L Mai… - Proceedings of the ACM …, 2024 - dl.acm.org

Deep learning (DL) jobs use multi-dimensional parallelism, ie, combining data, model, and
pipeline parallelism, to use large GPU clusters efficiently. Long-running jobs may …

Kaydet Alıntı yap Alıntılanma sayısı: 3 İlgili makaleler 6 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] swapnilgandhi.com

ReCycle: Resilient Training of Large DNNs using Pipeline Adaptation

S Gandhi, M Zhao, A Skiadopoulos… - Proceedings of the ACM …, 2024 - dl.acm.org

Training large Deep Neural Network (DNN) models requires thousands of GPUs over the
course of several days or weeks. At this scale, failures are frequent and can have a big …

Kaydet Alıntı yap Alıntılanma sayısı: 1 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Disttrain: Addressing model and data heterogeneity with disaggregated training for multimodal large language models

Z Zhang, Y Zhong, R Ming, H Hu, J Sun, Z Ge… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal large language models (LLMs) have demonstrated significant potential in a wide
range of AI applications. Yet, training multimodal LLMs suffers from low efficiency and …

Kaydet Alıntı yap Alıntılanma sayısı: 4 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hybridflow: A flexible and efficient rlhf framework

G Sheng, C Zhang, Z Ye, X Wu, W Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language
Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node …

Kaydet Alıntı yap Alıntılanma sayısı: 4 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Oobleck: Resilient distributed training of large models using pipeline templates

Resource-efficient algorithms and systems of foundation models: A survey

[PDF][PDF] MAST: Global scheduling of ML training across Geo-Distributed datacenters at hyperscale

A survey of resource-efficient llm and multimodal foundation models

Reducing energy bloat in large model training

Efficient training of large language models on distributed infrastructures: a survey

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Tenplex: Dynamic parallelism for deep learning using parallelizable tensor collections

ReCycle: Resilient Training of Large DNNs using Pipeline Adaptation

Disttrain: Addressing model and data heterogeneity with disaggregated training for multimodal large language models

Hybridflow: A flexible and efficient rlhf framework