Google Akademik

S Li, Y Zhao, R Varma, O Salpekar, P Noordhuis… - arxiv preprint arxiv …, 2020 - arxiv.org

This paper presents the design, implementation, and evaluation of the PyTorch distributed
data parallel module. PyTorch is a widely-adopted scientific computing package used in …

Kaydet Alıntı yap Alıntılanma sayısı: 706 İlgili makaleler 11 sürümün hepsi HTML olarak görüntüle

A comprehensive survey on training acceleration for large machine learning models in IoT

H Wang, Z Qu, Q Zhou, H Zhang, B Luo… - IEEE Internet of …, 2021 - ieeexplore.ieee.org

The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …

Kaydet Alıntı yap Alıntılanma sayısı: 46 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{MegaScale}: Scaling large language model training to more than 10,000 {GPUs}

Z Jiang, H Lin, Y Zhong, Q Huang, Y Chen… - … USENIX Symposium on …, 2024 - usenix.org

We present the design, implementation and engineering experience in building and
deploying MegaScale, a production system for training large language models (LLMs) at the …

Kaydet Alıntı yap Alıntılanma sayısı: 94 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Accelerating distributed {MoE} training and inference with lina

J Li, Y Jiang, Y Zhu, C Wang, H Xu - 2023 USENIX Annual Technical …, 2023 - usenix.org

Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …

Kaydet Alıntı yap Alıntılanma sayısı: 44 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Machine learning in real-time Internet of Things (IoT) systems: A survey

J Bian, A Al Arafat, H **ong, J Li, L Li… - IEEE Internet of …, 2022 - ieeexplore.ieee.org

Over the last decade, machine learning (ML) and deep learning (DL) algorithms have
significantly evolved and been employed in diverse applications, such as computer vision …

Kaydet Alıntı yap Alıntılanma sayısı: 69 İlgili makaleler 7 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Kaydet Alıntı yap Alıntılanma sayısı: 7 İlgili makaleler 5 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Parallelizing DNN training on GPUs: Challenges and opportunities

W Xu, Y Zhang, X Tang - … Proceedings of the Web Conference 2021, 2021 - dl.acm.org

In recent years, Deep Neural Networks (DNNs) have emerged as a widely adopted
approach in many application domains. Training DNN models is also becoming a significant …

Kaydet Alıntı yap Alıntılanma sayısı: 24 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] google.com

Elastic parameter server load distribution in deep learning clusters

Y Chen, Y Peng, Y Bao, C Wu, Y Zhu… - Proceedings of the 11th …, 2020 - dl.acm.org

In distributed DNN training, parameter servers (PS) can become performance bottlenecks
due to PS stragglers, caused by imbalanced parameter distribution, bandwidth contention …

Kaydet Alıntı yap Alıntılanma sayısı: 43 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] google.com

Robust searching-based gradient collaborative management in intelligent transportation system

H Shi, H Wang, R Ma, Y Hua, T Song, H Gao… - ACM Transactions on …, 2023 - dl.acm.org

With the rapid development of big data and the Internet of Things (IoT), traffic data from an
Intelligent Transportation System (ITS) is becoming more and more accessible. To …

Kaydet Alıntı yap Alıntılanma sayısı: 20 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{SHADE}: Enable Fundamental Cacheability for Distributed Deep Learning Training

RIS Khan, AH Yazdani, Y Fu, AK Paul, B Ji… - … USENIX Conference on …, 2023 - usenix.org

Deep learning training (DLT) applications exhibit unique I/O workload behaviors that pose
new challenges for storage system design. DLT is I/O intensive since data samples need to …

Kaydet Alıntı yap Alıntılanma sayısı: 22 İlgili makaleler 12 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Preemptive all-reduce scheduling for expediting distributed DNN training

Pytorch distributed: Experiences on accelerating data parallel training

A comprehensive survey on training acceleration for large machine learning models in IoT

{MegaScale}: Scaling large language model training to more than 10,000 {GPUs}

Accelerating distributed {MoE} training and inference with lina

Machine learning in real-time Internet of Things (IoT) systems: A survey

Efficient training of large language models on distributed infrastructures: a survey

Parallelizing DNN training on GPUs: Challenges and opportunities

Elastic parameter server load distribution in deep learning clusters

Robust searching-based gradient collaborative management in intelligent transportation system

{SHADE}: Enable Fundamental Cacheability for Distributed Deep Learning Training