- Academic Search

S Duan, D Wang, J Ren, F Lyu, Y Zhang… - … Surveys & Tutorials, 2022 - ieeexplore.ieee.org

As the computing paradigm shifts from cloud computing to end-edge-cloud computing, it
also supports artificial intelligence evolving from a centralized manner to a distributed one …

Speichern Zitieren Zitiert von: 194 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] arxiv.org

Edge-cloud polarization and collaboration: A comprehensive survey for ai

J Yao, S Zhang, Y Yao, F Wang, J Ma… - … on Knowledge and …, 2022 - ieeexplore.ieee.org

Influenced by the great success of deep learning via cloud computing and the rapid
development of edge chips, research in artificial intelligence (AI) has shifted to both of the …

Speichern Zitieren Zitiert von: 112 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

Speichern Zitieren Zitiert von: 926 Ähnliche Artikel Alle 9 Versionen

[Free GPT-4]

[PDF] usenix.org

Scaling distributed machine learning with {In-Network} aggregation

A Sapio, M Canini, CY Ho, J Nelson, P Kalnis… - … USENIX Symposium on …, 2021 - usenix.org

Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …

Speichern Zitieren Zitiert von: 495 Ähnliche Artikel Alle 19 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Decentralized training of foundation models in heterogeneous environments

B Yuan, Y He, J Davis, T Zhang… - Advances in …, 2022 - proceedings.neurips.cc

Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often
involving tens of thousands of GPUs running continuously for months. These models are …

Speichern Zitieren Zitiert von: 86 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]

[PDF] usenix.org

{ATP}: In-network aggregation for multi-tenant learning

CL Lao, Y Le, K Mahajan, Y Chen, W Wu… - … USENIX Symposium on …, 2021 - usenix.org

Distributed deep neural network training (DT) systems are widely deployed in clusters where
the network is shared across multiple tenants, ie, multiple DT jobs. Each DT job computes …

Speichern Zitieren Zitiert von: 254 Ähnliche Artikel Alle 11 Versionen HTML-Version

[Free GPT-4]

[PDF] fb.com

Rdma over ethernet for distributed training at meta scale

A Gangidi, R Miao, S Zheng, SJ Bondu… - Proceedings of the …, 2024 - dl.acm.org

The rapid growth in both computational density and scale in AI models in recent years
motivates the construction of an efficient and reliable dedicated network infrastructure. This …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] usenix.org

Power-aware Deep Learning Model Serving with {μ-Serve}

H Qiu, W Mao, A Patke, S Cui, S Jha, C Wang… - 2024 USENIX Annual …, 2024 - usenix.org

With the increasing popularity of large deep learning model-serving workloads, there is a
pressing need to reduce the energy consumption of a model-serving cluster while …

Speichern Zitieren Zitiert von: 11 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] usenix.org

{SRNIC}: A scalable architecture for {RDMA}{NICs}

Z Wang, L Luo, Q Ning, C Zeng, W Li, X Wan… - … USENIX Symposium on …, 2023 - usenix.org

RDMA is expected to be highly scalable: to perform well in large-scale data center networks
where packet losses are inevitable (ie, high network scalability), and to support a large …

Speichern Zitieren Zitiert von: 63 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] github.io

[PDF][PDF] MAST: Global scheduling of ML training across Geo-Distributed datacenters at hyperscale

A Choudhury, Y Wang, T Pelkonen… - 18th USENIX …, 2024 - yangwang83.github.io

In public clouds, users must manually select a datacenter region to upload their ML training
data and launch ML training workloads in the same region to ensure data and computation …

Speichern Zitieren Zitiert von: 11 Ähnliche Artikel Alle 3 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU}...

Distributed artificial intelligence empowered by end-edge-cloud computing: A survey

Edge-cloud polarization and collaboration: A comprehensive survey for ai

[HTML][HTML] Pre-trained models: Past, present and future

Scaling distributed machine learning with {In-Network} aggregation

Decentralized training of foundation models in heterogeneous environments

{ATP}: In-network aggregation for multi-tenant learning

Rdma over ethernet for distributed training at meta scale

Power-aware Deep Learning Model Serving with {μ-Serve}

{SRNIC}: A scalable architecture for {RDMA}{NICs}

[PDF][PDF] MAST: Global scheduling of ML training across Geo-Distributed datacenters at hyperscale