- Academic Search

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Zapisz Cytuj Cytowane przez 4762 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pytorch fsdp: experiences on scaling fully sharded data parallel

Y Zhao, A Gu, R Varma, L Luo, CC Huang, M Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

It is widely acknowledged that large models have the potential to deliver superior
performance across a broad range of domains. Despite the remarkable progress made in …

Zapisz Cytuj Cytowane przez 254 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlsys.org

Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems

A Desai, L Chou, A Shrivastava - Proceedings of Machine …, 2022 - proceedings.mlsys.org

Deep learning for recommendation data is one of the most pervasive and challenging AI
workload in recent times. State-of-the-art recommendation models are one of the largest …

Zapisz Cytuj Cytowane przez 28 Powiązane artykuły Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Training personalized recommendation systems from (GPU) scratch: Look forward not backwards

Y Kwon, M Rhu - Proceedings of the 49th Annual International …, 2022 - dl.acm.org

Personalized recommendation models (RecSys) are one of the most popular machine
learning workload serviced by hyperscalers. A critical challenge of training RecSys is its …

Zapisz Cytuj Cytowane przez 31 Powiązane artykuły Wszystkie wersje 7

A survey on auto-parallelism of large-scale deep learning training

P Liang, Y Tang, X Zhang, Y Bai, T Su… - … on Parallel and …, 2023 - ieeexplore.ieee.org

Deep learning (DL) has gained great success in recent years, leading to state-of-the-art
performance in research community and industrial fields like computer vision and natural …

Zapisz Cytuj Cytowane przez 16 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enabling compute-communication overlap in distributed deep learning training platforms

S Rashidi, M Denton, S Sridharan… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …

Zapisz Cytuj Cytowane przez 44 Powiązane artykuły Wszystkie wersje 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Graphpipe: Improving performance and scalability of dnn training with graph pipeline parallelism

B Jeon, M Wu, S Cao, S Kim, S Park… - arxiv preprint arxiv …, 2024 - arxiv.org

Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to
train on a single device. Pipeline parallelism is commonly used in existing DNN systems to …

Zapisz Cytuj Cytowane przez 4 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters

X Lian, B Yuan, X Zhu, Y Wang, Y He, H Wu… - Proceedings of the 28th …, 2022 - dl.acm.org

Recent years have witnessed an exponential growth of model scale in deep learning-based
recommender systems---from Google's 2016 model with 1 billion parameters to the latest …

Zapisz Cytuj Cytowane przez 25 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The trade-offs of model size in large recommendation models: 100GB to 10MB Criteo-tb DLRM model

A Desai, A Shrivastava - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Embedding tables dominate industrial-scale recommendation model sizes, using up to
terabytes of memory. A popular and the largest publicly available machine learning MLPerf …

Zapisz Cytuj Cytowane przez 9 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] vldb.org

EmbedX: A Versatile, Efficient and Scalable Platform to Embed Both Graphs and High-Dimensional Sparse Data

Y Zou, Z Ding, J Shi, S Guo, C Su, Y Zhang - Proceedings of the VLDB …, 2023 - dl.acm.org

In modern online services, it is of growing importance to process web-scale graph data and
high-dimensional sparse data together into embeddings for downstream tasks, such as …

Zapisz Cytuj Cytowane przez 4 Powiązane artykuły

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

High-performance, distributed training of large-scale deep learning recommendation models

On the opportunities and risks of foundation models

Pytorch fsdp: experiences on scaling fully sharded data parallel

Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems

Training personalized recommendation systems from (GPU) scratch: Look forward not backwards

A survey on auto-parallelism of large-scale deep learning training

Enabling compute-communication overlap in distributed deep learning training platforms

Graphpipe: Improving performance and scalability of dnn training with graph pipeline parallelism

Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters

The trade-offs of model size in large recommendation models: 100GB to 10MB Criteo-tb DLRM model

EmbedX: A Versatile, Efficient and Scalable Platform to Embed Both Graphs and High-Dimensional Sparse Data