[HTML][HTML] Modern computing: Vision and challenges

SS Gill, H Wu, P Patros, C Ottaviani, P Arora… - … and Informatics Reports, 2024 - Elsevier
Over the past six decades, the computing systems field has experienced significant
transformations, profoundly impacting society with transformational developments, such as …

Gemini: a family of highly capable multimodal models

G Team, R Anil, S Borgeaud, JB Alayrac, J Yu… - ar** what we believe to be the world's first large-scale
production deployments of lightwave fabrics used for both datacenter networking and …

{TopoOpt}: Co-optimizing network topology and parallelization strategy for distributed training jobs

W Wang, M Khazraee, Z Zhong, M Ghobadi… - … USENIX Symposium on …, 2023 - usenix.org
We propose TopoOpt, a novel direct-connect fabric for deep neural network (DNN) training
workloads. TopoOpt co-optimizes the distributed training process across three dimensions …

Optical switching in future fiber-optic networks utilizing spectral and spatial degrees of freedom

DM Marom, Y Miyamoto, DT Neilson… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Forthcoming capacity scaling requirements of optical networks and advances in optical fiber
communications beyond the omnipresent single-mode fiber operating over the conventional …

Alibaba hpn: A data center network for large language model training

K Qian, Y **, J Cao, J Gao, Y Xu, Y Guan, B Fu… - Proceedings of the …, 2024 - dl.acm.org
This paper presents HPN, Alibaba Cloud's data center network for large language model
(LLM) training. Due to the differences between LLMs and general cloud computing (eg, in …

Resiliency at Scale: Managing {Google's}{TPUv4} Machine Learning Supercomputer

Y Zu, A Ghaffarkhah, HV Dang, B Towles… - … USENIX Symposium on …, 2024 - usenix.org
TPUv4 (Tensor Processing Unit) is Google's 3rd generation accelerator for machine learning
training, deployed as a 4096-node supercomputer with a custom 3D torus interconnect. In …

State-of-the-art 800G/1.6 T datacom interconnects and outlook for 3.2 T

X Zhou, CF Lam, R Urata, H Liu - Optical Fiber Communication …, 2023 - opg.optica.org
OFC 2023 Invited paper ver 4 Page 1 State-of-the-Art 800G/1.6T Datacom Interconnects
and Outlook for 3.2T **ang Zhou, Cedric F. Lam, Ryohei Urata, and Hong Liu Google Inc …