Optimizing distributed ml communication with fused computation-collective operations
Machine learning models are distributed across multiple nodes using numerous parallelism
strategies. The resulting collective communication is often on the critical path due to a lack of …
strategies. The resulting collective communication is often on the critical path due to a lack of …
Model Parameter Prediction Method for Accelerating Distributed DNN Training
W Liu, D Chen, M Tan, K Chen, Y Yin, WL Shang, J Li… - Computer Networks, 2024 - Elsevier
As the size of deep neural network (DNN) models and datasets increases, distributed
training becomes popular to reduce the training time. However, a severe communication …
training becomes popular to reduce the training time. However, a severe communication …
Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders
To overcome the memory capacity wall of large-scale AI and big data applications, Compute
Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of …
Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of …
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
Large Language Models increasingly rely on distributed techniques for their training and
inference. These techniques require communication across devices which can reduce …
inference. These techniques require communication across devices which can reduce …
Echo: Simulating Distributed Training At Scale
Simulation offers unique values for both enumeration and extrapolation purposes, and is
becoming increasingly important for managing the massive machine learning (ML) clusters …
becoming increasingly important for managing the massive machine learning (ML) clusters …
Accelerating CPU to Memory Access in SoC Architecture Design
H Wan, Y Zou, G Wen, J Hu - SoutheastCon 2024, 2024 - ieeexplore.ieee.org
The evolution and revolution of technologies in artificial intelligence, internet of things, and
5G communication mark the dawn of a new fully digitalized era. Correspondingly, the …
5G communication mark the dawn of a new fully digitalized era. Correspondingly, the …
Cross-Stack Optimizations for Sequence-Based Models on GPUs
S Pati - 2024 - search.proquest.com
Advancements in the field of machine learning has made deep neural networks (DNNs)
ubiquitous. Their application in the domain of natural language processing (NLP) with …
ubiquitous. Their application in the domain of natural language processing (NLP) with …
[PDF][PDF] Tianshu: Towards Accurate Measuring, Modeling and Simulation of Deep Neural Networks
H Huang, H Xu, N Guan - … Symposium, EMSS 2024, Held at the …, 2024 - henryhxu.github.io
To help train DNN models more efficiently, researchers have developed a series of new
computation devices and parallel training strategies. Choosing which device and strategy to …
computation devices and parallel training strategies. Choosing which device and strategy to …