Optimizing distributed ml communication with fused computation-collective operations

K Punniyamurthy, K Hamidouche… - … Conference for High …, 2024 - ieeexplore.ieee.org
Machine learning models are distributed across multiple nodes using numerous parallelism
strategies. The resulting collective communication is often on the critical path due to a lack of …

Model Parameter Prediction Method for Accelerating Distributed DNN Training

W Liu, D Chen, M Tan, K Chen, Y Yin, WL Shang, J Li… - Computer Networks, 2024 - Elsevier
As the size of deep neural network (DNN) models and datasets increases, distributed
training becomes popular to reduce the training time. However, a severe communication …

Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders

H Ham, J Hong, G Park, Y Shin, O Woo, W Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
To overcome the memory capacity wall of large-scale AI and big data applications, Compute
Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of …

T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives

S Pati, S Aga, M Islam, N Jayasena… - Proceedings of the 29th …, 2024 - dl.acm.org
Large Language Models increasingly rely on distributed techniques for their training and
inference. These techniques require communication across devices which can reduce …

Echo: Simulating Distributed Training At Scale

Y Feng, Y Chen, K Chen, J Li, T Wu, P Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Simulation offers unique values for both enumeration and extrapolation purposes, and is
becoming increasingly important for managing the massive machine learning (ML) clusters …

Accelerating CPU to Memory Access in SoC Architecture Design

H Wan, Y Zou, G Wen, J Hu - SoutheastCon 2024, 2024 - ieeexplore.ieee.org
The evolution and revolution of technologies in artificial intelligence, internet of things, and
5G communication mark the dawn of a new fully digitalized era. Correspondingly, the …

Cross-Stack Optimizations for Sequence-Based Models on GPUs

S Pati - 2024 - search.proquest.com
Advancements in the field of machine learning has made deep neural networks (DNNs)
ubiquitous. Their application in the domain of natural language processing (NLP) with …

[PDF][PDF] Tianshu: Towards Accurate Measuring, Modeling and Simulation of Deep Neural Networks

H Huang, H Xu, N Guan - … Symposium, EMSS 2024, Held at the …, 2024 - henryhxu.github.io
To help train DNN models more efficiently, researchers have developed a series of new
computation devices and parallel training strategies. Choosing which device and strategy to …