P4INC-AOI: All-Optical Interconnect Empowered by In-Network Computing for DML Workloads

X **e, B Tang, X Chen, Z Zhu - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Increasing demands for distributed machine learning (DML) have posed significant pressure
on data-center networks (DCNs). This promotes the adoption of reconfigurable all-optical …

Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution

H Wang, C Ruan, J He, J Ruan, C Tang, X Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
The growth of Large Language Models (LLMs) has necessitated large-scale distributed
training. Highly optimized frameworks, however, still suffer significant losses in Model …