Tabi: An efficient multi-level inference system for large language models

Y Wang, K Chen, H Tan, K Guo - Proceedings of the Eighteenth …, 2023 - dl.acm.org
Today's trend of building ever larger language models (LLMs), while pushing the
performance of natural language processing, adds significant latency to the inference stage …

Computers Can Learn from the Heuristic Designs and Master Internet Congestion Control

CY Yen, S Abbasloo, HJ Chao - … of the ACM SIGCOMM 2023 Conference, 2023 - dl.acm.org
In this work, for the first time, we demonstrate that computers can automatically learn from
observing the heuristic efforts of the last four decades, stand on the shoulders of the existing …

Spine: An efficient DRL-based congestion control with ultra-low overhead

H Tian, X Liao, C Zeng, J Zhang, K Chen - Proceedings of the 18th …, 2022 - dl.acm.org
Previous congestion control (CC) algorithms based on deep reinforcement learning (DRL)
directly adjust flow sending rate to respond to dynamic bandwidth change, resulting in high …

Astraea: Towards Fair and Efficient Learning-based Congestion Control

X Liao, H Tian, C Zeng, X Wan, K Chen - Proceedings of the Nineteenth …, 2024 - dl.acm.org
Recent years have witnessed a plethora of learning-based solutions for congestion control
(CC) that demonstrate better performance over traditional TCP schemes. However, they fail …

Efficient DRL-Based Congestion Control With Ultra-Low Overhead

H Tian, X Liao, C Zeng, D Sun… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Previous congestion control (CC) algorithms based on deep reinforcement learning (DRL)
directly adjust flow sending rate to respond to dynamic bandwidth change, resulting in high …

Towards fair and efficient learning-based congestion control

X Liao, H Tian, C Zeng, X Wan, K Chen - arxiv preprint arxiv:2403.01798, 2024 - arxiv.org
Recent years have witnessed a plethora of learning-based solutions for congestion control
(CC) that demonstrate better performance over traditional TCP schemes. However, they fail …

SwiftQueue: Optimizing Low-Latency Applications with Swift Packet Queuing

S Ray, X Jiang, J Luo, N Feamster, J Jiang - arxiv preprint arxiv …, 2024 - arxiv.org
Low Latency, Low Loss, and Scalable Throughput (L4S), as an emerging router-queue
management technique, has seen steady deployment in the industry. An L4S-enabled router …

[PDF][PDF] Design and Operation of Shared Machine Learning Clusters on Campus

K Xu, D Sun, H Wang, Z Ren, X Wan… - Proceedings of the …, 2025 - xcwanandy.github.io
Amid the rapid advancements in large machine learning (ML) models, universities
worldwide are investing substantial funds and efforts into GPU clusters. However, managing …

MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference

W Wang, Y Zhang, Y **, H Tian… - 2023 19th International …, 2023 - ieeexplore.ieee.org
Distributed edge inference emerges to be a promising paradigm to speed up inference.
Previous works make physical partitions on CNNs to realize it, but there are the following …

RNN-based Congestion Control in the Linux Kernel

Y Kojima, R Kazama, H Abe… - 2024 Twelfth International …, 2024 - ieeexplore.ieee.org
This paper presents a lightweight in-kernel design of RNN-CUBIC, a loss-based machine
learning (ML)-based congestion control algorithm (CCA), which optimizes the TCP sending …