Tabi: An efficient multi-level inference system for large language models
Today's trend of building ever larger language models (LLMs), while pushing the
performance of natural language processing, adds significant latency to the inference stage …
performance of natural language processing, adds significant latency to the inference stage …
Computers Can Learn from the Heuristic Designs and Master Internet Congestion Control
In this work, for the first time, we demonstrate that computers can automatically learn from
observing the heuristic efforts of the last four decades, stand on the shoulders of the existing …
observing the heuristic efforts of the last four decades, stand on the shoulders of the existing …
Spine: An efficient DRL-based congestion control with ultra-low overhead
Previous congestion control (CC) algorithms based on deep reinforcement learning (DRL)
directly adjust flow sending rate to respond to dynamic bandwidth change, resulting in high …
directly adjust flow sending rate to respond to dynamic bandwidth change, resulting in high …
Astraea: Towards Fair and Efficient Learning-based Congestion Control
Recent years have witnessed a plethora of learning-based solutions for congestion control
(CC) that demonstrate better performance over traditional TCP schemes. However, they fail …
(CC) that demonstrate better performance over traditional TCP schemes. However, they fail …
Efficient DRL-Based Congestion Control With Ultra-Low Overhead
H Tian, X Liao, C Zeng, D Sun… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Previous congestion control (CC) algorithms based on deep reinforcement learning (DRL)
directly adjust flow sending rate to respond to dynamic bandwidth change, resulting in high …
directly adjust flow sending rate to respond to dynamic bandwidth change, resulting in high …
Towards fair and efficient learning-based congestion control
Recent years have witnessed a plethora of learning-based solutions for congestion control
(CC) that demonstrate better performance over traditional TCP schemes. However, they fail …
(CC) that demonstrate better performance over traditional TCP schemes. However, they fail …
SwiftQueue: Optimizing Low-Latency Applications with Swift Packet Queuing
Low Latency, Low Loss, and Scalable Throughput (L4S), as an emerging router-queue
management technique, has seen steady deployment in the industry. An L4S-enabled router …
management technique, has seen steady deployment in the industry. An L4S-enabled router …
[PDF][PDF] Design and Operation of Shared Machine Learning Clusters on Campus
Amid the rapid advancements in large machine learning (ML) models, universities
worldwide are investing substantial funds and efforts into GPU clusters. However, managing …
worldwide are investing substantial funds and efforts into GPU clusters. However, managing …
MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference
W Wang, Y Zhang, Y **, H Tian… - 2023 19th International …, 2023 - ieeexplore.ieee.org
Distributed edge inference emerges to be a promising paradigm to speed up inference.
Previous works make physical partitions on CNNs to realize it, but there are the following …
Previous works make physical partitions on CNNs to realize it, but there are the following …
RNN-based Congestion Control in the Linux Kernel
Y Kojima, R Kazama, H Abe… - 2024 Twelfth International …, 2024 - ieeexplore.ieee.org
This paper presents a lightweight in-kernel design of RNN-CUBIC, a loss-based machine
learning (ML)-based congestion control algorithm (CCA), which optimizes the TCP sending …
learning (ML)-based congestion control algorithm (CCA), which optimizes the TCP sending …