{dLoRA}: Dynamically orchestrating requests and adapters for {LoRA}{LLM} serving

B Wu, R Zhu, Z Zhang, P Sun, X Liu, X ** - 18th USENIX Symposium on …, 2024‏ - usenix.org
Low-rank adaptation (LoRA) is a popular approach to finetune pre-trained large language
models (LLMs) to specific domains. This paper introduces dLoRA, an inference serving …

An exhaustive survey on p4 programmable data plane switches: Taxonomy, applications, challenges, and future trends

EF Kfoury, J Crichigno, E Bou-Harb - IEEE access, 2021‏ - ieeexplore.ieee.org
Traditionally, the data plane has been designed with fixed functions to forward packets using
a small set of protocols. This closed-design paradigm has limited the capability of the …

Hermod: principled and practical scheduling for serverless functions

K Kaffes, NJ Yadwadkar, C Kozyrakis - … of the 13th Symposium on Cloud …, 2022‏ - dl.acm.org
Serverless computing has seen rapid growth due to the ease-of-use and cost-efficiency it
provides. However, function scheduling, a critical component of serverless systems, has …

Mind: In-network memory management for disaggregated data centers

S Lee, Y Yu, Y Tang, A Khandelwal, L Zhong… - Proceedings of the …, 2021‏ - dl.acm.org
Memory disaggregation promises transparent elasticity, high resource utilization and
hardware heterogeneity in data centers by physically separating memory and compute into …

When should the network be the computer?

DRK Ports, J Nelson - Proceedings of the Workshop on Hot Topics in …, 2019‏ - dl.acm.org
Researchers have repurposed programmable network devices to place small amounts of
application computation in the network, sometimes yielding orders-of-magnitude …

SketchINT: Empowering INT with TowerSketch for per-flow per-switch measurement

K Yang, S Long, Q Shi, Y Li, Z Liu, Y Wu… - … on Parallel and …, 2023‏ - ieeexplore.ieee.org
Network measurement is indispensable to network operations. INT solutions that can
provide fine-grained per-switch per-packet information serve as promising solutions for per …

Unlocking the power of inline {Floating-Point} operations on programmable switches

Y Yuan, O Alama, J Fei, J Nelson, DRK Ports… - … USENIX Symposium on …, 2022‏ - usenix.org
The advent of switches with programmable dataplanes has enabled the rapid development
of new network functionality, as well as providing a platform for acceleration of a broad …

Rambda: Rdma-driven acceleration framework for memory-intensive µs-scale datacenter applications

Y Yuan, J Huang, Y Sun, T Wang… - … Symposium on High …, 2023‏ - ieeexplore.ieee.org
Responding to the" datacenter tax" and" killer microseconds" problems for memory-intensive
datacenter applications, diverse solutions including Smart NIC-based ones have been …

DINC: Toward distributed in-network computing

C Zheng, H Tang, M Zang, X Hong, A Feng… - Proceedings of the …, 2023‏ - dl.acm.org
In-network computing provides significant performance benefits, load reduction, and power
savings. Still, an in-network service's functionality is strictly limited to a single hardware …

Bidl: A high-throughput, low-latency permissioned blockchain framework for datacenter networks

J Qi, X Chen, Y Jiang, J Jiang, T Shen, S Zhao… - Proceedings of the …, 2021‏ - dl.acm.org
A permissioned blockchain framework typically runs an efficient Byzantine consensus
protocol and is attractive to deploy fast trading applications among a large number of …