{dLoRA}: Dynamically orchestrating requests and adapters for {LoRA}{LLM} serving
Low-rank adaptation (LoRA) is a popular approach to finetune pre-trained large language
models (LLMs) to specific domains. This paper introduces dLoRA, an inference serving …
models (LLMs) to specific domains. This paper introduces dLoRA, an inference serving …
An exhaustive survey on p4 programmable data plane switches: Taxonomy, applications, challenges, and future trends
Traditionally, the data plane has been designed with fixed functions to forward packets using
a small set of protocols. This closed-design paradigm has limited the capability of the …
a small set of protocols. This closed-design paradigm has limited the capability of the …
Hermod: principled and practical scheduling for serverless functions
Serverless computing has seen rapid growth due to the ease-of-use and cost-efficiency it
provides. However, function scheduling, a critical component of serverless systems, has …
provides. However, function scheduling, a critical component of serverless systems, has …
Mind: In-network memory management for disaggregated data centers
Memory disaggregation promises transparent elasticity, high resource utilization and
hardware heterogeneity in data centers by physically separating memory and compute into …
hardware heterogeneity in data centers by physically separating memory and compute into …
When should the network be the computer?
Researchers have repurposed programmable network devices to place small amounts of
application computation in the network, sometimes yielding orders-of-magnitude …
application computation in the network, sometimes yielding orders-of-magnitude …
SketchINT: Empowering INT with TowerSketch for per-flow per-switch measurement
Network measurement is indispensable to network operations. INT solutions that can
provide fine-grained per-switch per-packet information serve as promising solutions for per …
provide fine-grained per-switch per-packet information serve as promising solutions for per …
Unlocking the power of inline {Floating-Point} operations on programmable switches
The advent of switches with programmable dataplanes has enabled the rapid development
of new network functionality, as well as providing a platform for acceleration of a broad …
of new network functionality, as well as providing a platform for acceleration of a broad …
Rambda: Rdma-driven acceleration framework for memory-intensive µs-scale datacenter applications
Responding to the" datacenter tax" and" killer microseconds" problems for memory-intensive
datacenter applications, diverse solutions including Smart NIC-based ones have been …
datacenter applications, diverse solutions including Smart NIC-based ones have been …
DINC: Toward distributed in-network computing
In-network computing provides significant performance benefits, load reduction, and power
savings. Still, an in-network service's functionality is strictly limited to a single hardware …
savings. Still, an in-network service's functionality is strictly limited to a single hardware …
Bidl: A high-throughput, low-latency permissioned blockchain framework for datacenter networks
A permissioned blockchain framework typically runs an efficient Byzantine consensus
protocol and is attractive to deploy fast trading applications among a large number of …
protocol and is attractive to deploy fast trading applications among a large number of …