Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024‏ - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Splitwise: Efficient generative llm inference using phase splitting

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024‏ - ieeexplore.ieee.org
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

Enzian: an open, general, CPU/FPGA platform for systems software research

D Cock, A Ramdas, D Schwyn, M Giardino… - Proceedings of the 27th …, 2022‏ - dl.acm.org
Hybrid computing platforms, comprising CPU cores and FPGA logic, are increasingly used
for accelerating data-intensive workloads in cloud deployments, and are a growing topic of …

{FpgaNIC}: An {FPGA-based} versatile 100gb {SmartNIC} for {GPUs}

Z Wang, H Huang, J Zhang, F Wu… - 2022 USENIX Annual …, 2022‏ - usenix.org
Given that the increasing rate of network bandwidth is far ahead of that of the compute
capacity of host CPU, which by default processes network packets, SmartNIC has been …

Co-design hardware and algorithm for vector search

W Jiang, S Li, Y Zhu, J de Fine Licht, Z He… - Proceedings of the …, 2023‏ - dl.acm.org
Vector search has emerged as the foundation for large-scale information retrieval and
machine learning systems, with search engines like Google and Bing processing tens of …

Recpipe: Co-designing models and hardware to jointly optimize recommendation quality and performance

U Gupta, S Hsia, J Zhang, M Wilkening… - MICRO-54: 54th Annual …, 2021‏ - dl.acm.org
Deep learning recommendation systems must provide high quality, personalized content
under strict tail-latency targets and high system loads. This paper presents RecPipe, a …

Rm-ssd: In-storage computing for large-scale recommendation inference

X Sun, H Wan, Q Li, CL Yang, TW Kuo… - … Symposium on High …, 2022‏ - ieeexplore.ieee.org
To meet the strict service level agreement requirements of recommendation systems, the
entire set of embeddings in recommendation systems needs to be loaded into the memory …

{ACCL+}: an {FPGA-Based} Collective Engine for Distributed Applications

Z He, D Korolija, Y Zhu, B Ramhorst, T Laan… - … USENIX Symposium on …, 2024‏ - usenix.org
FPGAs are increasingly prevalent in cloud deployments, serving as Smart-NICs or network-
attached accelerators. To facilitate the development of distributed applications with FPGAs …

Mp-rec: Hardware-software co-design to enable multi-path recommendation

S Hsia, U Gupta, B Acun, N Ardalani, P Zhong… - Proceedings of the 28th …, 2023‏ - dl.acm.org
Deep learning recommendation systems serve personalized content under diverse tail-
latency targets and input-query loads. In order to do so, state-of-the-art recommendation …