- Academic Search

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024‏ - dl.acm.org‏

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …‏

שמור צטט צוטט על ידי 23 מאמרים בנושא זה כל 5 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Splitwise: Efficient generative llm inference using phase splitting‏

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024‏ - ieeexplore.ieee.org‏

Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …‏

שמור צטט צוטט על ידי 111 מאמרים בנושא זה כל 6 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision‏

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arxiv preprint arxiv …, 2022‏ - arxiv.org‏

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …‏

שמור צטט צוטט על ידי 35 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] ethz.ch

Enzian: an open, general, CPU/FPGA platform for systems software research‏

D Cock, A Ramdas, D Schwyn, M Giardino… - Proceedings of the 27th …, 2022‏ - dl.acm.org‏

Hybrid computing platforms, comprising CPU cores and FPGA logic, are increasingly used
for accelerating data-intensive workloads in cloud deployments, and are a growing topic of …‏

שמור צטט צוטט על ידי 73 מאמרים בנושא זה כל 9 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{FpgaNIC}: An {FPGA-based} versatile 100gb {SmartNIC} for {GPUs}‏

Z Wang, H Huang, J Zhang, F Wu… - 2022 USENIX Annual …, 2022‏ - usenix.org‏

Given that the increasing rate of network bandwidth is far ahead of that of the compute
capacity of host CPU, which by default processes network packets, SmartNIC has been …‏

שמור צטט צוטט על ידי 40 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Co-design hardware and algorithm for vector search‏

W Jiang, S Li, Y Zhu, J de Fine Licht, Z He… - Proceedings of the …, 2023‏ - dl.acm.org‏

Vector search has emerged as the foundation for large-scale information retrieval and
machine learning systems, with search engines like Google and Bing processing tens of …‏

שמור צטט צוטט על ידי 18 מאמרים בנושא זה כל 25 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Recpipe: Co-designing models and hardware to jointly optimize recommendation quality and performance‏

U Gupta, S Hsia, J Zhang, M Wilkening… - MICRO-54: 54th Annual …, 2021‏ - dl.acm.org‏

Deep learning recommendation systems must provide high quality, personalized content
under strict tail-latency targets and high system loads. This paper presents RecPipe, a …‏

שמור צטט צוטט על ידי 44 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] cityu.edu.hk

Rm-ssd: In-storage computing for large-scale recommendation inference‏

X Sun, H Wan, Q Li, CL Yang, TW Kuo… - … Symposium on High …, 2022‏ - ieeexplore.ieee.org‏

To meet the strict service level agreement requirements of recommendation systems, the
entire set of embeddings in recommendation systems needs to be loaded into the memory …‏

שמור צטט צוטט על ידי 32 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{ACCL+}: an {FPGA-Based} Collective Engine for Distributed Applications‏

Z He, D Korolija, Y Zhu, B Ramhorst, T Laan… - … USENIX Symposium on …, 2024‏ - usenix.org‏

FPGAs are increasingly prevalent in cloud deployments, serving as Smart-NICs or network-
attached accelerators. To facilitate the development of distributed applications with FPGAs …‏

שמור צטט צוטט על ידי 6 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mp-rec: Hardware-software co-design to enable multi-path recommendation‏

S Hsia, U Gupta, B Acun, N Ardalani, P Zhong… - Proceedings of the 28th …, 2023‏ - dl.acm.org‏

Deep learning recommendation systems serve personalized content under diverse tail-
latency targets and input-query loads. In order to do so, state-of-the-art recommendation …‏

שמור צטט צוטט על ידי 16 מאמרים בנושא זה כל 4 הגרסאות

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Fleetrec: Large-scale recommendation inference on hybrid gpu-fpga clusters

Deep learning workload scheduling in gpu datacenters: A survey‏

Splitwise: Efficient generative llm inference using phase splitting‏

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision‏

Enzian: an open, general, CPU/FPGA platform for systems software research‏

{FpgaNIC}: An {FPGA-based} versatile 100gb {SmartNIC} for {GPUs}‏

Co-design hardware and algorithm for vector search‏

Recpipe: Co-designing models and hardware to jointly optimize recommendation quality and performance‏

Rm-ssd: In-storage computing for large-scale recommendation inference‏

{ACCL+}: an {FPGA-Based} Collective Engine for Distributed Applications‏

Mp-rec: Hardware-software co-design to enable multi-path recommendation‏