Google Наука

Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in...

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Performance interference of virtual machines: A survey

W Lin, C **ong, W Wu, F Shi, K Li, M Xu - ACM Computing Surveys, 2023 - dl.acm.org

The rapid development of cloud computing with virtualization technology has benefited both
academia and industry. For any cloud data center at scale, one of the primary challenges is …

Запазване Позоваване С позовавания в 27 Сродни статии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Запазване Позоваване С позовавания в 93 Сродни статии Всички 7 версии

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Llumnix: Dynamic scheduling for large language model serving

B Sun, Z Huang, H Zhao, W **ao, X Zhang… - … USENIX Symposium on …, 2024 - usenix.org

Inference serving for large language models (LLMs) is the key to unleashing their potential
in people's daily lives. However, efficient LLM serving remains challenging today because …

Запазване Позоваване С позовавания в 24 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Serving heterogeneous machine learning models on {Multi-GPU} servers with {Spatio-Temporal} sharing

S Choi, S Lee, Y Kim, J Park, Y Kwon… - 2022 USENIX Annual …, 2022 - usenix.org

As machine learning (ML) techniques are applied to a widening range of applications, high
throughput ML inference serving has become critical for online services. Such ML inference …

Запазване Позоваване С позовавания в 106 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Gandiva: Introspective cluster scheduling for deep learning

W **ao, R Bhardwaj, R Ramjee, M Sivathanu… - … USENIX Symposium on …, 2018 - usenix.org

We introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific
knowledge to improve latency and efficiency of training deep learning models in a GPU …

Запазване Позоваване С позовавания в 599 Сродни статии Всички 11 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

M Jeon, S Venkataraman, A Phanishayee… - 2019 USENIX Annual …, 2019 - usenix.org

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

Запазване Позоваване С позовавания в 436 Сродни статии Всички 12 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] google.com

Adaptive resource efficient microservice deployment in cloud-edge continuum

K Fu, W Zhang, Q Chen, D Zeng… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

User-facing services are now evolving towards the microservice architecture where a
service is built by connecting multiple microservice stages. Since the entire service is heavy …

Запазване Позоваване С позовавания в 108 Сродни статии Всички 5 версии

Faasflow: Enable efficient workflow execution for function-as-a-service

Z Li, Y Liu, L Guo, Q Chen, J Cheng, W Zheng… - Proceedings of the 27th …, 2022 - dl.acm.org

Serverless computing (Function-as-a-Service) provides fine-grain resource sharing by
running functions (or Lambdas) in containers. Data-dependent functions are required to be …

Запазване Позоваване С позовавания в 76 Сродни статии

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

W **ao, S Ren, Y Li, Y Zhang, P Hou, Z Li… - … USENIX Symposium on …, 2020 - usenix.org

Efficiently scheduling deep learning jobs on large-scale GPU clusters is crucial for job
performance, system throughput, and hardware utilization. It is getting ever more …

Запазване Позоваване С позовавания в 213 Сродни статии Всички 11 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks

S Ghodrati, BH Ahn, JK Kim, S Kinzer… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on
learning patterns of data and are permeating into different industries and markets. Cloud …

Запазване Позоваване С позовавания в 130 Сродни статии Всички 9 версии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in...

Performance interference of virtual machines: A survey

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

Llumnix: Dynamic scheduling for large language model serving

Serving heterogeneous machine learning models on {Multi-GPU} servers with {Spatio-Temporal} sharing

Gandiva: Introspective cluster scheduling for deep learning

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

Adaptive resource efficient microservice deployment in cloud-edge continuum

Faasflow: Enable efficient workflow execution for function-as-a-service

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks