- Academic Search

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Lưu Trích dẫn Trích dẫn 93 bài viết Bài viết có liên quan Tất cả 7 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Serving heterogeneous machine learning models on {Multi-GPU} servers with {Spatio-Temporal} sharing

S Choi, S Lee, Y Kim, J Park, Y Kwon… - 2022 USENIX Annual …, 2022 - usenix.org

As machine learning (ML) techniques are applied to a widening range of applications, high
throughput ML inference serving has become critical for online services. Such ML inference …

Lưu Trích dẫn Trích dẫn 106 bài viết Bài viết có liên quan Tất cả 5 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge

Y Kang, J Hauswald, C Gao, A Rovinski… - ACM SIGARCH …, 2017 - dl.acm.org

The computation for today's intelligent personal assistants such as Apple Siri, Google Now,
and Microsoft Cortana, is performed in the cloud. This cloud-only approach requires …

Lưu Trích dẫn Trích dẫn 1536 bài viết Bài viết có liên quan Tất cả 12 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

M Jeon, S Venkataraman, A Phanishayee… - 2019 USENIX Annual …, 2019 - usenix.org

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

Lưu Trích dẫn Trích dẫn 436 bài viết Bài viết có liên quan Tất cả 12 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Orion: Interference-aware, fine-grained GPU sharing for ML applications

F Strati, X Ma, A Klimovic - … of the Nineteenth European Conference on …, 2024 - dl.acm.org

GPUs are critical for maximizing the throughput-per-Watt of deep neural network (DNN)
applications. However, DNN applications often underutilize GPUs, even when using large …

Lưu Trích dẫn Trích dẫn 31 bài viết Bài viết có liên quan Tất cả 4 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] google.com

Adaptive resource efficient microservice deployment in cloud-edge continuum

K Fu, W Zhang, Q Chen, D Zeng… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

User-facing services are now evolving towards the microservice architecture where a
service is built by connecting multiple microservice stages. Since the entire service is heavy …

Lưu Trích dẫn Trích dẫn 108 bài viết Bài viết có liên quan Tất cả 5 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

W **ao, S Ren, Y Li, Y Zhang, P Hou, Z Li… - … USENIX Symposium on …, 2020 - usenix.org

Efficiently scheduling deep learning jobs on large-scale GPU clusters is crucial for job
performance, system throughput, and hardware utilization. It is getting ever more …

Lưu Trích dẫn Trích dẫn 213 bài viết Bài viết có liên quan Tất cả 11 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks

S Ghodrati, BH Ahn, JK Kim, S Kinzer… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on
learning patterns of data and are permeating into different industries and markets. Cloud …

Lưu Trích dẫn Trích dẫn 130 bài viết Bài viết có liên quan Tất cả 9 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] jasonmars.org

Grandslam: Guaranteeing slas for jobs in microservices execution frameworks

RS Kannan, L Subramanian, A Raju, J Ahn… - Proceedings of the …, 2019 - dl.acm.org

The microservice architecture has dramatically reduced user effort in adopting and
maintaining servers by providing a catalog of functions as services that can be used as …

Lưu Trích dẫn Trích dẫn 189 bài viết Bài viết có liên quan Tất cả 5 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units

Y Choi, M Rhu - 2020 IEEE International Symposium on High …, 2020 - ieeexplore.ieee.org

To amortize cost, cloud vendors providing DNN acceleration as a service to end-users
employ consolidation and virtualization to share the underlying resources among multiple …

Lưu Trích dẫn Trích dẫn 153 bài viết Bài viết có liên quan Tất cả 6 phiên bản

Tạo thông báo

Trích dẫn

Tìm kiếm nâng cao

Đã lưu vào Thư viện của tôi

Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse...

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

Serving heterogeneous machine learning models on {Multi-GPU} servers with {Spatio-Temporal} sharing

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

Orion: Interference-aware, fine-grained GPU sharing for ML applications

Adaptive resource efficient microservice deployment in cloud-edge continuum

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks

Grandslam: Guaranteeing slas for jobs in microservices execution frameworks

Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units