- Academic Search

P Dalmia, RS Kumar, MD Sinclair - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org

Chiplets are transforming computer system designs, allowing system designers to combine
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …

Spara Citera Citerat av 2 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]

[HTML] zjujournals.com

Resource scheduling techniques in cloud from a view of coordination: a holistic survey

Y Wang, J Yu, Z Yu - Frontiers of Information Technology & Electronic …, 2023 - Springer

Nowadays, the management of resource contention in shared cloud remains a pending
problem. The evolution and deployment of new application paradigms (eg, deep learning …

Spara Citera Citerat av 4 Relaterade artiklar Alla 7 versionerna

CD-MSA: cooperative and deadline-aware scheduling for efficient multi-tenancy on DNN accelerators

C Wang, Y Bai, D Sun - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org

With DNN turning into the backbone of AI cloud services and propelling the emergence of
INFerence-as-a-Service (INFaaS), DNN-specific accelerators have become the …

Spara Citera Citerat av 7 Relaterade artiklar Alla 3 versionerna

[Free GPT-4]

[PDF] arxiv.org

Global Optimizations & Lightweight Dynamic Logic for Concurrency

S Pati, S Aga, N Jayasena, MD Sinclair - arxiv preprint arxiv:2409.02227, 2024 - arxiv.org

Modern accelerators like GPUs are increasingly executing independent operations
concurrently to improve the device's compute utilization. However, effectively harnessing it …

Spara Citera Citerat av 1 Relaterade artiklar Alla 2 versionerna Se som HTML-version

Interference-aware Multiplexing for Deep Learning in GPU Clusters: A Middleware Approach

W Chen, Z Mo, H Xu, K Ye, C Xu - Proceedings of the International …, 2023 - dl.acm.org

A common strategy for improving efficiency in training deep learning entails multiplexing
tasks on a single GPU. To mitigate the interference caused by multiplexing, existing …

Spara Citera Citerat av 4 Relaterade artiklar Alla 3 versionerna

[Free GPT-4]

[PDF] embarclab.com

GPUPool: A holistic approach to fine-grained gpu sharing in the cloud

XS Tan, P Golikov, N Vijaykumar… - Proceedings of the …, 2022 - dl.acm.org

As Graphics Processing Units (GPUs) evolved into popular hardware accelerators for many
compute-hungry applications in the cloud, GPU virtualization has become a highly desirable …

Spara Citera Citerat av 9 Relaterade artiklar

[Free GPT-4]

[PDF] acm.org Full View

GPU domain specialization via composable on-package architecture

Y Fu, E Bolotin, N Chatterjee, D Nellans… - ACM Transactions on …, 2021 - dl.acm.org

As GPUs scale their low-precision matrix math throughput to boost deep learning (DL)
performance, they upset the balance between math throughput and memory system …

Spara Citera Citerat av 15 Relaterade artiklar Alla 4 versionerna

RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling

S Gupta, S Dwarkadas - 2024 IEEE International Symposium …, 2024 - ieeexplore.ieee.org

Data movement latency when using on-chip accelerators in emerging heterogeneous
architectures is a serious performance bottleneck. While hardware/software mechanisms …

Spara Citera Citerat av 1 Relaterade artiklar Alla 2 versionerna

[Free GPT-4]

[PDF] arxiv.org

PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers

G Yeo, J Kim, Y Choi, M Rhu - arxiv preprint arxiv:2411.19114, 2024 - arxiv.org

NVIDIA's Multi-Instance GPU (MIG) is a feature that enables system designers to reconfigure
one large GPU into multiple smaller GPU slices. This work characterizes this emerging GPU …

Spara Citera Relaterade artiklar Alla 2 versionerna Se som HTML-version

Optimizing Goodput of Real-time Serverless Functions using Dynamic Slicing with vGPUs

C Prakash, A Garg, U Bellur, P Kulkarni… - 2021 IEEE …, 2021 - ieeexplore.ieee.org

As the popularity and relevance of the Function-as-a-Service (FaaS) model keeps growing,
we believe newer avatars of the service will support computationally intensive SIMT …

Spara Citera Citerat av 4 Relaterade artiklar Alla 2 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Deadline-aware offloading for high-throughput accelerators

CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

Resource scheduling techniques in cloud from a view of coordination: a holistic survey

CD-MSA: cooperative and deadline-aware scheduling for efficient multi-tenancy on DNN accelerators

Global Optimizations & Lightweight Dynamic Logic for Concurrency

Interference-aware Multiplexing for Deep Learning in GPU Clusters: A Middleware Approach

GPUPool: A holistic approach to fine-grained gpu sharing in the cloud

GPU domain specialization via composable on-package architecture

RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling

PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers

Optimizing Goodput of Real-time Serverless Functions using Dynamic Slicing with vGPUs