Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems

H Zhu, M Erez - Proceedings of the twenty-first international conference …, 2016 - dl.acm.org
Latency-critical applications suffer from both average performance degradation and reduced
completion time predictability when collocated with batch tasks. Such variation forces the …

Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency

R Ausavarungnirun, V Miller, J Landgraf… - ACM SIGPLAN …, 2018 - dl.acm.org
Graphics Processing Units (GPUs) exploit large amounts of threadlevel parallelism to
provide high instruction throughput and to efficiently hide long-latency stalls. The resulting …

DASH: Deadline-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators

H Usui, L Subramanian, KKW Chang… - ACM Transactions on …, 2016 - dl.acm.org
Modern SoCs integrate multiple CPU cores and hardware accelerators (HWAs) that share
the same main memory system, causing interference among memory requests from different …

Exploiting inter-warp heterogeneity to improve GPGPU performance

R Ausavarungnirun, S Ghose, O Kayiran… - 2015 International …, 2015 - ieeexplore.ieee.org
In a GPU, all threads within a warp execute the same instruction in lockstep. For a memory
instruction, this can lead to memory divergence: the memory requests for some threads are …

Kelp: Qos for accelerated machine learning systems

H Zhu, D Lo, L Cheng, R Govindaraju… - … Symposium on High …, 2019 - ieeexplore.ieee.org
Development and deployment of machine learning (ML) accelerators in Warehouse Scale
Computers (WSCs) demand significant capital investments and engineering efforts …

[HTML][HTML] Enhancing QoS in Multicore Systems with Heterogeneous Memory Configurations

J Kim, H Park, J Hong - Electronics, 2024 - mdpi.com
Quality of service (QoS) has evolved to ensure performance across various computing
environments, focusing on data bandwidth, response time, throughput, and stability …

Investigating fairness in disaggregated non-volatile memories

VR Kommareddy, C Hughes… - 2019 IEEE Computer …, 2019 - ieeexplore.ieee.org
Many applications have growing demands for memory, particularly in the HPC space,
making the memory system a potential bottleneck of next-generation computing systems …

Providing high and controllable performance in multicore systems through shared resource management

L Subramanian - arxiv preprint arxiv:1508.03087, 2015 - arxiv.org
Multiple applications executing concurrently on a multicore system interfere with each other
at different shared resources such as main memory and shared caches. Such inter …

A memory controller with row buffer locality awareness for hybrid memory systems

HB Yoon, J Meza, R Ausavarungnirun… - arxiv preprint arxiv …, 2018 - arxiv.org
Non-volatile memory (NVM) is a class of promising scalable memory technologies that can
potentially offer higher capacity than DRAM at the same cost point. Unfortunately, the access …

Exploiting the dram microarchitecture to increase memory-level parallelism

Y Kim, V Seshadri, D Lee, J Liu, O Mutlu - arxiv preprint arxiv:1805.01966, 2018 - arxiv.org
This paper summarizes the idea of Subarray-Level Parallelism (SALP) in DRAM, which was
published in ISCA 2012, and examines the work's significance and future potential. Modern …