Rubik: Fast analytical power management for latency-critical systems

H Kasture, DB Bartolini, N Beckmann… - Proceedings of the 48th …, 2015 - dl.acm.org
Latency-critical workloads (eg, web search), common in datacenters, require stable tail (eg,
95 th percentile) latencies of a few milliseconds. Servers running these workloads are kept …

DeSC: Decoupled supply-compute communication management for heterogeneous architectures

TJ Ham, JL Aragón, M Martonosi - Proceedings of the 48th International …, 2015 - dl.acm.org
Today's computers employ significant heterogeneity to meet performance targets at
manageable power. In adopting increased compute specialization, however, the relative …

Lock–unlock: Is that all? a pragmatic analysis of locking in software systems

R Guerraoui, H Guiroux, R Lachaize, V Quéma… - ACM Transactions on …, 2019 - dl.acm.org
A plethora of optimized mutex lock algorithms have been designed over the past 25 years to
mitigate performance bottlenecks related to critical sections and locks. Unfortunately, there is …

Fix the code. don't tweak the hardware: A new compiler approach to voltage-frequency scaling

A Jimborean, K Koukos, V Spiliopoulos… - Proceedings of Annual …, 2014 - dl.acm.org
Traditional compiler approaches to optimize power efficiency aim to adjust voltage and
frequency at runtime to match the code characteristics to the hardware (eg, running memory …

Unlocking energy

B Falsafi, R Guerraoui, J Picorel… - 2016 USENIX Annual …, 2016 - usenix.org
Locks are a natural place for improving the energy efficiency of software systems. First,
concurrent systems are mainstream and when their threads synchronize, they typically do it …

Resource-aware task scheduling

M Tillenius, E Larsson, RM Badia… - ACM Transactions on …, 2015 - dl.acm.org
Dependency-aware task-based parallel programming models have proven to be successful
for develo** efficient application software for multicore-based computer architectures. The …

Clairvoyance: Look-ahead compile-time scheduling

KA Tran, TE Carlson, K Koukos… - 2017 IEEE/ACM …, 2017 - ieeexplore.ieee.org
To enhance the performance of memory-bound applications, hardware designs have been
developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the …

Freeway: Maximizing MLP for slice-out-of-order execution

R Kumar, M Alipour… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level
cache access latencies. While out-of-order (OoO) cores, and techniques building on them …

Beyond the roofline: Cache-aware power and energy-efficiency modeling for multi-cores

A Ilic, F Pratas, L Sousa - IEEE Transactions on Computers, 2016 - ieeexplore.ieee.org
To foster the energy-efficiency in current and future multi-core processors, the benefits and
trade-offs of a large set of optimization solutions must be evaluated. For this purpose, it is …

Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs

K Koukos, P Ekemark, G Zacharopoulos… - Proceedings of the 25th …, 2016 - dl.acm.org
Computer architecture design faces an era of great challenges in an attempt to
simultaneously improve performance and energy efficiency. Previous hardware techniques …