Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Rubik: Fast analytical power management for latency-critical systems
Latency-critical workloads (eg, web search), common in datacenters, require stable tail (eg,
95 th percentile) latencies of a few milliseconds. Servers running these workloads are kept …
95 th percentile) latencies of a few milliseconds. Servers running these workloads are kept …
DeSC: Decoupled supply-compute communication management for heterogeneous architectures
Today's computers employ significant heterogeneity to meet performance targets at
manageable power. In adopting increased compute specialization, however, the relative …
manageable power. In adopting increased compute specialization, however, the relative …
Lock–unlock: Is that all? a pragmatic analysis of locking in software systems
A plethora of optimized mutex lock algorithms have been designed over the past 25 years to
mitigate performance bottlenecks related to critical sections and locks. Unfortunately, there is …
mitigate performance bottlenecks related to critical sections and locks. Unfortunately, there is …
Fix the code. don't tweak the hardware: A new compiler approach to voltage-frequency scaling
Traditional compiler approaches to optimize power efficiency aim to adjust voltage and
frequency at runtime to match the code characteristics to the hardware (eg, running memory …
frequency at runtime to match the code characteristics to the hardware (eg, running memory …
Unlocking energy
Locks are a natural place for improving the energy efficiency of software systems. First,
concurrent systems are mainstream and when their threads synchronize, they typically do it …
concurrent systems are mainstream and when their threads synchronize, they typically do it …
Resource-aware task scheduling
Dependency-aware task-based parallel programming models have proven to be successful
for develo** efficient application software for multicore-based computer architectures. The …
for develo** efficient application software for multicore-based computer architectures. The …
Clairvoyance: Look-ahead compile-time scheduling
To enhance the performance of memory-bound applications, hardware designs have been
developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the …
developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the …
Freeway: Maximizing MLP for slice-out-of-order execution
Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level
cache access latencies. While out-of-order (OoO) cores, and techniques building on them …
cache access latencies. While out-of-order (OoO) cores, and techniques building on them …
Beyond the roofline: Cache-aware power and energy-efficiency modeling for multi-cores
To foster the energy-efficiency in current and future multi-core processors, the benefits and
trade-offs of a large set of optimization solutions must be evaluated. For this purpose, it is …
trade-offs of a large set of optimization solutions must be evaluated. For this purpose, it is …
Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
Computer architecture design faces an era of great challenges in an attempt to
simultaneously improve performance and energy efficiency. Previous hardware techniques …
simultaneously improve performance and energy efficiency. Previous hardware techniques …