A mechanistic performance model for superscalar out-of-order processors
A mechanistic model for out-of-order superscalar processors is developed and then applied
to the study of microarchitecture resource scaling. The model divides execution time into …
to the study of microarchitecture resource scaling. The model divides execution time into …
Distributed microarchitectural protocols in the TRIPS prototype processor
Growing on-chip wire delays will cause many future microarchitectures to be distributed, in
which hardware resources within a single processor become nodes on one or more …
which hardware resources within a single processor become nodes on one or more …
Invisifence: performance-transparent memory ordering in conventional multiprocessors
A multiprocessor's memory consistency model imposes ordering constraints among loads,
stores, atomic operations, and memory fences. Even for consistency models that relax …
stores, atomic operations, and memory fences. Even for consistency models that relax …
DeSC: Decoupled supply-compute communication management for heterogeneous architectures
Today's computers employ significant heterogeneity to meet performance targets at
manageable power. In adopting increased compute specialization, however, the relative …
manageable power. In adopting increased compute specialization, however, the relative …
Redefining the Role of the CPU in the Era of CPU-GPU Integration
We've seen the quick adoption of GPUs as general-purpose computing engines in recent
years, fueled by high computational throughput and energy efficiency. There is heavier …
years, fueled by high computational throughput and energy efficiency. There is heavier …
[LIBRO][B] Multithreading architecture
M Nemirovsky, D Tullsen - 2022 - books.google.com
Multithreaded architectures now appear across the entire range of computing devices, from
the highest-performing general purpose devices to low-end embedded processors …
the highest-performing general purpose devices to low-end embedded processors …
Kilo-instruction processors: Overcoming the memory wall
Historically, advances in integrated circuit technology have driven improvements in
processor microarchitecture and led to todays microprocessors with sophisticated pipelines …
processor microarchitecture and led to todays microprocessors with sophisticated pipelines …
iCFP: Tolerating all-level cache misses in in-order processors
Growing concerns about power have revived interest in in-order pipelines. In-order pipelines
sacrifice single-thread performance. Specifically, they do not allow execution to flow freely …
sacrifice single-thread performance. Specifically, they do not allow execution to flow freely …
Non-speculative load-load reordering in tso
In Total Store Order memory consistency (TSO), loads can be speculatively reordered to
improve performance. If a load-load reordering is seen by other cores, speculative loads …
improve performance. If a load-load reordering is seen by other cores, speculative loads …
Long term parking (ltp) criticality-aware resource allocation in ooo processors
Modern processors employ large structures (IQ, LSQ, register file, etc.) to expose instruction-
level parallelism (ILP) and memory-level parallelism (MLP). These resources are typically …
level parallelism (ILP) and memory-level parallelism (MLP). These resources are typically …