Execution-based prediction using speculative slices

C Zilles, G Sohi - Proceedings of the 28th annual international …, 2001‏ - dl.acm.org
A relatively small set of static instructions has significant leverage on program execution
performance. These problem instructions contribute a disproportionate number of cache …

Opening pandora's box: A systematic study of new ways microarchitecture can leak private data

JRS Vicarte, P Shome, N Nayak… - 2021 ACM/IEEE 48th …, 2021‏ - ieeexplore.ieee.org
Microarchitectural attacks have plunged Computer Architecture into a security crisis. Yet, as
the slowing of Moore's law justifies the use of ever more exotic microarchitecture, it is likely …

Speculative data-driven multithreading

A Roth, GS Sohi - Proceedings HPCA Seventh International …, 2001‏ - ieeexplore.ieee.org
Mispredicted branches and loads that miss in the cache cause the majority of retirement
stalls experienced by sequential processors; we call these critical instructions. Despite their …

[PDF][PDF] BDD based decomposition of logic functions with application to FPGA synthesis

YT Lai, M Pedram, SBK Vrudhula - Proceedings of the 30th international …, 1993‏ - dl.acm.org
This paper presents a theory for (disjunctive and nondisjunctive) function decomposition
using the BDD representation of Boolean functions. Incompletely specified as well as multi …

R2d2: Removing redundancy utilizing linearity of address generation in gpus

D Ha, Y Oh, WW Ro - Proceedings of the 50th Annual International …, 2023‏ - dl.acm.org
A generally used GPU programming methodology is that adjacent threads access data in
neighbor or specific-stride memory addresses and perform computations with the fetched …

Store vulnerability window (SVW): Re-execution filtering for enhanced load optimization

A Roth - … Symposium on Computer Architecture (ISCA'05), 2005‏ - ieeexplore.ieee.org
The load-store unit is a performance critical component of a dynamically-scheduled
processor. It is also a complex and non-scalable component. Several recently proposed …

WiDGET: Wisconsin decoupled grid execution tiles

Y Watanabe, JD Davis, DA Wood - ACM SIGARCH Computer …, 2010‏ - dl.acm.org
The recent paradigm shift to multi-core systems results in high system throughput within a
specified power budget. However, future systems still require good single thread …

Reno: a rename-based instruction optimizer

V Petric, T Sha, A Roth - 32nd International Symposium on …, 2005‏ - ieeexplore.ieee.org
RENO is a modified MIPS R10000 register renamer that uses map-table" short-circuiting" to
implement dynamic versions of several well-known static optimizations: move elimination …

WIR: Warp instruction reuse to minimize repeated computations in GPUs

K Kim, WW Ro - 2018 IEEE International Symposium on High …, 2018‏ - ieeexplore.ieee.org
Warp instructions with an identical arithmetic operation on same input values produce the
identical computation results. This paper proposes warp instruction reuse to allow such …

Control flow optimization via dynamic reconvergence prediction

JD Collins, DM Tullsen, H Wang - … International Symposium on …, 2004‏ - ieeexplore.ieee.org
This paper presents a novel microarchitecture technique for accurately predicting control
flow reconvergence dynamically. A reconvergence point is the earliest dynamic instruction in …