[PDF][PDF] Simpoint 3.0: Faster and more flexible program phase analysis
This paper describes the new features available in the SimPoint 3.0 release. The release
provides two techniques for drastically reducing the run-time of SimPoint: faster searching to …
provides two techniques for drastically reducing the run-time of SimPoint: faster searching to …
Understanding and optimizing asynchronous low-precision stochastic gradient descent
Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in
machine learning and other domains. Since this is likely to continue for the foreseeable …
machine learning and other domains. Since this is likely to continue for the foreseeable …
Cloak and dagger: from two permissions to complete control of the UI feedback loop
The effectiveness of the Android permission system fundamentally hinges on the user's
correct understanding of the capabilities of the permissions being granted. In this paper, we …
correct understanding of the capabilities of the permissions being granted. In this paper, we …
RFVP: Rollback-free value prediction with safe-to-approximate loads
This article aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth
(bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach …
(bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach …
LoopPoint: Checkpoint-driven sampled simulation for multi-threaded applications
Generic multi-threaded sampled simulation has been a long-standing, challenging problem
with the potential to help change how researchers study modern, complex computing …
with the potential to help change how researchers study modern, complex computing …
Memory centric characterization and analysis of spec cpu2017 suite
In this paper, we provide a comprehensive, memory-centric characterization of the SPEC
CPU2017 benchmark suite, using a number of mechanisms including dynamic binary …
CPU2017 benchmark suite, using a number of mechanisms including dynamic binary …
HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs
Data dependences in sequential programs limit parallelization because extracted threads
cannot run independently. Although thread-level speculation can avoid the need for precise …
cannot run independently. Although thread-level speculation can avoid the need for precise …
Efficient design space exploration via statistical sampling and AdaBoost learning
Design space exploration (DSE) has become a notoriously difficult problem due to the
exponentially increasing size of design space of microprocessors and time-consuming …
exponentially increasing size of design space of microprocessors and time-consuming …
Characterizing and comparing prevailing simulation techniques
Due to the simulation time of the reference input set, architects often use alternative
simulation techniques. Although these alternatives reduce the simulation time, what has not …
simulation techniques. Although these alternatives reduce the simulation time, what has not …
BioPerf: A benchmark suite to evaluate high-performance computer architecture on bioinformatics applications
The exponential growth in the amount of genomic data has spurred growing interest in large
scale analysis of genetic information. Bioinformatics applications, which explore …
scale analysis of genetic information. Bioinformatics applications, which explore …