[PDF][PDF] Simpoint 3.0: Faster and more flexible program phase analysis

G Hamerly, E Perelman, J Lau… - Journal of Instruction …, 2005 - cseweb.ucsd.edu
This paper describes the new features available in the SimPoint 3.0 release. The release
provides two techniques for drastically reducing the run-time of SimPoint: faster searching to …

Understanding and optimizing asynchronous low-precision stochastic gradient descent

C De Sa, M Feldman, C Ré, K Olukotun - Proceedings of the 44th annual …, 2017 - dl.acm.org
Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in
machine learning and other domains. Since this is likely to continue for the foreseeable …

Cloak and dagger: from two permissions to complete control of the UI feedback loop

Y Fratantonio, C Qian, SP Chung… - 2017 IEEE Symposium …, 2017 - ieeexplore.ieee.org
The effectiveness of the Android permission system fundamentally hinges on the user's
correct understanding of the capabilities of the permissions being granted. In this paper, we …

RFVP: Rollback-free value prediction with safe-to-approximate loads

A Yazdanbakhsh, G Pekhimenko, B Thwaites… - ACM Transactions on …, 2016 - dl.acm.org
This article aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth
(bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach …

LoopPoint: Checkpoint-driven sampled simulation for multi-threaded applications

A Sabu, H Patil, W Heirman… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Generic multi-threaded sampled simulation has been a long-standing, challenging problem
with the potential to help change how researchers study modern, complex computing …

Memory centric characterization and analysis of spec cpu2017 suite

S Singh, M Awasthi - Proceedings of the 2019 ACM/SPEC International …, 2019 - dl.acm.org
In this paper, we provide a comprehensive, memory-centric characterization of the SPEC
CPU2017 benchmark suite, using a number of mechanisms including dynamic binary …

HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs

S Campanoni, K Brownell, S Kanev, TM Jones… - ACM SIGARCH …, 2014 - dl.acm.org
Data dependences in sequential programs limit parallelization because extracted threads
cannot run independently. Although thread-level speculation can avoid the need for precise …

Efficient design space exploration via statistical sampling and AdaBoost learning

D Li, S Yao, YH Liu, S Wang, XH Sun - Proceedings of the 53rd Annual …, 2016 - dl.acm.org
Design space exploration (DSE) has become a notoriously difficult problem due to the
exponentially increasing size of design space of microprocessors and time-consuming …

Characterizing and comparing prevailing simulation techniques

JJ Yi, SV Kodakara, R Sendag, DJ Lilja… - … Symposium on High …, 2005 - ieeexplore.ieee.org
Due to the simulation time of the reference input set, architects often use alternative
simulation techniques. Although these alternatives reduce the simulation time, what has not …

BioPerf: A benchmark suite to evaluate high-performance computer architecture on bioinformatics applications

DA Bader, Y Li, T Li, V Sachdeva - IEEE International. 2005 …, 2005 - ieeexplore.ieee.org
The exponential growth in the amount of genomic data has spurred growing interest in large
scale analysis of genetic information. Bioinformatics applications, which explore …