An evaluation of high-level mechanistic core models

TE Carlson, W Heirman, S Eyerman, I Hur… - ACM Transactions on …, 2014 - dl.acm.org
Large core counts and complex cache hierarchies are increasing the burden placed on
commonly used simulation and modeling techniques. Although analytical models provide …

Personalized route recommendation using big trajectory data

J Dai, B Yang, C Guo, Z Ding - 2015 IEEE 31st international …, 2015 - ieeexplore.ieee.org
When planning routes, drivers usually consider a multitude of different travel costs, eg,
distances, travel times, and fuel consumption. Different drivers may choose different routes …

Cache decay: Exploiting generational behavior to reduce cache leakage power

S Kaxiras, Z Hu, M Martonosi - Proceedings of the 28th annual …, 2001 - dl.acm.org
Power dissipation is increasingly important in CPUs ranging from those intended for mobile
use, all the way up to high-performance processors for high-end servers. While the bulk of …

Using Hardware Performance Monitors to Understand the Behavior of Java Applications.

PF Sweeney, M Hauswirth, B Cahoon… - … Machine Research and …, 2004 - usenix.org
Modern Java programs, such as middleware and application servers, include many complex
software components. Improving the performance of these Java applications requires a …

Continuous profiling: Where have all the cycles gone?

JM Anderson, LM Berc, J Dean, S Ghemawat… - ACM Transactions on …, 1997 - dl.acm.org
This article describes the Digital Continuous Profiling Infrastructure, a sampling-based
profiling system designed to run continuously on production systems. The system supports …

A new memory monitoring scheme for memory-aware scheduling and partitioning

GE Suh, S Devadas, L Rudolph - … International Symposium on …, 2002 - ieeexplore.ieee.org
We propose a low overhead, online memory monitoring scheme utilizing a set of novel
hardware counters. The counters indicate the marginal gain in cache hits as the size of the …

Optimizing main-memory join on modern hardware

S Manegold, P Boncz, M Kersten - IEEE transactions on …, 2002 - ieeexplore.ieee.org
In the past decade, the exponential growth in commodity CPU's speed has far outpaced
advances in memory latency. A second trend is that CPU performance advances are not …

A performance counter architecture for computing accurate CPI components

S Eyerman, L Eeckhout, T Karkhanis, JE Smith - ACM SIGPLAN Notices, 2006 - dl.acm.org
A common way of representing processor performance is to use Cycles per Instruction
(CPI)stacks' which break performance into a baseline CPI plus a number of individual miss …

ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis

R Bell, AD Malony, S Shende - Euro-Par 2003 Parallel Processing: 9th …, 2003 - Springer
This paper presents the design, implementation, and application of ParaProf, a portable,
extensible, and scalable tool for parallel performance profile analysis. ParaProf attempts to …

Analytical cache models with applications to cache partitioning

GE Suh, S Devadas, L Rudolph - ACM International Conference on …, 2001 - dl.acm.org
An accurate, tractable, analytic cache model for time-shared systems is presented, which
estimates the overall cache miss-rate of a multiprocessing system with any cache size and …