[BOK][B] Memory systems: cache, DRAM, disk
B Jacob, D Wang, S Ng - 2010 - books.google.com
Is your memory hierarchy stop** your microprocessor from performing at the high level it
should be? Memory Systems: Cache, DRAM, Disk shows you how to resolve this problem …
should be? Memory Systems: Cache, DRAM, Disk shows you how to resolve this problem …
Predicting whole-program locality through reuse distance analysis
C Ding, Y Zhong - Proceedings of the ACM SIGPLAN 2003 conference …, 2003 - dl.acm.org
Profiling can accurately analyze program behavior for select data inputs. We show that
profiling can also predict program locality for inputs other than profiled ones. Here locality is …
profiling can also predict program locality for inputs other than profiled ones. Here locality is …
Locality phase prediction
As computer memory hierarchy becomes adaptive, its performance increasingly depends on
forecasting the dynamic program locality. This paper presents a method that predicts the …
forecasting the dynamic program locality. This paper presents a method that predicts the …
Improving cache performance in dynamic applications through data and computation reorganization at run time
C Ding, K Kennedy - ACM SIGPLAN Notices, 1999 - dl.acm.org
With the rapid improvement of processor speed, performance of the memory hierarchy has
become the principal bottleneck for most applications. A number of compiler transformations …
become the principal bottleneck for most applications. A number of compiler transformations …
Program locality analysis using reuse distance
On modern computer systems, the memory performance of an application depends on its
locality. For a single execution, locality-correlated measures like average miss rate or …
locality. For a single execution, locality-correlated measures like average miss rate or …
[BOK][B] The compiler design handbook: optimizations and machine code generation
YN Srikant, P Shankar - 2002 - taylorfrancis.com
The widespread use of object-oriented languages and Internet security concerns are just the
beginning. Add embedded systems, multiple memory banks, highly pipelined units …
beginning. Add embedded systems, multiple memory banks, highly pipelined units …
Loci: A rule-based framework for parallel multi-disciplinary simulation synthesis
EA Luke, T George - Journal of Functional Programming, 2005 - cambridge.org
We present a rule-based framework for the development of scalable parallel high
performance simulations for a broad class of scientific applications (with particular emphasis …
performance simulations for a broad class of scientific applications (with particular emphasis …
Array regrou** and structure splitting using whole-program reference affinity
While the memory of most machines is organized as a hierarchy, program data are laid out
in a uniform address space. This paper defines a model of reference affinity, which …
in a uniform address space. This paper defines a model of reference affinity, which …
Compile-time composition of run-time data and iteration reorderings
MM Strout, L Carter, J Ferrante - Proceedings of the ACM SIGPLAN 2003 …, 2003 - dl.acm.org
Many important applications, such as those using sparse data structures, have memory
reference patterns that are unknown at compile-time. Prior work has developed run-time …
reference patterns that are unknown at compile-time. Prior work has developed run-time …
swSpTRSV: A fast sparse triangular solve with sparse level tile layout on sunway architectures
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …