Scaanalyzer: A tool to identify memory scalability bottlenecks in parallel programs

X Liu, B Wu - Proceedings of the International Conference for High …, 2015 - dl.acm.org
It is difficult to scale parallel programs in a system that employs a large number of cores. To
identify scalability bottlenecks, existing tools principally pinpoint poor thread synchronization …

Parallelism-centric what-if and differential analyses

A Yoga, S Nagarakatte - Proceedings of the 40th ACM SIGPLAN …, 2019 - dl.acm.org
This paper proposes TaskProf2, a parallelism profiler and an adviser for task parallel
programs. As a parallelism profiler, TaskProf2 pinpoints regions with serialization …

Persistent unfairness arising from cache residency imbalance

D Dice, VJ Marathe, N Shavit - Proceedings of the 26th ACM symposium …, 2014 - dl.acm.org
We describe a counter-intuitive performance phenomena relevant to concurrency research.
On a modern multicore system with a shared last-level cache, a set of concurrently running …

[PDF][PDF] Implicit acceleration of critical sections via unsuccessful speculation

J Izraelevitz, A Kogan, Y Lev - 11th ACM SIGPLAN Wkshp …, 2016 - anon.cs.rochester.edu
The speculative execution of critical sections, whether done using HTM via the transactional
lock elision pattern or using a software solution such as STM or a sequence lock, has the …

Unification of static and dynamic analyses to enable vectorization

A Rane, R Krishnaiyer, CJ Newburn, J Browne… - … and Compilers for …, 2015 - Springer
Modern compilers execute sophisticated static analyses to enable optimization across a
wide spectrum of code patterns. However, there are many cases where even the most …

Performance model based on memory footprint for OpenMP memory bound applications

C Allande, J Jorba, A Sikora… - Parallel Computing: On …, 2016 - ebooks.iospress.nl
Performance of memory intensive applications executed on multi-core multi-socket
environments is closely related to the utilization of shared resources in the memory …

Performance Profilers and Debugging Tools for OpenMp Applications

NB Moradi - 2021 - search.proquest.com
OpenMP is a popular application programming interface (API) used to write shared-memory
parallel programs. It supports a wide range of parallel constructs to express different types of …

[PDF][PDF] HPerf: A Lightweight Profiler for Task Distribution on CPU+ GPU Platforms

JH Lee, N Nigania, H Kim, B Brett - 2015 - repository.gatech.edu
Heterogeneous computing has emerged as one of the major computing platforms in many
domains. Although there have been several proposals to aid programming for …

Parallelism-Driven Performance Analysis Techniques for Task Parallel Programs

A Yoga - 2019 - search.proquest.com
Performance analysis of parallel programs continues to be challenging for programmers.
Programmers have to account for several factors to extract the best possible performance …

Hardware, software and algorithm to precisely predict performance of SoC when a processor and other masters access single-port memory simultaneously

Y Li, E Simard, X Sun - US Patent 10,891,071, 2021 - Google Patents
(57) ABSTRACT A method, system, program control code, and hardware circuit are provided
for predicting performance of an sys tem-on-chip (SOC)(100) having a processor (105) and …