Scaanalyzer: A tool to identify memory scalability bottlenecks in parallel programs
It is difficult to scale parallel programs in a system that employs a large number of cores. To
identify scalability bottlenecks, existing tools principally pinpoint poor thread synchronization …
identify scalability bottlenecks, existing tools principally pinpoint poor thread synchronization …
Parallelism-centric what-if and differential analyses
This paper proposes TaskProf2, a parallelism profiler and an adviser for task parallel
programs. As a parallelism profiler, TaskProf2 pinpoints regions with serialization …
programs. As a parallelism profiler, TaskProf2 pinpoints regions with serialization …
Persistent unfairness arising from cache residency imbalance
We describe a counter-intuitive performance phenomena relevant to concurrency research.
On a modern multicore system with a shared last-level cache, a set of concurrently running …
On a modern multicore system with a shared last-level cache, a set of concurrently running …
[PDF][PDF] Implicit acceleration of critical sections via unsuccessful speculation
The speculative execution of critical sections, whether done using HTM via the transactional
lock elision pattern or using a software solution such as STM or a sequence lock, has the …
lock elision pattern or using a software solution such as STM or a sequence lock, has the …
Unification of static and dynamic analyses to enable vectorization
Modern compilers execute sophisticated static analyses to enable optimization across a
wide spectrum of code patterns. However, there are many cases where even the most …
wide spectrum of code patterns. However, there are many cases where even the most …
Performance model based on memory footprint for OpenMP memory bound applications
Performance of memory intensive applications executed on multi-core multi-socket
environments is closely related to the utilization of shared resources in the memory …
environments is closely related to the utilization of shared resources in the memory …
Performance Profilers and Debugging Tools for OpenMp Applications
NB Moradi - 2021 - search.proquest.com
OpenMP is a popular application programming interface (API) used to write shared-memory
parallel programs. It supports a wide range of parallel constructs to express different types of …
parallel programs. It supports a wide range of parallel constructs to express different types of …
[PDF][PDF] HPerf: A Lightweight Profiler for Task Distribution on CPU+ GPU Platforms
Heterogeneous computing has emerged as one of the major computing platforms in many
domains. Although there have been several proposals to aid programming for …
domains. Although there have been several proposals to aid programming for …
Parallelism-Driven Performance Analysis Techniques for Task Parallel Programs
A Yoga - 2019 - search.proquest.com
Performance analysis of parallel programs continues to be challenging for programmers.
Programmers have to account for several factors to extract the best possible performance …
Programmers have to account for several factors to extract the best possible performance …
Hardware, software and algorithm to precisely predict performance of SoC when a processor and other masters access single-port memory simultaneously
Y Li, E Simard, X Sun - US Patent 10,891,071, 2021 - Google Patents
(57) ABSTRACT A method, system, program control code, and hardware circuit are provided
for predicting performance of an sys tem-on-chip (SOC)(100) having a processor (105) and …
for predicting performance of an sys tem-on-chip (SOC)(100) having a processor (105) and …