Evaluating the cost of atomic operations on modern architectures

H Schweizer, M Besta, T Hoefler - … International Conference on …, 2015 - ieeexplore.ieee.org
Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-Add (FAA) are
ubiquitous in parallel programming. Yet, performance tradeoffs between these operations …

Memory performance of AMD EPYC Rome and Intel Cascade Lake SP server processors

M Velten, R Schöne, T Ilsche… - … of the 2022 ACM/SPEC on …, 2022 - dl.acm.org
Modern processors, in particular within the server segment, integrate more cores with each
generation. This increases their complexity in general, and that of the memory hierarchy in …

A comparison of binarization methods for historical archive documents

J He, QDM Do, AC Downton… - … Conference on Document …, 2005 - ieeexplore.ieee.org
This paper compares several alternative binarization algorithms for historical archive
documents, by evaluating their effect on end-to-end word recognition performance in a …

Test-driving intel xeon phi

J Fang, H Sips, L Zhang, C Xu, Y Che… - Proceedings of the 5th …, 2014 - dl.acm.org
Based on Intel's Many Integrated Core (MIC) architecture, Intel Xeon Phi is one of the few
truly many-core CPUs-featuring around 60 fairly powerful cores, two levels of caches, and …

Capability models for manycore memory systems: A case-study with Xeon Phi KNL

S Ramos, T Hoefler - 2017 IEEE International Parallel and …, 2017 - ieeexplore.ieee.org
Increasingly complex memory systems and onchip interconnects are developed to mitigate
the data movement bottlenecks in manycore processors. One example of such a complex …

A survey of performance modeling and simulation techniques for accelerator-based computing

U Lopez-Novoa, A Mendiburu… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
The high performance computing landscape is shifting from collections of homogeneous
nodes towards heterogeneous systems, in which nodes consist of a combination of …

Parallel transposition of sparse data structures

H Wang, W Liu, K Hou, W Feng - … of the 2016 international conference on …, 2016 - dl.acm.org
Many applications in computational sciences and social sciences exploit sparsity and
connectivity of acquired data. Even though many parallel sparse primitives such as sparse …

Exploiting locality in sparse matrix-matrix multiplication on many-core architectures

K Akbudak, C Aykanat - IEEE Transactions on Parallel and …, 2017 - ieeexplore.ieee.org
Exploiting spatial and temporal localities is investigated for efficient row-by-row
parallelization of general sparse matrix-matrix multiplication (SpGEMM) operation of the …

Ultra-scalable CPU-MIC acceleration of mesoscale atmospheric modeling on Tianhe-2

W Xue, C Yang, H Fu, X Wang, Y Xu… - IEEE Transactions …, 2014 - ieeexplore.ieee.org
In this work an ultra-scalable algorithm is designed and optimized to accelerate a 3D
compressible Euler atmospheric model on the CPU-MIC hybrid system of Tianhe-2. We first …

Energy, memory, and runtime tradeoffs for implementing collective communication operations

T Hoefler, D Moor - Supercomputing frontiers and innovations, 2014 - superfri.susu.ru
Collective operations are among the most important communication operations in shared-
and distributed-memory parallel applications. In this paper, we analyze the tradeoffs …