- Academic Search

H Schweizer, M Besta, T Hoefler - … International Conference on …, 2015 - ieeexplore.ieee.org

Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-Add (FAA) are
ubiquitous in parallel programming. Yet, performance tradeoffs between these operations …

Save Cite Cited by 125 Related articles All 27 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Memory performance of AMD EPYC Rome and Intel Cascade Lake SP server processors

M Velten, R Schöne, T Ilsche… - … of the 2022 ACM/SPEC on …, 2022 - dl.acm.org

Modern processors, in particular within the server segment, integrate more cores with each
generation. This increases their complexity in general, and that of the memory hierarchy in …

Save Cite Cited by 36 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] kaist.ac.kr

A comparison of binarization methods for historical archive documents

J He, QDM Do, AC Downton… - … Conference on Document …, 2005 - ieeexplore.ieee.org

This paper compares several alternative binarization algorithms for historical archive
documents, by evaluating their effect on end-to-end word recognition performance in a …

Save Cite Cited by 178 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] spec.org

Test-driving intel xeon phi

J Fang, H Sips, L Zhang, C Xu, Y Che… - Proceedings of the 5th …, 2014 - dl.acm.org

Based on Intel's Many Integrated Core (MIC) architecture, Intel Xeon Phi is one of the few
truly many-core CPUs-featuring around 60 fairly powerful cores, two levels of caches, and …

Save Cite Cited by 123 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] unixer.de

Capability models for manycore memory systems: A case-study with Xeon Phi KNL

S Ramos, T Hoefler - 2017 IEEE International Parallel and …, 2017 - ieeexplore.ieee.org

Increasingly complex memory systems and onchip interconnects are developed to mitigate
the data movement bottlenecks in manycore processors. One example of such a complex …

Save Cite Cited by 68 Related articles All 28 versions Free GPT-4

A survey of performance modeling and simulation techniques for accelerator-based computing

U Lopez-Novoa, A Mendiburu… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org

The high performance computing landscape is shifting from collections of homogeneous
nodes towards heterogeneous systems, in which nodes consist of a combination of …

Save Cite Cited by 89 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Parallel transposition of sparse data structures

H Wang, W Liu, K Hou, W Feng - … of the 2016 international conference on …, 2016 - dl.acm.org

Many applications in computational sciences and social sciences exploit sparsity and
connectivity of acquired data. Even though many parallel sparse primitives such as sparse …

Save Cite Cited by 55 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] bilkent.edu.tr

Exploiting locality in sparse matrix-matrix multiplication on many-core architectures

K Akbudak, C Aykanat - IEEE Transactions on Parallel and …, 2017 - ieeexplore.ieee.org

Exploiting spatial and temporal localities is investigated for efficient row-by-row
parallelization of general sparse matrix-matrix multiplication (SpGEMM) operation of the …

Save Cite Cited by 47 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] researchgate.net

Ultra-scalable CPU-MIC acceleration of mesoscale atmospheric modeling on Tianhe-2

W Xue, C Yang, H Fu, X Wang, Y Xu… - IEEE Transactions …, 2014 - ieeexplore.ieee.org

In this work an ultra-scalable algorithm is designed and optimized to accelerate a 3D
compressible Euler atmospheric model on the CPU-MIC hybrid system of Tianhe-2. We first …

Save Cite Cited by 64 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] susu.ru

Energy, memory, and runtime tradeoffs for implementing collective communication operations

T Hoefler, D Moor - Supercomputing frontiers and innovations, 2014 - superfri.susu.ru

Collective operations are among the most important communication operations in shared-
and distributed-memory parallel applications. In this paper, we analyze the tradeoffs …

Save Cite Cited by 56 Related articles All 28 versions Free GPT-4 View as HTML

Cite

Advanced search

Saved to My library

Evaluating the cost of atomic operations on modern architectures

Memory performance of AMD EPYC Rome and Intel Cascade Lake SP server processors

A comparison of binarization methods for historical archive documents

Test-driving intel xeon phi

Capability models for manycore memory systems: A case-study with Xeon Phi KNL

A survey of performance modeling and simulation techniques for accelerator-based computing

Parallel transposition of sparse data structures

Exploiting locality in sparse matrix-matrix multiplication on many-core architectures

Ultra-scalable CPU-MIC acceleration of mesoscale atmospheric modeling on Tianhe-2

Energy, memory, and runtime tradeoffs for implementing collective communication operations