- Academic Search

Y Zheng, A Kamil, MB Driscoll, H Shan… - 2014 IEEE 28th …, 2014 - ieeexplore.ieee.org

Partitioned Global Address Space (PGAS) languages are convenient for expressing
algorithms with large, random-access data, and they have proven to provide high …

Zapisz Cytuj Cytowane przez 258 Powiązane artykuły Wszystkie wersje 12

[Free GPT-4]
[DeepSeek]

[PDF] utexas.edu

Sequoia: Programming the memory hierarchy

K Fatahalian, DR Horn, TJ Knight, L Leem… - Proceedings of the …, 2006 - dl.acm.org

We present Sequoia, a programming language designed to facilitate the development of
memory hierarchy aware parallel programs that remain portable across modern machines …

Zapisz Cytuj Cytowane przez 684 Powiązane artykuły Wszystkie wersje 33

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

SPIRAL: Extreme performance portability

F Franchetti, TM Low, DT Popovici… - Proceedings of the …, 2018 - ieeexplore.ieee.org

In this paper, we address the question of how to automatically map computational kernels to
highly efficient code for a wide range of computing platforms and establish the correctness of …

Zapisz Cytuj Cytowane przez 137 Powiązane artykuły Wszystkie wersje 18

[Free GPT-4]
[DeepSeek]

[PDF] escholarship.org

Trends in data locality abstractions for HPC systems

D Unat, A Dubey, T Hoefler, J Shalf… - … on Parallel and …, 2017 - ieeexplore.ieee.org

The cost of data movement has always been an important concern in high performance
computing (HPC) systems. It has now become the dominant factor in terms of both energy …

Zapisz Cytuj Cytowane przez 124 Powiązane artykuły Wszystkie wersje 39

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Exascale computing trends: Adjusting to the" new normal"'for computer architecture

P Kogge, J Shalf - Computing in Science & Engineering, 2013 - ieeexplore.ieee.org

We now have 20 years of data under our belt about the performance of supercomputers
against at least a single floating-point benchmark from dense linear algebra. Until about …

Zapisz Cytuj Cytowane przez 152 Powiązane artykuły Wszystkie wersje 7

[Free GPT-4]
[DeepSeek]

[PDF] escholarship.org

UPC++: A high-performance communication framework for asynchronous computation

J Bachan, SB Baden, S Hofmeyr… - 2019 IEEE …, 2019 - ieeexplore.ieee.org

UPC++ is a C++ library that supports high-performance computation via an asynchronous
communication framework. This paper describes a new incarnation that differs substantially …

Zapisz Cytuj Cytowane przez 84 Powiązane artykuły Wszystkie wersje 12

[Free GPT-4]
[DeepSeek]

[PDF] cmu.edu

The locality descriptor: A holistic cross-layer abstraction to express data locality in GPUs

N Vijaykumar, E Ebrahimi, K Hsieh… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org

Exploiting data locality in GPUs is critical to making more efficient use of the existing caches
and the NUMA-based memory hierarchy expected in future GPUs. While modern GPU …

Zapisz Cytuj Cytowane przez 79 Powiązane artykuły Wszystkie wersje 9

[Free GPT-4]
[DeepSeek]

[PDF] illinois.edu

Runnemede: An architecture for ubiquitous high-performance computing

NP Carter, A Agrawal, S Borkar… - 2013 IEEE 19th …, 2013 - ieeexplore.ieee.org

DARPA's Ubiquitous High-Performance Computing (UHPC) program asked researchers to
develop computing systems capable of achieving energy efficiencies of 50 GOPS/Watt …

Zapisz Cytuj Cytowane przez 136 Powiązane artykuły Wszystkie wersje 13

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Partitioning streaming parallelism for multi-cores: a machine learning based approach

Z Wang, MFP O'Boyle - Proceedings of the 19th international conference …, 2010 - dl.acm.org

Stream based languages are a popular approach to expressing parallelism in modern
applications. The efficient map** of streaming parallelism to multi-core processors is …

Zapisz Cytuj Cytowane przez 133 Powiązane artykuły Wszystkie wersje 5

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Hierarchical place trees: A portable abstraction for task parallelism and data movement

Y Yan, J Zhao, Y Guo, V Sarkar - … , LCPC 2009, Newark, DE, USA, October …, 2010 - Springer

Modern computer systems feature multiple homogeneous or heterogeneous computing
units with deep memory hierarchies, and expect a high degree of thread-level parallelism …

Zapisz Cytuj Cytowane przez 143 Powiązane artykuły Wszystkie wersje 12

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Programming for parallelism and locality with hierarchically tiled arrays

UPC++: a PGAS extension for C++

Sequoia: Programming the memory hierarchy

SPIRAL: Extreme performance portability

Trends in data locality abstractions for HPC systems

Exascale computing trends: Adjusting to the" new normal"'for computer architecture

UPC++: A high-performance communication framework for asynchronous computation

The locality descriptor: A holistic cross-layer abstraction to express data locality in GPUs

Runnemede: An architecture for ubiquitous high-performance computing

Partitioning streaming parallelism for multi-cores: a machine learning based approach

Hierarchical place trees: A portable abstraction for task parallelism and data movement