- Academic Search

U Bondhugula - Proceedings of the International Conference on High …, 2013 - dl.acm.org

We present new techniques for compilation of arbitrarily nested loops with affine
dependences for distributed-memory parallel architectures. Our framework is implemented …

Opslaan Citeren Geciteerd door 119 Verwante artikelen Alle 8 versies

[Free GPT-4]
[DeepSeek]

[PDF] illinois.edu

Work stealing and persistence-based load balancers for iterative overdecomposed applications

J Lifflander, S Krishnamoorthy, LV Kale - Proceedings of the 21st …, 2012 - dl.acm.org

Applications often involve iterative execution of identical or slowly evolving calculations.
Such applications require incremental rebalancing to improve load balance across …

Opslaan Citeren Geciteerd door 88 Verwante artikelen Alle 14 versies

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Matching memory access patterns and data placement for NUMA systems

Z Majo, TR Gross - Proceedings of the Tenth International Symposium …, 2012 - dl.acm.org

Many recent multicore multiprocessors are based on a nonuniform memory architecture
(NUMA). A mismatch between the data access patterns of programs and the map** of …

Opslaan Citeren Geciteerd door 78 Verwante artikelen Alle 4 versies

User-defined distributions and layouts in Chapel: Philosophy and framework

BL Chamberlain, SJ Deitz, D Iten, SE Choi - Proceedings of the 2nd …, 2010 - dl.acm.org

This paper introduces user-defined domain maps, a novel concept for implementing
distributions and memory layouts for parallel data aggregates. Our domain maps implement …

Opslaan Citeren Geciteerd door 69 Verwante artikelen

[Free GPT-4]
[DeepSeek]

[PDF] googleapis.com

Digital data processing method and system

US Patent 8,429,625, 2013 - Google Patents

(57) ABSTRACT A method and system for processing generic formatted data, including first
data describing a sequence of generic opera tions without any loops, in view of providing …

Opslaan Citeren Geciteerd door 59 Verwante artikelen Alle 4 versies In cache

[Free GPT-4]
[DeepSeek]

[HTML] acm.org

An approach to data distributions in Chapel

RE Diaconescu, HP Zima - The International Journal of High …, 2007 - journals.sagepub.com

A key characteristic of today's high performance computing systems is a physically
distributed memory, which makes the efficient management of locality essential for taking …

Opslaan Citeren Geciteerd door 57 Verwante artikelen Alle 9 versies

[Free GPT-4]
[DeepSeek]

[PDF] fudan.edu.cn

Morphology and phase diagram of linear triblock copolymers: Parallel real-space self-consistent-field-theory simulation

M Sun, P Wang, F Qiu, P Tang, H Zhang, Y Yang - Physical Review E …, 2008 - APS

A parallel algorithm designed for widely used distributed computer clusters is developed for
the real-space self-consistent-field theory for polymers. We adopt an efficient data partition …

Opslaan Citeren Geciteerd door 43 Verwante artikelen Alle 7 versies

[Free GPT-4]
[DeepSeek]

[PDF] iisc.ac.in

[PDF][PDF] Automatic distributed-memory parallelization and code generation using the polyhedral framework

U Bondhugula - en. В: Technical report, ISc-CSA-TR-2011-3 (сент …, 2011 - csa.iisc.ac.in

Compiling for distributed-memory parallel architectures is considered very challenging. In
spite of the large amount of work done to address this problem, no practical and efficient …

Opslaan Citeren Geciteerd door 28 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] archive.org

Effective automatic computation placement and data allocation for parallelization of regular programs

C Reddy, U Bondhugula - Proceedings of the 28th ACM international …, 2014 - dl.acm.org

This paper proposes techniques for data allocation and computation map** when
compiling affine loop nest sequences for distributed-memory clusters. Techniques for …

Opslaan Citeren Geciteerd door 25 Verwante artikelen Alle 3 versies

Optimally maximizing iteration-level loop parallelism

D Liu, Y Wang, Z Shao, M Guo… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org

Loops are the main source of parallelism in many applications. This paper solves the open
problem of extracting the maximal number of iterations from a loop to run parallel on chip …

Opslaan Citeren Geciteerd door 28 Verwante artikelen Alle 4 versies

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations

Compiling affine loop nests for distributed-memory parallel architectures

Work stealing and persistence-based load balancers for iterative overdecomposed applications

Matching memory access patterns and data placement for NUMA systems

User-defined distributions and layouts in Chapel: Philosophy and framework

Digital data processing method and system

An approach to data distributions in Chapel

Morphology and phase diagram of linear triblock copolymers: Parallel real-space self-consistent-field-theory simulation

[PDF][PDF] Automatic distributed-memory parallelization and code generation using the polyhedral framework

Effective automatic computation placement and data allocation for parallelization of regular programs

Optimally maximizing iteration-level loop parallelism