Compiling affine loop nests for distributed-memory parallel architectures

U Bondhugula - Proceedings of the International Conference on High …, 2013 - dl.acm.org
We present new techniques for compilation of arbitrarily nested loops with affine
dependences for distributed-memory parallel architectures. Our framework is implemented …

Work stealing and persistence-based load balancers for iterative overdecomposed applications

J Lifflander, S Krishnamoorthy, LV Kale - Proceedings of the 21st …, 2012 - dl.acm.org
Applications often involve iterative execution of identical or slowly evolving calculations.
Such applications require incremental rebalancing to improve load balance across …

Matching memory access patterns and data placement for NUMA systems

Z Majo, TR Gross - Proceedings of the Tenth International Symposium …, 2012 - dl.acm.org
Many recent multicore multiprocessors are based on a nonuniform memory architecture
(NUMA). A mismatch between the data access patterns of programs and the map** of …

User-defined distributions and layouts in Chapel: Philosophy and framework

BL Chamberlain, SJ Deitz, D Iten, SE Choi - Proceedings of the 2nd …, 2010 - dl.acm.org
This paper introduces user-defined domain maps, a novel concept for implementing
distributions and memory layouts for parallel data aggregates. Our domain maps implement …

Digital data processing method and system

US Patent 8,429,625, 2013 - Google Patents
(57) ABSTRACT A method and system for processing generic formatted data, including first
data describing a sequence of generic opera tions without any loops, in view of providing …

An approach to data distributions in Chapel

RE Diaconescu, HP Zima - The International Journal of High …, 2007 - journals.sagepub.com
A key characteristic of today's high performance computing systems is a physically
distributed memory, which makes the efficient management of locality essential for taking …

Morphology and phase diagram of linear triblock copolymers: Parallel real-space self-consistent-field-theory simulation

M Sun, P Wang, F Qiu, P Tang, H Zhang, Y Yang - Physical Review E …, 2008 - APS
A parallel algorithm designed for widely used distributed computer clusters is developed for
the real-space self-consistent-field theory for polymers. We adopt an efficient data partition …

[PDF][PDF] Automatic distributed-memory parallelization and code generation using the polyhedral framework

U Bondhugula - en. В: Technical report, ISc-CSA-TR-2011-3 (сент …, 2011 - csa.iisc.ac.in
Compiling for distributed-memory parallel architectures is considered very challenging. In
spite of the large amount of work done to address this problem, no practical and efficient …

Effective automatic computation placement and data allocation for parallelization of regular programs

C Reddy, U Bondhugula - Proceedings of the 28th ACM international …, 2014 - dl.acm.org
This paper proposes techniques for data allocation and computation map** when
compiling affine loop nest sequences for distributed-memory clusters. Techniques for …

Optimally maximizing iteration-level loop parallelism

D Liu, Y Wang, Z Shao, M Guo… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org
Loops are the main source of parallelism in many applications. This paper solves the open
problem of extracting the maximal number of iterations from a loop to run parallel on chip …