From the future Si technology perspective: Challenges and opportunities

K Kim - 2010 International Electron Devices Meeting, 2010 - ieeexplore.ieee.org
As silicon technology enters sub-20nm nodes, new materials, structures and processes are
being introduced in order to continue with the advantages of dimensional scaling, eg, 3D …

Communication lower bounds and optimal algorithms for numerical linear algebra

G Ballard, E Carson, J Demmel, M Hoemmen… - Acta Numerica, 2014 - cambridge.org
The traditional metric for the efficiency of a numerical algorithm has been the number of
arithmetic operations it performs. Technological trends have long been reducing the time to …

[書籍][B] Space-filling curves: an introduction with applications in scientific computing

M Bader - 2012 - books.google.com
The present book provides an introduction to using space-filling curves (SFC) as tools in
scientific computing. Special focus is laid on the representation of SFC and on resulting …

A dependency-aware task-based programming environment for multi-core architectures

JM Perez, RM Badia, J Labarta - 2008 IEEE international …, 2008 - ieeexplore.ieee.org
Parallel programming on SMP and multi-core architectures is hard. In this paper we present
a programming model for those environments based on automatic function level parallelism …

Communication-optimal parallel recursive rectangular matrix multiplication

J Demmel, D Eliahu, A Fox, S Kamil… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Communication-optimal algorithms are known for square matrix multiplication. Here, we
obtain the first communication-optimal algorithm for all dimensions of rectangular matrices …

Graphgrind: Addressing load imbalance of graph partitioning

J Sun, H Vandierendonck… - Proceedings of the …, 2017 - dl.acm.org
We investigate how graph partitioning adversely affects the performance of graph analytics.
We demonstrate that graph partitioning induces extra work during graph traversal and that …

An algorithm for the optimal control of the driving of trains

R Franke, P Terwiesch, M Meyer - Proceedings of the 39th IEEE …, 2000 - ieeexplore.ieee.org
We discuss an algorithm that optimizes the driving style of a train. The objective is to
minimize the electrical energy used for traction subject to constraints such as the travel time …

SuperMatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures

E Chan, ES Quintana-Orti, G Quintana-Orti… - Proceedings of the …, 2007 - dl.acm.org
We discuss the high-performance parallel implementation and execution of dense linear
algebra matrix operations on SMP architectures, with an eye towards multi-core processors …

The cache-oblivious gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation

RA Chowdhury, V Ramachandran - Proceedings of the nineteenth …, 2007 - dl.acm.org
The Gaussian Elimination Paradigm (GEP) was introduced by the authors in [6] to represent
the triply-nested loop computation that occurs in several important algorithms including …

ULCC: a user-level facility for optimizing shared cache performance on multicores

X Ding, K Wang, X Zhang - Proceedings of the 16th ACM symposium on …, 2011 - dl.acm.org
Scientific applications face serious performance challenges on multicore processors, one of
which is caused by access contention in last level shared caches from multiple running …