Dark silicon aware runtime map** for many-core systems: A patterning approach
Limitation on power budget in many-core systems leaves a fraction of on-chip resources
inactive, referred to as dark silicon. In such systems, an efficient run-time application …
inactive, referred to as dark silicon. In such systems, an efficient run-time application …
adBoost: Thermal aware performance boosting through dark silicon patterning
Increasing power densities of many-core systems leaves a fraction of on-chip resources
inactive, referred to as dark silicon. Efficient management of critical interlinked parameters …
inactive, referred to as dark silicon. Efficient management of critical interlinked parameters …
Pacmap: Topology map** of unstructured communication patterns onto non-contiguous allocations
In high performance computing (HPC), applications usually have many parallel tasks
running on multiple machine nodes. As these tasks intensively communicate with each …
running on multiple machine nodes. As these tasks intensively communicate with each …
Adjustable contiguity of run-time task allocation in networked many-core systems
In this paper, we propose a run-time map** algorithm, CASqA, for networked many-core
systems. In this algorithm, the level of contiguousness of the allocated processors (α) can be …
systems. In this algorithm, the level of contiguousness of the allocated processors (α) can be …
Simulation and optimization of HPC job allocation for jointly reducing communication and cooling costs
Performance and energy are critical aspects in high performance computing (HPC) data
centers. Highly parallel HPC applications that require multiple nodes usually run for long …
centers. Highly parallel HPC applications that require multiple nodes usually run for long …
Parallel job scheduling policies to improve fairness: A case study
Balancing fairness, user performance, and system performance is a critical concern when
develo** and installing parallel schedulers. Sandia uses a customized scheduler to …
develo** and installing parallel schedulers. Sandia uses a customized scheduler to …
Task scheduling for many-cores with S-NUCA caches
A many-core processor may comprise a large number of processing cores on a single chip.
The many-core's last-level shared cache can potentially be physically distributed alongside …
The many-core's last-level shared cache can potentially be physically distributed alongside …
Efficient top-k spatial locality search for co-located spatial web objects
In step with the web being used widely by mobile users, user location is becoming an
essential signal in services, including local intent search. Given a large set of spatial web …
essential signal in services, including local intent search. Given a large set of spatial web …
A multi-faceted approach to job placement for improved performance on extreme-scale systems
Job placement plays a pivotal role in application performance on supercomputers. We
present a multi-faceted exploration to influence placement in extreme-scale systems, to …
present a multi-faceted exploration to influence placement in extreme-scale systems, to …
Using task migration to improve non-contiguous processor allocation in NoC-based CMPs
In this paper, a processor allocation mechanism for NoC-based chip multiprocessors is
presented. Processor allocation is a well-known problem in parallel computer systems and …
presented. Processor allocation is a well-known problem in parallel computer systems and …