Enabling preemptive multiprogramming on GPUs
GPUs are being increasingly adopted as compute accelerators in many domains, spanning
environments from mobile systems to cloud computing. These systems are usually running …
environments from mobile systems to cloud computing. These systems are usually running …
SPIDER-based out-of-order execution scheme for Ht-MPSOC
In this work, the influence of the dynamic task scheduling process is examined. Out-of-order
(OoO) implementation processes exhibit remarkable guarantee for task-level parallelism in …
(OoO) implementation processes exhibit remarkable guarantee for task-level parallelism in …
Hybrid dataflow/von-Neumann architectures
General purpose hybrid dataflow/von-Neumann architectures are gaining attraction as
effective parallel platforms. Although different implementations differ in the way they merge …
effective parallel platforms. Although different implementations differ in the way they merge …
Adaptive, efficient, parallel execution of parallel programs
S Sridharan, G Gupta, GS Sohi - Proceedings of the 35th ACM SIGPLAN …, 2014 - dl.acm.org
Future multicore processors will be heterogeneous, be increasingly less reliable, and
operate in dynamically changing operating conditions. Such environments will result in a …
operate in dynamically changing operating conditions. Such environments will result in a …
Accelerating RTL Simulation with Hardware-Software Co-Design
Fast simulation of digital circuits is crucial to build modern chips. But RTL (Register-Transfer-
Level) simulators are slow, as they cannot exploit multicores well. Slow simulation lengthens …
Level) simulators are slow, as they cannot exploit multicores well. Slow simulation lengthens …
A ubiquitous machine learning accelerator with automatic parallelization on FPGA
Machine learning has been widely applied in various emerging data-intensive applications,
and has to be optimized and accelerated by powerful engines to process very large scale …
and has to be optimized and accelerated by powerful engines to process very large scale …
TERAFLUX: Harnessing dataflow in next generation teradevices
The improvements in semiconductor technologies are gradually enabling extreme-scale
systems such as teradevices (ie, chips composed by 1000 billion of transistors), most likely …
systems such as teradevices (ie, chips composed by 1000 billion of transistors), most likely …
A scalable architecture for reprioritizing ordered parallelism
Many algorithms schedule their work, or tasks, according to a priority order for correctness or
faster convergence. While priority schedulers commonly implement task enqueue and …
faster convergence. While priority schedulers commonly implement task enqueue and …
MUSA: a multi-level simulation approach for next-generation HPC machines
The complexity of High Performance Computing (HPC) systems is increasing in the number
of components and their heterogeneity. Interactions between software and hardware involve …
of components and their heterogeneity. Interactions between software and hardware involve …
Dataflow execution of sequential imperative programs on multicore architectures
G Gupta, GS Sohi - Proceedings of the 44th annual IEEE/ACM …, 2011 - dl.acm.org
As multicore processors become the default, researchers are aggressively looking for
program execution models that make it easier to use the available resources. Multithreaded …
program execution models that make it easier to use the available resources. Multithreaded …