On-the-fly pipeline parallelism
Pipeline parallelism organizes a parallel program as a linear sequence of stages. Each
stage processes elements of a data stream, passing each processed data element to the …
stage processes elements of a data stream, passing each processed data element to the …
Scheduling parallel computations by work stealing: A survey
J Yang, Q He - International Journal of Parallel Programming, 2018 - Springer
Work stealing has been proven to be an efficient technique for scheduling parallel
computations, and has been gaining popularity as the multiprocessor/multicore-processor …
computations, and has been gaining popularity as the multiprocessor/multicore-processor …
Threadscan: Automatic and scalable memory reclamation
The concurrent memory reclamation problem is that of devising a way for a deallocating
thread to verify that no other concurrent threads hold references to a memory block being …
thread to verify that no other concurrent threads hold references to a memory block being …
Opencilk: A modular and extensible software infrastructure for fast task-parallel code
This paper presents OpenCilk, an open-source software infrastructure for task-parallel
programming that allows for substantial code reuse and easy exploration of design choices …
programming that allows for substantial code reuse and easy exploration of design choices …
Manycore clique enumeration with fast set intersections
Listing all maximal cliques of a given graph has important applications in the analysis of
social and biological networks. Parallelisation of maximal clique enumeration (MCE) …
social and biological networks. Parallelisation of maximal clique enumeration (MCE) …
Proactive work stealing for futures
The use of futures provides a flexible way to express parallelism and can generate arbitrary
dependences among parallel subcomputations. The additional flexibility that futures provide …
dependences among parallel subcomputations. The additional flexibility that futures provide …
Bws: balanced work stealing for time-sharing multicores
Running multithreaded programs in multicore systems has become a common practice for
many application domains. Work stealing is a widely-adopted and effective approach for …
many application domains. Work stealing is a widely-adopted and effective approach for …
Libfork: portable continuation-stealing with stackless coroutines
Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to
its optimal time scaling and strong bounds on memory scaling. The latter is rarely achieved …
its optimal time scaling and strong bounds on memory scaling. The latter is rarely achieved …
Heartbeat scheduling: Provable efficiency for nested parallelism
A classic problem in parallel computing is to take a high-level parallel program written, for
example, in nested-parallel style with fork-join constructs and run it efficiently on a real …
example, in nested-parallel style with fork-join constructs and run it efficiently on a real …
Fence-free work stealing on bounded TSO processors
Work stealing is the method of choice for load balancing in task parallel programming
languages and frameworks. Yet despite considerable effort invested in optimizing work …
languages and frameworks. Yet despite considerable effort invested in optimizing work …