On-the-fly pipeline parallelism

ITA Lee, CE Leiserson, TB Schardl, Z Zhang… - ACM Transactions on …, 2015‏ - dl.acm.org
Pipeline parallelism organizes a parallel program as a linear sequence of stages. Each
stage processes elements of a data stream, passing each processed data element to the …

Scheduling parallel computations by work stealing: A survey

J Yang, Q He - International Journal of Parallel Programming, 2018‏ - Springer
Work stealing has been proven to be an efficient technique for scheduling parallel
computations, and has been gaining popularity as the multiprocessor/multicore-processor …

Threadscan: Automatic and scalable memory reclamation

D Alistarh, W Leiserson, A Matveev… - ACM Transactions on …, 2018‏ - dl.acm.org
The concurrent memory reclamation problem is that of devising a way for a deallocating
thread to verify that no other concurrent threads hold references to a memory block being …

Opencilk: A modular and extensible software infrastructure for fast task-parallel code

TB Schardl, ITA Lee - Proceedings of the 28th ACM SIGPLAN Annual …, 2023‏ - dl.acm.org
This paper presents OpenCilk, an open-source software infrastructure for task-parallel
programming that allows for substantial code reuse and easy exploration of design choices …

Manycore clique enumeration with fast set intersections

J Blanuša, R Stoica, P Ienne, K Atasu - Proceedings of the VLDB …, 2020‏ - dl.acm.org
Listing all maximal cliques of a given graph has important applications in the analysis of
social and biological networks. Parallelisation of maximal clique enumeration (MCE) …

Proactive work stealing for futures

K Singer, Y Xu, ITA Lee - Proceedings of the 24th Symposium on …, 2019‏ - dl.acm.org
The use of futures provides a flexible way to express parallelism and can generate arbitrary
dependences among parallel subcomputations. The additional flexibility that futures provide …

Bws: balanced work stealing for time-sharing multicores

X Ding, K Wang, PB Gibbons, X Zhang - Proceedings of the 7th ACM …, 2012‏ - dl.acm.org
Running multithreaded programs in multicore systems has become a common practice for
many application domains. Work stealing is a widely-adopted and effective approach for …

Libfork: portable continuation-stealing with stackless coroutines

CJ Williams, J Elliott - arxiv preprint arxiv:2402.18480, 2024‏ - arxiv.org
Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to
its optimal time scaling and strong bounds on memory scaling. The latter is rarely achieved …

Heartbeat scheduling: Provable efficiency for nested parallelism

UA Acar, A Charguéraud, A Guatto, M Rainey… - Proceedings of the 39th …, 2018‏ - dl.acm.org
A classic problem in parallel computing is to take a high-level parallel program written, for
example, in nested-parallel style with fork-join constructs and run it efficiently on a real …

Fence-free work stealing on bounded TSO processors

A Morrison, Y Afek - ACM SIGARCH Computer Architecture News, 2014‏ - dl.acm.org
Work stealing is the method of choice for load balancing in task parallel programming
languages and frameworks. Yet despite considerable effort invested in optimizing work …