Repairing sequential consistency in C/C++ 11
The C/C++ 11 memory model defines the semantics of concurrent memory accesses in
C/C++, and in particular supports racy" atomic" accesses at a range of different consistency …
C/C++, and in particular supports racy" atomic" accesses at a range of different consistency …
Sledge: A serverless-first, light-weight wasm runtime for the edge
Emerging IoT applications with real-time latency constraints require new data processing
systems operating at the Edge. Serverless computing offers a new compelling paradigm …
systems operating at the Edge. Serverless computing offers a new compelling paradigm …
Taskflow: A lightweight parallel and heterogeneous task graph computing system
Taskflow aims to streamline the building of parallel and heterogeneous applications using a
lightweight task graph-based approach. Taskflow introduces an expressive task graph …
lightweight task graph-based approach. Taskflow introduces an expressive task graph …
CDSchecker: checking concurrent data structures written with C/C++ atomics
B Norris, B Demsky - Proceedings of the 2013 ACM SIGPLAN …, 2013 - dl.acm.org
Writing low-level concurrent software has traditionally required intimate knowledge of the
entire toolchain and often has involved coding in assembly. New language standards have …
entire toolchain and often has involved coding in assembly. New language standards have …
Compass: strong and compositional library specifications in relaxed memory separation logic
Several functional correctness criteria have been proposed for relaxed-memory consistency
libraries, but most lack support for modular client reasoning. Mével and Jourdan recently …
libraries, but most lack support for modular client reasoning. Mével and Jourdan recently …
Promising-ARM/RISC-V: a simpler and faster operational concurrency model
For ARMv8 and RISC-V, there are concurrency models in two styles, extensionally
equivalent: axiomatic models, expressing the concurrency semantics in terms of global …
equivalent: axiomatic models, expressing the concurrency semantics in terms of global …
TaroRTL: Accelerating RTL Simulation using Coroutine-based Heterogeneous Task Graph Scheduling
RTL simulation is critical for validating hardware designs. However, RTL simulation can be
time-consuming for large designs. Existing RTL simulators have leveraged task graph …
time-consuming for large designs. Existing RTL simulators have leveraged task graph …
Scheduling parallel computations by work stealing: A survey
J Yang, Q He - International Journal of Parallel Programming, 2018 - Springer
Work stealing has been proven to be an efficient technique for scheduling parallel
computations, and has been gaining popularity as the multiprocessor/multicore-processor …
computations, and has been gaining popularity as the multiprocessor/multicore-processor …
An efficient work-stealing scheduler for task dependency graph
Work-stealing is a key component of many parallel task graph libraries such as Intel
Threading Building Blocks (TBB) FlowGraph, Microsoft Task Parallel Library (TPL) Batch …
Threading Building Blocks (TBB) FlowGraph, Microsoft Task Parallel Library (TPL) Batch …
COATCheck: Verifying memory ordering at the hardware-OS interface
Modern computer systems include numerous compute elements, from CPUs to GPUs to
accelerators. Harnessing their full potential requires well-defined, properly-implemented …
accelerators. Harnessing their full potential requires well-defined, properly-implemented …