Shinjuku: Preemptive Scheduling for {μsecond-scale} Tail Latency

K Kaffes, T Chong, JT Humphries, A Belay… - … USENIX Symposium on …, 2019 - usenix.org
The recently proposed dataplanes for microsecond scale applications, such as IX and
ZygOS, use non-preemptive policies to schedule requests to cores. For the many real-world …

Habanero-Java: the new adventures of old X10

V Cavé, J Zhao, J Shirako, V Sarkar - Proceedings of the 9th …, 2011 - dl.acm.org
In this paper, we present the Habanero-Java (HJ) language developed at Rice University as
an extension to the original Java-based definition of the X10 language. HJ includes a …

Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures

T Gautier, JVF Lima, N Maillard… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and
accelerators, like GPUs. Programming such nodes is typically based on a combination of …

Optimizing load balancing and data-locality with data-aware scheduling

K Wang, X Zhou, T Li, D Zhao, M Lang… - … Conference on Big …, 2014 - ieeexplore.ieee.org
Load balancing techniques (eg work stealing) are important to obtain the best performance
for distributed task scheduling systems that have multiple schedulers making scheduling …

Personal data lake with data gravity pull

C Walker, H Alrehamy - … Conference on Big Data and Cloud …, 2015 - ieeexplore.ieee.org
This paper presents Personal Data Lake, a unified storage facility for storing, analyzing and
querying personal data. A data lake stores data regardless of format and thus provides an …

Understanding energy behaviors of thread management constructs

G Pinto, F Castor, YD Liu - Proceedings of the 2014 ACM International …, 2014 - dl.acm.org
Java programmers are faced with numerous choices in managing concurrent execution on
multicore platforms. These choices often have different trade-offs (eg, performance …

[PDF][PDF] Hierarchical work stealing on manycore clusters

SJ Min, C Iancu, K Yelick - Fifth Conference on Partitioned Global Address …, 2011 - Citeseer
Abstract Partitioned Global Address Space languages like UPC offer a convenient way of
expressing large shared data structures, especially for irregular structures that require …

Customizable domain-specific computing

J Cong, V Sarkar, G Reinman… - IEEE Design & Test of …, 2010 - ieeexplore.ieee.org
To meet computing needs and overcome power density limitations, the computing industry
has entered the era of parallelization. However, highly parallel, general-purpose computing …

Scalable and precise dynamic datarace detection for structured parallelism

R Raman, J Zhao, V Sarkar, M Vechev, E Yahav - Acm Sigplan Notices, 2012 - dl.acm.org
Existing dynamic race detectors suffer from at least one of the following three limitations:(i)
space overhead per memory location grows linearly with the number of parallel threads [13] …

Taskstream: Accelerating task-parallel workloads by recovering program structure

V Dadu, T Nowatzki - Proceedings of the 27th ACM International …, 2022 - dl.acm.org
Reconfigurable accelerators, like CGRAs and dataflow architectures, have come to
prominence for addressing data-processing problems. However, they are largely limited to …