Shinjuku: Preemptive Scheduling for {μsecond-scale} Tail Latency

K Kaffes, T Chong, JT Humphries, A Belay… - … USENIX Symposium on …, 2019 - usenix.org
The recently proposed dataplanes for microsecond scale applications, such as IX and
ZygOS, use non-preemptive policies to schedule requests to cores. For the many real-world …

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

VW Lee, C Kim, J Chhugani, M Deisher, D Kim… - Proceedings of the 37th …, 2010 - dl.acm.org
Recent advances in computing have led to an explosion in the amount of data being
generated. Processing the ever-growing data in a timely manner has made throughput …

Memory coherence in shared virtual memory systems

K Li, P Hudak - ACM Transactions on Computer Systems (TOCS), 1989 - dl.acm.org
The memory coherence problem in designing and implementing a shared virtual memory on
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …

Arachne:{Core-Aware} Thread Management

H Qin, Q Li, J Speiser, P Kraft… - 13th USENIX Symposium …, 2018 - usenix.org
Arachne is a new user-level implementation of threads that provides both low latency and
high throughput for applications with extremely short-lived threads (only a few …

Relax: An architectural framework for software recovery of hardware faults

M De Kruijf, S Nomura, K Sankaralingam - ACM SIGARCH Computer …, 2010 - dl.acm.org
As technology scales ever further, device unreliability is creating excessive complexity for
hardware to maintain the illusion of perfect operation. In this paper, we consider whether …

Zero-shot kernel learning

H Zhang, P Koniusz - … of the IEEE conference on computer …, 2018 - openaccess.thecvf.com
In this paper, we address an open problem of zero-shot learning. Its principle is based on
learning a map** that associates feature vectors extracted from ie images and attribute …

Scheduling parallel programs by work stealing with private deques

UA Acar, A Charguéraud, M Rainey - Proceedings of the 18th ACM …, 2013 - dl.acm.org
Work stealing has proven to be an effective method for scheduling parallel programs on
multicore computers. To achieve high performance, work stealing distributes tasks between …

Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

A Bhattacharjee, M Martonosi - ACM SIGARCH Computer Architecture …, 2009 - dl.acm.org
With the shift towards chip multiprocessors (CMPs), exploiting and managing parallelism
has become a central problem in computing systems. Many issues of parallelism …

A scalable architecture for ordered parallelism

MC Jeffrey, S Subramanian, C Yan, J Emer… - Proceedings of the 48th …, 2015 - dl.acm.org
We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is
abundant but hard to mine with current software and hardware techniques. In this …

Flexible architectural support for fine-grain scheduling

D Sanchez, RM Yoo, C Kozyrakis - ACM SIGARCH Computer …, 2010 - dl.acm.org
To make efficient use of CMPs with tens to hundreds of cores, it is often necessary to exploit
fine-grain parallelism. However, managing tasks of a few thousand instructions is …