Shinjuku: Preemptive Scheduling for {μsecond-scale} Tail Latency
The recently proposed dataplanes for microsecond scale applications, such as IX and
ZygOS, use non-preemptive policies to schedule requests to cores. For the many real-world …
ZygOS, use non-preemptive policies to schedule requests to cores. For the many real-world …
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Recent advances in computing have led to an explosion in the amount of data being
generated. Processing the ever-growing data in a timely manner has made throughput …
generated. Processing the ever-growing data in a timely manner has made throughput …
Memory coherence in shared virtual memory systems
The memory coherence problem in designing and implementing a shared virtual memory on
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …
Arachne:{Core-Aware} Thread Management
Arachne is a new user-level implementation of threads that provides both low latency and
high throughput for applications with extremely short-lived threads (only a few …
high throughput for applications with extremely short-lived threads (only a few …
Relax: An architectural framework for software recovery of hardware faults
As technology scales ever further, device unreliability is creating excessive complexity for
hardware to maintain the illusion of perfect operation. In this paper, we consider whether …
hardware to maintain the illusion of perfect operation. In this paper, we consider whether …
Zero-shot kernel learning
In this paper, we address an open problem of zero-shot learning. Its principle is based on
learning a map** that associates feature vectors extracted from ie images and attribute …
learning a map** that associates feature vectors extracted from ie images and attribute …
Scheduling parallel programs by work stealing with private deques
Work stealing has proven to be an effective method for scheduling parallel programs on
multicore computers. To achieve high performance, work stealing distributes tasks between …
multicore computers. To achieve high performance, work stealing distributes tasks between …
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors
With the shift towards chip multiprocessors (CMPs), exploiting and managing parallelism
has become a central problem in computing systems. Many issues of parallelism …
has become a central problem in computing systems. Many issues of parallelism …
A scalable architecture for ordered parallelism
We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is
abundant but hard to mine with current software and hardware techniques. In this …
abundant but hard to mine with current software and hardware techniques. In this …
Flexible architectural support for fine-grain scheduling
To make efficient use of CMPs with tens to hundreds of cores, it is often necessary to exploit
fine-grain parallelism. However, managing tasks of a few thousand instructions is …
fine-grain parallelism. However, managing tasks of a few thousand instructions is …