Cilk: An efficient multithreaded runtime system
RD Blumofe, CF Joerg, BC Kuszmaul… - ACM SigPlan …, 1995 - dl.acm.org
Cilk (pronounced “silk”) is a C-based runtime system for multi-threaded parallel
programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler …
programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler …
The data locality of work stealing
This paper studies the data locality of the work-stealing scheduling algorithm on hardware-
controlled shared-memory machines. We present lower and upper bounds on the number of …
controlled shared-memory machines. We present lower and upper bounds on the number of …
Dead-block prediction & dead-block correlating prefetchers
AC Lai, C Fide, B Falsafi - ACM SIGARCH Computer Architecture News, 2001 - dl.acm.org
Effective data prefetching requires accurate mechanisms to predict both “which” cache
blocks to prefetch and “when” to prefetch them. This paper proposes the Dead-Block …
blocks to prefetch and “when” to prefetch them. This paper proposes the Dead-Block …
Capriccio: Scalable threads for internet services
This paper presents Capriccio, a scalable thread package for use with high-concurrency
servers. While recent work has advocated event-based systems, we believe that thread …
servers. While recent work has advocated event-based systems, we believe that thread …
Map** irregular applications to DIVA, a PIM-based data-intensive architecture
Abstract Processing-in-memory (PIM) chips that integrate processor logic into memory
devices offer a new opportunity for bridging the growing gap between processor and …
devices offer a new opportunity for bridging the growing gap between processor and …
Supporting dynamic data structures on distributed-memory machines
Compiling for distributed-memory machines has been a very active research area in recent
years. Much of this work has concentrated on programs that use arrays as their primary data …
years. Much of this work has concentrated on programs that use arrays as their primary data …
A dynamic compilation framework for controlling microprocessor energy and performance
Dynamic voltage and frequency scaling (DVFS) is an effective technique for controlling
microprocessor energy and performance. Existing DVFS techniques are primarily based on …
microprocessor energy and performance. Existing DVFS techniques are primarily based on …
A large, fast instruction window for tolerating cache misses
Instruction window size is an important design parameter for many modern processors.
Large instruction windows offer the potential advantage of exposing large amounts of …
Large instruction windows offer the potential advantage of exposing large amounts of …
CRL: High-performance all-software distributed shared memory
Abstract The C/? egion Library(CRL) is a new all-software distributed shared memory (DSM)
system. CRL requires no special compiler, hardware, or operating system support beyond …
system. CRL requires no special compiler, hardware, or operating system support beyond …
Executing multithreaded programs efficiently
RD Blumofe - 1995 - dspace.mit.edu
Executing Multithreaded Programs Efficiently Robert D. Blumofe Page 1 Executing
Multithreaded Programs Efficiently by Robert D. Blumofe Sc.B., Brown University (1988) SM …
Multithreaded Programs Efficiently by Robert D. Blumofe Sc.B., Brown University (1988) SM …