Cilk: An efficient multithreaded runtime system

RD Blumofe, CF Joerg, BC Kuszmaul… - ACM SigPlan …, 1995 - dl.acm.org
Cilk (pronounced “silk”) is a C-based runtime system for multi-threaded parallel
programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler …

The data locality of work stealing

UA Acar, GE Blelloch, RD Blumofe - Proceedings of the twelfth annual …, 2000 - dl.acm.org
This paper studies the data locality of the work-stealing scheduling algorithm on hardware-
controlled shared-memory machines. We present lower and upper bounds on the number of …

Dead-block prediction & dead-block correlating prefetchers

AC Lai, C Fide, B Falsafi - ACM SIGARCH Computer Architecture News, 2001 - dl.acm.org
Effective data prefetching requires accurate mechanisms to predict both “which” cache
blocks to prefetch and “when” to prefetch them. This paper proposes the Dead-Block …

Capriccio: Scalable threads for internet services

R Von Behren, J Condit, F Zhou, GC Necula… - ACM SIGOPS …, 2003 - dl.acm.org
This paper presents Capriccio, a scalable thread package for use with high-concurrency
servers. While recent work has advocated event-based systems, we believe that thread …

Map** irregular applications to DIVA, a PIM-based data-intensive architecture

M Hall, P Kogge, J Koller, P Diniz, J Chame… - Proceedings of the …, 1999 - dl.acm.org
Abstract Processing-in-memory (PIM) chips that integrate processor logic into memory
devices offer a new opportunity for bridging the growing gap between processor and …

Supporting dynamic data structures on distributed-memory machines

A Rogers, MC Carlisle, JH Reppy… - ACM Transactions on …, 1995 - dl.acm.org
Compiling for distributed-memory machines has been a very active research area in recent
years. Much of this work has concentrated on programs that use arrays as their primary data …

A dynamic compilation framework for controlling microprocessor energy and performance

Q Wu, VJ Reddi, Y Wu, J Lee… - 38th Annual IEEE …, 2005 - ieeexplore.ieee.org
Dynamic voltage and frequency scaling (DVFS) is an effective technique for controlling
microprocessor energy and performance. Existing DVFS techniques are primarily based on …

A large, fast instruction window for tolerating cache misses

AR Lebeck, J Koppanalil, T Li, J Patwardhan… - ACM SIGARCH …, 2002 - dl.acm.org
Instruction window size is an important design parameter for many modern processors.
Large instruction windows offer the potential advantage of exposing large amounts of …

CRL: High-performance all-software distributed shared memory

KL Johnson, MF Kaashoek, DA Wallach - Proceedings of the fifteenth …, 1995 - dl.acm.org
Abstract The C/? egion Library(CRL) is a new all-software distributed shared memory (DSM)
system. CRL requires no special compiler, hardware, or operating system support beyond …

Executing multithreaded programs efficiently

RD Blumofe - 1995 - dspace.mit.edu
Executing Multithreaded Programs Efficiently Robert D. Blumofe Page 1 Executing
Multithreaded Programs Efficiently by Robert D. Blumofe Sc.B., Brown University (1988) SM …