Cache-conscious wavefront scheduling
This paper studies the effects of hardware thread scheduling on cache management in
GPUs. We propose Cache-Conscious Wave front Scheduling (CCWS), an adaptive …
GPUs. We propose Cache-Conscious Wave front Scheduling (CCWS), an adaptive …
Divergence-aware warp scheduling
This paper uses hardware thread scheduling to improve the performance and energy
efficiency of divergent applications on GPUs. We propose Divergence-Aware Warp …
efficiency of divergent applications on GPUs. We propose Divergence-Aware Warp …
Low-latency, high-throughput garbage collection
To achieve short pauses, state-of-the-art concurrent copying collectors such as C4,
Shenandoah, and ZGC use substantially more CPU cycles and memory than simpler …
Shenandoah, and ZGC use substantially more CPU cycles and memory than simpler …
Performance analysis of content matching intrusion detection systems
Although network intrusion detection systems (nIDS) are widely used, there is limited
understanding of how these systems perform in different settings and how they should be …
understanding of how these systems perform in different settings and how they should be …
GPUs as an opportunity for offloading garbage collection
GPUs have become part of most commodity systems. Nonetheless, they are often
underutilized when not executing graphics-intensive or special-purpose numerical …
underutilized when not executing graphics-intensive or special-purpose numerical …
Data structure aware garbage collector
Garbage collection may benefit greatly from knowledge about program behavior, but most
managed languages do not provide means for the programmer to deliver such knowledge …
managed languages do not provide means for the programmer to deliver such knowledge …
Memory management for many-core processors with software configurable locality policies
J Zhou, B Demsky - ACM SIGPLAN Notices, 2012 - dl.acm.org
As processors evolve towards higher core counts, architects will develop more sophisticated
memory systems to satisfy the cores' increasing thirst for memory bandwidth. Early many …
memory systems to satisfy the cores' increasing thirst for memory bandwidth. Early many …
On the limits of modeling generational garbage collector performance
Garbage collection is an element of many contemporary software platforms whose
performance is determined by complex interactions and is therefore difficult to quantify and …
performance is determined by complex interactions and is therefore difficult to quantify and …
Improving Garbage Collection Observability with Performance Tracing
Debugging garbage collectors for performance and correctness is notoriously difficult.
Among the arsenal of tools available to systems engineers, support for one of the most …
Among the arsenal of tools available to systems engineers, support for one of the most …
Linear-Mark: Locality vs. Accuracy in Mark-Sweep Garbage Collection
Tracing garbage collectors are widely deployed in modern programming languages. But
tracing an arbitrary heap shape incurs poor locality and may hinder scalability. In this paper …
tracing an arbitrary heap shape incurs poor locality and may hinder scalability. In this paper …