A survey of processors with explicit multithreading
T Ungerer, B Robič, J Šilc - ACM Computing Surveys (CSUR), 2003 - dl.acm.org
Hardware multithreading is becoming a generally applied technique in the next generation
of microprocessors. Several multithreaded processors are announced by industry or already …
of microprocessors. Several multithreaded processors are announced by industry or already …
Multithreaded processors
T Ungerer, B Robič, J Šilc - The Computer Journal, 2002 - academic.oup.com
The instruction-level parallelism found in a conventional instruction stream is limited. Studies
have shown the limits of processor utilization even for today's superscalar microprocessors …
have shown the limits of processor utilization even for today's superscalar microprocessors …
The wavescalar architecture
S Swanson, A Schwerin, M Mercaldi… - ACM Transactions on …, 2007 - dl.acm.org
Silicon technology will continue to provide an exponential increase in the availability of raw
transistors. Effectively translating this resource into application performance, however, is an …
transistors. Effectively translating this resource into application performance, however, is an …
Handling long-latency loads in a simultaneous multithreading processor
DM Tullsen, JA Brown - Proceedings. 34th ACM/IEEE …, 2001 - ieeexplore.ieee.org
Simultaneous multithreading architectures have been defined previously with fully shared
execution resources. When one thread in such an architecture experiences a very long …
execution resources. When one thread in such an architecture experiences a very long …
Decoupled software pipelining with the synchronization array
Despite the success of instruction-level parallelism (ILP) optimizations in increasing the
performance of microprocessors, certain codes remain elusive. In particular, codes …
performance of microprocessors, certain codes remain elusive. In particular, codes …
Persistent processor architecture
This paper presents PPA (Persistent Processor Architecture), simple microarchitectural
support for lightweight yet performant whole-system persistence. PPA offers fully transparent …
support for lightweight yet performant whole-system persistence. PPA offers fully transparent …
Initial observations of the simultaneous multithreading Pentium 4 processor
N Tuck, DM Tullsen - 2003 12th International Conference on …, 2003 - ieeexplore.ieee.org
We analyze an Intel Pentium 4 hyper-threading processor. The focus is to understand its
performance and the underlying reasons behind that performance. Particular attention is …
performance and the underlying reasons behind that performance. Particular attention is …
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures
Efficient fine-grain synchronization is extremely important to effectively harness the
computational power of many-core architectures. However, designing and implementing …
computational power of many-core architectures. However, designing and implementing …
Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors
D Kim, SSW Liao, PH Wang… - … Symposium on Code …, 2004 - ieeexplore.ieee.org
Pre-execution techniques have received much attention as an effective way of prefetching
cache blocks to tolerate the ever-increasing memory latency. A number of pre-execution …
cache blocks to tolerate the ever-increasing memory latency. A number of pre-execution …
Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers
We examine the ability of CMPs, due to their lower on-chip communication latencies, to
exploit data parallelism at inner-loop granularities similar to that commonly targeted by …
exploit data parallelism at inner-loop granularities similar to that commonly targeted by …