A survey of processors with explicit multithreading

T Ungerer, B Robič, J Šilc - ACM Computing Surveys (CSUR), 2003 - dl.acm.org
Hardware multithreading is becoming a generally applied technique in the next generation
of microprocessors. Several multithreaded processors are announced by industry or already …

Multithreaded processors

T Ungerer, B Robič, J Šilc - The Computer Journal, 2002 - academic.oup.com
The instruction-level parallelism found in a conventional instruction stream is limited. Studies
have shown the limits of processor utilization even for today's superscalar microprocessors …

The wavescalar architecture

S Swanson, A Schwerin, M Mercaldi… - ACM Transactions on …, 2007 - dl.acm.org
Silicon technology will continue to provide an exponential increase in the availability of raw
transistors. Effectively translating this resource into application performance, however, is an …

Handling long-latency loads in a simultaneous multithreading processor

DM Tullsen, JA Brown - Proceedings. 34th ACM/IEEE …, 2001 - ieeexplore.ieee.org
Simultaneous multithreading architectures have been defined previously with fully shared
execution resources. When one thread in such an architecture experiences a very long …

Decoupled software pipelining with the synchronization array

R Rangan, N Vachharajani… - … , 2004. PACT 2004., 2004 - ieeexplore.ieee.org
Despite the success of instruction-level parallelism (ILP) optimizations in increasing the
performance of microprocessors, certain codes remain elusive. In particular, codes …

Persistent processor architecture

J Zeng, J Jeong, C Jung - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
This paper presents PPA (Persistent Processor Architecture), simple microarchitectural
support for lightweight yet performant whole-system persistence. PPA offers fully transparent …

Initial observations of the simultaneous multithreading Pentium 4 processor

N Tuck, DM Tullsen - 2003 12th International Conference on …, 2003 - ieeexplore.ieee.org
We analyze an Intel Pentium 4 hyper-threading processor. The focus is to understand its
performance and the underlying reasons behind that performance. Particular attention is …

Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures

W Zhu, VC Sreedhar, Z Hu, GR Gao - Proceedings of the 34th annual …, 2007 - dl.acm.org
Efficient fine-grain synchronization is extremely important to effectively harness the
computational power of many-core architectures. However, designing and implementing …

Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors

D Kim, SSW Liao, PH Wang… - … Symposium on Code …, 2004 - ieeexplore.ieee.org
Pre-execution techniques have received much attention as an effective way of prefetching
cache blocks to tolerate the ever-increasing memory latency. A number of pre-execution …

Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers

J Sampson, R Gonzalez, JF Collard… - 2006 39th Annual …, 2006 - ieeexplore.ieee.org
We examine the ability of CMPs, due to their lower on-chip communication latencies, to
exploit data parallelism at inner-loop granularities similar to that commonly targeted by …