Single-ISA heterogeneous multi-core architectures for multithreaded workload performance

R Kumar, DM Tullsen, P Ranganathan… - ACM SIGARCH …, 2004 - dl.acm.org
A single-ISA heterogeneous multi-core architecture is achip multiprocessor composed of
cores of varying size, performance, and complexity. This paper demonstrates that …

Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

K Sankaralingam, R Nagarajan, H Liu, C Kim… - Proceedings of the 30th …, 2003 - dl.acm.org
This paper describes the polymorphous TRIPS architecture which can be configured for
different granularities and types of parallelism. TRIPS contains mechanisms that enable the …

Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams

MB Taylor, W Lee, J Miller, D Wentzlaff, I Bratt… - ACM SIGARCH …, 2004 - dl.acm.org
This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a
general-purpose architecture that performswell on a larger class of stream and embedded …

Dynamic warp subdivision for integrated branch and memory divergence tolerance

J Meng, D Tarjan, K Skadron - Proceedings of the 37th annual …, 2010 - dl.acm.org
SIMD organizations amortize the area and power of fetch, decode, and issue logic across
multiple processing units in order to maximize throughput for a given area and power …

A case for high performance computing with virtual machines

W Huang, J Liu, B Abali, DK Panda - Proceedings of the 20th annual …, 2006 - dl.acm.org
Virtual machine (VM) technologies are experiencing a resurgence in both industry and
research communities. VMs offer many desirable features such as security, ease of …

Auto-vectorization of interleaved data for SIMD

D Nuzman, I Rosen, A Zaks - ACM SIGPLAN Notices, 2006 - dl.acm.org
Most implementations of the Single Instruction Multiple Data (SIMD) model available today
require that data elements be packed in vector registers. Operations on disjoint vector …

The vector-thread architecture

R Krashinsky, C Batten, M Hampton… - ACM SIGARCH …, 2004 - dl.acm.org
The vector-thread (VT) architectural paradigm unifies the vectorand multithreaded compute
models. The VT abstraction providesthe programmer with a control processor and a vector of …

Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks

C Kozyrakis, D Patterson - 35th Annual IEEE/ACM International …, 2002 - ieeexplore.ieee.org
Multimedia processing on embedded devices requires an architecture that leads to high
performance, low power consumption, reduced design complexity, and small code size. In …

Convergence of recognition, mining, and synthesis workloads and its implications

YK Chen, J Chhugani, P Dubey… - Proceedings of the …, 2008 - ieeexplore.ieee.org
This paper examines the growing need for a general-purpose ldquoanalytics enginerdquo
that can enable next-generation processing platforms to effectively model events, objects …

Data processing apparatus having cache and translation lookaside buffer

ML Böttcher, D Kershaw - US Patent 9,684,601, 2017 - Google Patents
A data processing apparatus has a cache and a translation look aside buffer (TLB). A way
table is provided for identifying which of a plurality of cache ways stores require data. Each …