Single-ISA heterogeneous multi-core architectures for multithreaded workload performance
A single-ISA heterogeneous multi-core architecture is achip multiprocessor composed of
cores of varying size, performance, and complexity. This paper demonstrates that …
cores of varying size, performance, and complexity. This paper demonstrates that …
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
This paper describes the polymorphous TRIPS architecture which can be configured for
different granularities and types of parallelism. TRIPS contains mechanisms that enable the …
different granularities and types of parallelism. TRIPS contains mechanisms that enable the …
Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams
This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a
general-purpose architecture that performswell on a larger class of stream and embedded …
general-purpose architecture that performswell on a larger class of stream and embedded …
Dynamic warp subdivision for integrated branch and memory divergence tolerance
SIMD organizations amortize the area and power of fetch, decode, and issue logic across
multiple processing units in order to maximize throughput for a given area and power …
multiple processing units in order to maximize throughput for a given area and power …
A case for high performance computing with virtual machines
Virtual machine (VM) technologies are experiencing a resurgence in both industry and
research communities. VMs offer many desirable features such as security, ease of …
research communities. VMs offer many desirable features such as security, ease of …
Auto-vectorization of interleaved data for SIMD
D Nuzman, I Rosen, A Zaks - ACM SIGPLAN Notices, 2006 - dl.acm.org
Most implementations of the Single Instruction Multiple Data (SIMD) model available today
require that data elements be packed in vector registers. Operations on disjoint vector …
require that data elements be packed in vector registers. Operations on disjoint vector …
The vector-thread architecture
R Krashinsky, C Batten, M Hampton… - ACM SIGARCH …, 2004 - dl.acm.org
The vector-thread (VT) architectural paradigm unifies the vectorand multithreaded compute
models. The VT abstraction providesthe programmer with a control processor and a vector of …
models. The VT abstraction providesthe programmer with a control processor and a vector of …
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Multimedia processing on embedded devices requires an architecture that leads to high
performance, low power consumption, reduced design complexity, and small code size. In …
performance, low power consumption, reduced design complexity, and small code size. In …
Convergence of recognition, mining, and synthesis workloads and its implications
This paper examines the growing need for a general-purpose ldquoanalytics enginerdquo
that can enable next-generation processing platforms to effectively model events, objects …
that can enable next-generation processing platforms to effectively model events, objects …
Data processing apparatus having cache and translation lookaside buffer
ML Böttcher, D Kershaw - US Patent 9,684,601, 2017 - Google Patents
A data processing apparatus has a cache and a translation look aside buffer (TLB). A way
table is provided for identifying which of a plurality of cache ways stores require data. Each …
table is provided for identifying which of a plurality of cache ways stores require data. Each …