Throughput-effective on-chip networks for manycore accelerators
As the number of cores and threads in manycore compute accelerators such as Graphics
Processing Units (GPU) increases, so does the importance of on-chip interconnection …
Processing Units (GPU) increases, so does the importance of on-chip interconnection …
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Y Lee, R Avizienis, A Bishara, R **a… - Proceedings of the 38th …, 2011 - dl.acm.org
We present a taxonomy and modular implementation approach for data-parallel
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …
WAYPOINT: scaling coherence to thousand-core architectures
JH Kelm, MR Johnson, SS Lumettta… - Proceedings of the 19th …, 2010 - dl.acm.org
In this paper, we evaluate a set of coherence architectures in the context of a 1024-core chip
multiprocessor (CMP) tailored to throughput-oriented parallel workloads. Based on our …
multiprocessor (CMP) tailored to throughput-oriented parallel workloads. Based on our …
Cohesion: a hybrid memory model for accelerators
JH Kelm, DR Johnson, W Tuohy, SS Lumetta… - Proceedings of the 37th …, 2010 - dl.acm.org
Two broad classes of memory models are available today: models with hardware cache
coherence, used in conventional chip multiprocessors, and models that rely upon software …
coherence, used in conventional chip multiprocessors, and models that rely upon software …
Rigel: A 1,024-core single-chip accelerator architecture
Rigel is a single-chip accelerator architecture with 1,024 independent processing cores
targeted at a broad class of data-and task-parallel computation. This article discusses …
targeted at a broad class of data-and task-parallel computation. This article discusses …
Cohesion: An adaptive hybrid memory model for accelerators
JH Kelm, DR Johnson, W Tuohy, SS Lumetta… - IEEE micro, 2011 - ieeexplore.ieee.org
Cohesion is a hybrid memory model that enables fine-grained temporal data reassignment
between hardware-and software-managed coherence domains, allowing systems to support …
between hardware-and software-managed coherence domains, allowing systems to support …
Simplified vector-thread architectures for flexible and efficient data-parallel accelerators
CF Batten - 2010 - dspace.mit.edu
This thesis explores a new approach to building data-parallel accelerators that is based on
simplifying the instruction set, microarchitecture, and programming methodology for a vector …
simplifying the instruction set, microarchitecture, and programming methodology for a vector …
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Y Lee, R Avizienis, A Bishara, R **a… - ACM Transactions on …, 2013 - dl.acm.org
We present a taxonomy and modular implementation approach for data-parallel
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …
Designing on-chip networks for throughput accelerators
As the number of cores and threads in throughput accelerators such as Graphics Processing
Units (GPU) increases, so does the importance of on-chip interconnection network design …
Units (GPU) increases, so does the importance of on-chip interconnection network design …
[किताब][B] Efficient embedded computing
JD Balfour - 2010 - search.proquest.com
This dissertation describes Elm, an efficient programmable system for high-performance
embedded applications. Elm is significantly more efficient than conventional embedded …
embedded applications. Elm is significantly more efficient than conventional embedded …