- Academic Search

Massively parallel processing core with plural chains of processing elements and respective smart memory storing select data received from each chain

S Cadambi, A Majumdar, M Becchi… - US Patent …, 2013 - Google Patents

An accelerator System is shown that includes a plurality of processing cores. Each
processing core includes a plurality of processing chains configured to perform parallel …

Save Cite Cited by 471 Related articles All 4 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] acm.org

Why on-chip cache coherence is here to stay

MMK Martin, MD Hill, DJ Sorin - Communications of the ACM, 2012 - dl.acm.org

Why on-chip cache coherence is here to stay Page 1 78 CommuniCations oF the aCm | juLy 2012
| voL. 55 | no. 7 contributed articles shAred MeMorY is the dominant low-level communication …

Save Cite Cited by 364 Related articles All 21 versions Free GPT-4

[Free GPT-4]

[PDF] gatech.edu

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

GF Diamos, AR Kerr, S Yalamanchili… - Proceedings of the 19th …, 2010 - dl.acm.org

Ocelot is a dynamic compilation framework designed to map the explicitly data parallel
execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms …

Save Cite Cited by 352 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] github.io

An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth

DH Woo, NH Seong, DL Lewis… - HPCA-16 2010 The …, 2010 - ieeexplore.ieee.org

Memory bandwidth has become a major performance bottleneck as more and more cores
are integrated onto a single die, demanding more and more data from the system memory …

Save Cite Cited by 359 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] llvm.org

Relax: An architectural framework for software recovery of hardware faults

M De Kruijf, S Nomura, K Sankaralingam - ACM SIGARCH Computer …, 2010 - dl.acm.org

As technology scales ever further, device unreliability is creating excessive complexity for
hardware to maintain the illusion of perfect operation. In this paper, we consider whether …

Save Cite Cited by 307 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] ubc.ca

Thread block compaction for efficient SIMT control flow

WWL Fung, TM Aamodt - 2011 IEEE 17th international …, 2011 - ieeexplore.ieee.org

Manycore accelerators such as graphics processor units (GPUs) organize processing units
into single-instruction, multiple data “cores” to improve throughput per unit hardware cost …

Save Cite Cited by 276 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] stonybrook.edu

DeNovo: Rethinking the memory hierarchy for disciplined parallelism

B Choi, R Komuravelli, H Sung… - 2011 International …, 2011 - ieeexplore.ieee.org

For parallelism to become tractable for mass programmers, shared-memory languages and
environments must evolve to enforce disciplined practices that ban" wild shared-memory …

Save Cite Cited by 264 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] illinois.edu

Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces

B Pichai, L Hsu, A Bhattacharjee - ACM SIGARCH Computer Architecture …, 2014 - dl.acm.org

The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent
example, necessitates a manageable programming model to ensure widespread adoption …

Save Cite Cited by 205 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] upc.edu

An asymmetric distributed shared memory model for heterogeneous parallel systems

I Gelado, JE Stone, J Cabezas, S Patel… - Proceedings of the …, 2010 - dl.acm.org

Heterogeneous computing combines general purpose CPUs with accelerators to efficiently
execute both sequential control-intensive and data-parallel phases of applications. Existing …

Save Cite Cited by 269 Related articles All 17 versions Free GPT-4

[Free GPT-4]

[PDF] psu.edu

Goldmine: Automatic assertion generation using data mining and static analysis

S Vasudevan, D Sheridan, S Patel… - … , Automation & Test …, 2010 - ieeexplore.ieee.org

We present GOLDMINE, a methodology for generating assertions automatically. Our method
involves a combination of data mining and static analysis of the Register Transfer Level …

Save Cite Cited by 201 Related articles All 12 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Rigel: An architecture and scalable programming interface for a 1000-core accelerator

Massively parallel processing core with plural chains of processing elements and respective smart memory storing select data received from each chain

Why on-chip cache coherence is here to stay

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth

Relax: An architectural framework for software recovery of hardware faults

Thread block compaction for efficient SIMT control flow

DeNovo: Rethinking the memory hierarchy for disciplined parallelism

Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces

An asymmetric distributed shared memory model for heterogeneous parallel systems

Goldmine: Automatic assertion generation using data mining and static analysis