Google Tudós

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022 - Springer

Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …

Mentés Hivatkozás Idézetek száma: 243 Kapcsolódó cikkek Mind a(z) 6 változat

[Free GPT-4]
[DeepSeek]

[PDF] cmu.edu

Processing-in-memory: A workload-driven perspective

S Ghose, A Boroumand, JS Kim… - IBM Journal of …, 2019 - ieeexplore.ieee.org

Many modern and emerging applications must process increasingly large volumes of data.
Unfortunately, prevalent computing paradigms are not designed to efficiently handle such …

Mentés Hivatkozás Idézetek száma: 219 Kapcsolódó cikkek Mind a(z) 15 változat

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org

Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Mentés Hivatkozás Idézetek száma: 108 Kapcsolódó cikkek Mind a(z) 10 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Figaro: Improving system performance via fine-grained in-dram data relocation and caching

Y Wang, L Orosa, X Peng, Y Guo… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

Main memory, composed of DRAM, is a performance bottleneck for many applications, due
to the high DRAM access latency. In-DRAM caches work to mitigate this latency by …

Mentés Hivatkozás Idézetek száma: 93 Kapcsolódó cikkek Mind a(z) 22 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Smash: Co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations

K Kanellopoulos, N Vijaykumar, C Giannoula… - Proceedings of the …, 2019 - dl.acm.org

Important workloads, such as machine learning and graph analytics applications, heavily
involve sparse linear algebra operations. These operations use sparse matrix compression …

Mentés Hivatkozás Idézetek száma: 117 Kapcsolódó cikkek Mind a(z) 6 változat

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

MGPUSim: Enabling multi-GPU performance modeling and optimization

Y Sun, T Baruah, SA Mojumder, S Dong… - Proceedings of the 46th …, 2019 - dl.acm.org

The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …

Mentés Hivatkozás Idézetek száma: 118 Kapcsolódó cikkek Mind a(z) 7 változat

[Free GPT-4]
[DeepSeek]

[PDF] acm.org Full View

Paver: Locality graph-based thread block scheduling for gpus

D Tripathy, A Abdolrashidi, LN Bhuyan, L Zhou… - ACM Transactions on …, 2021 - dl.acm.org

The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache
sizes per thread, leading to serious cache contention problems such as thrashing. Hence …

Mentés Hivatkozás Idézetek száma: 34 Kapcsolódó cikkek Mind a(z) 6 változat

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Stream-based memory access specialization for general purpose processors

Z Wang, T Nowatzki - Proceedings of the 46th International Symposium …, 2019 - dl.acm.org

Because of severe limitations in technology scaling, architects have innovated in
specializing general purpose processors for computation primitives (eg vector instructions …

Mentés Hivatkozás Idézetek száma: 56 Kapcsolódó cikkek Mind a(z) 4 változat

[Free GPT-4]
[DeepSeek]

[PDF] ucla.edu

Architecting waferscale processors-a GPU case study

S Pal, D Petrisko, M Tomei, P Gupta… - … Symposium on High …, 2019 - ieeexplore.ieee.org

Increasing communication overheads are already threatening computer system scaling. One
approach to dramatically reduce communication overheads is waferscale processing …

Mentés Hivatkozás Idézetek száma: 62 Kapcsolódó cikkek Mind a(z) 10 változat

[Free GPT-4]
[DeepSeek]

[PDF] google.com

Understanding the future of energy efficiency in multi-module gpus

A Arunkumar, E Bolotin, D Nellans… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org

As Moore's law slows down, GPUs must pivot towards multi-module designs to continue
scaling performance at historical rates. Prior work on multi-module GPUs has focused on …

Mentés Hivatkozás Idézetek száma: 54 Kapcsolódó cikkek Mind a(z) 4 változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

The locality descriptor: A holistic cross-layer abstraction to express data locality in GPUs

A modern primer on processing in memory

Processing-in-memory: A workload-driven perspective

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

Figaro: Improving system performance via fine-grained in-dram data relocation and caching

Smash: Co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations

MGPUSim: Enabling multi-GPU performance modeling and optimization

Paver: Locality graph-based thread block scheduling for gpus

Stream-based memory access specialization for general purpose processors

Architecting waferscale processors-a GPU case study

Understanding the future of energy efficiency in multi-module gpus