الباحث العلمي من Google

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022‏ - Springer‏

Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …‏

حفظ اقتباس تم اقتباسها في عدد: 242 مقالات ذات صلة الإصدارات الـ 6كلها

[Free GPT-4]
[DeepSeek]

[PDF] cmu.edu

Processing-in-memory: A workload-driven perspective‏

S Ghose, A Boroumand, JS Kim… - IBM Journal of …, 2019‏ - ieeexplore.ieee.org‏

Many modern and emerging applications must process increasingly large volumes of data.
Unfortunately, prevalent computing paradigms are not designed to efficiently handle such …‏

حفظ اقتباس تم اقتباسها في عدد: 213 مقالات ذات صلة الإصدارات الـ 15كلها

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks‏

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021‏ - ieeexplore.ieee.org‏

Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …‏

حفظ اقتباس تم اقتباسها في عدد: 105 مقالات ذات صلة الإصدارات الـ 13كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Smash: Co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations‏

K Kanellopoulos, N Vijaykumar, C Giannoula… - Proceedings of the …, 2019‏ - dl.acm.org‏

Important workloads, such as machine learning and graph analytics applications, heavily
involve sparse linear algebra operations. These operations use sparse matrix compression …‏

حفظ اقتباس تم اقتباسها في عدد: 116 مقالات ذات صلة الإصدارات الـ 8كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Mgpusim: Enabling multi-gpu performance modeling and optimization‏

Y Sun, T Baruah, SA Mojumder, S Dong… - Proceedings of the 46th …, 2019‏ - dl.acm.org‏

The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …‏

حفظ اقتباس تم اقتباسها في عدد: 116 مقالات ذات صلة الإصدارات الـ 8كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FIGARO: Improving system performance via fine-grained in-DRAM data relocation and caching‏

Y Wang, L Orosa, X Peng, Y Guo… - 2020 53rd Annual …, 2020‏ - ieeexplore.ieee.org‏

Main memory, composed of DRAM, is a performance bottleneck for many applications, due
to the high DRAM access latency. In-DRAM caches work to mitigate this latency by …‏

حفظ اقتباس تم اقتباسها في عدد: 91 مقالات ذات صلة الإصدارات الـ 23كلها

[Free GPT-4]
[DeepSeek]

[PDF] ucla.edu

Architecting waferscale processors-a gpu case study‏

S Pal, D Petrisko, M Tomei, P Gupta… - … Symposium on High …, 2019‏ - ieeexplore.ieee.org‏

Increasing communication overheads are already threatening computer system scaling. One
approach to dramatically reduce communication overheads is waferscale processing …‏

حفظ اقتباس تم اقتباسها في عدد: 64 مقالات ذات صلة الإصدارات الـ 10كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Stream-based memory access specialization for general purpose processors‏

Z Wang, T Nowatzki - Proceedings of the 46th International Symposium …, 2019‏ - dl.acm.org‏

Because of severe limitations in technology scaling, architects have innovated in
specializing general purpose processors for computation primitives (eg vector instructions …‏

حفظ اقتباس تم اقتباسها في عدد: 56 مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org Full View‏

Paver: Locality graph-based thread block scheduling for gpus‏

D Tripathy, A Abdolrashidi, LN Bhuyan, L Zhou… - ACM Transactions on …, 2021‏ - dl.acm.org‏

The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache
sizes per thread, leading to serious cache contention problems such as thrashing. Hence …‏

حفظ اقتباس تم اقتباسها في عدد: 34 مقالات ذات صلة الإصدارات الـ 6كلها

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Common counters: Compressed encryption counters for secure GPU memory‏

S Na, S Lee, Y Kim, J Park, J Huh - 2021 IEEE International …, 2021‏ - ieeexplore.ieee.org‏

Hardware-based trusted execution has opened a promising new opportunity for enabling
secure cloud computing. Nevertheless, the current trusted execution environments are …‏

حفظ اقتباس تم اقتباسها في عدد: 32 مقالات ذات صلة الإصدارات الـ 11كلها

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

The locality descriptor: A holistic cross-layer abstraction to express data locality in GPUs

A modern primer on processing in memory‏

Processing-in-memory: A workload-driven perspective‏

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks‏

Smash: Co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations‏

Mgpusim: Enabling multi-gpu performance modeling and optimization‏

FIGARO: Improving system performance via fine-grained in-DRAM data relocation and caching‏

Architecting waferscale processors-a gpu case study‏

Stream-based memory access specialization for general purpose processors‏

Paver: Locality graph-based thread block scheduling for gpus‏

Common counters: Compressed encryption counters for secure GPU memory‏