Μελετητής Google

H Kwon, P Chatarasi, M Pellauer, A Parashar… - Proceedings of the …, 2019 - dl.acm.org

The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse
and perform staging are known as dataflow, which directly impacts the performance and …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 343 Σχετικά άρθρα Όλες οι 10 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] iitdh.ac.in

A survey of cache simulators

H Brais, R Kalayappan, PR Panda - ACM Computing Surveys (CSUR), 2020 - dl.acm.org

Computer architecture simulation tools are essential for implementing and evaluating new
ideas in the domain and can be useful for understanding the behavior of programs and …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 44 Σχετικά άρθρα Όλες οι 6 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Analytical characterization and design space exploration for optimization of cnns

R Li, Y Xu, A Sukumaran-Rajam, A Rountev… - Proceedings of the 26th …, 2021 - dl.acm.org

Moving data through the memory hierarchy is a fundamental bottleneck that can limit the
performance of core algorithms of machine learning, such as convolutional neural networks …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 72 Σχετικά άρθρα Όλες οι 7 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] futhark-lang.org

Incremental flattening for nested data parallelism

T Henriksen, F Thorøe, M Elsman… - Proceedings of the 24th …, 2019 - dl.acm.org

Compilation techniques for nested-parallel applications that can adapt to hardware and
dataset characteristics are vital for unlocking the power of modern hardware. This paper …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 52 Σχετικά άρθρα Όλες οι 7 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] acm.org Full View

Polydl: Polyhedral optimizations for creation of high-performance dl primitives

S Tavarageri, A Heinecke, S Avancha, B Kaul… - ACM Transactions on …, 2021 - dl.acm.org

Deep Neural Networks (DNNs) have revolutionized many aspects of our lives. The use of
DNNs is becoming ubiquitous, including in software for image recognition, speech …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 30 Σχετικά άρθρα Όλες οι 4 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A fast analytical model of fully associative caches

T Gysi, T Grosser, L Brandner, T Hoefler - Proceedings of the 40th ACM …, 2019 - dl.acm.org

While the cost of computation is an easy to understand local property, the cost of data
movement on cached architectures depends on global state, does not compose, and is hard …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 36 Σχετικά άρθρα Όλες οι 33 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Fast and exact analysis for LRU caches

V Touzeau, C Maïza, D Monniaux… - Proceedings of the ACM on …, 2019 - dl.acm.org

For applications in worst-case execution time analysis and in security, it is desirable to
statically classify memory accesses into those that result in cache hits, and those that result …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 38 Σχετικά άρθρα Όλες οι 6 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Falcon: A scalable analytical cache model

A Pitchanathan, K Grover, T Grosser - Proceedings of the ACM on …, 2024 - dl.acm.org

Compilers often use performance models to decide how to optimize code. This is often
preferred over using hardware performance measurements, since hardware measurements …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 1 Σχετικά άρθρα Όλες οι 2 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] whiterose.ac.uk

A methodology for efficient tile size selection for affine loop kernels

V Kelefouras, K Djemame, G Keramidas… - International Journal of …, 2022 - Springer

Reducing the number of data accesses in memory hierarchy is of paramount importance on
modern computer systems. One of the key optimizations addressing this problem is loop …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 10 Σχετικά άρθρα Όλες οι 9 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Parallel Loop Locality Analysis for Symbolic Thread Counts

F Liu, Y Zhu, S Sun, C Ding, W Smith… - Proceedings of the 2024 …, 2024 - dl.acm.org

Data movement limits program performance. This bottleneck is more significant in multi-
thread programs but more difficult to analyze, especially for multiple thread counts. For …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 2 Σχετικά άρθρα Όλες οι 4 εκδοχές

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Analytical modeling of cache behavior for affine programs

Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach

A survey of cache simulators

Analytical characterization and design space exploration for optimization of cnns

Incremental flattening for nested data parallelism

Polydl: Polyhedral optimizations for creation of high-performance dl primitives

A fast analytical model of fully associative caches

Fast and exact analysis for LRU caches

Falcon: A scalable analytical cache model

A methodology for efficient tile size selection for affine loop kernels

Parallel Loop Locality Analysis for Symbolic Thread Counts