الباحث العلمي من Google

C Lattner, M Amini, U Bondhugula… - 2021 IEEE/ACM …, 2021‏ - ieeexplore.ieee.org‏

This work presents MLIR, a novel approach to building reusable and extensible compiler
infrastructure. MLIR addresses software fragmentation, compilation for heterogeneous …‏

حفظ اقتباس تم اقتباسها في عدد: 559 مقالات ذات صلة الإصدارات الـ 10كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MLIR: A compiler infrastructure for the end of Moore's law‏

C Lattner, M Amini, U Bondhugula, A Cohen… - arxiv preprint arxiv …, 2020‏ - arxiv.org‏

This work presents MLIR, a novel approach to building reusable and extensible compiler
infrastructure. MLIR aims to address software fragmentation, improve compilation for …‏

حفظ اقتباس تم اقتباسها في عدد: 326 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Kernel operations on the GPU, with autodiff, without memory overflows‏

B Charlier, J Feydy, JA Glaunes, FD Collin… - Journal of Machine …, 2021‏ - jmlr.org‏

The KeOps library provides a fast and memory-efficient GPU support for tensors whose
entries are given by a mathematical formula, such as kernel and distance matrices. KeOps …‏

حفظ اقتباس تم اقتباسها في عدد: 211 مقالات ذات صلة الإصدارات الـ 11كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Graphit: A high-performance graph dsl‏

Y Zhang, M Yang, R Baghdadi, S Kamil… - Proceedings of the …, 2018‏ - dl.acm.org‏

The performance bottlenecks of graph applications depend not only on the algorithm and
the underlying hardware, but also on the size and structure of the input graph. As a result …‏

حفظ اقتباس تم اقتباسها في عدد: 208 مقالات ذات صلة الإصدارات الـ 8كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Exocompilation for productive programming of hardware accelerators‏

Y Ikarashi, GL Bernstein, A Reinking, H Genc… - Proceedings of the 43rd …, 2022‏ - dl.acm.org‏

High-performance kernel libraries are critical to exploiting accelerators and specialized
instructions in many applications. Because compilers are difficult to extend to support …‏

حفظ اقتباس تم اقتباسها في عدد: 59 مقالات ذات صلة الإصدارات الـ 9كلها بحث عن المكتبات

[Free GPT-4]
[DeepSeek]

[PDF] mlsys.org

DietCode: Automatic optimization for dynamic tensor programs‏

B Zheng, Z Jiang, CH Yu, H Shen… - Proceedings of …, 2022‏ - proceedings.mlsys.org‏

Achieving high performance for compute-intensive operators in machine learning (ML)
workloads is a crucial but challenging task. Many ML and system practitioners rely on …‏

حفظ اقتباس تم اقتباسها في عدد: 40 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Accelerating reduction and scan using tensor core units‏

A Dakkak, C Li, J **ong, I Gelado, W Hwu - Proceedings of the ACM …, 2019‏ - dl.acm.org‏

Driven by deep learning, there has been a surge of specialized processors for matrix
multiplication, referred to as Tensor Core Units (TCUs). These TCUs are capable of …‏

حفظ اقتباس تم اقتباسها في عدد: 112 مقالات ذات صلة الإصدارات الـ 11كلها

[Free GPT-4]
[DeepSeek]

[PDF] umich.edu

Domain-specific architectures: Research problems and promising approaches‏

A Krishnakumar, U Ogras, R Marculescu… - ACM Transactions on …, 2023‏ - dl.acm.org‏

Process technology-driven performance and energy efficiency improvements have slowed
down as we approach physical design limits. General-purpose manycore architectures …‏

حفظ اقتباس تم اقتباسها في عدد: 27 مقالات ذات صلة الإصدارات الـ 2كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Optimizing tensor programs on flexible storage‏

M Schleich, A Shaikhha, D Suciu - … of the ACM on Management of Data, 2023‏ - dl.acm.org‏

Tensor programs often need to process large tensors (vectors, matrices, or higher order
tensors) that require a specialized storage format for their memory layout. Several such …‏

حفظ اقتباس تم اقتباسها في عدد: 28 مقالات ذات صلة الإصدارات الـ 4كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies‏

B Hagedorn, J Lenfers, T Koehler, X Qin… - Proceedings of the …, 2020‏ - dl.acm.org‏

Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for
many applications. The predominantly used imperative languages-like C or OpenCL-force …‏

حفظ اقتباس تم اقتباسها في عدد: 63 مقالات ذات صلة الإصدارات الـ 8كلها

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Halide: Decoupling algorithms from schedules for high-performance image processing

MLIR: Scaling compiler infrastructure for domain specific computation‏

MLIR: A compiler infrastructure for the end of Moore's law‏

Kernel operations on the GPU, with autodiff, without memory overflows‏

Graphit: A high-performance graph dsl‏

Exocompilation for productive programming of hardware accelerators‏

DietCode: Automatic optimization for dynamic tensor programs‏

Accelerating reduction and scan using tensor core units‏

Domain-specific architectures: Research problems and promising approaches‏

Optimizing tensor programs on flexible storage‏

Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies‏