الباحث العلمي من Google

J Cong, J Lau, G Liu, S Neuendorffer, P Pan… - ACM Transactions on …, 2022‏ - dl.acm.org‏

The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it
went from prototy** to deployment. A decade later, in this article, we assess the progress …‏

حفظ اقتباس تم اقتباسها في عدد: 142 مقالات ذات صلة الإصدارات الـ 7كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation‏

J Ansel, E Yang, H He, N Gimelshein, A Jain… - Proceedings of the 29th …, 2024‏ - dl.acm.org‏

This paper introduces two extensions to the popular PyTorch machine learning framework,
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …‏

حفظ اقتباس تم اقتباسها في عدد: 428 مقالات ذات صلة الإصدارات الـ 6كلها

[Free GPT-4]
[DeepSeek]

[PDF] mlsys.org

Pathways: Asynchronous distributed dataflow for ml‏

P Barham, A Chowdhery, J Dean… - Proceedings of …, 2022‏ - proceedings.mlsys.org‏

We present the design of a new large scale orchestration layer for accelerators. Our system,
Pathways, is explicitly designed to enable exploration of new systems and ML research …‏

حفظ اقتباس تم اقتباسها في عدد: 148 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Tensorir: An abstraction for automatic tensorized program optimization‏

S Feng, B Hou, H **, W Lin, J Shao, R Lai… - Proceedings of the 28th …, 2023‏ - dl.acm.org‏

Deploying deep learning models on various devices has become an important topic. The
wave of hardware specialization brings a diverse set of acceleration primitives for multi …‏

حفظ اقتباس تم اقتباسها في عدد: 71 مقالات ذات صلة الإصدارات الـ 4كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on deep learning hardware accelerators for heterogeneous hpc platforms‏

C Silvano, D Ielmini, F Ferrandi, L Fiorin… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …‏

حفظ اقتباس تم اقتباسها في عدد: 47 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Challenges and opportunities to enable large-scale computing via heterogeneous chiplets‏

Z Yang, S Ji, X Chen, J Zhuang… - 2024 29th Asia and …, 2024‏ - ieeexplore.ieee.org‏

Fast-evolving artificial intelligence (AI) algorithms such as large language models have
been driving the ever-increasing computing demands in today's data centers …‏

حفظ اقتباس تم اقتباسها في عدد: 9 مقالات ذات صلة الإصدارات الـ 6كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Allo: A programming model for composable accelerator design‏

H Chen, N Zhang, S **ang, Z Zeng, M Dai… - Proceedings of the ACM …, 2024‏ - dl.acm.org‏

Special-purpose hardware accelerators are increasingly pivotal for sustaining performance
improvements in emerging applications, especially as the benefits of technology scaling …‏

حفظ اقتباس تم اقتباسها في عدد: 13 مقالات ذات صلة الإصدارات الـ 8كلها

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{SecretFlow-SPU}: A performant and {User-Friendly} framework for {Privacy-Preserving} machine learning‏

J Ma, Y Zheng, J Feng, D Zhao, H Wu, W Fang… - 2023 USENIX Annual …, 2023‏ - usenix.org‏

With the increasing public attention to data security and privacy protection, privacy-
preserving machine learning (PPML) has become a research hotspot in recent years …‏

حفظ اقتباس تم اقتباسها في عدد: 34 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] mlsys.org

Apollo: Automatic partition-based operator fusion through layer by layer optimization‏

J Zhao, X Gao, R **a, Z Zhang… - Proceedings of …, 2022‏ - proceedings.mlsys.org‏

We study fusion for deep neural networks (DNNs) in a just-in-time (JIT) compilation
framework Apollo. It considers both memory-and compute-bound tensor operators for fusion …‏

حفظ اقتباس تم اقتباسها في عدد: 46 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] ring0.me

AKG: automatic kernel generation for neural processing units using polyhedral transformations‏

J Zhao, B Li, W Nie, Z Geng, R Zhang, X Gao… - Proceedings of the …, 2021‏ - dl.acm.org‏

Existing tensor compilers have proven their effectiveness in deploying deep neural networks
on general-purpose hardware like CPU and GPU, but optimizing for neural processing units …‏

حفظ اقتباس تم اقتباسها في عدد: 80 مقالات ذات صلة الإصدارات الـ 7كلها

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

MLIR: Scaling compiler infrastructure for domain specific computation

FPGA HLS today: successes, challenges, and opportunities‏

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation‏

Pathways: Asynchronous distributed dataflow for ml‏

Tensorir: An abstraction for automatic tensorized program optimization‏

A survey on deep learning hardware accelerators for heterogeneous hpc platforms‏

Challenges and opportunities to enable large-scale computing via heterogeneous chiplets‏

Allo: A programming model for composable accelerator design‏

{SecretFlow-SPU}: A performant and {User-Friendly} framework for {Privacy-Preserving} machine learning‏

Apollo: Automatic partition-based operator fusion through layer by layer optimization‏

AKG: automatic kernel generation for neural processing units using polyhedral transformations‏