الباحث العلمي من Google

Efficiently emulating high-bitwidth computation with low-bitwidth hardware

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Mad macce: Supporting multiply-add operations for democratizing matrix-multiplication accelerators‏

S Sung, S Hur, S Kim, D Ha, Y Oh, WW Ro - Proceedings of the 56th …, 2023‏ - dl.acm.org‏

Modern GPUs commonly employ specialized matrix multiplication units (MXUs) to
accelerate matrix multiplication, the core computation of deep learning workloads. However …‏

حفظ اقتباس تم اقتباسها في عدد: 2 مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]
[DeepSeek]

[PDF] github.io

MixPert: Optimizing Mixed-Precision Floating-Point Emulation on GPU Integer Tensor Cores‏

Z Lin, A Sun, X Zhang, Y Lu - Proceedings of the 25th ACM SIGPLAN …, 2024‏ - dl.acm.org‏

Featuring mixed-precision tensor operations, accelerators significantly enhance
performance for many error-tolerant computing tasks, but their applicability is limited in …‏

حفظ اقتباس تم اقتباسها في عدد: 1 مقالات ذات صلة الإصدارات الـ 4كلها

LE-GEMM: A lightweight emulation-based GEMM with precision refinement on GPU‏

Y Zhang, L Lu, Z Yang, Z Liang, S Suo - Journal of Systems Architecture, 2025‏ - Elsevier‏

Many special hardware units, such as Matrix Core and Tensor Core, have recently been
designed and applied in various scientific computing scenarios. These units support tensor …‏

حفظ اقتباس مقالات ذات صلة

[Free GPT-4]
[DeepSeek]

[PDF] ucr.edu

M³XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUs‏

D Ha, Y Zhang, CC Kao, CJ Hughes… - … Conference for High …, 2024‏ - ieeexplore.ieee.org‏

Beyond the high-profile artificial intelligence and machine learning (AI/ML) workloads, the
demand for high-performance matrix operations on standard and complex floating-point …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mixed-precision numerics in scientific applications: survey and perspectives‏

A Kashi, H Lu, W Brewer, D Rogers… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The explosive demand for artificial intelligence (AI) workloads has led to a significant
increase in silicon area dedicated to lower-precision computations on recent high …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] escholarship.org

[كتاب][B] Democratizing Tensor Processors: Efficient and Generalized Tensor Computation with Architectural Support‏

Y Zhang - 2024‏ - search.proquest.com‏

Tensor processors, notably matrix units (MXUs), have become indispensable in accelerating
matrix operations for machine learning. However, their specialized design and limited …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 3كلها بحث عن المكتبات

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Efficiently emulating high-bitwidth computation with low-bitwidth hardware

Mad macce: Supporting multiply-add operations for democratizing matrix-multiplication accelerators‏

MixPert: Optimizing Mixed-Precision Floating-Point Emulation on GPU Integer Tensor Cores‏

LE-GEMM: A lightweight emulation-based GEMM with precision refinement on GPU‏

M³XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUs‏

Mixed-precision numerics in scientific applications: survey and perspectives‏

[كتاب][B] Democratizing Tensor Processors: Efficient and Generalized Tensor Computation with Architectural Support‏

Efficiently emulating high-bitwidth computation with low-bitwidth hardware

Mad macce: Supporting multiply-add operations for democratizing matrix-multiplication accelerators‏

MixPert: Optimizing Mixed-Precision Floating-Point Emulation on GPU Integer Tensor Cores‏

LE-GEMM: A lightweight emulation-based GEMM with precision refinement on GPU‏

M3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUs‏

Mixed-precision numerics in scientific applications: survey and perspectives‏

[كتاب][B] Democratizing Tensor Processors: Efficient and Generalized Tensor Computation with Architectural Support‏

M³XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUs‏