- Academic Search

Y Wang, C Li, C Liu, S Liu, Y Lei, J Zhang… - CCF Transactions on …, 2021 - Springer

Abstract Digital Signal Processors (DSPs) have been widely used in embedded domains,
delivering high performance with ultra-low power consumption. Such promises make it …

保存引用被引用数: 24 関連記事

[Free GPT-4]

[PDF] psu.edu

Stash: Have your scratchpad and cache it too

R Komuravelli, MD Sinclair, J Alsop, M Huzaifa… - ACM SIGARCH …, 2015 - dl.acm.org

Heterogeneous systems employ specialization for energy efficiency. Since data movement
is expected to be a dominant consumer of energy, these systems employ specialized …

保存引用被引用数: 104 関連記事全 11 バージョン

[Free GPT-4]

[PDF] acm.org

ApproxHPVM: a portable compiler IR for accuracy-aware optimizations

H Sharif, P Srivastava, M Huzaifa… - Proceedings of the …, 2019 - dl.acm.org

We propose ApproxHPVM, a compiler IR and system designed to enable accuracy-aware
performance and energy tuning on heterogeneous systems with multiple compute units and …

保存引用被引用数: 30 関連記事全 13 バージョン

[Free GPT-4]

[PDF] samxi.org

[PDF][PDF] Toward cache-friendly hardware accelerators

YS Shao, S **, V Srinivasan, GY Wei… - HPCA Sensors and Cloud …, 2015 - samxi.org

Increasing demand for power-efficient, high-performance computing has spurred a growing
number and diversity of hardware accelerators in mobile Systems on Chip (SoCs) as well as …

保存引用被引用数: 37 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] ieee.org

A novel DSP architecture for scientific computing and deep learning

C Yang, S Chen, J Zhang, Z Lv, Z Wang - IEEE Access, 2019 - ieeexplore.ieee.org

Exascale computing requires accelerators with ultrahigh power efficiency. Digital signal
processors (DSPs), the most important embedded processors widely known for high power …

保存引用被引用数: 20 関連記事全 2 バージョン

[Free GPT-4]

[PDF] nealcrago.com

WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization

NC Crago, S Damani, K Sankaralingam… - … Symposium on High …, 2024 - ieeexplore.ieee.org

Graphics processing units (GPUs) are an important class of parallel processors that offer
high compute throughput and memory bandwidth. GPUs are used in a variety of important …

保存引用被引用数: 3 関連記事全 3 バージョン

[Free GPT-4]

[PDF] google.com

Coordinated DMA: improving the DRAM access efficiency for matrix multiplication

S Ma, Z Liu, S Chen, L Huang, Y Guo… - … on Parallel and …, 2019 - ieeexplore.ieee.org

High performance implementation of matrix multiplication is essential for scientific
computing. The memory access procedure is quite possible to be the bottleneck of matrix …

保存引用被引用数: 13 関連記事全 3 バージョン

[Free GPT-4]

[PDF] google.com

An efficient direct memory access (DMA) controller for scientific computing accelerators

S Ma, L Huang, Y Lei, Y Guo… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org

We design an efficient DMA controller for scientific computing accelerators. It supports
several flexible and powerful transfers, including reshape transfers, parameter linking …

保存引用被引用数: 10 関連記事全 2 バージョン

[Free GPT-4]

[PDF] acm.org

ELF: Maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling

JJK Park, Y Park, S Mahlke - … of the International Conference for High …, 2015 - dl.acm.org

Graphics processing units (GPUs) are increasingly utilized as throughput engines in the
modern computer systems. GPUs rely on fast context switching between thousands of …

保存引用被引用数: 15 関連記事全 8 バージョン

[Free GPT-4]

[PDF] arxiv.org

CIAO: Cache interference-aware throughput-oriented architecture and scheduling for GPUs

J Zhang, S Gao, NS Kim, M Jung - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

A modern GPU aims to simultaneously execute more warps for higher Thread-Level
Parallelism (TLP) and performance. When generating many memory requests, however …

保存引用被引用数: 11 関連記事全 10 バージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

D2MA: Accelerating coarse-grained data transfer for GPUs

Advancing DSP into HPC, AI, and beyond: challenges, mechanisms, and future directions

Stash: Have your scratchpad and cache it too

ApproxHPVM: a portable compiler IR for accuracy-aware optimizations

[PDF][PDF] Toward cache-friendly hardware accelerators

A novel DSP architecture for scientific computing and deep learning

WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization

Coordinated DMA: improving the DRAM access efficiency for matrix multiplication

An efficient direct memory access (DMA) controller for scientific computing accelerators

ELF: Maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling

CIAO: Cache interference-aware throughput-oriented architecture and scheduling for GPUs