- Academic Search

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org

In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

บันทึก อ้างอิง อ้างโดย44 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

บันทึก อ้างอิง อ้างโดย99 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sparseloop: An analytical approach to sparse tensor accelerator modeling

YN Wu, PA Tsai, A Parashar, V Sze… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …

บันทึก อ้างอิง อ้างโดย64 บทความที่เกี่ยวข้อง ทั้งหมด 10 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration

ZG Liu, PN Whatmough, Y Zhu… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Exploiting sparsity is a key technique in accelerating quantized convolutional neural network
(CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit …

บันทึก อ้างอิง อ้างโดย81 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

Freely scalable and reconfigurable optical hardware for deep learning

L Bernstein, A Sludds, R Hamerly, V Sze, J Emer… - Scientific reports, 2021 - nature.com

As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy
and solve more complex problems. This trend has been enabled by an increase in available …

บันทึก อ้างอิง อ้างโดย97 บทความที่เกี่ยวข้อง ทั้งหมด 11 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] princeton.edu

Llmcompass: Enabling efficient hardware design for large language model inference

H Zhang, A Ning, RB Prabhakar… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org

The past year has witnessed the increasing popularity of Large Language Models (LLMs).
Their unprecedented scale and associated high hardware cost have impeded their broader …

บันทึก อ้างอิง อ้างโดย12 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transform quantization for CNN compression

SI Young, W Zhe, D Taubman… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

In this paper, we compress convolutional neural network (CNN) weights post-training via
transform quantization. Previous CNN quantization techniques tend to ignore the joint …

บันทึก อ้างอิง อ้างโดย89 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

Automatic domain-specific soc design for autonomous unmanned aerial vehicles

S Krishnan, Z Wan, K Bhardwaj… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Building domain-specific accelerators is becoming increasingly paramount to meet the high-
performance requirements under stringent power and real-time constraints. However …

บันทึก อ้างอิง อ้างโดย33 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org

With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

บันทึก อ้างอิง อ้างโดย16 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

Technology prospects for data-intensive computing

K Akarvardar, HSP Wong - Proceedings of the IEEE, 2023 - ieeexplore.ieee.org

For many decades, progress in computing hardware has been closely associated with
CMOS logic density, performance, and cost. As such, slowdown in 2-D scaling, frequency …

บันทึก อ้างอิง อ้างโดย21 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

A systematic methodology for characterizing scalability of dnn accelerators using scale-sim

A survey of design and optimization for systolic array-based dnn accelerators

Full stack optimization of transformer inference: a survey

Sparseloop: An analytical approach to sparse tensor accelerator modeling

S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration

Freely scalable and reconfigurable optical hardware for deep learning

Llmcompass: Enabling efficient hardware design for large language model inference

Transform quantization for CNN compression

Automatic domain-specific soc design for autonomous unmanned aerial vehicles

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

Technology prospects for data-intensive computing