Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey of design and optimization for systolic array-based dnn accelerators
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …
DNN hardware accelerators. However, the design of systolic arrays also encountered many …
Full stack optimization of transformer inference: a survey
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …
Transformer models. These models achieve superior accuracy across a wide range of …
Sparseloop: An analytical approach to sparse tensor accelerator modeling
In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …
algebra applications (eg, sparse neural networks). However, these proposals are single …
S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration
Exploiting sparsity is a key technique in accelerating quantized convolutional neural network
(CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit …
(CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit …
Freely scalable and reconfigurable optical hardware for deep learning
As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy
and solve more complex problems. This trend has been enabled by an increase in available …
and solve more complex problems. This trend has been enabled by an increase in available …
Llmcompass: Enabling efficient hardware design for large language model inference
The past year has witnessed the increasing popularity of Large Language Models (LLMs).
Their unprecedented scale and associated high hardware cost have impeded their broader …
Their unprecedented scale and associated high hardware cost have impeded their broader …
Transform quantization for CNN compression
In this paper, we compress convolutional neural network (CNN) weights post-training via
transform quantization. Previous CNN quantization techniques tend to ignore the joint …
transform quantization. Previous CNN quantization techniques tend to ignore the joint …
Automatic domain-specific soc design for autonomous unmanned aerial vehicles
Building domain-specific accelerators is becoming increasingly paramount to meet the high-
performance requirements under stringent power and real-time constraints. However …
performance requirements under stringent power and real-time constraints. However …
Tileflow: A framework for modeling fusion dataflow via tree-based analysis
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …
Technology prospects for data-intensive computing
K Akarvardar, HSP Wong - Proceedings of the IEEE, 2023 - ieeexplore.ieee.org
For many decades, progress in computing hardware has been closely associated with
CMOS logic density, performance, and cost. As such, slowdown in 2-D scaling, frequency …
CMOS logic density, performance, and cost. As such, slowdown in 2-D scaling, frequency …