Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Hardware accelerator design for sparse DNN inference and training: A tutorial
Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence
generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning …
generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning …
[HTML][HTML] Optimizing memory access efficiency in CUDA kernel via data layout technique
Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-
performance computing, playing pivotal roles in advancing fields like IoT, autonomous …
performance computing, playing pivotal roles in advancing fields like IoT, autonomous …
Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link
The deep learning models (DL) are becoming bigger, easily beyond the memory capacity of
a single accelerator. The recent progress in large DL training utilizes CPU memory as an …
a single accelerator. The recent progress in large DL training utilizes CPU memory as an …
Torch2Chip: An end-to-end customizable deep neural network compression and deployment toolkit for prototype hardware accelerator design
Deep neural network (DNN) compression (eg, quantization, pruning) has been widely
investigated in variousdeep learning tasks (eg, vision and language). The development of …
investigated in variousdeep learning tasks (eg, vision and language). The development of …
Fusemax: Leveraging extended einsums to optimize attention accelerator design
Attention for transformers is a critical workload that has recently received significant
'attention'as a target for custom acceleration. Yet, while prior work succeeds in reducing …
'attention'as a target for custom acceleration. Yet, while prior work succeeds in reducing …
Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications
Accelerating matrix multiplication is crucial to achieve high performance in many application
domains, including neural networks, graph analytics, and scientific computing. These …
domains, including neural networks, graph analytics, and scientific computing. These …
BBS: Bi-directional bit-level sparsity for deep learning acceleration
Bit-level sparsity methods skip ineffectual zero-bit operations and are typically applicable
within bit-serial deep learning accelerators. This type of sparsity at the bit-level is especially …
within bit-serial deep learning accelerators. This type of sparsity at the bit-level is especially …
Towards cognitive ai systems: Workload and characterization of neuro-symbolic ai
The remarkable advancements in artificial intel-ligence (AI), primarily driven by deep neural
networks, are facing challenges surrounding unsustainable computational tra-jectories …
networks, are facing challenges surrounding unsustainable computational tra-jectories …
SOFA: A compute-memory optimized sparsity accelerator via cross-stage coordinated tiling
H Wang, J Fang, X Tang, Z Yue, J Li… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Benefiting from the self-attention mechanism, Transformer models have attained impressive
contextual comprehension capabilities for lengthy texts. The requirements of high …
contextual comprehension capabilities for lengthy texts. The requirements of high …
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Large language models (LLMs) have demonstrated remarkable performance across various
machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders …
machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders …