Hardware accelerator design for sparse DNN inference and training: A tutorial

W Mao, M Wang, X **e, X Wu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence
generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning …

[HTML][HTML] Optimizing memory access efficiency in CUDA kernel via data layout technique

N Seifi, A Al-Mamun - Journal of Computer and Communications, 2024 - scirp.org
Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-
performance computing, playing pivotal roles in advancing fields like IoT, autonomous …

Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link

D Xu, Y Feng, K Shin, D Kim, H Jeon… - … Conference for High …, 2024 - ieeexplore.ieee.org
The deep learning models (DL) are becoming bigger, easily beyond the memory capacity of
a single accelerator. The recent progress in large DL training utilizes CPU memory as an …

Torch2Chip: An end-to-end customizable deep neural network compression and deployment toolkit for prototype hardware accelerator design

J Meng, Y Liao, A Anupreetham… - Proceedings of …, 2024 - proceedings.mlsys.org
Deep neural network (DNN) compression (eg, quantization, pruning) has been widely
investigated in variousdeep learning tasks (eg, vision and language). The development of …

Fusemax: Leveraging extended einsums to optimize attention accelerator design

N Nayak, X Wu, TO Odemuyiwa… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Attention for transformers is a critical workload that has recently received significant
'attention'as a target for custom acceleration. Yet, while prior work succeeds in reducing …

Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications

Y Yang, JS Emer, D Sanchez - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org
Accelerating matrix multiplication is crucial to achieve high performance in many application
domains, including neural networks, graph analytics, and scientific computing. These …

BBS: Bi-directional bit-level sparsity for deep learning acceleration

Y Chen, J Meng, J Seo… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Bit-level sparsity methods skip ineffectual zero-bit operations and are typically applicable
within bit-serial deep learning accelerators. This type of sparsity at the bit-level is especially …

Towards cognitive ai systems: Workload and characterization of neuro-symbolic ai

Z Wan, CK Liu, H Yang, R Raj, C Li… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
The remarkable advancements in artificial intel-ligence (AI), primarily driven by deep neural
networks, are facing challenges surrounding unsustainable computational tra-jectories …

SOFA: A compute-memory optimized sparsity accelerator via cross-stage coordinated tiling

H Wang, J Fang, X Tang, Z Yue, J Li… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Benefiting from the self-attention mechanism, Transformer models have attained impressive
contextual comprehension capabilities for lengthy texts. The requirements of high …

BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration

Y Chen, AF AbouElhamayed, X Dai, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated remarkable performance across various
machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders …