A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Hardware acceleration of LLMs: A comprehensive survey and comparison

N Koilia, C Kachris - arxiv preprint arxiv:2409.03384, 2024 - arxiv.org
Large Language Models (LLMs) have emerged as powerful tools for natural language
processing tasks, revolutionizing the field with their ability to understand and generate …

A survey on neural network hardware accelerators

T Mohaidat, K Khalil - IEEE Transactions on Artificial …, 2024 - ieeexplore.ieee.org
Artificial intelligence hardware accelerator is an emerging research for several applications
and domains. The hardware accelerator's direction is to provide high computational speed …

TransFusionNet: Semantic and spatial features fusion framework for liver tumor and vessel segmentation under JetsonTX2

X Wang, X Zhang, G Wang, Y Zhang… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Liver cancer is one of the most common malignant diseases worldwide. Segmentation and
reconstruction of liver tumors and vessels in CT images can provide convenience for …

A 109-gops/w fpga-based vision transformer accelerator with weight-loop dataflow featuring data reusing and resource saving

Y Zhang, L Feng, H Shan, Z Zhu - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The Vision Transformer (ViT) models have demonstrated excellent performance in computer
vision tasks, but a large amount of computation and memory access for massive matrix …

A high-performance pixel-level fully pipelined hardware accelerator for neural networks

Z Li, Z Zhang, J Hu, Q Meng, X Shi… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
The design of convolutional neural network (CNN) hardware accelerators based on a single
computing engine (CE) architecture or multi-CE architecture has received widespread …

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier
Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …

Accelerating transformer neural networks on fpgas for high energy physics experiments

F Wojcicki, Z Que, AD Tapper… - … Conference on Field …, 2022 - ieeexplore.ieee.org
High Energy Physics studies the fundamental forces and elementary particles of the
Universe. With the unprecedented scale of experiments comes the challenge of accurate …

COSA: Co-Operative Systolic Arrays for Multi-head Attention Mechanism in Neural Network using Hybrid Data Reuse and Fusion Methodologies

Z Wang, G Wang, H Jiang, N Xu… - 2023 60th ACM/IEEE …, 2023 - ieeexplore.ieee.org
Attention mechanism acceleration is becoming increasingly vital to achieve superior
performance in deep learning tasks. Existing accelerators are commonly devised …

SIMD Dataflow Co-optimization for Efficient Neural Networks Inferences on CPUs

C Zhou, Z Hassman, R Xu, D Shah, V Richard… - arxiv preprint arxiv …, 2023 - arxiv.org
We address the challenges associated with deploying neural networks on CPUs, with a
particular focus on minimizing inference time while maintaining accuracy. Our novel …