A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier
Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …

LAMP-Q: Layer Sensitivity-Aware Mixed-Precision Quantization for MobileNetV3

S Yoon, N Kim, H Kim - 2025 International Conference on …, 2025 - ieeexplore.ieee.org
Quantization is an effective technique for reducing memory usage and power consumption
in deep neural networks (DNNs) by decreasing parameter size. However, conventional …