A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - ar**_through_Local_Minima_Quantization_in_the_Loss_Landscape_of_ICCV_2023_paper.pdf" data-clk="hl=cs&sa=T&oi=gga&ct=gga&cd=6&d=8435796162240697876&ei=Ddq7Z42LItelieoPwouE4QE" data-clk-atid="FNb1L-r3EXUJ" target="_blank">[PDF] thecvf.com

Jum** through local minima: Quantization in the loss landscape of vision transformers

N Frumkin, D Gope… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Quantization scale and bit-width are the most important parameters when considering how
to quantize a neural network. Prior work focuses on optimizing quantization scales in a …

SSR: Spatial sequential hybrid architecture for latency throughput tradeoff in transformer acceleration

J Zhuang, Z Yang, S Ji, H Huang, AK Jones… - Proceedings of the …, 2024 - dl.acm.org
With the increase in the computation intensity of the chip, the mismatch between
computation layer shapes and the available computation resource significantly limits the …

ViTA: A vision transformer inference accelerator for edge applications

S Nag, G Datta, S Kundu… - … on Circuits and …, 2023 - ieeexplore.ieee.org
Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer,
have recently gained significant traction in computer vision tasks due to their ability to …

Lightening-transformer: A dynamically-operated optically-interconnected photonic transformer accelerator

H Zhu, J Gu, H Wang, Z Jiang, Z Zhang… - … Symposium on High …, 2024 - ieeexplore.ieee.org
The wide adoption and significant computing resource cost of attention-based transformers,
eg, Vision Transformers and large language models, have driven the demand for efficient …