A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y **e - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization

Z Yuan, C Xue, Y Chen, Q Wu, G Sun - European conference on computer …, 2022 - Springer
Quantization is one of the most effective methods to compress neural networks, which has
achieved great success on convolutional neural networks (CNNs). Recently, vision …

Lut-gemm: Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models

G Park, B Park, M Kim, S Lee, J Kim, B Kwon… - arxiv preprint arxiv …, 2022 - arxiv.org
The recent advancements in self-supervised learning, combined with the Transformer
architecture, have enabled natural language processing (NLP) to achieve remarkably low …

Clip-q: Deep network compression learning by in-parallel pruning-quantization

F Tung, G Mori - Proceedings of the IEEE conference on …, 2018 - openaccess.thecvf.com
Deep neural networks enable state-of-the-art accuracy on visual recognition tasks such as
image classification and object detection. However, modern deep networks contain millions …

Review of lightweight deep convolutional neural networks

F Chen, S Li, J Han, F Ren, Z Yang - Archives of Computational Methods …, 2024 - Springer
Lightweight deep convolutional neural networks (LDCNNs) are vital components of mobile
intelligence, particularly in mobile vision. Although various heavy networks with increasingly …

A survey on methods and theories of quantized neural networks

Y Guo - arxiv preprint arxiv:1808.04752, 2018 - arxiv.org
Deep neural networks are the state-of-the-art methods for many real-world tasks, such as
computer vision, natural language processing and speech recognition. For all its popularity …

Compression of deep learning models for text: A survey

M Gupta, P Agrawal - ACM Transactions on Knowledge Discovery from …, 2022 - dl.acm.org
In recent years, the fields of natural language processing (NLP) and information retrieval (IR)
have made tremendous progress thanks to deep learning models like Recurrent Neural …

Structured binary neural networks for accurate image classification and semantic segmentation

B Zhuang, C Shen, M Tan, L Liu… - Proceedings of the …, 2019 - openaccess.thecvf.com
In this paper, we propose to train convolutional neural networks (CNNs) with both binarized
weights and activations, leading to quantized models specifically for mobile devices with …