Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arxiv preprint arxiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Annealing knowledge distillation

A Jafari, M Rezagholizadeh, P Sharma… - arxiv preprint arxiv …, 2021 - arxiv.org
Significant memory and computational requirements of large deep neural networks restrict
their application on edge devices. Knowledge distillation (KD) is a prominent model …

One-shot model for mixed-precision quantization

I Koryakovskiy, A Yakovleva… - Proceedings of the …, 2023 - openaccess.thecvf.com
Neural network quantization is a popular approach for model compression. Modern
hardware supports quantization in mixed-precision mode, which allows for greater …

Autofreeze: Automatically freezing model blocks to accelerate fine-tuning

Y Liu, S Agarwal, S Venkataraman - arxiv preprint arxiv:2102.01386, 2021 - arxiv.org
With the rapid adoption of machine learning (ML), a number of domains now use the
approach of fine tuning models which were pre-trained on a large corpus of data. However …

4-bit conformer with native quantization aware training for speech recognition

S Ding, P Meadowlark, Y He, L Lew, S Agrawal… - arxiv preprint arxiv …, 2022 - arxiv.org
Reducing the latency and model size has always been a significant research problem for
live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model …

USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models

S Ding, D Qiu, D Rim, Y He, O Rybakov… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
End-to-end automatic speech recognition (ASR) models have seen revolutionary quality
gains with the recent development of large-scale universal speech models (USM). However …

Integer-only zero-shot quantization for efficient speech recognition

S Kim, A Gholami, Z Yao, N Lee, P Wang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
End-to-end neural network models achieve improved performance on various automatic
speech recognition (ASR) tasks. However, these models perform poorly on edge hardware …

Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

Z Yu, S Li, L Sun, L Liu, W Haining - Connection Science, 2022 - Taylor & Francis
With the development of deep learning, neural networks are widely used in various fields,
and the improved model performance also introduces a considerable number of parameters …

Transformer-based Arabic dialect identification

W Lin, M Madhavi, RK Das, H Li - … International Conference on …, 2020 - ieeexplore.ieee.org
This paper presents a dialect identification (DID) system based on the transformer neural
network architecture. The conventional convolutional neural network (CNN)-based systems …