Funasr: A fundamental end-to-end speech recognition toolkit

Z Gao, Z Li, J Wang, H Luo, X Shi, M Chen, Y Li… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper introduces FunASR, an open-source speech recognition toolkit designed to
bridge the gap between academic research and industrial applications. FunASR offers …

Accelerating inference for pretrained language models by unified multi-perspective early exiting

J Kong, J Wang, LC Yu, X Zhang - Proceedings of the 29th …, 2022 - aclanthology.org
Conditional computation algorithms, such as the early exiting (EE) algorithm, can be applied
to accelerate the inference of pretrained language models (PLMs) while maintaining …

Multimodality self-distillation for fast inference of vision and language pretrained models

J Kong, J Wang, LC Yu, X Zhang - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The computational cost of the vision and language pretrained models (VL-PTMs) limits their
deployment in resource-constrained devices that require low latency. One existing solution …

Omni-sparsity dnn: Fast sparsity optimization for on-device streaming e2e asr via supernet

H Yang, Y Shangguan, D Wang, M Li… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
From wearables to powerful smart devices, modern automatic speech recognition (ASR)
models run on a variety of edge devices with different computational budgets. To navigate …

[PDF][PDF] Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning.

S Tian, K Deng, Z Li, L Ye, G Cheng, T Li, Y Yan - Interspeech, 2022 - isca-archive.org
Recently, end-to-end ASR models based on connectionist temporal classification (CTC)
have achieved impressive results, but their performance is limited in lightweight models …

Residualtransformer: Residual low-rank learning with weight-sharing for transformer layers

Y Wang, J Li - … 2024-2024 IEEE International Conference on …, 2024 - ieeexplore.ieee.org
Memory constraint of always-on devices is one of the major concerns when deploying
speech processing models on these devices. While larger models trained with sufficiently …

Distilling multi-level x-vector knowledge for small-footprint speaker verification

X Liu, M Sahidullah, T Kinnunen - arxiv preprint arxiv:2303.01125, 2023 - arxiv.org
Even though deep speaker models have demonstrated impressive accuracy in speaker
verification tasks, this often comes at the expense of increased model size and computation …

Dynamic ASR pathways: An adaptive masking approach towards efficient pruning of a multilingual ASR model

J **e, K Li, J Guo, A Tjandra… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Neural network pruning offers an effective method for compressing a multilingual automatic
speech recognition (ASR) model with minimal performance loss. However, it entails several …

Adaptive Ensemble Self-Distillation With Consistent Gradients for Fast Inference of Pretrained Language Models

J Kong, J Wang, X Zhang - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Conditional computation algorithms, eg, the early exiting (EE) strategy, can accelerate the
inference of pretrained language models (PLMs) by exiting shallow layers without …

Factorized and progressive knowledge distillation for CTC-based ASR models

S Tian, Z Li, Z Lyv, G Cheng, Q **ao, T Li, Q Zhao - Speech Communication, 2024 - Elsevier
Abstract Knowledge distillation (KD) is a popular model compression method to improve the
performance of lightweight models by transferring knowledge from a teacher model to a …