Contextual adapters for personalized speech recognition in neural transducers

KM Sathyendra, T Muniyappa, FJ Chang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR)
models is a challenge due to the lack of training data. A standard way to address this issue …

Nam+: Towards scalable end-to-end contextual biasing for adaptive asr

T Munkhdalai, Z Wu, G Pundak, KC Sim… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Attention-based biasing techniques for end-to-end ASR systems are able to achieve large
accuracy gains without requiring the inference algorithm adjustments and parameter tuning …

Improving hybrid ctc/attention architecture for agglutinative language speech recognition

Z Ren, N Yolwas, W Slamu, R Cao, H Wang - Sensors, 2022 - mdpi.com
Unlike the traditional model, the end-to-end (E2E) ASR model does not require speech
information such as a pronunciation dictionary, and its system is built through a single neural …

Contextualized end-to-end speech recognition with contextual phrase prediction network

K Huang, A Zhang, Z Yang, P Guo, B Mu, T Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …

MEConformer: Highly representative embedding extractor for speaker verification via incorporating selective convolution into deep speaker encoder

Q Zheng, Z Chen, Z Wang, H Liu, M Lin - Expert Systems with Applications, 2024 - Elsevier
Transformer models have demonstrated superior performance across various domains,
including computer vision, natural language processing, and speech recognition. The …

[PDF][PDF] Dual-mode NAM: Effective top-k context injection for end-to-end asr

Z Wu, T Munkhdalai, P Rondon, G Pundak… - Proc …, 2023 - isca-archive.org
ASR systems in real applications must be adapted on the fly to correctly recognize task-
specific contextual terms, such as contacts, application names and media entities. However …

Improving contextual biasing with text injection

TN Sainath, R Prabhavalkar, D Caseiro… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
In this work, we present a model-based approach to improving contextual biasing that
improves quality without drastically increasing model computation during inference …

Adaptive contextual biasing for transducer based streaming speech recognition

T Xu, Z Yang, K Huang, P Guo, A Zhang, B Li… - arxiv preprint arxiv …, 2023 - arxiv.org
By incorporating additional contextual information, deep biasing methods have emerged as
a promising solution for speech recognition of personalized words. However, for real-world …

Robust acoustic and semantic contextual biasing in neural transducers for speech recognition

X Fu, KM Sathyendra, A Gandhe, J Liu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Attention-based contextual biasing approaches have shown significant improvements in the
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …

Convrnn-t: Convolutional augmented recurrent neural network transducers for streaming speech recognition

M Radfar, R Barnwal, RV Swaminathan… - arxiv preprint arxiv …, 2022 - arxiv.org
The recurrent neural network transducer (RNN-T) is a prominent streaming end-to-end
(E2E) ASR technology. In RNN-T, the acoustic encoder commonly consists of stacks of …