Contextual adapters for personalized speech recognition in neural transducers
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR)
models is a challenge due to the lack of training data. A standard way to address this issue …
models is a challenge due to the lack of training data. A standard way to address this issue …
Nam+: Towards scalable end-to-end contextual biasing for adaptive asr
Attention-based biasing techniques for end-to-end ASR systems are able to achieve large
accuracy gains without requiring the inference algorithm adjustments and parameter tuning …
accuracy gains without requiring the inference algorithm adjustments and parameter tuning …
Improving hybrid ctc/attention architecture for agglutinative language speech recognition
Z Ren, N Yolwas, W Slamu, R Cao, H Wang - Sensors, 2022 - mdpi.com
Unlike the traditional model, the end-to-end (E2E) ASR model does not require speech
information such as a pronunciation dictionary, and its system is built through a single neural …
information such as a pronunciation dictionary, and its system is built through a single neural …
Contextualized end-to-end speech recognition with contextual phrase prediction network
Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …
incorporating it into the end-to-end speech recognition models has drawn immense interest …
MEConformer: Highly representative embedding extractor for speaker verification via incorporating selective convolution into deep speaker encoder
Transformer models have demonstrated superior performance across various domains,
including computer vision, natural language processing, and speech recognition. The …
including computer vision, natural language processing, and speech recognition. The …
[PDF][PDF] Dual-mode NAM: Effective top-k context injection for end-to-end asr
ASR systems in real applications must be adapted on the fly to correctly recognize task-
specific contextual terms, such as contacts, application names and media entities. However …
specific contextual terms, such as contacts, application names and media entities. However …
Improving contextual biasing with text injection
In this work, we present a model-based approach to improving contextual biasing that
improves quality without drastically increasing model computation during inference …
improves quality without drastically increasing model computation during inference …
Adaptive contextual biasing for transducer based streaming speech recognition
By incorporating additional contextual information, deep biasing methods have emerged as
a promising solution for speech recognition of personalized words. However, for real-world …
a promising solution for speech recognition of personalized words. However, for real-world …
Robust acoustic and semantic contextual biasing in neural transducers for speech recognition
Attention-based contextual biasing approaches have shown significant improvements in the
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …
Convrnn-t: Convolutional augmented recurrent neural network transducers for streaming speech recognition
The recurrent neural network transducer (RNN-T) is a prominent streaming end-to-end
(E2E) ASR technology. In RNN-T, the acoustic encoder commonly consists of stacks of …
(E2E) ASR technology. In RNN-T, the acoustic encoder commonly consists of stacks of …