Low-rank adaptation of large language model rescoring for parameter-efficient speech recognition
We propose a neural language modeling system based on low-rank adaptation (LoRA) for
speech recognition output rescoring. Although pretrained language models (LMs) like BERT …
speech recognition output rescoring. Although pretrained language models (LMs) like BERT …
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
Existing research suggests that automatic speech recognition (ASR) models can benefit
from additional contexts (eg, contact lists, user specified vocabulary). Rare words and …
from additional contexts (eg, contact lists, user specified vocabulary). Rare words and …
Contextualized end-to-end automatic speech recognition with intermediate biasing loss
Contextualized end-to-end automatic speech recognition has been an active research area,
with recent efforts focusing on the implicit learning of contextual phrases based on the final …
with recent efforts focusing on the implicit learning of contextual phrases based on the final …
Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR
Contextual biasing enables speech recognizers to transcribe important phrases in the
speaker's context, such as contact names, even if they are rare in, or absent from, the …
speaker's context, such as contact names, even if they are rare in, or absent from, the …
Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
In speech recognition applications, it is important to recognize context-specific rare words,
such as proper nouns. Tree-constrained Pointer Generator (TCPGen) has shown promise …
such as proper nouns. Tree-constrained Pointer Generator (TCPGen) has shown promise …
An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition
End-to-end (E2E) automatic speech recognition (ASR) models have become standard
practice for various commercial applications. However, in real-world scenarios, the long …
practice for various commercial applications. However, in real-world scenarios, the long …
Locality enhanced dynamic biasing and sampling strategies for contextual ASR
Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant
rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually …
rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually …
Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition
This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike
conventional Transducers where the decoder embeddings for different tokens are trained …
conventional Transducers where the decoder embeddings for different tokens are trained …
Promptformer: Prompted conformer transducer for asr
S Duarte-Torres, A Sen, A Rana… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Context cues carry information which can improve multi-turn interactions in automatic
speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired …
speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired …
An Effective Contextualized Automatic Speech Recognition Approach Leveraging Self-Supervised Phoneme Features
Years of scholarly efforts have led to extensive studies on end-to-end automatic speech
recognition (E2E ASR), now demonstrating robust performance in everyday applications …
recognition (E2E ASR), now demonstrating robust performance in everyday applications …