Low-rank adaptation of large language model rescoring for parameter-efficient speech recognition

Y Yu, CHH Yang, J Kolehmainen… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We propose a neural language modeling system based on low-rank adaptation (LoRA) for
speech recognition output rescoring. Although pretrained language models (LMs) like BERT …

Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation

R Huang, M Yarmohammadi, S Khudanpur… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing research suggests that automatic speech recognition (ASR) models can benefit
from additional contexts (eg, contact lists, user specified vocabulary). Rare words and …

Contextualized end-to-end automatic speech recognition with intermediate biasing loss

M Shakeel, Y Sudo, Y Peng, S Watanabe - arxiv preprint arxiv …, 2024 - arxiv.org
Contextualized end-to-end automatic speech recognition has been an active research area,
with recent efforts focusing on the implicit learning of contextual phrases based on the final …

Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Z Wu, G Song, C Li, P Rondon, Z Meng, X Velez… - arxiv preprint arxiv …, 2024 - arxiv.org
Contextual biasing enables speech recognizers to transcribe important phrases in the
speaker's context, such as contact names, even if they are rare in, or absent from, the …

Phoneme-aware Encoding for Prefix-tree-based Contextual ASR

H Futami, E Tsunoo, Y Kashiwagi… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
In speech recognition applications, it is important to recognize context-specific rare words,
such as proper nouns. Tree-constrained Pointer Generator (TCPGen) has shown promise …

An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition

YC Wang, LT Pai, BC Yan, HW Wang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
End-to-end (E2E) automatic speech recognition (ASR) models have become standard
practice for various commercial applications. However, in real-world scenarios, the long …

Locality enhanced dynamic biasing and sampling strategies for contextual ASR

MA Jalal, PP Parada, G Pavlidis… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant
rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually …

Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition

H Xu, Z Chen, F Jia, B Ginsburg - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike
conventional Transducers where the decoder embeddings for different tokens are trained …

Promptformer: Prompted conformer transducer for asr

S Duarte-Torres, A Sen, A Rana… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Context cues carry information which can improve multi-turn interactions in automatic
speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired …

An Effective Contextualized Automatic Speech Recognition Approach Leveraging Self-Supervised Phoneme Features

LT Pai, YC Wang, BC Yan, HW Wang… - 2024 Asia Pacific …, 2024 - ieeexplore.ieee.org
Years of scholarly efforts have led to extensive studies on end-to-end automatic speech
recognition (E2E ASR), now demonstrating robust performance in everyday applications …