Scaling laws for discriminative speech recognition rescoring models

Y Gu, PG Shivakumar, J Kolehmainen… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Recent studies have found that model performance has a smooth power-law relationship, or
scaling laws, with training data and model size, for a wide range of problems. These scaling …

Transformer-based Model for ASR N-Best Rescoring and Rewriting

IE Kang, C Van Gysel, MH Siu - arxiv preprint arxiv:2406.08207, 2024‏ - arxiv.org
Voice assistants increasingly use on-device Automatic Speech Recognition (ASR) to ensure
speed and privacy. However, due to resource constraints on the device, queries pertaining …

EEL: Efficiently encoding lattices for reranking

P Singhal, J Xu, X Ye, G Durrett - arxiv preprint arxiv:2306.00947, 2023‏ - arxiv.org
Standard decoding approaches for conditional text generation tasks typically search for an
output hypothesis with high model probability, but this may not yield the best hypothesis …

Personalization for bert-based discriminative speech recognition rescoring

J Kolehmainen, Y Gu, A Gourav… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Recognition of personalized content remains a challenge in end-to-end speech recognition.
We explore three novel approaches that use personalized content in a neural rescoring step …

Leveraging Cross-Utterance Context For ASR Decoding

R Flynn, A Ragni - arxiv preprint arxiv:2306.16903, 2023‏ - arxiv.org
While external language models (LMs) are often incorporated into the decoding stage of
automated speech recognition systems, these models usually operate with limited context …

[HTML][HTML] RNN-T lattice enhancement by grafting of pruned paths

M Novak, P Papadopoulos - 2022‏ - amazon.science
Abstract Recurrent Neural Network Transducers (RNN-T)—a streaming variant of end-to-
end models—became very popular in recent years. Since RNN-T networks condition the …