Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

D Le, M Jain, G Keren, S Kim, Y Shi… - arxiv preprint arxiv …, 2021 - arxiv.org
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …

Modeling spoken information queries for virtual assistants: Open problems, challenges and opportunities

C Van Gysel - Proceedings of the 46th International ACM SIGIR …, 2023 - dl.acm.org
Virtual assistants are becoming increasingly important speech-driven Information Retrieval
platforms that assist users with various tasks. We discuss open problems and challenges …

Internal language model training for domain-adaptive end-to-end speech recognition

Z Meng, N Kanda, Y Gaur… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The efficacy of external language model (LM) integration with existing end-to-end (E2E)
automatic speech recognition (ASR) systems can be improved significantly using the …

Bayesian neural network language modeling for speech recognition

B Xue, S Hu, J Xu, M Geng, X Liu… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
State-of-the-art neural network language models (NNLMs) represented by long short term
memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly …

Tree-constrained pointer generator for end-to-end contextual speech recognition

G Sun, C Zhang, PC Woodland - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …

Contextualized end-to-end speech recognition with contextual phrase prediction network

K Huang, A Zhang, Z Yang, P Guo, B Mu, T Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …

Semantic distance: A new metric for asr performance analysis towards spoken language understanding

S Kim, A Arora, D Le, CF Yeh, C Fuegen… - arxiv preprint arxiv …, 2021 - arxiv.org
Word Error Rate (WER) has been the predominant metric used to evaluate the performance
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …

Adaptive contextual biasing for transducer based streaming speech recognition

T Xu, Z Yang, K Huang, P Guo, A Zhang, B Li… - arxiv preprint arxiv …, 2023 - arxiv.org
By incorporating additional contextual information, deep biasing methods have emerged as
a promising solution for speech recognition of personalized words. However, for real-world …

Dissecting user-perceived latency of on-device E2E speech recognition

Y Shangguan, R Prabhavalkar, H Su… - arxiv preprint arxiv …, 2021 - arxiv.org
As speech-enabled devices such as smartphones and smart speakers become increasingly
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …

Deep fusion framework for speech command recognition using acoustic and linguistic features

S Mehra, S Susan - Multimedia Tools and Applications, 2023 - Springer
The research problem addressed in this study is how to effectively combine multimodal data
from imperfect text transcripts and raw audio in a deep framework for automatic speech …