Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …
remained an active research area. Previous solutions to this problem were either designed …
Modeling spoken information queries for virtual assistants: Open problems, challenges and opportunities
C Van Gysel - Proceedings of the 46th International ACM SIGIR …, 2023 - dl.acm.org
Virtual assistants are becoming increasingly important speech-driven Information Retrieval
platforms that assist users with various tasks. We discuss open problems and challenges …
platforms that assist users with various tasks. We discuss open problems and challenges …
Internal language model training for domain-adaptive end-to-end speech recognition
The efficacy of external language model (LM) integration with existing end-to-end (E2E)
automatic speech recognition (ASR) systems can be improved significantly using the …
automatic speech recognition (ASR) systems can be improved significantly using the …
Bayesian neural network language modeling for speech recognition
State-of-the-art neural network language models (NNLMs) represented by long short term
memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly …
memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly …
Tree-constrained pointer generator for end-to-end contextual speech recognition
Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …
Contextualized end-to-end speech recognition with contextual phrase prediction network
Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …
incorporating it into the end-to-end speech recognition models has drawn immense interest …
Semantic distance: A new metric for asr performance analysis towards spoken language understanding
Word Error Rate (WER) has been the predominant metric used to evaluate the performance
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …
Adaptive contextual biasing for transducer based streaming speech recognition
By incorporating additional contextual information, deep biasing methods have emerged as
a promising solution for speech recognition of personalized words. However, for real-world …
a promising solution for speech recognition of personalized words. However, for real-world …
Dissecting user-perceived latency of on-device E2E speech recognition
As speech-enabled devices such as smartphones and smart speakers become increasingly
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …
Deep fusion framework for speech command recognition using acoustic and linguistic features
The research problem addressed in this study is how to effectively combine multimodal data
from imperfect text transcripts and raw audio in a deep framework for automatic speech …
from imperfect text transcripts and raw audio in a deep framework for automatic speech …