XLS-R: Self-supervised cross-lingual speech representation learning at scale
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …
Unsupervised cross-lingual representation learning for speech recognition
This paper presents XLSR which learns cross-lingual speech representations by pretraining
a single model from the raw waveform of speech in multiple languages. We build on …
a single model from the raw waveform of speech in multiple languages. We build on …
Xtreme-s: Evaluating cross-lingual speech representations
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech
representations in many languages. XTREME-S covers four task families: speech …
representations in many languages. XTREME-S covers four task families: speech …
Confidence estimation and deletion prediction using bidirectional recurrent neural networks
The standard approach to assess reliability of automatic speech transcriptions is through the
use of confidence scores. If accurate, these scores provide a flexible mechanism to flag …
use of confidence scores. If accurate, these scores provide a flexible mechanism to flag …
[PDF][PDF] The Kaldi OpenKWS System: Improving Low Resource Keyword Search.
The IARPA BABEL program has stimulated worldwide research in keyword search
technology for low resource languages, and the NIST OpenKWS evaluations are the de …
technology for low resource languages, and the NIST OpenKWS evaluations are the de …
Dynamic acoustic unit augmentation with bpe-dropout for low-resource end-to-end speech recognition
With the rapid development of speech assistants, adapting server-intended automatic
speech recognition (ASR) solutions to a direct device has become crucial. For on-device …
speech recognition (ASR) solutions to a direct device has become crucial. For on-device …
Keyword spotting in continuous speech using convolutional neural network
AM Rostami, A Karimi, MA Akhaee - Speech Communication, 2022 - Elsevier
Keyword spotting is a process of finding some specific words or phrases in recorded
speeches by computers. Deep neural network algorithms, as a powerful engine, can handle …
speeches by computers. Deep neural network algorithms, as a powerful engine, can handle …
The multi-domain international search on speech 2020 albayzin evaluation: Overview, systems, results, discussion and post-evaluation analyses
The large amount of information stored in audio and video repositories makes search on
speech (SoS) a challenging area that is continuously receiving much interest. Within SoS …
speech (SoS) a challenging area that is continuously receiving much interest. Within SoS …
Advanced recurrent network-based hybrid acoustic models for low resource speech recognition
Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies.
However, the problem of exploding or vanishing gradients has limited their application. In …
However, the problem of exploding or vanishing gradients has limited their application. In …
Constructing sub-word units for spoken term detection
C Van Heerden, D Karakos… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org
Spoken term detection, especially of out-of-vocabulary (OOV) keywords, benefits from the
use of sub-word systems. We experiment with different language-independent approaches …
use of sub-word systems. We experiment with different language-independent approaches …