XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

Unsupervised cross-lingual representation learning for speech recognition

A Conneau, A Baevski, R Collobert… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents XLSR which learns cross-lingual speech representations by pretraining
a single model from the raw waveform of speech in multiple languages. We build on …

Xtreme-s: Evaluating cross-lingual speech representations

A Conneau, A Bapna, Y Zhang, M Ma… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech
representations in many languages. XTREME-S covers four task families: speech …

Confidence estimation and deletion prediction using bidirectional recurrent neural networks

A Ragni, Q Li, MJF Gales… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
The standard approach to assess reliability of automatic speech transcriptions is through the
use of confidence scores. If accurate, these scores provide a flexible mechanism to flag …

[PDF][PDF] The Kaldi OpenKWS System: Improving Low Resource Keyword Search.

J Trmal, M Wiesner, V Peddinti, X Zhang… - Interspeech, 2017 - researchgate.net
The IARPA BABEL program has stimulated worldwide research in keyword search
technology for low resource languages, and the NIST OpenKWS evaluations are the de …

Dynamic acoustic unit augmentation with bpe-dropout for low-resource end-to-end speech recognition

A Laptev, A Andrusenko, I Podluzhny, A Mitrofanov… - Sensors, 2021 - mdpi.com
With the rapid development of speech assistants, adapting server-intended automatic
speech recognition (ASR) solutions to a direct device has become crucial. For on-device …

Keyword spotting in continuous speech using convolutional neural network

AM Rostami, A Karimi, MA Akhaee - Speech Communication, 2022 - Elsevier
Keyword spotting is a process of finding some specific words or phrases in recorded
speeches by computers. Deep neural network algorithms, as a powerful engine, can handle …

The multi-domain international search on speech 2020 albayzin evaluation: Overview, systems, results, discussion and post-evaluation analyses

J Tejedor, DT Toledano, JM Ramirez, AR Montalvo… - Applied Sciences, 2021 - mdpi.com
The large amount of information stored in audio and video repositories makes search on
speech (SoS) a challenging area that is continuously receiving much interest. Within SoS …

Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

J Kang, WQ Zhang, WW Liu, J Liu… - EURASIP Journal on Audio …, 2018 - Springer
Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies.
However, the problem of exploding or vanishing gradients has limited their application. In …

Constructing sub-word units for spoken term detection

C Van Heerden, D Karakos… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org
Spoken term detection, especially of out-of-vocabulary (OOV) keywords, benefits from the
use of sub-word systems. We experiment with different language-independent approaches …