XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

Unsupervised speech recognition

A Baevski, WN Hsu, A Conneau… - Advances in Neural …, 2021 - proceedings.neurips.cc
Despite rapid progress in the recent past, current speech recognition systems still require
labeled training data which limits this technology to a small fraction of the languages spoken …

Unsupervised cross-lingual representation learning for speech recognition

A Conneau, A Baevski, R Collobert… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents XLSR which learns cross-lingual speech representations by pretraining
a single model from the raw waveform of speech in multiple languages. We build on …

Unsupervised pretraining transfers well across languages

M Riviere, A Joulin, PE Mazaré… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been
extensively investigated in the supervised setting. This assumes the existence of a parallel …

Towards end-to-end unsupervised speech recognition

AH Liu, WN Hsu, M Auli… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Unsupervised speech recognition has shown great potential to make Automatic Speech
Recognition (ASR) systems accessible to every language. However, existing methods still …

Feature extraction methods in language identification: a survey

D Deshwal, P Sangwan, D Kumar - Wireless Personal Communications, 2019 - Springer
Abstract Language Identification (LI) is one of the widely emerging field in the areas of
speech processing to accurately identify the language from the data base based on some …

Parp: Prune, adjust and re-prune for self-supervised speech recognition

CIJ Lai, Y Zhang, AH Liu, S Chang… - Advances in …, 2021 - proceedings.neurips.cc
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …

Spoken language recognization based on features and classification methods: A review

P Bam, S Degadwala, R Upadhyay… - … Conference on Artificial …, 2022 - ieeexplore.ieee.org
In Western countries, speech-recognition applications are accepted. In East Asia, it isn't as
common. The complexity of the language might be one of the main reasons for this latency …

Advancing stuttering detection via data augmentation, class-balanced loss and multi-contextual deep learning

SA Sheikh, M Sahidullah, F Hirsch… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
Stuttering is a neuro-developmental speech impairment characterized by uncontrolled
utterances (interjections) and core behaviors (blocks, repetitions, and prolongations), and is …

Language learning using speech to image retrieval

D Merkx, SL Frank, M Ernestus - arxiv preprint arxiv:1909.03795, 2019 - arxiv.org
Humans learn language by interaction with their environment and listening to other humans.
It should also be possible for computational models to learn language directly from speech …