Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Machine learning paradigms for speech recognition: An overview

L Deng, X Li - IEEE Transactions on Audio, Speech, and …, 2013 - ieeexplore.ieee.org
Automatic Speech Recognition (ASR) has historically been a driving force behind many
machine learning (ML) techniques, including the ubiquitously used hidden Markov model …

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

Y Zhang, DS Park, W Han, J Qin… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …

Pushing the limits of semi-supervised learning for automatic speech recognition

Y Zhang, J Qin, DS Park, W Han, CC Chiu… - arxiv preprint arxiv …, 2020 - arxiv.org
We employ a combination of recent developments in semi-supervised learning for automatic
speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled …

Improved noisy student training for automatic speech recognition

DS Park, Y Zhang, Y Jia, W Han, CC Chiu, B Li… - arxiv preprint arxiv …, 2020 - arxiv.org
Recently, a semi-supervised learning method known as" noisy student training" has been
shown to improve image classification performance of deep networks significantly. Noisy …

Iterative pseudo-labeling for speech recognition

Q Xu, T Likhomanenko, J Kahn, A Hannun… - arxiv preprint arxiv …, 2020 - arxiv.org
Pseudo-labeling has recently shown promise in end-to-end automatic speech recognition
(ASR). We study Iterative Pseudo-Labeling (IPL), a semi-supervised algorithm which …

The application of hidden Markov models in speech recognition

M Gales, S Young - Foundations and Trends® in Signal …, 2008 - nowpublishers.com
The Application of Hidden Markov Models in Speech Recognition Page 1 The Application of
Hidden Markov Models in Speech Recognition Full text available at: http://dx.doi.org/10.1561/2000000004 …

Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams

Y Zhang, JR Glass - 2009 IEEE Workshop on Automatic Speech …, 2009 - ieeexplore.ieee.org
In this paper, we present an unsupervised learning framework to address the problem of
detecting spoken keywords. Without any transcription information, a Gaussian Mixture Model …

Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription

H Liao, E McDermott, A Senior - 2013 IEEE Workshop on …, 2013 - ieeexplore.ieee.org
YouTube is a highly visited video sharing website where over one billion people watch six
billion hours of video every month. Improving accessibility to these videos for the hearing …