[HTML][HTML] Unsupervised automatic speech recognition: A review

H Aldarmaki, A Ullah, S Ram, N Zaki - Speech Communication, 2022 - Elsevier
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Mls: A large-scale multilingual dataset for speech research

V Pratap, Q Xu, A Sriram, G Synnaeve… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus
suitable for speech research. The dataset is derived from read audiobooks from LibriVox …

Contextnet: Improving convolutional neural networks for automatic speech recognition with global context

W Han, Z Zhang, Y Zhang, J Yu, CC Chiu, J Qin… - arxiv preprint arxiv …, 2020 - arxiv.org
Convolutional neural networks (CNN) have shown promising results for end-to-end speech
recognition, albeit still behind other state-of-the-art methods in performance. In this paper …

Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions

S Kriman, S Beliaev, B Ginsburg… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We propose a new end-to-end neural acoustic model for automatic speech recognition. The
model is composed of multiple blocks with residual connections between them. Each block …

A comparison of transformer and lstm encoder decoder models for asr

A Zeyer, P Bahar, K Irie, R Schlüter… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org
We present competitive results using a Transformer encoder-decoder-attention model for
end-to-end speech recognition needing less training time compared to a similarly …

End-to-end asr: from supervised to semi-supervised learning with modern architectures

G Synnaeve, Q Xu, J Kahn, T Likhomanenko… - arxiv preprint arxiv …, 2019 - arxiv.org
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth
Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq …

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention--w/o Data Augmentation

C Lüscher, E Beck, K Irie, M Kitza, W Michel… - arxiv preprint arxiv …, 2019 - arxiv.org
We present state-of-the-art automatic speech recognition (ASR) systems employing a
standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder …

Self-training for end-to-end speech recognition

J Kahn, A Lee, A Hannun - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
We revisit self-training in the context of end-to-end speech recognition. We demonstrate that
training with pseudo-labels can substantially improve the accuracy of a baseline model. Key …