Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arxiv preprint arxiv …, 2020 - arxiv.org
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD Conference …, 2022 - dl.acm.org
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Large-scale multilingual speech recognition with a streaming end-to-end model

A Kannan, A Datta, TN Sainath, E Weinstein… - arxiv preprint arxiv …, 2019 - arxiv.org
Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic
speech recognition (ASR) coverage of the world's languages. They have shown …

Effectiveness of self-supervised pre-training for speech recognition

A Baevski, M Auli, A Mohamed - arxiv preprint arxiv:1911.03912, 2019 - arxiv.org
We compare self-supervised representation learning algorithms which either explicitly
quantize the audio data or learn representations without quantization. We find the former to …

Multilingual and code-switching ASR challenges for low resource Indian languages

A Diwan, R Vaideeswaran, S Shah, A Singh… - arxiv preprint arxiv …, 2021 - arxiv.org
Recently, there is increasing interest in multilingual automatic speech recognition (ASR)
where a speech recognition system caters to multiple low resource languages by taking …

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

S Bansal, H Kamper, K Livescu, A Lopez… - arxiv preprint arxiv …, 2018 - arxiv.org
We present a simple approach to improve direct speech-to-text translation (ST) when the
source language is low-resource: we pre-train the model on a high-resource automatic …

Parp: Prune, adjust and re-prune for self-supervised speech recognition

CIJ Lai, Y Zhang, AH Liu, S Chang… - Advances in …, 2021 - proceedings.neurips.cc
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …

End-to-end ASR-free keyword search from speech

K Audhkhasi, A Rosenberg, A Sethy… - IEEE Journal of …, 2017 - ieeexplore.ieee.org
Conventional keyword search (KWS) systems for speech databases match the input text
query to the set of word hypotheses generated by an automatic speech recognition (ASR) …