Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Deep representation learning in speech processing: Challenges, recent advances, and future trends
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …
engineered acoustic features (feature engineering) as a separate distinct problem from the …
Graph neural networks: foundation, frontiers and applications
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …
recent years. Graph neural networks, also known as deep learning on graphs, graph …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Large-scale multilingual speech recognition with a streaming end-to-end model
Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic
speech recognition (ASR) coverage of the world's languages. They have shown …
speech recognition (ASR) coverage of the world's languages. They have shown …
Effectiveness of self-supervised pre-training for speech recognition
We compare self-supervised representation learning algorithms which either explicitly
quantize the audio data or learn representations without quantization. We find the former to …
quantize the audio data or learn representations without quantization. We find the former to …
Multilingual and code-switching ASR challenges for low resource Indian languages
Recently, there is increasing interest in multilingual automatic speech recognition (ASR)
where a speech recognition system caters to multiple low resource languages by taking …
where a speech recognition system caters to multiple low resource languages by taking …
Pre-training on high-resource speech recognition improves low-resource speech-to-text translation
We present a simple approach to improve direct speech-to-text translation (ST) when the
source language is low-resource: we pre-train the model on a high-resource automatic …
source language is low-resource: we pre-train the model on a high-resource automatic …
Parp: Prune, adjust and re-prune for self-supervised speech recognition
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
End-to-end ASR-free keyword search from speech
Conventional keyword search (KWS) systems for speech databases match the input text
query to the set of word hypotheses generated by an automatic speech recognition (ASR) …
query to the set of word hypotheses generated by an automatic speech recognition (ASR) …