Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
Layer-wise analysis of a self-supervised speech representation model
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …
training speech representation models. The utility of these learned representations has been …
Pre-training on high-resource speech recognition improves low-resource speech-to-text translation
We present a simple approach to improve direct speech-to-text translation (ST) when the
source language is low-resource: we pre-train the model on a high-resource automatic …
source language is low-resource: we pre-train the model on a high-resource automatic …
Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner
Spectacular progress in the information processing sciences (machine learning, wearable
sensors) promises to revolutionize the study of cognitive development. Here, we analyse the …
sensors) promises to revolutionize the study of cognitive development. Here, we analyse the …
Deep convolutional acoustic word embeddings using word-pair side information
Recent studies have been revisiting whole words as the basic modelling unit in speech
recognition and query applications, instead of phonetic units. Such whole-word segmental …
recognition and query applications, instead of phonetic units. Such whole-word segmental …
Evaluating speech features with the minimal-pair ABX task: Analysis of the classical MFC/PLP pipeline
We present a new framework for the evaluation of speech rep-resentations in zero-resource
settings, that extends and complements previous work by Carlin, Jansen and Hermansky [1] …
settings, that extends and complements previous work by Carlin, Jansen and Hermansky [1] …
Efficient spoken term discovery using randomized algorithms
Spoken term discovery is the task of automatically identifying words and phrases in speech
data by searching for long repeated acoustic patterns. Initial solutions relied on exhaustive …
data by searching for long repeated acoustic patterns. Initial solutions relied on exhaustive …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …
improving performance and data efficiency on various speech tasks. However, these …
Unsupervised neural network based feature extraction using weak top-down constraints
Deep neural networks (DNNs) have become a standard component in supervised ASR,
used in both data-driven feature extraction and acoustic modelling. Supervision is typically …
used in both data-driven feature extraction and acoustic modelling. Supervision is typically …