Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022‏ - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

[HTML][HTML] Unsupervised automatic speech recognition: A review

H Aldarmaki, A Ullah, S Ram, N Zaki - Speech Communication, 2022‏ - Elsevier
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …

Layer-wise analysis of a self-supervised speech representation model

A Pasad, JC Chou, K Livescu - 2021 IEEE Automatic Speech …, 2021‏ - ieeexplore.ieee.org
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

S Bansal, H Kamper, K Livescu, A Lopez… - arxiv preprint arxiv …, 2018‏ - arxiv.org
We present a simple approach to improve direct speech-to-text translation (ST) when the
source language is low-resource: we pre-train the model on a high-resource automatic …

Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner

E Dupoux - Cognition, 2018‏ - Elsevier
Spectacular progress in the information processing sciences (machine learning, wearable
sensors) promises to revolutionize the study of cognitive development. Here, we analyse the …

Deep convolutional acoustic word embeddings using word-pair side information

H Kamper, W Wang, K Livescu - 2016 IEEE International …, 2016‏ - ieeexplore.ieee.org
Recent studies have been revisiting whole words as the basic modelling unit in speech
recognition and query applications, instead of phonetic units. Such whole-word segmental …

Evaluating speech features with the minimal-pair ABX task: Analysis of the classical MFC/PLP pipeline

T Schatz, V Peddinti, F Bach, A Jansen… - … 2013: 14th Annual …, 2013‏ - hal.science
We present a new framework for the evaluation of speech rep-resentations in zero-resource
settings, that extends and complements previous work by Carlin, Jansen and Hermansky [1] …

Efficient spoken term discovery using randomized algorithms

A Jansen, B Van Durme - 2011 IEEE Workshop on Automatic …, 2011‏ - ieeexplore.ieee.org
Spoken term discovery is the task of automatically identifying words and phrases in speech
data by searching for long repeated acoustic patterns. Initial solutions relied on exhaustive …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024‏ - direct.mit.edu
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

Unsupervised neural network based feature extraction using weak top-down constraints

H Kamper, M Elsner, A Jansen… - 2015 IEEE International …, 2015‏ - ieeexplore.ieee.org
Deep neural networks (DNNs) have become a standard component in supervised ASR,
used in both data-driven feature extraction and acoustic modelling. Supervision is typically …