Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

[HTML][HTML] Unsupervised automatic speech recognition: A review

H Aldarmaki, A Ullah, S Ram, N Zaki - Speech Communication, 2022 - Elsevier
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …

Deep convolutional acoustic word embeddings using word-pair side information

H Kamper, W Wang, K Livescu - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Recent studies have been revisiting whole words as the basic modelling unit in speech
recognition and query applications, instead of phonetic units. Such whole-word segmental …

Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings

K Levin, K Henry, A Jansen… - 2013 IEEE workshop on …, 2013 - ieeexplore.ieee.org
Measures of acoustic similarity between words or other units are critical for segmental
exemplar-based acoustic models, spoken term discovery, and query-by-example search …

Discriminative acoustic word embeddings: Tecurrent neural network-based approaches

S Settle, K Livescu - 2016 IEEE Spoken Language Technology …, 2016 - ieeexplore.ieee.org
Acoustic word embeddings-fixed-dimensional vector representations of variable-length
spoken word segments-have begun to be considered for tasks such as speech recognition …

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

H Kamper - ICASSP 2019-2019 IEEE International Conference …, 2019 - ieeexplore.ieee.org
We investigate unsupervised models that can map a variable-duration speech segment to a
fixed-dimensional representation. In settings where unlabelled speech is the only available …

Multi-view recurrent neural acoustic word embeddings

W He, W Wang, K Livescu - arxiv preprint arxiv:1611.04496, 2016 - arxiv.org
Recent work has begun exploring neural acoustic word embeddings---fixed-dimensional
vector representations of arbitrary-length speech segments corresponding to words. Such …

Analyzing acoustic word embeddings from pre-trained self-supervised speech models

R Sanabria, H Tang, S Goldwater - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Given the strong results of self-supervised models on various tasks, there have been
surprisingly few studies exploring self-supervised representations for acoustic word …

Audio word2vec: Sequence-to-sequence autoencoding for unsupervised learning of audio segmentation and representation

YC Chen, SF Huang, H Lee, YH Wang… - … /ACM Transactions on …, 2019 - ieeexplore.ieee.org
In text, word2vec transforms each word into a fixed-size vector used as the basic component
in applications of natural language processing. Given a large collection of unannotated …