Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
Deep convolutional acoustic word embeddings using word-pair side information
Recent studies have been revisiting whole words as the basic modelling unit in speech
recognition and query applications, instead of phonetic units. Such whole-word segmental …
recognition and query applications, instead of phonetic units. Such whole-word segmental …
Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings
Measures of acoustic similarity between words or other units are critical for segmental
exemplar-based acoustic models, spoken term discovery, and query-by-example search …
exemplar-based acoustic models, spoken term discovery, and query-by-example search …
Discriminative acoustic word embeddings: Tecurrent neural network-based approaches
Acoustic word embeddings-fixed-dimensional vector representations of variable-length
spoken word segments-have begun to be considered for tasks such as speech recognition …
spoken word segments-have begun to be considered for tasks such as speech recognition …
Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models
H Kamper - ICASSP 2019-2019 IEEE International Conference …, 2019 - ieeexplore.ieee.org
We investigate unsupervised models that can map a variable-duration speech segment to a
fixed-dimensional representation. In settings where unlabelled speech is the only available …
fixed-dimensional representation. In settings where unlabelled speech is the only available …
Multi-view recurrent neural acoustic word embeddings
Recent work has begun exploring neural acoustic word embeddings---fixed-dimensional
vector representations of arbitrary-length speech segments corresponding to words. Such …
vector representations of arbitrary-length speech segments corresponding to words. Such …
Analyzing acoustic word embeddings from pre-trained self-supervised speech models
Given the strong results of self-supervised models on various tasks, there have been
surprisingly few studies exploring self-supervised representations for acoustic word …
surprisingly few studies exploring self-supervised representations for acoustic word …
Audio word2vec: Sequence-to-sequence autoencoding for unsupervised learning of audio segmentation and representation
In text, word2vec transforms each word into a fixed-size vector used as the basic component
in applications of natural language processing. Given a large collection of unannotated …
in applications of natural language processing. Given a large collection of unannotated …