Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Deep spoken keyword spotting: An overview
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech
In this paper, we propose a novel deep neural network architecture, Speech2Vec, for
learning fixed-length vector representations of audio segments excised from a speech …
learning fixed-length vector representations of audio segments excised from a speech …
Effectiveness of self-supervised pre-training for speech recognition
We compare self-supervised representation learning algorithms which either explicitly
quantize the audio data or learn representations without quantization. We find the former to …
quantize the audio data or learn representations without quantization. We find the former to …
Audio word2vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder
The vector representations of fixed dimensionality for words (in text) offered by Word2Vec
have been shown to be very useful in many application scenarios, in particular due to the …
have been shown to be very useful in many application scenarios, in particular due to the …
Effectiveness of self-supervised pre-training for asr
We compare self-supervised representation learning algorithms which either explicitly
quantize the audio data or learn representations without quantization. We find the former to …
quantize the audio data or learn representations without quantization. We find the former to …
Query-by-example keyword spotting using long short-term memory networks
We present a novel approach to query-by-example keyword spotting (KWS) using a long
short-term memory (LSTM) recurrent neural network-based feature extractor. In our …
short-term memory (LSTM) recurrent neural network-based feature extractor. In our …
Deep convolutional acoustic word embeddings using word-pair side information
Recent studies have been revisiting whole words as the basic modelling unit in speech
recognition and query applications, instead of phonetic units. Such whole-word segmental …
recognition and query applications, instead of phonetic units. Such whole-word segmental …
[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
End-to-end ASR-free keyword search from speech
Conventional keyword search (KWS) systems for speech databases match the input text
query to the set of word hypotheses generated by an automatic speech recognition (ASR) …
query to the set of word hypotheses generated by an automatic speech recognition (ASR) …