Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Layer-wise analysis of a self-supervised speech representation model
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …
training speech representation models. The utility of these learned representations has been …
[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
Medical image classification using synergic deep learning
The classification of medical images is an essential task in computer-aided diagnosis,
medical image retrieval and mining. Although deep learning has shown proven advantages …
medical image retrieval and mining. Although deep learning has shown proven advantages …
Jointly discovering visual objects and spoken words from raw sensory input
In this paper, we explore neural network models that learn to associate segments of spoken
audio captions with the semantically relevant portions of natural images that they refer to …
audio captions with the semantically relevant portions of natural images that they refer to …
Effectiveness of self-supervised pre-training for speech recognition
We compare self-supervised representation learning algorithms which either explicitly
quantize the audio data or learn representations without quantization. We find the former to …
quantize the audio data or learn representations without quantization. We find the former to …
Unsupervised learning of semantic audio representations
Even in the absence of any explicit semantic annotation, vast collections of audio recordings
provide valuable information for learning the categorical structure of sounds. We consider …
provide valuable information for learning the categorical structure of sounds. We consider …
Pre-training on high-resource speech recognition improves low-resource speech-to-text translation
We present a simple approach to improve direct speech-to-text translation (ST) when the
source language is low-resource: we pre-train the model on a high-resource automatic …
source language is low-resource: we pre-train the model on a high-resource automatic …
[PDF][PDF] The zero resource speech challenge 2015.
Abstract The Interspeech 2015 Zero Resource Speech Challenge aims at discovering
subword and word units from raw speech. The challenge provides the first unified and open …
subword and word units from raw speech. The challenge provides the first unified and open …
Deep convolutional acoustic word embeddings using word-pair side information
Recent studies have been revisiting whole words as the basic modelling unit in speech
recognition and query applications, instead of phonetic units. Such whole-word segmental …
recognition and query applications, instead of phonetic units. Such whole-word segmental …