Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
Unsupervised learning of semantic audio representations
Even in the absence of any explicit semantic annotation, vast collections of audio recordings
provide valuable information for learning the categorical structure of sounds. We consider …
provide valuable information for learning the categorical structure of sounds. We consider …
The zero resource speech challenge 2017
We describe a new challenge aimed at discovering subword and word units from raw
speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It …
speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It …
Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner
Spectacular progress in the information processing sciences (machine learning, wearable
sensors) promises to revolutionize the study of cognitive development. Here, we analyse the …
sensors) promises to revolutionize the study of cognitive development. Here, we analyse the …
Word discovery in visually grounded, self-supervised speech models
We present a method for visually-grounded spoken term discovery. After training either a
HuBERT or wav2vec2. 0 model to associate spoken captions with natural images, we show …
HuBERT or wav2vec2. 0 model to associate spoken captions with natural images, we show …
Learning hierarchical discrete linguistic units from visually-grounded speech
In this paper, we present a method for learning discrete linguistic units by incorporating
vector quantization layers into neural models of visually grounded speech. We show that our …
vector quantization layers into neural models of visually grounded speech. We show that our …
[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
A segmental framework for fully-unsupervised large-vocabulary speech recognition
Zero-resource speech technology is a growing research area that aims to develop methods
for speech processing in the absence of transcriptions, lexicons, or language modelling text …
for speech processing in the absence of transcriptions, lexicons, or language modelling text …
VQVAE unsupervised unit discovery and multi-scale code2spec inverter for zerospeech challenge 2019
We describe our submitted system for the ZeroSpeech Challenge 2019. The current
challenge theme addresses the difficulty of constructing a speech synthesizer without any …
challenge theme addresses the difficulty of constructing a speech synthesizer without any …