Audiolm: a language modeling approach to audio generation
We introduce AudioLM, a framework for high-quality audio generation with long-term
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …
[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …
An unsupervised autoregressive model for speech representation learning
This paper proposes a novel unsupervised autoregressive neural model for learning generic
speech representations. In contrast to other speech representation learning methods that …
speech representations. In contrast to other speech representation learning methods that …
Unsupervised speech representation learning using wavenet autoencoders
We consider the task of unsupervised extraction of meaningful latent representations of
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
Unsupervised pretraining transfers well across languages
Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been
extensively investigated in the supervised setting. This assumes the existence of a parallel …
extensively investigated in the supervised setting. This assumes the existence of a parallel …
Libri-light: A benchmark for asr with limited or no supervision
We introduce a new collection of spoken English audio suitable for training speech
recognition systems under limited or no supervision. It is derived from open-source audio …
recognition systems under limited or no supervision. It is derived from open-source audio …
Vector-quantized neural networks for acoustic unit discovery in the zerospeech 2020 challenge
In this paper, we explore vector quantization for acoustic unit discovery. Leveraging
unlabelled data, we aim to learn discrete representations of speech that separate phonetic …
unlabelled data, we aim to learn discrete representations of speech that separate phonetic …
The zero resource speech benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
We introduce a new unsupervised task, spoken language modeling: the learning of linguistic
representations from raw audio signals without any labels, along with the Zero Resource …
representations from raw audio signals without any labels, along with the Zero Resource …
The zero resource speech challenge 2017
We describe a new challenge aimed at discovering subword and word units from raw
speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It …
speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It …