Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Audio self-supervised learning: A survey
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …
learning (SSL) targets discovering general representations from large-scale data. This …
wav2vec: Unsupervised pre-training for speech recognition
We explore unsupervised pre-training for speech recognition by learning representations of
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …
Unsupervised speech recognition
Despite rapid progress in the recent past, current speech recognition systems still require
labeled training data which limits this technology to a small fraction of the languages spoken …
labeled training data which limits this technology to a small fraction of the languages spoken …
Unsupervised speech representation learning using wavenet autoencoders
We consider the task of unsupervised extraction of meaningful latent representations of
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
Deep partial multi-view learning
Although multi-view learning has made significant progress over the past few decades, it is
still challenging due to the difficulty in modeling complex correlations among different views …
still challenging due to the difficulty in modeling complex correlations among different views …
Libri-light: A benchmark for asr with limited or no supervision
We introduce a new collection of spoken English audio suitable for training speech
recognition systems under limited or no supervision. It is derived from open-source audio …
recognition systems under limited or no supervision. It is derived from open-source audio …
Unified speech-text pre-training for speech translation and recognition
We describe a method to jointly pre-train speech and text in an encoder-decoder modeling
framework for speech translation and recognition. The proposed method incorporates four …
framework for speech translation and recognition. The proposed method incorporates four …
Towards end-to-end unsupervised speech recognition
Unsupervised speech recognition has shown great potential to make Automatic Speech
Recognition (ASR) systems accessible to every language. However, existing methods still …
Recognition (ASR) systems accessible to every language. However, existing methods still …
Firerisk: A remote sensing dataset for fire risk assessment with benchmarks using supervised and self-supervised learning
In recent decades, wildfires have caused tremendous property losses, fatalities, and
extensive damage to forest ecosystems. Inspired by the abundance of publicly available …
extensive damage to forest ecosystems. Inspired by the abundance of publicly available …