Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

wav2vec: Unsupervised pre-training for speech recognition

S Schneider, A Baevski, R Collobert, M Auli - arxiv preprint arxiv …, 2019 - arxiv.org
We explore unsupervised pre-training for speech recognition by learning representations of
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …

Unsupervised speech recognition

A Baevski, WN Hsu, A Conneau… - Advances in Neural …, 2021 - proceedings.neurips.cc
Despite rapid progress in the recent past, current speech recognition systems still require
labeled training data which limits this technology to a small fraction of the languages spoken …

Unsupervised speech representation learning using wavenet autoencoders

J Chorowski, RJ Weiss, S Bengio… - … /ACM transactions on …, 2019 - ieeexplore.ieee.org
We consider the task of unsupervised extraction of meaningful latent representations of
speech by applying autoencoding neural networks to speech waveforms. The goal is to …

Deep partial multi-view learning

C Zhang, Y Cui, Z Han, JT Zhou… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Although multi-view learning has made significant progress over the past few decades, it is
still challenging due to the difficulty in modeling complex correlations among different views …

Libri-light: A benchmark for asr with limited or no supervision

J Kahn, M Riviere, W Zheng… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We introduce a new collection of spoken English audio suitable for training speech
recognition systems under limited or no supervision. It is derived from open-source audio …

Unified speech-text pre-training for speech translation and recognition

Y Tang, H Gong, N Dong, C Wang, WN Hsu… - arxiv preprint arxiv …, 2022 - arxiv.org
We describe a method to jointly pre-train speech and text in an encoder-decoder modeling
framework for speech translation and recognition. The proposed method incorporates four …

Towards end-to-end unsupervised speech recognition

AH Liu, WN Hsu, M Auli… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Unsupervised speech recognition has shown great potential to make Automatic Speech
Recognition (ASR) systems accessible to every language. However, existing methods still …

Firerisk: A remote sensing dataset for fire risk assessment with benchmarks using supervised and self-supervised learning

S Shen, S Seneviratne, X Wanyan… - … Conference on Digital …, 2023 - ieeexplore.ieee.org
In recent decades, wildfires have caused tremendous property losses, fatalities, and
extensive damage to forest ecosystems. Inspired by the abundance of publicly available …