A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

An unsupervised autoregressive model for speech representation learning

YA Chung, WN Hsu, H Tang, J Glass - arxiv preprint arxiv:1904.03240, 2019 - arxiv.org
This paper proposes a novel unsupervised autoregressive neural model for learning generic
speech representations. In contrast to other speech representation learning methods that …

Generative pre-training for speech with autoregressive predictive coding

YA Chung, J Glass - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
Learning meaningful and general representations from unannotated speech that are
applicable to a wide range of tasks remains challenging. In this paper we propose to use …

Non-autoregressive predictive coding for learning speech representations from local dependencies

AH Liu, YA Chung, J Glass - arxiv preprint arxiv:2011.00406, 2020 - arxiv.org
Self-supervised speech representations have been shown to be effective in a variety of
speech applications. However, existing representation learning methods generally rely on …

Unsupervised pre-training of bidirectional speech encoders via masked reconstruction

W Wang, Q Tang, K Livescu - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
We propose an approach for pre-training speech representations via a masked
reconstruction loss. Our pre-trained encoder networks are bidirectional and can therefore be …

Learning hierarchical discrete linguistic units from visually-grounded speech

D Harwath, WN Hsu, J Glass - arxiv preprint arxiv:1911.09602, 2019 - arxiv.org
In this paper, we present a method for learning discrete linguistic units by incorporating
vector quantization layers into neural models of visually grounded speech. We show that our …

Improved speech representations with multi-target autoregressive predictive coding

YA Chung, J Glass - arxiv preprint arxiv:2004.05274, 2020 - arxiv.org
Training objectives based on predictive coding have recently been shown to be very
effective at learning meaningful representations from unlabeled speech. One example is …

Pre-training audio representations with self-supervision

M Tagliasacchi, B Gfeller… - IEEE Signal …, 2020 - ieeexplore.ieee.org
We explore self-supervision as a way to learn general purpose audio representations.
Specifically, we propose two self-supervised tasks: Audio2Vec, which aims at reconstructing …

A brief overview of unsupervised neural speech representation learning

L Borgholt, JD Havtorn, J Edin, L Maaløe… - arxiv preprint arxiv …, 2022 - arxiv.org
Unsupervised representation learning for speech processing has matured greatly in the last
few years. Work in computer vision and natural language processing has paved the way, but …