A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Hubert: Self-supervised speech representation learning by masked prediction of hidden units

WN Hsu, B Bolte, YHH Tsai, K Lakhotia… - … ACM transactions on …, 2021 - ieeexplore.ieee.org
Self-supervised approaches for speech representation learning are challenged by three
unique problems:(1) there are multiple sound units in each input utterance,(2) there is no …

[HTML][HTML] Deep speech 2: End-to-end speech recognition in english and mandarin

D Amodei, S Ananthanarayanan… - International …, 2016 - proceedings.mlr.press
We show that an end-to-end deep learning approach can be used to recognize either
English or Mandarin Chinese speech–two vastly different languages. Because it replaces …

Deep learning in neural networks: An overview

J Schmidhuber - Neural networks, 2015 - Elsevier
In recent years, deep artificial neural networks (including recurrent ones) have won
numerous contests in pattern recognition and machine learning. This historical survey …

Understanding LSTM--a tutorial into long short-term memory recurrent neural networks

RC Staudemeyer, ER Morris - arxiv preprint arxiv:1909.09586, 2019 - arxiv.org
Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are one of the most
powerful dynamic classifiers publicly known. The network itself and the related learning …

Speech recognition with deep recurrent neural networks

A Graves, A Mohamed, G Hinton - 2013 IEEE international …, 2013 - ieeexplore.ieee.org
Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end
training methods such as Connectionist Temporal Classification make it possible to train …

[PDF][PDF] Long short-term memory recurrent neural network architectures for large scale acoustic modeling

H Sak, AW Senior, F Beaufays - 2014 - isca-archive.org
Abstract Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN)
architecture that was designed to model temporal sequences and their long-range …

Deep learning: methods and applications

L Deng, D Yu - Foundations and trends® in signal processing, 2014 - nowpublishers.com
This monograph provides an overview of general deep learning methodology and its
applications to a variety of signal and information processing tasks. The application areas …

Convolutional neural networks for speech recognition

O Abdel-Hamid, A Mohamed, H Jiang… - … on audio, speech …, 2014 - ieeexplore.ieee.org
Recently, the hybrid deep neural network (DNN)-hidden Markov model (HMM) has been
shown to significantly improve speech recognition performance over the conventional …