Speech recognition using deep neural networks: A systematic review

AB Nassif, I Shahin, I Attili, M Azzeh, K Shaalan - IEEE access, 2019 - ieeexplore.ieee.org
Over the past decades, a tremendous amount of research has been done on the use of
machine learning for speech processing applications, especially speech recognition …

Specaugment: A simple data augmentation method for automatic speech recognition

DS Park, W Chan, Y Zhang, CC Chiu, B Zoph… - arxiv preprint arxiv …, 2019 - arxiv.org
We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …

Seqgan: Sequence generative adversarial nets with policy gradient

L Yu, W Zhang, J Wang, Y Yu - Proceedings of the AAAI conference on …, 2017 - ojs.aaai.org
As a new way of training generative models, Generative Adversarial Net (GAN) that uses a
discriminative model to guide the training of the generative model has enjoyed considerable …

[HTML][HTML] Deep speech 2: End-to-end speech recognition in english and mandarin

D Amodei, S Ananthanarayanan… - International …, 2016 - proceedings.mlr.press
We show that an end-to-end deep learning approach can be used to recognize either
English or Mandarin Chinese speech–two vastly different languages. Because it replaces …

Light gated recurrent units for speech recognition

M Ravanelli, P Brakel, M Omologo… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
A field that has directly benefited from the recent advances in deep learning is automatic
speech recognition (ASR). Despite the great achievements of the past decades, however, a …

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

W Chan, N Jaitly, Q Le, O Vinyals - 2016 IEEE international …, 2016 - ieeexplore.ieee.org
We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes
speech utterances directly to characters without pronunciation models, HMMs or other …

The Microsoft 2017 conversational speech recognition system

W **ong, L Wu, F Alleva, J Droppo… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
We describe the latest version of Microsoft's conversational speech recognition system for
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …

[PDF][PDF] Purely sequence-trained neural networks for ASR based on lattice-free MMI.

D Povey, V Peddinti, D Galvez, P Ghahremani… - Interspeech, 2016 - isca-archive.org
In this paper we describe a method to perform sequencediscriminative training of neural
network acoustic models without the need for frame-level cross-entropy pre-training. We use …

Transformer-based acoustic modeling for hybrid speech recognition

Y Wang, A Mohamed, D Le, C Liu… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech
recognition. Several modeling choices are discussed in this work, including various …

Deep speech: Scaling up end-to-end speech recognition

A Hannun, C Case, J Casper, B Catanzaro… - arxiv preprint arxiv …, 2014 - arxiv.org
We present a state-of-the-art speech recognition system developed using end-to-end deep
learning. Our architecture is significantly simpler than traditional speech systems, which rely …