Large-vocabulary continuous speech recognition systems: A look at some recent advances

G Saon, JT Chien - IEEE signal processing magazine, 2012 - ieeexplore.ieee.org
Over the past decade or so, several advances have been made to the design of modern
large vocabulary continuous speech recognition (LVCSR) systems to the point where their …

The Microsoft 2017 conversational speech recognition system

W **ong, L Wu, F Alleva, J Droppo… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
We describe the latest version of Microsoft's conversational speech recognition system for
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …

[PDF][PDF] Purely sequence-trained neural networks for ASR based on lattice-free MMI.

D Povey, V Peddinti, D Galvez, P Ghahremani… - Interspeech, 2016 - isca-archive.org
In this paper we describe a method to perform sequencediscriminative training of neural
network acoustic models without the need for frame-level cross-entropy pre-training. We use …

Achieving human parity in conversational speech recognition

W **ong, J Droppo, X Huang, F Seide, M Seltzer… - arxiv preprint arxiv …, 2016 - arxiv.org
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure the human …

Toward human parity in conversational speech recognition

W **ong, J Droppo, X Huang, F Seide… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure a human …

[PDF][PDF] End-to-end Speech Recognition Using Lattice-free MMI.

H Hadian, H Sameti, D Povey, S Khudanpur - Interspeech, 2018 - danielpovey.com
We present our work on end-to-end training of acoustic models using the lattice-free
maximum mutual information (LF-MMI) objective function in the context of hidden Markov …

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices

T Yoshioka, N Ito, M Delcroix, A Ogawa… - … IEEE Workshop on …, 2015 - ieeexplore.ieee.org
CHiME-3 is a research community challenge organised in 2015 to evaluate speech
recognition systems for mobile multi-microphone devices used in noisy daily environments …

Conversion of non-back-off language models for efficient speech decoding

E Arisoy, B Ramabhadran, A Sethy, S Chen - US Patent 9,484,023, 2016 - Google Patents
BACKGROUND As is well known, a language model is used to represent the language that
an automatic speech recognition (ASR) system is intended to recognize or decode. One of …

[PDF][PDF] Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization.

B Kingsbury, TN Sainath, H Soltau - Interspeech, 2012 - isca-archive.org
Training neural network acoustic models with sequencediscriminative criteria, such as state-
level minimum Bayes risk (sMBR), been shown to produce large improvements in …

Comparing human and machine errors in conversational speech transcription

A Stolcke, J Droppo - arxiv preprint arxiv:1708.08615, 2017 - arxiv.org
Recent work in automatic recognition of conversational telephone speech (CTS) has
achieved accuracy levels comparable to human transcribers, although there is some debate …