Automatic speech recognition and speech variability: A review

M Benzeghiba, R De Mori, O Deroo, S Dupont… - Speech …, 2007 - Elsevier
Major progress is being recorded regularly on both the technology and exploitation of
automatic speech recognition (ASR) and spoken language systems. However, there are still …

Large-vocabulary continuous speech recognition systems: A look at some recent advances

G Saon, JT Chien - IEEE signal processing magazine, 2012 - ieeexplore.ieee.org
Over the past decade or so, several advances have been made to the design of modern
large vocabulary continuous speech recognition (LVCSR) systems to the point where their …

Feature engineering in context-dependent deep neural networks for conversational speech transcription

F Seide, G Li, X Chen, D Yu - 2011 IEEE Workshop on …, 2011 - ieeexplore.ieee.org
We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-
DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for …

The application of hidden Markov models in speech recognition

M Gales, S Young - Foundations and Trends® in Signal …, 2008 - nowpublishers.com
The Application of Hidden Markov Models in Speech Recognition Page 1 The Application of
Hidden Markov Models in Speech Recognition Full text available at: http://dx.doi.org/10.1561/2000000004 …

[PDF][PDF] Discriminative training for large vocabulary speech recognition

D Povey - 2005 - researchgate.net
This thesis investigates the use of discriminative criteria for training HMM parameters for
speech recognition, in particular the Maximum Mutual Information (MMI) criterion and a new …

Maximum F1-score discriminative training criterion for automatic mispronunciation detection

H Huang, H Xu, X Wang… - IEEE/ACM Transactions on …, 2015 - ieeexplore.ieee.org
We carry out an in-depth investigation on a newly proposed Maximum F1-score Criterion
(MFC) discriminative training objective function for Goodness of Pronunciation (GOP) based …

Bayesian recurrent neural network for language modeling

JT Chien, YC Ku - IEEE transactions on neural networks and …, 2015 - ieeexplore.ieee.org
A language model (LM) is calculated as the probability of a word sequence that provides the
solution to word prediction for a variety of information systems. A recurrent neural network …

Interacting with computers by voice: automatic speech recognition and synthesis

D O'shaughnessy - Proceedings of the IEEE, 2003 - ieeexplore.ieee.org
This paper examines how people communicate with computers using speech. Automatic
speech recognition (ASR) transforms speech into text, while automatic speech synthesis [or …

[PDF][PDF] Develo** a Speech Activity Detection System for the DARPA RATS Program.

T Ng, B Zhang, L Nguyen, S Matsoukas, X Zhou… - Interspeech, 2012 - isca-archive.org
This paper describes the speech activity detection (SAD) system developed by the Patrol
team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) …

Bankruptcy analysis with self-organizing maps in learning metrics

S Kaski, J Sinkkonen, J Peltonen - IEEE Transactions on …, 2001 - ieeexplore.ieee.org
We introduce a method for deriving a metric, locally based on the Fisher information matrix,
into the data space. A self-organizing map (SOM) is computed in the new metric to explore …