Survey of deep learning paradigms for speech processing

KB Bhangale, M Kothandaraman - Wireless Personal Communications, 2022 - Springer
Over the past decades, a particular focus is given to research on machine learning
techniques for speech processing applications. However, in the past few years, research …

Large-vocabulary continuous speech recognition systems: A look at some recent advances

G Saon, JT Chien - IEEE signal processing magazine, 2012 - ieeexplore.ieee.org
Over the past decade or so, several advances have been made to the design of modern
large vocabulary continuous speech recognition (LVCSR) systems to the point where their …

The application of hidden Markov models in speech recognition

M Gales, S Young - Foundations and Trends® in Signal …, 2008 - nowpublishers.com
The Application of Hidden Markov Models in Speech Recognition Page 1 The Application of
Hidden Markov Models in Speech Recognition Full text available at: http://dx.doi.org/10.1561/2000000004 …

[PDF][PDF] Text-to-speech synthesis

P Taylor - 2009 - 103.203.175.90
Text-to-Speech Synthesis provides a complete, end-to-end account of the process of
generating speech by computer. Giving an in-depth explanation of all aspects of current …

Unsupervised training and directed manual transcription for LVCSR

K Yu, M Gales, L Wang, PC Woodland - Speech Communication, 2010 - Elsevier
A significant cost in obtaining acoustic training data is the generation of accurate
transcriptions. When no transcription is available, unsupervised training techniques must be …

HMMs and related speech recognition technologies

S Young - Springer handbook of speech processing, 2008 - Springer
Almost all present-day continuous speech recognition (CSR) systems are based on hidden
Markov models (HMMs). Although the fundamentals of HMM-based CSR have been …

[PDF][PDF] Fundamental technologies in modern speech recognition

T OCKPH - IEEE Signal Processing Magazine, 2012 - Citeseer
There is a vast body of literature on LVCSR research and some limitation is necessary in the
scope of this article. We will focus primarily on the techniques that have been successful in …

Mandarin tone classification without pitch tracking

N Ryant, J Yuan, M Liberman - 2014 IEEE international …, 2014 - ieeexplore.ieee.org
A deep neural network (DNN) based classifier achieved 27.38% frame error rate (FER) and
15.62% segment error rate (SER) in recognizing five tonal categories in Mandarin Chinese …

[PDF][PDF] Highly accurate mandarin tone classification in the absence of pitch information

N Ryant, M Slaney, M Liberman… - … of Speech Prosody, 2014 - researchgate.net
A deep neural network (DNN) classifier based only on 40 mel-frequency cepstral coefficients
(MFCCs) achieved 29.99% frame error rate (FER) and 16.86% segment error rate (SER) in …

Use of contexts in language model interpolation and adaptation

X Liu, MJF Gales, PC Woodland - Computer Speech & Language, 2013 - Elsevier
Language models (LMs) are often constructed by building multiple individual component
models that are combined using context independent interpolation weights. By tuning these …