Speech production knowledge in automatic speech recognition

S King, J Frankel, K Livescu, E McDermott… - The Journal of the …, 2007 - pubs.aip.org
Although much is known about how speech is produced, and research into speech
production has resulted in measured articulatory data, feature systems of different kinds, and …

Interacting with computers by voice: automatic speech recognition and synthesis

D O'shaughnessy - Proceedings of the IEEE, 2003 - ieeexplore.ieee.org
This paper examines how people communicate with computers using speech. Automatic
speech recognition (ASR) transforms speech into text, while automatic speech synthesis [or …

[PDF][PDF] Robust speech recognition using articulatory information

K Kirchho - PhD esis, University of Bielefeld, Bielefeld, Germany, 1999 - Citeseer
Whereas most state-of-the-art speech recognition systems use spectral or cepstral
representations of the speech signal, there have also been some promising attempts at …

Subword modeling for automatic speech recognition: Past, present, and emerging approaches

K Livescu, E Fosler-Lussier… - IEEE Signal Processing …, 2012 - ieeexplore.ieee.org
Modern automatic speech recognition systems handle large vocabularies of words, making
it infeasible to collect enough repetitions of each word to train individual word models …

Exploring emergent syllables in end-to-end automatic speech recognizers through model explainability technique

VN Vitale, F Cutugno, A Origlia, G Coro - Neural Computing and …, 2024 - Springer
Automatic speech recognition systems based on end-to-end models (E2E-ASRs) can
achieve comparable performance to conventional ASR systems while reproducing all their …

[PDF][PDF] Moving beyond the 'beads-on-a-string'model of speech

M Ostendorf - Proc. IEEE ASRU Workshop, 1999 - Citeseer
The notion that a word is composed of a sequence of phone segments, sometimes referred
to as 'beads on a string', has formed the basis of most speech recognition work for over 15 …

Visual speech recognition with loosely synchronized feature streams

K Saenko, K Livescu, M Siracusa… - … on Computer Vision …, 2005 - ieeexplore.ieee.org
We present an approach to detecting and recognizing spoken isolated phrases based solely
on visual input. We adopt an architecture that first employs discriminative detection of visual …

[PDF][PDF] Fundamental technologies in modern speech recognition

T OCKPH - IEEE Signal Processing Magazine, 2012 - Citeseer
There is a vast body of literature on LVCSR research and some limitation is necessary in the
scope of this article. We will focus primarily on the techniques that have been successful in …

Articulatory features for robust visual speech recognition

K Saenko, T Darrell, JR Glass - … of the 6th international conference on …, 2004 - dl.acm.org
Visual information has been shown to improve the performance of speech recognition
systems in noisy acoustic environments. However, most audio-visual speech recognizers …

Deep neural network based place and manner of articulation detection and classification for bengali continuous speech

T Bhowmik, A Chowdhury, SKD Mandal - Procedia Computer Science, 2018 - Elsevier
The phonological features are the most basic unit of a speech knowledge hierarchy. This
paper reports about detection and classification of phonological features from Bengali …