Speaker recognition by machines and humans: A tutorial review

JHL Hansen, T Hasan - IEEE Signal processing magazine, 2015 - ieeexplore.ieee.org
Identifying a person by his or her voice is an important human trait most take for granted in
natural human-to-human interaction/communication. Speaking to someone over the …

An overview of noise-robust automatic speech recognition

J Li, L Deng, Y Gong… - IEEE/ACM Transactions …, 2014 - ieeexplore.ieee.org
New waves of consumer-centric applications, such as voice search and voice interaction
with mobile devices and home entertainment systems, increasingly require automatic …

Machine learning paradigms for speech recognition: An overview

L Deng, X Li - IEEE Transactions on Audio, Speech, and …, 2013 - ieeexplore.ieee.org
Automatic Speech Recognition (ASR) has historically been a driving force behind many
machine learning (ML) techniques, including the ubiquitously used hidden Markov model …

Statistical parametric speech synthesis

H Zen, K Tokuda, AW Black - speech communication, 2009 - Elsevier
This review gives a general overview of techniques used in statistical parametric speech
synthesis. One instance of these techniques, called hidden Markov model (HMM)-based …

Speech synthesis based on hidden Markov models

K Tokuda, Y Nankaku, T Toda, H Zen… - Proceedings of the …, 2013 - ieeexplore.ieee.org
This paper gives a general overview of hidden Markov model (HMM)-based speech
synthesis, which has recently been demonstrated to be very effective in synthesizing …

An overview of speaker identification: Accuracy and robustness issues

R Togneri, D Pullella - IEEE circuits and systems magazine, 2011 - ieeexplore.ieee.org
This paper presents the main paradigms for speaker identification, and recent work on
missing data methods to increase robustness. The feature extraction, speaker modeling and …

Eigenvoice modeling with sparse training data

P Kenny, G Boulianne… - IEEE transactions on …, 2005 - ieeexplore.ieee.org
We derive an exact solution to the problem of maximum likelihood estimation of the
supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and …

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

The subspace Gaussian mixture model—A structured model for speech recognition

D Povey, L Burget, M Agarwal, P Akyazi, F Kai… - Computer Speech & …, 2011 - Elsevier
We describe a new approach to speech recognition, in which all Hidden Markov Model
(HMM) states share the same Gaussian Mixture Model (GMM) structure with the same …

[LIBRO][B] Distant speech recognition

M Wölfel, J McDonough - 2009 - books.google.com
A complete overview of distant automatic speech recognition The performance of
conventional Automatic Speech Recognition (ASR) systems degrades dramatically as soon …