Speaker adaptation of neural network acoustic models using i-vectors

G Saon, H Soltau, D Nahamoo… - 2013 IEEE Workshop on …, 2013 - ieeexplore.ieee.org
We propose to adapt deep neural network (DNN) acoustic models to a target speaker by
supplying speaker identity vectors (i-vectors) as input features to the network in parallel with …

Speakerbeam: Speaker aware neural network for target speaker extraction in speech mixtures

K Žmolíková, M Delcroix, K Kinoshita… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
The processing of speech corrupted by interfering overlap** speakers is one of the
challenging problems with regards to today's automatic speech recognition systems …

Method and apparatus for automated speaker parameters adaptation in a deployed speaker verification system

DE Colibro, C Vair, KR Farrell - US Patent 9,865,266, 2018 - Google Patents
Typical speaker verification systems usually employ speakers' audio data collected during
an enrollment phase when users enroll with the system and provide respective voice …

[PDF][PDF] Language recognition in ivectors space

D Martinez, O Plchot, L Burget, O Glembek… - … annual conference of …, 2011 - isca-archive.org
The concept of so called iVectors, where each utterance is represented by fixed-length low-
dimensional feature vector, has recently become very successfully in speaker verification. In …

Bi-modal person recognition on a mobile phone: using mobile phone data

C McCool, S Marcel, A Hadid… - … on multimedia and …, 2012 - ieeexplore.ieee.org
This paper presents a novel fully automatic bi-modal, face and speaker, recognition system
which runs in real-time on a mobile phone. The implemented system runs in real-time on a …

[PDF][PDF] Neural Network Bottleneck Features for Language Identification.

P Matejka, Le Zhang 0002, T Ng, O Glembek, JZ Ma… - Odyssey, 2014 - isca-archive.org
This paper presents the application of Neural Network Bottleneck (BN) features in Language
Identification (LID). BN features are generally used for Large Vocabulary Speech …

[PDF][PDF] Mobile biometrics (mobio): Joint face and voice verification for a mobile platform

P Tresadern, C McCool, N Poh, P Matejka… - IEEE pervasive …, 2012 - academia.edu
Mobile Biometrics (MoBio): Joint Face and Voice Verification for a Mobile Platform Page 1
Mobile Biometrics (MoBio): Joint Face and Voice Verification for a Mobile Platform PA …

ALIZE 3.0-open source toolkit for state-of-the-art speaker recognition

A Larcher, JF Bonastre, B Fauve, KA Lee… - Annual Conference of …, 2013 - hal.science
ALIZE is an open-source platform for speaker recognition. The ALIZE library implements a
low-level statistical engine based on the well-known Gaussian mixture modelling. The toolkit …

[PDF][PDF] A small footprint i-vector extractor.

P Kenny - Odyssey, 2012 - pdfs.semanticscholar.org
In the case c= 1 and Nc= 1 this is the same as the PPCA posterior calculation (Bishop)
Accumulate the matrix∑ c NcV∗ c Vc for each utterance The matrices V∗ c Vc are typically …

[PDF][PDF] Towards speaker adaptive training of deep neural network acoustic models.

Y Miao, H Zhang, F Metze - Interspeech, 2014 - isca-archive.org
We investigate the concept of speaker adaptive training (SAT) in the context of deep neural
network (DNN) acoustic models. Previous studies have shown success of performing …