Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arxiv preprint arxiv …, 2020 - arxiv.org
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding

L Schönherr, K Kohls, S Zeiler, T Holz… - arxiv preprint arxiv …, 2018 - arxiv.org
Voice interfaces are becoming accepted widely as input methods for a diverse set of
devices. This development is driven by rapid improvements in automatic speech recognition …

Noise invariant frame selection: a simple method to address the background noise problem for text-independent speaker verification

S Song, S Zhang, BW Schuller, L Shen… - … Joint Conference on …, 2018 - ieeexplore.ieee.org
The performance of speaker-related systems usually degrades heavily in practical
applications largely due to the presence of background noise. To improve the robustness of …

Age group classification and gender recognition from speech with temporal convolutional neural networks

HA Sánchez-Hevia, R Gil-Pita, M Utrilla-Manso… - Multimedia Tools and …, 2022 - Springer
This paper analyses the performance of different types of Deep Neural Networks to jointly
estimate age and identify gender from speech, to be applied in Interactive Voice Response …

An improved deep embedding learning method for short duration speaker verification

Z Gao, Y Song, IV McLoughlin, W Guo, LR Dai - 2018 - kar.kent.ac.uk
This paper presents an improved deep embedding learning method based on convolutional
neural networks (CNN) for short-duration speaker verification (SV). Existing deep learning …

Noise robust speaker recognition based on adaptive frame weighting in GMM for i-vector extraction

X Zhang, X Zou, M Sun, TF Zheng, C Jia… - IEEE Access, 2019 - ieeexplore.ieee.org
Even though speaker recognition has gained significant progress in recent years, its
performance is known to be deteriorated severely with the existence of strong background …

Time-contrastive learning based deep bottleneck features for text-dependent speaker verification

AK Sarkar, ZH Tan, H Tang, S Shon… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
There are a number of studies about extraction of bottleneck (BN) features from deep neural
networks (DNNs) trained to discriminate speakers, pass-phrases, and triphone states for …

Age and gender recognition from speech using deep neural networks

HA Sánchez-Hevia, R Gil-Pita, M Utrilla-Manso… - Advances in Physical …, 2021 - Springer
This paper deals with joint gender identification and age group classification from speech,
aimed at improving the functionalities of Interactive Voice Response Systems. Deep Neural …

Voice-based gender identification using co-occurrence-based features

A Ghosal, C Pathak, P Singh, S Dutta - Computational Intelligence in …, 2020 - Springer
Automatic detection of gender based on audio is gaining its popularity day-by-day because
of its several applications in several domains. But most of the past research works are …

Optimizing neural network embeddings using a pair-wise loss for text-independent speaker verification

H Dhamyal, T Zhou, B Raj… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
This paper proposes a new loss function called the “quartet” loss for the better optimization
of the neural networks for matching tasks. For such tasks, where neural network embeddings …