A survey on text-dependent and text-independent speaker verification

Y Tu, W Lin, MW Mak - IEEE Access, 2022 - ieeexplore.ieee.org
Speaker verification (SV) aims to detect an individual's identity from his/her voice. SV has
been successfully applied in various areas such as access control, remote service …

Wav2Spk: A simple DNN architecture for learning speaker embeddings from waveforms

W Lin, MW Mak - 2020 - ira.lib.polyu.edu.hk
Speaker recognition has seen impressive advances with the advent of deep neural networks
(DNNs). However, state-of-the-art speaker recognition systems still rely on human …

Text-independent speaker verification employing CNN-LSTM-TDNN hybrid networks

J Alam, A Fathan, WH Kang - International Conference on Speech and …, 2021 - Springer
Abstract Time Delay Neural Network (TDNN)-based speaker embeddings extraction have
become the dominant approach for text-independent speaker verification. Several single …

Robust speaker verification using deep weight space ensemble

W Lin, MW Mak - IEEE/ACM Transactions on Audio, Speech …, 2023 - ieeexplore.ieee.org
Domain shift is one of the most challenging problems in speaker verification. Although
numerous methods have been proposed to address domain shift, most approaches optimize …

Mixture representation learning for deep speaker embedding

W Lin, MW Mak - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org
How to effectively convert a sequence of variable-length acoustic features to a fixed-
dimension representation has always been a research focus in speaker recognition. In state …

Robust speaker verification using population-based data augmentation

W Lin, MW Mak - … 2022-2022 IEEE International Conference on …, 2022 - ieeexplore.ieee.org
Speaker recognition under environments with a low signal-to-noise ratio (SNR) and high
reverberation level has always been challenging. Data augmentation can be applied to …

Promoting independence of depression and speaker features for speaker disentanglement in speech-based depression detection

L Zuo, MW Mak, Y Tu - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org
Recent studies have demonstrated the effectiveness of speaker disentanglement in
mitigating the interference caused by speaker features in speech-based depression …

Aggregating frame-level information in the spectral domain with self-attention for speaker embedding

Y Tu, MW Mak - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org
Most pooling methods in state-of-the-art speaker embedding networks are implemented in
the temporal domain. However, due to the high non-stationarity in the feature maps …

[PDF][PDF] Mutual Information Enhanced Training for Speaker Embedding.

Y Tu, MW Mak - Interspeech, 2021 - isca-archive.org
Mutual information (MI) is useful in unsupervised and selfsupervised learning. Maximizing
the MI between the low-level features and the learned embeddings can preserve meaningful …

Short-time spectral aggregation for speaker embedding

Y Tu, MW Mak - … 2021-2021 IEEE International Conference on …, 2021 - ieeexplore.ieee.org
State-of-the-art speaker verification systems take frame-level acoustics features as input and
produce fixed-dimensional embeddings as utterance-level representations. Thus, how to …