Overview of speaker modeling and its applications: From the lens of deep speaker representation learning
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …
By thoroughly and accurately modeling this information, it can be utilized in various …
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …
Speech foundation model ensembles for the controlled singing voice deepfake detection (ctrsvdd) challenge 2024
This work details our approach to achieving a leading system with a 1.79% pooled equal
error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection …
error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection …
Multi-stage Face-voice Association Learning with Keynote Speaker Diarization
The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …
Prompt-driven target speech diarization
We introduce a novel task named 'target speech diarization', which seeks to determine
'when target event occurred'within an audio signal. We devise a neural architecture called …
'when target event occurred'within an audio signal. We devise a neural architecture called …
Towards quantifying and reducing language mismatch effects in cross-lingual speech anti-spoofing
The effects of language mismatch impact speech anti-spoofing systems, while investigations
and quantification of these effects remain limited. Existing anti-spoofing datasets are mainly …
and quantification of these effects remain limited. Existing anti-spoofing datasets are mainly …
SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech
It was shown that pre-trained models with self-supervised learning (SSL) techniques are
effective in various downstream speech tasks. However, most such models are trained on …
effective in various downstream speech tasks. However, most such models are trained on …
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
This paper addresses the extraction of the bird vocalization embedding from the whole song
level using disentangled representation learning (DRL). Bird vocalization embeddings are …
level using disentangled representation learning (DRL). Bird vocalization embeddings are …
Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques
M Tiwari, DK Verma - International Journal of Speech Technology, 2024 - Springer
This research article introduces a novel approach to text-independent speaker recognition
by integrating Mel-Frequency Cepstral Coefficients (MFCC) and Bidirectional Long Short …
by integrating Mel-Frequency Cepstral Coefficients (MFCC) and Bidirectional Long Short …
Introducing Euclidean Distance Optimization into Softmax Loss under Neural Collapse
Q Zhang, X Zhang, J Yang, M Sun, T Cao - Pattern Recognition, 2025 - Elsevier
The choice of loss function is crucial in training convolutional neural networks (CNNs). Cross-
entropy loss with Softmax and its variations have demonstrated excellent performance in …
entropy loss with Softmax and its variations have demonstrated excellent performance in …