Cam++: A fast and efficient network for speaker verification using context-aware masking
Time delay neural network (TDNN) has been proven to be efficient for speaker verification.
One of its successful variants, ECAPA-TDNN, achieved state-of-the-art performance at the …
One of its successful variants, ECAPA-TDNN, achieved state-of-the-art performance at the …
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training
speaker embedding extractors. First, we provide an open-source platform for researchers in …
speaker embedding extractors. First, we provide an open-source platform for researchers in …
[PDF][PDF] Unsupervised anomalous detection based on unsupervised pretrained models
Unsupervised pretrained models have been widely applied in lots of scenarios successfully.
DCASE 2023 challenge Task2 is about firstshot unsupervised anomalous sound detection …
DCASE 2023 challenge Task2 is about firstshot unsupervised anomalous sound detection …
Weakly-supervised speech pre-training: A case study on target speech recognition
Self-supervised learning (SSL) based speech pre-training has attracted much attention for
its capability of extracting rich representations learned from massive unlabeled data. On the …
its capability of extracting rich representations learned from massive unlabeled data. On the …
Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection
Machine anomalous sound detection is a useful technique for various applications, but it
often suffers from poor generalization due to the challenges of data collection and complex …
often suffers from poor generalization due to the challenges of data collection and complex …
Wespeaker baselines for VoxSRC2023
This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023
Challenge. Our aim is to provide participants, especially those with limited experience, with …
Challenge. Our aim is to provide participants, especially those with limited experience, with …
Phantom in the opera: adversarial music attack for robot dialogue system
This study explores the vulnerability of robot dialogue systems' automatic speech
recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a …
recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a …
Adversarial data augmentation for robust speaker verification
Data augmentation (DA) has gained widespread popularity in deep speaker models due to
its ease of implementation and significant effectiveness. It enriches training data by …
its ease of implementation and significant effectiveness. It enriches training data by …
SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition
Deploying a well-optimized pre-trained speaker recognition model in a new domain often
leads to a significant decline in performance. While fine-tuning is a commonly employed …
leads to a significant decline in performance. While fine-tuning is a commonly employed …
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
Data augmentation (DA) has played a pivotal role in the success of deep speaker
recognition. Current DA techniques primarily focus on speaker-preserving augmentation …
recognition. Current DA techniques primarily focus on speaker-preserving augmentation …