Cam++: A fast and efficient network for speaker verification using context-aware masking

H Wang, S Zheng, Y Chen, L Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org
Time delay neural network (TDNN) has been proven to be efficient for speaker verification.
One of its successful variants, ECAPA-TDNN, achieved state-of-the-art performance at the …

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

J Jung, W Zhang, J Shi, Z Aldeneh, T Higuchi… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training
speaker embedding extractors. First, we provide an open-source platform for researchers in …

[PDF][PDF] Unsupervised anomalous detection based on unsupervised pretrained models

Z Lv, B Han, Z Chen, Y Qian, J Ding… - DCASE 2023 Challenge …, 2023 - dcase.community
Unsupervised pretrained models have been widely applied in lots of scenarios successfully.
DCASE 2023 challenge Task2 is about firstshot unsupervised anomalous sound detection …

Weakly-supervised speech pre-training: A case study on target speech recognition

W Zhang, Y Qian - arxiv preprint arxiv:2305.16286, 2023 - arxiv.org
Self-supervised learning (SSL) based speech pre-training has attracted much attention for
its capability of extracting rich representations learned from massive unlabeled data. On the …

Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection

B Han, Z Lv, A Jiang, W Huang, Z Chen… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Machine anomalous sound detection is a useful technique for various applications, but it
often suffers from poor generalization due to the challenges of data collection and complex …

Wespeaker baselines for VoxSRC2023

S Wang, C Liang, X **ang, B Han, Z Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023
Challenge. Our aim is to provide participants, especially those with limited experience, with …

Phantom in the opera: adversarial music attack for robot dialogue system

S Li, J Li, Y Cao - Frontiers in Computer Science, 2024 - frontiersin.org
This study explores the vulnerability of robot dialogue systems' automatic speech
recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a …

Adversarial data augmentation for robust speaker verification

Z Zhou, J Chen, N Wang, L Li, D Wang - Proceedings of the 2023 9th …, 2023 - dl.acm.org
Data augmentation (DA) has gained widespread popularity in deep speaker models due to
its ease of implementation and significant effectiveness. It enriches training data by …

SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition

T Wang, L Li, D Wang - arxiv preprint arxiv:2406.07832, 2024 - arxiv.org
Deploying a well-optimized pre-trained speaker recognition model in a new domain often
leads to a significant decline in performance. While fine-tuning is a commonly employed …

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

Z Zhou, S Xu, S Yin, L Li, D Wang - arxiv preprint arxiv:2406.07421, 2024 - arxiv.org
Data augmentation (DA) has played a pivotal role in the success of deep speaker
recognition. Current DA techniques primarily focus on speaker-preserving augmentation …