Aishell-3: A multi-speaker mandarin tts corpus and the baselines

Y Shi, H Bu, X Xu, S Zhang, M Li - arxiv preprint arxiv:2010.11567, 2020 - arxiv.org
In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin
speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems …

Pushing the limits of raw waveform speaker recognition

J Jung, YJ Kim, HS Heo, BJ Lee, Y Kwon… - arxiv preprint arxiv …, 2022 - arxiv.org
In recent years, speaker recognition systems based on raw waveform inputs have received
increasing attention. However, the performance of such systems are typically inferior to the …

The speakin system for voxceleb speaker recognition challange 2021

M Zhao, Y Ma, M Liu, M Xu - arxiv preprint arxiv:2109.01989, 2021 - arxiv.org
This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …

Simple attention module based speaker verification with iterative noisy label detection

X Qin, N Li, C Weng, D Su, M Li - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Recently, the attention mechanism such as squeeze-and-excitation module (SE) and
convolutional block attention module (CBAM) has achieved great success in deep learning …

Self-supervised learning with cluster-aware-dino for high-performance robust speaker verification

B Han, Z Chen, Y Qian - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
The automatic speaker verification task has achieved great success using deep learning
approaches with a large-scale, manually annotated dataset. However, collecting a …

Self-supervised speaker verification using dynamic loss-gate and label correction

B Han, Z Chen, Y Qian - arxiv preprint arxiv:2208.01928, 2022 - arxiv.org
For self-supervised speaker verification, the quality of pseudo labels decides the upper
bound of the system due to the massive unreliable labels. In this work, we propose dynamic …

Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation

D Cai, M Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org
This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …

An enhanced res2net with local and global feature fusion for speaker verification

Y Chen, S Zheng, H Wang, L Cheng, Q Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
Effective fusion of multi-scale features is crucial for improving speaker verification
performance. While most existing methods aggregate multi-scale features in a layer-wise …

Cross-channel attention-based target speaker voice activity detection: Experimental results for the m2met challenge

W Wang, X Qin, M Li - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
DukeECE. As the highly overlapped speech exists in the dataset, we employ an x-vector-
based target-speaker voice activity detection (TS-VAD) to find the overlap between …

The multi-speaker multi-style voice cloning challenge 2021

Q **e, X Tian, G Liu, K Song, L **e, Z Wu… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common
sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning …