Aishell-3: A multi-speaker mandarin tts corpus and the baselines
In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin
speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems …
speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems …
Pushing the limits of raw waveform speaker recognition
In recent years, speaker recognition systems based on raw waveform inputs have received
increasing attention. However, the performance of such systems are typically inferior to the …
increasing attention. However, the performance of such systems are typically inferior to the …
The speakin system for voxceleb speaker recognition challange 2021
M Zhao, Y Ma, M Liu, M Xu - arxiv preprint arxiv:2109.01989, 2021 - arxiv.org
This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …
Simple attention module based speaker verification with iterative noisy label detection
Recently, the attention mechanism such as squeeze-and-excitation module (SE) and
convolutional block attention module (CBAM) has achieved great success in deep learning …
convolutional block attention module (CBAM) has achieved great success in deep learning …
Self-supervised learning with cluster-aware-dino for high-performance robust speaker verification
The automatic speaker verification task has achieved great success using deep learning
approaches with a large-scale, manually annotated dataset. However, collecting a …
approaches with a large-scale, manually annotated dataset. However, collecting a …
Self-supervised speaker verification using dynamic loss-gate and label correction
For self-supervised speaker verification, the quality of pseudo labels decides the upper
bound of the system due to the massive unreliable labels. In this work, we propose dynamic …
bound of the system due to the massive unreliable labels. In this work, we propose dynamic …
Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation
This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …
An enhanced res2net with local and global feature fusion for speaker verification
Effective fusion of multi-scale features is crucial for improving speaker verification
performance. While most existing methods aggregate multi-scale features in a layer-wise …
performance. While most existing methods aggregate multi-scale features in a layer-wise …
Cross-channel attention-based target speaker voice activity detection: Experimental results for the m2met challenge
DukeECE. As the highly overlapped speech exists in the dataset, we employ an x-vector-
based target-speaker voice activity detection (TS-VAD) to find the overlap between …
based target-speaker voice activity detection (TS-VAD) to find the overlap between …
The multi-speaker multi-style voice cloning challenge 2021
The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common
sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning …
sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning …