Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone
E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …
Deep speaker embeddings for Speaker Verification: Review and experimental comparison
The construction of speaker-specific acoustic models for automatic speaker recognition is
almost exclusively based on deep neural network-based speaker embeddings. This work …
almost exclusively based on deep neural network-based speaker embeddings. This work …
Audio-visual person-of-interest deepfake detection
Face manipulation technology is advancing very rapidly, and new methods are being
proposed day by day. The aim of this work is to propose a deepfake detector that can cope …
proposed day by day. The aim of this work is to propose a deepfake detector that can cope …
HierSpeech: Bridging the gap between text and speech by hierarchical variational inference using self-supervised representations for speech synthesis
This paper presents HierSpeech, a high-quality end-to-end text-to-speech (TTS) system
based on a hierarchical conditional variational autoencoder (VAE) utilizing self-supervised …
based on a hierarchical conditional variational autoencoder (VAE) utilizing self-supervised …
Voxsrc 2021: The third voxceleb speaker recognition challenge
The third instalment of the VoxCeleb Speaker Recognition Challenge was held in
conjunction with Interspeech 2021. The aim of this challenge was to assess how well current …
conjunction with Interspeech 2021. The aim of this challenge was to assess how well current …
Deepfake audio detection by speaker verification
Thanks to recent advances in deep leaning, sophisticated generation tools exist, nowadays,
that produce extremely realistic synthetic speech. However, malicious uses of such tools are …
that produce extremely realistic synthetic speech. However, malicious uses of such tools are …
The ins and outs of speaker recognition: lessons from VoxSRC 2020
The VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020 offers a
challenging evaluation for speaker recognition systems, which includes celebrities playing …
challenging evaluation for speaker recognition systems, which includes celebrities playing …
Zmm-tts: Zero-shot multilingual and multispeaker speech synthesis conditioned on self-supervised discrete speech representations
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker,
single-language synthesis. Multilingual TTS systems are limited to resource-rich languages …
single-language synthesis. Multilingual TTS systems are limited to resource-rich languages …
Transfer learning framework for low-resource text-to-speech using a large-scale unlabeled speech corpus
Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus,
which is troublesome to collect. In this paper, we propose a transfer learning framework for …
which is troublesome to collect. In this paper, we propose a transfer learning framework for …
Antifake: Using adversarial audio to prevent unauthorized speech synthesis
The rapid development of deep neural networks and generative AI has catalyzed growth in
realistic speech synthesis. While this technology has great potential to improve lives, it also …
realistic speech synthesis. While this technology has great potential to improve lives, it also …