Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
An overview of voice conversion and its challenges: From statistical modeling to deep learning
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …
conversion, we change the speaker identity from one to another, while kee** the linguistic …
Voicebox: Text-guided multilingual universal speech generation at scale
Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …
community. These models not only generate high fidelity outputs, but are also generalists …
A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Asvspoof 2021: Towards spoofed and deepfake speech detection in the wild
Benchmarking initiatives support the meaningful comparison of competing solutions to
prominent problems in speech and language processing. Successive benchmarking …
prominent problems in speech and language processing. Successive benchmarking …
Multi-task learning for detecting and segmenting manipulated facial images and videos
Detecting manipulated images and videos is an important topic in digital media forensics.
Most detection methods use binary classification to determine the probability of a query …
Most detection methods use binary classification to determine the probability of a query …
Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion
The voice conversion challenge is a bi-annual scientific event held to compare and
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …
Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks
This paper proposes a method that allows non-parallel many-to-many voice conversion (VC)
by using a variant of a generative adversarial network (GAN) called StarGAN. Our method …
by using a variant of a generative adversarial network (GAN) called StarGAN. Our method …
The voicemos challenge 2022
We present the first edition of the VoiceMOS Challenge, a scientific event that aims to
promote the study of automatic prediction of the mean opinion score (MOS) of synthetic …
promote the study of automatic prediction of the mean opinion score (MOS) of synthetic …
Generalization ability of MOS prediction networks
Automatic methods to predict listener opinions of synthesized speech remain elusive since
listeners, systems being evaluated, characteristics of the speech, and even the instructions …
listeners, systems being evaluated, characteristics of the speech, and even the instructions …