A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
An overview of voice conversion and its challenges: From statistical modeling to deep learning
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …
conversion, we change the speaker identity from one to another, while kee** the linguistic …
Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion
The voice conversion challenge is a bi-annual scientific event held to compare and
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …
Emotional voice conversion: Theory, databases and ESD
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …
research, and the existing emotional speech databases. We then motivate the development …
Transformers in speech processing: A survey
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …
sparked the interest of the speech-processing community, leading to an exploration of their …
The singing voice conversion challenge 2023
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …
scientific event aiming to compare and understand different voice conversion (VC) systems …
A text-guided protein design framework
Current AI-assisted protein design mainly utilizes protein sequential and structural
information. Meanwhile, there exists tremendous knowledge curated by humans in the text …
information. Meanwhile, there exists tremendous knowledge curated by humans in the text …
Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion
We present an unsupervised non-parallel many-to-many voice conversion (VC) method
using a generative adversarial network (GAN) called StarGAN v2. Using a combination of …
using a generative adversarial network (GAN) called StarGAN v2. Using a combination of …
i-code: An integrative and composable multimodal learning framework
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to
maintain a holistic worldview. Most current pretraining methods, however, are limited to one …
maintain a holistic worldview. Most current pretraining methods, however, are limited to one …