A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

Y Zhao, WC Huang, X Tian, J Yamagishi… - arxiv preprint arxiv …, 2020 - arxiv.org
The voice conversion challenge is a bi-annual scientific event held to compare and
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arxiv preprint arxiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

A text-guided protein design framework

S Liu, Y Li, Z Li, A Gitter, Y Zhu, J Lu, Z Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Current AI-assisted protein design mainly utilizes protein sequential and structural
information. Meanwhile, there exists tremendous knowledge curated by humans in the text …

Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion

YA Li, A Zare, N Mesgarani - arxiv preprint arxiv:2107.10394, 2021 - arxiv.org
We present an unsupervised non-parallel many-to-many voice conversion (VC) method
using a generative adversarial network (GAN) called StarGAN v2. Using a combination of …

i-code: An integrative and composable multimodal learning framework

Z Yang, Y Fang, C Zhu, R Pryzant, D Chen… - Proceedings of the …, 2023 - ojs.aaai.org
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to
maintain a holistic worldview. Most current pretraining methods, however, are limited to one …