- Academic Search

Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-s...

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Save Cite Cited by 230 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer

Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

Save Cite Cited by 422 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Save Cite Cited by 421 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

Y Zhao, WC Huang, X Tian, J Yamagishi… - arxiv preprint arxiv …, 2020 - arxiv.org

The voice conversion challenge is a bi-annual scientific event held to compare and
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …

Save Cite Cited by 243 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] sciencedirect.com

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier

In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Save Cite Cited by 187 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arxiv preprint arxiv …, 2023 - arxiv.org

The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Save Cite Cited by 69 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

Save Cite Cited by 59 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

A text-guided protein design framework

S Liu, Y Li, Z Li, A Gitter, Y Zhu, J Lu, Z Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

Current AI-assisted protein design mainly utilizes protein sequential and structural
information. Meanwhile, there exists tremendous knowledge curated by humans in the text …

Save Cite Cited by 60 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion

YA Li, A Zare, N Mesgarani - arxiv preprint arxiv:2107.10394, 2021 - arxiv.org

We present an unsupervised non-parallel many-to-many voice conversion (VC) method
using a generative adversarial network (GAN) called StarGAN v2. Using a combination of …

Save Cite Cited by 112 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aaai.org

i-code: An integrative and composable multimodal learning framework

Z Yang, Y Fang, C Zhu, R Pryzant, D Chen… - Proceedings of the …, 2023 - ojs.aaai.org

Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to
maintain a holistic worldview. Most current pretraining methods, however, are limited to one …

Save Cite Cited by 44 Related articles All 5 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-s...

A review of deep learning techniques for speech processing

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

An overview of voice conversion and its challenges: From statistical modeling to deep learning

Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

Emotional voice conversion: Theory, databases and ESD

Transformers in speech processing: A survey

The singing voice conversion challenge 2023

A text-guided protein design framework

Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion

i-code: An integrative and composable multimodal learning framework