Академия Google

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Сохранить Цитировать Цитируется: 419 Похожие статьи Все версии статьи (9)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An overview of affective speech synthesis and conversion in the deep learning era

A Triantafyllopoulos, BW Schuller… - Proceedings of the …, 2023 - ieeexplore.ieee.org

Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …

Сохранить Цитировать Цитируется: 67 Похожие статьи Все версии статьи (12)

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Contentvec: An improved self-supervised speech representation by disentangling speakers

K Qian, Y Zhang, H Gao, J Ni, CI Lai… - International …, 2022 - proceedings.mlr.press

Self-supervised learning in speech involves training a speech representation network on a
large-scale unannotated speech corpus, and then applying the learned representations to …

Сохранить Цитировать Цитируется: 126 Похожие статьи Все версии статьи (9) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

[ЦИТИРОВАНИЕ][C] An introduction to variational autoencoders

DP Kingma, M Welling - Foundations and Trends® in …, 2019 - nowpublishers.com

An Introduction to Variational Autoencoders Page 1 An Introduction to Variational Autoencoders
Page 2 Other titles in Foundations and Trends R in Machine Learning Computational Optimal …

Сохранить Цитировать Цитируется: 3376 Похожие статьи Все версии статьи (11) Поиск в библиотеках В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Autovc: Zero-shot voice style transfer with only autoencoder loss

K Qian, Y Zhang, S Chang, X Yang… - International …, 2019 - proceedings.mlr.press

Despite the progress in voice conversion, many-to-many voice conversion trained on non-
parallel data, as well as zero-shot voice conversion, remains under-explored. Deep style …

Сохранить Цитировать Цитируется: 595 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Emotional voice conversion: Theory, databases and esd

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier

In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Сохранить Цитировать Цитируется: 191 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

X Wang, J Yamagishi, M Todisco, H Delgado… - Computer Speech & …, 2020 - Elsevier

Automatic speaker verification (ASV) is one of the most natural and convenient means of
biometric person recognition. Unfortunately, just like all other biometric systems, ASV is …

Сохранить Цитировать Цитируется: 435 Похожие статьи Все версии статьи (14)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks

H Kameoka, T Kaneko, K Tanaka… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org

This paper proposes a method that allows non-parallel many-to-many voice conversion (VC)
by using a variant of a generative adversarial network (GAN) called StarGAN. Our method …

Сохранить Цитировать Цитируется: 510 Похожие статьи Все версии статьи (5)

[Free GPT-4]
[DeepSeek]

[PDF] ntt.co.jp

Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - 2018 26th European Signal …, 2018 - ieeexplore.ieee.org

We propose a non-parallel voice-conversion (VC) method that can learn a map** from
source to target speech without relying on parallel data. The proposed method is particularly …

Сохранить Цитировать Цитируется: 368 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Unsupervised speech decomposition via triple information bottleneck

K Qian, Y Zhang, S Chang… - International …, 2020 - proceedings.mlr.press

Speech information can be roughly decomposed into four components: language content,
timbre, pitch, and rhythm. Obtaining disentangled representations of these components is …

Сохранить Цитировать Цитируется: 219 Похожие статьи Все версии статьи (9) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Voice conversion from non-parallel corpora using variational auto-encoder

An overview of voice conversion and its challenges: From statistical modeling to deep learning

An overview of affective speech synthesis and conversion in the deep learning era

Contentvec: An improved self-supervised speech representation by disentangling speakers

[ЦИТИРОВАНИЕ][C] An introduction to variational autoencoders

Autovc: Zero-shot voice style transfer with only autoencoder loss

Emotional voice conversion: Theory, databases and esd

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks

Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks

Unsupervised speech decomposition via triple information bottleneck