Академия Google

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer

Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

Сохранить Цитировать Цитируется: 435 Похожие статьи Все версии статьи (11)

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Сохранить Цитировать Цитируется: 419 Похожие статьи Все версии статьи (9)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Deep stable learning for out-of-distribution generalization

X Zhang, P Cui, R Xu, L Zhou… - Proceedings of the …, 2021 - openaccess.thecvf.com

Approaches based on deep neural networks have achieved striking performance when
testing data and training data share similar distribution, but can significantly fail otherwise …

Сохранить Цитировать Цитируется: 328 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Unsupervised speech decomposition via triple information bottleneck

K Qian, Y Zhang, S Chang… - International …, 2020 - proceedings.mlr.press

Speech information can be roughly decomposed into four components: language content,
timbre, pitch, and rhythm. Obtaining disentangled representations of these components is …

Сохранить Цитировать Цитируется: 219 Похожие статьи Все версии статьи (9) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion

YA Li, A Zare, N Mesgarani - arxiv preprint arxiv:2107.10394, 2021 - arxiv.org

We present an unsupervised non-parallel many-to-many voice conversion (VC) method
using a generative adversarial network (GAN) called StarGAN v2. Using a combination of …

Сохранить Цитировать Цитируется: 114 Похожие статьи Все версии статьи (4) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Audio deepfake approaches

OA Shaaban, R Yildirim, AA Alguttar - IEEE Access, 2023 - ieeexplore.ieee.org

This paper presents a review of techniques involved in the creation and detection of audio
deepfakes, the first section provides information about general deep fakes. In the second …

Сохранить Цитировать Цитируется: 18 Похожие статьи Все версии статьи (4)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Privacy-preserving voice analysis via disentangled representations

R Aloufi, H Haddadi, D Boyle - Proceedings of the 2020 ACM SIGSAC …, 2020 - dl.acm.org

Voice User Interfaces (VUIs) are increasingly popular and built into smartphones, home
assistants, and Internet of Things (IoT) devices. Despite offering an always-on convenient …

Сохранить Цитировать Цитируется: 75 Похожие статьи Все версии статьи (4)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Anonymizing speech: Evaluating and designing speaker anonymization techniques

P Champion - arxiv preprint arxiv:2308.04455, 2023 - arxiv.org

The growing use of voice user interfaces has led to a surge in the collection and storage of
speech data. While data collection allows for the development of efficient tools powering …

Сохранить Цитировать Цитируется: 21 Похожие статьи Все версии статьи (12) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Global prosody style transfer without text transcriptions

K Qian, Y Zhang, S Chang, J **ong… - International …, 2021 - proceedings.mlr.press

Prosody plays an important role in characterizing the style of a speaker or an emotion, but
most non-parallel voice or emotion style transfer algorithms do not convert any prosody …

Сохранить Цитировать Цитируется: 40 Похожие статьи Все версии статьи (9) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts

WC Huang, T Hayashi, S Watanabe, T Toda - arxiv preprint arxiv …, 2020 - arxiv.org

This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice
conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC) …

Сохранить Цитировать Цитируется: 47 Похожие статьи Все версии статьи (6) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Unsupervised representation disentanglement using cross domain features and adversarial learning...

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

An overview of voice conversion and its challenges: From statistical modeling to deep learning

Deep stable learning for out-of-distribution generalization

Unsupervised speech decomposition via triple information bottleneck

Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion

Audio deepfake approaches

Privacy-preserving voice analysis via disentangled representations

Anonymizing speech: Evaluating and designing speaker anonymization techniques

Global prosody style transfer without text transcriptions

The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts