Академия Google

E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …

Сохранить Цитировать Цитируется: 450 Похожие статьи Все версии статьи (7) В виде HTML

Deep speaker embeddings for Speaker Verification: Review and experimental comparison

M Jakubec, R Jarina, E Lieskovska, P Kasak - Engineering Applications of …, 2024 - Elsevier

The construction of speaker-specific acoustic models for automatic speaker recognition is
almost exclusively based on deep neural network-based speaker embeddings. This work …

Сохранить Цитировать Цитируется: 22 Похожие статьи Все версии статьи (2)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio-visual person-of-interest deepfake detection

D Cozzolino, A Pianese, M Nießner… - Proceedings of the …, 2023 - openaccess.thecvf.com

Face manipulation technology is advancing very rapidly, and new methods are being
proposed day by day. The aim of this work is to propose a deepfake detector that can cope …

Сохранить Цитировать Цитируется: 74 Похожие статьи Все версии статьи (6) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

HierSpeech: Bridging the gap between text and speech by hierarchical variational inference using self-supervised representations for speech synthesis

SH Lee, SB Kim, JH Lee, E Song… - Advances in Neural …, 2022 - proceedings.neurips.cc

This paper presents HierSpeech, a high-quality end-to-end text-to-speech (TTS) system
based on a hierarchical conditional variational autoencoder (VAE) utilizing self-supervised …

Сохранить Цитировать Цитируется: 49 Похожие статьи Все версии статьи (6) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Voxsrc 2021: The third voxceleb speaker recognition challenge

A Brown, J Huh, JS Chung, A Nagrani… - arxiv preprint arxiv …, 2022 - arxiv.org

The third instalment of the VoxCeleb Speaker Recognition Challenge was held in
conjunction with Interspeech 2021. The aim of this challenge was to assess how well current …

Сохранить Цитировать Цитируется: 57 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deepfake audio detection by speaker verification

A Pianese, D Cozzolino, G Poggi… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Thanks to recent advances in deep leaning, sophisticated generation tools exist, nowadays,
that produce extremely realistic synthetic speech. However, malicious uses of such tools are …

Сохранить Цитировать Цитируется: 57 Похожие статьи Все версии статьи (4)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The ins and outs of speaker recognition: lessons from VoxSRC 2020

Y Kwon, HS Heo, BJ Lee… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

The VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020 offers a
challenging evaluation for speaker recognition systems, which includes celebrities playing …

Сохранить Цитировать Цитируется: 73 Похожие статьи Все версии статьи (5)

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Zmm-tts: Zero-shot multilingual and multispeaker speech synthesis conditioned on self-supervised discrete speech representations

C Gong, X Wang, E Cooper, D Wells… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker,
single-language synthesis. Multilingual TTS systems are limited to resource-rich languages …

Сохранить Цитировать Цитируется: 18 Похожие статьи Все версии статьи (2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transfer learning framework for low-resource text-to-speech using a large-scale unlabeled speech corpus

M Kim, M Jeong, BJ Choi, S Ahn, JY Lee… - arxiv preprint arxiv …, 2022 - arxiv.org

Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus,
which is troublesome to collect. In this paper, we propose a transfer learning framework for …

Сохранить Цитировать Цитируется: 29 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Antifake: Using adversarial audio to prevent unauthorized speech synthesis

Z Yu, S Zhai, N Zhang - Proceedings of the 2023 ACM SIGSAC …, 2023 - dl.acm.org

The rapid development of deep neural networks and generative AI has catalyzed growth in
realistic speech synthesis. While this technology has great potential to improve lives, it also …

Сохранить Цитировать Цитируется: 22 Похожие статьи Все версии статьи (5)

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Clova baseline system for the voxceleb speaker recognition challenge 2020

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

Deep speaker embeddings for Speaker Verification: Review and experimental comparison

Audio-visual person-of-interest deepfake detection

HierSpeech: Bridging the gap between text and speech by hierarchical variational inference using self-supervised representations for speech synthesis

Voxsrc 2021: The third voxceleb speaker recognition challenge

Deepfake audio detection by speaker verification

The ins and outs of speaker recognition: lessons from VoxSRC 2020

Zmm-tts: Zero-shot multilingual and multispeaker speech synthesis conditioned on self-supervised discrete speech representations

Transfer learning framework for low-resource text-to-speech using a large-scale unlabeled speech corpus

Antifake: Using adversarial audio to prevent unauthorized speech synthesis