Študovňa Google

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

Uložiť Citovať Citované 651-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers

S Chen, S Liu, L Zhou, Y Liu, X Tan, J Li, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces VALL-E 2, the latest advancement in neural codec language models
that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity …

Uložiť Citovať Citované 60-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] google.com

Onefi: One-shot recognition for unseen gesture via cots wifi

R **ao, J Liu, J Han, K Ren - Proceedings of the 19th ACM Conference …, 2021 - dl.acm.org

WiFi-based Human Gesture Recognition (HGR) becomes increasingly promising for device-
free human-computer interaction. However, existing WiFi-based approaches have not been …

Uložiť Citovať Citované 89-krát Súvisiace články Všetky verzie 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Meta-tts: Meta-learning for few-shot speaker adaptive text-to-speech

SF Huang, CJ Lin, DR Liu, YC Chen… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

Personalizing a speech synthesis system is a highly desired application, where the system
can generate speech with the user's voice with rare enrolled recordings. There are two main …

Uložiť Citovať Citované 62-krát Súvisiace články Všetky verzie 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Usat: A universal speaker-adaptive text-to-speech approach

W Wang, Y Song, S Jha - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

Conventional text-to-speech (TTS) research has predominantly focused on enhancing the
quality of synthesized speech for speakers in the training dataset. The challenge of …

Uložiť Citovať Citované 11-krát Súvisiace články Všetky verzie 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The multi-speaker multi-style voice cloning challenge 2021

Q **e, X Tian, G Liu, K Song, L **e, Z Wu… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common
sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning …

Uložiť Citovať Citované 44-krát Súvisiace články Všetky verzie 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Voice filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

A Gabryś, G Huybrechts, MS Ribeiro… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

State-of-the-art text-to-speech (TTS) systems require several hours of recorded speech data
to generate high-quality synthetic speech. When using reduced amounts of training data …

Uložiť Citovať Citované 25-krát Súvisiace články Všetky verzie 7

Neural codec language models are zero-shot text to speech synthesizers

S Chen, C Wang, Y Wu, Z Zhang, L Zhou… - … on Audio, Speech …, 2025 - ieeexplore.ieee.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called VALL-E) using discrete codes derived from …

Uložiť Citovať Citované 6-krát Súvisiace články

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

J Wang, J Li, X Zhao, Z Wu, S Kang, H Meng - arxiv preprint arxiv …, 2021 - arxiv.org

Factorizing speech as disentangled speech representations is vital to achieve highly
controllable style transfer in voice conversion (VC). Conventional speech representation …

Uložiť Citovať Citované 30-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Takin-vc: Zero-shot voice conversion via jointly hybrid content and memory-augmented context-aware timbre modeling

Y Yang, Y Pan, J Yao, X Zhang, J Ye, H Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

Zero-shot voice conversion (VC) aims to transform the source speaker timbre into an
arbitrary unseen one without altering the original speech content. While recent …

Uložiť Citovať Citované 3-krát Súvisiace články Všetky verzie 3 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.

Neural codec language models are zero-shot text to speech synthesizers

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers

Onefi: One-shot recognition for unseen gesture via cots wifi

Meta-tts: Meta-learning for few-shot speaker adaptive text-to-speech

Usat: A universal speaker-adaptive text-to-speech approach

The multi-speaker multi-style voice cloning challenge 2021

Voice filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

Neural codec language models are zero-shot text to speech synthesizers

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

Takin-vc: Zero-shot voice conversion via jointly hybrid content and memory-augmented context-aware timbre modeling