- Academic Search

Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling

Y Yang, Y Pan, J Yao, X Zhang, J Ye, H Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

Zero-shot voice conversion (VC) aims to transform the source speaker timbre into an
arbitrary unseen one without altering the original speech content. While recent …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model

M Baas, P Scholtz, A Mehta, E Dyson… - arxiv preprint arxiv …, 2025 - arxiv.org

Codec-based text-to-speech (TTS) models have shown impressive quality with zero-shot
voice cloning abilities. However, they often struggle with more expressive references or …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Neural codec language models for disentangled and textless voice conversion

Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling

MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model