Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling

Y Yang, Y Pan, J Yao, X Zhang, J Ye, H Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Zero-shot voice conversion (VC) aims to transform the source speaker timbre into an
arbitrary unseen one without altering the original speech content. While recent …

MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model

M Baas, P Scholtz, A Mehta, E Dyson… - arxiv preprint arxiv …, 2025 - arxiv.org
Codec-based text-to-speech (TTS) models have shown impressive quality with zero-shot
voice cloning abilities. However, they often struggle with more expressive references or …