Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Zero-shot voice conversion (VC) aims to transform the source speaker timbre into an
arbitrary unseen one without altering the original speech content. While recent …
arbitrary unseen one without altering the original speech content. While recent …
MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model
M Baas, P Scholtz, A Mehta, E Dyson… - arxiv preprint arxiv …, 2025 - arxiv.org
Codec-based text-to-speech (TTS) models have shown impressive quality with zero-shot
voice cloning abilities. However, they often struggle with more expressive references or …
voice cloning abilities. However, they often struggle with more expressive references or …