[PDF][PDF] Disentangling prosody and timbre embeddings via voice conversion

N Gengembre, O Le Blouch… - Proc. Interspeech …, 2024 - anr-eva.gitlabpages.inria.fr
Modern voice conversion and anonymization architectures generally share a design
preserving source linguistic content and expressivity while modifying speaker timbre …

Text-to-Speech With Lip Synchronization Based on Speech-Assisted Text-to-Video Alignment and Masked Unit Prediction

Y Ahn, J Chae, JW Shin - IEEE Signal Processing Letters, 2025 - ieeexplore.ieee.org
Text-to-speech (TTS) with lip synchronization (TTSLS) is the task of generating a speech
signal synchronized with the lip movements in a video given the text transcription and the …

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

R Fu, X Qi, Z Wen, J Tao, T Wang, C Qiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-
Speech task, has garnered significant interest due to its numerous applications in multi …

Latent Filling: Latent Space Data Augmentation for Zero-Shot Speech Synthesis

JS Bae, JY Lee, JH Lee, S Mun, T Kang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems
by enlarging the training data through crowd-sourcing or augmenting existing speech data …

Synthesis and Restoration of Traditional Ethnic Musical Instrument Timbres Based on Time-Frequency Analysis.

M Chen, Y **ang, C **ong - Traitement du Signal, 2024 - search.ebscohost.com
With the advent of the digital age, the preservation and restoration of the timbres of
traditional ethnic musical instruments have emerged as significant areas of study in …