[PDF][PDF] Disentangling prosody and timbre embeddings via voice conversion
Modern voice conversion and anonymization architectures generally share a design
preserving source linguistic content and expressivity while modifying speaker timbre …
preserving source linguistic content and expressivity while modifying speaker timbre …
Text-to-Speech With Lip Synchronization Based on Speech-Assisted Text-to-Video Alignment and Masked Unit Prediction
Y Ahn, J Chae, JW Shin - IEEE Signal Processing Letters, 2025 - ieeexplore.ieee.org
Text-to-speech (TTS) with lip synchronization (TTSLS) is the task of generating a speech
signal synchronized with the lip movements in a video given the text transcription and the …
signal synchronized with the lip movements in a video given the text transcription and the …
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-
Speech task, has garnered significant interest due to its numerous applications in multi …
Speech task, has garnered significant interest due to its numerous applications in multi …
Latent Filling: Latent Space Data Augmentation for Zero-Shot Speech Synthesis
Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems
by enlarging the training data through crowd-sourcing or augmenting existing speech data …
by enlarging the training data through crowd-sourcing or augmenting existing speech data …
Synthesis and Restoration of Traditional Ethnic Musical Instrument Timbres Based on Time-Frequency Analysis.
M Chen, Y **ang, C **ong - Traitement du Signal, 2024 - search.ebscohost.com
With the advent of the digital age, the preservation and restoration of the timbres of
traditional ethnic musical instruments have emerged as significant areas of study in …
traditional ethnic musical instruments have emerged as significant areas of study in …