- Academic Search

Z Ye, Z Ju, H Liu, X Tan, J Chen, Y Lu, P Sun… - Proceedings of the …, 2024 - dl.acm.org

Recent progress in large-scale zero-shot speech synthesis has been significantly advanced
by language models and diffusion models. However, the generation process of both …

Tallenna Viittaa Viittausten määrä 12 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Autoregressive diffusion transformer for text-to-speech synthesis

Z Liu, S Wang, S Inoue, Q Bai, H Li - arxiv preprint arxiv:2406.05551, 2024 - arxiv.org

Audio language models have recently emerged as a promising approach for various audio
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …

Tallenna Viittaa Viittausten määrä 12 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Songcreator: Lyrics-based universal song generation

S Lei, Y Zhou, B Tang, MWY Lam… - Advances in …, 2025 - proceedings.neurips.cc

Music is an integral part of human culture, embodying human intelligence and creativity, of
which songs compose an essential part. While various aspects of song generation have …

Tallenna Viittaa Viittausten määrä 3 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speech Editing--a Summary

T Kässmann, Y Liu, D Liu - arxiv preprint arxiv:2407.17172, 2024 - arxiv.org

With the rise of video production and social media, speech editing has become crucial for
creators to address issues like mispronunciations, missing words, or stuttering in audio …

Tallenna Viittaa Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

E TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications

Z Liang, Z Ma, C Du, K Yu… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Text-based speech editing aims at manipulating part of real audio by modifying the
corresponding transcribed text, without being discernible by human auditory system. With …

Tallenna Viittaa Aiheeseen liittyviä artikkeleita Kaikki 2 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fluenteditor: Text-based speech editing by considering acoustic and prosody consistency

R Liu, J **, Z Jiang, H Li - arxiv preprint arxiv:2309.11725, 2023 - arxiv.org

Text-based speech editing (TSE) techniques are designed to enable users to edit the output
audio by modifying the input text transcript instead of the audio itself. Despite much progress …

Tallenna Viittaa Viittausten määrä 2 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FluentEditor+: Text-based Speech Editing by Modeling Local Hierarchical Acoustic Smoothness and Global Prosody Consistency

R Liu, J **, Z Jiang, H Li - arxiv preprint arxiv:2410.03719, 2024 - arxiv.org

Text-based speech editing (TSE) allows users to modify speech by editing the
corresponding text and performing operations such as cutting, copying, and pasting to …

Tallenna Viittaa Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

H Wang, M Yu, J Hai, C Chen, Y Hu, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we introduce SSR-Speech, a neural codec autoregressive model designed for
stable, safe, and robust zero-shot text-based speech editing and text-to-speech synthesis …

Tallenna Viittaa Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model

J Zuo, S Ji, M Fang, Z Jiang, X Cheng, Q Yang… - arxiv preprint arxiv …, 2025 - arxiv.org

This paper introduces PFlow-VC, a conditional flow matching voice conversion model that
leverages fine-grained discrete pitch tokens and target speaker prompt information for …

Tallenna Viittaa Aiheeseen liittyviä artikkeleita HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency

Y Chen, Y Jia, S Zhao, Z Jiang, H Li, J Kang… - arxiv preprint arxiv …, 2024 - arxiv.org

As text-based speech editing becomes increasingly prevalent, the demand for unrestricted
free-text editing continues to grow. However, existing speech editing techniques encounter …

Tallenna Viittaa Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

FluentSpeech: Stutter-oriented automatic speech editing with context-aware diffusion models

Flashspeech: Efficient zero-shot speech synthesis

Autoregressive diffusion transformer for text-to-speech synthesis

Songcreator: Lyrics-based universal song generation

Speech Editing--a Summary

E TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications

Fluenteditor: Text-based speech editing by considering acoustic and prosody consistency

FluentEditor+: Text-based Speech Editing by Modeling Local Hierarchical Acoustic Smoothness and Global Prosody Consistency

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency