- Academic Search

P Peng, PY Huang, SW Li, A Mohamed… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-
of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on …

保存引用被引用数: 48 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Flashspeech: Efficient zero-shot speech synthesis

Z Ye, Z Ju, H Liu, X Tan, J Chen, Y Lu, P Sun… - Proceedings of the …, 2024 - dl.acm.org

Recent progress in large-scale zero-shot speech synthesis has been significantly advanced
by language models and diffusion models. However, the generation process of both …

保存引用被引用数: 12 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

Songcreator: Lyrics-based universal song generation

S Lei, Y Zhou, B Tang, MWY Lam, F Liu, H Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Music is an integral part of human culture, embodying human intelligence and creativity, of
which songs compose an essential part. While various aspects of song generation have …

保存引用被引用数: 3 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Speech Editing--a Summary

T Kässmann, Y Liu, D Liu - arxiv preprint arxiv:2407.17172, 2024 - arxiv.org

With the rise of video production and social media, speech editing has become crucial for
creators to address issues like mispronunciations, missing words, or stuttering in audio …

保存引用関連記事全 2 バージョン HTMLバージョン

E TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications

Z Liang, Z Ma, C Du, K Yu… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Text-based speech editing aims at manipulating part of real audio by modifying the
corresponding transcribed text, without being discernible by human auditory system. With …

保存引用関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

FluentEditor+: Text-based Speech Editing by Modeling Local Hierarchical Acoustic Smoothness and Global Prosody Consistency

R Liu, J **, Z Jiang, H Li - arxiv preprint arxiv:2410.03719, 2024 - arxiv.org

Text-based speech editing (TSE) allows users to modify speech by editing the
corresponding text and performing operations such as cutting, copying, and pasting to …

保存引用関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

H Wang, M Yu, J Hai, C Chen, Y Hu, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we introduce SSR-Speech, a neural codec autoregressive model designed for
stable, safe, and robust zero-shot text-based speech editing and text-to-speech synthesis …

保存引用関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency

Y Chen, Y Jia, S Zhao, Z Jiang, H Li, J Kang… - arxiv preprint arxiv …, 2024 - arxiv.org

As text-based speech editing becomes increasingly prevalent, the demand for unrestricted
free-text editing continues to grow. However, existing speech editing techniques encounter …

保存引用関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

Z Liu, S Wang, S Inoue, Q Bai, H Li - arxiv preprint arxiv:2406.05551, 2024 - arxiv.org

Audio language models have recently emerged as a promising approach for various audio
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …

保存引用被引用数: 11 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

MMSD-Net: Towards Multi-modal Stuttering Detection

L Nie, SR Kadiri, R Agrawal - arxiv preprint arxiv:2407.11492, 2024 - arxiv.org

Stuttering is a common speech impediment that is caused by irregular disruptions in speech
production, affecting over 70 million people across the world. Standard automatic speech …

保存引用関連記事全 4 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Voicecraft: Zero-shot speech editing and text-to-speech in the wild

Flashspeech: Efficient zero-shot speech synthesis

Songcreator: Lyrics-based universal song generation

Speech Editing--a Summary

E TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications

FluentEditor+: Text-based Speech Editing by Modeling Local Hierarchical Acoustic Smoothness and Global Prosody Consistency

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

MMSD-Net: Towards Multi-modal Stuttering Detection