- Academic Search

Z Chang, GA Koulieris, HPH Shum - arxiv preprint arxiv:2306.04542, 2023 - arxiv.org

Diffusion models are generative models, which gradually add and remove noise to learn the
underlying distribution of training data for data generation. The components of diffusion …

保存引用被引用数: 60 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Enhancing emotional text-to-speech controllability with natural language guidance through contrastive learning and diffusion models

X **g, K Zhou, A Triantafyllopoulos… - arxiv preprint arxiv …, 2024 - arxiv.org

While current emotional text-to-speech (TTS) systems can generate highly intelligible
emotional speech, achieving fine control over emotion rendering of the output speech still …

保存引用被引用数: 2 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Sf-speech: Straightened flow for zero-shot voice clone on small-scale dataset

X Li, Z Shang, H Hua, P Shi, C Yang, L Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large-scale speech generation models have achieved impressive performance in the zero-
shot voice clone tasks relying on large-scale datasets. However, exploring how to achieve …

保存引用被引用数: 2 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Dex-tts: Diffusion-based expressive text-to-speech with style modeling on time variability

HJ Park, JS Kim, W Shin, SW Han - arxiv preprint arxiv:2406.19135, 2024 - arxiv.org

Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to
synthesize natural speech, but there are limitations to obtaining well-represented styles and …

保存引用被引用数: 2 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

X Ma, G Fang, MB Mi, X Wang - arxiv preprint arxiv:2406.01733, 2024 - arxiv.org

Diffusion Transformers have recently demonstrated unprecedented generative capabilities
for various tasks. The encouraging results, however, come with the cost of slow inference …

保存引用被引用数: 17 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising

G Fang, X Ma, X Wang - arxiv preprint arxiv:2412.05628, 2024 - arxiv.org

Transformer-based diffusion models have achieved significant advancements across a
variety of generative tasks. However, producing high-quality outputs typically necessitates …

保存引用関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

X Qi, R Fu, Z Wen, T Wang, C Qiang, J Tao, C Li… - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, speech diffusion models have advanced rapidly. Alongside the widely used
U-Net architecture, transformer-based models such as the Diffusion Transformer (DiT) have …

保存引用関連記事 HTMLバージョン

[Free GPT-4]

[PDF] ieee.org

Personalized and Controllable Voice Style Transfer with Speech Diffusion Transformer

HY Choi, SH Lee, SW Lee - IEEE Transactions on Audio …, 2025 - ieeexplore.ieee.org

Although speech synthesis systems have remarkably advanced with their expansion into
various applications, achieving robust voice style transfer while maintaining high-quality in …

保存引用関連記事

アラートを作成

引用

検索オプション

マイライブラリに保存しました

U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech

On the design fundamentals of diffusion models: A survey

Enhancing emotional text-to-speech controllability with natural language guidance through contrastive learning and diffusion models

Sf-speech: Straightened flow for zero-shot voice clone on small-scale dataset

Dex-tts: Diffusion-based expressive text-to-speech with style modeling on time variability

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Personalized and Controllable Voice Style Transfer with Speech Diffusion Transformer