- Academic Search

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Uložit Citovat Počet citací tohoto článku: 242 Související články Všechny verze (počet: 7)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VQTTS: High-fidelity text-to-speech synthesis with self-supervised VQ acoustic feature

C Du, Y Guo, X Chen, K Yu - arxiv preprint arxiv:2204.00768, 2022 - arxiv.org

The mainstream neural text-to-speech (TTS) pipeline is a cascade system, including an
acoustic model (AM) that predicts acoustic feature from the input transcript and a vocoder …

Uložit Citovat Počet citací tohoto článku: 69 Související články Všechny verze (počet: 4) Zobrazit jako HTML

Controllable accented text-to-speech synthesis with fine and coarse-grained intensity rendering

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a
variant of the standard version (L1), which is challenging as L2 is different from L1 in terms …

Uložit Citovat Počet citací tohoto článku: 16 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Emodiff: Intensity controllable emotional text-to-speech with soft-label guidance

Y Guo, C Du, X Chen, K Yu - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Although current neural text-to-speech (TTS) models are able to generate high-quality
speech, intensity controllable emotional TTS is still a challenging task. Most existing …

Uložit Citovat Počet citací tohoto článku: 36 Související články Všechny verze (počet: 3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Autoregressive diffusion transformer for text-to-speech synthesis

Z Liu, S Wang, S Inoue, Q Bai, H Li - arxiv preprint arxiv:2406.05551, 2024 - arxiv.org

Audio language models have recently emerged as a promising approach for various audio
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …

Uložit Citovat Počet citací tohoto článku: 14 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prompttts++: Controlling speaker identity in prompt-based text-to-speech using natural language descriptions

R Shimizu, R Yamamoto, M Kawamura… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that
allows control over speaker identity using natural language descriptions. To control speaker …

Uložit Citovat Počet citací tohoto článku: 22 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Diffprosody: Diffusion-based latent prosody generation for expressive speech synthesis with prosody conditional adversarial training

HS Oh, SH Lee, SW Lee - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

Expressive text-to-speech systems have undergone significant advancements owing to
prosody modeling, but conventional methods can still be improved. Traditional approaches …

Uložit Citovat Počet citací tohoto článku: 17 Související články Všechny verze (počet: 5)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec

S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully
cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style …

Uložit Citovat Počet citací tohoto článku: 8 Související články Všechny verze (počet: 3) Zobrazit jako HTML

Speaker adaptive text-to-speech with timbre-normalized vector-quantized feature

C Du, Y Guo, X Chen, K Yu - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

Achieving high fidelity and speaker similarity in text-to-speech speaker adaptation with
limited amount of data is a challenging task. Most existing methods only consider adapting …

Uložit Citovat Počet citací tohoto článku: 10 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Acoustic modeling for end-to-end empathetic dialogue speech synthesis using linguistic and prosodic contexts of dialogue history

Y Nishimura, Y Saito, S Takamichi, K Tachibana… - arxiv preprint arxiv …, 2022 - arxiv.org

We propose an end-to-end empathetic dialogue speech synthesis (DSS) model that
considers both the linguistic and prosodic contexts of dialogue history. Empathy is the active …

Uložit Citovat Počet citací tohoto článku: 13 Související články Všechny verze (počet: 8) Zobrazit jako HTML

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

A review of deep learning techniques for speech processing

VQTTS: High-fidelity text-to-speech synthesis with self-supervised VQ acoustic feature

Controllable accented text-to-speech synthesis with fine and coarse-grained intensity rendering

Emodiff: Intensity controllable emotional text-to-speech with soft-label guidance

Autoregressive diffusion transformer for text-to-speech synthesis

Prompttts++: Controlling speaker identity in prompt-based text-to-speech using natural language descriptions

Diffprosody: Diffusion-based latent prosody generation for expressive speech synthesis with prosody conditional adversarial training

Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec

Speaker adaptive text-to-speech with timbre-normalized vector-quantized feature

Acoustic modeling for end-to-end empathetic dialogue speech synthesis using linguistic and prosodic contexts of dialogue history