Google Akademik

A Vyas, B Shi, M Le, A Tjandra, YC Wu, B Guo… - arxiv preprint arxiv …, 2023 - arxiv.org

Audio is an essential part of our life, but creating it often requires expertise and is time-
consuming. Research communities have made great progress over the past year advancing …

Kaydet Alıntı yap Alıntılanma sayısı: 86 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Kaydet Alıntı yap Alıntılanma sayısı: 6 İlgili makaleler HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec

S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully
cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style …

Kaydet Alıntı yap Alıntılanma sayısı: 7 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] openreview.net

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji… - The Twelfth …, 2024 - openreview.net

Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts,
which significantly reduces the data and computation requirements for voice cloning by …

Kaydet Alıntı yap Alıntılanma sayısı: 25 İlgili makaleler HTML olarak görüntüle

[Free GPT-4]

[PDF] openreview.net

Unistyle: Unified style modeling for speaking style captioning and stylistic speech synthesis

X Zhu, W Tian, X Wang, L He, Y **ao, X Wang… - Proceedings of the …, 2024 - dl.acm.org

Understanding the speaking style, such as the emotion of the interlocutor's speech, and
responding with speech in an appropriate style is a natural occurrence in human …

Kaydet Alıntı yap Alıntılanma sayısı: 5 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]

[PDF] acm.org

Speechcraft: A fine-grained expressive speech dataset with natural language description

Z **, J Jia, Q Wang, K Li, S Zhou, S Zhou… - Proceedings of the …, 2024 - dl.acm.org

Speech-language multi-modal learning presents a significant challenge due to the fine
nuanced information inherent in speech styles. Therefore, a large-scale dataset providing …

Kaydet Alıntı yap Alıntılanma sayısı: 4 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey

T **e, Y Rong, P Zhang, L Liu - arxiv preprint arxiv:2412.06602, 2024 - arxiv.org

Text-to-speech (TTS), also known as speech synthesis, is a prominent research area that
aims to generate natural-sounding human speech from text. Recently, with the increasing …

Kaydet Alıntı yap Alıntılanma sayısı: 1 İlgili makaleler HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Hierspeech++: Bridging the gap between semantic and acoustic representation of speech by hierarchical variational inference for zero-shot speech synthesis

SH Lee, HY Choi, SB Kim, SW Lee - arxiv preprint arxiv:2311.12454, 2023 - arxiv.org

Large language models (LLM)-based speech synthesis has been widely adopted in zero-
shot speech synthesis. However, they require a large-scale data and possess the same …

Kaydet Alıntı yap Alıntılanma sayısı: 27 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] acm.org

Voxinstruct: Expressive human instruction-to-speech generation with unified multilingual codec language modelling

Y Zhou, X Qin, Z **, S Zhou, S Lei, S Zhou… - Proceedings of the …, 2024 - dl.acm.org

Recent AIGC systems possess the capability to generate digital multimedia content based
on human language instructions, such as text, image and video. However, when it comes to …

Kaydet Alıntı yap Alıntılanma sayısı: 3 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

Natural language guidance of high-fidelity text-to-speech with synthetic annotations

D Lyth, S King - arxiv preprint arxiv:2402.01912, 2024 - arxiv.org

Text-to-speech models trained on large-scale datasets have demonstrated impressive in-
context learning capabilities and naturalness. However, control of speaker identity and style …

Kaydet Alıntı yap Alıntılanma sayısı: 30 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Prompttts 2: Describing and generating voices with text prompt

Audiobox: Unified audio generation with natural language prompts

Wavchat: A survey of spoken dialogue models

Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Unistyle: Unified style modeling for speaking style captioning and stylistic speech synthesis

Speechcraft: A fine-grained expressive speech dataset with natural language description

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey

Hierspeech++: Bridging the gap between semantic and acoustic representation of speech by hierarchical variational inference for zero-shot speech synthesis

Voxinstruct: Expressive human instruction-to-speech generation with unified multilingual codec language modelling

Natural language guidance of high-fidelity text-to-speech with synthetic annotations