- Academic Search

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Tallenna Viittaa Viittausten määrä 467 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adaspeech 4: Adaptive text to speech in zero-shot scenarios

Y Wu, X Tan, B Li, L He, S Zhao, R Song, T Qin… - arxiv preprint arxiv …, 2022 - arxiv.org

Adaptive text to speech (TTS) can synthesize new voices in zero-shot scenarios efficiently,
by using a well-trained source TTS model without adapting it on the speech data of new …

Tallenna Viittaa Viittausten määrä 72 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

A vector quantized approach for text to speech synthesis on real-world spontaneous speech

LW Chen, S Watanabe, A Rudnicky - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Abstract Recent Text-to-Speech (TTS) systems trained on reading or acted corpora have
achieved near human-level naturalness. The diversity of human speech, however, often …

Tallenna Viittaa Viittausten määrä 41 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dailytalk: Spoken dialogue dataset for conversational text-to-speech

K Lee, K Park, D Kim - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

The majority of current Text-to-Speech (TTS) datasets, which are collections of individual
utterances, contain few conversational aspects. In this paper, we introduce DailyTalk, a high …

Tallenna Viittaa Viittausten määrä 40 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Navigating the Soundscape of Deception: A Comprehensive Survey on Audio Deepfake Generation, Detection, and Future Horizons

TM Wani, SAA Qadri, FA Wani… - Foundations and Trends …, 2024 - nowpublishers.com

The rise of audio deepfakes presents a significant security threat that undermines trust in
digital communications and media. These synthetic audio technologies can convincingly …

Tallenna Viittaa Aiheeseen liittyviä artikkeleita Kaikki 2 versiota Kirjastohaku HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Content-dependent fine-grained speaker embedding for zero-shot speaker adaptation in text-to-speech synthesis

Y Zhou, C Song, X Li, L Zhang, Z Wu, Y Bian… - arxiv preprint arxiv …, 2022 - arxiv.org

Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any
adaptation time and parameters. Previous researches usually use a speaker encoder to …

Tallenna Viittaa Viittausten määrä 27 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[KIRJA][B] Neural text-to-speech synthesis

X Tan - 2023 - Springer

Speaking is one of the most important language capabilities (the others being listening,
reading, and writing) of human beings. Text-to-speech synthesis (TTS for short), which aims …

Tallenna Viittaa Viittausten määrä 16 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota Kirjastohaku

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spontaneous style text-to-speech synthesis with controllable spontaneous behaviors based on language models

W Li, P Yang, Y Zhong, Y Zhou, Z Wang, Z Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Spontaneous style speech synthesis, which aims to generate human-like speech, often
encounters challenges due to the scarcity of high-quality data and limitations in model …

Tallenna Viittaa Viittausten määrä 4 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MAV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

Z Chen, H Liu, W Yu, G Sun, H Liu, J Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Publishing open-source academic video recordings is an emergent and prevalent approach
to sharing knowledge online. Such videos carry rich multimodal information including …

Tallenna Viittaa Viittausten määrä 3 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spontts: modeling and transferring spontaneous style for tts

H Li, X Zhu, L Xue, Y Song, Y Chen… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Spontaneous speaking style exhibits notable differences from other speaking styles due to
various spontaneous phenomena (eg, filled pauses, prolongation) and substantial prosody …

Tallenna Viittaa Viittausten määrä 6 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Adaspeech 3: Adaptive text to speech for spontaneous style

A survey on neural speech synthesis

Adaspeech 4: Adaptive text to speech in zero-shot scenarios

A vector quantized approach for text to speech synthesis on real-world spontaneous speech

Dailytalk: Spoken dialogue dataset for conversational text-to-speech

Navigating the Soundscape of Deception: A Comprehensive Survey on Audio Deepfake Generation, Detection, and Future Horizons

Content-dependent fine-grained speaker embedding for zero-shot speaker adaptation in text-to-speech synthesis

[KIRJA][B] Neural text-to-speech synthesis

Spontaneous style text-to-speech synthesis with controllable spontaneous behaviors based on language models

MAV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

Spontts: modeling and transferring spontaneous style for tts