Google 학술 검색

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

저장 인용 467회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Expressive TTS training with frame and style reconstruction loss

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that
improves the speech styling at utterance level. One of the key challenges in prosody …

저장 인용 93회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Teacher-student training for robust tacotron-based tts

R Liu, B Sisman, J Li, F Bao, G Gao… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

While neural end-to-end text-to-speech (TTS) is superior to conventional statistical methods
in many ways, the exposure bias problem in the autoregressive models remains an issue to …

저장 인용 67회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts

WC Huang, T Hayashi, S Watanabe, T Toda - arxiv preprint arxiv …, 2020 - arxiv.org

This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice
conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC) …

저장 인용 47회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Modeling prosodic phrasing with multi-task learning in tacotron-based TTS

R Liu, B Sisman, F Bao, G Gao… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org

Tacotron-based end-to-end speech synthesis has shown remarkable voice quality.
However, the rendering of prosody in the synthesized speech remains to be improved …

저장 인용 28회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient neural speech synthesis for low-resource languages through multilingual modeling

M de Korte, J Kim, E Klabbers - arxiv preprint arxiv:2008.09659, 2020 - arxiv.org

Recent advances in neural TTS have led to models that can produce high-quality synthetic
speech. However, these models typically require large amounts of training data, which can …

저장 인용 27회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Deepfake defense: Constructing and evaluating a specialized Urdu deepfake audio dataset

S Munir, W Sajjad, M Raza, E Abbas… - Findings of the …, 2024 - aclanthology.org

Deepfakes, particularly in the auditory domain, have become a significant threat,
necessitating the development of robust countermeasures. This paper addresses the …

저장 인용 2회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] ijscia.com

[PDF][PDF] ArmSpeech: Armenian spoken language corpus

VH Baghdasaryan - International Journal of Scientific Advances (IJSCIA), 2022 - ijscia.com

The Armenian language is an independent branch of the Indo-European language family
and the official language of the Republic of Armenia and the Republic of Artsakh. According …

저장 인용 10회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis.

K Fujita, A Ando, Y Ijima - Interspeech, 2021 - isca-archive.org

This paper proposes a novel speech-rhythm-based method for speaker embeddings.
Conventionally spectral feature-based speaker embedding vectors such as the x-vector are …

저장 인용 11회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation

K Mitsui, T Koriyama, H Saruwatari - Speech Communication, 2021 - Elsevier

This paper proposes deep Gaussian process (DGP)-based frameworks for multi-speaker
speech synthesis and speaker representation learning. A DGP has a deep architecture of …

저장 인용 8회 인용 관련 학술자료 전체 4개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Training multi-speaker neural text-to-speech systems using speaker-imbalanced speech corpora

A survey on neural speech synthesis

Expressive TTS training with frame and style reconstruction loss

Teacher-student training for robust tacotron-based tts

The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts

Modeling prosodic phrasing with multi-task learning in tacotron-based TTS

Efficient neural speech synthesis for low-resource languages through multilingual modeling

Deepfake defense: Constructing and evaluating a specialized Urdu deepfake audio dataset

[PDF][PDF] ArmSpeech: Armenian spoken language corpus

[PDF][PDF] Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis.

Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation