Google 학술 검색

X Wang, S Takaki, J Yamagishi - IEEE/ACM Transactions on …, 2019 - ieeexplore.ieee.org

Neural waveform models have demonstrated better performance than conventional
vocoders for statistical parametric speech synthesis. One of the best models, called …

저장 인용 168회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural source-filter-based waveform model for statistical parametric speech synthesis

X Wang, S Takaki, J Yamagishi - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

Neural waveform models such as the WaveNet are used in many recent text-to-speech
systems, but the original WaveNet is quite slow in waveform generation because of its …

저장 인용 178회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

Y Yasuda, X Wang, S Takaki… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

End-to-end speech synthesis is a promising approach that directly converts raw text to
speech. Although it was shown that Tacotron2 outperforms classical pipeline systems with …

저장 인용 112회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

I'm sorry for your loss: Spectrally-based audio distances are bad at pitch

J Turian, M Henry - arxiv preprint arxiv:2012.04572, 2020 - arxiv.org

Growing research demonstrates that synthetic failure modes imply poor generalization. We
compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the …

저장 인용 37회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

Y Yasuda, X Wang, J Yamagishi - Computer Speech & Language, 2021 - Elsevier

Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality
speech directly from text or simple linguistic features such as phonemes. Unlike traditional …

저장 인용 37회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] ieice.org

Prosodic features control by symbols as input of sequence-to-sequence acoustic modeling for neural TTS

K Kurihara, N Seiyama, T Kumano - IEICE Transactions on …, 2021 - search.ieice.org

This paper describes a method to control prosodic features using phonetic and prosodic
symbols as input of attention-based sequence-to-sequence (seq2seq) acoustic modeling …

저장 인용 33회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural harmonic-plus-noise waveform model with trainable maximum voice frequency for text-to-speech synthesis

X Wang, J Yamagishi - arxiv preprint arxiv:1908.10256, 2019 - arxiv.org

Neural source-filter (NSF) models are deep neural networks that produce waveforms given
input acoustic features. They use dilated-convolution-based neural filter modules to filter …

저장 인용 41회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Training multi-speaker neural text-to-speech systems using speaker-imbalanced speech corpora

HT Luong, X Wang, J Yamagishi… - arxiv preprint arxiv …, 2019 - arxiv.org

When the available data of a target speaker is insufficient to train a high quality speaker-
dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers …

저장 인용 31회 인용 관련 학술자료 전체 10개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments

Y Yasuda, X Wang, J Yamagishi - arxiv preprint arxiv:1908.11535, 2019 - arxiv.org

End-to-end text-to-speech (TTS) synthesis is a method that directly converts input text to
output acoustic features using a single network. A recent advance of end-to-end TTS is due …

저장 인용 25회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Modeling of Rakugo speech and its limitations: Toward speech synthesis that entertains audiences

S Kato, Y Yasuda, X Wang, E Cooper, S Takaki… - IEEE …, 2020 - ieeexplore.ieee.org

We have been investigating rakugo speech synthesis as a challenging example of speech
synthesis that entertains audiences. Rakugo is a traditional Japanese form of verbal …

저장 인용 13회 인용 관련 학술자료 전체 7개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis...

Neural source-filter waveform models for statistical parametric speech synthesis

Neural source-filter-based waveform model for statistical parametric speech synthesis

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

I'm sorry for your loss: Spectrally-based audio distances are bad at pitch

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

Prosodic features control by symbols as input of sequence-to-sequence acoustic modeling for neural TTS

Neural harmonic-plus-noise waveform model with trainable maximum voice frequency for text-to-speech synthesis

Training multi-speaker neural text-to-speech systems using speaker-imbalanced speech corpora

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments

Modeling of Rakugo speech and its limitations: Toward speech synthesis that entertains audiences