Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Neural source-filter waveform models for statistical parametric speech synthesis
Neural waveform models have demonstrated better performance than conventional
vocoders for statistical parametric speech synthesis. One of the best models, called …
vocoders for statistical parametric speech synthesis. One of the best models, called …
Neural source-filter-based waveform model for statistical parametric speech synthesis
Neural waveform models such as the WaveNet are used in many recent text-to-speech
systems, but the original WaveNet is quite slow in waveform generation because of its …
systems, but the original WaveNet is quite slow in waveform generation because of its …
Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language
End-to-end speech synthesis is a promising approach that directly converts raw text to
speech. Although it was shown that Tacotron2 outperforms classical pipeline systems with …
speech. Although it was shown that Tacotron2 outperforms classical pipeline systems with …
I'm sorry for your loss: Spectrally-based audio distances are bad at pitch
Growing research demonstrates that synthetic failure modes imply poor generalization. We
compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the …
compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the …
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality
speech directly from text or simple linguistic features such as phonemes. Unlike traditional …
speech directly from text or simple linguistic features such as phonemes. Unlike traditional …
Prosodic features control by symbols as input of sequence-to-sequence acoustic modeling for neural TTS
K Kurihara, N Seiyama, T Kumano - IEICE Transactions on …, 2021 - search.ieice.org
This paper describes a method to control prosodic features using phonetic and prosodic
symbols as input of attention-based sequence-to-sequence (seq2seq) acoustic modeling …
symbols as input of attention-based sequence-to-sequence (seq2seq) acoustic modeling …
Neural harmonic-plus-noise waveform model with trainable maximum voice frequency for text-to-speech synthesis
Neural source-filter (NSF) models are deep neural networks that produce waveforms given
input acoustic features. They use dilated-convolution-based neural filter modules to filter …
input acoustic features. They use dilated-convolution-based neural filter modules to filter …
Training multi-speaker neural text-to-speech systems using speaker-imbalanced speech corpora
When the available data of a target speaker is insufficient to train a high quality speaker-
dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers …
dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers …
Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
End-to-end text-to-speech (TTS) synthesis is a method that directly converts input text to
output acoustic features using a single network. A recent advance of end-to-end TTS is due …
output acoustic features using a single network. A recent advance of end-to-end TTS is due …
Modeling of Rakugo speech and its limitations: Toward speech synthesis that entertains audiences
We have been investigating rakugo speech synthesis as a challenging example of speech
synthesis that entertains audiences. Rakugo is a traditional Japanese form of verbal …
synthesis that entertains audiences. Rakugo is a traditional Japanese form of verbal …