WORLD: a vocoder-based high-quality speech synthesis system for real-time applications

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org
A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

[HTML][HTML] D4C, a band-aperiodicity estimator for high-quality speech synthesis

M Morise - Speech Communication, 2016 - Elsevier
An algorithm is proposed for estimating the band aperiodicity of speech signals, where
“aperiodicity” is defined as the power ratio between the speech signal and the aperiodic …

[PDF][PDF] Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals.

M Morise - INTERSPEECH, 2017 - isca-archive.org
A fundamental frequency (F0) estimator named Harvest is described. The unique points of
Harvest are that it can obtain a reliable F0 contour and reduce the error that the voiced …

A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis

S Takaki, J Yamagishi - 2016 IEEE International Conference on …, 2016 - ieeexplore.ieee.org
In the state-of-the-art statistical parametric speech synthesis system, a speech analysis
module, eg STRAIGHT spectral analysis, is generally used for obtaining accurate and stable …

Sound quality comparison among high-quality vocoders by using re-synthesized speech

M Morise, Y Watanabe - Acoustical Science and Technology, 2018 - jstage.jst.go.jp
Since we have released WORLD on GitHubà and have been continuously updating
WORLD to improve the sound quality of the synthesized speech, there is no information on …

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

N Makishima, S Suzuki, A Ando… - ar** and finely tuned WaveNet vocoder
PL Tobing, YC Wu, T Hayashi, K Kobayashi… - IEEE Access, 2019 - ieeexplore.ieee.org
In this paper, we present a novel framework for a voice conversion (VC) system based on a
cyclic recurrent neural network (CycleRNN) and a finely tuned WaveNet vocoder. Even …

[PDF][PDF] QHM-GAN: Neural Vocoder based on Quasi-Harmonic Modeling

S Chen, T Toda - Proc. Interspeech 2024, 2024 - isca-archive.org
Neural vocoder has been studied for years, aiming at modeling speech signals and
enabling speech signal reconstruction from acoustic features. Unfortunately, the existing end …

Efficient shallow wavenet vocoder using multiple samples output based on laplacian distribution and linear prediction

PL Tobing, YC Wu, T Hayashi… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper presents a novel way for an efficient implementation scheme of shallow WaveNet
vocoder with multiple samples (segment) output based on the use of Laplacian distribution …