Generative adversarial networks for speech processing: A review
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …
They are used as generative models for all kinds of data such as text, images, audio, music …
Glotnet—a raw waveform model for the glottal excitation in statistical parametric speech synthesis
L Juvela, B Bollepalli, V Tsiaras… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
Recently, generative neural network models which operate directly on raw audio, such as
WaveNet, have improved the state of the art in text-to-speech synthesis (TTS). Moreover …
WaveNet, have improved the state of the art in text-to-speech synthesis (TTS). Moreover …
OPENGLOT–An open environment for the evaluation of glottal inverse filtering
Glottal inverse filtering (GIF) refers to technology to estimate the source of voiced speech,
the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy …
the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy …
A comparison between straight, glottal, and sinusoidal vocoding in statistical parametric speech synthesis
A vocoder is used to express a speech waveform with a controllable parametric
representation that can be converted back into a speech waveform. Vocoders representing …
representation that can be converted back into a speech waveform. Vocoders representing …
Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis
Recent studies have shown that text-to-speech synthesis quality can be improved by using
glottal vocoding. This refers to vocoders that parameterize speech into two parts, the glottal …
glottal vocoding. This refers to vocoders that parameterize speech into two parts, the glottal …
Speaker-independent raw waveform model for glottal excitation
Recent speech technology research has seen a growing interest in using WaveNets as
statistical vocoders, ie, generating speech waveforms from acoustic features. These models …
statistical vocoders, ie, generating speech waveforms from acoustic features. These models …
Full-band LPCNet: A real-time neural vocoder for 48 kHz audio with a CPU
K Matsubara, T Okamoto, R Takashima… - IEEE …, 2021 - ieeexplore.ieee.org
This paper investigates a real-time neural speech synthesis system on CPUs that can
synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range …
synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range …
An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features
Although a WaveNet vocoder can synthesize more natural-sounding speech waveforms
than conventional vocoders with sampling frequencies of 16 and 24 kHz, it is difficult to …
than conventional vocoders with sampling frequencies of 16 and 24 kHz, it is difficult to …
Vocal effort based speaking style conversion using vocoder features and parallel learning
Speaking style conversion (SSC) is the technology of converting natural speech signals from
one style to another. In this study, we aim to provide a general SSC system for converting …
one style to another. In this study, we aim to provide a general SSC system for converting …
[PDF][PDF] An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.
LPCNet neural vocoder and its variants have shown the ability to synthesize high-quality
speech in small footprint by exploiting domain knowledge in speech. In this paper, we …
speech in small footprint by exploiting domain knowledge in speech. In this paper, we …