Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

Glotnet—a raw waveform model for the glottal excitation in statistical parametric speech synthesis

L Juvela, B Bollepalli, V Tsiaras… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
Recently, generative neural network models which operate directly on raw audio, such as
WaveNet, have improved the state of the art in text-to-speech synthesis (TTS). Moreover …

OPENGLOT–An open environment for the evaluation of glottal inverse filtering

P Alku, T Murtola, J Malinen, J Kuortti, B Story… - Speech …, 2019 - Elsevier
Glottal inverse filtering (GIF) refers to technology to estimate the source of voiced speech,
the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy …

A comparison between straight, glottal, and sinusoidal vocoding in statistical parametric speech synthesis

M Airaksinen, L Juvela, B Bollepalli… - … on Audio, Speech …, 2018 - ieeexplore.ieee.org
A vocoder is used to express a speech waveform with a controllable parametric
representation that can be converted back into a speech waveform. Vocoders representing …

Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis

B Bollepalli, L Juvela, P Alku - arxiv preprint arxiv:1903.05955, 2019 - arxiv.org
Recent studies have shown that text-to-speech synthesis quality can be improved by using
glottal vocoding. This refers to vocoders that parameterize speech into two parts, the glottal …

Speaker-independent raw waveform model for glottal excitation

L Juvela, V Tsiaras, B Bollepalli, M Airaksinen… - arxiv preprint arxiv …, 2018 - arxiv.org
Recent speech technology research has seen a growing interest in using WaveNets as
statistical vocoders, ie, generating speech waveforms from acoustic features. These models …

Full-band LPCNet: A real-time neural vocoder for 48 kHz audio with a CPU

K Matsubara, T Okamoto, R Takashima… - IEEE …, 2021 - ieeexplore.ieee.org
This paper investigates a real-time neural speech synthesis system on CPUs that can
synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range …

An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features

T Okamoto, K Tachibana, T Toda… - … on Acoustics, Speech …, 2018 - ieeexplore.ieee.org
Although a WaveNet vocoder can synthesize more natural-sounding speech waveforms
than conventional vocoders with sampling frequencies of 16 and 24 kHz, it is difficult to …

Vocal effort based speaking style conversion using vocoder features and parallel learning

S Seshadri, L Juvela, O Räsänen, P Alku - IEEE Access, 2019 - ieeexplore.ieee.org
Speaking style conversion (SSC) is the technology of converting natural speech signals from
one style to another. In this study, we aim to provide a general SSC system for converting …

[PDF][PDF] An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.

Y Cui, X Wang, L He, FK Soong - INTERSPEECH, 2020 - isca-archive.org
LPCNet neural vocoder and its variants have shown the ability to synthesize high-quality
speech in small footprint by exploiting domain knowledge in speech. In this paper, we …