Deep neural network-guided unit selection synthesis

T Merritt, RAJ Clark, Z Wu… - … on Acoustics, Speech …, 2016 - ieeexplore.ieee.org
Vocoding of speech is a standard part of statistical parametric speech synthesis systems. It
imposes an upper bound of the naturalness that can possibly be achieved. Hybrid systems …

A comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesis

M Airaksinen, L Juvela, B Bollepalli… - … on Audio, Speech …, 2018 - ieeexplore.ieee.org
A vocoder is used to express a speech waveform with a controllable parametric
representation that can be converted back into a speech waveform. Vocoders representing …

GlottDNN-A full-band glottal vocoder for statistical parametric speech synthesis

M Airaksinen, B Bollepalli, L Juvela, Z Wu… - Interspeech …, 2016 - research.ed.ac.uk
GlottHMM is a previously developed vocoder that has been successfully used in HMM-
based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according …

Lombard speech synthesis using transfer learning in a tacotron text-to-speech system

B Bollepalli, L Juvela, P Alku - Interspeech, 2019 - research.aalto.fi
Currently, there is increasing interest to use sequence-to-sequence models in text-to-speech
(TTS) synthesis with attention like that in Tacotron models. These models are end-to-end …

Normal-to-Lombard adaptation of speech synthesis using long short-term memory recurrent neural networks

B Bollepalli, L Juvela, M Airaksinen… - Speech …, 2019 - Elsevier
In this article, three adaptation methods are compared based on how well they change the
speaking style of a neural network based text-to-speech (TTS) voice. The speaking style …

[PDF][PDF] A Sound Engineering Approach to Near End Listening Enhancement.

C Chermaz, S King - INTERSPEECH, 2020 - isca-archive.org
We present the beta version of ASE (the Automatic Sound Engineer), a NELE (Near End
Listening Enhancement) algorithm based on audio engineering knowledge. Generations of …

Augmented CycleGANs for continuous scale normal-to-Lombard speaking style conversion

S Seshadri, L Juvela, P Alku, O Räsänen - Interspeech, 2019 - research.aalto.fi
Lombard speech is a speaking style associated with increased vocal effort that is naturally
used by humans to improve intelligibility in the presence of noise. It is hence desirable to …

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks

C Valentini-Botinhao, MS Ribeiro, O Watts… - arxiv preprint arxiv …, 2022 - arxiv.org
Automatically predicting the outcome of subjective listening tests is a challenging task.
Ratings may vary from person to person even if preferences are consistent across listeners …

Adaptive gain control for enhanced speech intelligibility under reverberation

PN Petkov, Y Stylianou - IEEE signal processing letters, 2016 - ieeexplore.ieee.org
Overlap-masking reduces speech intelligibility in reverberant environments. In contrast to
additive noise, the masking signal depends on the past of the speech signal. An increase in …

[PDF][PDF] Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis.

T Merritt, T Raitio, S King - Interspeech, 2014 - isca-archive.org
This paper presents an investigation of the separate perceptual degradations introduced by
the modelling of source and filter features in statistical parametric speech synthesis. This is …