Review of end-to-end speech synthesis technology based on deep learning

Z Mu, X Yang, Y Dong - arxiv preprint arxiv:2104.09995, 2021‏ - arxiv.org
As an indispensable part of modern human-computer interaction system, speech synthesis
technology helps users get the output of intelligent machine more easily and intuitively, thus …

The emotional voices database: Towards controlling the emotion dimension in voice generation systems

A Adigwe, N Tits, KE Haddad, S Ostadabbas… - arxiv preprint arxiv …, 2018‏ - arxiv.org
In this paper, we present a database of emotional speech intended to be open-sourced and
used for synthesis and generation purpose. It contains data for male and female actors in …

Exploring transfer learning for low resource emotional tts

N Tits, K El Haddad, T Dutoit - … and Applications: Proceedings of the 2019 …, 2020‏ - Springer
During the last few years, spoken language technologies have known a big improvement
thanks to Deep Learning. However Deep Learning-based algorithms require amounts of …

Multi-label extreme learning machine (MLELMs) for bangla regional speech recognition

PS Hossain, A Chakrabarty, K Kim, MJ Piran - Applied Sciences, 2022‏ - mdpi.com
Extensive research has been conducted in the past to determine age, gender, and words
spoken in Bangla speech, but no work has been conducted to identify the regional language …

The Blizzard Challenge 2023

O Perrotin, B Stephenson, S Gerber… - 18th Blizzard Challenge …, 2023‏ - hal.science
The Blizzard Challenge 2023 is the eighteenth edition of the text-to-speech synthesis
Blizzard Challenge. This year, two French datasets were provided to participants and two …

Learning and controlling the source-filter representation of speech with a variational autoencoder

S Sadok, S Leglaive, L Girin, X Alameda-Pineda… - Speech …, 2023‏ - Elsevier
Understanding and controlling latent representations in deep generative models is a
challenging yet important problem for analyzing, transforming and generating various types …

A methodology for controlling the emotional expressiveness in synthetic speech-a deep learning approach

N Tits - 2019 8th International Conference on Affective …, 2019‏ - ieeexplore.ieee.org
In this project, we aim to build a Text-to-Speech system able to produce speech with a
controllable emotional expressiveness. We propose a methodology for solving this problem …

Local Style Tokens: Fine-Grained Prosodic Representations For TTS Expressive Control

M Lenglet, O Perrotin, G Bailly - 12th ISCA Speech Synthesis Workshop …, 2023‏ - hal.science
Neural Text-To-Speech (TTS) models achieve great performances regarding naturalness,
but modeling expressivity remains an ongoing challenge. Some success was found through …

FastLips: an End-to-End Audiovisual Text-to-Speech System with Lip Features Prediction for Virtual Avatars

M Lenglet, O Perrotin, G Bailly - Interspeech 2024, 2024‏ - hal.science
In this paper, we introduce FastLips, an end-to-end neural model designed to generate
speech and co-verbal facial movements from text, animating a virtual avatar. Based on the …

Impact of Segmentation and Annotation in French end-to-end Synthesis

M Lenglet, O Perrotin, G Bailly - SSW 11th ISCA Speech Synthesis …, 2021‏ - hal.science
Audio books are commonly used to train text-to-speech models (TTS), as they offer large
phonetic content with rather expressive pronunciation, but number and sizes of publicly …