Technical challenges for smooth interaction with seniors with dementia: Lessons from humanitude™

H Sumioka, M Shiomi, M Honda… - Frontiers in Robotics and …, 2021 - frontiersin.org
Due to cognitive and socio-emotional decline and mental diseases, senior citizens,
especially people with dementia (PwD), struggle to interact smoothly with their caregivers …

Controllable emotion transfer for end-to-end speech synthesis

T Li, S Yang, L Xue, L **e - 2021 12th International Symposium …, 2021 - ieeexplore.ieee.org
Emotion embedding space learned from references is a straight-forward approach for
emotion transfer in encoder-decoder structured emotional text to speech (TTS) systems …

Stargan for emotional speech conversion: Validated by data augmentation of end-to-end emotion recognition

G Rizos, A Baird, M Elliott… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
In this paper, we propose an adversarial network implementation for speech emotion
conversion as a data augmentation method, validated by a multi-class speech affect …

Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis

J Lorenzo-Trueba, GE Henter, S Takaki… - Speech …, 2018 - Elsevier
In this paper, we investigate the simultaneous modeling of multiple emotions in DNN-based
expressive speech synthesis, and how to represent the emotional labels, such as emotional …

Multi-type features separating fusion learning for Speech Emotion Recognition

X Xu, D Li, Y Zhou, Z Wang - Applied Soft Computing, 2022 - Elsevier
Abstract Speech Emotion Recognition (SER) is a challengeable task to improve human–
computer interaction. Speech data have different representations, and choosing the …

Speech melody matters—how robots profit from using charismatic speech

K Fischer, O Niebuhr, LC Jensen… - ACM Transactions on …, 2019 - dl.acm.org
In this article, we address to what extent the proverb “the sound makes the music” also
applies to human-robot interaction, and whether robots could profit from using speech …

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey

T **e, Y Rong, P Zhang, L Liu - arxiv preprint arxiv:2412.06602, 2024 - arxiv.org
Text-to-speech (TTS), also known as speech synthesis, is a prominent research area that
aims to generate natural-sounding human speech from text. Recently, with the increasing …

Deep encoder-decoder models for unsupervised learning of controllable speech synthesis

GE Henter, J Lorenzo-Trueba, X Wang… - arxiv preprint arxiv …, 2018 - arxiv.org
Generating versatile and appropriate synthetic speech requires control over the output
expression separate from the spoken text. Important non-textual speech variation is seldom …

End-to-end triplet loss based emotion embedding system for speech emotion recognition

P Kumar, S Jain, B Raman, PP Roy… - … Conference on Pattern …, 2021 - ieeexplore.ieee.org
In this paper, an end-to-end neural embedding system based on triplet loss and residual
learning has been proposed for speech emotion recognition. The proposed system learns …

A survey on speech synthesis techniques in Indian languages

SP Panda, AK Nayak, SC Rai - Multimedia Systems, 2020 - Springer
The text to speech technology has achieved significant progress during the past decade and
is an active area of research and development in providing different human–computer …