Modelling individual and cross-cultural variation in the map** of emotions to speech prosody

P van Rijn, P Larrouy-Maestri - Nature human behaviour, 2023 - nature.com
The existence of a map** between emotions and speech prosody is commonly assumed.
We propose a Bayesian modelling framework to analyse this map**. Our models are fitted …

Giving robots a voice: Human-in-the-loop voice creation and open-ended labeling

P van Rijn, S Mertes, K Janowski, K Weitz… - Proceedings of the CHI …, 2024 - dl.acm.org
Speech is a natural interface for humans to interact with robots. Yet, aligning a robot's voice
to its appearance is challenging due to the rich vocabulary of both modalities. Previous …

[PDF][PDF] Words are all you need? capturing human sensory similarity with textual descriptors

R Marjieh, P van Rijn, I Sucholutsky… - arxiv preprint arxiv …, 2022 - researchgate.net
Recent advances in multimodal training use textual descriptions to significantly enhance
machine understanding of images and videos. Yet, it remains unclear to what extent …

VoiceMe: Personalized voice generation in TTS

P van Rijn, S Mertes, D Schiller, P Dura… - arxiv preprint arxiv …, 2022 - arxiv.org
Novel text-to-speech systems can generate entirely new voices that were not seen during
training. However, it remains a difficult task to efficiently create personalized voices from a …

Local Style Tokens: Fine-Grained Prosodic Representations For TTS Expressive Control

M Lenglet, O Perrotin, G Bailly - 12th ISCA Speech Synthesis Workshop …, 2023 - hal.science
Neural Text-To-Speech (TTS) models achieve great performances regarding naturalness,
but modeling expressivity remains an ongoing challenge. Some success was found through …

Speaking Rate Control of end-to-end TTS Models by Direct Manipulation of the Encoder's Output Embeddings

M Lenglet, O Perrotin, G Bailly - … 2022-23rd Annual Conference of the …, 2022 - hal.science
Since neural Text-To-Speech models have achieved such high standards in terms of
naturalness, the main focus of the field has gradually shifted to gaining more control over the …

[PDF][PDF] Analysis by synthesis: Using an expressive tts model as feature extractor for paralinguistic speech classification

D Schiller, S Mertes, P Rijn, E André - 2021 - opus.bibliothek.uni-augsburg.de
Modeling adequate features of speech prosody is one key factor to good performance in
affective speech classification. However, the distinction between the prosody that is induced …

Using Gibbs Sampling with People to characterize perceptual and aesthetic evaluations in multidimensional visual stimulus space

E Van Geert, N Jacoby - Proceedings of the Annual Meeting of the …, 2024 - escholarship.org
Aesthetic appreciation is inherently multidimensional: many different stimulus dimensions
(eg, colors, shapes, sizes) contribute to our aesthetic experience. However, most studies in …

Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People

DM Huang, P Van Rijn, I Sucholutsky, R Marjieh… - arxiv preprint arxiv …, 2024 - arxiv.org
Conversational tones--the manners and attitudes in which speakers communicate--are
essential to effective communication. Amidst the increasing popularization of Large …

VoiceX: A Text-To-Speech Framework for Custom Voices

S Mertes, DW Don, O Grothe, J Kuch… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern TTS systems are capable of creating highly realistic and natural-sounding speech.
Despite these developments, the process of customizing TTS voices remains a complex …