Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Learning efficient representations for keyword spotting with triplet loss

R Vygon, N Mikhaylovskiy - … 2021, St. Petersburg, Russia, September 27 …, 2021 - Springer
In the past few years, triplet loss-based metric embeddings have become a de-facto
standard for several important computer vision problems, most notably, person …

A survey on automatic multimodal emotion recognition in the wild

G Sharma, A Dhall - Advances in data science: Methodologies and …, 2021 - Springer
Affective computing has been an active area of research for the past two decades. One of
the major component of affective computing is automatic emotion recognition. This chapter …

Self-supervised endoscopic image key-points matching

M Farhat, H Chaabouni-Chouayakh… - Expert Systems with …, 2023 - Elsevier
Feature matching and finding correspondences between endoscopic images is a key step in
many clinical applications such as patient follow-up and generation of panoramic image …

Multi-cultural speech emotion recognition using language and speaker cues

SK Pandey, HS Shekhawat, SRM Prasanna - … Signal Processing and …, 2023 - Elsevier
Abstract Speech Emotion Recognition (SER) has been an active area of research to make
Human–Computer Interaction (HCI) smoother and more natural. However, due to the …

Quantifying emotional similarity in speech

J Harvill, SG Leem, M AbdelWahab… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
This study proposes the novel formulation of measuring emotional similarity between
speech recordings. This formulation explores the ordinal nature of emotions by comparing …

Domain generalization with triplet network for cross-corpus speech emotion recognition

S Lee - 2021 IEEE Spoken Language Technology Workshop …, 2021 - ieeexplore.ieee.org
Domain generalization is a major challenge for cross-corpus speech emotion recognition.
The recognition performance built on" seen" source corpora is inevitably degraded when the …

MSP-face corpus: a natural audiovisual emotional database

A Vidal, A Salman, WC Lin, C Busso - Proceedings of the 2020 …, 2020 - dl.acm.org
Expressive behaviors conveyed during daily interactions are difficult to determine, because
they often consist of a blend of different emotions. The complexity in expressive human …

Learning low-dimensional embeddings of audio shingles for cross-version retrieval of classical music

F Zalkow, M Müller - Applied Sciences, 2019 - mdpi.com
Cross-version music retrieval aims at identifying all versions of a given piece of music using
a short query audio fragment. One previous approach, which is particularly suited for …

Use of triplet-loss function to improve driving anomaly detection using conditional generative adversarial network

Y Qiu, T Misu, C Busso - 2020 IEEE 23rd International …, 2020 - ieeexplore.ieee.org
Driving anomaly detection is an important problem in advanced driver assistance systems
(ADAS). The ability to immediately detect potentially hazardous scenarios will prevent …