Speech synthesis with mixed emotions
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …
The current studies are mostly focused on imitating an averaged style belonging to a specific …
Learning efficient representations for keyword spotting with triplet loss
R Vygon, N Mikhaylovskiy - … 2021, St. Petersburg, Russia, September 27 …, 2021 - Springer
In the past few years, triplet loss-based metric embeddings have become a de-facto
standard for several important computer vision problems, most notably, person …
standard for several important computer vision problems, most notably, person …
A survey on automatic multimodal emotion recognition in the wild
Affective computing has been an active area of research for the past two decades. One of
the major component of affective computing is automatic emotion recognition. This chapter …
the major component of affective computing is automatic emotion recognition. This chapter …
Self-supervised endoscopic image key-points matching
M Farhat, H Chaabouni-Chouayakh… - Expert Systems with …, 2023 - Elsevier
Feature matching and finding correspondences between endoscopic images is a key step in
many clinical applications such as patient follow-up and generation of panoramic image …
many clinical applications such as patient follow-up and generation of panoramic image …
Multi-cultural speech emotion recognition using language and speaker cues
Abstract Speech Emotion Recognition (SER) has been an active area of research to make
Human–Computer Interaction (HCI) smoother and more natural. However, due to the …
Human–Computer Interaction (HCI) smoother and more natural. However, due to the …
Quantifying emotional similarity in speech
This study proposes the novel formulation of measuring emotional similarity between
speech recordings. This formulation explores the ordinal nature of emotions by comparing …
speech recordings. This formulation explores the ordinal nature of emotions by comparing …
Domain generalization with triplet network for cross-corpus speech emotion recognition
S Lee - 2021 IEEE Spoken Language Technology Workshop …, 2021 - ieeexplore.ieee.org
Domain generalization is a major challenge for cross-corpus speech emotion recognition.
The recognition performance built on" seen" source corpora is inevitably degraded when the …
The recognition performance built on" seen" source corpora is inevitably degraded when the …
MSP-face corpus: a natural audiovisual emotional database
Expressive behaviors conveyed during daily interactions are difficult to determine, because
they often consist of a blend of different emotions. The complexity in expressive human …
they often consist of a blend of different emotions. The complexity in expressive human …
Learning low-dimensional embeddings of audio shingles for cross-version retrieval of classical music
Cross-version music retrieval aims at identifying all versions of a given piece of music using
a short query audio fragment. One previous approach, which is particularly suited for …
a short query audio fragment. One previous approach, which is particularly suited for …
Use of triplet-loss function to improve driving anomaly detection using conditional generative adversarial network
Driving anomaly detection is an important problem in advanced driver assistance systems
(ADAS). The ability to immediately detect potentially hazardous scenarios will prevent …
(ADAS). The ability to immediately detect potentially hazardous scenarios will prevent …