Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers

MB Akçay, K Oğuz - Speech Communication, 2020 - Elsevier
Speech is the most natural way of expressing ourselves as humans. It is only natural then to
extend this communication medium to computer applications. We define speech emotion …

A comprehensive survey and analysis of generative models in machine learning

GM Harshvardhan, MK Gourisaria, M Pandey… - Computer Science …, 2020 - Elsevier
Generative models have been in existence for many decades. In the field of machine
learning, we come across many scenarios when directly learning a target is intractable …

Deep learning techniques for speech emotion recognition, from databases to models

BJ Abbaschian, D Sierra-Sosa, A Elmaghraby - Sensors, 2021 - mdpi.com
The advancements in neural networks and the on-demand need for accurate and near real-
time Speech Emotion Recognition (SER) in human–computer interactions make it …

A review on speech emotion recognition using deep learning and attention mechanism

E Lieskovská, M Jakubec, R Jarina, M Chmulík - Electronics, 2021 - mdpi.com
Emotions are an integral part of human interactions and are significant factors in determining
user satisfaction or customer opinion. speech emotion recognition (SER) modules also play …

Survey of deep representation learning for speech emotion recognition

S Latif, R Rana, S Khalifa, R Jurdak… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
Traditionally, speech emotion recognition (SER) research has relied on manually
handcrafted acoustic features using feature engineering. However, the design of …

Att-Net: Enhanced emotion recognition system using lightweight self-attention module

S Kwon - Applied Soft Computing, 2021 - Elsevier
Speech emotion recognition (SER) is an active research field of digital signal processing
and plays a crucial role in numerous applications of Human–computer interaction (HCI) …

Speech technology for healthcare: Opportunities, challenges, and state of the art

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org
Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

Towards learning a universal non-semantic representation of speech

J Shor, A Jansen, R Maor, O Lang, O Tuval… - arxiv preprint arxiv …, 2020 - arxiv.org
The ultimate goal of transfer learning is to reduce labeled data requirements by exploiting a
pre-existing embedding model trained for different datasets or tasks. The visual and …

Jointly fine-tuning" bert-like" self supervised models to improve multimodal speech emotion recognition

S Siriwardhana, A Reis, R Weerasekera… - arxiv preprint arxiv …, 2020 - arxiv.org
Multimodal emotion recognition from speech is an important area in affective computing.
Fusing multiple data modalities and learning representations with limited amounts of labeled …

Expressive TTS training with frame and style reconstruction loss

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that
improves the speech styling at utterance level. One of the key challenges in prosody …