Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects

S Zhang, Y Yang, C Chen, X Zhang, Q Leng… - Expert Systems with …, 2024 - Elsevier
Emotion recognition has recently attracted extensive interest due to its significant
applications to human–computer interaction. The expression of human emotion depends on …

Automated assessment of psychiatric disorders using speech: A systematic review

DM Low, KH Bentley, SS Ghosh - Laryngoscope investigative …, 2020 - Wiley Online Library
Objective There are many barriers to accessing mental health assessments including cost
and stigma. Even when individuals receive professional care, assessments are intermittent …

Will affective computing emerge from foundation models and general artificial intelligence? A first evaluation of ChatGPT

MM Amin, E Cambria, BW Schuller - IEEE Intelligent Systems, 2023 - ieeexplore.ieee.org
ChatGPT has shown the potential of emerging general artificial intelligence capabilities, as it
has demonstrated competent performance across many natural language processing tasks …

Ast: Audio spectrogram transformer

Y Gong, YA Chung, J Glass - arxiv preprint arxiv:2104.01778, 2021 - arxiv.org
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the
main building block for end-to-end audio classification models, which aim to learn a direct …

Deep learning for human affect recognition: Insights and new developments

PV Rouast, MTP Adam, R Chiong - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Automatic human affect recognition is a key step towards more natural human-computer
interaction. Recent trends include recognition in the wild using a fusion of audiovisual and …

Speech emotion recognition from 3D log-mel spectrograms with deep learning network

H Meng, T Yan, F Yuan, H Wei - IEEE access, 2019 - ieeexplore.ieee.org
Speech emotion recognition is a vital and challenging task that the feature extraction plays a
significant role in the SER performance. With the development of deep learning, we put our …

End-to-end multimodal emotion recognition using deep neural networks

P Tzirakis, G Trigeorgis, MA Nicolaou… - IEEE Journal of …, 2017 - ieeexplore.ieee.org
Automatic affect recognition is a challenging task due to the various modalities emotions can
be expressed with. Applications can be found in many domains including multimedia …

The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing

F Eyben, KR Scherer, BW Schuller… - IEEE transactions on …, 2015 - ieeexplore.ieee.org
Work on voice sciences over recent decades has led to a proliferation of acoustic
parameters that are used quite selectively and are not always extracted in a similar fashion …

Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching

S Zhang, S Zhang, T Huang… - IEEE transactions on …, 2017 - ieeexplore.ieee.org
Speech emotion recognition is challenging because of the affective gap between the
subjective emotions and low-level features. Integrating multilevel feature learning and model …

Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network

G Trigeorgis, F Ringeval, R Brueckner… - … on acoustics, speech …, 2016 - ieeexplore.ieee.org
The automatic recognition of spontaneous emotions from speech is a challenging task. On
the one hand, acoustic features need to be robust enough to capture the emotional content …