Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects

S Zhang, Y Yang, C Chen, X Zhang, Q Leng… - Expert Systems with …, 2024 - Elsevier
Emotion recognition has recently attracted extensive interest due to its significant
applications to human–computer interaction. The expression of human emotion depends on …

Progress, achievements, and challenges in multimodal sentiment analysis using deep learning: A survey

A Pandey, DK Vishwakarma - Applied Soft Computing, 2024 - Elsevier
Sentiment analysis is a computational technique that analyses the subjective information
conveyed within a given expression. This encompasses appraisals, opinions, attitudes or …

MM-DFN: Multimodal dynamic fusion network for emotion recognition in conversations

D Hu, X Hou, L Wei, L Jiang… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Emotion Recognition in Conversations (ERC) has considerable prospects for develo**
empathetic machines. For multimodal ERC, it is vital to understand context and fuse modality …

Deep imbalanced learning for multimodal emotion recognition in conversations

T Meng, Y Shou, W Ai, N Yin, K Li - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The main task of multimodal emotion recognition in conversations (MERC) is to identify the
emotions in modalities, eg, text, audio, image, and video, which is a significant development …

Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning

B Mocanu, R Tapu, T Zaharia - Image and Vision Computing, 2023 - Elsevier
In the last few years, the multi-modal emotion recognition has become an important research
issue in the affective computing community due to its wide range of applications that include …

Dense graph convolutional with joint cross-attention network for multimodal emotion recognition

C Cheng, W Liu, L Feng, Z Jia - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Multimodal emotion recognition (MER) has attracted much attention since it can leverage
consistency and complementary relationships across multiple modalities. However, previous …

A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition

Z Fu, F Liu, H Wang, J Qi, X Fu, A Zhou, Z Li - arxiv preprint arxiv …, 2021 - arxiv.org
The audio-video based multimodal emotion recognition has attracted a lot of attention due to
its robust performance. Most of the existing methods focus on proposing different cross …

Temporal sentiment localization: Listen and look in untrimmed videos

Z Zhang, J Yang - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
Video sentiment analysis aims to uncover the underlying attitudes of viewers, which has a
wide range of applications in real world. Existing works simply classify a video into a single …

A multi-stage hierarchical relational graph neural network for multimodal sentiment analysis

P Gong, J Liu, X Zhang, X Li - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Multimodal sentiment analysis targets at accurately perceiving the emotional states by
incorporating related information from multiple sources. However, existing methods mostly …

HiT-MST: Dynamic facial expression recognition with hierarchical transformers and multi-scale spatiotemporal aggregation

X **a, D Jiang - Information Sciences, 2023 - Elsevier
Facial expression recognition rarely explores complex spatiotemporal dependencies among
facial regions at different scales. This paper proposes a transformer-based three-layer …