A survey of deep learning-based multimodal emotion recognition: Speech, text, and face

H Lian, C Lu, S Li, Y Zhao, C Tang, Y Zong - Entropy, 2023 - mdpi.com
Multimodal emotion recognition (MER) refers to the identification and understanding of
human emotional states by combining different signals, including—but not limited to—text …

Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arxiv preprint arxiv:2208.09579, 2022 - arxiv.org
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Decoupled multimodal distilling for emotion recognition

Y Li, Y Wang, Z Cui - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Human multimodal emotion recognition (MER) aims to perceive human emotions via
language, visual and acoustic modalities. Despite the impressive performance of previous …

Disentangled representation learning for multimodal emotion recognition

D Yang, S Huang, H Kuang, Y Du… - Proceedings of the 30th …, 2022 - dl.acm.org
Multimodal emotion recognition aims to identify human emotions from text, audio, and visual
modalities. Previous methods either explore correlations between different modalities or …

Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis

L Sun, Z Lian, B Liu, J Tao - IEEE Transactions on Affective …, 2023 - ieeexplore.ieee.org
With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA)
has attracted increasing attention recently. Despite significant progress, there are still two …

Incomplete multimodality-diffused emotion recognition

Y Wang, Y Li, Z Cui - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Human multimodal emotion recognition (MER) aims to perceive and understand human
emotions via various heterogeneous modalities, such as language, vision, and acoustic …

IIFDD: Intra and inter-modal fusion for depression detection with multi-modal information from Internet of Medical Things

J Chen, Y Hu, Q Lai, W Wang, J Chen, H Liu… - Information …, 2024 - Elsevier
Depression is now a prevalent mental illness and multimodal data-based depression
detection is an essential topic of research. Internet of Medical Things devices can provide …

Learning language-guided adaptive hyper-modality representation for multimodal sentiment analysis

H Zhang, Y Wang, G Yin, K Liu, Y Liu, T Yu - arxiv preprint arxiv …, 2023 - arxiv.org
Though Multimodal Sentiment Analysis (MSA) proves effective by utilizing rich information
from multiple sources (eg, language, video, and audio), the potential sentiment-irrelevant …

Learning modality-specific and-agnostic representations for asynchronous multimodal language sequences

D Yang, H Kuang, S Huang, L Zhang - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Understanding human behaviors and intents from videos is a challenging task. Video flows
usually involve time-series data from different modalities, such as natural language, facial …