Multimodal co-learning: Challenges, applications with datasets, recent advances and future directions

A Rahate, R Walambe, S Ramanna, K Kotecha - Information Fusion, 2022‏ - Elsevier
Multimodal deep learning systems that employ multiple modalities like text, image, audio,
video, etc., are showing better performance than individual modalities (ie, unimodal) …

Using transformers for multimodal emotion recognition: Taxonomies and state of the art review

S Hazmoune, F Bougamouza - Engineering Applications of Artificial …, 2024‏ - Elsevier
Emotion recognition is an aspect of human-computer interaction, affective computing, and
social robotics. Conventional unimodal approaches for emotion recognition, depending on …

M2fnet: Multi-modal fusion network for emotion recognition in conversation

V Chudasama, P Kar, A Gudmalwar… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
Abstract Emotion Recognition in Conversations (ERC) is crucial in develo** sympathetic
human-machine interaction. In conversational videos, emotion can be present in multiple …

Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities

AI Middya, B Nag, S Roy - Knowledge-based systems, 2022‏ - Elsevier
Emotion identification based on multimodal data (eg, audio, video, text, etc.) is one of the
most demanding and important research fields, with various uses. In this context, this …

Marlin: Masked autoencoder for facial video representation learning

Z Cai, S Ghosh, K Stefanov, A Dhall… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
This paper proposes a self-supervised approach to learn universal facial representations
from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute …

CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation

H Sun, H Wang, J Liu, YW Chen, L Lin - Proceedings of the 30th ACM …, 2022‏ - dl.acm.org
Multimodal sentiment analysis and depression estimation are two important research topics
that aim to predict human mental states using multimodal data. Previous research has …

COGMEN: COntextualized GNN based multimodal emotion recognitioN

A Joshi, A Bhat, A Jain, AV Singh, A Modi - arxiv preprint arxiv …, 2022‏ - arxiv.org
Emotions are an inherent part of human interactions, and consequently, it is imperative to
develop AI systems that understand and recognize human emotions. During a conversation …

AOBERT: All-modalities-in-One BERT for multimodal sentiment analysis

K Kim, S Park - Information Fusion, 2023‏ - Elsevier
Multimodal sentiment analysis utilizes various modalities such as Text, Vision and Speech to
predict sentiment. As these modalities have unique characteristics, methods have been …

Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arxiv preprint arxiv:2208.09579, 2022‏ - arxiv.org
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Deep emotional arousal network for multimodal sentiment analysis and emotion recognition

F Zhang, XC Li, CP Lim, Q Hua, CR Dong, JH Zhai - Information Fusion, 2022‏ - Elsevier
Multimodal sentiment analysis and emotion recognition has become an increasingly popular
research area, where the biggest challenge is to efficiently fuse the input information from …