Multimodal co-learning: Challenges, applications with datasets, recent advances and future directions
Multimodal deep learning systems that employ multiple modalities like text, image, audio,
video, etc., are showing better performance than individual modalities (ie, unimodal) …
video, etc., are showing better performance than individual modalities (ie, unimodal) …
Using transformers for multimodal emotion recognition: Taxonomies and state of the art review
Emotion recognition is an aspect of human-computer interaction, affective computing, and
social robotics. Conventional unimodal approaches for emotion recognition, depending on …
social robotics. Conventional unimodal approaches for emotion recognition, depending on …
M2fnet: Multi-modal fusion network for emotion recognition in conversation
Abstract Emotion Recognition in Conversations (ERC) is crucial in develo** sympathetic
human-machine interaction. In conversational videos, emotion can be present in multiple …
human-machine interaction. In conversational videos, emotion can be present in multiple …
Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities
Emotion identification based on multimodal data (eg, audio, video, text, etc.) is one of the
most demanding and important research fields, with various uses. In this context, this …
most demanding and important research fields, with various uses. In this context, this …
Marlin: Masked autoencoder for facial video representation learning
This paper proposes a self-supervised approach to learn universal facial representations
from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute …
from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute …
CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation
Multimodal sentiment analysis and depression estimation are two important research topics
that aim to predict human mental states using multimodal data. Previous research has …
that aim to predict human mental states using multimodal data. Previous research has …
COGMEN: COntextualized GNN based multimodal emotion recognitioN
Emotions are an inherent part of human interactions, and consequently, it is imperative to
develop AI systems that understand and recognize human emotions. During a conversation …
develop AI systems that understand and recognize human emotions. During a conversation …
AOBERT: All-modalities-in-One BERT for multimodal sentiment analysis
Multimodal sentiment analysis utilizes various modalities such as Text, Vision and Speech to
predict sentiment. As these modalities have unique characteristics, methods have been …
predict sentiment. As these modalities have unique characteristics, methods have been …
Learning in audio-visual context: A review, analysis, and new perspective
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …
understanding. To mimic human perception ability, audio-visual learning, aimed at …
Deep emotional arousal network for multimodal sentiment analysis and emotion recognition
Multimodal sentiment analysis and emotion recognition has become an increasingly popular
research area, where the biggest challenge is to efficiently fuse the input information from …
research area, where the biggest challenge is to efficiently fuse the input information from …