A survey of deep learning-based multimodal emotion recognition: Speech, text, and face
Multimodal emotion recognition (MER) refers to the identification and understanding of
human emotional states by combining different signals, including—but not limited to—text …
human emotional states by combining different signals, including—but not limited to—text …
Learning in audio-visual context: A review, analysis, and new perspective
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …
understanding. To mimic human perception ability, audio-visual learning, aimed at …
Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
Decoupled multimodal distilling for emotion recognition
Human multimodal emotion recognition (MER) aims to perceive human emotions via
language, visual and acoustic modalities. Despite the impressive performance of previous …
language, visual and acoustic modalities. Despite the impressive performance of previous …
Disentangled representation learning for multimodal emotion recognition
Multimodal emotion recognition aims to identify human emotions from text, audio, and visual
modalities. Previous methods either explore correlations between different modalities or …
modalities. Previous methods either explore correlations between different modalities or …
Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis
With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA)
has attracted increasing attention recently. Despite significant progress, there are still two …
has attracted increasing attention recently. Despite significant progress, there are still two …
Incomplete multimodality-diffused emotion recognition
Human multimodal emotion recognition (MER) aims to perceive and understand human
emotions via various heterogeneous modalities, such as language, vision, and acoustic …
emotions via various heterogeneous modalities, such as language, vision, and acoustic …
IIFDD: Intra and inter-modal fusion for depression detection with multi-modal information from Internet of Medical Things
Depression is now a prevalent mental illness and multimodal data-based depression
detection is an essential topic of research. Internet of Medical Things devices can provide …
detection is an essential topic of research. Internet of Medical Things devices can provide …
Learning language-guided adaptive hyper-modality representation for multimodal sentiment analysis
Though Multimodal Sentiment Analysis (MSA) proves effective by utilizing rich information
from multiple sources (eg, language, video, and audio), the potential sentiment-irrelevant …
from multiple sources (eg, language, video, and audio), the potential sentiment-irrelevant …
Learning modality-specific and-agnostic representations for asynchronous multimodal language sequences
Understanding human behaviors and intents from videos is a challenging task. Video flows
usually involve time-series data from different modalities, such as natural language, facial …
usually involve time-series data from different modalities, such as natural language, facial …