Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Disentangled representation learning
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying
and disentangling the underlying factors hidden in the observable data in representation …
and disentangling the underlying factors hidden in the observable data in representation …
Decoupled multimodal distilling for emotion recognition
Human multimodal emotion recognition (MER) aims to perceive human emotions via
language, visual and acoustic modalities. Despite the impressive performance of previous …
language, visual and acoustic modalities. Despite the impressive performance of previous …
Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis
Abstract Representation Learning is a significant and challenging task in multimodal
learning. Effective modality representations should contain two parts of characteristics: the …
learning. Effective modality representations should contain two parts of characteristics: the …
Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis
In multimodal sentiment analysis (MSA), the performance of a model highly depends on the
quality of synthesized embeddings. These embeddings are generated from the upstream …
quality of synthesized embeddings. These embeddings are generated from the upstream …
Disentangled representation learning for multimodal emotion recognition
Multimodal emotion recognition aims to identify human emotions from text, audio, and visual
modalities. Previous methods either explore correlations between different modalities or …
modalities. Previous methods either explore correlations between different modalities or …
Are multimodal transformers robust to missing modality?
Multimodal data collected from the real world are often imperfect due to missing modalities.
Therefore multimodal models that are robust against modal-incomplete data are highly …
Therefore multimodal models that are robust against modal-incomplete data are highly …
Misa: Modality-invariant and-specific representations for multimodal sentiment analysis
Multimodal Sentiment Analysis is an active area of research that leverages multimodal
signals for affective understanding of user-generated videos. The predominant approach …
signals for affective understanding of user-generated videos. The predominant approach …
[HTML][HTML] Multimodal transformer for unaligned multimodal language sequences
Human language is often multimodal, which comprehends a mixture of natural language,
facial gestures, and acoustic behaviors. However, two major challenges in modeling such …
facial gestures, and acoustic behaviors. However, two major challenges in modeling such …