Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis
Abstract Representation Learning is a significant and challenging task in multimodal
learning. Effective modality representations should contain two parts of characteristics: the …
learning. Effective modality representations should contain two parts of characteristics: the …
Self-supervised multimodal versatile networks
Videos are a rich source of multi-modal supervision. In this work, we learn representations
using self-supervision by leveraging three modalities naturally present in videos: visual …
using self-supervision by leveraging three modalities naturally present in videos: visual …
Multimodal sentiment analysis based on fusion methods: A survey
Sentiment analysis is an emerging technology that aims to explore people's attitudes toward
an entity. It can be applied in a variety of different fields and scenarios, such as product …
an entity. It can be applied in a variety of different fields and scenarios, such as product …
Misa: Modality-invariant and-specific representations for multimodal sentiment analysis
Multimodal Sentiment Analysis is an active area of research that leverages multimodal
signals for affective understanding of user-generated videos. The predominant approach …
signals for affective understanding of user-generated videos. The predominant approach …
Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis
In multimodal sentiment analysis (MSA), the performance of a model highly depends on the
quality of synthesized embeddings. These embeddings are generated from the upstream …
quality of synthesized embeddings. These embeddings are generated from the upstream …
Sign language transformers: Joint end-to-end sign language recognition and translation
Abstract Prior work on Sign Language Translation has shown that having a mid-level sign
gloss representation (effectively recognizing the individual signs) improves the translation …
gloss representation (effectively recognizing the individual signs) improves the translation …
Cdtrans: Cross-domain transformer for unsupervised domain adaptation
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled
source domain to a different unlabeled target domain. Most existing UDA methods focus on …
source domain to a different unlabeled target domain. Most existing UDA methods focus on …