Multimodal machine learning: A survey and taxonomy
Our experience of the world is multimodal-we see objects, hear sounds, feel texture, smell
odors, and taste flavors. Modality refers to the way in which something happens or is …
odors, and taste flavors. Modality refers to the way in which something happens or is …
Early vs late fusion in multimodal convolutional neural networks
Combining machine learning in neural networks with multimodal fusion strategies offers an
interesting potential for classification tasks but the optimum fusion strategies for many …
interesting potential for classification tasks but the optimum fusion strategies for many …
Forecasting power demand in China with a CNN-LSTM model including multimodal information
D Wang, J Gan, J Mao, F Chen, L Yu - Energy, 2023 - Elsevier
Accurate forecasting of social power demand is the country's primary task in making
decisions on power overall planning, coal power withdrawal, and renewable energy …
decisions on power overall planning, coal power withdrawal, and renewable energy …
Multimodal categorization of crisis events in social media
Recent developments in image classification and natural language processing, coupled with
the rapid growth in social media usage, have enabled fundamental advances in detecting …
the rapid growth in social media usage, have enabled fundamental advances in detecting …
CTFN: Hierarchical learning for multimodal sentiment analysis using coupled-translation fusion network
Multimodal sentiment analysis is the challenging research area that attends to the fusion of
multiple heterogeneous modalities. The main challenge is the occurrence of some missing …
multiple heterogeneous modalities. The main challenge is the occurrence of some missing …
Event-based media processing and analysis: A survey of the literature
Research on event-based processing and analysis of media is receiving an increasing
attention from the scientific community due to its relevance for an abundance of applications …
attention from the scientific community due to its relevance for an abundance of applications …
DFMKE: A dual fusion multi-modal knowledge graph embedding framework for entity alignment
Entity alignment is critical for multiple knowledge graphs (KGs) integration. Although
researchers have made significant efforts to explore the relational embeddings between …
researchers have made significant efforts to explore the relational embeddings between …
Audio-visual event localization via recursive fusion by joint co-attention
The major challenge in audio-visual event localization task lies in how to fuse information
from multiple modalities effectively. Recent works have shown that the attention mechanism …
from multiple modalities effectively. Recent works have shown that the attention mechanism …
Dynamic multimodal fusion via meta-learning towards micro-video recommendation
Multimodal information (eg, visual, acoustic, and textual) has been widely used to enhance
representation learning for micro-video recommendation. For integrating multimodal …
representation learning for micro-video recommendation. For integrating multimodal …
The imagenet shuffle: Reorganized pre-training for video event detection
This paper strives for video event detection using a representation learned from deep
convolutional neural networks. Different from the leading approaches, who all learn from the …
convolutional neural networks. Different from the leading approaches, who all learn from the …