Multimodal machine learning: A survey and taxonomy

T Baltrušaitis, C Ahuja… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Our experience of the world is multimodal-we see objects, hear sounds, feel texture, smell
odors, and taste flavors. Modality refers to the way in which something happens or is …

Early vs late fusion in multimodal convolutional neural networks

K Gadzicki, R Khamsehashari… - 2020 IEEE 23rd …, 2020 - ieeexplore.ieee.org
Combining machine learning in neural networks with multimodal fusion strategies offers an
interesting potential for classification tasks but the optimum fusion strategies for many …

Forecasting power demand in China with a CNN-LSTM model including multimodal information

D Wang, J Gan, J Mao, F Chen, L Yu - Energy, 2023 - Elsevier
Accurate forecasting of social power demand is the country's primary task in making
decisions on power overall planning, coal power withdrawal, and renewable energy …

Multimodal categorization of crisis events in social media

M Abavisani, L Wu, S Hu… - Proceedings of the …, 2020 - openaccess.thecvf.com
Recent developments in image classification and natural language processing, coupled with
the rapid growth in social media usage, have enabled fundamental advances in detecting …

CTFN: Hierarchical learning for multimodal sentiment analysis using coupled-translation fusion network

J Tang, K Li, X **, A Cichocki, Q Zhao… - Proceedings of the 59th …, 2021 - aclanthology.org
Multimodal sentiment analysis is the challenging research area that attends to the fusion of
multiple heterogeneous modalities. The main challenge is the occurrence of some missing …

Event-based media processing and analysis: A survey of the literature

C Tzelepis, Z Ma, V Mezaris, B Ionescu… - Image and Vision …, 2016 - Elsevier
Research on event-based processing and analysis of media is receiving an increasing
attention from the scientific community due to its relevance for an abundance of applications …

DFMKE: A dual fusion multi-modal knowledge graph embedding framework for entity alignment

J Zhu, C Huang, P De Meo - Information Fusion, 2023 - Elsevier
Entity alignment is critical for multiple knowledge graphs (KGs) integration. Although
researchers have made significant efforts to explore the relational embeddings between …

Audio-visual event localization via recursive fusion by joint co-attention

B Duan, H Tang, W Wang, Z Zong… - Proceedings of the …, 2021 - openaccess.thecvf.com
The major challenge in audio-visual event localization task lies in how to fuse information
from multiple modalities effectively. Recent works have shown that the attention mechanism …

Dynamic multimodal fusion via meta-learning towards micro-video recommendation

H Liu, Y Wei, F Liu, W Wang, L Nie… - ACM Transactions on …, 2023 - dl.acm.org
Multimodal information (eg, visual, acoustic, and textual) has been widely used to enhance
representation learning for micro-video recommendation. For integrating multimodal …

The imagenet shuffle: Reorganized pre-training for video event detection

P Mettes, DC Koelma, CGM Snoek - Proceedings of the 2016 ACM on …, 2016 - dl.acm.org
This paper strives for video event detection using a representation learned from deep
convolutional neural networks. Different from the leading approaches, who all learn from the …