A survey on deep learning for multimodal data fusion

J Gao, P Li, Z Chen, J Zhang - Neural Computation, 2020‏ - direct.mit.edu
With the wide deployments of heterogeneous networks, huge amounts of data with
characteristics of high volume, high variety, high velocity, and high veracity are generated …

An analytical study of information extraction from unstructured and multidimensional big data

K Adnan, R Akbar - Journal of Big Data, 2019‏ - Springer
Process of information extraction (IE) is used to extract useful information from unstructured
or semi-structured data. Big data arise new challenges for IE techniques with the rapid …

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

K Bayoudh, R Knani, F Hamdaoui, A Mtibaa - The Visual Computer, 2022‏ - Springer
The research progress in multimodal learning has grown rapidly over the last decade in
several areas, especially in computer vision. The growing potential of multimodal data …

Negative-aware attention framework for image-text matching

K Zhang, Z Mao, Q Wang… - Proceedings of the IEEE …, 2022‏ - openaccess.thecvf.com
Image-text matching, as a fundamental task, bridges the gap between vision and language.
The key of this task is to accurately measure similarity between these two modalities. Prior …

Multi-modal knowledge graph construction and application: A survey

X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022‏ - ieeexplore.ieee.org
Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …

Tedigan: Text-guided diverse face image generation and manipulation

W **a, Y Yang, JH Xue, B Wu - Proceedings of the IEEE …, 2021‏ - openaccess.thecvf.com
In this work, we propose TediGAN, a novel framework for multi-modal image generation and
manipulation with textual descriptions. The proposed method consists of three components …

Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training

G Li, N Duan, Y Fang, M Gong, D Jiang - Proceedings of the AAAI …, 2020‏ - aaai.org
We propose Unicoder-VL, a universal encoder that aims to learn joint representations of
vision and language in a pre-training manner. Borrow ideas from cross-lingual pre-trained …

Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval

H Chen, G Ding, X Liu, Z Lin, J Liu… - Proceedings of the …, 2020‏ - openaccess.thecvf.com
Enabling bi-directional retrieval of images and texts is important for understanding the
correspondence between vision and language. Existing methods leverage the attention …

Stock price prediction using deep learning and frequency decomposition

H Rezaei, H Faaljou, G Mansourfar - Expert Systems with Applications, 2021‏ - Elsevier
Nonlinearity and high volatility of financial time series have made it difficult to predict stock
price. However, thanks to recent developments in deep learning and methods such as long …

Context-aware attention network for image-text retrieval

Q Zhang, Z Lei, Z Zhang, SZ Li - Proceedings of the IEEE …, 2020‏ - openaccess.thecvf.com
As a typical cross-modal problem, image-text bi-directional retrieval relies heavily on the
joint embedding learning and similarity measure for each image-text pair. It remains …