A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions

A Barua, MU Ahmed, S Begum - Ieee access, 2023 - ieeexplore.ieee.org
Multimodal machine learning (MML) is a tempting multidisciplinary research area where
heterogeneous data from multiple modalities and machine learning (ML) are combined to …

Cross-modal text and visual generation: A systematic review. Part 1: Image to text

M Żelaszczyk, J Mańdziuk - Information Fusion, 2023 - Elsevier
We review the existing literature on generating text from visual data under the cross-modal
generation umbrella, which affords us to compare and contrast various approaches taking …

Recurrent multimodal interaction for referring image segmentation

C Liu, Z Lin, X Shen, J Yang, X Lu… - Proceedings of the …, 2017 - openaccess.thecvf.com
In this paper we are interested in the problem of image segmentation given natural
language descriptions, ie referring expressions. Existing works tackle this problem by first …

Stack-captioning: Coarse-to-fine learning for image captioning

J Gu, J Cai, G Wang, T Chen - Proceedings of the AAAI conference on …, 2018 - ojs.aaai.org
The existing image captioning approaches typically train a one-stage sentence decoder,
which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage …

Abstractive text-image summarization using multi-modal attentional hierarchical RNN

J Chen, H Zhuge - Proceedings of the 2018 conference on …, 2018 - aclanthology.org
Rapid growth of multi-modal documents on the Internet makes multi-modal summarization
research necessary. Most previous research summarizes texts or images separately. Recent …

Multi-level policy and reward-based deep reinforcement learning framework for image captioning

N Xu, H Zhang, AA Liu, W Nie, Y Su… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Image captioning is one of the most challenging tasks in AI because it requires an
understanding of both complex visuals and natural language. Because image captioning is …

Transformer-based local-global guidance for image captioning

H Parvin, AR Naghsh-Nilchi, HM Mohammadi - Expert Systems with …, 2023 - Elsevier
Image captioning is a difficult problem for machine learning algorithms to compress huge
amounts of images into descriptive languages. The recurrent models are popularly used as …

Unpaired image captioning by language pivoting

J Gu, S Joty, J Cai, G Wang - Proceedings of the European …, 2018 - openaccess.thecvf.com
Image captioning is a multimodal task involving computer vision and natural language
processing, where the goal is to learn a map** from the image to its natural language …

Image difference captioning with instance-level fine-grained feature representation

Q Huang, Y Liang, J Wei, Y Cai, H Liang… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
The task of image difference captioning aims at locating changed objects in similar image
pairs and describing the difference with natural language. The key challenges of this task …

[HTML][HTML] A systematic literature review on image captioning

R Staniūtė, D Šešok - Applied Sciences, 2019 - mdpi.com
Natural language problems have already been investigated for around five years. Recent
progress in artificial intelligence (AI) has greatly improved the performance of models …