- Academic Search

A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions

A Barua, MU Ahmed, S Begum - Ieee access, 2023 - ieeexplore.ieee.org

Multimodal machine learning (MML) is a tempting multidisciplinary research area where
heterogeneous data from multiple modalities and machine learning (ML) are combined to …

บันทึก อ้างอิง อ้างโดย56 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

Cross-modal text and visual generation: A systematic review. Part 1: Image to text

M Żelaszczyk, J Mańdziuk - Information Fusion, 2023 - Elsevier

We review the existing literature on generating text from visual data under the cross-modal
generation umbrella, which affords us to compare and contrast various approaches taking …

บันทึก อ้างอิง อ้างโดย19 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Recurrent multimodal interaction for referring image segmentation

C Liu, Z Lin, X Shen, J Yang, X Lu… - Proceedings of the …, 2017 - openaccess.thecvf.com

In this paper we are interested in the problem of image segmentation given natural
language descriptions, ie referring expressions. Existing works tackle this problem by first …

บันทึก อ้างอิง อ้างโดย288 บทความที่เกี่ยวข้อง ทั้งหมด 10 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Stack-captioning: Coarse-to-fine learning for image captioning

J Gu, J Cai, G Wang, T Chen - Proceedings of the AAAI conference on …, 2018 - ojs.aaai.org

The existing image captioning approaches typically train a one-stage sentence decoder,
which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage …

บันทึก อ้างอิง อ้างโดย229 บทความที่เกี่ยวข้อง ทั้งหมด 10 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Abstractive text-image summarization using multi-modal attentional hierarchical RNN

J Chen, H Zhuge - Proceedings of the 2018 conference on …, 2018 - aclanthology.org

Rapid growth of multi-modal documents on the Internet makes multi-modal summarization
research necessary. Most previous research summarizes texts or images separately. Recent …

บันทึก อ้างอิง อ้างโดย115 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

Multi-level policy and reward-based deep reinforcement learning framework for image captioning

N Xu, H Zhang, AA Liu, W Nie, Y Su… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org

Image captioning is one of the most challenging tasks in AI because it requires an
understanding of both complex visuals and natural language. Because image captioning is …

บันทึก อ้างอิง อ้างโดย107 บทความที่เกี่ยวข้อง

Transformer-based local-global guidance for image captioning

H Parvin, AR Naghsh-Nilchi, HM Mohammadi - Expert Systems with …, 2023 - Elsevier

Image captioning is a difficult problem for machine learning algorithms to compress huge
amounts of images into descriptive languages. The recurrent models are popularly used as …

บันทึก อ้างอิง อ้างโดย23 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unpaired image captioning by language pivoting

J Gu, S Joty, J Cai, G Wang - Proceedings of the European …, 2018 - openaccess.thecvf.com

Image captioning is a multimodal task involving computer vision and natural language
processing, where the goal is to learn a map** from the image to its natural language …

บันทึก อ้างอิง อ้างโดย101 บทความที่เกี่ยวข้อง ทั้งหมด 10 ฉบับ ดูในรูปแบบ HTML

Image difference captioning with instance-level fine-grained feature representation

Q Huang, Y Liang, J Wei, Y Cai, H Liang… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

The task of image difference captioning aims at locating changed objects in similar image
pairs and describing the difference with natural language. The key challenges of this task …

บันทึก อ้างอิง อ้างโดย46 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] A systematic literature review on image captioning

R Staniūtė, D Šešok - Applied Sciences, 2019 - mdpi.com

Natural language problems have already been investigated for around five years. Recent
progress in artificial intelligence (AI) has greatly improved the performance of models …

บันทึก อ้างอิง อ้างโดย74 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ แคช

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

MAT: A multimodal attentive translator for image captioning

A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions

Cross-modal text and visual generation: A systematic review. Part 1: Image to text

Recurrent multimodal interaction for referring image segmentation

Stack-captioning: Coarse-to-fine learning for image captioning

Abstractive text-image summarization using multi-modal attentional hierarchical RNN

Multi-level policy and reward-based deep reinforcement learning framework for image captioning

Transformer-based local-global guidance for image captioning

Unpaired image captioning by language pivoting

Image difference captioning with instance-level fine-grained feature representation

[HTML][HTML] A systematic literature review on image captioning