- Academic Search

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

保存引用被引用次数：396 相关文章所有 11 个版本

[Free GPT-4]

[PDF] arxiv.org

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org

Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

保存引用被引用次数：1005 相关文章所有 8 个版本

[Free GPT-4]

[PDF] aclanthology.org

Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning

P Sharma, N Ding, S Goodman… - Proceedings of the 56th …, 2018 - aclanthology.org

We present a new dataset of image caption annotations, Conceptual Captions, which
contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) …

保存引用被引用次数：2683 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] ieee.org

Deep multimodal representation learning: A survey

W Guo, J Wang, S Wang - Ieee Access, 2019 - ieeexplore.ieee.org

Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …

保存引用被引用次数：540 相关文章所有 4 个版本

[Free GPT-4]

[PDF] arxiv.org

Multimodal machine learning: A survey and taxonomy

T Baltrušaitis, C Ahuja… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

Our experience of the world is multimodal-we see objects, hear sounds, feel texture, smell
odors, and taste flavors. Modality refers to the way in which something happens or is …

保存引用被引用次数：3869 相关文章所有 12 个版本

[Free GPT-4]

[PDF] jair.org

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation

A Gatt, E Krahmer - Journal of Artificial Intelligence Research, 2018 - jair.org

This paper surveys the current state of the art in Natural Language Generation (NLG),
defined as the task of generating text or speech from non-linguistic input. A survey of NLG is …

保存引用被引用次数：1148 相关文章所有 15 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Spice: Semantic propositional image caption evaluation

P Anderson, B Fernando, M Johnson… - Computer Vision–ECCV …, 2016 - Springer

There is considerable interest in the task of automatically generating image captions.
However, evaluation is challenging. Existing automatic evaluation metrics are primarily …

保存引用被引用次数：2338 相关文章所有 13 个版本

[Free GPT-4]

[PDF] arxiv.org

Remind your neural network to prevent catastrophic forgetting

TL Hayes, K Kafle, R Shrestha, M Acharya… - European conference on …, 2020 - Springer

People learn throughout life. However, incrementally updating conventional neural networks
leads to catastrophic forgetting. A common remedy is replay, which is inspired by how the …

保存引用被引用次数：361 相关文章所有 10 个版本

[Free GPT-4]

[PDF] thecvf.com

What makes training multi-modal classification networks hard?

W Wang, D Tran, M Feiszli - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com

Consider end-to-end training of a multi-modal vs. a uni-modal network on a task with
multiple input modalities: the multi-modal network receives more information, so it should …

保存引用被引用次数：445 相关文章所有 8 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Visual translation embedding network for visual relation detection

H Zhang, Z Kyaw, SF Chang… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com

Visual relations, such as" person ride bike" and" bike next to car", offer a comprehensive
scene understanding of an image, and have already shown their great utility in connecting …

保存引用被引用次数：664 相关文章所有 8 个版本 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Automatic description generation from images: A survey of models, datasets, and evaluation measures

From show to tell: A survey on deep learning-based image captioning

A comprehensive survey of deep learning for image captioning

Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning

Deep multimodal representation learning: A survey

Multimodal machine learning: A survey and taxonomy

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation

Spice: Semantic propositional image caption evaluation

Remind your neural network to prevent catastrophic forgetting

What makes training multi-modal classification networks hard?

Visual translation embedding network for visual relation detection