محقق Google

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022‏ - ieeexplore.ieee.org‏

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …‏

ذخیره ارجاع بیان شده در 391 یافته مقاله‌های مربوط تمام نسخه‌های 12

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive survey of deep learning for image captioning‏

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019‏ - dl.acm.org‏

Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …‏

ذخیره ارجاع بیان شده در 1004 یافته مقاله‌های مربوط تمام نسخه‌های 9

[Free GPT-4]
[DeepSeek]

[PDF] jair.org

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation‏

A Gatt, E Krahmer - Journal of Artificial Intelligence Research, 2018‏ - jair.org‏

This paper surveys the current state of the art in Natural Language Generation (NLG),
defined as the task of generating text or speech from non-linguistic input. A survey of NLG is …‏

ذخیره ارجاع بیان شده در 1128 یافته مقاله‌های مربوط تمام نسخه‌های 15 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Show and tell: Lessons learned from the 2015 mscoco image captioning challenge‏

O Vinyals, A Toshev, S Bengio… - IEEE transactions on …, 2016‏ - ieeexplore.ieee.org‏

Automatically describing the content of an image is a fundamental problem in artificial
intelligence that connects computer vision and natural language processing. In this paper …‏

ذخیره ارجاع بیان شده در 1147 یافته مقاله‌های مربوط تمام نسخه‌های 20

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Microsoft coco captions: Data collection and evaluation server‏

X Chen, H Fang, TY Lin, R Vedantam, S Gupta… - arxiv preprint arxiv …, 2015‏ - arxiv.org‏

In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When
completed, the dataset will contain over one and a half million captions describing over …‏

ذخیره ارجاع بیان شده در 2875 یافته مقاله‌های مربوط تمام نسخه‌های 5 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Show, attend and tell: Neural image caption generation with visual attention‏

K Xu, J Ba, R Kiros, K Cho, A Courville… - International …, 2015‏ - proceedings.mlr.press‏

Inspired by recent work in machine translation and object detection, we introduce an
attention based model that automatically learns to describe the content of images. We …‏

ذخیره ارجاع بیان شده در 13218 یافته مقاله‌های مربوط تمام نسخه‌های 24 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] cv-foundation.org

Deep visual-semantic alignments for generating image descriptions‏

A Karpathy, L Fei-Fei - Proceedings of the IEEE conference on …, 2015‏ - cv-foundation.org‏

We present a model that generates natural language descriptions of images and their
regions. Our approach leverages datasets of images and their sentence descriptions to …‏

ذخیره ارجاع بیان شده در 7383 یافته مقاله‌های مربوط تمام نسخه‌های 39 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Long-term recurrent convolutional networks for visual recognition and description‏

J Donahue, L Anne Hendricks… - Proceedings of the …, 2015‏ - openaccess.thecvf.com‏

Abstract Models comprised of deep convolutional network layers have dominated recent
image interpretation tasks; we investigate whether models which are also compositional, or" …‏

ذخیره ارجاع بیان شده در 8175 یافته مقاله‌های مربوط تمام نسخه‌های 24 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] cv-foundation.org

Show and tell: A neural image caption generator‏

O Vinyals, A Toshev, S Bengio… - Proceedings of the IEEE …, 2015‏ - cv-foundation.org‏

Automatically describing the content of an image is a fundamental problem in artificial
intelligence that connects computer vision and natural language processing. In this paper …‏

ذخیره ارجاع بیان شده در 8086 یافته مقاله‌های مربوط تمام نسخه‌های 26 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unifying visual-semantic embeddings with multimodal neural language models‏

R Kiros, R Salakhutdinov, RS Zemel - arxiv preprint arxiv:1411.2539, 2014‏ - arxiv.org‏

Inspired by recent advances in multimodal learning and machine translation, we introduce
an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with …‏

ذخیره ارجاع بیان شده در 1739 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Treetalk: Composition and compression of trees for image descriptions

From show to tell: A survey on deep learning-based image captioning‏

A comprehensive survey of deep learning for image captioning‏

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation‏

Show and tell: Lessons learned from the 2015 mscoco image captioning challenge‏

Microsoft coco captions: Data collection and evaluation server‏

Show, attend and tell: Neural image caption generation with visual attention‏

Deep visual-semantic alignments for generating image descriptions‏

Long-term recurrent convolutional networks for visual recognition and description‏

Show and tell: A neural image caption generator‏

Unifying visual-semantic embeddings with multimodal neural language models‏