From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022‏ - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Automated identification of media bias in news articles: an interdisciplinary literature review

F Hamborg, K Donnay, B Gipp - International Journal on Digital Libraries, 2019‏ - Springer
Media bias, ie, slanted news coverage, can strongly impact the public perception of the
reported topics. In the social sciences, research over the past decades has developed …

Microsoft coco captions: Data collection and evaluation server

X Chen, H Fang, TY Lin, R Vedantam, S Gupta… - arxiv preprint arxiv …, 2015‏ - arxiv.org
In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When
completed, the dataset will contain over one and a half million captions describing over …

[PDF][PDF] Referitgame: Referring to objects in photographs of natural scenes

S Kazemzadeh, V Ordonez, M Matten… - Proceedings of the 2014 …, 2014‏ - aclanthology.org
In this paper we introduce a new game to crowd-source natural language referring
expressions. By designing a two player game, we can both collect and verify referring …

A survey on automatic image caption generation

S Bai, S An - Neurocomputing, 2018‏ - Elsevier
Image captioning means automatically generating a caption for an image. As a recently
emerged research area, it is attracting more and more attention. To achieve the goal of …

Grounded compositional semantics for finding and describing images with sentences

R Socher, A Karpathy, QV Le, CD Manning… - Transactions of the …, 2014‏ - direct.mit.edu
Abstract Previous work on Recursive Neural Networks (RNNs) shows that these models can
produce compositional feature vectors for accurately representing and classifying sentences …

Automatic description generation from images: A survey of models, datasets, and evaluation measures

R Bernardi, R Cakici, D Elliott, A Erdem, E Erdem… - Journal of Artificial …, 2016‏ - jair.org
Automatic description generation from natural images is a challenging problem that has
recently received a large amount of interest from the computer vision and natural language …

See say and segment: Teaching lmms to overcome false premises

TH Wu, G Biamby, D Chan, L Dunlap… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Abstract Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-
vocabulary language grounding and segmentation but can suffer under false premises …

Good news, everyone! context driven entity-aware captioning for news images

AF Biten, L Gomez, M Rusinol… - Proceedings of the …, 2019‏ - openaccess.thecvf.com
Current image captioning systems perform at a merely descriptive level, essentially
enumerating the objects in the scene and their relations. Humans, on the contrary, interpret …

TreeTalk: Composition and Compression of Trees for Image Descriptions

P Kuznetsova, V Ordonez, TL Berg… - Transactions of the …, 2014‏ - direct.mit.edu
We present a new tree based approach to composing expressive image descriptions that
makes use of naturally occuring web images with captions. We investigate two related tasks …