A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

Multimodal machine learning: A survey and taxonomy

T Baltrušaitis, C Ahuja… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Our experience of the world is multimodal-we see objects, hear sounds, feel texture, smell
odors, and taste flavors. Modality refers to the way in which something happens or is …

Stacked cross attention for image-text matching

KH Lee, X Chen, G Hua, H Hu… - Proceedings of the …, 2018 - openaccess.thecvf.com
In this paper, we study the problem of image-text matching. Inferring the latent semantic
alignment between objects or other salient stuff (eg snow, sky, lawn) and the corresponding …

Multimodal transformer with multi-view visual representation for image captioning

J Yu, J Li, Z Yu, Q Huang - … on circuits and systems for video …, 2019 - ieeexplore.ieee.org
Image captioning aims to automatically generate a natural language description of a given
image, and most state-of-the-art models have adopted an encoder-decoder framework. The …

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation

A Gatt, E Krahmer - Journal of Artificial Intelligence Research, 2018 - jair.org
This paper surveys the current state of the art in Natural Language Generation (NLG),
defined as the task of generating text or speech from non-linguistic input. A survey of NLG is …

Spice: Semantic propositional image caption evaluation

P Anderson, B Fernando, M Johnson… - Computer Vision–ECCV …, 2016 - Springer
There is considerable interest in the task of automatically generating image captions.
However, evaluation is challenging. Existing automatic evaluation metrics are primarily …

Towards diverse and natural image descriptions via a conditional gan

B Dai, S Fidler, R Urtasun, D Lin - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Despite the substantial progress in recent years, the problem of image captioning remains
far from being satisfactorily tackled. Sentences produced by existing methods, eg those …

Show and tell: Lessons learned from the 2015 mscoco image captioning challenge

O Vinyals, A Toshev, S Bengio… - IEEE transactions on …, 2016 - ieeexplore.ieee.org
Automatically describing the content of an image is a fundamental problem in artificial
intelligence that connects computer vision and natural language processing. In this paper …

Boosting image captioning with attributes

T Yao, Y Pan, Y Li, Z Qiu, T Mei - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Automatically describing an image with a natural language has been an emerging
challenge in both fields of computer vision and natural language processing. In this paper …

Summarizing source code using a neural attention model

S Iyer, I Konstas, A Cheung… - 54th Annual Meeting …, 2016 - researchportal.hw.ac.uk
High quality source code is often paired with high level summaries of the computation it
performs, for example in code documentation or in descriptions posted in online forums …