From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

Zerocap: Zero-shot image-to-text generation for visual-semantic arithmetic

Y Tewel, Y Shalev, I Schwartz… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Recent text-to-image matching models apply contrastive learning to large corpora of
uncurated pairs of images and sentences. While such models can provide a powerful score …

A survey of zero-shot learning: Settings, methods, and applications

W Wang, VW Zheng, H Yu, C Miao - ACM Transactions on Intelligent …, 2019 - dl.acm.org
Most machine-learning methods focus on classifying instances whose classes have already
been seen in training. In practice, many applications require classifying instances whose …

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

Artificial intelligence (AI) in augmented reality (AR)-assisted manufacturing applications: a review

CK Sahu, C Young, R Rai - International Journal of Production …, 2021 - Taylor & Francis
Augmented reality (AR) has proven to be an invaluable interactive medium to reduce
cognitive load by bridging the gap between the task-at-hand and relevant information by …

Beyond IID: three levels of generalization for question answering on knowledge bases

Y Gu, S Kase, M Vanni, B Sadler, P Liang… - Proceedings of the Web …, 2021 - dl.acm.org
Existing studies on question answering on knowledge bases (KBQA) mainly operate with
the standard iid assumption, ie, training distribution over questions is the same as the test …

Neural baby talk

J Lu, J Yang, D Batra, D Parikh - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
We introduce a novel framework for image captioning that can produce natural language
explicitly grounded in entities that object detectors find in the image. Our approach …

Improved image captioning via policy gradient optimization of spider

S Liu, Z Zhu, N Ye, S Guadarrama… - Proceedings of the …, 2017 - openaccess.thecvf.com
Current image captioning methods are usually trained via maximum likelihood estimation.
However, the log-likelihood score of a caption does not correlate well with human …

Nocaps: Novel object captioning at scale

H Agrawal, K Desai, Y Wang, X Chen… - Proceedings of the …, 2019 - openaccess.thecvf.com
Image captioning models have achieved impressive results on datasets containing limited
visual concepts and large amounts of paired image-caption training data. However, if these …