A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

Deep learning in medical imaging: general overview

JG Lee, S Jun, YW Cho, H Lee… - Korean journal of …, 2017 - synapse.koreamed.org
The artificial neural network (ANN)–a machine learning technique inspired by the human
neuronal synapse system–was introduced in the 1950s. However, the ANN was previously …

RSTNet: Captioning with adaptive attention on visual and non-visual words

X Zhang, X Sun, Y Luo, J Ji, Y Zhou… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent progress on visual question answering has explored the merits of grid features for
vision language tasks. Meanwhile, transformer-based models have shown remarkable …

X-linear attention networks for image captioning

Y Pan, T Yao, Y Li, T Mei - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
Recent progress on fine-grained visual recognition and visual question answering has
featured Bilinear Pooling, which effectively models the 2nd order interactions across multi …

Task-adaptive attention for image captioning

C Yan, Y Hao, L Li, J Yin, A Liu, Z Mao… - … on Circuits and …, 2021 - ieeexplore.ieee.org
Attention mechanisms are now widely used in image captioning models. However, most
attention models only focus on visual features. When generating syntax related words, little …

Making the v in vqa matter: Elevating the role of image understanding in visual question answering

Y Goyal, T Khot, D Summers-Stay… - Proceedings of the …, 2017 - openaccess.thecvf.com
Problems at the intersection of vision and language are of significant importance both as
challenging research questions and for the rich set of applications they enable. However …

Visual genome: Connecting language and vision using crowdsourced dense image annotations

R Krishna, Y Zhu, O Groth, J Johnson, K Hata… - International journal of …, 2017 - Springer
Despite progress in perceptual tasks such as image classification, computers still perform
poorly on cognitive tasks such as image description and question answering. Cognition is …

Show and tell: Lessons learned from the 2015 mscoco image captioning challenge

O Vinyals, A Toshev, S Bengio… - IEEE transactions on …, 2016 - ieeexplore.ieee.org
Automatically describing the content of an image is a fundamental problem in artificial
intelligence that connects computer vision and natural language processing. In this paper …

Boosting image captioning with attributes

T Yao, Y Pan, Y Li, Z Qiu, T Mei - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Automatically describing an image with a natural language has been an emerging
challenge in both fields of computer vision and natural language processing. In this paper …

Vqa: Visual question answering

S Antol, A Agrawal, J Lu, M Mitchell… - Proceedings of the …, 2015 - openaccess.thecvf.com
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given
an image and a natural language question about the image, the task is to provide an …