From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Visual clues: Bridging vision and language foundations for image paragraph captioning

Y **e, L Zhou, X Dai, L Yuan, N Bach… - Advances in Neural …, 2022 - proceedings.neurips.cc
People say," A picture is worth a thousand words". Then how can we get the rich information
out of the image? We argue that by using visual clues to bridge large pretrained vision …

Chinese image caption generation via visual attention and topic modeling

M Liu, H Hu, L Li, Y Yu, W Guan - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Automatic image captioning is to conduct the cross-modal conversion from image visual
content to natural language text. Involving computer vision (CV) and natural language …

Dialoguetrm: Exploring the intra-and inter-modal emotional behaviors in the conversation

Y Mao, Q Sun, G Liu, X Wang, W Gao, X Li… - arxiv preprint arxiv …, 2020 - arxiv.org
Emotion Recognition in Conversations (ERC) is essential for building empathetic human-
machine systems. Existing studies on ERC primarily focus on summarizing the context …

Intention oriented image captions with guiding objects

Y Zheng, Y Li, S Wang - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
Although existing image caption models can produce promising results using recurrent
neural networks (RNNs), it is difficult to guarantee that an object we care about is contained …

Effective multimodal encoding for image paragraph captioning

TS Nguyen, B Fernando - IEEE Transactions on Image …, 2022 - ieeexplore.ieee.org
In this paper, we present a regularization-based image paragraph generation method. We
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …

Dual-CNN: A Convolutional language decoder for paragraph image captioning

R Li, H Liang, Y Shi, F Feng, X Wang - Neurocomputing, 2020 - Elsevier
The task of paragraph image captioning aims to generate a coherent paragraph describing
a given image. However, due to their limited ability to capture long-term dependency …

Curiosity-driven reinforcement learning for diverse visual paragraph generation

Y Luo, Z Huang, Z Zhang, Z Wang, J Li… - Proceedings of the 27th …, 2019 - dl.acm.org
Visual paragraph generation aims to automatically describe a given image from different
perspectives and organize sentences in a coherent way. In this paper, we address three …

Image captioning with novel topics guidance and retrieval-based topics re-weighting

M Al-Qatf, X Wang, A Hawbani… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
Topic modelling (TM) has shown significant progress in boosting the effectiveness of image
captioning in the last few years. Although important improvements have been shown in …

Exploring global and local linguistic representations for text-to-image synthesis

R Li, N Wang, F Feng, G Zhang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
The task of text-to-image synthesis is to generate photographic images conditioned on given
textual descriptions. This challenging task has recently attracted considerable attention from …