From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
Visual clues: Bridging vision and language foundations for image paragraph captioning
People say," A picture is worth a thousand words". Then how can we get the rich information
out of the image? We argue that by using visual clues to bridge large pretrained vision …
out of the image? We argue that by using visual clues to bridge large pretrained vision …
Chinese image caption generation via visual attention and topic modeling
M Liu, H Hu, L Li, Y Yu, W Guan - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Automatic image captioning is to conduct the cross-modal conversion from image visual
content to natural language text. Involving computer vision (CV) and natural language …
content to natural language text. Involving computer vision (CV) and natural language …
Dialoguetrm: Exploring the intra-and inter-modal emotional behaviors in the conversation
Emotion Recognition in Conversations (ERC) is essential for building empathetic human-
machine systems. Existing studies on ERC primarily focus on summarizing the context …
machine systems. Existing studies on ERC primarily focus on summarizing the context …
Intention oriented image captions with guiding objects
Y Zheng, Y Li, S Wang - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
Although existing image caption models can produce promising results using recurrent
neural networks (RNNs), it is difficult to guarantee that an object we care about is contained …
neural networks (RNNs), it is difficult to guarantee that an object we care about is contained …
Effective multimodal encoding for image paragraph captioning
TS Nguyen, B Fernando - IEEE Transactions on Image …, 2022 - ieeexplore.ieee.org
In this paper, we present a regularization-based image paragraph generation method. We
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …
Dual-CNN: A Convolutional language decoder for paragraph image captioning
The task of paragraph image captioning aims to generate a coherent paragraph describing
a given image. However, due to their limited ability to capture long-term dependency …
a given image. However, due to their limited ability to capture long-term dependency …
Curiosity-driven reinforcement learning for diverse visual paragraph generation
Visual paragraph generation aims to automatically describe a given image from different
perspectives and organize sentences in a coherent way. In this paper, we address three …
perspectives and organize sentences in a coherent way. In this paper, we address three …
Image captioning with novel topics guidance and retrieval-based topics re-weighting
Topic modelling (TM) has shown significant progress in boosting the effectiveness of image
captioning in the last few years. Although important improvements have been shown in …
captioning in the last few years. Although important improvements have been shown in …
Exploring global and local linguistic representations for text-to-image synthesis
The task of text-to-image synthesis is to generate photographic images conditioned on given
textual descriptions. This challenging task has recently attracted considerable attention from …
textual descriptions. This challenging task has recently attracted considerable attention from …