From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
A comprehensive survey of deep learning for image captioning
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …
recognizing the important objects, their attributes, and their relationships in an image. It also …
Zerocap: Zero-shot image-to-text generation for visual-semantic arithmetic
Recent text-to-image matching models apply contrastive learning to large corpora of
uncurated pairs of images and sentences. While such models can provide a powerful score …
uncurated pairs of images and sentences. While such models can provide a powerful score …
A survey of zero-shot learning: Settings, methods, and applications
Most machine-learning methods focus on classifying instances whose classes have already
been seen in training. In practice, many applications require classifying instances whose …
been seen in training. In practice, many applications require classifying instances whose …
Visualgpt: Data-efficient adaptation of pretrained language models for image captioning
The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …
learning. To efficiently learn from small quantities of multimodal data, we leverage the …
Artificial intelligence (AI) in augmented reality (AR)-assisted manufacturing applications: a review
Augmented reality (AR) has proven to be an invaluable interactive medium to reduce
cognitive load by bridging the gap between the task-at-hand and relevant information by …
cognitive load by bridging the gap between the task-at-hand and relevant information by …
Beyond IID: three levels of generalization for question answering on knowledge bases
Existing studies on question answering on knowledge bases (KBQA) mainly operate with
the standard iid assumption, ie, training distribution over questions is the same as the test …
the standard iid assumption, ie, training distribution over questions is the same as the test …
Neural baby talk
We introduce a novel framework for image captioning that can produce natural language
explicitly grounded in entities that object detectors find in the image. Our approach …
explicitly grounded in entities that object detectors find in the image. Our approach …
Improved image captioning via policy gradient optimization of spider
Current image captioning methods are usually trained via maximum likelihood estimation.
However, the log-likelihood score of a caption does not correlate well with human …
However, the log-likelihood score of a caption does not correlate well with human …
Nocaps: Novel object captioning at scale
Image captioning models have achieved impressive results on datasets containing limited
visual concepts and large amounts of paired image-caption training data. However, if these …
visual concepts and large amounts of paired image-caption training data. However, if these …