A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?
As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …
everywhere because of its ability to analyze and create text, images, and beyond. With such …
From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
Meshed-memory transformer for image captioning
Transformer-based architectures represent the state of the art in sequence modeling tasks
like machine translation and language understanding. Their applicability to multi-modal …
like machine translation and language understanding. Their applicability to multi-modal …
Grit: Faster and better image captioning transformer using dual visual features
Current state-of-the-art methods for image captioning employ region-based features, as they
provide object-level information that is essential to describe the content of images; they are …
provide object-level information that is essential to describe the content of images; they are …
Cptr: Full transformer network for image captioning
In this paper, we consider the image captioning task from a new sequence-to-sequence
prediction perspective and propose CaPtion TransformeR (CPTR) which takes the …
prediction perspective and propose CaPtion TransformeR (CPTR) which takes the …
Improving image captioning by leveraging intra-and inter-layer global representation in transformer network
Transformer-based architectures have shown great success in image captioning, where
object regions are encoded and then attended into the vectorial representations to guide the …
object regions are encoded and then attended into the vectorial representations to guide the …
Region-aware image captioning via interaction learning
Image captioning is one of the primary goals in computer vision which aims to automatically
generate natural descriptions for images. Intuitively, human visual system can notice some …
generate natural descriptions for images. Intuitively, human visual system can notice some …
Image Captioning in news report scenario
Image captioning strives to generate pertinent captions for specified images, situating itself
at the crossroads of Computer Vision (CV) and Natural Language Processing (NLP). This …
at the crossroads of Computer Vision (CV) and Natural Language Processing (NLP). This …
Beyond a pre-trained object detector: Cross-modal textual and visual context for image captioning
Significant progress has been made on visual captioning, largely relying on pre-trained
features and later fixed object detectors that serve as rich inputs to auto-regressive models …
features and later fixed object detectors that serve as rich inputs to auto-regressive models …
Trends in integration of vision and language research: A survey of tasks, datasets, and methods
A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …
growth in the last few years. This success can be partly attributed to the advancements made …