A comprehensive survey of deep learning for image captioning
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …
recognizing the important objects, their attributes, and their relationships in an image. It also …
[HTML][HTML] Adversarial text-to-image synthesis: A review
With the advent of generative adversarial networks, synthesizing images from text
descriptions has recently become an active research area. It is a flexible and intuitive way for …
descriptions has recently become an active research area. It is a flexible and intuitive way for …
Diffusiondet: Diffusion model for object detection
We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …
Beyond transmitting bits: Context, semantics, and task-oriented communications
Communication systems to date primarily aim at reliably communicating bit sequences.
Such an approach provides efficient engineering designs that are agnostic to the meanings …
Such an approach provides efficient engineering designs that are agnostic to the meanings …
Visual genome: Connecting language and vision using crowdsourced dense image annotations
Despite progress in perceptual tasks such as image classification, computers still perform
poorly on cognitive tasks such as image description and question answering. Cognition is …
poorly on cognitive tasks such as image description and question answering. Cognition is …
Past, present, and future of simultaneous localization and map**: Toward the robust-perception age
Simultaneous localization and map** (SLAM) consists in the concurrent construction of a
model of the environment (the map), and the estimation of the state of the robot moving …
model of the environment (the map), and the estimation of the state of the robot moving …
Gqa: A new dataset for real-world visual reasoning and compositional question answering
We introduce GQA, a new dataset for real-world visual reasoning and compositional
question answering, seeking to address key shortcomings of previous VQA datasets. We …
question answering, seeking to address key shortcomings of previous VQA datasets. We …
Clevr: A diagnostic dataset for compositional language and elementary visual reasoning
When building artificial intelligence systems that can reason and answer questions about
visual data, we need diagnostic tests to analyze our progress and discover short-comings …
visual data, we need diagnostic tests to analyze our progress and discover short-comings …
Semantic communications: Principles and challenges
Semantic communication, regarded as the breakthrough beyond the Shannon paradigm,
aims at the successful transmission of semantic information conveyed by the source rather …
aims at the successful transmission of semantic information conveyed by the source rather …
Spice: Semantic propositional image caption evaluation
There is considerable interest in the task of automatically generating image captions.
However, evaluation is challenging. Existing automatic evaluation metrics are primarily …
However, evaluation is challenging. Existing automatic evaluation metrics are primarily …