A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

[HTML][HTML] Adversarial text-to-image synthesis: A review

S Frolov, T Hinz, F Raue, J Hees, A Dengel - Neural Networks, 2021 - Elsevier
With the advent of generative adversarial networks, synthesizing images from text
descriptions has recently become an active research area. It is a flexible and intuitive way for …

Diffusiondet: Diffusion model for object detection

S Chen, P Sun, Y Song, P Luo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …

Beyond transmitting bits: Context, semantics, and task-oriented communications

D Gündüz, Z Qin, IE Aguerri, HS Dhillon… - IEEE Journal on …, 2022 - ieeexplore.ieee.org
Communication systems to date primarily aim at reliably communicating bit sequences.
Such an approach provides efficient engineering designs that are agnostic to the meanings …

Visual genome: Connecting language and vision using crowdsourced dense image annotations

R Krishna, Y Zhu, O Groth, J Johnson, K Hata… - International journal of …, 2017 - Springer
Despite progress in perceptual tasks such as image classification, computers still perform
poorly on cognitive tasks such as image description and question answering. Cognition is …

Past, present, and future of simultaneous localization and map**: Toward the robust-perception age

C Cadena, L Carlone, H Carrillo, Y Latif… - IEEE Transactions …, 2016 - ieeexplore.ieee.org
Simultaneous localization and map** (SLAM) consists in the concurrent construction of a
model of the environment (the map), and the estimation of the state of the robot moving …

Gqa: A new dataset for real-world visual reasoning and compositional question answering

DA Hudson, CD Manning - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
We introduce GQA, a new dataset for real-world visual reasoning and compositional
question answering, seeking to address key shortcomings of previous VQA datasets. We …

Clevr: A diagnostic dataset for compositional language and elementary visual reasoning

J Johnson, B Hariharan… - Proceedings of the …, 2017 - openaccess.thecvf.com
When building artificial intelligence systems that can reason and answer questions about
visual data, we need diagnostic tests to analyze our progress and discover short-comings …

Semantic communications: Principles and challenges

Z Qin, X Tao, J Lu, W Tong, GY Li - arxiv preprint arxiv:2201.01389, 2021 - arxiv.org
Semantic communication, regarded as the breakthrough beyond the Shannon paradigm,
aims at the successful transmission of semantic information conveyed by the source rather …

Spice: Semantic propositional image caption evaluation

P Anderson, B Fernando, M Johnson… - Computer Vision–ECCV …, 2016 - Springer
There is considerable interest in the task of automatically generating image captions.
However, evaluation is challenging. Existing automatic evaluation metrics are primarily …