Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion

S Wu, H Fei, H Zhang, TS Chua - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-
intricate setting, ie, generating intricate visual content from simple abstract text prompts …

Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

D Sharma, C Dhiman, D Kumar - Expert Systems with Applications, 2023 - Elsevier
Abstract Automatic Visual Captioning (AVC) generates syntactically and semantically correct
sentences by describing important objects, attributes, and their relationships with each other …

Graph neural networks in vision-language image understanding: A survey

H Senior, G Slabaugh, S Yuan, L Rossi - The Visual Computer, 2024 - Springer
Abstract 2D image understanding is a complex problem within computer vision, but it holds
the key to providing human-level scene comprehension. It goes further than identifying the …

Hierarchical cross-modality semantic correlation learning model for multimodal summarization

L Zhang, X Zhang, J Pan - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org
Multimodal summarization with multimodal output (MSMO) generates a summary with both
textual and visual content. Multimodal news report contains heterogeneous contents, which …

Integrating object-aware and interaction-aware knowledge for weakly supervised scene graph generation

X Li, L Chen, W Ma, Y Yang, J **ao - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Recently, increasing efforts have been focused on Weakly Supervised Scene Graph
Generation (WSSGG). The mainstream solution for WSSGG typically follows the same …

Image captioning based on scene graphs: A survey

J Jia, X Ding, S Pang, X Gao, X **n, R Hu… - Expert Systems with …, 2023 - Elsevier
Although recent developments in deep learning have brought several tasks closer to human
performance, there is still a significant gap between human and machine performance in …

Effective multimodal encoding for image paragraph captioning

TS Nguyen, B Fernando - IEEE Transactions on Image …, 2022 - ieeexplore.ieee.org
In this paper, we present a regularization-based image paragraph generation method. We
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …

Ic3: Image captioning by committee consensus

DM Chan, A Myers, S Vijayanarasimhan… - arxiv preprint arxiv …, 2023 - arxiv.org
If you ask a human to describe an image, they might do so in a thousand different ways.
Traditionally, image captioning models are trained to generate a single" best"(most like a …

Compute to tell the tale: Goal-driven narrative generation

Y Wong, S Fan, Y Guo, Z Xu, K Stephen… - Proceedings of the 30th …, 2022 - dl.acm.org
Man is by nature a social animal. One important facet of human evolution is through
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …

Video scene graph generation from single-frame weak supervision

S Chen, J **ao, L Chen - The Eleventh International Conference on …, 2023 - openreview.net
Video scene graph generation (VidSGG) aims to generate a sequence of graph-structure
representations for the given video. However, all existing VidSGG methods are fully …