Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion
In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-
intricate setting, ie, generating intricate visual content from simple abstract text prompts …
intricate setting, ie, generating intricate visual content from simple abstract text prompts …
Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey
Abstract Automatic Visual Captioning (AVC) generates syntactically and semantically correct
sentences by describing important objects, attributes, and their relationships with each other …
sentences by describing important objects, attributes, and their relationships with each other …
Graph neural networks in vision-language image understanding: A survey
Abstract 2D image understanding is a complex problem within computer vision, but it holds
the key to providing human-level scene comprehension. It goes further than identifying the …
the key to providing human-level scene comprehension. It goes further than identifying the …
Hierarchical cross-modality semantic correlation learning model for multimodal summarization
Multimodal summarization with multimodal output (MSMO) generates a summary with both
textual and visual content. Multimodal news report contains heterogeneous contents, which …
textual and visual content. Multimodal news report contains heterogeneous contents, which …
Integrating object-aware and interaction-aware knowledge for weakly supervised scene graph generation
Recently, increasing efforts have been focused on Weakly Supervised Scene Graph
Generation (WSSGG). The mainstream solution for WSSGG typically follows the same …
Generation (WSSGG). The mainstream solution for WSSGG typically follows the same …
Image captioning based on scene graphs: A survey
J Jia, X Ding, S Pang, X Gao, X **n, R Hu… - Expert Systems with …, 2023 - Elsevier
Although recent developments in deep learning have brought several tasks closer to human
performance, there is still a significant gap between human and machine performance in …
performance, there is still a significant gap between human and machine performance in …
Effective multimodal encoding for image paragraph captioning
In this paper, we present a regularization-based image paragraph generation method. We
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …
Ic3: Image captioning by committee consensus
If you ask a human to describe an image, they might do so in a thousand different ways.
Traditionally, image captioning models are trained to generate a single" best"(most like a …
Traditionally, image captioning models are trained to generate a single" best"(most like a …
Compute to tell the tale: Goal-driven narrative generation
Man is by nature a social animal. One important facet of human evolution is through
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …
Video scene graph generation from single-frame weak supervision
Video scene graph generation (VidSGG) aims to generate a sequence of graph-structure
representations for the given video. However, all existing VidSGG methods are fully …
representations for the given video. However, all existing VidSGG methods are fully …