- Academic Search

Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion

S Wu, H Fei, H Zhang, TS Chua - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-
intricate setting, ie, generating intricate visual content from simple abstract text prompts …

Save Cite Cited by 47 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

D Sharma, C Dhiman, D Kumar - Expert Systems with Applications, 2023 - Elsevier

Abstract Automatic Visual Captioning (AVC) generates syntactically and semantically correct
sentences by describing important objects, attributes, and their relationships with each other …

Save Cite Cited by 15 Related articles All 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Graph neural networks in vision-language image understanding: A survey

H Senior, G Slabaugh, S Yuan, L Rossi - The Visual Computer, 2024 - Springer

Abstract 2D image understanding is a complex problem within computer vision, but it holds
the key to providing human-level scene comprehension. It goes further than identifying the …

Save Cite Cited by 18 Related articles All 7 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Hierarchical cross-modality semantic correlation learning model for multimodal summarization

L Zhang, X Zhang, J Pan - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org

Multimodal summarization with multimodal output (MSMO) generates a summary with both
textual and visual content. Multimodal news report contains heterogeneous contents, which …

Save Cite Cited by 57 Related articles All 6 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Integrating object-aware and interaction-aware knowledge for weakly supervised scene graph generation

X Li, L Chen, W Ma, Y Yang, J **ao - Proceedings of the 30th ACM …, 2022 - dl.acm.org

Recently, increasing efforts have been focused on Weakly Supervised Scene Graph
Generation (WSSGG). The mainstream solution for WSSGG typically follows the same …

Save Cite Cited by 27 Related articles All 4 versions Free GPT-4 DeepSeek

Image captioning based on scene graphs: A survey

J Jia, X Ding, S Pang, X Gao, X **n, R Hu… - Expert Systems with …, 2023 - Elsevier

Although recent developments in deep learning have brought several tasks closer to human
performance, there is still a significant gap between human and machine performance in …

Save Cite Cited by 15 Related articles All 2 versions Free GPT-4 DeepSeek

Effective multimodal encoding for image paragraph captioning

TS Nguyen, B Fernando - IEEE Transactions on Image …, 2022 - ieeexplore.ieee.org

In this paper, we present a regularization-based image paragraph generation method. We
propose a novel multimodal encoding generator (MEG) to generate effective multimodal …

Save Cite Cited by 12 Related articles All 5 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ic3: Image captioning by committee consensus

DM Chan, A Myers, S Vijayanarasimhan… - arxiv preprint arxiv …, 2023 - arxiv.org

If you ask a human to describe an image, they might do so in a thousand different ways.
Traditionally, image captioning models are trained to generate a single" best"(most like a …

Save Cite Cited by 18 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Compute to tell the tale: Goal-driven narrative generation

Y Wong, S Fan, Y Guo, Z Xu, K Stephen… - Proceedings of the 30th …, 2022 - dl.acm.org

Man is by nature a social animal. One important facet of human evolution is through
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …

Save Cite Cited by 15 Related articles All 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Video scene graph generation from single-frame weak supervision

S Chen, J **ao, L Chen - The Eleventh International Conference on …, 2023 - openreview.net

Video scene graph generation (VidSGG) aims to generate a sequence of graph-structure
representations for the given video. However, all existing VidSGG methods are fully …

Save Cite Cited by 7 Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Hierarchical scene graph encoder-decoder for image paragraph captioning

Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion

Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

Graph neural networks in vision-language image understanding: A survey

Hierarchical cross-modality semantic correlation learning model for multimodal summarization

Integrating object-aware and interaction-aware knowledge for weakly supervised scene graph generation

Image captioning based on scene graphs: A survey

Effective multimodal encoding for image paragraph captioning

Ic3: Image captioning by committee consensus

Compute to tell the tale: Goal-driven narrative generation

Video scene graph generation from single-frame weak supervision