A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arxiv preprint arxiv …, 2023 - arxiv.org
As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD Conference …, 2022 - dl.acm.org
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

Scaling up vision-language pre-training for image captioning

X Hu, Z Gan, J Wang, Z Yang, Z Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
In recent years, we have witnessed significant performance boost in the image captioning
task based on vision-language pre-training (VLP). Scale is believed to be an important factor …

From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

A survey of natural language generation

C Dong, Y Li, H Gong, M Chen, J Li, Y Shen… - ACM Computing …, 2022 - dl.acm.org
This article offers a comprehensive review of the research on Natural Language Generation
(NLG) over the past two decades, especially in relation to data-to-text generation and text-to …

Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion

S Wu, H Fei, H Zhang, TS Chua - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-
intricate setting, ie, generating intricate visual content from simple abstract text prompts …

Similarity reasoning and filtration for image-text matching

H Diao, Y Zhang, L Ma, H Lu - Proceedings of the AAAI conference on …, 2021 - ojs.aaai.org
Image-text matching plays a critical role in bridging the vision and language, and great
progress has been made by exploiting the global alignment between image and sentence …

Meshed-memory transformer for image captioning

M Cornia, M Stefanini, L Baraldi… - Proceedings of the …, 2020 - openaccess.thecvf.com
Transformer-based architectures represent the state of the art in sequence modeling tasks
like machine translation and language understanding. Their applicability to multi-modal …

Unbiased scene graph generation from biased training

K Tang, Y Niu, J Huang, J Shi… - Proceedings of the …, 2020 - openaccess.thecvf.com
Today's scene graph generation (SGG) task is still far from practical, mainly due to the
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …