Evolutionary computation in the era of large language model: Survey and roadmap

X Wu, S Wu, J Wu, L Feng, KC Tan - arxiv preprint arxiv:2401.10034, 2024 - arxiv.org
Large Language Models (LLMs), built upon Transformer-based architectures with massive
pretraining on diverse data, have not only revolutionized natural language processing but …

Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing

M Cao, X Wang, Z Qi, Y Shan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite the success in large-scale text-to-image generation and text-conditioned image
editing, existing methods still struggle to produce consistent generation and editing results …

T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation

K Huang, K Sun, E **e, Z Li… - Advances in Neural …, 2023 - proceedings.neurips.cc
Despite the stunning ability to generate high-quality images by recent text-to-image models,
current approaches often struggle to effectively compose objects with different attributes and …

Layoutgpt: Compositional visual planning and generation with large language models

W Feng, W Zhu, T Fu, V Jampani… - Advances in …, 2023 - proceedings.neurips.cc
Attaining a high degree of user controllability in visual generation often requires intricate,
fine-grained inputs like layouts. However, such inputs impose a substantial burden on users …

Svdiff: Compact parameter space for diffusion fine-tuning

L Han, Y Li, H Zhang, P Milanfar… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, diffusion models have achieved remarkable success in text-to-image generation,
enabling the creation of high-quality images from text prompts and various conditions …

Fastcomposer: Tuning-free multi-subject image generation with localized attention

G **ao, T Yin, WT Freeman, F Durand… - International Journal of …, 2024 - Springer
Diffusion models excel at text-to-image generation, especially in subject-driven generation
for personalized images. However, existing methods are inefficient due to the subject …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion

J **e, Y Li, Y Huang, H Liu, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent text-to-image diffusion models have demonstrated an astonishing capacity to
generate high-quality images. However, researchers mainly studied the way of synthesizing …

Tokenflow: Consistent diffusion features for consistent video editing

M Geyer, O Bar-Tal, S Bagon, T Dekel - arxiv preprint arxiv:2307.10373, 2023 - arxiv.org
The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-
the-art video models are still lagging behind image models in terms of visual quality and …

A survey on generative diffusion models

H Cao, C Tan, Z Gao, Y Xu, G Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep generative models have unlocked another profound realm of human creativity. By
capturing and generalizing patterns within data, we have entered the epoch of all …