Diffusion models: A comprehensive survey of methods and applications

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org
Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms

L Yang, Z Yu, C Meng, M Xu, S Ermon… - Forty-first International …, 2024 - openreview.net
Diffusion models have exhibit exceptional performance in text-to-image generation and
editing. However, existing methods often face challenges when handling complex text …

Revision: Rendering tools enable spatial fidelity in vision-language models

A Chatterjee, Y Luo, T Gokhale, Y Yang… - European Conference on …, 2024 - Springer
Abstract Text-to-Image (T2I) and multimodal large language models (MLLMs) have been
adopted in solutions for several computer vision and multimodal learning tasks. However, it …

Retrieval-augmented generation for ai-generated content: A survey

P Zhao, H Zhang, Q Yu, Z Wang, Y Geng, F Fu… - arxiv preprint arxiv …, 2024 - arxiv.org
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …

Structure-Guided Adversarial Training of Diffusion Models

L Yang, H Qian, Z Zhang, J Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Diffusion models have demonstrated exceptional efficacy in various generative applications.
While existing models focus on minimizing a weighted sum of denoising score matching …

Genartist: Multimodal llm as an agent for unified image generation and editing

Z Wang, A Li, Z Li, X Liu - arxiv preprint arxiv:2407.05600, 2024 - arxiv.org
Despite the success achieved by existing image generation and editing methods, current
models still struggle with complex problems including intricate text prompts, and the …

Cross-modal contextualized diffusion models for text-guided visual generation and editing

L Yang, Z Zhang, Z Yu, J Liu, M Xu… - The Twelfth …, 2024 - openreview.net
Conditional diffusion models have exhibited superior performance in high-fidelity text-
guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion …

Lion: Implicit vision prompt tuning

H Wang, J Chang, Y Zhai, X Luo, J Sun, Z Lin… - Proceedings of the …, 2024 - ojs.aaai.org
Despite recent promising performances across a range of vision tasks, vision Transformers
still have an issue of high computational costs. Recently, vision prompt learning has …

Dit4edit: Diffusion transformer for image editing

K Feng, Y Ma, B Wang, C Qi, H Chen, Q Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite recent advances in UNet-based image editing, methods for shape-aware object
editing in high-resolution images are still lacking. Compared to UNet, Diffusion Transformers …

Gloss-driven Conditional Diffusion Models for Sign Language Production

S Tang, F Xue, J Wu, S Wang, R Hong - ACM Transactions on …, 2024 - dl.acm.org
Sign Language Production (SLP) aims to convert text or audio sentences into sign language
videos corresponding to their semantics, which is challenging due to the diversity and …