Diffusion models: A comprehensive survey of methods and applications
Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …
record-breaking performance in many applications, including image synthesis, video …
Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms
Diffusion models have exhibit exceptional performance in text-to-image generation and
editing. However, existing methods often face challenges when handling complex text …
editing. However, existing methods often face challenges when handling complex text …
Revision: Rendering tools enable spatial fidelity in vision-language models
Abstract Text-to-Image (T2I) and multimodal large language models (MLLMs) have been
adopted in solutions for several computer vision and multimodal learning tasks. However, it …
adopted in solutions for several computer vision and multimodal learning tasks. However, it …
Retrieval-augmented generation for ai-generated content: A survey
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …
advancements in model algorithms, scalable foundation model architectures, and the …
Structure-Guided Adversarial Training of Diffusion Models
Diffusion models have demonstrated exceptional efficacy in various generative applications.
While existing models focus on minimizing a weighted sum of denoising score matching …
While existing models focus on minimizing a weighted sum of denoising score matching …
Genartist: Multimodal llm as an agent for unified image generation and editing
Despite the success achieved by existing image generation and editing methods, current
models still struggle with complex problems including intricate text prompts, and the …
models still struggle with complex problems including intricate text prompts, and the …
Cross-modal contextualized diffusion models for text-guided visual generation and editing
Conditional diffusion models have exhibited superior performance in high-fidelity text-
guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion …
guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion …
Lion: Implicit vision prompt tuning
Despite recent promising performances across a range of vision tasks, vision Transformers
still have an issue of high computational costs. Recently, vision prompt learning has …
still have an issue of high computational costs. Recently, vision prompt learning has …
Dit4edit: Diffusion transformer for image editing
Despite recent advances in UNet-based image editing, methods for shape-aware object
editing in high-resolution images are still lacking. Compared to UNet, Diffusion Transformers …
editing in high-resolution images are still lacking. Compared to UNet, Diffusion Transformers …
Gloss-driven Conditional Diffusion Models for Sign Language Production
Sign Language Production (SLP) aims to convert text or audio sentences into sign language
videos corresponding to their semantics, which is challenging due to the diversity and …
videos corresponding to their semantics, which is challenging due to the diversity and …