Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion
Image inpainting, the process of restoring corrupted images, has seen significant
advancements with the advent of diffusion models (DMs). Despite these advancements …
advancements with the advent of diffusion models (DMs). Despite these advancements …
Self-rectifying diffusion sampling with perturbed-attention guidance
Recent studies have demonstrated that diffusion models can generate high-quality samples,
but their quality heavily depends on sampling guidance techniques, such as classifier …
but their quality heavily depends on sampling guidance techniques, such as classifier …
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
One of the key shortcomings in current text-to-image (T2I) models is their inability to
consistently generate images which faithfully follow the spatial relationships specified in the …
consistently generate images which faithfully follow the spatial relationships specified in the …
Renaissance: A survey into ai text-to-image generation in the era of large model
Text-to-image generation (TTI) refers to the usage of models that could process text input
and generate high fidelity images based on text descriptions. Text-to-image generation …
and generate high fidelity images based on text descriptions. Text-to-image generation …
Deep compression autoencoder for efficient high-resolution diffusion models
We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models
for accelerating high-resolution diffusion models. Existing autoencoder models have …
for accelerating high-resolution diffusion models. Existing autoencoder models have …
Diffh2o: Diffusion-based synthesis of hand-object interactions from textual descriptions
We introduce DiffH2O, a new diffusion-based framework for synthesizing realistic, dexterous
hand-object interactions from natural language. Our model employs a temporal two-stage …
hand-object interactions from natural language. Our model employs a temporal two-stage …
SATO: Stable Text-to-Motion Framework
Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily
stem from more accurate predictions of specific actions. However, the text modality typically …
stem from more accurate predictions of specific actions. However, the text modality typically …
[PDF][PDF] Erasing concepts from text-to-image diffusion models with few-shot unlearning
M Fuchi, T Takagi - arxiv preprint arxiv:2405.07288, 2024 - bmva-archive.org.uk
Generating images from text has become easier because of the scaling of diffusion models
and advancements in the field of vision and language. These models are trained using vast …
and advancements in the field of vision and language. These models are trained using vast …
Revisit large-scale image-caption data in pre-training multimodal foundation models
Recent advancements in multimodal models highlight the value of rewritten captions for
improving performance, yet key challenges remain. For example, while synthetic captions …
improving performance, yet key challenges remain. For example, while synthetic captions …
Bigger is not always better: Scaling properties of latent diffusion models
We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their
sampling efficiency. While improved network architecture and inference algorithms have …
sampling efficiency. While improved network architecture and inference algorithms have …