Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion

X Ju, X Liu, X Wang, Y Bian, Y Shan, Q Xu - European Conference on …, 2024 - Springer
Image inpainting, the process of restoring corrupted images, has seen significant
advancements with the advent of diffusion models (DMs). Despite these advancements …

Self-rectifying diffusion sampling with perturbed-attention guidance

D Ahn, H Cho, J Min, W Jang, J Kim, SH Kim… - … on Computer Vision, 2024 - Springer
Recent studies have demonstrated that diffusion models can generate high-quality samples,
but their quality heavily depends on sampling guidance techniques, such as classifier …

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

A Chatterjee, GBM Stan, E Aflalo, S Paul… - … on Computer Vision, 2024 - Springer
One of the key shortcomings in current text-to-image (T2I) models is their inability to
consistently generate images which faithfully follow the spatial relationships specified in the …

Renaissance: A survey into ai text-to-image generation in the era of large model

F Bie, Y Yang, Z Zhou, A Ghanem… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Text-to-image generation (TTI) refers to the usage of models that could process text input
and generate high fidelity images based on text descriptions. Text-to-image generation …

Deep compression autoencoder for efficient high-resolution diffusion models

J Chen, H Cai, J Chen, E **e, S Yang, H Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models
for accelerating high-resolution diffusion models. Existing autoencoder models have …

Diffh2o: Diffusion-based synthesis of hand-object interactions from textual descriptions

S Christen, S Hampali, F Sener, E Remelli… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
We introduce DiffH2O, a new diffusion-based framework for synthesizing realistic, dexterous
hand-object interactions from natural language. Our model employs a temporal two-stage …

SATO: Stable Text-to-Motion Framework

W Chen, H **ao, E Zhang, L Hu, L Wang… - Proceedings of the …, 2024 - dl.acm.org
Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily
stem from more accurate predictions of specific actions. However, the text modality typically …

[PDF][PDF] Erasing concepts from text-to-image diffusion models with few-shot unlearning

M Fuchi, T Takagi - arxiv preprint arxiv:2405.07288, 2024 - bmva-archive.org.uk
Generating images from text has become easier because of the scaling of diffusion models
and advancements in the field of vision and language. These models are trained using vast …

Revisit large-scale image-caption data in pre-training multimodal foundation models

Z Lai, V Saveris, C Chen, HY Chen, H Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in multimodal models highlight the value of rewritten captions for
improving performance, yet key challenges remain. For example, while synthetic captions …

Bigger is not always better: Scaling properties of latent diffusion models

K Mei, Z Tu, M Delbracio, H Talebi… - … on Machine Learning …, 2024 - openreview.net
We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their
sampling efficiency. While improved network architecture and inference algorithms have …