Diffusion model-based image editing: A survey

Y Huang, J Huang, Y Liu, M Yan, J Lv, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arxiv preprint arxiv …, 2024 - arxiv.org
Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

Imprint: Generative object compositing by learning identity-preserving representation

Y Song, Z Zhang, Z Lin, S Cohen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Generative object compositing emerges as a promising new avenue for compositional
image editing. However the requirement of object identity preservation poses a significant …

Mp5: A multi-modal open-ended embodied system in minecraft via active perception

Y Qin, E Zhou, Q Liu, Z Yin, L Sheng… - 2024 IEEE/CVF …, 2024 - ieeexplore.ieee.org
It is a long-lasting goal to design an embodied system that can solve long-horizon open-
world tasks in human-like ways. However, existing approaches usually struggle with …

Efficient diffusion models: A comprehensive survey from principles to practices

Z Ma, Y Zhang, G Jia, L Zhao, Y Ma, M Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
As one of the most popular and sought-after generative models in the recent years, diffusion
models have sparked the interests of many researchers and steadily shown excellent …

EditShield: Protecting Unauthorized Image Editing by Instruction-Guided Diffusion Models

R Chen, H **, Y Liu, J Chen, H Wang… - European Conference on …, 2024 - Springer
Text-to-image diffusion models have emerged as an evolutionary for producing creative
content in image synthesis. Based on the impressive generation abilities of these models …

Genartist: Multimodal llm as an agent for unified image generation and editing

Z Wang, A Li, Z Li, X Liu - arxiv preprint arxiv:2407.05600, 2024 - arxiv.org
Despite the success achieved by existing image generation and editing methods, current
models still struggle with complex problems including intricate text prompts, and the …

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance

W Sun, B Cui, J Tang, XM Dong - arxiv preprint arxiv:2412.12974, 2024 - arxiv.org
Recently, diffusion models have emerged as promising newcomers in the field of generative
models, shining brightly in image generation. However, when employed for object removal …

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

J Yang, X Niu, N Jiang, R Zhang, S Huang - European Conference on …, 2024 - Springer
Existing 3D human object interaction (HOI) datasets and models simply align global
descriptions with the long HOI sequence, while lacking a detailed understanding of …

Unifiedmllm: Enabling unified representation for multi-modal multi-tasks with large language model

Z Li, W Wang, YQ Cai, X Qi, P Wang, D Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Significant advancements has recently been achieved in the field of multi-modal large
language models (MLLMs), demonstrating their remarkable capabilities in understanding …