Академия Google

Y Huang, J Huang, Y Liu, M Yan, J Lv, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …

Сохранить Цитировать Цитируется: 68 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]

[PDF] arxiv.org

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arxiv preprint arxiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

Сохранить Цитировать Цитируется: 43 Похожие статьи Все версии статьи (4) В виде HTML

[Free GPT-4]

[PDF] thecvf.com

Imprint: Generative object compositing by learning identity-preserving representation

Y Song, Z Zhang, Z Lin, S Cohen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Generative object compositing emerges as a promising new avenue for compositional
image editing. However the requirement of object identity preservation poses a significant …

Сохранить Цитировать Цитируется: 18 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]

[PDF] arxiv.org

Mp5: A multi-modal open-ended embodied system in minecraft via active perception

Y Qin, E Zhou, Q Liu, Z Yin, L Sheng… - 2024 IEEE/CVF …, 2024 - ieeexplore.ieee.org

It is a long-lasting goal to design an embodied system that can solve long-horizon open-
world tasks in human-like ways. However, existing approaches usually struggle with …

Сохранить Цитировать Цитируется: 20 Похожие статьи Все версии статьи (4)

[Free GPT-4]

[PDF] arxiv.org

Efficient diffusion models: A comprehensive survey from principles to practices

Z Ma, Y Zhang, G Jia, L Zhao, Y Ma, M Ma… - arxiv preprint arxiv …, 2024 - arxiv.org

As one of the most popular and sought-after generative models in the recent years, diffusion
models have sparked the interests of many researchers and steadily shown excellent …

Сохранить Цитировать Цитируется: 2 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]

[PDF] arxiv.org

`EditShield`: Protecting Unauthorized Image Editing by Instruction-Guided Diffusion Models

R Chen, H **, Y Liu, J Chen, H Wang… - European Conference on …, 2024 - Springer

Text-to-image diffusion models have emerged as an evolutionary for producing creative
content in image synthesis. Based on the impressive generation abilities of these models …

Сохранить Цитировать Цитируется: 10 Похожие статьи Все версии статьи (2)

[Free GPT-4]

[PDF] arxiv.org

Genartist: Multimodal llm as an agent for unified image generation and editing

Z Wang, A Li, Z Li, X Liu - arxiv preprint arxiv:2407.05600, 2024 - arxiv.org

Despite the success achieved by existing image generation and editing methods, current
models still struggle with complex problems including intricate text prompts, and the …

Сохранить Цитировать Цитируется: 9 Похожие статьи Все версии статьи (4) В виде HTML

[Free GPT-4]

[PDF] arxiv.org

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance

W Sun, B Cui, J Tang, XM Dong - arxiv preprint arxiv:2412.12974, 2024 - arxiv.org

Recently, diffusion models have emerged as promising newcomers in the field of generative
models, shining brightly in image generation. However, when employed for object removal …

Сохранить Цитировать Цитируется: 7 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]

[PDF] arxiv.org

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

J Yang, X Niu, N Jiang, R Zhang, S Huang - European Conference on …, 2024 - Springer

Existing 3D human object interaction (HOI) datasets and models simply align global
descriptions with the long HOI sequence, while lacking a detailed understanding of …

Сохранить Цитировать Цитируется: 2 Похожие статьи Все версии статьи (7)

[Free GPT-4]

[PDF] arxiv.org

Unifiedmllm: Enabling unified representation for multi-modal multi-tasks with large language model

Z Li, W Wang, YQ Cai, X Qi, P Wang, D Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Significant advancements has recently been achieved in the field of multi-modal large
language models (MLLMs), demonstrating their remarkable capabilities in understanding …

Сохранить Цитировать Цитируется: 6 Похожие статьи Все версии статьи (3) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Smartedit: Exploring complex instruction-based image editing with multimodal large language models

Diffusion model-based image editing: A survey

The (r) evolution of multimodal large language models: A survey

Imprint: Generative object compositing by learning identity-preserving representation

Mp5: A multi-modal open-ended embodied system in minecraft via active perception

Efficient diffusion models: A comprehensive survey from principles to practices

`EditShield`: Protecting Unauthorized Image Editing by Instruction-Guided Diffusion Models

Genartist: Multimodal llm as an agent for unified image generation and editing

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Unifiedmllm: Enabling unified representation for multi-modal multi-tasks with large language model