Google Akademik

X Wang, T Darrell, SS Rambhatla… - Proceedings of the …, 2024 - openaccess.thecvf.com

Text-to-image diffusion models produce high quality images but do not offer control over
individual instances in the image. We introduce InstanceDiffusion that adds precise instance …

Kaydet Alıntı yap Alıntılanma sayısı: 61 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Grounded text-to-image synthesis with attention refocusing

Q Phung, S Ge, JB Huang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …

Kaydet Alıntı yap Alıntılanma sayısı: 88 İlgili makaleler 5 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Direct-a-video: Customized video generation with user-directed camera movement and object motion

S Yang, L Hou, H Huang, C Ma, P Wan… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org

Recent text-to-video diffusion models have achieved impressive progress. In practice, users
often desire the ability to control object motion and camera movement independently for …

Kaydet Alıntı yap Alıntılanma sayısı: 62 İlgili makaleler 7 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Comat: Aligning text-to-image diffusion model with image-to-text concept matching

D Jiang, G Song, X Wu, R Zhang… - Advances in …, 2025 - proceedings.neurips.cc

Diffusion models have demonstrated great success in the field of text-to-image generation.
However, alleviating the misalignment between the text prompts and images is still …

Kaydet Alıntı yap Alıntılanma sayısı: 14 İlgili makaleler 5 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] ecva.net

Be yourself: Bounded attention for multi-subject text-to-image generation

O Dahary, O Patashnik, K Aberman… - European Conference on …, 2024 - Springer

Text-to-image diffusion models have an unprecedented ability to generate diverse and high-
quality images. However, they often struggle to faithfully capture the intended semantics of …

Kaydet Alıntı yap Alıntılanma sayısı: 17 İlgili makaleler 7 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Controlmllm: Training-free visual prompt learning for multimodal large language models

M Wu, X Cai, J Ji, J Li, O Huang… - Advances in …, 2025 - proceedings.neurips.cc

In this work, we propose a training-free method to inject visual prompts into Multimodal
Large Language Models (MLLMs) through learnable latent variable optimization. We …

Kaydet Alıntı yap Alıntılanma sayısı: 9 İlgili makaleler 5 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation

K Sun, K Huang, X Liu, Y Wu, Z Xu, Z Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Text-to-video (T2V) generation models have advanced significantly, yet their ability to
compose different objects, attributes, actions, and motions into a video remains unexplored …

Kaydet Alıntı yap Alıntılanma sayısı: 18 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Place: Adaptive layout-semantic fusion for semantic image synthesis

Z Lv, Y Wei, W Zuo, KYK Wong - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Recent advancements in large-scale pre-trained text-to-image models have led to
remarkable progress in semantic image synthesis. Nevertheless synthesizing high-quality …

Kaydet Alıntı yap Alıntılanma sayısı: 8 İlgili makaleler 9 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Neural assets: 3d-aware multi-object scene synthesis with image diffusion models

Z Wu, Y Rubanova, R Kabra… - Advances in …, 2025 - proceedings.neurips.cc

We address the problem of multi-object 3D pose control in image diffusion models. Instead
of conditioning on a sequence of text tokens, we propose to use a set of per-object …

Kaydet Alıntı yap Alıntılanma sayısı: 6 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Controllable generation with text-to-image diffusion models: A survey

P Cao, F Zhou, Q Song, L Yang - arxiv preprint arxiv:2403.04279, 2024 - arxiv.org

In the rapidly advancing realm of visual generation, diffusion models have revolutionized the
landscape, marking a significant shift in capabilities with their impressive text-guided …

Kaydet Alıntı yap Alıntılanma sayısı: 27 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Dense text-to-image generation with attention modulation

Instancediffusion: Instance-level control for image generation

Grounded text-to-image synthesis with attention refocusing

Direct-a-video: Customized video generation with user-directed camera movement and object motion

Comat: Aligning text-to-image diffusion model with image-to-text concept matching

Be yourself: Bounded attention for multi-subject text-to-image generation

Controlmllm: Training-free visual prompt learning for multimodal large language models

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation

Place: Adaptive layout-semantic fusion for semantic image synthesis

Neural assets: 3d-aware multi-object scene synthesis with image diffusion models

Controllable generation with text-to-image diffusion models: A survey