- Academic Search

S Tan, B Gong, X Wang, S Zhang, D Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Character image animation, which generates high-quality videos from a reference image
and target pose sequence, has seen significant progress in recent years. However, most …

Uložit Citovat Počet citací tohoto článku: 5 Související články Všechny verze (počet: 2) Zobrazit jako HTML

Record: Reasoning and correcting diffusion for hoi generation

JY Jiang-Lin, KY Huang, L Lo, YN Huang… - Proceedings of the …, 2024 - dl.acm.org

Diffusion models revolutionize image generation by leveraging natural language to guide
the creation of multimedia content. Despite significant advancements in such generative …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 5)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

H Zhang, D Hong, T Gao, Y Wang, J Shao… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion models have been recognized for their ability to generate images that are not only
visually appealing but also of high artistic quality. As a result, Layout-to-Image (L2I) …

Uložit Citovat Počet citací tohoto článku: 1 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

T Hu, L Li, J van de Weijer, H Gao, FS Khan… - arxiv preprint arxiv …, 2024 - arxiv.org

Although text-to-image (T2I) models exhibit remarkable generation capabilities, they
frequently fail to accurately bind semantically related objects or attributes in the input …

Uložit Citovat Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

S Shi, B Gong, X Chen, D Zheng, S Tan, Z Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

The image-to-video (I2V) generation is conditioned on the static image, which has been
enhanced recently by the motion intensity as an additional control signal. These motion …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mimir: Improving Video Diffusion Models for Precise Text Understanding

S Tan, B Gong, Y Feng, K Zheng, D Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Text serves as the key control signal in video generation due to its narrative nature. To
render text descriptions into video clips, current video diffusion models borrow features from …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Z Chen, L Yang, Y Qi, H Zhang, K Pang, K Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite the rapid advancements in text-to-image (T2I) synthesis, enabling precise visual
control remains a significant challenge. Existing works attempted to incorporate multi-facet …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[HTML] amazon.science

[HTML][HTML] Improving faithfulness of text-to-image diffusion models through inference intervention

D Guo, S Agarwal, YH Lin, JY Kao, T Chung, N Peng… - 2025 - amazon.science

Text-to-Image diffusion models have shown remarkable capabilities in generating high-
quality images. However, current models often struggle to adhere to the complete set of …

Uložit Citovat Související články Všechny verze (počet: 2) Archiv

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

Animate-x: Universal character image animation with enhanced motion representation

Record: Reasoning and correcting diffusion for hoi generation

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

Mimir: Improving Video Diffusion Models for Precise Text Understanding

VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

[HTML][HTML] Improving faithfulness of text-to-image diffusion models through inference intervention