Cami2v: Camera-controlled image-to-video diffusion model

G Zheng, T Li, R Jiang, Y Lu, T Wu, X Li - arxiv preprint arxiv:2410.15957, 2024 - arxiv.org
Recently, camera pose, as a user-friendly and physics-related condition, has been
introduced into text-to-video diffusion model for camera control. However, existing methods …

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

H Zhang, D Hong, T Gao, Y Wang, J Shao… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have been recognized for their ability to generate images that are not only
visually appealing but also of high artistic quality. As a result, Layout-to-Image (L2I) …

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Z Li, C Meng, Y Li, L Yang, S Zhang, J Ma, J Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in text-to-image (T2I) generation have shown remarkable success in
producing high-quality images from text. However, existing T2I models show decayed …

CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis

M Zhang, Y Liu, Y Liu, H Yu, Q Ye - arxiv preprint arxiv:2412.08464, 2024 - arxiv.org
Accurately depicting real-world landscapes in remote sensing (RS) images requires precise
alignment between objects and their environment. However, most existing synthesis …

RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control

T Li, G Zheng, R Jiang, T Wu, Y Lu, Y Lin, X Li - arxiv preprint arxiv …, 2025 - arxiv.org
Recent advancements in camera-trajectory-guided image-to-video generation offer higher
precision and better support for complex camera control compared to text-based …

EliGen: Entity-Level Controlled Image Generation with Regional Attention

H Zhang, Z Duan, X Wang, Y Chen, Y Zhang - arxiv preprint arxiv …, 2025 - arxiv.org
Recent advancements in diffusion models have significantly advanced text-to-image
generation, yet global text prompts alone remain insufficient for achieving fine-grained …

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

D Zhou, J **e, Z Yang, Y Yang - arxiv preprint arxiv:2501.05131, 2025 - arxiv.org
The growing demand for controllable outputs in text-to-image generation has driven
significant advancements in multi-instance generation (MIG), enabling users to define both …