Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Instancediffusion: Instance-level control for image generation
Text-to-image diffusion models produce high quality images but do not offer control over
individual instances in the image. We introduce InstanceDiffusion that adds precise instance …
individual instances in the image. We introduce InstanceDiffusion that adds precise instance …
Grounded text-to-image synthesis with attention refocusing
Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …
synthesis methods have shown compelling results. However these models still fail to …
Direct-a-video: Customized video generation with user-directed camera movement and object motion
Recent text-to-video diffusion models have achieved impressive progress. In practice, users
often desire the ability to control object motion and camera movement independently for …
often desire the ability to control object motion and camera movement independently for …
Comat: Aligning text-to-image diffusion model with image-to-text concept matching
D Jiang, G Song, X Wu, R Zhang… - Advances in …, 2025 - proceedings.neurips.cc
Diffusion models have demonstrated great success in the field of text-to-image generation.
However, alleviating the misalignment between the text prompts and images is still …
However, alleviating the misalignment between the text prompts and images is still …
Be yourself: Bounded attention for multi-subject text-to-image generation
Text-to-image diffusion models have an unprecedented ability to generate diverse and high-
quality images. However, they often struggle to faithfully capture the intended semantics of …
quality images. However, they often struggle to faithfully capture the intended semantics of …
Controlmllm: Training-free visual prompt learning for multimodal large language models
In this work, we propose a training-free method to inject visual prompts into Multimodal
Large Language Models (MLLMs) through learnable latent variable optimization. We …
Large Language Models (MLLMs) through learnable latent variable optimization. We …
T2v-compbench: A comprehensive benchmark for compositional text-to-video generation
Text-to-video (T2V) generation models have advanced significantly, yet their ability to
compose different objects, attributes, actions, and motions into a video remains unexplored …
compose different objects, attributes, actions, and motions into a video remains unexplored …
Place: Adaptive layout-semantic fusion for semantic image synthesis
Recent advancements in large-scale pre-trained text-to-image models have led to
remarkable progress in semantic image synthesis. Nevertheless synthesizing high-quality …
remarkable progress in semantic image synthesis. Nevertheless synthesizing high-quality …
Neural assets: 3d-aware multi-object scene synthesis with image diffusion models
Z Wu, Y Rubanova, R Kabra… - Advances in …, 2025 - proceedings.neurips.cc
We address the problem of multi-object 3D pose control in image diffusion models. Instead
of conditioning on a sequence of text tokens, we propose to use a set of per-object …
of conditioning on a sequence of text tokens, we propose to use a set of per-object …
Controllable generation with text-to-image diffusion models: A survey
In the rapidly advancing realm of visual generation, diffusion models have revolutionized the
landscape, marking a significant shift in capabilities with their impressive text-guided …
landscape, marking a significant shift in capabilities with their impressive text-guided …