Diffusion models in vision: A survey

FA Croitoru, V Hondru, RT Ionescu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Denoising diffusion models represent a recent emerging topic in computer vision,
demonstrating remarkable results in the area of generative modeling. A diffusion model is a …

Ai-generated content (aigc) for various data modalities: A survey

LG Foo, H Rahmani, J Liu - arxiv preprint arxiv:2308.14177, 2023 - arxiv.org
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and
other media using AI algorithms. Due to its wide range of applications and the demonstrated …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Conditional image-to-video generation with latent flow diffusion models

H Ni, C Shi, K Li, SX Huang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video
starting from an image (eg, a person's face) and a condition (eg, an action class label like …

Collaborative diffusion for multi-modal face generation and editing

Z Huang, KCK Chan, Y Jiang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Diffusion models arise as a powerful generative tool recently. Despite the great progress,
existing diffusion models mainly focus on uni-modal control, ie, the diffusion process is …

Gauhuman: Articulated gaussian splatting from monocular human videos

S Hu, T Hu, Z Liu - … of the IEEE/CVF conference on …, 2024 - openaccess.thecvf.com
We present GauHuman a 3D human model with Gaussian Splatting for both fast training (1 2
minutes) and real-time rendering (up to 189 FPS) compared with existing NeRF-based …

Datasetdm: Synthesizing data with perception annotations using diffusion models

W Wu, Y Zhao, H Chen, Y Gu, R Zhao… - Advances in …, 2023 - proceedings.neurips.cc
Current deep networks are very data-hungry and benefit from training on large-scale
datasets, which are often time-consuming to collect and annotate. By contrast, synthetic data …

Controlling text-to-image diffusion by orthogonal finetuning

Z Qiu, W Liu, H Feng, Y Xue, Y Feng… - Advances in …, 2023 - proceedings.neurips.cc
Large text-to-image diffusion models have impressive capabilities in generating
photorealistic images from text prompts. How to effectively guide or control these powerful …

Avatarclip: Zero-shot text-driven generation and animation of 3d avatars

F Hong, M Zhang, L Pan, Z Cai, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
3D avatar creation plays a crucial role in the digital age. However, the whole production
process is prohibitively time-consuming and labor-intensive. To democratize this technology …

Videobooth: Diffusion-based video generation with image prompts

Y Jiang, T Wu, S Yang, C Si, D Lin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-driven video generation witnesses rapid progress. However merely using text prompts
is not enough to depict the desired subject appearance that accurately aligns with users' …