An overview of diffusion models: Applications, guided generation, statistical rates and optimization

M Chen, S Mei, J Fan, M Wang - arxiv preprint arxiv:2404.07771, 2024 - arxiv.org
Diffusion models, a powerful and universal generative AI technology, have achieved
tremendous success in computer vision, audio, reinforcement learning, and computational …

Scaling rectified flow transformers for high-resolution image synthesis

P Esser, S Kulal, A Blattmann, R Entezari… - … on machine learning, 2024 - openreview.net
Diffusion models create data from noise by inverting the forward paths of data towards noise
and have emerged as a powerful generative modeling technique for high-dimensional …

Opportunities and challenges of diffusion models for generative AI

M Chen, S Mei, J Fan, M Wang - National Science Review, 2024 - academic.oup.com
Diffusion models, a powerful and universal generative artificial intelligence technology, have
achieved tremendous success and opened up new possibilities in diverse applications. In …

Reconfusion: 3d reconstruction with diffusion priors

R Wu, B Mildenhall, P Henzler, K Park… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract 3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at
rendering photorealistic novel views of complex scenes. However recovering a high-quality …

Lumiere: A space-time diffusion model for video generation

O Bar-Tal, H Chefer, O Tov, C Herrmann… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
We introduce Lumiere–a text-to-video diffusion model designed for synthesizing videos that
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …

Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation

Y Xu, Z Shi, W Yifan, H Chen, C Yang, S Peng… - … on Computer Vision, 2024 - Springer
We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from
sparse-view images in around 0.1 s. GRM is a feed-forward transformer-based model that …

4d-fy: Text-to-4d generation using hybrid score distillation sampling

S Bahmani, I Skorokhodov, V Rong… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-
video models to generate dynamic 3D scenes. However current text-to-4D methods face a …

Gpt-4v (ision) is a human-aligned evaluator for text-to-3d generation

T Wu, G Yang, Z Li, K Zhang, Z Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Despite recent advances in text-to-3D generative methods there is a notable absence of
reliable evaluation metrics. Existing metrics usually focus on a single criterion each such as …

Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model

Y Xu, H Tan, F Luan, S Bi, P Wang, J Li, Z Shi… - arxiv preprint arxiv …, 2023 - arxiv.org
We propose\textbf {DMV3D}, a novel 3D generation approach that uses a transformer-based
3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model …

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

L Zhang, Z Wang, Q Zhang, Q Qiu, A Pang… - ACM Transactions on …, 2024 - dl.acm.org
In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is
often hampered by the limitations of existing digital tools, which demand extensive expertise …