- Academic Search

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …

Save Cite Cited by 26 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Omnigen: Unified image generation

S **ao, Y Wang, J Zhou, H Yuan, X **ng, R Yan… - arxiv preprint arxiv …, 2024 - arxiv.org

In this work, we introduce OmniGen, a new diffusion model for unified image generation.
Unlike popular diffusion models (eg, Stable Diffusion), OmniGen no longer requires …

Save Cite Cited by 25 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Vd3d: Taming large video diffusion transformers for 3d camera control

S Bahmani, I Skorokhodov, A Siarohin… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of
complex videos from a text description. However, most existing models lack fine-grained …

Save Cite Cited by 21 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation

S Yuan, J Huang, Y Xu, Y Liu, S Zhang, Y Shi… - arxiv preprint arxiv …, 2024 - arxiv.org

We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to
evaluate the temporal and metamorphic capabilities of the T2V models (eg Sora and …

Save Cite Cited by 17 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Mars: Mixture of auto-regressive models for fine-grained text-to-image synthesis

W He, S Fu, M Liu, X Wang, W **ao, F Shu… - arxiv preprint arxiv …, 2024 - arxiv.org

Auto-regressive models have made significant progress in the realm of language
generation, yet they do not perform on par with diffusion models in the domain of image …

Save Cite Cited by 17 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Efficient diffusion models: A comprehensive survey from principles to practices

Z Ma, Y Zhang, G Jia, L Zhao, Y Ma, M Ma… - arxiv preprint arxiv …, 2024 - arxiv.org

As one of the most popular and sought-after generative models in the recent years, diffusion
models have sparked the interests of many researchers and steadily shown excellent …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Venhancer: Generative space-time enhancement for video generation

J He, T Xue, D Liu, X Lin, P Gao, D Lin, Y Qiao… - arxiv preprint arxiv …, 2024 - arxiv.org

We present VEnhancer, a generative space-time enhancement framework that improves the
existing text-to-video results by adding more details in spatial domain and synthetic detailed …

Save Cite Cited by 9 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Monoformer: One transformer for both diffusion and autoregression

C Zhao, Y Song, W Wang, H Feng, E Ding… - arxiv preprint arxiv …, 2024 - arxiv.org

Most existing multimodality methods use separate backbones for autoregression-based
discrete text generation and diffusion-based continuous visual generation, or the same …

Save Cite Cited by 8 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Scaling diffusion transformers to 16 billion parameters

Z Fei, M Fan, C Yu, D Li, J Huang - arxiv preprint arxiv:2407.11633, 2024 - arxiv.org

In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is
scalable and competitive with dense networks while exhibiting highly optimized inference …

Save Cite Cited by 9 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Mardini: Masked autoregressive diffusion for video generation at scale

H Liu, S Liu, Z Zhou, M Xu, Y **e, X Han… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce MarDini, a new family of video diffusion models that integrate the advantages
of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR …

Save Cite Cited by 5 Related articles All 4 versions Free GPT-4 View as HTML

Cite

Advanced search

Saved to My library

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

Omnigen: Unified image generation

Vd3d: Taming large video diffusion transformers for 3d camera control

Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation

Mars: Mixture of auto-regressive models for fine-grained text-to-image synthesis

Efficient diffusion models: A comprehensive survey from principles to practices

Venhancer: Generative space-time enhancement for video generation

Monoformer: One transformer for both diffusion and autoregression

Scaling diffusion transformers to 16 billion parameters

Mardini: Masked autoregressive diffusion for video generation at scale