- Academic Search

K Tian, Y Jiang, Z Yuan, B Peng… - Advances in neural …, 2025 - proceedings.neurips.cc

Abstract We present Visual AutoRegressive modeling (VAR), a new generation paradigm
that redefines the autoregressive learning on images as coarse-to-fine" next-scale …

Zapisz Cytuj Cytowane przez 153 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Miradata: A large-scale video dataset with long durations and structured captions

X Ju, Y Gao, Z Zhang, Z Yuan… - Advances in …, 2025 - proceedings.neurips.cc

Sora's high-motion intensity and long consistent videos have significantly impacted the field
of video generation, attracting unprecedented attention. However, existing publicly available …

Zapisz Cytuj Cytowane przez 30 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Emu3: Next-token prediction is all you need

X Wang, X Zhang, Z Luo, Q Sun, Y Cui, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

While next-token prediction is considered a promising path towards artificial general
intelligence, it has struggled to excel in multimodal tasks, which are still dominated by …

Zapisz Cytuj Cytowane przez 76 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open-sora: Democratizing efficient video production for all

Z Zheng, X Peng, T Yang, C Shen, S Li, H Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision and language are the two foundational senses for humans, and they build up our
cognitive ability and intelligence. While significant breakthroughs have been made in AI …

Zapisz Cytuj Cytowane przez 48 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improved distribution matching distillation for fast image synthesis

T Yin, M Gharbi, T Park, R Zhang, E Shechtman… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent approaches have shown promises distilling diffusion models into efficient one-step
generators. Among them, Distribution Matching Distillation (DMD) produces one-step …

Zapisz Cytuj Cytowane przez 47 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Genai arena: An open evaluation platform for generative models

D Jiang, M Ku, T Li, Y Ni, S Sun… - Advances in Neural …, 2025 - proceedings.neurips.cc

Generative AI has made remarkable strides to revolutionize fields such as image and video
generation. These advancements are driven by innovative algorithms, architecture, and …

Zapisz Cytuj Cytowane przez 12 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dreamlip: Language-image pre-training with long captions

K Zheng, Y Zhang, W Wu, F Lu, S Ma, X **… - … on Computer Vision, 2024 - Springer

Abstract Language-image pre-training largely relies on how precisely and thoroughly a text
describes its paired image. In practice, however, the contents of an image can be so rich that …

Zapisz Cytuj Cytowane przez 23 Powiązane artykuły Wszystkie wersje 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …

Zapisz Cytuj Cytowane przez 29 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Representation alignment for generation: Training diffusion transformers is easier than you think

S Yu, S Kwak, H Jang, J Jeong, J Huang, J Shin… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent studies have shown that the denoising process in (generative) diffusion models can
induce meaningful (discriminative) representations inside the model, though the quality of …

Zapisz Cytuj Cytowane przez 26 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Ditfastattn: Attention compression for diffusion transformer models

Z Yuan, H Zhang, L Pu, X Ning… - Advances in …, 2025 - proceedings.neurips.cc

Abstract Diffusion Transformers (DiT) excel at image and video generation but face
computational challenges due to the quadratic complexity of self-attention operators. We …

Zapisz Cytuj Cytowane przez 11 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation

Visual autoregressive modeling: Scalable image generation via next-scale prediction

Miradata: A large-scale video dataset with long durations and structured captions

Emu3: Next-token prediction is all you need

Open-sora: Democratizing efficient video production for all

Improved distribution matching distillation for fast image synthesis

Genai arena: An open evaluation platform for generative models

Dreamlip: Language-image pre-training with long captions

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

Representation alignment for generation: Training diffusion transformers is easier than you think

Ditfastattn: Attention compression for diffusion transformer models