Google Učenjak

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Emu3: Next-token prediction is all you need

X Wang, X Zhang, Z Luo, Q Sun, Y Cui, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

While next-token prediction is considered a promising path towards artificial general
intelligence, it has struggled to excel in multimodal tasks, which are still dominated by …

Shrani Navedi Navedeno v 80 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …

Shrani Navedi Navedeno v 30 virih Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unidream: Unifying diffusion priors for relightable text-to-3d generation

Z Liu, Y Li, Y Lin, X Yu, S Peng, YP Cao, X Qi… - … on Computer Vision, 2024 - Springer

Recent advancements in text-to-3D generation technology have significantly advanced the
conversion of textual descriptions into imaginative well-geometrical and finely textured 3D …

Shrani Navedi Navedeno v 24 virih Sorodni članki Vse različice: 6

Sana: Efficient high-resolution image synthesis with linear diffusion transformers

E **e, J Chen, J Chen, H Cai, H Tang, Y Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Sana, a text-to-image framework that can efficiently generate images up to
4096$\times $4096 resolution. Sana can synthesize high-resolution, high-quality images …

Shrani Navedi Navedeno v 13 virih Sorodni članki Vse različice: 2 Posnetek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Janus-pro: Unified multimodal understanding and generation with data and model scaling

X Chen, Z Wu, X Liu, Z Pan, W Liu, Z **e, X Yu… - arxiv preprint arxiv …, 2025 - arxiv.org

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus.
Specifically, Janus-Pro incorporates (1) an optimized training strategy,(2) expanded training …

Shrani Navedi Navedeno v 8 virih Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

PixWizard: Versatile image-to-image visual assistant with open-language instructions

W Lin, X Wei, R Zhang, L Zhuo, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper presents a versatile image-to-image visual assistant, PixWizard, designed for
image generation, manipulation, and translation based on free-from language instructions …

Shrani Navedi Navedeno v 3 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Janusflow: Harmonizing autoregression and rectified flow for unified multimodal understanding and generation

Y Ma, X Liu, X Chen, W Liu, C Wu, Z Wu, Z Pan… - arxiv preprint arxiv …, 2024 - arxiv.org

We present JanusFlow, a powerful framework that unifies image understanding and
generation in a single model. JanusFlow introduces a minimalist architecture that integrates …

Shrani Navedi Navedeno v 4 virih Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Customize your visual autoregressive recipe with set autoregressive modeling

W Liu, L Zhuo, Y **n, S **a, P Gao, X Yue - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce a new paradigm for AutoRegressive (AR) image generation, termed Set
AutoRegressive Modeling (SAR). SAR generalizes the conventional AR to the next-set …

Shrani Navedi Navedeno v 5 virih Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers

E **e, J Chen, J Chen, H Cai, H Tang, Y Lin… - The Thirteenth …, 2025 - openreview.net

We introduce Sana, a text-to-image framework that can efficiently generate images up to
4096$\times $4096 resolution. Sana can synthesize high-resolution, high-quality images …

Shrani Navedi Sorodni članki V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

D Hu, J Chen, X Huang, H Coskun, A Sahni… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing text-to-image (T2I) diffusion models face several limitations, including large model
sizes, slow runtime, and low-quality generation on mobile devices. This paper aims to …

Shrani Navedi Navedeno v 1 virih Sorodni članki Vse različice: 2 V obliki HTML

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

Emu3: Next-token prediction is all you need

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

Unidream: Unifying diffusion priors for relightable text-to-3d generation

Sana: Efficient high-resolution image synthesis with linear diffusion transformers

Janus-pro: Unified multimodal understanding and generation with data and model scaling

PixWizard: Versatile image-to-image visual assistant with open-language instructions

Janusflow: Harmonizing autoregression and rectified flow for unified multimodal understanding and generation

Customize your visual autoregressive recipe with set autoregressive modeling

SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training