Google Akademik

X Wang, X Zhang, Z Luo, Q Sun, Y Cui, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

While next-token prediction is considered a promising path towards artificial general
intelligence, it has struggled to excel in multimodal tasks, which are still dominated by …

Kaydet Alıntı yap Alıntılanma sayısı: 66 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Show-o: One single transformer to unify multimodal understanding and generation

J **e, W Mao, Z Bai, DJ Zhang, W Wang, KQ Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

We present a unified transformer, ie, Show-o, that unifies multimodal understanding and
generation. Unlike fully autoregressive models, Show-o unifies autoregressive and …

Kaydet Alıntı yap Alıntılanma sayısı: 71 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …

Kaydet Alıntı yap Alıntılanma sayısı: 26 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Janus: Decoupling visual encoding for unified multimodal understanding and generation

C Wu, X Chen, Z Wu, Y Ma, X Liu, Z Pan, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we introduce Janus, an autoregressive framework that unifies multimodal
understanding and generation. Prior research often relies on a single visual encoder for …

Kaydet Alıntı yap Alıntılanma sayısı: 22 İlgili makaleler HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Loong: Generating minute-level long videos with autoregressive language models

Y Wang, T **ong, D Zhou, Z Lin, Y Zhao, B Kang… - arxiv preprint arxiv …, 2024 - arxiv.org

It is desirable but challenging to generate content-rich long videos in the scale of minutes.
Autoregressive large language models (LLMs) have achieved great success in generating …

Kaydet Alıntı yap Alıntılanma sayısı: 20 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Open-magvit2: An open-source project toward democratizing auto-regressive visual generation

Z Luo, F Shi, Y Ge, Y Yang, L Wang, Y Shan - arxiv preprint arxiv …, 2024 - arxiv.org

We present Open-MAGVIT2, a family of auto-regressive image generation models ranging
from 300M to 1.5 B. The Open-MAGVIT2 project produces an open-source replication of …

Kaydet Alıntı yap Alıntılanma sayısı: 26 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Maskbit: Embedding-free image generation via bit tokens

M Weber, L Yu, Q Yu, X Deng, X Shen… - arxiv preprint arxiv …, 2024 - arxiv.org

Masked transformer models for class-conditional image generation have become a
compelling alternative to diffusion models. Typically comprising two stages-an initial VQGAN …

Kaydet Alıntı yap Alıntılanma sayısı: 16 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

L Chen, Z Wang, S Ren, L Li, H Zhao, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …

Kaydet Alıntı yap Alıntılanma sayısı: 2 İlgili makaleler HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Dart: Denoising autoregressive transformer for scalable text-to-image generation

J Gu, Y Wang, Y Zhang, Q Zhang, D Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion models have become the dominant approach for visual generation. They are
trained by denoising a Markovian process which gradually adds noise to the input. We …

Kaydet Alıntı yap Alıntılanma sayısı: 9 İlgili makaleler HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Randomized autoregressive visual generation

Q Yu, J He, X Deng, X Shen, LC Chen - arxiv preprint arxiv:2411.00776, 2024 - arxiv.org

This paper presents Randomized AutoRegressive modeling (RAR) for visual generation,
which sets a new state-of-the-art performance on the image generation task while …

Kaydet Alıntı yap Alıntılanma sayısı: 9 İlgili makaleler HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Emu3: Next-token prediction is all you need

Show-o: One single transformer to unify multimodal understanding and generation

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

Janus: Decoupling visual encoding for unified multimodal understanding and generation

Loong: Generating minute-level long videos with autoregressive language models

Open-magvit2: An open-source project toward democratizing auto-regressive visual generation

Maskbit: Embedding-free image generation via bit tokens

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Dart: Denoising autoregressive transformer for scalable text-to-image generation

Randomized autoregressive visual generation