Maskbit: Embedding-free image generation via bit tokens

M Weber, L Yu, Q Yu, X Deng, X Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Masked transformer models for class-conditional image generation have become a
compelling alternative to diffusion models. Typically comprising two stages-an initial VQGAN …

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

L Chen, Z Wang, S Ren, L Li, H Zhao, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …

[HTML][HTML] Continuous speculative decoding for autoregressive image generation

Z Wang, R Zhang, K Ding, Q Yang, F Li… - Advances in Neural …, 2024 - jalms.net
連続値自己回帰 (AR) 画像生成モデルは, 離散トークンモデルと比較して顕著な優位性を示し,
優れた再構成品質と高い生成忠実度を実証している. しかしながら, 自己回帰フレームワークの計算 …

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

D Kim, J He, Q Yu, C Yang, X Shen, S Kwak… - arxiv preprint arxiv …, 2025 - arxiv.org
Image tokenizers form the foundation of modern text-to-image generative models but are
notoriously difficult to train. Furthermore, most existing text-to-image models rely on large …

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Z Pang, T Zhang, F Luan, Y Man, H Tan… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce RandAR, a decoder-only visual autoregressive (AR) model capable of
generating images in arbitrary token orders. Unlike previous decoder-only AR models that …

FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching

S Ren, Q Yu, J He, X Shen, A Yuille… - arxiv preprint arxiv …, 2024 - arxiv.org
Autoregressive (AR) modeling has achieved remarkable success in natural language
processing by enabling models to generate text with coherence and contextual …

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

H Chen, Z Wang, X Li, X Sun, F Chen, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Efficient image tokenization with high compression ratios remains a critical challenge for
training generative models. We present SoftVQ-VAE, a continuous image tokenizer that …

3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes

T Medi, A Rampini, P Reddy, PK Jayaraman… - arxiv preprint arxiv …, 2024 - arxiv.org
Autoregressive (AR) models have achieved remarkable success in natural language and
image generation, but their application to 3D shape modeling remains largely unexplored …

Exact Sampling for Classical and Quantum Many-Body Systems

D Wu - 2024 - infoscience.epfl.ch
Many-body systems at low temperature have revealed non-trivial phases of materials, such
as spin liquids, which have found applications in the evolving fields of superconductivity …