X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Z Sun, Z Chu, P Zhang, T Wu, X Dong, Y Zang… - arxiv preprint arxiv …, 2024 - arxiv.org
In-context generation is a key component of large language models'(LLMs) open-task
generalization capability. By leveraging a few examples as context, LLMs can perform both …

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads

S Kou, J **, C Liu, Y Ma, J Jia, Q Chen, P Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Orthus, an autoregressive (AR) transformer that excels in generating images
given textual prompts, answering questions based on visual inputs, and even crafting …

[HTML][HTML] Continuous speculative decoding for autoregressive image generation

Z Wang, R Zhang, K Ding, Q Yang, F Li… - Advances in Neural …, 2024 - jalms.net
連続値自己回帰 (AR) 画像生成モデルは, 離散トークンモデルと比較して顕著な優位性を示し,
優れた再構成品質と高い生成忠実度を実証している. しかしながら, 自己回帰フレームワークの計算 …

Autoregressive Models in Vision: A Survey

J **ong, G Liu, L Huang, C Wu, T Wu, Y Mu… - arxiv preprint arxiv …, 2024 - arxiv.org
Autoregressive modeling has been a huge success in the field of natural language
processing (NLP). Recently, autoregressive models have emerged as a significant area of …

SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer

E **e, J Chen, Y Zhao, J Yu, L Zhu, Y Lin… - arxiv preprint arxiv …, 2025 - arxiv.org
This paper presents SANA-1.5, a linear Diffusion Transformer for efficient scaling in text-to-
image generation. Building upon SANA-1.0, we introduce three key innovations:(1) Efficient …

A Survey on Vision Autoregressive Model

K Jiang, J Huang - arxiv preprint arxiv:2411.08666, 2024 - arxiv.org
Autoregressive models have demonstrated great performance in natural language
processing (NLP) with impressive scalability, adaptability and generalizability. Inspired by …