- Academic Search

M Tschannen, C Eastwood, F Mentzer - European Conference on …, 2024 - Springer

Abstract We introduce Generative Infinite-Vocabulary Transformers (GIVT) which generate
vector sequences with real-valued entries, instead of discrete tokens from a finite …

Save Cite Cited by 32 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …

Save Cite Cited by 26 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Maskbit: Embedding-free image generation via bit tokens

M Weber, L Yu, Q Yu, X Deng, X Shen… - arxiv preprint arxiv …, 2024 - arxiv.org

Masked transformer models for class-conditional image generation have become a
compelling alternative to diffusion models. Typically comprising two stages-an initial VQGAN …

Save Cite Cited by 16 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Emova: Empowering language models to see, hear and speak with vivid emotions

K Chen, Y Gou, R Huang, Z Liu, D Tan, J Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and
tones, marks a milestone for omni-modal foundation models. However, empowering Large …

Save Cite Cited by 12 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Visual autoregressive modeling: Scalable image generation via next-scale prediction

K Tian, Y Jiang, Z Yuan, B Peng, L Wang - arxiv preprint arxiv:2404.02905, 2024 - arxiv.org

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that
redefines the autoregressive learning on images as coarse-to-fine" next-scale prediction" or" …

Save Cite Cited by 139 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Maskgct: Zero-shot text-to-speech with masked generative codec transformer

Y Wang, H Zhan, L Liu, R Zeng, H Guo, J Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent large-scale text-to-speech (TTS) systems are usually grouped as autoregressive
and non-autoregressive systems. The autoregressive systems implicitly model duration but …

Save Cite Cited by 12 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Adanat: Exploring adaptive policy for token-based image generation

Z Ni, Y Wang, R Zhou, R Lu, J Guo, J Hu, Z Liu… - … on Computer Vision, 2024 - Springer

Recent studies have demonstrated the effectiveness of token-based methods for visual
content generation. As a representative work, non-autoregressive Transformers (NATs) are …

Save Cite Cited by 3 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Save Cite Cited by 6 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Quest: Self-supervised skill abstractions for learning continuous control

A Mete, H Xue, A Wilcox, Y Chen, A Garg - arxiv preprint arxiv …, 2024 - arxiv.org

Generalization capabilities, or rather a lack thereof, is one of the most important unsolved
problems in the field of robot learning, and while several large scale efforts have set out to …

Save Cite Cited by 5 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Vector Quantization for Recommender Systems: A Review and Outlook

Q Liu, X Dong, J **ao, N Chen, H Hu, J Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org

Vector quantization, renowned for its unparalleled feature compression capabilities, has
been a prominent topic in signal processing and machine learning research for several …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Finite scalar quantization: Vq-vae made simple

Givt: Generative infinite-vocabulary transformers

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

Maskbit: Embedding-free image generation via bit tokens

Emova: Empowering language models to see, hear and speak with vivid emotions

Visual autoregressive modeling: Scalable image generation via next-scale prediction

Maskgct: Zero-shot text-to-speech with masked generative codec transformer

Adanat: Exploring adaptive policy for token-based image generation

Wavchat: A survey of spoken dialogue models

Quest: Self-supervised skill abstractions for learning continuous control

Vector Quantization for Recommender Systems: A Review and Outlook