- Academic Search

H Chang, H Zhang, J Barber, AJ Maschinot… - arxiv preprint arxiv …, 2023 - arxiv.org

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image
generation performance while being significantly more efficient than diffusion or …

Save Cite Cited by 491 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

4m: Massively multimodal masked modeling

D Mizrahi, R Bachmann, O Kar, T Yeo… - Advances in …, 2023 - proceedings.neurips.cc

Current machine learning models for vision are often highly specialized and limited to a
single modality and task. In contrast, recent large language models exhibit a wide range of …

Save Cite Cited by 55 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Learning vision from models rivals learning vision from data

Y Tian, L Fan, K Chen, D Katabi… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce SynCLR a novel approach for learning visual representations exclusively from
synthetic images without any real data. We synthesize a large dataset of image captions …

Save Cite Cited by 46 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Givt: Generative infinite-vocabulary transformers

M Tschannen, C Eastwood, F Mentzer - European Conference on …, 2024 - Springer

Abstract We introduce Generative Infinite-Vocabulary Transformers (GIVT) which generate
vector sequences with real-valued entries, instead of discrete tokens from a finite …

Save Cite Cited by 32 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arxiv preprint arxiv …, 2024 - arxiv.org

General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

Save Cite Cited by 33 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Revisiting non-autoregressive transformers for efficient image synthesis

Z Ni, Y Wang, R Zhou, J Guo, J Hu… - Proceedings of the …, 2024 - openaccess.thecvf.com

The field of image synthesis is currently flourishing due to the advancements in diffusion
models. While diffusion models have been successful their computational intensity has …

Save Cite Cited by 11 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Representation alignment for generation: Training diffusion transformers is easier than you think

S Yu, S Kwak, H Jang, J Jeong, J Huang, J Shin… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent studies have shown that the denoising process in (generative) diffusion models can
induce meaningful (discriminative) representations inside the model, though the quality of …

Save Cite Cited by 23 Related articles View as HTML

[Free GPT-4]

[PDF] thecvf.com

Momask: Generative masked modeling of 3d human motions

C Guo, Y Mu, MG Javed, S Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce MoMask a novel masked modeling framework for text-driven 3D human
motion generation. In MoMask a hierarchical quantization scheme is employed to represent …

Save Cite Cited by 27 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Masked modeling for self-supervised representation learning on vision and beyond

S Li, L Zhang, Z Wang, D Wu, L Wu, Z Liu, J **a… - arxiv preprint arxiv …, 2023 - arxiv.org

As the deep learning revolution marches on, self-supervised learning has garnered
increasing attention in recent years thanks to its remarkable representation learning ability …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Vpp: Efficient conditional 3d generation via voxel-point progressive representation

Z Qi, M Yu, R Dong, K Ma - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Conditional 3D generation is undergoing a significant advancement, enabling the free
creation of 3D content from inputs such as text or 2D images. However, previous …

Save Cite Cited by 14 Related articles All 5 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Mage: Masked generative encoder to unify representation learning and image synthesis

Muse: Text-to-image generation via masked generative transformers

4m: Massively multimodal masked modeling

Learning vision from models rivals learning vision from data

Givt: Generative infinite-vocabulary transformers

Is sora a world simulator? a comprehensive survey on general world models and beyond

Revisiting non-autoregressive transformers for efficient image synthesis

Representation alignment for generation: Training diffusion transformers is easier than you think

Momask: Generative masked modeling of 3d human motions

Masked modeling for self-supervised representation learning on vision and beyond

Vpp: Efficient conditional 3d generation via voxel-point progressive representation