Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arxiv preprint arxiv …, 2024 - arxiv.org
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

Transformer-based generative adversarial networks in computer vision: A comprehensive survey

SR Dubey, SK Singh - IEEE Transactions on Artificial …, 2024 - ieeexplore.ieee.org
Generative adversarial networks (GANs) have been very successful for synthesizing the
images in a given dataset. The artificially generated images by GANs are very realistic. The …

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

Visual prompting via image inpainting

A Bar, Y Gandelsman, T Darrell… - Advances in Neural …, 2022 - proceedings.neurips.cc
How does one adapt a pre-trained visual model to novel downstream tasks without task-
specific finetuning or any model modification? Inspired by prompting in NLP, this paper …

Regularized vector quantization for tokenized image synthesis

J Zhang, F Zhan, C Theobalt… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Quantizing images into discrete representations has been a fundamental problem in unified
generative modeling. Predominant approaches learn the discrete representation either in a …

Repaint: Inpainting using denoising diffusion probabilistic models

A Lugmayr, M Danelljan, A Romero… - Proceedings of the …, 2022 - openaccess.thecvf.com
Free-form inpainting is the task of adding new content to an image in the regions specified
by an arbitrary binary mask. Most existing approaches train for a certain distribution of …

Mat: Mask-aware transformer for large hole image inpainting

W Li, Z Lin, K Zhou, L Qi, Y Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recent studies have shown the importance of modeling long-range interactions in the
inpainting problem. To achieve this goal, existing approaches exploit either standalone …

Textdiffuser: Diffusion models as text painters

J Chen, Y Huang, T Lv, L Cui… - Advances in Neural …, 2023 - proceedings.neurips.cc
Diffusion models have gained increasing attention for their impressive generation abilities
but currently struggle with rendering accurate and coherent text. To address this issue, we …

Deep learning for image inpainting: A survey

H **ang, Q Zou, MA Nawaz, X Huang, F Zhang, H Yu - Pattern Recognition, 2023 - Elsevier
Image inpainting has been widely exploited in the field of computer vision and image
processing. The main purpose of image inpainting is to produce visually plausible structure …

Modulated contrast for versatile image synthesis

F Zhan, J Zhang, Y Yu, R Wu… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Perceiving the similarity between images has been a long-standing and fundamental
problem underlying various visual generation tasks. Predominant approaches measure the …