- Academic Search

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

Salva Cita Citato da 224 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library

The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Salva Cita Citato da 91 Articoli correlati Tutte e 12 le versioni

[Free GPT-4]

[PDF] arxiv.org

Show-1: Marrying pixel and latent diffusion models for text-to-video generation

DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu… - International Journal of …, 2024 - Springer

Significant advancements have been achieved in the realm of large-scale pre-trained text-to-
video Diffusion Models (VDMs). However, previous methods either rely solely on pixel …

Salva Cita Citato da 167 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[HTML] acm.org

Lumiere: A space-time diffusion model for video generation

O Bar-Tal, H Chefer, O Tov, C Herrmann… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org

We introduce Lumiere–a text-to-video diffusion model designed for synthesizing videos that
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …

Salva Cita Citato da 180 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] arxiv.org

Sparsectrl: Adding sparse controls to text-to-video diffusion models

Y Guo, C Yang, A Rao, M Agrawala, D Lin… - European Conference on …, 2024 - Springer

The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …

Salva Cita Citato da 76 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] thecvf.com

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Salva Cita Citato da 205 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

When do we not need larger vision models?

B Shi, Z Wu, M Mao, X Wang, T Darrell - European Conference on …, 2024 - Springer

Scaling up the size of vision models has been the de facto standard to obtain more powerful
visual representations. In this work, we discuss the point beyond which larger vision models …

Salva Cita Citato da 41 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] arxiv.org

Emu3: Next-token prediction is all you need

X Wang, X Zhang, Z Luo, Q Sun, Y Cui, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

While next-token prediction is considered a promising path towards artificial general
intelligence, it has struggled to excel in multimodal tasks, which are still dominated by …

Salva Cita Citato da 68 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Show-o: One single transformer to unify multimodal understanding and generation

J **e, W Mao, Z Bai, DJ Zhang, W Wang, KQ Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

We present a unified transformer, ie, Show-o, that unifies multimodal understanding and
generation. Unlike fully autoregressive models, Show-o unifies autoregressive and …

Salva Cita Citato da 72 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Physdreamer: Physics-based interaction with 3d objects via video generation

T Zhang, HX Yu, R Wu, BY Feng, C Zheng… - … on Computer Vision, 2024 - Springer

Realistic object interactions are crucial for creating immersive virtual experiences, yet
synthesizing realistic 3D object dynamics in response to novel interactions remains a …

Salva Cita Citato da 35 Articoli correlati Tutte e 2 le versioni

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Videopoet: A large language model for zero-shot video generation

Sora: A review on background, technology, limitations, and opportunities of large vision models

State of the art on diffusion models for visual computing

Show-1: Marrying pixel and latent diffusion models for text-to-video generation

Lumiere: A space-time diffusion model for video generation

Sparsectrl: Adding sparse controls to text-to-video diffusion models

Vbench: Comprehensive benchmark suite for video generative models

When do we not need larger vision models?

Emu3: Next-token prediction is all you need

Show-o: One single transformer to unify multimodal understanding and generation

Physdreamer: Physics-based interaction with 3d objects via video generation