- Academic Search

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Save Cite Cited by 77 Related articles

[Free GPT-4]

[PDF] arxiv.org

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Save Cite Cited by 151 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Align your latents: High-resolution video synthesis with latent diffusion models

A Blattmann, R Rombach, H Ling… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding
excessive compute demands by training a diffusion model in a compressed lower …

Save Cite Cited by 934 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Generating diverse and natural 3d human motions from text

C Guo, S Zou, X Zuo, S Wang, W Ji… - Proceedings of the …, 2022 - openaccess.thecvf.com

Automated generation of 3D human motions from text is a challenging problem. The
generated motions are expected to be sufficiently diverse to explore the text-grounded …

Save Cite Cited by 514 Related articles All 6 versions Free GPT-4 View as HTML

Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc

The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

Save Cite Cited by 285 Related articles All 6 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] openreview.net

Phenaki: Variable length video generation from open domain textual descriptions

R Villegas, M Babaeizadeh, PJ Kindermans… - International …, 2022 - openreview.net

We present Phenaki, a model capable of realistic video synthesis given a sequence of
textual prompts. Generating videos from text is particularly challenging due to the …

Save Cite Cited by 383 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Stable video diffusion: Scaling latent video diffusion models to large datasets

A Blattmann, T Dockhorn, S Kulal… - arxiv preprint arxiv …, 2023 - arxiv.org

We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of-
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …

Save Cite Cited by 709 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Mastering atari with discrete world models

D Hafner, T Lillicrap, M Norouzi, J Ba - arxiv preprint arxiv:2010.02193, 2020 - arxiv.org

Intelligent agents need to generalize from past experience to achieve goals in complex
environments. World models facilitate such generalization and allow learning behaviors …

Save Cite Cited by 931 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving

X Wang, Z Zhu, G Huang, X Chen, J Zhu… - European Conference on …, 2024 - Springer

World models, especially in autonomous driving, are trending and drawing extensive
attention due to their capacity for comprehending driving environments. The established …

Save Cite Cited by 120 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Videobert: A joint model for video and language representation learning

C Sun, A Myers, C Vondrick… - Proceedings of the …, 2019 - openaccess.thecvf.com

Self-supervised learning has become increasingly important to leverage the abundance of
unlabeled data available on platforms like YouTube. Whereas most existing approaches …

Save Cite Cited by 1496 Related articles All 10 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Stochastic video generation with a learned prior

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Align your latents: High-resolution video synthesis with latent diffusion models

Generating diverse and natural 3d human motions from text

Videocomposer: Compositional video synthesis with motion controllability

Phenaki: Variable length video generation from open domain textual descriptions

Stable video diffusion: Scaling latent video diffusion models to large datasets

Mastering atari with discrete world models

DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving

Videobert: A joint model for video and language representation learning