Towards world simulator: Crafting physical commonsense-based benchmark for video generation

F Meng, J Liao, X Tan, W Shao, Q Lu, K Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-video (T2V) models like Sora have made significant strides in visualizing complex
prompts, which is increasingly viewed as a promising path towards constructing the …

Motion prompting: Controlling video generation with motion trajectories

D Geng, C Herrmann, J Hur, F Cole, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Motion control is crucial for generating expressive and compelling video content; however,
most existing video generation models rely mainly on text prompts for control, which struggle …

Improving dynamic object interactions in text-to-video generation with ai feedback

H Furuta, H Zen, D Schuurmans, A Faust… - arxiv preprint arxiv …, 2024 - arxiv.org
Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …

Artificial intelligence for biomedical video generation

L Li, J Qiu, A Saha, L Li, P Li, M He, Z Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
As a prominent subfield of Artificial Intelligence Generated Content (AIGC), video generation
has achieved notable advancements in recent years. The introduction of Sora-alike models …

Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC

S Meng, Y Luo, P Liu - arxiv preprint arxiv:2502.07007, 2025 - arxiv.org
Recent advancements in AI-generated content have significantly improved the realism of 3D
and 4D generation. However, most existing methods prioritize appearance consistency …

Generative Physical AI in Vision: A Survey

D Liu, J Zhang, AD Dinh, E Park, S Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org
Generative Artificial Intelligence (AI) has rapidly advanced the field of computer vision by
enabling machines to create and interpret visual data with unprecedented sophistication …

PhysMotion: Physics-Grounded Dynamics From a Single Image

X Tan, Y Jiang, X Li, Z Zong, T **e, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce PhysMotion, a novel framework that leverages principled physics-based
simulations to guide intermediate 3D representations generated from a single image and …

Surgsora: Decoupled rgbd-flow diffusion model for controllable surgical video generation

T Chen, S Yang, J Wang, L Bai, H Ren… - arxiv preprint arxiv …, 2024 - arxiv.org
Medical video generation has transformative potential for enhancing surgical understanding
and pathology insights through precise and controllable visual representations. However …

A Survey of Sustainability in Large Language Models: Applications, Economics, and Challenges

A Singh, NP Patel, A Ehtesham, S Kumar… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have transformed numerous domains by providing
advanced capabilities in natural language understanding, generation, and reasoning …

Llmphy: Complex physical reasoning using large language models and world models

A Cherian, R Corcodel, S Jain, D Romeres - arxiv preprint arxiv …, 2024 - arxiv.org
Physical reasoning is an important skill needed for robotic agents when operating in the real
world. However, solving such reasoning problems often involves hypothesizing and …