Aligning cyber space with physical world: A comprehensive survey on embodied ai

Y Liu, W Chen, Y Bai, X Liang, G Li, W Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …

Drivedreamer4d: World models are effective data machines for 4d driving scene representation

G Zhao, C Ni, X Wang, Z Zhu, X Zhang, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Closed-loop simulation is essential for advancing end-to-end autonomous driving systems.
Contemporary sensor simulation methods, such as NeRF and 3DGS, rely predominantly on …

Understanding World or Predicting Future? A Comprehensive Survey of World Models

J Ding, Y Zhang, Y Shang, Y Zhang, Z Zong… - arxiv preprint arxiv …, 2024 - arxiv.org
The concept of world models has garnered significant attention due to advancements in
multimodal large language models such as GPT-4 and video generation models such as …

Towards world simulator: Crafting physical commonsense-based benchmark for video generation

F Meng, J Liao, X Tan, W Shao, Q Lu, K Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-video (T2V) models like Sora have made significant strides in visualizing complex
prompts, which is increasingly viewed as a promising path towards constructing the …

Avid: Adapting video diffusion models to world models

M Rigter, T Gupta, A Hilmkil, C Ma - arxiv preprint arxiv:2410.12822, 2024 - arxiv.org
Large-scale generative models have achieved remarkable success in a number of domains.
However, for sequential decision-making problems, such as robotics, action-labelled data is …

Acdc: Autoregressive coherent multimodal generation using diffusion correction

H Chung, D Lee, JC Ye - arxiv preprint arxiv:2410.04721, 2024 - arxiv.org
Autoregressive models (ARMs) and diffusion models (DMs) represent two leading
paradigms in generative modeling, each excelling in distinct areas: ARMs in global context …

Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers

Y Chen, Y Wang, Z Zhang - arxiv preprint arxiv:2412.18607, 2024 - arxiv.org
World model-based searching and planning are widely recognized as a promising path
toward human-level physical intelligence. However, current driving world models primarily …

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

C Ni, G Zhao, X Wang, Z Zhu, W Qin, G Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
Closed-loop simulation is crucial for end-to-end autonomous driving. Existing sensor
simulation methods (eg, NeRF and 3DGS) reconstruct driving scenes based on conditions …

Autoregressive Models in Vision: A Survey

J **ong, G Liu, L Huang, C Wu, T Wu, Y Mu… - arxiv preprint arxiv …, 2024 - arxiv.org
Autoregressive modeling has been a huge success in the field of natural language
processing (NLP). Recently, autoregressive models have emerged as a significant area of …

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

X Chi, Y Wang, A Cheng, P Fang, Z Tian, Y He… - arxiv preprint arxiv …, 2024 - arxiv.org
Massive multi-modality datasets play a significant role in facilitating the success of large
video-language models. However, current video-language datasets primarily provide text …