Aligning cyber space with physical world: A comprehensive survey on embodied ai
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Drivedreamer4d: World models are effective data machines for 4d driving scene representation
Closed-loop simulation is essential for advancing end-to-end autonomous driving systems.
Contemporary sensor simulation methods, such as NeRF and 3DGS, rely predominantly on …
Contemporary sensor simulation methods, such as NeRF and 3DGS, rely predominantly on …
Understanding World or Predicting Future? A Comprehensive Survey of World Models
The concept of world models has garnered significant attention due to advancements in
multimodal large language models such as GPT-4 and video generation models such as …
multimodal large language models such as GPT-4 and video generation models such as …
Towards world simulator: Crafting physical commonsense-based benchmark for video generation
Text-to-video (T2V) models like Sora have made significant strides in visualizing complex
prompts, which is increasingly viewed as a promising path towards constructing the …
prompts, which is increasingly viewed as a promising path towards constructing the …
Avid: Adapting video diffusion models to world models
Large-scale generative models have achieved remarkable success in a number of domains.
However, for sequential decision-making problems, such as robotics, action-labelled data is …
However, for sequential decision-making problems, such as robotics, action-labelled data is …
Acdc: Autoregressive coherent multimodal generation using diffusion correction
Autoregressive models (ARMs) and diffusion models (DMs) represent two leading
paradigms in generative modeling, each excelling in distinct areas: ARMs in global context …
paradigms in generative modeling, each excelling in distinct areas: ARMs in global context …
Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers
World model-based searching and planning are widely recognized as a promising path
toward human-level physical intelligence. However, current driving world models primarily …
toward human-level physical intelligence. However, current driving world models primarily …
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
Closed-loop simulation is crucial for end-to-end autonomous driving. Existing sensor
simulation methods (eg, NeRF and 3DGS) reconstruct driving scenes based on conditions …
simulation methods (eg, NeRF and 3DGS) reconstruct driving scenes based on conditions …
Autoregressive Models in Vision: A Survey
Autoregressive modeling has been a huge success in the field of natural language
processing (NLP). Recently, autoregressive models have emerged as a significant area of …
processing (NLP). Recently, autoregressive models have emerged as a significant area of …
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Massive multi-modality datasets play a significant role in facilitating the success of large
video-language models. However, current video-language datasets primarily provide text …
video-language models. However, current video-language datasets primarily provide text …