Aligning cyber space with physical world: A comprehensive survey on embodied ai

Y Liu, W Chen, Y Bai, X Liang, G Li, W Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …

Hallucination detection in foundation models for decision-making: A flexible definition and review of the state of the art

N Chakraborty, M Ornik, K Driggs-Campbell - ACM Computing Surveys, 2025 - dl.acm.org
Autonomous systems are soon to be ubiquitous, spanning manufacturing, agriculture,
healthcare, entertainment, and other industries. Most of these systems are developed with …

Scenecraft: An llm agent for synthesizing 3d scenes as blender code

Z Hu, A Iscen, A Jain, T Kipf, Y Yue… - … on Machine Learning, 2024 - openreview.net
This paper introduces SceneCraft, a Large Language Model (LLM) Agent converting text
descriptions into Blender-executable Python scripts which render complex scenes with up to …

Disentangled 3d scene generation with layout learning

D Epstein, B Poole, B Mildenhall, AA Efros… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce a method to generate 3D scenes that are disentangled into their component
objects. This disentanglement is unsupervised, relying only on the knowledge of a large …

Gala3d: Towards text-to-3d complex scene generation via layout-guided generative gaussian splatting

X Zhou, X Ran, Y **ong, J He, Z Lin, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective
compositional text-to-3D generation. We first utilize large language models (LLMs) to …

Anyhome: Open-vocabulary generation of structured and textured 3d homes

R Fu, Z Wen, Z Liu, S Sridhar - European Conference on Computer Vision, 2024 - Springer
Inspired by cognitive theories, we introduce AnyHome, a framework that translates any text
into well-structured and textured indoor scenes at a house-scale. By prompting Large …

Llms meet multimodal generation and editing: A survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

Blenderalchemy: Editing 3d graphics with vision-language models

I Huang, G Yang, L Guibas - European Conference on Computer Vision, 2024 - Springer
Graphics design is important for various applications, including movie production and game
design. To create a high-quality scene, designers usually need to spend hours in software …

When llms step into the 3d world: A survey and meta-analysis of 3d tasks via multi-modal large language models

X Ma, Y Bhalgat, B Smart, S Chen, X Li, J Ding… - arxiv preprint arxiv …, 2024 - arxiv.org
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs)
has seen rapid progress, offering unprecedented capabilities for understanding and …

Understanding user experience in large language model interactions

J Wang, W Ma, P Sun, M Zhang, JY Nie - arxiv preprint arxiv:2401.08329, 2024 - arxiv.org
In the rapidly evolving landscape of large language models (LLMs), most research has
primarily viewed them as independent individuals, focusing on assessing their capabilities …