CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

L Zhang, Z Wang, Q Zhang, Q Qiu, A Pang… - ACM Transactions on …, 2024 - dl.acm.org
In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is
often hampered by the limitations of existing digital tools, which demand extensive expertise …

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

S Chen, X Chen, C Zhang, M Li, G Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Recent progress in Large Multimodal Models (LMM) has opened up great
possibilities for various applications in the field of human-machine interactions. However …

Motionchain: Conversational motion controllers via multimodal prompts

B Jiang, X Chen, C Zhang, F Yin, Z Li, G Yu… - European Conference on …, 2024 - Springer
Recent advancements in language models have demonstrated their adeptness in
conducting multi-turn dialogues and retaining conversational context. However, this …

SemGrasp : Semantic Grasp Generation via Language Aligned Discretization

K Li, J Wang, L Yang, C Lu, B Dai - European Conference on Computer …, 2024 - Springer
Generating natural human grasps necessitates consideration of not just object geometry but
also semantic information. Solely depending on object shape for grasp generation confines …

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation

Y Yang, F Yin, W Liu, J Fan, X Chen, G Yu… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Recent advancements in implicit neural representations have contributed to high-fidelity
surface reconstruction and photorealistic novel view synthesis. However, with the expansion …

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

S Chen, X Chen, A Pang, X Zeng, W Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed,
and storage efficiency, which is widely preferred in various applications. However, given its …

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

X Ma, Y Bhalgat, B Smart, S Chen, X Li, J Ding… - arxiv preprint arxiv …, 2024 - arxiv.org
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs)
has seen rapid progress, offering unprecedented capabilities for understanding and …

M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions

M Li, X Chen, C Zhang, S Chen, H Zhu, F Yin… - … on Computer Vision, 2024 - Springer
Recently, the understanding of the 3D world has garnered increased attention, facilitating
autonomous agents to perform further decision-making. However, the majority of existing 3D …

Text‐to‐Microstructure Generation Using Generative Deep Learning

X Zheng, I Watanabe, J Paik, J Li, X Guo, M Naito - Small, 2024 - Wiley Online Library
Designing novel materials is greatly dependent on understanding the design principles,
physical mechanisms, and modeling methods of material microstructures, requiring …