Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

4d contrastive superflows are dense 3d representation learners

X Xu, L Kong, H Shuai, W Zhang, L Pan, K Chen… - … on Computer Vision, 2024 - Springer
In the realm of autonomous driving, accurate 3D perception is the foundation. However,
develo** such models relies on extensive human annotations–a process that is both …

Hand-Centric Motion Refinement for 3D Hand-Object Interaction via Hierarchical Spatial-Temporal Modeling

Y Hao, J Zhang, T Zhuo, F Wen, H Fan - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Hands are the main medium when people interact with the world. Generating proper 3D
motion for hand-object interaction is vital for applications such as virtual reality and robotics …

Eqvafford: Se (3) equivariance for point-level affordance learning

Y Chen, C Tie, R Wu, H Dong - arxiv preprint arxiv:2408.01953, 2024 - arxiv.org
Humans perceive and interact with the world with the awareness of equivariance, facilitating
us in manipulating different objects in diverse poses. For robotic manipulation, such …

MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models

J Liu, J Han, L Liu, AI Aviles-Rivero, C Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Point cloud videos effectively capture real-world spatial geometries and temporal dynamics,
which are essential for enabling intelligent agents to understand the dynamically changing …