Llm-grounder: Open-vocabulary 3d visual grounding with large language model as an agent

J Yang, X Chen, S Qian, N Madaan… - … on Robotics and …, 2024‏ - ieeexplore.ieee.org
3D visual grounding is a critical skill for household robots, enabling them to navigate,
manipulate objects, and answer questions based on their environment. While existing …

Multi3drefer: Grounding text description to multiple 3d objects

Y Zhang, ZM Gong, AX Chang - Proceedings of the IEEE …, 2023‏ - openaccess.thecvf.com
We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …

Rekep: Spatio-temporal reasoning of relational keypoint constraints for robotic manipulation

W Huang, C Wang, Y Li, R Zhang, L Fei-Fei - arxiv preprint arxiv …, 2024‏ - arxiv.org
Representing robotic manipulation tasks as constraints that associate the robot and the
environment is a promising way to encode desired robot behaviors. However, it remains …

Visual programming for zero-shot open-vocabulary 3d visual grounding

Z Yuan, J Ren, CM Feng, H Zhao… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Abstract 3D Visual Grounding (3DVG) aims at localizing 3D object based on textual
descriptions. Conventional supervised methods for 3DVG often necessitate extensive …

What's left? concept grounding with logic-enhanced foundation models

J Hsu, J Mao, J Tenenbaum… - Advances in Neural …, 2024‏ - proceedings.neurips.cc
Recent works such as VisProg and ViperGPT have smartly composed foundation models for
visual reasoning—using large language models (LLMs) to produce programs that can be …

Recent advances in multi-modal 3D scene understanding: A comprehensive survey and evaluation

Y Lei, Z Wang, F Chen, G Wang, P Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Multi-modal 3D scene understanding has gained considerable attention due to its wide
applications in many areas, such as autonomous driving and human-computer interaction …

Towards data-and knowledge-driven artificial intelligence: A survey on neuro-symbolic computing

W Wang, Y Yang, F Wu - arxiv preprint arxiv:2210.15889, 2022‏ - arxiv.org
Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and
statistical paradigms of cognition, has been an active research area of Artificial Intelligence …

Naturally supervised 3d visual grounding with language-regularized concept learners

C Feng, J Hsu, W Liu, J Wu - Proceedings of the IEEE/CVF …, 2024‏ - openaccess.thecvf.com
Abstract 3D visual grounding is a challenging task that often requires direct and dense
supervision notably the semantic label for each object in the scene. In this paper we instead …

Motion question answering via modular motion programs

M Endo, J Hsu, J Li, J Wu - International Conference on …, 2023‏ - proceedings.mlr.press
In order to build artificial intelligence systems that can perceive and reason with human
behavior in the real world, we must first design models that conduct complex spatio-temporal …

Towards data-and knowledge-driven AI: a survey on neuro-symbolic computing

W Wang, Y Yang, F Wu - IEEE Transactions on Pattern Analysis …, 2024‏ - ieeexplore.ieee.org
Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and
statistical paradigms of cognition, has been an active research area of Artificial Intelligence …