Llm-grounder: Open-vocabulary 3d visual grounding with large language model as an agent
3D visual grounding is a critical skill for household robots, enabling them to navigate,
manipulate objects, and answer questions based on their environment. While existing …
manipulate objects, and answer questions based on their environment. While existing …
Multi3drefer: Grounding text description to multiple 3d objects
We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …
Rekep: Spatio-temporal reasoning of relational keypoint constraints for robotic manipulation
Representing robotic manipulation tasks as constraints that associate the robot and the
environment is a promising way to encode desired robot behaviors. However, it remains …
environment is a promising way to encode desired robot behaviors. However, it remains …
Visual programming for zero-shot open-vocabulary 3d visual grounding
Abstract 3D Visual Grounding (3DVG) aims at localizing 3D object based on textual
descriptions. Conventional supervised methods for 3DVG often necessitate extensive …
descriptions. Conventional supervised methods for 3DVG often necessitate extensive …
What's left? concept grounding with logic-enhanced foundation models
Recent works such as VisProg and ViperGPT have smartly composed foundation models for
visual reasoning—using large language models (LLMs) to produce programs that can be …
visual reasoning—using large language models (LLMs) to produce programs that can be …
Recent advances in multi-modal 3D scene understanding: A comprehensive survey and evaluation
Multi-modal 3D scene understanding has gained considerable attention due to its wide
applications in many areas, such as autonomous driving and human-computer interaction …
applications in many areas, such as autonomous driving and human-computer interaction …
Towards data-and knowledge-driven artificial intelligence: A survey on neuro-symbolic computing
Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and
statistical paradigms of cognition, has been an active research area of Artificial Intelligence …
statistical paradigms of cognition, has been an active research area of Artificial Intelligence …
Naturally supervised 3d visual grounding with language-regularized concept learners
Abstract 3D visual grounding is a challenging task that often requires direct and dense
supervision notably the semantic label for each object in the scene. In this paper we instead …
supervision notably the semantic label for each object in the scene. In this paper we instead …
Motion question answering via modular motion programs
In order to build artificial intelligence systems that can perceive and reason with human
behavior in the real world, we must first design models that conduct complex spatio-temporal …
behavior in the real world, we must first design models that conduct complex spatio-temporal …
Towards data-and knowledge-driven AI: a survey on neuro-symbolic computing
Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and
statistical paradigms of cognition, has been an active research area of Artificial Intelligence …
statistical paradigms of cognition, has been an active research area of Artificial Intelligence …