[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)
Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …
Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts
For years, researchers have been devoted to generalizable object perception and
manipulation, where cross-category generalizability is highly desired yet underexplored. In …
manipulation, where cross-category generalizability is highly desired yet underexplored. In …
Where2explore: Few-shot affordance learning for unseen novel categories of articulated objects
Articulated object manipulation is a fundamental yet challenging task in robotics. Due to
significant geometric and semantic variations across object categories, previous …
significant geometric and semantic variations across object categories, previous …
Learning environment-aware affordance for 3d articulated object manipulation under occlusions
Perceiving and manipulating 3D articulated objects in diverse environments is essential for
home-assistant robots. Recent studies have shown that point-level affordance provides …
home-assistant robots. Recent studies have shown that point-level affordance provides …
Ditto: Building digital twins of articulated objects from interaction
Digitizing physical objects into the virtual world has the potential to unlock new research and
applications in embodied AI and mixed reality. This work focuses on recreating interactive …
applications in embodied AI and mixed reality. This work focuses on recreating interactive …
Continuous scene representations for embodied ai
Abstract We propose Continuous Scene Representations (CSR), a scene representation
constructed by an embodied agent navigating within a space, where objects and their …
constructed by an embodied agent navigating within a space, where objects and their …
Unsupervised part discovery from contrastive reconstruction
The goal of self-supervised visual representation learning is to learn strong, transferable
image representations, with the majority of research focusing on object or scene level. On …
image representations, with the majority of research focusing on object or scene level. On …
Paris: Part-level reconstruction and motion analysis for articulated objects
We address the task of simultaneous part-level reconstruction and motion parameter
estimation for articulated objects. Given two sets of multi-view images of an object in two …
estimation for articulated objects. Given two sets of multi-view images of an object in two …
Universal manipulation policy network for articulated objects
We introduce the Universal Manipulation Policy Network (UMPNet)–a single image-based
policy network that infers closed-loop action sequences for manipulating articulated objects …
policy network that infers closed-loop action sequences for manipulating articulated objects …
Learning foresightful dense visual affordance for deformable object manipulation
Understanding and manipulating deformable objects (eg, ropes and fabrics) is an essential
yet challenging task with broad applications. Difficulties come from complex states and …
yet challenging task with broad applications. Difficulties come from complex states and …