[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023 - stableaiprompts.com
Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts

H Geng, H Xu, C Zhao, C Xu, L Yi… - Proceedings of the …, 2023 - openaccess.thecvf.com
For years, researchers have been devoted to generalizable object perception and
manipulation, where cross-category generalizability is highly desired yet underexplored. In …

Where2explore: Few-shot affordance learning for unseen novel categories of articulated objects

C Ning, R Wu, H Lu, K Mo… - Advances in Neural …, 2023 - proceedings.neurips.cc
Articulated object manipulation is a fundamental yet challenging task in robotics. Due to
significant geometric and semantic variations across object categories, previous …

Learning environment-aware affordance for 3d articulated object manipulation under occlusions

R Wu, K Cheng, Y Zhao, C Ning… - Advances in Neural …, 2024 - proceedings.neurips.cc
Perceiving and manipulating 3D articulated objects in diverse environments is essential for
home-assistant robots. Recent studies have shown that point-level affordance provides …

Ditto: Building digital twins of articulated objects from interaction

Z Jiang, CC Hsu, Y Zhu - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Digitizing physical objects into the virtual world has the potential to unlock new research and
applications in embodied AI and mixed reality. This work focuses on recreating interactive …

Continuous scene representations for embodied ai

SY Gadre, K Ehsani, S Song… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract We propose Continuous Scene Representations (CSR), a scene representation
constructed by an embodied agent navigating within a space, where objects and their …

Unsupervised part discovery from contrastive reconstruction

S Choudhury, I Laina, C Rupprecht… - Advances in Neural …, 2021 - proceedings.neurips.cc
The goal of self-supervised visual representation learning is to learn strong, transferable
image representations, with the majority of research focusing on object or scene level. On …

Paris: Part-level reconstruction and motion analysis for articulated objects

J Liu, A Mahdavi-Amiri, M Savva - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We address the task of simultaneous part-level reconstruction and motion parameter
estimation for articulated objects. Given two sets of multi-view images of an object in two …

Universal manipulation policy network for articulated objects

Z Xu, Z He, S Song - IEEE robotics and automation letters, 2022 - ieeexplore.ieee.org
We introduce the Universal Manipulation Policy Network (UMPNet)–a single image-based
policy network that infers closed-loop action sequences for manipulating articulated objects …

Learning foresightful dense visual affordance for deformable object manipulation

R Wu, C Ning, H Dong - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Understanding and manipulating deformable objects (eg, ropes and fabrics) is an essential
yet challenging task with broad applications. Difficulties come from complex states and …