Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Y Hu, F Lin, T Zhang, L Yi, Y Gao - arxiv preprint arxiv:2311.17842, 2023 - arxiv.org
In this study, we are interested in imbuing robots with the capability of physically-grounded
task planning. Recent advancements have shown that large language models (LLMs) …

Manigaussian: Dynamic gaussian splatting for multi-task robotic manipulation

G Lu, S Zhang, Z Wang, C Liu, J Lu, Y Tang - European Conference on …, 2024 - Springer
Performing language-conditioned robotic manipulation tasks in unstructured environments
is highly demanded for general intelligent robots. Conventional robotic manipulation …

Copa: General robotic manipulation through spatial constraints of parts with foundation models

H Huang, F Lin, Y Hu, S Wang… - 2024 IEEE/RSJ …, 2024 - ieeexplore.ieee.org
Foundation models pre-trained on web-scale data are shown to encapsulate extensive
world knowledge beneficial for robotic manipulation in the form of task planning. However …

From 3D point‐cloud data to explainable geometric deep learning: State‐of‐the‐art and future challenges

A Saranti, B Pfeifer, C Gollob… - … : Data Mining and …, 2024 - Wiley Online Library
We present an exciting journey from 3D point‐cloud data (PCD) to the state of the art in
graph neural networks (GNNs) and their evolution with explainable artificial intelligence …

Sugar: Pre-training 3d visual representations for robotics

S Chen, R Garcia, I Laptev… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Learning generalizable visual representations from Internet data has yielded promising
results for robotics. Yet prevailing approaches focus on pre-training 2D representations …

Rise: 3d perception makes real-world robot imitation simple and effective

C Wang, H Fang, HS Fang, C Lu - 2024 IEEE/RSJ International …, 2024 - ieeexplore.ieee.org
Precise robot manipulations require rich spatial information in imitation learning. Image-
based policies model object positions from fixed cameras, which are sensitive to camera …

SAM-E: leveraging visual foundation model with sequence imitation for embodied manipulation

J Zhang, C Bai, H He, W **a, Z Wang, B Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of
scene understanding and action prediction. Current methods employ both 3D representation …

Cage: Causal attention enables data-efficient generalizable robotic manipulation

S **a, H Fang, C Lu, HS Fang - arxiv preprint arxiv:2410.14974, 2024 - arxiv.org
Generalization in robotic manipulation remains a critical challenge, particularly when scaling
to new environments with limited demonstrations. This paper introduces CAGE, a novel …

Visual grounding for object-level generalization in reinforcement learning

H Jiang, Z Lu - European Conference on Computer Vision, 2024 - Springer
Generalization is a pivotal challenge for agents following natural language instructions. To
approach this goal, we leverage a vision-language model (VLM) for visual grounding and …

Leveraging locality to boost sample efficiency in robotic manipulation

T Zhang, Y Hu, J You, Y Gao - arxiv preprint arxiv:2406.10615, 2024 - arxiv.org
Given the high cost of collecting robotic data in the real world, sample efficiency is a
consistently compelling pursuit in robotics. In this paper, we introduce SGRv2, an imitation …