Real-world robot applications of foundation models: A review

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

B Jia, Y Chen, H Yu, Y Wang, X Niu, T Liu, Q Li… - … on Computer Vision, 2024 - Springer
Abstract 3D vision-language (3D-VL) grounding, which aims to align language with 3D
physical environments, stands as a cornerstone in develo** embodied agents. In …

Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts

H Geng, H Xu, C Zhao, C Xu, L Yi… - Proceedings of the …, 2023 - openaccess.thecvf.com
For years, researchers have been devoted to generalizable object perception and
manipulation, where cross-category generalizability is highly desired yet underexplored. In …

Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers

L Wang, X Chen, J Zhao, K He - arxiv preprint arxiv:2409.20537, 2024 - arxiv.org
One of the roadblocks for training generalist robotic models today is heterogeneity. Previous
robot learning methods often collect data to train with one specific embodiment for one task …

Manipllm: Embodied multimodal large language model for object-centric robotic manipulation

X Li, M Zhang, Y Geng, H Geng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Robot manipulation relies on accurately predicting contact points and end-effector directions
to ensure successful operation. However learning-based robot manipulation trained on a …

Large language models as generalizable policies for embodied tasks

A Szot, M Schwarzer, H Agrawal… - The Twelfth …, 2023 - openreview.net
We show that large language models (LLMs) can be adapted to be generalizable policies
for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement …

An embodied generalist agent in 3d world

J Huang, S Yong, X Ma, X Linghu, P Li, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Leveraging massive knowledge and learning schemes from large language models (LLMs),
recent machine learning models show notable successes in building generalist agents that …

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Y Kuang, J Ye, H Geng, J Mao, C Deng… - arxiv preprint arxiv …, 2024 - arxiv.org
This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation,
dubbed RAM, featuring generalizability across various objects, environments, and …

Physcene: Physically interactable 3d scene synthesis for embodied ai

Y Yang, B Jia, P Zhi, S Huang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
With recent developments in Embodied Artificial Intelligence (EAI) research there has been
a growing demand for high-quality large-scale interactive scene generation. While prior …