- Academic Search

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

Save Cite Cited by 39 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Save Cite Cited by 40 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

B Jia, Y Chen, H Yu, Y Wang, X Niu, T Liu, Q Li… - … on Computer Vision, 2024 - Springer

Abstract 3D vision-language (3D-VL) grounding, which aims to align language with 3D
physical environments, stands as a cornerstone in develo** embodied agents. In …

Save Cite Cited by 41 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts

H Geng, H Xu, C Zhao, C Xu, L Yi… - Proceedings of the …, 2023 - openaccess.thecvf.com

For years, researchers have been devoted to generalizable object perception and
manipulation, where cross-category generalizability is highly desired yet underexplored. In …

Save Cite Cited by 78 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers

L Wang, X Chen, J Zhao, K He - arxiv preprint arxiv:2409.20537, 2024 - arxiv.org

One of the roadblocks for training generalist robotic models today is heterogeneity. Previous
robot learning methods often collect data to train with one specific embodiment for one task …

Save Cite Cited by 13 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Manipllm: Embodied multimodal large language model for object-centric robotic manipulation

X Li, M Zhang, Y Geng, H Geng… - Proceedings of the …, 2024 - openaccess.thecvf.com

Robot manipulation relies on accurately predicting contact points and end-effector directions
to ensure successful operation. However learning-based robot manipulation trained on a …

Save Cite Cited by 33 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

Large language models as generalizable policies for embodied tasks

A Szot, M Schwarzer, H Agrawal… - The Twelfth …, 2023 - openreview.net

We show that large language models (LLMs) can be adapted to be generalizable policies
for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement …

Save Cite Cited by 43 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

An embodied generalist agent in 3d world

J Huang, S Yong, X Ma, X Linghu, P Li, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Leveraging massive knowledge and learning schemes from large language models (LLMs),
recent machine learning models show notable successes in building generalist agents that …

Save Cite Cited by 89 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Y Kuang, J Ye, H Geng, J Mao, C Deng… - arxiv preprint arxiv …, 2024 - arxiv.org

This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation,
dubbed RAM, featuring generalizability across various objects, environments, and …

Save Cite Cited by 11 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Physcene: Physically interactable 3d scene synthesis for embodied ai

Y Yang, B Jia, P Zhi, S Huang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

With recent developments in Embodied Artificial Intelligence (EAI) research there has been
a growing demand for high-quality large-scale interactive scene generation. While prior …

Save Cite Cited by 25 Related articles All 3 versions Free GPT-4 View as HTML

Cite

Advanced search

Saved to My library

Real-world robot applications of foundation models: A review

Shapellm: Universal 3d object understanding for embodied interaction

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts

Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers

Manipllm: Embodied multimodal large language model for object-centric robotic manipulation

Large language models as generalizable policies for embodied tasks

An embodied generalist agent in 3d world

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Physcene: Physically interactable 3d scene synthesis for embodied ai