Real-world robot applications of foundation models: A review

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

A survey on integration of large language models with intelligent robots

Y Kim, D Kim, J Choi, J Park, N Oh, D Park - Intelligent Service Robotics, 2024 - Springer
In recent years, the integration of large language models (LLMs) has revolutionized the field
of robotics, enabling robots to communicate, understand, and reason with human-like …

Lerf: Language embedded radiance fields

J Kerr, CM Kim, K Goldberg… - Proceedings of the …, 2023 - openaccess.thecvf.com
Humans describe the physical world using natural language to refer to specific 3D locations
based on a vast range of properties: visual appearance, semantics, abstract associations, or …

Gaussian grou**: Segment and edit anything in 3d scenes

M Ye, M Danelljan, F Yu, L Ke - European Conference on Computer …, 2024 - Springer
Abstract The recent Gaussian Splatting achieves high-quality and real-time novel-view
synthesis of the 3D scenes. However, it is solely concentrated on the appearance and …

Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning

Q Gu, A Kuwajerwala, S Morin… - … on Robotics and …, 2024 - ieeexplore.ieee.org
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …

Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians

L Hu, H Zhang, Y Zhang, B Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present GaussianAvatar an efficient approach to creating realistic human avatars with
dynamic 3D appearances from a single video. We start by introducing animatable 3D …

Segment anything in 3d with nerfs

J Cen, Z Zhou, J Fang, W Shen, L **e… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Recently, the Segment Anything Model (SAM) emerged as a powerful vision
foundation model which is capable to segment anything in 2D images. This paper aims to …

Physically grounded vision-language models for robotic manipulation

J Gao, B Sarkar, F **a, T **ao, J Wu… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Recent advances in vision-language models (VLMs) have led to improved performance on
tasks such as visual question answering and image captioning. Consequently, these models …

Openshape: Scaling up 3d shape representation towards open-world understanding

M Liu, R Shi, K Kuang, Y Zhu, X Li… - Advances in neural …, 2024 - proceedings.neurips.cc
We introduce OpenShape, a method for learning multi-modal joint representations of text,
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …