Real-world robot applications of foundation models: A review

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

A survey on integration of large language models with intelligent robots

Y Kim, D Kim, J Choi, J Park, N Oh, D Park - Intelligent Service Robotics, 2024 - Springer
In recent years, the integration of large language models (LLMs) has revolutionized the field
of robotics, enabling robots to communicate, understand, and reason with human-like …

Lerf: Language embedded radiance fields

J Kerr, CM Kim, K Goldberg… - Proceedings of the …, 2023 - openaccess.thecvf.com
Humans describe the physical world using natural language to refer to specific 3D locations
based on a vast range of properties: visual appearance, semantics, abstract associations, or …

Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning

Q Gu, A Kuwajerwala, S Morin… - … on Robotics and …, 2024 - ieeexplore.ieee.org
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …

Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians

L Hu, H Zhang, Y Zhang, B Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present GaussianAvatar an efficient approach to creating realistic human avatars with
dynamic 3D appearances from a single video. We start by introducing animatable 3D …

Segment anything in 3d with nerfs

J Cen, Z Zhou, J Fang, W Shen, L **e… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Recently, the Segment Anything Model (SAM) emerged as a powerful vision
foundation model which is capable to segment anything in 2D images. This paper aims to …

Openshape: Scaling up 3d shape representation towards open-world understanding

M Liu, R Shi, K Kuang, Y Zhu, X Li… - Advances in neural …, 2024 - proceedings.neurips.cc
We introduce OpenShape, a method for learning multi-modal joint representations of text,
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …

Gnfactor: Multi-task real robot learning with generalizable neural feature fields

Y Ze, G Yan, YH Wu, A Macaluso… - … on Robot Learning, 2023 - proceedings.mlr.press
It is a long-standing problem in robotics to develop agents capable of executing diverse
manipulation tasks from visual observations in unstructured real-world environments. To …

Langsplat: 3d language gaussian splatting

M Qin, W Li, J Zhou, H Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Humans live in a 3D world and commonly use natural language to interact with a 3D scene.
Modeling a 3D language field to support open-ended language queries in 3D has gained …

Toward general-purpose robots via foundation models: A survey and meta-analysis

Y Hu, Q **e, V Jain, J Francis, J Patrikar… - arxiv preprint arxiv …, 2023 - arxiv.org
Building general-purpose robots that operate seamlessly in any environment, with any
object, and utilizing various skills to complete diverse tasks has been a long-standing goal in …