Gpt-4v (ision) for robotics: Multimodal task planning from human demonstration

N Wake, A Kanehira, K Sasabuchi… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V
(ision), to facilitate one-shot visual teaching for robotic manipulation. This system analyzes …

Chatgpt empowered long-step robot control in various environments: A case application

N Wake, A Kanehira, K Sasabuchi, J Takamatsu… - IEEE …, 2023 - ieeexplore.ieee.org
This paper introduces a novel method for translating natural-language instructions into
executable robot actions using OpenAI's ChatGPT in a few-shot setting. We propose …

A multi-modal framework for robots to learn manipulation tasks from human demonstrations

C Yin, Q Zhang - Journal of Intelligent & Robotic Systems, 2023 - Springer
Enabling robots to learn manipulation tasks by observing human demonstrations remains a
major challenge. Recent advances in video captioning tasks provide an end-to-end method …

Semantic constraints to represent common sense required in household actions for multimodal learning-from-observation robot

K Ikeuchi, N Wake, K Sasabuchi… - … Journal of Robotics …, 2024 - journals.sagepub.com
The learning-from-observation (LfO) paradigm allows a robot to learn how to perform actions
by observing human actions. Previous research in top-down learning-from-observation has …

Multi-modal LLM-enabled Long-horizon Skill Learning for Robotic Manipulation

R Tan, S Lou, Y Zhou, C Lv - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
The advent of Large Language Models (LLMs) has empowered robots to execute tasks
based on human instructions. Nonetheless, the challenge still persists in endowing robots …

Interactive task encoding system for learning-from-observation

N Wake, A Kanehira, K Sasabuchi… - 2023 IEEE/ASME …, 2023 - ieeexplore.ieee.org
We present the Interactive Task Encoding System (ITES) for teaching robots to perform
manipulative tasks. ITES is designed as an input system for the Learning-from-Observation …

A Prompt-driven Task Planning Method for Multi-drones based on Large Language Model

Y Liu - arxiv preprint arxiv:2406.00006, 2024 - arxiv.org
With the rapid development of drone technology, the application of multi-drones is becoming
increasingly widespread in various fields. However, the task planning technology for multi …

Prism-tracker: A framework for multimodal procedure tracking using wearable sensors and state transition information with user-driven handling of errors and …

R Arakawa, H Yakura, V Mollyn, S Nie… - Proceedings of the …, 2023 - dl.acm.org
A user often needs training and guidance while performing several daily life procedures, eg,
cooking, setting up a new appliance, or doing a COVID test. Watch-based human activity …

A Human-Robot Interaction Dual-Arm Robot System For Power Distribution Network

H He, Y Li, J Chen, Y Guo, X Bi… - 2023 China Automation …, 2023 - ieeexplore.ieee.org
The current live-line maintenance robot in the power distribution network faces a significant
challenge. The commonly used operational methods have their limitations. Teleoperation …