Robot learning in the era of foundation models: A survey

X ** under human intent uncertainty using pomdps
JA Yow, NP Garg, WT Ang - IEEE Transactions on Robotics, 2023‏ - ieeexplore.ieee.org
In shared autonomy (SA), accurate user intent prediction is crucial for good robot assistance
and avoiding user–robot conflicts. Prior works have relied on passive observation of joystick …

InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500K Dialogues

H Zhang, J Xu, Y Mo, T Kong - Proceedings of the IEEE/CVF …, 2024‏ - openaccess.thecvf.com
Ambiguity is ubiquitous in human communication. Previous approaches in Human-Robot
Interaction (HRI) have often relied on predefined interaction templates leading to reduced …

HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Gras** Using Vision-Language Models

V Bhat, P Krishnamurthy, R Karri, F Khorrami - ar**
GC Kang, J Kim, J Kim, BT Zhang - 2024 IEEE International …, 2024‏ - ieeexplore.ieee.org
Interactive Object Gras** (IOG) is the task of identifying and gras** the desired object
via human-robot natural language interaction. Current IOG systems assume that a human …

BoxGrounder: 3D Visual Grounding Using Object Size Estimates

M Piccolrovazzi, MG Adam, M Zakour… - IEEE Robotics and …, 2024‏ - ieeexplore.ieee.org
Recent advances in simultaneous localization and map** (SLAM) systems have
significantly enhanced the process of creating 3D digital replicas of real-world environments …

SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction

J Xu, H Zhang, X Li, H Liu, X Lan, T Kong - arxiv preprint arxiv …, 2024‏ - arxiv.org
Linguistic ambiguity is ubiquitous in our daily lives. Previous works adopted interaction
between robots and humans for language disambiguation. Nevertheless, when interactive …

Towards unified interactive visual grounding in the wild

J Xu, H Zhang, Q Si, Y Li, X Lan… - 2024 IEEE International …, 2024‏ - ieeexplore.ieee.org
Interactive visual grounding in Human-Robot Interaction (HRI) is challenging yet practical
due to the inevitable ambiguity in natural languages. It requires robots to disambiguate the …