Robopoint: A vision-language model for spatial affordance prediction for robotics

W Yuan, J Duan, V Blukis, W Pumacay… - ar**
T Ma, Z Wang, J Zhou, M Wang, J Liang - arxiv preprint arxiv:2411.12286, 2024 - arxiv.org
Inferring affordable (ie, graspable) parts of arbitrary objects based on human specifications
is essential for robots advancing toward open-vocabulary manipulation. Current grasp …

UniAff: A unified representation of affordances for tool usage and articulation with vision-language models

Q Yu, S Huang, X Yuan, Z Jiang, C Hao, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Previous studies on robotic manipulation are based on a limited understanding of the
underlying 3D motion constraints and affordances. To address these challenges, we …

Showui: One vision-language-action model for gui visual agent

KQ Lin, L Li, D Gao, Z Yang, S Wu, Z Bai, W Lei… - arxiv preprint arxiv …, 2024 - arxiv.org
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing
human workflow productivity. While most agents are language-based, relying on closed …

Improving Vision-Language-Action Models via Chain-of-Affordance

J Li, Y Zhu, Z Tang, J Wen, M Zhu, X Liu, C Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Robot foundation models, particularly Vision-Language-Action (VLA) models, have
garnered significant attention for their ability to enhance robot policy learning, greatly …

Objects and Actions Learning Representations for Open-World Robotics

W Yuan - 2024 - search.proquest.com
Advancing robotics involves enabling systems to generalize across diverse and unseen
environments, known as" the open world." Traditional approaches rely on state estimators …

Understanding Depth and Height Perception in Large Visual-Language Models

S Azad, Y Jain, R Garg, YS Rawat, V Vineet - openreview.net
Geometric understanding—including depth and height perception—is fundamental to
intelligence and crucial for navigating our environment. Despite the impressive capabilities …