Google 학술 검색

KQ Lin, L Li, D Gao, Z Yang, Z Bai, W Lei… - … 2024 Workshop on …, 2024 - openreview.net

Graphical User Interface (GUI) automation holds significant promise for enhancing human
productivity by assisting with digital tasks. While recent Large Language Models (LLMs) and …

저장 인용 4회 인용 관련 학술자료 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual prompting in multimodal large language models: A survey

J Wu, Z Zhang, Y **

T Ma, Z Wang, J Zhou, M Wang, J Liang - arxiv preprint arxiv:2411.12286, 2024 - arxiv.org

Inferring affordable (ie, graspable) parts of arbitrary objects based on human specifications
is essential for robots advancing toward open-vocabulary manipulation. Current grasp …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving Vision-Language-Action Models via Chain-of-Affordance

J Li, Y Zhu, Z Tang, J Wen, M Zhu, X Liu, C Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Robot foundation models, particularly Vision-Language-Action (VLA) models, have
garnered significant attention for their ability to enhance robot policy learning, greatly …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] washington.edu

Objects and Actions Learning Representations for Open-World Robotics

W Yuan - 2024 - search.proquest.com

Advancing robotics involves enabling systems to generalize across diverse and unseen
environments, known as" the open world." Traditional approaches rely on state estimators …

저장 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Understanding Depth and Height Perception in Large Visual-Language Models

S Azad, Y Jain, R Garg, YS Rawat, V Vineet - openreview.net

Geometric understanding—including depth and height perception—is fundamental to
intelligence and crucial for navigating our environment. Despite the impressive capabilities …

저장 인용 관련 학술자료 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

A3VLM: Actionable Articulation-Aware Vision Language Model

Showui: One vision-language-action model for generalist gui agent

Visual prompting in multimodal large language models: A survey

Improving Vision-Language-Action Models via Chain-of-Affordance

Objects and Actions Learning Representations for Open-World Robotics

Understanding Depth and Height Perception in Large Visual-Language Models