- Academic Search

Y Liu, W Chen, Y Bai, X Liang, G Li, W Gao… - ar** and
challenging area in embodied AI. It is crucial for advancing next-generation intelligent robots …

Save Cite Cited by 3 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

Y Liu, X Cao, T Chen, Y Jiang, J You, M Wu… - arxiv preprint arxiv …, 2025 - arxiv.org

Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …

Save Cite Cited by 1 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use

Z Zhang, R Rossi, T Yu, F Dernoncourt… - arxiv preprint arxiv …, 2024 - arxiv.org

While vision-language models (VLMs) have demonstrated remarkable performance across
various tasks combining textual and visual information, they continue to struggle with fine …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arxiv preprint arxiv …, 2024 - arxiv.org

Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Vision-language-action model and diffusion policy switching enables dexterous control of an anthropomorphic hand

C Pan, K Junge, J Hughes - arxiv preprint arxiv:2410.14022, 2024 - arxiv.org

To advance autonomous dexterous manipulation, we propose a hybrid control method that
combines the relative advantages of a fine-tuned Vision-Language-Action (VLA) model and …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Investigating the role of instruction variety and task difficulty in robotic manipulation tasks

A Parekh, N Vitsakis, A Suglia, I Konstas - arxiv preprint arxiv:2407.03967, 2024 - arxiv.org

Evaluating the generalisation capabilities of multimodal models based solely on their
performance on out-of-distribution data fails to capture their true robustness. This work …

Save Cite Cited by 1 Related articles All 5 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

A Survey on Vision-Language-Action Models for Embodied AI

Aligning cyber space with physical world: A comprehensive survey on embodied ai

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use

Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives

Vision-language-action model and diffusion policy switching enables dexterous control of an anthropomorphic hand

Investigating the role of instruction variety and task difficulty in robotic manipulation tasks