A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities
Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …
personalization. Powered by modern AI technologies such as multimodal large language …
VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use
While vision-language models (VLMs) have demonstrated remarkable performance across
various tasks combining textual and visual information, they continue to struggle with fine …
various tasks combining textual and visual information, they continue to struggle with fine …
Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …
datasets, exhibit powerful capabilities in understanding complex patterns and generating …
Vision-language-action model and diffusion policy switching enables dexterous control of an anthropomorphic hand
To advance autonomous dexterous manipulation, we propose a hybrid control method that
combines the relative advantages of a fine-tuned Vision-Language-Action (VLA) model and …
combines the relative advantages of a fine-tuned Vision-Language-Action (VLA) model and …
Investigating the role of instruction variety and task difficulty in robotic manipulation tasks
Evaluating the generalisation capabilities of multimodal models based solely on their
performance on out-of-distribution data fails to capture their true robustness. This work …
performance on out-of-distribution data fails to capture their true robustness. This work …