Octopus: Embodied vision-language programmer from environmental feedback

J Yang, Y Dong, S Liu, B Li, Z Wang, H Tan… - … on Computer Vision, 2024 - Springer
Large vision-language models (VLMs) have achieved substantial progress in multimodal
perception and reasoning. When integrated into an embodied agent, existing embodied …

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

S **e, L Kong, Y Dong, C Sima, W Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org
Recent advancements in Vision-Language Models (VLMs) have sparked interest in their use
for autonomous driving, particularly in generating interpretable driving decisions through …

VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

X Wu, Y Ding, B Li, P Lu, D Yin, KW Chang… - arxiv preprint arxiv …, 2024 - arxiv.org
The ability of large vision-language models (LVLMs) to critique and correct their reasoning is
an essential building block towards their self-improvement. However, a systematic analysis …