Critic-v: Vlm critics help catch vlm errors in multimodal reasoning

D Zhang, J Lei, J Li, X Wang, Y Liu, Z Yang, J Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-language models~(VLMs) have shown remarkable advancements in multimodal
reasoning tasks. However, they still often generate inaccurate or irrelevant responses due to …