Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning

Y Gou, H Yang, Z Liu, K Chen, Y Zeng, L Hong… - arxiv preprint arxiv …, 2025 - arxiv.org
Visual Instruction Tuning (VIT) enhances Multimodal Large Language Models (MLLMs) but it
is hindered by corrupted datasets containing hallucinated content, incorrect responses, and …

FCoT-VL: Advancing Text-oriented Large Vision-Language Models with Efficient Visual Token Compression

J Li, J Fan, F Tang, G Huang, S Zhu, S Liu… - arxiv preprint arxiv …, 2025 - arxiv.org
The rapid success of Vision Large Language Models (VLLMs) often depends on the high-
resolution images with abundant visual tokens, which hinders training and deployment …