MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization

K Zhu, P **a, Y Li, H Zhu, S Wang, H Yao - arxiv preprint arxiv …, 2024‏ - arxiv.org
The advancement of Large Vision-Language Models (LVLMs) has propelled their
application in the medical field. However, Medical LVLMs (Med-LVLMs) encounter factuality …

ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports

VM Rao, S Zhang, JN Acosta, S Adithan… - … 2025: Proceedings of …, 2024‏ - World Scientific
Accurately interpreting medical images and writing radiology reports is a critical but
challenging task in healthcare. Both human-written and AI-generated reports can contain …

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types

J Chen, T Zhang, C Liu, H Ding, Y Shi, F Cheng… - arxiv preprint arxiv …, 2025‏ - arxiv.org
Multimodal visual language models are gaining prominence in open-world applications,
driven by advancements in model architectures, training techniques, and high-quality data …

Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation

C Wang, W Zhou, S Ghosh, K Batmanghelich… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Radiology report generation (RRG) has shown great potential in assisting radiologists by
automating the labor-intensive task of report writing. While recent advancements have …