A survey of multimodal large language model from a data-centric perspective

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - ar** Contrastive Pre-training for Data Efficiency
Y Guo, M Kankanhalli - arxiv preprint arxiv:2411.09126, 2024 - arxiv.org
While contrastive pre-training is widely employed, its data efficiency problem has remained
relatively under-explored thus far. Existing methods often rely on static coreset selection …