The revolution of multimodal large language models: a survey D Caffagni, F Cocchi, L Barsellotti, N Moratelli, S Sarto, L Baraldi, ... arXiv preprint arXiv:2402.12451, 2024 | 46 | 2024 |
Wiki-llava: Hierarchical retrieval-augmented generation for multimodal llms D Caffagni, F Cocchi, N Moratelli, S Sarto, M Cornia, L Baraldi, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 29 | 2024 |
Synthcap: Augmenting transformers with synthetic data for image captioning D Caffagni, M Barraco, M Cornia, L Baraldi, R Cucchiara International Conference on Image Analysis and Processing, 112-123, 2023 | 7 | 2023 |
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization N Moratelli, D Caffagni, M Cornia, L Baraldi, R Cucchiara arXiv preprint arXiv:2408.14547, 2024 | 4 | 2024 |
Benchmarking BERT-based Models for Latin: A Case Study on Biblical References in Ancient Christian Literature D Caffagni, F Cocchi, A Mambelli, F Tutrone, M Zanella, M Cornia, ... Proceedings of the 21st Conference on Information and Research Science …, 2025 | | 2025 |