Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 3308 | 2023 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 1234 | 2024 |
A suite of generative tasks for multi-level multimodal webpage understanding A Burns, K Srinivasan, J Ainslie, G Brown, BA Plummer, K Saenko, J Ni, ... arXiv preprint arXiv:2305.03668, 2023 | 12 | 2023 |
Non-intrusive adaptation: Input-centric parameter-efficient fine-tuning for versatile multimodal modeling Y Wang, J Wu, T Dabral, J Zhang, G Brown, CT Lu, F Liu, Y Liang, B Pang, ... arXiv preprint arXiv:2310.12100, 2023 | 10 | 2023 |
Wikiweb2m: A page-level multimodal wikipedia dataset A Burns, K Srinivasan, J Ainslie, G Brown, BA Plummer, K Saenko, J Ni, ... arXiv preprint arXiv:2305.05432, 2023 | 3 | 2023 |