Mio: A foundation model on multimodal tokens Z Wang, K Zhu, C Xu, W Zhou, J Liu, Y Zhang, J Wang, N Shi, S Li, Y Li, ... arXiv preprint arXiv:2409.17692, 2024 | 2 | 2024 |
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis B Zeng, L Yang, S Li, J Liu, Z Zhang, J Tian, K Zhu, Y Guo, FY Wang, ... arXiv preprint arXiv:2410.07155, 2024 | 1 | 2024 |
ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions H Lin, S Li, G Nan, C Tang, X Wang, J Xu, R Yankai, Z Zhou, Y Gao, Q Cui, ... arXiv preprint arXiv:2405.19226, 2024 | | 2024 |