Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysis C Fu, Y Dai, Y Luo, L Li, S Ren, R Zhang, Z Wang, C Zhou, Y Shen, ... arXiv preprint arXiv:2405.21075, 2024 | 145 | 2024 |
Simple and scalable nearest neighbor machine translation Y Dai, Z Zhang, Q Liu, Q Cui, W Li, Y Du, T Xu arXiv preprint arXiv:2302.12188, 2023 | 19 | 2023 |
T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs S Yin, C Fu, S Zhao, Y Shen, C Ge, Y Yang, Z Long, Y Dai, T Xu, X Sun, ... arXiv preprint arXiv:2411.19951, 2024 | 1 | 2024 |
Datastore Distillation for Nearest Neighbor Machine Translation Y Dai, Z Zhang, Y Du, S Liu, L Liu, T Xu IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 | | 2023 |