Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysis

C Fu, Y Dai, Y Luo, L Li, S Ren, R Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs)
have emerged as a focal point in recent advancements. However, the predominant focus …

Video understanding with large language models: A survey

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …