LinVT: Empower Your Image-level Large Language Model to Understand Videos

L Gao, Y Zhong, Y Zeng, H Tan, D Li, Z Zhao - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have been widely used in various tasks, motivating us to
develop an LLM-based assistant for videos. Instead of training from scratch, we propose a …