Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Video understanding with large language models: A survey
With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …
content, the demand for proficient video understanding tools has intensified markedly. Given …
Videotree: Adaptive tree-based video representation for llm reasoning on long videos
Long-form video understanding has been a challenging task due to the high redundancy in
video data and the abundance of query-irrelevant information. To tackle this challenge, we …
video data and the abundance of query-irrelevant information. To tackle this challenge, we …
Vamos: Versatile action models for video understanding
What makes good representations for video understanding, such as anticipating future
activities, or answering video-conditioned questions? While earlier approaches focus on …
activities, or answering video-conditioned questions? While earlier approaches focus on …
Language repository for long video understanding
Language has become a prominent modality in computer vision with the rise of LLMs.
Despite supporting long context-lengths, their effectiveness in handling long-term …
Despite supporting long context-lengths, their effectiveness in handling long-term …
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
Recent advancements in image understanding have benefited from the extensive use of
web image-text pairs. However, video understanding remains a challenge despite the …
web image-text pairs. However, video understanding remains a challenge despite the …
Videoqa in the era of llms: An empirical study
Abstract Video Large Language Models (Video-LLMs) are flourishing and has advanced
many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) …
many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) …
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
In the video-language domain, recent works in leveraging zero-shot Large Language Model-
based reasoning for video understanding have become competitive challengers to previous …
based reasoning for video understanding have become competitive challengers to previous …
Drvideo: Document retrieval based long video understanding
Existing methods for long video understanding primarily focus on videos only lasting tens of
seconds, with limited exploration of techniques for handling longer videos. The increased …
seconds, with limited exploration of techniques for handling longer videos. The increased …
Too many frames, not all useful: Efficient strategies for long-form video qa
Long-form videos that span across wide temporal intervals are highly information redundant
and contain multiple distinct events or entities that are often loosely related. Therefore, when …
and contain multiple distinct events or entities that are often loosely related. Therefore, when …
Episodic memory verbalization using hierarchical representations of life-long robot experience
Verbalization of robot experience, ie, summarization of and question answering about a
robot's past, is a crucial ability for improving human-robot interaction. Previous works …
robot's past, is a crucial ability for improving human-robot interaction. Previous works …