Google Učenjak

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

Shrani Navedi Navedeno v 63 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videotree: Adaptive tree-based video representation for llm reasoning on long videos

Z Wang, S Yu, E Stengel-Eskin, J Yoon… - arxiv preprint arxiv …, 2024 - arxiv.org

Long-form video understanding has been a challenging task due to the high redundancy in
video data and the abundance of query-irrelevant information. To tackle this challenge, we …

Shrani Navedi Navedeno v 27 virih Sorodni članki Vse različice: 4 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vamos: Versatile action models for video understanding

S Wang, Q Zhao, MQ Do, N Agarwal, K Lee… - European Conference on …, 2024 - Springer

What makes good representations for video understanding, such as anticipating future
activities, or answering video-conditioned questions? While earlier approaches focus on …

Shrani Navedi Navedeno v 17 virih Sorodni članki Vse različice: 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Language repository for long video understanding

K Kahatapitiya, K Ranasinghe, J Park… - arxiv preprint arxiv …, 2024 - arxiv.org

Language has become a prominent modality in computer vision with the rise of LLMs.
Despite supporting long context-lengths, their effectiveness in handling long-term …

Shrani Navedi Navedeno v 19 virih Sorodni članki Vse različice: 5 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment

W Li, H Fan, Y Wong… - Advances in Neural …, 2025 - proceedings.neurips.cc

Recent advancements in image understanding have benefited from the extensive use of
web image-text pairs. However, video understanding remains a challenge despite the …

Shrani Navedi Navedeno v 3 virih Sorodni članki Vse različice: 4 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videoqa in the era of llms: An empirical study

J **ao, N Huang, H Qin, D Li, Y Li, F Zhu, Z Tao… - International Journal of …, 2025 - Springer

Abstract Video Large Language Models (Video-LLMs) are flourishing and has advanced
many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) …

Shrani Navedi Navedeno v 7 virih Sorodni članki Vse različice: 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

R Liao, M Erler, H Wang, G Zhai, G Zhang, Y Ma… - arxiv preprint arxiv …, 2024 - arxiv.org

In the video-language domain, recent works in leveraging zero-shot Large Language Model-
based reasoning for video understanding have become competitive challengers to previous …

Shrani Navedi Navedeno v 5 virih Sorodni članki Vse različice: 4 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Drvideo: Document retrieval based long video understanding

Z Ma, C Gou, H Shi, B Sun, S Li, H Rezatofighi… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing methods for long video understanding primarily focus on videos only lasting tens of
seconds, with limited exploration of techniques for handling longer videos. The increased …

Shrani Navedi Navedeno v 6 virih Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Too many frames, not all useful: Efficient strategies for long-form video qa

J Park, K Ranasinghe, K Kahatapitiya, W Ryoo… - arxiv preprint arxiv …, 2024 - arxiv.org

Long-form videos that span across wide temporal intervals are highly information redundant
and contain multiple distinct events or entities that are often loosely related. Therefore, when …

Shrani Navedi Navedeno v 7 virih Sorodni članki Vse različice: 4 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Episodic memory verbalization using hierarchical representations of life-long robot experience

L Bärmann, C DeChant, J Plewnia… - arxiv preprint arxiv …, 2024 - arxiv.org

Verbalization of robot experience, ie, summarization of and question answering about a
robot's past, is a crucial ability for improving human-robot interaction. Previous works …

Shrani Navedi Navedeno v 4 virih Sorodni članki Vse različice: 4 V obliki HTML

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

Morevqa: Exploring modular reasoning models for video question answering

Video understanding with large language models: A survey

Videotree: Adaptive tree-based video representation for llm reasoning on long videos

Vamos: Versatile action models for video understanding

Language repository for long video understanding

TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment

Videoqa in the era of llms: An empirical study

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

Drvideo: Document retrieval based long video understanding

Too many frames, not all useful: Efficient strategies for long-form video qa

Episodic memory verbalization using hierarchical representations of life-long robot experience