- Academic Search

A Bardes, Q Garrido, J Ponce, X Chen… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

This paper explores feature prediction as a stand-alone objective for unsupervised learning
from video and introduces V-JEPA, a collection of vision models trained solely using a …‏

שמור צטט צוטט על ידי 65 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videoprism: A foundational visual encoder for video understanding‏

L Zhao, NB Gundavarapu, L Yuan, H Zhou… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

We introduce VideoPrism, a general-purpose video encoder that tackles diverse video
understanding tasks with a single frozen model. We pretrain VideoPrism on a …‏

שמור צטט צוטט על ידי 31 מאמרים בנושא זה כל 10 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] techrxiv.org

Foundation models for video understanding: A survey‏

N Madan, A Møgelmose, R Modi, YS Rawat… - Authorea …, 2024‏ - techrxiv.org‏

Video Foundation Models (ViFMs) aim to develop general-purpose representations for
various video understanding tasks by leveraging large-scale datasets and powerful models …‏

שמור צטט צוטט על ידי 19 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multi-perspective traffic video description model with fine-grained refinement approach‏

TA To, MN Tran, TB Ho, TL Ha… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

The analysis of traffic patterns is crucial for enhancing safety and optimizing flow within
urban cities. While urban cities possess extensive camera networks for monitoring the raw …‏

שמור צטט צוטט על ידי 3 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video-language understanding: A survey from model architecture, model training, and data perspectives‏

T Nguyen, Y Bin, J **ao, L Qu, Y Li, JZ Wu… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Humans use multiple senses to comprehend the environment. Vision and language are two
of the most vital senses since they allow us to easily communicate our thoughts and …‏

שמור צטט צוטט על ידי 7 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

V-jepa: Latent video prediction for visual representation learning‏

A Bardes, Q Garrido, J Ponce, X Chen, M Rabbat… - 2023‏ - openreview.net‏

This paper shows that the masked-modelling principle driving the success of large
foundational language models can be effectively applied to video by making predictions in …‏

שמור צטט צוטט על ידי 10 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videoeval: Comprehensive benchmark suite for low-cost evaluation of video foundation model‏

X Li, Z Huang, J Wang, K Li, L Wang - arxiv preprint arxiv:2407.06491, 2024‏ - arxiv.org‏

With the growth of high-quality data and advancement in visual pre-training paradigms,
Video Foundation Models (VFMs) have made significant progress recently, demonstrating …‏

שמור צטט צוטט על ידי 2 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

LVS: A Learned Video Storage for Fast and Efficient Video Understanding‏

Y Lee, J Park - Proceedings of the IEEE/CVF Conference …, 2024‏ - openaccess.thecvf.com‏

As video understanding (VU) promises unprecedented capabilities in the era of video data
explosion, its efficient computation plays a critical role in practicalizing the algorithmic …‏

שמור צטט מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] biorxiv.org

Video Foundation Models for Animal Behavior Analysis‏

JJ Sun, H Zhou, L Zhao, L Yuan, B Seybold, D Hendon… - bioRxiv, 2024‏ - biorxiv.org‏

Computational approaches leveraging computer vision and machine learning have
transformed the quantification of animal behavior from video. However, existing methods …‏

שמור צטט צוטט על ידי 2 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video Creation by Demonstration‏

Y Sun, H Zhou, L Yuan, JJ Sun, Y Li, X Jia… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

We explore a novel video creation experience, namely Video Creation by Demonstration.
Given a demonstration video and a context image from a different scene, we generate a …‏

שמור צטט מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

VideoGLUE: Video general understanding evaluation of foundation models

Revisiting feature prediction for learning visual representations from video‏

Videoprism: A foundational visual encoder for video understanding‏

Foundation models for video understanding: A survey‏

Multi-perspective traffic video description model with fine-grained refinement approach‏

Video-language understanding: A survey from model architecture, model training, and data perspectives‏

V-jepa: Latent video prediction for visual representation learning‏

Videoeval: Comprehensive benchmark suite for low-cost evaluation of video foundation model‏

LVS: A Learned Video Storage for Fast and Efficient Video Understanding‏

Video Foundation Models for Animal Behavior Analysis‏

Video Creation by Demonstration‏