Internvideo2: Scaling foundation models for multimodal video understanding

Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei… - … on Computer Vision, 2024 - Springer
We introduce InternVideo2, a new family of video foundation models (ViFM) that achieve the
state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue. Our …

Hnssl: Hard negative-based self-supervised learning

W Zhu, J Liu, Y Huang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Recently, learning from vast unlabeled data, especially self-supervised learning, has been
emerging and attracting widespread attention. Self-supervised learning followed by …

[KİTAP][B] Computer Vision-ECCV 2024: 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XXIV.

A Leonardis, E Ricci, S Roth, O Russakovsky, T Sattler… - 2024 - books.google.com
The multi-volume set of LNCS books with volume numbers 15059 up to 15147 constitutes
the refereed proceedings of the 18th European Conference on Computer Vision, ECCV …

Youtube SFV+ HDR Quality Dataset

Y Wang, JG Yim, N Birkbeck… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
The popularity of Short form videos (SFV) has grown dramatically in the past few years, and
has become a phenomenal video category with billions of viewers. Meanwhile, High …

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

Y Wu, W Zhu, J Cao, Y Lu, B Li, W Chi, Z Qiu… - arxiv preprint arxiv …, 2024 - arxiv.org
The demand for producing short-form videos for sharing on social media platforms has
experienced significant growth in recent times. Despite notable advancements in the fields …

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

J Li, S Tao, Y Yan, X Gu, H Xu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org
Endeavors have been made to explore Large Language Models for video analysis (Video-
LLMs), particularly in understanding and interpreting long videos. However, existing Video …

LinVT: Empower Your Image-level Large Language Model to Understand Videos

L Gao, Y Zhong, Y Zeng, H Tan, D Li, Z Zhao - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have been widely used in various tasks, motivating us to
develop an LLM-based assistant for videos. Instead of training from scratch, we propose a …

An empirical comparison of video frame sampling methods for multi-modal rag retrieval

M Kandhare, T Gisselbrecht - arxiv preprint arxiv:2408.03340, 2024 - arxiv.org
Numerous video frame sampling methodologies detailed in the literature present a
significant challenge in determining the optimal video frame method for Video RAG pattern …

[HTML][HTML] PyCinemetrics: Computational film studies tool based on deep learning and PySide2

C Li, J Lu, Y Pei, Y Shen, Y Hu, Y Fan, Y Tian, X Linghu… - SoftwareX, 2024 - Elsevier
Although computer vision offers ample possibilities, there is currently a lack of general film
measurement software. Here, we propose a software called PyCinemetrics, which is a …

The Short Video Popularity Prediction Using Internet of Things and Deep Learning

Z He, D Li - IEEE Access, 2024 - ieeexplore.ieee.org
In order to furnish valuable insights and solutions applicable to content creators, social
media platforms, academic research, and general users, this investigation integrates the …