Google Učenjak

Shrani Navedi Navedeno v 90 virih Sorodni članki Vse različice: 3

Generalized video anomaly event detection: Systematic taxonomy and comparison of deep models

Y Liu, D Yang, Y Wang, J Liu, J Liu… - ACM Computing …, 2024 - dl.acm.org

Video Anomaly Detection (VAD) serves as a pivotal technology in the intelligent surveillance
systems, enabling the temporal or spatial identification of anomalous events within videos …

Shrani Navedi Navedeno v 284 virih Sorodni članki Vse različice: 8 V obliki HTML

Mvbench: A comprehensive multi-modal video understanding benchmark

K Li, Y Wang, Y He, Y Li, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

With the rapid development of Multi-modal Large Language Models (MLLMs) a number of
diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities …

Shrani Navedi Navedeno v 395 virih Sorodni članki Vse različice: 8 V obliki HTML

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Shrani Navedi Navedeno v 165 virih Sorodni članki Vse različice: 7

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2023 - proceedings.neurips.cc

The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

Shrani Navedi Navedeno v 290 virih Sorodni članki Vse različice: 6 Posnetek

Shrani Navedi Navedeno v 135 virih Sorodni članki Vse različice: 5

Internvideo2: Scaling foundation models for multimodal video understanding

Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei… - … on Computer Vision, 2024 - Springer

We introduce InternVideo2, a new family of video foundation models (ViFM) that achieve the
state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue. Our …

Shrani Navedi Navedeno v 718 virih Sorodni članki Vse različice: 6 V obliki HTML

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B **e, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Shrani Navedi Navedeno v 345 virih Sorodni članki Vse različice: 2 V obliki HTML

Internvideo: General video foundation models via generative and discriminative learning

Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao… - arxiv preprint arxiv …, 2022 - arxiv.org

The foundation models have recently shown excellent performance on a variety of
downstream tasks in computer vision. However, most existing vision foundation models …