- Academic Search

Rethinking zero-shot video classification: End-to-end training for realistic applications

Szukaj w artykułach zawierających cytaty

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Transfer learning and its extensive appositeness in human activity recognition: A survey

A Ray, MH Kolekar - Expert Systems with Applications, 2024 - Elsevier

In this competitive world, the supervision and monitoring of human resources are primary
and necessary tasks to drive context-aware applications. Advancement in sensor and …

Zapisz Cytuj Cytowane przez 15 Powiązane artykuły Wszystkie wersje 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - European conference on …, 2022 - Springer

Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

Zapisz Cytuj Cytowane przez 350 Powiązane artykuły Wszystkie wersje 8

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Fine-tuned clip models are efficient video learners

H Rasheed, MU Khattak, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large-scale multi-modal training with image-text pairs imparts strong generalization to CLIP
model. Since training on a similar scale for videos is infeasible, recent approaches focus on …

Zapisz Cytuj Cytowane przez 164 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prompting visual-language models for efficient video understanding

C Ju, T Han, K Zheng, Y Zhang, W **e - European Conference on …, 2022 - Springer

Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …

Zapisz Cytuj Cytowane przez 429 Powiązane artykuły Wszystkie wersje 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Actionclip: A new paradigm for video action recognition

M Wang, J **ng, Y Liu - arxiv preprint arxiv:2109.08472, 2021 - arxiv.org

The canonical approach to video action recognition dictates a neural model to do a classic
and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined …

Zapisz Cytuj Cytowane przez 453 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vita-clip: Video and text adaptive clip via multimodal prompting

ST Wasim, M Naseer, S Khan… - Proceedings of the …, 2023 - openaccess.thecvf.com

Adopting contrastive image-text pretrained models like CLIP towards video classification has
gained attention due to its cost-effectiveness and competitive performance. However, recent …

Zapisz Cytuj Cytowane przez 86 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

S Pramanick, Y Song, S Nag, KQ Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …

Zapisz Cytuj Cytowane przez 69 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

W Wu, X Wang, H Luo, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …

Zapisz Cytuj Cytowane przez 92 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Revisiting classifier: Transferring vision-language models for video recognition

W Wu, Z Sun, W Ouyang - Proceedings of the AAAI conference on …, 2023 - ojs.aaai.org

Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …

Zapisz Cytuj Cytowane przez 107 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Omnivid: A generative framework for universal video understanding

J Wang, D Chen, C Luo, B He, L Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com

The core of video understanding tasks such as recognition captioning and tracking is to
automatically detect objects or actions in a video and analyze their temporal evolution …

Zapisz Cytuj Cytowane przez 17 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Rethinking zero-shot video classification: End-to-end training for realistic applications

Transfer learning and its extensive appositeness in human activity recognition: A survey

Expanding language-image pretrained models for general video recognition

Fine-tuned clip models are efficient video learners

Prompting visual-language models for efficient video understanding

Actionclip: A new paradigm for video action recognition

Vita-clip: Video and text adaptive clip via multimodal prompting

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

Revisiting classifier: Transferring vision-language models for video recognition

Omnivid: A generative framework for universal video understanding