Google Академик

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

Сачувај Цитирај 653 пута наведен Сродни чланци Све верзије (18)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Temporal action segmentation: An analysis of modern techniques

G Ding, F Sener, A Yao - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …

Сачувај Цитирај 71 пута наведен Сродни чланци Све верзије (8)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Сачувај Цитирај 134 пута наведен Сродни чланци Све верзије (11) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videoclip: Contrastive pre-training for zero-shot video-text understanding

H Xu, G Ghosh, PY Huang, D Okhonko… - ar** from natural language instructions and egocentric …

Сачувај Цитирај 834 пута наведен Сродни чланци Све верзије (11) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Actbert: Learning global-local video-text representations

L Zhu, Y Yang - Proceedings of the IEEE/CVF conference …, 2020 - openaccess.thecvf.com

In this paper, we introduce ActBERT for self-supervised learning of joint video-text
representations from unlabeled data. First, we leverage global action information to catalyze …

Сачувај Цитирај 507 пута наведен Сродни чланци Све верзије (11) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Howto100m: Learning a text-video embedding by watching hundred million narrated video clips

A Miech, D Zhukov, JB Alayrac… - Proceedings of the …, 2019 - openaccess.thecvf.com

Learning text-video embeddings usually requires a dataset of video clips with manually
provided captions. However, such datasets are expensive and time consuming to create and …

Сачувај Цитирај 1320 пута наведен Сродни чланци Све верзије (10) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Unsupervised learning from narrated instruction videos

Human action recognition from various data modalities: A review

Temporal action segmentation: An analysis of modern techniques

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

Videoclip: Contrastive pre-training for zero-shot video-text understanding

Actbert: Learning global-local video-text representations

Howto100m: Learning a text-video embedding by watching hundred million narrated video clips