- Academic Search

A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions‏

SK Yadav, K Tiwari, HM Pandey, SA Akbar - Knowledge-Based Systems, 2021‏ - Elsevier‏

Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …‏

שמור צטט צוטט על ידי 274 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Temporal action segmentation: An analysis of modern techniques‏

G Ding, F Sener, A Yao - IEEE Transactions on Pattern Analysis …, 2023‏ - ieeexplore.ieee.org‏

Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …‏

שמור צטט צוטט על ידי 71 מאמרים בנושא זה כל 8 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Embodiedgpt: Vision-language pre-training via embodied chain of thought‏

Y Mu, Q Zhang, M Hu, W Wang… - Advances in …, 2023‏ - proceedings.neurips.cc‏

Embodied AI is a crucial frontier in robotics, capable of planning and executing action
sequences for robots to accomplish long-horizon tasks in physical environments. In this …‏

שמור צטט צוטט על ידי 211 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Affordances from human videos as a versatile representation for robotics‏

S Bahl, R Mendonca, L Chen… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …‏

שמור צטט צוטט על ידי 143 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video‏

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022‏ - openaccess.thecvf.com‏

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …‏

שמור צטט צוטט על ידי 1020 מאמרים בנושא זה כל 20 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Egocentric video-language pretraining‏

KQ Lin, J Wang, M Soldan, M Wray… - Advances in …, 2022‏ - proceedings.neurips.cc‏

Abstract Video-Language Pretraining (VLP), which aims to learn transferable representation
to advance a wide range of video-text downstream tasks, has recently received increasing …‏

שמור צטט צוטט על ידי 187 מאמרים בנושא זה כל 10 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Anticipative video transformer‏

R Girdhar, K Grauman - Proceedings of the IEEE/CVF …, 2021‏ - openaccess.thecvf.com‏

Abstract We propose Anticipative Video Transformer (AVT), an end-to-end attention-based
video modeling architecture that attends to the previously observed video in order to …‏

שמור צטט צוטט על ידי 251 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Future transformer for long-term action anticipation‏

D Gong, J Lee, M Kim, SJ Ha… - Proceedings of the IEEE …, 2022‏ - openaccess.thecvf.com‏

The task of predicting future actions from a video is crucial for a real-world agent interacting
with others. When anticipating actions in the distant future, we humans typically consider …‏

שמור צטט צוטט על ידי 71 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning video representations using contrastive bidirectional transformer‏

C Sun, F Baradel, K Murphy, C Schmid - arxiv preprint arxiv:1906.05743, 2019‏ - arxiv.org‏

This paper proposes a self-supervised learning approach for video features that results in
significantly improved performance on downstream tasks (such as video classification …‏

שמור צטט צוטט על ידי 246 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models‏

H Mittal, N Agarwal, SY Lo… - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com‏

We introduce PlausiVL a large video-language model for anticipating action sequences that
are plausible in the real-world. While significant efforts have been made towards anticipating …‏

שמור צטט צוטט על ידי 12 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

When will you do what?-anticipating temporal occurrences of activities

A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions‏

Temporal action segmentation: An analysis of modern techniques‏

Embodiedgpt: Vision-language pre-training via embodied chain of thought‏

Affordances from human videos as a versatile representation for robotics‏

Ego4d: Around the world in 3,000 hours of egocentric video‏

Egocentric video-language pretraining‏

Anticipative video transformer‏

Future transformer for long-term action anticipation‏

Learning video representations using contrastive bidirectional transformer‏

Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models‏