„Google“ mokslinčius

Išsaugoti Cituoti Cituoja 77 Susiję straipsniai Visos 8 versijos

Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org

Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

Išsaugoti Cituoti Cituoja 172 Susiję straipsniai Visos 6 versijos HTML kopija

Tridet: Temporal action detection with relative boundary modeling

D Shi, Y Zhong, Q Cao, L Ma, J Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we present a one-stage framework TriDet for temporal action detection.
Existing methods often suffer from imprecise boundary predictions due to the ambiguous …

Išsaugoti Cituoti Cituoja 429 Susiję straipsniai Visos 9 versijos

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer

Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

Išsaugoti Cituoti Cituoja 1016 Susiję straipsniai Visos 20 versijos HTML kopija

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Išsaugoti Cituoti Cituoja 186 Susiję straipsniai Visos 10 versijos HTML kopija

[PDF] neurips.cc

Egocentric video-language pretraining

KQ Lin, J Wang, M Soldan, M Wray… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Video-Language Pretraining (VLP), which aims to learn transferable representation
to advance a wide range of video-text downstream tasks, has recently received increasing …

Išsaugoti Cituoti Cituoja 71 Susiję straipsniai Visos 7 versijos HTML kopija

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

S Pramanick, Y Song, S Nag, KQ Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …

Išsaugoti Cituoti Cituoja 51 Susiję straipsniai Visos 6 versijos HTML kopija

Unloc: A unified framework for video localization tasks

S Yan, X **ong, A Nagrani, A Arnab… - Proceedings of the …, 2023 - openaccess.thecvf.com

While large-scale image-text pretrained models such as CLIP have been used for multiple
video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos …

Išsaugoti Cituoti Cituoja 125 Susiję straipsniai Visos 7 versijos

TallFormer: Temporal Action Localization with a Long-Memory Transformer

F Cheng, G Bertasius - European Conference on Computer Vision, 2022 - Springer

Most modern approaches in temporal action localization divide this problem into two parts:(i)
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …