Google Академик

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Сачувај Цитирај 198 пута наведен Сродни чланци Све верзије (7) Претрага библиотека HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org

Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

Сачувај Цитирај 77 пута наведен Сродни чланци Све верзије (8)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Advancing high-resolution video-language representation with large-scale video transcriptions

H Xue, T Hang, Y Zeng, Y Sun, B Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com

We study joint video and language (VL) pre-training to enable cross-modality learning and
benefit plentiful downstream VL tasks. Existing works either extract low-quality video …

Сачувај Цитирај 195 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

TallFormer: Temporal Action Localization with a Long-Memory Transformer

F Cheng, G Bertasius - European Conference on Computer Vision, 2022 - Springer

Most modern approaches in temporal action localization divide this problem into two parts:(i)
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …

Сачувај Цитирај 125 пута наведен Сродни чланци Све верзије (7)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Temporal action detection with structured segment networks

Y Zhao, Y **ong, L Wang, Z Wu… - Proceedings of the …, 2017 - openaccess.thecvf.com

Detecting actions in untrimmed videos is an important yet challenging task. In this paper, we
present the structured segment network (SSN), a novel framework which models the …

Сачувај Цитирај 1133 пута наведен Сродни чланци Све верзије (16) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Man: Moment alignment network for natural language moment retrieval via iterative graph adjustment

D Zhang, X Dai, X Wang, YF Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com

This research strives for natural language moment retrieval in long, untrimmed video
streams. The problem is not trivial especially when a video contains multiple moments of …

Сачувај Цитирај 360 пута наведен Сродни чланци Све верзије (10) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Weakly-supervised action localization by generative attention modeling

B Shi, Q Dai, Y Mu, J Wang - Proceedings of the IEEE/CVF …, 2020 - openaccess.thecvf.com

Weakly-supervised temporal action localization is a problem of learning an action
localization model with only video-level action labeling available. The general framework …

Сачувај Цитирај 192 пута наведен Сродни чланци Све верзије (9) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Exploring denoised cross-video contrast for weakly-supervised temporal action localization

J Li, T Yang, W Ji, J Wang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Weakly-supervised temporal action localization aims to localize actions in untrimmed videos
with only video-level labels. Most existing methods address this problem with a" localization …

Сачувај Цитирај 73 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An efficient spatio-temporal pyramid transformer for action detection

Y Weng, Z Pan, M Han, X Chang, B Zhuang - European Conference on …, 2022 - Springer

The task of action detection aims at deducing both the action category and localization of the
start and end moment for each action instance in a long, untrimmed video. While vision …

Сачувај Цитирај 37 пута наведен Сродни чланци Све верзије (9)

[Free GPT-4]
[DeepSeek]

[PDF] futuretechsci.org

Top-heavy CapsNets based on spatiotemporal non-local for action recognition

MH Ha - Journal of Computing Theories and Applications, 2024 - dl.futuretechsci.org

To effectively comprehend human actions, we have developed a Deep Neural Network
(DNN) that utilizes inner spatiotemporal non-locality to capture meaningful semantic context …

Сачувај Цитирај 8 пута наведен Сродни чланци HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

S3d: single shot multi-span detector via fully 3d convolutional networks

Vision-language pre-training: Basics, recent advances, and future trends

Deep learning-based action detection in untrimmed videos: A survey

Advancing high-resolution video-language representation with large-scale video transcriptions

TallFormer: Temporal Action Localization with a Long-Memory Transformer

Temporal action detection with structured segment networks

Man: Moment alignment network for natural language moment retrieval via iterative graph adjustment

Weakly-supervised action localization by generative attention modeling

Exploring denoised cross-video contrast for weakly-supervised temporal action localization

An efficient spatio-temporal pyramid transformer for action detection

Top-heavy CapsNets based on spatiotemporal non-local for action recognition