محقق Google

YB Lin, YL Sung, J Lei, M Bansal… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Vision transformers (ViTs) have achieved impressive results on various computer vision
tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained …‏

ذخیره ارجاع بیان شده در 78 یافته مقاله‌های مربوط تمام نسخه‌های 5 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

A joint cross-attention model for audio-visual fusion in dimensional emotion recognition‏

RG Praveen, WC de Melo, N Ullah… - Proceedings of the …, 2022‏ - openaccess.thecvf.com‏

Multi-modal emotion recognition has recently gained much attention since it can leverage
diverse and complementary relationships over multiple modalities, such as audio, visual …‏

ذخیره ارجاع بیان شده در 73 یافته مقاله‌های مربوط تمام نسخه‌های 9 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Annotation-free audio-visual segmentation‏

J Liu, Y Wang, C Ju, C Ma… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

Abstract The objective of Audio-Visual Segmentation (AVS) is to localise the sounding
objects within visual scenes by accurately predicting pixel-wise segmentation masks. To …‏

ذخیره ارجاع بیان شده در 35 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Collecting cross-modal presence-absence evidence for weakly-supervised audio-visual event perception‏

J Gao, M Chen, C Xu - … of the IEEE/CVF conference on …, 2023‏ - openaccess.thecvf.com‏

With only video-level event labels, this paper targets at the task of weakly-supervised audio-
visual event perception (WS-AVEP), which aims to temporally localize and categorize events …‏

ذخیره ارجاع بیان شده در 31 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Temporal action localization in the deep learning era: A survey‏

B Wang, Y Zhao, L Yang, T Long… - IEEE Transactions on …, 2023‏ - ieeexplore.ieee.org‏

The temporal action localization research aims to discover action instances from untrimmed
videos, representing a fundamental step in the field of intelligent video understanding. With …‏

ذخیره ارجاع بیان شده در 28 یافته مقاله‌های مربوط تمام نسخه‌های 7

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Boosting weakly-supervised temporal action localization with text information‏

G Li, D Cheng, X Ding, N Wang… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Due to the lack of temporal annotation, current Weakly-supervised Temporal Action
Localization (WTAL) methods are generally stuck into over-complete or incomplete …‏

ذخیره ارجاع بیان شده در 29 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Learning action completeness from points for weakly-supervised temporal action localization‏

P Lee, H Byun - Proceedings of the IEEE/CVF international …, 2021‏ - openaccess.thecvf.com‏

We tackle the problem of localizing temporal intervals of actions with only a single frame
label for each action instance for training. Owing to label sparsity, existing work fails to learn …‏

ذخیره ارجاع بیان شده در 93 یافته مقاله‌های مربوط تمام نسخه‌های 9 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Exploring cross-video and cross-modality signals for weakly-supervised audio-visual video parsing‏

YB Lin, HY Tseng, HY Lee, YY Lin… - Advances in Neural …, 2021‏ - proceedings.neurips.cc‏

The audio-visual video parsing task aims to temporally parse a video into audio or visual
event categories. However, it is labor intensive to temporally annotate audio and visual …‏

ذخیره ارجاع بیان شده در 77 یافته مقاله‌های مربوط تمام نسخه‌های 10 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio-visual segmentation via unlabeled frame exploitation‏

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …‏

ذخیره ارجاع بیان شده در 6 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio-adaptive activity recognition across video domains‏

Y Zhang, H Doughty, L Shao… - Proceedings of the …, 2022‏ - openaccess.thecvf.com‏

This paper strives for activity recognition under domain shift, for example caused by change
of scenery or camera viewpoint. The leading approaches reduce the shift in activity …‏

ذخیره ارجاع بیان شده در 50 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Cross-attentional audio-visual fusion for weakly-supervised action localization

Vision transformers are parameter-efficient audio-visual learners‏

A joint cross-attention model for audio-visual fusion in dimensional emotion recognition‏

Annotation-free audio-visual segmentation‏

Collecting cross-modal presence-absence evidence for weakly-supervised audio-visual event perception‏

Temporal action localization in the deep learning era: A survey‏

Boosting weakly-supervised temporal action localization with text information‏

Learning action completeness from points for weakly-supervised temporal action localization‏

Exploring cross-video and cross-modality signals for weakly-supervised audio-visual video parsing‏

Audio-visual segmentation via unlabeled frame exploitation‏

Audio-adaptive activity recognition across video domains‏