Google Академик

J Zhou, D Guo, Y Mao, Y Zhong, X Chang… - European Conference on …, 2024 - Springer

Abstract Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate
events within audio and visual modalities. Multiple events can overlap in the timeline …

Сачувај Цитирај 7 пута наведен Сродни чланци Све верзије (6)

Category-adaptive label discovery and noise rejection for multi-label recognition with partial positive labels

T Pu, Q Lao, H Wu, T Chen, L Tian… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

As a cost-effective alternative to standard multi-label learning, the multi-label image
recognition with partial positive labels (MLR-PPL) task attracts increasing attention, in which …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (3)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling

KK Rachavarapu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

In this paper we address the weakly-supervised Audio-Visual Video Parsing (AVVP)
problem which aims at labeling events in a video as audible visible or both and temporally …

Сачувај Цитирај Сродни чланци Све верзије (3) HTML верзија

Resisting Noise in Pseudo Labels: Audible Video Event Parsing With Evidential Learning

X Jiang, X Xu, L Zhu, Z Sun, A Cichocki… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Perceiving temporal events and discriminating their modality types in audible videos, which
is also called audio–visual video parsing (AVVP), is becoming a research hotspot in …

Сачувај Цитирај Сродни чланци

Segment-level event perception with semantic dictionary for weakly supervised audio-visual video parsing

Z **e, Y Yang, Y Yu, J Wang, Y Liu, Y Jiang - Knowledge-Based Systems, 2025 - Elsevier

Videos capture auditory and visual signals, each conveying distinct events. Simultaneously
analyzing these multimodal signals enhances human comprehension of the video content …

Сачувај Цитирај Сродни чланци

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

G Li, H Du, D Hu - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org

The Audio Visual Question Answering (AVQA) task aims to answer questions related to
various visual objects, sounds, and their interactions in videos. Such naturally multimodal …

Сачувај Цитирај Сродни чланци Све верзије (4)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering

T Yang, Y Nan, L Dai, Z Liang, Y Tian… - arxiv preprint arxiv …, 2024 - arxiv.org

Audio-Visual Question Answering (AVQA) is a challenging task that involves answering
questions based on both auditory and visual information in videos. A significant challenge is …

Сачувај Цитирај Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization

T Geng, T Wang, Y Zhang, J Duan, W Guan… - arxiv preprint arxiv …, 2024 - arxiv.org

Video localization tasks aim to temporally locate specific instances in videos, including
temporal action localization (TAL), sound event detection (SED) and audio-visual event …

Сачувај Цитирај Сродни чланци HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

Y Gao, X Sun, G Lv, D Yu, S Niu - arxiv preprint arxiv:2412.19563, 2024 - arxiv.org

Audio-visual video parsing (AVVP) aims to recognize audio and visual event labels with
precise temporal boundaries, which is quite challenging since audio or visual modality might …

Сачувај Цитирај Сродни чланци Све верзије (2) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Boosting positive segments for weakly-supervised audio-visual video parsing

Label-anticipated event disentanglement for audio-visual video parsing

Category-adaptive label discovery and noise rejection for multi-label recognition with partial positive labels

Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling

Resisting Noise in Pseudo Labels: Audible Video Event Parsing With Evidential Learning

Segment-level event perception with semantic dictionary for weakly supervised audio-visual video parsing

Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering

UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization

Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing