Google Academic

Multimodal variational auto-encoder based audio-visual segmentation

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Heterogeneous federated domain generalization network with common representation learning for cross-load machinery fault diagnosis

Q Qian, J Luo, Y Qin - IEEE Transactions on Systems, Man, and …, 2024 - ieeexplore.ieee.org

Various federated transfer learning (FTL) methods have been proposed to address domain
shift and safeguard data privacy in the field of fault diagnosis. However, the effectiveness of …

Salvați Citați Citat de 14 ori Articole cu conținut similar

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio-visual segmentation via unlabeled frame exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

Salvați Citați Citat de 6 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Improving audio-visual segmentation with bidirectional generation

D Hao, Y Mao, B He, X Han, Y Dai… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects
within videos down to the pixel level. Traditional approaches often tackle this challenge by …

Salvați Citați Citat de 28 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

A Shahabaz, S Sarkar - IEEE Access, 2024 - ieeexplore.ieee.org

The joint analysis of audio and video is a powerful tool that can be applied to various
contexts, including action, speech, and sound recognition, audio-visual video parsing …

Salvați Citați Citat de 3 ori Articole cu conținut similar Toate cele 4 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bavs: Bootstrap** audio-visual segmentation by integrating foundation knowledge

C Liu, P Li, H Zhang, L Li, Z Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding
sources by predicting pixel-wise maps. Previous methods assume that each sound …

Salvați Citați Citat de 22 ori Articole cu conținut similar Toate cele 7 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Label-anticipated event disentanglement for audio-visual video parsing

J Zhou, D Guo, Y Mao, Y Zhong, X Chang… - European Conference on …, 2024 - Springer

Abstract Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate
events within audio and visual modalities. Multiple events can overlap in the timeline …

Salvați Citați Citat de 7 ori Articole cu conținut similar Toate cele 6 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Object-aware adaptive-positivity learning for audio-visual question answering

Z Li, D Guo, J Zhou, J Zhang, M Wang - Proceedings of the AAAI …, 2024 - ojs.aaai.org

This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to
answer questions derived from untrimmed audible videos. To generate accurate answers …

Salvați Citați Citat de 14 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unveiling the power of audio-visual early fusion transformers with dense interactions through masked modeling

S Mo, P Morgado - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Humans possess a remarkable ability to integrate auditory and visual information enabling a
deeper understanding of the surrounding environment. This early fusion of audio and visual …

Salvați Citați Citat de 11 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Meerkat: Audio-visual large language model for grounding in space and time

S Chowdhury, S Nag, S Dasgupta, J Chen… - … on Computer Vision, 2024 - Springer

Abstract Leveraging Large Language Models' remarkable proficiency in text-based tasks,
recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and …

Salvați Citați Citat de 6 ori Articole cu conținut similar Toate cele 11 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

EchoTrack: Auditory referring multi-object tracking for autonomous driving

J Lin, J Chen, K Peng, X He, Z Li… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which
dynamically tracks specific objects in a video sequence based on audio expressions and …

Salvați Citați Citat de 6 ori Articole cu conținut similar Toate cele 4 versiuni

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Multimodal variational auto-encoder based audio-visual segmentation

Heterogeneous federated domain generalization network with common representation learning for cross-load machinery fault diagnosis

Audio-visual segmentation via unlabeled frame exploitation

Improving audio-visual segmentation with bidirectional generation

Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

Bavs: Bootstrap** audio-visual segmentation by integrating foundation knowledge

Label-anticipated event disentanglement for audio-visual video parsing

Object-aware adaptive-positivity learning for audio-visual question answering

Unveiling the power of audio-visual early fusion transformers with dense interactions through masked modeling

Meerkat: Audio-visual large language model for grounding in space and time

EchoTrack: Auditory referring multi-object tracking for autonomous driving