Heterogeneous federated domain generalization network with common representation learning for cross-load machinery fault diagnosis

Q Qian, J Luo, Y Qin - IEEE Transactions on Systems, Man, and …, 2024 - ieeexplore.ieee.org
Various federated transfer learning (FTL) methods have been proposed to address domain
shift and safeguard data privacy in the field of fault diagnosis. However, the effectiveness of …

Audio-visual segmentation via unlabeled frame exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

Improving audio-visual segmentation with bidirectional generation

D Hao, Y Mao, B He, X Han, Y Dai… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects
within videos down to the pixel level. Traditional approaches often tackle this challenge by …

Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

A Shahabaz, S Sarkar - IEEE Access, 2024 - ieeexplore.ieee.org
The joint analysis of audio and video is a powerful tool that can be applied to various
contexts, including action, speech, and sound recognition, audio-visual video parsing …

Bavs: Bootstrap** audio-visual segmentation by integrating foundation knowledge

C Liu, P Li, H Zhang, L Li, Z Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding
sources by predicting pixel-wise maps. Previous methods assume that each sound …

Label-anticipated event disentanglement for audio-visual video parsing

J Zhou, D Guo, Y Mao, Y Zhong, X Chang… - European Conference on …, 2024 - Springer
Abstract Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate
events within audio and visual modalities. Multiple events can overlap in the timeline …

Object-aware adaptive-positivity learning for audio-visual question answering

Z Li, D Guo, J Zhou, J Zhang, M Wang - Proceedings of the AAAI …, 2024 - ojs.aaai.org
This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to
answer questions derived from untrimmed audible videos. To generate accurate answers …

Unveiling the power of audio-visual early fusion transformers with dense interactions through masked modeling

S Mo, P Morgado - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Humans possess a remarkable ability to integrate auditory and visual information enabling a
deeper understanding of the surrounding environment. This early fusion of audio and visual …

Meerkat: Audio-visual large language model for grounding in space and time

S Chowdhury, S Nag, S Dasgupta, J Chen… - … on Computer Vision, 2024 - Springer
Abstract Leveraging Large Language Models' remarkable proficiency in text-based tasks,
recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and …

EchoTrack: Auditory referring multi-object tracking for autonomous driving

J Lin, J Chen, K Peng, X He, Z Li… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which
dynamically tracks specific objects in a video sequence based on audio expressions and …