Ref-avs: Refer and segment objects in audio-visual scenes

Y Wang, P Sun, D Zhou, G Li, H Zhang… - European Conference on …, 2024‏ - Springer
Traditional reference segmentation tasks have predominantly focused on silent visual
scenes, neglecting the integral role of multimodal perception and interaction in human …

Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

Y Wang, P Sun, Y Li, H Zhang, D Hu - European Conference on Computer …, 2024‏ - Springer
Abstract The Audio-Visual Segmentation (AVS) task aims to segment sounding objects in
the visual space using audio cues. However, in this work, it is recognized that previous AVS …

Do Audio-Visual Segmentation Models Truly Segment Sounding Objects?

J Li, W Zhao, Z Huang, Y Guo, Y Tian - arxiv preprint arxiv:2502.00358, 2025‏ - arxiv.org
Unlike traditional visual segmentation, audio-visual segmentation (AVS) requires the model
not only to identify and segment objects but also to determine whether they are sound …

Unveiling and Mitigating Bias in Audio Visual Segmentation

P Sun, H Zhang, D Hu - Proceedings of the 32nd ACM International …, 2024‏ - dl.acm.org
Community researchers have developed a range of advanced audio-visual segmentation
models aimed at improving the quality of sounding objects' masks. While masks created by …