Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Learning in audio-visual context: A review, analysis, and new perspective
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …
understanding. To mimic human perception ability, audio-visual learning, aimed at …
Self-supervised multimodal learning: A survey
Multimodal learning, which aims to understand and analyze information from multiple
modalities, has achieved substantial progress in the supervised regime in recent years …
modalities, has achieved substantial progress in the supervised regime in recent years …
Learning to answer questions in dynamic audio-visual scenarios
In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to
answer questions regarding different visual objects, sounds, and their associations in …
answer questions regarding different visual objects, sounds, and their associations in …
A light weight model for active speaker detection
Active speaker detection is a challenging task in audio-visual scenarios, with the aim to
detect who is speaking in one or more speaker scenarios. This task has received …
detect who is speaking in one or more speaker scenarios. This task has received …
Annotation-free audio-visual segmentation
Abstract The objective of Audio-Visual Segmentation (AVS) is to localise the sounding
objects within visual scenes by accurately predicting pixel-wise segmentation masks. To …
objects within visual scenes by accurately predicting pixel-wise segmentation masks. To …
Audio-visual segmentation via unlabeled frame exploitation
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …
Although great progress has been witnessed we experimentally reveal that current methods …
Progressive spatio-temporal perception for audio-visual question answering
Audio-Visual Question Answering (AVQA) task aims to answer questions about different
visual objects, sounds, and their associations in videos. Such naturally multi-modal videos …
visual objects, sounds, and their associations in videos. Such naturally multi-modal videos …
Egocentric auditory attention localization in conversations
In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …
auditory attention, or the ability to focus on a particular speaker while tuning out others …
Prompting segmentation with sound is generalizable audio-visual source localizer
Never having seen an object and heard its sound simultaneously, can the model still
accurately localize its visual position from the input audio? In this work, we concentrate on …
accurately localize its visual position from the input audio? In this work, we concentrate on …
Semantic and relation modulation for audio-visual event localization
We study the problem of localizing audio-visual events that are both audible and visible in a
video. Existing works focus on encoding and aligning audio and visual features at the …
video. Existing works focus on encoding and aligning audio and visual features at the …