STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

A Politis, K Shimada, P Sudarsanam… - arxiv preprint arxiv …, 2022 - arxiv.org
This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (STARS22) dataset
for sound event localization and detection, comprised of spatial recordings of real scenes …

STARSS23: An audio-visual dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

K Shimada, A Politis, P Sudarsanam… - Advances in …, 2024 - proceedings.neurips.cc
While direction of arrival (DOA) of sound events is generally estimated from multichannel
audio data recorded in a microphone array, sound events usually derive from visually …

SELD U-Net: Joint Optimization of Sound Event Localization and Detection with Noise Reduction

Y Shin, YG Kim, CH Choi, DJ Kim, C Chun - IEEE Access, 2023 - ieeexplore.ieee.org
Sound event localization and detection (SELD) is a combined task that classifies acoustic
events from audio signals, estimates temporal boundaries, and identifies event locations …

Tf-mamba: A time-frequency network for sound source localization

Y **ao, RK Das - arxiv preprint arxiv:2409.05034, 2024 - arxiv.org
Sound source localization (SSL) determines the position of sound sources using multi-
channel audio data. It is commonly used to improve speech enhancement and separation …

FN-SSL: Full-band and narrow-band fusion for sound source localization

Y Wang, B Yang, X Li - arxiv preprint arxiv:2305.19610, 2023 - arxiv.org
Extracting direct-path spatial features is critical for sound source localization in adverse
acoustic environments. This paper proposes a full-band and narrow-band fusion network for …

Audio inputs for active speaker detection and localization via microphone array

D Berghi, PJB Jackson - … of Signal Processing to Audio and …, 2023 - ieeexplore.ieee.org
This study considers the problem of detecting and locating an active talker's horizontal
position from multichannel audio captured by a microphone array. We refer to this as active …

Leveraging Visual Supervision for Array-Based Active Speaker Detection and Localization

D Berghi, PJB Jackson - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Conventional audio-visual approaches for active speaker detection (ASD) typically rely on
visually pre-extracted face tracks and the corresponding single-channel audio to find the …

[PDF][PDF] Sound event localization and detection with pre-trained audio spectrogram transformer and multichannel separation network

R Scheibler, T Komatsu, Y Fujita, M Hentschel - omni (1ch), 2022 - dcase.community
We propose a sound event localization and detection system based on a CNN-Conformer
base network. Our main contribution is to evaluate the use of pre-trained elements in this …

Text-Queried Target Sound Event Localization

J Zhao, X Qian, Y Xu, H Liu, Y Cao… - 2024 32nd …, 2024 - ieeexplore.ieee.org
Sound event localization and detection (SELD) aims to determine the appearance of sound
classes, together with their Direction of Arrival (DOA). However, current SELD systems can …

Learning multi-target TDOA features for sound event localization and detection

A Berg, J Engman, J Gulin, K Åström… - arxiv preprint arxiv …, 2024 - arxiv.org
Sound event localization and detection (SELD) systems using audio recordings from a
microphone array rely on spatial cues for determining the location of sound events. As a …