Unified multisensory perception: Weakly-supervised audio-visual video parsing

Y Tian, D Li, C Xu - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer
In this paper, we introduce a new problem, named audio-visual video parsing, which aims to
parse a video into temporal event segments and label them as either audible, visible, or …

Sound event detection in domestic environments with weakly labeled data and soundscape synthesis

N Turpault, R Serizel, AP Shah… - Workshop on Detection …, 2019 - inria.hal.science
This paper presents Task 4 of the Detection and Classification of Acoustic Scenes and
Events (DCASE) 2019 challenge and provides a first analysis of the challenge results. The …

A framework for the robust evaluation of sound event detection

Ç Bilen, G Ferroni, F Tuveri, J Azcarreta… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This work defines a new framework for performance evaluation of polyphonic sound event
detection (SED) systems, which overcomes the limitations of the conventional collar-based …

General-purpose tagging of freesound audio with audioset labels: Task description, dataset, and baseline

E Fonseca, M Plakal, F Font, DPW Ellis… - arxiv preprint arxiv …, 2018 - arxiv.org
This paper describes Task 2 of the DCASE 2018 Challenge, titled" General-purpose audio
tagging of Freesound content with AudioSet labels". This task was hosted on the Kaggle …

Sound event detection of weakly labelled data with cnn-transformer and automatic threshold optimization

Q Kong, Y Xu, W Wang… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Sound event detection (SED) is a task to detect sound events in an audio recording. One
challenge of the SED task is that many datasets such as the Detection and Classification of …

Sound event detection in synthetic domestic environments

R Serizel, N Turpault, A Shah… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We present a comparative analysis of the performance of state-of-the-art sound event
detection systems. In particular, we study the robustness of the systems to noise and signal …

Training sound event detection on a heterogeneous dataset

N Turpault, R Serizel - arxiv preprint arxiv:2007.03931, 2020 - arxiv.org
Training a sound event detection algorithm on a heterogeneous dataset including both
recorded and synthetic soundscapes that can have various labeling granularity is a non …

A transformer-based audio captioning model with keyword estimation

Y Koizumi, R Masumura, K Nishida, M Yasuda… - arxiv preprint arxiv …, 2020 - arxiv.org
One of the problems with automated audio captioning (AAC) is the indeterminacy in word
selection corresponding to the audio event/scene. Since one acoustic event/scene can be …

Sound event detection in the DCASE 2017 challenge

A Mesaros, A Diment, B Elizalde… - … on Audio, Speech …, 2019 - ieeexplore.ieee.org
Each edition of the challenge on Detection and Classification of Acoustic Scenes and Events
(DCASE) contained several tasks involving sound event detection in different setups …

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

Q Kong, Y Cao, T Iqbal, Y Xu, W Wang… - arxiv preprint arxiv …, 2019 - arxiv.org
The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge
focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 …