Sound event detection: A tutorial

A Mesaros, T Heittola, T Virtanen… - IEEE Signal …, 2021 - ieeexplore.ieee.org
Imagine standing on a street corner in the city. With your eyes closed you can hear and
recognize a succession of sounds: cars passing by, people speaking, their footsteps when …

Fsd50k: an open dataset of human-labeled sound events

E Fonseca, X Favory, J Pons, F Font… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Most existing datasets for sound event recognition (SER) are relatively small and/or domain-
specific, with the exception of AudioSet, based on over 2 M tracks from YouTube videos and …

Learning sound event classifiers from web audio with noisy labels

E Fonseca, M Plakal, DPW Ellis, F Font… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
As sound event classification moves towards larger datasets, issues of label noise become
inevitable. Web sites can supply large volumes of user-contributed audio and metadata, but …

Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths

MD McDonnell, W Gao - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
We investigate the problem of acoustic scene classification, using a deep residual network
applied to log-mel spectrograms complemented by log-mel deltas and delta-deltas. We …

The receptive field as a regularizer in deep convolutional neural networks for acoustic scene classification

K Koutini, H Eghbal-Zadeh, M Dorfer… - 2019 27th European …, 2019 - ieeexplore.ieee.org
Convolutional Neural Networks (CNNs) have had great success in many machine vision as
well as machine audition tasks. Many image recognition network architectures have …

Detecting spoofing attacks using vgg and sincnet: but-omilia submission to asvspoof 2019 challenge

H Zeinali, T Stafylakis, G Athanasopoulou… - arxiv preprint arxiv …, 2019 - arxiv.org
In this paper, we present the system description of the joint efforts of Brno University of
Technology (BUT) and Omilia--Conversational Intelligence for the ASVSpoof2019 Spoofing …

Receptive-field-regularized CNN variants for acoustic scene classification

K Koutini, H Eghbal-Zadeh, G Widmer - arxiv preprint arxiv:1909.02859, 2019 - arxiv.org
Acoustic scene classification and related tasks have been dominated by Convolutional
Neural Networks (CNNs). Top-performing CNNs use mainly audio spectograms as input and …

Musical tempo and key estimation using convolutional neural networks with directional filters

H Schreiber, M Müller - arxiv preprint arxiv:1903.10839, 2019 - arxiv.org
In this article we explore how the different semantics of spectrograms' time and frequency
axes can be exploited for musical tempo and key estimation using Convolutional Neural …

Semi-supervised triplet loss based learning of ambient audio embeddings

N Turpault, R Serizel, E Vincent - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Deep neural networks are particularly useful to learn relevant representations from data.
Recent studies have demonstrated the potential of unsupervised representation learning for …

Emotion and theme recognition in music with frequency-aware RF-regularized CNNs

K Koutini, S Chowdhury, V Haunschmid… - arxiv preprint arxiv …, 2019 - arxiv.org
We present CP-JKU submission to MediaEval 2019; a Receptive Field-(RF)-regularized and
Frequency-Aware CNN approach for tagging music with emotion/mood labels. We perform …