Face mask recognition from audio: The MASC database and an overview on the mask challenge

MM Mohamed, MA Nessiem, A Batliner, C Bergler… - Pattern Recognition, 2022 - Elsevier
The sudden outbreak of COVID-19 has resulted in tough challenges for the field of
biometrics due to its spread via physical contact, and the regulations of wearing face masks …

Dilated convolution neural network with LeakyReLU for environmental sound classification

X Zhang, Y Zou, W Shi - 2017 22nd international conference …, 2017 - ieeexplore.ieee.org
Environmental sound classification task (ESC) is still open and challenging. In contrast to
speech, sounds of a specific acoustic event may be produced by a wide variety of sources …

The INTERSPEECH 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates

BW Schuller, A Batliner, C Bergler, C Mascolo… - arxiv preprint arxiv …, 2021 - arxiv.org
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different
problems for the first time in a research competition under well-defined conditions: In the …

Time-frequency visual representation and texture features for audio applications: a comprehensive review, recent trends, and challenges

YD Mistry, GK Birajdar, AM Khodke - Multimedia Tools and Applications, 2023 - Springer
The conventional audio feature extraction methods employed in the audio analysis are
categorized into time-domain and frequency-domain. Recently, a new audio feature …

Deep convolutional neural networks and data augmentation for acoustic event detection

N Takahashi, M Gygli, B Pfister, L Van Gool - arxiv preprint arxiv …, 2016 - arxiv.org
We propose a novel method for Acoustic Event Detection (AED). In contrast to speech,
sounds coming from acoustic events may be produced by a wide variety of sources …

openXBOW--Introducing the Passau open-source crossmodal Bag-of-Words toolkit

M Schmitt, B Schuller - Journal of Machine Learning Research, 2017 - jmlr.org
We introduce openXBOW, an open-source toolkit for the generation of bag-of-words (BoW)
representations from multimodal input. In the BoW principle, word histograms were first used …

Aenet: Learning deep audio features for video analysis

N Takahashi, M Gygli… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
We propose a new deep network for audio event recognition, called AENet. In contrast to
speech, sounds coming from audio events may be produced by a wide variety of sources …

The interspeech 2017 computational paralinguistics challenge: Addressee, cold & snoring

B Schuller, S Steidl, A Batliner, E Bergelson… - Computational …, 2017 - pure.mpg.de
Zusammenfassung The INTERSPEECH 2017 Computational Paralinguistics Challenge
addresses three different problems for the first time in research competition under well …

[PDF][PDF] At the border of acoustics and linguistics: Bag-of-audio-words for the recognition of emotions in speech

M Schmitt, F Ringeval, B Schuller - 2016 - opus.bibliothek.uni-augsburg.de
Recognition of natural emotion in speech is a challenging task. Different methods have been
proposed to tackle this complex task, such as acoustic feature brute-forcing or even endto …

The interspeech 2018 computational paralinguistics challenge: Atypical & self-assessed affect, crying & heart beats

BW Schuller, S Steidl, A Batliner… - … Annual Conference of …, 2018 - oulurepo.oulu.fi
Abstract The INTERSPEECH 2018 Computational Paralinguistics Challenge addresses four
different problems for the first time in a research competition under well-defined conditions …