Survey of deep learning paradigms for speech processing

KB Bhangale, M Kothandaraman - Wireless Personal Communications, 2022 - Springer
Over the past decades, a particular focus is given to research on machine learning
techniques for speech processing applications. However, in the past few years, research …

[HTML][HTML] Music emotion recognition using convolutional long short term memory deep neural networks

S Hizlisoy, S Yildirim, Z Tufekci - Engineering Science and Technology, an …, 2021 - Elsevier
In this paper, we propose an approach for music emotion recognition based on
convolutional long short term memory deep neural network (CLDNN) architecture. In …

Multiclass audio segmentation based on recurrent neural networks for broadcast domain data

P Gimeno, I Viñals, A Ortega, A Miguel… - EURASIP Journal on …, 2020 - Springer
This paper presents a new approach based on recurrent neural networks (RNN) to the
multiclass audio segmentation task whose goal is to classify an audio signal as speech …

A multi-resolution CRNN-based approach for semi-supervised sound event detection in DCASE 2020 challenge

D De Benito-Gorrón, D Ramos, DT Toledano - IEEE Access, 2021 - ieeexplore.ieee.org
Sound Event Detection is a task with a rising relevance over the recent years in the field of
audio signal processing, due to the creation of specific datasets such as Google AudioSet or …

Ethio-Semitic language identification using convolutional neural networks with data augmentation

AA Alemu, MD Melese, AO Salau - Multimedia Tools and Applications, 2024 - Springer
In today's digital world, natural language is used to exchange information among humans,
and it has now advanced to the point of being an evolution criteria for technology. The …

Towards cross-modal pre-training and learning tempo-spatial characteristics for audio recognition with convolutional and recurrent neural networks

S Amiriparian, M Gerczuk, S Ottl, L Stappen… - EURASIP Journal on …, 2020 - Springer
In this paper, we investigate the performance of two deep learning paradigms for the audio-
based tasks of acoustic scene, environmental sound and domestic activity classification. In …

End-to-end sequence labeling via convolutional recurrent neural network with a connectionist temporal classification layer

X Huang, L Qiao, W Yu, J Li, Y Ma - International Journal of Computational …, 2020 - Springer
Sequence labeling is a common machine-learning task which not only needs the most likely
prediction of label for a local input but also seeks the most suitable annotation for the whole …

A large TV dataset for speech and music activity detection

YN Hung, CW Wu, I Orife, A Hipple, W Wolcott… - EURASIP Journal on …, 2022 - Springer
Automatic speech and music activity detection (SMAD) is an enabling task that can help
segment, index, and pre-process audio content in radio broadcast and TV programs …

Semi-supervised machine condition monitoring by learning deep discriminative audio features

I Thoidis, M Giouvanakis, G Papanikolaou - Electronics, 2021 - mdpi.com
In this study, we aim to learn highly descriptive representations for a wide set of machinery
sounds and exploit this knowledge to perform condition monitoring of mechanical …

Music emotion recognition based on a modified brain emotional learning model

M Jandaghian, S Setayeshi, F Razzazi… - Multimedia Tools and …, 2023 - Springer
Listening to music can evoke different emotions in humans. Music emotion recognition
(MER) can predict a person's emotions before listening to a song. However, there are three …