Audio2gestures: Generating diverse gestures from speech audio with conditional variational autoencoders

J Li, D Kang, W Pei, X Zhe, Y Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Generating conversational gestures from speech audio is challenging due to the inherent
one-to-many map** between audio and body motions. Conventional CNNs/RNNs …

Pyannote. audio: neural building blocks for speaker diarization

H Bredin, R Yin, JM Coria, G Gelly… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We introduce pyannote. audio, an open-source toolkit written in Python for speaker
diarization. Based on PyTorch machine learning framework, it provides a set of trainable end …

A multimodal hierarchical approach to speech emotion recognition from audio and text

P Singh, R Srivastava, KPS Rana, V Kumar - Knowledge-Based Systems, 2021 - Elsevier
Speech emotion recognition (SER) plays a crucial role in improving the quality of man–
machine interfaces in various fields like distance learning, medical science, virtual …

An emotion-based personalized music recommendation framework for emotion improvement

Z Liu, W Xu, W Zhang, Q Jiang - Information Processing & Management, 2023 - Elsevier
Music has a close relationship with people's emotion and mental status. Music
recommendation has both economic and social benefits. Unfortunately, most existing music …

Hybrid LSTM-transformer model for emotion recognition from speech audio files

F Andayani, LB Theng, MT Tsun, C Chua - IEEE Access, 2022 - ieeexplore.ieee.org
Emotion is a vital component in daily human communication and it helps people understand
each other. Emotion recognition plays a crucial role in develo** human-computer …

Automated dysarthria severity classification: A study on acoustic features and deep learning techniques

AA Joshy, R Rajan - IEEE Transactions on Neural Systems and …, 2022 - ieeexplore.ieee.org
Assessing the severity level of dysarthria can provide an insight into the patient's
improvement, assist pathologists to plan therapy, and aid automatic dysarthric speech …

Music controlnet: Multiple time-varying controls for music generation

SL Wu, C Donahue, S Watanabe… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Text-to-music generation models are now capable of generating high-quality music audio in
broad styles. However, text control is primarily suitable for the manipulation of global musical …

Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection

D Bhattacharya, NK Sharma, D Dutta, SR Chetupalli… - Scientific Data, 2023 - nature.com
This paper presents the Coswara dataset, a dataset containing diverse set of respiratory
sounds and rich meta-data, recorded between April-2020 and February-2022 from 2635 …

Few-shot sound event detection

Y Wang, J Salamon, NJ Bryan… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Locating perceptually similar sound events within a continuous recording is a common task
for various audio applications. However, current tools require users to manually listen to and …

Few-shot continual learning for audio classification

Y Wang, NJ Bryan, M Cartwright… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Supervised learning for audio classification typically imposes a fixed class vocabulary,
which can be limiting for real-world applications where the target class vocabulary is not …