Audio2gestures: Generating diverse gestures from speech audio with conditional variational autoencoders
Generating conversational gestures from speech audio is challenging due to the inherent
one-to-many map** between audio and body motions. Conventional CNNs/RNNs …
one-to-many map** between audio and body motions. Conventional CNNs/RNNs …
Pyannote. audio: neural building blocks for speaker diarization
We introduce pyannote. audio, an open-source toolkit written in Python for speaker
diarization. Based on PyTorch machine learning framework, it provides a set of trainable end …
diarization. Based on PyTorch machine learning framework, it provides a set of trainable end …
A multimodal hierarchical approach to speech emotion recognition from audio and text
Speech emotion recognition (SER) plays a crucial role in improving the quality of man–
machine interfaces in various fields like distance learning, medical science, virtual …
machine interfaces in various fields like distance learning, medical science, virtual …
An emotion-based personalized music recommendation framework for emotion improvement
Z Liu, W Xu, W Zhang, Q Jiang - Information Processing & Management, 2023 - Elsevier
Music has a close relationship with people's emotion and mental status. Music
recommendation has both economic and social benefits. Unfortunately, most existing music …
recommendation has both economic and social benefits. Unfortunately, most existing music …
Hybrid LSTM-transformer model for emotion recognition from speech audio files
Emotion is a vital component in daily human communication and it helps people understand
each other. Emotion recognition plays a crucial role in develo** human-computer …
each other. Emotion recognition plays a crucial role in develo** human-computer …
Automated dysarthria severity classification: A study on acoustic features and deep learning techniques
Assessing the severity level of dysarthria can provide an insight into the patient's
improvement, assist pathologists to plan therapy, and aid automatic dysarthric speech …
improvement, assist pathologists to plan therapy, and aid automatic dysarthric speech …
Music controlnet: Multiple time-varying controls for music generation
Text-to-music generation models are now capable of generating high-quality music audio in
broad styles. However, text control is primarily suitable for the manipulation of global musical …
broad styles. However, text control is primarily suitable for the manipulation of global musical …
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection
This paper presents the Coswara dataset, a dataset containing diverse set of respiratory
sounds and rich meta-data, recorded between April-2020 and February-2022 from 2635 …
sounds and rich meta-data, recorded between April-2020 and February-2022 from 2635 …
Few-shot sound event detection
Locating perceptually similar sound events within a continuous recording is a common task
for various audio applications. However, current tools require users to manually listen to and …
for various audio applications. However, current tools require users to manually listen to and …
Few-shot continual learning for audio classification
Supervised learning for audio classification typically imposes a fixed class vocabulary,
which can be limiting for real-world applications where the target class vocabulary is not …
which can be limiting for real-world applications where the target class vocabulary is not …