Dawn of the transformer era in speech emotion recognition: closing the valence gap
Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …
machine learning tasks. In the audio domain, such architectures have been successfully …
Transfer learning based physics-informed neural networks for solving inverse problems in engineering structures under different loading scenarios
Recently, a class of machine learning methods called physics-informed neural networks
(PINNs) has been proposed and gained prevalence in solving various scientific computing …
(PINNs) has been proposed and gained prevalence in solving various scientific computing …
The singing voice conversion challenge 2023
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …
scientific event aiming to compare and understand different voice conversion (VC) systems …
S3prl-vc: Open-source voice conversion framework with self-supervised speech representations
This paper introduces S3PRL-VC, an open-source voice conversion (VC) framework based
on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech …
on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech …
Phonetic analysis of self-supervised representations of english speech
We present an analysis of discrete units discovered via selfsupervised representation
learning on English speech. We focus on units produced by a pre-trained HuBERT model …
learning on English speech. We focus on units produced by a pre-trained HuBERT model …
Overlapped speech and gender detection with WavLM pre-trained features
This article focuses on overlapped speech and gender detection in order to study
interactions between women and men in French audiovisual media (Gender Equality …
interactions between women and men in French audiovisual media (Gender Equality …
A comparative study of self-supervised speech representation based voice conversion
We present a large-scale comparative study of self-supervised speech representation (S3R)-
based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive …
based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive …
Tandem multitask training of speaker diarisation and speech recognition for meeting transcription
Self-supervised-learning-based pre-trained models for speech data, such as Wav2Vec 2.0
(W2V2), have become the backbone of many speech tasks. In this paper, to achieve speaker …
(W2V2), have become the backbone of many speech tasks. In this paper, to achieve speaker …
Unstructured Pruning and Low Rank Factorisation of Self-Supervised Pre-Trained Speech Models
H Wang, WQ Zhang - IEEE Journal of Selected Topics in Signal …, 2024 - ieeexplore.ieee.org
Self-supervised pre-trained speech models require significant memory and computational
resources, limiting their applicability to many speech tasks. Unstructured pruning is a …
resources, limiting their applicability to many speech tasks. Unstructured pruning is a …
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
N Inoue, S Otake, T Hirose, M Ohi… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
Self-supervised learning has emerged as a key approach for learning generic
representations from speech data. Despite promising results in downstream tasks such as …
representations from speech data. Despite promising results in downstream tasks such as …