Dawn of the transformer era in speech emotion recognition: closing the valence gap

J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …

Transfer learning based physics-informed neural networks for solving inverse problems in engineering structures under different loading scenarios

C Xu, BT Cao, Y Yuan, G Meschke - Computer Methods in Applied …, 2023 - Elsevier
Recently, a class of machine learning methods called physics-informed neural networks
(PINNs) has been proposed and gained prevalence in solving various scientific computing …

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

S3prl-vc: Open-source voice conversion framework with self-supervised speech representations

WC Huang, SW Yang, T Hayashi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This paper introduces S3PRL-VC, an open-source voice conversion (VC) framework based
on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech …

Phonetic analysis of self-supervised representations of english speech

D Wells, H Tang, K Richmond - 23rd Annual Conference of the …, 2022 - research.ed.ac.uk
We present an analysis of discrete units discovered via selfsupervised representation
learning on English speech. We focus on units produced by a pre-trained HuBERT model …

Overlapped speech and gender detection with WavLM pre-trained features

M Lebourdais, M Tahon, A Laurent… - arxiv preprint arxiv …, 2022 - arxiv.org
This article focuses on overlapped speech and gender detection in order to study
interactions between women and men in French audiovisual media (Gender Equality …

A comparative study of self-supervised speech representation based voice conversion

WC Huang, SW Yang, T Hayashi… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
We present a large-scale comparative study of self-supervised speech representation (S3R)-
based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive …

Tandem multitask training of speaker diarisation and speech recognition for meeting transcription

X Zheng, C Zhang, PC Woodland - arxiv preprint arxiv:2207.03852, 2022 - arxiv.org
Self-supervised-learning-based pre-trained models for speech data, such as Wav2Vec 2.0
(W2V2), have become the backbone of many speech tasks. In this paper, to achieve speaker …

Unstructured Pruning and Low Rank Factorisation of Self-Supervised Pre-Trained Speech Models

H Wang, WQ Zhang - IEEE Journal of Selected Topics in Signal …, 2024 - ieeexplore.ieee.org
Self-supervised pre-trained speech models require significant memory and computational
resources, limiting their applicability to many speech tasks. Unstructured pruning is a …

ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks

N Inoue, S Otake, T Hirose, M Ohi… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
Self-supervised learning has emerged as a key approach for learning generic
representations from speech data. Despite promising results in downstream tasks such as …