[PDF][PDF] Effects of Data Resampling on Predicting Customer Churn via a Comparative Tree-based Random Forest and XGBoost

RE Ako, FO Aghware, MD Okpor, MI Akazue… - Journal of Computing …, 2024‏ - unidel.edu.ng
Customer attrition has become the focus of many businesses today–since the online market
space has continued to proffer customers, various choices and alternatives to goods …

Gradient remedy for multi-task learning in end-to-end noise-robust speech recognition

Y Hu, C Chen, R Li, Q Zhu… - ICASSP 2023-2023 IEEE …, 2023‏ - ieeexplore.ieee.org
Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals
for downstream automatic speech recognition (ASR), where multi-task learning strategy is …

Speech separation with pretrained frontend to minimize domain mismatch

W Wang, Z Pan, X Li, S Wang… - IEEE/ACM Transactions on …, 2024‏ - ieeexplore.ieee.org
Speech separation seeks to separate individual speech signals from a speech mixture.
Typically, most separation models are trained on synthetic data due to the unavailability of …

Learning video temporal dynamics with cross-modal attention for robust audio-visual speech recognition

S Kim, K Jang, S Bae, H Kim… - 2024 IEEE Spoken …, 2024‏ - ieeexplore.ieee.org
Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio
and video modalities. In practical environments with noise-corrupted audio, the role of video …

Selective huBERT: Self-supervised pre-training for target speaker in clean and mixture speech

J Lin, M Ge, W Wang, H Li… - IEEE Signal Processing …, 2024‏ - ieeexplore.ieee.org
Self-supervised pre-trained speech models were shown effective for various downstream
speech processing tasks. Since they are mainly pre-trained to map input speech to pseudo …

Aca-net: Towards lightweight speaker verification using asymmetric cross attention

JQ Yip, T Truong, D Ng, C Zhang, Y Ma… - arxiv preprint arxiv …, 2023‏ - arxiv.org
In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding
extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric …

[PDF][PDF] Dual-memory multimodal learning for continual spoken keyword spotting with confidence selection and diversity enhancement

Z Yang, D Ng, X Li, C Zhang, R Jiang, W **, Y Ma… - Proc …, 2023‏ - isca-archive.org
Enabling continual learning (CL) from an ever-changing environment is highly valuable, but
it poses significant challenges for spoken keyword spotting (KWS), which simultaneously …

[HTML][HTML] Environment-aware knowledge distillation for improved resource-constrained edge speech recognition

A Pimentel, HR Guimarães, A Avila, TH Falk - Applied Sciences, 2023‏ - mdpi.com
Recent advances in self-supervised learning have allowed automatic speech recognition
(ASR) systems to achieve state-of-the-art (SOTA) word error rates (WER) while requiring …

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

HJ Chang, J Glass - arxiv preprint arxiv:2311.09117, 2023‏ - arxiv.org
This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-
supervision method for speaker and noise-invariant speech representations by learning …

Are Soft Prompts Good Zero-Shot Learners for Speech Recognition?

D Ng, C Zhang, R Zhang, Y Ma… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org
Large self-supervised pre-trained speech models require computationally expensive fine-
tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient …