The influence of dataset partitioning on dysfluency detection systems

SP Bayerl, D Wagner, E Nöth, T Bocklet… - … Conference on Text …, 2022 - Springer
This paper empirically investigates the influence of different data splits and splitting
strategies on the performance of dysfluency detection systems. For this, we perform …

Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments

S Baghel, S Ramoji, S Jain, PR Chowdhuri… - Speech …, 2024 - Elsevier
In multi-lingual societies, where multiple languages are spoken in a small geographic
vicinity, informal conversations often involve mix of languages. Existing speech technologies …

Hats: An open data set integrating human perception applied to the evaluation of automatic speech recognition metrics

T Bañeras-Roux, J Wottawa, M Rouvier… - … Conference on Text …, 2023 - Springer
Abstract Conventionally, Automatic Speech Recognition (ASR) systems are evaluated on
their ability to correctly recognize each word contained in a speech signal. In this context, the …

Crdnn-bilstm knowledge distillation model towards enhancing the automatic speech recognition

L Ashok Kumar, D Karthika Renuka, KS Naveena… - SN Computer …, 2024 - Springer
Numerous automatic speech recognition (ASR) models have been developed in recent
years, but they suffer from the drawback of being large models that take more time to train …

A Paradigm for Interpreting Metrics and Measuring Error Severity in Automatic Speech Recognition

T Bañeras-Roux, M Rouvier, J Wottawa… - … Conference on Text …, 2024 - Springer
The evaluation of automatic speech transcriptions relies heavily on metrics such as Word
Error Rate (WER) and Character Error Rate (CER). However, these metrics have faced …

Multi‐modal video search by examples—A video quality impact analysis

G Wu, A Haider, X Tian, E Loweimi… - IET Computer …, 2024 - Wiley Online Library
As the proliferation of video content continues, and many video archives lack suitable
metadata, therefore, video retrieval, particularly through example‐based search, has …

Integrating Voice Activity Detection to Enhance Robustness of On-Device Speaker Verification

KA Hoang, K Duong, TNV Minh, T Le… - Pacific Rim International …, 2024 - Springer
Mobile devices are integral to daily life, necessitating secure authentication methods like
speaker verification for enhanced security and convenience. While deep neural networks …

[HTML][HTML] Multilingual non-intrusive binaural intelligibility prediction based on phone classification

J Roßbach, KC Wagener, BT Meyer - Computer Speech & Language, 2025 - Elsevier
Speech intelligibility (SI) prediction models are a valuable tool for the development of
speech processing algorithms for hearing aids or consumer electronics. For the use in …

Dysphonia Diagnosis Using Self-supervised Speech Models in Mono and Cross-Lingual Settings

D Aziz, D Sztahó - International Conference on Text, Speech, and …, 2024 - Springer
Voice disorders like dysphonia can significantly impact a person's quality of life, so proper
diagnostic methods are crucial. Previous approaches have primarily used datasets of a …

Deep Speaker Embeddings for Speaker Verification of Children

MH Abed, D Sztahó - International Conference on Text, Speech, and …, 2024 - Springer
Currently, deep speaker embedding models are the most advanced feature extraction
methods for speaker verification. However, their effectiveness in identifying children's voices …