Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking

Q Wang, H Muckenhirn, K Wilson, P Sridhar… - arxiv preprint arxiv …, 2018 - arxiv.org
In this paper, we present a novel system that separates the voice of a target speaker from
multi-speaker signals, by making use of a reference signal from the target speaker. We …

[PDF][PDF] Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems.

S Basak, H Agrawal, S Jena, S Gite… - … in Engineering & …, 2023 - cdn.techscience.cn
Speech recognition systems have become a unique human-computer interaction (HCI)
family. Speech is one of the most naturally developed human abilities; speech signal …

Noise robust automatic speech recognition: review and analysis

M Dua, Akanksha, S Dua - International Journal of Speech Technology, 2023 - Springer
Abstract Automatic Speech Recognition (ASR) system is an emerging technology used in
various fields such as robotics, traffic controls, and healthcare, etc. The leading cause of …

Speech robust bench: a robustness benchmark for speech recognition

MA Shah, DS Noguero, MA Heikkila, B Raj… - arxiv preprint arxiv …, 2024 - arxiv.org
As Automatic Speech Recognition (ASR) models become ever more pervasive, it is
important to ensure that they make reliable predictions under corruptions present in the …

Deaf and hard-of-hearing users' preferences for hearing speakers' behavior during technology-mediated in-person and remote conversations

M Seita, S Andrew, M Huenerfauth - … of the 18th International Web for All …, 2021 - dl.acm.org
Various technologies mediate synchronous audio-visual one-on-one communication
(SAVOC) between Deaf and Hard-of-Hearing (DHH) and hearing colleagues, including …

Sortformer: Seamless integration of speaker diarization and asr by bridging timestamps and tokens

T Park, I Medennikov, K Dhawan, W Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
We propose Sortformer, a novel neural model for speaker diarization, trained with
unconventional objectives compared to existing end-to-end diarization models. The …

Context-sensitive evaluation of automatic speech recognition: considering user experience & language variation

N Markl, C Lai - Proceedings of the First Workshop on Bridging …, 2021 - aclanthology.org
Abstract Commercial Automatic Speech Recognition (ASR) systems tend to show systemic
predictive bias for marginalised speaker/user groups. We highlight the need for an …

Predicting the understandability of imperfect english captions for people who are deaf or hard of hearing

S Kafle, M Huenerfauth - ACM Transactions on Accessible Computing …, 2019 - dl.acm.org
Automatic Speech Recognition (ASR) technology has seen major advancements in its
accuracy and speed in recent years, making it a possible mechanism for supporting …

Methods for evaluation of imperfect captioning tools by deaf or hard-of-hearing users at different reading literacy levels

L Berke, S Kafle, M Huenerfauth - … of the 2018 CHI Conference on …, 2018 - dl.acm.org
As Automatic Speech Recognition (ASR) improves in accuracy, it may become useful for
transcribing spoken text in real-time for Deaf and Hard-of-Hearing (DHH) individuals. To …

Behavioral changes in speakers who are automatically captioned in meetings with deaf or hard-of-hearing peers

M Seita, K Albusays, S Kafle, M Stinson… - Proceedings of the 20th …, 2018 - dl.acm.org
Deaf and hard of hearing (DHH) individuals face barriers to communication in small-group
meetings with hearing peers; we examine generation of captions on mobile devices by …