Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking
In this paper, we present a novel system that separates the voice of a target speaker from
multi-speaker signals, by making use of a reference signal from the target speaker. We …
multi-speaker signals, by making use of a reference signal from the target speaker. We …
[PDF][PDF] Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems.
S Basak, H Agrawal, S Jena, S Gite… - … in Engineering & …, 2023 - cdn.techscience.cn
Speech recognition systems have become a unique human-computer interaction (HCI)
family. Speech is one of the most naturally developed human abilities; speech signal …
family. Speech is one of the most naturally developed human abilities; speech signal …
Noise robust automatic speech recognition: review and analysis
M Dua, Akanksha, S Dua - International Journal of Speech Technology, 2023 - Springer
Abstract Automatic Speech Recognition (ASR) system is an emerging technology used in
various fields such as robotics, traffic controls, and healthcare, etc. The leading cause of …
various fields such as robotics, traffic controls, and healthcare, etc. The leading cause of …
Speech robust bench: a robustness benchmark for speech recognition
As Automatic Speech Recognition (ASR) models become ever more pervasive, it is
important to ensure that they make reliable predictions under corruptions present in the …
important to ensure that they make reliable predictions under corruptions present in the …
Deaf and hard-of-hearing users' preferences for hearing speakers' behavior during technology-mediated in-person and remote conversations
Various technologies mediate synchronous audio-visual one-on-one communication
(SAVOC) between Deaf and Hard-of-Hearing (DHH) and hearing colleagues, including …
(SAVOC) between Deaf and Hard-of-Hearing (DHH) and hearing colleagues, including …
Sortformer: Seamless integration of speaker diarization and asr by bridging timestamps and tokens
We propose Sortformer, a novel neural model for speaker diarization, trained with
unconventional objectives compared to existing end-to-end diarization models. The …
unconventional objectives compared to existing end-to-end diarization models. The …
Context-sensitive evaluation of automatic speech recognition: considering user experience & language variation
Abstract Commercial Automatic Speech Recognition (ASR) systems tend to show systemic
predictive bias for marginalised speaker/user groups. We highlight the need for an …
predictive bias for marginalised speaker/user groups. We highlight the need for an …
Predicting the understandability of imperfect english captions for people who are deaf or hard of hearing
Automatic Speech Recognition (ASR) technology has seen major advancements in its
accuracy and speed in recent years, making it a possible mechanism for supporting …
accuracy and speed in recent years, making it a possible mechanism for supporting …
Methods for evaluation of imperfect captioning tools by deaf or hard-of-hearing users at different reading literacy levels
As Automatic Speech Recognition (ASR) improves in accuracy, it may become useful for
transcribing spoken text in real-time for Deaf and Hard-of-Hearing (DHH) individuals. To …
transcribing spoken text in real-time for Deaf and Hard-of-Hearing (DHH) individuals. To …
Behavioral changes in speakers who are automatically captioned in meetings with deaf or hard-of-hearing peers
Deaf and hard of hearing (DHH) individuals face barriers to communication in small-group
meetings with hearing peers; we examine generation of captions on mobile devices by …
meetings with hearing peers; we examine generation of captions on mobile devices by …