The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios
S Cornell, M Wiesner, S Watanabe, D Raj… - ar** speech captured by a distant microphone array with an arbitrary …
One model to rule them all? towards end-to-end joint speaker diarization and speech recognition
This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …
On word error rate definitions and their efficient computation for multi-speaker speech recognition systems
We propose a general framework to compute the word error rate (WER) of ASR systems that
process recordings containing multiple speakers at their input and that produce multiple …
process recordings containing multiple speakers at their input and that produce multiple …
Conformer-based target-speaker automatic speech recognition for single-channel audio
We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain
architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The …
architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The …
A sidecar separator can convert a single-talker speech recognition system to a multi-talker one
Although automatic speech recognition (ASR) can perform well in common non-overlap**
environments, sustaining performance in multi-talker overlap** speech recognition …
environments, sustaining performance in multi-talker overlap** speech recognition …
Empowering whisper as a joint multi-talker and target-talker speech recognition system
Multi-talker speech recognition and target-talker speech recognition, both involve
transcription in multi-talker contexts, remain significant challenges. However, existing …
transcription in multi-talker contexts, remain significant challenges. However, existing …