One model to rule them all? towards end-to-end joint speaker diarization and speech recognition

S Cornell, J Jung, S Watanabe… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …

On word error rate definitions and their efficient computation for multi-speaker speech recognition systems

T von Neumann, C Boeddeker… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose a general framework to compute the word error rate (WER) of ASR systems that
process recordings containing multiple speakers at their input and that produce multiple …

Conformer-based target-speaker automatic speech recognition for single-channel audio

Y Zhang, KC Puvvada, V Lavrukhin… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain
architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The …

A sidecar separator can convert a single-talker speech recognition system to a multi-talker one

L Meng, J Kang, M Cui, Y Wang, X Wu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Although automatic speech recognition (ASR) can perform well in common non-overlap**
environments, sustaining performance in multi-talker overlap** speech recognition …

Empowering whisper as a joint multi-talker and target-talker speech recognition system

L Meng, J Kang, Y Wang, Z **, X Wu, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-talker speech recognition and target-talker speech recognition, both involve
transcription in multi-talker contexts, remain significant challenges. However, existing …