End-to-end neural speaker diarization with self-attention

Y Fujita, N Kanda, S Horiguchi, Y Xue… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
Speaker diarization has been mainly developed based on the clustering of speaker
embeddings. However, the clustering-based approach has two major problems; ie,(i) it is not …

Guided source separation meets a strong ASR backend: Hitachi/Paderborn University joint investigation for dinner party ASR

N Kanda, C Boeddeker, J Heitkaemper, Y Fujita… - arxiv preprint arxiv …, 2019 - arxiv.org
In this paper, we present Hitachi and Paderborn University's joint effort for automatic speech
recognition (ASR) in a dinner party scenario. The main challenges of ASR systems for …

Online end-to-end neural diarization with speaker-tracing buffer

Y Xue, S Horiguchi, Y Fujita… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
This paper proposes a novel online speaker diarization algorithm based on a fully
supervised self-attention mechanism (SA-EEND). Online diarization inherently presents a …

Investigation of end-to-end speaker-attributed ASR for continuous multi-talker recordings

N Kanda, X Chang, Y Gaur, X Wang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Recently, an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR)
model was proposed as a joint model of speaker counting, speech recognition and speaker …

BW-EDA-EEND: Streaming end-to-end neural speaker diarization for a variable number of speakers

E Han, C Lee, A Stolcke - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that
processes data incrementally for a variable number of speakers. The system is based on the …

End-to-end neural diarization: Reformulating speaker diarization as simple multi-label classification

Y Fujita, S Watanabe, S Horiguchi, Y Xue… - arxiv preprint arxiv …, 2020 - arxiv.org
The most common approach to speaker diarization is clustering of speaker embeddings.
However, the clustering-based approach has a number of problems; ie,(i) it is not optimized …