Y Fu, L Cheng, S Lv, Y Jv, Y Kong, Z Chen… - arxiv preprint arxiv …, 2021 - arxiv.org
In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference …
This paper investigates an end-to-end neural diarization (EEND) method for an unknown number of speakers. In contrast to the conventional cascaded approach to speaker …
Audio-visual speaker diarization aims at detecting" who spoke when''using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor …