Backdoor attacks against voice recognition systems: A survey

B Yan, J Lan, Z Yan - ACM Computing Surveys, 2024 - dl.acm.org
Voice Recognition Systems (VRSs) employ deep learning for speech recognition and
speaker recognition. They have been widely deployed in various real-world applications …

[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings

L Serafini, S Cornell, G Morrone, E Zovato… - Computer Speech & …, 2023 - Elsevier
We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge

F Yu, S Zhang, P Guo, Y Fu, Z Du… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge
(M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech …

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario

F Yu, S Zhang, P Guo, Y Liang, Z Du… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Recently cross-channel attention, which better leverages multi-channel signals from
microphone array, has shown promising results in the multi-party meeting scenario. Cross …

Speaker overlap-aware neural diarization for multi-party meeting analysis

Z Du, S Zhang, S Zheng, Z Yan - arxiv preprint arxiv:2211.10243, 2022 - arxiv.org
Recently, hybrid systems of clustering and neural diarization models have been successfully
applied in multi-party meeting analysis. However, current models always treat overlapped …

Multi-input multi-output target-speaker voice activity detection for unified, flexible, and robust audio-visual speaker diarization

M Cheng, M Li - arxiv preprint arxiv:2401.08052, 2024 - arxiv.org
Audio-visual learning has demonstrated promising results in many classical speech tasks
(eg, speech separation, automatic speech recognition, wake-word spotting). We believe that …

End-to-end Online Speaker Diarization with Target Speaker Tracking

W Wang, M Li - arxiv preprint arxiv:2310.08696, 2023 - arxiv.org
This paper proposes an online target speaker voice activity detection system for speaker
diarization tasks, which does not require a priori knowledge from the clustering-based …

Online target speaker voice activity detection for speaker diarization

W Wang, Q Lin, M Li - arxiv preprint arxiv:2207.05920, 2022 - arxiv.org
This paper proposes an online target speaker voice activity detection system for speaker
diarization tasks, which does not require a priori knowledge from the clustering-based …

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR

Y Liang, M Shi, F Yu, Y Li, S Zhang, Z Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
With the success of the first Multi-channel Multi-party Meeting Transcription challenge
(M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to …