- Academic Search

B Yan, J Lan, Z Yan - ACM Computing Surveys, 2024 - dl.acm.org

Voice Recognition Systems (VRSs) employ deep learning for speech recognition and
speaker recognition. They have been widely deployed in various real-world applications …

Salva Cita Citato da 11 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings

L Serafini, S Cornell, G Morrone, E Zovato… - Computer Speech & …, 2023 - Elsevier

We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …

Salva Cita Citato da 8 Articoli correlati Tutte e 6 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

Salva Cita Citato da 14 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge

F Yu, S Zhang, P Guo, Y Fu, Z Du… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge
(M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech …

Salva Cita Citato da 30 Articoli correlati Tutte e 5 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario

F Yu, S Zhang, P Guo, Y Liang, Z Du… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Recently cross-channel attention, which better leverages multi-channel signals from
microphone array, has shown promising results in the multi-party meeting scenario. Cross …

Salva Cita Citato da 13 Articoli correlati Tutte e 3 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speaker overlap-aware neural diarization for multi-party meeting analysis

Z Du, S Zhang, S Zheng, Z Yan - arxiv preprint arxiv:2211.10243, 2022 - arxiv.org

Recently, hybrid systems of clustering and neural diarization models have been successfully
applied in multi-party meeting analysis. However, current models always treat overlapped …

Salva Cita Citato da 15 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-input multi-output target-speaker voice activity detection for unified, flexible, and robust audio-visual speaker diarization

M Cheng, M Li - arxiv preprint arxiv:2401.08052, 2024 - arxiv.org

Audio-visual learning has demonstrated promising results in many classical speech tasks
(eg, speech separation, automatic speech recognition, wake-word spotting). We believe that …

Salva Cita Citato da 8 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

End-to-end Online Speaker Diarization with Target Speaker Tracking

W Wang, M Li - arxiv preprint arxiv:2310.08696, 2023 - arxiv.org

This paper proposes an online target speaker voice activity detection system for speaker
diarization tasks, which does not require a priori knowledge from the clustering-based …

Salva Cita Citato da 6 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Online target speaker voice activity detection for speaker diarization

W Wang, Q Lin, M Li - arxiv preprint arxiv:2207.05920, 2022 - arxiv.org

This paper proposes an online target speaker voice activity detection system for speaker
diarization tasks, which does not require a priori knowledge from the clustering-based …

Salva Cita Citato da 12 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR

Y Liang, M Shi, F Yu, Y Li, S Zhang, Z Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

With the success of the first Multi-channel Multi-party Meeting Transcription challenge
(M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to …

Salva Cita Citato da 5 Articoli correlati Tutte e 4 le versioni

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Cross-channel attention-based target speaker voice activity detection: Experimental results...

Backdoor attacks against voice recognition systems: A survey

[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario

Speaker overlap-aware neural diarization for multi-party meeting analysis

Multi-input multi-output target-speaker voice activity detection for unified, flexible, and robust audio-visual speaker diarization

End-to-end Online Speaker Diarization with Target Speaker Tracking

Online target speaker voice activity detection for speaker diarization

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR