- Academic Search

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition‏

Y Dai, H Chen, J Du, R Wang, S Chen… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

Abstract Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed
to be sensitive to missing video frames performing even worse than single-modality models …‏

שמור צטט צוטט על ידי 5 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Voxblink: A large scale speaker verification dataset on camera‏

Y Lin, X Qin, G Zhao, M Cheng, N Jiang… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org‏

In this paper, we introduce a large-scale and high-quality audiovisual speaker verification
dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data …‏

שמור צטט צוטט על ידי 15 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Voxblink2: A 100k+ speaker recognition corpus and the open-set speaker-identification benchmark‏

Y Lin, M Cheng, F Zhang, Y Gao, S Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which
includes approximately 10M utterances with videos from 110K+ speakers in the wild. This …‏

שמור צטט צוטט על ידי 6 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-input multi-output target-speaker voice activity detection for unified, flexible, and robust audio-visual speaker diarization‏

M Cheng, M Li - arxiv preprint arxiv:2401.08052, 2024‏ - arxiv.org‏

Audio-visual learning has demonstrated promising results in many classical speech tasks
(eg, speech separation, automatic speech recognition, wake-word spotting). We believe that …‏

שמור צטט צוטט על ידי 9 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The dku-msxf diarization system for the voxceleb speaker recognition challenge 2023‏

M Cheng, W Wang, X Qin, Y Lin, N Jiang… - National Conference on …, 2023‏ - Springer‏

This paper describes the DKU-MSXF submission to track 4 of the VoxCeleb Speaker
Recognition Challenge 2023 (VoxSRC-23). Our system pipeline contains voice activity …‏

שמור צטט צוטט על ידי 7 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation‏

M Cheng, Y Lin, M Li - arxiv preprint arxiv:2411.13849, 2024‏ - arxiv.org‏

This paper proposes a novel Sequence-to-Sequence Neural Diarization (SSND) framework
to perform online and offline speaker diarization. It is developed from the sequence-to …‏

שמור צטט מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

Summary on the multimodal information based speech processing (MISP) 2022 challenge‏

H Chen, S Wu, Y Dai, Z Wang, J Du… - ICASSP 2023-2023 …, 2023‏ - ieeexplore.ieee.org‏

The Multimodal Information based Speech Processing (MISP) 2022 challenge aimed to
enhance speech processing performance in harsh acoustic environments by leveraging …‏

שמור צטט צוטט על ידי 1 מאמרים בנושא זה כל 8 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization‏

H Zhao, L Zhang, Y Li, Y Wang, H Wang, W Rao… - National Conference on …, 2023‏ - Springer‏

The scarcity of labeled audio-visual datasets is a constraint for training superior audio-visual
speaker diarization systems. To improve the performance of audio-visual speaker …‏

שמור צטט מאמרים בנושא זה כל 3 הגרסאות

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

The whu-alibaba audio-visual speaker diarization system for the misp 2022 challenge

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition‏

Voxblink: A large scale speaker verification dataset on camera‏

Voxblink2: A 100k+ speaker recognition corpus and the open-set speaker-identification benchmark‏

Multi-input multi-output target-speaker voice activity detection for unified, flexible, and robust audio-visual speaker diarization‏

The dku-msxf diarization system for the voxceleb speaker recognition challenge 2023‏

Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation‏

Summary on the multimodal information based speech processing (MISP) 2022 challenge‏

Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization‏