Google Academic

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Salvați Citați Citat de 453 ori Articole cu conținut similar Toate cele 8 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Salvați Citați Citat de 432 ori Articole cu conținut similar Toate cele 7 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge

F Yu, S Zhang, Y Fu, L **e, S Zheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Recent development of speech signal processing, such as speech recognition, speaker
diarization, etc., has inspired numerous applications of speech technologies. The meeting …

Salvați Citați Citat de 103 ori Articole cu conținut similar Toate cele 3 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Streaming multi-talker ASR with token-level serialized output training

N Kanda, J Wu, Y Wu, X **ao, Z Meng, X Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

This paper proposes a token-level serialized output training (t-SOT), a novel framework for
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …

Salvați Citați Citat de 63 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

GPU-accelerated guided source separation for meeting transcription

D Raj, D Povey, S Khudanpur - arxiv preprint arxiv:2212.05271, 2022 - arxiv.org

Guided source separation (GSS) is a type of target-speaker extraction method that relies on
pre-computed speaker activities and blind source separation to perform front-end …

Salvați Citați Citat de 38 ori Articole cu conținut similar Toate cele 11 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Joint speaker counting, speech recognition, and speaker identification for overlapped speech of any number of speakers

N Kanda, Y Gaur, X Wang, Z Meng, Z Chen… - arxiv preprint arxiv …, 2020 - arxiv.org

We propose an end-to-end speaker-attributed automatic speech recognition model that
unifies speaker counting, speech recognition, and speaker identification on monaural …

Salvați Citați Citat de 91 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] ieee.org

Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning

X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org

Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …

Salvați Citați Citat de 35 ori Articole cu conținut similar Toate cele 4 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

One model to rule them all? towards end-to-end joint speaker diarization and speech recognition

S Cornell, J Jung, S Watanabe… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …

Salvați Citați Citat de 19 ori Articole cu conținut similar Toate cele 4 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

CoVoMix: Advancing zero-shot speech generation for human-like multi-talker conversations

L Zhang, Y Qian, L Zhou, S Liu… - Advances in …, 2025 - proceedings.neurips.cc

Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant
strides in generating high-fidelity and diverse speech. However, dialogue generation, along …

Salvați Citați Citat de 2 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Extending Whisper with prompt tuning to target-speaker ASR

H Ma, Z Peng, M Shao, J Li, J Liu - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Target-speaker automatic speech recognition (ASR) aims to transcribe the desired speech
of a target speaker from multi-talker overlapped utterances. Most of the existing target …

Salvați Citați Citat de 20 ori Articole cu conținut similar Toate cele 3 versiuni

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Serialized output training for end-to-end overlapped speech recognition

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

A review of speaker diarization: Recent advances with deep learning

M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge

Streaming multi-talker ASR with token-level serialized output training

GPU-accelerated guided source separation for meeting transcription

Joint speaker counting, speech recognition, and speaker identification for overlapped speech of any number of speakers

Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning

One model to rule them all? towards end-to-end joint speaker diarization and speech recognition

CoVoMix: Advancing zero-shot speech generation for human-like multi-talker conversations

Extending Whisper with prompt tuning to target-speaker ASR