- Academic Search

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Save Cite Cited by 1834 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Streaming multi-talker ASR with token-level serialized output training

N Kanda, J Wu, Y Wu, X **ao, Z Meng, X Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

This paper proposes a token-level serialized output training (t-SOT), a novel framework for
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …

Save Cite Cited by 63 Related articles All 6 versions Free GPT-4 View as HTML

A deep hierarchical fusion network for fullband acoustic echo cancellation

H Zhao, N Li, R Han, L Chen, X Zheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Deep learning based wideband (16kHz) acoustic echo cancellation (AEC) approaches have
surpassed traditional methods. This work proposes a deep hierarchical fusion (DHF) …

Save Cite Cited by 19 Related articles

[Free GPT-4]

[PDF] arxiv.org

Speech separation with large-scale self-supervised learning

Z Chen, N Kanda, J Wu, Y Wu, X Wang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Self-supervised learning (SSL) methods such as WavLM have shown promising speech
separation (SS) results in small-scale simulation-based experiments. In this work, we extend …

Save Cite Cited by 13 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Serialized output training by learned dominance

Y Shi, L Li, S Yin, D Wang, J Han - arxiv preprint arxiv:2407.03966, 2024 - arxiv.org

Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker
speech recognition by sequentially decoding the speech of individual speakers. To address …

Save Cite Cited by 3 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

On Speaker Attribution with SURT

D Raj, M Wiesner, M Maciejewski… - arxiv preprint arxiv …, 2024 - arxiv.org

The Streaming Unmixing and Recognition Transducer (SURT) has recently become a
popular framework for continuous, streaming, multi-talker speech recognition (ASR). With …

Save Cite Cited by 4 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] ieee.org

Polyscriber: Integrated fine-tuning of extractor and lyrics transcriber for polyphonic music

X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

Lyrics transcription of polyphonic music is challenging as the background music affects lyrics
intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, ie a …

Save Cite Cited by 12 Related articles All 4 versions Free GPT-4

Multi-stage and multi-loss training for fullband non-personalized and personalized speech enhancement

L Chen, C Xu, X Zhang, X Ren, X Zheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Deep learning-based wideband (16kHz) speech enhancement approaches have surpassed
traditional methods. This work further extends the existing wideband systems to enable full …

Save Cite Cited by 9 Related articles

Keyword Guided Target Speech Recognition

Y Shi, L Li, D Wang, J Han - IEEE Signal Processing Letters, 2024 - ieeexplore.ieee.org

This letter presents a new target speech recognition problem, where the target speech is
defined by a keyword. For instance, when a person speaks “Hey Google” or “Help Me”, we …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Leveraging real conversational data for multi-channel continuous speech separation

X Wang, D Wang, N Kanda, SE Eskimez… - arxiv preprint arxiv …, 2022 - arxiv.org

Existing multi-channel continuous speech separation (CSS) models are heavily dependent
on supervised data-either simulated data which causes data mismatch between the training …

Save Cite Cited by 8 Related articles All 6 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Investigation of practical aspects of single channel speech separation for ASR

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

Streaming multi-talker ASR with token-level serialized output training

A deep hierarchical fusion network for fullband acoustic echo cancellation

Speech separation with large-scale self-supervised learning

Serialized output training by learned dominance

On Speaker Attribution with SURT

Polyscriber: Integrated fine-tuning of extractor and lyrics transcriber for polyphonic music

Multi-stage and multi-loss training for fullband non-personalized and personalized speech enhancement

Keyword Guided Target Speech Recognition

Leveraging real conversational data for multi-channel continuous speech separation