محقق Google

Dual-path mamba: Short and long-term bidirectional selective structured state space models for speech separation‏

X Jiang, C Han, N Mesgarani - arxiv preprint arxiv:2403.18257, 2024‏ - arxiv.org‏

Transformers have been the most successful architecture for various speech modeling tasks,
including speech separation. However, the self-attention mechanism in transformers with …‏

ذخیره ارجاع بیان شده در 38 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech‏

J Shi, J Tian, Y Wu, J Jung, JQ Yip… - 2024 IEEE Spoken …, 2024‏ - ieeexplore.ieee.org‏

Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …‏

ذخیره ارجاع بیان شده در 8 یافته مقاله‌های مربوط تمام نسخه‌های 3

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

MSFNet: Multi-scale fusion network for brain-controlled speaker extraction‏

C Fan, J Zhang, H Zhang, W **ang, J Tao, X Li… - Proceedings of the …, 2024‏ - dl.acm.org‏

Speaker extraction aims to selectively extract the target speaker from the multi-talker
environment under the guidance of auxiliary reference. Recent studies have shown that the …‏

ذخیره ارجاع بیان شده در 5 یافته مقاله‌های مربوط تمام نسخه‌های 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

TF-Locoformer: Transformer with local modeling by convolution for speech separation and enhancement‏

K Saijo, G Wichern, FG Germain, Z Pan… - … on Acoustic Signal …, 2024‏ - ieeexplore.ieee.org‏

Time-frequency (TF) domain dual-path models achieve high-fidelity speech separation.
While some previous state-of-the-art (SoTA) models rely on RNNs, this reliance means they …‏

ذخیره ارجاع بیان شده در 4 یافته مقاله‌های مربوط تمام نسخه‌های 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LibriheavyMix: a 20,000-hour dataset for single-channel reverberant multi-talker speech separation, ASR and speaker diarization‏

Z **, Y Yang, M Shi, W Kang, X Yang, Z Yao… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The evolving speech processing landscape is increasingly focused on complex scenarios
like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions …‏

ذخیره ارجاع بیان شده در 2 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards audio codec-based speech separation‏

JQ Yip, S Zhao, D Ng, ES Chng, B Ma - arxiv preprint arxiv:2406.12434, 2024‏ - arxiv.org‏

Recent improvements in neural audio codec (NAC) models have generated interest in
adopting pre-trained codecs for a variety of speech processing applications to take …‏

ذخیره ارجاع بیان شده در 4 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Usef-tse: Universal speaker embedding free target speaker extraction‏

B Zeng, M Li - arxiv preprint arxiv:2409.02615, 2024‏ - arxiv.org‏

Target speaker extraction aims to isolate the voice of a specific speaker from mixed speech.
Traditionally, this process has relied on extracting a speaker embedding from a reference …‏

ذخیره ارجاع بیان شده در 2 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Separate and reconstruct: Asymmetric encoder-decoder for speech separation‏

UH Shin, S Lee, T Kim, HM Park - arxiv preprint arxiv:2406.05983, 2024‏ - arxiv.org‏

In speech separation, time-domain approaches have successfully replaced the time-
frequency domain with latent sequence feature from a learnable encoder. Conventionally …‏

ذخیره ارجاع بیان شده در 1 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

Enhanced speech separation through a supervised approach using bidirectional long short-term memory in dual domains‏

S Basir, MS Hosen, MN Hossain… - Computers and …, 2024‏ - Elsevier‏

The process of separating individual sound sources from mono audio is a complex yet
essential endeavor in audio signal processing and analysis. This article presents an …‏

ذخیره ارجاع مقاله‌های مربوط تمام نسخه‌های 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Early joint learning of emotion information makes multimodal model understand you better‏

M Ge, M Li, D Tang, P Li, K Liu, S Deng, S Pu… - Proceedings of the 2nd …, 2024‏ - dl.acm.org‏

In this paper, we present our solutions for emotion recognition in the sub-challenges of
Multimodal Emotion Recognition Challenge (MER2024). For the tasks MER-SEMI and MER …‏

ذخیره ارجاع بیان شده در 1 یافته مقاله‌های مربوط تمام نسخه‌های 3

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Mossformer2: Combining transformer and rnn-free recurrent network for enhanced time-domain...

Dual-path mamba: Short and long-term bidirectional selective structured state space models for speech separation‏

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech‏

MSFNet: Multi-scale fusion network for brain-controlled speaker extraction‏

TF-Locoformer: Transformer with local modeling by convolution for speech separation and enhancement‏

LibriheavyMix: a 20,000-hour dataset for single-channel reverberant multi-talker speech separation, ASR and speaker diarization‏

Towards audio codec-based speech separation‏

Usef-tse: Universal speaker embedding free target speaker extraction‏

Separate and reconstruct: Asymmetric encoder-decoder for speech separation‏

Enhanced speech separation through a supervised approach using bidirectional long short-term memory in dual domains‏

Early joint learning of emotion information makes multimodal model understand you better‏