Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Dual-path mamba: Short and long-term bidirectional selective structured state space models for speech separation
Transformers have been the most successful architecture for various speech modeling tasks,
including speech separation. However, the self-attention mechanism in transformers with …
including speech separation. However, the self-attention mechanism in transformers with …
Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech
Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …
addition to signal compression capabilities, discrete codecs have also been found to …
MSFNet: Multi-scale fusion network for brain-controlled speaker extraction
Speaker extraction aims to selectively extract the target speaker from the multi-talker
environment under the guidance of auxiliary reference. Recent studies have shown that the …
environment under the guidance of auxiliary reference. Recent studies have shown that the …
TF-Locoformer: Transformer with local modeling by convolution for speech separation and enhancement
Time-frequency (TF) domain dual-path models achieve high-fidelity speech separation.
While some previous state-of-the-art (SoTA) models rely on RNNs, this reliance means they …
While some previous state-of-the-art (SoTA) models rely on RNNs, this reliance means they …
LibriheavyMix: a 20,000-hour dataset for single-channel reverberant multi-talker speech separation, ASR and speaker diarization
The evolving speech processing landscape is increasingly focused on complex scenarios
like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions …
like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions …
Towards audio codec-based speech separation
Recent improvements in neural audio codec (NAC) models have generated interest in
adopting pre-trained codecs for a variety of speech processing applications to take …
adopting pre-trained codecs for a variety of speech processing applications to take …
Usef-tse: Universal speaker embedding free target speaker extraction
Target speaker extraction aims to isolate the voice of a specific speaker from mixed speech.
Traditionally, this process has relied on extracting a speaker embedding from a reference …
Traditionally, this process has relied on extracting a speaker embedding from a reference …
Separate and reconstruct: Asymmetric encoder-decoder for speech separation
In speech separation, time-domain approaches have successfully replaced the time-
frequency domain with latent sequence feature from a learnable encoder. Conventionally …
frequency domain with latent sequence feature from a learnable encoder. Conventionally …
Enhanced speech separation through a supervised approach using bidirectional long short-term memory in dual domains
The process of separating individual sound sources from mono audio is a complex yet
essential endeavor in audio signal processing and analysis. This article presents an …
essential endeavor in audio signal processing and analysis. This article presents an …
Early joint learning of emotion information makes multimodal model understand you better
M Ge, M Li, D Tang, P Li, K Liu, S Deng, S Pu… - Proceedings of the 2nd …, 2024 - dl.acm.org
In this paper, we present our solutions for emotion recognition in the sub-challenges of
Multimodal Emotion Recognition Challenge (MER2024). For the tasks MER-SEMI and MER …
Multimodal Emotion Recognition Challenge (MER2024). For the tasks MER-SEMI and MER …