TF-GridNet: Integrating full-and sub-band modeling for speech separation

ZQ Wang, S Cornell, S Choi, Y Lee… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …

A Survey on Low-Latency DNN-Based Speech Enhancement

S Drgas - Sensors, 2023 - mdpi.com
This paper presents recent advances in low-latency, single-channel, deep neural network-
based speech enhancement systems. The sources of latency and their acceptable values in …

Earspeech: Exploring in-ear occlusion effect on earphones for data-efficient airborne speech enhancement

F Han, P Yang, Y Zuo, F Shang, F Xu… - Proceedings of the ACM on …, 2024 - dl.acm.org
Earphones have become a popular voice input and interaction device. However, airborne
speech is susceptible to ambient noise, making it necessary to improve the quality and …

Low bit rate binaural link for improved ultra low-latency low-complexity multichannel speech enhancement in Hearing Aids

NL Westhausen, BT Meyer - … of Signal Processing to Audio and …, 2023 - ieeexplore.ieee.org
Speech enhancement in hearing aids is a challenging task since the hardware limits the
number of possible operations and the latency needs to be in the range of only a few …

FNeural speech enhancement with very low algorithmic latency and complexity via integrated full-and sub-band modeling

ZQ Wang, S Cornell, S Choi, Y Lee… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that
integrates full-and sub-band (FSB) modeling, for single-and multi-channel speech …

[PDF][PDF] A simple rnn model for lightweight, low-compute and low-latency multichannel speech enhancement in the time domain

A Pandey, K Tan, B Xu - INTERSPEECH, 2023 - isca-archive.org
Deep learning has led to unprecedented advances in speech enhancement. However, deep
neural networks (DNNs) typically require large amount of computation, memory, signal …

DPSNN: spiking neural network for low-latency streaming speech enhancement

T Sun, S Bohté - Neuromorphic Computing and Engineering, 2024 - iopscience.iop.org
Speech enhancement improves communication in noisy environments, affecting areas such
as automatic speech recognition (ASR), hearing aids, and telecommunications. With these …

Binaural multichannel blind speaker separation with a causal low-latency and low-complexity approach

NL Westhausen, BT Meyer - IEEE Open Journal of Signal …, 2023 - ieeexplore.ieee.org
In this article, we introduce a causal low-latency low-complexity approach for binaural
multichannel blind speaker separation in noisy reverberant conditions. The model, referred …

Multi-channel target speaker extraction with refinement: The WAVLab submission to the second clarity enhancement challenge

S Cornell, ZQ Wang, Y Masuyama, S Watanabe… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper describes our submission to the Second Clarity Enhancement Challenge
(CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy …

Single-microphone speaker separation and voice activity detection in noisy and reverberant environments

R Opochinsky, M Moradi, S Gannot - arxiv preprint arxiv:2401.03448, 2024 - arxiv.org
Speech separation involves extracting an individual speaker's voice from a multi-speaker
audio signal. The increasing complexity of real-world environments, where multiple …