Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arxiv preprint arxiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Separate what you describe: Language-queried audio source separation

X Liu, H Liu, Q Kong, X Mei, J Zhao, Q Huang… - arxiv preprint arxiv …, 2022 - arxiv.org
In this paper, we introduce the task of language-queried audio source separation (LASS),
which aims to separate a target source from an audio mixture based on a natural language …

Personalized speech enhancement: New models and comprehensive evaluation

SE Eskimez, T Yoshioka, H Wang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Personalized speech enhancement (PSE) models utilize additional cues, such as speaker
embeddings like d-vectors, to remove background noise and interfering speech in real-time …

AdaptiveNet: Post-deployment neural architecture adaptation for diverse edge environments

H Wen, Y Li, Z Zhang, S Jiang, X Ye, Y Ouyang… - Proceedings of the 29th …, 2023 - dl.acm.org
Deep learning models are increasingly deployed to edge devices for real-time applications.
To ensure stable service quality across diverse edge environments, it is highly desirable to …

Tea-pse: Tencent-ethereal-audio-lab personalized speech enhancement system for icassp 2022 dns challenge

Y Ju, W Rao, X Yan, Y Fu, S Lv, L Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This paper describes Tencent Ethereal Audio Lab–Northwestern Polytechnical University
personalized speech enhancement (TEA-PSE) system submitted to track 2 of the ICASSP …

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

Fast real-time personalized speech enhancement: End-to-end enhancement network (E3Net) and knowledge distillation

M Thakker, SE Eskimez, T Yoshioka… - arxiv preprint arxiv …, 2022 - arxiv.org
This paper investigates how to improve the runtime speed of personalized speech
enhancement (PSE) networks while maintaining the model quality. Our approach includes …

Towards neural diarization for unlimited numbers of speakers using global and local attractors

S Horiguchi, S Watanabe, P García… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully
tuned conventional clustering-based methods on challenging datasets. However, the main …

Online neural diarization of unlimited numbers of speakers using global and local attractors

S Horiguchi, S Watanabe, P García… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
A method to perform offline and online speaker diarization for an unlimited number of
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …