[HTML][HTML] A real-time automated defect detection system for ceramic pieces manufacturing process based on computer vision with deep learning

E Cumba**, N Rodrigues, P Costa, R Miragaia… - Sensors, 2023 - mdpi.com
Defect detection is a key element of quality control in today's industries, and the process
requires the incorporation of automated methods, including image sensors, to detect any …

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR

P Guo, X Chang, H Lv, S Watanabe… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Benefiting from massive and diverse data sources, speech foundation models exhibit strong
generalization and knowledge transfer capabilities to a wide range of downstream tasks …

Weakly-supervised speech pre-training: A case study on target speech recognition

W Zhang, Y Qian - arxiv preprint arxiv:2305.16286, 2023 - arxiv.org
Self-supervised learning (SSL) based speech pre-training has attracted much attention for
its capability of extracting rich representations learned from massive unlabeled data. On the …

Towards Decoupling Frontend Enhancement and Backend Recognition in Monaural Robust ASR

Y Yang, A Pandey, DL Wang - arxiv preprint arxiv:2403.06387, 2024 - arxiv.org
It has been shown that the intelligibility of noisy speech can be improved by speech
enhancement (SE) algorithms. However, monaural SE has not been established as an …

Jointist: Simultaneous improvement of multi-instrument transcription and music source separation via joint training

KW Cheuk, K Choi, Q Kong, B Li, M Won… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is
capable of transcribing, recognizing, and separating multiple musical instruments from an …

Direct enhancement of pre-trained speech embeddings for speech processing in noisy conditions

MN Ali, A Brutti, D Falavigna - Computer Speech & Language, 2023 - Elsevier
Lately, the development of deep learning algorithms has marked milestones in the field of
speech processing. In particular, the release of pre-trained feature extraction models has …

Time-domain speech enhancement for robust automatic speech recognition

Y Yang, A Pandey, DL Wang - arxiv preprint arxiv:2210.13318, 2022 - arxiv.org
It has been shown that the intelligibility of noisy speech can be improved by speech
enhancement algorithms. However, speech enhancement has not been established as an …

Surt 2.0: Advances in transducer-based multi-talker speech recognition

D Raj, D Povey, S Khudanpur - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
The Streaming Unmixing and Recognition Transducer (SURT) model was proposed recently
as an end-to-end approach for continuous, streaming, multi-talker speech recognition (ASR) …

Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription

C Cui, IA Sheikh, M Sadeghi, E Vincent - arxiv preprint arxiv:2410.21849, 2024 - arxiv.org
Distant-microphone meeting transcription is a challenging task. State-of-the-art end-to-end
speaker-attributed automatic speech recognition (SA-ASR) architectures lack a multichannel …

Speech Separation in Noisy Reverberant Acoustic Environments

W Ravenscroft - 2024 - etheses.whiterose.ac.uk
Speech separation remains a vital area of research for many modern technologies. The
ubiquitous spread of deep neural networks (DNNs) in many areas of signal processing (SP) …