rVAD: An unsupervised segment-based robust voice activity detection method
This paper presents an unsupervised segment-based method for robust voice activity
detection (rVAD). The method consists of two passes of denoising followed by a voice …
detection (rVAD). The method consists of two passes of denoising followed by a voice …
Unsupervised speech activity detection using voicing measures and perceptual spectral flux
Effective speech activity detection (SAD) is a necessary first step for robust speech
applications. In this letter, we propose a robust and unsupervised SAD solution that …
applications. In this letter, we propose a robust and unsupervised SAD solution that …
Boosting contextual information for deep neural network based voice activity detection
Voice activity detection (VAD) is an important topic in audio signal processing. Contextual
information is important for improving the performance of VAD at low signal-to-noise ratios …
information is important for improving the performance of VAD at low signal-to-noise ratios …
Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions
Convolutional neural networks (CNN) are extensions to deep neural networks (DNN) which
are used as alternate acoustic models with state-of-the-art performances for speech …
are used as alternate acoustic models with state-of-the-art performances for speech …
[PDF][PDF] Develo** a Speech Activity Detection System for the DARPA RATS Program.
This paper describes the speech activity detection (SAD) system developed by the Patrol
team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) …
team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) …
Study of senone-based deep neural network approaches for spoken language recognition
This paper compares different approaches for using deep neural networks (DNNs) trained to
predict senone posteriors for the task of spoken language recognition (SLR). These …
predict senone posteriors for the task of spoken language recognition (SLR). These …
Spoken language identification system using convolutional recurrent neural network
Following recent advancements in deep learning and artificial intelligence, spoken
language identification applications are playing an increasingly significant role in our day-to …
language identification applications are playing an increasingly significant role in our day-to …
[PDF][PDF] Robust CNN-based speech recognition with Gabor filter kernels.
As has been extensively shown, acoustic features for speech recognition can be learned
from neural networks with multiple hidden layers. However, the learned transformations may …
from neural networks with multiple hidden layers. However, the learned transformations may …
Progress of machine learning based automatic phoneme recognition and its prospect
M Malakar, RB Keskar - Speech Communication, 2021 - Elsevier
A phoneme is the smallest perceptually distinct sound unit that can be distinguished among
words in a particular language. Every language has its own set of phonemes, and all …
words in a particular language. Every language has its own set of phonemes, and all …
[PDF][PDF] How to train your speaker embeddings extractor
With the recent introduction of speaker embeddings for text-independent speaker
recognition, many fundamental questions require addressing in order to fast-track the …
recognition, many fundamental questions require addressing in order to fast-track the …