rVAD: An unsupervised segment-based robust voice activity detection method

ZH Tan, N Dehak - Computer speech & language, 2020 - Elsevier
This paper presents an unsupervised segment-based method for robust voice activity
detection (rVAD). The method consists of two passes of denoising followed by a voice …

Unsupervised speech activity detection using voicing measures and perceptual spectral flux

SO Sadjadi, JHL Hansen - IEEE signal processing letters, 2013 - ieeexplore.ieee.org
Effective speech activity detection (SAD) is a necessary first step for robust speech
applications. In this letter, we propose a robust and unsupervised SAD solution that …

Boosting contextual information for deep neural network based voice activity detection

XL Zhang, DL Wang - IEEE/ACM Transactions on Audio …, 2015 - ieeexplore.ieee.org
Voice activity detection (VAD) is an important topic in audio signal processing. Contextual
information is important for improving the performance of VAD at low signal-to-noise ratios …

Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions

S Thomas, S Ganapathy, G Saon… - 2014 IEEE International …, 2014 - ieeexplore.ieee.org
Convolutional neural networks (CNN) are extensions to deep neural networks (DNN) which
are used as alternate acoustic models with state-of-the-art performances for speech …

[PDF][PDF] Develo** a Speech Activity Detection System for the DARPA RATS Program.

T Ng, B Zhang, L Nguyen, S Matsoukas, X Zhou… - Interspeech, 2012 - isca-archive.org
This paper describes the speech activity detection (SAD) system developed by the Patrol
team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) …

Study of senone-based deep neural network approaches for spoken language recognition

L Ferrer, Y Lei, M McLaren… - IEEE/ACM Transactions …, 2015 - ieeexplore.ieee.org
This paper compares different approaches for using deep neural networks (DNNs) trained to
predict senone posteriors for the task of spoken language recognition (SLR). These …

Spoken language identification system using convolutional recurrent neural network

AA Alashban, MA Qamhan, AH Meftah, YA Alotaibi - Applied Sciences, 2022 - mdpi.com
Following recent advancements in deep learning and artificial intelligence, spoken
language identification applications are playing an increasingly significant role in our day-to …

[PDF][PDF] Robust CNN-based speech recognition with Gabor filter kernels.

SY Chang, N Morgan - Interspeech, 2014 - isca-archive.org
As has been extensively shown, acoustic features for speech recognition can be learned
from neural networks with multiple hidden layers. However, the learned transformations may …

Progress of machine learning based automatic phoneme recognition and its prospect

M Malakar, RB Keskar - Speech Communication, 2021 - Elsevier
A phoneme is the smallest perceptually distinct sound unit that can be distinguished among
words in a particular language. Every language has its own set of phonemes, and all …

[PDF][PDF] How to train your speaker embeddings extractor

ML McLaren, D Castan, MK Nandwana, L Ferrer… - 2018 - repository.ubn.ru.nl
With the recent introduction of speaker embeddings for text-independent speaker
recognition, many fundamental questions require addressing in order to fast-track the …