Speech recognition using deep neural networks: A systematic review
Over the past decades, a tremendous amount of research has been done on the use of
machine learning for speech processing applications, especially speech recognition …
machine learning for speech processing applications, especially speech recognition …
A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
A survey on voice assistant security: Attacks and countermeasures
Voice assistants (VA) have become prevalent on a wide range of personal devices such as
smartphones and smart speakers. As companies build voice assistants with extra …
smartphones and smart speakers. As companies build voice assistants with extra …
Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription
YouTube is a highly visited video sharing website where over one billion people watch six
billion hours of video every month. Improving accessibility to these videos for the hearing …
billion hours of video every month. Improving accessibility to these videos for the hearing …
Voice activity detection using an adaptive context attention model
Voice activity detection (VAD) classifies incoming signal segments into speech or
background noise; its performance is crucial in various speech-related applications …
background noise; its performance is crucial in various speech-related applications …
Voice activity detection in the wild: A data-driven approach using teacher-student training
Voice activity detection is an essential pre-processing component for speech-related tasks
such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain …
such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain …
Optimization of RNN-based speech activity detection
Speech activity detection (SAD) is an essential component of automatic speech recognition
systems impacting the overall system performance. This paper investigates an optimization …
systems impacting the overall system performance. This paper investigates an optimization …
An open-source voice type classifier for child-centered daylong recordings
Spontaneous conversations in real-world settings such as those found in child-centered
recordings have been shown to be amongst the most challenging audio files to process …
recordings have been shown to be amongst the most challenging audio files to process …
End-to-end automatic speech recognition integrated with CTC-based voice activity detection
This paper integrates a voice activity detection (VAD) function with end-to-end automatic
speech recognition toward an online speech interface and transcribing very long audio …
speech recognition toward an online speech interface and transcribing very long audio …
A hybrid CNN-BiLSTM voice activity detector
This paper presents a new hybrid architecture for voice activity detection (VAD)
incorporating both convolutional neural network (CNN) and bidirectional long short-term …
incorporating both convolutional neural network (CNN) and bidirectional long short-term …