Speech recognition using deep neural networks: A systematic review

AB Nassif, I Shahin, I Attili, M Azzeh, K Shaalan - IEEE access, 2019 - ieeexplore.ieee.org
Over the past decades, a tremendous amount of research has been done on the use of
machine learning for speech processing applications, especially speech recognition …

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

A survey on voice assistant security: Attacks and countermeasures

C Yan, X Ji, K Wang, Q Jiang, Z **, W Xu - ACM Computing Surveys, 2022 - dl.acm.org
Voice assistants (VA) have become prevalent on a wide range of personal devices such as
smartphones and smart speakers. As companies build voice assistants with extra …

Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription

H Liao, E McDermott, A Senior - 2013 IEEE Workshop on …, 2013 - ieeexplore.ieee.org
YouTube is a highly visited video sharing website where over one billion people watch six
billion hours of video every month. Improving accessibility to these videos for the hearing …

Voice activity detection using an adaptive context attention model

J Kim, M Hahn - IEEE Signal Processing Letters, 2018 - ieeexplore.ieee.org
Voice activity detection (VAD) classifies incoming signal segments into speech or
background noise; its performance is crucial in various speech-related applications …

Voice activity detection in the wild: A data-driven approach using teacher-student training

H Dinkel, S Wang, X Xu, M Wu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Voice activity detection is an essential pre-processing component for speech-related tasks
such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain …

Optimization of RNN-based speech activity detection

G Gelly, JL Gauvain - IEEE/ACM Transactions on Audio …, 2017 - ieeexplore.ieee.org
Speech activity detection (SAD) is an essential component of automatic speech recognition
systems impacting the overall system performance. This paper investigates an optimization …

An open-source voice type classifier for child-centered daylong recordings

M Lavechin, R Bousbib, H Bredin, E Dupoux… - arxiv preprint arxiv …, 2020 - arxiv.org
Spontaneous conversations in real-world settings such as those found in child-centered
recordings have been shown to be amongst the most challenging audio files to process …

End-to-end automatic speech recognition integrated with CTC-based voice activity detection

T Yoshimura, T Hayashi, K Takeda… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper integrates a voice activity detection (VAD) function with end-to-end automatic
speech recognition toward an online speech interface and transcribing very long audio …

A hybrid CNN-BiLSTM voice activity detector

N Wilkinson, T Niesler - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
This paper presents a new hybrid architecture for voice activity detection (VAD)
incorporating both convolutional neural network (CNN) and bidirectional long short-term …