A comprehensive empirical review of modern voice activity detection approaches for movies and TV shows

M Sharma, S Joshi, T Chatterjee, R Hamid - Neurocomputing, 2022 - Elsevier
A robust and language agnostic Voice Activity Detection (VAD) is crucial for Digital
Entertainment Content (DEC). Primary examples of DEC include movies and TV series …

Real-life voice activity detection with lstm recurrent neural networks and an application to hollywood movies

F Eyben, F Weninger, S Squartini… - 2013 IEEE international …, 2013 - ieeexplore.ieee.org
A novel, data-driven approach to voice activity detection is presented. The approach is
based on Long Short-Term Memory Recurrent Neural Networks trained on standard RASTA …

Recurrent neural networks for voice activity detection

T Hughes, K Mierle - 2013 IEEE International Conference on …, 2013 - ieeexplore.ieee.org
We present a novel recurrent neural network (RNN) model for voice activity detection. Our
multi-layer RNN model, in which nodes compute quadratic polynomials, outperforms a much …

[PDF][PDF] Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection.

R Zazo, TN Sainath, G Simko, C Parada - Interspeech, 2016 - isca-archive.org
Abstract Voice Activity Detection (VAD) is an important preprocessing step in any state-of-the-
art speech recognition system. Choosing the right set of features and model architecture can …

Voice activity detection: Merging source and filter-based information

T Drugman, Y Stylianou, Y Kida… - IEEE Signal Processing …, 2015 - ieeexplore.ieee.org
Voice Activity Detection (VAD) refers to the problem of distinguishing speech segments from
background noise. Numerous approaches have been proposed for this purpose. Some are …

Deep artwork detection and retrieval for automatic context-aware audio guides

L Seidenari, C Baecchi, T Uricchio… - ACM Transactions on …, 2017 - dl.acm.org
In this article, we address the problem of creating a smart audio guide that adapts to the
actions and interests of museum visitors. As an autonomous agent, our guide perceives the …

[PDF][PDF] Speaker and noise independent voice activity detection.

FG Germain, DL Sun, GJ Mysore - Interspeech, 2013 - isca-archive.org
Voice activity detection (VAD) in the presence of heavy, nonstationary noise is a challenging
problem that has attracted attention in recent years. Most modern VAD systems require …

Context-aware voice-based interaction in smart home-vocadom@ a4h corpus collection and empirical assessment of its usefulness

F Portet, S Caffiau, F Ringeval, M Vacher… - 2019 IEEE Intl Conf …, 2019 - ieeexplore.ieee.org
Smart homes aim at enhancing the quality of life of people at home by the use of home
automation systems and Ambient Intelligence. Most of these smart homes provide enhanced …

A crowdsourcing caption editor for educational videos

R Deshpande, T Tuna, J Subhlok… - 2014 ieee frontiers in …, 2014 - ieeexplore.ieee.org
Video of a classroom lecture has been shown to be a versatile learning resource
comparable to a textbook. Captions in videos are highly valued by students, especially those …

Bayesian semi-supervised audio event transcription based on Markov Indian buffet process

Y Ohishi, D Mochihashi, T Matsui… - … on acoustics, speech …, 2013 - ieeexplore.ieee.org
We present a novel generative model for audio event transcription that recognizes “events”
on audio signals including multiple kinds of overlap** sounds. In the proposed model …