Automatic speech recognition and speech variability: A review

M Benzeghiba, R De Mori, O Deroo, S Dupont… - Speech …, 2007 - Elsevier
Major progress is being recorded regularly on both the technology and exploitation of
automatic speech recognition (ASR) and spoken language systems. However, there are still …

Speech production knowledge in automatic speech recognition

S King, J Frankel, K Livescu, E McDermott… - The Journal of the …, 2007 - pubs.aip.org
Although much is known about how speech is produced, and research into speech
production has resulted in measured articulatory data, feature systems of different kinds, and …

Hawkes processes for events in social media

MA Rizoiu, Y Lee, S Mishra, L **e - Frontiers of multimedia research, 2017 - dl.acm.org
This chapter provides an accessible introduction for point processes, and especially Hawkes
processes, for modeling discrete, inter-dependent events over continuous time. We start by …

Deep learning for video classification and captioning

Z Wu, T Yao, Y Fu, YG Jiang - Frontiers of multimedia research, 2017 - dl.acm.org
Today's digital contents are inherently multimedia: text, audio, image, video, and so on.
Video, in particular, has become a new way of communication between Internet users with …

Making deep belief networks effective for large vocabulary continuous speech recognition

TN Sainath, B Kingsbury… - … IEEE Workshop on …, 2011 - ieeexplore.ieee.org
To date, there has been limited work in applying Deep Belief Networks (DBNs) for acoustic
modeling in LVCSR tasks, with past work using standard speech features. However, a …

Real-world acoustic event detection

X Zhuang, X Zhou, MA Hasegawa-Johnson… - Pattern recognition …, 2010 - Elsevier
Acoustic Event Detection (AED) aims to identify both timestamps and types of events in an
audio stream. This becomes very challenging when going beyond restricted highlight events …

Short-time phase spectrum in speech processing: A review and some experimental results

LD Alsteris, KK Paliwal - Digital signal processing, 2007 - Elsevier
Incorporating information from the short-time phase spectrum into a feature set for automatic
speech recognition (ASR) may possibly serve to improve recognition accuracy. Currently …

Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU summer workshop

K Livescu, O Cetin… - … , Speech and Signal …, 2007 - ieeexplore.ieee.org
We report on investigations, conducted at the 2006 Johns Hopkins Workshop, into the use of
articulatory features (AFs) for observation and pronunciation models in speech recognition …

Cross-modal speaker verification and recognition: A multilingual perspective

S Nawaz, MS Saeed, P Morerio… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent years have seen a surge in finding association between faces and voices within a
cross-modal biometric application along with speaker recognition. Inspired from this, we …

[PDF][PDF] A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice.

M Van Segbroeck, A Tsiartas, SS Narayanan - Interspeech, 2013 - sail.usc.edu
Reliable automatic detection of speech/non-speech activity in degraded, noisy audio signals
is a fundamental and challenging task in robust signal processing. As various speech …