Unsupervised speech recognition
Despite rapid progress in the recent past, current speech recognition systems still require
labeled training data which limits this technology to a small fraction of the languages spoken …
labeled training data which limits this technology to a small fraction of the languages spoken …
[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
Unsupervised learning of spoken language with visual context
Humans learn to speak before they can read or write, so why can't computers do the same?
In this paper, we present a deep neural network model capable of rudimentary spoken …
In this paper, we present a deep neural network model capable of rudimentary spoken …
Query-by-example keyword spotting using long short-term memory networks
We present a novel approach to query-by-example keyword spotting (KWS) using a long
short-term memory (LSTM) recurrent neural network-based feature extractor. In our …
short-term memory (LSTM) recurrent neural network-based feature extractor. In our …
[PDF][PDF] A nonparametric Bayesian approach to acoustic model discovery
We investigate the problem of acoustic modeling in which prior language-specific
knowledge and transcribed data are unavailable. We present an unsupervised model that …
knowledge and transcribed data are unavailable. We present an unsupervised model that …
Deep convolutional acoustic word embeddings using word-pair side information
Recent studies have been revisiting whole words as the basic modelling unit in speech
recognition and query applications, instead of phonetic units. Such whole-word segmental …
recognition and query applications, instead of phonetic units. Such whole-word segmental …
Recent developments in spoken term detection: a survey
A Mandal, KR Prasanna Kumar, P Mitra - International Journal of Speech …, 2014 - Springer
Spoken term detection (STD) provides an efficient means for content based indexing of
speech. However, achieving high detection performance, faster speed, detecting ot-of …
speech. However, achieving high detection performance, faster speed, detecting ot-of …
Rio: A pervasive rfid-based touch gesture interface
In this paper, we design and develop RIO, a novel battery-free touch sensing user interface
(UI) primitive for future IoT and smart spaces. RIO enables UIs to be constructed using off-the …
(UI) primitive for future IoT and smart spaces. RIO enables UIs to be constructed using off-the …
Learning hierarchical discrete linguistic units from visually-grounded speech
In this paper, we present a method for learning discrete linguistic units by incorporating
vector quantization layers into neural models of visually grounded speech. We show that our …
vector quantization layers into neural models of visually grounded speech. We show that our …
Mispronunciation detection and diagnosis in l2 english speech using multidistribution deep neural networks
K Li, X Qian, H Meng - IEEE/ACM Transactions on Audio …, 2016 - ieeexplore.ieee.org
This paper investigates the use of multidistribution deep neural networks (DNNs) for
mispronunciation detection and diagnosis (MDD), to circumvent the difficulties encountered …
mispronunciation detection and diagnosis (MDD), to circumvent the difficulties encountered …