Unsupervised speech recognition

A Baevski, WN Hsu, A Conneau… - Advances in Neural …, 2021 - proceedings.neurips.cc
Despite rapid progress in the recent past, current speech recognition systems still require
labeled training data which limits this technology to a small fraction of the languages spoken …

[HTML][HTML] Unsupervised automatic speech recognition: A review

H Aldarmaki, A Ullah, S Ram, N Zaki - Speech Communication, 2022 - Elsevier
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …

Unsupervised learning of spoken language with visual context

D Harwath, A Torralba, J Glass - Advances in neural …, 2016 - proceedings.neurips.cc
Humans learn to speak before they can read or write, so why can't computers do the same?
In this paper, we present a deep neural network model capable of rudimentary spoken …

Query-by-example keyword spotting using long short-term memory networks

G Chen, C Parada, TN Sainath - 2015 IEEE international …, 2015 - ieeexplore.ieee.org
We present a novel approach to query-by-example keyword spotting (KWS) using a long
short-term memory (LSTM) recurrent neural network-based feature extractor. In our …

[PDF][PDF] A nonparametric Bayesian approach to acoustic model discovery

C Lee, J Glass - Proceedings of the 50th Annual Meeting of the …, 2012 - aclanthology.org
We investigate the problem of acoustic modeling in which prior language-specific
knowledge and transcribed data are unavailable. We present an unsupervised model that …

Deep convolutional acoustic word embeddings using word-pair side information

H Kamper, W Wang, K Livescu - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Recent studies have been revisiting whole words as the basic modelling unit in speech
recognition and query applications, instead of phonetic units. Such whole-word segmental …

Recent developments in spoken term detection: a survey

A Mandal, KR Prasanna Kumar, P Mitra - International Journal of Speech …, 2014 - Springer
Spoken term detection (STD) provides an efficient means for content based indexing of
speech. However, achieving high detection performance, faster speed, detecting ot-of …

Rio: A pervasive rfid-based touch gesture interface

S Pradhan, E Chai, K Sundaresan, L Qiu… - Proceedings of the 23rd …, 2017 - dl.acm.org
In this paper, we design and develop RIO, a novel battery-free touch sensing user interface
(UI) primitive for future IoT and smart spaces. RIO enables UIs to be constructed using off-the …

Learning hierarchical discrete linguistic units from visually-grounded speech

D Harwath, WN Hsu, J Glass - arxiv preprint arxiv:1911.09602, 2019 - arxiv.org
In this paper, we present a method for learning discrete linguistic units by incorporating
vector quantization layers into neural models of visually grounded speech. We show that our …

Mispronunciation detection and diagnosis in l2 english speech using multidistribution deep neural networks

K Li, X Qian, H Meng - IEEE/ACM Transactions on Audio …, 2016 - ieeexplore.ieee.org
This paper investigates the use of multidistribution deep neural networks (DNNs) for
mispronunciation detection and diagnosis (MDD), to circumvent the difficulties encountered …