Deep spoken keyword spotting: An overview
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
Language model modification for local speech recognition systems using remote sources
A language model is modified for a local speech recognition system using remote speech
recognition sources. In one example, a speech utterance is received. The speech utterance …
recognition sources. In one example, a speech utterance is received. The speech utterance …
Query-by-example keyword spotting using long short-term memory networks
We present a novel approach to query-by-example keyword spotting (KWS) using a long
short-term memory (LSTM) recurrent neural network-based feature extractor. In our …
short-term memory (LSTM) recurrent neural network-based feature extractor. In our …
[HTML][HTML] Compressed time delay neural network for small-footprint keyword spotting
In this paper we investigate a time delay neural network (TDNN) for a keyword spotting task
that requires low CPU, memory and latency. The TDNN is trained with transfer learning and …
that requires low CPU, memory and latency. The TDNN is trained with transfer learning and …
Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting
We propose a max-pooling based loss function for training Long Short-Term Memory
(LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and …
(LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and …
Spoken content retrieval—beyond cascading speech recognition with text retrieval
Spoken content retrieval refers to directly indexing and retrieving spoken content based on
the audio rather than text descriptions. This potentially eliminates the requirement of …
the audio rather than text descriptions. This potentially eliminates the requirement of …
Monophone-based background modeling for two-stage on-device wake word detection
Accurate on-device wake word detection is crucial to products with far-field voice control
such as the Amazon Echo. It is quite challenging to build a wake word system with both low …
such as the Amazon Echo. It is quite challenging to build a wake word system with both low …
Neural-network lexical translation for cross-lingual IR from text and speech
We propose a neural network model to estimate word translation probabilities for Cross-
Lingual Information Retrieval (CLIR). The model estimates better probabilities for word …
Lingual Information Retrieval (CLIR). The model estimates better probabilities for word …
[PDF][PDF] The Kaldi OpenKWS System: Improving Low Resource Keyword Search.
The IARPA BABEL program has stimulated worldwide research in keyword search
technology for low resource languages, and the NIST OpenKWS evaluations are the de …
technology for low resource languages, and the NIST OpenKWS evaluations are the de …
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
Existing research suggests that automatic speech recognition (ASR) models can benefit
from additional contexts (eg, contact lists, user specified vocabulary). Rare words and …
from additional contexts (eg, contact lists, user specified vocabulary). Rare words and …