Matching latent encoding for audio-text based keyword spotting
Using audio and text embeddings jointly for Keyword Spotting (KWS) has shown high-
quality results, but the key challenge of how to semantically align two embeddings for multi …
quality results, but the key challenge of how to semantically align two embeddings for multi …
Flexible keyword spotting based on homogeneous audio-text embedding
Spotting user-defined/flexible keywords represented in text frequently uses an expensive
text encoder for joint analysis with an audio encoder in an embedding space, which can …
text encoder for joint analysis with an audio encoder in an embedding space, which can …
Phonmatchnet: phoneme-guided zero-shot keyword spotting for user-defined keywords
YH Lee, N Cho - arxiv preprint arxiv:2308.16511, 2023 - arxiv.org
This study presents a novel zero-shot user-defined keyword spotting model that utilizes the
audio-phoneme relationship of the keyword to improve performance. Unlike the previous …
audio-phoneme relationship of the keyword to improve performance. Unlike the previous …
[PDF][PDF] A multitask training approach to enhance whisper with open-vocabulary keyword spotting
The recognition of rare named entities, such as personal names and terminologies, is
challenging for automatic speech recognition (ASR) systems, especially when they are not …
challenging for automatic speech recognition (ASR) systems, especially when they are not …
Open-Vocabulary Keyword-Spotting with Adaptive Instance Normalization
Open vocabulary keyword spotting is a crucial and challenging task in automatic speech
recognition (ASR) that focuses on detecting user-defined keywords within a spoken …
recognition (ASR) that focuses on detecting user-defined keywords within a spoken …
U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias
Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has
attracted increasingly more interest. However, existing methods based on acoustic models …
attracted increasingly more interest. However, existing methods based on acoustic models …
Open vocabulary keyword spotting through transfer learning from speech synthesis
Identifying keywords in an open-vocabulary context is crucial for personalizing interactions
with smart devices. Previous approaches to open vocabulary keyword spotting depend on a …
with smart devices. Previous approaches to open vocabulary keyword spotting depend on a …
Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations
Acoustic word embeddings (AWEs) are vector representations of spoken words. An effective
method for obtaining AWEs is the Correspondence Auto-Encoder (CAE). In the past, the …
method for obtaining AWEs is the Correspondence Auto-Encoder (CAE). In the past, the …
Contrastive Learning with Audio Discrimination for Customizable Keyword Spotting in Continuous Speech
Customizable keyword spotting (KWS) in continuous speech has attracted increasing
attention due to its real-world application potential. While contrastive learning (CL) has been …
attention due to its real-world application potential. While contrastive learning (CL) has been …
Fully unsupervised training of few-shot keyword spotting
For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset
containing massive target keywords has known to be essential to generalize to arbitrary …
containing massive target keywords has known to be essential to generalize to arbitrary …