Matching latent encoding for audio-text based keyword spotting

K Nishu, M Cho, D Naik - arxiv preprint arxiv:2306.05245, 2023 - arxiv.org
Using audio and text embeddings jointly for Keyword Spotting (KWS) has shown high-
quality results, but the key challenge of how to semantically align two embeddings for multi …

Flexible keyword spotting based on homogeneous audio-text embedding

K Nishu, M Cho, P Dixon, D Naik - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Spotting user-defined/flexible keywords represented in text frequently uses an expensive
text encoder for joint analysis with an audio encoder in an embedding space, which can …

Phonmatchnet: phoneme-guided zero-shot keyword spotting for user-defined keywords

YH Lee, N Cho - arxiv preprint arxiv:2308.16511, 2023 - arxiv.org
This study presents a novel zero-shot user-defined keyword spotting model that utilizes the
audio-phoneme relationship of the keyword to improve performance. Unlike the previous …

[PDF][PDF] A multitask training approach to enhance whisper with open-vocabulary keyword spotting

Y Li, M Zhang, C Su, Y Li, X Qiao, M Ren, M Ma… - Interspeech, 2024 - isca-archive.org
The recognition of rare named entities, such as personal names and terminologies, is
challenging for automatic speech recognition (ASR) systems, especially when they are not …

Open-Vocabulary Keyword-Spotting with Adaptive Instance Normalization

A Navon, A Shamsian, N Glazer… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Open vocabulary keyword spotting is a crucial and challenging task in automatic speech
recognition (ASR) that focuses on detecting user-defined keywords within a spoken …

U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias

A Zhang, P Zhou, K Huang, Y Zou… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has
attracted increasingly more interest. However, existing methods based on acoustic models …

Open vocabulary keyword spotting through transfer learning from speech synthesis

V Kesavaraj, A Vuppala - 2024 International Conference on …, 2024 - ieeexplore.ieee.org
Identifying keywords in an open-vocabulary context is crucial for personalizing interactions
with smart devices. Previous approaches to open vocabulary keyword spotting depend on a …

Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations

A Meghanani, T Hain - arxiv preprint arxiv:2403.08738, 2024 - arxiv.org
Acoustic word embeddings (AWEs) are vector representations of spoken words. An effective
method for obtaining AWEs is the Correspondence Auto-Encoder (CAE). In the past, the …

Contrastive Learning with Audio Discrimination for Customizable Keyword Spotting in Continuous Speech

Y **, B Yang, H Li, J Guo, K Yu - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Customizable keyword spotting (KWS) in continuous speech has attracted increasing
attention due to its real-world application potential. While contrastive learning (CL) has been …

Fully unsupervised training of few-shot keyword spotting

D Lee, M Kim, SH Mun, MH Han… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset
containing massive target keywords has known to be essential to generalize to arbitrary …