Google Наука

D Wagner, A Churchill, S Sigtia… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Interactions with virtual assistants typically start with a predefined trigger phrase followed by
the user command. To make interactions with the assistant more intuitive, we explore …

Запазване Позоваване С позовавания в 9 Сродни статии Всички 4 версии

Leveraging Contrastive Language–Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting

D Liu, Q Mao, L Gao, G Wang - Engineering Applications of Artificial …, 2024 - Elsevier

In resource-limited keyword spotting scenarios, the scarcity of annotated corpora hinders
deep learning's ability to develop robust models for representing acoustic features. Recent …

Запазване Позоваване С позовавания в 2 Сродни статии

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] Small footprint multi-channel network for keyword spotting with centroid based awareness

D Ng, Y **ao, JQ Yip, Z Yang, B Tian, Q Fu… - Proc …, 2023 - isca-archive.org

Abstract Spoken Keyword Spotting (KWS) in noisy far-field environments is challenging for
small-footprint models, given the restrictions on computational resources (eg, model size …

Запазване Позоваване С позовавания в 10 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Self-supervised learning-for underwater acoustic signal classification with mixup

Q Xu, J Jiang, K Xu, Y Dou, C Gao… - IEEE Journal of …, 2023 - ieeexplore.ieee.org

Underwater acoustic signal classification is a critical task that involves identifying different
types of signals in a complex and dynamic underwater environment, which is often …

Запазване Позоваване С позовавания в 6 Сродни статии Всички 2 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multimodal data and resource efficient device-directed speech detection with large foundation models

D Wagner, A Churchill, S Sigtia, P Georgiou… - arxiv preprint arxiv …, 2023 - arxiv.org

Interactions with virtual assistants typically start with a trigger phrase followed by a
command. In this work, we explore the possibility of making these interactions more natural …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Aca-net: Towards lightweight speaker verification using asymmetric cross attention

JQ Yip, T Truong, D Ng, C Zhang, Y Ma… - arxiv preprint arxiv …, 2023 - arxiv.org

In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding
extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric …

Запазване Позоваване С позовавания в 5 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] Dual-memory multimodal learning for continual spoken keyword spotting with confidence selection and diversity enhancement

Z Yang, D Ng, X Li, C Zhang, R Jiang, W **, Y Ma… - Proc …, 2023 - isca-archive.org

Enabling continual learning (CL) from an ever-changing environment is highly valuable, but
it poses significant challenges for spoken keyword spotting (KWS), which simultaneously …

Запазване Позоваване С позовавания в 3 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Efficient time and energy optimization in NOMA-enabled mobile edge computing through partial offloading

D Liu, Y Liu, L Khoukhi, A Hafid… - Tsinghua Science …, 2024 - ieeexplore.ieee.org

Customized keyword spotting needs to adapt quickly to small user samples. Current
methods primarily solve the problem under moderate noise conditions. Recent work …

Запазване Позоваване Сродни статии

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Machine Learning Analysis of Radio Data to Uncover Community Perceptions on the Ebola Outbreak in Uganda

J Nakatumba-Nabende, J Mukiibi, TS Bateesa… - ACM Journal on …, 2024 - dl.acm.org

Radio is vital for people, especially in rural areas, to share their concerns through interactive
talk shows. Understanding public perceptions of pandemics is crucial because they …

Запазване Позоваване Сродни статии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions

D Wagner, A Churchill, S Sigtia, E Marchi - arxiv preprint arxiv …, 2025 - arxiv.org

In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for
virtual Assistant interactions that integrates audio and text as inputs to a Large Language …

Запазване Позоваване Сродни статии Всички 2 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Contrastive speech mixup for low-resource keyword spotting

A multimodal approach to device-directed speech detection with large language models

Leveraging Contrastive Language–Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting

[PDF][PDF] Small footprint multi-channel network for keyword spotting with centroid based awareness

Self-supervised learning-for underwater acoustic signal classification with mixup

Multimodal data and resource efficient device-directed speech detection with large foundation models

Aca-net: Towards lightweight speaker verification using asymmetric cross attention

[PDF][PDF] Dual-memory multimodal learning for continual spoken keyword spotting with confidence selection and diversity enhancement

Efficient time and energy optimization in NOMA-enabled mobile edge computing through partial offloading

Machine Learning Analysis of Radio Data to Uncover Community Perceptions on the Ebola Outbreak in Uganda

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions