محقق Google

D Wagner, A Churchill, S Sigtia… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org‏

Interactions with virtual assistants typically start with a predefined trigger phrase followed by
the user command. To make interactions with the assistant more intuitive, we explore …‏

ذخیره ارجاع بیان شده در 9 یافته مقاله‌های مربوط تمام نسخه‌های 4

Leveraging Contrastive Language–Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting‏

D Liu, Q Mao, L Gao, G Wang - Engineering Applications of Artificial …, 2024‏ - Elsevier‏

In resource-limited keyword spotting scenarios, the scarcity of annotated corpora hinders
deep learning's ability to develop robust models for representing acoustic features. Recent …‏

ذخیره ارجاع بیان شده در 2 یافته مقاله‌های مربوط

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] Small footprint multi-channel network for keyword spotting with centroid based awareness‏

D Ng, Y **ao, JQ Yip, Z Yang, B Tian, Q Fu… - Proc …, 2023‏ - isca-archive.org‏

Abstract Spoken Keyword Spotting (KWS) in noisy far-field environments is challenging for
small-footprint models, given the restrictions on computational resources (eg, model size …‏

ذخیره ارجاع بیان شده در 10 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Self-supervised learning-for underwater acoustic signal classification with mixup‏

Q Xu, J Jiang, K Xu, Y Dou, C Gao… - IEEE Journal of …, 2023‏ - ieeexplore.ieee.org‏

Underwater acoustic signal classification is a critical task that involves identifying different
types of signals in a complex and dynamic underwater environment, which is often …‏

ذخیره ارجاع بیان شده در 6 یافته مقاله‌های مربوط تمام نسخه‌های 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multimodal data and resource efficient device-directed speech detection with large foundation models‏

D Wagner, A Churchill, S Sigtia, P Georgiou… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Interactions with virtual assistants typically start with a trigger phrase followed by a
command. In this work, we explore the possibility of making these interactions more natural …‏

ذخیره ارجاع بیان شده در 4 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Aca-net: Towards lightweight speaker verification using asymmetric cross attention‏

JQ Yip, T Truong, D Ng, C Zhang, Y Ma… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding
extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric …‏

ذخیره ارجاع بیان شده در 5 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] Dual-memory multimodal learning for continual spoken keyword spotting with confidence selection and diversity enhancement‏

Z Yang, D Ng, X Li, C Zhang, R Jiang, W **, Y Ma… - Proc …, 2023‏ - isca-archive.org‏

Enabling continual learning (CL) from an ever-changing environment is highly valuable, but
it poses significant challenges for spoken keyword spotting (KWS), which simultaneously …‏

ذخیره ارجاع بیان شده در 3 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Efficient time and energy optimization in NOMA-enabled mobile edge computing through partial offloading‏

D Liu, Y Liu, L Khoukhi, A Hafid… - Tsinghua Science …, 2024‏ - ieeexplore.ieee.org‏

Customized keyword spotting needs to adapt quickly to small user samples. Current
methods primarily solve the problem under moderate noise conditions. Recent work …‏

ذخیره ارجاع مقاله‌های مربوط

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Machine Learning Analysis of Radio Data to Uncover Community Perceptions on the Ebola Outbreak in Uganda‏

J Nakatumba-Nabende, J Mukiibi, TS Bateesa… - ACM Journal on …, 2024‏ - dl.acm.org‏

Radio is vital for people, especially in rural areas, to share their concerns through interactive
talk shows. Understanding public perceptions of pandemics is crucial because they …‏

ذخیره ارجاع مقاله‌های مربوط

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions‏

D Wagner, A Churchill, S Sigtia, E Marchi - arxiv preprint arxiv …, 2025‏ - arxiv.org‏

In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for
virtual Assistant interactions that integrates audio and text as inputs to a Large Language …‏

ذخیره ارجاع مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Contrastive speech mixup for low-resource keyword spotting

A multimodal approach to device-directed speech detection with large language models‏

Leveraging Contrastive Language–Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting‏

[PDF][PDF] Small footprint multi-channel network for keyword spotting with centroid based awareness‏

Self-supervised learning-for underwater acoustic signal classification with mixup‏

Multimodal data and resource efficient device-directed speech detection with large foundation models‏

Aca-net: Towards lightweight speaker verification using asymmetric cross attention‏

[PDF][PDF] Dual-memory multimodal learning for continual spoken keyword spotting with confidence selection and diversity enhancement‏

Efficient time and energy optimization in NOMA-enabled mobile edge computing through partial offloading‏

Machine Learning Analysis of Radio Data to Uncover Community Perceptions on the Ebola Outbreak in Uganda‏

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions‏