[HTML][HTML] Knowledge-aware audio-grounded generative slot filling for limited annotated data

G Sun, C Zhang, I Vulić, P Budzianowski… - Computer Speech & …, 2025 - Elsevier
Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems
is an expensive and time-consuming endeavour. This motivates research into slot-filling …

Chain-of-Thought Prompting for Speech Translation

K Hu, Z Chen, CHH Yang, P Żelasko… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated remarkable advancements in language
understanding and generation. Building on the success of text-based LLMs, recent research …

Improving contextual spelling correction by external acoustics attention and semantic aware data augmentation

X Wang, Y Liu, J Li, S Zhao - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
We previously proposed contextual spelling correction (CSC) to correct the output of end-to-
end (E2E) automatic speech recognition (ASR) models with contextual information such as …

Integrating pretrained asr and lm to perform sequence generation for spoken language understanding

S Arora, H Futami, Y Kashiwagi, E Tsunoo… - arxiv preprint arxiv …, 2023 - arxiv.org
There has been an increased interest in the integration of pretrained speech recognition
(ASR) and language models (LM) into the SLU framework. However, prior methods often …

A study on the integration of pipeline and e2e slu systems for spoken semantic parsing toward stop quality challenge

S Arora, H Futami, SL Wu, J Huynh… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Recently there have been efforts to introduce new benchmark tasks for spoken language
understanding (SLU), like semantic parsing. In this paper, we describe our proposed spoken …

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

S Kim, A Shrivastava, D Le, J Lin, O Kalinli… - arxiv preprint arxiv …, 2023 - arxiv.org
End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …

Augmenting text for spoken language understanding with Large Language Models

R Sharma, S Kim, D Lazar, T Le, A Shrivastava… - arxiv preprint arxiv …, 2023 - arxiv.org
Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from
input speech. Training robust models for existing application domains represented in …

Tensor decomposition for minimization of E2E SLU model toward on-device processing

Y Kashiwagi, S Arora, H Futami, J Huynh… - arxiv preprint arxiv …, 2023 - arxiv.org
Spoken Language Understanding (SLU) is a critical speech recognition application and is
often deployed on edge devices. Consequently, on-device processing plays a significant …

Introducing semantics into speech encoders

D Xu, S Dong, C Wang, S Kim, Z Lin… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent studies find existing self-supervised speech encoders contain primarily acoustic
rather than semantic information. As a result, pipelined supervised automatic speech …

PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

T Le, D Lazar, S Kim, S Jiang, D Le, A Sagar… - arxiv preprint arxiv …, 2024 - arxiv.org
Spoken Language Understanding (SLU) is a critical component of voice assistants; it
consists of converting speech to semantic parses for task execution. Previous works have …