Cwcl: Cross-modal transfer with continuously weighted contrastive loss
This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-
trained model in one modality is used for representation learning in another domain using …
trained model in one modality is used for representation learning in another domain using …
SLUE phase-2: A benchmark suite of diverse spoken language understanding tasks
Spoken language understanding (SLU) tasks have been studied for many decades in the
speech research community, but have not received as much attention as lower-level tasks …
speech research community, but have not received as much attention as lower-level tasks …
[PDF][PDF] Whislu: End-to-end spoken language understanding with whisper
Abstract Spoken Language Understanding (SLU) systems commonly use cascading
structures. However, these systems are prone to error propagation, information loss, high …
structures. However, these systems are prone to error propagation, information loss, high …
Universlu: Universal spoken language understanding for diverse classification and sequence generation tasks with a single network
Recent studies have demonstrated promising outcomes by employing large language
models with multi-tasking capabilities. They utilize prompts to guide the model's behavior …
models with multi-tasking capabilities. They utilize prompts to guide the model's behavior …
A comparative study on e-branchformer vs conformer in speech recognition, translation, and understanding tasks
Conformer, a convolution-augmented Transformer variant, has become the de facto encoder
architecture for speech processing due to its superior performance in various tasks …
architecture for speech processing due to its superior performance in various tasks …
Integrating pretrained asr and lm to perform sequence generation for spoken language understanding
There has been an increased interest in the integration of pretrained speech recognition
(ASR) and language models (LM) into the SLU framework. However, prior methods often …
(ASR) and language models (LM) into the SLU framework. However, prior methods often …
Deliberation model for on-device spoken language understanding
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language
understanding (SLU), where a streaming automatic speech recognition (ASR) model …
understanding (SLU), where a streaming automatic speech recognition (ASR) model …
Retrieval augmented correction of named entity speech recognition errors
E Pusateri, A Walia, A Kashi, B Bandyopadhyay… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, end-to-end automatic speech recognition (ASR) systems have proven
themselves remarkably accurate and performant, but these systems still have a significant …
themselves remarkably accurate and performant, but these systems still have a significant …
Improving end-to-end speech processing by efficient text data utilization with latent synthesis
Training a high performance end-to-end speech (E2E) processing model requires an
enormous amount of labeled speech data, especially in the era of data-centric artificial …
enormous amount of labeled speech data, especially in the era of data-centric artificial …
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …
parse from speech have become more promising recently. This approach uses a single …