Espnet-slu: Advancing spoken language understanding through espnet

S Arora, S Dalmia, P Denisov, X Chang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
As Automatic Speech Processing (ASR) systems are getting better, there is an increasing
interest of using the ASR output to do downstream Natural Language Processing (NLP) …

Cwcl: Cross-modal transfer with continuously weighted contrastive loss

RS Srinivasa, J Cho, C Yang… - Advances in …, 2023 - proceedings.neurips.cc
This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-
trained model in one modality is used for representation learning in another domain using …

A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding

Y Peng, S Arora, Y Higuchi, Y Ueda… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive
and time-consuming. Recent studies achieved promising results by using pre-trained …

Integration of pre-trained networks with continuous token interface for end-to-end spoken language understanding

S Seo, D Kwak, B Lee - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Most End-to-End (E2E) Spoken Language Understanding (SLU) networks leverage the pre-
trained Automatic Speech Recognition (ASR) networks but still lack the capability to …

Universlu: Universal spoken language understanding for diverse classification and sequence generation tasks with a single network

S Arora, H Futami, J Jung, Y Peng, R Sharma… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent studies have demonstrated promising outcomes by employing large language
models with multi-tasking capabilities. They utilize prompts to guide the model's behavior …

Two-pass low latency end-to-end spoken language understanding

S Arora, S Dalmia, X Chang, B Yan, A Black… - arxiv preprint arxiv …, 2022 - arxiv.org
End-to-end (E2E) models are becoming increasingly popular for spoken language
understanding (SLU) systems and are beginning to achieve competitive performance to …

Integrating pretrained asr and lm to perform sequence generation for spoken language understanding

S Arora, H Futami, Y Kashiwagi, E Tsunoo… - arxiv preprint arxiv …, 2023 - arxiv.org
There has been an increased interest in the integration of pretrained speech recognition
(ASR) and language models (LM) into the SLU framework. However, prior methods often …

On the use of semantically-aligned speech representations for spoken language understanding

G Laperrière, V Pelloin, M Rouvier… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
In this paper we examine the use of semantically-aligned speech representations for end-to-
end spoken language understanding (SLU). We employ the recently-introduced SAMU …

End-to-end spoken language understanding with tree-constrained pointer generator

G Sun, C Zhang, PC Woodland - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
End-to-end spoken language understanding (SLU) suffers from the long-tail word problem.
This paper exploits contextual biasing, a technique to improve the speech recognition of rare …

[PDF][PDF] Improving Spoken Language Understanding with Cross-Modal Contrastive Learning.

J Dong, J Fu, P Zhou, H Li, X Wang - Interspeech, 2022 - researchgate.net
Spoken language understanding (SLU) is conventionally based on pipeline architecture
with error propagation issues. To mitigate this problem, end-to-end (E2E) models are …